KR20240021677A

KR20240021677A - Apparatus for processing cyber threat information, method for processing cyber threat information, and computationally-readable storage medium for storing a program processing cyber threat information

Info

Publication number: KR20240021677A
Application number: KR1020220185453A
Authority: KR
Inventors: 김기홍; 박성은; 최민준; 이현종
Original assignee: 주식회사 샌즈랩
Priority date: 2022-08-10
Filing date: 2022-12-27
Publication date: 2024-02-19

Abstract

개시하는 실시 예는, 비실행형 파일을 입력받고 상기 입력된 비실행형 파일의 사이버 위협과 관련된 적어도 하나의 특징 분석을 수행하고 분석 정보를 생성하는 단계; 상기 생성된 적어도 하나의 분석정보들을 선택적으로 결합한 특징 정보에 기반해 상기 비실행형 파일에 악성 행위가 포함되는지 탐지하는 단계; 상기 비실행형 파일에 악성행위가 탐지된 경우, 상기 악성 행위에 따른 공격기법에 대한 분류정보와 공격그룹 분류 정보를 생성하는 단계; 및 상기 비실행형 파일의 생성된 정보를 기반으로 사이버 위협 정보를 사용자에게 제공하는 단계;를 포함하는 사이버 위협 정보 처리 방법을 제공한다.The disclosed embodiment includes the steps of receiving a non-executable file, analyzing at least one characteristic related to a cyber threat of the input non-executable file, and generating analysis information; Detecting whether the non-executable file contains malicious activity based on characteristic information obtained by selectively combining the generated at least one analysis information; When a malicious activity is detected in the non-executable file, generating classification information on attack techniques and attack group classification information according to the malicious activity; and providing cyber threat information to a user based on the generated information of the non-executable file.

Description

사이버 위협 정보 처리 장치, 사이버 위협 정보 처리 방법 및 사이버 위협 정보 처리하는 프로그램을 저장하는 컴퓨터 판독 가능한 저장매체{APPARATUS FOR PROCESSING CYBER THREAT INFORMATION, METHOD FOR PROCESSING CYBER THREAT INFORMATION, AND COMPUTATIONALLY-READABLE STORAGE MEDIUM FOR STORING A PROGRAM PROCESSING CYBER THREAT INFORMATION}A computer-readable storage medium that stores a cyber threat information processing device, a cyber threat information processing method, and a cyber threat information processing program PROGRAM PROCESSING CYBER THREAT INFORMATION}

개시하는 실시 예들은 사이버 위협 정보 처리 장치, 사이버 위협 정보 처리 방법 및 사이버 위협 정보 처리하는 프로그램을 저장하는 저장매체에 관한 것이다. The disclosed embodiments relate to a cyber threat information processing device, a cyber threat information processing method, and a storage medium that stores a cyber threat information processing program.

신종 또는 변종 등의 악성코드를 중심으로 점차 고도화 되고 있는 사이버 보안 위협의 피해가 커지고 있다. 이러한 피해를 조금이라도 줄이고 조기에 대응하기 위해서 다차원의 패턴 구성 및 각종 복합 분석 등을 통해서 대응 기술에 대한 고도화를 병행해 나가고 있다. 그러나, 최근의 사이버 공격은 제어 범위 내에 적절하게 대응되기 보다는 오히려 나날이 위협이 증가하고 있는 추세이다. 이러한 사이버 공격은 기존 ICT (Information and Communication Technology) 기반 시설을 넘어서 우리 삶에 직접적으로 영향을 끼치는 금융, 교통, 환경, 건강 등에 까지 위협을 가하고 있다.Damage from increasingly sophisticated cyber security threats, centered on new or variant types of malware, is increasing. In order to reduce this damage and respond early, we are simultaneously upgrading our response technology through multidimensional pattern composition and various complex analyses. However, rather than responding appropriately to recent cyber attacks within the scope of control, the threat is increasing day by day. These cyber attacks go beyond existing ICT (Information and Communication Technology) infrastructure and pose a threat to finance, transportation, environment, and health that directly affect our lives.

현존하는 대부분의 사이버 보안 위협을 탐지하고 대응하는 기반 기술 중에 하나는 사이버 공격 또는 악성 코드에 대한 패턴을 데이터베이스를 사전에 생성하고 데이터 흐름이 필요한 곳에 적절한 모니터링 기술을 활용한다. 기존의 기술은 모니터링된 패턴과 일치하는 데이터 흐름 또는 코드가 탐지되면 위협을 식별하여 대응하는 방식을 바탕으로 발전되어 왔다. 이와 같은 종래의 기술은 사전에 확보된 패턴과 일치하면 빠르고 정확하게 탐지할 수 있다는 장점이 있지만, 패턴이 확보되지 않거나 우회하는 신종, 변종 위협의 경우 탐지 자체가 불가능하거나 분석하는데 매우 시간이 오래 소요되는 문제점이 있었다. One of the basic technologies for detecting and responding to most existing cyber security threats is to create a database of patterns for cyber attacks or malicious code in advance and utilize appropriate monitoring technology where data flow is required. Existing technologies have been developed based on identifying and responding to threats when data flows or code matching monitored patterns are detected. Such conventional technology has the advantage of being able to detect quickly and accurately if it matches a pattern obtained in advance, but in the case of a new or variant threat for which a pattern is not secured or is bypassed, detection itself is impossible or analysis takes a very long time. There was a problem.

종래의 기술은 인공지능 분석을 활용하더라도 악성코드 자체를 탐지하고 분석하는 기술을 고도화하는 방법에 초점이 맞춰져 있다. 그러나 근본적으로 사이버 보안 위협을 대응하기 위한 원천적인 기술은 존재하지 않아 이러한 방법만으로 신종 악성코드나 그 악성코드의 변종에 대응하기 힘들며 한계가 있다는 문제점이 있다. Conventional technologies, even when using artificial intelligence analysis, are focused on ways to advance the technology to detect and analyze the malicious code itself. However, fundamentally, there is no fundamental technology to respond to cyber security threats, so it is difficult to respond to new malware or variants of malware with these methods alone, and there are limitations.

예를 들면 이미 발견된 악성 코드 자체를 탐지하고 분석하는 기술만으로는 그 탐지나 분석 시스템을 속이기 위한 디코이(decoy) 정보나 가짜 정보에 대응하지 못하고 혼선이 발생하는 문제점이 있다. For example, there is a problem that technology that detects and analyzes already discovered malicious code itself cannot respond to decoy information or fake information to deceive the detection or analysis system, causing confusion.

학습할 데이터가 충분히 있는 대량 생산의 악성코드의 경우는 그 특징 정보를 충분히 확보할 수 있기 때문에 악성 여부 및 악성코드 종류를 구분할 수 있다. 그러나, 상대적으로 수량이 작게 만들어져 정교하게 공격하는 APT (Advanced Persistent Threat) 공격의 경우는 학습 데이터와 일치하지 않는 경우가 많고 타겟팅(targeting)된 공격이 대다수를 이루고 있기 때문에 기존 기술은 고도화하더라도 한계점이 존재한다.In the case of mass-produced malware with sufficient data to learn, it is possible to distinguish whether it is malicious or not and the type of malware because sufficient characteristic information can be obtained. However, in the case of APT (Advanced Persistent Threat) attacks, which are relatively small and sophisticated attacks, often do not match the learning data and the majority of attacks are targeted, so existing technologies have limitations even if they are advanced. exist.

또한 종래에는 악성 코드, 공격 코드 또는 사이버 위협에 대한 설명을 하는 방법과 표현 기법이 분석가의 입장이나 분석 시각에 따라 달랐다. 예를 들면 악성 코드와 공격 행위를 기술하는 방식은 전세계적으로 표준이 되지 않아 같은 사건, 같은 악성코드를 탐지하여도 해당 분야의 전문가의 설명이 달라 혼동이 되는 문제점이 있었다. 심지어 악성코드 탐지 명 또한 통일이 되지 않아 같은 악성 파일임에도 불구하고 어떤 공격이 정확하게 수행되었는지 식별되지 못하거나 다르게 정리되었다. 따라서 식별된 공격 기법을 정규화되고 표준화된 방식으로 설명하지 못하는 문제점이 있었다.Additionally, in the past, methods and expression techniques for explaining malicious code, attack code, or cyber threats differed depending on the analyst's position or analysis perspective. For example, the way to describe malicious code and attack behavior is not standardized around the world, so even if the same incident or the same malicious code is detected, the explanations given by experts in the field are different, causing confusion. Even the malware detection name was not unified, so even though it was the same malicious file, it was not possible to identify exactly what attack was performed or it was organized differently. Therefore, there was a problem in that the identified attack techniques could not be explained in a normalized and standardized manner.

종래의 악성 코드 탐지 및 분석 방법은 악성코드 자체의 탐지를 중시하여 매우 유사한 악성 행위를 수행하는 악성 코드의 경우 생성하는 공격자가 다른 경우 공격자들을 식별하지 못하는 문제점이 있었다. Conventional malicious code detection and analysis methods place emphasis on detection of the malicious code itself, and have the problem of not being able to identify the attackers when the attackers who create the malicious codes that perform very similar malicious actions are different.

위와 같은 문제점들과 연결되어 종래의 방식은 이러한 개별적인 케이스 집중된 탐지 방법에 의해 추후 가까운 미래에 어떤 사이버 위협 공격이 있을지 예측하기 어려운 문제점이 있었다. In connection with the above problems, the conventional method had the problem of predicting what cyber threat attacks would occur in the near future due to this individual case-focused detection method.

이하에서 개시하는 실시 예의 목적은, 인공 지능으로 학습된 데이터와 정확하게 일치하지 않는 악성 코드라도 탐지하고 대응할 수 있고 악성 코드의 변종에 대응할 수 있는 사이버 위협 정보 처리 장치, 사이버 위협 정보 처리 방법 및 사이버 위협 정보 처리하는 프로그램을 저장하는 저장매체를 제공하는 것이다.The purpose of the embodiment disclosed below is to provide a cyber threat information processing device, a cyber threat information processing method, and a cyber threat capable of detecting and responding to malicious code that does not exactly match data learned by artificial intelligence and responding to variants of malicious code. It provides a storage medium for storing information processing programs.

실시 예의 다른 목적은 악성 코드의 변종이라도 매우 빠른 시간 내에 악성 코드, 공격 기법, 공격자와 공격 예측 방법을 식별할 수 있는 사이버 위협 정보 처리 장치, 사이버 위협 정보 처리 방법 및 사이버 위협 정보 처리하는 프로그램을 저장하는 저장매체를 제공하는 것이다.Another object of the embodiment is to store a cyber threat information processing device, a cyber threat information processing method, and a program for processing cyber threat information that can identify malicious code, attack techniques, attackers, and attack prediction methods within a very short time even if it is a variant of malicious code. It provides a storage medium that

실시 예의 다른 목적은 악성코드 탐지 명 등이 통일되지 않거나 사이버 공격 기법이 정확하게 기술되지 못하는 악성 코드의 정보를 정규화되고 표준화된 방식으로 제공할 수 있는 사이버 위협 정보 처리 장치, 사이버 위협 정보 처리 방법 및 사이버 위협 정보 처리하는 프로그램을 저장하는 저장매체를 제공하는 것이다.Another object of the embodiment is a cyber threat information processing device, a cyber threat information processing method, and a cyber threat information processing device that can provide information on malicious code in which the malicious code detection name, etc., is not unified or the cyber attack technique is not accurately described, in a normalized and standardized manner. It provides a storage medium that stores programs that process threat information.

실시 예의 다른 목적은 매우 유사한 악성 행위를 수행하는 악성 코드를 생성하는 다른 공격자들을 식별하고 미래에 어떤 사이버 위협 공격이 있을지 예측이 가능한 사이버 위협 정보 처리 장치, 사이버 위협 정보 처리 방법 및 사이버 위협 정보 처리하는 프로그램을 저장하는 저장매체를 제공하는 것이다.Another object of the embodiment is a cyber threat information processing device, a cyber threat information processing method, and a cyber threat information processing device capable of identifying other attackers who create malicious code that performs very similar malicious acts and predicting what cyber threat attacks will occur in the future. It provides a storage medium for storing programs.

실시 예의 다른 목적은, 실행된 파일의 수행 결과는 동일하지만 수행 과정에 차이에 따라 발생하는 공격 기법 또는 공격 그룹의 차이가 실질적으로 다른 공격 기법이거나 또는 다른 공격 그룹에 의해 행해지는 것인지를 더욱 명확하게 탐지하고 인지할 수 있는 구체적인 예들을 제공하는 것이다. Another purpose of the embodiment is to make it clearer whether the execution result of the executed file is the same, but the difference in the attack technique or attack group that occurs due to the difference in the execution process is actually a different attack technique or is carried out by a different attack group. The goal is to provide concrete examples that can be detected and recognized.

실시 예의 다른 목적은, 실행 파일이 아닌 비실행 파일인 경우라도 이에 포함된 여러 가지 파일 타입들에 대한 사이버 위협 정보, 공격 기법 및 공격 그룹을 식별할 수 있는 구체적인 예들을 제공하는 것이다.Another purpose of the embodiment is to provide specific examples that can identify cyber threat information, attack techniques, and attack groups for various file types included even in the case of non-executable files rather than executable files.

상기 생성한 분석 정보는, 상기 비실행형 파일의 사이버 위협과 관련된 정적특징정보를 포함할 수 있다.The generated analysis information may include static characteristic information related to cyber threats of the non-executable file.

상기 생성한 분석 정보는, 상기 비실행형 파일의 사이버 위협과 관련된 동적특징정보를 포함하고, 상기 동적특징정보는 상기 비실행형 파일과 관련된 리더 프로그램이 운영체제상에 요청하는 시스템콜에 대한 후킹(hooking)을 수행하고 상기 후킹 시점에서 메모리 상에 데이터와 상기 후킹 시점 이전의 실행함수 및 파라미터로부터 얻은 정보에 기반하여 생성될 수 있다.The generated analysis information includes dynamic characteristic information related to cyber threats of the non-executable file, and the dynamic characteristic information is hooked to a system call requested from the operating system by a leader program related to the non-executable file. may be performed and generated based on data on memory at the hooking point and information obtained from execution functions and parameters before the hooking point.

상기 생성한 분석 정보는, 상기 비실행형 파일과 관련된 애플리케이션 실행 시 API 후킹(hooking)을 수행하고 상기 후킹 시점의 메모리 상 데이터로부터 얻은 특징 정보를 포함할 수 있다.The generated analysis information may include characteristic information obtained from data on memory at the time of API hooking and hooking when an application related to the non-executable file is executed.

다른 관점에서 개시하는 실시 예는, 데이터를 저장하는 저장장치; 및 입력된 파일을 프로그램을 수행하는 프로세서를 포함하고, 상기 프로세서는, 응용 프로그램 인터페이스(Application Programming Interface; API)를 통해 상기 입력된 비실행형 파일의 사이버 위협과 관련된 적어도 하나의 특징 분석을 수행하고 분석 정보를 생성하고, 상기 생성된 적어도 하나의 분석정보들을 선택적으로 결합한 특징 정보에 기반해 상기 비실행형 파일에 악성 행위가 포함되는지 탐지하고; 상기 비실행형 파일에 악성행위가 탐지된 경우, 상기 악성 행위에 따른 공격기법에 대한 분류정보와 공격그룹 분류 정보를 생성하고; 및 상기 비실행형 파일의 생성된 정보를 기반으로 사이버 위협 정보를 사용자에게 제공하는; 사이버 위협 정보 처리 장치를 제공한다. Embodiments disclosed from another perspective include a storage device for storing data; and a processor that executes a program for the input file, wherein the processor performs and analyzes at least one feature related to a cyber threat of the input non-executable file through an application programming interface (API). Generating information and detecting whether the non-executable file contains malicious activity based on characteristic information obtained by selectively combining at least one analysis information generated; When malicious behavior is detected in the non-executable file, classification information on attack techniques and attack group classification information according to the malicious behavior are generated; and providing cyber threat information to the user based on the generated information of the non-executable file. Provides a cyber threat information processing device.

다른 관점에서 개시하는 실시 예는, 입력된 비실행형 파일의 사이버 위협과 관련된 적어도 하나의 특징 분석을 수행하고 분석 정보를 생성하고; 상기 생성된 적어도 하나의 분석정보들을 선택적으로 결합한 특징 정보에 기반해 상기 비실행형 파일에 악성 행위가 포함되는지 탐지하고; 상기 비실행형 파일에 악성행위가 탐지된 경우, 상기 악성 행위에 따른 공격기법에 대한 분류정보와 공격그룹 분류 정보를 생성하고; 및 상기 비실행형 파일의 생성된 정보를 기반으로 사이버 위협 정보를 사용자에게 제공하는, 사이버 보안 위협 정보 처리하는 프로그램을 저장하는 컴퓨터 판독 가능한 저장 매체를 제공한다.In another aspect, the disclosed embodiment includes performing analysis of at least one characteristic related to a cyber threat of an input non-executable file and generating analysis information; detecting whether the non-executable file contains malicious behavior based on characteristic information obtained by selectively combining the generated at least one analysis information; When malicious behavior is detected in the non-executable file, classification information on attack techniques and attack group classification information according to the malicious behavior are generated; and a computer-readable storage medium storing a program for processing cyber security threat information, which provides cyber threat information to a user based on the generated information of the non-executable file.

이하에서 개시하는 실시예에 따르면 머신 러닝으로 학습된 데이터와 정확하게 일치하지 않는 악성 코드라도 탐지하고 대응할 수 있고 악성 코드의 변종에 대응할 수 있다. According to the embodiment disclosed below, it is possible to detect and respond to malicious code that does not exactly match data learned through machine learning, and to respond to variants of malicious code.

실시예에 따르면 악성 코드의 변종이라도 매우 빠른 시간 내에 악성 코드, 공격 기법 및 공격자를 식별할 수 있고 나아가 추후의 특정 공격자의 공격 기법을 예측할 수 있다. According to the embodiment, even if it is a variant of malicious code, malicious code, attack techniques, and attackers can be identified within a very short period of time, and furthermore, future attack techniques of specific attackers can be predicted.

실시예에 따르면 이러한 악성 코드 여부, 공격 기법, 공격 식별자 및 공격자를 기반으로 사이버 공격 구현 방식을 정확히 식별하고 이를 표준화된 모델로 제공할 수 있다. 실시예에 따르면 악성코드 탐지 명 등이 통일되지 않거나 사이버 공격 기법이 정확하게 기술되지 못하는 악성 코드의 정보를 정규화되고 표준화된 방식으로 제공할 수 있다. According to the embodiment, the cyber attack implementation method can be accurately identified based on whether such malicious code is present, attack technique, attack identifier, and attacker, and provided as a standardized model. According to an embodiment, information on malicious code for which the malicious code detection name, etc. is not unified or the cyber attack technique is not accurately described, can be provided in a normalized and standardized manner.

또한 기존에 알려지지 않은 악성 코드를 생성 가능성과 이를 개발할 수 있는 공격자들을 예측하고 미래에 어떤 사이버 위협 공격이 있을지 예측 가능한 수단을 제공할 수 있다.In addition, it can predict the possibility of creating previously unknown malicious code and the attackers who can develop it, and provide a means of predicting what cyber threat attacks will occur in the future.

실시예에 따르면, 실행된 파일의 수행 결과는 동일하더라도 수행 과정에 차이에 따라 발생하는 다른 공격 기법이거나 또는 다른 공격 그룹을 더욱 명확하게 탐지하고 인지할 수 있다.According to the embodiment, even if the execution result of the executed file is the same, different attack techniques or different attack groups that occur depending on differences in the execution process can be more clearly detected and recognized.

실시예에 따르면, 실행 파일이 아닌 비실행 파일인 경우라도 이에 포함된 여러 가지 파일 타입들에 대한 사이버 위협 정보, 공격 기법 및 공격 그룹을 식별할 수 있다.According to an embodiment, even if it is a non-executable file rather than an executable file, cyber threat information, attack techniques, and attack groups for various file types included therein can be identified.

도 1은 사이버 위협 정보 처리 방법의 일 실시 예를 예시한 도면
도 2는 개시하는 실시 예에 따라 분석 정보 생성하는 과정에서 정적 분석 정보를 얻는 예를 개시한 도면
도 3은 개시하는 실시 예에 따라 분석 정보 생성하는 과정에서 동적 분석 정보를 얻는 예를 개시한 도면
도 4은 개시하는 실시 예에 따라 분석 정보 생성하는 과정에서 심층 분석 정보를 얻는 예를 개시한 도면
도 5는 심층 분석의 일 예로서 악성 코드를 디스어셈블링하여 악성 행위가 포함된 파일임을 판단하는 예를 개시한 도면
도 6은 개시하는 실시 예에 따라 분석 정보 생성하는 과정에서 연관관계 분석 정보를 산출하는 일 예를 개시한 도면
도 7은 개시한 실시 예에 따라 연관관계 분석 정보를 얻는 과정의 일 예를 개시한 도면
도 8은 실시 예에 따라 사이버 위협 정보의 예측 정보 생성하는 일 예를 개시한 도면
도 9는 실시 예에 따라 사이버 위협 정보를 제공하기 위한 악성 코드 질의들의 예를 개시한 도면
도 10은 사이버 위협 정보 처리 장치의 일 실시 예를 개시한 도면
도 11은 개시하는 실시 예에 따라 분석 프레임 워크 중 정적 분석 모듈의 기능을 상세히 설명하기 위한 일 예를 나타낸 도면
도 12는 개시하는 실시 예에 따라 분석 프레임 워크 중 동적분석 모듈의 기능을 상세히 설명하기 위한 일 예를 나타낸 도면
도 13은 개시하는 실시 예에 따라 분석 프레임 워크 중 심층분석 모듈의 기능을 상세히 설명하기 위한 일 예를 나타낸 도면
도 14은 개시하는 실시 예에 따라 분석 프레임 워크 중 연관관계분석 모듈의 기능을 상세히 설명하기 위한 일 예를 나타낸 도면
도 15는 개시하는 실시 예에 따라 예측 프레임 워크의 예측정보생성 모듈의 기능을 상세히 설명하기 위한 일 예를 나타낸 도면
도 16은 개시하는 실시 예에 따라 정적 분석을 수행하는 일 예를 나타낸 도면
도 17은 개시하는 실시 예에 따라 동적 분석을 수행하는 일 예를 나타낸 도면
도 18은 개시하는 실시 예에 따라 심층 분석을 수행하는 일 예를 나타낸 도면
도 19는 개시하는 실시 예에 따라 바이너리 코드에서 추출된 코드들로 공격 기법을 매칭하는 일 예를 나타낸 도면
도 20은 개시하는 실시 예에 따라 OP-CODE를 포함하는 코드 세트와 공격 기법을 매칭하는 일 예를 나타낸 도면
도 21은 개시하는 실시 예에 따라 사이버 위협 정보를 처리하는 흐름을 예시한 도면
도 22는 개시하는 실시 예에 따라 OP-CODE 및 ASM-CODE를 정규화된 코드로 변환한 값을 예시한 도면
도 23은 개시하는 실시 예에 따라 OP-CODE 및 ASM-CODE의 벡터화된 값을 예시한 도면
도 24는 개시하는 실시 예에 따라 코드의 블록 단위를 해쉬 값으로 변환하는 예를 개시한 도면
도 25는 개시하는 실시 예에 따른 앙상블 머신 러닝 모델의 일 예를 나타낸 도면
도 26은 개시하는 실시 예에 따라 머신 러닝으로 데이터를 학습하고 분류하는 흐름을 예시한 도면
도 27은 개시하는 실시 예에 따라 학습 데이터로 공격 식별자와 공격자를 식별하여 라벨링을 수행한 예를 나타낸 도면
도 28은 실시 예에 따라 공격 식별자를 식별한 결과를 나타낸 도면
도 29는 실시 예에 따라 공격 식별자에 따른 그램 데이터 패턴을 예시한 도면
도 30은 개시한 사이버 위협 정보를 처리하는 실시 예의 성능를 예시한 도면
도 31은 사이버 위협 정보의 탐지하는 엔진들의 탐지 엔진들을 탐지 명을 제공하는 예를 나타낸 도면
도 32는 실시 예에 따라 새로운 악성 코드와 공격 방식을 예시하는 일 예를 나타낸 도면
도 33은 함수 단위의 공격 기법 및 공격 그룹 식별을 수행하는 예를 설명하기 위한 도면
도 34는 함수가 분리될 경우의 공격 기법 및 공격 그룹 식별을 수행하는 예를 설명하기 위한 도면
도 35는 실시 예에 따라 사이버 위협에 관련된 특징 정보를 얻는 예를 개시한 도면
도 36은 실시 예에 따라 브랜치 인스트럭션(branch instruction) 계열을 이용하여 제어흐름을 얻는 과정을 예시한 도면
도 37은 제 2 예에 따라 예시한 인스트럭션 결합 원칙에 따라 제어블럭의 인스트럭션들을 결합하여 인스트럭션 시퀀스를 생성하는 경우를 예시한 도면
도 38은 제어블럭 내의 인스터력션들을 이용하여 특징 정보를 포함하는 인스트럭션 시퀀스들을 생성하는 다른 예를 설명하기 위한 도면
도 39는 제어블럭 내의 인스터력션들을 이용하여 특징 정보를 포함하는 인스트럭션 시퀀스들을 생성하는 또 다른 예를 설명하기 위한 도면
도 40은 제어블럭 내의 인스터력션들을 이용하여 특징 정보를 포함하는 인스트럭션 시퀀스들을 생성하는 또 다른 예를 설명하기 위한 도면
도 41은 위의 설명한 예들에 따라 인스트럭션 시퀀스를 생성하는 예를 개시한 도면
도 42는 개시한 사이버 위협 정보 처리 장치의 다른 일 실시 예를 예시한 도면
도 43은 개시한 사이버 위협 정보 처리 방법의 다른 일 실시 예를 예시한 도면
도 44는 비실행형 파일 구조와 그 비실행형 파일의 리더 프로그램을 개념적으로 나타낸 도면
도 45는 비실행형 파일의 사이버 위협 정보를 얻을 수 있는 실시 예의 블록도를 개시한 도면
도 46은 파일의 사이버 위협 정보를 얻을 수 있는 예시도 중 파일분석부에 포함되어 파일의 제1 타입의 분석을 실시하는 예를 개시한 도면
도 47은 파일의 사이버 위협 정보를 얻을 수 있는 예시도 중 파일분석부에 포함되어 파일의 제2 타입의 분석을 수행하는 예를 개시한 도면
도 48은 실시 예에 따른 파일에 대한 제2 타입의 분석에 의해 비실행형 파일의 동적 수행에 의해 추출되는 대상과 추출된 정보를 예시한 도면
도 49는 파일의 사이버 위협 정보를 얻을 수 있는 예시도 중 파일분석부에 포함되어 파일에 대한 제3 타입의 분석을 실시하는 예를 개시한 도면
도 50은 실시 예에 따라 제3 분석부가 마일드 동적 분석을 수행할 경우 API 후킹 리스트 정보를 예시한 도면
도 51은 비실행형 파일의 사이버 위협 정보를 얻을 수 있는 실시 예 중 특징처리부를 설명하기 위한 도면
도 52는 개시한 실시 예에 따라 비실행형파일에서 추출된 특징 정보의 중요도를 비교한 예시도
도 53은 개시한 실시 예에 따라 공격기법분류부의 분류 모델을 설명하기 위한 예시도
도 54는 개시한 예에 따라 비실행형파일에 대해 여러 분석 기법을 선택적 결합하여 식별한 공격기법을 예시한 도면
도 55는 개시한 실시 예에 따라 공격그룹분류부의 분류 모델을 설명하기 위한 예시도
도 56은 위에서 설명한 비실행형파일의 리더 프로그램 실행과 시스템콜을 예시한 도면
도 57은 실시 예에 따라 프로그램 코드상 시스템콜을 후킹하는 예를 설명하기 위한 도면
도 58은 실시 예에 따라 동적 분석을 통해 사이버 위협 정보를 추적할 수 있는 예를 개시한 도면
도 59는 개시한 사이버 위협 정보 처리 장치의 다른 일 실시 예를 예시한 도면
도 60은 개시한 사이버 위협 정보 처리 방법의 다른 일 실시 예를 예시한 도면1 is a diagram illustrating an example of a cyber threat information processing method
FIG. 2 is a diagram illustrating an example of obtaining static analysis information in the process of generating analysis information according to the disclosed embodiment.
3 is a diagram illustrating an example of obtaining dynamic analysis information in the process of generating analysis information according to the disclosed embodiment.
Figure 4 is a diagram illustrating an example of obtaining in-depth analysis information in the process of generating analysis information according to the disclosed embodiment.
Figure 5 is an example of in-depth analysis, showing an example of disassembling malicious code to determine that it is a file containing malicious behavior.
Figure 6 is a diagram illustrating an example of calculating correlation analysis information in the process of generating analysis information according to the disclosed embodiment.
Figure 7 is a diagram illustrating an example of a process for obtaining correlation analysis information according to the disclosed embodiment.
8 is a diagram illustrating an example of generating predictive information of cyber threat information according to an embodiment.
9 is a diagram illustrating examples of malicious code queries for providing cyber threat information according to an embodiment.
10 is a diagram illustrating an embodiment of a cyber threat information processing device.
FIG. 11 is a diagram illustrating an example to explain in detail the function of a static analysis module in the analysis framework according to the disclosed embodiment.
FIG. 12 is a diagram illustrating an example to explain in detail the function of the dynamic analysis module in the analysis framework according to the disclosed embodiment.
FIG. 13 is a diagram illustrating an example to explain in detail the function of the deep analysis module in the analysis framework according to the disclosed embodiment.
FIG. 14 is a diagram illustrating an example to explain in detail the function of the correlation analysis module in the analysis framework according to the disclosed embodiment.
FIG. 15 is a diagram illustrating an example to explain in detail the function of the prediction information generation module of the prediction framework according to the disclosed embodiment.
16 is a diagram illustrating an example of performing static analysis according to the disclosed embodiment.
17 is a diagram illustrating an example of performing dynamic analysis according to the disclosed embodiment.
18 is a diagram illustrating an example of performing in-depth analysis according to the disclosed embodiment.
FIG. 19 is a diagram illustrating an example of matching an attack technique with codes extracted from a binary code according to the disclosed embodiment.
Figure 20 is a diagram showing an example of matching a code set including OP-CODE and an attack technique according to the disclosed embodiment.
21 is a diagram illustrating a flow of processing cyber threat information according to the disclosed embodiment.
Figure 22 is a diagram illustrating values converted from OP-CODE and ASM-CODE to normalized codes according to the disclosed embodiment.
23 is a diagram illustrating vectorized values of OP-CODE and ASM-CODE according to the disclosing embodiment.
Figure 24 is a diagram illustrating an example of converting a block unit of code into a hash value according to the disclosed embodiment.
25 is a diagram illustrating an example of an ensemble machine learning model according to an embodiment of the disclosure.
Figure 26 is a diagram illustrating a flow of learning and classifying data through machine learning according to the disclosed embodiment.
Figure 27 is a diagram illustrating an example of labeling by identifying an attack identifier and an attacker with learning data according to an embodiment disclosed.
Figure 28 is a diagram showing the results of identifying an attack identifier according to an embodiment
Figure 29 is a diagram illustrating a gram data pattern according to an attack identifier according to an embodiment
30 is a diagram illustrating the performance of an embodiment of processing the disclosed cyber threat information.
31 is a diagram showing an example of providing detection names to detection engines of engines that detect cyber threat information.
32 is a diagram illustrating an example of a new malicious code and attack method according to an embodiment.
Figure 33 is a diagram for explaining an example of performing function-level attack techniques and attack group identification
Figure 34 is a diagram to explain an example of an attack technique and attack group identification when functions are separated
Figure 35 is a diagram illustrating an example of obtaining characteristic information related to a cyber threat according to an embodiment
Figure 36 is a diagram illustrating the process of obtaining control flow using a branch instruction series according to an embodiment.
FIG. 37 is a diagram illustrating a case where an instruction sequence is generated by combining instructions of a control block according to the instruction combining principle illustrated according to the second example.
FIG. 38 is a diagram illustrating another example of generating instruction sequences including feature information using instructions in a control block.
FIG. 39 is a diagram illustrating another example of generating instruction sequences including feature information using instructions in a control block.
FIG. 40 is a diagram illustrating another example of generating instruction sequences including feature information using instructions in a control block.
41 is a diagram illustrating an example of generating an instruction sequence according to the examples described above.
Figure 42 is a diagram illustrating another embodiment of the disclosed cyber threat information processing device
Figure 43 is a diagram illustrating another embodiment of the disclosed cyber threat information processing method
Figure 44 is a diagram conceptually showing a non-executable file structure and a leader program for the non-executable file
Figure 45 is a block diagram illustrating an embodiment of obtaining cyber threat information of non-executable files.
Figure 46 is a diagram illustrating an example of obtaining cyber threat information of a file, included in the file analysis unit, and performing a first type of analysis of the file.
Figure 47 is a diagram showing an example of performing a second type of analysis of a file included in the file analysis unit among examples of obtaining cyber threat information of a file.
Figure 48 is a diagram illustrating the object and extracted information extracted by dynamic execution of a non-executable file by a second type of analysis of the file according to an embodiment
Figure 49 is a diagram illustrating an example of performing a third type of analysis on a file included in the file analysis unit among examples of obtaining cyber threat information on a file.
Figure 50 is a diagram illustrating API hooking list information when a third analysis unit performs mild dynamic analysis according to an embodiment
Figure 51 is a diagram for explaining a feature processing unit in an embodiment of obtaining cyber threat information of a non-executable file
Figure 52 is an example diagram comparing the importance of feature information extracted from a non-executable file according to the disclosed embodiment.
Figure 53 is an example diagram for explaining the classification model of the attack technique classification unit according to the disclosed embodiment.
Figure 54 is a diagram illustrating an attack technique identified by selectively combining several analysis techniques for a non-executable file according to the disclosed example.
Figure 55 is an example diagram for explaining the classification model of the attack group classification unit according to the disclosed embodiment.
Figure 56 is a diagram illustrating the execution of the leader program and system call of the non-executable file described above.
Figure 57 is a diagram for explaining an example of hooking a system call in a program code according to an embodiment
Figure 58 is a diagram illustrating an example of tracking cyber threat information through dynamic analysis according to an embodiment
Figure 59 is a diagram illustrating another embodiment of the disclosed cyber threat information processing device
Figure 60 is a diagram illustrating another embodiment of the disclosed cyber threat information processing method

이하에서는 첨부한 도면을 참조하여 실시 예를 예시하여 상세히 기술하도록 한다. 실시 예에서 프레임워크, 모듈, 응용 프로그램 인터페이스 등은 물리 장치 결합된 장치로 구현할 수도 있고 소프트웨어로 구현할 수도 있다. Hereinafter, examples will be described in detail with reference to the attached drawings. In an embodiment, the framework, module, application program interface, etc. may be implemented as a device combined with a physical device or may be implemented as software.

실시 예가 소프트웨어로 구현될 경우 저장매체에 저장되고 컴퓨터 등에 설치되어 프로세서에 의해 실행될 수 있다. If the embodiment is implemented as software, it may be stored in a storage medium, installed on a computer, etc., and executed by a processor.

사이버 위협 정보 처리 장치 및 사이버 위협 정보 처리 방법의 실시 예들을 상세히 개시하면 다음과 같다. Embodiments of the cyber threat information processing device and the cyber threat information processing method are disclosed in detail as follows.

도 1은 사이버 위협 정보 처리 방법의 일 실시 예를 예시한 도면이다. 사이버 위협 정보 처리 방법의 일 실시 예를 설명하면 다음과 같다. 1 is a diagram illustrating an example of a method for processing cyber threat information. An example of a cyber threat information processing method is described as follows.

사이버 위협 정보 처리 장치로 입력된 파일의 전처리를 수행한다(S1000). The cyber threat information processing device performs preprocessing of the input file (S1000).

파일의 전처리를 통해 파일을 식별할 수 있는 식별 정보를 얻을 수 있다. 파일의 전처리 수행의 일 예는 다음과 같다. Through preprocessing of the file, identification information that can identify the file can be obtained. An example of performing preprocessing of a file is as follows.

수신한 파일로부터 파일의 출처 정보, 파일을 얻은 수집 정보, 파일의 사용자 정보 등을 포함한 여러 가지 메타 정보를 얻을 수 있다. 예를 들어 파일이 URL (uniform resource locator)을 포함하거나 또는 전자메일에 포함된 경우 파일에 대한 수집 정보를 얻을 수 있다. 사용자 정보는 파일의 생성, 업로드 또는 최종 저장한 사용자 정보 등을 포함할 수 있다. 전처리 과정에서 파일의 메타 정보로서 IP(internet protocol) 정보, 이에 기반한 국가 정보, API(Application Programming Interface) key 정보, 예를 들면 분석을 의뢰한 사용자의 API 정보 등을 얻을 수 있다. Various meta information can be obtained from the received file, including the source information of the file, the collection information from which the file was obtained, and the user information of the file. For example, you can obtain aggregate information about a file if it contains a URL (uniform resource locator) or is included in an email. User information may include user information that created, uploaded, or finally saved the file. During the preprocessing process, IP (internet protocol) information, country information based on this, API (Application Programming Interface) key information, and, for example, API information of the user who requested the analysis, can be obtained as meta information of the file.

전처리 과정에서 파일의 해쉬(Hash) 값을 추출할 수도 있다. 해쉬 값이 이미 사이버 위협 정보 처리 장치에 알려진 것이라면 이를 기반으로 파일의 종류나 위험 정도를 식별할 수 있다. The hash value of the file can also be extracted during the preprocessing process. If the hash value is already known to the cyber threat information processing device, the type or risk level of the file can be identified based on this.

만약 이미 알려진 파일이 아니라면 기 저장된 정보 또는 필요한 경우 외부의 레퍼런스 웹 사이트(reference website)에 해쉬 값과 파일 정보를 조회하여 파일 종류 식별을 위한 분석 정보를 얻을 수 있다. 예를 들어 외부의 레퍼런스 웹 사이트로서 한국인터넷진흥원에서 운영하는 C-TAS(Cyber Threats Analysis System), CTA(Cyber Threat Alliance)의 운영시스템, VitusTotal 등의 사이트로부터 파일 종류에 따른 정보를 얻을 수 있다. If the file is not already known, you can obtain analysis information to identify the file type by searching the hash value and file information in pre-stored information or, if necessary, on an external reference website. For example, you can obtain information about file types from external reference websites such as C-TAS (Cyber Threats Analysis System) operated by the Korea Internet & Security Agency, CTA (Cyber Threat Alliance) operating system, and VitusTotal.

예를 들면, 파일의 MD5 (Message-Digest algorithm 5), SHA1 (Secure Hash Algorithm 1), SHA 256 등의 해쉬 함수의 해쉬 값을 이용하여 해당 사이트에서 파일을 검색할 수 있다. 그리고 검색 결과를 이용해 상기 파일을 식별할 수 있다.For example, you can search for files on the site using the hash values of the file's hash functions such as MD5 (Message-Digest algorithm 5), SHA1 (Secure Hash Algorithm 1), and SHA 256. The file can then be identified using the search results.

파일을 분석을 수행하는 일 예로서, 입력된 파일이 모바일 네트워크를 통해 전송될 경우 네트워크 트래픽을 통해 전송되는 패킷은 네트워크 전송 패킷의 재조합 기술 등을 사용하여 입력된 파일이 모바일 악성 의심 코드인 경우 이를 저장할 수 있다. 패킷의 재조합 기술은 수집된 네트워크 트래픽에서 하나의 실행 코드에 해당하는 일련의 패킷들을 재 조합하며, 재 조합된 패킷들에 의해 전송되는 파일이 모바일 악성 의심 코드인 경우 이 파일이 저장된다. As an example of analyzing a file, when an input file is transmitted through a mobile network, packets transmitted through network traffic are analyzed if the input file is suspected mobile malicious code using network transmission packet recombination technology, etc. You can save it. Packet recombination technology recombines a series of packets corresponding to one execution code from the collected network traffic, and if the file transmitted by the recombined packets is a suspected mobile malicious code, this file is saved.

만약 이 단계에서 전송 파일 내에 모바일 악성 의심 코드 추출이 되지 않은 경우 파일 내에 다운로드 URL에 직접 접속하여 모바일 악성 의심 코드를 다운로드하여 저장할 수도 있다. If the suspected mobile malicious code is not extracted from the transmitted file at this stage, you can download and save the suspected mobile malicious code by directly accessing the download URL in the file.

상기 입력된 파일과 관련된 악성 행위(malicious activity) 분석 정보 생성한다(S2000).Malicious activity analysis information related to the input file is generated (S2000).

입력된 파일과 관련된 악성 행위의 분석 정보는 파일 자체에 대한 정보를 분석하는 정적 분석 정보나 입력된 파일로부터 얻은 정보를 실행하여 악성 행위 여부를 판별할 수 있는 동적 분석 정보를 포함할 수 있다. Analysis information on malicious behavior related to an input file may include static analysis information that analyzes information about the file itself, or dynamic analysis information that can determine whether malicious behavior exists by executing information obtained from the input file.

이 단계의 분석 정보는 입력된 파일과 관련된 실행 파일로부터 가공된 정보를 이용하거나 파일과 관련된 메모리 분석을 수행하는 심층 분석 정보를 포함할 수 있다. Analysis information at this stage may include in-depth analysis information that uses information processed from an executable file related to the input file or performs memory analysis related to the file.

심층 분석은 악성 행위를 정확하게 식별할 수 있도록 인공 지능 분석을 포함할 수 있다.Deeper analysis may include artificial intelligence analysis to accurately identify malicious behavior.

이 단계의 분석 정보는 또한 파일과 관련하여 이미 저장된 분석 정보나 또는 생성된 분석 정보를 서로 연관시켜 공격 행위나 공격자에 대한 연관 관계를 추정할 수 있는 연관관계 분석 정보를 포함할 수 있다. The analysis information at this stage may also include correlation analysis information that can estimate an attack behavior or a relationship to an attacker by correlating analysis information already stored or generated in relation to a file.

이 단계에서 다수의 분석 정보는 전체 분석 결과로 제공되기 위해 취합될 수 있다. At this stage, multiple pieces of analysis information can be aggregated to provide overall analysis results.

예를 들어 하나의 파일에 대한 정적 분석 정보, 동적 분석 정보, 심층 분석 정보, 연관관계 분석 정보 등은 정확한 공격 기법과 공격자 식별을 위해 통합 분석될 수 있다. 통합 분석은 분석 정보 사이의 중복된 부분을 제거하고 분석 정보 간 공통의 정보는 정확도를 높이는데 사용될 수 있다. For example, static analysis information, dynamic analysis information, in-depth analysis information, correlation analysis information, etc. for one file can be integrated and analyzed to identify accurate attack techniques and attackers. Integrated analysis removes overlap between analysis information, and common information between analysis information can be used to increase accuracy.

예를 들어 여러 분석과 경로를 통해 수집된 사이버 위협 침해 정보(indicator of compromise, IoC)들은 정보들 사이에 노멀라이징(normalizing)하거나 인리치먼트(enrichment) 수행을 통해 표준화 작업을 수행할 수 있다. For example, cyber threat indicator of compromise (IoC) collected through various analyzes and channels can be standardized by normalizing or enriching the information.

분석 정보의 획득하는 실시 예에서 반드시 위의 기술된 모든 분석 정보를 순서에 따라 산출할 필요는 없다. 예를 들어 정적 분석 정보 획득과 동적 분석 정보 획득은 어느 하나만 진행될 수도 있으며 정적 분석 정보 보다 동적 분석 정보를 먼저 수행할 수도 있다. In the embodiment of acquiring analysis information, it is not necessary to calculate all the analysis information described above in order. For example, acquisition of static analysis information and acquisition of dynamic analysis information may be performed either alone, or dynamic analysis information may be performed before static analysis information.

심층 분석 정보는 반드시 정적 분석 또는 동적 분석을 수행한 후 진행될 필요가 없으며, 연관 관계 분석도 심층 분석 정보 없이 수행될 수도 있다. In-depth analysis information does not necessarily need to be performed after performing static or dynamic analysis, and correlation analysis may also be performed without in-depth analysis information.

따라서 위 분석 정보를 획득하는 처리 순서는 변경될 수도 있으며 선택적으로 이루어질 수도 있다. 또한 위에 기술한 분석 정보의 획득 과정과 예측 정보의 생성 과정은 파일로부터 획득한 정보에 기초하여 병렬적으로 수행될 수 있다. 예를 들면 동적 분석이 수행이 완료되지 않더라도 연관관계 분석 정보를 생성할 수도 있다. 마찬가지로 동적 분석 수행이나 심층 분석 수행이 동시에 진행될 수 있다.Therefore, the processing order for obtaining the above analysis information may be changed or may be selective. Additionally, the process of acquiring analysis information and generating prediction information described above can be performed in parallel based on information acquired from the file. For example, correlation analysis information may be generated even if dynamic analysis is not completed. Likewise, performing dynamic analysis or performing in-depth analysis can proceed simultaneously.

이러한 경우 위에서 예시한 전처리 과정(S1000)은 파일의 정보를 얻거나 식별하기 위한 것이므로 정적 분석, 동적 분석, 심층 분석 또는 연관 분석이 개별적이나 병렬적으로 수행될 경우 각 분석 단계에 일부로서 각각 수행될 수 있다.In this case, the preprocessing process (S1000) illustrated above is intended to obtain or identify file information, so if static analysis, dynamic analysis, in-depth analysis, or correlation analysis is performed individually or in parallel, each will be performed as part of each analysis step. You can.

이 단계에 대한 상세한 실시 예는 아래에서 후술한다. Detailed examples of this step are described below.

상기 입력된 파일과 관련된 악성 행위의 예측 정보를 생성할 수 있다(S3000).Predictive information about malicious behavior related to the input file can be generated (S3000).

분석 정확도를 높이기 위해 위의 분석된 여러 가지 정보의 데이터 세트를 이용하여 악성 행위의 발생 여부, 공격 기법, 공격자 그룹 등에 대한 예측 정보를 생성할 수 있다. In order to increase analysis accuracy, the data set of various information analyzed above can be used to generate predictive information about whether malicious behavior has occurred, attack techniques, attacker groups, etc.

예측 정보의 생성은 이미 분석된 데이터 세트에 대한 인공지능 분석을 통해 수행될 수 있다. 예측 정보의 생성은 필수적인 단계가 아니며 인공지능 분석을 위해 적절하게 분석된 데이터 세트가 마련되어 조건이 만족될 경우 추후 악성 공격 행위에 대한 예측 정보를 생성할 수 있다. Generation of predictive information can be performed through artificial intelligence analysis of already analyzed data sets. Generating predictive information is not an essential step, and if a properly analyzed data set is prepared for artificial intelligence analysis and the conditions are met, predictive information about future malicious attacks can be generated.

실시 예는 여러 가지 분석 정보들을 기반으로 인공 지능 기반의 머신 러닝을 수행한다. 실시 예는 분석된 정보에 대한 데이터 세트를 기반으로 예측 정보를 생성할 수 있다. 예를 들면 인공 지능으로 학습된 데이터를 바탕으로 추가적인 분석 정보를 생성하고 다시 생성된 분석 정보는 다시 새로운 학습 데이터로서 인공 지능의 입력 데이터로 이용될 수 있다. The embodiment performs artificial intelligence-based machine learning based on various analysis information. Embodiments may generate prediction information based on a data set for the analyzed information. For example, additional analysis information is generated based on data learned by artificial intelligence, and the re-generated analysis information can be used as new learning data and input data for artificial intelligence.

여기서 예측 정보는 악성 코드 제작자 정보, 악성 코드 공격 방법 정보, 악성 코드 공격 그룹 예측, 악성 코드 유사도 예측 정보, 및 악성 코드 확산도 예측 정보 등을 포함할 수 있다. Here, the prediction information may include malicious code creator information, malicious code attack method information, malicious code attack group prediction, malicious code similarity prediction information, and malicious code spread prediction information.

생성된 예측 정보는 악성 코드 자체의 위험도를 예측한 제 1 예측 정보와 악성 코드의 공격자, 공격 그룹, 유사도, 확산도 등을 예측한 제 2 예측 정보 등을 포함할 수 있다. The generated prediction information may include first prediction information predicting the risk of the malicious code itself and second prediction information predicting the attacker, attack group, similarity, and spread of the malicious code.

이러한 제 1 예측 정보와 제 2 예측 정보를 포함하는 예측 분석 정보는 서버나 데이터 베이스에 저장될 수 있다.Prediction analysis information including such first prediction information and second prediction information may be stored in a server or database.

이에 대한 상세한 실시 예는 이하에서 후술한다. Detailed examples of this are described below.

상기의 분석 정보 또는 예측 정보에 대한 후처리 후 상기 입력된 파일과 관련된 사이버 위협 정보를 제공한다(S4000).After post-processing the analysis information or prediction information, cyber threat information related to the input file is provided (S4000).

실시 예는 분석 정보 또는 예측 정보에 기초하여 악성 코드 종류 및 악성 코드의 위험도를 결정한다. 그리고 실시 예는 악성 코드에 대한 프로파일링 정보를 생성한다. 따라서 파일 분석을 통해 파일에 대한 자체 분석을 수행한 결과나 추가 및 예측 분석을 수행한 결과를 저장할 수 있다. 생성되는 프로파일링 정보는 악성 코드에 대한 공격 기법이나 공격자에 대한 라벨링을 포함한다.The embodiment determines the type of malicious code and the risk of the malicious code based on analysis information or prediction information. And the embodiment generates profiling information about malicious code. Therefore, through file analysis, you can save the results of your own analysis of the file or the results of additional and predictive analysis. The profiling information generated includes attack techniques for malicious code or labeling of the attacker.

사이버 위협 정보는 위의 전처리가 수행된 정보, 생성되거나 식별된 분석 정보, 생성된 예측 정보 또는 이 정보들의 취합 정보나 이 정보들을 기반으로 결정된 정보를 포함할 수 있다. Cyber threat information may include information on which the above preprocessing has been performed, analysis information generated or identified, predicted information generated, information collected from this information, or information determined based on this information.

제공되는 사이버 위협 정보에는 입력된 파일과 관련하여 데이터 베이스에 저장된 분석 정보를 이용하거나 위에서 분석되거나 예측된 정보가 포함될 수 있다. The cyber threat information provided may use analysis information stored in a database related to the input file or may include information analyzed or predicted above.

실시 예에 따르면 사용자가 입력된 파일에 대한 악성 행위뿐만 아니라 이미 저장된 파일이나 악성 행위에 대해 사이버 위협 정보를 조회할 경우 이에 대한 정보를 제공할 수 있다.According to an embodiment, when a user searches for cyber threat information not only about malicious actions on input files but also about already stored files or malicious actions, information about these may be provided.

이러한 통합 분석 정보는 해당 파일에 대응하여 서버나 데이터 베이스에 표준화된 포맷으로 저장될 수 있다. 이러한 통합 분석 정보는 표준화된 포맷으로 저장되어 사이버 위협 정보를 검색 또는 조회에 사용될 수 있다. This integrated analysis information can be stored in a standardized format on a server or database corresponding to the corresponding file. This integrated analysis information is stored in a standardized format and can be used to search or query cyber threat information.

사용자의 사이버 위협 정보의 조회에 대항 추가적인 예시는 이하에서 상세히 후술한다.Additional examples of countering the user's inquiry of cyber threat information are described in detail below.

도 2는 개시하는 실시 예에 따라 분석 정보 생성하는 과정에서 정적 분석 정보를 얻는 예를 개시한다. Figure 2 discloses an example of obtaining static analysis information in the process of generating analysis information according to the disclosed embodiment.

개시하는 실시 예에 따른 정적 분석 정보를 획득하는 단계는, 입력된 파일의 구조 정보를 얻고 분석하는 단계를 포함할 수 있다(S2110). The step of acquiring static analysis information according to the disclosed embodiment may include obtaining and analyzing structure information of the input file (S2110).

실시 예는 파일이 실행되지 않는 환경에서 먼저 식별된 파일 기본적인 구조 정보를 분석할 수 있다. 이 단계에서는 예를 들어 파일의 종류가 ELF(Executable and Linkable Format), PE(Portable Executable), APK(Android Application Package) 등에 파일 종류가 다르더라도 파일의 위 파일 구조나 그 구조로부터 추출할 수 있는 정보를 획득하거나 분석한다. The embodiment may analyze the basic structure information of an identified file in an environment in which the file is not executed. At this stage, even if the file type is different, for example, ELF (Executable and Linkable Format), PE (Portable Executable), APK (Android Application Package), etc., the above file structure of the file or information that can be extracted from that structure Obtain or analyze.

참고로 예시하는 정적 분석에서 파일의 식별은 개시한 전처리 단계에서 수행될 수도 있는데 이러한 경우 S210 단계의 분석 단계는 전처리 단계와 함께 수행될 수 있다.For reference, in the static analysis exemplified, the identification of the file may be performed in the preprocessing step. In this case, the analysis step of step S210 may be performed together with the preprocessing step.

그리고 입력된 파일의 패턴 분석을 수행할 수 있다(S2120). And pattern analysis of the input file can be performed (S2120).

여기서는 식별된 파일에 대해 파일 패턴을 분석하는 경우로서 파일에 어떤 조치를 취하지 않고 파일 자체를 오픈하여 추출할 수 있는 여러 스트링(string) 등을 확인하여 파일의 패턴을 얻을 수 있다. In this case, the file pattern of the identified file is analyzed, and the pattern of the file can be obtained by checking various strings that can be extracted by opening the file itself without taking any action on the file.

입력된 파일이 제작과 관련된 정보를 얻고 분석할 수 있다(S2130). Information related to production of the input file can be obtained and analyzed (S2130).

실시 예는 파일이 가지고 있는 고유 정보나 메타 정보, 예를 들면 파일 제작자 정보, 실행 파일인 경우 코드사이닝(codesigning) 정보 등을 얻을 수 있다. In an embodiment, unique information or meta information of a file, such as file creator information or, in the case of an executable file, codesigning information, etc. can be obtained.

그리고 입력된 파일의 환경 정보를 분석할 수 있다(S2140). And the environmental information of the input file can be analyzed (S2140).

여기서는 대상 파일이 갖추어야 할 시스템 환경적 구성 요소 정보 등에 정보를 얻을 수 있다. Here, you can obtain information such as system environment component information that the target file must have.

그리고 입력된 파일과 관련된 여러 가지 기타 정보들을 분석하고 저장한다(S2150). 이러한 파일의 수행 없이 파일 자체의 정적 정보를 특정 파일 포맷, 예를 들어 JSON (JavaScript Object Notation)과 같은 데이터 포맷으로 저장할 수 있다.And various other information related to the input file is analyzed and stored (S2150). Without executing these files, the static information of the file itself can be saved in a specific file format, for example, a data format such as JSON (JavaScript Object Notation).

정적 분석의 예는 파일 자체를 분석하는 것으로서 코딩 기반의 취약 항목 존재 여부, 인터페이스 또는 함수의 호출 구조 문제, 또는 파일의 바이너리 구조 등을 얻을 수 있다. An example of static analysis is analyzing the file itself, which can determine the presence of coding-based vulnerabilities, problems with the call structure of interfaces or functions, or the binary structure of the file.

위에서 개시한 정적 정보를 분석하는 일 예를 편의상 플로우 차트로 나타내었으나, 위 단계들은 반드시 위에서 기술되거나 도면에서 표시된 순서로 수행될 필요가 없다. 또한 파일에 따라 이 도면에서 개시한 모든 단계를 수행할 필요도 없으며 정적 분석 정보를 얻기 위해 일부 단계, 예를 들면 구조 정부 분석, 제작 관련 정보 분석 및 환경 정보 분석을 선택적으로 수행할 수도 있다. 즉 이에 대한 실시 순서와 실시 단계의 선택의 당업자의 선택에 따라 달라질 수 있다. An example of analyzing the static information disclosed above is shown as a flow chart for convenience, but the steps do not necessarily need to be performed in the order described above or shown in the drawings. Additionally, depending on the file, it may not be necessary to perform all the steps disclosed in this drawing, and some steps may optionally be performed to obtain static analysis information, such as structural government analysis, fabrication-related information analysis, and environmental information analysis. In other words, the order of implementation and selection of implementation steps may vary depending on the selection of a person skilled in the art.

개시된 실시 예에 따라 정적 분석 정보를 획득하는 예들을 간략하게 설명하면 다음과 같다. Examples of obtaining static analysis information according to the disclosed embodiment will be briefly described as follows.

정적 분석을 수행하는 일 예로서, 전처리 과정에서 입력된 파일의 해쉬(Hash) 값을 추출할 경우 추출된 파일의 해쉬 값과, 악성코드에 대해 이미 저장된 해쉬 값과 비교하여 상기 입력된 파일이 악성코드 여부를 분석할 수 있다. 분석된 기반으로 파일 내에 악성 코드가 있는지 탐지할 수 있다. As an example of performing static analysis, when extracting the hash value of an input file in the preprocessing process, the hash value of the extracted file is compared with the hash value already stored for malicious code to determine if the input file is malicious. You can analyze the code. Based on the analysis, it is possible to detect whether there is malicious code in the file.

만약, 입력 파일이 모바일 데이터 인 경우 입력된 파일로부터 모바일 악성 의심 코드의 코드 정보를 추출한다. 여기서, 코드 정보란 모바일 악성 의심 코드를 실행하지 않고 코드 자체로부터 추출할 수 있는 정보를 의미하는 것으로, 예를 들어, 해쉬(Hash) 정보, 코드 크기 정보, 파일 헤더 정보, 코드 내에 포함되어 있는 식별 가능한 문자열 정보 및 동작 플랫폼 정보 등을 포함할 수 있다.If the input file is mobile data, code information of suspected mobile malicious code is extracted from the input file. Here, code information refers to information that can be extracted from the code itself without executing suspected mobile malicious code, such as hash information, code size information, file header information, and identification contained within the code. Possible string information and operating platform information may be included.

설명한 바와 같이 이와 같이 획득된 정적 분석 정보는 해당 파일에 대응하여 저장될 수 있다.As described, the static analysis information obtained in this way may be stored corresponding to the corresponding file.

도 3은 개시하는 실시 예에 따라 분석 정보 생성하는 과정에서 동적 분석 정보를 얻는 예를 개시한다.Figure 3 discloses an example of obtaining dynamic analysis information in the process of generating analysis information according to the disclosed embodiment.

전처리로부터 식별된 파일 정보 또는 정적 분석 정보 중 적어도 하나에 기반하여 식별된 파일의 실행 환경에서 실행된 결과 데이터에 따른 동적 분석 정보를 획득할 수 있다Dynamic analysis information according to result data executed in the execution environment of the identified file based on at least one of the file information identified from preprocessing or the static analysis information can be obtained.

개시하는 실시 예에 따른 동적 분석 정보를 획득하는 단계는 파일이 실행 중인 환경에서 다양한 입출력 데이터를 분석하거나 또는 파일 실행 시 실행 환경과 상호작용의 변화를 분석하여 취약하거나 위험한 이상현상을 탐지하는 단계이다. 일반적으로 가상화 환경에서 파일을 직접적으로 실행하여 이상 여부를 분석한다. The step of acquiring dynamic analysis information according to the disclosed embodiment is a step of detecting vulnerable or dangerous anomalies by analyzing various input and output data in the environment in which the file is running, or analyzing changes in the execution environment and interaction when the file is executed. . In general, files are run directly in a virtualization environment and any abnormalities are analyzed.

동적 분석을 수행하기 위해 실시 예는 입력 파일을 실행하기 위한 동적 분석 환경을 생성하고 준비한다(S2210). 입력된 파일의 타입을 식별한 경우 각각의 파일의 타입에 따라 어떤 실행 환경이 필요한지 알 수 있다. 예를 들면 파일에 따라 윈도우 운영체제, 리눅스 운영체제, 모바일 기기 운영체제에서 실행되는 파일인지 식별할 수 있다. To perform dynamic analysis, the embodiment creates and prepares a dynamic analysis environment for executing the input file (S2210). If the type of the input file is identified, you can find out what execution environment is needed depending on the type of each file. For example, depending on the file, it can be identified whether it is a file running on a Windows operating system, a Linux operating system, or a mobile device operating system.

준비된 분석 환경에서 악성 코드 여부를 판별하기 위해 획득된 파일을 실행한다(S2220). Execute the acquired file to determine whether it is malicious code in the prepared analysis environment (S2220).

동적 분석 정보를 획득하기 위해 이러한 실행 환경에서 파일을 실행하여 해당 시스템에서 발생하는 이벤트를 수집할 수 있다(S2230). 예를 파일 자체, 프로세스, 메모리, 레지스트리, 네트워크의 시스템에 대한 이벤트 또는 각 시스템의 설정을 변경시키는 이벤트를 수집할 수 있다. 그리고, 수집된 이벤트들을 개별적으로 또는 취합하여 분석한다.In order to obtain dynamic analysis information, events occurring in the system can be collected by executing files in this execution environment (S2230). For example, you can collect events about the files themselves, processes, memory, registry, systems on the network, or events that change the settings of each system. Then, the collected events are analyzed individually or collectively.

수집된 결과를 취합한 후 동적 분석을 위한 환경을 다시 복구한다(S2240). After collecting the collected results, the environment for dynamic analysis is restored (S2240).

이와 같이 획득된 결과는 해당 파일에 대응된 동적 분석 정보로 저장될 수 있다.The results obtained in this way can be saved as dynamic analysis information corresponding to the corresponding file.

이하에서 이와 같은 동적 분석 정보를 획득하는 실시 예에 따라 동적 분석 정보를 수집하고 분석하는 예를 간략하게 개시한다. Below, an example of collecting and analyzing dynamic analysis information according to an embodiment of acquiring such dynamic analysis information will be briefly disclosed.

동적 분석의 일 실시 예로서, 입력된 파일이 모바일 기기 운영 체제에서 동작하는 파일로 식별된 경우, 파일을 모바일 단말 또는 모바일 단말 환경과 동일하게 구성된 에뮬레이터나 가상화 환경에서 직접 실행한다. 그리고 파일 내에 모바일 악성 의심 코드가 실행된 후에 단말에 발생하는 모든 변화, 즉 행위 정보를 추출하고 기록한다. 행위 정보는 단말의 운영체제(OS) 환경에 따라 상이하나, 통상적으로 프로세스, 파일, 메모리 및 네트워크 정보 등의 이벤트 정보를 포함할 수 있다.As an example of dynamic analysis, if the input file is identified as a file that runs on the mobile device operating system, the file is directly executed in a mobile terminal or an emulator or virtualization environment configured identically to the mobile device environment. In addition, all changes that occur in the terminal after the suspected mobile malicious code is executed in the file, that is, behavioral information, are extracted and recorded. Behavior information varies depending on the terminal's operating system (OS) environment, but typically includes event information such as process, file, memory, and network information.

동적 분석의 다른 실시 예로서 전처리 과정에서 입력된 파일의 해쉬(Hash) 값을 추출되지 않고 사용자 단말에서 해쉬 값이 추출된 경우라도, 단말에서 추출된 파일의 해쉬 값을 인텔리전스 플랫폼을 통해 수신할 수 있다. As another example of dynamic analysis, even if the hash value of the input file is not extracted in the preprocessing process and the hash value is extracted from the user terminal, the hash value of the file extracted from the terminal can be received through the intelligence platform. there is.

데이터베이스에 해당 파일의 해쉬 값이 이미 저장되지 않는 경우 수신된 파일을 가상 또는 실제의 운영체제에서 실행시키고, 실행 시에 발생되는 행위를 실시간으로 수집하고 수집된 동적분석 정보를 데이터베이스에 이미 저장된 정보와 비교할 수 있다. If the hash value of the file is not already stored in the database, the received file is executed in a virtual or real operating system, the actions that occur during execution are collected in real time, and the collected dynamic analysis information is compared with the information already stored in the database. You can.

상기 비교 결과 이미 정의된 위험도를 초과하는 경우 입력된 파일이 악성 코드를 포함하고 있다고 판단할 수 있고, 해당 파일의 해쉬 값을 데이터 베이스에 저장하여 추후 정적 분석 등에 이용할 수 있다. If the comparison result exceeds the already defined risk level, it can be determined that the input file contains malicious code, and the hash value of the file can be stored in the database and used for static analysis in the future.

악성 코드에 따라 행위 주체가 되는 제 1 프로세스가 시스템에 위험한 행위를 발생하는 경우도 있다. 그러나, 경우에 따라 상기 제 1 의 프로세스의 행위가 추가적으로 자식 프로세스인 제 2 프로세스를 추가로 생성하고 상기 제 2 프로세스가 시스템에 악성 행위를 수행하는 경우도 있다. Depending on the malicious code, the first process acting as the agent may cause actions that are dangerous to the system. However, in some cases, the actions of the first process additionally create a second process, which is a child process, and the second process may perform malicious actions on the system.

이러한 경우, 동적 분석의 일 실시 예는 최초의 제 1 의 프로세스의 행위가 실행 시스템에 발생시키는 이벤트들을 저장하고, 추가적으로 제 1 프로세스의 자식 프로세스인 제 2 프로세스를 추출 또는 확인하여 상기 제 2 프로세스에 따른 악성 행위의 이벤트를 저장할 수도 있다. 이와 같이 이 예에서 동적 분석은 최초의 제 1 프로세스와 그와 연결될 제 2, 3의 프로세스의 이벤트 정보도 종합적으로 분석하여 식별된 파일이 악성 코드를 포함하는지 판단할 수 있다.In this case, an embodiment of dynamic analysis stores events that the actions of the first process generate in the execution system, and additionally extracts or verifies the second process, which is a child process of the first process, and attaches a message to the second process. Events of malicious actions that follow can also be stored. In this example, dynamic analysis can determine whether the identified file contains malicious code by comprehensively analyzing event information of the first process and the second and third processes to be connected to it.

입력된 파일의 실행 결과에 따라 알려지지 않은 악성 코드의 특성이 없는 경우는 악성 코드의 특성을 가지고 있더라도 탐지하기 어려운 경우 있다. 이러한 경우 동적 분석의 또 다른 실시 예는 식별된 파일이 실행 시에 외부와 통신하는 네트워크 프로세스를 모니터링하고 분석하여 상기 실행 프로세스의 악성 행위를 탐지할 수 있다. Depending on the execution results of the input file, if there are no known characteristics of malicious code, it may be difficult to detect even if it has the characteristics of malicious code. In this case, another embodiment of dynamic analysis can detect malicious behavior of the execution process by monitoring and analyzing the network process that communicates with the outside when the identified file is executed.

예를 들면 식별된 파일을 실행한 경우 외부와 통신하는 네트워크 이벤트를 모니터링할 수 있다. 파일 실행에 따라 로컬 어드레스 오브젝트(local address object)를 생성한 프로세스 아이디(Process IDentifier, PID)를 저장한다. 그리고, 상기 파일 실행과 관련된 네트워크 이벤트가 발생될 경우 해당 네트워크 이벤트의 IRP(Interior Router Protocol) 정보로부터 로컬 어드레스 오브젝트 정보들을 추출할 수 있다. For example, when an identified file is executed, network events communicating with the outside world can be monitored. Stores the process ID (Process IDentifier, PID) that created the local address object as the file is executed. Additionally, when a network event related to file execution occurs, local address object information can be extracted from the IRP (Interior Router Protocol) information of the network event.

상기 프로세스 아이디가 생성한 로컬 어드레스 오브젝트와 상기 네트워크 이벤트와 관련된 로컬 어드레스 오브젝트들을 비교하여 악성 행위를 판단하는 동적 분석을 수행할 수 있다. 예를 들면 상기 네트워크 이벤트에 따라 송수신되는 패킷의 패턴이나 또는 패킷 전송을 유발하는 C&C (Control and Command) 서버를 확인하여 악성 행위 여부를 판단할 수 있다. Dynamic analysis can be performed to determine malicious behavior by comparing the local address object created by the process ID with local address objects related to the network event. For example, malicious behavior can be determined by checking the pattern of packets transmitted or received according to the network event or the C&C (Control and Command) server that causes packet transmission.

동적 분석의 또 다른 실시 예로서, 주소 결정 프로토콜(Address Resolution Protocol, ARP) 스푸핑 (spoofing) 공격을 방지하기 위해 ARP 정보를 모니터닝할 수도 있다. 일반적으로 로컬 영역 네트워크에서 장비의 IP(internet protocol) 주소와 MAC (media access control) 주소간의 대응은 ARP 이나 Neighbor Discovery Protocol (NDP) 이 사용될 수 있다. As another example of dynamic analysis, ARP information may be monitored to prevent Address Resolution Protocol (ARP) spoofing attacks. In general, ARP or Neighbor Discovery Protocol (NDP) can be used to correspond between a device's IP (internet protocol) address and MAC (media access control) address in a local area network.

ARP 스푸핑 공격은 공격자가 IP 패킷을 전송할 경우 수신 네트워크 장비의 MAC 주소가 아닌 자신의 MAC 주소에 대응하는 ARP 메시지를 전송하여 이루어진다. 전송된 메시지를 수신한 네트워크 장비는 전송 패킷을 정상적인 IP 주소가 아닌 공격자로 전송하도록 한다. ARP spoofing attacks occur when an attacker transmits an IP packet by sending an ARP message corresponding to the attacker's own MAC address rather than the MAC address of the receiving network device. The network equipment that receives the transmitted message causes the transmission packet to be sent to the attacker rather than to the normal IP address.

실시 예는 이러한 공격에 대응하기 위하여 네트워크 장비들로부터 직접 수집된 ARP 정보와, 가상 네트워크에 포함된 네트워크 장비들의 SNMP (Simple Network Management Protocol) 정보 내의 ARP 정보를 비교함으로써 ARP 스푸핑 공격 발생 여부를 판단할 수 있다. In order to respond to such attacks, the embodiment determines whether an ARP spoofing attack has occurred by comparing ARP information collected directly from network devices with ARP information in SNMP (Simple Network Management Protocol) information of network devices included in the virtual network. You can.

즉, 동적 분석의 일 실시 예는, 호스트가 네트워크에 연결된 장비들에 ARP 정보 요청 메시지를 전송하여 회신된 ARP 응답 메시지에 포함된 제 1 ARP 정보와, 가상 네트워크에 접속된 장비들의 SNMP 정보 내에 포함된 제 2 ARP 정보를 비교하여 제 1 ARP 정보와 제 2 ARP 정보가 다른 경우 ARP 스푸닝 공격이 발생했다고 판단할 수 있다. That is, in one embodiment of dynamic analysis, the host sends an ARP information request message to devices connected to the network, and the first ARP information included in the returned ARP response message is included in the SNMP information of devices connected to the virtual network. By comparing the generated second ARP information, if the first ARP information and the second ARP information are different, it can be determined that an ARP spooning attack has occurred.

이 실시 예는 이러한 동적 분석의 방식을 이용하여 ARP 스푸핑 공격을 탐지하고 호스트 장비에 저장될 기밀 정보 유출을 방지할 수 있다. This embodiment can use this dynamic analysis method to detect ARP spoofing attacks and prevent leakage of confidential information stored in the host device.

동적 분석 방식의 또 다른 실시에는 가상 환경을 회피하도록 하는 악성 코드를 분석할 수 있는 방법이다. 여기서 관리 서버와 네트워크를 통해 연결된 단말은 관리 서버에 저장된 제 1 OS (operating system) 이미지를 이용해 부팅을 수행할 수 있다. 단말이 부팅된 후 상기 제 1 OS에 기초하여 악성 코드를 분석한 후, 상기 단말은 관리 서버로부터 제 2 OS 이미지를 수신하고, 수신된 제 2 OS 이미지를 이용해 초기화를 수행한다. 그리고 상기 단말이 악성 코드가 분석 종료된 시그니처를 상기 관리 서버로 전송하도록 한다. 따라서, 제 1 OS에 기초하여 악성 코드를 분석 후에 발행된 악성 행위가 있더라도 상기 관리 서버는 단말이 제 1 OS을 단말에서 삭제하도록 하고 원본 OS 이미지와 동일한 제 2 OS를 기초로 단말이 부팅하도록 함으로써 단말에 악성 행위 발생을 방지하도록 할 수 있다. Another implementation of the dynamic analysis method is a method that can analyze malicious code that avoids the virtual environment. Here, a terminal connected to the management server through a network can boot using the first OS (operating system) image stored in the management server. After the terminal is booted and the malicious code is analyzed based on the first OS, the terminal receives a second OS image from the management server and performs initialization using the received second OS image. Then, the terminal transmits the signature for which the malicious code has been analyzed to the management server. Therefore, even if there is a malicious act issued after analyzing the malicious code based on the first OS, the management server causes the terminal to delete the first OS from the terminal and boots the terminal based on the second OS that is the same as the original OS image. It is possible to prevent malicious actions from occurring on the terminal.

악성 코드는 외부의 서버와 통신하며 추가적인 명령을 발생시키고 파일을 수신하도록 할 수 있다. Malicious code can communicate with an external server, issue additional commands, and receive files.

그런데 동적 분석을 수행할 수 있는 서버가 중지된 경우는 이러한 동적 분석에 매우 오랜 시간이 소요될 수 있고 해당 행위가 사전 차단된 경우에도 동적 분석을 수행할 수 없는 경우가 있다. However, if the server that can perform dynamic analysis is stopped, such dynamic analysis may take a very long time, and there are cases where dynamic analysis cannot be performed even if the action is blocked in advance.

동적 분석을 통해 네트워크 행위를 분석하기 위해서는 악성 코드가 사용하는 명령 제어 서버(C&C 서버), 추가적인 악성 코드를 다운로드하기 위한 다운로드 서버 또는 악성 코드들끼리 정보를 주고 받거나 해커와 정보를 주고 받는 커뮤니케이션 패킷 등의 정보를 추출하여 분석해야 한다. 그러나, 이와 같이 관련 서버가 작동하지 않는 경우에는 그러한 정보의 추출할 수 없다. To analyze network behavior through dynamic analysis, you can use command and control servers (C&C servers) used by malicious code, download servers to download additional malicious code, or communication packets that exchange information between malicious codes or with hackers. Information must be extracted and analyzed. However, if the relevant server is not operating, such information cannot be extracted.

여기서 개시하는 동적 분석 방법의 또 다른 실시 예는 서버가 동작 중지된 경우에도 동적 분석을 수행하도록 할 수 있다. Another embodiment of the dynamic analysis method disclosed herein allows dynamic analysis to be performed even when the server is stopped.

예를 들어 네트워크 접속 유도 장치가 악성 코드에 감염된 클라이언트 단말과 관리 서버에 사이에서 단말의 접속 요청을 처리하도록 하여 동적 분석을 진행하도록 할 수도 있다. 네트워크 접속 유도 장치는 단말로부터 접속 요청을 수신하고 이를 악성 코드 행위를 유발시키는 C&C 서버로 전달하도록 할 수 있다. 그리고, 만약 상기 네트워크 접속 유도 장치가 일정 시간 내에 C&C 서버로부터 응답 패킷을 수신하지 못하면, 상기 네트워크 접속 유도 장치는 별도의 가상의 응답 패킷과 접속 요청을 함께 상기 단말에 전송하도록 한다. For example, a network connection induction device may process a terminal's connection request between a client terminal infected with malicious code and a management server to perform dynamic analysis. A network connection induction device can receive a connection request from a terminal and transmit it to a C&C server that triggers malicious code actions. And, if the network connection inducing device does not receive a response packet from the C&C server within a certain time, the network connection inducing device transmits a separate virtual response packet and a connection request to the terminal.

이후에 상기 단말로부터 수신된 악성 코드 분석에 관련된 데이터를 추출할 수 있다. Afterwards, data related to the analysis of malicious code received from the terminal can be extracted.

가상의 응답 패킷을 이용하는 예는 가상의 응답 패킷 TCP 세션을 생성하기 위한 패킷 형식이면 충분하다. 악성 코드가 사용하는 일반적인 TCP (Transmission Control Protocol) 프로토콜은 TCP 세션만 생성하도록 상기 클라이언트 단말이 전송하는 데이터 패킷을 생성할 수 있다. 그리고 상기 데이터 패킷으로부터 악성 코드의 동적 분석에 필요한 중요 정보들을 추출할 수 있다. 이와 같이 하면 관리 서버가 동작하지 않더라도 네트워크 접속 유도 장치의 동작을 이용하여 동적 분석을 수행할 수 있다. For an example of using a virtual response packet, a packet format for creating a virtual response packet TCP session is sufficient. The general TCP (Transmission Control Protocol) protocol used by malicious code can generate data packets transmitted by the client terminal to create only a TCP session. And important information required for dynamic analysis of malicious code can be extracted from the data packet. In this way, even if the management server is not operating, dynamic analysis can be performed using the operation of the network connection guidance device.

이와 같이 실시 예는 수신된 파일을 실행하여 발행하는 이벤트를 분석할 수 하고 동적 분석 정보를 데이터베이스에 저장할 수 있다. In this way, the embodiment can analyze events issued by executing the received file and store the dynamic analysis information in the database.

도 4는 개시하는 실시 예에 따라 분석 정보 생성하는 과정에서 심층 분석 정보를 얻는 예를 개시한다.Figure 4 discloses an example of obtaining in-depth analysis information in the process of generating analysis information according to the disclosed embodiment.

개시하는 실시 예에 따른 심적 분석 정보를 획득하는 단계는 수신된 파일 포함하는 실행 가능한 파일 디스어셈블링(disassembling)하여 기계 언어 레벨에서 분석하여 악성 행위를 유발하는 공격 기법이나 공격자를 식별하는 특징을 포함한다. The step of acquiring mental analysis information according to the disclosed embodiment includes disassembling executable files including received files and analyzing them at the machine language level to identify attack techniques or attackers that cause malicious actions. do.

심층 분석 정보는 기술한 정적 분석이나 동적 분석의 결과를 이용하여 얻을 수도 있고, 분석자의 해석 기준에 따라 실행 가능한 파일을 악성 행위를 유발하는 파일로 분석할 수 있다. In-depth analysis information can be obtained using the results of the static or dynamic analysis described, and executable files can be analyzed as files that cause malicious behavior according to the analyst's interpretation criteria.

또한 심층 분석 정보는 파일 자체의 분석 정보나 또는 파일을 여러 번 가공한 정보를 포함할 수 있고 이미 저장된 정보를 기반으로 수행될 수 있다. Additionally, in-depth analysis information may include analysis information of the file itself or information that has been processed multiple times and may be performed based on information that has already been stored.

심층 분석은 디스어셈블링(disassembling), 디스어셈블된 기계언어레벨의 코드추출, 공격행위(TTP)식별, 공격자 식별, 테인트분석(taint analysis)을 수행하는 단계를 포함할 수 있다. In-depth analysis may include disassembling, extracting disassembled machine language level code, identifying attack behavior (TTP), identifying attackers, and performing taint analysis.

도면을 참조하여 상세히 예시하면 다음과 같다. A detailed example is as follows with reference to the drawings.

입력된 파일이 실행 가능한 파일을 포함할 경우 심층 분석은 실행 가능한 파일을 디스어셈블(disassemble)한다 (S2410). If the input file includes an executable file, deep analysis disassembles the executable file (S2410).

디스어셈블(disassemble)된 어셈블리 코드(assembly code)들은 OP-CODE(operation code)와 피연산자(operand)를 포함할 수 있다. OP-CODE(operation code)는 명령어 코드로 호칭할 수는 기계 언어 명령어를 나타내고, 피연산자(operand)는 실행 동작에 필요한 정보, 즉 기계 언어 명령어의 대상 데이터나 메모리 위치를 나타낸다. Disassembled assembly codes may include an operation code (OP-CODE) and operands. OP-CODE (operation code) represents a machine language instruction, which can be referred to as an instruction code, and the operand represents information necessary for the execution operation, that is, the target data or memory location of the machine language instruction.

이하에서는 편의상 디스어셈블(disassemble)된 어셈블리 코드(assembly code)들 중 OP-CODE를 제외한 부분을 ASM-CODE로 호칭하도록 한다. 따라서, 이하에서 ASM-CODE 는 피연산자(operand) 부분을 포함할 수 있다. Hereinafter, for convenience, the part of the disassembled assembly code excluding OP-CODE will be referred to as ASM-CODE. Therefore, hereinafter, ASM-CODE may include an operand part.

디스어셈블링(disassembling)을 통해 오브젝트 코드 형식의 실행 가능한 파일은 특정 형식, 예를 들면 어셈블러 언어 형식의 코드 또는 디스어셈블된 코드로 변환된다. 이러한 디스어셈블된 코드로부터 일정 형식을 가진 OP-CODE (operation code) 와 ASM-CODE를 추출할 수 있다 (S2420). Through disassembling, an executable file in object code format is converted into a specific format, for example, code in assembler language format or disassembled code. OP-CODE (operation code) and ASM-CODE having a certain format can be extracted from these disassembled codes (S2420).

추출된 디스어셈블드 코드를 일정 형식의 데이터 포맷을 변환할 수 있다. 일정 형식의 데이터 포맷의 변환 예시는 아래에서 개시한다.The extracted disassembled code can be converted to a certain data format. An example of conversion of a certain type of data format is disclosed below.

심층 분석은 추출된 디스어셈블된 코드나 상기 일정 형식으로 변환된 데이터 포맷을 기반으로 공격행위를 식별할 수 있다(S2430). In-depth analysis can identify attacks based on extracted disassembled code or data format converted to the above-mentioned format (S2430).

디스어셈블된 코드 내에 OP-CODE는 수행될 연산을 특정하는 기계 언어 명령어의 일부인데, 사이버 보안 상 공격 행위 또는 공격 기법(Terrorist Tactics, Techniques, and Procedures, 이하 TTP)을 유발하는 OP-CODE는 해당 공격 행위 별로 매우 유사한 값이나 포맷을 가질 수 있다. 따라서, 이러한 OP-CODE와 ASM-CODE 를 분석하면 특정 공격 행위를 구별할 수 있다. The OP-CODE in the disassembled code is a part of machine language instructions that specify the operation to be performed. The OP-CODE that causes cyber security attacks or attack techniques (Terrorist Tactics, Techniques, and Procedures (TTP)) is the corresponding Each attack action can have very similar values or formats. Therefore, by analyzing OP-CODE and ASM-CODE, specific attack actions can be distinguished.

실행 가능한 파일로부터 디스어셈블된 코드들을 추출하고 추출된 디스어셈블된 코드들은 실행 함수에 따라 분리될 수 있다. Disassembled codes can be extracted from an executable file, and the extracted disassembled codes can be separated according to the execution function.

예를 들면 디스어셈블된 코드로부터 추출된 OP-CODE와 ASM-CODE 또는 상기 디스어셈블된 코드의 재조합된 코드는 퍼지 해쉬(Fuzzy Hashing) 방식 또는 CTPH (context triggered piecewise hashes) 방식 등의 해쉬 값이나 이를 일정 형식의 코드로 변환할 수 있다. For example, the OP-CODE and ASM-CODE extracted from the disassembled code or the recombined code of the disassembled code uses a hash value such as the Fuzzy Hashing method or the CTPH (context triggered piecewise hashes) method. It can be converted into a certain format of code.

실시 예는 실행 가능한 파일의 디스어셈블된 코드를 일정 형식으로 변환하고 사이버 보안 전문가 집단들이 공통적으로 인정하는 공격 행위 세부 요소들로 매칭하도록 하여 그 공격행위를 식별할 수 있다. The embodiment may identify the attack behavior by converting the disassembled code of the executable file into a certain format and matching it with detailed attack behavior elements commonly recognized by cybersecurity expert groups.

그리고 이미 추출된 디스어셈블된 코드들과 공격행위(TTP) 별 매칭 관계를 저장한 데이터베이스에 기반하여 공격행위(TTP)를 식별하도록 할 수 있다. 이 경우 추출된 디스어셈블된 코드들의 CTPH 알고리즘에 따른 퍼지 해쉬 값이나 이를 일정 형식으로 변환한 데이터와 공격 행위(TTP) 별 매칭 유사도를 고속으로 수행할 수 있다. In addition, the attack behavior (TTP) can be identified based on a database that stores the matching relationship between the already extracted disassembled codes and each attack behavior (TTP). In this case, the fuzzy hash value according to the CTPH algorithm of the extracted disassembled codes or the data converted into a certain format and the matching similarity for each attack action (TTP) can be performed at high speed.

이러한 보안 전문가 집단의 공격 행위를 저장한 데이터 베이스의 일 예로서 MITRE ATT&CK 등의 정보를 저장한 데이터베이스를 예로 들 수 있다. MITRE ATT&CK은 실제 보안 공격 기법이나 행위에 대한 데이터 베이스의 하나로서, 특정 보안 공격 기법이나 행위들을 매트릭스 형식의 구성 요소들로 표시함으로써, 공격 기법과 행위들을 일정한 데이터 세트 형식으로 식별할 수 있도록 한다. An example of a database that stores the attack actions of this group of security experts is a database that stores information such as MITER ATT&CK. MITER ATT&CK is a database on actual security attack techniques or actions. It displays specific security attack techniques or actions as components in a matrix format, allowing attack techniques and actions to be identified in a certain data set format.

MITRE ATT&CK는 해커 또는 악성 코드의 공격 기법에 대한 내용을 공격의 단계 별로 분류하여 CVE 코드(Common Vulnerabilities and Exposures Code)의 매트릭스로 표현한다. MITER ATT&CK classifies the attack techniques of hackers or malicious code by attack stage and expresses them in a matrix of the Common Vulnerabilities and Exposures Code (CVE code).

실시 예는 디스어셈블된 코드를 분석함으로써 여러 가지 공격 행위들 중 특정 공격 행위를 식별하되, 식별된 타입의 공격 행위가 전문가 단체들이 인정하는 실제 수행되는 공격 코드들에 매칭되도록 함으로써 공격 행위 식별이 전문적이면서 공통으로 인식되는 요소들로 표현되도록 할 수 있다.The embodiment identifies a specific attack behavior among various attack behaviors by analyzing the disassembled code, and matches the identified type of attack behavior to actual attack codes recognized by expert groups, so that the identification of the attack behavior is professional. However, it can be expressed with elements that are commonly recognized.

디스어셈블된 코드 내에 OP-CODE는 특정 행위를 유발시키는 기계 언어 명령어이므로, 동일한 공격 행위를 유발하는 파일의 OP-CODE 는 매우 유사할 수 있다. 그러나 동일 공격 행위와 이를 유발하는 파일에 포함된 OP-CODE가 정확하게 완전히 동일한 것은 아니므로, 실시 예는 OP-CODE를 포함하는 디스어셈블링된 코드에 대해 인공 지능 기반의 머신 러닝을 수행하도록 할 수 있다. 머신 러닝이 수행되면 임계 치 이상의 유사도를 가진 공격 코드의 포함 여부와 공격 코드의 공격 기법이 식별될 수 있다. The OP-CODE in the disassembled code is a machine language instruction that triggers a specific action, so the OP-CODE of files that trigger the same attack action may be very similar. However, since the same attack behavior and the OP-CODE included in the file causing it are not exactly the same, the embodiment may perform artificial intelligence-based machine learning on the disassembled code containing the OP-CODE. there is. When machine learning is performed, it can be identified whether attack codes with similarity above a threshold are included and the attack techniques of the attack codes.

따라서, 동일한 악성 행위를 유발시키는 파일들의 디스어셈블링된 코드들이 완전히 동일하지 않더라도 디스어셈블링된 코드기반으로 악성 행위를 수행하는 파일을 식별할 수 있다. Therefore, even if the disassembled codes of files that cause the same malicious behavior are not completely identical, the file that performs the malicious behavior can be identified based on the disassembled code.

머신 러닝 알고리즘으로 Perceptron, Logistic Regression, Support Vector Machines, Multilayer Perceptron 등의 알고리즘이 사용될 수 있다. Machine learning algorithms such as Perceptron, Logistic Regression, Support Vector Machines, and Multilayer Perceptron can be used.

디스어셈블된 코드들의 퍼지 해쉬 값들의 유사도를 AI(Artificial Intelligence; 이하 AI) 알고리즘을 이용하여 기존에 학습된 MITRE ATT&CK과 같은 공격 기법의 공격 코드들로 매칭하여 최종적으로 악성 코드임을 탐지할 수 있다 By matching the similarity of the fuzzy hash values of the disassembled codes with attack codes from previously learned attack techniques such as MITER ATT&CK using an AI (Artificial Intelligence) algorithm, it is possible to ultimately detect malicious code.

그리고 실시 예는 인공 지능 머신 러닝의 결과를 이용하면 보다 정확성을 가지고 신속하게 디스어셈블된 코드에 대응되는 공격 행위 또는 공격 행위의 취약 요소들을 식별할 수 있다. And in embodiments, by using the results of artificial intelligence machine learning, it is possible to identify attack actions or vulnerable elements of attack actions corresponding to disassembled code quickly and with greater accuracy.

이에 대한 구체적인 실시 예들은 이하에서 도면을 참고하여 상세히 개시한다.Specific embodiments of this will be described in detail below with reference to the drawings.

심층 분석의 실시 예는 디스어셈블된 코드와 인공 지능 기반의 머신 러닝 결과를 이용해 유사 공격 행위를 유발하는 공격자도 식별하는 단계를 포함할 수도 있다(S2440). 마찬가지로 공격자 식별에 대한 구체적인 예는 후술한다An embodiment of deep analysis may also include identifying an attacker who causes similar attack behavior using disassembled code and artificial intelligence-based machine learning results (S2440). Likewise, specific examples of attacker identification are described later.

그리고 심층 분석의 실시 예는 파일이 없는(fileless) 악성 코드의 경우도 특정 시점에서 시스템의 메모리 분석을 통해 공격 행위가 있는지 여부에 대해 판단할 수 있는 테인트분석(taint analysis)을 포함할 수 있다(S2450). And, an example of in-depth analysis may include taint analysis, which can determine whether an attack occurs by analyzing the system's memory at a specific point in time, even in the case of fileless malicious code. (S2450).

심층 분석은 실행 파일의 디스어셈블링된 코드를 처리하는 것에 기반하며 이에 따른 공격 기법이나 공격자의 식별, 또는 테인트 분석은 선택적으로 수행될 수도 있다.In-depth analysis is based on processing the disassembled code of the executable, and the resulting attack technique, identification of the attacker, or taint analysis may be optionally performed.

이와 같이 수행된 최종 심층 분석 정보는 해당 파일에 대응되는 심층 분석 정보로 데이터베이스에 저장할 수 있다. The final in-depth analysis information performed in this way can be stored in the database as in-depth analysis information corresponding to the file.

도 5는 심층 분석의 일 예로서 악성 코드를 디스어셈블링하여 악성 행위가 포함된 파일임을 판단하는 예를 개시한다. Figure 5 shows an example of in-depth analysis in which malicious code is disassembled to determine that the file contains malicious behavior.

기술한 바와 같이 실행 가능한 파일을 디스어셈블링을 수행하면 어셈블리 언어 형식의 코드의 형식인 OP-CODE 와 ASM-CODE를 얻을 수 있다.As described, by disassembling an executable file, you can obtain OP-CODE and ASM-CODE, which are assembly language code formats.

예를 들어 EXE 실행 파일 내에 특정 함수 A는 디스어셈블러(disassembler)를 거치면 OP-CODE를 포함하는 디스어셈블링된 코드 또는 디스어셈블드 코드(disassembled cocde)로 변환될 수 있다. For example, a specific function A in an EXE executable file can be converted into disassembled code or disassembled code including OP-CODE by going through a disassembler.

만약 EXE 실행 파일이 악성 행위를 유발하는 악성 코드인 경우, 이러한 행위를 유발하는 함수나 코드 부분을 디스어셈블링하면 악성 행위를 유발하는 디스어셈블드 코드 세트를 얻을 수 있다. If the EXE executable file is malicious code that causes malicious behavior, disassembling the function or code part that causes this behavior can result in a set of disassembled code that causes malicious behavior.

디스어셈블드 코드 세트는 상기 악성 행위 또는 악성 코드에 대응되는 OP-CODE 세트 또는 OP-CODE 와 ASM-CODE가 조합된 세트를 포함할 수 있다. The disassembled code set may include an OP-CODE set corresponding to the malicious behavior or malicious code, or a combination of OP-CODE and ASM-CODE.

악성 행위가 동일하더라도 이를 수행하도록 하는 악성 코드의 알고리즘이나 실행 파일의 디스어셈블링 결과가 정확하게 같지 않기 때문에 인공 지능 기반의 유사도 분석을 통해 입력된 악성 코드가 특정 디스어셈블드 코드 세트와 대응되는지를 식별할 수 있다.Even if the malicious behavior is the same, the algorithm of the malicious code that performs it or the disassembly results of the executable file are not exactly the same, so artificial intelligence-based similarity analysis identifies whether the input malicious code corresponds to a specific set of disassembled code. can do.

이렇게 특정 디스어셈블드 코드 세트와 대응되는 악성 행위를, MITRE ATT&CK와 같은 전문적이고 공용의 공격 방식 또는 공격 기법에 대응시켜 공격 기법 (TTP)를 식별하는데 사용할 수 있다. Malicious behavior corresponding to a specific disassembled code set can be used to identify an attack technique (TTP) by matching it to a professional or common attack method or attack technique, such as MITER ATT&CK.

또는 특정 디스어셈블드 코드 내 OP-CODE 세트 또는 OP-CODE 와 ASM-CODE가 조합된 세트를 MITRE ATT&CK에서 정의한 공격 기법 요소들과 대응시켜 공격 기법을 판단하는데 사용할 수 있다. Alternatively, the OP-CODE set in a specific disassembled code or the combination of OP-CODE and ASM-CODE can be used to determine the attack technique by matching it with the attack technique elements defined in MITER ATT&CK.

이 도면은 실행 파일, 해당 실행 파일의 디스어셈블드 코드 세트와 MITRE ATT&CK에서 공격 기법 요소들에 대응되는 공격 기법을 대응한 예를 나타낸다.This figure shows an example of an executable file, a disassembled code set of the executable file, and an attack technique corresponding to attack technique elements in MITER ATT&CK.

도 6은 개시하는 실시 예에 따라 분석 정보 생성하는 과정에서 연관관계 분석 정보를 산출하는 일 예를 개시한다.Figure 6 discloses an example of calculating correlation analysis information in the process of generating analysis information according to the disclosed embodiment.

상기 얻은 여러 가지 분석 정보들은 사이버 위협 침해 정보로 이용될 수 있는데, 사이버 위협 침해 정보에 기반해 공격자 또는 공격 기법의 연관관계를 나타내는 연관관계 분석 정보를 생성한다. The various analysis information obtained above can be used as cyber threat infringement information. Based on the cyber threat infringement information, correlation analysis information indicating the correlation between attackers or attack techniques is generated.

사이버 위협 침해 정보(indicator of compromise, IoC)는 시스템이나 네트워크 상에 발생하는 실제 또는 잠재적인 사이버 보안 위협 행위, 공격 행위 또는 악성 행위를 식별하는 여러 가지 정보들을 지칭한다. 예를 들면, 사이버 위협 침해 정보(IoC)는 이러한 행위들을 지칭하는 파일, 로그 정보 상에 나타나는 여러 흔적들, 파일 자체, 경로 등 또는 이런 행위를 추론하도록 하는 정보들을 나타낸다.Cyber threat indicator of compromise (IoC) refers to a variety of information that identifies actual or potential cyber security threats, attacks, or malicious actions occurring on a system or network. For example, cyber threat information on compromise (IoC) represents files that refer to these actions, various traces that appear in log information, the file itself, the path, etc., or information that allows these actions to be inferred.

이미 분석된 정적, 동적, 심층 분석 정보 등과 식별된 파일을 이용하여, 분석 정보와 공격 행위 사이의 IP 정보의 연관관계(S2510), 이메일에 포함되거나 웹사이트의 호스트네임의 연관관계(S2520), URL의 연관관계(S2530), 파일의 코드사인(codesign)의 연관 관계들(S2540)을 얻을 수 있다. Using identified files such as static, dynamic, and in-depth analysis information that has already been analyzed, the association of IP information between analysis information and attack behavior (S2510), the association of hostnames contained in emails or website (S2520), You can obtain the relationships of URLs (S2530) and the relationships of code signs (S2540) of files.

여기서 예시하는 연관관계 분석 정보를 획득하는 과정은 일 예로서 반드시 예시한 순서를 따르거나 모든 연관관계가 분석되어야 하는 것은 아니다. 예를 들어 분석 정보와 공격행위 사이의 IP 와 URL의 연관관계만 이용해도 관련 파일에 대한 연관관계를 얻어낼 수 있다. 이러한 연관관계 분석 정보는 정확하게 공격기법 또는 공격자를 추론하는데 사용될 수 있다. The process of acquiring correlation analysis information illustrated here is an example and does not necessarily follow the exemplified order or all relationships must be analyzed. For example, just by using the IP and URL correlation between analysis information and attack activity, the correlation to related files can be obtained. This correlation analysis information can be used to accurately infer attack techniques or attackers.

정적 분석, 동적 분석, 심층 분석 등으로 공격 행위나 공격자가 식별되지 않더라도 분석된 정보들 간의 연관관계를 이용하면 공격 행위와 공격자를 추정할 수 있는 정보를 얻을 수 있다. 이에 대한 상세한 설명은 이하에서 도면을 참조하여 설명한다.Even if the attack behavior or attacker is not identified through static analysis, dynamic analysis, or in-depth analysis, information that can be used to estimate the attack behavior and the attacker can be obtained by using the correlation between the analyzed information. A detailed description of this will be provided below with reference to the drawings.

이러한 연관 관계 분석 정보는 수신되는 파일에 대해 지속적이고 누적적으로 저장하고 추후 새로운 파일을 수신할 때마다 저장된 연관관계 분석 정보는 다시 업데이트할 수 있다. This correlation analysis information is continuously and cumulatively stored for received files, and the stored correlation analysis information can be updated again whenever a new file is received in the future.

위에서 분석한 여러 가지 분석 정보를 기반으로 사이버 위협 침해 정보를 얻는다. Cyber threat infringement information is obtained based on the various analysis information analyzed above.

그리고 사이버 위협 침해 정보(IoC)를 이용해 공격 행위나 공격자를 식별할 수 있는 여러 가지 연관관계 정보를 얻을 수 있다(S2550). And, using cyber threat infringement information (IoC), various correlation information that can identify attack activities or attackers can be obtained (S2550).

이러한 사이버 위협 침해 정보(IoC)는 추후에 공격 기법을 추론하는 연관관계 분석 정보를 얻는데 이용될 수 있다. 연관 관계 분석과 이를 이용하여 공격자를 추적 또는 공격 행위를 추론할 수 있는 예는 이하에서 상세히 설명한다.This cyber threat infringement information (IoC) can later be used to obtain correlation analysis information to infer attack techniques. Correlation analysis and examples of how it can be used to track attackers or infer attack behavior are described in detail below.

그리고 획득된 연관관계 분석 정보는 해당 파일에 대응하여 다시 서버나 데이터 베이스에 저장될 수 있다.And the obtained correlation analysis information can be stored again in the server or database in response to the file.

설명한 바와 같이 위와 같이 분석된 정보들은 취합되어 중복 제거, 표준화, 인리치먼트 (enrichment) 과정을 통해 표준화될 수 있다. 예를 들면 정적 분석 정보, 동적분석 정보, 심층분석 정보, 연관관계분석 정보들은 사용자에게 제공되거나 추후 사이버 위협 정보를 갱신 또는 재생산하기 위해 표준화된 포맷으로 저장될 수 있다. As explained above, the information analyzed above can be collected and standardized through the process of deduplication, standardization, and enrichment. For example, static analysis information, dynamic analysis information, in-depth analysis information, and correlation analysis information can be provided to users or stored in a standardized format to update or reproduce cyber threat information at a later date.

여기서 각 분석 정보들의 중복되거나 공통된 분석 정보는 중복된 부분을 제거하고, 부족한 부분의 데이터의 인리치먼트(enrichment) 작업 등을 수행할 수 있다.Here, duplicate or common analysis information of each analysis information can be removed, and enrichment of insufficient data can be performed.

그리고 사용자의 조회 질의에 따라 또는 서비스 정책에 따라 사이버 위협 정보로 제공될 수 있다. 사이버 위협 정보로 제공에 대해서도 이하에서 상세히 설명한다.And it can be provided as cyber threat information according to the user's inquiry or service policy. Provision of cyber threat information is also explained in detail below.

이러한 사이버 위협 정보는 사용자에게 직접 제공될 수도 있고 아래에서 설명하는 사이버 위협 예측 정보로 생성된 후 사용자의 요청이나 서비스에 따라 제공될 수도 있다. This cyber threat information may be provided directly to the user, or may be generated as cyber threat prediction information described below and then provided according to the user's request or service.

도 7은 개시한 실시 예에 따라 연관관계 분석 정보를 얻는 과정의 일 예를 개시한 도면이다. Figure 7 is a diagram illustrating an example of a process for obtaining correlation analysis information according to the disclosed embodiment.

이 도면에서 파일 A-1 (10), A-2 (20), B-1 (30)은 악성 행위를 유발할 수 있는 파일을 지칭하고, 서버 (가) (110), 서버 (나)(120)는 악성 행위를 유발시키는 C&C 서버를 나타낸다. In this figure, files A-1 (10), A-2 (20), and B-1 (30) refer to files that can cause malicious actions, and server (A) (110) and server (B) (120) ) represents the C&C server that causes malicious actions.

개시한 실시 예에 따라 파일 A-1(10)의 파일을 수신하여 동적 분석을 수행한 경우, 파일 A-1 (10) 실행 시에 서버 (가) (110) 를 접속하는 것을 확인하였다고 가정한다.When the file A-1 (10) is received and dynamic analysis is performed according to the disclosed embodiment, it is assumed that it is confirmed that the server (a) (110) is connected when file A-1 (10) is executed. .

실시 예는 악성 코드에 대한 여러 가지 분석 정보를 저장하는 데이터 베이스로부터 파일 A-1 (10)과 유사한 파일 A-2 (20)의 저장된 분석 정보를 얻을 수 있다. 파일 A-2 (20)의 분석 정보로부터 동일한 서버인 서버 (가) (110) 가 파일 A-1 (10) 과 파일 A-2 (20)을 활용한다는 것을 파악할 수 있고 이러한 정보로부터 서버 (가) (110) 는 동일 공격 기법 또는 동일 서버를 이용하는 해커임을 추정할 수 잇다. The embodiment may obtain stored analysis information of file A-2 (20), which is similar to file A-1 (10), from a database that stores various analysis information about malicious code. From the analysis information of File A-2 (20), it can be determined that the same server, Server (A) (110), utilizes File A-1 (10) and File A-2 (20), and from this information, Server (A) ) (110) can be assumed to be a hacker using the same attack technique or the same server.

실시 예에 따라 이미 분석된 파일인 파일 A-2 (20) 이 서버 (가) (110)뿐만 아니라 서버 (나) (120) 도 접속하는 경우 파일 A-2 (20) 의 연관 관계로서 서버 (나) (120)의 정보를 저장할 수 있다. According to the embodiment, when File A-2 (20), which is a file that has already been analyzed, connects not only to server (A) 110 but also to server (B) 120, the server (20) as an association of file A-2 (20) B) The information in (120) can be saved.

만약 파일 A-1(10) 과 파일 A-2(20) 과는 전혀 다른 파일이지만 파일 B-1 (30) 의 분석 정보가 서버 (나) (120)를 접속한 기록을 저장했다면 파일 형식이 다르지만 서버 (가) (110) 와 서버 (나) (120) 는 동일한 공격자 그룹 또는 동일한 기법을 이용하는 공격자 그룹일 수 있다. If it is a completely different file from File A-1 (10) and File A-2 (20), but the analysis information of File B-1 (30) stores the record of accessing the server (B) (120), the file format is Although different, server (A) 110 and server (B) 120 may be the same attacker group or an attacker group using the same technique.

따라서, 이와 같이 파일과 관련된 여러 가지 분석 정보에 대해 연관관계를 분석하면 악성 행위를 유발하는 공격자, 공격 기법 등에 대한 그룹핑 정보를 얻을 수 있고, 이러한 연관관계 분석 정보는 공격자나 공격자 그룹을 식별하는데 활용될 수 있다. Therefore, by analyzing the correlation between various analysis information related to files, grouping information on attackers causing malicious actions, attack techniques, etc. can be obtained, and this correlation analysis information can be used to identify the attacker or attacker group. It can be.

이하에서는 사이버 위협 예측 정보를 설명하는 예를 개시한다.Below, an example explaining cyber threat prediction information is disclosed.

파일의 식별 정보와 얻은 분석 정보들 중 적어도 하나 이상의 정보를 이용하거나 취합한 데이터 세트에 기초하여 사이버 위협 예측 정보를 생성할 수 있다 Cyber threat prediction information can be generated using at least one of the file identification information and the obtained analysis information or based on the collected data set.

도 8은 실시 예에 따라 사이버 위협 정보의 예측 정보 생성하는 일 예를 개시한다. 도면을 참조하여 사이버 위협 정보의 예측 정보를 생성하는 예를 설명하면 다음과 같다.Figure 8 discloses an example of generating predictive information of cyber threat information according to an embodiment. An example of generating cyber threat information prediction information will be described with reference to the drawing as follows.

분석 정보에 대한 데이터 세트가 확보되면 그 데이터 세트를 기초로 추후에 발생할 공격 행위와 관련된 예측 정보 생성이 가능하다. Once a data set for analysis information is secured, it is possible to generate predictive information related to future attack actions based on the data set.

위와 같이 추출된 분석 정보에 따른 데이터 세트를 인공 지능 기반의 학습 데이터 세트로 가공하고, 가공된 학습 데이터 세트를 기초로 인공 지능 분석을 수행하면 공격 행위와 관련된 여러 가지 예측 정보 생성이 가능하다. By processing the data set according to the analysis information extracted as above into an artificial intelligence-based learning data set and performing artificial intelligence analysis based on the processed learning data set, it is possible to generate various predictive information related to attack behavior.

이렇게 생성된 예측 정보의 데이터 세트는 다시 새로운 학습 데이터 세트로 반복적으로 생성 또는 가공할 수 있다. The data set of prediction information created in this way can be repeatedly created or processed into a new learning data set.

이 도면의 실시 예는 위의 분석 정보의 데이터 세트를 인공 지능 학습을 통해 악성 코드 제작자의 예측 정보(S3110), 악성 코드 공격 방법의 예측 정보(S3120), 악성 코드 공격 그룹의 예측 정보(S3130), 악성 코드 유사도 예측 정보(S3140), 악성 코드 확산도 예측 정보(S3150) 등을 생성하는 예를 개시한다.The embodiment of this figure is a data set of the above analysis information through artificial intelligence learning to generate prediction information of the malicious code creator (S3110), prediction information of the malicious code attack method (S3120), and prediction information of the malicious code attack group (S3130). , an example of generating malicious code similarity prediction information (S3140), malicious code spread prediction information (S3150), etc. is disclosed.

여기서 예측 정보의 순서는 일 예로서 예측 정보 획득의 순서의 변경이 가능하다. 예를 들면 악성 코드 유사도 예측 정보(S3140)와 악성 코드 확산도 예측 정보(S3150)의 순서는 변경될 수 있으며 나머지 예측 정보의 생성도 반드시 예시된 순서에 따를 필요가 없다. Here, the order of prediction information is an example, and the order of obtaining prediction information can be changed. For example, the order of the malicious code similarity prediction information (S3140) and the malicious code spread prediction information (S3150) may be changed, and the generation of the remaining prediction information does not necessarily have to follow the illustrated order.

또한 예시한 유사도 예측 정보 이외에 사이버 위협 정보와 관련된 추가적인 예측 정보 생성도 가능하다.Additionally, in addition to the example similarity prediction information, it is also possible to generate additional prediction information related to cyber threat information.

이렇게 생성한 악성 코드의 예측 정보는 자체 위험도를 예측하는 위험도 예측 정보와 공격자, 공격 그룹, 유사도, 확산도 등을 각각 예측하는 예측 정보 또는 그 예측 정보를 종합적으로 표시하는 악성 코드의 종합 예측 정보로 나뉘어 데이터베이스에 저장될 수 있다. The prediction information of the malicious code generated in this way is divided into risk prediction information that predicts its own risk, prediction information that predicts the attacker, attack group, similarity, spread, etc., or comprehensive prediction information of the malicious code that comprehensively displays the prediction information. It can be divided and stored in the database.

위와 같은 사이버 위협 정보의 분석 정보와 예측 정보를 이용하면 입력된 파일과 관련된 악성 코드의 종류를 식별하고 이에 대한 위험도를 결정할 수 있다. Using the analysis and prediction information of the cyber threat information above, it is possible to identify the type of malicious code related to the input file and determine its risk level.

또한 입력된 파일과 관련된 악성 코드의 기록을 포함한 프로파일링 정보를 생성하여 저장될 수 있는데, 저장된 악성 코드와 관련된 분석 정보, 예측 정보, 위험도 또는 프로파일링 정보는 사용자가 이를 쉽게 조회할 수 있도록 추가로 가공될 수 있다. Additionally, profiling information, including records of malicious code related to the input file, may be generated and stored. Analysis information, prediction information, risk level or profiling information related to the stored malicious code is additionally added so that the user can easily view it. It can be processed.

사용자에게 사이버 위협 정보를 제공하는 일 예를 개시하면 다음과 같다. An example of providing cyber threat information to a user is as follows.

특정 파일을 기준으로 여러 가지 연관 관계 분석 정보가 발생될 수 있어서 사이버 위협 침해 정보(IoC)를 매우 많은 데이터 통신량이 필요할 수 있다. 실시 예는 사이버 보안의 위협에 신속하게 대처하기 위해서는 이러한 정보를 빠른 시간 내에 공유, 저장, 조회, 및 업데이트할 수 있다. Since various correlation analysis information can be generated based on a specific file, cyber threat infringement information (IoC) may require a very large amount of data communication. Embodiments can share, store, view, and update this information in a short period of time in order to quickly respond to cybersecurity threats.

위와 같은 분석 정보들에 기초하여 실시 예는 보안 이벤트가 발생하면 발생된 보안 이벤트에 관련된 사이버 위협 침해 정보(IoC)를 암호화 소켓 통신을 통해 사이버 위협 침해 정보(IoC) 저장 서버나 다른 사용자 단말기들에 P2P 소켓 통신을 이용해 조회를 요청할 수 있다. 그리고 사이버 위협 침해 정보(IoC) 저장 서버나 다른 사용자 단말기들 중 사이버 위협 침해 정보(IoC)를 빨리 수신하는 정보를 사이버 위협 침해 정보(IoC)로 이용할 수 있다. Based on the above analysis information, the embodiment stores the cyber threat infringement information (IoC) related to the security event when a security event occurs to a cyber threat infringement information (IoC) storage server or other user terminals through encrypted socket communication. You can request an inquiry using P2P socket communication. In addition, information that quickly receives cyber threat infringement information (IoC) from a cyber threat infringement information (IoC) storage server or other user terminals can be used as cyber threat infringement information (IoC).

또 다른 예로서, 사이버 위협 정보를 제공하는 또 다른 예로서 사용자가 사용하는 단말에서 상기와 같이 분석된 악성 코드에 대한 정보를 조회할 경우 조회된 정보를 다음과 같이 제공할 수 있다. As another example, as another example of providing cyber threat information, when information about the malicious code analyzed as above is searched on the terminal used by the user, the searched information can be provided as follows.

예를 들어 사용자가 사용하는 단말이 파일의 해쉬 값을 산출한 경우, 산출된 해쉬 값에 대해 텍스트 형식으로 악성 코드 여부의 조회하는 질의를 서버로 전송할 수 있다. 해쉬 값과 질의를 수신한 서버가 위와 같이 악성 코드 정보가 저장된 데이터 베이스에 상기 해쉬 값을 전달하고 이에 대한 조회 결과를 수신한다. 조회 결과를 수신한 서버는 그 결과를 상기 해쉬 값에 대응되는 텍스트 값으로 사용자 단말에 다시 리턴할 수 있다. For example, if the terminal used by the user calculates the hash value of the file, a query to check whether there is malicious code for the calculated hash value in text format can be transmitted to the server. The server that receives the hash value and query transmits the hash value to the database where the malicious code information is stored as above and receives the query result. The server that receives the search result may return the result back to the user terminal as a text value corresponding to the hash value.

저장된 악성 코드에 대한 정보를 기반으로 사용자의 요청에 따라 사이버 위협 정보를 제공하는 다른 예를 도면을 참조하여 설명하면 다음과 같다. Another example of providing cyber threat information at the user's request based on information about stored malicious code is described with reference to the drawings as follows.

도 9는 실시 예에 따라 사이버 위협 정보를 제공하기 위한 악성 코드 질의들의 예를 개시한다. 9 discloses an example of malicious code queries for providing cyber threat information according to an embodiment.

사이버 위협 정보 처리에 대한 실시 예는 위와 같이 산출한 분석 정보와 예측 정보를 기초로 식별한 악성 코드를 여러 가지 메타 정보와 함께 저장할 수 있다.In an embodiment of cyber threat information processing, malicious code identified based on the analysis information and prediction information calculated as above may be stored along with various meta information.

위에서 설명한 바와 같이 사용자는 악성 코드 정보가 저장된 데이터 베이스에 예시한 바와 같은 조회를 요청할 수 있다. As explained above, the user can request a query as shown in the example in the database where malicious code information is stored.

Query (A)를 참고하면, 사용자는 실시 예에 따른 사이버 위협 정보가 저장된 데이터베이스에 Query (A)와 같이 악성 코드와 관련된 기간, 특정 악성 코드의 수량, 탐지명, 파일 타입, 유포지, 코드사인 및 파일 크기 등의 카테고리로 악성 코드를 질의할 수 있다. Referring to Query (A), the user enters the database where cyber threat information according to the embodiment is stored, such as Query (A), the period related to malicious code, the quantity of specific malicious code, detection name, file type, distribution site, code sign, and You can query for malicious code by categories such as file size.

그러면 사이버 위협 정보가 저장된 데이터 베이스는 서버를 통해 Query 에 대응되는 사이버 위협 정보나 악성 코드 정보를 리턴한다.Then, the database where cyber threat information is stored returns cyber threat information or malicious code information corresponding to the query through the server.

다른 예로 사용자는 이 도면의 Query (B)에서 예시한 바와 같이 악성 코드와 관련된 특정일, 특정 악성 코드의 수량, 파일 타입, 유포지 여부, 자식 프로세스의 생성 여부 등을 질의할 수 있다. As another example, as illustrated in Query (B) of this figure, a user can query a specific date related to malicious code, the quantity of specific malicious code, the file type, whether it is distributed, whether a child process is created, etc.

Query (C)에서 예시하는 바와 같이 사용자는 악성 코드와 관련된 기간, 특정 악성 코드의 수량, 파일 타입, 유포지 정보, 파일 명 정보, 악성 코드 수행에 따른 공격 행위, 파일 크기에 정보를 이용하여 악성 코드에 대한 정보를 질의할 수 있다. As exemplified in Query (C), the user uses the information on the period related to the malicious code, the quantity of the specific malicious code, file type, distribution information, file name information, attack behavior according to the execution of the malicious code, and file size to identify the malicious code. You can query information about.

Query (D)의 예는 악성 코드와 관련된 기간, 특정 악성 코드의 수량, 파일 타입, 유포지 주소 및 악성 코드의 통계 정보를 이용하여 악성 코드에 대한 정보를 질의할 수 있다.An example of Query (D) can query information about malicious code using the period related to malicious code, quantity of specific malicious code, file type, distribution address, and statistical information of malicious code.

설명한 바와 같이 사이버 위협 정보 처리 방법의 실시 예는 분석 정보, 예측 정보는 사용자의 조회 문의에 대해 대응되는 악성 코드 정보를 제공하기 위해 악성 코드에 위와 같은 조건에 맞는 정보를 데이터베이스에 함께 저장한다.As described, in an embodiment of the cyber threat information processing method, analysis information and prediction information are stored together with information that meets the above conditions in a database in order to provide malicious code information corresponding to the user's inquiry.

따라서, 서버는 해당 질의 조건과 일치하는 악성 코드에 대한 정보를 데이터베이스부터 얻어 사용자에게 전송할 수 있다.Therefore, the server can obtain information about malicious code that matches the query conditions from the database and transmit it to the user.

예시한 바와 같이 사용자는 파일의 여러 가지 메타 정보를 이용해 악성 코드 정보를 조회할 수 있다. 사용자는 보호해야 하는 정보나 시스템이 악성 코드에 의해 피해나 위협이 될 수 있는 정보를 미리 얻을 수 있다.As shown in the example, users can search for malicious code information using various meta information of the file. Users can obtain information in advance that information or systems that need to be protected may be damaged or threatened by malicious code.

도 10은 사이버 위협 정보 처리 장치의 일 실시 예를 개시한 도면이다. 이 도면의 실시 예는 사이버 위협 정보 처리 장치를 개념적으로 예시하는데 이 도면을 참조하여 사이버 위협 정보 처리 장치의 실시 예를 설명하면 다음과 같다. 10 is a diagram illustrating an embodiment of a cyber threat information processing device. The embodiment of this drawing conceptually illustrates a cyber threat information processing device. An embodiment of the cyber threat information processing device will be described with reference to this drawing as follows.

개시하는 사이버 위협 정보 처리 장치는 물리장치(2000)인 데이터베이스 및 서버(2100) 및 데이터베이스(2200)와 상기 물리장치(2000) 상에서 구동되는 응용 프로그래밍 인터페이스 Application Programming Interface, API) 포함하는 플랫폼 (10000)을 포함한다. 이하에서 플랫폼(10000)은 사이버 위협 인텔리전스 플랫폼(cyber threat intelligence platform; CTIP) 또는 간략하게 인텔리전스 플랫폼(10000)으로 호칭한다.The disclosed cyber threat information processing device is a platform (10000) including a database and server 2100 and a database 2200, which are physical devices 2000, and an application programming interface (API) running on the physical device 2000. Includes. Hereinafter, the platform 10000 is referred to as a cyber threat intelligence platform (CTIP), or simply the intelligence platform 10000.

서버(2100)는 중앙연산장치(central processing unit, CPU) 나 프로세서와 같은 연산장치를 포함하고 데이터베이스(2200)에 데이터를 저장하거나 읽을 수 있다. The server 2100 includes a computing device such as a central processing unit (CPU) or processor and can store or read data in the database 2200.

서버(2100)는 입력되는 보안 관련 데이터를 연산 및 처리하며 파일을 실행하여 여러 가지 보안 이벤트를 발생시키고 관련된 데이터를 처리하도록 한다. 그리고 서버(2100)는 여러 가지 사이버 보안 관련 데이터의 입출력을 제어하고 인텔리전스 플랫폼(10000)에서 처리된 데이터를 데이터베이스(2200)에 저장할 수 있다. The server 2100 calculates and processes input security-related data and executes files to generate various security events and process related data. Additionally, the server 2100 can control the input and output of various cybersecurity-related data and store the data processed by the intelligence platform 10000 in the database 2200.

서버(2100)는 데이터 입력을 위한 네트워크 장치나 네트워크의 보안 장치를 포함할 수 있다. 서버(2100)의 중앙처리장치, 프로세서 또는 연산장치는 이하의 도면에서 예시하는 프레임워크나 해당 프레임 워크 내의 모듈을 수행할 수 있다.The server 2100 may include a network device for data input or a network security device. The central processing unit, processor, or computing device of the server 2100 may execute the framework illustrated in the drawings below or a module within the framework.

실시 예에 따른 인텔리전스 플랫폼(10000)은 사이버 위협 정보의 처리를 위한 응용 프로그래밍 인터페이스(API)를 제공한다. 예를 들어 인텔리전스 플랫폼(10000)은, 네트워크와 연결된 네트워크 보안 장치나 악성 행위를 스캔 및 감지하는 사이버 악성 행위 방지 프로그래밍 소프트웨어로부터 파일이나 데이터를 입력받을 수 있다. The intelligence platform 10000 according to the embodiment provides an application programming interface (API) for processing cyber threat information. For example, the intelligence platform 10000 may receive files or data from a network security device connected to the network or cyber malicious behavior prevention programming software that scans and detects malicious behavior.

예를 들어 실시 예에 따른 인텔리전스 플랫폼(10000)은 보안 이벤트를 제공하는 SIEM (Security Information and Event Management) API, 실행 환경에 대한 데이터를 제공하는 EDR (Environmental Data Retrieval) API, 네트워크 트래픽을 정의된 보안 정책에 따라 모니터하고 제어하는 파이어월(firewall) API 등의 기능을 제공할 수 있다. 또한 인텔리전스 플랫폼(10000)은 내부와 외부 네트워크 사이에 방화벽과 유사한 역할을 수행하는 IPS (Intrusion Prevention Systems )의 API의 역할도 제공할 수 있다. For example, the intelligence platform 10000 according to the embodiment may include a Security Information and Event Management (SIEM) API that provides security events, an Environmental Data Retrieval (EDR) API that provides data about the execution environment, and It can provide functions such as a firewall API that monitors and controls according to policy. In addition, the intelligence platform (10000) can also provide the role of an API for IPS (Intrusion Prevention Systems), which performs a similar role as a firewall between internal and external networks.

실시 예에 따른 인텔리전스 플랫폼(10000)의 응용 프로그래밍 인터페이스(API)(1100)는 사이버 보안의 공격 행위를 수행하는 악성 코드를 포함하는 파일들을 여러 클라이언트 기기들 (1010, 1020, 1030) 로부터 수신할 수 있다. The application programming interface (API) 1100 of the intelligence platform 10000 according to the embodiment may receive files containing malicious code that performs cyber security attacks from various client devices 1010, 1020, and 1030. there is.

실시 예에 따른 인텔리전스 플랫폼(10000)은 전처리부(미도시), 분석 프레임 워크(1210)와 예측 프레임 워크(1220) 및 AI 엔진 (1230) 및 후처리부(미도시)을 포함할 수 있다. The intelligence platform 10000 according to the embodiment may include a pre-processing unit (not shown), an analysis framework 1210, a prediction framework 1220, an AI engine 1230, and a post-processing unit (not shown).

인텔리전스 플랫폼(10000)의 전처리부는 클라이언트 기기들(1010, 1020, 1030)로부터 수신된 여러 가지 파일들에 대한 사이버 위협 정보를 분석할 수 있도록 전처리를 수행한다.The preprocessing unit of the intelligence platform 10000 performs preprocessing to analyze cyber threat information on various files received from the client devices 1010, 1020, and 1030.

예를 들면 전처리부는 수신된 파일을 처리하여 그 파일로부터 파일의 출처 정보, 파일을 얻은 수집 정보, 파일의 사용자 정보 등을 포함한 여러 가지 메타 정보를 얻을 수 있다. 예를 들어 파일이 URL (uniform resource locator)을 포함하거나 또는 전자메일에 포함된 경우 파일에 대한 수집 정보를 얻을 수 있다. 사용자 정보는 파일의 생성, 업로드 또는 최종 저장한 사용자 정보 등을 포함할 수 있다. 전처리 과정에서 파일의 메타 정보로서 IP(internet protocol) 정보, 이에 기반한 국가 정보, API(Application Programming Interface) key 정보 등을 얻을 수 있다.For example, the preprocessor processes the received file and can obtain various meta information from the file, including the source information of the file, collection information from which the file was obtained, user information of the file, etc. For example, you can obtain aggregate information about a file if it contains a URL (uniform resource locator) or is included in an email. User information may include user information that created, uploaded, or finally saved the file. During the preprocessing process, IP (internet protocol) information, country information based on this, and API (Application Programming Interface) key information can be obtained as meta information of the file.

인텔리전스 플랫폼(10000)의 전처리부(미도시)는 입력된 파일의 해쉬(Hash) 값을 추출할 수 있다. 해쉬 값이 이미 사이버 위협 정보 처리 장치에 알려진 것이라면 이를 기반으로 파일의 종류를 식별할 수 있다. The preprocessor (not shown) of the intelligence platform 10000 can extract the hash value of the input file. If the hash value is already known to the cyber threat information processing device, the type of file can be identified based on it.

만약 이미 알려진 파일이 아니라면 운영하는 C-TAS(Cyber Threats Analysis System), CTA(Cyber Threat Alliance)의 운영시스템, VitusTotal 등의 사이버 위협 정보의 레퍼런스 인터넷 사이트에 해쉬 값과 파일 정보를 조회하여 파일 종류 식별을 위한 분석 정보를 얻을 수 있다. If the file is not already known, identify the file type by checking the hash value and file information on reference internet sites for cyber threat information such as the operated C-TAS (Cyber Threats Analysis System), CTA (Cyber Threat Alliance) operating system, and VitusTotal. You can obtain analysis information for.

설명한 바와 같이 입력된 파일의 해쉬 값은 MD5 (Message-Digest algorithm 5), SHA1 (Secure Hash Algorithm 1), SHA 256 등의 해쉬 함수의 해쉬 값이 될 수 있다. As described, the hash value of the input file can be a hash value of a hash function such as MD5 (Message-Digest algorithm 5), SHA1 (Secure Hash Algorithm 1), or SHA 256.

분석 프레임 워크(1210)는 입력된 파일로부터 악성 코드에 대한 분석 정보를 생성할 수 있다. The analysis framework 1210 can generate analysis information about malicious code from the input file.

분석 프레임 워크(1210)는 정적 분석 모듈(1211), 동적분석 모듈(1213), 심층분석 모듈(1215) 및 연관관계분석 모듈(1217) 등 여러 가지 분석 방식에 따른 분석 모듈을 포함할 수 있다. The analysis framework 1210 may include analysis modules according to various analysis methods, such as a static analysis module 1211, a dynamic analysis module 1213, an in-depth analysis module 1215, and a correlation analysis module 1217.

정적 분석 모듈(1211)은 입력된 파일과 관련된 악성 행위의 분석 정보는 파일 자체에 대한 악성 코드 관련 정보를 분석할 수 있다. The static analysis module 1211 can analyze malicious code-related information about the file itself as well as analysis information about malicious behavior related to the input file.

동적분석 모듈(1213)은 입력된 파일로부터 얻은 여러 가지 정보들을 기반으로 여러 행위를 수행함으로써 악성 코드 관련 정보를 분석할 수 있다. The dynamic analysis module 1213 can analyze information related to malicious code by performing various actions based on various information obtained from the input file.

심층분석 모듈(1215)은 입력된 파일과 관련된 실행 가능한 파일을 가공한 정보를 이용하거나 실행 가능한 파일과 관련된 메모리 분석을 수행하여 악성 코드 관련 정보를 분석할 수 있다. 심층분석 모듈(1215)은 악성 행위를 정확하게 식별할 수 있도록 인공 지능 분석을 포함할 수 있다.The deep analysis module 1215 can analyze malicious code-related information by using information processed from executable files related to the input file or by performing memory analysis related to the executable file. Deep analysis module 1215 may include artificial intelligence analysis to accurately identify malicious behavior.

연관관계분석 모듈(1217)은 입력된 파일과 관련하여 이미 저장된 분석 정보들이나 또는 생성된 분석 정보들을 서로 연관시켜 공격 행위나 공격자에 대한 연관 관계를 추정할 수 있는 연관관계 분석 정보를 포함할 수 있다. The correlation analysis module 1217 may include correlation analysis information that can estimate the relationship to an attack or an attacker by associating analysis information already stored or generated in relation to the input file. .

분석 프레임 워크(1210)는 정적 분석 모듈(1211), 동적분석 모듈(1213), 심층분석 모듈(1215) 및 연관관계분석 모듈(1217)로부터 분석된 정보들을 악성 코드의 특성과 행위에 대한 분석 결과들을 서로 결합하고, 결합된 최종 정보를 사용자에게 제공할 수 있다. The analysis framework 1210 analyzes the information analyzed from the static analysis module 1211, dynamic analysis module 1213, in-depth analysis module 1215, and correlation analysis module 1217 on the characteristics and behavior of the malicious code. They can be combined with each other, and the combined final information can be provided to the user.

예를 들어 분석 프레임 워크(1210)는 하나의 파일에 대한 정적 분석 정보, 동적 분석 정보, 심층 분석 정보, 연관관계 분석 정보 등은 정확한 공격 기법과 공격자 식별을 위해 통합 분석할 수 있다. 분석 프레임 워크(1210)는 분석 정보들 사이에 중복된 부분을 제거하고 분석 정보들 사이에 공통의 정보는 정확도를 높이는데 사용한다. For example, the analysis framework 1210 can integrate and analyze static analysis information, dynamic analysis information, in-depth analysis information, correlation analysis information, etc. for one file to identify accurate attack techniques and attackers. The analysis framework 1210 removes overlap between analysis information and uses common information among analysis information to increase accuracy.

분석 프레임 워크(1210)는 제공하는 정보를 표준화할 수 있는데, 예를 들면 여러 분석과 경로를 통해 수집된 사이버 위협 침해 정보(indicator of compromise, IoC)들을 노멀라이징(normalizing)하거나 인리치먼트(enrichment) 작업한다. 그리고 최종 표준화된 악성 코드 또는 악성 행위에 대한 분석 정보를 생성할 수 있다. The analysis framework 1210 can standardize the information provided, for example, normalizing or enriching cyber threat indicator of compromise (IoC) collected through various analyzes and paths. work In addition, analysis information on final standardized malicious code or malicious behavior can be generated.

분석 프레임 워크(1210)의 정적 분석 모듈(1211), 동적분석 모듈(1213), 심층분석 모듈(1215) 및 연관관계분석 모듈(1217)은 분석되는 데이터의 정확성을 높이기 위해 분석 대상 데이터에 인공지능 분석에 따른 머신 러닝이나 딥 러닝 기법을 수행할 수 있다. The static analysis module (1211), dynamic analysis module (1213), in-depth analysis module (1215), and correlation analysis module (1217) of the analysis framework (1210) use artificial intelligence on the data to be analyzed to increase the accuracy of the data being analyzed. Machine learning or deep learning techniques can be performed according to the analysis.

AI 엔진(1230)은 분석 프레임 워크(1210)의 분석 정보 생성을 위해 인공지능 분석 알고리즘을 수행할 수 있다.The AI engine 1230 may perform an artificial intelligence analysis algorithm to generate analysis information of the analysis framework 1210.

이러한 정보는 데이터 베이스(2200)에 저장될 수 있고 서버(2100)는 사용자나 클라이언트 요청에 따라 데이터 베이스(2200)에 저장된 악성 코드 또는 악성 행위에 대한 분석 정보를 사이버 위협 인텔리전스 정보로 제공할 수 있다. This information can be stored in the database 2200, and the server 2100 can provide analysis information about malicious code or malicious behavior stored in the database 2200 as cyber threat intelligence information according to user or client requests. .

예측 프레임 워크(1220)은 제1예측정보생성모듈(1221), 제2예측정보생성모듈 (1223) 등 예측 정보에 따라 다수의 예측정보생성모듈들을 포함할 수 있다. 예측 프레임 워크(1220)은 분석 정확도를 높이기 위해 위의 분석된 여러 가지 정보의 데이터 세트를 이용하여 악성 행위의 발생 여부, 공격 기법, 공격자 그룹 등에 대한 예측 정보를 생성할 수 있다.The prediction framework 1220 may include a number of prediction information generation modules according to prediction information, such as a first prediction information generation module 1221 and a second prediction information generation module 1223. In order to increase analysis accuracy, the prediction framework 1220 can generate prediction information about whether malicious behavior has occurred, attack techniques, attacker groups, etc., using the data sets of various information analyzed above.

예측 프레임 워크(1220)는 분석 프레임 워크(1210)가 분석한 분석 정보에 대한 데이터 세트를 기반으로 AI 엔진(1230)을 이용하여 인공지능 분석 알고리즘을 수행하여 입력된 파일과 관련된 악성 행위에 대한 예측 정보를 생성할 수 있다.The prediction framework 1220 performs an artificial intelligence analysis algorithm using the AI engine 1230 based on the data set on the analysis information analyzed by the analysis framework 1210 to predict malicious behavior related to the input file. Information can be generated.

AI 엔진(1230)은 분석 정보에 대한 데이터 세트에 대해 인공 지능 기반의 머신 러닝으로 학습하여 추가적인 분석 정보를 생성하고, 추가 생성된 분석 정보는 다시 새로운 학습 데이터로서 인공 지능의 입력 데이터로 이용될 수 있다.The AI engine 1230 generates additional analysis information by learning about the data set for analysis information using artificial intelligence-based machine learning, and the additionally generated analysis information can be used as new learning data as input data for artificial intelligence. there is.

예측 프레임 워크(1220)가 생성하는 예측 정보는 악성 코드 제작자 정보, 악성 코드 공격 방법 정보, 악성 코드 공격 그룹 예측, 악성 코드 유사도 예측 정보, 및 악성 코드 확산도 예측 정보 등을 포함할 수 있다.Prediction information generated by the prediction framework 1220 may include malicious code creator information, malicious code attack method information, malicious code attack group prediction, malicious code similarity prediction information, and malicious code spread prediction information.

위와 같이 여러 가지 악성 코드나 공격 행위 등에 관련된 예측 정보를 생성한 예측 프레임 워크(1220)는 생성한 예측 정보들을 데이터베이스(2200)에 저장할 수 있다. 그리고 사용자의 요청에 따라 또는 공격 징후에 따라 생성한 예측정보를 사용자에게 제공할 수 있다.As described above, the prediction framework 1220, which generates prediction information related to various malicious codes or attack actions, can store the generated prediction information in the database 2200. In addition, prediction information generated according to the user's request or attack signs can be provided to the user.

서버(2100)는 설명한 바와 같이 데이터 베이스(2200)에 저장된 분석 정보 또는 예측 정보에 대한 후처리 후 상기 입력된 파일과 관련된 사이버 위협 정보를 제공할 수 있다. As described, the server 2100 may provide cyber threat information related to the input file after post-processing the analysis information or prediction information stored in the database 2200.

서버(2100)의 프로세서는 생성된 분석 정보 또는 예측 정보에 기초하여 악성 코드 종류 및 악성 코드의 위험도를 결정하는 작업을 수행한다. The processor of the server 2100 determines the type of malicious code and the risk level of the malicious code based on the generated analysis information or prediction information.

서버(2100)의 프로세서는 악성 코드에 대한 프로파일링 정보를 생성할 수 있다. 데이터베이스(2200)는 파일 분석을 통해 파일에 대한 자체 분석을 수행한 결과나 추가 및 예측 분석을 수행한 결과를 저장할 수 있다. The processor of the server 2100 may generate profiling information about malicious code. The database 2200 may store the results of self-analysis of a file or the results of additional and predictive analysis through file analysis.

서버(2100)에 의해 사용자에게 제공되는 사이버 위협 정보는, 기술된 전처리가 수행된 정보, 생성되거나 식별된 분석 정보, 생성된 예측 정보 또는 이 정보들의 취합 정보나 이 정보들을 기반으로 결정된 정보를 포함할 수 있다. Cyber threat information provided to the user by the server 2100 includes information on which the described preprocessing has been performed, analysis information generated or identified, predicted information generated, or aggregate information of this information or information determined based on this information. can do.

이러한 통합 분석 정보는 해당 파일에 대응하여 서버나 데이터 베이스에 표준화된 포맷으로 저장될 수 있다. 이러한 통합 분석 정보는 표준화된 포맷으로 저장되어 사이버 위협 정보를 검색 또는 조회하는데 사용될 수 있다.This integrated analysis information can be stored in a standardized format on a server or database corresponding to the corresponding file. This integrated analysis information can be stored in a standardized format and used to search or query cyber threat information.

도 11은 개시하는 실시 예에 따라 분석 프레임 워크 중 정적 분석 모듈의 기능을 상세히 설명하기 위한 일 예를 나타낸다. 이 도면을 참조하여 정적 분석 모듈의 수행 과정을 예시하면 다음과 같다. Figure 11 shows an example to explain in detail the function of the static analysis module in the analysis framework according to the disclosed embodiment. Referring to this drawing, an example of the static analysis module performance process is as follows.

개시한 바와 같이 인텔리전스 플랫폼(100)의 분석 프레임 워크(15000)는 정적분석 모듈(15100)을 포함할 수 있다.As disclosed, the analysis framework 15000 of the intelligence platform 100 may include a static analysis module 15100.

정적분석 모듈(15100)은 파일 자체를 분석할 수 있는데, 파일 또는 파일의 메타 정보 등에 기초하여 코딩 기반의 취약 항목 존재 여부, 인터페이스 또는 함수의 호출 구조 문제, 또는 파일의 바이너리 구조 등 파일과 관련하여 악성 행위에 연관될 수 있는 정보를 얻을 수 있다.The static analysis module 15100 can analyze the file itself, based on the file or its meta information, whether there is a coding-based vulnerability, problems with the call structure of the interface or function, or the binary structure of the file. Information that may be related to malicious activity can be obtained.

정적분석 모듈(15100)은 파일구조분석 모듈(15101), 파일패턴분석 모듈(15103), 파일제작정보분석 모듈(15105), 파일환경분석 모듈(15107), 및 파일관련분석 모듈(15109)를 포함할 수 있다. The static analysis module (15100) includes a file structure analysis module (15101), a file pattern analysis module (15103), a file production information analysis module (15105), a file environment analysis module (15107), and a file-related analysis module (15109). can do.

정적분석 모듈(15100) 중 파일구조분석 모듈(15101)는 파일이 실행되지 않는 환경에서 식별된 파일의 기본적인 구조 정보를 분석할 수 있다. Among the static analysis modules 15100, the file structure analysis module 15101 can analyze the basic structure information of an identified file in an environment in which the file is not executed.

파일구조분석 모듈(15101)는 예를 들어 파일의 종류가 ELF(Executable and Linkable Format), PE(Portable Executable), APK(Android Application Package) 등에 파일 종류가 다르더라도 파일의 위 파일 구조나 그 구조로부터 추출할 수 있는 정보를 획득하거나 분석한다. For example, the file structure analysis module 15101 analyzes the above file structure of the file or its structure even if the file types are different, such as ELF (Executable and Linkable Format), PE (Portable Executable), and APK (Android Application Package). Obtain or analyze extractable information.

파일패턴분석 모듈(15103)은 파일의 패턴 분석을 수행할 수 있는데, 식별된 파일에 어떤 조치를 취하지 않고 파일 자체를 오픈하여 추출할 수 있는 여러 스트링(string) 등을 확인하여 파일의 패턴을 얻을 수 있다. The file pattern analysis module (15103) can perform file pattern analysis. Without taking any action on the identified file, the file itself can be opened and extracted by checking various strings to obtain the file pattern. You can.

파일제작정보분석 모듈(15105)은 입력된 파일이 제작과 관련된 정보를 얻고 분석할 수 있다. 파일제작정보분석 모듈(15105)은 파일이 가지고 있는 고유 정보나 메타 정보, 예를 들면 파일 제작자 정보, 실행 파일인 경우 코드사이닝(codesigning) 정보 등을 얻을 수 있다. The file production information analysis module 15105 can obtain and analyze information related to the production of the input file. The file production information analysis module 15105 can obtain unique information or meta information contained in the file, such as file creator information and, in the case of an executable file, codesigning information.

파일환경분석 모듈(15107)은 입력된 파일의 환경 정보를 분석할 수 있다. 파일환경분석 모듈(15107)은 대상 파일이 갖추어야 할 시스템 환경적 구성 요소 정보 등에 정보를 얻을 수 있다. The file environment analysis module 15107 can analyze the environment information of the input file. The file environment analysis module 15107 can obtain information such as system environment component information that the target file must have.

파일관련분석 모듈(15109)은 그리고 입력된 파일과 관련된 여러 가지 기타 메타 정보들을 분석할 수 있다. The file-related analysis module 15109 can also analyze various other meta information related to the input file.

정적분석 모듈(15100)은 입력된 파일의 수행 없이 개시한 바와 같이 얻고 분석된 파일 자체의 정적 정보를 JSON (JavaScript Object Notation)과 같은 데이터 포맷으로 변환하여 데이터베이스(2200)에 저장할 수 있다.The static analysis module 15100 can convert the static information of the file itself, obtained and analyzed as described, into a data format such as JSON (JavaScript Object Notation) and store it in the database 2200 without executing the input file.

서버(2100)는 데이터베이스(2200)에 저장된 파일에 대한 정적 분석 정보를 사용자에 제공할 수 있다. The server 2100 may provide static analysis information about files stored in the database 2200 to the user.

분석프레임워크(15000)의 정적분석 모듈(15100)은 입력된 파일의 해쉬(Hash) 값과, 데이터베이스(2200)에 악성코드에 대해 이미 저장된 해쉬 값을 비교하여 상기 입력된 파일이 악성코드 여부를 분석할 수 있다. 그리고 입력 파일의 악성 코드에 대해 분석된 정보는 데이터베이스(2200)에 저장할 수 있다.The static analysis module 15100 of the analysis framework 15000 compares the hash value of the input file with the hash value already stored for malicious code in the database 2200 to determine whether the input file is malicious code. It can be analyzed. And the information analyzed about the malicious code in the input file can be stored in the database 2200.

분석프레임워크(15000)의 정적분석 모듈(15100)은 입력 파일이 모바일 데이터 인 경우 입력된 파일로부터 모바일 악성 의심 코드의 코드 정보를 추출할 수 있다. 악성 의심 코드의 코드 정보는 해쉬(Hash) 정보, 코드 크기 정보, 파일 헤더 정보, 코드 내에 포함되어 있는 식별 가능한 문자열 정보 및 동작 플랫폼 정보 등을 포함할 수 있다.When the input file is mobile data, the static analysis module 15100 of the analysis framework 15000 can extract code information of suspected mobile malicious code from the input file. Code information of suspected malicious code may include hash information, code size information, file header information, identifiable string information included in the code, and operating platform information.

분석프레임워크(15000)의 정적분석 모듈(15100)은 분석한 분석정보를 기반으로 파일 내에 악성 코드가 있는지 탐지할 수 있다. 그리고 탐지된 악성 코드와 관련된 정적 분석 정보를 데이터베이스(2200)에 저장할 수 있다.The static analysis module 15100 of the analysis framework 15000 can detect whether there is malicious code in the file based on the analyzed analysis information. Additionally, static analysis information related to the detected malicious code may be stored in the database 2200.

도 12는 개시하는 실시 예에 따라 분석 프레임 워크 중 동적분석 모듈의 기능을 상세히 설명하기 위한 일 예를 나타낸다. 이 도면을 참조하여 동적분석 모듈의 수행 과정을 예시하면 다음과 같다.Figure 12 shows an example to explain in detail the function of the dynamic analysis module in the analysis framework according to the disclosed embodiment. Referring to this drawing, the execution process of the dynamic analysis module is exemplified as follows.

예시한 인텔리전스 플랫폼(10000)의 분석 프레임 워크(15000)는 동적분석 모듈(15200)을 포함할 수 있다. 동적분석 모듈(15200)은 전처리된 파일 정보 또는 정적 분석 정보 중 적어도 하나에 기반하여 식별된 파일의 실행 환경에서 실행된 결과 데이터에 따른 동적 분석 정보를 획득할 수 있다. The analysis framework 15000 of the exemplary intelligence platform 10000 may include a dynamic analysis module 15200. The dynamic analysis module 15200 may obtain dynamic analysis information according to result data executed in the execution environment of a file identified based on at least one of preprocessed file information or static analysis information.

동적분석 모듈(15200)은 파일이 실행 중인 환경에서 다양한 입출력 데이터를 분석하거나 또는 파일 실행 시 실행 환경과 상호작용의 변화를 분석하여 취약하거나 위험한 이상현상을 탐지할 수 있다. 동적분석 모듈(15200)은 가상화 환경 등을 생성하고 생성된 가상화 환경에서 파일을 직접적으로 실행하여 이상 여부를 분석할 수 있다.The dynamic analysis module 15200 can detect vulnerable or dangerous abnormalities by analyzing various input and output data in the environment in which the file is running, or by analyzing changes in the execution environment and interaction when the file is executed. The dynamic analysis module 15200 can create a virtualization environment and analyze any abnormalities by directly executing files in the created virtualization environment.

분석 프레임 워크(15000)의 동적분석 모듈(15200)은 환경준비 모듈(15201), 파일실행 모듈(15203), 행위수집 모듈(15205), 분석결과취합 모듈(15207), 및 분석환경복구 모듈(15209)를 포함할 수 있다. The dynamic analysis module (15200) of the analysis framework (15000) includes the environment preparation module (15201), file execution module (15203), behavior collection module (15205), analysis result collection module (15207), and analysis environment recovery module (15209). ) may include.

환경준비 모듈(15201)은 입력 파일과 관련된 실행 파일을 실행하기 위한 동적 분석 환경을 생성하고 준비한다. 환경준비 모듈(15201)은 실행 파일의 타입을 식별한 경우 각각의 파일의 타입에 따라 어떤 실행 환경이 필요한지 식별할 수 있다. 예를 들면 파일에 따라 윈도우 운영체제, 리눅스 운영체제, 모바일 기기 운영체제에서 실행되는 파일인지 식별할 수 있다. 환경준비 모듈(15201)은 실행 파일을 실행하기 위해 식별된 환경을 준비할 수 있다. The environment preparation module 15201 creates and prepares a dynamic analysis environment for executing an executable file associated with an input file. When the type of executable file is identified, the environment preparation module 15201 can identify what execution environment is needed according to the type of each file. For example, depending on the file, it can be identified whether it is a file running on a Windows operating system, a Linux operating system, or a mobile device operating system. The environment preparation module 15201 may prepare the identified environment for executing the executable file.

파일실행 모듈(15203)은 환경준비 모듈(15201)이 준비한 분석 환경에서 실행 파일이 악성 코드 포함하고 있는지 여부를 판별하기 위해 파일을 실행한다. The file execution module 15203 executes the file in the analysis environment prepared by the environment preparation module 15201 to determine whether the executable file contains malicious code.

행위수집 모듈(15205)은 동적 분석 정보를 획득하기 위해 실행 환경에서 실행된 파일의 실행 중에 시스템에서 발생하는 이벤트를 수집할 수 있다. 예를 들어 행위수집 모듈(15205)은 파일 자체, 프로세스, 메모리, 레지스트리, 네트워크의 시스템에 대한 이벤트 또는 각 시스템의 설정을 변경시키는 이벤트를 수집할 수 있다. The behavior collection module 15205 can collect events that occur in the system during the execution of files executed in the execution environment to obtain dynamic analysis information. For example, the behavior collection module 15205 can collect events for the files themselves, processes, memory, registry, network systems, or events that change the settings of each system.

분석결과취합 모듈(15207)은 행위수집 모듈(15205)이 수집한 이벤트들을 개별적으로 또는 취합하여 분석한다.The analysis result collection module 15207 analyzes the events collected by the behavior collection module 15205 individually or in aggregate.

분석환경복구 모듈(15209)은 수집된 결과를 취합한 후 동적 분석을 위한 환경을 다시 복구한다. The analysis environment recovery module (15209) collects the collected results and then restores the environment for dynamic analysis.

동적분석 모듈(15200)은 이와 같이 획득된 결과를 해당 파일 또는 파일의 악성 코드에 대응된 동적 분석 정보로 데이터베이스(2200)에 저장할 수 있다.The dynamic analysis module 15200 may store the results obtained in this way in the database 2200 as dynamic analysis information corresponding to the file or malicious code in the file.

동적분석 모듈(15200)이 위 실시 예에 따라 동적 분석 정보를 수집하고 분석하는 예를 간략하게 개시하면 다음과 같다. An example in which the dynamic analysis module 15200 collects and analyzes dynamic analysis information according to the above embodiment will be briefly described as follows.

동적 분석의 일 실시 예로서, 동적분석 모듈(15200)은 입력된 파일이 모바일 기기 운영 체제에서 동작하는 파일로 식별된 경우, 파일을 모바일 단말 또는 모바일 단말 환경과 동일하게 구성된 에뮬레이터나 가상화 환경을 생성할 수 있다. 그리고 동적분석 모듈(15200)은 생성한 에뮬레이터나 가상화 환경에서 상기 파일을 직접 실행할 수 있다. 동적분석 모듈(15200)은 파일 내에 모바일 악성 의심 코드가 실행된 후에 단말에 발생하는 모든 변화, 즉 행위 정보를 추출하고 기록할 수 있다. 행위 정보는 단말의 운영체제(OS) 환경이 다른 경우라도 프로세스, 파일, 메모리 및 네트워크 정보 등의 이벤트 정보를 포함할 수 있다.As an example of dynamic analysis, when the input file is identified as a file that operates on a mobile device operating system, the dynamic analysis module 15200 creates a mobile terminal or an emulator or virtualization environment configured identically to the mobile device environment. can do. And the dynamic analysis module 15200 can directly execute the file in the created emulator or virtualization environment. The dynamic analysis module 15200 can extract and record all changes that occur in the terminal after the suspected mobile malicious code is executed in the file, that is, behavioral information. Behavior information may include event information such as process, file, memory, and network information even if the operating system (OS) environment of the terminal is different.

동적 분석의 다른 실시 예로서 동적분석 모듈(15200)은 전처리 과정에서 입력된 파일의 해쉬(Hash) 값을 추출되지 않고 사용자 단말에서 추출된 경우라도 단말에서 추출된 파일의 해쉬 값을 인텔리전스 플랫폼(10000)을 통해 수신할 수 있다.As another example of dynamic analysis, the dynamic analysis module 15200 extracts the hash value of the file extracted from the terminal even if the hash value of the input file is not extracted during the pre-processing process and is extracted from the user terminal through the intelligence platform (10000). ) can be received through.

데이터베이스(2200)에 해당 파일의 해쉬 값이 이미 저장되지 않는 경우 동적분석 모듈(15200)은 수신된 파일을 가상 또는 실제의 운영체제에서 실행시키고, 실행 시에 발생되는 행위를 실시간으로 수집하고 수집된 동적분석 정보를 데이터베이스(2200)에 이미 저장된 정보와 비교할 수 있다. If the hash value of the file is not already stored in the database 2200, the dynamic analysis module 15200 runs the received file in a virtual or real operating system, collects in real time the actions that occur during execution, and collects the collected dynamic Analysis information can be compared with information already stored in the database 2200.

상기 비교 결과 이미 정의된 위험도를 초과하는 경우 입력된 파일이 악성 코드를 포함하고 있다고 판단할 수 있고, 동적분석 모듈(15200)은 악성 코드에 대응되는 파일의 해쉬 값을 데이터베이스(2200)에 저장할 수 있다. 저장된 악성 해쉬 값은 추후 정적 분석 등에 이용할 수 있다. If the comparison result exceeds the already defined risk level, it may be determined that the input file contains malicious code, and the dynamic analysis module 15200 may store the hash value of the file corresponding to the malicious code in the database 2200. there is. The stored malicious hash value can be used for static analysis later.

그런데 동적 분석을 수행할 수 있는 플랫폼과 서버가 중지된 경우는 이러한 동적 분석에 매우 오랜 시간이 소요될 수 있고 해당 행위가 사전 차단된 경우에도 동적 분석을 수행할 수 없는 경우가 있다. However, if the platform and server that can perform dynamic analysis are stopped, such dynamic analysis may take a very long time, and there are cases where dynamic analysis cannot be performed even if the action is blocked in advance.

실시 예에 따른 동적분석 모듈(15200)은 네트워크 행위를 분석할 경우, 악성 코드가 사용하는 명령 제어 서버(C&C 서버), 추가적인 악성 코드를 다운로드하기 위한 다운로드 서버 또는 악성 코드들끼리 정보를 주고 받거나 해커와 정보를 주고 받는 커뮤니케이션 패킷 등의 정보를 추출하여 분석할 수 있다. When analyzing network behavior, the dynamic analysis module 15200 according to the embodiment is a command control server (C&C server) used by malicious code, a download server for downloading additional malicious code, or exchanges information between malicious codes or a hacker. You can extract and analyze information such as communication packets that exchange information with.

여기서 개시하는 동적분석 모듈(15200)은 서버(2100)가 동작 중지된 경우에도 동적 분석을 수행하도록 할 수 있다. The dynamic analysis module 15200 disclosed here can perform dynamic analysis even when the server 2100 is stopped operating.

예를 들어 네트워크 접속 유도 장치(미도시)가 악성 코드에 감염된 클라이언트 단말과 인텔리전스 플랫폼(10000) 또는 서버(2100)에 사이에서 단말의 접속 요청을 처리하도록 하여 동적 분석을 진행하도록 할 수도 있다. For example, a network connection induction device (not shown) may process a terminal's connection request between a client terminal infected with malicious code and the intelligence platform 10000 or the server 2100 to perform dynamic analysis.

네트워크 접속 유도 장치(미도시)는 단말로부터 접속 요청을 수신하고 이를 악성 코드 행위를 유발시키는 C&C 서버로 전달하도록 할 수 있다. A network connection induction device (not shown) can receive a connection request from a terminal and transmit it to a C&C server that triggers malicious code actions.

그리고, 만약 상기 네트워크 접속 유도 장치가 일정 시간 내에 C&C 서버로부터 응답 패킷을 수신하지 못하면, 상기 네트워크 접속 유도 장치는 별도의 가상의 응답 패킷과 접속 요청을 함께 상기 단말에 전송하도록 한다. And, if the network connection inducing device does not receive a response packet from the C&C server within a certain time, the network connection inducing device transmits a separate virtual response packet and a connection request to the terminal.

가상의 응답 패킷을 이용하는 예는 가상의 응답 패킷 TCP 세션을 생성하기 위한 패킷 형식이면 충분하다. 악성 코드가 사용하는 일반적인 TCP (Transmission Control Protocol) 프로토콜은 TCP 세션만 생성하도록 상기 클라이언트 단말이 전송하는 데이터 패킷을 생성할 수 있다. 그리고 상기 데이터 패킷으로부터 악성 코드의 동적 분석에 필요한 중요 정보들을 추출할 수 있다. 이와 같이 하면 관리 서버가 동작하지 않더라도 네트워크 접속 유도 장치의 동작을 이용하여 동적 분석을 수행할 수 있다.For an example of using a virtual response packet, a packet format for creating a virtual response packet TCP session is sufficient. The general TCP (Transmission Control Protocol) protocol used by malicious code can generate data packets transmitted by the client terminal to create only a TCP session. And important information required for dynamic analysis of malicious code can be extracted from the data packet. In this way, even if the management server is not operating, dynamic analysis can be performed using the operation of the network connection guidance device.

도 13은 개시하는 실시 예에 따라 분석 프레임 워크 중 심층분석 모듈의 기능을 상세히 설명하기 위한 일 예를 나타낸다. 이 도면을 참조하여 심층분석 모듈의 수행 과정을 예시하면 다음과 같다.Figure 13 shows an example to explain in detail the function of the deep analysis module in the analysis framework according to the disclosed embodiment. Referring to this drawing, an example of the in-depth analysis module execution process is as follows.

인텔리전스 플랫폼(10000)의 분석 프레임 워크(15000)는 심층분석 모듈(15300)을 포함할 수 있다. 심층분석 모듈(15300)은 수신된 파일 포함하는 실행 가능한 파일 디스어셈블링하여 기계 언어 레벨에서 분석하여 악성 행위를 유발하는 공격 기법이나 공격자를 식별할 수 있다. The analysis framework (15000) of the intelligence platform (10000) may include an in-depth analysis module (15300). The deep analysis module 15300 disassembles executable files, including the received files, and analyzes them at the machine language level to identify attack techniques or attackers that cause malicious actions.

심층분석 모듈(15300)은 기술한 정적 분석이나 동적 분석의 기반으로 심층 분석 정보를 얻을 수도 있고, 분석자의 해석 기준에 따라 실행 가능한 파일을 악성 행위를 유발하는 파일을 이용하여 분석할 수도 있다. The in-depth analysis module 15300 can obtain in-depth analysis information based on the described static analysis or dynamic analysis, and can also analyze executable files using files that cause malicious behavior according to the analyst's interpretation criteria.

심층분석 모듈(15300)은 파일 자체의 분석 정보나 또는 파일을 여러 번 가공한 정보를 포함할 수 있고 이미 저장된 정보를 기반으로 심층 분석 정보를 생성할 수 있다The in-depth analysis module 15300 may include analysis information of the file itself or information that has been processed several times, and may generate in-depth analysis information based on information already stored.

심층분석 모듈(15300)은 또한, 심층 분석은 디스어셈블링(disassembling) 모듈(15301), 기계언어코드추출 모듈(15303), 공격행위(TTP)식별 모듈(15305), 공격자식별 모듈(15307), 테인트분석(taint analysis)모듈(15309)를 포함할 수 있다. The in-depth analysis module (15300) also includes a disassembling module (15301), a machine language code extraction module (15303), an attack behavior (TTP) identification module (15305), an attacker identification module (15307), May include a taint analysis module (15309).

분석 프레임 워크(15000)는 심층분석 모듈(15300)은 AI 엔진(1230)을 이용하여 인공 지능 기반의 머신 러닝 알고리즘을 수행하고, 그 결과로 심층분석 정보를 얻을 수 있다. The analysis framework 15000 and the in-depth analysis module 15300 use the AI engine 1230 to perform an artificial intelligence-based machine learning algorithm, and as a result, in-depth analysis information can be obtained.

디스어셈블링(disassembling) 모듈(15301)은 입력된 파일이 실행 가능한 파일을 포함할 경우 실행 가능한 파일을 디스어셈블(disassemble)한다. The disassembling module 15301 disassembles the executable file when the input file includes an executable file.

실행 가능한 파일이 디스어셈블링(disassembling)되면 오브젝트 코드 형식의 특정 형식, 예를 들면 어셈블러 언어 형식의 코드로 변환된다. When an executable file is disassembled, it is converted into code in a specific form of object code format, such as an assembler language.

기계언어코드추출모듈(15303)은 일정 형식을 가진 OP-CODE (operation code)와 ASM-CODE를 포함하는 디스어셈블드 코드를 추출할 수 있다. 일정 형식을 가진 OP-CODE (operation code)는 악성 코드와 관련된 OP-CODE 부분을 의미하는 것으로 추출된 OP-CODE를 포함하는 디스어셈블드 코드는 악성 코드 또는 악성 행위와 관련된 부분을 지칭한다. The machine language code extraction module 15303 can extract disassembled code including OP-CODE (operation code) and ASM-CODE in a certain format. OP-CODE (operation code) with a certain format refers to the OP-CODE part related to malicious code, and the disassembled code including the extracted OP-CODE refers to the part related to malicious code or malicious behavior.

기계언어코드추출모듈(15303)은 디스어셈블드 코드를 일정 형식의 데이터 포맷을 변환할 수 있다. 일정 형식의 데이터 포맷의 변환 예시는 아래에서 개시한다.The machine language code extraction module 15303 can convert the disassembled code into a certain data format. An example of conversion of a certain type of data format is disclosed below.

실행 가능한 파일의 디스어셈블드 코드를 사이버 보안 전문가 집단들이 공통적으로 인정하는 공격 행위 세부 요소들로 매칭하도록 하여 그 공격행위를 식별할 수 있다.The attack can be identified by matching the disassembled code of the executable file with attack details commonly recognized by cybersecurity experts.

공격행위(TTP)식별 모듈(15305)은 추출된 디스어셈블드 코드나 일정 형식으로 변환된 포맷의 데이터를 기반으로 공격행위, 공격기법 및 공격 프로세스를 식별할 수 있다. The attack behavior (TTP) identification module 15305 can identify attack behavior, attack techniques, and attack processes based on extracted disassembled code or data in a format converted to a certain format.

공격행위(TTP)식별 모듈(15305)은 실행 가능한 파일의 디스어셈블드 코드를 기반의 퍼지 해쉬 값을 사이버 보안 전문가 집단들이 공통적으로 인정하는 공격 행위 세부 요소들로 매칭하도록 하여 그 공격행위를 식별할 수 있다.The attack behavior (TTP) identification module (15305) identifies the attack behavior by matching the fuzzy hash value based on the disassembled code of the executable file with the detailed attack behavior elements commonly recognized by cybersecurity expert groups. You can.

공격행위(TTP)식별 모듈(15305)은 이미 추출된 디스어셈블드 코드들과 공격행위(TTP) 별 매칭 관계를 저장한 데이터베이스(2200) 또는 외부 레퍼런스 데이터베이스에 기반하여 공격행위(TTP)를 식별하도록 할 수 있다. 공격행위(TTP)식별 모듈(15305)은 AI 엔진(1230)의 머신 러닝을 이용하여 추출된 디스어셈블드 코드들의 CTPH 알고리즘 등의 퍼지 해쉬 값과 공격행위(TTP) 별 매칭 유사도를 고속으로 수행하여 공격행위 또는 공격기법을 분류할 수 있다..The attack behavior (TTP) identification module 15305 identifies the attack behavior (TTP) based on the database 2200 or an external reference database that stores the matching relationship between the already extracted disassembled codes and the attack behavior (TTP). can do. The attack behavior (TTP) identification module 15305 performs fuzzy hash values such as the CTPH algorithm of disassembled codes extracted using machine learning of the AI engine 1230 and matching similarity for each attack behavior (TTP) at high speed. Attack actions or attack techniques can be classified.

디스어셈블드 코드 내 OP-CODE는 수행될 연산을 특정하는 기계 언어 명령어의 일부인데, 사이버 보안 상 공격기법 또는 공격행위(Terrorist Tactics, Techniques, and Procedures, 이하 TTP)를 유발하는 OP-CODE 를 포함하는 디스어셈블드 코드는 해당 공격 행위 별로 매우 유사한 값이나 포맷을 가질 수 있다. 따라서, 이러한 OP-CODE와 ASM-CODE의 조합인 디스어셈블드 코드를 분석하면 특정 타입의 공격 행위를 구별할 수 있다. The OP-CODE in the disassembled code is a part of machine language instructions that specify the operation to be performed, and includes OP-CODE that triggers cyber security attack techniques or acts (Terrorist Tactics, Techniques, and Procedures, hereinafter TTP). The disassembled code may have very similar values or formats depending on the attack behavior. Therefore, by analyzing the disassembled code, which is a combination of OP-CODE and ASM-CODE, specific types of attacks can be distinguished.

예를 들면 공격행위(TTP)식별 모듈(15305)는 실행 가능한 파일로부터 추출된 디스어셈블드 코드를 퍼지 해쉬(Fuzzy Hashing) 방식 또는 CTPH (context triggered piecewise hashes) 방식의 해쉬 값으로 변환할 수 있다. For example, the attack behavior (TTP) identification module 15305 can convert disassembled code extracted from an executable file into a hash value using fuzzy hashing or CTPH (context triggered piecewise hashes).

공격행위(TTP)식별 모듈(15305)과 함께 수행되는 AI 엔진(1230)의 머신 러닝 알고리즘으로 Perceptron, Logistic Regression, Support Vector Machines, Multilayer Perceptron 등의 알고리즘이 사용될 수 있다. 또한 AI 엔진(1230)으로 앙상블 머신 러닝 알고리즘이나 자연어 처리 알고리즘도 사용할 수 있다. 이에 대한 예는 이하에서 상세히 개시한다.As a machine learning algorithm of the AI engine 1230 performed together with the attack behavior (TTP) identification module 15305, algorithms such as Perceptron, Logistic Regression, Support Vector Machines, and Multilayer Perceptron can be used. Additionally, ensemble machine learning algorithms or natural language processing algorithms can be used with the AI engine (1230). Examples of this are disclosed in detail below.

보안 전문가 집단의 공격 행위를 저장한 데이터 베이스의 일 예로서 MITRE ATT&CK은 실제 보안 공격 기법이나 행위에 대한 데이터 베이스인데 공격행위(TTP)식별 모듈(15305)은 추출한 OP-CODE을 포함하는 디스어셈블드 코드이 변환된 해쉬 값을 MITRE ATT&CK의 데이터베이스 상의 일정한 데이터 세트 형식 또는 식별자로 식별할 수 있도록 한다.As an example of a database that stores the attack actions of a group of security experts, MITER ATT&CK is a database of actual security attack techniques or actions, and the attack action (TTP) identification module (15305) is a disassembled database containing the extracted OP-CODE. The code allows the converted hash value to be identified as a certain data set format or identifier in MITER ATT&CK's database.

MITRE ATT&CK는 해커 또는 악성 코드의 공격 기법에 대한 취약 요소들을 CVE 코드(Common Vulnerabilities and Exposures Code)의 매트릭스로 표현한다. MITER ATT&CK expresses vulnerabilities to hacker or malicious code attack techniques as a matrix of Common Vulnerabilities and Exposures Code (CVE code).

실시 예는 디스어셈블드 코드를 분석함으로써 여러 가지 공격 행위들 중 특정 공격 행위를 식별하되, 식별된 타입의 공격 행위가 전문가 단체들이 인정하는 공격 행위의 요소들로 매칭되도록 함으로써 공격 행위 식별이 전문적이면서 공통으로 인식되는 요소들로 표현되도록 할 수 있다.The embodiment identifies a specific attack behavior among various attack behaviors by analyzing the disassembled code, and matches the identified type of attack behavior with elements of the attack behavior recognized by expert groups, so that the identification of the attack behavior is professional and It can be expressed with elements that are commonly recognized.

설명한 바와 같이 OP-CODE는 특정 행위를 유발시키는 기계 언어 명령어이므로, 동일한 공격 행위를 유발하는 파일의 디스어셈블드 코드는 매우 유사할 수 있다. 그러나 공격 행위와 이를 유발하는 파일의 디스어셈블드 코드가 정확하게 매칭되는 것은 아니므로 코드 상 일부 차이가 있을 수 있다.As explained, OP-CODE is a machine language instruction that triggers a specific action, so the disassembled code of files that trigger the same attack action may be very similar. However, the attack behavior and the disassembled code of the file that causes it do not exactly match, so there may be some differences in the code.

공격행위(TTP)식별 모듈(15305)은 추출한 디스어셈블드 코드를 일정 형식으로 변환한 코드에 대해 AI 엔진(1230)의 머신 러닝 수행하도록 한다. 따라서, 동일한 악성 행위를 유발시키는 파일들의 OP-CODE들이 완전히 동일하지 않더라도 공격행위(TTP)식별 모듈(15305)은 머신 러닝과 추출된 OP-CODE 기반의 퍼지 해쉬 값과 그에 대응하는 공격 요소를 매칭하여 공격 행위 등을 식별할 수 있다. The attack behavior (TTP) identification module 15305 allows the AI engine 1230 to perform machine learning on the code converted from the extracted disassembled code into a certain format. Therefore, even if the OP-CODEs of files that cause the same malicious behavior are not completely identical, the attack behavior (TTP) identification module 15305 matches the fuzzy hash value based on machine learning and the extracted OP-CODE with the corresponding attack element. Thus, attacks, etc. can be identified.

공격행위(TTP)식별 모듈(15305)은 디스어셈블드 코드들의 유사도를 AI 알고리즘을 이용하여 MITRE ATT&CK과 같은 공격 기법에 매칭하여 최종적으로 해당 파일이 악성 코드임을 탐지할 수 있다. The attack behavior (TTP) identification module 15305 matches the similarity of the disassembled codes to attack techniques such as MITER ATT&CK using an AI algorithm and can ultimately detect that the file is malicious code.

이에 대한 구체적인 예는 후술 한다.Specific examples of this will be described later.

공격자식별 모듈(15307)은 추출된 디스어셈블드 코드와 인공 지능 기반의 머신 러닝 결과를 이용해 유사 공격 행위를 유발하는 공격자도 식별하는 단계를 포함할 수도 있다. 마찬가지로 공격자 식별에 대한 구체적인 예는 후술한다The attacker identification module 15307 may also include a step of identifying attackers who cause similar attack actions using the extracted disassembled code and artificial intelligence-based machine learning results. Likewise, specific examples of attacker identification are described later.

테인트분석(taint analysis)모듈(15309)은 파일이 없는(fileless) 악성 코드의 경우도 특정 시점에서 시스템의 메모리 분석을 통해 공격 행위가 있는지 여부에 대해 판단할 수 있다. The taint analysis module (15309) can determine whether there is an attack by analyzing the system's memory at a specific point in time, even in the case of fileless malicious code.

심층분석 모듈(15300)은 해당 파일이나 그 파일로부터 식별된 악성 코드에 대응되는 심층 분석 정보를 데이터베이스(2200)에 저장할 수 있다.The in-depth analysis module 15300 may store in-depth analysis information corresponding to the file or malicious code identified from the file in the database 2200.

도 14은 개시하는 실시 예에 따라 분석 프레임 워크 중 연관관계분석 모듈의 기능을 상세히 설명하기 위한 일 예를 나타낸다. 이 도면을 참조하여 연관관계분석 모듈의 수행 과정을 예시하면 다음과 같다.Figure 14 shows an example to explain in detail the function of the correlation analysis module in the analysis framework according to the disclosed embodiment. Referring to this drawing, an example of the execution process of the correlation analysis module is as follows.

인텔리전스 플랫폼(10000)의 분석 프레임 워크(15000)는 연관관계분석 모듈(15400)을 포함할 수 있다. 연관관계분석 모듈(15400)은 분석 프레임 워크(15000)가 분석하는 여러 가지 분석 정보들을, 사이버 위협 침해 정보(IoC)에 기반하여 공격자 또는 공격 기법 사이에 연관관계로 표현되도록 연관관계 분석 정보를 생성한다.The analysis framework (15000) of the intelligence platform (10000) may include a correlation analysis module (15400). The correlation analysis module (15400) generates correlation analysis information so that various analysis information analyzed by the analysis framework (15000) is expressed as a correlation between attackers or attack techniques based on cyber threat infringement information (IoC). do.

연관관계분석 모듈(15400)은 분석 정보와 공격 행위 사이의 IP 정보의 연관관계를 분석하는 제 1 연관관계분석 모듈(15401), 이메일에 포함되거나 웹사이트 등에 포함된 호스트네임의 연관관계를 분석하는 제 2 연관관계분석 모듈 (15403), URL의 연관관계를 분석하는 제 3 연관관계분석 모듈 (15405), 파일의 코드사인(codesign)의 연관관계를 분석하는 제 4 연관관계분석 모듈 (15407), 공격 기법들 사이의 연관관계를 분석하는 제 5 연관관계분석 모듈 (15407) 등을 포함할 수 있다.The association analysis module 15400 is a first association analysis module 15401 that analyzes the association of IP information between analysis information and attack actions, and analyzes the association of host names included in emails or websites, etc. A second association analysis module (15403), a third association analysis module (15405) that analyzes the association of URLs, a fourth association analysis module (15407) that analyzes the association of the code sign of the file, It may include a fifth correlation analysis module (15407) that analyzes the relationships between attack techniques.

이 도면에 표시된 모듈들은 예시에 불과하며, 이 도면에 표시되지 않더라도 연관관계분석 모듈(15400)은 공격 기법과 공격자를 판단하기 위해 분석된 정보들 사이에 여러 가지 연관관계들을 분석할 수 있는 모듈들을 포함할 수 있다. 예를 들면 연관관계분석 모듈(15400)은 생성한 연관관계 정보들을 취합하거나 통합하는 통합 분석 모듈을 포함할 수도 있다. The modules shown in this drawing are only examples, and even if not shown in this drawing, the correlation analysis module 15400 includes modules that can analyze various relationships between attack techniques and the analyzed information to determine the attacker. It can be included. For example, the relationship analysis module 15400 may include an integrated analysis module that collects or integrates the generated relationship information.

연관관계분석 모듈(15400)은 정확하게 공격기법 또는 공격자를 추론하는데 사용되는 연관관계 분석 정보를 생성할 수 있다. The correlation analysis module 15400 can accurately generate correlation analysis information used to infer attack techniques or attackers.

연관관계분석 모듈(15400)은 수신되는 파일이나 악성 코드에 대해 지속적이고 누적적으로 분석 정보들을 저장하고 추후 새로운 파일이나 악성 코드가 분석될 때마다 관련된 연관관계 분석 정보를 다시 업데이트하여 데이터베이스(2220)에 저장한다. The correlation analysis module 15400 continuously and cumulatively stores analysis information about received files or malicious code, and updates the related relationship analysis information again whenever a new file or malicious code is analyzed in the database 2220. Save it to

연관관계분석 모듈(15400)은 위에서 분석한 여러 가지 분석 정보(정적분석정보, 동적분석정보, 심층분석정보 등)를 기반으로 사이버 위협 침해 정보를 얻을 수 있다. The correlation analysis module 15400 can obtain cyber threat infringement information based on various analysis information analyzed above (static analysis information, dynamic analysis information, in-depth analysis information, etc.).

연관관계분석 모듈(15400)은 사이버 위협 침해 정보(IoC)를 이용해 공격 행위나 공격자를 식별할 수 있는 여러 가지 연관관계 정보를 얻을 수 있으며 이와 같이 분석된 연관관계 분석 정보를 데이터베이스(2200)에 저장할 수 있다.The correlation analysis module 15400 can obtain various correlation information that can identify attack actions or attackers using cyber threat infringement information (IoC), and store the analyzed correlation analysis information in the database 2200. You can.

위에서 개시한 바와 같이 인텔리전스 플랫폼(10000)의 분석 프레임 워크(15000)는 분석된 정보들을 종합하여 중복 제거, 표준화, 인리치먼트 과정을 통해 표준화된 정보를 데이터베이스(2220)에 저장할 수 있다. As described above, the analysis framework 15000 of the intelligence platform 10000 can synthesize the analyzed information and store the standardized information in the database 2220 through a process of deduplication, standardization, and enrichment.

인텔리전스 플랫폼(10000)는 정적 분석 정보, 동적분석 정보, 심층분석 정보, 연관관계분석 정보들을 사이버 위협 정보를 갱신 또는 재생산하기 위해 표준화된 포맷으로 데이터베이스(2200)에 저장할 수 있다. The intelligence platform 10000 may store static analysis information, dynamic analysis information, in-depth analysis information, and correlation analysis information in the database 2200 in a standardized format to update or reproduce cyber threat information.

여기서 인텔리전스 플랫폼(10000)는 각 분석 정보들의 중복되거나 공통된 분석 정보의 중복된 부분을 제거하고, 부족한 부분의 데이터의 인리치먼트(enrichment) 작업 등을 수행할 수 있다.Here, the intelligence platform 10000 can remove duplicate or common analysis information from each analysis information and perform enrichment work on insufficient data.

인텔리전스 플랫폼(10000)는 후 처리를 통해 표준화된 정보를 사이버 공격들의 방지하기 위해 고안된 표준인 STIX 이나 TAXII 등의 포맷으로 저장할 수 있다. The intelligence platform (10000) can store standardized information through post-processing in formats such as STIX or TAXII, which are standards designed to prevent cyber attacks.

서버 (2100)는 사용자의 조회 질의에 따라 또는 서비스 정책에 따라 분석 프레임 워크(15000)가 생성한 분석 정보 등을 표준화된 사이버 위협 정보로 제공할 수 있다. 사이버 위협 정보로 제공 방법에 대해서도 이하에서 상세히 후술한다.The server 2100 may provide analysis information generated by the analysis framework 15000 as standardized cyber threat information according to a user's query or service policy. The method of providing cyber threat information is also described in detail below.

이러한 사이버 위협 정보는 사용자의 요청이나 서비스에 따라 제공할 수도 있다.This cyber threat information may be provided upon user request or service.

도 15는 개시하는 실시 예에 따라 예측 프레임 워크의 예측정보생성 모듈의 기능을 상세히 설명하기 위한 일 예를 나타낸다. 이 도면을 참조하여 예측 프레임 워크의 수행 과정을 예시하면 다음과 같다.Figure 15 shows an example to explain in detail the function of the prediction information generation module of the prediction framework according to the disclosed embodiment. Referring to this drawing, an example of the execution process of the prediction framework is as follows.

예시한 인텔리전스 플랫폼(10000)의 예측 프레임 워크(17000)는 예측정보생성모듈(17100)을 포함할 수 있다. 예측정보생성모듈(17100)은 생성하는 예측정보에 따라 다수의 정보예측모듈들을 포함할 수 있다. 이 예에서는 예측정보생성모듈(17100)이 제1정보예측모듈(1711), 제2정보예측모듈(1713), 제3정보예측모듈(1715), 제4정보예측모듈(1717), 및 제5정보예측모듈(1719)을 포함하는 예를 나타낸다. The prediction framework 17000 of the exemplified intelligence platform 10000 may include a prediction information generation module 17100. The prediction information generation module 17100 may include a number of information prediction modules depending on the prediction information it generates. In this example, the prediction information generation module 17100 includes the first information prediction module 1711, the second information prediction module 1713, the third information prediction module 1715, the fourth information prediction module 1717, and the fifth information prediction module 1710. An example including the information prediction module 1719 is shown.

예측 프레임 워크(17000)는 이전에 예시한 분석 프레임 워크(미도시)가 생성한 분석정보들을 이용할 수 있다. 예측 프레임 워크(17000)는 여러 가지 분석 정보들에 따른 데이터 세트를 인공 지능 기반의 학습 데이터 세트로 가공하고, AI 엔진(1230)은 가공된 학습 데이터 세트를 기초로 인공 지능 분석을 수행할 수 있다. The prediction framework 17000 may use analysis information generated by the previously exemplified analysis framework (not shown). The prediction framework 17000 processes a data set based on various analysis information into an artificial intelligence-based learning data set, and the AI engine 1230 can perform artificial intelligence analysis based on the processed learning data set. .

예측 프레임 워크(17000)과 AI 엔진(1230)의 수행을 통해 공격 행위와 관련된 여러 가지 예측 정보 생성할 수 있다. Through the execution of the prediction framework 17000 and the AI engine 1230, various types of prediction information related to attack behavior can be generated.

이 예에서는 제1정보예측모듈(1711)는 인공 지능 학습을 통해 악성 코드 제작자의 예측 정보를 생성할 수 있다. 제2정보예측모듈(1713)는 악성 코드 공격 방법의 예측 정보를 생성하고 제3정보예측모듈(1715)는 악성 코드 공격 그룹의 예측 정보를 생성할 수 있다. 그리고 제4정보예측모듈(1717)는 악성 코드 유사도 예측 정보를 생성하고, 제5정보예측모듈(1719)는 악성 코드 확산도 예측 정보를 생성하는 예를 나타낸다. In this example, the first information prediction module 1711 can generate prediction information about the malicious code creator through artificial intelligence learning. The second information prediction module 1713 may generate prediction information of a malicious code attack method, and the third information prediction module 1715 may generate prediction information of a malicious code attack group. In addition, the fourth information prediction module 1717 generates malicious code similarity prediction information, and the fifth information prediction module 1719 shows an example of generating malicious code spread prediction information.

구체적인 예측 정보의 생성의 예는 이하에서 후술한다. An example of generating specific prediction information will be described later.

예측 프레임 워크(17000)는 생성한 예측 정보를 데이터베이스(2200)에 저장할 수 있다. The prediction framework 17000 may store the generated prediction information in the database 2200.

예를 들면 예측 프레임 워크(17000)는 특정 악성 코드에 대해 그 악성코드의 위험 자체를 예측한 악성코드 위험도 예측 정보를 생성하여 데이터베이스(2200)에 저장할 수 있다. For example, the prediction framework 17000 may generate malicious code risk prediction information that predicts the risk of the malicious code itself for a specific malicious code and store it in the database 2200.

그리고 예측 프레임 워크(17000)는 특정 악성 코드에 대해 예측한 제작자, 공격방법, 공격 그룹, 유사도, 확산도의 예측 정보를 데이터베이스(2200)에 저장할 수 있다. In addition, the prediction framework 17000 can store prediction information about the creator, attack method, attack group, similarity, and spread predicted for a specific malicious code in the database 2200.

개시한 바와 같이 인텔리전스 플랫폼(1000)은 분석 정보 또는 예측 정보에 기초하여 악성 코드 종류 및 악성 코드의 위험도를 생성할 수 있다. 그리고 인텔리전스 플랫폼(10000)은 악성 코드에 대한 프로파일링 정보를 생성할 수 있다. As disclosed, the intelligence platform 1000 can generate the type of malicious code and the risk level of the malicious code based on analysis information or prediction information. And the intelligence platform 10000 can generate profiling information about malicious code.

인텔리전스 플랫폼(10000)은 파일 분석을 통해 파일에 대한 자체 분석을 수행한 결과나 추가 및 예측 분석을 수행한 결과를 데이터베이스(2200)에 저장할 수 있다.The intelligence platform 10000 may store the results of its own analysis of the file or the results of additional and predictive analysis on the file through file analysis in the database 2200.

인텔리전스 플랫폼(10000)이 제공하는 사이버 위협 정보는, 위의 전처리를 수행한 정보, 생성한 분석 정보, 생성한 예측 정보 또는 이 정보들의 취합 정보나 이 정보들을 기반으로 추가 후 처리된 정보를 포함할 수 있다. The cyber threat information provided by the intelligence platform (10000) may include information that has undergone the above preprocessing, generated analysis information, generated prediction information, aggregate information of this information, or information that has been additionally processed based on this information. You can.

따라서 제공되는 사이버 위협 정보에는 입력된 파일과 관련하여 통합 분석Therefore, the cyber threat information provided includes integrated analysis related to the entered file.

이러한 예시한 인텔리전스 플랫폼(10000)에 의해 제공되는 통합 분석 정보는, 입력된 파일에 대응하여 서버(2100)에 의해 데이터베이스(2200)에 표준화된 포맷으로 저장될 수 있다. 이러한 통합 분석 정보는 표준화된 포맷으로 저장되어 사이버 위협 정보를 검색 또는 조회에 사용될 수 있다.The integrated analysis information provided by this exemplary intelligence platform 10000 may be stored in a standardized format in the database 2200 by the server 2100 in response to the input file. This integrated analysis information is stored in a standardized format and can be used to search or query cyber threat information.

이하에서는 각 처리 단계 또는 모듈에 따른 상세한 실시 예들을 개시한다. Below, detailed embodiments for each processing step or module are disclosed.

도 16은 개시하는 실시 예에 따라 정적 분석을 수행하는 일 예를 나타낸다. 도면을 참조하여 실시 예에 따른 정적 분석 방법의 일 예를 설명하며 다음과 같다. Figure 16 shows an example of performing static analysis according to the disclosed embodiment. An example of a static analysis method according to an embodiment will be described with reference to the drawings, and is as follows.

설명한 바와 같이 정적 분석을 수행하기 이전에 전처리 단계나 정적 분석의 초기 단계에서 파일의 종류를 식별 수 있다. 이 도면은 파일의 종류로서 편의상 ELF, EXE, ARK 파일이 식별된 경우를 예시하지만 실시예의 적용은 이에 국한되지 않는다.As described, the type of file can be identified in the preprocessing stage or early stage of static analysis before performing static analysis. This figure illustrates a case in which ELF, EXE, and ARK files are identified as file types for convenience, but application of the embodiment is not limited thereto.

악성코드의 정적 분석 또는 탐지는 위와 같은 파일 자체가 가지고 있는 성격과 기존에 확인된 패턴 데이터베이스와 비교 하는 과정을 기반으로 동작할 수 있다. Static analysis or detection of malicious code can be performed based on the process of comparing the characteristics of the file itself with the previously identified pattern database.

정적 정보 추출기는 입력된 파일의 구조를 파싱하여 구조 정보를 얻을 수 있다.The static information extractor can obtain structural information by parsing the structure of the input file.

파싱된 파일의 구조 상 패턴(pattern)은 데이터베이스(DB)(2200)에 이미 저장된 악성 코드의 패턴과 비교될 수 있다. The pattern in the structure of the parsed file can be compared with the pattern of malicious code already stored in the database (DB) 2200.

파싱된 파일의 구조 특징과 패턴은 상기 파싱된 파일의 메타 정보가 될 수 있다. Structural characteristics and patterns of the parsed file may become meta information of the parsed file.

위에 개시된 예에서는 표시하지 않았으나 개시하는 실시예의 정적 분석에서도 머신 러닝 엔진이 사용될 수 있다. 데이터베이스(2200)는 이미 저장된 악성 코드의 학습된 특징들을 포함하는 데이터 세트를 저장할 수 있다. Although not indicated in the example disclosed above, a machine learning engine may also be used in the static analysis of the disclosed embodiment. The database 2200 may store a data set containing learned characteristics of already stored malicious code.

AI 엔진은 위와 같이 파상된 파일로부터 얻은 메타 정보를 머신 러닝을 통해 학습하고, 데이터베이스(2200)에 이미 저장된 데이터 세트를 비교하여 악성코드 여부를 판단할 수 있다.The AI engine can learn the meta information obtained from the damaged file as above through machine learning and compare it with the data set already stored in the database 2200 to determine whether it is malicious code.

정적 분석을 통해 악성 코드로 분석된 파일은 파일의 구조적 특징은 악성 코드와 관련된 데이터 세트로 다시 저장될 수 있다.For files analyzed as malicious code through static analysis, the structural characteristics of the file can be saved again as a data set related to the malicious code.

도 17은 개시하는 실시 예에 따라 동적 분석을 수행하는 일 예를 나타낸다. 도면을 참조하여 실시 예에 따른 동적 분석 방법의 일 예를 설명하며 다음과 같다. Figure 17 shows an example of performing dynamic analysis according to the disclosed embodiment. An example of a dynamic analysis method according to an embodiment will be described with reference to the drawings, and is as follows.

설명한 바와 같이 동적 분석을 수행하기 이전에 전처리 단계나 동적 분석의 초기 단계에서 파일의 종류를 식별 수 있다. 마찬가지로 이 예시에서 파일의 종류로서 편의상 ELF, EXE, ARK 파일이 식별된 경우를 예시한다. As described, the type of file can be identified in the preprocessing stage or initial stage of dynamic analysis before performing dynamic analysis. Similarly, in this example, ELF, EXE, and ARK files are identified as file types for convenience.

전처리를 통해 동적 분석 대상이 되는 파일 종류를 식별할 수 있다. 식별된 파일은 각 파일의 종류와 타입에 따라 가상 환경에서 실행될 수 있다. Preprocessing allows you to identify the type of file that is subject to dynamic analysis. Identified files can be executed in a virtual environment depending on the type and type of each file.

예를 들어 식별된 파일이 ELF 파일인 경우 대기 큐(Que)를 거쳐 리눅스 가상 환경(Virtual Machine, VM)의 운영체제에서 실행될 수 있다. For example, if the identified file is an ELF file, it can be executed in the operating system of a Linux virtual machine (VM) through a waiting queue.

ELF 파일이 실행될 경우 발생하는 이벤트는 행위 로그(log)에 기록될 수 있다. Events that occur when an ELF file is executed can be recorded in the behavior log.

이와 같이 각각의 식별 파일의 종류 별로 윈도우, 리눅스, 모바일 운영체제 시스템을 가상으로 구축한 후 가상 시스템의 실행 이벤트를 기록한다. In this way, Windows, Linux, and mobile operating systems are virtually constructed for each type of identification file, and then the execution events of the virtual system are recorded.

그리고 데이터베이스(2200)에 이미 저장된 악성 코드의 실행 이벤트들과 기록한 실행 이벤트들을 비교할 수 있다. 위에서 예시하지 않았으나 동적 분석의 경우에도 머신 러닝을 통해 기록한 실행 이벤트들을 학습하고, 학습된 데이터가 이미 저장된 악성 코드의 실행 이벤트들과 유사한지 판단할 수 있다.Additionally, execution events of malicious code already stored in the database 2200 can be compared with recorded execution events. Although not illustrated above, even in the case of dynamic analysis, execution events recorded through machine learning can be learned and it can be determined whether the learned data is similar to execution events of already stored malicious code.

동적 분석의 경우 파일에 따라 가상 환경을 구축해야 하고 이에 따라 분석 및 탐지 시스템의 규모가 커질 수 있다.In the case of dynamic analysis, a virtual environment must be built depending on the file, and the scale of the analysis and detection system can increase accordingly.

도 18은 개시하는 실시 예에 따라 심층 분석을 수행하는 일 예를 나타낸다. 도면을 참조하여 실시 예에 따른 심층 분석 방법의 일 예를 설명하며 다음과 같다. Figure 18 shows an example of performing in-depth analysis according to the disclosed embodiment. An example of an in-depth analysis method according to an embodiment will be described with reference to the drawings, and is as follows.

설명한 바와 같이 심층 분석을 수행하기 이전에 전처리 단계나 심층 분석의 초기 단계에서 파일의 종류를 식별 수 있다. 개시된 예는 식별된 파일이 ELF, EXE, ARK 의 실행 가능한 바이너리 파일을 예시한다. As explained, the type of file can be identified in the preprocessing stage or in the initial stages of in-depth analysis before performing in-depth analysis. The disclosed example illustrates that the identified files are executable binary files of ELF, EXE, and ARK.

실행 가능한 바이너리 파일을 디스어셈블(disassemble)을 수행하면 CPU(Central Processing Unit)의 명령어 집합 중 함수들의 구조를 분석할 수 있다.By disassembling an executable binary file, you can analyze the structure of functions in the instruction set of the CPU (Central Processing Unit).

심층 분석은 동적 분석과 다르게 바이너리 파일을 디스어셈블하여 추출된 코드를 기반으로 동작하기 때문에 상대적으로 시스템 규모가 간단하게 분석이 가능하다. 그리고 심층 분석은 별도의 엔진 없이 추출된 코드들을 정규화 하는 과정을 통해 만들어진 데이터를 기초로 인공지능 분석을 수행할 수 있다. Unlike dynamic analysis, in-depth analysis operates based on code extracted by disassembling binary files, allowing for relatively simple analysis of the system scale. And in-depth analysis can perform artificial intelligence analysis based on data created through the process of normalizing extracted codes without a separate engine.

이 도면에서 디스어셈블드 코드는 OP-CODE와 ASM-CODE의 결합으로 표현된다. In this figure, the disassembled code is expressed as a combination of OP-CODE and ASM-CODE.

실시 예는 OP-CODE 와 ASM-CODE를 기반으로 두 가지 코드를 조합하고, 조합된 코드 중 의미가 있는 코드 블록(Code Block)을 추출할 수 있다. The embodiment combines two codes based on OP-CODE and ASM-CODE and extracts a meaningful code block from the combined codes.

OP-CODE 와 ASM-CODE을 포함하는 디스어셈블된 코드의 코드 블록(Code Block)은 일정한 형식을 변환하여 해당 코드가 악성 코드와 관련되었는지, 어떤 악성 코드이지 또는 어떤 공격자가 개발했는지를 식별할 수 있다.The code block of the disassembled code, including OP-CODE and ASM-CODE, can be converted to a certain format to identify whether the code is related to malicious code, what malicious code it is, or which attacker developed it. there is.

이를 판단하기 위한 코드 블록(Code Block)의 데이터 변환 방식을 여러 가지 과정이 있다. 디스어셈블된 코드의 데이터 변환 과정은 데이터의 처리 속도와 정확도에 따라 선택적으로 적용될 수 있으나 이 도면에서는 정규화 과정과 벡터화 과정만을 표기하였다. There are several processes for converting code block data to determine this. The data conversion process of the disassembled code can be selectively applied depending on the data processing speed and accuracy, but only the normalization process and vectorization process are shown in this figure.

OP-CODE와 ASM-CODE의 결합 코드의 추출된 코드 블록(Code Block)을 정규화 과정과 벡터화 과정을 수행할 수 있다. The normalization process and vectorization process can be performed on the extracted code block of the combined code of OP-CODE and ASM-CODE.

즉 바이너리 코드의 OP-CODE 와 ASM-CODE 조합으로 코드 블록(Code Block)을 추출하고 이 코드 블록(Code Block)의 특징 정보를 벡터화시킨 후 다양한 특징 정보를 통해 학습된 데이터와 비교하여 공격 행위 등을 식별하도록 한다.That is, extract a code block using a combination of OP-CODE and ASM-CODE of binary code, vectorize the feature information of this code block, and compare it with data learned through various feature information to identify attack actions, etc. to identify.

동일한 실행 파일이라도 이와 같이 추출된 코드 블록(Code Block)이 모두 다를 수 있기 때문에 실시 예는 추출된 코드 블록(Code Block)를 악성 코드로 판단하고 분류하는 방식으로 머신 러닝 또는 인공 지능(AI) 방식을 이용할 수 있다. Since the extracted code blocks may be different even if they are the same executable file, the embodiment uses machine learning or artificial intelligence (AI) to determine and classify the extracted code blocks as malicious code. can be used.

그리고 실시 예는 정규화 및 벡터화 과정이 수행된 최종 데이터를 인공 지능을 통해 학습시킨다. 학습된 데이터는 데이터베이스(2200)에 이미 저장된 공격 기법(TTP)과 공격자 또는 공격 그룹의 데이터와 비교되어 악성 코드 여부 등의 정보를 얻을 수 있다. And in the embodiment, the final data on which the normalization and vectorization processes were performed is trained through artificial intelligence. The learned data can be compared with the attack technique (TTP) already stored in the database 2200 and the data of the attacker or attack group to obtain information such as whether it is malicious code.

실시 예는 악성 코드의 핵심 부분인 구성 요소를 MITRE ATT&CK 모델을 기반으로 분류하고 구분할 수 있다. In the embodiment, components that are a core part of malicious code can be classified and distinguished based on the MITER ATT&CK model.

이에 대한 구체적인 실시 예는 이하에서 더욱 상세하게 개시된다.Specific examples of this are disclosed in more detail below.

도 19는 개시하는 실시 예에 따라 바이너리 코드에서 추출된 코드들로 공격 기법을 매칭하는 일 예를 나타낸다. 여기에서는 공격 기법을 매칭하는 일 예로 표준화된 모델을 사용하는 예를 개시한다. Figure 19 shows an example of matching an attack technique with codes extracted from binary code according to the disclosed embodiment. Here, an example of using a standardized model for matching attack techniques is disclosed.

여기서 표준화된 모델로 MITRE ATT&CK® Framework를 예시한다.Here, the MITER ATT&CK® Framework is illustrated as a standardized model.

예를 들어 사이버 보안 상 “악성 행위” 라고 하는 것은 분석가에 따라 해석 방식이 다르고 각자가 가지고 있는 식견에 따라서 다르게 해석하는 경우가 많았다. For example, in cybersecurity, “malicious behavior” is often interpreted differently depending on the analyst and each analyst’s insight.

국제적으로 시스템 상에서 발생하는 “악성 행위”를 표준화 하고 모두가 같은 해석을 할 수 있도록 전문가들 사이에 많은 노력을 수행되고 있다. 미국 연방정부의 지원을 받으며 국가안보관련 업무를 수행하던 비영리 연구개발 단체인 MITRE(https://attack.mitre.org)에서 “악성 행위” 에 대한 정의를 연구하였고 그에 따라 ATT&CK® Framework 이라는 것을 만들고 공표하였다. 이 프레임 워크는 사이버 위협 또는 악성코드에 대해 모두가 같은 “악성 행위”를 정의 할 수 있도록 정의하였다. Internationally, a lot of effort is being made among experts to standardize “malicious behavior” that occurs on systems and to ensure that everyone has the same interpretation. MITER (https://attack.mitre.org), a non-profit research and development organization that performs national security-related work with support from the U.S. federal government, studied the definition of “malicious behavior” and created the ATT&CK® Framework accordingly. announced. This framework defines cyber threats or malware so that everyone can define the same “malicious behavior”.

MITRE ATT&CK® Framework (이하, MITRE ATT&CK®)는 공격자들의 최신 공격 기술 정보를 정리한 것으로서 Adversarial Tactics, Techniques, and Common Knowledge의 약어이다. MITRE ATT&CK® 은, 실제 사이버 공격 사례를 관찰한 후 공격자의 악의적 행위(Adversary behaviors)에 대해서 공격 방법(Tactics)과 기술(Techniques)을 분석하여 다양한 공격 그룹들의 공격 기법들에 대한 정보들을 분류하고 목록화한 표준적인 데이터이다. MITER ATT&CK® Framework (hereinafter referred to as MITER ATT&CK®) is an abbreviation for Adversarial Tactics, Techniques, and Common Knowledge, which summarizes information on attackers' latest attack techniques. MITER ATT&CK® observes actual cyber attack cases and then analyzes the tactics and techniques of the attacker's malicious behaviors to classify and list information on the attack techniques of various attack groups. This is standardized data.

MITRE ATT&CK® 은 전통적인 사이버 킬체인의 개념과는 약간 관점을 달리하여 지능화된 공격의 탐지를 향상시키기 위해 위협적인 전술과 기술을 체계화(패턴화)한 것이다. 원래 ATT&CK는 MITRE에서 윈도우 운영체제를 사용하는 기업 환경에 사용되는 해킹 공격에 대해서 방법(Tactics), 기술(Techniques), 절차(Procedures) 등 TTP를 문서화하는 것으로 시작되었다. 그 이후 ATT&CK은 공격자로부터 발생한 일관된 공격 행동 패턴에 대한 분석을 기반으로 TTP 정보를 매핑하여 공격자의 행위를 식별해 줄 수 있는 프레임워크로 발전하였다.MITER ATT&CK® takes a slightly different perspective from the traditional cyber kill chain concept and systematizes (patterns) threatening tactics and technologies to improve detection of intelligent attacks. Originally, ATT&CK began with MITER documenting TTP, including tactics, techniques, and procedures, for hacking attacks used in corporate environments using the Windows operating system. Since then, ATT&CK has developed into a framework that can identify attackers' actions by mapping TTP information based on analysis of consistent attack behavior patterns generated by attackers.

개시하는 실시 예에서 언급하는 악성 행위는, MITRE ATT&CK® 와 같은 표준화된 모델에 기반하여 악성 코드를 공격 기법에 매칭하여 표현할 수 있는데 표준화된 모델이 어떤 것이든 악성 코드를 요소 별로 식별하고 분류하여 공격 식별자에 매칭할 수 있다. The malicious behavior mentioned in the disclosed embodiment can be expressed by matching malicious code to an attack technique based on a standardized model such as MITER ATT&CK®. Regardless of the standardized model, malicious code is identified and classified by element and attacked. Can be matched to identifier.

이 도면의 예 어떻게 악성 코드의 악성 행위와 MITRE ATT&CK 모델 기반으로 공격 기법이 매칭되는지를 개념적으로 나타낸다. The example in this diagram conceptually shows how the malicious behavior of malware and attack techniques are matched based on the MITER ATT&CK model.

실행 파일 EXE는 파일 실행 시에 수행되는 여러 가지 함수들(Function A, B, C, D, E,…, N,…, Z)을 포함할 수 있다. 그 함수들 중 적어도 하나의 함수를 포함하는 함수 그룹은 하나의 공격 방법(tactic)을 수행할 수 있다. An executable file EXE may include various functions (Function A, B, C, D, E,…, N,…, Z) that are performed when the file is executed. A function group that includes at least one of those functions can perform one tactic.

이 도면의 예에서 함수 A, B, C는 공격 방법(tactic) A에 대응되고, 함수 D, B, F는 공격 방법(tactic) B에 대응되는 예를 개시한다. 유사하게 함수 Z, R, C는 공격 방법(tactic) C에 대응되고, 함수 K 및 F는 공격 방법(tactic) D에 대응된다. In the example of this figure, functions A, B, and C correspond to attack method (tactic) A, and functions D, B, and F correspond to attack method (tactic) B. Similarly, functions Z, R, C correspond to tactic C, and functions K and F correspond to tactic D.

실시 예는 각 공격 방법(tactic)에 대응되는 함수들의 집합과 특정 디스어셈블드 코드 의 부분을 대응시킬 수 있다. 데이터베이스는 이미 인공 지능으로 학습된 디스어셈블드 코드들에 대응될 수 있는 의 공격 방법(Tactics), 기술(Techniques), 절차(Procedures) (TTP)의 공격 식별자 (T-ID)를 저장하고 있다. An embodiment may match a set of functions corresponding to each attack tactic with a portion of a specific disassembled code. The database stores attack identifiers (T-IDs) of Tactics, Techniques, and Procedures (TTP) that can correspond to disassembled codes already learned by artificial intelligence.

공격 방법(Tactics), 기술(Techniques), 절차(Procedures) (TTP)의 공격 식별자 (T-ID)는 표준화된 모델을 따르며 여기 도면의 예시는 사이버 위협 정보의 표준화된 모델로 MITRE ATT&CK®를 예시하였다. The Attack Identifier (T-ID) for Tactics, Techniques, Procedures (TTP) follows a standardized model and the example in the diagram here illustrates MITER ATT&CK® as a standardized model for cyber threat information. did.

따라서, 실시 예는 바이너리 파일에서 디스어셈블드 코드로부터 추출한 결과 데이터를 표준화된 공격 식별자로 매칭시킬 수 있다. 공격 식별자를 매칭하는 보다 구체적인 방식은 아래에서 개시한다.Accordingly, the embodiment may match the resulting data extracted from the disassembled code in the binary file with a standardized attack identifier. A more specific method of matching attack identifiers is disclosed below.

도 20은 개시하는 실시 예에 따라 OP-CODE를 포함하는 코드 세트와 공격 기법을 매칭하는 일 예를 나타낸다. Figure 20 shows an example of matching a code set including OP-CODE and an attack technique according to the disclosed embodiment.

대부분의 인공지능 엔진은 악성 코드의 다양한 특징 정보를 바탕으로 학습된 데이터 셋(data set)을 이용해 악성 코드를 판별한다. 그러면 악성 코드의 악성 여부는 판단이 되지만 이러한 방식은 악성 코드가 왜 악성 코드인지에 대한 설명을 하기 힘들었다. 그러나 예시한 바와 같이 표준화된 공격 방법(TTP)의 식별자로 대응시키면 해당 악성 코드가 어떤 위협 요소가 있는지 식별이 가능하다. 따라서, 실시 예는 보안 관리자에게 사이버 위협 정보를 정확하게 전달하도록 하고, 보안 관리자가 사이버 위협 정보를 체계적이고 장기적으로 관리할 수 있도록 할 수 있다. Most artificial intelligence engines determine malicious code using a data set learned based on various characteristic information of malicious code. Then, it can be determined whether the malicious code is malicious, but this method makes it difficult to explain why the malicious code is malicious. However, as shown in the example, if you match it with the identifier of a standardized attack method (TTP), it is possible to identify what kind of threat the malicious code poses. Therefore, the embodiment can accurately deliver cyber threat information to the security manager and enable the security manager to systematically and long-term manage the cyber threat information.

실시 예는 디스어셈블드 코드를 기반으로 공격 방법(TTP)을 식별하기 위한 인공 지능 학습용 데이터 셋을 생성할 때 단순히 공격 방법(TTP)의 식별자 또는 라벨링 만을 구분할 뿐만 아니라 공격 방법(TTP)을 어떻게 구현했는지에 대한 특징을 중요한 요소로 반영할 수 있다. The embodiment not only distinguishes the identifier or labeling of the attack method (TTP) when generating an artificial intelligence learning data set to identify the attack method (TTP) based on the disassembled code, but also how to implement the attack method (TTP). The characteristics of what has been done can be reflected as an important factor.

동일한 공격 방법(TTP)을 구현하는 악성 코드라도 개발자에 따라 동일한 코드로 생성하는 것은 불가능하다. 즉, 공격 방법(TTP)의 기술은 인간 구술 언어 형태로 되어 있으나, 개발자에 따라 이를 구현 방식과 코드 작성 방법이 동일하지 않다. Even if malicious code implements the same attack method (TTP), it is impossible to create the same code depending on the developer. In other words, the attack method (TTP) technology is in the form of human oral language, but the implementation method and code writing method are not the same depending on the developer.

이러한 코드 작성의 차이는 개발자의 역량이나 프로그램 로직을 구현하는 방식이나 습관에 따르는데 이러한 차이는 바이너리 코드 또는 이를 디스어셈블된 OP-CODE 와 ASM-CODE의 차이로 나타낸다. These differences in code writing depend on the developer's capabilities or the method or habits of implementing program logic, and these differences are expressed as binary code or the difference between disassembled OP-CODE and ASM-CODE.

그래서 단순히 결과적인 공격 방법(TTP)의 타입에 따라 공격 식별자를 부여하거나 대응시키면 악성 코드를 생성하는 공격자 또는 공격자 그룹까지 정확하게 식별하기 힘들다. Therefore, simply assigning or responding to an attack identifier based on the type of resulting attack method (TTP) makes it difficult to accurately identify the attacker or group of attackers creating malicious code.

반대로 디스어셈블된 OP-CODE 와 ASM-CODE의 특성을 중요한 변수로 반영시켜서 모델링을 수행하면 특정 악성코드나 공격 도구를 개발한 개발자 혹은 자동으로 생성하는 도구 자체까지도 식별이 가능하다. Conversely, if modeling is performed by reflecting the characteristics of the disassembled OP-CODE and ASM-CODE as important variables, it is possible to identify the developer who developed a specific malware or attack tool, or even the automatically generated tool itself.

개시하는 실시 예는 디스어셈블된 OP-CODE 와 ASM-CODE 결합 코드의 고유한 특성에 따라 현대의 사이버 전에서 굉장히 중요한 위협 인텔리전스를 생성하도록 할 수 있다. 즉, 이러한 고유 특성에 기초하면 실시 예는 공격 코드 또는 악성 코드를 어떻게 동작을 하는지, 이것을 누가 어떤 의도로 개발했는지에 대한 내용을 함께 식별할 수 있다. The disclosed embodiment can generate threat intelligence, which is very important in modern cyber warfare, according to the unique characteristics of the disassembled OP-CODE and ASM-CODE combination codes. In other words, based on these unique characteristics, the embodiment can identify how the attack code or malicious code operates and who developed it and with what intent.

그리고 추후에 해당 공격자가 계속해서 공격하는 특징 정보를 바탕으로 취약한 시스템을 보완할 수 있고 사이버 보안 위협에 대한 능동적이고 선제적인 대응이 가능하도록 할 수 있다. In the future, vulnerable systems can be supplemented based on the characteristic information that the attacker continues to attack, and active and preemptive responses to cybersecurity threats can be made possible.

이러한 개념 상에서 실시 예는 단순히 OP-CODE 기반으로 공격 결과에 따른 공격 기법을 식별하는 방식과 성능에서 전혀 다른 결과를 제공한다. In this concept, the embodiment simply provides completely different results in terms of performance and method of identifying attack techniques based on attack results based on OP-CODE.

실시 예는 공격 방법(TTP)를 구현하기 위해 사용된 코딩 기법을 정확하게 식별하여 분류하기 위해 디스어셈블된 OP-CODE 와 ASM-CODE을 조합된 특징에 기초한 디스어셈블드 코드의 데이터 세트를 생성할 수 있다. 이렇게 생성된 데이터 세트로부터 고유한 특성을 식별하도록 모델링하면 공격 방법(TTP)뿐만 아니라 개발자의 특징 정보, 즉 개발자 (또는 자동화된 제작 도구)가 누구인지까지 식별이 가능하다. Embodiments can generate a data set of disassembled code based on the combined features of the disassembled OP-CODE and ASM-CODE to accurately identify and classify the coding technique used to implement the attack method (TTP). there is. By modeling to identify unique characteristics from the data set generated in this way, it is possible to identify not only the attack method (TTP) but also the developer's characteristic information, that is, who the developer (or automated production tool) is.

이 도면은 위에서 설명한 방식으로 모델링된 OP-CODE 데이터 세트를 공격 식별자에 매칭하는 예를 나타낸다. This figure shows an example of matching an OP-CODE data set modeled in the manner described above to an attack identifier.

이 예에서 제 1 OP-CODE 세트(OP-CODE set #1)는 공격 기법 식별자 T1011에 매칭되고, 제 2 OP-CODE 세트(OP-CODE set #2)는 공격 기법 식별자 T2013에 매칭됨을 나타낸다. 그리고 제 3 OP-CODE 세트(OP-CODE set #3)는 공격 기법 식별자 T1488에 매칭할 수 있고, 제 N번째 OP-CODE 세트(OP-CODE set #N)는 임의의 공격 기법 식별자 T1XXX에 매칭됨을 나타낸다. 표준화된 모델인 MITRE ATT&CK®은 공격 기법의 식별자를 요소 별로 매트릭스 형식으로 표현하지만, 실시 예는 공격 기법의 식별자 이외에 공격자 또는 공격 도구를 추가로 식별할 수 있다. In this example, the first OP-CODE set (OP-CODE set #1) matches the attack technique identifier T1011, and the second OP-CODE set (OP-CODE set #2) matches the attack technique identifier T2013. And the third OP-CODE set (OP-CODE set #3) can match the attack technique identifier T1488, and the Nth OP-CODE set (OP-CODE set #N) can match a random attack technique identifier T1XXX. It indicates that it is. MITER ATT&CK®, a standardized model, expresses the identifier of the attack technique in a matrix format for each element, but the embodiment may additionally identify the attacker or attack tool in addition to the identifier of the attack technique.

이 도면은 편의 상 OP-CODE 데이터 세트로 표시하였으나 OP-CODE 와 ASM-CODE을 포함하는 디스어셈블드 코드의 데이터 세트로 공격 기법을 식별하면 OP-CODE 데이터 세트만으로 공격 기법을 식별하는 것보다 더욱 세분화된 공격 기법을 식별할 수 있다. This figure is shown as the OP-CODE data set for convenience, but identifying attack techniques with a data set of disassembled code including OP-CODE and ASM-CODE is more effective than identifying attack techniques with the OP-CODE data set alone. Granular attack techniques can be identified.

실시 예에 따라 디스어셈블드 코드의 데이터 세트의 조합을 분석하면 공격 기법 식별자 뿐만 아니라 공격자 또는 공격 그룹의 식별할 수도 있다.Depending on the embodiment, analyzing the combination of the data set of the disassembled code may identify not only the attack technique identifier but also the attacker or attack group.

따라서, 실시 예는 기존의 기술보다 인텔리전스 정보 획득 차원에서 고도화된 기술을 제공할 수 있을 뿐만 아니라 종래의 보안 영역에서 해결하지 못한 문제를 해결할 수 있다. Therefore, the embodiment can not only provide a more advanced technology in terms of intelligence information acquisition than existing technologies, but also solve problems that have not been solved in the conventional security area.

위와 같이 복잡한 환경에서 정확한 인텔리전스 정보를 확보하기 위해 빠른 데이터처리와 알고리즘이 요구된다. 이하에서는 이와 관련된 추가적인 실시 예와 그에 따른 성능에 대해 개시하도록 한다.In order to secure accurate intelligence information in a complex environment such as the above, fast data processing and algorithms are required. Below, additional embodiments related to this and their performance will be disclosed.

도 21은 개시하는 실시 예에 따라 사이버 위협 정보를 처리하는 흐름을 예시한 도면이다. Figure 21 is a diagram illustrating a flow of processing cyber threat information according to the disclosed embodiment.

이 도면에서 식별된 파일이 ELF, EXE, ARK 의 실행 가능한 바이너리 파일인 경우를 예로 하여 설명한다. 이 단계의 처리 과정은 위에서 개시한 심층 분석과 관련된다.This will be explained using the example where the files identified in this figure are executable binary files of ELF, EXE, and ARK. This stage of processing involves the in-depth analysis described above.

먼저 제 1 단계로서 OP-CODE 코드를 포함하는 디스어셈블드 코드를 추출하는 과정의 일 상세한 예를 설명하면 다음과 같다. First, as a first step, a detailed example of the process of extracting the disassembled code including the OP-CODE code is described as follows.

소스 코드를 컴파일(complie)하면 실행 파일이 생성된다.When you compile the source code, an executable file is created.

원시 소스 코드는 실행 가능한 각 운영체제(OS) 환경에서 컴파일러에 의해 기계의 처리에 적합한 형태의 새로운 데이터로 생성된다. 새롭게 구성된 바이너리 데이터는 사람이 읽기에는 적합하지 않은 형태로 되어 있어 실행 파일 형태로 만들어진 파일을 인간이 해석해서 그 내부 로직을 파악하는 것은 불가능하다.The original source code is created by a compiler in each executable operating system (OS) environment as new data in a form suitable for machine processing. The newly formed binary data is in a form that is not suitable for human reading, so it is impossible for humans to interpret the file created in the form of an executable file and understand its internal logic.

그러나 보안 시스템의 취약점 분석과 다양한 목적을 위해서 그 역과정을 수행하여 기계어의 해석이나 분석을 수행하는데 설명한 바와 같이 디스어셈블 과정이라고 한다. 디스어셈블 과정은 특정 운영체제의 중앙처리장치(CPU)와 처리 비트 수(32비트, 64비트 등) 에 맞춰서 수행될 수 있다. However, for security system vulnerability analysis and various purposes, the reverse process is performed to interpret or analyze machine language, and as explained, it is called a disassembly process. The disassembly process can be performed according to the central processing unit (CPU) and number of processing bits (32-bit, 64-bit, etc.) of a specific operating system.

예시한 ELF, EXE, ARK 의 실행 파일을 각각 디스어셈블을 수행하면 디스어셈블된 어셈블리 코드를 획득할 수 있다. By disassembling each of the example executable files of ELF, EXE, and ARK, the disassembled assembly code can be obtained.

디스어셈블된 코드는 OP-CODE 와 ASM-CODE가 조합된 코드를 포함할 수 있다. Disassembled code may include code that combines OP-CODE and ASM-CODE.

실시 예는 디스어셈블 도구를 기반으로 실행 파일을 분석하여 실행 파일로부터 OP-CODE 와 ASM-CODE을 추출할 수 있다.In the embodiment, OP-CODE and ASM-CODE can be extracted from the executable file by analyzing the executable file based on a disassembly tool.

개시하는 실시 예는 추출된 OP-CODE 와 ASM-CODE을 그대로 이용하지 않고 각 함수 별로 재구성하여 OP-CODE 배열을 다시 구성한다. OP-CODE 배열을 재정리할 경우 원본 바이너리 데이터도 함께 포함하여 데이터의 해석을 충분히 수행할 수 있도록 데이터를 재구성할 수 있다. 이러한 재배열를 통해 OP-CODE 와 ASM-CODE의 새로운 조합은 공격 기법뿐만 아니라 공격자를 식별할 수 있는 기초 데이터를 제공한다. The disclosed embodiment does not use the extracted OP-CODE and ASM-CODE as is, but reconfigures the OP-CODE array for each function. When reorganizing the OP-CODE array, the data can be reorganized to sufficiently perform data interpretation by including the original binary data. Through this rearrangement, the new combination of OP-CODE and ASM-CODE provides basic data to identify attackers as well as attack techniques.

제 2 단계로 어셈블리 데이터를 처리하는 과정(ASM)을 상세히 설명하면 다음과 같다. The assembly data processing process (ASM) in the second step is described in detail as follows.

어셈블리 데이터 처리 과정은 OP-CODE와 필요한 ASM-CODE 만을 분리한 후 인간 또는 컴퓨터가 읽기 좋은 형태로 재구성된 데이터를 기반으로 유사도를 분석하고 정보를 추출하는 과정이다. The assembly data processing process is a process of separating only the OP-CODE and the necessary ASM-CODE, then analyzing the similarity and extracting information based on the data reconstructed in a form that is easy to read by humans or computers.

이 단계에서 디스어셈블된 어셈블리 데이터는 일정한 데이터 형식으로 변환될 수 있다. At this stage, the disassembled assembly data can be converted to a certain data format.

이러한 데이터 형식의 변환은 데이터 처리 속도를 높이고 데이터의 정확한 분석을 위해 아래 기술된 변환 방식들은 모두 적용될 필요없이 선택적으로 적용될 수 있다.Conversion of these data formats can be selectively applied without the need to apply all of the conversion methods described below to increase data processing speed and accurate analysis of data.

재배열된 OP-CODE 와 ASM-CODE의 조합의 어셈블리 데이터로부터 여러 가지 함수를 추출할 수 있다. Various functions can be extracted from the assembly data of the rearranged combination of OP-CODE and ASM-CODE.

하나의 실행 파일을 디스어셈블하면 프로그램 크기에 따라 다르지만 평균적으로 약, 7,000~12,000개 정도 되는 함수를 포함할 수 있다. 이 함수들은 프로그래머가 필요에 따라 구현한 함수도 있으며 운영체제에서 기본적으로 제공하는 함수들도 있다. Disassembling a single executable file can, on average, contain about 7,000 to 12,000 functions, depending on the size of the program. Some of these functions are implemented by programmers as needed, while others are provided by default in the operating system.

실제 ASM-CODE를 분석하면 약 87%~91% 정도의 함수가 운영체제에서 기본적으로 제공하는 함수(OS supported)이고 프로그래머가 프로그램 로직을 위해서 실제 구현한 ASM-CODE는 약 10% 정도이다. 운영체제에서 제공한 함수는 함수 명과 함께 운영체제 설치 시에 기본적으로 설치되는 각종 DLL, SO 파일 등에 포함되는 함수들(Default function)이다. 이러한 운영체제 제공 함수들은 이미 분석하여 저장하여 분석 대상 데이터로부터 필터링할 수 있다. 이렇게 분석해야 할 코드만 분리하면 이후 처리 속도와 성능을 높일 수 있다. When analyzing the actual ASM-CODE, about 87% to 91% of the functions are basically functions provided by the operating system (OS supported), and about 10% of the ASM-CODEs are actually implemented by programmers for program logic. Functions provided by the operating system are functions (default functions) included in various DLLs and SO files that are installed by default when installing the operating system along with the function name. These operating system-provided functions have already been analyzed and stored so that they can be filtered from the data to be analyzed. By isolating only the code that needs to be analyzed in this way, subsequent processing speed and performance can be increased.

실시 예는 프로그램의 기능적 분석을 정확하게 수행하기 위해서 OP-CODE를 함수 단위로 분리해서 처리할 수 있다. 실시 예는 모든 의미적 분석의 최소 단위를 어셈블리 코드에 포함된 함수를 기반하여 수행할 수 있다. In the embodiment, in order to accurately perform functional analysis of the program, OP-CODE can be separated and processed in function units. In an embodiment, the minimum unit of all semantic analysis may be performed based on a function included in assembly code.

분석 성능과 처리 속도를 높이기 위해 실시 예는 의미가 정확하지 않은 연산자 수준의 함수들은 필터링하고 정보량이 임계 치 보다 작은 함수들 도 분석 대상에서 제거할 수 있다. 함수들의 필터링의 여부와 정도는 실시 예에 따라 다르게 설정할 수 있다. In order to increase analysis performance and processing speed, the embodiment can filter out operator-level functions whose meaning is not precise and also remove functions whose information amount is less than a threshold value from the analysis target. The presence and degree of filtering of functions can be set differently depending on the embodiment.

실시 예는 함수에 따라 정리된 OP-CODE 로부터 디스어셈블러가 출력 시 제공하는 주석 데이터를 제거할 수 있다. 그리고 실시 예는 디스어셈블된 코드를 재배열할 수 있다. The embodiment may remove annotation data provided by the disassembler when outputting from the OP-CODE organized according to the function. And the embodiment may rearrange the disassembled code.

예를 들면, 디스어셈블러가 출력하는 디스어셈블된 코드는 [ASM-CODE, OP-CODE, 파라미터]의 순서를 가질 수 있다. For example, the disassembled code output by the disassembler may have the order of [ASM-CODE, OP-CODE, parameter].

실시 예는 어셈블리 데이터로부터 파라미터 데이터를 제거하고 위 순서의 디스어셈블된 코드를 [OP-CODE, ASM-CODE] 순서로 재정리 또는 재구성할 수 있다. 이렇게 재정된 디스어셈블된 코드는 정규화 또는 벡터화하여 처리하기 용이하다. 그리고 처리 속도를 현격하게 높일 수 있다.The embodiment may remove parameter data from assembly data and reorganize or reorganize the disassembled code in the above order in the [OP-CODE, ASM-CODE] order. The reassembled disassembled code can be easily processed by normalizing or vectorizing it. And the processing speed can be significantly increased.

특히 [OP-CODE, ASM-CODE] 의 조합을 가지는 디스어셈블된 코드 중 ASM-CODE 부분은 데이터의 길이가 달라 서로 비교하기 용이하지 않다. 따라서 해당 어셈블리 데이터의 고유성을 확인하기 위해서 데이터를 특정 크기의 데이터 포맷으로 정규화시킬 수 있다. 예를 들면 실시 예는 [OP-CODE, ASM-CODE] 조합의 디스어셈블된 코드의 고유성을 확인하기 위해서 데이터 부분을 정규화하기 용이한 특정 길이의 데이터 세트, 예를 들면 CRC(cyclic redundancy check) 데이터로 변환시킬 수 있다. In particular, the ASM-CODE part of the disassembled code with a combination of [OP-CODE, ASM-CODE] is not easy to compare because the data length is different. Therefore, in order to confirm the uniqueness of the assembly data, the data can be normalized into a data format of a specific size. For example, the embodiment uses a data set of a certain length, for example, CRC (cyclic redundancy check) data, that is easy to normalize the data part to check the uniqueness of the disassembled code of the [OP-CODE, ASM-CODE] combination. It can be converted to .

일 예로서 [OP-CODE, ASM-CODE] 조합의 디스어셈블된 코드에서 OP-CODE 부분은 제 1 길이의 CRC 데이터로, ASM-CODE 부분은 제 2 길이의 CRC 데이터로 각각 변환하는 것도 가능하다. As an example, in the disassembled code of the [OP-CODE, ASM-CODE] combination, it is possible to convert the OP-CODE part into CRC data of the first length and the ASM-CODE part into CRC data of the second length. .

OP-CODE와 ASM-CODE 변환된 정규화 데이터는 각각 해당 변환 이전의 각각 코드의 고유성을 유지할 수 있도록 한다. 고유성을 가지고 변환된 정규화 데이터의 유사도 판단 속도를 빠르게 하기 위해 상기 정규화된 데이터를 벡터화(Vectorization)를 수행할 수 있다. The normalized data converted to OP-CODE and ASM-CODE ensures that the uniqueness of each code before the conversion is maintained. In order to speed up the similarity judgment of normalized data converted to uniqueness, vectorization can be performed on the normalized data.

설명한 바와 같이 데이터 변환 과정으로서 정규화 또는 벡터화 과정은 데이터 처리 속도를 높이고 데이터의 정확한 분석을 선택적으로 적용될 수도 있다.As explained, the normalization or vectorization process as a data conversion process can be selectively applied to increase data processing speed and accurate analysis of data.

정규화 과정과 벡터화 과정의 상세한 예는 다시 아래에서 상세히 개시한다.Detailed examples of the normalization process and vectorization process are again disclosed in detail below.

제 3단계로서 디스어셈블드 코드를 분석하는 데이터의 분석과정을 상세히 설명하면 다음과 같다. As a third step, the data analysis process for analyzing the disassembled code is explained in detail as follows.

이 과정에서도 데이터 처리 속도를 높이고 데이터의 정확한 분석을 위해 여러 가지 데이터 형식의 변환이 사용될 수 있는데, 아래 개시하는 기술된 변환 방식들은 모두 적용할 필요없이 그 중 일부를 선택적으로 적용할 수 있다.In this process, conversion of various data formats can be used to increase data processing speed and accurate analysis of data. Some of the conversion methods described below do not need to be applied all, but can be selectively applied.

이러한 변환된 데이터에 기초하여 변환된 디스어셈블드 코드 내의 함수 별 데이터 세트를 기반으로 악성 코드와 유사도를 분석하는 단계이다.This is the step of analyzing similarity to malicious code based on data sets for each function in the converted disassembled code based on this converted data.

실시 예는 코드 간 유사도를 수행하기 위해 벡터화된 OP-CODE 와 ASM-CODE의 데이터 세트들을 바이트 데이터로 다시 변환할 수 있다. The embodiment may convert vectorized OP-CODE and ASM-CODE data sets back into byte data to perform inter-code similarity.

재변환된 바이트 데이터를 기반으로 블록 단위의 해쉬 값을 추출하고 블록 단위의 고유 값을 기반으로 전체 데이터의 해쉬 값을 생성할 수 있다. Based on the re-converted byte data, the hash value of the block unit can be extracted and the hash value of the entire data can be generated based on the unique value of the block unit.

해쉬 값은 바이트 데이터의 부분인 블록 단위의 비교를 효율적으로 수행하기 위해서 각 블록 단위의 고유 값을 추출하도록 지정된 단위의 해쉬 값을 추출하여 비교할 수 있다. Hash values can be compared by extracting the hash value of a designated unit to extract the unique value of each block in order to efficiently perform comparison in blocks, which are part of byte data.

이와 같이 지정된 단위의 해쉬 값을 추출하고 2개 이상의 데이터의 유사도를 비교하기 위해 퍼지 해쉬(Fuzzy Hashing) 기법이 사용될 수 있다. 예를 들면 실시 예는 퍼지 해쉬(Fuzzy Hashing) 중 CTPH(Context Triggered Piecewise Hashing) 방식을 사용하여 블록 단위로 추출된 해쉬 값과 기 저장된 악성 코드 중 일부 단위의 해쉬 값을 서로 비교하여 유사도를 판단할 수 있다. In this way, the fuzzy hashing technique can be used to extract the hash value of the specified unit and compare the similarity of two or more data. For example, in the embodiment, the hash value extracted in block units using the CTPH (Context Triggered Piecewise Hashing) method of fuzzy hashing is compared with the hash value of some units of previously stored malicious code to determine similarity. You can.

정리하면 실시 예는 OP-CODE 및 ASM-CODE의 조합 코드가 특정 기능을 함수 단위로 구현한다는 사실에 기반하여, 각 특정 기능의 고유성을 확인하기 위해서 OP-CODE 와 ASM-CODE의 디스어셈블된 코드의 고유 값을 생성한다. 그리고 이 고유 값을 기반으로 디스어셈블된 코드의 OP-CODE와 ASM-CODE중 블록 단위의 고유 값을 추출하여 유사도 연산을 수행할 수 있다. In summary, the embodiment is based on the fact that the combination code of OP-CODE and ASM-CODE implements specific functions in function units, and the disassembled code of OP-CODE and ASM-CODE is used to check the uniqueness of each specific function. Generates a unique value of And based on this unique value, the similarity calculation can be performed by extracting the unique value of the block unit among the OP-CODE and ASM-CODE of the disassembled code.

블록 단위의 해쉬 값을 추출 하는 상세한 예도 아래에서 도면을 참조하여 개시하도록 한다. A detailed example of extracting hash values in block units is also disclosed with reference to the drawings below.

설명한 바와 같이 실시 예는 유사도 연산을 수행할 경우 블록 단위 해쉬 값을 이용할 수 있다. As described, the embodiment may use block-level hash values when performing similarity calculation.

추출된 블록 단위 해쉬 값은 String Data (Byte Data) 로 구성되어 있고 String Data (Byte Data)는 수치화 값들로 코드 간의 유사도를 비교할 수 있다. 만약 수십억 개의 디스어셈블된 코드 데이터 세트의 바이트 비교를 수행하면 하나의 유사도 결과를 얻는데 엄청난 시간을 소비할 수 있다. The extracted block-level hash value is composed of String Data (Byte Data), and String Data (Byte Data) is numerical values that can compare the similarity between codes. If you perform a byte comparison on a data set of billions of disassembled codes, it can take a huge amount of time to get a single similarity result.

따라서 실시 예는 String Data (Byte Data)는 수치화 값으로 변환할 수 있는데 이러한 수치화 값에 기반하면 인공지능 기술을 활용해 유사도 분석을 빠르게 수행할 수 있다. Therefore, in the embodiment, String Data (Byte Data) can be converted into numerical values, and based on these numerical values, similarity analysis can be quickly performed using artificial intelligence technology.

실시 예는 추출된 블록 단위의 해쉬 값의 String Data (Byte Data) 를 N-gram 데이터 기반으로 벡터화시킬 수 있다. 이 도면의 실시 예는 연산 속도를 높이기 위해 블록 단위의 해쉬 값을 2-gram 데이터로 벡터화 수행하는 경우를 예시한다. 그런데 실시 예는 블록 단위의 해쉬 값을 반드시 2-gram 데이터로 변환할 필요는 없으며 3-gram, 4-gram,…, N-gram의 데이터로 벡터화 변환하는 것도 가능하다. N-gram의 데이터에서 N이 증가할수록 데이터의 특성을 정확하게 반영할 수 있지만 데이터의 처리 시간의 속도가 증가한다. In an embodiment, the String Data (Byte Data) of the extracted hash value in block units can be vectorized based on N-gram data. The embodiment of this figure illustrates a case where hash values in block units are vectorized into 2-gram data to increase operation speed. However, in the embodiment, it is not necessary to convert the hash value of the block unit into 2-gram data, but 3-gram, 4-gram,... , it is also possible to vectorize and convert it into N-gram data. As N increases in N-gram data, the characteristics of the data can be accurately reflected, but the speed of data processing time increases.

기술한 바와 같이 데이터 처리 속도를 높이고 데이터의 정확한 분석을 위해 바이트 변환, 해쉬의 변환 및 아래의 N-gram 변환은 선택적으로 적용할 수 있다.As described, to increase data processing speed and accurately analyze data, byte conversion, hash conversion, and N-gram conversion below can be selectively applied.

예시한 2-gram 변환 데이터는 최대 65,536 차원을 가진다. 학습 데이터의 차원이 높아질수록, 데이터의 분포가 희박해(sparse)지며, 이에 따라 분류 성능에 악영향을 끼칠 수 있다. 그리고 학습 데이터의 차원이 높아지면 데이터를 학습하기 위한 시간 복잡도와 공간 복잡도가 증가한다. The example 2-gram transformed data has a maximum of 65,536 dimensions. As the dimension of the learning data increases, the distribution of the data becomes sparse, which may have a negative impact on classification performance. And as the dimension of the learning data increases, the time complexity and space complexity for learning the data increase.

이러한 문제점을 해결하기 위해 실시 예는 다양한 텍스트 표현 기반의 여러 가지 자연어 처리 알고리즘으로 처리할 수 있다. 이 실시 예에서는 이러한 알고리즘으로 TF-IDF(Term Frequency-Inversed Document Frequency) 기법을 예로 하여 설명한다. To solve these problems, embodiments can be processed with various natural language processing algorithms based on various text expressions. In this embodiment, this algorithm is explained using the TF-IDF (Term Frequency-Inversed Document Frequency) technique as an example.

이 단계의 학습 데이터의 유사도를 처리하기 위한 일 예로서, 고차원 데이터 중에서 공격 식별자 또는 클래스(T-ID)를 판단할 경우 의미 있는 특징(패턴)을 선택하기 위해 TF-IDF(Term Frequency-Inversed Document Frequency) 기법을 사용할 수 있다. 일반적으로, TF-IDF 기법은 검색 엔진에서 유사도가 높은 문서를 찾기 위해 사용되는데 이를 계산하는 수학식들은 다음과 같다. As an example for processing the similarity of the learning data at this stage, TF-IDF (Term Frequency-Inversed Document) is used to select meaningful features (patterns) when determining an attack identifier or class (T-ID) among high-dimensional data. Frequency) technique can be used. Generally, the TF-IDF technique is used in search engines to find documents with high similarity, and the equations for calculating this are as follows.

[수학식 1][Equation 1]

여기서 는 특정 문서 에서 특정 단어 의 빈도율을 의미하고 그 단어가 반복적으로 나올수록 높은 값을 갖는다. here is a specific document specific words from It means the frequency rate of and the more repeatedly the word appears, the higher the value.

[수학식 2][Equation 2]

는 특정 단어 를 포함하는 문서 의 비율의 역수 값으로, 단어가 여러 문서에서 흔하게 나타날수록 낮은 값을 갖는다. is a specific word Documents containing It is the reciprocal value of the ratio, and the more commonly a word appears in multiple documents, the lower the value.

[수학식 3][Equation 3]

는 와 를 곱한 값으로, 어떤 단어가 어떤 문서에 더 적합한지 수치화시킬 수 있다. Is and By multiplying the value, you can quantify which word is more suitable for which document.

TF-IDF 방식은 수학식 1에 의한 단어의 빈도와 수학식 2에 의한 역문서빈도 (문서의 빈도에 특정한 역수)를 이용하여 수학식 3과 같이 문서 단어 행렬 내의 단어의 중요도에 따라 가중치를 반영하는 하는 방식이다. The TF-IDF method uses the word frequency according to Equation 1 and the inverse document frequency (the specific inverse of the document frequency) according to Equation 2, and reflects weights according to the importance of words in the document word matrix as shown in Equation 3. It's a way of doing it.

실시 예에서 블록 단위의 코드 상의 단어의 특징 또는 패턴에 기반하여 해당 단어가 포함된 문서를 공격 식별자(T-ID)라고 추론할 수 있다. 따라서, 블록 단위의 코드로부터 추출된 패턴에 대해서 TF-IDF를 계산하면, 특정 공격 식별자(T-ID) 내에서 빈번하게 나타나는 패턴을 추출하거나 또는 특정 공격 식별자(T-ID)와 관련 없는 패턴을 가지는 코드를 제거할 수 있다. In an embodiment, a document containing the word may be inferred to be an attack identifier (T-ID) based on the characteristics or patterns of the word in the code in the block unit. Therefore, when TF-IDF is calculated for patterns extracted from block-level code, patterns that appear frequently within a specific attack identifier (T-ID) are extracted or patterns unrelated to a specific attack identifier (T-ID) are extracted. Branches can remove code.

예를 들어, 특정 패턴 A는 모든 공격 식별자(T-ID)들에서 발현되는 패턴이라고 했을 때, 특정 패턴 A에 대한 TF-IDF 값은 낮게 측정될 것이다. 그리고 이러한 패턴은 실제 공격 식별자(T-ID)를 구분하기 위해 불필요한 패턴임을 판단할 수 있다. TF-IDF와 같은 자연어의 유사도 판단을 위한 알고리즘은 머신 러닝 알고리즘의 학습을 통해 수행될 수도 있다. For example, if specific pattern A is a pattern that appears in all attack identifiers (T-IDs), the TF-IDF value for specific pattern A will be measured low. And it can be determined that this pattern is unnecessary to distinguish the actual attack identifier (T-ID). Algorithms for determining similarity of natural language, such as TF-IDF, can also be performed through learning machine learning algorithms.

실시 예는 이러한 불필요한 패턴을 제거하여 불필요한 연산을 줄이고 추론 시간을 단축시킬 수 있다.Embodiments can reduce unnecessary calculations and shorten inference time by removing these unnecessary patterns.

상세하게 실시 예는 변환되어 블록 단위 코드의 데이터에 대해, 여러 가지 자연어 처리의 텍스트 표현에 기초한 유사도 알고리즘을 수행할 수 있다. 유사도 알고리즘을 통해 공격 식별자와 관련이 없는 패턴의 코드는 제거하여 아래 수행되는 알고리즘 수행과 머신 러닝에 따른 분류 과정의 수행을 크게 단축시킬 수 있다. In detail, the embodiment can be converted to perform a similarity algorithm based on text expressions of various natural language processing on block unit code data. By removing patterns of code that are not related to the attack identifier through the similarity algorithm, the performance of the algorithm performed below and the classification process according to machine learning can be greatly shortened.

실시 예는 블록 단위의 코드 상의 특징 또는 패턴을 기반하여 공격 식별자의 패턴을 분류하기 위해 분류 모델링을 수행할 수 있다. 실시 예는 벡터화된 블록 단위의 코드 특징 또는 패턴이 알려진 공격 식별자의 패턴인지를 학습하고, 이를 정확한 공격 기법이나 구현방식으로 분류할 수 있다. 실시 예는 악성 코드와 유사한 코드 패턴이 있다고 판단된 코드에 대해 정확한 공격 구현 방식, 즉 공격 식별자와 공격자를 분류를 위해 여러 가지 앙상블 머신 러닝 모델들을 이용한다. Embodiments may perform classification modeling to classify patterns of attack identifiers based on features or patterns in block-level code. The embodiment can learn whether a code feature or pattern in a vectorized block unit is a pattern of a known attack identifier and classify it as an accurate attack technique or implementation method. The embodiment uses various ensemble machine learning models to classify an accurate attack implementation method, that is, an attack identifier and an attacker, for code determined to have a code pattern similar to malicious code.

앙상블 머신 러닝 모델들은 준비된 데이터를 여러 개의 분류 노드들을 생성하고 각 분류 노드의 대한 노드의 예측을 결합하여 정확한 예측을 수행하는 기법이다. 위에서 설명한 바와 같이 블록 단위의 코드 상의 단어의 특징 또는 패턴이 어떤 공격 구현 방식인지, 즉 공격 식별자 또는 공격자인지 분류하는 앙상블 머신 러닝 모델들을 수행한다. Ensemble machine learning models are a technique that creates multiple classification nodes from prepared data and combines the predictions of each classification node to make accurate predictions. As described above, ensemble machine learning models are performed to classify whether the characteristics or patterns of words in block-level code represent an attack implementation method, that is, an attack identifier or an attacker.

앙상블 머신 러닝 모델들을 적용 시에 과탐과 오탐을 방지하기 위해 준비된 데이터의 분류를 위한 임계 값을 설정할 수 있다. 설정된 탐지 임계 값 이상의 데이터들만 분류하고 설정된 탐지 임계 값에 도달하지 못하는 데이터는 분류 수행을 하지 않을 수 있다. When applying ensemble machine learning models, you can set a threshold for classification of the prepared data to prevent over- and false-positives. Only data above the set detection threshold may be classified, and data that does not reach the set detection threshold may not be classified.

기술 바와 같이 데이터 처리 속도를 높이고 데이터의 정확한 분석을 위해 여러 가지 데이터 형식의 변환이 사용될 수 있다. 위에서 기술한 데이터 변환 방식을 앙상블 머신 러닝 모델들에 적용한 구체적인 실시 예는 이하에서 상세히 설명한다.As described, conversion of several data formats can be used to speed up data processing and ensure accurate analysis of data. A specific example of applying the data conversion method described above to ensemble machine learning models is described in detail below .

제 4단계로서 공격 기법(TTP)을 식별하여 라벨링을 부여하는 프로파일링 하는 과정을 설명하면 다음과 같다. As the fourth step, the profiling process to identify attack techniques (TTPs) and assign labels is explained as follows.

이미 분석된 공격 코드 또는 악성 코드에 기반하여 입력된 바이너리 데이터의 OP-CODE와 ASM-CODE를 포함하는 디스어셈블드 코드의 특징 추출을 통해 벡터화시키는 예를 위에서 기술하였다.An example of vectorization through feature extraction of disassembled code including OP-CODE and ASM-CODE of input binary data based on already analyzed attack code or malicious code was described above.

이렇게 벡터화된 데이터는 머신 러닝 모델링을 통해 학습된 후 특정 공격 기법으로 분류되고 분류된 코드들은 프로파일링 과정에서 상기 분류된 데이터의 라벨링이 수행된다.This vectorized data is learned through machine learning modeling and then classified into specific attack techniques, and the classified codes are labeled during the profiling process.

라벨링은 크게 두 부분에 수행될 수 있는데 하나는 표준화된 모델에서 정의한 공격 식별자에 대한 고유 인덱스를 붙이는 것이고 다른 하나는 공격 코드를 작성한 사용자에 대한 정보를 기입하는 것이다. Labeling can be largely performed in two parts: one is to attach a unique index to the attack identifier defined in the standardized model, and the other is to enter information about the user who wrote the attack code.

라벨링은 표준화된 모델, 예를 들면 MITRE ATT&CK에서 반영된 공격 식별자(T-ID)에 따라 부여하도록 하여 추가적인 작업 없이 사용자에게 정확한 정보를 전달할 수 있도록 한다. Labeling is assigned according to an attack identifier (T-ID) reflected in a standardized model, for example, MITER ATT&CK, allowing accurate information to be delivered to the user without additional work.

그리고 라벨링은 공격 식별자뿐만 아니라 해당 공격 식별자를 구현한 공격자를 구별할 수 있도록 부여된다. 따라서 공격 식별자뿐만 아니라 공격자와 그에 따른 구현 방식을 식별할 수 있도록 제공할 수 있다. And labeling is given to distinguish not only the attack identifier but also the attacker who implemented the attack identifier. Therefore, not only the attack identifier but also the attacker and his/her implementation method can be identified.

실시 예는 기존에 분류된 디스어셈블된 코드(OP-CODE, ASM-CODE, 또는 그 조합)의 데이터 세트를 학습한 데이터를 기반으로 고도화된 프로파일링이 가능한다. 실시 예는 위에서 개시한 정적 분석, 동적 분석, 또는 연관 분석의 데이터도 라벨링을 수행하는 참고 데이터로 활용할 수 있다. 따라서 기존에 분석되지 않은 데이터 세트라고 하더라도 정적, 동적, 및 연관 분석의 결과를 함께 고려하면 매우 빠르고 효율적으로 프로파일링 데이터를 확보할 수 있다.The embodiment enables advanced profiling based on data learned from a data set of previously classified disassembled codes (OP-CODE, ASM-CODE, or a combination thereof). In the embodiment, data from the static analysis, dynamic analysis, or association analysis disclosed above may also be used as reference data for labeling. Therefore, even if it is a data set that has not been previously analyzed, profiling data can be obtained very quickly and efficiently by considering the results of static, dynamic, and correlation analysis together.

위에서 3단계의 악성 코드와 유사한 패턴을 가지는 코드를 학습하고 학습된 데이터가 분류되는 과정과 4단계의 분류된 데이터의 프로파일링 과정은 머신 러닝에 알고리즘에 의해 함께 진행될 수 있다. The process of learning a code with a pattern similar to the malicious code in step 3 above, classifying the learned data, and the profiling process of the classified data in step 4 can be carried out together by a machine learning algorithm.

이에 대한 상세한 예는 아래에서 개시한다. 그리고 프로파일링된 데이터 세트의 실제 예도 아래에서 도면을 참고하여 예시하도록 한다.A detailed example of this is disclosed below. An actual example of the profiled data set is also illustrated with reference to the drawing below.

도 22는 개시하는 실시 예의 데이터 변환의 일 예로서 디스어셈블드 코드의 OP-CODE 및 ASM-CODE를 정규화된 코드로 변환한 값을 예시한 도면이다. FIG. 22 is a diagram illustrating values obtained by converting OP-CODE and ASM-CODE of disassembled code into normalized codes as an example of data conversion in the disclosed embodiment.

설명한 바와 같이 실행 파일의 디스어셈블링을 수행하면 OP-CODE 및 ASM-CODE가 결합된 데이터가 출력된다. As described, when disassembling an executable file, data combining OP-CODE and ASM-CODE is output.

실시 예는 디스어셈블링된 데이터로부터 함수 별로 출력되는 주석 데이터를 제거하고 처리가 용이하도록 OP-CODE, ASM-CODE, 및 대응 파라미터의 배치 순서를 변경할 수 있다. The embodiment may remove annotation data output for each function from disassembled data and change the arrangement order of OP-CODE, ASM-CODE, and corresponding parameters to facilitate processing.

재구성된 OP-CODE와 ASM-CODE를 정규화된 코드 데이터로 변경하는데, 이 도면의 예는 정규화된 코드 데이터로 CRC 데이터를 예시한다. The reconstructed OP-CODE and ASM-CODE are changed to normalized code data, and the example in this figure illustrates CRC data as normalized code data.

일 예로 OP-CODE는 CRC-16로 변환하고 ASM-CODE로 CRC-32로 변환할 수 있다. As an example, OP-CODE can be converted to CRC-16 and ASM-CODE can be converted to CRC-32.

예시한 표의 첫 번째 행에서 OP-CODE의 push함수를 0x45E9의 CRC-16 데이터로 변경하고, ASM-CODE의 55를 0xC9034AF6의 CRC-32 데이터로 변경한 것을 예시한다. In the first row of the example table, the push function of OP-CODE is changed to CRC-16 data of 0x45E9, and 55 of ASM-CODE is changed to CRC-32 data of 0xC9034AF6.

두 번째 행에서는 OP-CODE의 mov함수를 0x10E3의 CRC-16 데이터로 변경하고, ASM-CODE의 8B EC 를 0x3012FD2C의 CRC-32 데이터로 변경하였다. 세 번째 행에서는 OP-CODE의 lea함수를 0xAACE의 CRC-16 데이터로 변경하고, ASM-CODE의 8D 45 0C를 0x9214A6AA의 CRC-32 데이터로 변경하였다. In the second row, the mov function of OP-CODE was changed to CRC-16 data of 0x10E3, and 8B EC of ASM-CODE was changed to CRC-32 data of 0x3012FD2C. In the third row, the lea function of OP-CODE was changed to CRC-16 data of 0xAACE, and 8D 45 0C of ASM-CODE was changed to CRC-32 data of 0x9214A6AA.

네 번째 행에서 OP-CODE의 push함수를 0x45E9의 CRC-16 데이터로 변경하고, ASM-CODE의 50를 0xB969BE79의 CRC-32 데이터로 변경한 것을 예시한다. In the fourth row, the push function of OP-CODE is changed to CRC-16 data of 0x45E9, and 50 of ASM-CODE is changed to CRC-32 data of 0xB969BE79.

이 예와 다르게 CRC 데이터와 다른 다른 정규화 코드 데이터나 길이가 다른 코드 데이터를 사용할 수도 있다. Unlike this example, other normalized code data that is different from the CRC data or code data that has a different length may be used.

이렇게 디스어셈블링된 코드를 정규화된 코드로 변경하면 각 코드의 고유성을 확보하면서 이후의 연산, 유사도 산출 및 벡터화 수행을 용이하게 빠르게 수행할 수 있다. By changing the disassembled code into a normalized code, subsequent operations, similarity calculations, and vectorization can be performed easily and quickly while securing the uniqueness of each code.

도 23은 개시하는 실시 예의 데이터 변환의 일 예로서 디스어셈블드 코드의 OP-CODE 및 ASM-CODE의 벡터화된 값을 예시한 도면이다.FIG. 23 is a diagram illustrating vectorized values of OP-CODE and ASM-CODE of disassembled code as an example of data conversion of the disclosed embodiment.

이 도면에서는 정규화된 OP-CODE 의 코드(위의 예에 따르면 CRC-16)와 정규화된 ASM-CODE (위의 예에 따르면 CRC-32)를 각각 벡터화시킨 결과를 예시한다. This figure illustrates the results of vectorizing the normalized OP-CODE code (CRC-16 according to the example above) and the normalized ASM-CODE (CRC-32 according to the example above), respectively.

정규화된 OP-CODE 의 코드를 벡터화한 값(OP-CODE Vector)와 정규화된 ASM-CODE의 코드를 벡터화한 값(ASM-CODE Vector)을 이 도면에 표 형식으로 나타내었다. The vectorized value of the normalized OP-CODE code (OP-CODE Vector) and the vectorized value of the normalized ASM-CODE code (ASM-CODE Vector) are shown in table format in this figure.

이 도면의 각 행의 OP-CODE Vector 값과 ASM-CODE Vector 값은 각각 도 22의 각 행의 OP-CODE의 정규화 값과 ASM- CODE의 정규화 값에 대응된다. The OP-CODE Vector value and ASM-CODE Vector value of each row of this figure correspond to the normalized value of OP-CODE and the normalized value of ASM-CODE of each row of FIG. 22, respectively.

예를 들어, 도 22의 표의 네 번째 행의 CRC 데이터 0x45E9와 0xB969BE79의 벡터화 값들은 각각 이 도면의 표의 네 번째 행의 17897와 185 105 121 44이 된다. For example, the vectorized values of CRC data 0x45E9 and 0xB969BE79 in the fourth row of the table in FIG. 22 are 17897 and 185 105 121 44 in the fourth row of the table in FIG. 22, respectively.

이렇게 정규화된 데이터에 대해 벡터화를 수행하면 디스어셈블링된 OP-CODE의 함수와 ASM-CODE가 각각 고유 특징을 포함하면서 벡터화 값으로 변화된다.When vectorization is performed on normalized data, the disassembled OP-CODE function and ASM-CODE each contain unique features and are converted into vectorized values.

도 24는 개시하는 실시 예의 데이터 변환의 일 예로서 코드의 블록 단위를 해쉬 값으로 변환하는 예를 개시한 도면이다. FIG. 24 is a diagram illustrating an example of converting a block unit of code into a hash value as an example of data conversion in the disclosed embodiment.

유사도 분석을 수행하기 위해서 벡터화된 각 OP-CODE 및 ASM-CODE 의 데이터 세트는 바이트 데이터 형태로 재변환이 수행된다. 재변환된 바이트 데이터는 블록 단위의 해쉬 값으로 변환될 수 있다. 그리고 다시 블록 단위의 해쉬 값들에 기반하여 전체 재변환된 바이트 데이터의 해쉬 값을 생성한다. In order to perform similarity analysis, the vectorized data sets of each OP-CODE and ASM-CODE are reconverted into byte data form. The reconverted byte data can be converted into a hash value in block units. Then, a hash value of the entire re-converted byte data is generated based on the block-level hash values.

실시 예는 재변환된 해쉬 값을 산출하는데 MD5(Message-Digest algorithm 5), SHA1 (Secure Hash Algorithm 1), SHA 256이 등의 해쉬 값을 사용될 수도 있는데, 데이터 사이의 유사도 판단을 위한 퍼지 해쉬(Fuzzy Hash) 함수를 이용할 수 있다. In the embodiment, hash values such as MD5 (Message-Digest algorithm 5), SHA1 (Secure Hash Algorithm 1), and SHA 256 may be used to calculate the reconverted hash value, and fuzzy hash (for determining similarity between data) You can use the Fuzzy Hash) function.

이 도면의 표에서 첫 번째 행은 데이터에 포함될 수 있는 사람이 가독할 수 있는 character를 나타낸다. 재변환된 바이트 데이터 중 블록 단위에 포함되는 값은 이와 같은 가독성의 character들을 포함할 수 있다.The first row in the table in this figure represents human-readable characters that may be included in the data. The value included in the block unit among the reconverted byte data may include such readable characters.

각 character들은 두 번째 행의 아스키 값(ascii val)인 97, 98, 99, 100, …., 48, 49에 대응될 수 있다. Each character is an ASCII value (ASCII val) of the second row: 97, 98, 99, 100, … It can correspond to ., 48, 49.

첫 번째 행의 character 값들을 포함하는 데이터를 세그먼트하여 아스키 값들의 합산이 가능한 블록으로 분리할 수 있다.The data containing the character values of the first row can be segmented into blocks in which ASCII values can be summed.

표의 세 번째 행은 4개의 character 를 가지는 블록 단위 내에서 각 character 값에 대응되는 아스키 값의 합산 값을 나타낸다. The third row of the table shows the sum of the ASCII values corresponding to each character value within a block unit with 4 characters.

첫 번째 블록의 경우 그 블록 내 character 에 대응되는 아스키 값(ascii val) 97, 98, 99, 100의 합(ascii sum)인 394의 값을 가질 수 있다. In the case of the first block, it can have a value of 394, which is the ASCII sum (ASCII sum) of 97, 98, 99, and 100 corresponding to the characters in the block.

그리고 마지막 행은 블록 단위의 아스키 값의 합이 Base 64의 표현으로 변환된 경우를 나타낸다. 문자(letter) K는 첫 번째 블록의 합산이 된다. And the last row shows the case where the sum of ASCII values in block units is converted to Base 64 expression. Letter K becomes the sum of the first block.

이러한 방식으로 해당 데이터에 대해 Kaq6KaU라는 시그니처를 얻을 수 있다. In this way, a signature called Kaq6KaU can be obtained for the data.

이러한 시그니처를 기반으로 두 개의 블록 단위 데이터에 대한 유사도를 산출할 수 있다. Based on these signatures, the similarity of two block-level data can be calculated.

이 실시 예는 재변환된 바이트 데이터 중 코드에 포함된 블록 단위들에 대해 유사도 판단을 위한 퍼지 해쉬 함수로 해쉬 값을 산출하고, 산출된 해쉬 값들을 기반으로 유사도를 판단할 수 있다. 유사도 판단을 위한 퍼지 해쉬 함수로 CTPH(Context Triggered Piecewise Hashing)를 예시하였으나 데이터의 유사도를 산출할 수 있는 다른 퍼지 해쉬 함수를 사용하는 것도 가능하다. In this embodiment, a hash value can be calculated using a fuzzy hash function for determining similarity for block units included in the code among the re-converted byte data, and the similarity can be determined based on the calculated hash values. CTPH (Context Triggered Piecewise Hashing) is exemplified as a fuzzy hash function for determining similarity, but it is also possible to use other fuzzy hash functions that can calculate the similarity of data.

도 25는 개시하는 실시 예에 따른 앙상블 머신 러닝 모델의 일 예를 나타낸 도면이다. Figure 25 is a diagram illustrating an example of an ensemble machine learning model according to the disclosed embodiment.

실시 예는 앙상블 머신 러닝 모델을 이용하여 악성 코드로 판단되는 파일의 공격 식별자(T-ID)를 정확하게 분류할 수 있다.In the embodiment, the attack identifier (T-ID) of a file determined to be malicious code can be accurately classified using an ensemble machine learning model.

String Data (Byte Data)로 구성된 블록 단위를 해쉬 값은 N-gram 특징 정보 기반으로 수치화시킨 후 이것이 공격 식별자(T-ID) 또는 분류될 클래스인지를 판단하기 위해 TF-IDF 등의 기법으로 유사도를 계산할 수 있다. The hash value of the block unit composed of String Data (Byte Data) is quantified based on N-gram feature information, and then the similarity is calculated using techniques such as TF-IDF to determine whether it is an attack identifier (T-ID) or a class to be classified. It can be calculated.

불필요한 연산을 줄여 공격 기법 식별의 성능을 높이기 위해 실시 예는 위 해쉬 값 중 유사도를 기반으로 불필요한 패턴을 제거할 수 있다. In order to improve the performance of identifying attack techniques by reducing unnecessary operations, the embodiment may remove unnecessary patterns based on the similarity among the hash values above.

그리고 불필요한 패턴이 제거된 데이터를 앙상블 머신 러닝을 통해 모델링하여 공격 식별자를 분류할 수 있다.Additionally, attack identifiers can be classified by modeling data from which unnecessary patterns have been removed through ensemble machine learning.

앙상블 머신 러닝 모델의 여러 개의 분류 노드의 학습 결과들을 결합하기는 방식으로 보팅(Voting), 배깅(Bagging), 부스팅(Booting) 등의 방식이 있다 이러한 방식들을 적절히 조합한 앙상블 머신 러닝 모델은 학습 데이터의 분류 정확도를 높이는데 기여할 수 있다. Methods for combining the learning results of multiple classification nodes of an ensemble machine learning model include voting, bagging, and boosting. An ensemble machine learning model that combines these methods appropriately uses the learning data. It can contribute to increasing classification accuracy.

여기서는 일 예로서 배깅 방식의 랜덤 포레스트(Random Forest) 방식을 적용하는 경우를 예를 들어 공격 식별자를 보다 정확하게 분류하는 방법을 설명한다. Here, as an example, a method of classifying attack identifiers more accurately will be explained using the case of applying the bagging random forest method.

랜덤 포레스트(Random Forest) 방식은 많은 수의 디시전 트리(Decision Tree) 생성하여 단일 디시전 트리에 의한 분류 오류를 낮추고 일반화된 분류 결과를 얻는 방식이다. 실시 예는 준비된 데이터에 대해 적어도 하나 이상의 디시전 트리(Decision Tree)를 이용한 랜덤 포레스트(Random Forest) 학습 알고리즘을 적용할 수 있다. 여기서 준비된 데이터는 블록 단위의 퍼지 해쉬 값으로부터 불필요한 패턴이 제거된 데이터를 의미한다.The Random Forest method generates a large number of decision trees to lower classification errors caused by a single decision tree and obtain generalized classification results. The embodiment may apply a random forest learning algorithm using at least one decision tree to the prepared data. The prepared data here refers to data from which unnecessary patterns have been removed from the block-level fuzzy hash value.

블록 단위 해쉬 값의 유사도 판단을 위해 적어도 하나 이상의 노드를 가진 디시전 트리(Decision Tree)모델을 수행한다. 디시전 트리(Decision Tree)의 정보 획득(information gain) 정도에 따라 1개 이상의 클래스(공격 식별자; T-ID)를 구분할 수 있는 특징 값(여기서는 블록 단위 해쉬 값을 기초로 한 분류 패턴의 발현 개수)에 대해 비교 조건을 최적화할 수 있다. To determine the similarity of block-level hash values, a decision tree model with at least one node is performed. A feature value that can distinguish one or more classes (attack identifier; T-ID) depending on the degree of information gain of the decision tree (here, the number of occurrences of classification patterns based on block unit hash values) ), the comparison conditions can be optimized.

이를 위해 도면에서 예시한 바와 같은 디시전 트리(Decision Tree)를 생성할 수 있다. For this purpose, a decision tree as illustrated in the drawing can be created.

이 도면에서 위 쪽의 사각형(2510, 2520, 2530, 2540)들은 인 터미널 노드로서 클래스를 구분하는 조건을 의미하고 아래 쪽의 사각형 부분(2610, 2620, 2630)은 터미널 노드로 분류되는 클래스를 의미한다. In this drawing, the upper squares (2510, 2520, 2530, 2540) are terminal nodes, meaning conditions for classifying them, and the lower square parts (2610, 2620, 2630) mean classes classified as terminal nodes. do.

예를 들어 랜덤 포레스트(Random Forest) 모델을 앙상블 머신 러닝 모델로 적용할 경우, 1개 이상의 디시전 트리(Decision Tree)를 이용하여 앙상블 기법을 사용하는 분류 모델이다. 랜덤 포레스트(Random Forest) 모델을 구성하는 디시전 트리(Decision Tree)의 입력 데이터의 특징을 다르게 하여 다양한 디시전 트리(Decision Tree)를 구성한다. 여러 개 생성된 디시전 트리(Decision Tree) 모델에 대해 분류를 수행하고 다수결 투표 기법을 사용하여 최종 분류 클래스를 결정한다. 각 노드의 테스트는 병렬적으로 진행될 수 있어 계산 효율이 높다.For example, when applying the Random Forest model as an ensemble machine learning model, it is a classification model that uses ensemble techniques using one or more decision trees. Various decision trees are constructed by varying the characteristics of the input data of the decision tree that makes up the Random Forest model. Classification is performed on multiple generated decision tree models and the final classification class is determined using a majority voting technique. Testing of each node can be carried out in parallel, resulting in high computational efficiency.

클래스를 분류할 경우 과탐과 오탐을 방지하기 위해 임계값을 설정하고 하한 임계값 이하의 값은 버리고, 탐지 임계값 이상의 데이터 대상으로 분류를 수행할 수 있다.When classifying a class, you can set a threshold to prevent over- and false-positives, discard values below the lower threshold, and classify data objects above the detection threshold.

도 26은 개시하는 실시 예에 따라 머신 러닝으로 데이터를 학습하고 분류하는 흐름을 예시한 도면이다. Figure 26 is a diagram illustrating a flow of learning and classifying data using machine learning according to the disclosed embodiment.

입력 데이터의 프로파일링은 분류 단계(S2610)과 학습 단계(S2620)를 포함할 수 있다. Profiling of input data may include a classification step (S2610) and a learning step (S2620).

실시 예에서 학습 단계(S2620)는 (a) 해쉬 값 추출 과정, (b) N-gram 패턴 추출 과정, (c) 자연어 처리 분석 (TF-IDF 분석) 과정, (d) 패턴 선택 과정, (e) 모델 학습 과정 등을 포함할 수 있다. In the embodiment, the learning step (S2620) includes (a) hash value extraction process, (b) N-gram pattern extraction process, (c) natural language processing analysis (TF-IDF analysis) process, (d) pattern selection process, (e) ) may include a model learning process, etc.

그리고 실시 예에서 분류 단계(S2610)는, (a) 해쉬 값 추출 과정, (b) N-gram 패턴 추출 과정, (f) 패턴 선택 과정, (g) 벡터화에 의한 분류 과정 등을 포함할 수 있다. And in the embodiment, the classification step (S2610) may include (a) a hash value extraction process, (b) an N-gram pattern extraction process, (f) a pattern selection process, (g) a classification process by vectorization, etc. .

실시 예에 따른 프로파일링 단계 중 분류 단계(S2620)를 먼저 설명하면 다음과 같다. Among the profiling steps according to the embodiment, the classification step (S2620) will first be described as follows.

실행 파일 집합이나 처리된 파일로부터 입력 데이터를 수신한다.Receives input data from a set of executable files or processed files.

데이터베이스에 저장된 실행 파일 집합들로부터 입력 데이터를 수신하거나 또는 위에서 예시한 처리 과정으로부터 전달되는 실행 파일이 포함된 입력 데이터를 수신한다. 입력 데이터는 OP-CODE 와 ASM-CODE 코드를 포함하는 디스어셈블된 코드를 변환시킨 데이터로 벡터화시킨 데이터일 수 있다. Input data is received from a set of executable files stored in a database, or input data containing an executable file delivered from the processing illustrated above is received. The input data may be vectorized data converted from disassembled codes including OP-CODE and ASM-CODE codes.

입력 데이터인 디스어셈블된 코드로부터 퍼지 해쉬(Fuzzy Hash) 값을 추출(a)하고 특정 함수에 대한 N-gram 패턴 데이터를 추출한다(b). 이때 기존의 의미 패턴 집합 중 악성 코드와 유사하다고 판단한 패턴을 포함한 2-gram 의 패턴 데이터를 선택할 수 있다(f). Fuzzy hash values are extracted from the disassembled code, which is the input data (a), and N-gram pattern data for a specific function is extracted (b). At this time, 2-gram pattern data including patterns judged to be similar to malicious code can be selected from the existing semantic pattern set (f).

선택한 패턴의 N-gram 데이터를 벡터화 데이터로 변환하고 벡터화 데이터를 의미가 패턴이 결정된 함수로 분류할 수 있다(g).N-gram data of the selected pattern can be converted into vectorized data, and the vectorized data can be classified by a function whose meaning is determined by the pattern (g).

실시 예에 따른 프로파일링 단계 중 학습 단계(S2610)는 다음과 같이 수행된다. Among the profiling steps according to the embodiment, the learning step (S2610) is performed as follows.

만약 입력된 데이터가 새로운 파일이라면 입력 데이터인 디스어셈블된 코드로부터 퍼지 해쉬(Fuzzy Hash) 값을 추출한다(a).If the input data is a new file, the fuzzy hash value is extracted from the disassembled code that is the input data (a).

추출된 퍼지 해쉬(Fuzzy Hash) 값을 N-gram 데이터(이 예에서는 2-gram)로 벡터화시킨다(b). The extracted fuzzy hash value is vectorized into N-gram data (2-gram in this example) (b).

추출된 특정 패턴에 대해 TF-IDF 와 같은 자연어 처리 분석을 수행한다(c)Perform natural language processing analysis such as TF-IDF on the extracted specific pattern (c)

기존의 공격 식별자(T-ID)와 관련된 패턴을 가지는 데이터 세트 중 유사도가 높은 데이터 세트를 선택하고 나머지는 필터링한다(d). 이때 기존의 의미 패턴 집합에 저장된 데이터 세트들과 비교하여 공격 식별자(T-ID)와 관련된 패턴을 가지는 데이터 세트의 일부 또는 전부의 특징을 포함한 샘플 데이터 세트들을 선택할 수 있다. Among the data sets with patterns related to the existing attack identifier (T-ID), a data set with high similarity is selected and the rest are filtered (d). At this time, sample data sets containing some or all of the features of the data set having a pattern related to the attack identifier (T-ID) can be selected by comparing the data sets stored in the existing semantic pattern set.

추출된 샘플 데이터 세트를 기반으로 벡터화한 N-gram 데이터를 학습시킬 수 있다(e). Vectorized N-gram data can be trained based on the extracted sample data set (e).

N-gram 의 벡터화 데이터를 분류 모델에 입력하여 공격 식별자(T-ID) 별로 확률을 얻는다. 예를 들어 N-gram 구조의 벡터화 데이터가 특정 공격 식별자(T-ID) T1027일 확률이 A%이고, 공격 식별자 T1055일 확률이 (100-A)%인 확률 등의 확률을 얻을 수 있다. N-gram vectorized data is input into the classification model to obtain probabilities for each attack identifier (T-ID). For example, the probability that the N-gram structured vectorized data is a specific attack identifier (T-ID) T1027 is A%, and the probability that it is the attack identifier T1055 is (100-A)%, etc. can be obtained.

분류 모델은 적어도 하나 이상의 디시전 트리를 포함하는 랜덤 포레스트 등의 앙상블 머신 러닝 모델을 이용할 수 있다.The classification model may use an ensemble machine learning model such as random forest that includes at least one decision tree.

여기서 분류 모델에 기반하여 벡터화한 N-gram 데이터가 어떤 공격 기법 또는 공격자인지 판단할 수 있다. Here, based on the classification model, it is possible to determine what kind of attack technique or attacker the vectorized N-gram data is.

분류 모델(e)의 분류 결과 또는 기존의 저장된 패턴의 선택(f) 결과에 따라 입력 데이터를 분류하여 라벨링을 수행한다(g). Labeling is performed by classifying the input data according to the classification result of the classification model (e) or the selection result of the existing stored pattern (f) (g).

최종 라벨링이 수행된 결과는 다음의 도면을 참조하여 예시한다.The results of the final labeling are illustrated with reference to the following drawings.

도 27은 개시하는 실시 예에 따라 입력 데이터를 학습하고 분류하여 공격 식별자와 공격자를 라벨링한 예를 나타낸 도면이다. Figure 27 is a diagram illustrating an example of learning and classifying input data and labeling an attack identifier and an attacker according to the disclosed embodiment.

이 도면은 프로파일러의 결과로서 공격 식별자, 공격자 또는 공격 그룹, 어셈블리 코드에 대응되는 퍼지 해쉬 값, 그에 대응되는 N-gram(여기서는 2-gram 데이터로 기재)를 각각 표 형식으로 나타낸 도면이다. This figure is a table showing the attack identifier, attacker or attack group, fuzzy hash value corresponding to the assembly code, and the corresponding N-gram (herein described as 2-gram data) as a result of the profiler, respectively, in table format.

실시 예에 따라 프로파일링이 완료되면 다음과 같은 공격 방법의 구현과 관련하여 분류된 데이터를 얻을 수 있다. Depending on the embodiment, once profiling is completed, classified data can be obtained related to the implementation of the following attack methods.

실시 예에 의한 프로파일링에 따라 공격 식별자(T-ID)와 공격자 또는 공격자 그룹(Attacker or Group)에 각각 라벨링될 수 있다. Depending on the profiling according to the embodiment, the attack identifier (T-ID) and the attacker or group of attackers may be labeled respectively.

여기서 공격 식별자(T-ID)는 설명한 바와 같이 표준화된 모델에 따를 수 있는데 이 예에서는 MITRE ATT&CK®에서 제공하는 공격 식별자(T-ID)를 부여한 결과를 예시한다. Here, the attack identifier (T-ID) can follow a standardized model as described, and this example illustrates the result of assigning the attack identifier (T-ID) provided by MITER ATT&CK®.

위에서 기술한 바와 같이 식별된 공격자 또는 공격자 그룹(Attacker or Group)에도 라벨링이 추가될 수 있다. 이 도면은 공격자 또는 공격자 그룹(Attacker or Group)의 라벨링으로 공격자 TA504를 식별한 예를 나타낸다. Labeling can also be added to identified attackers or groups of attackers, as described above. This figure shows an example of identifying the attacker TA504 by labeling the attacker or group.

SHA-256 (size)는 각각의 공격 식별자(T-ID) 또는 공격자 그룹(Attacker or Group)에 대응되는 악성 코드의 퍼지 해쉬 값과 데이터 사이즈을 나타낸다. 설명한 바와 같이 이러한 악성 코드는 OP-CODE 와 ASM-CODE의 재배치와 조합에 대응될 수 있다. SHA-256 (size) represents the fuzzy hash value and data size of the malicious code corresponding to each attack identifier (T-ID) or attacker group (Attacker or Group). As explained, these malicious codes can respond to rearrangement and combination of OP-CODE and ASM-CODE.

그리고 N-gram으로 표시한 섹션의 값은 공격 식별자(T-ID) 또는 공격자 그룹과 악성 코드의 퍼지 해쉬 값에 대응되는 N-gram 패턴 데이터로서, 이 예에서는 2-gram 데이터의 일부로 표시하였다. And the value of the section marked as N-gram is N-gram pattern data corresponding to the attack identifier (T-ID) or the fuzzy hash value of the attacker group and malicious code. In this example, it is displayed as part of 2-gram data.

이 도면에서 예시한 바와 같이 악성 코드(OP-CODE 와 ASM-CODE)의 퍼지 해쉬 값과 N-gram 패턴 데이터에 대응되는 공격 식별자(T-ID) 또는 공격자 그룹이 라벨링되어 저장될 수 있다. As illustrated in this figure, the fuzzy hash value of the malicious code (OP-CODE and ASM-CODE) and the attack identifier (T-ID) or attacker group corresponding to the N-gram pattern data may be labeled and stored.

예시한 라벨링된 데이터는 앙상블 머신 러닝의 참조 데이터로 이용될 수 있고, 분류 모델의 참조 데이터로 이용될 수도 있다. The illustrated labeled data can be used as reference data for ensemble machine learning and can also be used as reference data for a classification model.

이하에서 개시한 실시 예들의 성능 결과를 예시한다.Performance results of the disclosed embodiments are illustrated below.

도 28은 실시 예에 따라 공격 식별자를 식별한 결과를 나타낸 도면이다. Figure 28 is a diagram showing the results of identifying an attack identifier according to an embodiment.

이 도면은 유클리언 디스턴스 매트릭스(Euclidean Distance Matrix)를 예시하는데, 유클리언 디스턴스 매트릭스(Euclidean Distance Matrix)는 두 데이터 세트 사이의 유사도를 나타낼 수 있다. This figure illustrates the Euclidean Distance Matrix, which can represent the similarity between two data sets.

이 도면에서 밝은 부분은 두 데이터 세트의 유사도가 낮은 것을 의미하고 어두운 부분은 두 데이터 세트의 유사도가 높은 것을 의미한다. In this figure, the bright area means that the similarity between the two data sets is low, and the dark area means that the similarity between the two data sets is high.

이 도면에서 T10XX는 공격 식별자(T-ID)를 의미하고 괄호 안에 character T, K, L은 각각 해당 공격 식별자(T-ID)에 따른 공격 기법을 작성한 공격자 그룹을 의미한다. In this figure, T10XX represents an attack identifier (T-ID), and the characters T, K, and L in parentheses each represent an attacker group that created an attack technique according to the attack identifier (T-ID).

즉, 행과 열은 각각의 공격자 그룹들(T, K, L)이 생성한 공격 식별자(T-ID)들을 의미하며 행과 열은 동일한 의미를 가진다. 예를 들어 T1055(K)는 L 공격자 그룹이 생성한 T1055 공격을 의미하고, T1055(K)는 K 공격자 그룹이 생성한 동일한 공격 방법 T1055를 의미한다. That is, the rows and columns represent attack identifiers (T-IDs) generated by each attacker group (T, K, L), and the rows and columns have the same meaning. For example, T1055(K) refers to the T1055 attack created by the L attacker group, and T1055(K) refers to the same attack method T1055 created by the K attacker group.

각각의 데이터 세트의 샘플들은 자신의 샘플을 포함하기 때문에 다른 샘플들과의 거리를 각각 계산하면 왼쪽 위에서 오른쪽 아래의 대각선 방향으로 동일성이 높은 분포를 나타낸다. Since the samples in each data set include their own samples, calculating the distance to other samples shows a distribution with high identity in the diagonal direction from upper left to lower right.

이 도면을 보면 동일한 공격 식별자(T-ID)의 경우 공격자 그룹이 다르더라도 유사한 특징을 나타내는 것을 확인할 수 있다. 예를 들어 T1027의 공격 식별자는 공격 그룹이 T 또는 K라고 하더라도 공격 기법이 유사하면 유사도가 높게 평가될 수 있다.Looking at this figure, you can see that the same attack identifier (T-ID) shows similar characteristics even if the attacker group is different. For example, the attack identifier of T1027 can be evaluated as highly similar if the attack techniques are similar even if the attack group is T or K.

따라서, 위의 실시 예와 같이 추출한 데이터 세트를 기반으로 학습을 진행하면 동일한 공격자가 구현한 같은 공격 기법(T-ID)에 대한 특징은 명확하게 식별되고(가장 어두운 부분), 다른 공격자가 구현한 동일한 공격 기법(T-ID)은 유사도가 높은 것(중간 어두운 부분)을 확인할 수 있다.Therefore, if learning is performed based on the extracted data set as in the above example, the characteristics of the same attack technique (T-ID) implemented by the same attacker are clearly identified (the darkest part), and the characteristics of the same attack technique (T-ID) implemented by the same attacker are clearly identified (the darkest part). The same attack technique (T-ID) can be confirmed to have a high degree of similarity (middle dark area).

따라서, 이와 같이 OP-CODE 와 ASM-CODE 의 조합에 기초한 샘플 데이터를 추출하여 적용해 공격 기법을 분류하면 공격자가 다른 경우라고 하더라도 특정의 공격 기법 또는 식별자(T-ID)를 확실하게 분류해 낼 수 있다. 반대로 OP-CODE 와 ASM-CODE 의 조합을 통해 악성 코드 내부에 구현된 특정 코드를 명확하게 식별할 수 있을 뿐만 아니라 공격자, 공격 식별자를 포함함 공격 구현 방식을 식별할 수 있다.Therefore, if you classify attack techniques by extracting and applying sample data based on the combination of OP-CODE and ASM-CODE, it is possible to reliably classify a specific attack technique or identifier (T-ID) even if the attacker is different. You can. Conversely, through the combination of OP-CODE and ASM-CODE, not only can the specific code implemented inside the malicious code be clearly identified, but also the attacker and the attack implementation method, including the attack identifier, can be identified.

도 29는 실시 예에 따라 공격 식별자에 따른 그램 데이터 패턴을 예시한 도면이다. Figure 29 is a diagram illustrating a gram data pattern according to an attack identifier according to an embodiment.

이 도면은 서로 다른 공격 식별자 (T-ID)가 다른 경우 그램 데이터의 패턴을 예시한 도면이다. 예를 들어 공격 식별자 T1027과 T1055를 포함한 각각의 악성 코드를 2-gram의 패턴 데이터로 변환하여 실시예에 따라 분류하면 공격 식별자 (T-ID)가 별로 다른 그램 패턴을 보인다. This figure illustrates the pattern of gram data when different attack identifiers (T-IDs) are different. For example, if each malicious code including attack identifiers T1027 and T1055 is converted into 2-gram pattern data and classified according to the embodiment, the attack identifier (T-ID) shows different gram patterns.

즉, OP-CODE 와 ASM-CODE 의 조합을 기반으로 악성 코드 내 공격 기법들을 식별하는 실시 예에 따르면 공격 식별자 (T-ID)별로 그램 데이터의 패턴이 나뉠 수 있다. That is, according to an embodiment of identifying attack techniques within malicious code based on a combination of OP-CODE and ASM-CODE, patterns of gram data can be divided by attack identifier (T-ID).

이 결과는 본 실시예에 따르면 공격자가 같더라도 악성 코드 내 숨겨진 여러 가지 공격 식별자 (T-ID)들을 명확하게 식별할 수 있다는 것을 의미한다. This result means that according to this embodiment, even if the attacker is the same, various attack identifiers (T-IDs) hidden in the malicious code can be clearly identified.

도 30은 개시한 사이버 위협 정보를 처리하는 실시 예의 성능을 예시한 도면이다. Figure 30 is a diagram illustrating the performance of an embodiment of processing the disclosed cyber threat information.

이 도면은 개시한 실시예의 성능 중 공격 식별자 또는 공격자를 분류하는 연산 속도에 대한 성능을 예시한 것이다. This figure illustrates the performance of the disclosed embodiment in terms of calculation speed for classifying attack identifiers or attackers.

가로축은 데이터베이스에 저장된 데이터의 양을 나타내고 세로축은 공격 식별자를 분류하는데 소요되는 시간을 나타낸다. The horizontal axis represents the amount of data stored in the database, and the vertical axis represents the time required to classify attack identifiers.

데이터베이스에 저장된 퍼지 해쉬 데이터의 데이터의 개수를 증가시키면서, 일반적인 샘플을 각각 N : 1 (N대 1)로 비교하면 데이터의 개수 에 따라 처리 시간이 기하급수적으로 증가할 수 있다. 예를 들어 단순히 해쉬 값이나 퍼지 해쉬 값의 유사도만을 비교하면(ssdeep로 표시) 비교하는 데이터의 양에 따라 소요시간이 매우 증가한다.As the number of fuzzy hash data stored in the database increases, the processing time can increase exponentially depending on the number of data by comparing typical samples N:1 (N to 1). For example, if you simply compare the similarity of hash values or fuzzy hash values (expressed as ssdeep), the time required greatly increases depending on the amount of data being compared.

그러나 실시 예의 앙상블 머신 러닝 모델의 디시전 트리(Decision Tree) 모델을 이용하면 공격 식별자 등을 분류하는 추론 시간이 데이터의 개수가 증가해도 증가하지 않는다.However, when using the decision tree model of the ensemble machine learning model of the embodiment, the inference time for classifying attack identifiers, etc. does not increase even as the number of data increases.

즉 최적화된 비교 트리를 생성하는 디시전 트리(Decision Tree) 모델은 노드를 병렬적으로 처리할 수 있으므로 데이터 개수가 증가해도 계산 속도에 큰 영향을 받지 않는 장점이 있다. In other words, the Decision Tree model, which creates an optimized comparison tree, has the advantage of not being significantly affected by calculation speed even if the number of data increases because nodes can be processed in parallel.

도 31은 사이버 위협 정보의 탐지하는 탐지 엔진들을 이용하여 탐지 명을 제공하는 예를 나타낸 도면이다. Figure 31 is a diagram showing an example of providing a detection name using detection engines that detect cyber threat information.

악성코드 탐지 분야의 다양한 엔진들이 개발되어 사이버 위협 정보를 탐지 수행이 되고 있다. 인공 지능 분석이 늘어나면서 악성 코드의 탐지 능력이 증가하였다고 하더라도 탐지된 악성 코드를 제대로 설명하고 그 정보를 제공하지 못하면 이러한 탐지 능력의 효용성이 매우 떨어진다. Various engines in the field of malware detection have been developed to detect cyber threat information. Even though the detection ability of malicious code has increased as artificial intelligence analysis has increased, the effectiveness of this detection ability is greatly reduced if the detected malicious code is not properly described and information provided.

이 도면은 VirusTotal 사이트에서 제공하는 해외 유명의 탐지 엔진들(3210)(왼쪽)과, 각 그 탐지 엔진이 제공하는 동일한 악성 코드의 탐지명(오른편)을 예시한 것이다. This diagram shows famous overseas detection engines (3210) (left) provided by the VirusTotal site and the detection names of the same malicious code provided by each detection engine (right).

동일한 악성 코드의 식별과 전달이 정확하게 이루어지지 않기 때문에 해당 악성 코드가 어떤 이유로 탐지되었는지 식별하기 어렵다. 따라서 보안 담당자가 해당 정보에 기초하여 어떤 오브젝트에 대한 조치를 취해야 하는지 대응책을 찾기 힘들었고 보안 위협에 대한 리스크에 대응하기 힘들었다. Because the same malicious code is not accurately identified and transmitted, it is difficult to identify why the malicious code was detected. Therefore, it was difficult for security personnel to find countermeasures for which objects to take action based on the information, and it was difficult to respond to the risks of security threats.

그러나 개시하는 실시 예는 표준화된 모델인 MITRE ATT&CK 등에서 제공하는 공격 식별자의 매트릭스 요소와 그 조합으로 사이버 위협 정보를 제공하고 표준화된 식별자(T-ID)로 악성 코드에 대한 정보 제공함으로써 범용성과 효율성을 매우 높일 수 있다. However, the disclosed embodiment provides cyber threat information using matrix elements and combinations of attack identifiers provided by a standardized model, MITER ATT&CK, etc., and provides information about malicious code using a standardized identifier (T-ID), thereby improving versatility and efficiency. It can be raised very high.

이하에서는 개시한 실시 예에 기반하여 공격자 추적하고 새로운 공격을 예측할 수 있는 예를 부연하여 설명한다.Below, an example of tracking an attacker and predicting a new attack based on the disclosed embodiment will be explained in detail.

도 32는 실시 예에 따라 새로운 악성 코드와 공격 방식을 예시하는 일 예를 나타낸 도면이다. FIG. 32 is a diagram illustrating an example of a new malicious code and attack method according to an embodiment.

코드의 개발자는 코드를 생성하는데 본인만의 고유의 습관들, 예를 들어 변수명 선언, 함수 호출 구조, 파라미터 호출 방법 등을 사용하는 경향이 매우 높다. 프로그램의 개발이 논리의 흐름과 경험에 기반해 생성되기 때문에 이러한 습관을 완전히 변경하는 것은 매우 어려운 것이다. Code developers have a high tendency to use their own habits, such as variable name declaration, function call structure, and parameter calling methods, when creating code. Because program development is based on logic flow and experience, it is very difficult to completely change these habits.

이러한 근거에 기반하여 실시 예는 코드 상의 이와 같은 결과물들을 개발자의 핑거 프린팅로 사용하여 공격자를 추적할 수 있다. Based on this basis, the embodiment can track the attacker by using these results in the code as the developer's fingerprint.

악성 코드의 공격 식별자(T-ID)를 기반으로 학습 데이터를 구성할 경우 위와 같은 특징 정보를 이용해서 개발자를 특정할 수 있다. 악성 코드의 디스어셈블된 코드는 이러한 개발자의 고유 특성이나 습관을 반영하고 있다. When constructing learning data based on the attack identifier (T-ID) of malicious code, the developer can be identified using the above characteristic information. The disassembled code of malicious code reflects the developer's unique characteristics and habits.

특정 해커가 특정 공격 기법을 구현하기 위해서 본인이 인지하지 못한 본인만의 사용하는 기법을 사용할 수 있으며 그 코드의 복잡도가 증가할수록 특정 개발자를 지정할 수 있는 가능성이 높아진다.In order to implement a specific attack technique, a specific hacker may use his or her own techniques that the hacker is not aware of, and as the complexity of the code increases, the possibility of designating a specific developer increases.

또한 각 공격 식별자(T-ID) 별 OP-CODE 와 ASM-CODE 의 코드 블록을 조합하면 아직 알려지지 않은 신종 또는 변종의 악성 코드 탐지에도 사용될 수 있다. Additionally, by combining the code blocks of OP-CODE and ASM-CODE for each attack identifier (T-ID), it can be used to detect new or variant malicious codes that are not yet known.

이 도면은 아래와 실시 예에 따라 디스어셈블된 OP-CODE 와 ASM-CODE의 조합을 통해 현존하지 않는 새로운 TTP의 조합을 만드는 예를 개시한다. This drawing discloses an example of creating a new TTP combination that does not exist through a combination of OP-CODE and ASM-CODE disassembled according to the embodiment below.

이 예에서 T1044, T1039, T1211,…, T-N은 각각 공격 식별자(T-ID)들을 예시한다. In this example, T1044, T1039, T1211,… , T-N respectively illustrate attack identifiers (T-ID).

각 공격 식별자에 대응하는 OP-CODE 1 ~ N 세트는 각각의 각 공격 식별자의 악성 코드에 포함되는 코드 세트를 의미한다. The OP-CODE 1 to N set corresponding to each attack identifier refers to the code set included in the malicious code of each attack identifier.

여기서 예시한 바와 같이 malware 악성 코드는 기존에 알려진 공격 식별자T1044의 OP-CODE 1, T1039의 OP-CODE2, T1211의 OP-CODE3, 및, T-N의 OP-CODE 1 등을 조합을 포함하는 악성 코드라고 하자. 이러한 OP-CODE의 조합의 세트를 포함하는 malware 악성 코드는 이미 알려진 코드일 수도 있고 알려지지 않은 코드일 수도 있다. As exemplified here, malware malicious code is said to contain a combination of the previously known attack identifiers OP-CODE 1 of T1044, OP-CODE2 of T1039, OP-CODE3 of T1211, and OP-CODE 1 of T-N. let's do it. Malware malicious code containing a set of these OP-CODE combinations may be already known or unknown.

유사한 방식으로 T1044의 OP-CODE 3, T1039의 OP-CODEN, T1211의 OP-CODE4 및, T-N의 OP-CODE 2 등을 포함하는 새로운 공격 기법을 찾을 수 있다. In a similar way, new attack techniques can be found, including OP-CODE 3 of T1044, OP-CODEN of T1039, OP-CODE4 of T1211, and OP-CODE 2 of T-N.

또는 T1044의 OP-CODE 4, T1039의 OP-CODE4, T1211의 OP-CODE2 및, T-N의 OP-CODE 3 등을 포함하는 새롭고 알려지지 않은 공격 기법을 찾을 수도 있다.Alternatively, you may find new and unknown attack techniques, including OP-CODE 4 of T1044, OP-CODE 4 of T1039, OP-CODE 2 of T1211, and OP-CODE 3 of T-N.

위에서는 편의상 OP-CODE의 조합만으로 공격 기법을 찾는 예를 개시하였으나, OP-CODE와 ASM-CODE를 조합하여 디스어셈블드 코드를 생성하면 공격 기법을 찾을 뿐만 아니라 공격자나 공격 그룹도 식별할 수 있다. Above, for convenience, an example of finding an attack technique using only a combination of OP-CODE was given. However, if you generate a disassembled code by combining OP-CODE and ASM-CODE, you can not only find the attack technique, but also identify the attacker or attack group. .

유사하게 OP-CODE와 ASM-CODE를 포함하는 디스어셈블드 코드의 재조합을 통해 새로운 코드 세트를 생성할 수 있다. 실행 파일의 함수에 대응되는 OP-CODE 뿐만 아니라 실행 파일의 대상이나 저장 위치를 나타내는 ASM-CODE를 재구성하거나 또는 재조합된 디스어셈블드 코드를 생성할 수 있다. Similarly, a new code set can be created through recombination of disassembled code including OP-CODE and ASM-CODE. In addition to the OP-CODE corresponding to the function of the executable file, the ASM-CODE indicating the target or storage location of the executable file can be reconstructed or a reassembled disassembled code can be generated.

이러한 재구성 디스어셈블드 코드를 머신 러닝을 통해 학습하여 기존에 분석된 악성 코드와 비교하면 세분화된 새로운 방식의 공격 기법과 이를 생성하는 공격자를 식별하는 것을 넘어 추후 공격 예측이 가능하다. By learning this reconstructed disassembled code through machine learning and comparing it with previously analyzed malicious code, it is possible to predict future attacks beyond identifying new, granular attack techniques and the attackers who create them.

이렇게 새로운 TTP 의 조합과 공격 경로의 조합은 지금까지 존재 하지 않았던 새로운 사이버 위협 또는 악성코드의 공격 방법을 만들어 낼 수 있는데, 실시 예는 이렇게 기존의 디스어셈블된 코드 세트를 조합하여 공격 가능한 코드가 생성되는지 확인할 수 있다. 공격 가능한 코드인지 여부는 동적 분석 등의 테스트 등을 통해 확인할 수도 있다. This combination of new TTPs and attack paths can create new cyber threats or malware attack methods that have not existed before. In the embodiment, attackable code is created by combining existing disassembled code sets. You can check if it works. Whether or not the code is attackable can be confirmed through tests such as dynamic analysis.

따라서 실시 예는 디스어셈블된 코드 세트의 조합을 통해 향후 있을 보안 위협에 대응할 수 있는 정보를 제공할 수 있어 이에 대한 선제적인 대응이 가능하다. Therefore, the embodiment can provide information to respond to future security threats through a combination of disassembled code sets, enabling preemptive response to them.

예를 들면 조합된 코드에 기반하여 각 공격 기법(TTP) 별 사용 빈도나 사용 했을 때 성공 가능성 등의 값을 반영한 코드를 생성할 수 있다. For example, based on the combined code, a code can be created that reflects values such as the frequency of use of each attack technique (TTP) or the probability of success when used.

또는 인공 지능을 학습을 통해 성공 확률이 높은 새로운 코드 블록 조합의 공격 코드나 악성 코드를 미리 생성할 수 있다. 그리고 이러한 정보를 반영하여 기존 보안 제품들이 대응 할 수 있는 패턴을 생성하거나 내부 시스템의 취약한 부분의 보안성을 강화할 수 있는 정보를 제공할 수 있다. Alternatively, through artificial intelligence learning, attack code or malicious code with a new combination of code blocks with a high probability of success can be generated in advance. And by reflecting this information, it is possible to create patterns that existing security products can respond to or provide information that can strengthen the security of vulnerable parts of the internal system.

이하에서는 위에서 개시한 사이버 위협 정보 처리 장치 및 그 방법에 대한 다른 실시 예를 개시한다.Below, other embodiments of the cyber threat information processing device and method disclosed above are disclosed.

위에서 개시한 사이버 위협 정보 처리는 함수 단위의 위협 정보의 특징에 대한 분석이 가능하였다. 그러나 동일한 결과를 행하는 프로그램이라도 공격기법 EH는 공격 그룹을 식별하기 어려운 경우가 있을 수 있다. 예를 들어 프로그램 내 함수들을 포함하는 프로그램의 로직(logic)에 따라 또는 프로그램의 로직의 변화가 없더라도 함수들이 분리되는 등 다르게 활용되는 경우 공격기법이나 공격 그룹을 명확하게 식별하기 어려울 수 있다.The cyber threat information processing disclosed above enabled analysis of the characteristics of threat information at the function level. However, even if the program achieves the same result, it may be difficult to identify the attack group using the attack technique EH. For example, it may be difficult to clearly identify the attack technique or attack group if the functions are used differently depending on the logic of the program including the functions within the program, or if the functions are separated even if there is no change in the logic of the program.

이렇게 수행 결과는 동일하지만 수행 과정에 차이에 따라 발생하는 공격 기법 또는 공격 그룹의 차이가 실질적으로 다른 공격 기법이거나 또는 다른 공격 그룹에 의해 행해지는 것인지를 더욱 명확하게 탐지하고 인지할 수 있는 실시 예를 개시한다.In this way, the execution result is the same, but an embodiment in which the difference in the attack technique or attack group that occurs due to the difference in the execution process can more clearly detect and recognize whether it is a substantially different attack technique or is carried out by a different attack group. Begin.

도 33은 함수 단위의 공격 기법 및 공격 그룹 식별을 수행하는 예를 설명하기 위한 도면이다. Figure 33 is a diagram for explaining an example of performing function-level attack techniques and attack group identification.

이 예에서 실행파일(예, EXE)를 디스어셈블(disassemble)하고 그 실행파일에 포함된 함수들을 식별하였다고 가정한다. 여기서 식별된 함수들을 Function 1, Function 2, Function 3, Function 4로 예시한다. In this example, we assume that you have disassembled an executable file (e.g., EXE) and identified the functions contained in the executable file. The functions identified here are exemplified as Function 1, Function 2, Function 3, and Function 4.

식별된 함수들 중 Function 2은 함수 연산을 수행하는 인스트럭션(Instruction)들을 포함할 수 있다. 여기서 함수 Function 2에 포함되는 인스트럭션(Instruction)들을 Instruction 1, Instruction 2, Instruction 3, Instruction 4, Instruction 5, Instruction 6, 및 Instruction 7로 표시하였다. Among the identified functions, Function 2 may include instructions that perform function operations. Here, the instructions included in function Function 2 are indicated as Instruction 1, Instruction 2, Instruction 3, Instruction 4, Instruction 5, Instruction 6, and Instruction 7.

그런데 프로그램 상에서 하나의 함수는 수행 시에 여러 개의 서브 함수에 따라 분리되어 수행되는 경우가 있다. 이 예에서 Function 2이 2개의 서브 함수로 분리되어 수행된다고 가정한다. 그러면 Function 2에 포함되는 2개의 서브 함수에 인스트럭션들로 분리될 수 있다. However, in a program, there are cases where one function is performed separately according to several sub-functions during execution. In this example, it is assumed that Function 2 is performed separately into two sub-functions. Then, it can be separated into instructions in the two sub-functions included in Function 2.

여기서는 설명의 편의상 Function 2에 포함되는 1개의 서브 함수에 Instruction 1, Instruction 2, 및 Instruction 3이 포함되고, 다른 1개의 서브 함수에 Instruction 4, Instruction 5, Instruction 6, 및 Instruction 7는 경우를 예시하였다.Here, for convenience of explanation, one sub-function included in Function 2 includes Instruction 1, Instruction 2, and Instruction 3, and the other sub-function includes Instruction 4, Instruction 5, Instruction 6, and Instruction 7. .

그러나 프로그램 상에서는 서브 함수들은 하나의 함수 Function 2에 포함되어 있을 수 있다. However, in a program, subfunctions may be included in one function, Function 2.

함수 단위로 사이버 위협과 관련된 특징 정보를 추출하는 경우 Function 2에 대응되는 1개의 특징 정보(사이버 위협 특징 정보 A, 간단히 특징 정보 A로 표시)가 식별될 수 있다. When extracting feature information related to cyber threats in function units, one feature information (cyber threat feature information A, simply indicated as feature information A) corresponding to Function 2 can be identified.

위에 개시된 함수 단위의 사이버 위협과 관련된 특징 정보를 위에서 기재한 실시 예에 따라 분석하면 공격 기법과 공격 그룹을 식별할 수 있다.By analyzing characteristic information related to the cyber threat in the function unit disclosed above according to the embodiment described above, attack techniques and attack groups can be identified.

도 34는 함수가 분리될 경우의 공격 기법 및 공격 그룹 식별을 수행하는 예를 설명하기 위한 도면이다.Figure 34 is a diagram to explain an example of an attack technique and attack group identification when functions are separated.

이 실시 예는 위에서 개시한 예와 동일한 결과를 나타내는 실시 예이나, 여기서는 함수들 중 하나의 함수가 명확하게 프로그램 상 서브 함수로 분리되는 경우를 예시한다. This example shows the same results as the example disclosed above, but here it illustrates a case where one of the functions is clearly separated into a sub-function in the program.

즉, 실행 파일로부터 식별된 함수들 중 Function 2가 프로그램 상에서 Function 2-1 및 Function 2-2로 분리되는 경우를 예시한다. 여기서 Function 2가 Function 2-1 및 Function 2-2로 분리되는 경우라도 Function 2의 하나의 함수가 수행되는 경우와 프로그램 상 로직은 변화는 없다. In other words, this example illustrates the case where Function 2 among the functions identified from the executable file is separated into Function 2-1 and Function 2-2 in the program. Here, even if Function 2 is divided into Function 2-1 and Function 2-2, there is no change in program logic compared to when one function of Function 2 is performed.

프로그램 상 로직은 동일하지만 Function 2가 단순히 2개의 함수들(Function 2-1 및 Function 2-2)로 분리되는 경우 각 함수에 대응되는 특징 정보들(특징 정보 B 및 특징 정보 C)이 달라지므로 특징 정보를 기반으로 한 공격기법과 공격그룹의 식별 결과는 달라질 수 있다.The logic in the program is the same, but if Function 2 is simply divided into two functions (Function 2-1 and Function 2-2), the feature information (feature information B and feature information C) corresponding to each function is different, so the features The results of identifying attack techniques and attack groups based on information may vary.

따라서 이렇게 하나의 함수의 실행과 프로그램 상 동일한 로직을 실행되는 여러 함수를 기반으로 공격기법 또는 공격그룹을 식별하는 경우라도 이하의 실시 예에 따르면 이를 동일한 공격기법과 공격그룹으로 식별할 수 있다. Therefore, even if an attack technique or attack group is identified based on the execution of one function and several functions executing the same logic in the program, it can be identified as the same attack technique and attack group according to the following embodiment.

이하의 실시 예는 프로그램 내의 여러 함수들이 수행하는 인스트럭션들에 따른 제어흐름과 순서를 고려한 특징 정보를 기반으로 공격기법과 공격그룹을 식별하는 실시 예들을 개시한다. The following embodiments disclose examples of identifying attack techniques and attack groups based on feature information considering the control flow and order of instructions performed by various functions in the program.

프로그램의 함수들 내의 인스트럭션들의 흐름과 순서를 기반으로 특징 정보를 이용하면 프로그램 내에 함수들이 다르더라도 실질적으로 동일한 로직을 구현하면 특징 정보를 얻을 수 있다. By using feature information based on the flow and order of instructions within the functions of a program, feature information can be obtained by implementing substantially the same logic even if the functions within the program are different.

사이버 위협을 발생시키는 프로그램의 형식이 조금씩 변형되는 경우이거나 변종이라도 하더라도 이러한 특징 정보를 기반으로 공격기법과 공격그룹을 명확하게 식별할 수 있다. Even if the format of the program that creates the cyber threat is slightly modified or mutated, the attack technique and attack group can be clearly identified based on this characteristic information.

이하에서 함수 내 인스트럭션들에 따른 제어흐름 프로파일링과 순서들을 식별하는 예를 개시한다. Below, an example of control flow profiling and identifying sequences according to instructions within a function is disclosed.

도 35는 실시 예에 따라 사이버 위협에 관련된 특징 정보를 얻는 예를 개시한다.Figure 35 discloses an example of obtaining characteristic information related to a cyber threat according to an embodiment.

여기서 EXE로 표시한 실행 함수를 디스어셈블(Disassemble)하여 여러 가지 함수들을 포함하는 제어블록(ControlBlock)들을 얻을 수 있다. Here, you can disassemble the execution function displayed as EXE to obtain control blocks containing various functions.

얻은 제어블록(ControlBlock)들 내에 인스트럭션들의 관계 상의 제어흐름을 얻은 후에, 그 제어흐름에 따른 제어블록의 순서를 확인하고 이를 기반으로 인스트럭션 시퀀스를 얻을 수 있다. After obtaining the control flow in the relationship between instructions within the obtained control blocks (ControlBlocks), the order of the control blocks according to the control flow can be checked and an instruction sequence can be obtained based on this.

그리고 얻은 인스트럭션 시퀀스에 따라 사이버 위협 특징 정보를 식별할 수 있다. And cyber threat characteristic information can be identified according to the obtained instruction sequence.

제어블록 또는 이에 대응하는 코드블록을 얻는 상세한 실시 예들을 위에서 이미 개시하였다. Detailed embodiments of obtaining a control block or a corresponding code block have already been disclosed above.

이 예에서 실행 함수(EXE)를 디스어셈블(Disassemble)하여 얻은 제어블록(ControlBlock)들은 ControlBlock1, ControlBlock2, ControlBlock3, … , ControlBlock6으로 표시한다. In this example, the control blocks obtained by disassembling the executable function (EXE) are ControlBlock1, ControlBlock2, ControlBlock3, … , displayed as ControlBlock6.

여기서 제어블록(ControlBlock)들은 각각 ControlBlock1, ControlBlock2, ControlBlock3, … , ControlBlock6은 각 인스트럭션 세트(Instruction Set)에 대응될 수 있다. 위에서 설명한 것과 같이 위에서 설명한 인스트럭션 세트(Instruction Set)은 각각 다르지만 각 인스트럭션 세트 내의 수행 로직은 동일할 할 수도 있다. Here, the control blocks are ControlBlock1, ControlBlock2, ControlBlock3, … , ControlBlock6 may correspond to each instruction set. As described above, the instruction sets described above are different, but the execution logic within each instruction set may be the same.

따라서, 제어블록(ControlBlock)들이 동일한 로직을 수행하는지를 식별하기 위해 제어블록(ControlBlock)들에 대해 제어흐름을 분석한다. Therefore, the control flow of the control blocks (ControlBlocks) is analyzed to identify whether the control blocks (ControlBlocks) perform the same logic.

예를 들어 여기서는 실시 예를 쉽게 설명하기 위해 프로그램 실행에 따른 코드블록들의 제어흐름을 분석한 그래프를 생성하여 설명한다. For example, here, in order to easily explain the embodiment, a graph analyzing the control flow of code blocks according to program execution is created and explained.

예를 들어 제어블록(ControlBlock)1에 포함되는 인스트럭션 세트 중 실행 순서에 따른 인스트럭션을 C1, C2, C3, …, C6로 표시한다. 조금 더 이해를 쉽게 하기 위해 인스트럭션 세트 중 실행 순서에 따른 인스트럭션을 제어흐름 그래프(Control Flow Graph, CFG)로 표시하였다. For example, among the instruction set included in ControlBlock 1, the instructions according to the execution order are C1, C2, C3,... , denoted as C6. To make it easier to understand, the instructions according to the execution order among the instruction set are displayed as a control flow graph (CFG).

이 예에 나타난 인스트럭션들의 제어흐름 그래프내에 인스트럭션들을 순서를 얻을 수 있는데 여기서는 얻은 순서를 깊이 우선 탐색(Depth First Search, DFS) 방식으로 나타내었다. 순서를 깊이 우선 탐색(Depth First Search, DFS) 방식은 하나의 탐색 트리에 첨가 노드로 인스트럭션을 선택하고 이 노드에 적용 가능한 인스트럭션을 적용하고 탐색 트리에 다음 수준의 한 개의 자식 노드로서 인스트럭션을 첨가하는 식으로 반복하는 방식이다. The order of instructions can be obtained within the control flow graph of the instructions shown in this example. Here, the order obtained is expressed using Depth First Search (DFS). The Depth First Search (DFS) method selects an instruction as an addition node to one search tree, applies instructions applicable to this node, and adds the instruction as a child node of the next level to the search tree. It is a repetitive method.

그러면 제어블록(ControlBlock)에 대응되는 인스트럭션세트 내의 인스트럭션 제어흐름에 따라 적용되는 인스트럭션 순서를 얻을 수 있다. Then, the order of instructions applied according to the instruction control flow in the instruction set corresponding to the control block (ControlBlock) can be obtained.

이 예에서 ControlBlock1에 대응되는 인스트럭션세트1에 포함되는 인스트럭션들의 제어흐름에 따른 순서는 (C1, C2, C4, C5, C3, C6)가 될 수 있다. In this example, the order according to the control flow of instructions included in instruction set 1 corresponding to ControlBlock1 may be (C1, C2, C4, C5, C3, C6).

ControlBlock2에 대응되는 인스트럭션세트2에 포함되는 인스트럭션들의 제어흐름에 따른 순서는 (C2, C4, C5)가 될 수 있다.The order according to the control flow of instructions included in instruction set 2 corresponding to ControlBlock2 may be (C2, C4, C5).

ControlBlock3에 대응되는 인스트럭션세트3에 포함되는 인스트럭션들의 제어흐름에 따른 순서는 (C3, C6)가 될 수 있다The order according to the control flow of instructions included in instruction set 3 corresponding to ControlBlock3 can be (C3, C6).

그리고 얻은 인스트럭션 순서에 따른 인스트럭션 시퀀스를 생성할 수 있는데, 이렇게 인스트럭션 시퀀스에 따라 사이버 위협에 대한 특징 정보를 구분할 수 있다. In addition, an instruction sequence can be created according to the obtained instruction order, and in this way, characteristic information about cyber threats can be distinguished according to the instruction sequence.

여기서는 ControlBlock1에 대응되는 인스트럭션세트1를 제어흐름에 따른 순서에 따라 분류한 인스트럭션 시퀀스들이 6개이고, 각 6개의 인스트럭션 시퀀스들마다 하나의 특징 정보가 추출되는 예를 개시하였다. Here, an example is disclosed in which instruction set 1 corresponding to ControlBlock1 is classified into six instruction sequences in order of control flow, and one feature information is extracted for each of the six instruction sequences.

이와 같이 하면 프로그램 내에 하나의 함수가 분리되거나 실질적으로 동일한 로직으로 수행되는 함수들로 변경되더라도 동일한 로직에 따른 사이버 위협 정보를 구분해 낼 수 있다. In this way, even if one function in the program is separated or changed into functions that perform substantially the same logic, cyber threat information according to the same logic can be distinguished.

이하에서는 여러 가지 함수들을 포함하는 제어블록(ControlBlock)들 내에 여러 가지 제어흐름들을 이용하여 인스트럭션 시퀀스들을 얻는 여러 가지 예들을 개시한다.Below, various examples of obtaining instruction sequences using various control flows within control blocks containing various functions are disclosed.

먼저 포함하는 제어블록(ControlBlock)들 내에 여러 가지 제어흐름을 얻는 예를 개시한다. First, an example of obtaining various control flows within the included control blocks (ControlBlocks) will be disclosed.

실행 파일로부터 디스어셈블을 수행하여 얻은 제어블록(ControlBlock)들을 얻는다. Obtain control blocks obtained by disassembling the executable file.

제어블록(ControlBlock)들 내부에 인스트럭션들 중 제어블록 내 특정 블록이나 또는 해당 제어블록 밖의 제어블록을 레퍼런스하는 인스트럭션을 식별할 수 있다. 이렇게 코드 상에 분기하는 인스트럭션을 여기서는 브랜치 인스트럭션(branch instruction) 타입으로 호칭한다. Among the instructions inside the control blocks, an instruction that references a specific block within the control block or a control block outside the control block can be identified. Instructions that branch in the code like this are referred to here as branch instruction types.

브랜치 인스트럭션(branch instruction) 타입의 예로서 Call 함수나 Jump 함수 등이 있을 수 있다. 이 함수들은 그 제어블록 내 특정 블록이나 또는 해당 제어블록 밖의 제어블록을 레퍼런스할 수 있다. Examples of branch instruction types may include Call functions or Jump functions. These functions can reference specific blocks within that control block or control blocks outside that control block.

따라서, 이러한 브랜치 인스트럭션(branch instruction)에 따른 레퍼런스 주소를 식별하면 인스트럭션들의 제어흐름을 얻을 수 있다.Therefore, by identifying the reference address according to this branch instruction, the control flow of the instructions can be obtained.

도 36은 실시 예에 따라 브랜치 인스트럭션(branch instruction) 계열을 이용하여 제어흐름을 얻는 과정을 예시한다.Figure 36 illustrates a process of obtaining control flow using a branch instruction series according to an embodiment.

디스어셈블된 제어블록(cblk1)을 추출하고 추출한 제어블록(cblk1) 내부에서 브랜치 인스트럭션 타입의 인스트럭션을 식별한다.Extract the disassembled control block (cblk1) and identify the branch instruction type instruction within the extracted control block (cblk1).

코드 상에 분기하는 브랜치 인스트럭션 타입의 인스트럭션 지칭하는 레퍼런스 주소 중 제어블록(cblk1)의 외부의 위치를 지칭하는 레퍼런스(아웃고잉 레퍼런스, outgoing-ref로 표시)를 확인한다. Among the reference addresses that point to branch instruction type instructions that branch in the code, check the reference (outgoing reference, indicated as outgoing-ref) that points to a location outside the control block (cblk1).

이 도면의 왼쪽은 특정한 아웃고잉 레퍼런스 분석의 일 예를 설명하기 위한 예이다.The left side of this figure is an example to explain an example of a specific outgoing reference analysis.

이 예에서는 아웃고잉 레퍼런스가 아닌 그 제어블록(cblk1)의 내부의 위치를 지칭하는 레퍼런스(Reference A)는 무시할 수도 있다. 즉, 레퍼런스 A는 제어블록(cblk1)의 내부를 가리키기 때문에 제어흐름 생성시 고려하지 않을 수 있다.In this example, the reference (Reference A), which refers to an internal location of the control block (cblk1) rather than an outgoing reference, may be ignored. In other words, because reference A points to the inside of the control block (cblk1), it may not be considered when creating the control flow.

그리고 그 제어블록(cblk1)의 아웃고잉 레퍼런스가 다른 제어블록(cblk2)의 시작 주소 또는 시작 인스트럭션을 가리키는 경우(Reference B)와, 다른 제어블록(cblk3)의 내부 주소 또는 내부 인스트럭션을 가리키는 경우(Reference C)를 나누어 제어흐름을 생성할 수 있다.And when the outgoing reference of the control block (cblk1) points to the start address or start instruction of another control block (cblk2) (Reference B), and when it points to the internal address or internal instruction of another control block (cblk3) (Reference B) C) can be divided to create a control flow.

이 예에서 레퍼런스 B는 대상 제어블록(cblk2)의 시작 주소 또는 인스트럭션을 가리키므로 대상 제어블록(cblk2)은 그대로 제어흐름 생성에 포함시킬 수 있다.In this example, reference B points to the start address or instruction of the target control block (cblk2), so the target control block (cblk2) can be included in the control flow generation as is.

한편 레퍼런스 C는 대상 제어블록의 내부 중 인스트럭션 2(instr2)를 가리키므로 제어흐름 생성 시에 해당 제어블록(cblk3)의 인스트럭션 2(instr2)부터 마지막 인스트럭션까지 포함하는 새로운 제 3 제어블록(cblk3-2)를 제어흐름 생성에 포함시킬 수 있다.Meanwhile, since reference C points to instruction 2 (instr2) inside the target control block, when creating a control flow, a new third control block (cblk3-) is created that includes from instruction 2 (instr2) to the last instruction of the corresponding control block (cblk3). 2) can be included in control flow creation.

이 도면의 오른쪽은 위에서 설명한 예시에 따라 특정 제어블록(cblk1)에 대한 제어흐름 생성한 예이다.The right side of this figure is an example of creating a control flow for a specific control block (cblk1) according to the example described above.

왼쪽의 아웃고잉 레퍼런스 분석에 따라 제어블록(cblk1)의 제어흐름을 분석한 결과 제어블록(cblk1)에 대한 제어흐름이 생성될 수 있다.As a result of analyzing the control flow of the control block (cblk1) according to the outgoing reference analysis on the left, the control flow for the control block (cblk1) can be generated.

이와 같은 예에 따라 생성된 제어흐름은, 제 1 제어블록(cblk1)가 제 2 제어블록(cblk2)의 시작 주소 또는 인스트럭션을 지칭하는 경우 제 2 제어블록(cblk2)을 제어흐름 내의 버텍스(vertex)로 포함할 수 있다. The control flow generated according to this example refers to the second control block (cblk2) as a vertex in the control flow when the first control block (cblk1) refers to the start address or instruction of the second control block (cblk2). It can be included as .

그리고 제 1 제어블록(cblk1)가 제 3 제어블록(cblk3)의 내부 또는 중간 위치나 인스트럭션을 가리키는 경우, 생성된 제어흐름은 가리키는 위치의 인스트럭션부터 제 3 제어블록(cblk3)을 분리하고, 가리키는 위치의 인스트럭션을 시작 인스트럭션으로 하는 새로운 제어블록(cblk3-2)을 버텍스(vertex)로 포함할 수 있다. And when the first control block (cblk1) points to an internal or intermediate position or instruction of the third control block (cblk3), the generated control flow separates the third control block (cblk3) from the instruction at the pointed position, and A new control block (cblk3-2) with the instruction in as the starting instruction can be included as a vertex.

실시 예에 따르면, 특정 제어블록의 브랜치 인스트럭션이 아웃고잉 레퍼런스인 경우, 그 아웃고잉 레퍼런스가 지칭하는 위치나 인스트럭션에 따라 제어흐름을 생성할 수 있다.According to an embodiment, when the branch instruction of a specific control block is an outgoing reference, a control flow can be generated according to the location or instruction pointed to by the outgoing reference.

특정 제어블록에 대해 생성된 제어흐름은 그 아웃고잉 레퍼런스가 제 2 제어블록의 시작 지점을 지칭하는 경우 제 2 제어블록을 버텍스(vertex)로 포함한다. 그리고 생성된 제어흐름은 상기 아웃고잉 레퍼런스가 제 3 제어블록의 중간 지점을 지칭하는 경우 그 지칭 지점의 인스트럭션을 시작 인스트럭션으로 하는 새로운 제어블록을 버텍스(vertex)로 포함한다.The control flow generated for a specific control block includes the second control block as a vertex if the outgoing reference points to the starting point of the second control block. And, when the outgoing reference refers to the middle point of the third control block, the generated control flow includes a new control block that uses the instruction at the reference point as the start instruction as a vertex.

이 도면의 예에서 제 1 제어블록(cblk1)의 레퍼선스 A는 제 1 제어블록(cblk1) 내부를 가리키기 레퍼런스이므로 무시하고, 제 1 제어블록(cblk1)의 레퍼런스 B는 제 2 제어블록(cblk2)의 시작 주소를 가리키므로 제 2 레퍼런스를 버텍스로 포함한다. 제 1 제어블록(cblk1)의 레퍼런스 C는 제 2 제어블록(cblk2)의 내부를 가리키므로 제 2 제어블록(cblk2)의 인스트럭션 2로부터 새로운 제어블록을 생성하여 버텍스로 포함할 수 있다. In the example of this drawing, reference A of the first control block (cblk1) is ignored because it is a reference pointing to the inside of the first control block (cblk1), and reference B of the first control block (cblk1) is a reference to the second control block (cblk2). ), so it includes the second reference as a vertex. Since the reference C of the first control block (cblk1) points to the inside of the second control block (cblk2), a new control block can be created from instruction 2 of the second control block (cblk2) and included as a vertex.

이 도면의 예는 생성된 제어흐름을 제어흐름 그래프(Control Flow Graph, CFG)로 표시한 예인데, 하위 버텍스(vertex)들은 제어블록(cblk)의 시작 주소를 기준으로 버텍스들을 오름차순으로 그래프의 왼쪽으로 위치시킨 예를 나타낸다.The example in this drawing is an example of displaying the generated control flow as a control flow graph (CFG). Lower vertices are located on the left side of the graph in ascending order based on the start address of the control block (cblk). An example of positioning is shown.

이하에서는 위와 같이 실행파일이 디스어셈블된 제어블록들의 레퍼런스 관계를 탐색하여 생성한 인스트럭션 시퀀스에 따라 상기 실행 파일의 사이버 위협 특징 정보를 얻는 예를 이하에서 개시한다.Below, an example of obtaining cyber threat characteristic information of an executable file according to an instruction sequence generated by searching the reference relationship of disassembled control blocks of the executable file as described above will be described below.

레퍼런스 관계에 따라 생성되는 인스트럭션 시퀀스들은 사이버 위협 정보의 특징을 나타낼 수 있다. Instruction sequences generated according to reference relationships can represent characteristics of cyber threat information.

위에서 개시한 제어흐름 생성은 깊이 우선 탐색(DFS) 방식을 이용하면 제어블럭의 인스트럭션들을 특정한 원칙에 따른 순서에 따라 병합하여 인스트럭션 시퀀스들을 생성할 수 있다. The control flow generation disclosed above uses a depth-first search (DFS) method to generate instruction sequences by merging the instructions of the control block in an order according to a specific principle.

이하에서는 사이버 위협 정보의 특징을 얻을 수 있는 인스트럭션 시퀀스들을 결합하는 방식을 예시한다.Below, we illustrate a method of combining instruction sequences that can obtain the characteristics of cyber threat information.

인스트럭션 시퀀스들을 결합하는 제 1 예로서 제어블럭 내의 인스터력션들의 레퍼런스 관계에 따라 인스트럭션 시퀀스들을 생성할 경우 제어흐름의 의미가 있는 인스트럭션들을 깊이 우선 탐색하여 인스트럭션 시퀀스를 생성할 수 있다. As a first example of combining instruction sequences, when generating instruction sequences according to the reference relationship of instructions within a control block, the instruction sequence can be created by searching depth-first for instructions that have control flow significance.

여기서 제어흐름의 의미를 가지는 인스트럭션들이란 제어블록 내에 호출되는 인스트럭션들 중 NOP(non-operation) 또는 RET(return) 계열의 함수 또는 JUMP 함수나 CALL 함수 등 브랜치 계열의 함수들을 제거하는 것을 의미한다.Here, instructions with the meaning of control flow are NOP (non-operation) or RET (return) series functions or This means removing branch-type functions such as the JUMP function or CALL function.

이러한 계열의 함수들은 제어흐름의 그래프를 생성할 경우 그래프의 에지(EDGE)를 생성하는 뿐 실제 인스트럭션 시퀀스를 구성하지 않는다. 따라서 제어흐름의 그래프 내에 인스트럭션들을 깊이 우선 탐색으로 순서대로 결합할 경우 인스트럭션 시퀀스를 생성하는데 기여하지 않는다. When creating a control flow graph, these series of functions only create the edges of the graph and do not constitute an actual instruction sequence. Therefore, when instructions are combined in order within the control flow graph using depth-first search, it does not contribute to generating an instruction sequence.

제어블럭 내의 인스터력션들의 레퍼런스 관계에 따라 인스트럭션 시퀀스들을 생성하는 제 1 예는, 실제 인스트럭션 시퀀스에 포함될 수 있는 의미 있는 인스트럭션들을 결합하는 것으로서 브랜치 또는 단순히 레퍼런스 시키는 인스트럭션은 결합 시 포함시키지 않는다.The first example of generating instruction sequences according to the reference relationship of instructions within a control block is to combine meaningful instructions that can be included in the actual instruction sequence, and does not include branches or simply reference instructions when combining.

제어흐름 그래프에서 깊이 우선 탐색 방식으로 인스트럭션을 결합하므로 브랜치 계열의 인스트럭션 또는 단순히 레퍼런스 시키는 인스트럭션은 사용하지 않고 인스트럭션 시퀀스를 생성한다. Since instructions are combined using a depth-first search method in the control flow graph, an instruction sequence is created without using branch-based instructions or simply reference instructions.

제어블럭 내의 인스터력션들의 레퍼런스 관계에 따라 인스트럭션 시퀀스들을 생성하는 제 2 예로서, 제어블록 내의 인스트럭션 중 CALL 계열의 함수에 의해서 제어블록이 호출될 경우 스택 프레임이 조정될 수 있다. As a second example of generating instruction sequences according to the reference relationship of instructions within the control block, when the control block is called by a CALL series function among the instructions within the control block, the stack frame can be adjusted.

스택 프레임(Stack Frame)은 스택 영역에 함수를 구분하기 위해 생성되는 공간을 의미한다. 예를 들어 스택 프레임은 Parameters, Return Address, Local variables 등을 포함할 수 있는데 함수 호출 시 생성되고 함수가 종료되면서 소멸된다.Stack Frame refers to the space created to separate functions in the stack area. For example, the stack frame may include parameters, return addresses, local variables, etc., and is created when a function is called and destroyed when the function ends.

일반적으로 스택 프레임은 스택 시작점을 나타내는 스택 포인터(stack pointer, sp)와 스택 상의 특정 데이터를 가리키는 포인터인 베이스 포인터(base pointer, bp)를 포함하는데, 스택 프레임이 변경되는 경우 스택 포인터(sp)와 베이스 포인터(bp)가 변경될 수 있다. Generally, a stack frame includes a stack pointer (sp), which indicates the start point of the stack, and a base pointer (bp), which is a pointer to specific data on the stack. When the stack frame changes, the stack pointer (sp) and The base pointer (bp) may be changed.

이와 같은 스택 프레임 상의 포인터와 관련된 인스트럭션들은 제어흐름에서 로직의 잡음으로 역할하기 때문에 깊이 우선 탐색을 사용하는 등 인스트럭션 시퀀스들을 결합하는데 사용되지 않는다. 위에서 예시한 바와 같이 인스트럭션 시퀀스를 결합하는데 브랜치 계열의 인스트럭션을 사용하지 않는 것과 유사하게 스택 프레임과 관련된 인스트럭션도 사용하지 않은다.Since instructions related to pointers on the stack frame act as logic noise in the control flow, they are not used to combine instruction sequences, such as using depth-first search. As illustrated above, just as branch-type instructions are not used to combine instruction sequences, instructions related to stack frames are also not used.

도 37은 제 2 예에 따라 예시한 인스트럭션 결합 원칙에 따라 제어블럭의 인스트럭션들을 결합하여 인스트럭션 시퀀스를 생성하는 경우를 예시한다. FIG. 37 illustrates a case where an instruction sequence is generated by combining instructions of a control block according to the instruction combining principle illustrated according to the second example.

CALL 계열의 함수에 의해 제어블록이 호출될 경우 스택 프레임과 관련된 인스트럭션들은 제어흐름에 의한 로직과 관련이 없어서 인스트럭션들을 결합 시에 사용하지 않고 인스트럭션 시퀀스를 생성할 수 있다. When a control block is called by a CALL series function, the instructions related to the stack frame are not related to the logic of the control flow, so an instruction sequence can be created without using them when combining instructions.

이 도면은 app1로 표시한 샘플 코드의 제어블럭과 app2로 표시한 샘플 코드의 제어블럭을 예시하였다. 샘플 코드 app1과 app2는 동일한 결과를 수행하는 코드이나 이 예에서 app1 샘플 코드는 동일한 코드를 반복하는 반면, app2 샘플코드는 동일한 코드를 반복하지 않지만 동일한 수행을 하도록 fool1이라는 함수가 fool2를 호출하도록 하였다. This drawing illustrates the control block of the sample code indicated as app1 and the control block of the sample code indicated as app2. Sample code app1 and app2 are codes that perform the same result, but in this example, the app1 sample code repeats the same code, while the app2 sample code does not repeat the same code, but has a function called fool1 call fool2 to perform the same performance. .

app2 샘플 코드의 제어블록 예로 하여 설명하면 app2 샘플 코드의 제어블록 시작 전에 스택 프레임을 초기화할 수 있다. (0x100003eb0 ~ 0x100003eb4).If we take the example of the control block of the app2 sample code, the stack frame can be initialized before starting the control block of the app2 sample code. (0x100003eb0 to 0x100003eb4).

여기서 코드 상의 (pushq %rbp)는 베이스 포인터 저장하는 것을 의미하고, (movq %rsp, %rbp)는 베이스 포인터에 스택 포인터 저장함을 나타낸다.Here, (pushq %rbp) in the code means saving the base pointer, and (movq %rsp, %rbp) means saving the stack pointer to the base pointer.

그리고 코드 상의 (subq %16, %rsp)는 스택 포인터 위치를 스택 최상단으로 이동하는 것을 나타내는데, 스택은 최상단이 베이스보다 작은 주소를 가지게 된다. And (subq %16, %rsp) in the code indicates moving the stack pointer to the top of the stack, and the top of the stack has a smaller address than the base.

app2 샘플 코드상의 제어블럭의 리턴 전에 스택 정리할 수 있다 (0x100003ef9 ~ 0x100003efd).The stack can be cleaned up before the return of the control block in the app2 sample code (0x100003ef9 ~ 0x100003efd).

여기 코드 상의 (addq $16, %rsp)는 스택 포인터를 베이스(바닥)으로 이동시키는 것을 의미하는데 그 결과 스택의 값을 모두 없앤 효과를 발생한다. (addq $16, %rsp) in the code here means moving the stack pointer to the base (bottom), which has the effect of removing all stack values.

그리고, 코드 상의 (popq %rbp)는 저장했던 이전 베이스 포인터를 다시 복원함을 나타낸다.And, (popq %rbp) in the code indicates that the previously saved base pointer is restored.

따라서, 그 이후에 app1을 호출하면 호출에 의해 그 이전의 스택 프레임에 관련된 인스트럭션들은 제어흐름과 관련이 없으므로, 인스트럭션을 결합하여 인스트럭션 시퀀스 생성 시에 고려하지 않는다.Therefore, when app1 is called after that, the instructions related to the stack frame before the call are not related to the control flow, so they are not considered when combining instructions to create an instruction sequence.

이와 같이 스택 프레임과 관련된 함수 분리에 의해 스택 프레임이 조정되는 경우, 즉 스택 프레임과 관련된 인스트럭션들은 제어흐름에 의한 로직과 관련이 없는 경우 인스트럭션 시퀀스를 생성하는데 고려하지 않고 인스트럭션 시퀀스를 생성을 생성한다. In this way, when the stack frame is adjusted by separating functions related to the stack frame, that is, when the instructions related to the stack frame are not related to the logic by the control flow, the instruction sequence is generated without consideration in generating the instruction sequence.

제어블럭 내의 인스터력션들을 이용하여 특징 정보를 포함하는 인스트럭션 시퀀스들을 생성하는 다른 예를 개시한다. Another example of generating instruction sequences including feature information using instructions within a control block is disclosed.

제어블럭 내의 인스터력션들을 이용하여 특징 정보를 포함하는 인스트럭션 시퀀스들을 생성할 경우 제어흐름 분석에 따른 그래프의 에지 웨이트(edge weight)를 반영하여 인스트럭션 시퀀스들을 생성할 수 있다. When generating instruction sequences containing feature information using instructions within a control block, the instruction sequences can be generated by reflecting the edge weight of the graph according to control flow analysis.

제어흐름 분석에 따른 그래프의 에지 웨이트(edge weight)를 반영한 그래프는 이하에서 도면에서 비교 예시한다. Graphs reflecting the edge weight of the graph according to control flow analysis are compared and illustrated in the drawings below.

도 38은 제어블럭 내의 인스터력션들을 이용하여 특징 정보를 포함하는 인스트럭션 시퀀스들을 생성하는 다른 예를 설명하기 위한 도면이다. Figure 38 is a diagram for explaining another example of generating instruction sequences including feature information using instructions in a control block.

여기서 동일한 결과를 수행하는 샘플 코드 app1과 app3을 예시하였다Here are sample codes app1 and app3 that achieve the same result.

이 예에서는 왼쪽의 app1 샘플 코드가 나타내는 제어블럭은 동일 로직이나 변수만 다른 코드가 2회 반복되는 구조를 가지고 있다. In this example, the control block shown in the app1 sample code on the left has a structure in which the same logic or code with different variables is repeated twice.

오른쪽의 app3 샘플 코드는 동일한 코드를 반복하지 않고 이를 함수로 변경한 후 2회 호출(NET보완-6-110)하는 경우를 예시한다.The app3 sample code on the right illustrates a case where the same code is not repeated, but is changed to a function and called twice (NET Supplement-6-110).

이 도면의 두 샘플 코드의 결과는 동일하지만 app3 샘플 코드를 기반으로 인스트럭션 시퀀스를 생성할 경우 2회 호출되는 제어블록(0x100003ef0)의 인스트럭션은 제어흐름을 분석한 그래프에 2번 추가해서 인스트럭션 시퀀스를 생성할 수 있다. The results of the two sample codes in this figure are the same, but when creating an instruction sequence based on the app3 sample code, the instruction of the control block (0x100003ef0), which is called twice, is added twice to the graph analyzing the control flow to create an instruction sequence. can do.

이와 같이 제어블럭 내의 인스터력션들을 이용하여 인스트럭션 시퀀스들을 생성할 경우 반복해서 호출되는 인스트럭션은 제어흐름 그래프에서 에지 웨이트(edge weight)를 반영하여 인스트럭션 시퀀스를 생성할 수 있다. 따라서, 생성된 인스트럭션 시퀀스에서 다수 호출되는 인스트럭션이 웨이트(weight)로 반영될 수 있도록 할 수 있다.In this way, when generating instruction sequences using instructions within a control block, instructions that are called repeatedly can generate an instruction sequence by reflecting the edge weight in the control flow graph. Therefore, it is possible to ensure that instructions called multiple times in the generated instruction sequence are reflected as weights.

도 39는 제어블럭 내의 인스터력션들을 이용하여 특징 정보를 포함하는 인스트럭션 시퀀스들을 생성하는 또 다른 예를 설명하기 위한 도면이다. FIG. 39 is a diagram to explain another example of generating instruction sequences including feature information using instructions in a control block.

제어블럭 내의 인스터력션들을 이용하여 특징 정보를 포함하는 인스트럭션 시퀀스들을 생성하는 제 4의 실시 예는 다음과 같다.A fourth embodiment of generating instruction sequences including feature information using instructions in a control block is as follows.

이 도면에서 예시한 샘플 코드 app1, app2, 및 app3는 위에서 설명한 바와 같다.The sample codes app1, app2, and app3 illustrated in this figure are the same as described above.

샘플 코드 app1는 동일한 코드가 반복 수행되는 코드이고, 샘플 코드 app2는 동일한 코드가 반복되지 않지만 동일한 수행을 하도록 fool1이라는 함수가 fool2를 호출하도록 한 코드이고, 샘플 코드 app3은 함수 fool2를 2회 호출하도록 한 코드이다. Sample code app1 is a code in which the same code is performed repeatedly, sample code app2 is a code in which a function named fool1 calls fool2 so that the same code is not repeated but performs the same execution, and sample code app3 is a code in which the function fool2 is called twice. This is one code.

동일한 로직을 수행하는 코드들을 기반으로 인스트럭션 시퀀스를 생성하는 경우라도 파일마다 오프셋이 모두 다르기 때문에 파일 내의 함수의 오퍼랜드(operand)에 따라 인스트럭션 시퀀스가 달라질 수 있다.Even when an instruction sequence is generated based on codes that perform the same logic, the offsets are different for each file, so the instruction sequence may vary depending on the operand of the function in the file.

이 도면에서 예시하는 바와 같이 동일한 함수에 대해 함수의 연산자인 오퍼랜드(operand)가 모두 달라진다. As illustrated in this figure, for the same function, the operands (operands) of the function are all different.

이 도면의 박스들 안에 값인 오퍼랜드 때문에 사이버 위협 정보의 특징을 나타낼 수 있는 인스트럭션 시퀀스가 영향을 받을 수 있다.Because of the operand values in the boxes in this figure, the instruction sequence that can characterize cyber threat information can be affected.

따라서, 제어블럭 내의 인스터력션들을 이용하여 특징 정보를 포함하는 인스트럭션 시퀀스들을 생성할 경우 함수의 오퍼랜드는 제거하고 오피코드(OP-code)만을 이용해 인스트럭션 시퀀스를 생성할 수도 있다.Therefore, when generating instruction sequences including feature information using instructions within a control block, the operand of the function can be removed and the instruction sequence can be generated using only the op-code.

도 40은 제어블럭 내의 인스터력션들을 이용하여 특징 정보를 포함하는 인스트럭션 시퀀스들을 생성하는 또 다른 예를 설명하기 위한 도면이다. FIG. 40 is a diagram illustrating another example of generating instruction sequences including feature information using instructions in a control block.

제어블럭 내의 인스터력션들을 이용하여 특징 정보를 포함하는 인스트럭션 시퀀스들을 생성하는 제 5의 실시 예로서 제어블럭 내에 인스트럭션을 기반으로 인스트럭션 시퀀스를 생성할 경우 단순히 파라미터를 전달하는 인스트럭션들은 로직 흐름에 잡음으로 동작할 수 있다.As a fifth embodiment of generating instruction sequences containing feature information using instructions within a control block, when an instruction sequence is generated based on instructions within a control block, instructions that simply transfer parameters are used as noise in the logic flow. It can work.

이 도면의 예시한 샘플 코드의 제어블록에서 함수 0x100003ef0는 2번 호출되며 각각 파라미터를 전달하는 과정을 수행한다.In the control block of the sample code shown in this figure, function 0x100003ef0 is called twice and each carries out the process of passing parameters.

이렇게 단순히 파라미터 전달에만 관련하는 인스트럭션의 경우 제어흐름을 생성할 때 노이즈만 발생시키고 실제 특징 정보 또는 이에 대응되는 인스트럭션 시퀀스에는 의미 있는 기여를 하지 않으므로 제외한다. In the case of instructions that are simply related to parameter transfer, they are excluded because they only generate noise when generating the control flow and do not make a meaningful contribution to the actual feature information or the corresponding instruction sequence.

이상에서 개시한 바와 같이 실행파일을 디스어셈블하여 어셈블리 코드를 생성할 때 제어블럭 내에 포함된 인스트럭션들을 기반으로 사이버 위협 정보의 특징 정보에 대응하는 인스트럭션 시퀀스를 생성하는 예들을 개시하였다.As disclosed above, examples of generating an instruction sequence corresponding to characteristic information of cyber threat information based on instructions included in a control block when disassembling an executable file to generate assembly code have been disclosed.

위에 예시한 예들은 중복적으로 적용될 수 있기 때문에 위에 설명한 5가지의 예들을 적어도 하나 이상의 예에 따라 인스트럭션 시퀀스를 생성할 수 있다.Since the examples illustrated above can be applied redundantly, an instruction sequence can be created according to at least one of the five examples described above.

도 41은 위의 설명한 예들에 따라 인스트럭션 시퀀스를 생성하는 예를 개시한다. Figure 41 discloses an example of generating an instruction sequence according to the examples described above.

제어블록 내의 인스트럭션들의 특성, 순서, 및 레퍼런스를 고려하여 결합하면 사이버 위협 정보 등의 특성 정보를 포함하는 인스트럭션 시퀀스를 생성할 수 있다. By considering the characteristics, order, and references of the instructions in the control block and combining them, an instruction sequence containing characteristic information such as cyber threat information can be created.

이와 같이 인스트럭션 시퀀스를 생성할 경우 일 예는 제어블럭 내의 인스터력션들의 레퍼런스 관계에 따라 JUMP 함수나 CALL 함수 등 코드 상 분기하도록 하는 브랜치 계열의 함수를 제거하고 제어흐름에 따라 인스트럭션 시퀀스를 생성할 수 있다. When creating an instruction sequence like this, one example is to remove branch-type functions that cause branching in the code, such as the JUMP function or CALL function, according to the reference relationship between instructions in the control block, and create an instruction sequence according to the control flow. there is.

인스트럭션 시퀀스를 생성하는 다른 일 예는 스택 프레임과 관련된 함수 분리에 의해 스택 프레임이 조정되는 경우 제어흐름에 의한 로직과 관련이 없는 인스트럭션을 제거하고 인스트럭션 시퀀스를 생성할 수 있다. Another example of generating an instruction sequence is when the stack frame is adjusted by separating functions related to the stack frame, instructions unrelated to logic by the control flow can be removed and an instruction sequence can be generated.

인스트럭션 시퀀스를 생성하는 또 다른 일 예는 인스트럭션의 제어흐름 그래프에서 에지 웨이트(edge weight)를 반영하여 인스트럭션 시퀀스를 생성하는 것이다. 이를 이용해 생성된 인스트럭션 시퀀스에서 다수 호출되는 인스트럭션에 대해 제어흐름 분석의 그래프상 웨이트(weight)를 반영하여 인스트럭션 시퀀스를 생성할 수 있다. Another example of generating an instruction sequence is to generate an instruction sequence by reflecting the edge weight in the control flow graph of the instruction. Using this, an instruction sequence can be created by reflecting the weight on the graph of the control flow analysis for instructions that are called multiple times in the generated instruction sequence.

인스트럭션 시퀀스를 생성하는 또 다른 일 예는 디스어셈블된 코드에서 오퍼랜드에 의해 오프셋이 달라지므로 함수의 오퍼랜드는 제거하고 오피코드(OP-code)만을 이용해 인스트럭션 시퀀스를 생성할 수 있다. Another example of generating an instruction sequence is that the offset varies depending on the operand in the disassembled code, so the operand of the function can be removed and the instruction sequence can be generated using only the op-code.

인스트럭션 시퀀스를 생성하는 또 다른 일 예는 단순히 파라미터 전달에만 관련하는 인스트럭션의 경우 인스트럭션 시퀀스에는 의미 있는 기여를 하지 않으므로 인스트럭션 시퀀스 생성 시에 이를 제외하고 인스트럭션 시퀀스를 생성할 수 있다.Another example of generating an instruction sequence is that, in the case of instructions simply related to parameter passing, they do not make a meaningful contribution to the instruction sequence, so the instruction sequence can be generated by excluding them when creating the instruction sequence.

이러한 예들을 적어도 하나 이상 적용하면 디스어셈블된 제어블록 내의 제어흐름을 기반으로 사이버 위협 정보의 특징 정보를 포함할 수 있는 인스트럭션 시퀀스를 생성할 수 있다.By applying at least one of these examples, an instruction sequence that can include characteristic information of cyber threat information can be generated based on the control flow in the disassembled control block.

위에서 예시한 샘플 코드 app1, app2, 및 app3에 포함되는 메인 코드(0000000100003f60 <_main>)를 기준으로 인스트럭션 시퀀스를 생성할 수 있다.An instruction sequence can be created based on the main code (0000000100003f60 <_main>) included in the sample codes app1, app2, and app3 shown above.

생성된 인스트럭션 시퀀스의 코드는 위에서 개시한 바와 같이 정규화 및 벡터화를 수행할 수 있다. 그리고 벡터화된 내용을 해쉬 코드로 변환할 수 있다. 변환된 해쉬 코드는 사이버 위협 정보의 유닉크(unique)한 특징 정보를 포함할 수 있다. 해쉬 코드에 포함된 사이버 위협 특징 정보는 위에서 개시한 인공 지능 기법을 이용하여 변환된 해쉬 코드를 공격 기법과 공격 그룹을 식별할 수 있다. The code of the generated instruction sequence can be normalized and vectorized as described above. And the vectorized content can be converted to hash code. The converted hash code may include unique characteristic information of cyber threat information. The cyber threat characteristic information included in the hash code can identify attack techniques and attack groups using the converted hash code using the artificial intelligence technique described above.

이 도면에서는 CFG에 대응된 행은, 샘플 코드 app1, app2, 및 app3에 대한 제어흐름 분석에 따른 그래프를 각각 나타낸 것이다. In this figure, the rows corresponding to CFG show graphs according to control flow analysis for sample codes app1, app2, and app3, respectively.

이 예에서 샘플 코드 app1의 제어흐름 분석에 따른 그래프는 0:100003f60 -> 1:100003ed0으로 표현되고, 샘플 코드 app2의 제어흐름 분석에 따른 그래프는 0:100003f60 -> 1:100003f00 -> 2:100003ed0 로 표현된다. In this example, the graph according to the control flow analysis of sample code app1 is expressed as 0:100003f60 -> 1:100003ed0, and the graph according to the control flow analysis of sample code app2 is expressed as 0:100003f60 -> 1:100003f00 -> 2:100003ed0. It is expressed as

그리고 샘플 코드 app3의 제어흐름 분석에 따른 그래프는 0:100003f60 -> 1:100003f40 -> 2:100003ef0 로 표현된다. 여기서, 1:100003f40 -> 2:100003ef0의 제어흐름에는 edge weight 2가 반영되었다.And the graph according to the control flow analysis of sample code app3 is expressed as 0:100003f60 -> 1:100003f40 -> 2:100003ef0. Here, edge weight 2 was reflected in the control flow of 1:100003f40 -> 2:100003ef0.

각각의 제어흐름 분석에 따른 그래프는 위에서 예시한 5가지의 예 중 적어도 하나를 적용하여 생성한 것이다. The graph according to each control flow analysis is created by applying at least one of the five examples shown above.

Instruction Sequence에 대응된 행은 샘플 코드 app1, app2, 및 app3에 대한 인스트럭션 시퀀스들을 각각 나타낸 것이다. 따라서, 샘플 코드 app1, app2, 및 app3가 완전히 동일하지 않더라도 동일한 결과를 수행하는 코드들이기 때문에 위에 예시한 방식들에 따른 인스트럭션 시퀀스들은 모두 동일하게 나타나는 것을 확인할 수 있다. The rows corresponding to Instruction Sequence show instruction sequences for sample codes app1, app2, and app3, respectively. Accordingly, even if the sample codes app1, app2, and app3 are not completely identical, since they are codes that perform the same result, it can be seen that the instruction sequences according to the above-described methods all appear the same.

마지막 행인 Fuzzy Hash에 대응되는 행은 샘플 코드 app1, app2, 및 app3에 대한 인스트럭션 시퀀스들을 해쉬 코드로 변환한 것이다. 각 샘플 코드의 제어블럭의 해쉬 정보는 특징 정보가 될 수 있다. The row corresponding to the last row, Fuzzy Hash, converts the instruction sequences for sample codes app1, app2, and app3 into hash codes. Hash information of the control block of each sample code can be feature information.

이 예에서 알 수 있듯이 샘플 코드 app1, app2, 및 app3는 그 코드는 서로 조금씩 다르지만 사이버 위협 정보의 관점에서 동일한 의미를 가진다. 즉, 샘플 코드 app1, app2, 및 app3의 해쉬 코드들은 동일하며 그에 따른 코드의 특징 정보가 동일함을 알 수 있다. As you can see in this example, the sample codes app1, app2, and app3 have slightly different codes, but have the same meaning from a cyber threat intelligence perspective. In other words, it can be seen that the hash codes of the sample codes app1, app2, and app3 are the same and the characteristic information of the corresponding codes is the same.

도 42는 개시한 사이버 위협 정보 처리 장치의 다른 일 실시 예를 예시한 도면이다. Figure 42 is a diagram illustrating another embodiment of the disclosed cyber threat information processing device.

사이버 위협 정보 처리 장치의 다른 일 실시예는 프로세서를 포함하는 서버(2100), 데이터베이스(2200), 및 인텔리전스 플랫폼(10000)을 포함할 수 있다Another embodiment of the cyber threat information processing device may include a server 2100 including a processor, a database 2200, and an intelligence platform 10000.

데이터베이스(2200)는 이미 분류된 악성 코드 또는 악성 코드의 패턴 코드를 저장할 수 있다. The database 2200 may store already classified malicious code or pattern codes of malicious code.

서버(2100)의 프로세서는 응용 프로그램 인터페이스(Application Programming Interface) (1100)로부터 수신된 실행 파일을 디스어셈블링하여 디스어셈블된 코드를 획득하는 제1 실행모듈(18501)의 수행할 수 있다. The processor of the server 2100 may disassemble the executable file received from the application programming interface 1100 and perform the first execution module 18501 to obtain the disassembled code.

그리고 서버(2100)의 프로세서는 상기 디스어셈블된 코드 내 인스트럭션들의 관계에 따른 제어흐름에 기반하여 인스트럭션 시퀀스를 생성을 수행하도록 하는 제 2 실행모듈(18503)을 수행할 수 있다.Additionally, the processor of the server 2100 may perform a second execution module 18503 that generates an instruction sequence based on a control flow according to the relationship between instructions in the disassembled code.

제2 실행모듈(18103)의 수행 과정의 예는 도 35 내지 도 41에 예시하였다.Examples of the execution process of the second execution module 18103 are shown in FIGS. 35 to 41.

그리고 서버(2100)의 프로세서는 상기 생성한 인스트럭션 시퀀스를 사이버 위협 정보와 관련된 특징 데이터 세트로 변환하는 제 3 모듈(18505)을 수행할 수 있다. 특징 데이터 세트는 특징 벡터 데이터와 해쉬 함수가 될 수 있다.Additionally, the processor of the server 2100 may perform a third module 18505 that converts the generated instruction sequence into a feature data set related to cyber threat information. Feature data sets can be feature vector data and hash functions.

서버(2100)의 프로세서는 인공지능엔진(1230)을 수행하고 상기 변환된 특정 포맷의 데이터 세트에 기초하여 상기 저장된 악성코드와 유사 여부를 판단하고 상기 판단에 따라 상기 변환된 특정 포맷의 데이터 세트를 적어도 하나 이상의 정형화된 공격 식별자로 분류하는 제 4 실행모듈(18507)을 수행할 수 있다.The processor of the server 2100 executes the artificial intelligence engine 1230, determines whether it is similar to the stored malicious code based on the converted data set of the specific format, and based on the determination, generates the converted data set of the specific format. A fourth execution module 18507 that classifies attacks using at least one standardized attack identifier can be performed.

제4 실행모듈(18507))의 수행 과정의 예는 도 19, 도 20, 도 21, 도 25, 도 26, 도 27 등을 참조하여 설명하였다.Examples of the execution process of the fourth execution module (18507) are described with reference to FIGS. 19, 20, 21, 25, 26, and 27.

도 43은 개시한 사이버 위협 정보 처리 방법의 다른 일 실시 예를 예시한 도면이다.Figure 43 is a diagram illustrating another example of the disclosed cyber threat information processing method.

실행파일을 디스어셈블한 디스어셈블 코드를 얻는다(S4100).Obtain the disassembly code that disassembles the executable file (S4100).

상기 디스어셈블된 코드 내 인스트럭션들의 관계에 따른 제어흐름에 기반하여 인스트럭션 시퀀스를 생성한다(S4200).An instruction sequence is generated based on the control flow according to the relationship between instructions in the disassembled code (S4200).

코드 내 인스트럭션들의 관계에 따른 제어흐름에 기반하여 인스트럭션 시퀀스를 얻는 예는 도 35내지 도 41에 상세히 예시하였다. An example of obtaining an instruction sequence based on the control flow according to the relationship between instructions in the code is illustrated in detail in FIGS. 35 to 41.

상기 생성된 인스트럭션 시퀀스를 사이버 위협 정보와 관련된 특징 데이터 세트로 변환한다(S4300). The generated instruction sequence is converted into a feature data set related to cyber threat information (S4300).

상기 생성된 인스트럭션 시퀀스들을 특징 벡터 데이터로 변환한 후에 해쉬 함수 값으로 변환할 수 있다. 인스트럭션 시퀀스를 포함하는 코드블록을 벡터 데이터와 해쉬 함수 값으로 변환하는 예는 위에서 상세히 개시하였다. 예를 들면, 데이터 변환에 관하여 도 21 내지 도 24의 실시 예가 사용될 수 있다. 인스트럭션 시퀀스를 포함하는 코드블록을 벡터 데이터와 해쉬 함수 값으로 변환하는 예는 이 실시 예를 참조한다.The generated instruction sequences can be converted into feature vector data and then converted into hash function values. An example of converting a code block containing an instruction sequence into vector data and a hash function value was described in detail above. For example, the embodiments of Figures 21 to 24 may be used regarding data conversion. For an example of converting a code block containing an instruction sequence into vector data and a hash function value, refer to this embodiment.

상기 사이버 위협 정보와 관련된 특징 데이터 세트를 인공 지능 모델로 학습하여 사이버 위협 정보를 획득한다(S4400). 사이버 위협과 관련된 특징 정보가 포함된 데이터를 인공 지능 모델에 기반하여 학습하여 공격기법 또는 공격그룹을 분류하는 예를 위에서 상세히 개시하였다. 예를 들면, 학습 모델과 분류 모델에 관하여 도 25 내지 도 28의 실시 예가 적용될 수 있다. Cyber threat information is obtained by learning the feature data set related to the cyber threat information using an artificial intelligence model (S4400). An example of classifying attack techniques or attack groups by learning data containing characteristic information related to cyber threats based on an artificial intelligence model has been described in detail above. For example, the embodiments of FIGS. 25 to 28 may be applied to the learning model and classification model.

따라서 사이버 위협에만 관여하는 인스트럭션 시퀀스들만을 추출하여 생성한 코드블록으로부터 특정 공격 식별자에 관련된 패턴을 식별할 수 있다. 또한 선택된 공격 식별자에 따른 데이터에 기초하여 확률에 기반하여 정확한 공격 식별자가 결정될 수 있다. 위에서 예시한 바에 따라 공격 그룹도 식별이 가능하다Therefore, it is possible to identify patterns related to specific attack identifiers from code blocks generated by extracting only instruction sequences related to cyber threats. Additionally, an accurate attack identifier may be determined based on probability based on data according to the selected attack identifier. Attack groups can also be identified according to the example above.

획득한 사이버 위협 정보는 서버에서 사용자에게 다시 제공할 수 있다. 사용자는 API에 실행파일에 대한 정보를 문의하거나 실행파일을 입력함으로써 그 실행파일과 관련된 구체적인 사이버 위협 정보, 예를 들면 상세한 공격기법 및 공격그룹 등에 대한 정보를 얻을 수 있다.The acquired cyber threat information can be provided back to the user from the server. Users can obtain specific cyber threat information related to the executable file, such as detailed attack techniques and attack groups, by querying the API for information about the executable file or entering the executable file.

위에서는 시스템에 대한 실행파일들을 어셈블리어 영역에서 분석하여 사이버 위협 정보를 처리하는 실시예들을 개시하였다. Above, embodiments of processing cyber threat information by analyzing executable files for the system in the assembly language area were disclosed.

이하에서는 비실행형 파일로부터 사이버 위협 정보를 식별하고 처리하는 실시예를 개시한다. 최근에 특히 코로나 19 팬데믹으로 인해 경제, 사회, 교육 등 모든 활동이 비대면 중심으로 변화되면서 온라인 상업 활동, 재택근무, 원격 교육 등 수만은 온라인 플랫폼이 확대되고 있다. 따라서 온라인에서 공유되는 비실행형 파일의 수가 늘어났으며 공격자들은 이점을 이용하여 다양한 비실행형 파일을 통한 피싱 공격이나 APT (Advanced Persistent Threat) 공격을 수행하는 경우가 늘고 있다. Below, an embodiment of identifying and processing cyber threat information from non-executable files is disclosed. Recently, especially due to the COVID-19 pandemic, all activities, including economics, society, and education, have shifted to focus on non-face-to-face, and tens of thousands of online platforms, including online commercial activities, telecommuting, and distance education, are expanding. Therefore, the number of non-executable files shared online has increased, and attackers are increasingly taking advantage of this to carry out phishing attacks or APT (Advanced Persistent Threat) attacks through various non-executable files.

그러나 아직까지 일반 사용자들에게 비실행형 악성코드에 대한 경각심도 부족하고, 기존의 안티 바이러스 제품들은 실행형 파일에 맞춰 개발되었기에 비실행형 악성파일을 잘 탐지하지 못한다. 또한 비실행형 악성 파일을 탐지하더라도 탐지 이유에 대한 설명이 부족한 경우가 대부분이다. 따라서 비실행형 악성 파일에 대한 탐지와 그 탐지 근거의 제시가 필요하다. 이러한 점을 고려하여 비실행형 파일로부터 사이버 위협 정보를 식별하고 획득하는 실시 예를 이하에서 상세하게 개시한다. However, there is still a lack of awareness among general users about non-executable malicious code, and existing anti-virus products are developed for executable files, so they are not good at detecting non-executable malicious files. Additionally, even if non-executable malicious files are detected, there is usually a lack of explanation as to the reason for detection. Therefore, it is necessary to detect non-executable malicious files and provide a basis for their detection. In consideration of this, an embodiment of identifying and obtaining cyber threat information from a non-executable file will be described in detail below.

참고로 여기서 비실행형 파일은 파일의 외형적 형식이 비실행 파일을 의미하며 그 파일의 실행을 위해서는 별도의 실행 프로그램이 필요한 파일을 의미한다. 비실행형 파일을 정확하게 설명하기 위해 도면을 참조하여 설명한다.For reference, here, a non-executable file refers to a file that has a non-executable external format and requires a separate executable program to run the file. In order to accurately describe non-executable files, they are explained with reference to drawings.

도 44는 비실행형 파일 구조와 그 비실행형 파일의 리더 프로그램을 개념적으로 나타낸 도면이다. Figure 44 is a diagram conceptually showing a non-executable file structure and a leader program for the non-executable file.

파일의 확장자가 PDF나 DOC 등 문서형태 파일로 대표될 수 있는 비실행형 파일들은 이 도면과 같이 그 파일의 내부에 텍스트, 스크립트, 이미지 등 미디어 파일, 그리고 또다른 실행 파일이나 비실행형 파일을 포함(embedding)할 수 있다.Non-executable files, which can be represented by document-type files such as PDF or DOC, include media files such as text, script, and images, and other executable or non-executable files inside the file, as shown in this figure ( embedding) can be done.

이 도면의 예시와 같이 비실행형 파일은 스크립트, 텍스트나 미디어를 포함할 수 있다. 비실행 파일이 실행 파일을 포함하거나 또 다른 비실행형 파일을 포함할 수도 있다.Non-executable files may include scripts, text, or media, as shown in the example in this figure. A non-executable file may contain an executable file or may contain another non-executable file.

비실행형 파일은 해당 파일을 읽을 수 있는 실행 파일(비실행형 파일 리더 프로그램)이 실행되면서 비실행형 파일을 로드하고 그 내용을 확인할 수 있다. 악성 비실행형 파일의 경우, 리더 프로그램에 의해서 로딩되면서(리더 프로그램 실행 중) 리더 프로그램이 다음과 같은 작업을 하도록 유도할 수 있다.For non-executable files, an executable file (non-executable file reader program) that can read the file is executed, allowing the non-executable file to be loaded and its contents to be checked. In the case of a malicious non-executable file, it can be loaded by the reader program (while the reader program is running) and induce the reader program to perform the following actions.

악성 비실행형 파일이 실행되면 예를 들어 악성 행위가 포함된 스크립트가 실행될 수 있다. 또는 그 스크립트 실행으로 악성코드 유포지 서버와 연결해서 해당 악성코드 다운로드 후 실행하거나 악성 행위가 포함되고 임베딩(embedding)되어 있는 실행 파일을 추출 후 실행할 수도 있다.When a malicious non-executable file is executed, it can, for example, execute a script containing malicious actions. Alternatively, by executing the script, you can connect to the malicious code distribution server and download and execute the malicious code, or you can extract and execute the executable file that contains malicious actions and is embedded.

또한 악성 비실행형 파일이 실행되면 악성 행위가 포함되거나 임베딩되어 있는 비실행 파일을 추출 후 열거나 악성 행위가 포함된, 미디어 파일을 추출 후 열 수도 있다. Additionally, when a malicious non-executable file is executed, non-executable files containing or embedded with malicious behavior can be extracted and opened, or media files containing malicious behavior can be extracted and opened.

이하에서는 비실행형 악성파일을 탐지하고 그에 따른 공격 기법 및 공격 그룹을 식별할 수 있는 실시 예들을 개시한다. 개시하는 실시 예들은 인공 지능 모델을 활용하여 비실행형 파일에 대해 정상 또는 악성을 분류하거나, 비실행형 파일의 공격 그룹을 식별하거나 또는 비실행형 파일의 공격 행위를 식별할 수 있다.Hereinafter, embodiments that can detect non-executable malicious files and identify attack techniques and attack groups accordingly are disclosed. The disclosed embodiments may utilize an artificial intelligence model to classify non-executable files as normal or malicious, identify attack groups of non-executable files, or identify attack actions of non-executable files.

도 45는 비실행형 파일의 사이버 위협 정보를 얻을 수 있는 실시 예의 블록도를 개시한다. Figure 45 discloses a block diagram of an embodiment of obtaining cyber threat information of a non-executable file.

이 실시 예는 파일분석부(4300), 특징처리부(Feature Fusion)(4400), 악성탐지부(Malicious Document Detector)(4500), 공격기법분류부(Attack Technique Classifier)(4610), 및 공격그룹분류부(Attack Group Classifier)(4620)을 포함한다. This embodiment includes a file analysis unit 4300, a feature fusion unit 4400, a malicious document detector 4500, an attack technique classifier 4610, and an attack group classification. Includes Attack Group Classifier (4620).

파일분석부(4300)는 비실행형 파일(unknown Document)를 수신하고 비실행형 파일의 여러 가지 사이버 위협 정보를 분석할 수 있다. The file analysis unit 4300 can receive a non-executable file (unknown document) and analyze various cyber threat information of the non-executable file.

파일분석부(4300)는 제1 분석부(4310), 제2 분석부(4320), 및 제3 분석부(4330)을 포함할 수 있고, 각 분석부로부터 입력된 비실행형 파일의 특징 정보를 분석할 수 있다. The file analysis unit 4300 may include a first analysis unit 4310, a second analysis unit 4320, and a third analysis unit 4330, and may include characteristic information of the non-executable file input from each analysis unit. It can be analyzed.

특징처리부(4400)는 파일분석부(4300)가 분석한 특징 정보를 특징 벡터가 추출되고 추출된 벡터가 악성탐지부(4500)에서 악성 여부가 판단될 수 있도록 적절한 형태로 변환된다.The feature processing unit 4400 extracts feature vectors from the feature information analyzed by the file analysis unit 4300, and converts the extracted vectors into an appropriate form so that the malicious detection unit 4500 can determine whether they are malicious.

악성탐지부(4500)는 인공 지능 기법을 기반으로 입력된 특징 벡터가 변환된 데이터에 악성 행위가 포함되는지 탐지할 수 있다. 악성탐지부(4500)가 입력된 데이터에 사이버 위협 정보가 포함되지 않는다고 판단한 경우 정상적인 파일(Normal document)로 판단한다 The malicious detection unit 4500 can detect whether data converted from the input feature vector includes malicious behavior based on artificial intelligence techniques. If the malicious detection unit 4500 determines that the input data does not contain cyber threat information, it is judged to be a normal document.

공격기법분류부(4610)와 공격그룹분류부(4620)는 악성탐지부(4500)가 악성으로 탐지한 데이터에 대해 인공 지능 기법을 기반으로 사이버 위협 정보 체계에 따른 공격 기법(예, T1204.001)과 공격 그룹(예, G001)을 각각 분류할 수 있다. The attack technique classification unit 4610 and the attack group classification unit 4620 use attack techniques according to the cyber threat information system (e.g., T1204.001) based on artificial intelligence techniques for data detected as malicious by the malicious detection unit 4500. ) and attack groups (e.g., G001) can be classified respectively.

여기서는 사이버 위협 정보 체계에 따라 비실행형 파일에 포함된 공격 행위가 T1204.001이라는 공격 기법과, 그 공격 행위를 생성한 그룹이 G001이라는 공격 그룹이라는 것을 예시한다. Here, according to the cyber threat information system, the attack behavior included in the non-executable file is an attack technique called T1204.001, and the group that created the attack behavior is an attack group called G001.

예시한 블록들은 하드웨어로 구현될 수도 있고 소프트웨어로 구현되어 서버의 프로세서로 각각 실행될 수도 있다. 이하에서는 예시한 블록도의 각 부분의 상세한 예들을 개시한다. The illustrated blocks may be implemented as hardware or may be implemented as software and each executed by a server processor. Below, detailed examples of each part of the illustrated block diagram are disclosed.

도 46은 파일의 사이버 위협 정보를 얻을 수 있는 예시도 중 파일분석부에 포함되어 파일의 제1 타입의 분석을 실시하는 예를 개시한 도면이다. FIG. 46 is a diagram illustrating an example in which cyber threat information of a file can be obtained and is included in a file analysis unit to analyze the first type of file.

제1 분석부(4310)는 입력된 파일 자체를 분석하는데 여기서는 편의상 일종의 정적 분석을 수행하는 것으로 표현한다. The first analysis unit 4310 analyzes the input file itself, and here, for convenience, it is expressed as performing a type of static analysis.

제1 분석부(4310)는 비실행형 파일의 문서 내부에 포함되어 있는 악성 페이로드, 스크립트 등을 추출하고 분석하고 숨겨져 있는 첨부파일이나 다른 파일로 위장한 악성 데이터의 식별하는 등의 정적 분석을 수행한다.The first analysis unit 4310 extracts and analyzes malicious payloads and scripts contained within documents of non-executable files and performs static analysis, such as identifying malicious data disguised as hidden attachments or other files. .

제1 분석부(4310)는 정적특징추출단계, 정적특징처리단계, 및 정적특징변환단계를 수행하는데, 제1 분석부(4310)가 하드웨어적으로 구현된 경우 제1 분석부(4310)은 정적특징추출부(4312), 정적특징처리부(4315), 및 정적특징변환부(4317)을 포함할 수 있다.The first analysis unit 4310 performs a static feature extraction step, a static feature processing step, and a static feature conversion step. When the first analysis unit 4310 is implemented in hardware, the first analysis unit 4310 performs a static feature extraction step, a static feature processing step, and a static feature conversion step. It may include a feature extraction unit 4312, a static feature processing unit 4315, and a static feature conversion unit 4317.

제1 분석부(4310)는 정적 분석을 기반으로 비실행형 파일, 예를 들면 문서 내부에 있는 파일을 분리하고, 분리된 파일을 분석할 수 있다. 제1 분석부(4310)는 정적 분석을 기반으로 비실행형 파일 내의 숨겨진 악성 페이로드, 이를 실행할 수 잇는 스크립트 등을 추출하고 문서의 형태에 대한 정보를 추출할 수 있다.The first analysis unit 4310 may separate non-executable files, for example, files within a document, and analyze the separated files based on static analysis. Based on static analysis, the first analysis unit 4310 can extract hidden malicious payloads in non-executable files, scripts that can execute them, and extract information about the form of the document.

예를 들어 정적특징추출부(4312)는 비실행형 파일 내부의 URI 정보(URIs), 스크립트(Scripts), 임베딩 파일들(Embedding files), 행위관련정보(actions), 텍스트 내용(textual contents) 및 문서 메타 데이터(document metadata) 등을 추출할 수 있다. For example, the static feature extraction unit 4312 extracts URI information (URIs), scripts, embedding files, action-related information (actions), textual contents, and documents inside non-executable files. Metadata (document metadata), etc. can be extracted.

정적특징추출부(4312)는, 예를 들어 임베딩 파일들(Embedding files)에 대해서는 이미지 파일(Images)이나 여러 다른 형식의 첨부파일(Attachments)을 추출할 수 있다. The static feature extraction unit 4312 can extract, for example, image files or attachments of various other formats from embedding files.

정적특징처리부(4315)는 정적특징추출부(4312)가 추출한 정적특징 정보(URIs, Scripts, Embedding files, Actions 등)를 가공하여 정적특징 정보에 맞게 추가 분석 및 처리를 수행할 수 있다. The static feature processing unit 4315 can process the static feature information (URIs, Scripts, Embedding files, Actions, etc.) extracted by the static feature extraction unit 4312 and perform additional analysis and processing according to the static feature information.

정적특징처리부(4315)는 추출된 정보를 세분화하여 처리하여 공격기법과 공격그룹 식별을 구분할 수 있는 특징 정보에 공격자의 의도 정보를 반영하도록 할 수 있다.The static feature processing unit 4315 can process the extracted information in detail to reflect the attacker's intention information in feature information that can distinguish attack techniques and attack group identification.

예를 들면 정적특징처리부(4315)는 URI 파서로 URI를 파싱하여 URI 메타정보를 얻을 수 있는데, 이를 기반으로 공격자가 2차 감염을 위해 악성 파일을 다운로드하도록 유도하거나, 문서로부터 외부 피싱 웹 사이트에 접속하도록 유도하도록 하는 의도(intuition)를 확인할 수 있다. For example, the static feature processing unit 4315 can obtain URI meta information by parsing the URI with a URI parser. Based on this, it can induce the attacker to download a malicious file for secondary infection or send the document to an external phishing website. You can check the intention that leads to access.

정적특징처리부(4315)는 추출된 스크립트 분석을 통해 스크립트 메타데이터를 얻을 수 있으며, 이를 기반으로 공격자가 취약점 공격 또는 악성 행위를 위해 어떤 언어 스크립트를 선호하는지에 대한 정보를 얻을 수 있다. The static feature processing unit 4315 can obtain script metadata through analysis of the extracted script, and based on this, can obtain information about which language script the attacker prefers for exploiting vulnerabilities or performing malicious actions.

정적특징처리부(4315)는 임베딩 파일로부터 숨겨진 페이로드 식별자를 확인하고 임베딩 파일의 패이로드 타입을 얻을 수 있는데, 이를 기반으로 공격자가 악성 패이로드를 은닉하기 위해 어떤 기법을 적용하는지에 대한 정보를 얻을 수 있다. The static feature processing unit 4315 can check the hidden payload identifier from the embedding file and obtain the payload type of the embedding file, and based on this, obtain information about what technique the attacker applies to conceal the malicious payload. You can.

또한, 정적특징처리부(4315)는 임베딩 파일로부터 첨부된 파일의 타입을 확인하여 실제 파일 타입(true file type)을 확인할 수 있는데, 이를 기반으로 공격자가 문서 내부에 첨부 파일로 어떤 데이터를 포함시키고 어떤 것을 위장시켰는지에 대한 정보를 얻을 수 있다 Additionally, the static feature processing unit 4315 can check the true file type by checking the type of the attached file from the embedding file. Based on this, the attacker can determine what data is included as an attachment inside the document and what data is included as an attachment within the document. You can obtain information about whether something has been camouflaged.

정적특징처리부(4315)는 비실행형 파일 내에 포함된 여러 행위(actions)를 분류하고 행위 메타데이터를 얻을 수 있는데, 이를 기반으로 악성 행위 유발을 위해 어떤 행위나 기법을 사용하는지에 대한 정보를 얻을 수 있다. The static feature processing unit 4315 can classify various actions contained in non-executable files and obtain action metadata. Based on this, information can be obtained about what actions or techniques are used to cause malicious actions. there is.

이와 같이 정적특징처리부(4315)는 추출된 여러 가지 정적분석 정보로부터 공격자 의도 정보를 얻을 수 있다. 그리고, 정적특징처리부(4315)는 비실행형 파일 내부에 어떤 파일이 비정상적인 형태로 포함되어 있고 그 파일이 스크립트 형태인지 등에 대한 정보를 얻을 수 있다. In this way, the static feature processing unit 4315 can obtain attacker intent information from various extracted static analysis information. Additionally, the static feature processing unit 4315 can obtain information about which files are included in an abnormal form within the non-executable file and whether the file is in a script format.

정적특징변환부(4317)은 정적특징처리부(4315)가 추출한 이러한 정적특징 정보를 변환시킨다. 예를 들어 정적특징변환부(4317)은 특징처리부(4400)가 추출한 정적특징 정보를 기반으로 사이버 위협 정보를 처리할 수 있도록 위에서 설명한 바와 같이 정규화 또는 벡터화시키는 과정을 수행한다. The static feature conversion unit 4317 converts the static feature information extracted by the static feature processing unit 4315. For example, the static feature conversion unit 4317 performs the normalization or vectorization process as described above so that it can process cyber threat information based on the static feature information extracted by the feature processing unit 4400.

도 47은 파일의 사이버 위협 정보를 얻을 수 있는 예시도 중 파일분석부에 포함되어 파일의 제2 타입의 분석을 수행하는 예를 개시한 도면이다. Figure 47 is a diagram illustrating an example of performing a second type of analysis of a file by being included in a file analysis unit among examples of obtaining cyber threat information of a file.

제2 분석부(4320)는 비실행형 파일을 동적 분석을 기반으로 분석하여 사이버 위협 정보를 추출할 수 있다. 제3 분석부(4320)는 비실행형 파일을 리더 프로그램과 같은 대응되는 프로그램에 실행시켜 실제로 실행 시 발생하는 행위 정보를 추출할 수 있다. The second analysis unit 4320 can extract cyber threat information by analyzing non-executable files based on dynamic analysis. The third analysis unit 4320 can execute a non-executable file in a corresponding program such as a leader program and extract behavioral information that occurs during actual execution.

이하에서는 편의상 제2 분석부(4320)는 동적 분석 단계를 수행한다고 표현한다. Hereinafter, for convenience, the second analysis unit 4320 is expressed as performing a dynamic analysis step.

제2 분석부(4320)는 비실행형 파일의 동적 분석을 위해 안전하게 분리된 가상 환경을 구축하여 가상 환경에서 비실행형 파일에 맞는 대응 프로그램을 실행한다.The second analysis unit 4320 builds a safely separated virtual environment for dynamic analysis of non-executable files and executes a corresponding program suitable for the non-executable files in the virtual environment.

제2 분석부(4320)는 비실행형 파일이 대응 프로그램에서 실행될 경우 발생하는 프로세스에서 시스템 콜을 호출했을 때 어떤 파라미터를 가지고 행위를 수행하는지 분석할 수 있다. The second analysis unit 4320 can analyze what parameters are used when a system call is called in a process that occurs when a non-executable file is executed in a corresponding program.

제2 분석부(4320)가 실행단계, 동적특징추출단계 특징변환단계를 수행하는데, 제2 분석부(4320)가 하드웨어적으로 구현된 경우 실행부(4322), 동적특징추출부(4325) 및 동적특징변환부(4327)를 포함할 수 있다. The second analysis unit 4320 performs an execution step, a dynamic feature extraction step, and a feature conversion step. When the second analysis unit 4320 is implemented in hardware, the execution unit 4322, the dynamic feature extraction unit 4325, and It may include a dynamic feature conversion unit 4327.

실행부(4322)의 샌드박스리더(Sandbox Document Reader)는 입력된 비실행형 파일을 가상환경에서 대응 프로그램으로 실행하도록 한다. The sandbox document reader of the execution unit 4322 executes the input non-executable file as a corresponding program in a virtual environment.

실행부(4322)의 시스템콜분석부(System Call Hooking)는 실행된 대응 프로그램에서 파생하는 프로세스에서 특정 시스템 콜을 호출하는지 모니터링하고, 이를 통해 어떤 파라미터로 실행 행위를 하는지 분석할 수 있다. The system call analysis unit (System Call Hooking) of the execution unit 4322 monitors whether a specific system call is called by a process derived from the executed corresponding program, and through this, it can analyze what parameters the execution behavior is used for.

실행부(4322)의 시스템콜분석부(System Call Hooking)는 동적분석을 기반으로 모니터링하는 시스템콜과 그에 대응하여 추출 가능한 파라미터 데이터를 얻을 수 있다. The system call analysis unit (System Call Hooking) of the execution unit 4322 can obtain monitored system calls based on dynamic analysis and extractable parameter data corresponding to them.

예를 들면 실행부(4322)의 시스템콜분석부(System Call Hooking)는 프로그램이 실행되면서 Send API가 호출된 경우 그에 대응하는 패킷 데이터 등을 분석하고 네트워크를 통해 어떤 패킷 데이터가 어느 정도 전송되는지 등에 대한 시스템콜의 파라미터 정보를 얻을 수 있다. For example, when the Send API is called while the program is running, the system call analysis unit (System Call Hooking) of the execution unit 4322 analyzes the corresponding packet data, etc., and determines which packet data is transmitted through the network and to what extent. You can obtain parameter information for system calls.

실행부(4322)의 시스템콜분석부(System Call Hooking)는 비실행형 파일의 리더 프로그램이 실행하는 시스템콜의 스택을 역으로 추적하면서 그 추적 정보를 분석할 수 있다. 이러한 추적 정보는 시스템콜에 따른 함수의 실행순서와 그 함수들의 사용 변수 정보를 포함한다.The system call analysis unit (System Call Hooking) of the execution unit 4322 can trace back the stack of the system call executed by the leader program of the non-executable file and analyze the trace information. This tracking information includes the execution order of functions according to system calls and information on the variables used by those functions.

시스템콜분석부(System Call Hooking)에 대한 상세한 실시 예는 이하에서 다시 상세하게 설명한다. Detailed embodiments of the system call analysis unit (System Call Hooking) will be described in detail below.

동적특징추출부(4325)는 실행부(4322)가 가상환경에서 실행한 결과를 추출하고 수집할 수 있다. 예를 들어 동적특징추출부(4325)는 스크립트가 실행되면서 발생하는 여러 가지 명령어 정보, 리더 프로그램이 실행에 따른 네트워크 연결로 발생하는 통신 타입, IP 주소, 포트 번호 정보 등을 수집할 수 있다. The dynamic feature extraction unit 4325 can extract and collect the results of execution by the execution unit 4322 in a virtual environment. For example, the dynamic feature extraction unit 4325 can collect various command information generated as the script is executed, communication type, IP address, and port number information generated through network connection as the leader program is executed.

동적특징추출부(4325)는 리더 프로그램이 실행되면서 다운로드하는 여러 가지 패킷 데이터를 수집하거나, 그 패킷의 패이로드로부터 대상 파일의 경로나 패킷 내용에 대한 정보를 수집할 수 있다.The dynamic feature extraction unit 4325 can collect various packet data downloaded as the reader program is executed, or collect information about the path of the target file or packet contents from the payload of the packet.

다른 예로 동적특징추출부(4325)는 파일이 실행되거나 열리면서 실행되는 프로그램 및 그 대상 파일에 대한 정보를 얻을 수도 있다.As another example, the dynamic feature extraction unit 4325 may obtain information about the program being executed and its target file as the file is executed or opened.

동적특징변환부(4327)는 동적특징추출부(4325)가 수집하거나 추출한 정보를 변환시킨다. 예를 들어 동적특징변환부(4327)는 동적특징변환부(4327)가 추출한 특징 정보를 기반으로 사이버 위협 정보를 처리할 수 있도록 정규화 또는 벡터화시키는 과정을 수행한다. The dynamic feature conversion unit 4327 converts the information collected or extracted by the dynamic feature extraction unit 4325. For example, the dynamic feature conversion unit 4327 performs a normalization or vectorization process to process cyber threat information based on the feature information extracted by the dynamic feature conversion unit 4327.

도 48은 실시 예에 따른 파일에 대한 제2 타입의 분석에 의해 비실행형 파일의 동적 수행에 의해 추출되는 대상과 추출된 정보를 예시한 도면이다. FIG. 48 is a diagram illustrating an object and extracted information extracted by dynamic execution of a non-executable file by a second type of analysis of the file according to an embodiment.

비실행형 파일을 리더 프로그램으로 실행할 경우 프로그램 상 여러 가지 액션이 수행될 수 있다. 이 도면은 수행된 액션의 카테고리로 스크립트 실행/열기, 서버 연결, 다운로드, 파일 추출, 파일 실행/열기 등의 카테고리를 예시하였으나, 이외에 수많은 다른 액션이 있을 수 있다. When executing a non-executable file using a reader program, various actions can be performed in the program. This diagram illustrates categories of performed actions such as executing/opening a script, connecting to a server, downloading, extracting a file, and executing/opening a file, but there may be numerous other actions.

비실행형 파일의 리더 프로그램 실행으로 스크립트가 실행되는 경우 시스템콜API(System Call API)를 통해 WinExec, System 등의 함수가 실행될 수 있다. 이러한 함수들의 실행으로 커맨드라인 명령어가 실행될 수 있는데 여기서는 powershell.exe가 실행되는 것을 예시하였다. When a script is executed by executing the reader program of a non-executable file, functions such as WinExec and System can be executed through the System Call API. Command line commands can be executed by executing these functions. Here, powershell.exe is executed as an example.

비실행형 파일의 리더 프로그램 실행으로 다른 서버가 연결되는 경우 시스템콜API(System Call API)를 통해 Socket가 실행될 수 있는데 여기서는 그에 따라 발생하는 통신 타입의 파라미터로 AF_INFT를 예시하였다. 또한 경우 시스템콜API(System Call API)를 통해 Connect가 실행될 경우 포트 번호를 파라미터로 얻을 수도 있다. When another server is connected by executing the leader program of a non-executable file, a Socket can be executed through the System Call API. Here, AF_INFT is exemplified as a parameter of the communication type that occurs accordingly. Additionally, when Connect is executed through the System Call API, the port number can be obtained as a parameter.

그 밖에 예시한 바와 같이 비실행형 파일을 리더 프로그램으로 실행할 경우, 수행된 액션의 카테고리에 따라 시스템콜API(System Call API)를 통해 Send, SendTo, Recv, RecvFrom, Fopen, Fwirte, CreateFile, WriteFile, CreateProcess, ShellExecute 등의 함수가 실행될 수 있다. 각각의 시스템콜API(System Call API)의 함수들에 따라 추출될 수 있는 파라미터의 예를 오른쪽 섹션에 예시하였다.As other examples, when executing a non-executable file as a reader program, Send, SendTo, Recv, RecvFrom, Fopen, Fwirte, CreateFile, WriteFile, CreateProcess are performed through the System Call API depending on the category of the performed action. Functions such as , ShellExecute, etc. can be executed. Examples of parameters that can be extracted according to the functions of each System Call API are shown in the right section.

도 49는 파일의 사이버 위협 정보를 얻을 수 있는 예시도 중 파일분석부에 포함되어 파일에 대한 제3 타입의 분석을 실시하는 예를 개시한 도면이다. Figure 49 is a diagram illustrating an example of performing a third type of analysis on a file by being included in a file analysis unit among examples of obtaining cyber threat information on a file.

제3 분석부(4330)는 비실행형 파일에 대해 실행 준비 단계에서 메모리에 저장된 정보를 근거로 사이버 위협 정보의 특징을 얻는다. 가상 환경에서 동적 실행을 하기 직전의 메모리 상의 데이터를 분석하는 것이므로 이하에서는 편의상 제3 분석부(4330)은 마일드 동적 분석 단계를 수행한다라고 표현한다.The third analysis unit 4330 obtains characteristics of cyber threat information based on information stored in memory in the execution preparation stage for non-executable files. Since data in memory is analyzed immediately before dynamic execution in a virtual environment, hereinafter, for convenience, it is expressed that the third analysis unit 4330 performs a mild dynamic analysis step.

제3 분석부(4330)는 마일드 동적 분석 단계를 수행할 때, 파일 실행에 따른 악성 행위 준비 단계에서 메모리에 포함된 OP-code 및 연산자 정보, 또는 난독화가 해제된 악성 페이로드 데이터를 추출하여 분석할 수 있다.When performing the mild dynamic analysis step, the third analysis unit 4330 extracts and analyzes the OP-code and operator information contained in the memory or the de-obfuscated malicious payload data in the malicious action preparation step following file execution. can do.

제3 분석부(4330)는 위에서 설명한 동적 분석을 실행하면서 발생하는 파라미터들을 추출하는 것이 아니다. 제3 분석부(4330)는 가상 환경에서 동적 실행 직전에 악성 행위가 반드시 수반하는 시스템의 주요 함수들에 대해 일명 API 후킹(hooking)하도록 하여 해당 함수가 호출되는 경우 프로세스를 중지(suspended)상태로 하고, 그때 메모리에 로딩된 정보를 추출(dump)하는 것을 의미한다.The third analysis unit 4330 does not extract parameters generated while executing the dynamic analysis described above. The third analysis unit 4330 performs so-called API hooking on the main functions of the system that are necessarily accompanied by malicious actions just before dynamic execution in the virtual environment, and puts the process in a suspended state when the corresponding function is called. This means extracting (dumping) the information loaded into memory at that time.

이를 위해 제3 분석부(4330)은 실행준비 단계, 메모리추출단계, 데이터추출단계, 및 특징변환단계를 수행하는데, 제3 분석부(4330)가 하드웨어적으로 분리된 경우 제3 분석부(4330)는 실행준비부(4331), 메모리추출부(4333), 데이터추출부(4335), 및 특징변환부(4337)을 포함할 수 있다. To this end, the third analysis unit 4330 performs an execution preparation stage, a memory extraction stage, a data extraction stage, and a feature conversion stage. If the third analysis unit 4330 is separated in hardware, the third analysis unit 4330 ) may include an execution preparation unit 4331, a memory extraction unit 4333, a data extraction unit 4335, and a feature conversion unit 4337.

제3 분석부(4330)는 악성 행위를 준비하는 단계의 정보를 기초로 하여 악성 패이로드의 데이터를 메모리에서 얻어 분석할 수 있다. The third analysis unit 4330 can obtain and analyze malicious payload data from memory based on information on the preparation stage for malicious actions.

실행준비단계에서 실행준비부(4331)는 사용자 영역에서 비실행형 파일(Target file)과 리더 프로그램(application)을 준비한다. 실행준비부(4331)는 커널 영역에서 해당 리더 프로그램인 애플리케이션이 수행될 경우 수행되는 이벤트를 대비하여 여러 가지 파일 시스템, 네트워크 시스템 또는 메모리를 준비할 수 있다. In the execution preparation stage, the execution preparation unit 4331 prepares a non-executable file (target file) and a reader program (application) in the user area. The execution preparation unit 4331 can prepare various file systems, network systems, or memories in preparation for events that are performed when the application, which is the corresponding leader program, is executed in the kernel area.

그리고 실행준비부(4331)는 해당 애플리케이션이 실행 직전에 시스템의 주요 함수들에 대해 API 후킹(hooking)하도록 API 후킹 리스트 정보를 가지고 실행에 대비한다. 상세한 API 후킹 리스트 정보는 이하의 도면에서 예시하였다.And the execution preparation unit 4331 prepares for execution with API hooking list information so that the application hooks API for major functions of the system immediately before execution. Detailed API hooking list information is illustrated in the drawing below.

메모리추출부(4333)는 API 후킹 리스트 상에 함수가 호출되면 프로세스를 중지 상태로 하고 그때 메모리에 저장된 데이터를 덤핑(dumping)하여 정보를 추출한다. 메모리추출부(4333)는 함수의 프로세스 실행 직전의 데이터를 사이버 위협 정보가 될 수 있는 분석 정보를 얻을 수 있다.When a function is called on the API hooking list, the memory extraction unit 4333 stops the process and extracts information by dumping the data stored in the memory at that time. The memory extraction unit 4333 can obtain analysis information that can be cyber threat information from data immediately before the execution of the function process.

데이터추출부(4335)는 메모리추출부(4333)가 메모리 덤핑하여 얻은 데이터로부터 OP-code, 연산자(operand) 데이터 및 난독화 해제 데이터(deobfuscated data)를 얻을 수 있다 . The data extraction unit 4335 can obtain OP-code, operator data, and deobfuscated data from data obtained by memory dumping by the memory extraction unit 4333.

예를 들어 데이터추출부(4335)는 메모리추출부(4333)가 메모리 덤핑하여 얻은 데이터를 디스어셈블(disassemble)하고, 디스어셈블된 데이터로부터 OP-code, 연산자(operand) 데이터 및 난독화 해제 데이터(deobfuscated data) 등을 분류할 수 있다.For example, the data extraction unit 4335 disassembles the data obtained by memory dumping by the memory extraction unit 4333, and extracts OP-code, operator data, and deobfuscation data from the disassembled data ( deobfuscated data) can be classified.

여기의 데이터추출부(4335)는 전체 실행파일이 아닌 API 후킹 리스트 상에 함수들에 대응하는 OP-code, 연산자(operand) 데이터 및 난독화 해제 데이터 등에 대한 변환 데이터로서 분석 대상 데이터를 얻을 수 있다.The data extraction unit 4335 here can obtain analysis target data as conversion data for OP-code, operator data, and deobfuscation data corresponding to functions on the API hooking list, rather than the entire executable file. .

특징변환부(4337)는 얻은 OP-code, 연산자(operand) 데이터 및 난독화 해제 데이터(deobfuscated data)등을 기반으로 사이버 위협 정보를 처리할 수 있도록 정규화 또는 벡터화시키는 과정을 수행한다.The feature conversion unit 4337 performs a normalization or vectorization process to process cyber threat information based on the obtained OP-code, operator data, and deobfuscated data.

도 50은 실시 예에 따라 제3 분석부가 마일드 동적 분석을 수행할 경우 API 후킹 리스트 정보를 예시한 도면이다. Figure 50 is a diagram illustrating API hooking list information when a third analysis unit performs mild dynamic analysis according to an embodiment.

예시한 API 후킹 리스트 정보는 왼쪽 열에 API의 범주와 오른쪽 열에 각 API 범주에 포함되어 API 후킹 리스트에 포함될 수 있는 API를 각각 예시한 것이다 The example API hooking list information is an example of the API categories in the left column and the APIs included in each API category in the right column that can be included in the API hooking list.

API의 범주로 Window OS Native API, HTML DOM Parser API, VBS Script Engine API를 예시하였다. Examples of API categories include Window OS Native API, HTML DOM Parser API, and VBS Script Engine API.

Window OS Native API 범주에 대해서는 API 후킹에 사용될 수 있는 API 등을 예시하였고, HTML DOM Parser API 범주에 대해서는 7개 API를 예시하였고, VBS Script Engine API 범주에 대해서는 11개 API에 대해 예시하였다. For the Window OS Native API category, APIs that can be used for API hooking are exemplified, for the HTML DOM Parser API category, 7 APIs are exemplified, and for the VBS Script Engine API category, 11 APIs are exemplified.

도 51은 비실행형 파일의 사이버 위협 정보를 얻을 수 있는 실시 예 중 특징처리부를 설명하기 위한 도면이다. Figure 51 is a diagram for explaining a feature processing unit in an embodiment of obtaining cyber threat information of a non-executable file.

개시한 바와 같이 제1 분석부(4310), 및 제2 분석부(4320)는 각각 비실행형 파일에 대해 각각 정적특징정보 및 동적특징정보를 획득하고 분석할 수 있다.As disclosed, the first analysis unit 4310 and the second analysis unit 4320 can each obtain and analyze static characteristic information and dynamic characteristic information for non-executable files.

한편 제3 분석부(4330)는 가상 환경에서 비실행형 파일과 관련되어 실행되는 애플리케이션의 API 후킹(hooking)함으로써 그때의 메모리 정보로부터 그 비실행형 파일로 사이버 위협 정보를 획득하고 분석할 수 있다. 개시한 실시 예에서는 제3 분석부(4330)의 분석을 마일드 동적 분석이라고 호칭하였다.Meanwhile, the third analysis unit 4330 can obtain and analyze cyber threat information from the memory information at that time to the non-executable file by hooking the API of the application that is executed in relation to the non-executable file in a virtual environment. In the disclosed embodiment, the analysis of the third analysis unit 4330 is called mild dynamic analysis.

특징처리부(4400)는 제1 분석부(4310), 제2 분석부(4320) 및 제3 분석부(4330)가 각각 추출한 정적특징정보, 동적특징정보 및 마일드 동적특징정보를 선택적으로 취합하고 처리할 수 있다. The feature processing unit 4400 selectively collects and processes the static feature information, dynamic feature information, and mild dynamic feature information extracted by the first analysis section 4310, the second analysis section 4320, and the third analysis section 4330, respectively. can do.

악성탐지부(4500)은 특징처리부(4400)가 처리한 정보를 기반으로 비실행형 파일이 사이버 위협 정보를 포함하고 있는지 결정할 수 있다. The malicious detection unit 4500 may determine whether the non-executable file contains cyber threat information based on the information processed by the feature processing unit 4400.

그리고 공격기법분류부(4610)는 악성탐지부(4500)가 탐지한 사이버 위협 정보의 공격행위 또는 공격기법을 특정 체계에 따라 상세하게 분류할 수 있다. In addition, the attack technique classification unit 4610 can classify the attack acts or attack techniques of the cyber threat information detected by the malicious detection unit 4500 in detail according to a specific system.

공격그룹분류부(4620)는 악성탐지부(4500)가 탐지한 사이버 위협 정보의 공격행위가 누구에 의해 계획 또는 실행되는지를 분류할 수 있다. The attack group classification unit 4620 can classify by whom the attack based on cyber threat information detected by the malicious detection unit 4500 is planned or executed.

특징처리부(4400)는 정적특징정보, 동적특징정보 및 마일드 동적특징정보 중 하나를 이용하거나 적어도 둘 이상을 결합한 특징정보를 생성할 수 있다.The feature processing unit 4400 may generate feature information using one of static feature information, dynamic feature information, and mild dynamic feature information, or a combination of at least two or more.

특징처리부(4400)는 각각 추출된 정적특징정보, 동적특징정보 및 마일드 동적특징정보의 특성에 따라 또는 공격기법 또는 공격그룹의 분류 모델을 고려하여 추출된 정보를 선택적으로 결합하여 특징정보를 생성한다. The feature processing unit 4400 generates feature information by selectively combining the extracted information according to the characteristics of the extracted static feature information, dynamic feature information, and mild dynamic feature information or in consideration of the attack technique or classification model of the attack group. .

예를 들어 추출된 특징 정보 중 공격기법을 분류하기 위한 특징 정보와 공격그룹을 분류하기 위한 특징 정보와 다르거나 각 특징 정보의 중요도를 달리 평가하여 특징 정보를 결합할 수 있다. 이에 대한 설명은 이하의 도면에서 상세히 예시한다.For example, among the extracted feature information, the feature information for classifying attack techniques may be different from the feature information for classifying attack groups, or the feature information may be combined by evaluating the importance of each feature information differently. This is explained in detail in the drawings below.

따라서, 특징처리부(4400)는 추출된 정적특징정보, 동적특징정보 및 마일드 동적특징정보 중 적어도 하나의 정보를 선택적으로 또는 결합하여 사용할 수 있다. Accordingly, the feature processing unit 4400 may use at least one of the extracted static feature information, dynamic feature information, and mild dynamic feature information selectively or in combination.

예를 들어 정적특징정보와 동적특징정보와 다르게 마일드 동적특징정보만 어셈블리 코드 레벨의 정보를 가지고 있다면, 마일드 동적특징정보를 공격그룹 분류 모델에서 사용하지 않을 수도 있다.For example, if only mild dynamic feature information has assembly code level information, unlike static feature information and dynamic feature information, mild dynamic feature information may not be used in the attack group classification model.

이런 경우 악성탐지부(4500)나 공격기법분류부(4610)가 정적특징정보, 동적특징정보 및 마일드 동적특징정보 중 모든 특징정보를 사용하여 악성을 탐지하거나 공격기법을 분류하고, 공격그룹분류부(4620)는 별도로 정적특징정보와 동적특징정보를 선택적으로 사용하여 공격그룹을 분류할 수 있다. In this case, the malicious detection unit 4500 or the attack technique classification unit 4610 detects maliciousness or classifies the attack technique using all characteristic information among static characteristic information, dynamic characteristic information, and mild dynamic characteristic information, and the attack group classification unit (4620) can separately classify attack groups by selectively using static feature information and dynamic feature information.

이와 같이 추출된 특징정보가 모두 다른 중요도와 특성을 가지고 있으므로 그에 따라 선택하거나 결합된 특징정보에 기반하여 악성탐지, 공격기법분류 및 공격그룹분류를 각각 수행할 수 있다.Since the feature information extracted in this way all has different importance and characteristics, malicious detection, attack technique classification, and attack group classification can be performed respectively based on the selected or combined feature information.

한편, 악성탐지부(4500)는 비실행형 파일에 악성 여부를 기계 학습 모델을 기반으로 판단한다. 예를 들어 악성탐지부(4500)는 정적특징정보, 동적특징정보 및 마일드 동적특징정보 중 적어도 하나의 특징정보를 특징처리부(4400)가 처리한 경우, 그 특징 정보에 대응하는 특징 벡터 데이터를 기반으로 악성 여부를 탐지할 수 있다.Meanwhile, the malicious detection unit 4500 determines whether the non-executable file is malicious based on a machine learning model. For example, when the feature processing unit 4400 processes at least one feature information among static feature information, dynamic feature information, and mild dynamic feature information, the malicious detection unit 4500 bases the feature vector data corresponding to the feature information. You can detect whether it is malicious or not.

특징 벡터 데이터를 기반으로 악성 여부를 판단하는 예는 위에서 상세히 설명하였다.An example of determining maliciousness based on feature vector data was described in detail above.

도 52는 개시한 실시 예에 따라 비실행형 파일에서 추출된 특징 정보의 중요도를 비교한 예시도이다. Figure 52 is an example diagram comparing the importance of feature information extracted from a non-executable file according to the disclosed embodiment.

이 그래프의 예는 가로축이 특징정보에 따른 인덱스, 세로축이 중요도 스코어를 나타내는데 공격그룹 모델(Group model)에 따른 특징정보의 인덱스와 공격기법 식별지(TID model)에 따른 특징정보의 인덱스는 서로 다른 특징 인덱스에서 피크 값들 가지고 있다. In this example of a graph, the horizontal axis represents the index according to feature information, and the vertical axis represents the importance score. The index of feature information according to the attack group model (Group model) and the index of feature information according to the attack technique identifier (TID model) are different. It has peak values in the feature index.

이는 위에서 설명한 바와 같이 공격기법을 나타내는 특징정보와 공격그룹을 나타내는 특징정보의 특성이 서로 다름을 의미한다. As explained above, this means that the characteristics of the characteristic information representing the attack technique and the characteristic information representing the attack group are different.

따라서, 특징처리부(4400)는 이러한 특징 정보의 특성에 따라 각각 악성탐지, 공격기법분류 및 공격그룹분류 시에 각각 정적특징정보, 동적특징정보 및 마일드 동적특징정보를 다르게 선택하거나 선별적으로 결합하여 탐지모델 또는 분류모델이 수행되도록 할 수 있다. Therefore, the feature processing unit 4400 selects static feature information, dynamic feature information, and mild dynamic feature information differently or selectively combines them when detecting maliciousness, classifying attack techniques, and classifying attack groups, respectively, according to the characteristics of the feature information. A detection model or classification model can be performed.

도 53은 개시한 실시 예에 따라 공격기법분류부의 분류 모델을 설명하기 위한 예시도이다.Figure 53 is an example diagram for explaining the classification model of the attack technique classification unit according to the disclosed embodiment.

이 도면에서 실시 예에 따른 공격기법분류부가 공격기법을 분류하여 출력한 예를 나타낸다.This figure shows an example in which the attack technique classification unit according to the embodiment classifies and outputs attack techniques.

개시한 바와 같이 공격기법분류부는 비실행형 파일이 사이버 위협 정보를 포함하여 악성으로 판단된 경우, 특징처리부가 출력하는 사이버 위협에 대한 특징 벡터 데이터를 기반으로 기계 학습 모델을 수행하여 비실행형 파일의 공격 기법을 분류한다. As disclosed, when a non-executable file is determined to be malicious including cyber threat information, the attack technique classification unit performs a machine learning model based on the feature vector data for the cyber threat output by the feature processing unit to attack the non-executable file. Categorize techniques.

공격기법분류부가 기계 학습 모델을 이용하여 공격 기법을 분류할 때 훈련 데이터의 클래스 레이블(Class label)을 정답지로 하고 이를 기준으로 학습할 수 있다. 이러한 훈련 데이터는, 특징 벡터 데이터인 독립 변수와, 클래스 레이블인 종속 변수를 포함한다.When the attack technique classification unit classifies attack techniques using a machine learning model, the class label of the training data can be used as the answer key and learned based on this. This training data includes an independent variable, which is feature vector data, and a dependent variable, which is a class label.

일반적으로 종속 변수는 클래스 레이블이 하나의 인덱스 번호를 나타내는 정수 값(single label)이 될 수 있다. In general, a dependent variable can be an integer value (single label) whose class label represents an index number.

그런데 하나의 파일이 여러 개의 공격 기법을 포함할 수 있으므로, 공격기법분류부는 종속 변수를 1개의 정수 값으로 정의하지 않고 T개 벡터로 정의하는 다중 레이블링(multi label) 기법을 사용할 수 있다. 즉, 공격기법분류부는 특징벡터 데이터를 입력받고, 다중 레이블링 분류로서 공격 기법에 대응되는 이진 벡터로 분류할 수 있다. However, since one file can contain multiple attack techniques, the attack technique classification unit can use a multi-labeling technique that defines the dependent variable as T vectors rather than as a single integer value. In other words, the attack technique classification unit can receive feature vector data and classify it into a binary vector corresponding to the attack technique as a multi-labeling classification.

공격기법분류부는 다중 출력 분류 모델로서 각 클래스 레이블에 대한 이진 분류 모델을 학습하여 분류 가능한 공격 기법 개수인 T개 만큼 분류 모델을 생성할 수 있다. The attack technique classification unit is a multi-output classification model that learns a binary classification model for each class label and can generate as many classification models as T, which is the number of attack techniques that can be classified.

설명한 바를 간단히 수식으로 표현하면, T차원 벡터인 예측값

와 i번째 공격기법 분류모델 fi 의 입력 벡터 x에 대한 예측 값 oi는 다음과 같이 정의될 수 있다.To simply express what has been explained in a formula, the predicted value is a T-dimensional vector.

and the predicted value o i for the input vector x of the ith attack technique classification model f i can be defined as follows.

종속 변수인 클래스 레이블은 단일 레이블로 분류하면 T1059.005로 식별되는 공격기법이나 설명한 다중 레이블링으로 분류하면 공격기법 식별자 T1059.005, T1564.007, T1204.002에 대해 [1, 1, 0]와 같은 다차원 벡터로 표시될 수 있다. The class label, which is the dependent variable, is [1, 1, 0] for the attack technique identifiers T1059.005, T1564.007, and T1204.002 if classified as a single label, or the attack technique identified as T1059.005, or if classified as the described multiple labeling. It can be expressed as the same multidimensional vector.

그리고 공격기법분류부는 이 도면의 하단에 표시한 바와 같이 3개의 공격기법에 대한 확률로 출력할 수 있다.And the attack technique classification unit can output the probabilities for the three attack techniques as shown at the bottom of this figure.

도 54는 개시한 예에 따라 비실행형 파일에 대해 여러 분석 기법을 선택적 결합하여 식별한 공격기법을 예시한 도면이다. Figure 54 is a diagram illustrating an attack technique identified by selectively combining several analysis techniques for a non-executable file according to the disclosed example.

이 도면에서 각 공격기법의 식별자(기법ID), 공격기법의 명칭 및 각 공격기법의 설명을 예시하였다. In this drawing, the identifier (technique ID) of each attack technique, the name of the attack technique, and the description of each attack technique are illustrated.

예를 들면 공격기법 식별자 T1059.001의 명칭은 Command and Scripting Interpreter: PowerShell이고, 이 공격 기법은 PowerShell 스크립트를 이용하여 악성행위를 수행하는 비실행형 파일의 공격기법을 의미한다For example, the name of the attack technique identifier T1059.001 is Command and Scripting Interpreter: PowerShell, and this attack technique refers to an attack technique of a non-executable file that performs malicious actions using a PowerShell script.

위에서 예시한 공격기법 식별자 T1059.005의 명칭은 Command and Scripting Interpreter: Visual Basic이고, 이 공격 기법은 Visual Basic 프로그래밍 언어를 이용하여 악성행위를 수행하는 비실행형 파일의 공격기법을 의미한다. The name of the attack technique identifier T1059.005 shown above is Command and Scripting Interpreter: Visual Basic, and this attack technique refers to an attack technique of a non-executable file that performs malicious actions using the Visual Basic programming language.

도 55는 개시한 실시 예에 따라 공격그룹분류부의 분류 모델을 설명하기 위한 예시도이다. Figure 55 is an example diagram for explaining the classification model of the attack group classification unit according to the disclosed embodiment.

공격그룹분류부는 도 27 내지 도 28에서 예시한 실시예와 다르게 분류 모델에 기반하여 공격그룹을 분류할 수 있다. Unlike the embodiment illustrated in FIGS. 27 and 28, the attack group classification unit may classify attack groups based on a classification model.

공격그룹분류부는 특징처리부가 출력하는 특징벡터 데이터를 기반으로 공격행위를 의도한 공격그룹을 분류할 수 있다.The attack group classification unit can classify the attack group intended to commit an attack based on the feature vector data output by the feature processing unit.

이러한 클러스터링 일 예로 공격그룹분류부는 특징벡터 데이터에 기초하여 클러스터링 분석을 수행하고 유사한 성격을 포함하는 데이터를 하나의 그룹으로 그룹핑할 수 있다. As an example of such clustering, the attack group classification unit may perform clustering analysis based on feature vector data and group data containing similar characteristics into one group.

공격그룹분류부는 비실행형 파일에서 추출된 문서의 구조, 내용, 공격행위 첨부파일, 악성 데이터의 형태 등에 따라 클러스터링한 그룹들에 대해 클러스터링 식별정보를 각각 부여할 수 있다.The attack group classification unit can assign clustering identification information to groups clustered according to the structure and content of documents extracted from non-executable files, attack activity attachments, and types of malicious data.

그리고 공격그룹분류부는 부여한 클러스터링 식별정보(또는 그룹핑식별정보)에 따라 학습데이터를 디시전 트리(Decision Tree) 모델로 학습하고 클러스터링한 그룹들을 분류하도록 할 수 있다. In addition, the attack group classification unit can learn the learning data using a decision tree model and classify the clustered groups according to the given clustering identification information (or grouping identification information).

이 도면의 예는 클러스터링 식별정보(또는 그룹핑식별정보)에 따라 그룹들이 어떤 특징으로 구분되는지 분류하는 디시전 트리를 예시한다. The example in this figure illustrates a decision tree that classifies groups by characteristics according to clustering identification information (or grouping identification information).

가장 위의 박스는 루트 노드를 나타낸다. 클러스터링 식별정도를 가진 루트 노드가 비실행형 또는 실행형 파일이 포함하는 여러 가지 특징에 따라 디시전 노드에서 서브 노드들로 순차적으로 분리(splitting)되어, 학습된 의사 결정 트리 모델의 트리 구조를 보일 수 있다. The top box represents the root node. The tree structure of the learned decision tree model can be shown by sequentially splitting the root node with a clustering identification degree into sub-nodes from the decision node according to various characteristics included in the non-executable or executable file. there is.

여기서 디시전 노드와 서브 노드들도 각각 박스 형태로 나타내었다.Here, the decision node and subnodes are each shown in box form.

공격그룹분류부가 공격그룹을 분류할 경우, 클러스터링과 그룹에 따른 그룹프로파일링 정보를 얻을 수 있다. 예를 들어 공격그룹분류부는 문서 내부의 텍스트의 언어, 문서 내부 컨텐츠의 종류, 문서 내부에 특정 스트립트를 포함하는지, 또는 문서 실행 시 자동을 수행되는 액션이 포함되는지 등의 여러 가지 요건을 포함하는 그룹프로파일링 분석 정보를 제공할 수 있다.When the attack group classification unit classifies an attack group, clustering and group profiling information according to the group can be obtained. For example, the attack group classification unit is a group that includes several requirements, such as the language of the text inside the document, the type of content inside the document, whether the document contains a specific script, or whether actions that are automatically performed when the document is executed are included. Profiling analysis information can be provided.

이 도면의 예는 공격그룹분류부가 트리 구조 기반에 따라 그룹을 분류하는 예를 나타내며 6번째 분기를 통해 마지막 리프 노드(leaf node)들은 그룹들을 서로 구분할 수 있는 분류 모델을 예시한다.The example in this figure shows an example in which the attack group classification unit classifies groups based on a tree structure, and through the sixth branch, the last leaf nodes illustrate a classification model that can distinguish groups from each other.

이 트리 노드의 마지막 리프 노드(leaf node)들은 그룹을 구분하는 그룹 프로파일링 정보가 될 수 있다. 예를 들어 문서의 텍스트가 영어인지, 메타데이터가 포함되고 길이가 어느 것인지, 또는 컨텐츠를 포함하는지 등의 그룹을 구분하는 프로파일링 정보가 될 수 있다.The last leaf nodes of this tree node can be group profiling information that distinguishes groups. For example, this could be profiling information that distinguishes groups, such as whether the document's text is in English, whether it contains metadata and its length, or whether it contains content.

예를 들면 그룹 프로파일링 정보는 (1) 문서 내부에 텍스트가 영어, (2) 문서 내부에 미디어 컨텐츠가 없음, (3) 문서 내부에 자바스크립트가 포함됨, (4) 문서 실행 시 자동으로 수행되는 액션 기능이 있음 등의 정보를 포함할 수 있다.For example, group profiling information may include (1) the text inside the document is English, (2) there is no media content inside the document, (3) the document contains JavaScript, and (4) the document is automatically performed when the document is run. It may include information such as the presence of an action function.

이하에서는 위에서 개시한 동적분석 중 시스템콜분석부(System Call Hooking)의 상세한 실시 예를 개시한다. 위에서 개시한 바와 같이 정적분석특징을 기반으로 비실행형 파일의 악성여부를 판단하는 경우가 있을 수 있다. Below, a detailed example of the system call analysis unit (System Call Hooking) of the dynamic analysis disclosed above is disclosed. As described above, there may be cases where it is determined whether a non-executable file is malicious based on static analysis characteristics.

그러나 정적분석특징만으로는 외 악성 행위를 포함하는 비실행형 파일인지 또는 어떻게 악성 행위가 발생하는지 상세한 설명을 제공하기 힘든 경우가 많다. 따라서, 리더 프로그램을 실행하여 비실행형 파일을 로딩하면 악성 행위가 발생하는 과정을 정확하게 파악하고 그 설명을 제공할 수 있다.However, in many cases, it is difficult to determine whether a file is a non-executable file containing malicious behavior or to provide a detailed explanation of how the malicious behavior occurs based on static analysis characteristics alone. Therefore, by executing the reader program and loading a non-executable file, the process of malicious activity occurring can be accurately identified and an explanation provided.

비실행형 파일에 관련된 리더 프로그램이 실행되면 그 리더 프로그램은 운영체제가 제공하는 시스템콜의 조합에 따라 동작을 수행한다.When a leader program related to a non-executable file is executed, the leader program performs operations according to a combination of system calls provided by the operating system.

리더 프로그램이 윈도우 운영체체에서 실행되는 경우 다음과 같은 시스템콜 등이 사용될 수 있다. When the reader program runs on a Windows operating system, the following system calls can be used.

도 56은 위에서 설명한 비실행형 파일의 리더 프로그램 실행과 시스템콜을 예시한 도면이다. Figure 56 is a diagram illustrating the execution of the leader program and system call of the non-executable file described above.

비실행형 파일은 스크립트, 미디어파일, 실행파일, 다른 비실행형 파일, 텍스트 등을 포함할 수 있다. 이 비실행형 파일은 대응되는 리더 프로그램에 의해 실행될 수 있다. 리더 프로그램이 윈도우 운영체제에서 실행된다면 설명한 바와 같이 비실행형 파일의 포함된 파일에 따라 이 도면에서 예시한 여러 가지 시스템콜이 사용될 수 있다.Non-executable files can include scripts, media files, executable files, other non-executable files, text, etc. This non-executable file can be executed by the corresponding reader program. If the reader program runs on a Windows operating system, as described, various system calls illustrated in this figure can be used depending on the files included in the non-executable file.

예를 들어 비실행형 파일 내에 스크립트가 실행될 경우 WinExec, CreateProcess, ShellExecute의 시스템콜이 사용되고, 서버가 연결될 경우 Socket, connect 등의 시스템콜이 사용된다. 비실행형 파일 실행에 의해 다운로드 액션이 수행될 경우 send, sendto, recv, recvfrom 등의 시스템콜이 사용될 수 있다. 비실행형 파일 실행에 의해 파일 추출될 경우 fopen, fwrite, CreateFile, WriteFile 등의 시스템콜이, 파일 실행될 경우 WinExec, CreateProcess, system 등의 시스템콜이, 파일 열기 동작이 수행될 경우 ShellExecute, system 등의 시스템콜이 각각 사용될 수 있다.For example, when a script is executed within a non-executable file, system calls such as WinExec, CreateProcess, and ShellExecute are used, and when a server is connected, system calls such as Socket and connect are used. When a download action is performed by executing a non-executable file, system calls such as send, sendto, recv, and recvfrom can be used. When a file is extracted by executing a non-executable file, system calls such as fopen, fwrite, CreateFile, WriteFile, etc. are made. When a file is executed, system calls such as WinExec, CreateProcess, system, etc. are made. When a file open operation is performed, system calls such as ShellExecute, system, etc. Each call can be used.

그런데 리더 프로그램이 호출하는 이러한 시스템콜들은 그 시스템콜이 호출될 경우 후킹(hooking)(도면상 A 지점으로 표시)이 가능하다.However, these system calls called by the leader program can be hooked (marked at point A on the drawing) when the system call is called.

A 지점에서 시스템콜을 후킹할 경우 각 시스템콜에 전달되는 파라미터 값들이나 메모리 값을 덤핑(dumping)하여 얻을 수 있다.When hooking a system call at point A, it can be obtained by dumping the parameter values or memory values passed to each system call.

여기서는 윈도우 운영체제에서만 예시하였으나 모바일 운영체제나 리눅스 운영체제 등 다른 운영체제 상에서도 동일한 실시 예가 적용될 수 있다. Here, only the Windows operating system is exemplified, but the same embodiment can be applied to other operating systems such as mobile operating systems or Linux operating systems.

도 57은 실시 예에 따라 프로그램 코드상 시스템콜을 후킹하는 예를 설명하기 위한 도면이다. Figure 57 is a diagram for explaining an example of hooking a system call in a program code according to an embodiment.

이 도면에서 명령어 send의 경우 예시한 바와 같은 함수 시그너처를 포함할 수 있다. In this figure, the command send may include a function signature as shown in the example.

이 프로그램 코드 상에서 위 명령어에 따라 전송하는 정보는 [buf]와 [len]의 메모리 데이터를 덤핑함으로써 확인할 수 있다. In this program code, the information transmitted according to the above command can be confirmed by dumping the memory data of [buf] and [len].

이와 같이 비실행형 파일의 리더 프로그램이 실행되는 시스템콜에 따라 전달되는 파라미터 값 및 그 메모리 값을 덤핑하면 악성 행위가 어떤 동작을 유발시키고 어떤 정보가 이용되는지 확인할 수 있다.In this way, by dumping the parameter values and memory values transmitted according to the system call in which the leader program of the non-executable file is executed, it is possible to check what actions the malicious behavior causes and what information is used.

도 58은 실시 예에 따라 동적 분석을 통해 사이버 위협 정보를 추적할 수 있는 예를 개시한다.Figure 58 discloses an example of tracking cyber threat information through dynamic analysis according to an embodiment.

실시 예는 특정 운영체제 상의 리더 프로그램이 시스템콜을 사용할 경우 그 후킹 시점에서 리더 프로그램의 스택 트래이스(Stack Trace) 정보를 생성할 수 있다. In an embodiment, when a reader program on a specific operating system uses a system call, stack trace information of the reader program can be generated at the hooking point.

이 도면의 예시는 윈도우 운영체제에서 시스템콜 WinExec을 후킹한 후 생성한 스택 트래이스 정보를 통해 악성 행위의 순서와 관련 변수들에 따른 악성 행위 내용을 얻는 과정을 나타낸다.The example in this diagram shows the process of obtaining the sequence of malicious actions and the contents of malicious actions according to related variables through stack trace information generated after hooking the system call WinExec in the Windows operating system.

마지막 단계인 WinExec 시스템콜이 후킹된 시점에서 스택 트래이스를 예시하면 다음과 같다. 생성한 스택 트래이스 정보에 따르면 WinExec 시스템콜과 관하여 그 이전에 함수 main -> find_lastest_target -> get_script 순으로 호출된 것임을 알 수 있다.An example of the stack trace at the time when the WinExec system call, which is the last step, is hooked, is as follows. According to the generated stack trace information, it can be seen that the WinExec system call was previously called in the order of function main -> find_lastest_target -> get_script.

이 도면상의 함수를 포함하는 박스의 오른쪽에 각 함수가 사용하는 지역변수를 나타내었다. 예를 들면, find_lastest_target 함수는 지역변수로 count와 targets을 사용한다.To the right of the box containing the functions in this diagram, local variables used by each function are indicated. For example, the find_lastest_target function uses count and targets as local variables.

마지막에 get_script 함수에서 WinExec 시스템콜이 호출되었다. 이에 따라 악성 행위가 발생한 경우 스택 트래이스 정보를 이용하여 이에 대한 구체적인 메커니즘을 설명할 수 있다. Finally, the WinExec system call was called in the get_script function. Accordingly, if a malicious action occurs, the specific mechanism for it can be explained using stack trace information.

즉 스택 트래인스 정보 상의 시스템콜과 관련된 호출함수의 역순에 따라 다음과 같은 설명이 제공될 수 있다.That is, the following explanation can be provided according to the reverse order of the calling functions related to the system call in the stack train information.

(1) 시스템콜 WinExec을 통해 의심스러운 명령어 lpCmdLine을 실행하려고 함(One) Attempting to execute the suspicious command lpCmdLine through the system call WinExec

(2) 리더 프로그램을 통해 main -> find_lastest_target -> get_script 순으로 함수가 실행됨(2) Functions are executed in the order of main -> find_lastest_target -> get_script through the leader program.

(3) 각 함수의 지역변수는 다음과 같이 설정되며 지역변수에 대한 설명은 다음과 같음(3) The local variables of each function are set as follows, and the description of the local variables is as follows.

(a) main: (a) main:

target_list -- 지역변수의 설명target_list -- Description of local variables

(b) find_lastest_target:(b) find_lastest_target:

count -- 지역변수의 설명count -- Description of local variables

targets -- 지역변수의 설명targets -- Description of local variables

(c) get_script:(c) get_script:

script_src -- 지역변수의 설명script_src -- Description of local variables

cmd -- 지역변수의 설명cmd -- Description of local variables

실시 예에 따르면 비실행형 파일이 리더 프로그램에서 실행되어 악성 행위가 발생될 경우, 리더 프로그램이 운영체제 상의 시스템콜을 후킹한 후, 해당 시스템콜과 관련된 함수들을 순서와 그 함수들의 변수를 이용하여 악성 행위에 대한 구체적인 메커니즘을 제공할 수 있다.According to an embodiment, when a non-executable file is executed in a leader program and a malicious action occurs, the leader program hooks a system call on the operating system and then performs malicious action by using the order of functions related to the system call and the variables of those functions. A specific mechanism can be provided.

프로세서는 비실행형 파일을 입력받고 실행하는 리더 프로그램을 수행할 수 있다. 이 경우 상기 비실행형 파일을 실행하는 상기 리더 프로그램이 운영체제의 시스템콜을 수행할 경우 상기 시스템콜의 후킹 시점에서 상기 리더 프로그램의 스택 트래이스(Stack Trace) 정보를 생성할 수 있다. 그리고 프로세서는 상기 생성된 스택 트래이스 정보로부터 상기 시스템콜을 호출하는 호출 함수와 상기 호출 함수에 대응되는 변수를 얻고, 상기 얻은 호출 함수와 상기 호출 함수에 대응되는 변수에 대한 설명 정보를 제공할 수 있다.The processor can execute a reader program that receives and executes a non-executable file. In this case, when the reader program executing the non-executable file performs a system call of the operating system, stack trace information of the reader program can be generated at the point of hooking the system call. And the processor can obtain a call function that calls the system call and a variable corresponding to the call function from the generated stack trace information, and provide explanatory information about the obtained call function and variables corresponding to the call function. there is.

상기 설명 정보는, 상기 상기 시스템콜에 의해 사이버 위협 정보를 유발하는 명령어가 실행됨을 나타낼 수 있다. 상기 설명 정보는, 상기 시스템콜의 후킹 시점 이전의 상기 호출 함수가 호출된 순서를 포함할 수 있다. 또한, 상기 설명 정보는, 상기 호출 함수에 대응되는 변수에 대응되는 설명을 포함할 수 있다. The description information may indicate that a command causing cyber threat information is executed by the system call. The description information may include the order in which the call function was called before the hooking point of the system call. Additionally, the description information may include a description corresponding to the variable corresponding to the call function.

도 59는 개시한 사이버 위협 정보 처리 장치의 다른 일 실시 예를 예시한 도면이다. Figure 59 is a diagram illustrating another embodiment of the disclosed cyber threat information processing device.

서버(2100)의 프로세서는 응용 프로그램 인터페이스(Application Programming Interface) (1100)통해 수신된 비실행형 파일을 수신할 수 있다. The processor of the server 2100 may receive a non-executable file received through an application programming interface (Application Programming Interface) 1100.

서버(2100)의 프로세서는 API를 통해 수신한 비실행형 파일의 사이버 위협과 관련된 정적특징정보를 분석하여 추출하는 제1 특징분석모듈(18601)의 수행할 수 있다. The processor of the server 2100 can perform the first feature analysis module 18601, which analyzes and extracts static feature information related to cyber threats of non-executable files received through API.

제1 특징분석모듈(18601)이 수행하는 정적특징정보의 분석에 대한 상세한 예는 도 46 등에 기술하였다. A detailed example of the analysis of static feature information performed by the first feature analysis module 18601 is described in FIG. 46, etc.

서버(2100)의 프로세서는 API를 통해 수신한 비실행형 파일의 사이버 위협과 관련된 정적특징정보를 분석하여 추출하는 제2 특징분석모듈(18603)의 수행할 수 있다. The processor of the server 2100 can perform a second feature analysis module 18603 that analyzes and extracts static feature information related to cyber threats of non-executable files received through API.

제2 특징분석모듈(18603)이 수행하는 동적특징정보의 분석에 대한 상세한 예는 도 47, 도 48, 도 56 내지 도 58에 상세히 개시하였다.Detailed examples of the analysis of dynamic feature information performed by the second feature analysis module 18603 are disclosed in detail in FIGS. 47, 48, and 56 to 58.

제2 특징분석모듈(18603)가 동적특징정보의 분석할 경우, 비실행형 파일의 리더 프로그램이 그 운영체제에 요청하는 시스템콜을 후킹함으로써, 그때 발생한 메모리 데이터를 덤프하여 사이버 위협 정보를 얻을 수 있다. When the second feature analysis module (18603) analyzes dynamic feature information, the leader program of the non-executable file hooks a system call requested from the operating system, thereby dumping the memory data generated at that time to obtain cyber threat information.

제2 특징분석모듈(18603)은 시스템콜을 후킹 직전에 호출된 함수의 순서와 그 함수에 대응되는 파라미터로부터 악성 행위에 대한 메커니즘 정보를 얻을 수 있다.The second feature analysis module 18603 can obtain mechanism information about malicious behavior from the order of functions called just before hooking a system call and the parameters corresponding to the functions.

서버(2100)의 프로세서는 API를 통해 수신한 비실행형 파일의 사이버 위협과 관련된 마일드 동적특징정보를 분석하여 추출하는 제3 특징분석모듈(18605)의 수행할 수 있다. The processor of the server 2100 can perform a third feature analysis module 18605 that analyzes and extracts mild dynamic feature information related to cyber threats of non-executable files received through API.

제3 특징분석모듈(18605)이 수행하는 마일드 동적특징정보의 분석에 대한 상세한 예는 도 49 및 도 50에 상세히 개시하였다. Detailed examples of the analysis of mild dynamic feature information performed by the third feature analysis module 18605 are disclosed in detail in FIGS. 49 and 50.

제3 특징분석모듈(18605)은 비실행형 파일을 수행하는 애플리케이션 시스템의 주요 함수들에 대해 API 후킹(hooking)하도록 하여 해당 함수가 호출되는 경우 프로세스를 중지(suspended)상태로 하고, 그때 메모리에 로딩된 정보를 추출(dump)할 수 있다. The third feature analysis module (18605) hooks the API for the main functions of the application system that executes non-executable files, suspends the process when the function is called, and loads it into memory at that time. The information can be extracted (dumped).

제3 특징분석모듈(18605)는 그 메모리의 데이터를 디스어셈블하여 OP-code, 연산자(operand) 데이터 및 난독화 해제 데이터(deobfuscated data)를 얻고, 얻은 데이터에 기초하여 사이버 위협 정보에 관련된 특징 정보를 얻을 수 있다.The third feature analysis module (18605) disassembles the data in the memory to obtain OP-code, operator data, and deobfuscated data, and provides feature information related to cyber threat information based on the obtained data. can be obtained.

서버(2100)의 프로세서는 제1 특징분석모듈(18601), 제2 특징분석모듈(18603), 제3 특징분석모듈(18605)이 분석한 사이버 위협과 관련된 특징 정보들을 선택적으로 결합하여 사이버 위협 정보와 관련된 특징 데이터로 결합하여 생성하는 특징처리모듈(18607)을 수행할 수 있다.The processor of the server 2100 selectively combines the characteristic information related to the cyber threat analyzed by the first feature analysis module (18601), the second feature analysis module (18603), and the third feature analysis module (18605) to provide cyber threat information. A feature processing module (18607) that generates by combining feature data related to can be performed.

특징처리모듈(18607)의 상세한 실시 예는 도 51에 상세히 개시하였다. A detailed example of the feature processing module 18607 is disclosed in detail in Figure 51.

서버(2100)의 프로세서는 특징처리모듈(18607)가 처리한 사이버 위협 정보의 특징 정보에 기반하여 API를 통해 수신한 비실행형 파일에 악성 행위가 포함되는지 탐지하는 악성탐지모듈(18608)을 수행할 수 있다.The processor of the server 2100 performs a malicious detection module 18608 that detects whether a non-executable file received through the API contains malicious actions based on the characteristic information of the cyber threat information processed by the feature processing module 18607. You can.

서버(2100)의 프로세서는 악성탐지모듈(18608)가 수행한 결과에 따라 비실행형 파일에 악성 행위가 포함된 경우 AI 엔진(1230)을 수행하여 악성 행위의 공격기법과 공격그룹을 분류하는 분류모듈(18609)를 수행할 수 있다. The processor of the server 2100 is a classification module that performs the AI engine 1230 to classify attack techniques and attack groups of malicious actions when a non-executable file contains malicious actions according to the results of the malicious detection module 18608. (18609) can be performed.

분류모듈(18609)이 분류하는 비실행형 파일의 공격기법과 공격그룹에 대한 정보를 생성하는 상세한 실히 예는 도 52 내지 도 55에 상세히 개시하였다.Detailed examples of generating information about attack techniques and attack groups of non-executable files classified by the classification module 18609 are shown in detail in FIGS. 52 to 55.

도 60은 개시한 사이버 위협 정보 처리 방법의 다른 일 실시 예를 예시한 도면이다.Figure 60 is a diagram illustrating another embodiment of the disclosed cyber threat information processing method.

비실행형 파일을 입력받고 상기 입력된 비실행형 파일의 사이버 위협과 관련된 적어도 하나의 특징 분석을 수행한다(S4500)A non-executable file is input and at least one characteristic analysis related to a cyber threat of the input non-executable file is performed (S4500).

비실행형 파일의 사이버 위협과 관련된 정적특징정보, 동적특징정보, 마일드 동적특징정보를 각각 수행하는 하는 예들을 개시하였다. Examples of performing static characteristic information, dynamic characteristic information, and mild dynamic characteristic information related to cyber threats of non-executable files, respectively, have been disclosed.

정적특징정보의 분석에 대한 상세한 예는 도 46에, 동적특징정보의 분석에 대한 상세한 예는 도 47, 도 48, 도 56 내지 도 58에 각각 예시하였다. 그리고 마일드 동적특징정보의 분석에 대한 상세한 예는 도 49 및 도 50에 상세히 개시하였다.Detailed examples of analysis of static feature information are shown in Figure 46, and detailed examples of analysis of dynamic feature information are shown in Figures 47, 48, and 56 to 58, respectively. And detailed examples of analysis of mild dynamic feature information are disclosed in detail in FIGS. 49 and 50.

적어도 하나의 특징분석에 따른 분석정보들을 선택적으로 결합한 특징 정보에 기반해 비실행형 파일에 악성 행위가 포함되는지 탐지할 수 있다(S4600).It is possible to detect whether a non-executable file contains malicious behavior based on feature information that selectively combines analysis information according to at least one feature analysis (S4600).

비실행형 파일에 악성행위가 포함된 경우 공격기법에 대한 분류정보와 공격그룹 분류 정보를 생성할 수 있다(S4700). 비실행형 파일의 공격기법과 공격그룹에 대한 정보를 생성하는 상세한 실히 예는 도 52 내지 도 55에 상세히 개시하였다.If a non-executable file contains malicious behavior, classification information on attack techniques and attack group classification information can be generated (S4700). Detailed practical examples of generating information on attack techniques and attack groups for non-executable files are shown in detail in FIGS. 52 to 55.

위와 같이 분석된 비실행형 파일의 사이버 위협 정보를 사용자에게 제공한다(S4800).Cyber threat information on non-executable files analyzed as above is provided to the user (S4800).

따라서 개시한 실시예에 따르면 동일한 결과를 행하는 프로그램이라고 하더라도 함수들을 포함하는 프로그램의 로직(logic)에 따라 또는 프로그램의 로직의 변화가 없더라도 함수들이 분리되는 등 다르게 활용되는 경우라도 공격기법과 공격그룹에 대한 사이버 위협 정보를 정확하게 제공할 수 있고 악성 코드의 변종에 대응할 수 있다. Therefore, according to the disclosed embodiment, even if the program performs the same result, even if it is used differently, such as by separating functions depending on the logic of the program including the functions or even if there is no change in the logic of the program, it can be used against attack techniques and attack groups. It can provide accurate cyber threat information and respond to variants of malicious code.

실시예에 따르면 비실행형 파일에 악성행위가 포함되는 경우라도 이를 정확히 탐지하고 그에 따른 공격 기법과 공격 그룹에 대한 사이버 위협 정보를 제공할 수 있다. According to the embodiment, even if a non-executable file contains malicious activity, it can be accurately detected and cyber threat information about the corresponding attack technique and attack group can be provided.

따라서 개시한 실시예에 따르면 머신 러닝으로 학습된 데이터와 정확하게 일치하지 않는 악성 코드라도 탐지하고 대응할 수 있고 악성 코드의 변종에 대응할 수 있다. Therefore, according to the disclosed embodiment, it is possible to detect and respond to malicious code that does not exactly match data learned through machine learning, and to respond to variants of malicious code.

1010, 1020, 1030: 클라이언트
1100: 응용 프로그래밍 인터페이스
1210, 150000: 분석프레임워크
1211, 15100: 정적분석모듈
1213, 15200: 동적분석모듈
1215, 15300: 심층분석모듈
1217,15400: 연관관계분석모듈
1220,17000: 예측프레임워크
1221: 제 1예측정보생성모듈
1223: 제 2예측정보생성모듈
1230: AI 엔진
2000: 물리장치
2200: 데이터베이스
2100: 서버
2510, 2520, 2530,2540, 2610, 2620,2630 디시전 트리의 노드
10000: 인텔리전스 플랫폼
15101: 파일구조분석모듈
15103: 파일패턴분석모듈
15105: 파일제작정보분석모듈
15107: 파일환경분석모듈
15109: 파일관련분석모듈
15201: 환경준비모듈
15203: 파일실행모듈
15205: 행위수집모듈
15207: 분석결과취합모듈
15209: 분석환경복구모듈
15301: 디스어셈블링모듈
15303: 기계언어코드추출모듈
15309: 공격기법식별모듈
15307: 공격자식별모듈
15309: 테인트분석모듈
15401: 제1연관관계모듈
15403: 제2연관관계모듈
15409: 제3연관관계모듈
15407: 제4연관관계모듈
15409: 제5연관관계모듈
17100: 예측정보생성모듈
17101: 제1정보예측모듈
17103: 제2정보예측모듈
17105: 제3정보예측모듈
17107: 제4정보예측모듈
17109: 제5정보예측모듈
18000: 프레임워크
18100: 분석및예측모듈
18101, 18103, 18105: 제 1 모듈, 제 2 모듈, 제 3 모듈
18501, 18503, 18505, 18507: 제 1 실행모듈, 제 2 실행모듈, 제 3 실행모듈, 제 4 실행모듈
18601, 18603, 18605: 제1 특징분석모듈, 제2 특징분석모듈 제3 특징분석모듈
18607: 특징처리모듈
18608: 악성탐지모듈
18609: 분류모듈1010, 1020, 1030: Client
1100: Application programming interface
1210, 150000: Analysis framework
1211, 15100: Static analysis module
1213, 15200: Dynamic analysis module
1215, 15300: In-depth analysis module
1217,15400: Relationship analysis module
1220,17000: Prediction framework
1221: First predictive information generation module
1223: Second predictive information generation module
1230: AI Engine
2000: Physical devices
2200: Database
2100: server
Nodes of the decision tree 2510, 2520, 2530,2540, 2610, 2620,2630
10000: Intelligence Platform
15101: File structure analysis module
15103: File pattern analysis module
15105: File production information analysis module
15107: File environment analysis module
15109: File-related analysis module
15201: Environmental preparation module
15203: File execution module
15205: Behavior collection module
15207: Analysis result collection module
15209: Analysis environment recovery module
15301: Disassembly module
15303: Machine language code extraction module
15309: Attack technique identification module
15307: Attacker identification module
15309: Taint analysis module
15401: First association module
15403: Second association module
15409: Third association module
15407: Fourth association module
15409: Fifth association module
17100: Prediction information generation module
17101: First information prediction module
17103: Second information prediction module
17105: Third information prediction module
17107: 4th information prediction module
17109: Fifth information prediction module
18000: Framework
18100: Analysis and prediction module
18101, 18103, 18105: 1st module, 2nd module, 3rd module
18501, 18503, 18505, 18507: 1st execution module, 2nd execution module, 3rd execution module, 4th execution module
18601, 18603, 18605: 1st feature analysis module, 2nd feature analysis module, 3rd feature analysis module
18607: Feature processing module
18608: Malicious detection module
18609: Classification module

Claims

비실행형 파일을 입력받고 상기 입력된 비실행형 파일의 사이버 위협과 관련된 적어도 하나의 특징 분석을 수행하고 분석 정보를 생성하는 단계;
상기 생성된 적어도 하나의 분석정보들을 선택적으로 결합한 특징 정보에 기반해 상기 비실행형 파일에 악성 행위가 포함되는지 탐지하는 단계;
상기 비실행형 파일에 악성행위가 탐지된 경우, 상기 악성 행위에 따른 공격기법에 대한 분류정보와 공격그룹 분류 정보를 생성하는 단계; 및
상기 비실행형 파일의 생성된 정보를 기반으로 사이버 위협 정보를 사용자에게 제공하는 단계;를 포함하는 사이버 위협 정보 처리 방법.receiving a non-executable file, analyzing at least one feature related to a cyber threat of the input non-executable file, and generating analysis information;
Detecting whether the non-executable file contains malicious activity based on characteristic information obtained by selectively combining the generated at least one analysis information;
When a malicious activity is detected in the non-executable file, generating classification information on attack techniques and attack group classification information according to the malicious activity; and
A cyber threat information processing method comprising: providing cyber threat information to a user based on the generated information of the non-executable file.

제 1 항에 있어서,
상기 생성한 분석 정보는,
상기 비실행형 파일의 사이버 위협과 관련된 정적특징정보를 포함하는 사이버 위협 정보 처리 방법.According to claim 1,
The analysis information generated above is,
A method of processing cyber threat information including static characteristic information related to cyber threats of the non-executable file.

제 1 항에 있어서,
상기 생성한 분석 정보는,
상기 비실행형 파일의 사이버 위협과 관련된 동적특징정보를 포함하고,
상기 동적특징정보는 상기 비실행형 파일과 관련된 리더 프로그램이 운영체제상에 요청하는 시스템콜에 대한 후킹(hooking)을 수행하고 상기 후킹 시점에서 메모리 상에 데이터와 상기 후킹 시점 이전의 실행함수 및 파라미터로부터 얻은 정보에 기반하여 생성되는, 사이버 위협 정보 처리 방법.According to claim 1,
The analysis information generated above is,
Contains dynamic characteristic information related to cyber threats of the non-executable file,
The dynamic characteristic information is obtained by performing hooking on a system call requested from the operating system by a leader program related to the non-executable file, and from data in memory at the time of hooking and execution functions and parameters before the hooking time. A method of processing cyber threat information that is generated based on information.

제 1 항에 있어서,
상기 생성한 분석 정보는,
상기 비실행형 파일과 관련된 애플리케이션 실행 시 API 후킹(hooking)을 수행하고 상기 후킹 시점의 메모리 상 데이터로부터 얻은 특징 정보를 포함하는, 사이버 위협 정보 처리 방법.According to claim 1,
The analysis information generated above is,
A cyber threat information processing method that performs API hooking when running an application related to the non-executable file and includes characteristic information obtained from data in memory at the time of the hooking.

데이터를 저장하는 저장장치; 및
입력된 파일을 프로그램을 수행하는 프로세서를 포함하고,
상기 프로세서는,
응용 프로그램 인터페이스(Application Programming Interface; API)를 통해 상기 입력된 비실행형 파일의 사이버 위협과 관련된 적어도 하나의 특징 분석을 수행하고 분석 정보를 생성하고,
상기 생성된 적어도 하나의 분석정보들을 선택적으로 결합한 특징 정보에 기반해 상기 비실행형 파일에 악성 행위가 포함되는지 탐지하고;
상기 비실행형 파일에 악성행위가 탐지된 경우, 상기 악성 행위에 따른 공격기법에 대한 분류정보와 공격그룹 분류 정보를 생성하고; 및
상기 비실행형 파일의 생성된 정보를 기반으로 사이버 위협 정보를 사용자에게 제공하는; 사이버 위협 정보 처리 장치.A storage device that stores data; and
Includes a processor that executes a program on an input file,
The processor,
Perform at least one characteristic analysis related to a cyber threat of the input non-executable file through an application programming interface (API) and generate analysis information;
detecting whether the non-executable file includes malicious behavior based on characteristic information obtained by selectively combining the generated at least one analysis information;
When malicious behavior is detected in the non-executable file, classification information on attack techniques and attack group classification information according to the malicious behavior are generated; and
providing cyber threat information to the user based on information generated from the non-executable file; Cyber threat information processing device.

제 5 항에 있어서,
상기 생성한 분석 정보는,
상기 비실행형 파일의 사이버 위협과 관련된 정적특징정보를 포함하는 사이버 위협 정보 처리 장치.According to claim 5,
The analysis information generated above is,
A cyber threat information processing device including static characteristic information related to the cyber threat of the non-executable file.

제 5 항에 있어서,
상기 생성한 분석 정보는,
상기 비실행형 파일의 사이버 위협과 관련된 동적특징정보를 포함하고,
상기 동적특징정보는 상기 비실행형 파일과 관련된 리더 프로그램이 운영체제상에 요청하는 시스템콜에 대한 후킹(hooking)을 수행하고 상기 후킹 시점에서 메모리 상에 데이터와 상기 후킹 시점 이전의 실행함수 및 파라미터로부터 얻은 정보에 기반하여 생성되는, 사이버 위협 정보 처리 장치.According to claim 5,
The analysis information generated above is,
Contains dynamic characteristic information related to cyber threats of the non-executable file,
The dynamic characteristic information is obtained by performing hooking on a system call requested from the operating system by a leader program related to the non-executable file, and from data in memory at the time of hooking and execution functions and parameters before the hooking time. A cyber threat information processing device created based on information.

제 5 항에 있어서,
상기 생성한 분석 정보는,
상기 비실행형 파일과 관련된 애플리케이션 실행 시 API 후킹(hooking)을 수행하고 상기 후킹 시점의 메모리 상 데이터로부터 얻은 특징 정보를 포함하는, 사이버 위협 정보 처리 장치.According to claim 5,
The analysis information generated above is,
A cyber threat information processing device that performs API hooking when executing an application related to the non-executable file and includes characteristic information obtained from data in memory at the time of the hooking.

입력된 비실행형 파일의 사이버 위협과 관련된 적어도 하나의 특징 분석을 수행하고 분석 정보를 생성하고;
상기 생성된 적어도 하나의 분석정보들을 선택적으로 결합한 특징 정보에 기반해 상기 비실행형 파일에 악성 행위가 포함되는지 탐지하고;
상기 비실행형 파일에 악성행위가 탐지된 경우, 상기 악성 행위에 따른 공격기법에 대한 분류정보와 공격그룹 분류 정보를 생성하고; 및
상기 비실행형 파일의 생성된 정보를 기반으로 사이버 위협 정보를 사용자에게 제공하는, 사이버 보안 위협 정보 처리하는 프로그램을 저장하는 컴퓨터 판독 가능한 저장 매체.perform analysis of at least one feature related to a cyber threat of the input non-executable file and generate analysis information;
detecting whether the non-executable file includes malicious behavior based on characteristic information obtained by selectively combining the generated at least one analysis information;
When malicious behavior is detected in the non-executable file, classification information on attack techniques and attack group classification information according to the malicious behavior are generated; and
A computer-readable storage medium storing a program for processing cyber security threat information, which provides cyber threat information to a user based on the generated information of the non-executable file.