CN112115465B

CN112115465B - Method and system for detecting typical attack behavior of malicious code

Info

Publication number: CN112115465B
Application number: CN202010826647.5A
Authority: CN
Inventors: 薛静锋; 韩伟杰; 王勇; 张继; 单纯
Original assignee: Beijing Institute of Technology BIT; Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Current assignee: Beijing Institute of Technology BIT; Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2022-11-04
Anticipated expiration: 2040-08-17
Also published as: CN112115465A

Abstract

The invention discloses a typical malicious code attack behavior detection method and system, belongs to the technical field of network security, and can realize comprehensive characterization of a typical malicious behavior attack process of a malicious code. The technical scheme of the invention is as follows: and running malicious codes in the sandbox environment, and extracting a dynamic system call API sequence and an original ontology knowledge sequence from the generated dynamic analysis report. And calculating the classification contribution degree aiming at each API, and sequencing according to the classification contribution degree from large to small to obtain a malicious sequencing sequence. And sequentially selecting the APIs as search starting points, finding a position A where the search starting point is located in the original ontology knowledge sequence, respectively performing forward traversal search and backward traversal search starting from the position A in the original ontology knowledge sequence, and extracting an ontology knowledge tuple corresponding to the API which belongs to the same behavior type as the search starting point to form an ontology knowledge string. And taking the typical attack behavior of the malicious codes represented by the ontology knowledge string as a detection result.

Description

Method and system for detecting typical attack behavior of malicious code

Technical Field

The invention relates to the technical field of network security, in particular to a method and a system for detecting typical attack behaviors of malicious codes.

Background

Under the current network space environment, malicious codes become weapons which are mainly relied on by attackers to launch network attacks, the attack mechanism is more complex, and the destruction function is more powerful, so that the malicious codes become the main threat to the network space. For serious challenges caused by malicious codes, researchers mainly adopt a machine learning method to carry out automatic analysis and detection. In the detection process, researchers mainly extract relevant features of malicious codes in a static, dynamic or mixed analysis mode, and then train a classifier by adopting a machine learning method to carry out automatic detection and classification.

The current research work aiming at the malicious codes mainly focuses on the accurate detection of the malicious codes, namely, the judgment result of whether the malicious codes are malicious or not is given finally by extracting the relevant characteristics of the malicious codes. While current research enables effective detection of malicious code, significant shortcomings remain in establishing a thorough understanding and appreciation of malicious code.

Because the current research only provides a result for judging whether a program is a malicious code, and the attack process of typical malicious behaviors of the malicious code is not comprehensively analyzed, the mining and cognition of the typical attack behaviors of the malicious code are lacked, the malicious code is difficult to be comprehensively understood, and the targeted protective measures are not easy to be formulated.

Disclosure of Invention

In view of this, the invention provides a method and a system for detecting typical attack behaviors of malicious codes, which can realize comprehensive characterization of the attack process of typical malicious behaviors of malicious codes and realize comprehensive mining and cognition of typical attack behaviors of malicious codes by constructing ontology knowledge strings for characterizing typical attack behaviors of malicious codes.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

s1, running malicious codes in a sandbox environment, and extracting a dynamic system call API sequence and an original ontology knowledge sequence from a generated dynamic analysis report.

S2, calculating a classification contribution degree aiming at each API in the dynamic system call API sequence, and sequencing according to the classification contribution degrees from large to small, namely sequencing the malice to obtain a malice sequencing sequence

S3, according to the malicious sequence

And sequentially selecting the APIs as search starting points.

And S4, finding a position A where the search starting point is located in the original ontology knowledge sequence, respectively performing forward traversal search and backward traversal search starting from the position A in the original ontology knowledge sequence, extracting an ontology knowledge tuple corresponding to the API which belongs to the same behavior type as the search starting point, and forming an ontology knowledge string.

And S5, taking typical attack behaviors of the malicious codes represented by the ontology knowledge string as detection results.

Further, the dynamic system call API sequence comprises all APIs called by the system in the running process of the malicious code; the original ontology knowledge sequence consists of ontology knowledge tuples corresponding to each API; each ontology tuple contains the API and its operands.

Further, the forward traversal search specifically includes: and taking the API corresponding to the previous position of the position A, if the behavior type of the API corresponding to the previous position is consistent with the behavior type of the API corresponding to the search starting point, adding the ontology tuple of the API corresponding to the previous position into the forward ontology sub-string, updating the position A to be the previous position, and repeatedly executing forward traversal search until the behavior type of the API corresponding to the previous position is inconsistent with the behavior type of the API corresponding to the search starting point.

The backward traversal search specifically includes: and taking the API corresponding to the next position of the position A, if the behavior type of the API corresponding to the next position is consistent with the behavior type of the API corresponding to the search starting point, adding the ontology knowledge tuple of the API corresponding to the next position into the backward ontology knowledge sub-string, updating the position A to be the next position, and repeatedly executing backward traversal search until the behavior type of the API corresponding to the next position is inconsistent with the behavior type of the API corresponding to the search starting point.

And after the forward traversal search and the backward traversal search are finished, combining the obtained forward ontology knowledge substring and the obtained backward ontology knowledge substring into an ontology knowledge string.

Further, in S3, sequentially taking a malicious ith API as a search starting point, with an initial value of i being 1; in S4, judging the number of ontology knowledge groups in the ontology knowledge string is added, if the number of ontology knowledge groups in the ontology knowledge string is smaller than a set threshold value, i is increased by 1, and the step returns to S3; the setting threshold is set empirically.

Further, the behavior types of the API mainly include file operations, system operations, process/thread operations, registry operations, storage operations, kernel operations, network operations, device operations, window operations, and text operations.

Another embodiment of the invention provides a typical attack behavior detection system for malicious codes, which comprises a data acquisition module, a data preprocessing module, an ontology knowledge string extraction module and a behavior detection module;

and the data acquisition module is used for operating malicious codes in the sandbox environment, extracting a dynamic system call API sequence and an original ontology knowledge sequence from the generated dynamic analysis report and sending the dynamic system call API sequence and the original ontology knowledge sequence to the data preprocessing module.

A data preprocessing module used for calculating the classification contribution degree aiming at each API in the dynamic system call API sequence and sequencing according to the classification contribution degree from large to small, namely sequencing the malice to obtain a malice sequencing sequence

And sending the information into an ontology knowledge string extraction module.

An ontology string extraction module for extracting the ontology string according to the malicious sequence

Sequentially selecting APIs as search starting points; and finding a position A where the search starting point is located in the original ontology knowledge sequence, respectively performing forward traversal search and backward traversal search in the original ontology knowledge sequence from the position A, extracting an ontology knowledge tuple corresponding to an API (application programming interface) which belongs to one behavior type with the search starting point to form an ontology knowledge string, and sending the ontology knowledge string to a behavior detection module.

And the behavior detection module is used for taking the typical attack behavior of the malicious code represented by the ontology knowledge string as a detection result.

The backward traversal search specifically includes: taking the API corresponding to the next position of the position A, if the behavior type of the API corresponding to the next position is consistent with the behavior type of the API corresponding to the search starting point, adding the ontology knowledge tuple of the API corresponding to the next position into the backward ontology knowledge sub-string, updating the position A to be the next position, and repeatedly executing backward traversal search until the behavior type of the API corresponding to the next position is inconsistent with the behavior type of the API corresponding to the search starting point;

Further, in the ontology knowledge string extraction module, a malicious ith API is sequentially taken as a search starting point, and an initial value of i is 1; and in the ontology knowledge string extraction module, judging the number of ontology knowledge groups in the ontology knowledge string is added, if the number of ontology knowledge groups in the ontology knowledge string is less than a set threshold value, i is increased by 1, a search starting point is updated, a new ontology knowledge string is obtained again, and the new ontology knowledge string is sent to the behavior detection module. The setting threshold is set empirically.

Has the advantages that:

according to the detection method and system for typical attack behaviors of the malicious code, provided by the invention, based on the basis that the behavior characteristics of the malicious code can be effectively represented by dynamic system calling information, dynamic analysis is carried out on the malicious code, a dynamic system calling API sequence is extracted, the contribution degree of the API is calculated, and the API sequence is sequenced; in addition, the characteristics of the program behavior process can be effectively described based on ontology knowledge, and an ontology model is introduced to construct a knowledge representation framework of malicious codes; on the basis, traversing the ontology knowledge sequence of the malicious code based on the classification contribution degree of the dynamic system call and the behavior type information of the API, and extracting a meaningful ontology knowledge string from the original ontology knowledge sequence of the malicious code. The extracted ontology knowledge string can effectively reflect the implementation process of typical malicious behaviors of the malicious codes, an ontology knowledge representation framework of the typical malicious behaviors of the malicious codes is built, and system cognition of typical attack behaviors of the malicious codes is achieved.

Drawings

Fig. 1 is a flowchart of a typical attack behavior detection method for malicious code according to an embodiment of the present invention;

FIG. 2 is a process for generating an ontology knowledge sequence based on API and ontology knowledge association provided by an embodiment of the present invention;

fig. 3 is a block diagram of a typical attack behavior detection system for malicious code according to another embodiment of the present invention.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The invention provides a typical attack behavior detection method for malicious codes, and the flow of the typical attack behavior detection method is shown in figure 1.

The principle of the invention is as follows:

the API information can effectively characterize the behavior characteristics of the program, and is therefore often used to characterize the behavior characteristics of the program. In addition, research finds that the API with more obvious maliciousness can be found out to describe the program behavior characteristics by calculating the classification contribution degree of the API, and the API with higher classification contribution degree has more obvious maliciousness expression. And the research finds that the malicious program usually continuously executes the same type of system call in the process of executing the malicious operation. That is, those sequences of consecutive system calls will often be a concrete manifestation of typical malicious operations.

Therefore, the invention firstly calculates the classification contribution degree of the system calling API, selects the API with higher classification contribution degree, and then carries out traversal search on the original ontology knowledge sequence based on the behavior type of the API. The behavior types of the API mainly include file operation, system operation, process/thread operation, registry operation, storage operation, kernel operation, network operation, device operation, window operation, text operation, and the like. According to the behavior type of the API, the ontology knowledge substrings belonging to the same behavior type are selected from the ontology knowledge sequence, and the extracted ontology knowledge substrings can represent a complete typical attack behavior operation process of the malicious codes.

As shown in fig. 1, the method comprises the following steps:

The dynamic system call API sequence includes all APIs that the system calls during the execution of the malicious code.

For example: the dynamic system call API sequence may be expressed as:

api ₁ is the ith API in the API sequence; and n is the total number of the API in the API sequence. (ii) a

The original ontology knowledge sequence consists of ontology knowledge tuples corresponding to each API; each ontology tuple contains the API and its operands. That is, each tuple represents the operation information of one API in the API sequence of the malicious code, and the tuple formalization is represented as follows:

Onto _i ＝[api _i ,object _i ](1≤i≤n)

wherein, api _i Representing the ith A in an API sequencePI，object _i Represents api _i The operation object of (1). Thus, an ontology knowledge sequence corresponding to the dynamic system call API sequence may be established as follows:

S3, according to the malicious sequence

And sequentially selecting the APIs as search starting points.

And S4, finding a position A where the search starting point is located in the original ontology knowledge sequence, respectively performing forward traversal search and backward traversal search starting from the position A in the original ontology knowledge sequence, and extracting an ontology knowledge element group corresponding to the API which belongs to the same behavior type as the search starting point to form an ontology knowledge string.

That is, malicious code will usually continuously execute the same type of system call during the course of executing malicious operations. That is, those sequences of consecutive system calls will often be a concrete manifestation of typical malicious operations. Therefore, in order to generate a meaningful ontology knowledge sequence, the ontology knowledge string is constructed by extracting ontology knowledge tuples corresponding to the API belonging to the same behavior type from the originally generated ontology knowledge sequence based on the classification contribution degree and the behavior type information of the API, and the extracted ontology knowledge string can accurately reflect the operation process of typical malicious behaviors of a program and establish an ontology knowledge representation framework of the typical malicious behaviors of malicious codes.

In the embodiment of the present invention, the forward traversal search specifically includes:

and taking the API corresponding to the previous position of the position A, if the behavior type of the API corresponding to the previous position is consistent with the behavior type of the API corresponding to the search starting point, adding the ontology knowledge tuple of the API corresponding to the previous position into the forward ontology knowledge sub-string, updating the position A to be the previous position, and repeatedly executing forward traversal search until the behavior type of the API corresponding to the previous position is inconsistent with the behavior type of the API corresponding to the search starting point.

The backward traversal search specifically includes:

and taking the API corresponding to the next position of the position A, if the behavior type of the API corresponding to the next position is consistent with the behavior type of the API corresponding to the search starting point, adding the ontology knowledge tuple of the API corresponding to the next position into the backward ontology knowledge sub-string, updating the position A to be the next position, and repeatedly executing backward traversal search until the behavior type of the API corresponding to the next position is inconsistent with the behavior type of the API corresponding to the search starting point.

And after the forward traversal search and the backward traversal search are finished, combining the obtained forward ontology knowledge substrings and the obtained backward ontology knowledge substrings into an ontology knowledge string.

And S5, taking the typical attack behavior of the malicious code represented by the ontology knowledge string as a detection result. And based on the ontology knowledge string extracted in the S4, the implementation process of the typical attack behavior of the malicious code is effectively represented, the typical attack behavior of the malicious code is represented, and a researcher is assisted to establish system cognition of the typical attack behavior of the malicious code.

In the embodiment of the present invention, for an API with high maliciousness, if an ontology knowledge string extracted by the API with high maliciousness contains fewer ontology tuples, detection of typical attack behaviors of a malicious code may not be performed, so in S3, in an embodiment of the present invention, a maliciousness ith API is sequentially taken as a search starting point, and an i initial value is 1; in S4, judging the number of ontology knowledge groups in the ontology knowledge string is added, if the number of ontology knowledge groups in the ontology knowledge string is smaller than a set threshold value, i is increased by 1, and the step returns to S3; the setting threshold is empirically set, and may be set to a small value such as 3 or 4, for example.

A specific example of generating an ontology string is shown in fig. 2. The ontology knowledge string represents the process by which malicious code generates a malicious executable file. The extraction process of this specific example is explained in detail as follows:

(1) And selecting the SetFilePointer with higher classification contribution degree as the current analysis API based on the classification contribution degree, wherein the behavior type of the SetFilePointer belongs to the file operation class. Finding an ontology knowledge sentence corresponding to the SetFilePointer in the original ontology knowledge sequence, and then traversing the original ontology knowledge sequence in the forward direction and the backward direction;

(2) In the forward traversal process, the behavior types of GetFileType, ntCreateFile and SetFilePointer are found to be consistent, so ontology statements corresponding to the APIs are added into a forward ontology string;

(3) In the backward traversal process, behavior types of NtAllocateVirtualMemroy, ntTaaddFile, ntCreateFile, getFileType and NtWriteFile are found to be consistent with SetFilePointer, and ontology knowledge sentences corresponding to the APIs are added into backward ontology knowledge sub-strings;

(4) And combining the forward ontology knowledge substring and the backward ontology knowledge substring to form a complete ontology knowledge representation substring.

(5) In the specific analysis process, manual support is also needed, and we find that in the ontology knowledge sequence of the sample, process operation is connected after the file operation, and the operation purpose is to execute the malicious file created in the file operation process. Therefore, the process operation and the file operation process are combined to form a complete malicious behavior process.

Another embodiment of the present invention further provides a typical attack behavior detection system for malicious codes, which is shown in fig. 3 and includes a data acquisition module, a data preprocessing module, an ontology string extraction module, and a behavior detection module.

And the data acquisition module is used for operating malicious codes in the sandbox environment, extracting a dynamic system call API sequence and an original body knowledge sequence from the generated dynamic analysis report and sending the dynamic system call API sequence and the original body knowledge sequence to the data preprocessing module.

A data preprocessing module, configured to calculate a classification contribution degree for each API in the dynamic system call API sequence, and sort the APIs according to the classification contribution degrees from small to large, that is, sort the APIs by malicious intent, so as to obtain a malicious intent sort sequence

And sending the information to an ontology knowledge string extraction module.

An ontology knowledge string extraction module for extracting the ontology knowledge string according to the malicious sequence

Sequentially selecting APIs as search starting points; and finding a position A where the search starting point is located in the original ontology knowledge sequence, respectively performing forward traversal search and backward traversal search starting from the position A in the original ontology knowledge sequence, extracting an ontology knowledge tuple corresponding to the API which belongs to the same behavior type as the search starting point to form an ontology knowledge string, and sending the ontology knowledge string to a behavior detection module.

And the behavior detection module is used for taking the typical attack behavior of the malicious code represented by the ontology string as a detection result.

In the embodiment of the invention, the dynamic system call API sequence comprises all APIs called by the system in the running process of the malicious code; the original ontology knowledge sequence consists of ontology knowledge tuples corresponding to each API; each ontology tuple contains the API and its operands.

In the embodiment of the present invention, the forward traversal search specifically includes: and taking the API corresponding to the previous position of the position A, if the behavior type of the API corresponding to the previous position is consistent with the behavior type of the API corresponding to the search starting point, adding the ontology knowledge tuple of the API corresponding to the previous position into the forward ontology knowledge sub-string, updating the position A to be the previous position, and repeatedly executing forward traversal search until the behavior type of the API corresponding to the previous position is inconsistent with the behavior type of the API corresponding to the search starting point.

In the embodiment of the invention, in an ontology knowledge string extraction module, malicious ith bit APIs are sequentially taken as search starting points, and an initial value of i is 1; the body knowledge string extraction module is added with judgment on the number of body knowledge groups in the body knowledge string, if the number of the body knowledge groups in the body knowledge string is less than a set threshold value, i is increased by 1, a search starting point is updated, a new body knowledge string is obtained again, and the new body knowledge string is sent to the behavior detection module; the setting threshold is set empirically.

In the embodiment of the present invention, the behavior types of the API mainly include file operation, system operation, process/thread operation, registry operation, storage operation, kernel operation, network operation, device operation, window operation, and text operation.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The typical attack behavior detection method for the malicious code is characterized by comprising the following steps of:

s1, running malicious codes in a sandbox environment, and extracting a dynamic system call API sequence and an original ontology knowledge sequence from a generated dynamic analysis report;

s2, calculating classification contribution degrees aiming at each API in the dynamic system call API sequence, and sequencing according to the classification contribution degrees from large to small, namely sequencing the malice to obtain malice rankSequence of sequences

S3, sequencing the sequence according to the maliciousness

Sequentially selecting APIs as search starting points;

s4, finding a position A where the search starting point is located in the original ontology knowledge sequence, respectively performing forward traversal search and backward traversal search starting from the position A in the original ontology knowledge sequence, extracting an ontology knowledge tuple corresponding to the API which belongs to the same behavior type as the search starting point, and forming an ontology knowledge string;

and S5, taking the typical attack behavior of the malicious code represented by the ontology knowledge string as a detection result.

2. The method of claim 1, wherein the dynamic system call API sequence includes all APIs for system calls during the malicious code runtime;

the original ontology knowledge sequence consists of ontology knowledge groups corresponding to each API; each ontology tuple contains the API and its operands.

3. The method of claim 1, wherein the forward traversal search is specifically:

taking an API corresponding to the previous position of the position A, if the behavior type of the API corresponding to the previous position is consistent with the behavior type of the API corresponding to the search starting point, adding an ontology knowledge tuple of the API corresponding to the previous position into a forward ontology knowledge sub-string, updating the position A to be the previous position, and repeatedly executing forward traversal search until the behavior type of the API corresponding to the previous position is inconsistent with the behavior type of the API corresponding to the search starting point;

the backward traversal search specifically includes:

taking the API corresponding to the next position of the position A, if the behavior type of the API corresponding to the next position is consistent with the behavior type of the API corresponding to the search starting point, adding the ontology knowledge tuple of the API corresponding to the next position into the backward ontology knowledge sub-string, updating the position A to be the next position, and repeatedly executing backward traversal search until the behavior type of the API corresponding to the next position is inconsistent with the behavior type of the API corresponding to the search starting point;

4. The method according to claim 1, 2 or 3, wherein in S3, a malicious ith bit API is taken as a search starting point in sequence, and the initial value of i is 1;

in the step S4, judging the number of ontology knowledge groups in the ontology knowledge string is added, if the number of ontology knowledge groups in the ontology knowledge string is smaller than a set threshold value, i is increased by itself by 1, and the step returns to the step S3;

the setting threshold is set empirically.

5. The method of claim 1, 2 or 3, wherein the types of behavior of the API primarily include file operations, system operations, process/thread operations, registry operations, storage operations, kernel operations, network operations, device operations, window operations, and text operations.

6. The typical attack behavior detection system for the malicious code is characterized by comprising a data acquisition module, a data preprocessing module, an ontology knowledge string extraction module and a behavior detection module;

the data acquisition module is used for operating malicious codes in a sandbox environment, extracting a dynamic system call API sequence and an original body knowledge sequence from a generated dynamic analysis report and sending the dynamic system call API sequence and the original body knowledge sequence to the data preprocessing module;

the data preprocessing module is used for calculating the classification contribution degree aiming at each API in the dynamic system call API sequence and calculating the classification contribution degree according to the classification contribution degreeSorting according to the size, namely, sorting according to the maliciousness to obtain a maliciousness sorting sequence

Sending the ontology knowledge string to the ontology knowledge string extraction module;

the ontology knowledge string extraction module is used for sequencing the sequence according to the malice

Sequentially selecting APIs as search starting points; finding a position A where the search starting point is located in the original ontology knowledge sequence, respectively performing forward traversal search and backward traversal search from the position A in the original ontology knowledge sequence, extracting an ontology knowledge group corresponding to an API (application programming interface) which belongs to one behavior type with the search starting point to form an ontology knowledge string, and sending the ontology knowledge string to the behavior detection module;

7. The system of claim 6, wherein the dynamic system call API sequence includes all APIs for system calls during the running of the malicious code;

the original ontology knowledge sequence consists of ontology knowledge tuples corresponding to each API; each ontology tuple contains the API and its operands.

8. The system of claim 6, wherein the forward traversal search is specifically:

the backward traversal search specifically includes:

9. The system according to claim 6, 7 or 8, wherein in the ontology string extracting module, malicious ith bit APIs are sequentially taken as search starting points, and the initial value of i is 1;

the ontology knowledge string extraction module is additionally used for judging the number of ontology knowledge groups in the ontology knowledge string, if the number of ontology knowledge groups in the ontology knowledge string is smaller than a set threshold value, i is increased by 1, a search starting point is updated, a new ontology knowledge string is obtained again and sent to the behavior detection module;

the set threshold is set empirically.

10. The system of claim 6, 7 or 8, wherein the types of behavior of the API primarily include file operations, system operations, process/thread operations, registry operations, store operations, kernel operations, network operations, device operations, window operations, and text operations.