CN115859277B

CN115859277B - Host intrusion detection method based on system call sequence

Info

Publication number: CN115859277B
Application number: CN202310072261.3A
Authority: CN
Inventors: 李涛; 唐聪; 何俊江; 兰小龙; 方文波; 陈姿妤
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2023-02-07
Filing date: 2023-02-07
Publication date: 2023-05-02
Anticipated expiration: 2043-02-07
Also published as: CN115859277A

Abstract

The invention discloses a host intrusion detection method based on a system call sequence, which relates to the technical field of computer security and comprises the following steps: s1: capturing system call information and dividing the system call information into a plurality of system call sequences; s2: defining an abnormal activity track represented by an abnormal sequence; s3: storing mapping relations among features with different granularities; s4: converting the relation mapping diagram into an abstract behavior tree; s5: pruning the abstract behavior tree, and S6: converting the captured system call sequence into a leaf node sequence, and extracting features from the new leaf node sequence; s7: performing feature dimension reduction on the extracted feature vector; s8: and taking the feature vector after dimension reduction as the input of a machine learning model, and dividing the corresponding leaf node sequence into two types of abnormality and normal. The method solves the problems of overhigh vector dimension and overlong time consumption generated in the feature extraction process in the prior art, and can reduce the hardware cost required by host deployment.

Description

Host intrusion detection method based on system call sequence

Technical Field

The invention relates to the technical field of computer security, in particular to a host intrusion detection method based on a system call sequence.

Background

Existing intrusion detection systems (Intrusion detection systems, abbreviated IDS) can be categorized into Network-based intrusion detection systems (Network-based Intrusion detection systems, abbreviated NIDS) and Host-based intrusion detection systems (Host-based Intrusion detection systems, abbreviated HIDS). Where NIDS is typically deployed at a backbone network node to identify network intrusion events by detecting network traffic, HIDS is deployed on hosts to monitor various host data, such as logs, directories, files, and registries, to detect and prevent malicious activity. In contrast to NIDS, HIDS has the ability to detect internal attacks and advanced persistent threats (AdvancedPersistent Threat, abbreviated as APT), which can be considered the last line of defense to secure network assets.

Currently, in the aspect of a host intrusion detection system, the intrusion detection system based on machine learning/deep learning is the intrusion detection system with the best performance and the most wide application at present, in order to construct good characteristics for detection, system call information is used as the information of the most primitive and the finest granularity of an operating system, so that a system call sequence also becomes the most widely used and most frequent characteristic of HIDS for constructing an intrusion detection engine.

Feature extraction is a critical task for intrusion detection systems, but since this operation itself is very time consuming, some attacks may be performed before the feature selection/extraction task is completed. At present, typical feature extraction methods in constructing a system call-based intrusion detection engine are an N-gram sliding window, a TF-IDF (terminal Frequency-Inverse Document) and a window Frequency method (combining the N-gram with the TF-IDF), wherein the N-gram scans the whole system call, extracts N continuous system call sequences from the N-gram, and retains sequence information in the execution process of the system call, but does not consider the importance of different extracted features for distinguishing intrusions. In contrast, the TF-IDF method can be used to distinguish the importance of different features, but cannot preserve the order information of the system. Compared with the N-gram and TF-IDF methods, the window frequency method combines the advantages of the N-gram and the TF-IDF, and makes up the defects of the N-gram and the TF-IDF.

The window frequency method flow is shown in fig. 2, and the specific steps are as follows:

A. system call information is captured from the system log and divided into system call sequences S1, S2 and Si of different lengths for subsequent data processing.

B. The system call sequences S1, S2 and Si are marked with normal or abnormal system call sequence labels, so that the construction of a subsequent machine learning intrusion detection engine is facilitated

C. The method comprises the steps of extracting features from the system call by using a window frequency method, converting a system call sequence into feature vectors to be suitable for input of a machine learning intrusion detection engine, wherein a N-gram is used for taking a feature segment with a fixed length of N from the system call sequence (for example, taking a feature segment with a fixed length of 3, then the feature segment of S1 is [4 168 42, 168 42 102,.. 168 168 4, 168 4 240 ]), and then a TF-IDF method is used for giving weights (for example, a weight of '4 168 42' is '0.01045553'), so that different system call sequences can be converted into vector representations suitable for the intrusion detection engine by using the method.

D. The system call sequence vector representation and the corresponding classification labels are sent to a machine learning model for training, and a machine learning engine for intrusion detection can be constructed through training of a large number of system call sequence data.

However, the existing window frequency method directly extracts relevant features from the original system call sequence, and in order to meet the requirements of the detection engine on feature fragments with different lengths, different fixed-length capturing relevant feature fragments need to be set, which causes the number of relevant feature fragments to increase exponentially, and further causes that the dimension of the extracted feature vector is too high and the feature extraction time is too long, and the intrusion detection engine constructed by the window frequency method needs to consume a large amount of storage resources and calculation resources.

Disclosure of Invention

The invention aims to solve the problems of overhigh vector dimension and overlong time consumption generated in the feature extraction process in the prior art, and can reduce the hardware cost required for deploying a host, and provides a host intrusion detection method and device based on a system call sequence.

In a first aspect, the present invention provides a method for intrusion detection based on system calls, comprising a system call feature extraction stage and a leaf node sequence detection stage, wherein

The system call feature extraction stage includes:

s1, capturing system call information, dividing the captured system call information into a plurality of system call sequences, and marking corresponding sequence labels;

s2, defining an abnormal activity track represented by the abnormal sequence through different granularity characteristic characterization modes;

s3, storing the mapping relation between the features with different granularities by using a relation mapping diagram;

s4, converting the relation mapping diagram into an abstract behavior tree;

s5, pruning is carried out on the abstract behavior tree, and the structure of the pruned abstract behavior tree is stored;

the leaf node sequence detection stage comprises the following steps:

s6, mapping leaf nodes through an abstract behavior tree, converting the captured system call sequence into a leaf node sequence, and extracting features from the new leaf node sequence by using a window frequency method;

s7, performing feature dimension reduction on the extracted feature vector;

s8, taking the feature vector after dimension reduction as the input of a machine learning model, and dividing the corresponding leaf node sequence into two types of abnormality and normal.

Optionally, in step S2, the granularity characteristic characterization mode includes:

the method comprises the steps of an original system call sequence feature representation mode, a system behavior feature representation mode and a system kernel module feature representation mode; the characteristic particle sizes are fine particle size characterization, low-level coarse particle size characterization and high-level coarse particle size characterization respectively.

Optionally, step S3, storing the mapping relationship between the features with different granularities using a relationship map includes:

the mapping relation between the original system call and the system behavior is many-to-one, the mapping relation between the system behavior and the system kernel module is many-to-one, and the mapping relation between the features with different granularities is stored through the relation mapping diagram;

optionally, step S4, converting the relational mapping map into an abstract behavior tree includes:

and converting the graph storage mode of the relation map into a tree storage mode, and storing the relation map by using an abstract tree structure.

Optionally, step S5, pruning the abstract behavior tree, and storing the pruned abstract behavior tree structure includes:

and selecting to cut off different leaf nodes each time, measuring the pruning effect through the accuracy of the model, considering that the current abstract behavior tree meets the feature extraction requirement when the accuracy reaches a certain preset threshold, and storing the current abstract behavior tree structure.

Optionally, step S7, performing feature dimension reduction on the extracted feature vector includes:

and carrying out feature dimension reduction on the extracted feature vector through singular value decomposition.

Optionally, step S8, taking the feature vector after the dimension reduction as an input of a machine learning model, and classifying the corresponding leaf node sequence into two types of abnormality and normal, including:

the feature vector after dimension reduction and the classification label are divided into a training set, a testing set and a verification set, wherein the training set is used for training a model and determining parameters, the testing set is used for determining a network structure and adjusting super parameters of the model, the verification set is used for checking generalization capability of the model, different machine learning algorithm models are selected for parameter selection, and cross verification is used for evaluating model effects.

In a second aspect, the present invention provides a system call based intrusion detection apparatus, including a system call feature extraction unit and a leaf node sequence detection unit, wherein

The system call feature extraction unit includes:

the capturing unit is used for capturing system call information, dividing the captured system call information into a plurality of system call sequences and marking corresponding sequence labels;

the granularity unit is used for defining an abnormal activity track represented by the abnormal sequence through different granularity characteristic characterization modes;

a mapping unit, configured to store a mapping relationship between the features with different granularities using a relationship map;

the tree conversion unit is used for converting the relation mapping diagram into an abstract behavior tree;

the pruning unit is used for pruning the abstract behavior tree and storing the pruned abstract behavior tree structure;

the leaf node sequence detection unit includes:

the leaf conversion unit is used for carrying out leaf node mapping through the abstract behavior tree, converting the captured system call sequence into a leaf node sequence, and carrying out feature extraction from the new leaf node sequence by using a window frequency method;

the dimension reduction unit is used for carrying out feature dimension reduction on the extracted feature vector;

and the output unit is used for taking the feature vector after the dimension reduction as the input of a machine learning model, and dividing the corresponding leaf node sequence into two types of abnormality and normal.

Optionally, the granularity unit, the granularity characteristic characterization mode includes:

Optionally, the mapping unit, configured to store the mapping relationship between the features with different granularities using a relationship map, includes:

optionally, the tree conversion unit for converting the relationship map into an abstract behavior tree includes:

Optionally, the pruning unit is configured to prune the abstract behavior tree, and the saving the pruned abstract behavior tree structure includes:

Optionally, the dimension reduction unit is configured to perform feature dimension reduction on the extracted feature vector, and includes:

Optionally, the output unit is configured to divide the feature vector after the dimension reduction into two types of abnormality and normal according to the corresponding leaf node sequence, and includes:

Compared with the prior art, the technical scheme of the invention has the following advantages:

the invention has the advantages that the number of feature fragments generated by feature extraction is reduced, the dimension of feature vectors generated by the feature fragments is reduced, the time expenditure of feature extraction is reduced, the intrusion detection method constructed by the invention saves the calculation resources and storage resources required by deployment, reduces the minimum hardware requirement on a deployment host, solves the contradiction between cost and performance, and can help customers to realize an efficient and intelligent intrusion detection scheme under the condition of lower configuration.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.

FIG. 1 is a flow chart of the invention;

FIG. 2 is a window method implementation flow;

FIG. 3 is a mapping relationship between three granularity characterization modes;

FIG. 4 is a schematic pruning diagram of an abstract behavior tree.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings and examples.

Example 1

As shown in FIG. 1, the present invention provides a method for detecting host intrusion based on a system call sequence, which mainly comprises a system call feature extraction stage and a leaf node sequence detection stage, wherein

The system call feature extraction phase is as follows:

s1, capturing system call information, dividing the captured system call information into a plurality of system call sequences, and marking corresponding sequence labels.

Capturing system call information through a specific system call capturing program in an original system call information capturing stage; the captured system call log information is divided into a plurality of system call sequences, and corresponding category labels are marked.

Different operating systems provide a process tracking system call interface through which a tracking function of a process can be realized. The parent process controls the child process and changes the core mirror image of the child process, including reading and writing data of the child process space. The basic principle of original system call information capture is that after process tracking is used, all signals sent to the tracked child process are forwarded to the parent process, while the child process is blocked. After the father process receives the signal, it can check and modify the stopped sub-process, then make the sub-process continue to run, in this way, it can capture the system call information, then process the system call information into multiple original system call sequences, then make corresponding sequence labels, so as to facilitate the subsequent feature extraction and model training.

S2, defining abnormal activity tracks represented by the abnormal sequences through different granularity characteristic characterization modes.

Optionally, the granularity characteristic characterization mode includes: the method comprises the steps of an original system call sequence feature representation mode, a system behavior feature representation mode and a system kernel module feature representation mode; the characteristic particle sizes are fine particle size characterization, low-level coarse particle size characterization and high-level coarse particle size characterization respectively.

As shown in fig. 2, the feature sizes of the original system call sequence feature representation, the system behavior feature representation and the system kernel module feature representation are respectively fine-granularity representation, low-level coarse-granularity representation and high-level coarse-granularity representation. Defining an abnormal activity track represented by the abnormal sequence through three different granularity characteristic expression modes; the system call sequence comprises a system call sequence, a system call interface, a kernel function sub-module and a kernel module, wherein more than hundred system call numbers (fine granularity) are mapped into seventy system behaviors (low-level coarse granularity) according to behaviors represented by the system call sequence, and seventy system behavior interfaces are mapped into seven kernel function sub-modules (high-level coarse granularity) according to functions of the system behaviors, so that each original system call sequence can be converted into a system behavior sequence and a system kernel module sequence through different granularity characteristic representation modes.

And S3, storing the mapping relation between the features with different granularities by using a relation mapping diagram.

Optionally, storing the mapping between the features of different granularities using a relationship map includes: the mapping relation between the original system call and the system behavior is many-to-one, the mapping relation between the system behavior and the system kernel module is many-to-one, and the mapping relation between the features with different granularities is stored through the relation mapping diagram.

As shown in FIG. 3, three granularity characterization modes have a specific mapping relation with each other, intrusion behaviors can be detected through an original system call sequence, a system behavior sequence and a system kernel module sequence, and the mapping relation represented by three different granularity characteristics is represented by using a relation mapping diagram, wherein the original system call sequence has a good detection effect, the system behavior sequence has a poor detection effect, the system kernel module sequence has a poor detection effect, the time cost and the performance cost of the three sequences are sequentially reduced, and a multi-granularity mixed sequence is constructed from the three sequences in order to balance the detection effect and the resource cost.

S4, converting the relation mapping graph into an abstract behavior tree.

Optionally, converting the relational map to an abstract behavior tree comprises: and converting the graph storage mode of the relation map into a tree storage mode, and storing the relation map by using an abstract tree structure.

The mapping relation of the relation mapping graph is many-to-one, is the same as the node relation of the tree structure, and the mapping relation (named abstract action tree) is stored through the tree structure so as to facilitate the subsequent adjustment of the tree structure.

S5, pruning is carried out on the abstract behavior tree, and the structure of the pruned abstract behavior tree is stored.

Optionally, pruning the abstract behavior tree, and saving the pruned abstract behavior tree structure includes: and selecting to cut off different leaf nodes each time, measuring the pruning effect through the accuracy of the model, considering that the current abstract behavior tree meets the feature extraction requirement when the accuracy reaches a certain preset threshold, and storing the current abstract behavior tree structure.

As shown in fig. 4, the leaf nodes of the current abstract behavior tree are composed of fine granularity characteristics, pruning operation is performed on the abstract behavior tree, different leaf nodes are pruned each time, after pruning, the leaf nodes of the abstract behavior tree are composed of characteristic representations with different granularity (fine granularity, low-level coarse granularity and high-level coarse granularity), the leaf nodes of the abstract behavior tree are pruned through a plurality of rounds, the leaf nodes of the finally reserved abstract behavior tree are composed of system call nodes, system behavior nodes and system kernel module nodes, each system call number corresponds to a leaf node of the abstract behavior tree, and each system call sequence composed of the system call numbers can be converted into a new leaf node sequence through the tree.

The leaf node sequence detection stage comprises the following steps:

compared with the original sequence, the leaf node sequence not only retains the information stored in the original sequence, but also greatly reduces the vector dimension generated in the feature extraction process, and obviously reduces the time cost and the calculation cost.

S7, performing feature dimension reduction on the extracted feature vector.

Optionally, performing feature dimension reduction on the extracted feature vector through singular value decomposition;

the machine learning model has higher requirement on the feature dimension, and the extracted feature vector needs to be subjected to dimension reduction treatment, wherein the singular value dimension reduction method has high speed and good effect, and can reduce the dimension of the extracted feature vector to the formulated dimension.

Optionally, the feature vector and the classification label after dimension reduction are divided into a training set, a testing set and a verification set, wherein the training set is used for training a model and determining parameters, the testing set is used for determining a network structure and adjusting super parameters of the model, the verification set is used for checking generalization capability of the model, and finally, the intrusion detection engine with high efficiency and low cost is obtained.

Dividing data into a training set and a testing set, selecting different machine learning algorithm models, selecting parameters, using the cross-validation evaluation model effect to continuously perform parameter tuning, constructing an intrusion detection engine with high accuracy and low cost, and finally deploying the engine on a host to realize intrusion detection on the host.

A system call sequence is used below to illustrate the principles and processes of the present invention for feature fragment reduction.

Original system call sequence T: {5 125 6 53 6 91 4 78 78 78 125 122 192};

if T is taken as the whole corpus of feature extraction, extracting features from the corpus by using a window frequency method; and the fixed length of the characteristic fragments is set to be K, and the number of the generated characteristic fragments is N (characteristic vector dimension)

If k=1, the extracted feature segments are:

[5]，[125]，[6]，[3]，[91]，[4]，[78]，[122]，[192]；N=T(1)->9；

if k=2, the extracted feature segments are:

[5 125]，[125 6]，[6 5]，[5 3]，[3 6]，[6 91]，[91 4]，[4 78]，[78 78]，[78 125]，[125 122]，[122 192]；N=T(2)->12；

if k=3, the extracted feature segments are:

[5 125 6]，[125 6 5]，[6 5 3]，…，[125 122 192]；N=T(3)->12；

if k=4, the extracted feature segments are:

[5 125 6 5]，[125 6 5 3]，[6 5 3 6]，…，[78 125 122 192]；N=T(4)->12；

if k=5, the extracted feature segments are:

[5 125 6 5 3]，[125 6 5 3 6]，[6 5 3 6 91]，…，[7878 125 122 192]；N=T(5)->11；

when k=1-5, the number of feature fragments generated is: t (1-5) =t (1) +t (2) +t (3) +t (4) +t (5) =56;

when the method is adopted, the abstract behavior tree is utilized to convert the original system call sequence into the following leaf node sequence

L：{fs-xattr kernel-sched fs-xattr fs-xattr io fs-xattr kernel-capability io fs-stat fs-stat fs-stat kernel-schedkernel-sched ipc-sem}；

At this time, L is taken as a whole corpus of feature extraction, and features are extracted from the corpus by using a window frequency method; setting the fixed length value of the characteristic fragments as K, and setting the number of the generated characteristic fragments as N;

if k=1, the extracted feature segments are:

[fs-xattr]，[kernel-sched]，[io]，[kernel-capability]，[fs-stat]，[ipc-sem]；N=L(1)->6；

if k=2, the extracted feature segments are:

[fs-xattr kernel-sched]，[kernel-schedfs-xattr]，…，[kernel-sched ipc-sem]；N=L(2)->12；

if k=3, the extracted feature segments are:

[fs-xattr kernel-sched fs-xattr]，…，[kernel-schedkernel-sched ipc-sem]；N=L(3)->12；

if k=4, the extracted feature segments are:

[fs-xattr kernel-sched fs-xattr fs-xattr]，…，[fs-xattrfs-xattr kernel-sched ipc-sem]；N=L(4)->11；

if k=5, the extracted feature segments are:

[fs-xattr kernel-sched fs-xattr fs-xattr io]，…，[fs-statfs-stat kernel-sched kernel-sched ipc-sem]；N=L(5)->10；

when k=1-5, the number of feature fragments generated is: l (1-5) =l (1) +l (2) +l (3) +l (4) +l (5) =51;

from the above, it can be seen that L (i) <=t (i) and L (i-j) <=t (i-j), where (0 < i < j; i, j is a positive integer), the above explains the principle of the present invention;

in an actual application scene, the size of the corpus is far larger than that of the corpus, and the corpus is often composed of tens of thousands of pieces of original system call sequence data, so that the advantages of the invention are fully proved along with the increase of the size of the corpus;

therefore, the ADFA-LD data set is used as a corpus, and the advantages of the method for reducing the feature fragments can be evaluated on the corpus;

the feature segments T (k) generated on the ADFA-LD corpus using the original system call sequence and the feature segments L (k) generated by the leaf nodes of the present invention are represented by Table 1 below:

TABLE 1

The above describes the feature segment extraction method of the present invention, and the vectorization method of the feature segment of the present invention is explained next;

defining a corpus composed of a selected plurality of leaf node sequences as

The ith leaf node in the corpus is connected with the nodeThe sequence is defined as

Wherein->

Representing a feature fragment contained by the leaf node; />

A tag (normal sequence or malicious sequence) corresponding to the leaf node sequence, wherein ∈>

Indicating that the sequence is a normal sequence,/->

Indicating that the sequence is a malicious sequence; wherein->

Representing the number of leaf node sequences in the corpus.

The present invention uses tf-idf techniques to convert feature fragments into vectors that are used in an input format suitable for the various classifier models;

the detailed description of the calculation of tf-idf values for feature fragment terms is as follows:

word frequency

The calculation formula is shown as (1), +.>

The ith characteristic fragment +.>

Of>

Representing the number of times a feature segment bi appears in the whole corpus,/->

Representation ofThe total number of all feature fragments contained in the corpus;

(1)

inverse file frequency

The calculation formula is shown as (2), +.>

Representing the i-th characteristic fragment->

Is the inverse of the file frequency of>

Representing the sum of all leaf node sequences contained in the corpus, +.>

Indicating that the characteristic fragment +.>

Leaf node sequence number of->

To avoid the case where the denominator is 0 (when the feature segments in the test set do not appear in the expected library of training sets);

(2)

therefore, the calculation formula of tf-idf is shown as (3), and the characteristic fragment

Is equal to the word frequency +.>

Frequency +.>

Is multiplied by +.>

The tf-idf value of (2) is defined as +.>

Then for a leaf node sequence comprising a plurality of characteristic fragments +.>

Conversion to vector representations

；

In general, the transformed feature vector may have a higher dimension when the fixed length value of the feature segment is set to be larger or it is desired to include a plurality of feature segments of different lengths.

In order to reduce the dimension of the feature vector faster, the invention adopts the SVD method to reduce the dimension, because the SVD method has higher calculation efficiency than the principal component calculation method.

And then taking the feature vectors after dimension reduction as the input of various machine learning models (four machine learning classification models), and finally, dividing the corresponding leaf node sequences into two types of abnormality and normal.

Compared with the prior art that related features are directly extracted from an original system call sequence by a window frequency method, the method maps the original system call sequence into the leaf node sequence, and then extracts feature fragments on the leaf node sequence, so that the speed of increasing the number of the feature fragments is obviously slowed down, the time consumption of feature extraction is reduced, and in addition, the accuracy of the built intrusion detection engine is improved to a certain extent; performing performance evaluation on the data set ADFA-LD, when the fixed length n of the characteristic fragments extracted by the window method is set to be 3, the number of the characteristic fragments generated by the leaf node sequences and the original system call sequence is 18316 and 8632 respectively, compared with the system call sequence, the number of the characteristic fragments is reduced by 112.19%, when the length of the characteristic fragments extracted by the window method is set to be 1-5 (the number of all the characteristic fragments with the fixed length being 1 to 5), the number of the characteristic fragments generated by the leaf node sequences and the original system call sequence is 135485 and 160035 respectively, compared with the system call sequence, the number of the characteristic fragments generated by the leaf node sequences is reduced by 15.34%, the performance of the intrusion detection engine constructed by using a machine learning model such as SVM is comprehensively evaluated by using four indexes of precision rate, recall rate, F1 score and false alarm rate, the fixed length is set to be 1 to 5 respectively, compared with 20 index values generated by the detection engine constructed by the prior art, wherein 16 values are dominant values, and furthermore, the average time of the characteristic extraction is reduced by 1.02s, 6.0256 s, 32 140.43s and 3723 s.

Compared with the prior art, the method has the advantages that the number of feature fragments generated by feature extraction is reduced, the dimension of feature vectors generated by the feature fragments is reduced, the time cost of feature extraction is reduced, the calculated resources and the storage resources required by deployment are saved, the minimum hardware requirement on a deployment host is reduced, the contradiction between cost and performance is solved, and a customer can be helped to realize an efficient and intelligent intrusion detection scheme under a lower configuration condition.

The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims

1. A method for intrusion detection based on system call is characterized by comprising the following steps,

1) A system calling feature extraction stage;

s1: capturing system call information, dividing the captured system call information into a plurality of system call sequences, and marking corresponding sequence labels;

s2: defining abnormal activity tracks represented by the abnormal sequences through different granularity characteristic characterization modes;

the granularity characteristic characterization mode comprises the following steps:

the method comprises the steps of an original system call sequence feature representation mode, a system behavior feature representation mode and a system kernel module feature representation mode; the characteristic granularity is fine granularity representation, low-level coarse granularity representation and high-level coarse granularity representation respectively;

s3: storing the mapping relation between the features with different granularities by using a relation mapping diagram, wherein the mapping relation comprises the following steps:

s4: converting the relation mapping diagram into an abstract behavior tree;

s5: pruning is carried out on the abstract behavior tree, and the structure of the abstract behavior tree after pruning is stored;

2) A leaf node sequence detection stage;

s6: mapping leaf nodes through an abstract behavior tree, converting a captured system call sequence into a leaf node sequence, and extracting features from a new leaf node sequence by using a window frequency method;

s7: performing feature dimension reduction on the extracted feature vector;

s8: and taking the feature vector after dimension reduction as the input of a machine learning model, and dividing the corresponding leaf node sequence into two types of abnormality and normal.

2. The method of intrusion detection based on system call according to claim 1, wherein step S4: converting the relationship map into an abstract behavior tree, comprising:

3. The method of intrusion detection based on system call according to claim 1, wherein step S5: pruning the abstract behavior tree, and storing the pruned abstract behavior tree structure, wherein the pruning method comprises the following steps:

4. The method of intrusion detection based on system call according to claim 1, wherein step S7: performing feature dimension reduction on the extracted feature vector, including:

5. The method of intrusion detection based on system call according to claim 1, wherein step S8: taking the feature vector after dimension reduction as the input of a machine learning model, and dividing the corresponding leaf node sequence into two types of abnormality and normal, wherein the method comprises the following steps:

the feature vector and the classification label after dimension reduction are divided into a training set, a testing set and a verification set, wherein the training set is used for training a model and determining parameters, the testing set is used for determining a network structure and adjusting super parameters of the model, the verification set is used for checking generalization capability of the model, different machine learning algorithm models are selected for parameter selection, and cross verification is used for evaluating model effects.