CN114896594A - Malicious code detection device and method based on image feature multi-attention learning - Google Patents

Malicious code detection device and method based on image feature multi-attention learning Download PDF

Info

Publication number
CN114896594A
CN114896594A CN202210408579.XA CN202210408579A CN114896594A CN 114896594 A CN114896594 A CN 114896594A CN 202210408579 A CN202210408579 A CN 202210408579A CN 114896594 A CN114896594 A CN 114896594A
Authority
CN
China
Prior art keywords
features
feature
image
malicious
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210408579.XA
Other languages
Chinese (zh)
Inventor
武志超
谭振华
王卫东
吴建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN202210408579.XA priority Critical patent/CN114896594A/en
Publication of CN114896594A publication Critical patent/CN114896594A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Virology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a malicious code detection device and method based on image feature multi-attention learning, and belongs to the technical field of malicious code identification. The apparatus includes a code-image converter, a feature extractor, and a classifier; converting an original malicious code file to be detected into a gray image and defining the gray image as a malicious image; extracting low-level semantic features in the malicious image to obtain features F; extracting key features F' from the features F; extracting a correlation characteristic F 'of the pixel on the key characteristic F'; extracting higher-order features from the features F' to obtain features M; extracting a correlation characteristic M' of the pixel from the characteristic M; and mapping the deep image features M' to a sample mark space, so that the malicious images are classified into specific malicious software categories. The device and the method have higher recognition rate while reducing the complexity of the convolutional neural network model.

Description

Malicious code detection device and method based on image feature multi-attention learning
Technical Field
The invention relates to the technical field of malicious code identification, in particular to a malicious code detection device and method based on image feature multi-attention learning.
Background
The rapid development of information technology makes enterprise network security face more complex challenges. Among them, malware is one of the major threats to network security. Malware refers to software or code that is installed and run on a user's computer or other terminal without explicitly prompting or licensing the user, violating the legitimate interests of the user. Malicious code detection aims to identify malicious programs in a computer or a terminal so as to prevent the malicious programs from generating greater harm.
The traditional malicious code detection method is divided into static detection and dynamic detection. Static detection is the analysis of the file content and structure of malicious code, including byte code, assembly instructions, functions, and the like. Static detection has the disadvantage that it is difficult to identify complex variants of malware, such as file encryption, packaging, morphing. Dynamic detection identifies malware by executing the malware and analyzing the behavior of the malware, which needs to be done during the activation of the malware, taking into account the cost of time and hardware resources. In a word, both static detection and dynamic detection need to consume a large amount of time cost, labor cost and hardware resources, and the requirement for quickly and efficiently identifying malicious codes is difficult to meet.
With the development of deep learning, researchers have proposed a malicious code detection method based on image processing. The method gets rid of the defects of time and labor consumption of the traditional method, and the image is classified by using the convolutional neural network in a mode of converting the malicious code into the image, so that the malicious code is detected. The current image classification task is the core of computer vision, and mainly adopts a convolutional neural network in deep learning to classify images. The main task is to give the convolutional neural network an input picture, which it assigns to a certain label in a known mixed class. Many existing convolutional neural network models can well process image classification tasks and achieve good accuracy. However, in the neural network model for malicious code detection based on image processing, the structure of the neural network model is too complex. For example, the VGGNet-16 network model structure shown in fig. 1 and the ResNet-50 network model structure shown in fig. 2, each square in the diagram represents an operation in a convolutional neural network, Input represents data Input, Conv represents a convolution operation, for example, Conv3x3, 64 represents a convolution kernel size used in the convolution operation is 3x3, and a channel of the neural network Input data is 64. MaxPool and AvgPool represent the maximum pooling operation and the average pooling operation in the convolutional neural network, and the FC layer and Softmax constitute a classifier of the convolutional neural network, which can classify data into a specific class. In such complex networks as VGGNet-16 and ResNet-50, the overly complex network structure results in more computational effort. Moreover, the image converted by the malicious code, namely the malicious image, has the characteristics that: firstly, the malicious image has key features and non-key features, and when the malicious code is converted into the image, the tail part of the binary file is usually supplemented with 0 and converted into black, so that the black part area is irrelevant to the original code, and therefore, the area converted into three shades of black, white and gray by the code is usually called key features, for example, the image feature in the white frame line area shown in fig. 3 is the key feature, and the rest of the black area is called non-key features; secondly, the malicious codes have semantic information and code correlation, because the original codes have semantic correlation, the semantic information is formed, so that the codes can play a role, and when the codes are sequentially converted into pixels, the pixels are sequentially arranged, so that the pixels also have correlation. Therefore, after the codes are sequentially converted into the images, the correlation between the malicious codes is mapped by the pixel correlation existing in the malicious images. The existing image processing-based malicious software classification method is not designed according to the characteristics of malicious images, and key features of the malicious images and correlation features among pixels cannot be extracted more deeply, so that the identification accuracy of a malicious image classification model is improved.
Disclosure of Invention
Aiming at the problems that a neural network model for classifying malicious images is too complex, large in calculation amount and weak in extraction capability of deep features of the malicious images in the prior art, the invention provides a malicious code detection device and method based on multi-attention learning of image features, and aims to reduce the complexity of the neural network model for classifying the images while improving the malicious code identification accuracy through the system and method.
The technical scheme adopted by the invention is as follows:
the invention provides a malicious code detection device based on image feature multi-attention learning, which comprises:
the code-image converter is used for converting an input original malicious code file into a gray image and defining the gray image as a malicious image to be sent to the feature extractor;
the characteristic extractor is used for extracting key characteristics and correlation characteristics among pixels from the received malicious images so as to obtain deep-level characteristics in the malicious images;
and the classifier is used for classifying the malicious images according to the deep level features extracted by the feature extractor and classifying the malicious images into specific malicious software categories.
Further, according to the malicious code detection device based on image feature multi-attention learning, the feature extractor takes a convolutional neural network as a basic network and comprises three structures, namely a CNN module, a spatial attention module and a self-attention module.
Further, according to the malicious code detection device based on image feature multi-attention learning, the feature extractor specifically includes:
the first CNN module is used for receiving the malicious image sent by the code-image converter, extracting low-level semantic features in the input image from the malicious image through a convolutional layer, putting the extracted low-level semantic features into a pooling layer through an activation function, generating a feature F after the dimension reduction of the feature is carried out through the maximum pooling operation, and sending the feature F to the spatial attention module;
the spatial attention module is used for extracting key features from the features F, specifically, high-order features are further extracted from the features F through a convolution layer, and weights are distributed to the extracted features according to the principle that the key features have higher weights; then, the channel information of the features is aggregated in the space dimension by using the maximum pooling operation and the average pooling operation to generate two-dimensional feature maps F avg And F max And F is spliced by a splicing operation avg And F max Stacking the data channels together to obtain key features with weight information after the data channels are compressed; finally, multiplying the key features by the features F to obtain key features F 'with weights distributed on the features F, and sending the key features F' to a first self-attention module;
the first self-attention module is used for extracting the correlation characteristics of the pixels on the key characteristics F ', and specifically, firstly, performing convolution operation on the characteristics F ' to linearly map the characteristics F ' to obtain Q, K, V characteristic matrixes; then multiplying the output transpose of the matrix Q and the output of the matrix K to obtain a correlation matrix; then multiplying the correlation matrix with the matrix V to obtain a new matrix; finally, point multiplication is carried out on the new matrix with the weight information and the correlation characteristics and the key characteristics F 'received from the space attention module, pixel correlation of the original key characteristics F' is given, pixel correlation characteristics F 'in the image are obtained, and the pixel correlation characteristics F' are sent to a second CNN module;
the second CNN module is used for extracting higher-order features from the features F ', extracting high-level semantic features from the features F', transferring the extracted high-level semantic features to the pooling layer through an activation function, performing feature dimension reduction through maximum pooling operation to obtain features M, and sending the features M to the second self-attention module;
the second self-attention module is used for extracting the correlation characteristics of the pixels on the characteristics M, and specifically, firstly, convolution operation is carried out on the characteristics M to linearly map the characteristics M to obtain Q, K, V characteristic matrixes; then multiplying the output transpose of the matrix Q and the output of the matrix K to obtain a correlation matrix; then multiplying the correlation matrix with the matrix V to obtain a new matrix; and finally, performing dot multiplication on the new matrix with the correlation characteristics and the high-order characteristics M received from the second CNN module, giving pixel correlation to the characteristics M, obtaining pixel correlation characteristics M 'in the image, and sending the pixel correlation characteristics M' to the classifier.
Further, according to the malicious code detection apparatus based on image feature multi-attention learning, the classifier is composed of at least 1 linear layer and 1 softmax classifier.
Further, according to the malicious code detection device based on image feature multi-attention learning, the classifier consists of 3 linear layers and 1 softmax classifier which are connected together.
The invention provides a malicious code detection method based on image feature multi-attention learning, which comprises the following steps:
step 100: converting an original malicious code file to be detected into a gray image, and defining the gray image as a malicious image;
step 200: extracting low-level semantic features in the malicious image to obtain features F;
step 300: extracting key features F' from the features F;
step 400: extracting a correlation characteristic F 'of the pixel on the key characteristic F';
step 500: extracting higher-order features from the features F' to obtain features M;
step 600: extracting a correlation characteristic M' of the pixel from the characteristic M;
step 700: and mapping the deep image features M' to a sample mark space, so that the malicious images are classified into specific malicious software categories.
Further, according to the malicious code detection method based on image feature multi-attention learning, the method for extracting the low-level semantic feature in the malicious image to obtain the feature F in the step 200 is the same as the method for extracting the higher-level feature in the feature F ″ to obtain the feature M in the step 500, and specifically includes: firstly, semantic features are extracted once from a malicious image or feature F' through a convolutional layer, then the extracted semantic features are transmitted to a pooling layer through an activation function, dimension reduction of the features is carried out through maximum pooling operation, and finally the feature F or feature M is generated.
Further, according to the malicious code detection method based on image feature multi-attention learning, the method for extracting the key feature F' from the feature F in step 300 includes: firstly, further extracting high-order features from the features F through a convolution layer and distributing weights to the extracted features according to the principle that key features have higher weights; then, the channel information of the features is aggregated in the space dimension by using the maximum pooling operation and the average pooling operation to generate two-dimensional feature maps F avg And F max And F is spliced by a splicing operation avg And F max Stacking the data channels together to obtain key features with weight information after the data channels are compressed; and finally, multiplying the key features with the weight information after the data channel is compressed by the feature F to obtain the key features F' with the weights distributed on the feature F.
Further, according to the malicious code detection method based on image feature multi-attention learning, the method for extracting the correlation feature F ″ of the pixel on the key feature F 'in the step 400 is the same as the method for extracting the correlation feature M' of the pixel on the feature M in the step 600, and specifically includes: firstly, performing convolution operation on the feature F 'or the feature M to linearly map the feature F' or the feature M to obtain Q, K, V three feature matrixes; then multiplying the output transpose of the matrix Q and the output of the matrix K to obtain a correlation matrix; then multiplying the correlation matrix with the matrix V to obtain a new matrix; and finally, carrying out point multiplication on the new matrix with the weight information and the correlation characteristic and the key feature F 'or the feature M to endow the original key feature F' or the feature M with pixel correlation, and obtaining the pixel correlation characteristic F 'or the pixel correlation characteristic M' in the image.
Further, the malicious code detection device or method based on image feature multi-attention learning as described in any one of the above, wherein the key features refer to features in a region containing three shades of black, white and gray in a gray level image converted by malicious codes.
Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: the device of the invention classifies malicious images by combining spatial attention and two self-attention modules to form a neural network model and extracting deep features of the images on the basis of a convolutional neural network, although the model contains a plurality of modules, the structure of the modules is very simple and effective, each module only uses a small number of convolutional layers and pooling layers, the complexity and the calculation amount of the model are greatly reduced compared with the stacking of a large number of convolutional layers and pooling layers used by VGGNet and ResNet networks, each module can extract the deep features in the malicious images, namely key features and pixel correlation features in the images, and the problems of complexity of the neural network model for classifying the malicious images and insufficient extraction capability of the deep features of the images are solved. The identification accuracy rate obtained by experiments on the disclosed Malimg malicious code data set by the system and the method reaches 96.38 percent and exceeds 96.10 percent of VGGNet, which proves that the device and the method have higher identification rate while the complexity of a convolutional neural network model is reduced.
Drawings
FIG. 1 is a schematic diagram of a VGGNet-16 network model structure;
FIG. 2 is a schematic diagram of a ResNet-50 network model structure;
FIG. 3 is an exemplary diagram of key features in a malicious image;
FIG. 4 is a schematic structural diagram of a malicious code detection apparatus based on multi-attention learning of image features according to this embodiment;
FIG. 5 is a flowchart illustrating a malicious code detection method based on multi-attention learning of image features according to this embodiment;
FIG. 6(a) is an image of example 1 translation of malicious code belonging to the Fakerean family; (b) an image transformed for malicious code example 2 belonging to the Fakerean family; (c) an image transformed for example 3 malicious code belonging to the Fakerean family;
FIG. 7(a) is an image of example 1 translation of malicious code belonging to the Dontovo family; (b) an image translated for malicious code example 2 belonging to the Dontovo family; (c) an image translated for malicious code example 3 belonging to the Dontovo family;
FIG. 8 is a diagram of the results of classification of malware families on a Malimg dataset by the system of the present invention;
Detailed Description
To facilitate an understanding of the present application, the present application will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present application are given in the accompanying drawings. This application may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
In this embodiment, the software environment is a WINDOWS 10 system, and the simulation environment is PyCharm 2021.3.3x 64. The malicious code file samples were in the Malimg dataset, which included 25 malware families, i.e., 25 malware types, with the number of samples for each type shown in table 1. In an embodiment of the present invention, the 25 malware families are represented by the numbers 0-24, respectively, with the aim of finally outputting a specific number to represent each malware family by the classifier in the inventive apparatus. In this embodiment, the correspondence between the 25 malware family names and the numbers 0 to 24 is as follows: 0 in Allapel.L; 1 in Allapel.A; "Yuner.A": 2; "Lolyda.AA1": 3; (ii) "Lolyda. AA2": 4; (ii) "Lolyda. AA3": 5; c2 LOP.P: 6; gen! g, 7; "instant" 8; gen! I is 9; gen! E, 10; "VB.AT" is 11; 12 is Fakerean; gen! J is 13; gen! J is 14; 15 "Lolyda. AT"; "Adialer.C": 16; 17 is Wintrum.BX; 18 in Dialpatform.B; 19 in Dontovo.A "; 20 "obfuscator. ad"; (ii) 21 is agent. FYI; 22 "Autorun. K"; "Rbot! gen 23; "Skinrim. N": 24.
TABLE 1 Malimg dataset malware type and sample number
Figure BDA0003603132630000061
Fig. 4 is a schematic structural diagram of the malicious code detection apparatus based on multi-attention learning of image features according to the present embodiment, and as shown in fig. 4, the apparatus includes three parts, namely a code-image converter, a feature extractor and a classifier. The code-image converter is used for converting an input original malicious code file into a gray image, defining the gray image as a malicious image and sending the malicious image to the feature extractor; the feature extractor is used for extracting key features and correlation features from the received malicious image so as to obtain deep features in the malicious image; the classifier is used for classifying the malicious images according to the deep level features extracted by the feature extractor and classifying the malicious images into specific malicious software family categories. The feature extractor and the classifier form a neural network model in the apparatus of the present invention, which is called a FA model (Fusion Attention model).
As shown in fig. 4, the feature extractor, which takes a convolutional neural network as a basic network, includes three structures, namely a CNN module, a spatial attention module, and a self-attention module, and specifically includes:
the first CNN module is used for receiving the malicious image sent by the code-image converter, extracting low-level semantic features in the input image from the malicious image through a convolutional layer, transmitting the extracted low-level semantic features to a pooling layer through an activation function, performing feature dimension reduction through maximum pooling operation, reducing network model parameters, finally generating a feature F, and sending the feature F to the spatial attention module;
the spatial attention module is used for extracting key features from the features F, specifically, high-order features are further extracted from the features F through a convolution layer, and weights are distributed to the extracted features according to the principle that the key features have higher weights; then, the channel information of the features is aggregated in the space dimension by using the maximum pooling operation and the average pooling operation to generate two-dimensional feature maps F avg And F max And F is spliced by a splicing operation avg And F max Are stacked togetherObtaining key features with weight information after compressing the data channel; finally, multiplying the key features with the weight information after compressing the data channel with the feature F to obtain key features F 'with the weights distributed on the feature F, and sending the key features F' to the first self-attention module;
the first self-attention module is used for extracting the correlation characteristics of pixels on the key characteristics F ', specifically, firstly, the convolution operation is carried out on the characteristics F' and linear mapping is completed to obtain Q, K, V characteristic matrixes, and the difference between Q, K, V three characteristic matrixes is only that the output channels are different in size; after obtaining the three characteristic matrixes, multiplying the output transpose of the matrix Q and the output of the matrix K to obtain a correlation matrix; and multiplying the correlation matrix by the matrix V to obtain a new matrix, wherein each pixel point in the new matrix is related to the key feature F' received from the spatial attention module, and the new matrix comprises weight information. Finally, carrying out point multiplication on the new matrix with the weight information and the correlation characteristics and the key characteristics F 'received from the space attention module, giving pixel correlation to the original key characteristics F', obtaining pixel correlation characteristics F 'in the image, and sending the pixel correlation characteristics F' to the second CNN module;
the second CNN module is used for extracting higher-order features from the features F ', extracting high-level semantic features from the features F', transferring the extracted high-level semantic features to the pooling layer through an activation function, performing feature dimension reduction through maximum pooling operation, reducing network model parameters, finally generating features M, and sending the features M to the second self-attention module;
the second self-attention module is used for extracting the correlation characteristics of the pixels on the characteristics M, and specifically, firstly, performing convolution operation on the characteristics M to complete linear mapping to obtain Q, K, V three characteristic matrixes; after obtaining the three characteristic matrixes, multiplying the output transpose of the matrix Q and the output of the matrix K to obtain a correlation matrix; and multiplying the correlation matrix by the matrix V to obtain a new matrix, wherein each pixel point in the new matrix is correlated with the high-order feature M received from the second CNN module. And finally, performing dot multiplication on the new matrix with the correlation characteristics and the high-order characteristics M received from the second CNN module, giving pixel correlation to the characteristics M, obtaining pixel correlation characteristics M 'in the image, and sending the pixel correlation characteristics M' to the classifier.
As shown in fig. 4, the classifier consists of at least 1 linear layer and 1 softmax classifier. In the embodiment, the classifier is composed of 3 linear layers and 1 softmax classifier connected together, and is used for mapping the deep image features extracted by the feature extractor to the sample mark space, that is, classifying the deep image features received by the classifier. The linear layer (linear layer) is a model classification structure in the deep learning framework Pytorch, and is used for performing linear transformation on the obtained deep level image features to generate input features required by softmax. The 3 linear layers of the present embodiment have the functions that each linear layer reduces the dimension of the feature, and the neural network parameters are sequentially decreased. The softmax classifier can normalize the features and distribute weights, finally, each feature matrix is distributed to a specific malicious family category, and corresponding numbers representing the malicious family categories are generated, so that the corresponding malicious software family categories are obtained. Table 2 shows detailed parameters of each layer in the FA model in this embodiment.
TABLE 2 detailed parameters of layers in the FA model
Figure BDA0003603132630000081
Fig. 5 is a schematic flowchart of a malicious code detection method based on image feature multi-attention learning, which aims to train and verify a malicious code detection apparatus based on image feature multi-attention learning, and includes the following steps:
step 1: converting an original malicious code file into a gray image, normalizing all the gray images, and dividing the normalized gray images into a training set and a test set according to a certain proportion;
step 1.1: converting the original malicious code file into a gray image;
in the present embodiment, as described above, the original malicious code file sample adopts a Malimg data set. And acquiring a Malimg data set, wherein an original malicious code file in the Malimg data set contains a binary bit string, such as 011100110101100101101101010. Converting the binary bit string in the malicious code file into a gray image according to the following method: firstly, extracting the binary bit strings by using open and write functions in Python language, writing the binary bit strings into the open and write functions and storing the binary bit strings in a computer. Taking each byte in the malicious code file as a unit, splitting and reading each binary bit string into 8-bit vectors, namely extracting the stored binary bit strings according to 8 bits, and converting the vectors into decimal unsigned integers through a binary conversion operation in a Python language to be mapped into a space of 0-255. The formula for the binary conversion calculation of each byte is:
Figure BDA0003603132630000082
Figure BDA0003603132630000083
where I is the resulting mapping value. For example: 01100000 and 10101100, and calculating to obtain the corresponding mapping value of 96 and 172 through a system conversion calculation formula. This allows mapping all codes to values between 0 and 255, and finally converting them to grayscale images in the interval 0 (black) -255 (white) to obtain an image sample dataset, as shown in fig. 6 and 7.
Step 1.2: normalizing all the gray level images to construct an image sample data set;
the image is processed into a fixed size of uniform size. The reason for this is that an FA model (Fusion Attention model) composed of a feature extractor and a classifier requires input of images of uniform size, and small-sized images have better recognition effect. In order to prevent overfitting of the FA model, random crop is carried out on image data by using a RandomCrop method in a deep learning framework pyrrch, so that the data enhancement effect can be achieved. And normalizing the data by using Normalization operation, so that the data can be mapped into a range of 0-1, the calculation amount of the FA model is reduced, and the precision and the convergence speed of the FA model are improved.
Step 1.3: dividing the image sample data set into a training set and a test set according to a certain proportion;
after all malicious code files are converted into gray level images, an image sample data set is formed, and the image sample data set needs to be divided for subsequently training and testing the built model. In this embodiment, the image sample data set is divided into a training set and a test set using python language at a ratio of 8: 2. Then the data set traversal is performed with the for loop, new training set folders and test set folders are formed with the os.path.join method in the python dependent package os, and the assigned pictures are added to both folders.
Step 2: inputting the image sample data in the training set into the malicious code detection device based on the image feature multi-attention learning shown in fig. 4, and performing forward propagation on the malicious code detection device based on the image feature multi-attention learning to obtain a prediction result.
Step 2.1: extracting low-level semantic features in an input image through a first CNN module to obtain a feature F;
in this embodiment, specifically, the malicious images in the training set are input to the first CNN module: in a first CNN module, firstly extracting low-level semantic features in an input image through a convolutional layer, then transferring the extracted low-level semantic features into a pooling layer through an activation function, performing feature dimension reduction through maximum pooling operation, and finally generating features F;
step 2.2: extracting key features F' from the features F through a space attention module;
in the spatial attention module of the present embodiment, first, a convolution layer is used to further extract high-order features from the features F and weights are assigned to the extracted features according to the principle that key features have higher weights; secondly, aggregating channel information of the features in the space dimension by using maximum pooling operation and average pooling operation to generate two-dimensional feature maps Favg and Fmax, and stacking the Favg and the Fmax together by splicing operation to obtain key features with weight information after compressing a data channel; finally, multiplying the key features with the weight information after compressing the data channel with the feature F to obtain key features F' with weights distributed on the feature F;
the operation formula of the space attention module for feature extraction is as follows:
Figure BDA0003603132630000091
Figure BDA0003603132630000092
wherein M is s The entire operation of extracting features on behalf of the spatial attention module; f is an input image; σ represents an activation function; f. of 7x7 Represents a convolution operation using a 7x7 convolution kernel; AvgPool and MaxPool represent the average pooling and maximum pooling, respectively, in equation (2)
Figure BDA0003603132630000101
And
Figure BDA0003603132630000102
represents; r represents a characteristic matrix, and 1 multiplied by H multiplied by W represents the characteristic size of 1 channel with the length H and the width W;
step 2.3: extracting a correlation feature F 'of the pixel on the key feature F' by a first self-attention module;
in the first self-attention module of this embodiment, the convolution operation is performed on the feature F' and linear mapping is completed to obtain Q, K, V three feature matrices; then multiplying the output transposition of the matrix Q and the output of the matrix K to obtain a correlation matrix; then multiplying the correlation matrix with the matrix V to obtain a new matrix; finally, carrying out point multiplication on the new matrix with the weight information and the correlation characteristic and the key characteristic F 'received from the spatial attention module to obtain a pixel correlation characteristic F' in the image;
the operation formula of the self-attention module is as follows:
Attention(Q,K,V)=softmax(B i,j )V·F (3)
B i,j =Q(x i ) T K(x j ) (4)
wherein softmax represents an activation function; q, K, V represents three feature matrices; b is i,j Is used to represent i th Position pair generation j th The relationship weight of the position, namely the relationship weight between different pixels; f represents the original features of the input; q (x) and K (x) represent feature matrices Q and K generated after the convolution operation; q (x) i ) T Representing the transposed matrix of the generated feature matrix Q.
Step 2.4: extracting higher-order features from the features F' through a second CNN module to obtain features M;
in the second CNN module of the present embodiment, first, a high-level semantic feature is extracted from the feature F ″ through the convolutional layer, then the extracted high-level semantic feature is put into the pooling layer through the activation function, and the feature M is finally generated by performing feature dimension reduction through the maximum pooling operation.
Step 2.5: extracting a correlation feature M' of the pixel from the feature M through a second self-attention module;
in the second self-attention module of this embodiment, the convolution operation is performed on the feature M to complete linear mapping, so as to obtain Q, K, V three feature matrices; then multiplying the output transpose of the matrix Q and the output of the matrix K to obtain a correlation matrix; then multiplying the correlation matrix with the matrix V to obtain a new matrix; and finally, performing dot multiplication on the new matrix with the correlation characteristic and the high-order characteristic M received from the second CNN module, giving pixel correlation to the characteristic M, and obtaining a pixel correlation characteristic M' in the image.
Step 2.6: the deep image features M' are mapped to a sample label space by a classifier, i.e., the features are classified.
In the classifier of the embodiment, firstly, the dimension reduction is continuously performed on the feature M' through 3 linear layers, so that the neural network parameters are sequentially decreased; and then normalizing the features and distributing weights through a softmax classifier, finally distributing each corresponding feature matrix to a specific malicious family category, and generating corresponding numbers representing the malicious family categories so as to acquire the corresponding malicious software family categories.
And 3, step 3: calculating a loss value according to a prediction result obtained by forward propagation of the malicious code detection device based on the image feature multi-attention learning, performing backward propagation, and updating parameters of the malicious code detection device based on the image feature multi-attention learning.
In the embodiment, a random gradient descent method is used to update parameters of the malicious code detection device based on image feature multi-attention learning, and a loss function formula used for training is as follows:
Figure BDA0003603132630000111
wherein p ═ p 0 ,……,p c-1 ]Representing a probability distribution, each element p i Representing the probability that the sample belongs to class i malware; y ═ y 0 ,……,y c-1 ]Is a vector-form representation of the sample label, y when the sample belongs to class i i 1, otherwise y i 0; c represents a sample label.
In the training, the network layer uses an Initialization method to initialize network parameters, the size of a training batch is set to be 128, the initial learning rate is set to be 0.005, and the training round is set to be 100. In the training process, the device can be tested on the test set every 10 times of training, the test result can be output to a log file, and the parameter file of the device can be stored in a pth file.
And 4, step 4: ACC is used as a measure of detection performance, namely, the ratio of the predicted correct sample to the total sample. And (3) testing and evaluating the trained malicious code detection device based on the image characteristic multi-attention learning by using a test set, re-training the device according to the method in the step (2) after parameter adjustment is carried out on the current device according to an evaluation result, testing and evaluating the re-trained device again, repeatedly executing training and testing and evaluating operations on the device until an optimal device reaching a measurement index is obtained, and taking the device as a final malicious code detection device based on the image characteristic multi-attention learning.
In order to better illustrate the experimental results and performance improvement of the device, the experimental results of different models on the Malimg data set are compared, and the specific results are shown in table 3. The model of the combination of the feature extractor and the classifier in the device is referred to as a FA model (Fusion Attention model) for short.
TABLE 3 comparison of the experimental results of different models
Figure BDA0003603132630000112
Figure BDA0003603132630000121
It can be seen from table 3 that the identification effect of the FA model has exceeded the basic CNN model, and the FA model can be compared with the complex network VGGNet on Malimg data set. The device is effective in reducing the complexity of the neural network model and acquiring the deep-level features of the image.
In addition to performing the accuracy comparison between models, the family classification results of the device of the present invention on the Malimg data set are shown in fig. 8. The horizontal axis of fig. 8 represents the true label and the vertical axis represents the predicted label. The horizontal axis corresponds to 0, which represents the correct prediction, and the other numbers represent the number of families predicted as horizontal family on the vertical axis. It can be seen that most families are well predicted, maintaining 100% recognition accuracy with only a few sample prediction errors. 4 samples of the Allapel.L class are predicted to be Allapel.A class, and 5 samples of the Allapel.A class are predicted to be the Allapel.L class; class lolyda.aa2 3 samples were predicted to be of class lolyda.aa1; class c2lop.p 7 samples are predicted to be c2 lop.gen! g types; gen! Class I has 6 samples predicted to be Swizzor. gen! Class E; it can be seen that prediction errors occur between two sub-classes of the same general class. Due to the homology between the subclasses, there is little difference between them. This shows that the FA model in the device of the present invention captures the features of the same family well, and the extraction of the features through pixel correlation facilitates the identification of the same family by the model.
In conclusion, the device and the method of the invention meet the requirements of high efficiency and accuracy of malicious code identification. The problem of traditional malicious code discernment time cost and cost of labor, resource cost are high is solved. In the existing image-based method, the problems of complex model and insufficient extraction capability of the model on the deep level features of malicious codes are solved.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims (10)

1. A malicious code detection device based on image feature multi-attention learning is characterized by comprising:
the code-image converter is used for converting an input original malicious code file into a gray image and defining the gray image as a malicious image and sending the malicious image to the feature extractor;
the characteristic extractor is used for extracting key characteristics and correlation characteristics among pixels from the received malicious images so as to obtain deep-level characteristics in the malicious images;
and the classifier is used for classifying the malicious images according to the deep level features extracted by the feature extractor and classifying the malicious images into specific malicious software categories.
2. The apparatus according to claim 1, wherein the feature extractor is based on a convolutional neural network, and comprises three structures, namely a CNN module, a spatial attention module and a self-attention module.
3. The apparatus for detecting malicious code based on image feature multi-attention learning according to claim 2, wherein the feature extractor specifically comprises:
the first CNN module is used for receiving the malicious images sent by the code-image converter, extracting low-level semantic features in the input images from the malicious images through a convolutional layer, inputting the extracted low-level semantic features into a pooling layer through an activation function, performing feature dimension reduction through maximum pooling operation to generate features F, and sending the features F to the spatial attention module;
the spatial attention module is used for extracting key features from the features F, specifically, high-order features are further extracted from the features F through a convolution layer, and weights are distributed to the extracted features according to the principle that the key features have higher weights; then, the channel information of the features is aggregated in the space dimension by using the maximum pooling operation and the average pooling operation to generate two-dimensional feature maps F avg And F max And F is spliced by a splicing operation avg And F max Stacking the data channels together to obtain key features with weight information after the data channels are compressed; finally, multiplying the key features by the features F to obtain key features F 'with weights distributed on the features F, and sending the key features F' to a first self-attention module;
the first self-attention module is used for extracting the correlation characteristics of the pixels on the key characteristics F ', and specifically, firstly, performing convolution operation on the characteristics F ' to linearly map the characteristics F ' to obtain Q, K, V characteristic matrixes; then multiplying the output transpose of the matrix Q and the output of the matrix K to obtain a correlation matrix; then multiplying the correlation matrix with the matrix V to obtain a new matrix; finally, carrying out point multiplication on the new matrix with the weight information and the correlation characteristics and the key characteristics F 'received from the space attention module, giving pixel correlation to the original key characteristics F', obtaining pixel correlation characteristics F 'in the image, and sending the pixel correlation characteristics F' to the second CNN module;
the second CNN module is used for extracting higher-order features from the features F ', extracting high-level semantic features from the features F', transferring the extracted high-level semantic features to the pooling layer through an activation function, performing feature dimension reduction through maximum pooling operation to obtain features M, and sending the features M to the second self-attention module;
the second self-attention module is used for extracting the correlation characteristics of the pixels on the characteristics M, and specifically, firstly, convolution operation is carried out on the characteristics M to linearly map the characteristics M to obtain Q, K, V characteristic matrixes; then multiplying the output transpose of the matrix Q and the output of the matrix K to obtain a correlation matrix; then multiplying the correlation matrix with the matrix V to obtain a new matrix; and finally, performing dot multiplication on the new matrix with the correlation characteristics and the high-order characteristics M received from the second CNN module, giving pixel correlation to the characteristics M, obtaining pixel correlation characteristics M 'in the image, and sending the pixel correlation characteristics M' to the classifier.
4. The apparatus according to claim 1, wherein the classifier is composed of at least 1 linear layer and 1 softmax classifier.
5. The apparatus according to claim 4, wherein the classifier is composed of 3 linear layers and 1 softmax classifier connected together.
6. A malicious code detection method based on image feature multi-attention learning is characterized by comprising the following steps:
step 100: converting an original malicious code file to be detected into a gray image, and defining the gray image as a malicious image;
step 200: extracting low-level semantic features in the malicious image to obtain a feature F;
step 300: extracting key features F' from the features F;
step 400: extracting a correlation characteristic F 'of the pixel on the key characteristic F';
step 500: extracting higher-order features from the features F' to obtain features M;
step 600: extracting a correlation characteristic M' of the pixel from the characteristic M;
step 700: and mapping the deep image features M' to a sample mark space, so that the malicious images are classified into specific malicious software categories.
7. The method for detecting malicious codes based on image feature multi-attention learning according to claim 6, wherein the method for extracting the low-level semantic features in the malicious images to obtain the features F in the step 200 is the same as the method for extracting the features M at a higher order in the features F ″ in the step 500, and specifically comprises: firstly, semantic features are extracted once from a malicious image or feature F' through a convolutional layer, then the extracted semantic features are transmitted into a pooling layer through an activation function, dimension reduction of the features is carried out through maximum pooling operation, and finally the feature F or the feature M is generated.
8. The method for detecting malicious codes based on image feature multi-attention learning according to claim 6, wherein the method for extracting key features F' from the features F in the step 300 is as follows: firstly, further extracting high-order features from the features F through a convolution layer and distributing weights to the extracted features according to the principle that key features have higher weights; then, the channel information of the features is aggregated in the space dimension by using the maximum pooling operation and the average pooling operation to generate two-dimensional feature maps F avg And F max And F is spliced by a splicing operation avg And F max Stacking the data channels together to obtain key features with weight information after the data channels are compressed; and finally, multiplying the key features with the weight information after the data channel is compressed by the feature F to obtain the key features F' with the weights distributed on the feature F.
9. The method for detecting malicious codes based on image feature multi-attention learning according to claim 6, wherein the method for extracting the correlation feature F ″ of the pixel on the key feature F 'in the step 400 is the same as the method for extracting the correlation feature M' of the pixel on the feature M in the step 600, and specifically comprises: firstly, carrying out convolution operation on the feature F 'or the feature M to linearly map the feature F' or the feature M to obtain Q, K, V three feature matrixes; then multiplying the output transpose of the matrix Q and the output of the matrix K to obtain a correlation matrix; then multiplying the correlation matrix with the matrix V to obtain a new matrix; and finally, carrying out point multiplication on the new matrix with the weight information and the correlation characteristic and the key feature F 'or the feature M to endow the original key feature F' or the feature M with pixel correlation, and obtaining the pixel correlation characteristic F 'or the pixel correlation characteristic M' in the image.
10. The apparatus or method for detecting malicious codes based on image feature multi-attention learning as claimed in any preceding claim, wherein the key features refer to features in a region containing three shades of black, white and gray in a gray level image converted by malicious codes.
CN202210408579.XA 2022-04-19 2022-04-19 Malicious code detection device and method based on image feature multi-attention learning Pending CN114896594A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210408579.XA CN114896594A (en) 2022-04-19 2022-04-19 Malicious code detection device and method based on image feature multi-attention learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210408579.XA CN114896594A (en) 2022-04-19 2022-04-19 Malicious code detection device and method based on image feature multi-attention learning

Publications (1)

Publication Number Publication Date
CN114896594A true CN114896594A (en) 2022-08-12

Family

ID=82717370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210408579.XA Pending CN114896594A (en) 2022-04-19 2022-04-19 Malicious code detection device and method based on image feature multi-attention learning

Country Status (1)

Country Link
CN (1) CN114896594A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117216797A (en) * 2022-12-01 2023-12-12 丰立有限公司 System and method for protecting data file

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117216797A (en) * 2022-12-01 2023-12-12 丰立有限公司 System and method for protecting data file
CN117216797B (en) * 2022-12-01 2024-05-31 丰立有限公司 System and method for protecting data file

Similar Documents

Publication Publication Date Title
CN110084281B (en) Image generation method, neural network compression method, related device and equipment
CN110602113B (en) Hierarchical phishing website detection method based on deep learning
CN112613501A (en) Information auditing classification model construction method and information auditing method
CN111428557A (en) Method and device for automatically checking handwritten signature based on neural network model
CN111885035A (en) Network anomaly detection method, system, terminal and storage medium
CN112989358B (en) Method and device for improving robustness of source code vulnerability detection based on deep learning
CN110175248B (en) Face image retrieval method and device based on deep learning and Hash coding
CN109033833B (en) Malicious code classification method based on multiple features and feature selection
CN111651762A (en) Convolutional neural network-based PE (provider edge) malicious software detection method
CN113806746A (en) Malicious code detection method based on improved CNN network
CN113344826B (en) Image processing method, device, electronic equipment and storage medium
CN110717953A (en) Black-white picture coloring method and system based on CNN-LSTM combined model
CN116910752B (en) Malicious code detection method based on big data
CN114896594A (en) Malicious code detection device and method based on image feature multi-attention learning
CN112597925B (en) Handwriting recognition/extraction and erasure method, handwriting recognition/extraction and erasure system and electronic equipment
CN116758379B (en) Image processing method, device, equipment and storage medium
CN117407875A (en) Malicious code classification method and system and electronic equipment
CN111242114B (en) Character recognition method and device
CN111898544A (en) Character and image matching method, device and equipment and computer storage medium
CN114896598B (en) Malicious code detection method based on convolutional neural network
CN111488950A (en) Classification model information output method and device
CN116361791A (en) Malicious software detection method based on API packet reconstruction and image representation
CN114638984B (en) Malicious website URL detection method based on capsule network
WO2023173546A1 (en) Method and apparatus for training text recognition model, and computer device and storage medium
CN114741697A (en) Malicious code classification method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination