CN110096878A

CN110096878A - A kind of detection method of Malware

Info

Publication number: CN110096878A
Application number: CN201910341543.2A
Authority: CN
Inventors: 袁明
Original assignee: Wuhan Zhimei Interconnection Technology Co Ltd
Current assignee: Wuhan Zhimei Interconnection Technology Co Ltd
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2019-08-06

Abstract

The present invention provides a kind of detection method of Malware, it is first picture software Binary Conversion, problem is set to be converted to a picture classification problem, then transfer learning is carried out to picture using a kind of deep learning model Inception-Resnet-v2 based on CNN, it allows model Automatic-searching feature to go to classify to normal software and Malware by the training of mass data, is not necessarily to a large amount of Feature Engineerings.Inception-Resnet-v2 model obtains extraordinary picture classification effect on this data set of ImageNet.The present invention carries out transfer learning again using based on the model after ImageNet data set pre-training come the picture to software, can reach extraordinary effect, and accuracy rate is up to 95% or more.

Description

A kind of detection method of Malware

Technical field

The present invention relates to computers, server security protection technology field, refer in particular to a kind of detection method of Malware.

Background technique

Malware (Malware, malicious software) also known as " rogue software " generally refers to pass through net What the approach such as network, portable memory apparatus disseminated, deliberately personal computer, server, smart machine, computer network etc. are made Leak at privacy or confidential data, system damage, loss of data etc. it is non-using expected failure and information security issue, and attempt User is stopped to remove them in various ways, the form of Malware includes that shelves, script, activity description etc. can be performed in binary system. For definition, computer virus, Trojan Horse, extorts software, spyware, threatening software, utilizes leakage at computer worm Software, the even some ad wares of hole operation, are also included in the classification of Malware.

Malware security threat caused by equipment is very big, they usually pretend oneself by various means, so that right Its detection is very difficult.Past some detection modes mostly use md5 storehouse matching, characteristic matching, the observation of sandbox operation action Etc. means.But if Malware just can make these modes fail after slightly modifying code, cause to find in time new Malware, when being had been subjected to based on these method detection methods, method relatively advanced at present is using deep learning to a large amount of Normal and Malware is trained, to identify Malware, main patent has " WO2017084586A1: based on depth Method, system and the equipment of learning method deduction malicious code rule " " CN102243699A: the deep learning of unknown malicious code Detection method ".But such deep learning method needs to do a large amount of Feature Engineering, extracts for the binary system byte of software Feature, such as byte length, the hidden markov probability of n-gram, comentropy etc..The binary-coded character that is directed to also does word It is embedded in the feature to train Malware, and can only be for a certain software format, such as Android software.This is required Very big workload and cost is calculated, while Detection accuracy places one's entire reliance upon the processing of feature, and generalization ability is weak, no A large amount of other kinds of Malware can be detected.

Existing malware detection techniques mainly have based on the binary static detection of Malware and based on maliciously soft The dynamic detection of part sandbox operation.These detection techniques are for Code obfuscation, and shell adding, the malware detection effect after variation is very Difference.It is either whether harmful for the reversed analysis of code, or the behavior that observation Malware is run in sandbox, these inspections Survey mode cost is all very high, and also high to the competency profiling of safety analysis personnel, and accuracy rate is highly unstable.

Summary of the invention

It is an object of the invention to be directed to the existing state of the art, a kind of detection method of Malware, this method are provided A variety of file formats are supported in the extraction work that can be reduced manual features, increase interpretation, while it is quasi- to be correspondingly improved detection True rate.

In order to achieve the above objectives, the present invention adopts the following technical scheme:

The present invention is a kind of detection method of Malware, and the binary code of software is converted to picture, is based on using one kind The deep learning model of CNN carries out transfer learning to picture, allows deep learning model Automatic-searching feature by data training, from And classify to normal software and Malware.

Further, deep learning model is Inception-Resnet-v2.

The detection method of Malware of the invention, specifically includes the following steps:

(1) data preparation: collecting a large amount of normal softwares and Malware, is trained to collected software, collected soft Part format covers most of common software format, mainly there is exe, apk, jar, doc, docx, xls, xlsx, ppt, pptx, pdf, csv, txt, png, log, tsv, html, js, css, xml；Malicious code is possible to hidden Ensconce the above software format in, the file of these software formats is trained can reach detect to greatest extent it is all kinds of Malware.

(2) binary file is converted to picture: the binary file of a given Malware, the binary stream of reading are For the nonnegative integer vector of 8bit, the numberical range that 8bit is indicated is 0-255, the 0-255 pixel value of corresponding grey scale figure, then two The every 8bit of system stream is mapped as a pixel, is worth for pixel value, according to the big wisp of file, these pixel values are adjusted to one Two-dimensional matrix obtains a picture；

(3) picture pre-processes: after the software for needing training is switched to picture, adjusting to the length of the picture converted with width Unanimously；

(4) it establishes model: doing transfer learning using Inception-Resnet-v2 model, Inception-Resnet-v2 is one A deep neural network based on CNN, using a large amount of 1x1,1x3,3x1, the convolution kernel of 1x7,7x1 extract feature to picture, And several modules, that is, Inception module are formed by these convolution kernels, then these modules are stacked up composition Inception model can preferably extract the more high-order features of picture so that model depth is deepened.For the shallow-layer of picture Information can be transferred to deep layer, and Resnet network, i.e. jump connection need to have been added in each Inception module.Specifically, using Come to train Inception-Resnet-v2, ImageNet be a large-scale image data collection, it can make ImageNet data set Inception-Resnet-v2 learns to many general features, such as horizontal profile, vertically profiling etc., after training A picture classification model is obtained, on the basis of this model, starts the sample for training Malware,；

(5) training pattern: carry out building for implementation model using TensorFlow deep learning frame, picture need to be converted to Then tf-record data are divided into training set and test set, training set by the tf-record data format of TensorFlow For training pattern, test set is used to the generalization ability of assessment models, is adjusted, is made to model parameter again according to assessment result The model that training obtains can be disposed individually and carry out classification and Detection to Malware.

The invention has the benefit that the present invention proposes to use deep learning skill again after visualizing Malware by picture Art is detected.Firstly, being done using picture, analysis is no longer needed for doing decompiling to software and sandbox is run；Secondly, disliking The picture that software image comes out of anticipating has preferable interpretation, and malicious code can be hidden in certain a part of entire software, turn Picture is changed to just regardless of where is it, can be relatively easy to find it, even if the change that malicious code has made very little becomes new Variant, for picture be also it is detectable.Deep learning achieves extraordinary effect in picture detection field, Therefore the Malware for being converted to picture can also have higher detection accuracy.

Model training is completed and after preservation model parameter, when inputting new software document, file is switched to picture, then Classification and Detection is carried out to picture using depth CNN model, process is simple, it is no longer necessary to runs software, also no longer to binary system into Row semantic analysis and to file extract manual features.It supports extensive file format simultaneously, is no longer only to certain a kind of malice Software is effective, but has very strong generalization ability.Other modes are better than in terms of detection efficiency and detection accuracy.

Detailed description of the invention:

Attached drawing 1 is flow chart of the invention；

Attached drawing 2 is the flow chart of file conversion in the present invention；

Attached drawing 3 is the network structure of Inception module in the present invention；

Attached drawing 4 is model training flow chart of the invention.

Specific embodiment:

The binary code of software is converted to a picture, resettles base by a kind of detection method of Malware of the invention In the deep learning model of Inception-Resnet-v2, Inception-Resnet-v2 is a multi-layer C NN neural network, After through the study that exercises supervision to the normal software and Malware that largely mark, Malware can be examined by obtaining one The realization of survey, process such as Fig. 1.

The present embodiment specifically includes the following steps:

(2) binary file is converted to picture: as shown in Fig. 2, the binary file of a given Malware, reading Binary stream is the nonnegative integer vector of 8bit, and the numberical range that 8bit is indicated is 0-255, the 0-255 picture of corresponding grey scale figure Element value is worth for pixel value, according to these pixel values of the big wisp of file then the every 8bit of binary stream is mapped as a pixel It is adjusted to a two-dimensional matrix, as soon as obtaining picture, also obtains the width and height of a picture, file size is wide with image The corresponding relationship of degree such as the following table 1:

(3) picture pre-processes: after the software for needing training is switched to picture, adjusting to the length of the picture converted with width Unanimously, picture is scaled the picture of 299*299 size by the present embodiment；

(4) it establishes model: doing transfer learning using Inception-Resnet-v2 model, Inception-Resnet-v2 is one A deep neural network based on CNN, using a large amount of 1x1,1x3,3x1, the convolution kernel of 1x7,7x1 extract feature to picture, And several modules, that is, Inception module are formed by these convolution kernels, then these modules are stacked up composition Inception model can preferably extract the more high-order features of picture so that model depth is deepened.For the shallow-layer of picture Information can be transferred to deep layer, and Resnet network, i.e. jump connection, Inception mould need to have been added in each Inception module The network structure of block is such as

Shown in Fig. 3.

Since Inception-Resnet-v2 is a very big model, it needs a large amount of data that could learn to having The feature of effect, but the Malware sample that can be collected into is very limited, if the entire model of re -training can not obtain Preferable effect, therefore the present embodiment is trained Malware sample using the mode of transfer learning.Specifically, using Come to train Inception-Resnet-v2, ImageNet be a large-scale image data collection, it can make ImageNet data set Inception-Resnet-v2 learns to many general features, such as horizontal profile, vertically profiling etc., after training A picture classification model is obtained, on the basis of this model, starts the sample for training Malware, only needs re -training at this time Last two layers of model, the difficulty of model training and the time of training are greatly reduced in this way；

(5) training pattern: carry out building for implementation model using TensorFlow deep learning frame, picture need to be converted to Then tf-record data are divided into training set and test set, training set by the tf-record data format of TensorFlow For training pattern, test set is used to the generalization ability of assessment models, is adjusted again to model parameter according to assessment result, most Obtain a precision and all higher model of recall rate eventually, allow the obtained model of training individually dispose and to Malware into Row classification and Detection.

Certainly, the above is only better embodiments of the invention, and use scope of the invention is not limited with this, therefore, it is all It is to make equivalent change in the principle of the invention should be included within the scope of the present invention.

Claims

1. a kind of detection method of Malware, it is characterised in that: the binary code of software is converted to picture, uses one kind Deep learning model based on CNN carries out transfer learning to picture, makes deep learning model Automatic-searching special by data training Sign, to classify to normal software and Malware.

2. a kind of detection method of Malware according to claim 1, it is characterised in that: the deep learning model is Inception-Resnet-v2。

3. a kind of detection method of Malware according to claim 1 or 2, it is characterised in that: specifically include following step It is rapid:

(1) data preparation: collecting a large amount of normal softwares and Malware, is trained to collected software；

(2) binary file is converted to picture: the binary file of a given Malware, the binary stream of reading are The nonnegative integer vector of 8bit, the numberical range that 8bit is indicated is 0-255, the 0-255 pixel value of corresponding grey scale figure, then two into System flows every 8bit and is mapped as a pixel, is worth for pixel value, according to the big wisp of file, these pixel values are adjusted to one two Matrix is tieed up, a picture is obtained；

(4) it establishes model: training Inception-Resnet-v2 using ImageNet data set, by obtaining one after training A picture classification model starts the sample for training Malware on the basis of this model；

4. a kind of detection method of Malware according to claim 3, it is characterised in that: use Inception- Resnet-v2 model does transfer learning, and Inception-Resnet-v2 is the deep neural network based on CNN, using big The convolution kernel of the 1x1 of amount, 1x3,3x1,1x7,7x1 extract feature to picture, and form several modules by these convolution kernels, also It is Inception module, then these modules is stacked up and constitute Inception model, so that model depth is deepened, in step (4) in, in order to which the shallow-layer information of picture can be transferred to deep layer, Resnet network need to have been added in each Inception module, i.e., Jump connection.