CN110096878A - A kind of detection method of Malware - Google Patents

A kind of detection method of Malware Download PDF

Info

Publication number
CN110096878A
CN110096878A CN201910341543.2A CN201910341543A CN110096878A CN 110096878 A CN110096878 A CN 110096878A CN 201910341543 A CN201910341543 A CN 201910341543A CN 110096878 A CN110096878 A CN 110096878A
Authority
CN
China
Prior art keywords
picture
malware
model
training
inception
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910341543.2A
Other languages
Chinese (zh)
Inventor
袁明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zhimei Interconnection Technology Co Ltd
Original Assignee
Wuhan Zhimei Interconnection Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zhimei Interconnection Technology Co Ltd filed Critical Wuhan Zhimei Interconnection Technology Co Ltd
Priority to CN201910341543.2A priority Critical patent/CN110096878A/en
Publication of CN110096878A publication Critical patent/CN110096878A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of detection method of Malware, it is first picture software Binary Conversion, problem is set to be converted to a picture classification problem, then transfer learning is carried out to picture using a kind of deep learning model Inception-Resnet-v2 based on CNN, it allows model Automatic-searching feature to go to classify to normal software and Malware by the training of mass data, is not necessarily to a large amount of Feature Engineerings.Inception-Resnet-v2 model obtains extraordinary picture classification effect on this data set of ImageNet.The present invention carries out transfer learning again using based on the model after ImageNet data set pre-training come the picture to software, can reach extraordinary effect, and accuracy rate is up to 95% or more.

Description

A kind of detection method of Malware
Technical field
The present invention relates to computers, server security protection technology field, refer in particular to a kind of detection method of Malware.
Background technique
Malware (Malware, malicious software) also known as " rogue software " generally refers to pass through net What the approach such as network, portable memory apparatus disseminated, deliberately personal computer, server, smart machine, computer network etc. are made Leak at privacy or confidential data, system damage, loss of data etc. it is non-using expected failure and information security issue, and attempt User is stopped to remove them in various ways, the form of Malware includes that shelves, script, activity description etc. can be performed in binary system. For definition, computer virus, Trojan Horse, extorts software, spyware, threatening software, utilizes leakage at computer worm Software, the even some ad wares of hole operation, are also included in the classification of Malware.
Malware security threat caused by equipment is very big, they usually pretend oneself by various means, so that right Its detection is very difficult.Past some detection modes mostly use md5 storehouse matching, characteristic matching, the observation of sandbox operation action Etc. means.But if Malware just can make these modes fail after slightly modifying code, cause to find in time new Malware, when being had been subjected to based on these method detection methods, method relatively advanced at present is using deep learning to a large amount of Normal and Malware is trained, to identify Malware, main patent has " WO2017084586A1: based on depth Method, system and the equipment of learning method deduction malicious code rule " " CN102243699A: the deep learning of unknown malicious code Detection method ".But such deep learning method needs to do a large amount of Feature Engineering, extracts for the binary system byte of software Feature, such as byte length, the hidden markov probability of n-gram, comentropy etc..The binary-coded character that is directed to also does word It is embedded in the feature to train Malware, and can only be for a certain software format, such as Android software.This is required Very big workload and cost is calculated, while Detection accuracy places one's entire reliance upon the processing of feature, and generalization ability is weak, no A large amount of other kinds of Malware can be detected.
Existing malware detection techniques mainly have based on the binary static detection of Malware and based on maliciously soft The dynamic detection of part sandbox operation.These detection techniques are for Code obfuscation, and shell adding, the malware detection effect after variation is very Difference.It is either whether harmful for the reversed analysis of code, or the behavior that observation Malware is run in sandbox, these inspections Survey mode cost is all very high, and also high to the competency profiling of safety analysis personnel, and accuracy rate is highly unstable.
Summary of the invention
It is an object of the invention to be directed to the existing state of the art, a kind of detection method of Malware, this method are provided A variety of file formats are supported in the extraction work that can be reduced manual features, increase interpretation, while it is quasi- to be correspondingly improved detection True rate.
In order to achieve the above objectives, the present invention adopts the following technical scheme:
The present invention is a kind of detection method of Malware, and the binary code of software is converted to picture, is based on using one kind The deep learning model of CNN carries out transfer learning to picture, allows deep learning model Automatic-searching feature by data training, from And classify to normal software and Malware.
Further, deep learning model is Inception-Resnet-v2.
The detection method of Malware of the invention, specifically includes the following steps:
(1) data preparation: collecting a large amount of normal softwares and Malware, is trained to collected software, collected soft Part format covers most of common software format, mainly there is exe, apk, jar, doc, docx, xls, xlsx, ppt, pptx, pdf, csv, txt, png, log, tsv, html, js, css, xml;Malicious code is possible to hidden Ensconce the above software format in, the file of these software formats is trained can reach detect to greatest extent it is all kinds of Malware.
(2) binary file is converted to picture: the binary file of a given Malware, the binary stream of reading are For the nonnegative integer vector of 8bit, the numberical range that 8bit is indicated is 0-255, the 0-255 pixel value of corresponding grey scale figure, then two The every 8bit of system stream is mapped as a pixel, is worth for pixel value, according to the big wisp of file, these pixel values are adjusted to one Two-dimensional matrix obtains a picture;
(3) picture pre-processes: after the software for needing training is switched to picture, adjusting to the length of the picture converted with width Unanimously;
(4) it establishes model: doing transfer learning using Inception-Resnet-v2 model, Inception-Resnet-v2 is one A deep neural network based on CNN, using a large amount of 1x1,1x3,3x1, the convolution kernel of 1x7,7x1 extract feature to picture, And several modules, that is, Inception module are formed by these convolution kernels, then these modules are stacked up composition Inception model can preferably extract the more high-order features of picture so that model depth is deepened.For the shallow-layer of picture Information can be transferred to deep layer, and Resnet network, i.e. jump connection need to have been added in each Inception module.Specifically, using Come to train Inception-Resnet-v2, ImageNet be a large-scale image data collection, it can make ImageNet data set Inception-Resnet-v2 learns to many general features, such as horizontal profile, vertically profiling etc., after training A picture classification model is obtained, on the basis of this model, starts the sample for training Malware,;
(5) training pattern: carry out building for implementation model using TensorFlow deep learning frame, picture need to be converted to Then tf-record data are divided into training set and test set, training set by the tf-record data format of TensorFlow For training pattern, test set is used to the generalization ability of assessment models, is adjusted, is made to model parameter again according to assessment result The model that training obtains can be disposed individually and carry out classification and Detection to Malware.
The invention has the benefit that the present invention proposes to use deep learning skill again after visualizing Malware by picture Art is detected.Firstly, being done using picture, analysis is no longer needed for doing decompiling to software and sandbox is run;Secondly, disliking The picture that software image comes out of anticipating has preferable interpretation, and malicious code can be hidden in certain a part of entire software, turn Picture is changed to just regardless of where is it, can be relatively easy to find it, even if the change that malicious code has made very little becomes new Variant, for picture be also it is detectable.Deep learning achieves extraordinary effect in picture detection field, Therefore the Malware for being converted to picture can also have higher detection accuracy.
Model training is completed and after preservation model parameter, when inputting new software document, file is switched to picture, then Classification and Detection is carried out to picture using depth CNN model, process is simple, it is no longer necessary to runs software, also no longer to binary system into Row semantic analysis and to file extract manual features.It supports extensive file format simultaneously, is no longer only to certain a kind of malice Software is effective, but has very strong generalization ability.Other modes are better than in terms of detection efficiency and detection accuracy.
Detailed description of the invention:
Attached drawing 1 is flow chart of the invention;
Attached drawing 2 is the flow chart of file conversion in the present invention;
Attached drawing 3 is the network structure of Inception module in the present invention;
Attached drawing 4 is model training flow chart of the invention.
Specific embodiment:
The binary code of software is converted to a picture, resettles base by a kind of detection method of Malware of the invention In the deep learning model of Inception-Resnet-v2, Inception-Resnet-v2 is a multi-layer C NN neural network, After through the study that exercises supervision to the normal software and Malware that largely mark, Malware can be examined by obtaining one The realization of survey, process such as Fig. 1.
The present embodiment specifically includes the following steps:
(1) data preparation: collecting a large amount of normal softwares and Malware, is trained to collected software, collected soft Part format covers most of common software format, mainly there is exe, apk, jar, doc, docx, xls, xlsx, ppt, pptx, pdf, csv, txt, png, log, tsv, html, js, css, xml;Malicious code is possible to hidden Ensconce the above software format in, the file of these software formats is trained can reach detect to greatest extent it is all kinds of Malware.
(2) binary file is converted to picture: as shown in Fig. 2, the binary file of a given Malware, reading Binary stream is the nonnegative integer vector of 8bit, and the numberical range that 8bit is indicated is 0-255, the 0-255 picture of corresponding grey scale figure Element value is worth for pixel value, according to these pixel values of the big wisp of file then the every 8bit of binary stream is mapped as a pixel It is adjusted to a two-dimensional matrix, as soon as obtaining picture, also obtains the width and height of a picture, file size is wide with image The corresponding relationship of degree such as the following table 1:
(3) picture pre-processes: after the software for needing training is switched to picture, adjusting to the length of the picture converted with width Unanimously, picture is scaled the picture of 299*299 size by the present embodiment;
(4) it establishes model: doing transfer learning using Inception-Resnet-v2 model, Inception-Resnet-v2 is one A deep neural network based on CNN, using a large amount of 1x1,1x3,3x1, the convolution kernel of 1x7,7x1 extract feature to picture, And several modules, that is, Inception module are formed by these convolution kernels, then these modules are stacked up composition Inception model can preferably extract the more high-order features of picture so that model depth is deepened.For the shallow-layer of picture Information can be transferred to deep layer, and Resnet network, i.e. jump connection, Inception mould need to have been added in each Inception module The network structure of block is such as
Shown in Fig. 3.
Since Inception-Resnet-v2 is a very big model, it needs a large amount of data that could learn to having The feature of effect, but the Malware sample that can be collected into is very limited, if the entire model of re -training can not obtain Preferable effect, therefore the present embodiment is trained Malware sample using the mode of transfer learning.Specifically, using Come to train Inception-Resnet-v2, ImageNet be a large-scale image data collection, it can make ImageNet data set Inception-Resnet-v2 learns to many general features, such as horizontal profile, vertically profiling etc., after training A picture classification model is obtained, on the basis of this model, starts the sample for training Malware, only needs re -training at this time Last two layers of model, the difficulty of model training and the time of training are greatly reduced in this way;
(5) training pattern: carry out building for implementation model using TensorFlow deep learning frame, picture need to be converted to Then tf-record data are divided into training set and test set, training set by the tf-record data format of TensorFlow For training pattern, test set is used to the generalization ability of assessment models, is adjusted again to model parameter according to assessment result, most Obtain a precision and all higher model of recall rate eventually, allow the obtained model of training individually dispose and to Malware into Row classification and Detection.
Certainly, the above is only better embodiments of the invention, and use scope of the invention is not limited with this, therefore, it is all It is to make equivalent change in the principle of the invention should be included within the scope of the present invention.

Claims (4)

1. a kind of detection method of Malware, it is characterised in that: the binary code of software is converted to picture, uses one kind Deep learning model based on CNN carries out transfer learning to picture, makes deep learning model Automatic-searching special by data training Sign, to classify to normal software and Malware.
2. a kind of detection method of Malware according to claim 1, it is characterised in that: the deep learning model is Inception-Resnet-v2。
3. a kind of detection method of Malware according to claim 1 or 2, it is characterised in that: specifically include following step It is rapid:
(1) data preparation: collecting a large amount of normal softwares and Malware, is trained to collected software;
(2) binary file is converted to picture: the binary file of a given Malware, the binary stream of reading are The nonnegative integer vector of 8bit, the numberical range that 8bit is indicated is 0-255, the 0-255 pixel value of corresponding grey scale figure, then two into System flows every 8bit and is mapped as a pixel, is worth for pixel value, according to the big wisp of file, these pixel values are adjusted to one two Matrix is tieed up, a picture is obtained;
(3) picture pre-processes: after the software for needing training is switched to picture, adjusting to the length of the picture converted with width Unanimously;
(4) it establishes model: training Inception-Resnet-v2 using ImageNet data set, by obtaining one after training A picture classification model starts the sample for training Malware on the basis of this model;
(5) training pattern: carry out building for implementation model using TensorFlow deep learning frame, picture need to be converted to Then tf-record data are divided into training set and test set, training set by the tf-record data format of TensorFlow For training pattern, test set is used to the generalization ability of assessment models, is adjusted, is made to model parameter again according to assessment result The model that training obtains can be disposed individually and carry out classification and Detection to Malware.
4. a kind of detection method of Malware according to claim 3, it is characterised in that: use Inception- Resnet-v2 model does transfer learning, and Inception-Resnet-v2 is the deep neural network based on CNN, using big The convolution kernel of the 1x1 of amount, 1x3,3x1,1x7,7x1 extract feature to picture, and form several modules by these convolution kernels, also It is Inception module, then these modules is stacked up and constitute Inception model, so that model depth is deepened, in step (4) in, in order to which the shallow-layer information of picture can be transferred to deep layer, Resnet network need to have been added in each Inception module, i.e., Jump connection.
CN201910341543.2A 2019-04-26 2019-04-26 A kind of detection method of Malware Withdrawn CN110096878A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910341543.2A CN110096878A (en) 2019-04-26 2019-04-26 A kind of detection method of Malware

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910341543.2A CN110096878A (en) 2019-04-26 2019-04-26 A kind of detection method of Malware

Publications (1)

Publication Number Publication Date
CN110096878A true CN110096878A (en) 2019-08-06

Family

ID=67445964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910341543.2A Withdrawn CN110096878A (en) 2019-04-26 2019-04-26 A kind of detection method of Malware

Country Status (1)

Country Link
CN (1) CN110096878A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704842A (en) * 2019-09-27 2020-01-17 山东理工大学 Malicious code family classification detection method
CN110717412A (en) * 2019-09-23 2020-01-21 广东工业大学 Method and system for detecting malicious PDF document
CN110879888A (en) * 2019-11-15 2020-03-13 新华三大数据技术有限公司 Virus file detection method, device and equipment
CN111259397A (en) * 2020-02-12 2020-06-09 四川大学 Malware classification method based on Markov graph and deep learning
CN111552964A (en) * 2020-04-07 2020-08-18 哈尔滨工程大学 Malicious software classification method based on static analysis
CN111581640A (en) * 2020-04-02 2020-08-25 北京兰云科技有限公司 Malicious software detection method, device and equipment and storage medium
CN111651762A (en) * 2020-04-21 2020-09-11 浙江大学 Convolutional neural network-based PE (provider edge) malicious software detection method
CN114510717A (en) * 2022-01-25 2022-05-17 上海斗象信息科技有限公司 ELF file detection method and device and storage medium
CN114756860A (en) * 2022-02-22 2022-07-15 广州大学 Malicious software detection method based on meta-path

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103235A (en) * 2017-02-27 2017-08-29 广东工业大学 A kind of Android malware detection method based on convolutional neural networks
CN108985060A (en) * 2018-07-04 2018-12-11 中共中央办公厅电子科技学院 A kind of extensive Android Malware automated detection system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103235A (en) * 2017-02-27 2017-08-29 广东工业大学 A kind of Android malware detection method based on convolutional neural networks
CN108985060A (en) * 2018-07-04 2018-12-11 中共中央办公厅电子科技学院 A kind of extensive Android Malware automated detection system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JAEMIN JUNG, JONGMOO CHOI等: "Android Malware Detection using Convolutional Neural", 《INTERNATIONAL CONFERENCE ON RESEARCH IN ADAPTIVE AND CONVERGENT SYSTEMS》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717412A (en) * 2019-09-23 2020-01-21 广东工业大学 Method and system for detecting malicious PDF document
CN110704842A (en) * 2019-09-27 2020-01-17 山东理工大学 Malicious code family classification detection method
CN110879888A (en) * 2019-11-15 2020-03-13 新华三大数据技术有限公司 Virus file detection method, device and equipment
CN111259397A (en) * 2020-02-12 2020-06-09 四川大学 Malware classification method based on Markov graph and deep learning
CN111259397B (en) * 2020-02-12 2022-04-19 四川大学 Malware classification method based on Markov graph and deep learning
CN111581640A (en) * 2020-04-02 2020-08-25 北京兰云科技有限公司 Malicious software detection method, device and equipment and storage medium
CN111552964A (en) * 2020-04-07 2020-08-18 哈尔滨工程大学 Malicious software classification method based on static analysis
CN111651762A (en) * 2020-04-21 2020-09-11 浙江大学 Convolutional neural network-based PE (provider edge) malicious software detection method
CN114510717A (en) * 2022-01-25 2022-05-17 上海斗象信息科技有限公司 ELF file detection method and device and storage medium
CN114756860A (en) * 2022-02-22 2022-07-15 广州大学 Malicious software detection method based on meta-path

Similar Documents

Publication Publication Date Title
CN110096878A (en) A kind of detection method of Malware
Aboaoja et al. Malware detection issues, challenges, and future directions: A survey
Kumar et al. Malicious code detection based on image processing using deep learning
EP4058916A1 (en) Detecting unknown malicious content in computer systems
Hou et al. Droiddelver: An android malware detection system using deep belief network based on api call blocks
US11481492B2 (en) Method and system for static behavior-predictive malware detection
Gao et al. Malware classification for the cloud via semi-supervised transfer learning
Sabhadiya et al. Android malware detection using deep learning
CN110765458A (en) Malicious software detection method and device based on deep learning
KR102007809B1 (en) A exploit kit detection system based on the neural net using image
Zhao et al. Maldeep: A deep learning classification framework against malware variants based on texture visualization
CN107944274A (en) A kind of Android platform malicious application off-line checking method based on width study
CN109614795B (en) Event-aware android malicious software detection method
CN109858248A (en) Malice Word document detection method and device
CN104715194B (en) Malware detection method and apparatus
Allix et al. Large-scale machine learning-based malware detection: confronting the" 10-fold cross validation" scheme with reality
CN113901465A (en) Heterogeneous network-based Android malicious software detection method
Zhang et al. MalCaps: a capsule network based model for the malware classification
CN106650434B (en) A kind of virtual machine anomaly detection method and system based on I/O sequence
Wu A systematical study for deep learning based android malware detection
Kornish et al. Malware classification using deep convolutional neural networks
Yoo et al. The image game: exploit kit detection based on recursive convolutional neural networks
Chen et al. Android malware classification using XGBoost based on images patterns
Suryotrisongko et al. Topic modeling for cyber threat intelligence (cti)
Ye et al. Android malware detection technology based on lightweight convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20190806

WW01 Invention patent application withdrawn after publication