CN110096878A - A kind of detection method of Malware - Google Patents
A kind of detection method of Malware Download PDFInfo
- Publication number
- CN110096878A CN110096878A CN201910341543.2A CN201910341543A CN110096878A CN 110096878 A CN110096878 A CN 110096878A CN 201910341543 A CN201910341543 A CN 201910341543A CN 110096878 A CN110096878 A CN 110096878A
- Authority
- CN
- China
- Prior art keywords
- picture
- malware
- model
- training
- inception
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of detection method of Malware, it is first picture software Binary Conversion, problem is set to be converted to a picture classification problem, then transfer learning is carried out to picture using a kind of deep learning model Inception-Resnet-v2 based on CNN, it allows model Automatic-searching feature to go to classify to normal software and Malware by the training of mass data, is not necessarily to a large amount of Feature Engineerings.Inception-Resnet-v2 model obtains extraordinary picture classification effect on this data set of ImageNet.The present invention carries out transfer learning again using based on the model after ImageNet data set pre-training come the picture to software, can reach extraordinary effect, and accuracy rate is up to 95% or more.
Description
Technical field
The present invention relates to computers, server security protection technology field, refer in particular to a kind of detection method of Malware.
Background technique
Malware (Malware, malicious software) also known as " rogue software " generally refers to pass through net
What the approach such as network, portable memory apparatus disseminated, deliberately personal computer, server, smart machine, computer network etc. are made
Leak at privacy or confidential data, system damage, loss of data etc. it is non-using expected failure and information security issue, and attempt
User is stopped to remove them in various ways, the form of Malware includes that shelves, script, activity description etc. can be performed in binary system.
For definition, computer virus, Trojan Horse, extorts software, spyware, threatening software, utilizes leakage at computer worm
Software, the even some ad wares of hole operation, are also included in the classification of Malware.
Malware security threat caused by equipment is very big, they usually pretend oneself by various means, so that right
Its detection is very difficult.Past some detection modes mostly use md5 storehouse matching, characteristic matching, the observation of sandbox operation action
Etc. means.But if Malware just can make these modes fail after slightly modifying code, cause to find in time new
Malware, when being had been subjected to based on these method detection methods, method relatively advanced at present is using deep learning to a large amount of
Normal and Malware is trained, to identify Malware, main patent has " WO2017084586A1: based on depth
Method, system and the equipment of learning method deduction malicious code rule " " CN102243699A: the deep learning of unknown malicious code
Detection method ".But such deep learning method needs to do a large amount of Feature Engineering, extracts for the binary system byte of software
Feature, such as byte length, the hidden markov probability of n-gram, comentropy etc..The binary-coded character that is directed to also does word
It is embedded in the feature to train Malware, and can only be for a certain software format, such as Android software.This is required
Very big workload and cost is calculated, while Detection accuracy places one's entire reliance upon the processing of feature, and generalization ability is weak, no
A large amount of other kinds of Malware can be detected.
Existing malware detection techniques mainly have based on the binary static detection of Malware and based on maliciously soft
The dynamic detection of part sandbox operation.These detection techniques are for Code obfuscation, and shell adding, the malware detection effect after variation is very
Difference.It is either whether harmful for the reversed analysis of code, or the behavior that observation Malware is run in sandbox, these inspections
Survey mode cost is all very high, and also high to the competency profiling of safety analysis personnel, and accuracy rate is highly unstable.
Summary of the invention
It is an object of the invention to be directed to the existing state of the art, a kind of detection method of Malware, this method are provided
A variety of file formats are supported in the extraction work that can be reduced manual features, increase interpretation, while it is quasi- to be correspondingly improved detection
True rate.
In order to achieve the above objectives, the present invention adopts the following technical scheme:
The present invention is a kind of detection method of Malware, and the binary code of software is converted to picture, is based on using one kind
The deep learning model of CNN carries out transfer learning to picture, allows deep learning model Automatic-searching feature by data training, from
And classify to normal software and Malware.
Further, deep learning model is Inception-Resnet-v2.
The detection method of Malware of the invention, specifically includes the following steps:
(1) data preparation: collecting a large amount of normal softwares and Malware, is trained to collected software, collected soft
Part format covers most of common software format, mainly there is exe, apk, jar, doc, docx, xls, xlsx,
ppt, pptx, pdf, csv, txt, png, log, tsv, html, js, css, xml;Malicious code is possible to hidden
Ensconce the above software format in, the file of these software formats is trained can reach detect to greatest extent it is all kinds of
Malware.
(2) binary file is converted to picture: the binary file of a given Malware, the binary stream of reading are
For the nonnegative integer vector of 8bit, the numberical range that 8bit is indicated is 0-255, the 0-255 pixel value of corresponding grey scale figure, then two
The every 8bit of system stream is mapped as a pixel, is worth for pixel value, according to the big wisp of file, these pixel values are adjusted to one
Two-dimensional matrix obtains a picture;
(3) picture pre-processes: after the software for needing training is switched to picture, adjusting to the length of the picture converted with width
Unanimously;
(4) it establishes model: doing transfer learning using Inception-Resnet-v2 model, Inception-Resnet-v2 is one
A deep neural network based on CNN, using a large amount of 1x1,1x3,3x1, the convolution kernel of 1x7,7x1 extract feature to picture,
And several modules, that is, Inception module are formed by these convolution kernels, then these modules are stacked up composition
Inception model can preferably extract the more high-order features of picture so that model depth is deepened.For the shallow-layer of picture
Information can be transferred to deep layer, and Resnet network, i.e. jump connection need to have been added in each Inception module.Specifically, using
Come to train Inception-Resnet-v2, ImageNet be a large-scale image data collection, it can make ImageNet data set
Inception-Resnet-v2 learns to many general features, such as horizontal profile, vertically profiling etc., after training
A picture classification model is obtained, on the basis of this model, starts the sample for training Malware,;
(5) training pattern: carry out building for implementation model using TensorFlow deep learning frame, picture need to be converted to
Then tf-record data are divided into training set and test set, training set by the tf-record data format of TensorFlow
For training pattern, test set is used to the generalization ability of assessment models, is adjusted, is made to model parameter again according to assessment result
The model that training obtains can be disposed individually and carry out classification and Detection to Malware.
The invention has the benefit that the present invention proposes to use deep learning skill again after visualizing Malware by picture
Art is detected.Firstly, being done using picture, analysis is no longer needed for doing decompiling to software and sandbox is run;Secondly, disliking
The picture that software image comes out of anticipating has preferable interpretation, and malicious code can be hidden in certain a part of entire software, turn
Picture is changed to just regardless of where is it, can be relatively easy to find it, even if the change that malicious code has made very little becomes new
Variant, for picture be also it is detectable.Deep learning achieves extraordinary effect in picture detection field,
Therefore the Malware for being converted to picture can also have higher detection accuracy.
Model training is completed and after preservation model parameter, when inputting new software document, file is switched to picture, then
Classification and Detection is carried out to picture using depth CNN model, process is simple, it is no longer necessary to runs software, also no longer to binary system into
Row semantic analysis and to file extract manual features.It supports extensive file format simultaneously, is no longer only to certain a kind of malice
Software is effective, but has very strong generalization ability.Other modes are better than in terms of detection efficiency and detection accuracy.
Detailed description of the invention:
Attached drawing 1 is flow chart of the invention;
Attached drawing 2 is the flow chart of file conversion in the present invention;
Attached drawing 3 is the network structure of Inception module in the present invention;
Attached drawing 4 is model training flow chart of the invention.
Specific embodiment:
The binary code of software is converted to a picture, resettles base by a kind of detection method of Malware of the invention
In the deep learning model of Inception-Resnet-v2, Inception-Resnet-v2 is a multi-layer C NN neural network,
After through the study that exercises supervision to the normal software and Malware that largely mark, Malware can be examined by obtaining one
The realization of survey, process such as Fig. 1.
The present embodiment specifically includes the following steps:
(1) data preparation: collecting a large amount of normal softwares and Malware, is trained to collected software, collected soft
Part format covers most of common software format, mainly there is exe, apk, jar, doc, docx, xls, xlsx,
ppt, pptx, pdf, csv, txt, png, log, tsv, html, js, css, xml;Malicious code is possible to hidden
Ensconce the above software format in, the file of these software formats is trained can reach detect to greatest extent it is all kinds of
Malware.
(2) binary file is converted to picture: as shown in Fig. 2, the binary file of a given Malware, reading
Binary stream is the nonnegative integer vector of 8bit, and the numberical range that 8bit is indicated is 0-255, the 0-255 picture of corresponding grey scale figure
Element value is worth for pixel value, according to these pixel values of the big wisp of file then the every 8bit of binary stream is mapped as a pixel
It is adjusted to a two-dimensional matrix, as soon as obtaining picture, also obtains the width and height of a picture, file size is wide with image
The corresponding relationship of degree such as the following table 1:
(3) picture pre-processes: after the software for needing training is switched to picture, adjusting to the length of the picture converted with width
Unanimously, picture is scaled the picture of 299*299 size by the present embodiment;
(4) it establishes model: doing transfer learning using Inception-Resnet-v2 model, Inception-Resnet-v2 is one
A deep neural network based on CNN, using a large amount of 1x1,1x3,3x1, the convolution kernel of 1x7,7x1 extract feature to picture,
And several modules, that is, Inception module are formed by these convolution kernels, then these modules are stacked up composition
Inception model can preferably extract the more high-order features of picture so that model depth is deepened.For the shallow-layer of picture
Information can be transferred to deep layer, and Resnet network, i.e. jump connection, Inception mould need to have been added in each Inception module
The network structure of block is such as
Shown in Fig. 3.
Since Inception-Resnet-v2 is a very big model, it needs a large amount of data that could learn to having
The feature of effect, but the Malware sample that can be collected into is very limited, if the entire model of re -training can not obtain
Preferable effect, therefore the present embodiment is trained Malware sample using the mode of transfer learning.Specifically, using
Come to train Inception-Resnet-v2, ImageNet be a large-scale image data collection, it can make ImageNet data set
Inception-Resnet-v2 learns to many general features, such as horizontal profile, vertically profiling etc., after training
A picture classification model is obtained, on the basis of this model, starts the sample for training Malware, only needs re -training at this time
Last two layers of model, the difficulty of model training and the time of training are greatly reduced in this way;
(5) training pattern: carry out building for implementation model using TensorFlow deep learning frame, picture need to be converted to
Then tf-record data are divided into training set and test set, training set by the tf-record data format of TensorFlow
For training pattern, test set is used to the generalization ability of assessment models, is adjusted again to model parameter according to assessment result, most
Obtain a precision and all higher model of recall rate eventually, allow the obtained model of training individually dispose and to Malware into
Row classification and Detection.
Certainly, the above is only better embodiments of the invention, and use scope of the invention is not limited with this, therefore, it is all
It is to make equivalent change in the principle of the invention should be included within the scope of the present invention.
Claims (4)
1. a kind of detection method of Malware, it is characterised in that: the binary code of software is converted to picture, uses one kind
Deep learning model based on CNN carries out transfer learning to picture, makes deep learning model Automatic-searching special by data training
Sign, to classify to normal software and Malware.
2. a kind of detection method of Malware according to claim 1, it is characterised in that: the deep learning model is
Inception-Resnet-v2。
3. a kind of detection method of Malware according to claim 1 or 2, it is characterised in that: specifically include following step
It is rapid:
(1) data preparation: collecting a large amount of normal softwares and Malware, is trained to collected software;
(2) binary file is converted to picture: the binary file of a given Malware, the binary stream of reading are
The nonnegative integer vector of 8bit, the numberical range that 8bit is indicated is 0-255, the 0-255 pixel value of corresponding grey scale figure, then two into
System flows every 8bit and is mapped as a pixel, is worth for pixel value, according to the big wisp of file, these pixel values are adjusted to one two
Matrix is tieed up, a picture is obtained;
(3) picture pre-processes: after the software for needing training is switched to picture, adjusting to the length of the picture converted with width
Unanimously;
(4) it establishes model: training Inception-Resnet-v2 using ImageNet data set, by obtaining one after training
A picture classification model starts the sample for training Malware on the basis of this model;
(5) training pattern: carry out building for implementation model using TensorFlow deep learning frame, picture need to be converted to
Then tf-record data are divided into training set and test set, training set by the tf-record data format of TensorFlow
For training pattern, test set is used to the generalization ability of assessment models, is adjusted, is made to model parameter again according to assessment result
The model that training obtains can be disposed individually and carry out classification and Detection to Malware.
4. a kind of detection method of Malware according to claim 3, it is characterised in that: use Inception-
Resnet-v2 model does transfer learning, and Inception-Resnet-v2 is the deep neural network based on CNN, using big
The convolution kernel of the 1x1 of amount, 1x3,3x1,1x7,7x1 extract feature to picture, and form several modules by these convolution kernels, also
It is Inception module, then these modules is stacked up and constitute Inception model, so that model depth is deepened, in step
(4) in, in order to which the shallow-layer information of picture can be transferred to deep layer, Resnet network need to have been added in each Inception module, i.e.,
Jump connection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910341543.2A CN110096878A (en) | 2019-04-26 | 2019-04-26 | A kind of detection method of Malware |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910341543.2A CN110096878A (en) | 2019-04-26 | 2019-04-26 | A kind of detection method of Malware |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110096878A true CN110096878A (en) | 2019-08-06 |
Family
ID=67445964
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910341543.2A Withdrawn CN110096878A (en) | 2019-04-26 | 2019-04-26 | A kind of detection method of Malware |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110096878A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704842A (en) * | 2019-09-27 | 2020-01-17 | 山东理工大学 | Malicious code family classification detection method |
CN110717412A (en) * | 2019-09-23 | 2020-01-21 | 广东工业大学 | Method and system for detecting malicious PDF document |
CN110879888A (en) * | 2019-11-15 | 2020-03-13 | 新华三大数据技术有限公司 | Virus file detection method, device and equipment |
CN111259397A (en) * | 2020-02-12 | 2020-06-09 | 四川大学 | Malware classification method based on Markov graph and deep learning |
CN111552964A (en) * | 2020-04-07 | 2020-08-18 | 哈尔滨工程大学 | Malicious software classification method based on static analysis |
CN111581640A (en) * | 2020-04-02 | 2020-08-25 | 北京兰云科技有限公司 | Malicious software detection method, device and equipment and storage medium |
CN111651762A (en) * | 2020-04-21 | 2020-09-11 | 浙江大学 | Convolutional neural network-based PE (provider edge) malicious software detection method |
CN114510717A (en) * | 2022-01-25 | 2022-05-17 | 上海斗象信息科技有限公司 | ELF file detection method and device and storage medium |
CN114756860A (en) * | 2022-02-22 | 2022-07-15 | 广州大学 | Malicious software detection method based on meta-path |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107103235A (en) * | 2017-02-27 | 2017-08-29 | 广东工业大学 | A kind of Android malware detection method based on convolutional neural networks |
CN108985060A (en) * | 2018-07-04 | 2018-12-11 | 中共中央办公厅电子科技学院 | A kind of extensive Android Malware automated detection system and method |
-
2019
- 2019-04-26 CN CN201910341543.2A patent/CN110096878A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107103235A (en) * | 2017-02-27 | 2017-08-29 | 广东工业大学 | A kind of Android malware detection method based on convolutional neural networks |
CN108985060A (en) * | 2018-07-04 | 2018-12-11 | 中共中央办公厅电子科技学院 | A kind of extensive Android Malware automated detection system and method |
Non-Patent Citations (1)
Title |
---|
JAEMIN JUNG, JONGMOO CHOI等: "Android Malware Detection using Convolutional Neural", 《INTERNATIONAL CONFERENCE ON RESEARCH IN ADAPTIVE AND CONVERGENT SYSTEMS》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717412A (en) * | 2019-09-23 | 2020-01-21 | 广东工业大学 | Method and system for detecting malicious PDF document |
CN110704842A (en) * | 2019-09-27 | 2020-01-17 | 山东理工大学 | Malicious code family classification detection method |
CN110879888A (en) * | 2019-11-15 | 2020-03-13 | 新华三大数据技术有限公司 | Virus file detection method, device and equipment |
CN111259397A (en) * | 2020-02-12 | 2020-06-09 | 四川大学 | Malware classification method based on Markov graph and deep learning |
CN111259397B (en) * | 2020-02-12 | 2022-04-19 | 四川大学 | Malware classification method based on Markov graph and deep learning |
CN111581640A (en) * | 2020-04-02 | 2020-08-25 | 北京兰云科技有限公司 | Malicious software detection method, device and equipment and storage medium |
CN111552964A (en) * | 2020-04-07 | 2020-08-18 | 哈尔滨工程大学 | Malicious software classification method based on static analysis |
CN111651762A (en) * | 2020-04-21 | 2020-09-11 | 浙江大学 | Convolutional neural network-based PE (provider edge) malicious software detection method |
CN114510717A (en) * | 2022-01-25 | 2022-05-17 | 上海斗象信息科技有限公司 | ELF file detection method and device and storage medium |
CN114756860A (en) * | 2022-02-22 | 2022-07-15 | 广州大学 | Malicious software detection method based on meta-path |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110096878A (en) | A kind of detection method of Malware | |
Aboaoja et al. | Malware detection issues, challenges, and future directions: A survey | |
Kumar et al. | Malicious code detection based on image processing using deep learning | |
EP4058916A1 (en) | Detecting unknown malicious content in computer systems | |
Hou et al. | Droiddelver: An android malware detection system using deep belief network based on api call blocks | |
US11481492B2 (en) | Method and system for static behavior-predictive malware detection | |
Gao et al. | Malware classification for the cloud via semi-supervised transfer learning | |
Sabhadiya et al. | Android malware detection using deep learning | |
CN110765458A (en) | Malicious software detection method and device based on deep learning | |
KR102007809B1 (en) | A exploit kit detection system based on the neural net using image | |
Zhao et al. | Maldeep: A deep learning classification framework against malware variants based on texture visualization | |
CN107944274A (en) | A kind of Android platform malicious application off-line checking method based on width study | |
CN109614795B (en) | Event-aware android malicious software detection method | |
CN109858248A (en) | Malice Word document detection method and device | |
CN104715194B (en) | Malware detection method and apparatus | |
Allix et al. | Large-scale machine learning-based malware detection: confronting the" 10-fold cross validation" scheme with reality | |
CN113901465A (en) | Heterogeneous network-based Android malicious software detection method | |
Zhang et al. | MalCaps: a capsule network based model for the malware classification | |
CN106650434B (en) | A kind of virtual machine anomaly detection method and system based on I/O sequence | |
Wu | A systematical study for deep learning based android malware detection | |
Kornish et al. | Malware classification using deep convolutional neural networks | |
Yoo et al. | The image game: exploit kit detection based on recursive convolutional neural networks | |
Chen et al. | Android malware classification using XGBoost based on images patterns | |
Suryotrisongko et al. | Topic modeling for cyber threat intelligence (cti) | |
Ye et al. | Android malware detection technology based on lightweight convolutional neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190806 |
|
WW01 | Invention patent application withdrawn after publication |