CN111814147A - Android malicious software detection method based on model library - Google Patents

Android malicious software detection method based on model library Download PDF

Info

Publication number
CN111814147A
CN111814147A CN202010495937.6A CN202010495937A CN111814147A CN 111814147 A CN111814147 A CN 111814147A CN 202010495937 A CN202010495937 A CN 202010495937A CN 111814147 A CN111814147 A CN 111814147A
Authority
CN
China
Prior art keywords
models
model
base
data set
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010495937.6A
Other languages
Chinese (zh)
Inventor
李涛
余东豪
余鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Science and Engineering WUSE
Wuhan University of Science and Technology WHUST
Original Assignee
Wuhan University of Science and Engineering WUSE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Science and Engineering WUSE filed Critical Wuhan University of Science and Engineering WUSE
Priority to CN202010495937.6A priority Critical patent/CN111814147A/en
Publication of CN111814147A publication Critical patent/CN111814147A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses an android malicious software detection method based on a model library, which comprises the following steps: 1) establishing a data set by adopting android software application, and marking the data set; 2) dividing the training set into a data set 1 and a data set 2 according to the proportion of 8:2, and training the algorithm in the algorithm set by using the data in the data set 1 to generate a BaseModel; 3) randomly combining BaseModel Models to obtain Models; 4) detecting and evaluating the models by using the data set 2 to obtain the accuracy detection result of each model; 5) adjusting the weight between the base models; 6) repeating the steps 4) to 5) until the best value of the detection result is not changed; 7) sequencing the models to obtain k models with the best recognition effect; 8) and calculating the accuracy, the recall rate and the F1 value of the k models by using the test set, and performing android malware detection by adopting the model with the best effect. The method is suitable for detecting the android malicious software of multiple groups.

Description

Android malicious software detection method based on model library
Technical Field
The invention relates to an information security technology, in particular to an android malicious software detection method based on a model library.
Background
There are a large number of applications that have multiple class labels, and are multi-population. The feature difference of the APP can lead to that one application belongs to a plurality of populations, the application scenes are crossed and overlapped, the populations of the APP cannot be accurately marked out, so that the situation that the multiple populations of applications are detected by a recognizer trained by one population is difficult, and android application maliciousness detection cannot be directly carried out at a population angle. Therefore, a method for establishing a model library is further provided, so that the problem that malicious detection is difficult for multiple populations of APPs is solved, and multiple identifiers are required to be combined for detection. Therefore, finding the optimal recognizer combination quickly becomes the key to solving the problem.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for detecting android malicious software based on a model library aiming at the defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows: an android malicious software detection method based on a model library comprises the following steps:
1) the method comprises the steps of collecting android software application to establish a data set, marking the data set, and dividing the marked data set into a test set and a training set;
2) dividing the training set into a data set 1 and a data set 2 according to the proportion of 8:2, and training the algorithm in the algorithm set by using the data in the data set 1 to generate a BaseModel;
the algorithm set is an algorithm set packaged by various classification algorithms including SVM, RF and FC;
3) randomly combining BaseModel Models to obtain Models; the model is an integrated recognizer composed of a plurality of base models and comprises all the base models and the weights w of all the base models in the modeli
Wherein, wi=ni/N;
Wherein n isiIs a BaseModel of a base model in a modeliN is the total number of base models in the model;
4) detecting and evaluating the models by using the data set 2 to obtain the detection result of each model;
5) adjusting the weight between the base models;
6) repeating the steps 4) to 5) until the times of weight adjustment reach a set value or the best value of the detection result does not change any more;
7) sequencing the models to obtain k models with the best recognition effect, and confirming the combination and the weight of the base models;
8) and calculating the accuracy, the recall rate and the F1 value of the k models by using the test set, and performing android malware detection by adopting the model with the best effect.
According to the scheme, the process of randomly combining BaseModel base Models in the step 3) to obtain model Models is as follows: the method comprises the steps of firstly determining the upper limit of the total number of basic models in a model, then setting the number of the basic models by combining random numbers which are less than or equal to the upper limit value, and then combining to form the model.
According to the scheme, the weight between the base models in the step 5) is adjusted to be random weight adjustment.
According to the scheme, the following method is adopted for adjusting the weight between the basic models in the step 5):
if a certain base model is BaseModelpIf the weight of the model is larger than the set threshold value, another base model BaseModel is randomly selectedqThe weights of both are set to (w)p+wq)/2。
According to the scheme, the following method is adopted for adjusting the weight between the basic models in the step 5):
selecting m basic models in the model detected in the step 4) according to a preset probability P, and replacing the basic models by n randomly generated basic models to complete weight adjustment among the basic models to form a new model, wherein Np is more than or equal to m and more than or equal to 1, and m is more than or equal to n and more than or equal to 1.
According to the scheme, in the step 5), in the new model, if the weight of a certain base model exceeds the set threshold, the weight of the base model is adjusted to the set threshold.
The invention has the following beneficial effects:
the model established by the detection method is suitable for detecting the android malicious software of multiple groups, and the accuracy rate of the model can meet the set requirement.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of a method of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, a method for detecting android malware based on a model library includes the following steps:
1) the method comprises the steps of collecting android software application to establish a data set, marking the data set, and dividing the marked data set into a test set and a training set;
in this embodiment, the crawled 98 types of applications are used as a data set, the applications marked as benign are used as positive samples, the applications marked as malicious are used as negative samples, and the applications are extracted from a plurality of populations of the data set to provide training data and test data for the establishment of a model base. The data set is divided into a training set TrainSet and a testing set TestSet according to the proportion of 9: 1.
2) Dividing the training set into a data set 1 and a data set 2 according to the proportion of 8:2, and training the algorithm in the algorithm set by using the data in the data set 1 to generate a BaseModel;
the algorithm set is an algorithm set packaged by various classification algorithms including SVM, RF and FC;
the algorithm set is as follows: algorithms ═ algorithmm 1, algorithmm 2, …, Algorithmn }; implementing and packaging various algorithms to enable the algorithms to have a uniform input and output format;
the base model is a recognizer generated by training a single algorithm by multiple groups of application data sets;
3) randomly combining the BaseModel Models to obtain Models of all combinations; the model is an integrated recognizer consisting of a plurality of base models and comprises all the base models and the weights of all the base models in the model;
the model Models are obtained by random combination among the BaseModels and the process is as follows: the method comprises the steps of firstly determining the upper limit of the total number of basic models in a model, then setting the number of the basic models by combining random numbers which are less than or equal to the upper limit value, and then combining to form the model.
Model library:
ModelLibrary={BaseModel1,BaseModel2…,BaseModeln}.
the ModelLibrary is composed of all basic models, and stores the basic models generated by all algorithms in the algorithm set;
the model is an integrated recognizer composed of a plurality of base models and comprises all the base models and the weights w of all the base models in the modeli
Wherein, wi=ni/N;
Wherein n isiIs a BaseModel of a base model in a modeliN is the total number of base models in the model;
4) detecting and evaluating the models by using the data set 2 to obtain the identification accuracy rate detection result of each model;
5) adjusting the weight between the base models;
random weight adjustment:
the weight between the base models is adjusted to adopt random weight adjustment if a certain base model is BaseModelpIf the weight of the model is larger than the set threshold value, another base model BaseModel is randomly selectedqThe weights of both are set to (w)p+wq)/2。
2. Replacement adjustment
Selecting m basic models in the model detected in the step 4) according to a preset probability P, and replacing the basic models by n randomly generated basic models to complete weight adjustment among the basic models to form a new model, wherein Np is more than or equal to m and more than or equal to 1, and m is more than or equal to n and more than or equal to 1.
In the process of forming a new model, if the weight of a certain base model exceeds a set threshold, the weight of the base model is adjusted to the set threshold.
6) Repeating the steps 4) to 5) until the times of weight adjustment reach a set value or the best value of the detection result does not change any more;
7) sequencing the models to obtain k models with the best recognition effect, and confirming the combination and the weight of the base models;
8) and calculating the accuracy, recall rate and F1 value of k models by using the test set, and performing android malware detection by using the model with the best effect, wherein k is generally 3-10.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (6)

1. An android malicious software detection method based on a model library is characterized by comprising the following steps:
1) the method comprises the steps of collecting android software application to establish a data set, marking the data set, and dividing the marked data set into a test set and a training set;
2) dividing the training set into a data set 1 and a data set 2 according to the proportion of 8:2, and training the algorithm in the algorithm set by using the data in the data set 1 to generate a BaseModel;
the algorithm set is an algorithm set packaged by various classification algorithms including SVM, RF and FC;
3) randomly combining BaseModel Models to obtain Models; the model is an integrated recognizer composed of a plurality of base models and comprises all the base models and the weights w of all the base models in the modeli
Wherein, wi=ni/N;
Wherein n isiIs a BaseModel of a base model in a modeliN is the total number of base models in the model;
4) detecting and evaluating the models by using the data set 2 to obtain the accuracy detection result of each model;
5) adjusting the weight between the base models in the model;
6) repeating the steps 4) to 5) until the best value of the detection result is not changed;
7) sequencing the models to obtain k models with the best recognition effect, and confirming the combination and the weight of the base models;
8) and calculating the accuracy, the recall rate and the F1 value of the k models by using the test set, and performing android malware detection by adopting the model with the best effect.
2. The method for detecting android malware based on model library of claim 1, wherein the model Models are obtained by random combination between base Models BaseModel in step 3) as follows: the method comprises the steps of firstly determining the upper limit of the total number of basic models in a model, then setting the number of the basic models by combining random numbers which are less than or equal to the upper limit value, and then combining to form the model.
3. The method as claimed in claim 1, wherein the weights between the base models in step 5) are adjusted by random weight adjustment.
4. The method of claim 3, wherein the adjusting the weights between the base models in step 5) is performed by:
after random weight adjustment, certain base model BaseModelpIf the weight of the model is larger than the set threshold value, another base model BaseModel is randomly selectedqThe weights of both are set to (w)p+wq)/2。
5. The method for detecting android malware based on model library of claim 1, wherein the following method is adopted for adjusting the weight between the base models in the step 5):
selecting m basic models in the model detected in the step 4) according to a preset probability P, and replacing the basic models by n randomly generated basic models to complete weight adjustment among the basic models to form a new model, wherein Np is more than or equal to m and more than or equal to 1, and m is more than or equal to n and more than or equal to 1.
6. The method as claimed in claim 1, wherein in the step 5), if the weight of a base model exceeds a predetermined threshold, the weight of the base model is adjusted to the predetermined threshold.
CN202010495937.6A 2020-06-03 2020-06-03 Android malicious software detection method based on model library Pending CN111814147A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010495937.6A CN111814147A (en) 2020-06-03 2020-06-03 Android malicious software detection method based on model library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010495937.6A CN111814147A (en) 2020-06-03 2020-06-03 Android malicious software detection method based on model library

Publications (1)

Publication Number Publication Date
CN111814147A true CN111814147A (en) 2020-10-23

Family

ID=72848349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010495937.6A Pending CN111814147A (en) 2020-06-03 2020-06-03 Android malicious software detection method based on model library

Country Status (1)

Country Link
CN (1) CN111814147A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116155630A (en) * 2023-04-21 2023-05-23 北京邮电大学 Malicious traffic identification method and related equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985060A (en) * 2018-07-04 2018-12-11 中共中央办公厅电子科技学院 A kind of extensive Android Malware automated detection system and method
US20190080240A1 (en) * 2017-09-08 2019-03-14 SparkCognition, Inc. Execution of a genetic algorithm with variable evolutionary weights of topological parameters for neural network generation and training
CN109508545A (en) * 2018-11-09 2019-03-22 北京大学 A kind of Android Malware classification method based on rarefaction representation and Model Fusion
CN110263539A (en) * 2019-05-15 2019-09-20 湖南警察学院 A kind of Android malicious application detection method and system based on concurrent integration study
US20200125896A1 (en) * 2018-10-19 2020-04-23 Institute For Information Industry Malicious software recognition apparatus and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190080240A1 (en) * 2017-09-08 2019-03-14 SparkCognition, Inc. Execution of a genetic algorithm with variable evolutionary weights of topological parameters for neural network generation and training
CN108985060A (en) * 2018-07-04 2018-12-11 中共中央办公厅电子科技学院 A kind of extensive Android Malware automated detection system and method
US20200125896A1 (en) * 2018-10-19 2020-04-23 Institute For Information Industry Malicious software recognition apparatus and method
CN109508545A (en) * 2018-11-09 2019-03-22 北京大学 A kind of Android Malware classification method based on rarefaction representation and Model Fusion
CN110263539A (en) * 2019-05-15 2019-09-20 湖南警察学院 A kind of Android malicious application detection method and system based on concurrent integration study

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李涛;: "基于SVM的恶意PDF检测研究", 现代计算机(专业版), no. 08, 15 March 2018 (2018-03-15) *
杜炜;李剑;: "基于半监督学习的安卓恶意软件检测及其恶意行为分析", 信息安全研究, no. 03, 5 March 2018 (2018-03-05) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116155630A (en) * 2023-04-21 2023-05-23 北京邮电大学 Malicious traffic identification method and related equipment
CN116155630B (en) * 2023-04-21 2023-07-04 北京邮电大学 Malicious traffic identification method and related equipment

Similar Documents

Publication Publication Date Title
CN106991047B (en) Method and system for predicting object-oriented software defects
US20170063893A1 (en) Learning detector of malicious network traffic from weak labels
CN109871954B (en) Training sample generation method, abnormality detection method and apparatus
CN113469366B (en) Encrypted traffic identification method, device and equipment
CN110111113B (en) Abnormal transaction node detection method and device
CN106817248A (en) A kind of APT attack detection methods
CN103927483A (en) Decision model used for detecting malicious programs and detecting method of malicious programs
CN112733146B (en) Penetration testing method, device and equipment based on machine learning and storage medium
CN109145030B (en) Abnormal data access detection method and device
CN110493262B (en) Classification-improved network attack detection method and system
CN110879881B (en) Mouse track recognition method based on feature component hierarchy and semi-supervised random forest
CN106203103A (en) The method for detecting virus of file and device
CN110851817A (en) Terminal type identification method and device
CN110287311A (en) File classification method and device, storage medium, computer equipment
CN117081858A (en) Intrusion behavior detection method, system, equipment and medium based on multi-decision tree
CN108241662A (en) The optimization method and device of data mark
Jeon et al. Faketalkerdetect: Effective and practical realistic neural talking head detection with a highly unbalanced dataset
CN111814147A (en) Android malicious software detection method based on model library
CN110598794A (en) Classified countermeasure network attack detection method and system
CN114285587B (en) Domain name identification method and device and domain name classification model acquisition method and device
CN110197068B (en) Android malicious application detection method based on improved grayish wolf algorithm
CN111224919B (en) DDOS (distributed denial of service) identification method and device, electronic equipment and medium
CN112001424A (en) Malicious software open set family classification method and device based on countermeasure training
CN114626048B (en) Computer login system and method based on verification code identification
CN114513473B (en) Traffic class detection method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination