CN111814147A - Android malicious software detection method based on model library - Google Patents
Android malicious software detection method based on model library Download PDFInfo
- Publication number
- CN111814147A CN111814147A CN202010495937.6A CN202010495937A CN111814147A CN 111814147 A CN111814147 A CN 111814147A CN 202010495937 A CN202010495937 A CN 202010495937A CN 111814147 A CN111814147 A CN 111814147A
- Authority
- CN
- China
- Prior art keywords
- models
- model
- base
- data set
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 22
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 21
- 238000001514 detection method Methods 0.000 claims abstract description 16
- 238000012360 testing method Methods 0.000 claims abstract description 9
- 230000000694 effects Effects 0.000 claims abstract description 8
- 238000012163 sequencing technique Methods 0.000 claims abstract description 4
- 238000007635 classification algorithm Methods 0.000 claims description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses an android malicious software detection method based on a model library, which comprises the following steps: 1) establishing a data set by adopting android software application, and marking the data set; 2) dividing the training set into a data set 1 and a data set 2 according to the proportion of 8:2, and training the algorithm in the algorithm set by using the data in the data set 1 to generate a BaseModel; 3) randomly combining BaseModel Models to obtain Models; 4) detecting and evaluating the models by using the data set 2 to obtain the accuracy detection result of each model; 5) adjusting the weight between the base models; 6) repeating the steps 4) to 5) until the best value of the detection result is not changed; 7) sequencing the models to obtain k models with the best recognition effect; 8) and calculating the accuracy, the recall rate and the F1 value of the k models by using the test set, and performing android malware detection by adopting the model with the best effect. The method is suitable for detecting the android malicious software of multiple groups.
Description
Technical Field
The invention relates to an information security technology, in particular to an android malicious software detection method based on a model library.
Background
There are a large number of applications that have multiple class labels, and are multi-population. The feature difference of the APP can lead to that one application belongs to a plurality of populations, the application scenes are crossed and overlapped, the populations of the APP cannot be accurately marked out, so that the situation that the multiple populations of applications are detected by a recognizer trained by one population is difficult, and android application maliciousness detection cannot be directly carried out at a population angle. Therefore, a method for establishing a model library is further provided, so that the problem that malicious detection is difficult for multiple populations of APPs is solved, and multiple identifiers are required to be combined for detection. Therefore, finding the optimal recognizer combination quickly becomes the key to solving the problem.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for detecting android malicious software based on a model library aiming at the defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows: an android malicious software detection method based on a model library comprises the following steps:
1) the method comprises the steps of collecting android software application to establish a data set, marking the data set, and dividing the marked data set into a test set and a training set;
2) dividing the training set into a data set 1 and a data set 2 according to the proportion of 8:2, and training the algorithm in the algorithm set by using the data in the data set 1 to generate a BaseModel;
the algorithm set is an algorithm set packaged by various classification algorithms including SVM, RF and FC;
3) randomly combining BaseModel Models to obtain Models; the model is an integrated recognizer composed of a plurality of base models and comprises all the base models and the weights w of all the base models in the modeli;
Wherein, wi=ni/N;
Wherein n isiIs a BaseModel of a base model in a modeliN is the total number of base models in the model;
4) detecting and evaluating the models by using the data set 2 to obtain the detection result of each model;
5) adjusting the weight between the base models;
6) repeating the steps 4) to 5) until the times of weight adjustment reach a set value or the best value of the detection result does not change any more;
7) sequencing the models to obtain k models with the best recognition effect, and confirming the combination and the weight of the base models;
8) and calculating the accuracy, the recall rate and the F1 value of the k models by using the test set, and performing android malware detection by adopting the model with the best effect.
According to the scheme, the process of randomly combining BaseModel base Models in the step 3) to obtain model Models is as follows: the method comprises the steps of firstly determining the upper limit of the total number of basic models in a model, then setting the number of the basic models by combining random numbers which are less than or equal to the upper limit value, and then combining to form the model.
According to the scheme, the weight between the base models in the step 5) is adjusted to be random weight adjustment.
According to the scheme, the following method is adopted for adjusting the weight between the basic models in the step 5):
if a certain base model is BaseModelpIf the weight of the model is larger than the set threshold value, another base model BaseModel is randomly selectedqThe weights of both are set to (w)p+wq)/2。
According to the scheme, the following method is adopted for adjusting the weight between the basic models in the step 5):
selecting m basic models in the model detected in the step 4) according to a preset probability P, and replacing the basic models by n randomly generated basic models to complete weight adjustment among the basic models to form a new model, wherein Np is more than or equal to m and more than or equal to 1, and m is more than or equal to n and more than or equal to 1.
According to the scheme, in the step 5), in the new model, if the weight of a certain base model exceeds the set threshold, the weight of the base model is adjusted to the set threshold.
The invention has the following beneficial effects:
the model established by the detection method is suitable for detecting the android malicious software of multiple groups, and the accuracy rate of the model can meet the set requirement.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of a method of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, a method for detecting android malware based on a model library includes the following steps:
1) the method comprises the steps of collecting android software application to establish a data set, marking the data set, and dividing the marked data set into a test set and a training set;
in this embodiment, the crawled 98 types of applications are used as a data set, the applications marked as benign are used as positive samples, the applications marked as malicious are used as negative samples, and the applications are extracted from a plurality of populations of the data set to provide training data and test data for the establishment of a model base. The data set is divided into a training set TrainSet and a testing set TestSet according to the proportion of 9: 1.
2) Dividing the training set into a data set 1 and a data set 2 according to the proportion of 8:2, and training the algorithm in the algorithm set by using the data in the data set 1 to generate a BaseModel;
the algorithm set is an algorithm set packaged by various classification algorithms including SVM, RF and FC;
the algorithm set is as follows: algorithms ═ algorithmm 1, algorithmm 2, …, Algorithmn }; implementing and packaging various algorithms to enable the algorithms to have a uniform input and output format;
the base model is a recognizer generated by training a single algorithm by multiple groups of application data sets;
3) randomly combining the BaseModel Models to obtain Models of all combinations; the model is an integrated recognizer consisting of a plurality of base models and comprises all the base models and the weights of all the base models in the model;
the model Models are obtained by random combination among the BaseModels and the process is as follows: the method comprises the steps of firstly determining the upper limit of the total number of basic models in a model, then setting the number of the basic models by combining random numbers which are less than or equal to the upper limit value, and then combining to form the model.
Model library:
ModelLibrary={BaseModel1,BaseModel2…,BaseModeln}.
the ModelLibrary is composed of all basic models, and stores the basic models generated by all algorithms in the algorithm set;
the model is an integrated recognizer composed of a plurality of base models and comprises all the base models and the weights w of all the base models in the modeli;
Wherein, wi=ni/N;
Wherein n isiIs a BaseModel of a base model in a modeliN is the total number of base models in the model;
4) detecting and evaluating the models by using the data set 2 to obtain the identification accuracy rate detection result of each model;
5) adjusting the weight between the base models;
random weight adjustment:
the weight between the base models is adjusted to adopt random weight adjustment if a certain base model is BaseModelpIf the weight of the model is larger than the set threshold value, another base model BaseModel is randomly selectedqThe weights of both are set to (w)p+wq)/2。
2. Replacement adjustment
Selecting m basic models in the model detected in the step 4) according to a preset probability P, and replacing the basic models by n randomly generated basic models to complete weight adjustment among the basic models to form a new model, wherein Np is more than or equal to m and more than or equal to 1, and m is more than or equal to n and more than or equal to 1.
In the process of forming a new model, if the weight of a certain base model exceeds a set threshold, the weight of the base model is adjusted to the set threshold.
6) Repeating the steps 4) to 5) until the times of weight adjustment reach a set value or the best value of the detection result does not change any more;
7) sequencing the models to obtain k models with the best recognition effect, and confirming the combination and the weight of the base models;
8) and calculating the accuracy, recall rate and F1 value of k models by using the test set, and performing android malware detection by using the model with the best effect, wherein k is generally 3-10.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.
Claims (6)
1. An android malicious software detection method based on a model library is characterized by comprising the following steps:
1) the method comprises the steps of collecting android software application to establish a data set, marking the data set, and dividing the marked data set into a test set and a training set;
2) dividing the training set into a data set 1 and a data set 2 according to the proportion of 8:2, and training the algorithm in the algorithm set by using the data in the data set 1 to generate a BaseModel;
the algorithm set is an algorithm set packaged by various classification algorithms including SVM, RF and FC;
3) randomly combining BaseModel Models to obtain Models; the model is an integrated recognizer composed of a plurality of base models and comprises all the base models and the weights w of all the base models in the modeli;
Wherein, wi=ni/N;
Wherein n isiIs a BaseModel of a base model in a modeliN is the total number of base models in the model;
4) detecting and evaluating the models by using the data set 2 to obtain the accuracy detection result of each model;
5) adjusting the weight between the base models in the model;
6) repeating the steps 4) to 5) until the best value of the detection result is not changed;
7) sequencing the models to obtain k models with the best recognition effect, and confirming the combination and the weight of the base models;
8) and calculating the accuracy, the recall rate and the F1 value of the k models by using the test set, and performing android malware detection by adopting the model with the best effect.
2. The method for detecting android malware based on model library of claim 1, wherein the model Models are obtained by random combination between base Models BaseModel in step 3) as follows: the method comprises the steps of firstly determining the upper limit of the total number of basic models in a model, then setting the number of the basic models by combining random numbers which are less than or equal to the upper limit value, and then combining to form the model.
3. The method as claimed in claim 1, wherein the weights between the base models in step 5) are adjusted by random weight adjustment.
4. The method of claim 3, wherein the adjusting the weights between the base models in step 5) is performed by:
after random weight adjustment, certain base model BaseModelpIf the weight of the model is larger than the set threshold value, another base model BaseModel is randomly selectedqThe weights of both are set to (w)p+wq)/2。
5. The method for detecting android malware based on model library of claim 1, wherein the following method is adopted for adjusting the weight between the base models in the step 5):
selecting m basic models in the model detected in the step 4) according to a preset probability P, and replacing the basic models by n randomly generated basic models to complete weight adjustment among the basic models to form a new model, wherein Np is more than or equal to m and more than or equal to 1, and m is more than or equal to n and more than or equal to 1.
6. The method as claimed in claim 1, wherein in the step 5), if the weight of a base model exceeds a predetermined threshold, the weight of the base model is adjusted to the predetermined threshold.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010495937.6A CN111814147A (en) | 2020-06-03 | 2020-06-03 | Android malicious software detection method based on model library |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010495937.6A CN111814147A (en) | 2020-06-03 | 2020-06-03 | Android malicious software detection method based on model library |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111814147A true CN111814147A (en) | 2020-10-23 |
Family
ID=72848349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010495937.6A Pending CN111814147A (en) | 2020-06-03 | 2020-06-03 | Android malicious software detection method based on model library |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111814147A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116155630A (en) * | 2023-04-21 | 2023-05-23 | 北京邮电大学 | Malicious traffic identification method and related equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985060A (en) * | 2018-07-04 | 2018-12-11 | 中共中央办公厅电子科技学院 | A kind of extensive Android Malware automated detection system and method |
US20190080240A1 (en) * | 2017-09-08 | 2019-03-14 | SparkCognition, Inc. | Execution of a genetic algorithm with variable evolutionary weights of topological parameters for neural network generation and training |
CN109508545A (en) * | 2018-11-09 | 2019-03-22 | 北京大学 | A kind of Android Malware classification method based on rarefaction representation and Model Fusion |
CN110263539A (en) * | 2019-05-15 | 2019-09-20 | 湖南警察学院 | A kind of Android malicious application detection method and system based on concurrent integration study |
US20200125896A1 (en) * | 2018-10-19 | 2020-04-23 | Institute For Information Industry | Malicious software recognition apparatus and method |
-
2020
- 2020-06-03 CN CN202010495937.6A patent/CN111814147A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190080240A1 (en) * | 2017-09-08 | 2019-03-14 | SparkCognition, Inc. | Execution of a genetic algorithm with variable evolutionary weights of topological parameters for neural network generation and training |
CN108985060A (en) * | 2018-07-04 | 2018-12-11 | 中共中央办公厅电子科技学院 | A kind of extensive Android Malware automated detection system and method |
US20200125896A1 (en) * | 2018-10-19 | 2020-04-23 | Institute For Information Industry | Malicious software recognition apparatus and method |
CN109508545A (en) * | 2018-11-09 | 2019-03-22 | 北京大学 | A kind of Android Malware classification method based on rarefaction representation and Model Fusion |
CN110263539A (en) * | 2019-05-15 | 2019-09-20 | 湖南警察学院 | A kind of Android malicious application detection method and system based on concurrent integration study |
Non-Patent Citations (2)
Title |
---|
李涛;: "基于SVM的恶意PDF检测研究", 现代计算机(专业版), no. 08, 15 March 2018 (2018-03-15) * |
杜炜;李剑;: "基于半监督学习的安卓恶意软件检测及其恶意行为分析", 信息安全研究, no. 03, 5 March 2018 (2018-03-05) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116155630A (en) * | 2023-04-21 | 2023-05-23 | 北京邮电大学 | Malicious traffic identification method and related equipment |
CN116155630B (en) * | 2023-04-21 | 2023-07-04 | 北京邮电大学 | Malicious traffic identification method and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106991047B (en) | Method and system for predicting object-oriented software defects | |
US20170063893A1 (en) | Learning detector of malicious network traffic from weak labels | |
CN109871954B (en) | Training sample generation method, abnormality detection method and apparatus | |
CN113469366B (en) | Encrypted traffic identification method, device and equipment | |
CN110111113B (en) | Abnormal transaction node detection method and device | |
CN106817248A (en) | A kind of APT attack detection methods | |
CN103927483A (en) | Decision model used for detecting malicious programs and detecting method of malicious programs | |
CN112733146B (en) | Penetration testing method, device and equipment based on machine learning and storage medium | |
CN109145030B (en) | Abnormal data access detection method and device | |
CN110493262B (en) | Classification-improved network attack detection method and system | |
CN110879881B (en) | Mouse track recognition method based on feature component hierarchy and semi-supervised random forest | |
CN106203103A (en) | The method for detecting virus of file and device | |
CN110851817A (en) | Terminal type identification method and device | |
CN110287311A (en) | File classification method and device, storage medium, computer equipment | |
CN117081858A (en) | Intrusion behavior detection method, system, equipment and medium based on multi-decision tree | |
CN108241662A (en) | The optimization method and device of data mark | |
Jeon et al. | Faketalkerdetect: Effective and practical realistic neural talking head detection with a highly unbalanced dataset | |
CN111814147A (en) | Android malicious software detection method based on model library | |
CN110598794A (en) | Classified countermeasure network attack detection method and system | |
CN114285587B (en) | Domain name identification method and device and domain name classification model acquisition method and device | |
CN110197068B (en) | Android malicious application detection method based on improved grayish wolf algorithm | |
CN111224919B (en) | DDOS (distributed denial of service) identification method and device, electronic equipment and medium | |
CN112001424A (en) | Malicious software open set family classification method and device based on countermeasure training | |
CN114626048B (en) | Computer login system and method based on verification code identification | |
CN114513473B (en) | Traffic class detection method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |