CN111324893B

CN111324893B - Detection method and background system for android malicious software based on sensitive mode

Info

Publication number: CN111324893B
Application number: CN202010097459.3A
Authority: CN
Inventors: 廖丹; 陈锐; 黄畅; 李慧; 张明; 陈雪
Original assignee: Tianfu Co Innovation Center University Of Electronic Science And Technology Of China; University of Electronic Science and Technology of China
Current assignee: Tianfu Co Innovation Center University Of Electronic Science And Technology Of China; University of Electronic Science and Technology of China
Priority date: 2020-02-17
Filing date: 2020-02-17
Publication date: 2022-05-10
Anticipated expiration: 2040-02-17
Also published as: CN111324893A

Abstract

The invention discloses a detection method and a background system of android malicious software based on a sensitive mode, wherein the detection method comprises the steps of obtaining an APK file of android software to be detected, performing disassembling operation on the APK file to extract authority data and API (application program interface) calling data, and then filtering the extracted data to form a data sample; reading a sensitive mode cluster constructed by data samples based on a plurality of android software, and constructing the data samples into feature vectors based on existence and maximum inclusion degrees by taking the number of the sensitive mode clusters as dimensions; and inputting the feature vector into the trained malicious software detection model, and outputting a detection result. The scheme also provides a background system of the application store, which comprises a detection method of android malicious software based on a sensitive mode, wherein the detection method is integrated in the background system.

Description

Detection method and background system for android malicious software based on sensitive mode

Technical Field

The invention relates to the field of communication security detection, in particular to a detection method and a background system of android malicious software based on a sensitive mode.

Background

With the rapid development of mobile communication technology, the use of various mobile communication devices such as smart phones, tablet computers, and the like is increasing. To provide a good user experience, a variety of mobile terminal operating systems have emerged, with the android system occupying a large portion of the market share. According to a global smartphone operating system market research report issued by IDC (International Data corporation), the android system is far ahead with a market share of 86.7%.

In order to protect android mobile users from malicious software and create a secure and healthy mobile communication environment, researchers in academia and industry have proposed techniques and tools for detecting malicious software. The detection method is mainly divided into static analysis and dynamic analysis according to the content of the analysis.

The traditional static analysis technology is based on a signature authentication mechanism, a detection system maintains a signature database of known malicious software, and when a signature used by software to be detected exists in the database, the software to be detected is judged to be the malicious software. A disadvantage of this approach is that unknown malware and malware that uses obfuscation techniques cannot be detected.

To address this problem, more static features are introduced into the analysis of malware. For example, when malicious software executes malicious behaviors, corresponding permissions are often required to be applied, such as reading a contact list, sending a short message and the like, so some research works propose a permission-based detection method, and the malicious software and normal software are distinguished by comparing differences in use conditions of the related permissions. API calling is used as the bottom layer implementation of the function behavior of the application software, and the behavior characteristics of the application software can be reflected to a great extent. By means of data flow analysis, API calls with a high degree of security threat to users can be obtained, and the API calls are helpful for identifying malicious software. In addition to rights and API calls, some research has also analyzed android application components, including Activity (Activity), Service (Service), broadcast receiver (broadcastdetect) and Content Provider (Content Provider), to improve detection accuracy.

Some researchers believe that there is a bottleneck of dimensional disaster in processing character string features, and structured features are more beneficial to processing mass data. For example, by constructing a Dalvik opcode map and analyzing its topology, the number of nodes, map probability density, map distance, etc. are used to characterize malware. In addition, a Function Call Graph (FCG) is generated according to the call dependency relationship among the methods, and classification and detection of malicious software can be realized by utilizing similarity calculation of the graphs.

Unlike static analysis, which directly parses an APK file, dynamic analysis needs to go through the running process of application software, observe and collect the running data. The operation environment is often a controllable simulation platform, and the interactive operation under the near-real scene is completed. Taint analysis is a commonly used means in dynamic analysis, for example, a tool called tantdroid integrates taint propagation of four granularities of messages, variables, methods and files by using an android virtualization architecture, simultaneously tracks a plurality of sensitive information sources, and identifies malicious behaviors of application software by monitoring sensitive data. To capture operating system and Java level semantics simultaneously, some studies have collected detailed native instructions and Dalvik instruction traces to track information leakage through Java and native components.

It has been found that most malware requires network connectivity when performing malicious activities, and therefore, analyzing network traffic is also an effective means for detecting malware. By analyzing the IP address, the port number and other connection information in the message, the difference between the malicious software and the normal software can be found. In addition, there are related researches for monitoring abnormal behaviors of application software from a hardware perspective, for example, a power consumption perception-based malware detection framework is proposed, which can generate a corresponding power signature according to a history of power consumption, and reduce detection overhead by adopting noise filtering and data compression techniques. By collecting the information related to the data such as CPU and memory during the operation of the system components, the malicious software can be found according to the abnormal situation of resource occupation.

Through analysis and comparison of the prior art, the scheme finds that the technologies have certain limitations. For example, in static analysis, modeling with a single feature is often at risk of overfitting, while introducing too many features results in over-complexity and dimensionality disasters. Although the dynamic analysis can improve the generalization capability of the detection model to a certain extent, the resource cost required by the dynamic analysis is relatively high because the dynamic analysis needs to be implemented by executing application software.

Disclosure of Invention

Aiming at the defects in the prior art, the android malicious software detection method based on the sensitive mode and the background system can be used for detecting software with high precision under the condition of not starting the software.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that:

in a first aspect, a method for detecting android malware based on sensitive patterns is provided, which includes:

acquiring an APK file of android software to be detected, performing disassembling operation on the APK file to extract authority data and API call data, and filtering the extracted data to form a data sample;

reading a sensitive mode cluster constructed by data samples based on a plurality of android software, and constructing the data samples into feature vectors based on existence and maximum inclusion degrees by taking the number of the sensitive mode clusters as dimensions;

and inputting the feature vector into the trained malicious software detection model, and outputting a detection result.

Further, the method for constructing the malware detection model comprises the following steps:

acquiring a plurality of android software, disassembling an APK file of the android software to extract authority data and API call data, and filtering the extracted data, wherein each android software forms a data sample;

extracting all frequent item sets in a transaction data set formed by all data samples, wherein each frequent item set is used as a sensitive mode;

calculating the Jaro distance of any two sensitive modes as text similarity and calculating the cosine similarity of any two sensitive modes as support similarity, and then taking each sensitive mode as a cluster;

calculating the similarity between the two clusters based on the text similarity and the support similarity of the two sensitive modes;

judging whether the maximum similarity is smaller than a set threshold value or not, if not, combining two clusters with the maximum similarity into one cluster, returning to the previous step, and if not, taking all current clusters as sensitive mode clusters and entering the next step;

constructing a feature vector based on existence and maximum inclusion degree of each data sample by taking the number of the sensitive mode clusters as dimensions;

and training the multilayer gradient lifting decision tree by adopting a training set formed by all the feature vectors to obtain a malicious software detection model.

In a second aspect, a background system of an application store is provided, which includes a detection method based on sensitive pattern android malware, and the detection method is integrated in the background system.

The invention has the beneficial effects that: according to the detection method, the difference between the malicious software and the normal software is revealed from the perspective of sensitive authority and API calling, the data sample is constructed into the feature vector through the sensitive mode cluster, the malicious software is detected through the constructed malicious software detection model, application software does not need to be executed in the detection process, and the resource cost can be reduced.

In addition, the malware detection model constructed by the scheme is formed by training the feature vectors constructed by the data samples based on the sensitive pattern clusters, and has high precision and good generalization capability, so that the detection accuracy in malware detection can be ensured.

Drawings

FIG. 1 is a flow chart of a method for detecting android malware in a sensitive mode.

FIG. 2 is a flow chart of a method of constructing a malware detection model.

FIG. 3 is a training process of a multi-level gradient boosting decision tree.

Fig. 4 shows the support of different sensitive patterns (in part) in malware and normal software.

Fig. 5 is a comparison graph of the detection performance of the detection method of the present embodiment and various detection methods in the prior art.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

Referring to fig. 1, fig. 1 illustrates a flow diagram of a method for sensitive schema-based android malware detection; as shown in fig. 1, the method 100 includes steps 101 to 103.

In step 101, an APK file of android software to be detected is obtained, disassembling operation is performed on the APK file to extract authority data and API call data, and then the extracted data is filtered to form a data sample.

The android software has numerous static characteristics, and the authority and API calling information of the android software are only analyzed in consideration of the complexity and resource cost of the method. In the scheme, Apktool (a reverse engineering analysis tool) is used for disassembling an APK file of android software, and authority data and API calling data are extracted from an android manifest file and a smali file respectively.

In implementation, the implementation method for preferably filtering the extracted data in the scheme is as follows:

acquiring dangerous authorities published by an android official network and a sensitive API list provided by SuSi, and taking the dangerous authorities and the sensitive API list as a standard database;

and comparing the extracted data with data in the standard database, and deleting data which are not located in the standard database to form a data sample.

The dangerous authority published by the android official network and the sensitive API list provided by SuSi are extracted from a plurality of angles such as network connection, mobile phone states, contact lists, short messages, mails, account information, geographic positions and the like, so that the comprehensiveness of the coverage of a standard database is ensured.

By deleting part of the data in the above way, redundant information and noise in the extracted data can be removed, so as to reduce the complexity of analysis.

In step 102, a sensitive pattern cluster constructed based on a plurality of android software data samples is read (the sensitive pattern cluster is constructed by steps S1 to S6 in the method for constructing the malware detection model), and the data samples are constructed into feature vectors based on existence and maximum inclusion degrees by taking the number of the sensitive pattern cluster as a dimension.

In step 103, the feature vector is input into the trained malware detection model, and the detection result is output.

Referring to FIG. 2, FIG. 2 shows a flow chart of a method of building a malware detection model; as shown in fig. 2, the method S includes steps S1 to S7.

In step S1, a plurality of android software are acquired, the APK file thereof is disassembled to extract authority data and API call data, and then the extracted data is filtered, each android software forms a data sample;

the implementation of step S1 is the same as that of step S101, and will not be described herein.

In step S2, extracting all frequent item sets in the transaction data set formed by all data samples, each frequent item set being a sensitive mode; the frequent item set of the scheme can be extracted by adopting the existing Apriori algorithm and FP-growth algorithm.

Meanwhile, the scheme also provides a new method for extracting a plurality of frequent item sets in the transaction data set, which comprises the following steps:

a1, traversing a transaction data set D, and generating a corresponding FP tree and a head pointer table T according to the minimum support degree;

a2, for each element item k in the head pointer table T, after adding the element item k in the set Q, adding the element item k into the frequent item set list L, and obtaining the condition mode base B corresponding to the element item k from the FP tree_k；

A3 traversing condition mode base B_kRecording the element item k and the count value in the element item k to the head pointer table t corresponding to the element item k_kThe preparation method comprises the following steps of (1) performing;

a4, head pointer deletion table t_kThe element item k of which the medium conditional frequency is less than the minimum support degree or equal to the conditional support degree;

a5, if head pointer table t_kIf not, go to step A6, otherwise go to step A9;

a6 according to head pointer table t_kFor conditional mode base B_kFiltering and sorting;

a7 traversing updated condition mode base B_kGenerating a closed condition FP tree corresponding to the element item k;

a8, adopting closed condition FP tree to update FP tree, head pointer table t_kUpdating the head pointer table T and returning to the step A2;

a9, outputting a frequent item set list L.

The novel method for extracting a plurality of frequent item sets in the transaction data set can greatly reduce the scale of a search space through a pruning strategy, improve the efficiency of finding effective frequent item sets, and reduce the number of frequent item sets to a great extent.

According to the scheme, a new method for extracting a plurality of frequent item sets in the transaction data set is adopted to carry out sensitive mode mining on malicious software and normal software, and the obvious difference between the malicious software and the normal software in different combination modes is found. As can be seen in fig. 3, some sensitive patterns are significantly more supported in malware than in normal software, such as:

{ READ _ PHONE _ STATE, INTERNET, SEND _ SMS, getDeviceId (), getActiveNetworkInfo () } support in malicious software is 0.75, and support in normal software is 0.42; however, some sensitive modes are just the opposite, such as { ACCESS _ NETWORK _ STATE, BLUETOOTH, getMessage () } support up to 0.71 in normal software, and only 0.36 in malware. In order to construct a characteristic capable of effectively distinguishing malicious software from normal software, the scheme removes sensitive modes with similar support degrees in the two types of software.

The sensitive mode of the scheme is formed by combining sensitive authority frequently appearing in malicious software or normal software and API calling.

In step S3, the Jaro distance of any two sensitive modes is calculated as the text similarity and the cosine similarity of any two sensitive modes is calculated as the support similarity, and then each sensitive mode is taken as a cluster.

In implementation, the optimal Jaro distance calculation formula of any two sensitive modes in the scheme is as follows:

wherein d is_JaroijIs a sensitive mode sp_iAnd sensitive mode sp_jThe Jaro distance between; m is_ijIs a sensitive mode sp_iAnd sensitive mode sp_jThe number of words matched between; | s₁I and s₂Respectively is a sensitive mode sp_iAnd sensitive mode sp_jThe number of words of (2).

For example, the sensitivity mode { ACCESS _ NETWORK _ STATE, getDeviceId (), getLine1Number (), getMessage (), getText () } and the sensitivity mode { ACCESS _ NETWORK _ STATE, getDeviceId (), getMessage (), getText (), INTERNET }, the Number of matched words is 4, then their Jaro distance is:

the cosine similarity of any two sensitive modes is calculated according to the formula:

wherein the content of the first and second substances,

is a sensitive mode sp_iAnd sensitive mode sp_jCosine similarity between them;

is a sensitive mode sp_iA support vector of (2);

is a sensitive mode sp_jA support vector of (a); (supa)_iAnd (subpb)_iRespectively representing a sensitive pattern sp_iThe support in malware and normal software, the notation a represents the malware class and b represents the normal software class.

According to the scheme, the similarity in the step S4 is calculated by combining the two aspects of text similarity and support similarity, so that the stability of the malware detection model obtained by later training is higher, and the detection is more accurate when the malware detection model is applied to malware detection.

In step S4, the similarity between two clusters is calculated based on the text similarity and the support similarity of the two sensitive patterns:

sim_max(C_i,C_j)＝max({sim(sp_i,sp_j)|sp_i∈C_i,sp_j∈C_j})

sim_min(C_i,C_j)＝min({sim(sp_i,sp_j)|sp_i∈C_i,sp_j∈C_j})

wherein, sim (sp)_i,sp_j) Is a sensitive mode sp_iAnd sensitive mode sp_jSimilarity between them; w is [0,1 ]]A weight value of; c_iAnd C_jAre all clustered; sim (C)_i,C_j) Is C_iAnd C_jSimilarity between them; sim _ max (C)_i,C_j) Is C_iAnd C_jThe maximum similarity of (c); sim _ min (C)_i,C_j) Is C_iAnd C_jThe minimum similarity of;

is a sensitive mode sp_iAnd sensitive mode sp_jThe Jaro distance between;

is a sensitive mode sp_iAnd sensitive mode sp_jCosine similarity between them.

In step S5, determining whether the maximum similarity is smaller than a set threshold, if not, merging two clusters with the maximum similarity into one cluster, and returning to step S4, otherwise, taking all current clusters as sensitive mode clusters, and proceeding to step S6;

in step S6, constructing a feature vector based on the existence and the maximum inclusion degree of each data sample with the number of the sensitive pattern clusters as dimensions; the specific implementation of step S6 is as follows:

to display the representation of each android software sample, the present solution constructs a feature vector based on presence and maximum containment. The characteristic dimension is the number of the sensitive mode clusters, if the data sample has any mode in one cluster, the corresponding characteristic value is 1, otherwise, the inclusion degree of the data sample to each mode in the cluster is calculated, and the maximum inclusion degree is taken as the characteristic value. Assume the set of permissions and API calls in the data sample is PA_hThen, the feature vector of the data sample is constructed as follows:

V_h＝{v_h1,v_h2...v_hi...v_hn}

wherein, V_hThe characteristic vector corresponding to the h-th data sample; v. of_hiIs a V_hThe ith element in (1); n is the number of the sensitive mode clusters; PA_hIs the h data sample; sp_jA sensitive mode; i sp_jI is a sensitive mode sp_jThe number of middle elements; i sp_j∩PA_hL is sp_jAnd PA_hThe number of the same element items in the same element item; inclu (sp)_j,PA_h) Is degree of inclusion, i.e. sp_jAnd PA_hThe number of same element items in sp_jThe proportion of the total number of the element items in the total.

In step S7, a training set composed of all feature vectors is used to train the multi-layer gradient boosting decision tree, so as to obtain a malware detection model.

A hierarchical model algorithm with strong characteristic learning ability of a multi-layer gradient boosting decision tree (mGBDT) is formed by stacking a plurality of regression GBDT layers as building blocks and performing joint training with variants of target propagation. For each layer of GBDT, a mapping F_i:o_i-1→o_i(o_iIndicating the ith layer output) there is a corresponding pseudo-inverse mapping G_iSatisfy the requirement of

(t denotes the tth iteration), which can be calculated by minimizing the inverse loss function:

for the reverse loss function

At the output o_iInjection of gaussian noise epsilon in-1 can enhance the robustness and generalization capability of the model.

As shown in FIG. 4, the training process of the multi-layer gradient boosting decision tree is described, the whole process includes a plurality of iterations, and in each iteration, the pseudo-inverse mapping G of each layer is updated sequentially from the beginning to the end_iAnd calculates the corresponding pseudo label Z_i-1Then based on the forward loss function L_i(L_i＝||F_i(o_i-1)-z_i| l) update the mapping F from front to back in sequence_iAnd obtain new output O of each layer_iAnd finally completing the construction of each layer of GBDT after a specified number of iterations.

In addition, the scheme also provides a background system of the application store, which comprises a detection method of android malicious software based on a sensitive mode, wherein the detection method is integrated in the background system.

In order to verify the effectiveness of the method proposed by the invention, the results thereof are analyzed by means of relevant experiments as follows:

data set and experiment platform

In the experiment, the malware sample of the dataset was from VirusShare, containing 8183 malware; meanwhile, the scheme downloads 9058 pieces of normal software from a plurality of official application stores such as *** play, 360 assistants and the like, and in order to guarantee the quality of the data set, the scheme uses VirusTotal to perform secondary verification on the downloaded normal software, and the number of the normal software finally used for experiments is 8745.

All experiments in the scheme are completed on one PC, the PC is provided with a dual-core 3.7GHz processor and an 8G memory, and an operating system is windows10(64 bits).

Firstly, the effectiveness of the new method for extracting a plurality of frequent item sets in the transaction data set (the new extraction method in the scheme) provided by the scheme is explained:

the performance of the method is compared with that of the traditional FP-growth algorithm. In the data set used in the scheme, the sensitive authority and the API calling number contained in each sample are different and range from a few to hundreds of samples. Table 1 shows the number of frequent itemsets and mining time for the two methods, respectively, to mine at different minimum support degrees.

TABLE 1 Performance comparison of the new extraction method of the present scheme with the conventional FP-growth algorithm

As can be seen from table 1, due to the addition of the pruning strategy, the number of frequent item sets mined by the new extraction method according to the scheme is less than that of the conventional FP-growth algorithm, and the difference is more obvious along with the reduction of the minimum support degree. Therefore, the new extraction method of the scheme can greatly reduce the generation of redundant mode information. In addition, the mining efficiency of the new extraction method is higher than that of the traditional FP-growth algorithm, and the mining of frequent item sets can be completed in a shorter time.

Multi-tiered gradient boosting decision tree performance

In the scheme, a multi-layer gradient boosting decision tree (mGBDT) is adopted to train a detection model, and in order to evaluate the performance of the detection model, the traditional machine learning algorithms such as a Support Vector Machine (SVM), a Decision Tree (DT) and a Random Forest (RF) are compared, and the XGboost which is very competitive in the field of integrated learning is adopted. The indicators evaluated include Accuracy (Accuracy), Precision (Precision), and Recall (Recall).

As can be seen from FIG. 5, the performance of the multi-layer gradient boosting decision tree is obviously superior to that of other algorithms, particularly a support vector machine, a decision tree and a random forest, and the precision is 3% -6%. Although the XGboost has higher precision and recall ratio, the mGBDT has higher precision ratio, which is also important for a malicious software detection system.

The beneficial effects and application brought by the technology of the invention are as follows: according to the android malicious software detection method based on the sensitive mode, the difference between the malicious software and the normal software is revealed from the perspective of sensitive permission and API calling, the sensitive mode capable of effectively distinguishing the malicious software and the normal software can be rapidly obtained by using the new extraction method, and meanwhile, a detection model constructed by adopting a multi-layer gradient lifting decision tree algorithm has high precision and good generalization capability.

The technology can be integrated in a background system of an application store in practical application, detection and evaluation are carried out on application software to be put on shelf, and high-risk application release is prohibited.

Claims

1. The detection method of the android malicious software based on the sensitive mode is characterized by comprising the following steps:

reading a sensitive mode cluster constructed by data samples based on a plurality of android software, and constructing the data samples into feature vectors based on existence and maximum inclusion degree by taking the number of the sensitive mode cluster as dimensionality:

wherein the content of the first and second substances,V _his as followshThe characteristic vector corresponding to each data sample;

is composed ofV _hTo (1)iAn element;nthe number of the sensitive mode clusters;

is as followshA data sample;sp _ja sensitive mode is adopted;

in a sensitive mode

The number of middle elements;

is composed of

And

the number of the same element items in the same element item;

is degree of inclusion, i.e.

And

the number of the same element items in the total

The proportion of the total number of the element items in the composition;

inputting the feature vector into a trained malicious software detection model, and outputting a detection result;

the method for acquiring the sensitive mode cluster comprises the following steps:

and judging whether the maximum similarity is smaller than a set threshold value or not, if not, combining the two clusters with the maximum similarity into one cluster, returning to the previous step, and otherwise, taking all the current clusters as sensitive mode clusters.

2. The method for detecting android malware based on sensitive patterns according to claim 1, wherein the method for constructing the malware detection model comprises:

acquiring a sensitive mode cluster;

3. The method for detecting android malware based on sensitive patterns as claimed in claim 2, wherein the calculation formula of the Jaro distance of any two sensitive patterns is:

wherein the content of the first and second substances,

in a sensitive modesp _iAnd sensitive modesp _jThe Jaro distance between;m _ijin a sensitive modesp _iAnd sensitive modesp _jThe number of words matched between;

and

respectively in a sensitive modesp _iAnd sensitive modesp _jThe number of words of (2).

4. The method for detecting android malware based on sensitive patterns according to claim 2, wherein a calculation formula of cosine similarity of any two sensitive patterns is as follows:

wherein the content of the first and second substances,

in a sensitive modesp _iAnd a sensitive modesp _jCosine similarity between them;

in a sensitive modesp _iA support vector of (2);

in a sensitive modesp _jA support vector of (2);

and

respectively representing sensitive modessp _iSupport, sign in malware and normal softwareaOn behalf of the class of malware,brepresenting the normal software class.

5. The method for detecting android malware based on sensitive patterns as claimed in claim 2, wherein the calculation formula of the similarity between two clusters is:

wherein the content of the first and second substances,

in a sensitive modesp _iAnd a sensitive modesp _jSimilarity between them;wis [0,1 ]]A weight value of;C _iandC _jare all clustered;

is composed ofC _iAndC _jsimilarity between them;

is composed ofC _iAndC _jthe maximum similarity of (c);

is composed ofC _iAndC _jthe minimum similarity of;

in a sensitive modesp _iAnd a sensitive modesp _jThe Jaro distance between;

in a sensitive modesp _iAnd a sensitive modesp _jCosine similarity between them.

6. The method for detecting android malware based on sensitive patterns as claimed in any one of claims 1 to 5, wherein the method for extracting a plurality of frequent item sets in a transaction data set comprises:

a1 traversing transaction data setDGenerating corresponding according to the minimum supportFPTree and head pointer tableT；

A2 pointer table for headTEach element item inkIn the collectionQAdding element itemkThen, the element itemkAdding frequent itemset listLFromFPObtaining element items in a treekCorresponding conditional mode baseB _k；

A3, traversing conditional mode baseB _kThe element items therein arekAnd the count value is recorded to the element itemkCorresponding head pointer tablet _kPerforming the following steps;

a4 pointer with delete headt _kElement item with medium conditional frequency less than minimum support degree or equal to conditional support degreek；

A5, if head pointerWatch (A)t _kIf not, go to step A6, otherwise go to step A9;

a6 pointer to headt _kFor conditional mode baseB _kFiltering and sorting;

a7 traversing updated condition mode baseB _kGenerating element itemskCorresponding closure conditionFPA tree;

a8, adopting closed conditionFPTree updatesFPTree and head pointer tablet _kPointer with updated headTAnd returning to step A2;

a9, outputting a frequent item set listL。

7. The method for detecting android malware based on sensitive patterns of any one of claims 1-5, wherein Apriori algorithm and FP-growth algorithm are adopted to extract a plurality of frequent item sets in transaction data sets.

8. The method for detecting android malware based on sensitive patterns as claimed in any one of claims 1 to 5, wherein the method for filtering the extracted data is implemented as follows:

and comparing the extracted data with the data in the standard database, and deleting the data which are not located in the standard database.

9. A backend system of an application store, comprising the detection method of the sensitive-pattern-based android malware according to any one of claims 1 to 8, the detection method being integrated in the backend system.