CN112651024A

CN112651024A - Method, device and equipment for malicious code detection

Info

Publication number: CN112651024A
Application number: CN202011593644.8A
Authority: CN
Inventors: 杨吉云; 张恒; 周洁; 向涛; 钟世刚
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-04-13

Abstract

The application relates to the technical field of communication security, and discloses a method for detecting malicious codes, which comprises the following steps: acquiring a first function call graph of an APK file to be tested; acquiring a first sensitive API calling sequence of an APK file to be tested according to the first function calling graph; acquiring a first feature vector of an APK file to be tested according to the first sensitive API calling sequence; and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code. The technical effect of the method is explained. Compared with the prior art, the method considers the function call relation of the APK file to be detected when detecting the malicious codes, and can detect the android detection malicious codes more accurately. The application also discloses a device and equipment for detecting the malicious codes.

Description

Method, device and equipment for malicious code detection

Technical Field

The present application relates to the technical field of communication security, and for example, to a method, an apparatus, and a device for malicious code detection.

Background

At present, with the rapid development of the mobile internet and the popularization of the intelligent device, the threat of malicious codes to the system security and the user information security of the intelligent device is increased. In recent years, the Android (Android) system occupies most of the market of the smart terminal, and due to its openness, the Android system has occupied more than eighty percent. The Android platform becomes a main target of malware attacks. Malicious applications are created to perform different types of attacks, such as stealing user private information, sending messages without user permission, enticing users to access malicious websites, etc., which pose a serious threat to smartphone users. How to accurately detect the Android malicious code and protect the privacy of the user become a hot topic in recent years.

In the process of implementing the embodiments of the present disclosure, it is found that at least the following problems exist in the related art:

in the prior art, the calling relation of functions in an application program is not considered, so that the accuracy rate of detecting the Android malicious codes is low.

Disclosure of Invention

The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview nor is intended to identify key/critical elements or to delineate the scope of such embodiments but rather as a prelude to the more detailed description that is presented later.

The embodiment of the disclosure provides a method, a device and equipment for malicious code detection, so that the accuracy of Android malicious code detection can be improved.

In some embodiments, the method for malicious code detection comprises:

acquiring a first function call graph of an APK file to be tested;

acquiring a first sensitive API calling sequence of the APK file to be tested according to the first function calling graph;

acquiring a first feature vector of the APK file to be tested according to the first sensitive API calling sequence;

and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code.

In some embodiments, the apparatus for malicious code detection comprises a processor and a memory storing program instructions, the processor being configured to, when executing the program instructions, perform the method for malicious code detection as described above.

In some embodiments, the apparatus includes the above-described means for malicious code detection.

The method, the device and the equipment for detecting the malicious code provided by the embodiment of the disclosure can realize the following technical effects: acquiring a first function call graph FCG of an APK file to be tested, acquiring a first sensitive API call sequence according to the first function call graph, and acquiring a first feature vector of the APK file to be tested according to the first sensitive API call sequence; and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code. Compared with the prior art, the function call relation of the APK file to be detected is considered when malicious code detection is carried out, and android detection malicious codes can be detected more accurately.

The foregoing general description and the following description are exemplary and explanatory only and are not restrictive of the application.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the accompanying drawings and not in limitation thereof, in which elements having the same reference numeral designations are shown as like elements and not in limitation thereof, and wherein:

FIG. 1 is a schematic diagram of a method for malicious code detection provided by an embodiment of the present disclosure;

FIG. 2 is a diagram illustrating a process of reconstructing a second function call graph according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a second sensitive API call graph provided by embodiments of the present disclosure;

fig. 4 is a schematic diagram of an apparatus for malicious code detection according to an embodiment of the present disclosure.

Detailed Description

So that the manner in which the features and elements of the disclosed embodiments can be understood in detail, a more particular description of the disclosed embodiments, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may be practiced without these details. In other instances, well-known structures and devices may be shown in simplified form in order to simplify the drawing.

The terms "first," "second," and the like in the description and in the claims, and the above-described drawings of embodiments of the present disclosure, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the present disclosure described herein may be made. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.

The term "plurality" means two or more unless otherwise specified.

In the embodiment of the present disclosure, the character "/" indicates that the preceding and following objects are in an or relationship. For example, A/B represents: a or B.

The term "and/or" is an associative relationship that describes objects, meaning that three relationships may exist. For example, a and/or B, represents: a or B, or A and B.

As shown in fig. 1, an embodiment of the present disclosure provides a method for malicious code detection, including:

step S101, acquiring a first Function Call Graph (FCG) of an APK (Android application package) file to be tested;

step S102, acquiring a first sensitive API (Application Programming Interface) calling sequence of the APK file to be tested according to the first function calling graph;

step S103, acquiring a first feature vector of the APK file to be tested according to the first sensitive API calling sequence;

and step S104, inputting the first feature vector into a preset malicious code detection model, and obtaining a detection result of whether the APK file to be detected is a malicious code.

By adopting the method for detecting the malicious codes, the first function call graph FCG of the APK file to be detected is obtained, the first sensitive API call sequence is obtained according to the first function call graph, and the first feature vector of the APK file to be detected is obtained according to the first sensitive API call sequence; and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code. Compared with the prior art, the function call relation of the APK file to be detected is considered when malicious code detection is carried out, and android detection malicious codes can be detected more accurately.

Optionally, a FlowDroid tool is called to extract the first function call graph FCG from the APK file to be tested. Optionally, the first function call graph FCG is a function name in the APK file to be tested and a call relationship between functions. Optionally, the node in the first function call graph FCG is a function in the APK file to be tested. Optionally, the function comprises: optionally, the edge in the first function call graph FCG is a call from one function to another function.

Optionally, the obtaining a first sensitive API call sequence of the APK file to be tested according to the first function call graph includes: matching the nodes in the first function call graph with API nodes in a preset sensitive API set, and determining the corresponding nodes in the first function call graph as first sensitive API nodes under the condition that the nodes in the first function call graph are the same as the API nodes in the sensitive API set; or, under the condition that the node in the first function call graph is different from the API node in the sensitive API set, determining the corresponding node in the first function call graph as a first non-sensitive API node; deleting the first non-sensitive API node and the corresponding edge thereof to obtain a first sensitive API call graph of the APK file to be tested; and under the condition that at least one edge exists between the first sensitive API nodes in the first sensitive API call graph, adding a directed edge between two corresponding first sensitive API nodes in the first sensitive API call graph to obtain a first sensitive API call sequence. In this way, the first non-sensitive API node irrelevant to the malicious behavior and the corresponding edge of the first non-sensitive API node are deleted to obtain the first sensitive API call graph of the APK file to be detected, and compared with the first function call graph, the first sensitive API call graph reduces a large number of nodes and edges, so that the malicious code detection efficiency can be improved. And the extracted first sensitive API calling sequence is related to suspicious behaviors such as malicious codes in the APK file to be detected, so that the first sensitive API calling sequence is more effective, and the accuracy of the malicious codes can be improved by taking the first sensitive API calling sequence as a characteristic.

Optionally, obtaining a first feature vector of the APK file to be tested according to the first sensitive API call sequence includes: and judging whether any first sensitive API calling sequence appears in the APK file to be tested, if so, determining that the characteristic value corresponding to the first sensitive API calling sequence is '1', otherwise, determining that the characteristic value corresponding to the first sensitive API calling sequence is '0', and obtaining a first characteristic vector.

Optionally, before the inputting the first feature vector into the preset malicious code detection model, the method further includes: obtaining a second function call graph FCG of the APK file sample, wherein the type of the APK file sample comprises: malicious APK file samples and benign APK file samples; acquiring a second sensitive API calling sequence of the APK file sample according to the second function calling graph; obtaining the classification characteristics of the APK file samples according to the second sensitive API calling sequence; acquiring a second feature vector of the APK file sample according to the classification features; and training according to the second feature vector to construct a malicious code detection model.

Optionally, a FlowDroid tool is invoked to extract the second function call graph FCG from the APK file sample. Optionally, the second function call graph FCG is a function in the APK file sample and a call relationship between the functions. Optionally, the nodes in the second function call graph FCG are functions in APK file samples. Optionally, the function comprises: one or more of a custom function of the APK file sample, a function interface of an Android official document, an API of Google, and the like, and optionally, an edge in the second function call graph FCG is a call from one function to another function.

Optionally, before obtaining the second function call graph FCG of the APK file sample, the method further includes: the APK file samples were preprocessed. Optionally, the APK file sample is preprocessed, including: and calling VirusTotal to filter all APK file samples. Optionally, for malicious APK file sample data sets, deleting APK file samples identified as malicious by an antivirus engine having less than one virusttotal. Optionally, for benign APK file sample data sets, APK file samples identified as malicious by one or more antivirus engines in VirusTotal are deleted.

Optionally, obtaining a second sensitive API call sequence of the APK file sample according to the second function call graph includes: acquiring a second sensitive API call graph of the APK file sample according to the second function call graph; and acquiring a second sensitive API calling sequence according to the second sensitive API calling graph.

Optionally, obtaining a second sensitive API call graph of the APK file sample according to the second function call graph includes: matching the nodes in the second function call graph with API nodes in a preset sensitive API set, and determining the corresponding nodes in the second function call graph as second sensitive API nodes under the condition that the nodes in the second function call graph are the same as the API nodes in the sensitive API set; or, under the condition that the node in the second function call graph is different from the API node in the sensitive API set, determining the corresponding node in the second function call graph as a second non-sensitive API node; and deleting the second non-sensitive API node and the corresponding edge thereof to obtain a second sensitive API call graph of the APK file sample. In this way, the second non-sensitive API node irrelevant to the malicious behavior and the corresponding edge thereof are deleted to obtain the second sensitive API call graph of the APK file sample, and the second sensitive API call graph reduces a large number of nodes and edges relative to the second function call graph, so that the malicious code detection efficiency can be improved.

Optionally, the sensitive API set is obtained by the SUSI tool.

In some embodiments, the second function call graph is reconstructed, that is, the second non-sensitive API node and its corresponding edge in the second function call graph are deleted, so as to obtain the second sensitive API call graph. With reference to fig. 2 and 3, fig. 2 is a schematic diagram of a process of reconstructing a second function call graph, fig. 3 is a schematic diagram of a reconstructed second sensitive API call graph, a white circle represents a non-sensitive API call node, a gray circle represents a sensitive API call node, a solid line represents a call relationship in the function call graph, and a dotted line represents a reconstructed edge; and deleting the second non-sensitive API node and the corresponding edge thereof.

Optionally, obtaining a second sensitive API call sequence according to the second sensitive API call graph includes: and under the condition that at least one edge exists between every two second sensitive API nodes in the second sensitive API call graph, adding a directed edge between two corresponding second sensitive API nodes in the second sensitive API call graph to obtain a second sensitive API call sequence.

In this way, the second non-sensitive API node irrelevant to the malicious behavior and the corresponding edge thereof are deleted to obtain the second sensitive API call graph of the APK file sample, so that the second sensitive API call sequence is obtained, the second sensitive API call sequence relevant to the suspicious behavior such as the malicious code in the APK file sample can be extracted, the second sensitive API call sequence is more effective, and the malicious code detection model obtained by training the second sensitive API call sequence as the characteristic is more accurate in malicious code detection.

Optionally, obtaining the classification characteristic of the APK file sample according to the second sensitive API call sequence includes: acquiring the support degree of a second sensitive API calling sequence; and determining a second sensitive API calling sequence corresponding to the support degree meeting the preset condition as the classification characteristic of the APK file sample. Therefore, a frequent second sensitive API calling sequence, namely a frequent behavior pattern of the application program can be obtained, the malicious code can be detected according to the frequent behavior pattern of the application program, and the accuracy of detecting the malicious code can be improved. Meanwhile, the problem that the second sensitive API call sequences among the APK file samples are unbalanced when the second sensitive API call sequences are extracted from the APK file samples can be solved, namely, only one second sensitive API call sequence is extracted from some APK file samples, and dozens or hundreds of second sensitive API call sequences are extracted from some APK file samples.

Optionally, the support degree meeting the preset condition includes: a support degree greater than or equal to a set threshold.

Optionally, obtaining the classification characteristic of the APK file sample according to the second sensitive API call sequence includes: mining the second sensitive API calling sequence, and extracting frequent second sensitive API calling sequences; a frequent sequence of second sensitive API calls is determined as a classification characteristic. Optionally, the frequent second sensitive API call sequences include benign APK file samples and high frequency subsequences that are commonly used in applications in the malicious APK file sample set. In some embodiments, malicious code and benign code exhibit different behavior patterns, i.e., malicious code and benign code exhibit different combinations of API call sequences, passing through high frequency sub-sequences to discover respective regularities. And mining the potential relation between the second sensitive API calling sequences to obtain frequent second sensitive API calling sequences, and separately mining the malicious APK file sample data set and the benign APK file sample data set to discover the respective behavior patterns of the malicious APK file sample and the benign APK file sample. Malicious application programs and benign application programs can be distinguished more effectively, and the accuracy and the efficiency of detection are improved. Meanwhile, the behavior pattern of the malicious application program can be found, and the intention of the malicious application program can be conveniently understood.

Optionally, the frequent sensitive API call sequence is a second sensitive API call sequence whose support is greater than or equal to a threshold.

In some embodiments, the number of elements included in a sequence of sensitive API calls is referred to as the length of the sequence. For example, a sensitive API call sequence of length x is denoted as an x-sensitive API call sequence. Optionally, performing mining on the second sensitive API call sequence includes: obtaining 1-second sensitive API calling sequence in the second sensitive API calling sequence data set to obtain a candidate set C₁Deleting the set L of the 1-second sensitive API calling sequences with the support degree smaller than the set threshold value to obtain frequent 1-second sensitive API calling sequences₁(ii) a Mixing L with₁Connection ofPruning to generate a 2-second sensitive API call sequence to obtain a candidate set C₂Deleting the set L of 2-second sensitive API call sequences that are less than the set threshold resulting in frequent 2-second sensitive API call sequences₂(ii) a Mixing L with₂Connecting pruning to generate a 3-second sensitive API calling sequence to obtain a candidate set C₃Deleting 3-second sensitive API call sequences smaller than a set threshold to obtain a set L of frequent 3-second sensitive API call sequences₃(ii) a By analogy, a frequent set L of n-second sensitive API call sequences is obtained_n(ii) a Wherein n is a positive integer index.

Optionally, after mining the second sensitive API call sequence, retaining all sub-sequences of the second sensitive API call sequence obtained by mining. Therefore, malicious applications and benign applications can be better distinguished, and the accuracy of malicious code detection is improved.

Optionally, by calculating: w-suppp_k(s_i)＝S_k(s_i)*A_k(s_i) Obtaining the support degree of a second sensitive API calling sequence;

wherein, w-suppp_k(s_i) Calling sequence s for second sensitive API_iSupport in APK File samples of type k, S_k(s_i) Calling sequence s for second sensitive API_iFrequency of occurrence, A, in the second sensitive API call sequence of all APK file samples of type k_k(s_i) Calling sequence s for second sensitive API_iThe frequency of occurrence in all APK file samples of type k, either malicious APK file samples or benign APK file samples.

Optionally by calculation

Obtaining a second sensitive API call sequence s_iFrequency of occurrence in the second sensitive API call sequence for all APK file samples of type k; wherein NS_kNumber of second sensitive API call sequences for all APK file samples of type k, Occ(s)_i,NS_k) Is an API sequence s_iAll APK files in type kNumber of occurrences in the second sensitive API call sequence of the sample.

Optionally by calculation

Get the second sensitive API call sequence s_iFrequency of occurrence in all APK file samples of type k; wherein, NA_kTotal number of APK file samples of type k, Occ(s)_i,NA_k) For including sensitive API call sequences s in all APK file samples of type k_iNumber of APK file samples.

Optionally, after obtaining the classification feature of the APK file sample according to the second sensitive API call sequence, the method further includes: acquiring the intersection of the sensitive API call sequence set of the malicious APK file sample and the sensitive API call sequence set of the benign APK file sample in the classification characteristics; the intersections are deleted from the classification features. Therefore, the frequent calling sequence only appearing in the malicious APK file sample or the benign APK file sample is selected as the classification characteristic, so that the malicious software and the benign software can be better distinguished; and by deleting the frequent calling sequences which appear in both the malicious APK file sample set and the benign APK file sample set, the number of classification features is reduced, and the detection efficiency is improved.

Optionally, obtaining a second feature vector of the APK file sample according to the classification feature includes: in the classification characteristics, whether any second sensitive API calling sequence appears in the APK file sample is judged, if yes, the characteristic value corresponding to the second sensitive API calling sequence is determined to be '1', and if not, the characteristic value corresponding to the second sensitive API calling sequence is determined to be '0', and a second characteristic vector is obtained. Optionally, the label of the malicious APK file sample is determined to be "1", and the label of the benign APK file sample is determined to be "0".

Optionally, training according to the second feature vector to construct a malicious code detection model, including: and training the second feature vector through a Random Forest (RF) algorithm to construct a malicious code detection model. This is the prior art and will not be described herein.

Optionally, the first feature vector of the APK file to be detected is input into the trained malicious code detection model, and a detection result of whether the APK file to be detected is a malicious code is obtained. Optionally, in the case that the output detection result is "1", the APK file to be detected is malware; and under the condition that the output detection result is '0', the APK file to be detected is benign software. Therefore, machine learning is applied to malicious code analysis and detection, performance better than that of a traditional machine learning algorithm is achieved, and higher automation degree and accuracy are achieved.

According to the method for detecting the malicious codes, provided by the embodiment of the disclosure, a higher-level semantic can be constructed by using a frequent sensitive API sequence behavior pattern, so that a user can find a potential malicious behavior pattern conveniently, and thus, malicious codes in an android application program can be detected more accurately.

As shown in fig. 4, an apparatus for malicious code detection includes a processor (processor)100 and a memory (memory) 101. Optionally, the apparatus may also include a Communication Interface (Communication Interface)102 and a bus 103. The processor 100, the communication interface 102, and the memory 101 may communicate with each other via a bus 103. The communication interface 102 may be used for information transfer. The processor 100 may call logic instructions in the memory 101 to perform the method for malicious code detection of the above-described embodiments.

In addition, the logic instructions in the memory 101 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products.

The memory 101, which is a computer-readable storage medium, may be used for storing software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 100 executes functional applications and data processing, i.e., implements the method for malicious code detection in the above embodiments, by executing program instructions/modules stored in the memory 101.

The memory 101 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. In addition, the memory 101 may include a high-speed random access memory, and may also include a nonvolatile memory.

By adopting the device for detecting the malicious codes, which is provided by the embodiment of the disclosure, a first sensitive API (application program interface) calling sequence is obtained according to a first function call graph FCG (fuzzy C-means) of the APK file to be detected by obtaining the first function call graph, and a first feature vector of the APK file to be detected is obtained according to the first sensitive API calling sequence; and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code. Compared with the prior art, the function call relation of the APK file to be detected is considered when malicious code detection is carried out, and android detection malicious codes can be detected more accurately.

The embodiment of the disclosure provides a device, which includes the above apparatus for malicious code detection.

Optionally, the device comprises a computer or server or the like.

The equipment obtains a first sensitive API (application program interface) calling sequence according to a first function call graph FCG (function call graph) of the APK file to be tested by obtaining the first function call graph, and obtains a first feature vector of the APK file to be tested according to the first sensitive API calling sequence; and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code. Compared with the prior art, the function call relation of the APK file to be detected is considered when malicious code detection is carried out, and android detection malicious codes can be detected more accurately.

Embodiments of the present disclosure provide a computer-readable storage medium storing computer-executable instructions configured to perform the above-described method for malicious code detection.

Embodiments of the present disclosure provide a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the above-described method for malicious code detection.

The computer-readable storage medium described above may be a transitory computer-readable storage medium or a non-transitory computer-readable storage medium.

The technical solution of the embodiments of the present disclosure may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes one or more instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present disclosure. And the aforementioned storage medium may be a non-transitory storage medium comprising: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes, and may also be a transient storage medium.

The above description and drawings sufficiently illustrate embodiments of the disclosure to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, and other changes. The examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. Furthermore, the words used in the specification are words of description only and are not intended to limit the claims. As used in the description of the embodiments and the claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this application is meant to encompass any and all possible combinations of one or more of the associated listed. Furthermore, the terms "comprises" and/or "comprising," when used in this application, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other like elements in a process, method or apparatus that comprises the element. In this document, each embodiment may be described with emphasis on differences from other embodiments, and the same and similar parts between the respective embodiments may be referred to each other. For methods, products, etc. of the embodiment disclosures, reference may be made to the description of the method section for relevance if it corresponds to the method section of the embodiment disclosure.

Those of skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software may depend upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments. It can be clearly understood by the skilled person that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments disclosed herein, the disclosed methods, products (including but not limited to devices, apparatuses, etc.) may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be merely a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to implement the present embodiment. In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the description corresponding to the flowcharts and block diagrams in the figures, operations or steps corresponding to different blocks may also occur in different orders than disclosed in the description, and sometimes there is no specific order between the different operations or steps. For example, two sequential operations or steps may in fact be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims

1. A method for malicious code detection, comprising:

acquiring a first function call graph of an APK file to be tested;

2. The method of claim 1, wherein before inputting the first feature vector into a preset malicious code detection model, the method further comprises:

obtaining a second function call graph of an APK file sample, wherein the type of the APK file sample comprises: malicious APK file samples and benign APK file samples;

acquiring a second sensitive API calling sequence of the APK file sample according to the second function calling graph;

obtaining the classification characteristics of the APK file samples according to the second sensitive API calling sequence;

acquiring a second feature vector of the APK file sample according to the classification feature;

and training according to the second feature vector to construct a malicious code detection model.

3. The method of claim 2, wherein obtaining a second sensitive API call sequence of the APK file samples according to the second function call graph comprises:

acquiring a second sensitive API call graph of the APK file sample according to the second function call graph;

and acquiring the second sensitive API calling sequence according to the second sensitive API calling graph.

4. The method of claim 3, wherein obtaining the second sensitive API call graph for the APK file samples from the second function call graph comprises:

matching the nodes in the second function call graph with API nodes in a preset sensitive API set, and determining the corresponding nodes in the second function call graph as second non-sensitive API nodes under the condition that the nodes in the second function call graph are different from the API nodes in the sensitive API set;

and deleting the second non-sensitive API node and the corresponding edge thereof to obtain a second sensitive API call graph of the APK file sample.

5. The method of claim 3, wherein obtaining the second sensitive API call sequence from the second sensitive API call graph comprises:

and under the condition that at least one edge exists between every two second sensitive API nodes in the second sensitive API call graph, adding a directed edge between the two corresponding second sensitive API nodes to obtain a second sensitive API call sequence.

6. The method of claim 2, wherein obtaining the classification characteristic of the APK file sample according to the second sensitive API call sequence comprises:

acquiring the support degree of the second sensitive API calling sequence;

and determining a second sensitive API calling sequence corresponding to the support degree meeting the preset condition as the classification characteristic of the APK file sample.

7. The method of claim 6, wherein the computing is performed by:

w-supp_k(s_i)＝S_k(s_i)*A_k(s_i) Obtaining the support degree of the second sensitive API calling sequence;

wherein, w-suppp_k(s_i) Calling sequence s for second sensitive API_iSupport in APK File samples of type k, S_k(s_i) Calling sequence s for second sensitive API_iIn classFrequency of occurrence in the second sensitive API call sequence, A, of all APK file samples of type k_k(s_i) Calling sequence s for second sensitive API_iThe frequency of occurrence in all APK file samples of type k, k being either a malicious APK file sample or a benign APK file sample.

8. The method according to any one of claims 2 to 7, after obtaining the classification feature of the APK file sample according to the second sensitive API call sequence, further comprising:

acquiring the intersection of the sensitive API call sequence set of the malicious APK file sample and the sensitive API call sequence set of the benign APK file sample in the classification characteristic;

and deleting the intersection from the classification characteristic.

9. An apparatus for malicious code detection, comprising a processor and a memory storing program instructions, wherein the processor is configured to perform the method for malicious code detection according to any of claims 1 to 8 when executing the program instructions.

10. A device comprising the apparatus for malicious code detection of claim 9.