CN112651024A - Method, device and equipment for malicious code detection - Google Patents

Method, device and equipment for malicious code detection Download PDF

Info

Publication number
CN112651024A
CN112651024A CN202011593644.8A CN202011593644A CN112651024A CN 112651024 A CN112651024 A CN 112651024A CN 202011593644 A CN202011593644 A CN 202011593644A CN 112651024 A CN112651024 A CN 112651024A
Authority
CN
China
Prior art keywords
sensitive api
apk file
sequence
malicious code
sensitive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011593644.8A
Other languages
Chinese (zh)
Inventor
杨吉云
张恒
周洁
向涛
钟世刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202011593644.8A priority Critical patent/CN112651024A/en
Publication of CN112651024A publication Critical patent/CN112651024A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The application relates to the technical field of communication security, and discloses a method for detecting malicious codes, which comprises the following steps: acquiring a first function call graph of an APK file to be tested; acquiring a first sensitive API calling sequence of an APK file to be tested according to the first function calling graph; acquiring a first feature vector of an APK file to be tested according to the first sensitive API calling sequence; and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code. The technical effect of the method is explained. Compared with the prior art, the method considers the function call relation of the APK file to be detected when detecting the malicious codes, and can detect the android detection malicious codes more accurately. The application also discloses a device and equipment for detecting the malicious codes.

Description

Method, device and equipment for malicious code detection
Technical Field
The present application relates to the technical field of communication security, and for example, to a method, an apparatus, and a device for malicious code detection.
Background
At present, with the rapid development of the mobile internet and the popularization of the intelligent device, the threat of malicious codes to the system security and the user information security of the intelligent device is increased. In recent years, the Android (Android) system occupies most of the market of the smart terminal, and due to its openness, the Android system has occupied more than eighty percent. The Android platform becomes a main target of malware attacks. Malicious applications are created to perform different types of attacks, such as stealing user private information, sending messages without user permission, enticing users to access malicious websites, etc., which pose a serious threat to smartphone users. How to accurately detect the Android malicious code and protect the privacy of the user become a hot topic in recent years.
In the process of implementing the embodiments of the present disclosure, it is found that at least the following problems exist in the related art:
in the prior art, the calling relation of functions in an application program is not considered, so that the accuracy rate of detecting the Android malicious codes is low.
Disclosure of Invention
The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview nor is intended to identify key/critical elements or to delineate the scope of such embodiments but rather as a prelude to the more detailed description that is presented later.
The embodiment of the disclosure provides a method, a device and equipment for malicious code detection, so that the accuracy of Android malicious code detection can be improved.
In some embodiments, the method for malicious code detection comprises:
acquiring a first function call graph of an APK file to be tested;
acquiring a first sensitive API calling sequence of the APK file to be tested according to the first function calling graph;
acquiring a first feature vector of the APK file to be tested according to the first sensitive API calling sequence;
and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code.
In some embodiments, the apparatus for malicious code detection comprises a processor and a memory storing program instructions, the processor being configured to, when executing the program instructions, perform the method for malicious code detection as described above.
In some embodiments, the apparatus includes the above-described means for malicious code detection.
The method, the device and the equipment for detecting the malicious code provided by the embodiment of the disclosure can realize the following technical effects: acquiring a first function call graph FCG of an APK file to be tested, acquiring a first sensitive API call sequence according to the first function call graph, and acquiring a first feature vector of the APK file to be tested according to the first sensitive API call sequence; and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code. Compared with the prior art, the function call relation of the APK file to be detected is considered when malicious code detection is carried out, and android detection malicious codes can be detected more accurately.
The foregoing general description and the following description are exemplary and explanatory only and are not restrictive of the application.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the accompanying drawings and not in limitation thereof, in which elements having the same reference numeral designations are shown as like elements and not in limitation thereof, and wherein:
FIG. 1 is a schematic diagram of a method for malicious code detection provided by an embodiment of the present disclosure;
FIG. 2 is a diagram illustrating a process of reconstructing a second function call graph according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a second sensitive API call graph provided by embodiments of the present disclosure;
fig. 4 is a schematic diagram of an apparatus for malicious code detection according to an embodiment of the present disclosure.
Detailed Description
So that the manner in which the features and elements of the disclosed embodiments can be understood in detail, a more particular description of the disclosed embodiments, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may be practiced without these details. In other instances, well-known structures and devices may be shown in simplified form in order to simplify the drawing.
The terms "first," "second," and the like in the description and in the claims, and the above-described drawings of embodiments of the present disclosure, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the present disclosure described herein may be made. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.
The term "plurality" means two or more unless otherwise specified.
In the embodiment of the present disclosure, the character "/" indicates that the preceding and following objects are in an or relationship. For example, A/B represents: a or B.
The term "and/or" is an associative relationship that describes objects, meaning that three relationships may exist. For example, a and/or B, represents: a or B, or A and B.
As shown in fig. 1, an embodiment of the present disclosure provides a method for malicious code detection, including:
step S101, acquiring a first Function Call Graph (FCG) of an APK (Android application package) file to be tested;
step S102, acquiring a first sensitive API (Application Programming Interface) calling sequence of the APK file to be tested according to the first function calling graph;
step S103, acquiring a first feature vector of the APK file to be tested according to the first sensitive API calling sequence;
and step S104, inputting the first feature vector into a preset malicious code detection model, and obtaining a detection result of whether the APK file to be detected is a malicious code.
By adopting the method for detecting the malicious codes, the first function call graph FCG of the APK file to be detected is obtained, the first sensitive API call sequence is obtained according to the first function call graph, and the first feature vector of the APK file to be detected is obtained according to the first sensitive API call sequence; and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code. Compared with the prior art, the function call relation of the APK file to be detected is considered when malicious code detection is carried out, and android detection malicious codes can be detected more accurately.
Optionally, a FlowDroid tool is called to extract the first function call graph FCG from the APK file to be tested. Optionally, the first function call graph FCG is a function name in the APK file to be tested and a call relationship between functions. Optionally, the node in the first function call graph FCG is a function in the APK file to be tested. Optionally, the function comprises: optionally, the edge in the first function call graph FCG is a call from one function to another function.
Optionally, the obtaining a first sensitive API call sequence of the APK file to be tested according to the first function call graph includes: matching the nodes in the first function call graph with API nodes in a preset sensitive API set, and determining the corresponding nodes in the first function call graph as first sensitive API nodes under the condition that the nodes in the first function call graph are the same as the API nodes in the sensitive API set; or, under the condition that the node in the first function call graph is different from the API node in the sensitive API set, determining the corresponding node in the first function call graph as a first non-sensitive API node; deleting the first non-sensitive API node and the corresponding edge thereof to obtain a first sensitive API call graph of the APK file to be tested; and under the condition that at least one edge exists between the first sensitive API nodes in the first sensitive API call graph, adding a directed edge between two corresponding first sensitive API nodes in the first sensitive API call graph to obtain a first sensitive API call sequence. In this way, the first non-sensitive API node irrelevant to the malicious behavior and the corresponding edge of the first non-sensitive API node are deleted to obtain the first sensitive API call graph of the APK file to be detected, and compared with the first function call graph, the first sensitive API call graph reduces a large number of nodes and edges, so that the malicious code detection efficiency can be improved. And the extracted first sensitive API calling sequence is related to suspicious behaviors such as malicious codes in the APK file to be detected, so that the first sensitive API calling sequence is more effective, and the accuracy of the malicious codes can be improved by taking the first sensitive API calling sequence as a characteristic.
Optionally, obtaining a first feature vector of the APK file to be tested according to the first sensitive API call sequence includes: and judging whether any first sensitive API calling sequence appears in the APK file to be tested, if so, determining that the characteristic value corresponding to the first sensitive API calling sequence is '1', otherwise, determining that the characteristic value corresponding to the first sensitive API calling sequence is '0', and obtaining a first characteristic vector.
Optionally, before the inputting the first feature vector into the preset malicious code detection model, the method further includes: obtaining a second function call graph FCG of the APK file sample, wherein the type of the APK file sample comprises: malicious APK file samples and benign APK file samples; acquiring a second sensitive API calling sequence of the APK file sample according to the second function calling graph; obtaining the classification characteristics of the APK file samples according to the second sensitive API calling sequence; acquiring a second feature vector of the APK file sample according to the classification features; and training according to the second feature vector to construct a malicious code detection model.
Optionally, a FlowDroid tool is invoked to extract the second function call graph FCG from the APK file sample. Optionally, the second function call graph FCG is a function in the APK file sample and a call relationship between the functions. Optionally, the nodes in the second function call graph FCG are functions in APK file samples. Optionally, the function comprises: one or more of a custom function of the APK file sample, a function interface of an Android official document, an API of Google, and the like, and optionally, an edge in the second function call graph FCG is a call from one function to another function.
Optionally, before obtaining the second function call graph FCG of the APK file sample, the method further includes: the APK file samples were preprocessed. Optionally, the APK file sample is preprocessed, including: and calling VirusTotal to filter all APK file samples. Optionally, for malicious APK file sample data sets, deleting APK file samples identified as malicious by an antivirus engine having less than one virusttotal. Optionally, for benign APK file sample data sets, APK file samples identified as malicious by one or more antivirus engines in VirusTotal are deleted.
Optionally, obtaining a second sensitive API call sequence of the APK file sample according to the second function call graph includes: acquiring a second sensitive API call graph of the APK file sample according to the second function call graph; and acquiring a second sensitive API calling sequence according to the second sensitive API calling graph.
Optionally, obtaining a second sensitive API call graph of the APK file sample according to the second function call graph includes: matching the nodes in the second function call graph with API nodes in a preset sensitive API set, and determining the corresponding nodes in the second function call graph as second sensitive API nodes under the condition that the nodes in the second function call graph are the same as the API nodes in the sensitive API set; or, under the condition that the node in the second function call graph is different from the API node in the sensitive API set, determining the corresponding node in the second function call graph as a second non-sensitive API node; and deleting the second non-sensitive API node and the corresponding edge thereof to obtain a second sensitive API call graph of the APK file sample. In this way, the second non-sensitive API node irrelevant to the malicious behavior and the corresponding edge thereof are deleted to obtain the second sensitive API call graph of the APK file sample, and the second sensitive API call graph reduces a large number of nodes and edges relative to the second function call graph, so that the malicious code detection efficiency can be improved.
Optionally, the sensitive API set is obtained by the SUSI tool.
In some embodiments, the second function call graph is reconstructed, that is, the second non-sensitive API node and its corresponding edge in the second function call graph are deleted, so as to obtain the second sensitive API call graph. With reference to fig. 2 and 3, fig. 2 is a schematic diagram of a process of reconstructing a second function call graph, fig. 3 is a schematic diagram of a reconstructed second sensitive API call graph, a white circle represents a non-sensitive API call node, a gray circle represents a sensitive API call node, a solid line represents a call relationship in the function call graph, and a dotted line represents a reconstructed edge; and deleting the second non-sensitive API node and the corresponding edge thereof.
Optionally, obtaining a second sensitive API call sequence according to the second sensitive API call graph includes: and under the condition that at least one edge exists between every two second sensitive API nodes in the second sensitive API call graph, adding a directed edge between two corresponding second sensitive API nodes in the second sensitive API call graph to obtain a second sensitive API call sequence.
In this way, the second non-sensitive API node irrelevant to the malicious behavior and the corresponding edge thereof are deleted to obtain the second sensitive API call graph of the APK file sample, so that the second sensitive API call sequence is obtained, the second sensitive API call sequence relevant to the suspicious behavior such as the malicious code in the APK file sample can be extracted, the second sensitive API call sequence is more effective, and the malicious code detection model obtained by training the second sensitive API call sequence as the characteristic is more accurate in malicious code detection.
Optionally, obtaining the classification characteristic of the APK file sample according to the second sensitive API call sequence includes: acquiring the support degree of a second sensitive API calling sequence; and determining a second sensitive API calling sequence corresponding to the support degree meeting the preset condition as the classification characteristic of the APK file sample. Therefore, a frequent second sensitive API calling sequence, namely a frequent behavior pattern of the application program can be obtained, the malicious code can be detected according to the frequent behavior pattern of the application program, and the accuracy of detecting the malicious code can be improved. Meanwhile, the problem that the second sensitive API call sequences among the APK file samples are unbalanced when the second sensitive API call sequences are extracted from the APK file samples can be solved, namely, only one second sensitive API call sequence is extracted from some APK file samples, and dozens or hundreds of second sensitive API call sequences are extracted from some APK file samples.
Optionally, the support degree meeting the preset condition includes: a support degree greater than or equal to a set threshold.
Optionally, obtaining the classification characteristic of the APK file sample according to the second sensitive API call sequence includes: mining the second sensitive API calling sequence, and extracting frequent second sensitive API calling sequences; a frequent sequence of second sensitive API calls is determined as a classification characteristic. Optionally, the frequent second sensitive API call sequences include benign APK file samples and high frequency subsequences that are commonly used in applications in the malicious APK file sample set. In some embodiments, malicious code and benign code exhibit different behavior patterns, i.e., malicious code and benign code exhibit different combinations of API call sequences, passing through high frequency sub-sequences to discover respective regularities. And mining the potential relation between the second sensitive API calling sequences to obtain frequent second sensitive API calling sequences, and separately mining the malicious APK file sample data set and the benign APK file sample data set to discover the respective behavior patterns of the malicious APK file sample and the benign APK file sample. Malicious application programs and benign application programs can be distinguished more effectively, and the accuracy and the efficiency of detection are improved. Meanwhile, the behavior pattern of the malicious application program can be found, and the intention of the malicious application program can be conveniently understood.
Optionally, the frequent sensitive API call sequence is a second sensitive API call sequence whose support is greater than or equal to a threshold.
In some embodiments, the number of elements included in a sequence of sensitive API calls is referred to as the length of the sequence. For example, a sensitive API call sequence of length x is denoted as an x-sensitive API call sequence. Optionally, performing mining on the second sensitive API call sequence includes: obtaining 1-second sensitive API calling sequence in the second sensitive API calling sequence data set to obtain a candidate set C1Deleting the set L of the 1-second sensitive API calling sequences with the support degree smaller than the set threshold value to obtain frequent 1-second sensitive API calling sequences1(ii) a Mixing L with1Connection ofPruning to generate a 2-second sensitive API call sequence to obtain a candidate set C2Deleting the set L of 2-second sensitive API call sequences that are less than the set threshold resulting in frequent 2-second sensitive API call sequences2(ii) a Mixing L with2Connecting pruning to generate a 3-second sensitive API calling sequence to obtain a candidate set C3Deleting 3-second sensitive API call sequences smaller than a set threshold to obtain a set L of frequent 3-second sensitive API call sequences3(ii) a By analogy, a frequent set L of n-second sensitive API call sequences is obtainedn(ii) a Wherein n is a positive integer index.
Optionally, after mining the second sensitive API call sequence, retaining all sub-sequences of the second sensitive API call sequence obtained by mining. Therefore, malicious applications and benign applications can be better distinguished, and the accuracy of malicious code detection is improved.
Optionally, by calculating: w-supppk(si)=Sk(si)*Ak(si) Obtaining the support degree of a second sensitive API calling sequence;
wherein, w-supppk(si) Calling sequence s for second sensitive APIiSupport in APK File samples of type k, Sk(si) Calling sequence s for second sensitive APIiFrequency of occurrence, A, in the second sensitive API call sequence of all APK file samples of type kk(si) Calling sequence s for second sensitive APIiThe frequency of occurrence in all APK file samples of type k, either malicious APK file samples or benign APK file samples.
Optionally by calculation
Figure BDA0002867516700000081
Obtaining a second sensitive API call sequence siFrequency of occurrence in the second sensitive API call sequence for all APK file samples of type k; wherein NSkNumber of second sensitive API call sequences for all APK file samples of type k, Occ(s)i,NSk) Is an API sequence siAll APK files in type kNumber of occurrences in the second sensitive API call sequence of the sample.
Optionally by calculation
Figure BDA0002867516700000082
Get the second sensitive API call sequence siFrequency of occurrence in all APK file samples of type k; wherein, NAkTotal number of APK file samples of type k, Occ(s)i,NAk) For including sensitive API call sequences s in all APK file samples of type kiNumber of APK file samples.
Optionally, after obtaining the classification feature of the APK file sample according to the second sensitive API call sequence, the method further includes: acquiring the intersection of the sensitive API call sequence set of the malicious APK file sample and the sensitive API call sequence set of the benign APK file sample in the classification characteristics; the intersections are deleted from the classification features. Therefore, the frequent calling sequence only appearing in the malicious APK file sample or the benign APK file sample is selected as the classification characteristic, so that the malicious software and the benign software can be better distinguished; and by deleting the frequent calling sequences which appear in both the malicious APK file sample set and the benign APK file sample set, the number of classification features is reduced, and the detection efficiency is improved.
Optionally, obtaining a second feature vector of the APK file sample according to the classification feature includes: in the classification characteristics, whether any second sensitive API calling sequence appears in the APK file sample is judged, if yes, the characteristic value corresponding to the second sensitive API calling sequence is determined to be '1', and if not, the characteristic value corresponding to the second sensitive API calling sequence is determined to be '0', and a second characteristic vector is obtained. Optionally, the label of the malicious APK file sample is determined to be "1", and the label of the benign APK file sample is determined to be "0".
Optionally, training according to the second feature vector to construct a malicious code detection model, including: and training the second feature vector through a Random Forest (RF) algorithm to construct a malicious code detection model. This is the prior art and will not be described herein.
Optionally, the first feature vector of the APK file to be detected is input into the trained malicious code detection model, and a detection result of whether the APK file to be detected is a malicious code is obtained. Optionally, in the case that the output detection result is "1", the APK file to be detected is malware; and under the condition that the output detection result is '0', the APK file to be detected is benign software. Therefore, machine learning is applied to malicious code analysis and detection, performance better than that of a traditional machine learning algorithm is achieved, and higher automation degree and accuracy are achieved.
According to the method for detecting the malicious codes, provided by the embodiment of the disclosure, a higher-level semantic can be constructed by using a frequent sensitive API sequence behavior pattern, so that a user can find a potential malicious behavior pattern conveniently, and thus, malicious codes in an android application program can be detected more accurately.
As shown in fig. 4, an apparatus for malicious code detection includes a processor (processor)100 and a memory (memory) 101. Optionally, the apparatus may also include a Communication Interface (Communication Interface)102 and a bus 103. The processor 100, the communication interface 102, and the memory 101 may communicate with each other via a bus 103. The communication interface 102 may be used for information transfer. The processor 100 may call logic instructions in the memory 101 to perform the method for malicious code detection of the above-described embodiments.
In addition, the logic instructions in the memory 101 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products.
The memory 101, which is a computer-readable storage medium, may be used for storing software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 100 executes functional applications and data processing, i.e., implements the method for malicious code detection in the above embodiments, by executing program instructions/modules stored in the memory 101.
The memory 101 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. In addition, the memory 101 may include a high-speed random access memory, and may also include a nonvolatile memory.
By adopting the device for detecting the malicious codes, which is provided by the embodiment of the disclosure, a first sensitive API (application program interface) calling sequence is obtained according to a first function call graph FCG (fuzzy C-means) of the APK file to be detected by obtaining the first function call graph, and a first feature vector of the APK file to be detected is obtained according to the first sensitive API calling sequence; and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code. Compared with the prior art, the function call relation of the APK file to be detected is considered when malicious code detection is carried out, and android detection malicious codes can be detected more accurately.
The embodiment of the disclosure provides a device, which includes the above apparatus for malicious code detection.
Optionally, the device comprises a computer or server or the like.
The equipment obtains a first sensitive API (application program interface) calling sequence according to a first function call graph FCG (function call graph) of the APK file to be tested by obtaining the first function call graph, and obtains a first feature vector of the APK file to be tested according to the first sensitive API calling sequence; and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code. Compared with the prior art, the function call relation of the APK file to be detected is considered when malicious code detection is carried out, and android detection malicious codes can be detected more accurately.
Embodiments of the present disclosure provide a computer-readable storage medium storing computer-executable instructions configured to perform the above-described method for malicious code detection.
Embodiments of the present disclosure provide a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the above-described method for malicious code detection.
The computer-readable storage medium described above may be a transitory computer-readable storage medium or a non-transitory computer-readable storage medium.
The technical solution of the embodiments of the present disclosure may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes one or more instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present disclosure. And the aforementioned storage medium may be a non-transitory storage medium comprising: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes, and may also be a transient storage medium.
The above description and drawings sufficiently illustrate embodiments of the disclosure to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, and other changes. The examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. Furthermore, the words used in the specification are words of description only and are not intended to limit the claims. As used in the description of the embodiments and the claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this application is meant to encompass any and all possible combinations of one or more of the associated listed. Furthermore, the terms "comprises" and/or "comprising," when used in this application, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other like elements in a process, method or apparatus that comprises the element. In this document, each embodiment may be described with emphasis on differences from other embodiments, and the same and similar parts between the respective embodiments may be referred to each other. For methods, products, etc. of the embodiment disclosures, reference may be made to the description of the method section for relevance if it corresponds to the method section of the embodiment disclosure.
Those of skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software may depend upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments. It can be clearly understood by the skilled person that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments disclosed herein, the disclosed methods, products (including but not limited to devices, apparatuses, etc.) may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be merely a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to implement the present embodiment. In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the description corresponding to the flowcharts and block diagrams in the figures, operations or steps corresponding to different blocks may also occur in different orders than disclosed in the description, and sometimes there is no specific order between the different operations or steps. For example, two sequential operations or steps may in fact be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims (10)

1. A method for malicious code detection, comprising:
acquiring a first function call graph of an APK file to be tested;
acquiring a first sensitive API calling sequence of the APK file to be tested according to the first function calling graph;
acquiring a first feature vector of the APK file to be tested according to the first sensitive API calling sequence;
and inputting the first feature vector into a preset malicious code detection model to obtain a detection result of whether the APK file to be detected is a malicious code.
2. The method of claim 1, wherein before inputting the first feature vector into a preset malicious code detection model, the method further comprises:
obtaining a second function call graph of an APK file sample, wherein the type of the APK file sample comprises: malicious APK file samples and benign APK file samples;
acquiring a second sensitive API calling sequence of the APK file sample according to the second function calling graph;
obtaining the classification characteristics of the APK file samples according to the second sensitive API calling sequence;
acquiring a second feature vector of the APK file sample according to the classification feature;
and training according to the second feature vector to construct a malicious code detection model.
3. The method of claim 2, wherein obtaining a second sensitive API call sequence of the APK file samples according to the second function call graph comprises:
acquiring a second sensitive API call graph of the APK file sample according to the second function call graph;
and acquiring the second sensitive API calling sequence according to the second sensitive API calling graph.
4. The method of claim 3, wherein obtaining the second sensitive API call graph for the APK file samples from the second function call graph comprises:
matching the nodes in the second function call graph with API nodes in a preset sensitive API set, and determining the corresponding nodes in the second function call graph as second non-sensitive API nodes under the condition that the nodes in the second function call graph are different from the API nodes in the sensitive API set;
and deleting the second non-sensitive API node and the corresponding edge thereof to obtain a second sensitive API call graph of the APK file sample.
5. The method of claim 3, wherein obtaining the second sensitive API call sequence from the second sensitive API call graph comprises:
and under the condition that at least one edge exists between every two second sensitive API nodes in the second sensitive API call graph, adding a directed edge between the two corresponding second sensitive API nodes to obtain a second sensitive API call sequence.
6. The method of claim 2, wherein obtaining the classification characteristic of the APK file sample according to the second sensitive API call sequence comprises:
acquiring the support degree of the second sensitive API calling sequence;
and determining a second sensitive API calling sequence corresponding to the support degree meeting the preset condition as the classification characteristic of the APK file sample.
7. The method of claim 6, wherein the computing is performed by:
w-suppk(si)=Sk(si)*Ak(si) Obtaining the support degree of the second sensitive API calling sequence;
wherein, w-supppk(si) Calling sequence s for second sensitive APIiSupport in APK File samples of type k, Sk(si) Calling sequence s for second sensitive APIiIn classFrequency of occurrence in the second sensitive API call sequence, A, of all APK file samples of type kk(si) Calling sequence s for second sensitive APIiThe frequency of occurrence in all APK file samples of type k, k being either a malicious APK file sample or a benign APK file sample.
8. The method according to any one of claims 2 to 7, after obtaining the classification feature of the APK file sample according to the second sensitive API call sequence, further comprising:
acquiring the intersection of the sensitive API call sequence set of the malicious APK file sample and the sensitive API call sequence set of the benign APK file sample in the classification characteristic;
and deleting the intersection from the classification characteristic.
9. An apparatus for malicious code detection, comprising a processor and a memory storing program instructions, wherein the processor is configured to perform the method for malicious code detection according to any of claims 1 to 8 when executing the program instructions.
10. A device comprising the apparatus for malicious code detection of claim 9.
CN202011593644.8A 2020-12-29 2020-12-29 Method, device and equipment for malicious code detection Pending CN112651024A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011593644.8A CN112651024A (en) 2020-12-29 2020-12-29 Method, device and equipment for malicious code detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011593644.8A CN112651024A (en) 2020-12-29 2020-12-29 Method, device and equipment for malicious code detection

Publications (1)

Publication Number Publication Date
CN112651024A true CN112651024A (en) 2021-04-13

Family

ID=75363773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011593644.8A Pending CN112651024A (en) 2020-12-29 2020-12-29 Method, device and equipment for malicious code detection

Country Status (1)

Country Link
CN (1) CN112651024A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989347A (en) * 2021-04-15 2021-06-18 重庆大学 Method, device and equipment for identifying malicious software
CN114925364A (en) * 2022-05-19 2022-08-19 重庆大学 Android malicious application detection method based on reconstructed API
CN117354067A (en) * 2023-12-06 2024-01-05 南京先维信息技术有限公司 Malicious code detection method and system
CN117574371A (en) * 2023-11-28 2024-02-20 中华人民共和国新疆出入境边防检查总站(新疆维吾尔自治区公安厅边境管理总队) Malicious code detection system for entropy sensitive calling feature of edge computing platform

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120290649A1 (en) * 2011-05-10 2012-11-15 Telefonica, S.A. Method of characterizing a social network communication using motifs
CN106951780A (en) * 2017-02-08 2017-07-14 中国科学院信息工程研究所 Beat again the static detection method and device of bag malicious application
CN107169355A (en) * 2017-04-28 2017-09-15 北京理工大学 A kind of worm homology analysis method and apparatus
WO2017193036A1 (en) * 2016-05-05 2017-11-09 Cylance Inc. Machine learning model for malware dynamic analysis
CN109753800A (en) * 2019-01-02 2019-05-14 重庆邮电大学 Merge the Android malicious application detection method and system of frequent item set and random forests algorithm
CN110263538A (en) * 2019-05-13 2019-09-20 重庆大学 A kind of malicious code detecting method based on system action sequence
CN111324893A (en) * 2020-02-17 2020-06-23 电子科技大学 Detection method and background system for android malicious software based on sensitive mode
CN111400708A (en) * 2020-03-11 2020-07-10 重庆大学 Method and device for malicious code detection

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120290649A1 (en) * 2011-05-10 2012-11-15 Telefonica, S.A. Method of characterizing a social network communication using motifs
WO2017193036A1 (en) * 2016-05-05 2017-11-09 Cylance Inc. Machine learning model for malware dynamic analysis
CN106951780A (en) * 2017-02-08 2017-07-14 中国科学院信息工程研究所 Beat again the static detection method and device of bag malicious application
CN107169355A (en) * 2017-04-28 2017-09-15 北京理工大学 A kind of worm homology analysis method and apparatus
CN109753800A (en) * 2019-01-02 2019-05-14 重庆邮电大学 Merge the Android malicious application detection method and system of frequent item set and random forests algorithm
CN110263538A (en) * 2019-05-13 2019-09-20 重庆大学 A kind of malicious code detecting method based on system action sequence
CN111324893A (en) * 2020-02-17 2020-06-23 电子科技大学 Detection method and background system for android malicious software based on sensitive mode
CN111400708A (en) * 2020-03-11 2020-07-10 重庆大学 Method and device for malicious code detection

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
孙伟;孙雅杰;夏孟友;: "一种静态Android重打包恶意应用检测方法", 信息安全研究, no. 08, 5 August 2017 (2017-08-05), pages 692 - 700 *
杨吉云;范佳文;周洁;高凌云: "融合行为模式的Android恶意代码检测方法", 计算机科学与探索, vol. 16, no. 8, 31 August 2022 (2022-08-31), pages 1792 - 1799 *
杨吉云;陈钢;鄢然;吕建斌: "一种基于***行为序列特征的 Android恶意代码检测方法", 重庆大学学报, vol. 43, no. 9, 30 September 2020 (2020-09-30), pages 54 - 63 *
梁俊鹏;: "基于API频繁模式挖掘算法的Android恶意应用检测方法", 重庆文理学院学报(社会科学版), no. 05, 10 September 2016 (2016-09-10), pages 93 - 97 *
范铭;刘烃;刘均;罗夏朴;于乐;管晓宏;: "安卓恶意软件检测方法综述", 中国科学:信息科学, no. 08, 31 August 2020 (2020-08-31), pages 1148 - 1177 *
荣俸萍;方勇;左政;刘亮;: "MACSPMD:基于恶意API调用序列模式挖掘的恶意代码检测", 计算机科学, no. 05, 15 May 2018 (2018-05-15), pages 131 - 138 *
黄琨茗;张磊;赵奎;刘亮;: "基于最长频繁序列挖掘的恶意代码检测", 四川大学学报(自然科学版), no. 04, 28 July 2020 (2020-07-28), pages 681 - 688 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989347A (en) * 2021-04-15 2021-06-18 重庆大学 Method, device and equipment for identifying malicious software
CN114925364A (en) * 2022-05-19 2022-08-19 重庆大学 Android malicious application detection method based on reconstructed API
CN114925364B (en) * 2022-05-19 2024-06-07 重庆大学 Android malicious application detection method based on reconfiguration API
CN117574371A (en) * 2023-11-28 2024-02-20 中华人民共和国新疆出入境边防检查总站(新疆维吾尔自治区公安厅边境管理总队) Malicious code detection system for entropy sensitive calling feature of edge computing platform
CN117354067A (en) * 2023-12-06 2024-01-05 南京先维信息技术有限公司 Malicious code detection method and system
CN117354067B (en) * 2023-12-06 2024-02-23 南京先维信息技术有限公司 Malicious code detection method and system

Similar Documents

Publication Publication Date Title
EP3654217B1 (en) Malware detection
CN112651024A (en) Method, device and equipment for malicious code detection
US8769692B1 (en) System and method for detecting malware by transforming objects and analyzing different views of objects
CN109586282B (en) Power grid unknown threat detection system and method
Zhao et al. A review of computer vision methods in network security
CN107547495B (en) System and method for protecting a computer from unauthorized remote management
CN111371778B (en) Attack group identification method, device, computing equipment and medium
CN111368289B (en) Malicious software detection method and device
CN109756467B (en) Phishing website identification method and device
Malisa et al. Mobile application impersonation detection using dynamic user interface extraction
CN113221032A (en) Link risk detection method, device and storage medium
Bai et al. $\sf {DBank} $ DBank: Predictive Behavioral Analysis of Recent Android Banking Trojans
CN111400708B (en) Method and device for malicious code detection
Visu et al. Software-defined forensic framework for malware disaster management in Internet of Thing devices for extreme surveillance
EP3113065B1 (en) System and method of detecting malicious files on mobile devices
CN109800569A (en) Program identification method and device
CN112784269A (en) Malicious software detection method and device and computer storage medium
Ndagi et al. Machine learning classification algorithms for adware in android devices: a comparative evaluation and analysis
US11423099B2 (en) Classification apparatus, classification method, and classification program
CN108229168B (en) Heuristic detection method, system and storage medium for nested files
CN113190847A (en) Confusion detection method, device, equipment and storage medium for script file
CN112966264A (en) XSS attack detection method, device, equipment and machine-readable storage medium
CN114817913A (en) Code detection method and device, computer equipment and storage medium
CN111881446A (en) Method and device for identifying malicious codes of industrial internet
JP7031438B2 (en) Information processing equipment, control methods, and programs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination