CN117591119A - Mass APK source code feature extraction and similarity analysis method - Google Patents

Mass APK source code feature extraction and similarity analysis method Download PDF

Info

Publication number
CN117591119A
CN117591119A CN202311441226.0A CN202311441226A CN117591119A CN 117591119 A CN117591119 A CN 117591119A CN 202311441226 A CN202311441226 A CN 202311441226A CN 117591119 A CN117591119 A CN 117591119A
Authority
CN
China
Prior art keywords
similarity
source code
apk
file
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311441226.0A
Other languages
Chinese (zh)
Other versions
CN117591119B (en
Inventor
段东圣
侯炜
张露晨
佟玲玲
段运强
秦韬
李美燕
任博雅
鲁睿
张林波
孙旷怡
陈新兴
张绪川
王鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN202311441226.0A priority Critical patent/CN117591119B/en
Publication of CN117591119A publication Critical patent/CN117591119A/en
Application granted granted Critical
Publication of CN117591119B publication Critical patent/CN117591119B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Stored Programmes (AREA)

Abstract

The invention relates to the technical field of software detection and discloses a method for extracting massive APK source code characteristics and analyzing the similarity, which comprises the steps of firstly inputting two APK files, extracting android maniffest files and localized language configuration files of an APK package through a source code analysis decompilation method, and extracting SMALI or JAVA source codes; identifying an APK core source code directory, a third party package directory and a system resource directory by a package name index, a starting class index and a fixed directory identification mode, and generating a source code tree; analyzing the file in the core source code catalog, calculating a file HASH, and extracting the character string declaration characteristic representation in the source code file as a weighting characteristic; and calculating the similarity conditions of two source code tree structures to be analyzed, and weighting the similarity of different degrees according to the types of the source code catalogues. The method reduces analysis resource investment and time consumption, improves accuracy of source code similarity analysis, and can realize high-performance analysis in a large-scale APK data analysis scene.

Description

Mass APK source code feature extraction and similarity analysis method
Technical Field
The invention relates to the technical field of software detection, in particular to a method for extracting massive APK source code characteristics and analyzing the similarity.
Background
In the technical field of APK (Android application package file) source code similarity analysis, remarkable development is achieved in recent years. The method specifically comprises the following steps:
1. code comparison algorithm: a more efficient and accurate code alignment algorithm was developed for comparing and analyzing similarities between APK source codes. These algorithms can identify differences between different versions of an application and identify code segments for reuse. (whether this can add citation sources, papers or patents, the following is the same)
2. Code clone detection: clone detection techniques can identify cloned code segments, i.e., duplicate codes, in APK source codes. This is important for code maintenance and reconfiguration, and can help developers reduce repetitive labor and improve code quality.
3. Feature extraction and representation: researchers have proposed different feature extraction and representation methods for capturing similar features in APK source code. For example, an AST (abstract syntax tree) is used to represent a code structure, and TF-IDF (word frequency-inverse document frequency) is used to represent keywords in a code.
4. Machine learning and deep learning: machine learning and deep learning techniques are applied to APK source code similarity analysis to improve accuracy of similarity matching and detection. For example, convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs) may be used to learn the representation and similarity of APK source codes.
The prior related technology comprises the following steps:
and finally, according to a cosine similarity calculation method and the space vector, calculating the cosine similarity used for representing the similarity of the source codes, thereby helping a development team to identify the source codes of repeated or similar logic and providing a judgment basis for implementing scenes such as code reconstruction, service merging and the like.
1) At present, the main APP source code similarity analysis algorithm generally analyzes the similarity of APP packets and source codes through AndroidManifest file content and source code diff algorithm, obtains all source code files of the APP through decompilation, traverses each source code file and performs line-by-line comparison through diff algorithm, and also performs associated identification on context content, so that the algorithm based on similarity analysis has higher calculation resources and slower efficiency, is generally applicable to content management scenes such as Git, SVN and the like, and is not applicable to mass APP analysis scenes.
2) The current mainstream source code similarity analysis technology is mainly aimed at comparing the content similarity degree of two source code files, and lacks the condition of convenient and applicable in mass APP in practical service application. At present, most source code similarity analysis technologies face two source code files, and APP is a combination package of a large number of source code material files, so that the main technology for source code similarity analysis is difficult to directly and conveniently apply in the scene; in the similarity analysis process of the APK package, the file naming, variable naming and business logic of the APP source codes are changed due to the technologies of shell adding, confusion and the like, the content of the same source code is changed after the source code is subjected to shell adding, confusion and the like, and the source code is difficult to restore to the most original state after being processed by the technologies of reverse, shelling and the like, so that the stability of the output of the APP similarity analysis result is difficult to ensure by the similarity analysis technology of the source code. Aiming at the problems, a method for extracting and analyzing the characteristics of massive APK source codes is needed.
Disclosure of Invention
The invention aims to provide a method for extracting and analyzing characteristics and similarity of massive APK source codes. According to the invention, through extracting the management file of the APK, constructing the directory structure and the source code file map of the APK package, optimizing the analysis process, increasing specific item weight, reducing analysis resource investment and time consumption, improving the accuracy of source code similarity analysis, and realizing high-performance analysis on a large-scale APK data analysis scene.
The invention is realized in the following way:
the invention provides a method for extracting and analyzing characteristics and similarity of massive APK source codes, which comprises the following steps:
S 1 firstly, inputting two APK files, extracting an android management file and a localized language configuration file of an APK package by a source code analysis decompilation method, and extracting SMALI or JAVA source codes; the android management file extracted from the APK package through the source code analysis decompilation method is decompiled to the APK through an apktool and jadx existing APK analysis tool, if an abnormality occurs in the decompilation process, APK information is extracted through decompression of the compressed package and then based on the android package body structure specification analysis mode, decompiled to the smal i source code is finally output, and the android management file is obtained.
S 2 Identifying APK core source by packet name index, starting class index and fixed directory identification modeCode catalogue, third party package catalogue and system resource catalogue, and generate source code tree; summarizing and constructing a catalog feature set based on a source code file organization mode of Android Studio mainstream IDE defaults and community consensus; analyzing the core code file catalogue layer by layer through a package name and a starting class structure, and analyzing the core code catalogue through a naming mode of the package name and the position of the starting class;
S 3 analyzing files in a core source code catalog, calculating a file HASH, and extracting character string declaration characteristic representations in the source code file as weighting characteristics; the android management file includes APP name, package name, rights, attributes, service declarations, at step S 3 Specifically, the method comprises the following steps:
S 3.1 : firstly, word segmentation is carried out on an input configuration file, the configuration file is segmented into individual vocabulary units according to attributes or names, and symbols or characters without specific meanings are filtered;
S 3.2 : for each vocabulary unit, calculating a hash value of the vocabulary unit, multiplying the hash value by a weight value, setting the weight value according to the importance or frequency of the vocabulary, and extracting the characteristics;
S 3.3 : combining the feature vectors of each vocabulary unit, and using a vector with a fixed length to represent the whole text for feature combination;
S 3.4 : computing SimHash: weighting and summarizing the combined feature vectors, and setting the corresponding position of each feature vector to be 1 if the value at the position is greater than 0; otherwise, setting the binary value as-1 to finally obtain a binary SimHash value;
S 3.5 : comparing SIMHASH values of different texts, and measuring the similarity of two SIMHASH values by using hamming distance.
The feature of the source code is that the feature expression word set of the current source code file is formed by acquiring variables and attributes in the smal i or java source code and summarizing based on three element information such as name, type and occurrence frequency of the acquired variables.
The similarity calculation method for the source code feature representation is used for comparing the coverage degree of the variable intersection, calculating the occurrence frequency and the consistency of the types of the variables, and if the variable intersection exceeds a threshold value of 70%, considering that the current source code feature representation is similar.
S 4 Calculating the similarity conditions of two source code tree structures to be analyzed, and weighting the similarity of different degrees according to the types of source code catalogues, wherein the weight +2, the third party package catalogue +1 and the system resource catalogue 0 of core source code catalogue similarity;
S 5 calculating a source code file of an end node of each tree, wherein the source code file has consistent weight +2, and the source code file features represent similar weight +1; the method comprises the following steps:
S 5.1 firstly, calculating node similarity weight of a source code file of an end node of each tree;
S 5.2 judging whether the file HASH is the same, if so, weighting by +2, otherwise, weighting by +0;
S 5.3 judging whether the file characteristic representations are similar or not, if so, weighting by +1, otherwise, weighting by +0;
S 5.4 and outputting the weight.
S 6 Calculating the similarity condition of two trees, taking an average according to bidirectional comparison, generating the similarity of source code trees, specifically obtaining S1 by calculating the coverage rate of an A tree in a B tree, obtaining S2 by calculating the coverage rate of the B tree in the A tree, and finally calculating and outputting the similarity S of the directory structure by (s1+s2)/2, wherein the similarity S is shown as a formula (1);
S 7 the AndroidManifest of two APP analyzed by SimHash algorithm, the similarity degree of localization Language configuration, output the similarity proportion by calculating the Hamming distance, if the two APP input are A and B, then the AndroidManifest file is represented by C (Config), namely Ca and Cb, the localization Language configuration is represented by L (Language), namely La, lb, output the similarity attribute SC and SL by calculating the AndroidManifest and the Language configuration file of the two APP; as shown in (2) -formula (3);
S C Similarity (Ca), simhash (Cb)) formula (2)
S L =similarity(simhash(L a ),simhash(L b ) Arbitrary (3)
S 8 And finally, calculating three data of tree structure similarity, android management similarity and localization language configuration similarity according to the ratio of x to y to z, wherein x, y and z are weight coefficients of three similarity of tree structure similarity, android management similarity and localization language configuration similarity, and setting the three coefficients according to the importance and participation degree of the three similarity results in the final similarity calculation process. Calculating final APP similarity S through weighted summation; as shown in formula (4);
further, the present invention provides a computer readable storage medium storing a computer program which when executed by a main controller implements a method as described in any one of the above.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, through extracting the management file of the APK, constructing the directory structure and the source code file map of the APK package, optimizing the analysis process, increasing specific item weight, reducing analysis resource investment and time consumption, improving the accuracy of source code similarity analysis, and realizing high-performance analysis on a large-scale APK data analysis scene.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and that other related drawings are also obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of the present invention for computing end nodes of each tree;
FIG. 3 is a code operation diagram of the present invention for obtaining variable names, types, and occurrence frequencies of a current source code file.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.
Referring to fig. 1-3, a method for extracting and analyzing characteristics of massive APK source codes and similarity, S 1 Firstly, inputting two APK files, extracting an android management file and a localized language configuration file of an APK package by a source code analysis decompilation method, and extracting SMALI or JAVA source codes; the android management file extracted from the APK package through the source code analysis decompilation method is decompiled to the APK through an apktool and jadx existing APK analysis tool, if an abnormality occurs in the decompilation process, APK information is extracted through compression package decompression and then based on the android package body structure specification analysis mode, decompiled to the smali source code is finally output, and the android management file is obtained.
S 2 Identifying APK core source code by packet name index, starting class index and fixed directory identification modeRecording, third party package catalogs and system resource catalogs, and generating a source code tree; summarizing and constructing a catalog feature set based on a source code file organization mode of Android Studio mainstream IDE defaults and community consensus; analyzing the core code file catalogue layer by layer through a package name and a starting class structure, and analyzing the core code catalogue through a naming mode of the package name and the position of the starting class;
S 3 analyzing files in a core source code catalog, calculating a file HASH, and extracting character string declaration characteristic representations in the source code file as weighting characteristics; the android management file includes APP name, package name, rights, attributes, service declarations, at step S 3 Specifically, the method comprises the following steps:
S 3.1 : firstly, word segmentation is carried out on an input configuration file, the configuration file is segmented into individual vocabulary units according to attributes or names, and symbols or characters without specific meanings are filtered;
S 3.2 : for each vocabulary unit, calculating a hash value of the vocabulary unit, multiplying the hash value by a weight value, setting the weight value according to the importance or frequency of the vocabulary, and extracting the characteristics;
S 3.3 : combining the feature vectors of each vocabulary unit, and using a vector with a fixed length to represent the whole text for feature combination;
S 3.4 : computing SimHash: weighting and summarizing the combined feature vectors, and setting the corresponding position of each feature vector to be 1 if the value at the position is greater than 0; otherwise, setting the binary value as-1 to finally obtain a binary SimHash value;
S 3.5 : comparing SIMHASH values of different texts, and measuring the similarity of two SIMHASH values by using hamming distance.
The feature of the source code is that the feature expression word set of the current source code file is formed by acquiring the variable and the attribute in the smali or java source code and summarizing based on three element information such as name, type, occurrence frequency and the like of the acquired variable. Characterization data samples such as table 1:
table 1 features represent data samples
The similarity calculation method for the source code feature representation is used for comparing the coverage degree of the variable intersection, calculating the occurrence frequency and the consistency of the types of the variables, and if the variable intersection exceeds a threshold value of 70%, considering that the current source code feature representation is similar.
S 4 Calculating the similarity conditions of two source code tree structures to be analyzed, and weighting the similarity of different degrees according to the types of source code catalogues, wherein the weight +2, the third party package catalogue +1 and the system resource catalogue 0 of core source code catalogue similarity;
S 5 calculating a source code file of an end node of each tree, wherein the source code file has consistent weight +2, and the source code file features represent similar weight +1; the method comprises the following steps:
S 5.1 firstly, calculating node similarity weight of a source code file of an end node of each tree;
S 5.2 judging whether the file HASH is the same, if so, weighting by +2, otherwise, weighting by +0;
S 5.3 judging whether the file characteristic representations are similar or not, if so, weighting by +1, otherwise, weighting by +0;
S 5.4 and outputting the weight.
S 6 Calculating the similarity condition of two trees, taking an average according to bidirectional comparison, generating the similarity of source code trees, specifically obtaining S1 by calculating the coverage rate of an A tree in a B tree, obtaining S2 by calculating the coverage rate of the B tree in the A tree, and finally calculating and outputting the similarity S of the directory structure by (s1+s2)/2, wherein the similarity S is shown as a formula (1);
S 7 the AndroidManifest of two APP analyzed by SimHash algorithm, the similarity degree of localization Language configuration, output the similarity proportion by calculating the Hamming distance, if the two APP input are A and B, then the AndroidManifest file is represented by C (Config), namely Ca and Cb, the localization Language configuration is represented by L (Language), namely La, lb, output the similarity attribute SC and SL by calculating the AndroidManifest and the Language configuration file of the two APP; as shown in the formula (2) -formula (3);
S C similarity (Ca), simhash (Cb)) formula (2)
S L =similarity(simhash(L a ),simhash(L b ) Arbitrary (3)
S 8 And finally, calculating three data of tree structure similarity, android management similarity and localization language configuration similarity according to the ratio of x to y to z, wherein x, y and z are weight coefficients of three similarity of tree structure similarity, android management similarity and localization language configuration similarity, and setting the three coefficients according to the importance and participation degree of the three similarity results in the final similarity calculation process. Calculating final APP similarity S through weighted summation; as shown in formula (4);
in this embodiment, the present invention provides a computer-readable storage medium storing a computer program which, when executed by a main controller, implements a method as described in any one of the above.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A method for extracting and analyzing characteristics of massive APK source codes is characterized by comprising the following steps: the method comprises the following steps:
S 1 firstly, inputting two APK files, extracting an android management file and a localized language configuration file of an APK package by a source code analysis decompilation method, and extracting SMALI or JAVA source codes;
S 2 identifying an APK core source code directory, a third party package directory and a system resource directory by a package name index, a starting class index and a fixed directory identification mode, and generating a source code tree;
S 3 analyzing files in a core source code catalog, calculating a file HASH, and extracting character string declaration characteristic representations in the source code file as weighting characteristics;
S 4 calculating the similarity conditions of two source code tree structures to be analyzed, and weighting the similarity of different degrees according to the types of source code catalogues, wherein the weight +2, the third party package catalogue +1 and the system resource catalogue 0 of core source code catalogue similarity;
S 5 calculating a source code file of an end node of each tree, wherein the source code file has consistent weight +2, and the source code file features represent similar weight +1;
S 6 calculating the similarity condition of two trees, taking an average according to bidirectional comparison, generating the similarity of source code trees, specifically obtaining S1 by calculating the coverage rate of an A tree in a B tree, obtaining S2 by calculating the coverage rate of the B tree in the A tree, and finally calculating and outputting the similarity S of the directory structure by (s1+s2)/2, wherein the similarity S is shown as a formula (1);
S 7 the similarity degree of AndroidManifest and localized language configuration of two APP analyzed by SimHash algorithm is output by calculating Hamming distance to output similarity proportion, and similarity attributes SC and SL are output; as shown in the formula (2) -formula (3);
S C similarity (Ca), simhash (Cb)) formula (2)
S L =similarity(simhash(L a ),simhash(L b ) Arbitrary (3)
S 8 : finally, three data of tree structure similarity, android management similarity and localization language configuration similarity are weighted and summed according to the proportion of x to y to z to calculate output similarity, and x, y and z are weight coefficients of three similarity of tree structure similarity, android management similarity and localization language configuration similarity, and final APP similarity S is calculated through weighted and summed; as shown in formula (4);
2. the method for extracting and analyzing characteristics and similarity of massive APK source codes according to claim 1, wherein in step S 5 Specifically, the method comprises the following steps:
S 5.1 : firstly, calculating node similarity weight of a source code file of an end node of each tree;
S 5.2 : judging whether the file HASH is the same, if so, weighting by +2, otherwise, weighting by +0;
S 5.3 : judging whether the file characteristic representations are similar or not, if so, weighting by +1, otherwise, weighting by +0;
S 5.4 : and outputting the weight.
3. The method for extracting and analyzing characteristics of massive APK source codes and the similarity according to claim 1 is characterized in that in step S1, an android maniffect file of an APK package is extracted through a source code analysis decompilation method, APK is decompiled through an apktool and jadx existing APK analysis tool, if abnormality occurs in the decompilation process, APK information is extracted through compression package decompression and then based on an android package body structure specification analysis mode, decompiled to a smali source code is finally output, and the android maniffect file is obtained.
4. The method for extracting and analyzing characteristics and similarity of massive APK source codes according to claim 1, wherein in step S 2 Summarizing and constructing a catalog feature set based on a source code file organization mode of Android Studio mainstream IDE defaults and community consensus; and analyzing the core code file catalogue layer by layer through the package name and the starting class structure, and analyzing the core code catalogue through the naming mode of the package name and the position of the starting class.
5. The method for extracting and analyzing characteristics and similarity of massive APK source codes according to claim 1, wherein the android management file includes APP name, package name, authority, attribute, service statement, in step S 3 Specifically, the method comprises the following steps:
S 3.1 : firstly, word segmentation is carried out on an input configuration file, the configuration file is segmented into individual vocabulary units according to attributes or names, and symbols or characters without specific meanings are filtered;
S 3.2 : for each vocabulary unit, calculating a hash value of the vocabulary unit, multiplying the hash value by a weight value, setting the weight value according to the importance or frequency of the vocabulary, and extracting the characteristics;
S 3.3 : combining the feature vectors of each vocabulary unit, and using a vector with a fixed length to represent the whole text for feature combination;
S 3.4 : computing SimHash: weighting and summarizing the combined feature vectors, and setting the corresponding position of each feature vector to be 1 if the value at the position is greater than 0; otherwise, setting the binary value as-1 to finally obtain a binary SimHash value;
S 3.5 : comparing SIMHASH values of different texts, and measuring the similarity of two SIMHASH values by using hamming distance.
6. The method for extracting and analyzing characteristics of massive APK source codes and similarity according to claim 1, wherein the characteristics of the source codes are summarized to form a characteristic representation word set of a current source code file by acquiring variables and attributes in smal i or java source codes based on three element information such as name, type and occurrence frequency of the acquired variables.
7. The method for extracting and analyzing massive APK source code features according to claim 6, wherein the similarity calculation method for the source code features is characterized in that the coverage degree of the intersection of variables is compared, the consistency of the occurrence frequency and the type of the variables is calculated, and if the intersection of the variables exceeds a threshold value of 70%, the current source code features are considered to be similar.
8. A computer readable storage medium storing a computer program, which when executed by a main controller implements the method of any of the preceding claims 1-7.
CN202311441226.0A 2023-11-01 2023-11-01 Mass APK source code feature extraction and similarity analysis method Active CN117591119B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311441226.0A CN117591119B (en) 2023-11-01 2023-11-01 Mass APK source code feature extraction and similarity analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311441226.0A CN117591119B (en) 2023-11-01 2023-11-01 Mass APK source code feature extraction and similarity analysis method

Publications (2)

Publication Number Publication Date
CN117591119A true CN117591119A (en) 2024-02-23
CN117591119B CN117591119B (en) 2024-05-31

Family

ID=89909022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311441226.0A Active CN117591119B (en) 2023-11-01 2023-11-01 Mass APK source code feature extraction and similarity analysis method

Country Status (1)

Country Link
CN (1) CN117591119B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109445834A (en) * 2018-10-30 2019-03-08 北京计算机技术及应用研究所 The quick comparative approach of program code similitude based on abstract syntax tree
CN109800575A (en) * 2018-12-06 2019-05-24 成都网安科技发展有限公司 A kind of safety detection method of Android application program
CN110034921A (en) * 2019-04-18 2019-07-19 成都信息工程大学 The webshell detection method of hash is obscured based on cum rights
CN114995880A (en) * 2022-05-23 2022-09-02 北京计算机技术及应用研究所 Binary code similarity comparison method based on SimHash

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109445834A (en) * 2018-10-30 2019-03-08 北京计算机技术及应用研究所 The quick comparative approach of program code similitude based on abstract syntax tree
CN109800575A (en) * 2018-12-06 2019-05-24 成都网安科技发展有限公司 A kind of safety detection method of Android application program
CN110034921A (en) * 2019-04-18 2019-07-19 成都信息工程大学 The webshell detection method of hash is obscured based on cum rights
CN114995880A (en) * 2022-05-23 2022-09-02 北京计算机技术及应用研究所 Binary code similarity comparison method based on SimHash

Also Published As

Publication number Publication date
CN117591119B (en) 2024-05-31

Similar Documents

Publication Publication Date Title
US9003529B2 (en) Apparatus and method for identifying related code variants in binaries
US7076486B2 (en) Method and system for efficiently identifying differences between large files
CN110990273B (en) Clone code detection method and device
WO2022048363A1 (en) Website classification method and apparatus, computer device, and storage medium
CN107273474A (en) Autoabstract abstracting method and system based on latent semantic analysis
CN108984155B (en) Data processing flow setting method and device
CN102915365A (en) Hadoop-based construction method for distributed search engine
CN105260387B (en) A kind of Association Rule Analysis method towards magnanimity transaction database
CN112231416B (en) Knowledge graph body updating method and device, computer equipment and storage medium
CN112269593A (en) Method and apparatus for converting sequencing scripts to reuse JCL in different coding environments
CN116775497B (en) Database test case generation demand description coding method
CN115033890A (en) Comparison learning-based source code vulnerability detection method and system
CN106570153A (en) Data extraction method and system for mass URLs
CN111898135A (en) Data processing method, data processing apparatus, computer device, and medium
WO2016093839A1 (en) Structuring of semi-structured log messages
CN108647334B (en) Video social network homology analysis method under spark platform
CN113887182A (en) Table generation method, device, equipment and storage medium
CN105573726B (en) A kind of rules process method and equipment
CN109977977A (en) A kind of method and corresponding intrument identifying potential user
CN117591119B (en) Mass APK source code feature extraction and similarity analysis method
CN107622201B (en) A kind of Android platform clone's application program rapid detection method of anti-reinforcing
CN116991412A (en) Code processing method, device, electronic equipment and storage medium
CN115794105A (en) Micro-service extraction method and device and electronic equipment
CN114118058A (en) Emotion analysis system and method based on fusion of syntactic characteristics and attention mechanism
CN110493088B (en) Mobile internet traffic classification method based on URL

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant