CN117591119B - Mass APK source code feature extraction and similarity analysis method - Google Patents

Mass APK source code feature extraction and similarity analysis method Download PDF

Info

Publication number
CN117591119B
CN117591119B CN202311441226.0A CN202311441226A CN117591119B CN 117591119 B CN117591119 B CN 117591119B CN 202311441226 A CN202311441226 A CN 202311441226A CN 117591119 B CN117591119 B CN 117591119B
Authority
CN
China
Prior art keywords
similarity
source code
apk
file
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311441226.0A
Other languages
Chinese (zh)
Other versions
CN117591119A (en
Inventor
段东圣
侯炜
张露晨
佟玲玲
段运强
秦韬
李美燕
任博雅
鲁睿
张林波
孙旷怡
陈新兴
张绪川
王鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN202311441226.0A priority Critical patent/CN117591119B/en
Publication of CN117591119A publication Critical patent/CN117591119A/en
Application granted granted Critical
Publication of CN117591119B publication Critical patent/CN117591119B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Stored Programmes (AREA)

Abstract

The invention relates to the technical field of software detection and discloses a method for extracting massive APK source code characteristics and analyzing the similarity, which comprises the steps of firstly inputting two APK files, extracting AndroidManifest files and localized language configuration files of an APK package by a source code analysis decompilation method, and extracting SMALI or JAVA source codes; identifying an APK core source code directory, a third party package directory and a system resource directory by a package name index, a starting class index and a fixed directory identification mode, and generating a source code tree; analyzing the file in the core source code catalog, calculating a file HASH, and extracting the character string declaration characteristic representation in the source code file as a weighting characteristic; and calculating the similarity conditions of two source code tree structures to be analyzed, and weighting the similarity of different degrees according to the types of the source code catalogues. The method reduces analysis resource investment and time consumption, improves accuracy of source code similarity analysis, and can realize high-performance analysis in a large-scale APK data analysis scene.

Description

Mass APK source code feature extraction and similarity analysis method
Technical Field
The invention relates to the technical field of software detection, in particular to a method for extracting massive APK source code characteristics and analyzing the similarity.
Background
In the technical field of APK (Android application package file) source code similarity analysis, remarkable development is achieved in recent years. The method specifically comprises the following steps:
1. code comparison algorithm: a more efficient and accurate code alignment algorithm was developed for comparing and analyzing similarities between APK source codes. These algorithms can identify differences between different versions of an application and identify code segments for reuse. (whether this can add citation sources, papers or patents, the following is the same)
2. Code clone detection: clone detection techniques can identify cloned code segments, i.e., duplicate codes, in APK source codes. This is important for code maintenance and reconfiguration, and can help developers reduce repetitive labor and improve code quality.
3. Feature extraction and representation: researchers have proposed different feature extraction and representation methods for capturing similar features in APK source code. For example, an AST (abstract syntax tree) is used to represent a code structure, and TF-IDF (word frequency-inverse document frequency) is used to represent keywords in a code.
4. Machine learning and deep learning: machine learning and deep learning techniques are applied to APK source code similarity analysis to improve accuracy of similarity matching and detection. For example, convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs) may be used to learn the representation and similarity of APK source codes.
The prior related technology comprises the following steps:
And finally, according to a cosine similarity calculation method and the space vector, calculating the cosine similarity used for representing the similarity of the source codes, thereby helping a development team to identify the source codes of repeated or similar logic and providing a judgment basis for implementing scenes such as code reconstruction, service merging and the like.
1) At present, a main APP source code similarity analysis algorithm generally analyzes the similarity of APP packages and source codes through AndroidManifest file contents and a source code diff algorithm, obtains all source code files of the APP through decompilation, traverses each source code file and performs row-by-row comparison through the diff algorithm, and also performs associated identification on context contents, so that the algorithm based on the similarity analysis is generally applicable to content management scenes such as Git and SVN, and is not applicable to massive APP analysis scenes.
2) The current mainstream source code similarity analysis technology is mainly aimed at comparing the content similarity degree of two source code files, and lacks the condition of convenient and applicable in mass APP in practical service application. At present, most source code similarity analysis technologies face two source code files, and APP is a combination package of a large number of source code material files, so that the main technology for source code similarity analysis is difficult to directly and conveniently apply in the scene; in the similarity analysis process of the APK package, the file naming, variable naming and business logic of the APP source codes are changed due to the technologies of shell adding, confusion and the like, the content of the same source code is changed after the source code is subjected to shell adding, confusion and the like, and the source code is difficult to restore to the most original state after being processed by the technologies of reverse, shelling and the like, so that the stability of the output of the APP similarity analysis result is difficult to ensure by the similarity analysis technology of the source code. Aiming at the problems, a method for extracting and analyzing the characteristics of massive APK source codes is needed.
Disclosure of Invention
The invention aims to provide a method for extracting and analyzing characteristics and similarity of massive APK source codes. According to the invention, through extracting the management file of the APK, constructing the directory structure and the source code file map of the APK package, optimizing the analysis process, increasing specific item weight, reducing analysis resource investment and time consumption, improving the accuracy of source code similarity analysis, and realizing high-performance analysis on a large-scale APK data analysis scene.
The invention is realized in the following way:
the invention provides a method for extracting and analyzing characteristics and similarity of massive APK source codes, which comprises the following steps:
S 1, firstly inputting two APK files, extracting AndroidManifest files and localized language configuration files of an APK package by a source code analysis decompilation method, and extracting SMALI or JAVA source codes; the AndroidManifest file extracted into the APK package through the source code analysis decompilation method firstly decompiles the APK through the existing APK analysis tools of apktool and jadx, if the APK is abnormal in the decompilation process, the APK information is extracted through the compression package decompression and then based on the android package body structure specification analysis mode, and finally decompilation is output to smal i source codes, and AndroidManifest files are obtained.
S 2, identifying an APK core source code directory, a third party package directory and a system resource directory by a package name index, a starting class index and a fixed directory identification mode, and generating a source code tree; summarizing and constructing a catalog feature set based on a source code file organization mode of Android Studio mainstream IDE defaults and community consensus; analyzing the core code file catalogue layer by layer through a package name and a starting class structure, and analyzing the core code catalogue through a naming mode of the package name and the position of the starting class;
S 3, analyzing the file in the core source code catalog, calculating a file HASH, and extracting the character string declaration characteristic representation in the source code file as a weighting characteristic; the AndroidManifest file includes APP name, package name, authority, attribute, service statement, and in step S 3, the method specifically includes the following steps:
S 3.1: firstly, word segmentation is carried out on an input configuration file, the configuration file is segmented into individual vocabulary units according to attributes or names, and symbols or characters without specific meanings are filtered;
S 3.2: for each vocabulary unit, calculating a hash value of the vocabulary unit, multiplying the hash value by a weight value, setting the weight value according to the importance or frequency of the vocabulary, and extracting the characteristics;
S 3.3: combining the feature vectors of each vocabulary unit, and using a vector with a fixed length to represent the whole text for feature combination;
S 3.4: calculation SimHash: weighting and summarizing the combined feature vectors, and setting the corresponding position of each feature vector to be 1 if the value at the position is greater than 0; otherwise, setting the value to be-1, and finally obtaining a binary SimHash value;
S 3.5: comparing SIMHASH, comparing SimHash values of different texts, using hamming distance to measure similarity of two SimHash values.
The feature of the source code is obtained through the variables and attributes in smal i or java source codes, and the feature expression word set of the current source code file is formed by summarizing based on three element information such as name, type and occurrence frequency of the obtained variables.
The similarity calculation method for the source code feature representation is used for comparing the coverage degree of the variable intersection, calculating the occurrence frequency and the consistency of the types of the variables, and if the variable intersection exceeds a threshold value of 70%, considering that the current source code feature representation is similar.
S 4, calculating the similarity condition of two source code tree structures to be analyzed, and weighting the similarity of different degrees according to the type of a source code catalog, wherein the weight +2, the third party package catalog +1 and the system resource catalog 0 of the core source code catalog similarity;
S 5, calculating a source code file of an end node of each tree, wherein the source code file has consistent weight +2, and the source code file features represent similar weight +1; the method comprises the following steps:
S 5.1, firstly, calculating node similarity weight of a source code file of an end node of each tree;
s 5.2, judging whether the file HASH are the same, if so, weighting by +2, otherwise, weighting by +0;
s 5.3, judging whether the file characteristic representations are similar or not, if yes, weighting by +1, otherwise weighting by +0;
S 5.4, outputting weights.
S 6, calculating the similarity condition of two trees, taking an average value according to bidirectional comparison, generating the similarity of a source code tree, specifically obtaining S1 by calculating the coverage rate of an A tree in a B tree, obtaining S2 by calculating the coverage rate of the B tree in the A tree, and finally calculating the similarity S of the output directory structure through (s1+s2)/2, wherein the similarity S is shown as a formula (1);
S 7, analyzing AndroidManifest of two APP and similarity degree of localized Language configuration through SimHash algorithm, outputting similarity proportion through calculating Hamming distance, if the input two APP are A and B, androidManifest file is represented by C (Config), namely Ca and Cb, localized Language configuration is represented by L (Language), namely La and Lb, and outputting similarity attribute SC and SL through calculating AndroidManifest of the two APP and Language configuration file; as shown in the formula (2) -formula (3);
S C = similarity (simhash (Ca), simhash (Cb)) formula (2)
S L=similarity(simhash(La),simhash(Lb)) type (3)
And S 8, finally, calculating three data of tree structure similarity, androidManifest similarity and localization language configuration similarity according to the weighted sum of the ratio x, y and z, wherein x, y and z are weight coefficients of three similarity of tree structure similarity, androidManifest similarity and localization language configuration similarity, and setting the three coefficients according to the importance degree and participation degree of three similarity results in the final similarity calculation process. Calculating final APP similarity S through weighted summation; as shown in formula (4);
Further, the present invention provides a computer readable storage medium storing a computer program which when executed by a main controller implements a method as described in any one of the above.
Compared with the prior art, the invention has the beneficial effects that:
1. According to the invention, through extracting the management file of the APK, constructing the directory structure and the source code file map of the APK package, optimizing the analysis process, increasing specific item weight, reducing analysis resource investment and time consumption, improving the accuracy of source code similarity analysis, and realizing high-performance analysis on a large-scale APK data analysis scene.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and that other related drawings are also obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of the present invention for computing end nodes of each tree;
FIG. 3 is a code operation diagram of the present invention for obtaining variable names, types, and occurrence frequencies of a current source code file.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.
Referring to FIGS. 1-3, a method for extracting and analyzing characteristics of massive APK source codes includes inputting two APK files, extracting AndroidManifest files and localized language configuration files of APK package by source code analysis decompilation method, and extracting SMALI or JAVA source codes; the AndroidManifest file extracted into the APK package through the source code analysis decompilation method firstly decompiles the APK through the existing APK analysis tools of apktool and jadx, if abnormality occurs in the decompilation process, APK information is extracted through compression package decompression and then based on the android package body structure specification analysis mode, and finally decompilation is output to the smali source code, and AndroidManifest file is obtained.
S 2, identifying an APK core source code directory, a third party package directory and a system resource directory by a package name index, a starting class index and a fixed directory identification mode, and generating a source code tree; summarizing and constructing a catalog feature set based on a source code file organization mode of Android Studio mainstream IDE defaults and community consensus; analyzing the core code file catalogue layer by layer through a package name and a starting class structure, and analyzing the core code catalogue through a naming mode of the package name and the position of the starting class;
S 3, analyzing the file in the core source code catalog, calculating a file HASH, and extracting the character string declaration characteristic representation in the source code file as a weighting characteristic; the AndroidManifest file includes APP name, package name, authority, attribute, service statement, and in step S 3, the method specifically includes the following steps:
S 3.1: firstly, word segmentation is carried out on an input configuration file, the configuration file is segmented into individual vocabulary units according to attributes or names, and symbols or characters without specific meanings are filtered;
S 3.2: for each vocabulary unit, calculating a hash value of the vocabulary unit, multiplying the hash value by a weight value, setting the weight value according to the importance or frequency of the vocabulary, and extracting the characteristics;
S 3.3: combining the feature vectors of each vocabulary unit, and using a vector with a fixed length to represent the whole text for feature combination;
S 3.4: calculation SimHash: weighting and summarizing the combined feature vectors, and setting the corresponding position of each feature vector to be 1 if the value at the position is greater than 0; otherwise, setting the value to be-1, and finally obtaining a binary SimHash value;
S 3.5: comparing SIMHASH, comparing SimHash values of different texts, using hamming distance to measure similarity of two SimHash values.
The feature of the source code is that the feature expression word set of the current source code file is formed by acquiring the variable and the attribute in the smali or java source code and summarizing based on three element information such as name, type, occurrence frequency and the like of the acquired variable. Characterization data samples such as table 1:
Table 1 features represent data samples
The similarity calculation method for the source code feature representation is used for comparing the coverage degree of the variable intersection, calculating the occurrence frequency and the consistency of the types of the variables, and if the variable intersection exceeds a threshold value of 70%, considering that the current source code feature representation is similar.
S 4, calculating the similarity condition of two source code tree structures to be analyzed, and weighting the similarity of different degrees according to the type of a source code catalog, wherein the weight +2, the third party package catalog +1 and the system resource catalog 0 of the core source code catalog similarity;
S 5, calculating a source code file of an end node of each tree, wherein the source code file has consistent weight +2, and the source code file features represent similar weight +1; the method comprises the following steps:
S 5.1, firstly, calculating node similarity weight of a source code file of an end node of each tree;
s 5.2, judging whether the file HASH are the same, if so, weighting by +2, otherwise, weighting by +0;
s 5.3, judging whether the file characteristic representations are similar or not, if yes, weighting by +1, otherwise weighting by +0;
S 5.4, outputting weights.
S 6, calculating the similarity condition of two trees, taking an average value according to bidirectional comparison, generating the similarity of a source code tree, specifically obtaining S1 by calculating the coverage rate of an A tree in a B tree, obtaining S2 by calculating the coverage rate of the B tree in the A tree, and finally calculating the similarity S of the output directory structure through (s1+s2)/2, wherein the similarity S is shown as a formula (1);
S 7, analyzing AndroidManifest of two APP and similarity degree of localized Language configuration through SimHash algorithm, outputting similarity proportion through calculating Hamming distance, if the input two APP are A and B, androidManifest file is represented by C (Config), namely Ca and Cb, localized Language configuration is represented by L (Language), namely La and Lb, and outputting similarity attribute SC and SL through calculating AndroidManifest of the two APP and Language configuration file; as shown in the formula (2) -formula (3);
S C = similarity (simhash (Ca), simhash (Cb)) formula (2)
S L=similarity(simhash(La),simhash(Lb)) type (3)
And S 8, finally, calculating three data of tree structure similarity, androidManifest similarity and localization language configuration similarity according to the weighted sum of the ratio x, y and z, wherein x, y and z are weight coefficients of three similarity of tree structure similarity, androidManifest similarity and localization language configuration similarity, and setting the three coefficients according to the importance degree and participation degree of three similarity results in the final similarity calculation process. Calculating final APP similarity S through weighted summation; as shown in formula (4);
In this embodiment, the present invention provides a computer-readable storage medium storing a computer program which, when executed by a main controller, implements a method as described in any one of the above.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A method for extracting and analyzing characteristics of massive APK source codes is characterized by comprising the following steps: the method comprises the following steps:
s 1, firstly inputting two APK files, extracting AndroidManifest files and localized language configuration files of an APK package by a source code analysis decompilation method, and extracting SMALI or JAVA source codes;
S 2, identifying an APK core source code directory, a third party package directory and a system resource directory by a package name index, a starting class index and a fixed directory identification mode, and generating a source code tree;
S 3, analyzing the file in the core source code catalog, calculating a file HASH, and extracting the character string declaration characteristic representation in the source code file as a weighting characteristic;
s 4, calculating the similarity condition of two source code tree structures to be analyzed, and weighting the similarity of different degrees according to the types of source code catalogues, wherein the weight +2, the third party package catalogue +1 and the system resource catalogue 0 of the core source code catalogue similarity are carried out;
S 5, calculating a source code file of an end node of each tree, wherein the source code file has consistent weight +2, and the source code file features represent similar weight +1;
S 6, calculating the similarity condition of two trees, taking an average according to bidirectional comparison, generating the similarity of a source code tree, specifically obtaining S1 by calculating the coverage rate of an A tree in a B tree, obtaining S2 by calculating the coverage rate of the B tree in the A tree, and finally calculating the structural similarity S T of the output tree by (s1+s2)/2, wherein the structural similarity is shown as a formula (1);
S 7, outputting similarity proportions by calculating Hamming distances and outputting similarity attributes S C and S L according to the similarity degree of AndroidManifestH of two APP and localization language configuration analyzed by SimHash algorithm; as shown in the formula (2) -formula (3);
S C = similarity (simhash (Ca), simhash (Cb)) formula (2)
S L=similarity(simhash(La),simhash(Lb)) type (3)
S 8: finally, three data including tree structure similarity, androidManifest similarity and localization language configuration similarity are processed according to the proportion x: y: the z weighted summation calculates the weight coefficient of the three similarity of the tree structure similarity and AndroidManifest similarity and the localization language configuration similarity, and the final APP similarity S is calculated through the weighted summation; as shown in formula (4);
2. The method for extracting and analyzing characteristics and similarity of massive APK source codes according to claim 1, wherein in step S 5, the method is specifically performed as follows:
S 5.1: firstly, calculating node similarity weight of a source code file of an end node of each tree;
S 5.2: judging whether the file HASH is the same, if so, weighting by +2, otherwise, weighting by +0;
S 5.3: judging whether the file characteristic representations are similar or not, if so, weighting by +1, otherwise, weighting by +0;
S 5.4: and outputting the weight.
3. The method for extracting and analyzing characteristics of massive APK source codes and the similarity according to claim 1, wherein in step S1, a AndroidManifest file of an APK package extracted by a source code analysis decompilation method is decompiled to an APK through apktool and jadx existing APK analysis tools, if an abnormality occurs in the decompilation process, APK information is extracted by compressing the package to decompress and then analyzing based on android package body structure specifications, and finally decompiled to a smali source code is output, and a AndroidManifest file is obtained.
4. The method for extracting and analyzing massive APK source code features and similarity according to claim 1, wherein in step S 2, a catalog feature set is constructed by summarizing based on source code file organization modes of Android Studio mainstream IDE defaults and community consensus; and analyzing the core code file catalogue layer by layer through the package name and the starting class structure, and analyzing the core code catalogue through the naming mode of the package name and the position of the starting class.
5. The method for extracting and analyzing characteristics and similarity of massive APK source codes according to claim 1, wherein AndroidManifest files include APP names, package names, rights, attributes and service statements, and in step S 3, the method is specifically executed as follows:
S 3.1: firstly, word segmentation is carried out on an input configuration file, the configuration file is segmented into individual vocabulary units according to attributes or names, and symbols or characters without specific meanings are filtered;
S 3.2: for each vocabulary unit, calculating a hash value of the vocabulary unit, multiplying the hash value by a weight value, setting the weight value according to the importance or frequency of the vocabulary, and extracting the characteristics;
S 3.3: combining the feature vectors of each vocabulary unit, and using a vector with a fixed length to represent the whole text for feature combination;
S 3.4: calculation SimHash: weighting and summarizing the combined feature vectors, and setting the corresponding position of each feature vector to be 1 if the value at the position is greater than 0; otherwise, setting the value to be-1, and finally obtaining a binary SimHash value;
S 3.5: comparing SIMHASH, comparing SimHash values of different texts, using hamming distance to measure similarity of two SimHash values.
6. The method for extracting and analyzing characteristics of massive APK source codes and similarity according to claim 1, wherein the characteristics of the source codes are collected to form a characteristic representation word set of a current source code file by acquiring variables and attributes in smal i or java source codes based on naming, types and occurrence frequency three-element information of the acquired variables.
7. The method for extracting and analyzing massive APK source code features according to claim 6, wherein the similarity calculation method for the source code features is characterized in that the coverage degree of the intersection of variables is compared, the consistency of the occurrence frequency and the type of the variables is calculated, and if the intersection of the variables exceeds a threshold value of 70%, the current source code features are considered to be similar.
8. A computer readable storage medium storing a computer program, which when executed by a main controller implements the method of any of the preceding claims 1-7.
CN202311441226.0A 2023-11-01 2023-11-01 Mass APK source code feature extraction and similarity analysis method Active CN117591119B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311441226.0A CN117591119B (en) 2023-11-01 2023-11-01 Mass APK source code feature extraction and similarity analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311441226.0A CN117591119B (en) 2023-11-01 2023-11-01 Mass APK source code feature extraction and similarity analysis method

Publications (2)

Publication Number Publication Date
CN117591119A CN117591119A (en) 2024-02-23
CN117591119B true CN117591119B (en) 2024-05-31

Family

ID=89909022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311441226.0A Active CN117591119B (en) 2023-11-01 2023-11-01 Mass APK source code feature extraction and similarity analysis method

Country Status (1)

Country Link
CN (1) CN117591119B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109445834A (en) * 2018-10-30 2019-03-08 北京计算机技术及应用研究所 The quick comparative approach of program code similitude based on abstract syntax tree
CN109800575A (en) * 2018-12-06 2019-05-24 成都网安科技发展有限公司 A kind of safety detection method of Android application program
CN110034921A (en) * 2019-04-18 2019-07-19 成都信息工程大学 The webshell detection method of hash is obscured based on cum rights
CN114995880A (en) * 2022-05-23 2022-09-02 北京计算机技术及应用研究所 Binary code similarity comparison method based on SimHash

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109445834A (en) * 2018-10-30 2019-03-08 北京计算机技术及应用研究所 The quick comparative approach of program code similitude based on abstract syntax tree
CN109800575A (en) * 2018-12-06 2019-05-24 成都网安科技发展有限公司 A kind of safety detection method of Android application program
CN110034921A (en) * 2019-04-18 2019-07-19 成都信息工程大学 The webshell detection method of hash is obscured based on cum rights
CN114995880A (en) * 2022-05-23 2022-09-02 北京计算机技术及应用研究所 Binary code similarity comparison method based on SimHash

Also Published As

Publication number Publication date
CN117591119A (en) 2024-02-23

Similar Documents

Publication Publication Date Title
US9003529B2 (en) Apparatus and method for identifying related code variants in binaries
US7076486B2 (en) Method and system for efficiently identifying differences between large files
CN107273474A (en) Autoabstract abstracting method and system based on latent semantic analysis
CN107797916B (en) DDL statement auditing method and device
CN116775497B (en) Database test case generation demand description coding method
CN114817243A (en) Method, device and equipment for establishing database joint index and storage medium
CN109067708A (en) A kind of detection method, device, equipment and the storage medium at webpage back door
WO2016093839A1 (en) Structuring of semi-structured log messages
CN113887182A (en) Table generation method, device, equipment and storage medium
CN109977977A (en) A kind of method and corresponding intrument identifying potential user
CN117591119B (en) Mass APK source code feature extraction and similarity analysis method
CN110990834B (en) Static detection method, system and medium for android malicious software
CN107622201B (en) A kind of Android platform clone's application program rapid detection method of anti-reinforcing
CN113886520B (en) Code retrieval method, system and computer readable storage medium based on graph neural network
CN115794105A (en) Micro-service extraction method and device and electronic equipment
CN116991412A (en) Code processing method, device, electronic equipment and storage medium
CN114297046A (en) Event obtaining method, device, equipment and medium based on log
CN114118058A (en) Emotion analysis system and method based on fusion of syntactic characteristics and attention mechanism
Neznanov et al. Analyzing Social Networks Services Using Formal Concept Analysis Research Toolbox.
CN117725555B (en) Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium
Grace et al. Efficiency calculation of mined web navigational patterns
Zhang et al. C4. 5 Algorithm Based on the Sample Selection and Cosine Similarity
JP6783741B2 (en) Distance measuring device, communication system, creating device and distance measuring program
Choudhury et al. Sentimental analysis of Twitter data on Hadoop
CN113886388A (en) Scene-based identification unifying method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant