CN107229563B - Cross-architecture binary program vulnerability function association method - Google Patents

Cross-architecture binary program vulnerability function association method Download PDF

Info

Publication number
CN107229563B
CN107229563B CN201610178368.6A CN201610178368A CN107229563B CN 107229563 B CN107229563 B CN 107229563B CN 201610178368 A CN201610178368 A CN 201610178368A CN 107229563 B CN107229563 B CN 107229563B
Authority
CN
China
Prior art keywords
function
tested
vulnerability
similarity
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610178368.6A
Other languages
Chinese (zh)
Other versions
CN107229563A (en
Inventor
石志强
常青
陈昱
王猛涛
孙利民
朱红松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201610178368.6A priority Critical patent/CN107229563B/en
Publication of CN107229563A publication Critical patent/CN107229563A/en
Application granted granted Critical
Publication of CN107229563B publication Critical patent/CN107229563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3616Software analysis for verifying properties of programs using software metrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Stored Programmes (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a cross-architecture binary program vulnerability function correlation method. The method comprises the following steps: 1) carrying out reverse analysis on a binary file of a binary program to obtain a function library to be tested; then, acquiring a function call graph, a function control flow graph and basic function attributes according to the function library to be tested; 2) extracting the characteristics of each function to be tested according to the function call graph, the function control flow graph and the function basic attribute; then, calculating the numerical similarity of each function to be tested and the vulnerability function according to the extracted features and the features of the vulnerability function; 3) for each function to be tested, constructing a weighted bipartite graph of the function to be tested and the vulnerability function respectively, and calculating the overall similarity of the function to be tested and the vulnerability function by adopting a bipartite graph algorithm; 4) if the overall similarity of the function to be tested and the vulnerability function is larger than a set judgment threshold value, the function to be tested is judged to be a suspected vulnerability function, otherwise, the function to be tested is judged to be a normal function. The method is simple to implement and easy to popularize.

Description

Cross-architecture binary program vulnerability function association method
Technical Field
The invention relates to the field of binary program vulnerability mining and reverse analysis, in particular to a cross-architecture binary program vulnerability function association method, and belongs to the technical field of computer program detection.
Background
With the rapid development of global information technology and the rapid popularization of information systems and information products, computer software has become an important component of the development of world economy, science and technology, military and society. Practice shows that most information security events are initiated by attackers through software bugs. Therefore, the security vulnerability is a decisive factor directly affecting the information security system, and it is necessary to analyze and utilize the software vulnerability. Vulnerability analysis can be divided into a source code level and a binary level according to the objects to be analyzed. The vulnerability analysis technology of the source code level is to directly analyze a program written in a high-level language. An analyst can find coding errors and design defects in a program by a series of vulnerability analysis technologies by utilizing rich and complete semantic information in a source code. In practical application, however, a large amount of commercial software exists in the form of binary codes, and source codes are difficult to obtain. Therefore, binary program vulnerability analysis is becoming an important branch of the information security field.
The early application scenario is that the similarity of two binary files compiled by the same Architecture is calculated to perform function association, because the compiled by the same Architecture is obtained after disassembling, the assembler can be regarded as a character string and directly performs similarity analysis and processing because the assembler is of the same instruction set, the method of a semantic template is provided for fast positioning of similar code segments in 2013, arnun L akhotia provides a method for calculating the similarity of basic blocks by using a character compiling distance, Yaniv David provides a great difference in compiling optimization options even if the assembler obtained by disassembling the same source code if compiling binary files are different in compiling optimization options, which means that a method relying on the expression form of the assembler is sensitive to compiling optimization options, so researchers turn to a research point to information which depends on a lower expression form, begin to extract information of program segments as characteristic correlation information, and use a semantic correlation algorithm for extracting semantic functions of a semantic functions, and a Cross-linking algorithm for realizing semantic mapping between a semantic functions, such as a Cross-linking algorithm, a theoretical model, a model.
At present, a cross-architecture binary program vulnerability correlation technology which is simple to implement and high in accuracy is lacked.
Disclosure of Invention
The invention aims to provide a cross-architecture binary program vulnerability function association method. The method mainly comprises the following steps: performing reverse analysis on the binary file to obtain a function library to be tested, and calculating the numerical similarity between the function to be tested and the vulnerability function; intercepting local structure information of two functions to be compared from a function call graph to form two structure subgraphs; hierarchically abstracting the two structural subgraphs into weighted bipartite graphs, calculating the maximum weight matching of the weighted bipartite graphs by adopting a bipartite graph matching algorithm, weighting and summing the maximum weight matching as the overall similarity of two functions, and sequencing the two functions according to the overall similarity; and calculating a judgment threshold value based on the ROC curve, judging the function with the similarity larger than the judgment threshold value as a suspected vulnerability function, performing next analysis, otherwise, judging the function as a normal function, and not processing the function.
The technical innovation of the method is that a reconstruction function controls a flow chart algorithm when the similarity is calculated and a structured matching algorithm when the overall similarity is calculated. The method integrates the numerical value information and the structure information of the function, the extraction of the characteristics does not depend on a specific instruction set, the function association can be carried out on the binary files under different architectures, the result accuracy is high, and the realization is simple.
In order to achieve the purpose, the invention adopts the following technical scheme:
a cross-architecture binary program vulnerability function correlation method mainly comprises the following 3 steps:
1) and calculating the numerical similarity of the function to be detected and the vulnerability function. Firstly, carrying out reverse analysis on a binary file to obtain a function library to be tested; extracting information of calling relation among functions to be tested (namely function calling graph), control flow graph information in the functions and basic attribute information of the functions to be tested, and performing numerical processing to obtain characteristic vectors of the functions; adopting a self-compiled multi-platform function set with a symbol table as a training sample to train the integrated classifier; and calculating the similarity of each feature of the function to be measured and the vulnerability function to form a similarity vector, and bringing the similarity vector into the integrated classifier for prediction to obtain the numerical similarity.
2) And constructing a weighted bipartite graph, and calculating the overall similarity by adopting a bipartite graph algorithm. And intercepting local structure information of two functions to be compared from the function call graph to form two structure subgraphs, wherein the intercepted layer number can be determined according to actual needs. And hierarchically abstracting the two structural subgraphs into weighted bipartite graphs, wherein the node set is a function contained in corresponding layers of the two structural subgraphs, the edge set is the similarity of any two functions, the edge weight is the numerical similarity obtained by the previous step of calculation, then, the maximum weight matching of the weighted bipartite graphs is hierarchically calculated by adopting a bipartite graph matching algorithm, and the weighted sum is used as the overall similarity of the function to be measured and the vulnerability function.
3) The determination is made based on a determination threshold calculated based on the ROC curve. And obtaining an overall similarity vector of the function set to be tested and the vulnerability function to draw an ROC curve, taking a threshold value corresponding to the highest point of the Y-X curve as a judgment threshold value, judging the function with the similarity greater than the judgment threshold value as a suspected vulnerability function, and otherwise, judging the function as a normal function. Each point constituting the ROC curve is (X, Y), and then the curve constituted by (X, Y-X) is a Y-X curve based on the ROC curve, wherein X defines a field M.
The invention can obtain the following beneficial effects:
when the numerical similarity of the function to be tested and the vulnerability function is calculated, 9 aspects of characteristics such as call relation characteristics, stack space characteristics, character string characteristics, code scale characteristics, path sequence characteristics, path basic characteristics, degree sequence characteristics, degree basic characteristics, graph scale characteristics and the like are mainly considered, typical characteristics of one function are reflected relatively completely, and the characteristic extraction does not depend on a specific instruction set, so that the vulnerability association can be carried out on binary files compiled aiming at two different architectures. Meanwhile, when the characteristics are extracted, the IDA plug-in is compiled to extract from the IDA analysis result, and the IDA has differences when reversely analyzing the binary files with different architectures to construct the function control flow graph.
The invention adopts the method of intercepting the function call graph and constructing the weighted bipartite graph to calculate the maximum weight matching when fusing the numerical value information and the structure information of the function. And (3) assuming that the closer the function node to be detected is to the greater the contribution of the function node to the matching, layering the function nodes according to the hop number from the function to be detected, performing minimum bipartite graph matching on the single-layer function nodes by using a Kuhn-Munkres algorithm to obtain the single-layer similarity, and finally weighting and summing the similarities of the layers to obtain the overall similarity of the functions. When the method is used for calculating the overall similarity of the functions to be matched, the influence of the similarity of other function pairs on the function pairs to be matched is considered on the basis of the calling information among the functions. Compared with a method only using numerical values, the method is more objective and accurate.
Compared with the prior art, the method and the device do not depend on a specific instruction set, can be used for carrying out vulnerability association on binary files with different architectures, and are simple to implement and easy to popularize.
Drawings
FIG. 1 is a schematic flow diagram of a protocol;
FIG. 2 is a schematic diagram of a reconstruction function control flow graph;
FIG. 3 is a schematic diagram of a hierarchy of structural subgraphs;
FIG. 4 is a schematic diagram of a construct empowerment bipartite graph;
FIG. 5 is a schematic diagram of determining an optimal threshold value based on a ROC curve.
Detailed Description
A cross-architecture binary program vulnerability correlation method comprises the following specific implementation modes:
1) and writing an IDA plug-in to perform reverse analysis on the binary file to obtain a function library to be tested, and a function basic attribute, a function call graph and a function control flow graph.
2) And calculating the numerical similarity of the function to be detected and the vulnerability function. The whole process comprises three steps of numerical feature extraction, similarity calculation and neural network similarity prediction.
And in the stage of numerical feature extraction, numerical feature extraction is respectively carried out from three aspects of the basic attribute of the function, the function call graph and the function control flow graph. The method mainly extracts nine aspects of features such as call relation features, character string features, stack space features, code scale features, path sequence features, path basic features, degree sequence features, degree basic features, graph scale features and the like of the function to be tested. These nine aspects feature more completely reflecting the typical properties of a function.
And analyzing the function call graph, calculating the times of calling each function to be tested by other functions, calculating the times of calling the other functions by the function and the times after the function is subjected to duplication elimination, and forming calling relation characteristics.
Analyzing basic attributes of the functions, and calculating stack space to form stack space characteristics; calculating the number of jump instructions, the number of instructions and the code quantity to form a code scale characteristic; and calculating the number of the called character strings and the called character string set to form character string characteristics.
Before analyzing the function control flow graph, feature extraction is carried out on the function control flow graph (CFG graph) which cannot be directly subjected to IDA analysis. In a few cases, the CFG graphs of the same function under different architectures may be very different, such as the memcap _ main function of busybox, which is very different between the CFG graphs under the ARM architecture and the MIPS architecture. This is because the CPU instruction set of each platform is handled by the corresponding IDA processor module. However, the strategy for generating the CFG graph by each platform processor module is different, for example, rmdir _ main function of busy, ARM platform bl instruction divides basic block, and jal (also function call instruction) under MIPS platform does not divide basic block. In order to unify the basic block division rule of the CFG graph, we need to reconstruct the CFG graph, and the reconstruction algorithm is as follows
a) The head and tail addresses and the original edge endpoint addresses of all basic blocks of the function are identified.
b) And sequencing all the basic blocks according to the ascending order of the head addresses of the basic blocks, and counting the in-degree and out-degree of each basic block.
c) The basic blocks are scanned from small to large in ascending order of the basic block header address. If the out-degree of the nth basic block is 0 and the in-degree of the (n + 1) th basic block is 0, merging the two basic blocks into a new nth basic block, deleting the original nth and the original (n + 1) th basic blocks, resetting the edge taking the head address of the original (n + 1) th basic block as the end point address, and taking the head address of the nth basic block as the end point address instead; if the out-degree of the nth basic block is 0 and the in-degree of the (n + 1) th basic block is not 0, adding an edge pointing to the (n + 1) th basic block from the nth basic block, wherein the end point information is the head address of the nth basic block and the end point information is the head address of the nth basic block.
d) And finishing the reconstruction process until the last basic block is scanned.
The reconstructed CFG graph algorithm source code realized by python is as follows, wherein an input parameter bb L ist refers to a list formed by the head and the tail of all basic blocks, edge L ist is a list formed by all original edges of IDA analysis, startPoint is the function entry address, an output toDic is a dictionary formed by all edges of a reconstructed CFG graph, bbDic is a dictionary formed by all basic blocks after the CFG graph is reconstructed, and the reconstruction effect of the memcap _ main function on busybox is shown in FIG. 2.
Figure GDA0002390733210000051
Analyzing a function control flow graph, calculating the degree of entrance and exit of each node (namely a basic block), constructing a CFG directed graph adjacent matrix, converting the function control flow graph into an undirected graph, calculating the degree of each node, and constructing the CFG undirected graph adjacent matrix. And carrying out degree analysis on the CFG directed graph adjacency matrix and the CFG undirected graph adjacency matrix. And calculating an in-degree ascending sequence and an out-degree ascending sequence based on the CFG directed graph adjacency matrix, and calculating a degree ascending sequence based on the CFG undirected graph adjacency matrix, wherein the degree ascending sequence, the out-degree ascending sequence and the CFG undirected graph adjacency matrix form a degree sequence characteristic.
And calculating probability sequences of maximum degree, average degree and degree based on the degree ascending sequence. Calculating the entropy of the graph based on the probability sequence of the degree, and constructing basic features of the degree; performing path analysis on the CFG undirected graph adjacency matrix, and calculating the minimum distance between any two nodes (namely basic blocks) by using a Floyd algorithm or a Dijkstra algorithm to construct a path sequence characteristic; and calculating the average path length, the diameter and the radius of the graph to form the basic path characteristics. And (4) carrying out basic attribute analysis on the CFG directed graph adjacency matrix, and calculating the number of nodes, the number of edges, the link facies ratio of the graph, the graph density and the clustering coefficient of the graph to form the CFG graph scale characteristic.
And operating according to the steps, and totally extracting the call relation characteristic, the character string characteristic, the stack space characteristic, the code scale characteristic, the path sequence characteristic, the path basic characteristic, the degree sequence characteristic, the degree basic characteristic and the graph scale characteristic of the function.
In the feature similarity calculation stage, based on the expression form of the features, a numerical similarity calculation method, a sequence similarity calculation method based on a character string editing distance algorithm and a set similarity calculation method based on Jaccard similarity are adopted to calculate the similarity of each feature of the function to be compared as an input vector of the integrated classifier.
In the stage of predicting the overall similarity by the integrated classifier, firstly, a self-compiled function set with multiple platforms and a symbol table is used as a training sample to train the integrated classifier. The specific method comprises the following steps: and selecting the same source code, selecting different compilers and different optimization options, and compiling aiming at different architectures to obtain a plurality of binary executable files. And performing reverse analysis on each binary executable file to obtain a function library and extracting the multi-dimensional characteristics of each function. Based on the features, similarity is calculated for every two functions in different function libraries as input vectors of the integrated classifier. If the two function names are the same, the label is 1, as a positive sample, and if the two function names are different, the label is 0, as a negative sample. Several initial classifiers are established. And constructing a plurality of independent and identically distributed sub-training sample sets from the replaced extracted 80% samples in the initial sample set as training samples of each classifier. And inputting the corresponding sub-training sample set into a classifier for training, and adjusting the parameters of the classifier according to the prediction result until the prediction result meets the requirement, wherein the training of the classifier is finished at the moment. And then predicting the numerical similarity by adopting a trained integrated classifier. And extracting characteristics of the vulnerability function and each function to be tested, and calculating a similarity vector to serve as a test sample. And predicting by using a plurality of classifiers in the trained integrated classifier to obtain a plurality of predicted values, and taking the weighted average of the predicted values as a final predicted value as numerical similarity.
For example, if a training sample of the matching pattern MIPS-O2 → ARM-O2 is needed.
The method comprises the following steps: aiming at the MIPS framework, compiling a binary file named openssl-MIPS-O2 by adopting an-O2 optimization option for openssl source codes; aiming at an ARM architecture, openssl source codes are compiled into a binary file named openssl-ARM-O2 by adopting an-O2 optimization option.
Step two: and respectively carrying out reverse analysis on the two binary files to obtain two function libraries. The function library of openssl-MIPS-O2 has m functions in total, and is named as X1-MIPS-O2,X2-MIPS-O2,...,Xm-MIPS-O2; the function library of openssl-ARM-O2 has n functions in total, and is named as Y1-ARM-O2,X2-ARM-O2,...,YnARM-O2. Features are computed for all functions of the two libraries, resulting in m + n features in total.
Step three, calculating function similarity vectors among the libraries to obtain m × n similarity vectors, if X isi=YjThen the function X of the openssl-MIPS-O2 library can be considerediFunction Y of MIPS-O2 and opennssl-ARM-O2 libraryjARM-O2 is the same function, then the label columnA 1 is a positive sample, whereas a negative sample is considered.
Step four: for the balance of positive and negative samples and the speed increase, every time the similarity calculation and label marking are carried out on 100 openssl-MIPS-O2 functions and 100 openssl-ARM-O2 functions, 100 positive samples and 9900 negative samples are obtained. All positive samples were collected and 100 were randomly drawn from 9900 negative samples as negative samples.
This results in min (m, n) positive samples and the same number of negative samples as the initial sample set for the matching pattern MIPS-O2 → ARM-O2.
3) And constructing an empowered bipartite graph, and calculating the overall similarity by adopting a bipartite graph matching algorithm (such as a Kuhn-Munkres algorithm).
The whole algorithm comprises the following steps:
a) and intercepting local structure information of the function to be compared from the function call graph to form two structure subgraphs, wherein the intercepted layer number can be determined according to the experimental effect.
b) Layering the intercepted structure subgraph according to the hop number away from the function to be compared (wherein, if the structure subgraph is from a function call graph of a binary file where the vulnerability function is located, the function to be compared refers to the vulnerability function; if the structural subgraph is from a function call graph of a binary file where the function to be compared is located, the function to be compared here refers to the function to be compared), and the weight is given according to the importance degree of the function to be compared, as shown in fig. 3.
c) Abstracting two subgraph corresponding layers into a weighted complete bipartite graph, wherein a node set is a function contained in the corresponding layer, an edge set is a similarity relation of any two functions in the node set, and an edge weight is a numerical similarity corresponding to the two functions, as shown in fig. 4. This results in a plurality of weighted bipartite graphs.
d) And adopting a bipartite matching algorithm to calculate the maximum weight matching corresponding to each layer in a layered manner as the similarity of the corresponding layer for each weighted bipartite graph.
e) And weighting and summing the similarity of each layer to obtain the overall similarity of the functions to be compared.
4) The determination is made based on a determination threshold calculated based on the ROC curve. And obtaining an overall similarity vector of the function set to be tested and the vulnerability function to draw an ROC curve. Wherein the horizontal axis of the ROC curve is a false positive rate, namely a false positive rate (FP/(FP + TN)); the vertical axis represents the true positive rate, i.e., the ratio of true positive (TP/(TP + FN)). The ROC curve gives the variation of the false positive rate and the true positive rate when the threshold is varied, which can be used to compare the performance of the classifier. Ideally, the best classifier should be located at the upper left corner, which means that the classifier obtains a high true positive rate when the false positive rate is low, i.e., a true vulnerability function is detected, and few normal functions are misjudged as vulnerability functions. The point of the ROC curve closer to the upper left corner is the best threshold with the least error, the point on the training set where the total number of false positives and false negatives is the least, i.e., the point where Y-X is the largest, as shown in fig. 5. Therefore, a threshold corresponding to the highest point of the Y-X curve is used as a judgment threshold, and a function with the similarity greater than the judgment threshold is judged as a suspected vulnerability function, otherwise, the function is judged as a normal function.
In summary, the present invention discloses a cross-architecture binary program vulnerability correlation technique. The above description of the embodiments is not intended to limit the invention, and those skilled in the art may make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention is defined by the scope of the claims.

Claims (5)

1. A cross-architecture binary program vulnerability function correlation method comprises the following steps:
1) carrying out reverse analysis on a binary file of a binary program to obtain a function library to be tested; then, acquiring a function call graph, a function control flow graph and basic function attributes according to the function library to be tested;
2) extracting the characteristics of each function to be tested according to the function call graph, the function control flow graph and the function basic attribute; then, calculating the numerical similarity of each function to be tested and the vulnerability function according to the extracted features and the features of the vulnerability function;
3) for each function to be tested, constructing a weighted bipartite graph of the function to be tested and the vulnerability function respectively, and calculating the overall similarity of the function to be tested and the vulnerability function by adopting a bipartite graph algorithm;
4) if the overall similarity of the function to be tested and the vulnerability function is larger than a set judgment threshold value, the function to be tested is judged to be a suspected vulnerability function, otherwise, the function to be tested is judged to be a normal function.
2. The method of claim 1, wherein the numerical similarity is calculated by:
21) compiling the same source code into a plurality of binary executable files with different architectures; then, reversely analyzing each binary executable file to obtain a function library and extracting the characteristics of each function to be tested;
22) respectively selecting a function from two different function libraries, and calculating the similarity of the two selected functions based on the extracted features to be used as an input vector of the integrated classifier; if the two function names are the same, the label is 1, the corresponding input vector is used as a positive sample, otherwise, the corresponding input vector is used as a negative sample, and an initial sample set is obtained; wherein the ensemble classifier comprises a plurality of classifiers;
23) extracting a plurality of samples from the initial sample set, constructing a plurality of independent and identically distributed sub-training sample sets as training samples of each classifier in the integrated classifier;
24) and respectively inputting the sub-training sample sets into corresponding classifiers for training, predicting the vulnerability function and the function to be tested by adopting the trained classifiers based on the characteristics of the vulnerability function and each function to be tested, and then taking the weighted average of a plurality of obtained predicted values as the numerical similarity.
3. The method of claim 1 or 2, wherein the function control flow graph obtained in step 1) is reconstructed by:
a) identifying head and tail addresses and original edge endpoint addresses of all basic blocks of a function in a function control flow graph;
b) sequencing all the basic blocks according to the ascending sequence of the head addresses of the basic blocks, and counting the in-degree and out-degree of each basic block;
c) scanning the basic blocks from small to large according to the ascending order of the head addresses of the basic blocks: if the out-degree of the nth basic block is 0 and the in-degree of the (n + 1) th basic block is 0, merging the two basic blocks into a new nth basic block, deleting the original nth and the original (n + 1) th basic blocks, and changing the edge taking the head address of the original (n + 1) th basic block as the end point address into the head address of the nth basic block as the end point address; if the out-degree of the nth basic block is 0 and the in-degree of the (n + 1) th basic block is not 0, adding an edge pointing to the (n + 1) th basic block from the nth basic block, wherein one end point information of the edge is the head address of the nth basic block, and the other end point information is the head address of the nth basic block.
4. The method of claim 1, wherein the overall similarity is calculated by:
a) intercepting local structure information of the function to be detected from the function call graph to form a structure subgraph a, and intercepting local structure information of the vulnerability function from the function call graph where the vulnerability function is located to form a structure subgraph b;
b) layering the intercepted structure subgraph a according to the hop count from the function to be tested and giving weight according to the importance degree of the function to be tested, thereby abstracting the corresponding layer of the structure subgraph a into a weighted bipartite graph respectively, layering the intercepted structure subgraph b according to the hop count from the vulnerability function and giving weight according to the importance degree of the vulnerability function, thereby abstracting the corresponding layer of the structure subgraph b into a weighted bipartite graph respectively; the node set is a function contained in the corresponding layer, the edge set is the similarity relation of any two functions in the node set, and the edge weight is the numerical similarity corresponding to the two functions;
c) adopting a bipartite matching algorithm to calculate the maximum weight matching layer by layer for each weighted bipartite graph as the similarity of each layer;
d) and weighting and summing the similarity of each layer to be used as the overall similarity of the function to be detected and the vulnerability function.
5. The method of claim 1, wherein the decision threshold is determined by: and drawing an ROC curve according to the obtained overall similarity of the function to be detected and the vulnerability function, and taking a threshold value corresponding to the highest point of the Y-X curve as a judgment threshold value.
CN201610178368.6A 2016-03-25 2016-03-25 Cross-architecture binary program vulnerability function association method Active CN107229563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610178368.6A CN107229563B (en) 2016-03-25 2016-03-25 Cross-architecture binary program vulnerability function association method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610178368.6A CN107229563B (en) 2016-03-25 2016-03-25 Cross-architecture binary program vulnerability function association method

Publications (2)

Publication Number Publication Date
CN107229563A CN107229563A (en) 2017-10-03
CN107229563B true CN107229563B (en) 2020-07-10

Family

ID=59932522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610178368.6A Active CN107229563B (en) 2016-03-25 2016-03-25 Cross-architecture binary program vulnerability function association method

Country Status (1)

Country Link
CN (1) CN107229563B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017061270A1 (en) * 2015-10-09 2017-04-13 日本電信電話株式会社 Vulnerability discovering device, vulnerability discovering method, and vulnerability discovering program
CN107944278A (en) * 2017-12-11 2018-04-20 北京奇虎科技有限公司 A kind of kernel leak detection method and device
CN107967152B (en) * 2017-12-12 2020-06-19 西安交通大学 Software local plagiarism evidence generation method based on minimum branch path function birthmarks
CN109472145A (en) * 2017-12-29 2019-03-15 北京安天网络安全技术有限公司 A kind of code reuse recognition methods and system based on graph theory
CN108268777B (en) * 2018-01-18 2020-06-30 中国人民大学 Similarity detection method for carrying out unknown vulnerability discovery by using patch information
CN108491228B (en) * 2018-03-28 2020-03-17 清华大学 Binary vulnerability code clone detection method and system
CN109740347B (en) * 2018-11-23 2020-07-10 中国科学院信息工程研究所 Method for identifying and cracking fragile hash function of intelligent device firmware
CN109670318B (en) * 2018-12-24 2021-03-02 中国科学院软件研究所 Vulnerability detection method based on cyclic verification of nuclear control flow graph
CN110083534B (en) * 2019-04-19 2023-03-31 西安邮电大学 Software plagiarism detection method based on reduction-constrained shortest path birthmarks
CN110414238A (en) * 2019-06-18 2019-11-05 中国科学院信息工程研究所 The search method and device of homologous binary code
CN110598417B (en) * 2019-09-05 2021-02-12 北京理工大学 Software vulnerability detection method based on graph mining
CN110674346A (en) * 2019-10-11 2020-01-10 北京达佳互联信息技术有限公司 Video processing method, device, equipment and storage medium
CN110943981B (en) * 2019-11-20 2022-04-08 中国人民解放军战略支援部队信息工程大学 Cross-architecture vulnerability mining method based on hierarchical learning
CN111046385B (en) * 2019-11-22 2022-04-22 北京达佳互联信息技术有限公司 Software type detection method and device, electronic equipment and storage medium
CN110968874B (en) * 2019-11-28 2023-04-14 腾讯科技(深圳)有限公司 Vulnerability detection method, device, server and storage medium
CN111310178B (en) * 2020-01-20 2024-01-23 武汉理工大学 Firmware vulnerability detection method and system in cross-platform scene
CN111914260B (en) * 2020-06-22 2023-03-31 西安交通大学 Binary program vulnerability detection method based on function difference
CN112540787A (en) * 2020-12-14 2021-03-23 北京知道未来信息技术有限公司 Program reverse analysis method and device and electronic equipment
CN112800425B (en) * 2021-02-03 2024-06-21 南京大学 Code analysis method and device based on graph calculation
CN114610606B (en) * 2022-02-25 2023-03-03 中国人民解放军国防科技大学 Binary system module similarity matching method and device based on arrival-fixed value analysis

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315599A (en) * 2007-05-29 2008-12-03 北京航空航天大学 Method and device for detecting similarity of source codes
CN101398758A (en) * 2008-10-30 2009-04-01 北京航空航天大学 Detection method of code copy
CN101739337A (en) * 2009-12-14 2010-06-16 北京理工大学 Method for analyzing characteristic of software vulnerability sequence based on cluster
CN101814053A (en) * 2010-03-29 2010-08-25 中国人民解放军信息工程大学 Method for discovering binary code vulnerability based on function model
CN101968766A (en) * 2010-10-21 2011-02-09 上海交通大学 System for detecting software bug triggered during practical running of computer program
KR20150047241A (en) * 2013-10-24 2015-05-04 한양대학교 산학협력단 Method and apparatus for determing plagiarism of program using control flow graph
CN105045715A (en) * 2015-07-27 2015-11-11 电子科技大学 Programming mode and mode matching based bug clustering method
CN105160206A (en) * 2015-10-08 2015-12-16 中国科学院数学与***科学研究院 Method and system for predicting protein interaction target point of drug

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315599A (en) * 2007-05-29 2008-12-03 北京航空航天大学 Method and device for detecting similarity of source codes
CN101398758A (en) * 2008-10-30 2009-04-01 北京航空航天大学 Detection method of code copy
CN101739337A (en) * 2009-12-14 2010-06-16 北京理工大学 Method for analyzing characteristic of software vulnerability sequence based on cluster
CN101814053A (en) * 2010-03-29 2010-08-25 中国人民解放军信息工程大学 Method for discovering binary code vulnerability based on function model
CN101968766A (en) * 2010-10-21 2011-02-09 上海交通大学 System for detecting software bug triggered during practical running of computer program
KR20150047241A (en) * 2013-10-24 2015-05-04 한양대학교 산학협력단 Method and apparatus for determing plagiarism of program using control flow graph
CN105045715A (en) * 2015-07-27 2015-11-11 电子科技大学 Programming mode and mode matching based bug clustering method
CN105160206A (en) * 2015-10-08 2015-12-16 中国科学院数学与***科学研究院 Method and system for predicting protein interaction target point of drug

Also Published As

Publication number Publication date
CN107229563A (en) 2017-10-03

Similar Documents

Publication Publication Date Title
CN107229563B (en) Cross-architecture binary program vulnerability function association method
CN111783100B (en) Source code vulnerability detection method for code graph representation learning based on graph convolution network
Du et al. Deepstellar: Model-based quantitative analysis of stateful deep learning systems
CN111639344B (en) Vulnerability detection method and device based on neural network
CN105868108B (en) The unrelated binary code similarity detection method of instruction set based on neural network
CN110232280B (en) Software security vulnerability detection method based on tree structure convolutional neural network
CN112541180A (en) Software security vulnerability detection method based on grammatical features and semantic features
CN113672931B (en) Software vulnerability automatic detection method and device based on pre-training
CN112668013B (en) Java source code-oriented vulnerability detection method for statement-level mode exploration
CN110162972B (en) UAF vulnerability detection method based on statement joint coding deep neural network
CN115146279A (en) Program vulnerability detection method, terminal device and storage medium
CN112364352A (en) Interpretable software vulnerability detection and recommendation method and system
Ge et al. AMDroid: android malware detection using function call graphs
CN113326187A (en) Data-driven intelligent detection method and system for memory leakage
Gao et al. Malware detection using attributed CFG generated by pre-trained language model with graph isomorphism network
CN116150757A (en) Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model
Assefa et al. Intelligent phishing website detection using deep learning
CN111400713A (en) Malicious software family classification method based on operation code adjacency graph characteristics
CN117725592A (en) Intelligent contract vulnerability detection method based on directed graph annotation network
Gu et al. Hierarchical attention network for interpretable and fine-grained vulnerability detection
CN116975881A (en) LLVM (LLVM) -based vulnerability fine-granularity positioning method
Huo et al. The application of 1D-CNN in microsoft malware detection
CN113076089B (en) API (application program interface) completion method based on object type
Mahyari A hierarchical deep neural network for detecting lines of codes with vulnerabilities
Diwan et al. VDGraph2Vec: Vulnerability Detection in Assembly Code using Message Passing Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant