CN108427643B - Binary program fuzzy test method based on multi-population genetic algorithm - Google Patents
Binary program fuzzy test method based on multi-population genetic algorithm Download PDFInfo
- Publication number
- CN108427643B CN108427643B CN201810233482.3A CN201810233482A CN108427643B CN 108427643 B CN108427643 B CN 108427643B CN 201810233482 A CN201810233482 A CN 201810233482A CN 108427643 B CN108427643 B CN 108427643B
- Authority
- CN
- China
- Prior art keywords
- population
- edges
- test data
- program
- edge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G06F11/3684—Test management for test design, e.g. generating new test cases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention relates to a binary program fuzzy test method based on multi-population genetic algorithm, belonging to the field of binary vulnerability mining in information security. The method adopts a multi-population genetic algorithm method, and firstly abstracts each test data individual into a chromosome. A main population and sub-populations 1 and 2 are then initialized, either randomly or from initial data, by recording the number of newly discovered edges in the test data execution path and the number of edges associated with the test data as a measure of fitness. And then, the good individuals of the sub-population are obtained by fitness sorting and are migrated to the main population. Finally, the main population and the sub population are respectively subjected to genetic operation (crossing and mutation) to obtain new individuals to be subjected to a new round of tracking execution. The method can effectively improve the coverage rate of the program execution path, can cover the specific program execution path, has obvious guiding significance for the generation of the test data, and has good application value and popularization value.
Description
Technical Field
The invention relates to a binary program fuzzy test method, and belongs to the field of binary vulnerability mining in information security.
Background
The fuzzy test technology is the most common vulnerability mining method with good comprehensive effect in the security field at present, and monitors whether the execution process of the software has abnormity such as breakdown or the like by providing a randomly constructed or variant test case for a target software system so as to observe whether the target software has potential vulnerabilities. The higher the code coverage rate of the test data generated by the fuzzy test system is, the higher the possibility of finding a bug is, so that the code coverage rate of the test data can be used as an evaluation criterion of the quality of the test data generation. In general, in the fuzz test, the source code of the tested program is not available, so the format of the input data is unknown. The mutation-based approach generates new test data by directly modifying existing test data. However, because the mutation mode is random, a high code coverage rate cannot be achieved, and the vulnerability mining effect is not good. Therefore, the present invention will provide a binary program fuzz testing method of multi-population genetic algorithms to improve the code coverage of variant-based fuzz testing to generate test data.
The basic problems to be solved by the binary program fuzzy test method of the multi-population genetic algorithm are as follows: the problem that the test data randomly generated by the mutation-based method cannot achieve high code coverage rate. In view of the existing binary program fuzzy test method with unknown input data format, the commonly used methods can be classified into two types:
1. method of symbol execution
The symbol-based method processes the test data as symbol values. And testing a new execution path by collecting constraint information when the program processes the symbol value and then solving and generating new test data by utilizing the constraint information. Theoretically, the method can reach the code coverage rate of 100%, but for a complex program, the symbolic execution has the defect of path explosion, and the application range of the symbolic execution method is seriously influenced.
2. Method for evolving algorithms
The method based on the evolutionary algorithm converts the test data into a proper format so as to conveniently guide the generation of the test data, and the genetic algorithm is widely applied. At present, one or more execution paths are required to be predefined manually in the use of a genetic algorithm, so that test data conforming to the preset execution paths are generated, and other paths cannot be tested.
In summary, the conventional binary program fuzz testing method for an unknown input data format has the problems of being not suitable for complex programs, having few execution paths for testing, and the like. Therefore, the invention provides a binary program fuzzy test method based on multi-population genetic algorithm.
Disclosure of Invention
The invention aims to provide a binary program fuzzy test method based on multi-population genetic algorithm, aiming at solving the problems that the binary program fuzzy test method with unknown input data format is not suitable for complex programs and the execution path of the test is few.
The design principle of the invention is as follows:
the method comprises the steps of converting test data into individuals of a main population and a sub-population, generating new test data by using changes of the individuals in the evolution process, simultaneously influencing the evolution of the main population by using the new individuals of the sub-population, namely ensuring sharing and communication of excellent information among the populations by setting a migration operator, accelerating convergence speed while ensuring the diversity of the population individuals, namely accelerating the coverage rate of a program execution path, and showing the overall flow in figure 1.
The technical scheme of the invention is realized by the following steps:
Step 1.1, test data are converted into individuals in the population.
And 1.2, randomly initializing the main population and the sub-population.
And 2, positioning the basic block.
And 2.1, executing the program after Qemu instrumentation, and acquiring basic block information in the program execution process.
And 2.2, inserting codes before the basic block, and outputting the program execution path information to an external file.
And 3, monitoring whether the tested program is broken down or not and recording a program execution path.
And 3.1, recording a basic block sequence after the program is executed, wherein the basic block sequence can be converted into an edge sequence, and an edge is a jump between two continuous basic blocks.
And 3.2, merging the same edges in the edge sequence to obtain an edge set as program execution path information.
Step 3.3 all individuals X were treated according to step 3.1 and step 3.2iObtaining the execution path information of the program corresponding to all the test data, i.e. the edge set Ei。
And 4, calculating the fitness of the test data and selecting excellent population individuals.
Step 4.1 calculates the increment in the number of edges found after test data execution as the f1 value for fitness calculation.
Step 4.2 updates the set of all edges once found and calculates the number of edges associated with the test data as the fitness calculated f2 value.
And 4.3, firstly comparing f1 values, then comparing f2 values for fitness ranking, and screening out excellent individuals (test data).
And 5, transferring the sub-population to the main population and crossing variation in the population.
Step 5.1 transfer a suitable number of superior individuals from the sub-population to the main population.
And 5.2, crossing in the population.
And 5.3, carrying out variation in the population.
And 5.3, bringing the newly obtained excellent individuals of the main population and the sub-population into the tested program to be executed, and repeating the steps 3 to 5.
Advantageous effects
Compared with a symbolic execution analysis method and other evolutionary algorithms, the test data generated in the same time can obtain higher code coverage rate, the generation of the test data in the fuzzy test data by the multi-population genetic algorithm is proved to have obvious guiding significance, the efficiency of the crash vulnerability is found to be obviously improved compared with AFL, and the generated test data can cover the execution path of a specific program.
Drawings
FIG. 1 is a flow chart of the binary program fuzzy test method based on multi-population genetic algorithm of the present invention.
Fig. 2 is a schematic diagram illustrating an exemplary basic block of the present invention.
Detailed Description
In order to better illustrate the objects and advantages of the present invention, embodiments of the method of the present invention are described in further detail below with reference to examples.
The specific process is as follows:
Step 1.1, first the main population and the sub-population are composed of a number of individuals, each of which can be abstractly expressed as a chromosome, then the ith individual in the population can be expressed as Xi=(xi,1,xi,2,xi,3,...,xi,D). The process of population initialization is XiEach gene x ini,dAssigned value of each xi,dRepresents one byte, and the length D is the number of bytes of the test data.
Step 1.2, the invention initializes the main population and the sub population by using a random assignment mode. Both class1 and class2 sub-populations were tested using a population in which class2 sub-population initializations are negated from the binary code of each of class1 sub-populations.
And 2, positioning the basic block.
And 2.1, Qemu instrumentation can acquire basic block information in the program execution process. Qemu simulates the process of executing a program, and divides the program into basic blocks for translation and execution.
And 2.2, inserting a section of code for outputting the information of the currently executed basic block before the Qemu executes the basic block, obtaining a basic block sequence corresponding to the program execution process when the Qemu simulates the execution program, namely the execution path information of the program, and recording the basic block sequence to an external file.
And 3, tracking and executing the tested program.
Step 3.1, representing each basic block in the program by its entry address b, a sequence of basic blocks is obtained by tracking the program execution (b)1,b2,b3,...,bn). Defining a jump between two consecutive basic blocks in an execution path as e ═ bk,bk+1) Then E is an edge in the program execution path (as shown in fig. 2) using the basic block as a node, and the program execution path can be represented as a sequence E of edgese=(e1,e2,e3,...,en-1)。
Step 3.2, merge sequence EeThe same side in the sequence table is obtained to obtain a set E 'of sides containing the appearance frequency information'e=(e’1,e‘2,e’3,...,e‘n-1). The number of occurrences of the same edge may be different in different program executions, we divided it into 8 different types: 1.2, 3, 4-7, 8-15, 16-31, 32-127, not less than 128. These 8 types can be represented using different bits of a byte, facilitating programming implementation. After classification, a new set E of edges is obtained "e={e”1,e“2,e”3,...,e“n-1}。
Step 3.3, for each individual X in the main population and the sub-populationiThe corresponding program basic block sequence is processed by the method, and finally the execution path information of the program, namely the edge set E is obtainedi={ei,1,ei,2,ei,3,...}。
And 4, calculating and sequencing fitness, and then selecting excellent individuals of the population.
Step 4.1, defining the set of all the discovered edges in the whole fuzzy test process as Et={et,1,et,2,et,3,...}. Through f1(Xi)=card(Ei-Et) The number f1 of newly found edges of the test data after execution in the program under test is calculated.
Step 4.2, update the set E in the populationtAnd Wt. For set EtAny one side et,iSuppose that the last test data to find this edge is Xt,iObtaining a set W with one edge corresponding to the test datat={(et,1,Xt,1),(et,2,Xt,2),(et,3,Xt,3),...}. And use the functionComputing a set WtF2, where W (e) is the number of edges from the set WtAnd obtaining test data corresponding to the edge e, wherein R (x, y) is a binary function, when x and y are the same, the function returns to 1, and otherwise, the function returns to 0.
And 4.3, calculating adaptive values and sequencing of the sub-populations independently of the main population. First, f of each individual in the population is calculated1Then updating the set E in the populationtAnd WtFinally, f of each individual is calculated2. When two individuals are subjected to fitness comparison, firstly, f is compared1The value of (c) is compared in case of no distinction2The value of (c). This allows selection of superior individuals from the main and sub-populations.
And 5, transferring the sub-population to the main population and carrying out cross variation in the population.
And 5.1, adding the top 20 percent of excellent individuals in the sub-population into the excellent individuals in the main population for crossing and mutation.
Step 5.2, the crossover process uses 2-opt exchange. When the main population crosses, the length of the chromosome of one individual is D, four random numbers between 0 and D are firstly generated to be used as cross points, and then the fragments between two cross points in the chromosome are exchanged pairwise. Similarly, subgroup 1 of class1 and subgroup 2 of class2 employ one and three pairs of intersections, respectively.
And 5.3, when the main population is changed, generating two random numbers between 0 and D as change points, and then replacing genes at the change points by using randomly generated values. Similarly, sub-population 1 and sub-population 2 randomly generate one and two variation points, respectively.
And 5.4, bringing the newly obtained excellent individuals of the main population and the sub-population into the tested program to be executed, and repeating the steps 3 to 5.
And (3) testing results: the experiment tests the newly found edges of the three tested programs in the specified time, and the experimental result shows that the test data generated in the two groups of tests can obtain higher code coverage rate in the same time, and the code coverage rate is improved by more than 27% compared with AFL, namely higher code coverage rate is obtained. In addition, the test is carried out 100 times to the 9 tested programs with the holes to obtain the average value, and the test result shows that the efficiency of the invention for finding the collapse hole is improved by 13 percent compared with the AFL in the test of all the 9 tested programs.
The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (6)
1. The binary program fuzzy testing method based on the multi-population genetic algorithm is characterized by comprising the following steps of:
step 3, firstly obtaining the basic block sequence and the corresponding edge sequence, then combining the same edges to obtain the edge set E containing the occurrence frequency informationeThen, the times of occurrence of each edge are divided into 8 types and are represented by different bits of a byte, and a new edge set E 'is obtained after classification'eFinally, the execution path information of the program, i.e., the set of edges E 'is obtained'e;
Step 4.1, define ith individual in population as Xi=(xi,1,xi,2,xi,3,...,xi,D) The set of all the edges found in the whole fuzzy test process is Et={et,1,et,2,et,3,., defining the execution path information of the program, i.e. the set of edges is Ei={ei,1,ei,2,ei,3,., defining the number of elements in the set A as card (A), and passing f1(Xi)=card(Ei-Et) Calculating the number of newly found edges of the test data after the test data is executed in the tested program;
step 4.2, for the set E of all the discovered edges in the whole fuzzy test processt={et,1,et,2,et,3,.. } any one edge et,iSuppose that the last test data to find this edge is Xt,iThen a set W is obtained in which one edge corresponds to the test datat={(et,1,Xt,1),(et,2,Xt,2),(et,3,Xt,3) ,.., and using a functionComputing a set WtF2, where W (e) is the number of edges from the set WtObtaining test data corresponding to the edge e, wherein R (X, y) is a binary function, when X and y are the same, the function returns to 1, otherwise, the function returns to 0, and X is the other conditioni=(xi,1,xi,2,xi,3,...,xi,D) Is the ith individual in the population;
and 4.3, comparing the fitness of the two test data, namely firstly comparing f1 values of the two test data, and if the two test data are equal, updating the set EtAnd WtFinally f of the test data is calculated2Comparing the values;
and 5, using 2-opt exchange in the crossing process, randomly generating 0-D crossing points, setting different crossing rates and variation rates for different sub-populations by taking the main population as a reference, wherein one is lower than the main population and the other is higher than the main population, and thus avoiding the algorithm from falling into premature convergence.
2. The multi-population genetic algorithm based binary program fuzzy test method of claim 1 further comprising: step 3, recording an execution path by recording a basic block starting point, merging the same edges to obtain a set of edges containing the information of the number of the edges, then dividing the number of the edges into n types, representing each type in the n types by one or more bytes, and finally obtaining a set E 'of the edges representing the information of the program execution path'eThe elements of the edges in the set contain the edge occurrence number category information.
3. The multi-population genetic algorithm based binary program fuzzy test method of claim 1 further comprising: step 4.2 define set Wt={(et,1,Xt,1),(et,2,Xt,2),(et,3,Xt,3) ,., to record edges and test data information relating to the edges, and to use them to find f2 values in fitness calculations,i.e. the number of edges among all found edges that are relevant to the test data.
4. The multi-population genetic algorithm based binary program fuzz testing of claim 1The method is characterized in that: step 4 by f1(Xi)=card(Ei-Et) Computing a set E of all discovered edges after execution of the current test datatThe increment of the number of the medium elements is used as the value of the fitness index f1 of the individual.
5. The multi-population genetic algorithm based binary program fuzzy test method of claim 1 further comprising: the newly found edge f1 is preferably considered in the step 4 fitness calculation, and then all relevant edges of the test data are considered, because finding the test data of the new execution path is more significant.
6. The multi-population genetic algorithm based binary program fuzzy test method of claim 1 further comprising: step 5, in the genetic operation, the cross rate and the variation rate of the sub-population 1 of class1 are lower than those of the main population, and the cross rate and the variation rate of the sub-population 2 of class2 are higher than those of the main population, so that the algorithm is prevented from falling into premature convergence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810233482.3A CN108427643B (en) | 2018-03-21 | 2018-03-21 | Binary program fuzzy test method based on multi-population genetic algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810233482.3A CN108427643B (en) | 2018-03-21 | 2018-03-21 | Binary program fuzzy test method based on multi-population genetic algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108427643A CN108427643A (en) | 2018-08-21 |
CN108427643B true CN108427643B (en) | 2020-12-08 |
Family
ID=63158791
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810233482.3A Expired - Fee Related CN108427643B (en) | 2018-03-21 | 2018-03-21 | Binary program fuzzy test method based on multi-population genetic algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108427643B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111338952B (en) * | 2020-02-25 | 2024-03-29 | 杭州世平信息科技有限公司 | Fuzzy test method and device for path coverage rate feedback |
CN112463638B (en) * | 2020-12-11 | 2022-09-20 | 清华大学深圳国际研究生院 | Fuzzy test method based on neural network and computer readable storage medium |
CN113268432B (en) * | 2021-06-24 | 2023-09-01 | 广东电网有限责任公司计量中心 | Electric energy meter driver testing method and system based on evolutionary algorithm |
CN116089317B (en) * | 2023-04-10 | 2023-06-27 | 江西财经大学 | Multipath testing method and system based on path similarity table and individual migration |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20000039405A (en) * | 1998-12-12 | 2000-07-05 | 이계철 | Method for arranging binary tree using genetic algorithm |
CN102385550A (en) * | 2010-08-30 | 2012-03-21 | 北京理工大学 | Detection method for software vulnerability |
CN103914383A (en) * | 2014-04-04 | 2014-07-09 | 福州大学 | Fuzz testing system on basis of multi-swarm collaboration evolution genetic algorithm |
-
2018
- 2018-03-21 CN CN201810233482.3A patent/CN108427643B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20000039405A (en) * | 1998-12-12 | 2000-07-05 | 이계철 | Method for arranging binary tree using genetic algorithm |
CN102385550A (en) * | 2010-08-30 | 2012-03-21 | 北京理工大学 | Detection method for software vulnerability |
CN103914383A (en) * | 2014-04-04 | 2014-07-09 | 福州大学 | Fuzz testing system on basis of multi-swarm collaboration evolution genetic algorithm |
Non-Patent Citations (1)
Title |
---|
"基于相对分类信息熵的进化特征选择算法";翟俊海等;《模式识别与人工智能》;20160831;第28卷(第8期);第682-690页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108427643A (en) | 2018-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108427643B (en) | Binary program fuzzy test method based on multi-population genetic algorithm | |
Nguyen et al. | Multiple reference points-based decomposition for multiobjective feature selection in classification: Static and dynamic mechanisms | |
Liu et al. | A variable importance-based differential evolution for large-scale multiobjective optimization | |
Zhang et al. | A local boosting algorithm for solving classification problems | |
Marcoulides et al. | Specification searches in structural equation modeling with a genetic algorithm | |
Tambe et al. | Barcode identification for single cell genomics | |
CN110597715A (en) | Test sample optimization method based on fuzzy test | |
Ye et al. | A ternary bitwise calculator based genetic algorithm for improving error correcting output codes | |
Manikandan et al. | Feature selection on high dimensional data using wrapper based subset selection | |
Storato et al. | K2mem: discovering discriminative k-mers from sequencing data for metagenomic reads classification | |
Minku et al. | Clustering and co-evolution to construct neural network ensembles: an experimental study | |
Purshouse et al. | An adaptive divide-and-conquer methodology for evolutionary multi-criterion optimisation | |
Lanzarini et al. | A new binary pso with velocity control | |
Burks et al. | Higher-order Markov models for metagenomic sequence classification | |
Gkalelis et al. | Linear subclass support vector machines | |
Błażej et al. | The quality of genetic code models in terms of their robustness against point mutations | |
US20180239866A1 (en) | Prediction of genetic trait expression using data analytics | |
KR20030032395A (en) | Method for Analyzing Correlation between Multiple SNP and Disease | |
Krachunov et al. | Machine learning models in error and variant detection in high-variation high-throughput sequencing datasets | |
Silva et al. | Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA | |
Gao et al. | Exploring cancer biomarker genes from gene expression data via natureinspired multiobjective optimization | |
Yatskou et al. | Identification of single nucleotide genetic polymorphism sites using machine learning methods | |
Banik | Effect of the side effect machines in edit metric decoding | |
CN109905340B (en) | Feature optimization function selection method and device and electronic equipment | |
Watts et al. | Adapting random forests to predict obesity-associated gene expression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20201208 |