CN108427643B - Binary program fuzzy test method based on multi-population genetic algorithm - Google Patents

Binary program fuzzy test method based on multi-population genetic algorithm Download PDF

Info

Publication number
CN108427643B
CN108427643B CN201810233482.3A CN201810233482A CN108427643B CN 108427643 B CN108427643 B CN 108427643B CN 201810233482 A CN201810233482 A CN 201810233482A CN 108427643 B CN108427643 B CN 108427643B
Authority
CN
China
Prior art keywords
population
edges
test data
program
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810233482.3A
Other languages
Chinese (zh)
Other versions
CN108427643A (en
Inventor
罗森林
侯留洋
潘丽敏
焦龙龙
张笈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201810233482.3A priority Critical patent/CN108427643B/en
Publication of CN108427643A publication Critical patent/CN108427643A/en
Application granted granted Critical
Publication of CN108427643B publication Critical patent/CN108427643B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a binary program fuzzy test method based on multi-population genetic algorithm, belonging to the field of binary vulnerability mining in information security. The method adopts a multi-population genetic algorithm method, and firstly abstracts each test data individual into a chromosome. A main population and sub-populations 1 and 2 are then initialized, either randomly or from initial data, by recording the number of newly discovered edges in the test data execution path and the number of edges associated with the test data as a measure of fitness. And then, the good individuals of the sub-population are obtained by fitness sorting and are migrated to the main population. Finally, the main population and the sub population are respectively subjected to genetic operation (crossing and mutation) to obtain new individuals to be subjected to a new round of tracking execution. The method can effectively improve the coverage rate of the program execution path, can cover the specific program execution path, has obvious guiding significance for the generation of the test data, and has good application value and popularization value.

Description

Binary program fuzzy test method based on multi-population genetic algorithm
Technical Field
The invention relates to a binary program fuzzy test method, and belongs to the field of binary vulnerability mining in information security.
Background
The fuzzy test technology is the most common vulnerability mining method with good comprehensive effect in the security field at present, and monitors whether the execution process of the software has abnormity such as breakdown or the like by providing a randomly constructed or variant test case for a target software system so as to observe whether the target software has potential vulnerabilities. The higher the code coverage rate of the test data generated by the fuzzy test system is, the higher the possibility of finding a bug is, so that the code coverage rate of the test data can be used as an evaluation criterion of the quality of the test data generation. In general, in the fuzz test, the source code of the tested program is not available, so the format of the input data is unknown. The mutation-based approach generates new test data by directly modifying existing test data. However, because the mutation mode is random, a high code coverage rate cannot be achieved, and the vulnerability mining effect is not good. Therefore, the present invention will provide a binary program fuzz testing method of multi-population genetic algorithms to improve the code coverage of variant-based fuzz testing to generate test data.
The basic problems to be solved by the binary program fuzzy test method of the multi-population genetic algorithm are as follows: the problem that the test data randomly generated by the mutation-based method cannot achieve high code coverage rate. In view of the existing binary program fuzzy test method with unknown input data format, the commonly used methods can be classified into two types:
1. method of symbol execution
The symbol-based method processes the test data as symbol values. And testing a new execution path by collecting constraint information when the program processes the symbol value and then solving and generating new test data by utilizing the constraint information. Theoretically, the method can reach the code coverage rate of 100%, but for a complex program, the symbolic execution has the defect of path explosion, and the application range of the symbolic execution method is seriously influenced.
2. Method for evolving algorithms
The method based on the evolutionary algorithm converts the test data into a proper format so as to conveniently guide the generation of the test data, and the genetic algorithm is widely applied. At present, one or more execution paths are required to be predefined manually in the use of a genetic algorithm, so that test data conforming to the preset execution paths are generated, and other paths cannot be tested.
In summary, the conventional binary program fuzz testing method for an unknown input data format has the problems of being not suitable for complex programs, having few execution paths for testing, and the like. Therefore, the invention provides a binary program fuzzy test method based on multi-population genetic algorithm.
Disclosure of Invention
The invention aims to provide a binary program fuzzy test method based on multi-population genetic algorithm, aiming at solving the problems that the binary program fuzzy test method with unknown input data format is not suitable for complex programs and the execution path of the test is few.
The design principle of the invention is as follows:
the method comprises the steps of converting test data into individuals of a main population and a sub-population, generating new test data by using changes of the individuals in the evolution process, simultaneously influencing the evolution of the main population by using the new individuals of the sub-population, namely ensuring sharing and communication of excellent information among the populations by setting a migration operator, accelerating convergence speed while ensuring the diversity of the population individuals, namely accelerating the coverage rate of a program execution path, and showing the overall flow in figure 1.
The technical scheme of the invention is realized by the following steps:
step 1, population initialization.
Step 1.1, test data are converted into individuals in the population.
And 1.2, randomly initializing the main population and the sub-population.
And 2, positioning the basic block.
And 2.1, executing the program after Qemu instrumentation, and acquiring basic block information in the program execution process.
And 2.2, inserting codes before the basic block, and outputting the program execution path information to an external file.
And 3, monitoring whether the tested program is broken down or not and recording a program execution path.
And 3.1, recording a basic block sequence after the program is executed, wherein the basic block sequence can be converted into an edge sequence, and an edge is a jump between two continuous basic blocks.
And 3.2, merging the same edges in the edge sequence to obtain an edge set as program execution path information.
Step 3.3 all individuals X were treated according to step 3.1 and step 3.2iObtaining the execution path information of the program corresponding to all the test data, i.e. the edge set Ei
And 4, calculating the fitness of the test data and selecting excellent population individuals.
Step 4.1 calculates the increment in the number of edges found after test data execution as the f1 value for fitness calculation.
Step 4.2 updates the set of all edges once found and calculates the number of edges associated with the test data as the fitness calculated f2 value.
And 4.3, firstly comparing f1 values, then comparing f2 values for fitness ranking, and screening out excellent individuals (test data).
And 5, transferring the sub-population to the main population and crossing variation in the population.
Step 5.1 transfer a suitable number of superior individuals from the sub-population to the main population.
And 5.2, crossing in the population.
And 5.3, carrying out variation in the population.
And 5.3, bringing the newly obtained excellent individuals of the main population and the sub-population into the tested program to be executed, and repeating the steps 3 to 5.
Advantageous effects
Compared with a symbolic execution analysis method and other evolutionary algorithms, the test data generated in the same time can obtain higher code coverage rate, the generation of the test data in the fuzzy test data by the multi-population genetic algorithm is proved to have obvious guiding significance, the efficiency of the crash vulnerability is found to be obviously improved compared with AFL, and the generated test data can cover the execution path of a specific program.
Drawings
FIG. 1 is a flow chart of the binary program fuzzy test method based on multi-population genetic algorithm of the present invention.
Fig. 2 is a schematic diagram illustrating an exemplary basic block of the present invention.
Detailed Description
In order to better illustrate the objects and advantages of the present invention, embodiments of the method of the present invention are described in further detail below with reference to examples.
The specific process is as follows:
step 1, population initialization.
Step 1.1, first the main population and the sub-population are composed of a number of individuals, each of which can be abstractly expressed as a chromosome, then the ith individual in the population can be expressed as Xi=(xi,1,xi,2,xi,3,...,xi,D). The process of population initialization is XiEach gene x ini,dAssigned value of each xi,dRepresents one byte, and the length D is the number of bytes of the test data.
Step 1.2, the invention initializes the main population and the sub population by using a random assignment mode. Both class1 and class2 sub-populations were tested using a population in which class2 sub-population initializations are negated from the binary code of each of class1 sub-populations.
And 2, positioning the basic block.
And 2.1, Qemu instrumentation can acquire basic block information in the program execution process. Qemu simulates the process of executing a program, and divides the program into basic blocks for translation and execution.
And 2.2, inserting a section of code for outputting the information of the currently executed basic block before the Qemu executes the basic block, obtaining a basic block sequence corresponding to the program execution process when the Qemu simulates the execution program, namely the execution path information of the program, and recording the basic block sequence to an external file.
And 3, tracking and executing the tested program.
Step 3.1, representing each basic block in the program by its entry address b, a sequence of basic blocks is obtained by tracking the program execution (b)1,b2,b3,...,bn). Defining a jump between two consecutive basic blocks in an execution path as e ═ bk,bk+1) Then E is an edge in the program execution path (as shown in fig. 2) using the basic block as a node, and the program execution path can be represented as a sequence E of edgese=(e1,e2,e3,...,en-1)。
Step 3.2, merge sequence EeThe same side in the sequence table is obtained to obtain a set E 'of sides containing the appearance frequency information'e=(e’1,e‘2,e’3,...,e‘n-1). The number of occurrences of the same edge may be different in different program executions, we divided it into 8 different types: 1.2, 3, 4-7, 8-15, 16-31, 32-127, not less than 128. These 8 types can be represented using different bits of a byte, facilitating programming implementation. After classification, a new set E of edges is obtained "e={e”1,e“2,e”3,...,e“n-1}。
Step 3.3, for each individual X in the main population and the sub-populationiThe corresponding program basic block sequence is processed by the method, and finally the execution path information of the program, namely the edge set E is obtainedi={ei,1,ei,2,ei,3,...}。
And 4, calculating and sequencing fitness, and then selecting excellent individuals of the population.
Step 4.1, defining the set of all the discovered edges in the whole fuzzy test process as Et={et,1,et,2,et,3,...}. Through f1(Xi)=card(Ei-Et) The number f1 of newly found edges of the test data after execution in the program under test is calculated.
Step 4.2, update the set E in the populationtAnd Wt. For set EtAny one side et,iSuppose that the last test data to find this edge is Xt,iObtaining a set W with one edge corresponding to the test datat={(et,1,Xt,1),(et,2,Xt,2),(et,3,Xt,3),...}. And use the function
Figure GDA0002755971780000041
Computing a set WtF2, where W (e) is the number of edges from the set WtAnd obtaining test data corresponding to the edge e, wherein R (x, y) is a binary function, when x and y are the same, the function returns to 1, and otherwise, the function returns to 0.
And 4.3, calculating adaptive values and sequencing of the sub-populations independently of the main population. First, f of each individual in the population is calculated1Then updating the set E in the populationtAnd WtFinally, f of each individual is calculated2. When two individuals are subjected to fitness comparison, firstly, f is compared1The value of (c) is compared in case of no distinction2The value of (c). This allows selection of superior individuals from the main and sub-populations.
And 5, transferring the sub-population to the main population and carrying out cross variation in the population.
And 5.1, adding the top 20 percent of excellent individuals in the sub-population into the excellent individuals in the main population for crossing and mutation.
Step 5.2, the crossover process uses 2-opt exchange. When the main population crosses, the length of the chromosome of one individual is D, four random numbers between 0 and D are firstly generated to be used as cross points, and then the fragments between two cross points in the chromosome are exchanged pairwise. Similarly, subgroup 1 of class1 and subgroup 2 of class2 employ one and three pairs of intersections, respectively.
And 5.3, when the main population is changed, generating two random numbers between 0 and D as change points, and then replacing genes at the change points by using randomly generated values. Similarly, sub-population 1 and sub-population 2 randomly generate one and two variation points, respectively.
And 5.4, bringing the newly obtained excellent individuals of the main population and the sub-population into the tested program to be executed, and repeating the steps 3 to 5.
And (3) testing results: the experiment tests the newly found edges of the three tested programs in the specified time, and the experimental result shows that the test data generated in the two groups of tests can obtain higher code coverage rate in the same time, and the code coverage rate is improved by more than 27% compared with AFL, namely higher code coverage rate is obtained. In addition, the test is carried out 100 times to the 9 tested programs with the holes to obtain the average value, and the test result shows that the efficiency of the invention for finding the collapse hole is improved by 13 percent compared with the AFL in the test of all the 9 tested programs.
The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. The binary program fuzzy testing method based on the multi-population genetic algorithm is characterized by comprising the following steps of:
step 3, firstly obtaining the basic block sequence and the corresponding edge sequence, then combining the same edges to obtain the edge set E containing the occurrence frequency informationeThen, the times of occurrence of each edge are divided into 8 types and are represented by different bits of a byte, and a new edge set E 'is obtained after classification'eFinally, the execution path information of the program, i.e., the set of edges E 'is obtained'e
Step 4.1, define ith individual in population as Xi=(xi,1,xi,2,xi,3,...,xi,D) The set of all the edges found in the whole fuzzy test process is Et={et,1,et,2,et,3,., defining the execution path information of the program, i.e. the set of edges is Ei={ei,1,ei,2,ei,3,., defining the number of elements in the set A as card (A), and passing f1(Xi)=card(Ei-Et) Calculating the number of newly found edges of the test data after the test data is executed in the tested program;
step 4.2, for the set E of all the discovered edges in the whole fuzzy test processt={et,1,et,2,et,3,.. } any one edge et,iSuppose that the last test data to find this edge is Xt,iThen a set W is obtained in which one edge corresponds to the test datat={(et,1,Xt,1),(et,2,Xt,2),(et,3,Xt,3) ,.., and using a function
Figure FDA0002755971770000011
Computing a set WtF2, where W (e) is the number of edges from the set WtObtaining test data corresponding to the edge e, wherein R (X, y) is a binary function, when X and y are the same, the function returns to 1, otherwise, the function returns to 0, and X is the other conditioni=(xi,1,xi,2,xi,3,...,xi,D) Is the ith individual in the population;
and 4.3, comparing the fitness of the two test data, namely firstly comparing f1 values of the two test data, and if the two test data are equal, updating the set EtAnd WtFinally f of the test data is calculated2Comparing the values;
and 5, using 2-opt exchange in the crossing process, randomly generating 0-D crossing points, setting different crossing rates and variation rates for different sub-populations by taking the main population as a reference, wherein one is lower than the main population and the other is higher than the main population, and thus avoiding the algorithm from falling into premature convergence.
2. The multi-population genetic algorithm based binary program fuzzy test method of claim 1 further comprising: step 3, recording an execution path by recording a basic block starting point, merging the same edges to obtain a set of edges containing the information of the number of the edges, then dividing the number of the edges into n types, representing each type in the n types by one or more bytes, and finally obtaining a set E 'of the edges representing the information of the program execution path'eThe elements of the edges in the set contain the edge occurrence number category information.
3. The multi-population genetic algorithm based binary program fuzzy test method of claim 1 further comprising: step 4.2 define set Wt={(et,1,Xt,1),(et,2,Xt,2),(et,3,Xt,3) ,., to record edges and test data information relating to the edges, and to use them to find f2 values in fitness calculations,
Figure FDA0002755971770000021
i.e. the number of edges among all found edges that are relevant to the test data.
4. The multi-population genetic algorithm based binary program fuzz testing of claim 1The method is characterized in that: step 4 by f1(Xi)=card(Ei-Et) Computing a set E of all discovered edges after execution of the current test datatThe increment of the number of the medium elements is used as the value of the fitness index f1 of the individual.
5. The multi-population genetic algorithm based binary program fuzzy test method of claim 1 further comprising: the newly found edge f1 is preferably considered in the step 4 fitness calculation, and then all relevant edges of the test data are considered, because finding the test data of the new execution path is more significant.
6. The multi-population genetic algorithm based binary program fuzzy test method of claim 1 further comprising: step 5, in the genetic operation, the cross rate and the variation rate of the sub-population 1 of class1 are lower than those of the main population, and the cross rate and the variation rate of the sub-population 2 of class2 are higher than those of the main population, so that the algorithm is prevented from falling into premature convergence.
CN201810233482.3A 2018-03-21 2018-03-21 Binary program fuzzy test method based on multi-population genetic algorithm Expired - Fee Related CN108427643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810233482.3A CN108427643B (en) 2018-03-21 2018-03-21 Binary program fuzzy test method based on multi-population genetic algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810233482.3A CN108427643B (en) 2018-03-21 2018-03-21 Binary program fuzzy test method based on multi-population genetic algorithm

Publications (2)

Publication Number Publication Date
CN108427643A CN108427643A (en) 2018-08-21
CN108427643B true CN108427643B (en) 2020-12-08

Family

ID=63158791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810233482.3A Expired - Fee Related CN108427643B (en) 2018-03-21 2018-03-21 Binary program fuzzy test method based on multi-population genetic algorithm

Country Status (1)

Country Link
CN (1) CN108427643B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111338952B (en) * 2020-02-25 2024-03-29 杭州世平信息科技有限公司 Fuzzy test method and device for path coverage rate feedback
CN112463638B (en) * 2020-12-11 2022-09-20 清华大学深圳国际研究生院 Fuzzy test method based on neural network and computer readable storage medium
CN113268432B (en) * 2021-06-24 2023-09-01 广东电网有限责任公司计量中心 Electric energy meter driver testing method and system based on evolutionary algorithm
CN116089317B (en) * 2023-04-10 2023-06-27 江西财经大学 Multipath testing method and system based on path similarity table and individual migration

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000039405A (en) * 1998-12-12 2000-07-05 이계철 Method for arranging binary tree using genetic algorithm
CN102385550A (en) * 2010-08-30 2012-03-21 北京理工大学 Detection method for software vulnerability
CN103914383A (en) * 2014-04-04 2014-07-09 福州大学 Fuzz testing system on basis of multi-swarm collaboration evolution genetic algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000039405A (en) * 1998-12-12 2000-07-05 이계철 Method for arranging binary tree using genetic algorithm
CN102385550A (en) * 2010-08-30 2012-03-21 北京理工大学 Detection method for software vulnerability
CN103914383A (en) * 2014-04-04 2014-07-09 福州大学 Fuzz testing system on basis of multi-swarm collaboration evolution genetic algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于相对分类信息熵的进化特征选择算法";翟俊海等;《模式识别与人工智能》;20160831;第28卷(第8期);第682-690页 *

Also Published As

Publication number Publication date
CN108427643A (en) 2018-08-21

Similar Documents

Publication Publication Date Title
CN108427643B (en) Binary program fuzzy test method based on multi-population genetic algorithm
Nguyen et al. Multiple reference points-based decomposition for multiobjective feature selection in classification: Static and dynamic mechanisms
Liu et al. A variable importance-based differential evolution for large-scale multiobjective optimization
Zhang et al. A local boosting algorithm for solving classification problems
Marcoulides et al. Specification searches in structural equation modeling with a genetic algorithm
Tambe et al. Barcode identification for single cell genomics
CN110597715A (en) Test sample optimization method based on fuzzy test
Ye et al. A ternary bitwise calculator based genetic algorithm for improving error correcting output codes
Manikandan et al. Feature selection on high dimensional data using wrapper based subset selection
Storato et al. K2mem: discovering discriminative k-mers from sequencing data for metagenomic reads classification
Minku et al. Clustering and co-evolution to construct neural network ensembles: an experimental study
Purshouse et al. An adaptive divide-and-conquer methodology for evolutionary multi-criterion optimisation
Lanzarini et al. A new binary pso with velocity control
Burks et al. Higher-order Markov models for metagenomic sequence classification
Gkalelis et al. Linear subclass support vector machines
Błażej et al. The quality of genetic code models in terms of their robustness against point mutations
US20180239866A1 (en) Prediction of genetic trait expression using data analytics
KR20030032395A (en) Method for Analyzing Correlation between Multiple SNP and Disease
Krachunov et al. Machine learning models in error and variant detection in high-variation high-throughput sequencing datasets
Silva et al. Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA
Gao et al. Exploring cancer biomarker genes from gene expression data via natureinspired multiobjective optimization
Yatskou et al. Identification of single nucleotide genetic polymorphism sites using machine learning methods
Banik Effect of the side effect machines in edit metric decoding
CN109905340B (en) Feature optimization function selection method and device and electronic equipment
Watts et al. Adapting random forests to predict obesity-associated gene expression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20201208