CN116108449B - Software fuzzy test method, device, equipment and storage medium - Google Patents

Software fuzzy test method, device, equipment and storage medium Download PDF

Info

Publication number
CN116108449B
CN116108449B CN202310067956.2A CN202310067956A CN116108449B CN 116108449 B CN116108449 B CN 116108449B CN 202310067956 A CN202310067956 A CN 202310067956A CN 116108449 B CN116108449 B CN 116108449B
Authority
CN
China
Prior art keywords
field
software
program
corresponding relation
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310067956.2A
Other languages
Chinese (zh)
Other versions
CN116108449A (en
Inventor
张超
王准
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202310067956.2A priority Critical patent/CN116108449B/en
Publication of CN116108449A publication Critical patent/CN116108449A/en
Application granted granted Critical
Publication of CN116108449B publication Critical patent/CN116108449B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a software fuzzy test method, a device, equipment and a storage medium, wherein the method comprises the following steps: inputting an initial seed file into software to be tested, and acquiring a corresponding relation among input bytes, binary program instructions and basic program blocks generated in the testing process; integrating the minimum continuous bytes in the input bytes into fields by utilizing a minimum clustering algorithm to obtain field boundary information; further determining the corresponding relation between the basic blocks and the fields of the program; inputting the program basic block information corresponding to the field into a pre-trained neural network model, and determining the field type of the field; recording the field type in a file with a preset format to obtain a format template file; and a fuzzy test tool is adopted to carry out fuzzy test on the software based on the format template file, record the variation execution result, and carry out self-adaptive optimization of the fuzzy test on the software, thereby improving the efficiency of the software test.

Description

Software fuzzy test method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer and software technologies, and in particular, to a software fuzzy testing method, device, equipment and storage medium.
Background
Along with the continuous popularization of information technology, the informatization is spread over the aspects of social production and life, people put higher requirements on software security, software security holes are found in advance, and targeted restoration is of great significance to the current social maintenance order and stable development. Software fuzzing is one of the effective methods of discovering software vulnerabilities.
In the prior art, the black box fuzzy test scheme is a common software fuzzy test method. The black box fuzzy test is to input an initial seed file into a program, and a worker is required to analyze and test according to the output result and the breakdown state of the program.
However, the inventors found that the prior art has at least the following technical problems: the black box fuzzy test scheme requires a great deal of manpower and expert knowledge, and has low test efficiency.
Disclosure of Invention
The application provides a software fuzzy test method, device, equipment and storage medium, which are used for solving the problem of low test efficiency.
In a first aspect, the present application provides a software ambiguity test method, including:
inputting an initial seed file into software to be tested, and acquiring input bytes and executed binary program instructions used in the test process;
acquiring a first corresponding relation between a binary program instruction and a program basic block;
determining a second corresponding relation between the input byte and the binary program instruction by adopting a dynamic taint analysis method;
determining a third corresponding relation between the input byte and the basic block of the program according to the first corresponding relation and the second corresponding relation;
according to the third corresponding relation, integrating the minimum continuous bytes in the input bytes into fields by utilizing a minimum clustering algorithm to obtain field boundary information;
determining a fourth corresponding relation between the basic block and the field according to the third corresponding relation and the field boundary information;
inputting the basic block information of the program corresponding to the field into a pre-trained neural network model according to the fourth corresponding relation, and determining the field type corresponding to the field;
recording the field types in a file with a preset format according to a format information model to obtain a format template file corresponding to the initial seed file;
and carrying out fuzzy test on the software based on the format template file by adopting a fuzzy test tool, recording a mutation execution result, and carrying out self-adaptive optimization of the software fuzzy test according to the mutation execution result.
In one possible design, according to the fourth correspondence, inputting the program basic block information corresponding to the field into the pre-trained neural network model, determining the field type corresponding to the field includes: obtaining a program basic block corresponding to the field according to the fourth corresponding relation; vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
In one possible design, a fuzzy test tool is used to perform fuzzy test on software based on a format template file, record a mutation execution result, and perform adaptive optimization of the software fuzzy test according to the mutation execution result, including: inputting the initial seed file and a format template file corresponding to the initial seed file into software to be tested by adopting a fuzzy test tool, and generating a new test case for the variation of the initial seed file based on format information; judging whether the code coverage rate increment corresponding to the test case in the fuzzy test is larger than or equal to a first preset value, and if the code coverage rate increment corresponding to the test case is larger than or equal to the first preset value, re-extracting the format template file of the test case; recording the variation execution times of the format template files corresponding to the test cases in the fuzzy test process, calculating the average value of the variation execution times, and performing the variation execution again on the test cases corresponding to the format template with the variation execution times smaller than the average value when the code coverage rate increase speed of the format template files corresponding to the test cases is smaller than a second preset value.
In one possible design, the determining the second correspondence between the input bytes and the binary program instructions using dynamic taint analysis includes: and adopting a dynamic binary instrumentation tool to perform dynamic taint analysis processing on the input bytes and the binary program instructions, and obtaining the corresponding relation between the input byte offset and the registers or the memories related to the binary program instructions as a second corresponding relation between the input bytes and the binary program instructions.
In one possible design, the method inputs the basic block information of the program corresponding to the field into the pre-trained neural network model, and before determining the field type corresponding to the field, the method further includes: acquiring field boundary information of a sample file and a field type of a sample field; obtaining program basic block information corresponding to the sample field according to the field boundary information, and vectorizing the program basic block information of the sample; training a neural network model based on field types of the sample fields and the vectorized program basic block information to obtain a pre-trained neural network model.
In a second aspect, the present application provides a software ambiguity test apparatus, including:
the first acquisition module is used for inputting the initial seed file into the software to be tested and acquiring input bytes and executed binary program instructions used in the test process;
the second acquisition module is used for acquiring a first corresponding relation between the binary program instruction and the program basic block;
the first determining module is used for determining a second corresponding relation between the input bytes and the binary program instruction by adopting a dynamic taint analysis method;
the second determining module is used for determining a third corresponding relation between the input byte and the basic program block according to the first corresponding relation and the second corresponding relation;
the clustering module is used for integrating the minimum continuous bytes in the input bytes into fields by utilizing a minimum clustering algorithm according to the third corresponding relation to obtain field boundary information;
the third determining module is used for determining a fourth corresponding relation between the basic block and the field according to the third corresponding relation and the field boundary information;
the fourth determining module is used for inputting the basic block information of the program corresponding to the field into the pre-trained neural network model according to the fourth corresponding relation, and determining the field type corresponding to the field;
the recording module is used for recording the field types in a file with a preset format according to the format information model to obtain a format template file corresponding to the initial seed file;
and the testing module is used for carrying out fuzzy testing on the software based on the format template file by adopting a fuzzy testing tool, recording a variation execution result and carrying out self-adaptive optimization of the software fuzzy testing according to the variation execution result.
In one possible design, the fourth determining module is configured to obtain a program basic block corresponding to the field according to the fourth corresponding relationship; vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
In a third aspect, the present application provides a computer device comprising: at least one processor and memory;
the memory stores computer-executable instructions;
at least one processor executes computer-executable instructions stored in a memory, causing the at least one processor to perform the software ambiguity test method as described above in the first aspect and various possible designs of the first aspect.
In a fourth aspect, the present application provides a computer storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the software ambiguity test method of the first aspect and the various possible designs of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product, including a computer program, which when executed by a processor implements the software fuzzing method of the first aspect and the various possible designs of the first aspect.
According to the software fuzzy test method, device, equipment and storage medium, the corresponding relation between the input bytes and the binary program instructions generated in the test process is obtained through the dynamic taint analysis method, the input bytes are combined into the fields according to the minimum clustering algorithm, the field boundary information of the fields is obtained, the field types of the fields are obtained through inputting the program basic block information corresponding to the fields into the neural network, the field types are recorded in the format template file in the preset format, the fuzzy test tool is adopted to carry out fuzzy test on the format template file, the self-adaptive optimization is carried out according to the variation execution result, and the efficiency of the software test is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the prior art descriptions, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is an application scenario schematic diagram of a software ambiguity test method provided in an embodiment of the present application;
FIG. 2 is a flow chart of a software ambiguity test method according to one embodiment of the present application;
FIG. 3 is a flowchart of a software ambiguity test method according to another embodiment of the present application;
FIG. 4 is a flowchart of a software ambiguity test method according to another embodiment of the present application;
fig. 5 is a schematic structural diagram of a software ambiguity test device according to an embodiment of the present application;
fig. 6 is a schematic hardware structure of a computer device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Aiming at the problem of lower test efficiency in the prior art, the embodiment of the application provides the following technical scheme: the method comprises the steps of inputting an initial seed file into software to be tested, combining input bytes in a testing process into fields to obtain field boundaries, analyzing field types of the fields and generating format template files, carrying out fuzzification testing on the software based on different format models, and recording variation execution times of the different format template files. The following will explain in detail the embodiments.
Fig. 1 is an application scenario schematic diagram of a software ambiguity test method provided in an embodiment of the present application. As shown in fig. 1, the computer device 101 inputs an initial seed file into software to be tested, performs a software test, and sends a test result to the display terminal 102 for display.
The following describes the technical solution of the present application and how the technical solution of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 2 is a schematic flow chart of a software ambiguity test method provided in the embodiment of the present application, where the execution body of the embodiment may be a computer device in the embodiment shown in fig. 1, or any computer processing device, and the embodiment is not limited herein. As shown in fig. 2, the method includes:
s201: the initial seed file is input into the software to be tested, and input bytes and executed binary program instructions used in the testing process are obtained.
Where the initial seed file broadly refers to various types of inputs including, but not limited to, files in a file system, command line inputs, network message inputs, and the like.
Specifically, the initial seed file is input into the software program to be tested, the software program to be tested starts to run, input bytes used in the initial seed file are obtained, and binary program instructions executed in the software running process are obtained.
S202: a first correspondence between binary program instructions and program basic blocks is obtained.
Specifically, an open source static analysis tool Angr is used for analyzing a program control flow graph, and a first corresponding relation between a binary program instruction and a program basic block is obtained.
S203: a second correspondence between the input bytes and the binary program instructions is determined using dynamic taint analysis.
Specifically, a dynamic binary instrumentation tool is adopted, and based on a dynamic taint analysis method, input bytes and binary program instructions are processed to obtain a corresponding relation between input byte offset and a register or a memory related to the binary program instructions, and the corresponding relation is used as a second corresponding relation between the input bytes and the binary program instructions.
In this embodiment, the dynamic binary instrumentation tool may be an "Intel pintools" dynamic binary instrumentation tool.
S204: and determining a third corresponding relation between the input byte and the basic block of the program according to the first corresponding relation and the second corresponding relation.
Specifically, a third correspondence between the input byte and the program basic block is determined based on the correspondence between the input byte and the binary program instruction and the correspondence between the binary program instruction and the program basic block.
S205: and integrating the minimum continuous bytes in the input bytes into fields by utilizing a minimum clustering algorithm according to the third corresponding relation to obtain field boundary information.
Specifically, the input bytes in the same basic block are integrated by utilizing a minimum clustering algorithm, continuous offset is used as a field, and if fields among different basic blocks overlap, the minimum unit in the fields is used as the field, so that field boundary information is obtained.
The field boundary information is the offset of the field in the input byte and the field length.
S206: and determining a fourth corresponding relation between the basic block and the field according to the third corresponding relation and the field boundary information.
S207: and according to the fourth corresponding relation, inputting the program basic block information corresponding to the field into the pre-trained neural network model, and determining the field type corresponding to the field.
Wherein, the field type contains one or more of the following, length, enumeration, magic number, character string, check code, offset: the length represents the length of the data byte number or array; enumeration represents an enumerated type that can only take some specific values; magic numbers represent some hard-coded special bytes, commonly referred to as file headers, etc.; the character string represents a coded character sequence such as ASCII, unicode; the check code represents a special field for checking the integrity of other bytes in the input; the offset represents a field indicating the position of the other specific part in the input.
Specifically, according to the fourth corresponding relation, obtaining a program basic block corresponding to the field; vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
In this embodiment, binary program instructions of the program basic blocks corresponding to the fields may be obtained according to the second correspondence; vectorizing binary program instructions of the program basic blocks corresponding to the fields by using open source items VEX and Keras in a single-hot coding mode to obtain vectorization information of the binary program instructions corresponding to the fields; integrating the vectorization information of the binary program instruction corresponding to the field to obtain vectorization information of the binary program basic block corresponding to the field; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
S208: and recording the field types in a file with a preset format according to the format information model to obtain a format template file corresponding to the initial seed file.
In this embodiment, the preset file format may be a "pit file", where the length, offset, and corresponding field type corresponding to the input field are recorded.
S209: and carrying out fuzzy test on the software based on the format template file by adopting a fuzzy test tool, recording a mutation execution result, and carrying out self-adaptive optimization of the software fuzzy test according to the mutation execution result.
Specifically, a fuzzy test tool is adopted, fuzzy test is carried out on software based on a format template file, input bytes in an initial seed file are mutated to obtain a new test case, format template analysis is carried out on the test case meeting preset conditions, and a mutated test result is recorded and self-adaptive optimization is carried out.
In summary, according to the software fuzzy test method provided by the embodiment, the correspondence between the input bytes and the binary program instruction generated in the test process is obtained by adopting the dynamic taint analysis method, the input bytes are combined into the field according to the minimum clustering algorithm, the field boundary information of the field is obtained, the field type of the field is obtained by inputting the program basic block information corresponding to the field into the neural network, the field type is recorded in the format template file with the preset format, the fuzzy test tool is adopted to carry out fuzzy test on the format template file, and the adaptive optimization is carried out according to the variation execution result, so that the efficiency of the software test is improved.
Fig. 3 is a flowchart of a software ambiguity test method according to another embodiment of the present application. The embodiment of the present application describes S209 in detail on the basis of the embodiment provided in fig. 2. As shown in fig. 3, the method includes:
s301: and inputting the initial seed file and a format template file corresponding to the initial seed file into software to be tested by adopting a fuzzy test tool, and generating a new test case for the variation of the initial seed file based on the format information.
The format information is information recorded in the format template, and includes field boundary information, namely the offset and the length of a field in an input byte, and a corresponding field type.
Specifically, the input bytes in the initial seed file are mutated based on the format information to generate new test cases, different targeted mutation strategies are adopted for different field types, the probability of generating effective input is remarkably improved, and the software test efficiency is improved.
In this embodiment, the fuzzy test tool may be an AFL test tool; the specific mutation policy may be a specific integer value for the length type, or a specific enumeration value for the enumeration type.
S302: judging whether the code coverage rate increment corresponding to the test case in the fuzzy test is larger than or equal to a first preset value, and if the code coverage rate increment corresponding to the test case is larger than or equal to the first preset value, re-extracting the format template file of the test case.
Specifically, a self-contained tool in the fuzzy test is adopted to obtain the code coverage rate corresponding to the test case, whether the code coverage rate increment is larger than or equal to a first preset value is judged through the function in the fuzzy test tool which is improved in advance, and if the code coverage rate increment corresponding to the test case is larger than or equal to the first preset value, the format template file of the test case is extracted again.
Illustratively, the code coverage may be 80%,90%, etc. of no more than a percentage of 1.
S303: recording the variation execution times of the format template files corresponding to the test cases in the fuzzy test process, calculating the average value of the variation execution times, and performing the variation execution again on the test cases corresponding to the format template with the variation execution times smaller than the average value when the code coverage rate increase speed of the format template files corresponding to the test cases is smaller than a second preset value.
Specifically, the test cases are tested by adopting a test tool, whether the code coverage rate increase speed of the format template file corresponding to the test cases is smaller than a second preset value is judged by adopting functions in a fuzzy test tool which are improved in advance, if the code coverage rate increase speed of the format template file corresponding to the test cases is smaller than the second preset value, the input bytes in the test cases corresponding to the format templates with variation execution times smaller than the average value are changed again, a new test case is obtained, and software fuzzy test is carried out on the new test case.
In summary, according to the software fuzzy testing method provided by the embodiment, the software to be tested is tested based on the initial seed file by adopting the improved fuzzy testing tool, and the testing energy is secondarily distributed according to the increment amount and the increment speed of the code coverage rate corresponding to the test case, so that the efficiency of software testing is further improved.
Fig. 4 is a flowchart of a software ambiguity test method according to another embodiment of the present application. The embodiment of the present application is based on the embodiment provided in fig. 2, and the training model is described in detail before S206. As shown in fig. 4, the method includes:
s401: and acquiring field boundary information of the sample file and field types of the sample fields.
The sample file includes, but is not limited to, a file in a file system of known format information, command line input, network message input, etc.
Specifically, the field boundary information and field type of the sample file are automatically extracted using an Autoit script programming language and 010Editor software, and sent to a display terminal for inspection and correction by a tester.
S402: and obtaining program basic block information corresponding to the sample field according to the field boundary information, and vectorizing the program basic block information of the sample.
Specifically, a first corresponding relation between a binary program instruction and a program basic block is obtained, a second corresponding relation between an input byte contained in a sample field and the binary program instruction is determined by adopting a dynamic taint analysis method, a third corresponding relation between the input byte and the program basic block is determined according to the first corresponding relation and the second corresponding relation, further program basic block information corresponding to the sample field is obtained, vectorization is carried out on the program basic block information of the sample, and binary program basic block vectorization information is obtained.
S403: training a neural network model based on field types of the sample fields and the vectorized program basic block information to obtain a pre-trained neural network model.
Specifically, the field type of the sample field and the vectorized program basic block information are input into a neural network model to obtain a pre-trained neural network model.
The neural network model may be, for example, a convolutional neural network model.
In summary, according to the software fuzzy test method provided by the embodiment, the neural network model is trained according to the field boundary information and the field type of the sample field, so that the accuracy of the neural network model is improved, and the efficiency of the software fuzzy test is further improved.
Fig. 5 is a schematic structural diagram of a software ambiguity test apparatus according to an embodiment of the present application. As shown in fig. 5, the software ambiguity test apparatus includes: a first acquisition module 501, a second acquisition module 502, a first determination module 503, a second determination module 504, a clustering module 505, a third determination module 506, a fourth determination module 507, a recording module 508, and a test module 509.
The first obtaining module 501 inputs an initial seed file into software to be tested, and obtains input bytes and executed binary program instructions used in the testing process;
a second obtaining module 502, configured to obtain a first correspondence between a binary program instruction and a program basic block;
a first determining module 503, configured to determine a second correspondence between the input byte and the binary program instruction by using a dynamic taint analysis method;
a second determining module 504, configured to determine a third correspondence between the input byte and the basic block of the program according to the first correspondence and the second correspondence;
the clustering module 505 is configured to integrate, according to the third correspondence, a minimum continuous byte in the input bytes into a field by using a minimum clustering algorithm, so as to obtain field boundary information;
a third determining module 506, configured to determine a fourth corresponding relationship between the program basic block and the field according to the third corresponding relationship and the field boundary information;
a fourth determining module 507, configured to input, according to a fourth correspondence, program basic block information corresponding to the field to the pre-trained neural network model, and determine a field type corresponding to the field;
the recording module 508 is configured to record the field type in a file with a preset format according to the format information model, so as to obtain a format template file corresponding to the initial seed file;
the test module 509 is configured to perform a fuzzy test on the software based on the format template file by using a fuzzy test tool, record a mutation execution result, and perform adaptive optimization of the fuzzy test on the software according to the mutation execution result.
In a possible implementation manner, the fourth determining module 507 is specifically configured to obtain a program basic block corresponding to the field according to the fourth corresponding relationship; vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
In one possible implementation manner, the test module 509 is specifically configured to input the initial seed file and a format template file corresponding to the initial seed file into the software to be tested by using a fuzzy test tool, and generate a new test case for mutation of the initial seed file based on the format information; judging whether the code coverage rate increment corresponding to the test case in the fuzzy test is larger than or equal to a first preset value, and if the code coverage rate increment corresponding to the test case is larger than or equal to the first preset value, re-extracting the format template file of the test case; recording the variation execution times of the format template files corresponding to the test cases in the fuzzy test process, calculating the average value of the variation execution times, and performing the variation execution again on the test cases corresponding to the format template with the variation execution times smaller than the average value when the code coverage rate increase speed of the format template files corresponding to the test cases is smaller than a second preset value.
In one possible implementation manner, the first determining module 503 is specifically configured to perform dynamic taint analysis processing on the input byte and the binary program instruction by using a dynamic binary instrumentation tool, so as to obtain a corresponding relationship between the input byte offset and a register or a memory related to the binary program instruction, as a second corresponding relationship between the input byte and the binary program instruction.
In one possible implementation manner, the software ambiguity test device further includes a training module 510, specifically configured to obtain field boundary information of the sample file and a field type of the sample field; obtaining program basic block information corresponding to the sample field according to the field boundary information, and vectorizing the program basic block information of the sample; training a neural network model based on field types of the sample fields and the vectorized program basic block information to obtain a pre-trained neural network model.
The device provided in this embodiment may be used to implement the technical solution of the foregoing method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.
Fig. 6 is a schematic hardware structure of a computer device according to an embodiment of the present application. As shown in fig. 6, the computer device of the present embodiment includes: a processor 601 and a memory 602; wherein the method comprises the steps of
A memory 602 for storing computer-executable instructions;
the processor 601 is configured to execute computer-executable instructions stored in the memory to implement the steps performed by the computer device in the above embodiments. Reference may be made in particular to the relevant description of the embodiments of the method described above.
Alternatively, the memory 602 may be separate or integrated with the processor 601.
When the memory 602 is provided separately, the computer device further comprises a bus 603 for connecting the memory 602 and the processor 601.
The embodiment of the application also provides a computer storage medium, wherein computer execution instructions are stored in the computer storage medium, and when a processor executes the computer execution instructions, the software ambiguity test method is realized.
The embodiment of the application also provides a computer program product, which comprises a computer program, and when the computer program is executed by a processor, the software ambiguity test method is realized. The embodiment of the application also provides a computer program product, which comprises a computer program, and when the computer program is executed by a processor, the software ambiguity test method is realized.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to implement the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one unit. The units formed by the modules can be realized in a form of hardware or a form of hardware and software functional units.
The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or processor to perform some steps of the methods of the various embodiments of the present application.
It should be understood that the above processor may be a central processing unit (Central Processing Unit, abbreviated as CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, abbreviated as DSP), application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk or optical disk, etc.
The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). It is also possible that the processor and the storage medium reside as discrete components in an electronic device or a master device.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims (9)

1. A software ambiguity test method, comprising:
inputting an initial seed file into software to be tested, and acquiring input bytes and executed binary program instructions used in the test process;
acquiring a first corresponding relation between the binary program instruction and a program basic block;
determining a second correspondence between the input bytes and the binary program instructions by using a dynamic taint analysis method;
determining a third corresponding relation between the input byte and the program basic block according to the first corresponding relation and the second corresponding relation;
according to the third corresponding relation, integrating the smallest continuous bytes in the input bytes into fields by utilizing a smallest clustering algorithm to obtain field boundary information;
determining a fourth corresponding relation between the program basic block and the field according to the third corresponding relation and the field boundary information;
inputting the basic block information of the program corresponding to the field into a pre-trained neural network model according to the fourth corresponding relation, and determining the field type corresponding to the field;
recording the field type in a file with a preset format according to a format information model to obtain a format template file corresponding to the initial seed file;
and carrying out fuzzy test on the software based on the format template file by adopting a fuzzy test tool, recording a variation execution result, and carrying out self-adaptive optimization of the software fuzzy test according to the variation execution result.
2. The method according to claim 1, wherein the inputting the program basic block information corresponding to the field into the pre-trained neural network model according to the fourth correspondence, determining the field type corresponding to the field, includes:
obtaining a program basic block corresponding to the field according to the fourth corresponding relation;
vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields;
and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
3. The method of claim 1, wherein the employing the fuzzy test tool to fuzzify the software based on the format template file, recording a variant execution result, and performing adaptive optimization of the software fuzzing according to the variant execution result comprises:
inputting the initial seed file and a format template file corresponding to the initial seed file into software to be tested by adopting a fuzzy test tool, and generating a new test case for the variation of the initial seed file based on format information;
judging whether the code coverage rate increment corresponding to the test case in the fuzzy test is larger than or equal to a first preset value, and if the code coverage rate increment corresponding to the test case is larger than or equal to the first preset value, re-extracting the format template file of the test case;
recording the variation execution times of the format template file corresponding to the test case in the fuzzy test process, calculating the average value of the variation execution times, and performing the variation execution again on the test case corresponding to the format template with the variation execution times smaller than the average value when the code coverage rate increase speed of the format template file corresponding to the test case is smaller than a second preset value.
4. The method of claim 1, wherein said determining a second correspondence between said input bytes and said binary program instructions using dynamic taint analysis comprises:
and adopting a dynamic binary instrumentation tool to perform dynamic taint analysis processing on the input byte and the binary program instruction to obtain a corresponding relation between the input byte offset and a register or a memory related to the binary program instruction, wherein the corresponding relation is used as a second corresponding relation between the input byte and the binary program instruction.
5. The method according to any one of claims 1 to 4, wherein the inputting the program basic block information corresponding to the field into the pre-trained neural network model, before determining the field type corresponding to the field, further comprises:
acquiring field boundary information of a sample file and a field type of a sample field;
obtaining program basic block information corresponding to the sample field according to the field boundary information, and vectorizing the program basic block information of the sample;
training a neural network model based on the field type of the sample field and the vectorized program basic block information to obtain a pre-trained neural network model.
6. A software ambiguity test apparatus, comprising:
the first acquisition module is used for inputting the initial seed file into the software to be tested and acquiring input bytes and executed binary program instructions used in the test process;
the second acquisition module is used for acquiring a first corresponding relation between the binary program instruction and the program basic block;
the first determining module is used for determining a second corresponding relation between the input byte and the binary program instruction by adopting a dynamic taint analysis method;
the second determining module is used for determining a third corresponding relation between the input byte and the basic program block according to the first corresponding relation and the second corresponding relation;
the clustering module is used for integrating the minimum continuous bytes in the input bytes into fields by utilizing a minimum clustering algorithm according to the third corresponding relation to obtain field boundary information;
a third determining module, configured to determine a fourth correspondence between the program basic block and the field according to the third correspondence and field boundary information;
a fourth determining module, configured to input, according to the fourth correspondence, program basic block information corresponding to the field to a pre-trained neural network model, and determine a field type corresponding to the field;
the recording module is used for recording the field types in a file with a preset format according to a format information model to obtain a format template file corresponding to the initial seed file;
and the testing module is used for carrying out fuzzy testing on the software based on the format template file by adopting a fuzzy testing tool, recording a variation execution result and carrying out self-adaptive optimization of the software fuzzy testing according to the variation execution result.
7. The apparatus of claim 6, wherein the fourth determining module is configured to obtain a basic block of the program corresponding to the field according to a fourth correspondence; vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
8. A computer device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing computer-executable instructions stored in the memory causes the at least one processor to perform the software fuzzing method of any one of claims 1 to 5.
9. A computer storage medium having stored therein computer executable instructions which, when executed by a processor, implement the software fuzzing method of any one of claims 1 to 5.
CN202310067956.2A 2023-01-12 2023-01-12 Software fuzzy test method, device, equipment and storage medium Active CN116108449B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310067956.2A CN116108449B (en) 2023-01-12 2023-01-12 Software fuzzy test method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310067956.2A CN116108449B (en) 2023-01-12 2023-01-12 Software fuzzy test method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116108449A CN116108449A (en) 2023-05-12
CN116108449B true CN116108449B (en) 2024-02-23

Family

ID=86257699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310067956.2A Active CN116108449B (en) 2023-01-12 2023-01-12 Software fuzzy test method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116108449B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117827685B (en) * 2024-03-05 2024-04-30 国网浙江省电力有限公司丽水供电公司 Fuzzy test input generation method, device, terminal and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622558A (en) * 2012-03-01 2012-08-01 北京邮电大学 Excavating device and excavating method of binary system program loopholes
CN103440201A (en) * 2013-09-05 2013-12-11 北京邮电大学 Dynamic taint analysis device and application thereof to document format reverse analysis
CN107025175A (en) * 2017-05-12 2017-08-08 北京理工大学 A kind of fuzz testing seed use-case variable-length field pruning method
CN108416219A (en) * 2018-03-18 2018-08-17 西安电子科技大学 A kind of Android binary files leak detection method and system
CN112905184A (en) * 2021-01-08 2021-06-04 浙江大学 Pile-insertion-based industrial control protocol grammar reverse analysis method under basic block granularity

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622558A (en) * 2012-03-01 2012-08-01 北京邮电大学 Excavating device and excavating method of binary system program loopholes
CN103440201A (en) * 2013-09-05 2013-12-11 北京邮电大学 Dynamic taint analysis device and application thereof to document format reverse analysis
CN107025175A (en) * 2017-05-12 2017-08-08 北京理工大学 A kind of fuzz testing seed use-case variable-length field pruning method
CN108416219A (en) * 2018-03-18 2018-08-17 西安电子科技大学 A kind of Android binary files leak detection method and system
CN112905184A (en) * 2021-01-08 2021-06-04 浙江大学 Pile-insertion-based industrial control protocol grammar reverse analysis method under basic block granularity

Also Published As

Publication number Publication date
CN116108449A (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN110263538B (en) Malicious code detection method based on system behavior sequence
CN104881611A (en) Method and apparatus for protecting sensitive data in software product
CN116108449B (en) Software fuzzy test method, device, equipment and storage medium
CN111092894A (en) Webshell detection method based on incremental learning, terminal device and storage medium
CN111338622B (en) Supply chain code identification method, device, server and readable storage medium
CN111400695B (en) Equipment fingerprint generation method, device, equipment and medium
CN112181430A (en) Code change statistical method and device, electronic equipment and storage medium
CN114238980A (en) Industrial control equipment vulnerability mining method, system, equipment and storage medium
CN113901463A (en) Concept drift-oriented interpretable Android malicious software detection method
CN114285587A (en) Domain name identification method and device and domain name classification model acquisition method and device
CN110070383B (en) Abnormal user identification method and device based on big data analysis
CN114792007A (en) Code detection method, device, equipment, storage medium and computer program product
CN115828244A (en) Memory leak detection method and device and related equipment
CN113946826A (en) Method, system, equipment and medium for analyzing and monitoring vulnerability fingerprint silence
Alexandra-Cristina et al. Material survey on source code plagiarism detection in programming courses
CN113254352A (en) Test method, device, equipment and storage medium for test case
JP2022505341A (en) Systems and methods for selectively instrumenting programs according to performance characteristics
CN113177784B (en) Address type identification method and device
Ahn et al. Data embedding scheme for efficient program behavior modeling with neural networks
CN117688564B (en) Detection method, device and storage medium for intelligent contract event log
CN116578979B (en) Cross-platform binary code matching method and system based on code features
CN114880637B (en) Account risk verification method and device, computer equipment and storage medium
CN110928788B (en) Service verification method and device
Demanou et al. A Dynamic Model Selection Approach to Mitigate the Change of Balance Problem in Cross-Version Bug Prediction.
CN117648242A (en) Application abnormality detection method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant