CN114610320B - LLVM (LLVM) -based variable type information restoration and comparison method and system - Google Patents

LLVM (LLVM) -based variable type information restoration and comparison method and system Download PDF

Info

Publication number
CN114610320B
CN114610320B CN202210279549.3A CN202210279549A CN114610320B CN 114610320 B CN114610320 B CN 114610320B CN 202210279549 A CN202210279549 A CN 202210279549A CN 114610320 B CN114610320 B CN 114610320B
Authority
CN
China
Prior art keywords
type
variable
information
llvm
source code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210279549.3A
Other languages
Chinese (zh)
Other versions
CN114610320A (en
Inventor
纪守领
刘丁豪
何钦铭
陈建海
刘二腾
许端清
王文海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210279549.3A priority Critical patent/CN114610320B/en
Publication of CN114610320A publication Critical patent/CN114610320A/en
Application granted granted Critical
Publication of CN114610320B publication Critical patent/CN114610320B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a variable type information restoration and comparison method and system based on LLVM, comprising variable type information restoration analysis and variable type comparison analysis; the variable type information repair analysis comprises compiling target program source code into LLVM IR, extracting target variables, matching the LLVM IR variable types with source code information and storing type analysis results. The variable type information repair analysis and the variable type comparison analysis are realized through two LLVM analysis flows, and the variable type comparison analysis result applies the variable type information repair analysis result. The method and the system can solve the problems that the type comparison analysis cannot be carried out and the comparison analysis is inaccurate due to the lack of type information and/or the association type related type in the prior LLVM IR.

Description

LLVM (LLVM) -based variable type information restoration and comparison method and system
Technical Field
The invention belongs to the technical field of software program analysis, and particularly relates to a variable type information restoration and comparison method and system based on LLVM.
Background
Along with the rapid development of computer software, the code scale and the functional complexity of the software are continuously improved, and the requirement for analysis of the computer software is increased, such as program bug detection, program compiling optimization and the like. LLVM is one of the most popular program analysis frameworks at present, can convert source codes of various programming languages into LLVM intermediate representations (INTERMEDIATE REPRESENTATION IR) with rich semantic information and uniform formats, supports developers to design and realize custom program analysis flows on IR, and is widely applied to various fields such as compiling optimization, automatic vulnerability mining, automatic vulnerability restoration, patch analysis, clone detection and the like.
LLVM has been reconfigured once in version 3.0 for its LLVM IR type system, and its type system body framework has been in use until now. In the type system of the current LLVM, the types of all variables are divided into a void type, a function type and a primary type; the primary type comprises a single value type, a label type, a token type, a metadata type and an aggregation type; the aggregate type further includes an array type, a structure type, and an opaque structure type. In this set of type systems, the comparison between variable types within the same context (LLVMContext) can be done by pointer comparison, greatly improving the efficiency of program analysis. The variable type comparison task is the basis of analysis of a large number of upper programs, such as global call graph construction, control flow integrity protection, pointer alias analysis and the like, so that the construction of a complete type comparison method has important significance.
However, in the process of compiling the source code into LLVM IR, there are cases where the type information is lost, such as: after the partial structure body type and the function type are compiled into LLVM IR, the pointer field of the function in the structure body or the partial parameters of the function are compiled into a null pointer type; and part of the structure types also have the case of losing the structure name. In addition, LLVM IR type systems do not design a separate type for the federation (union) type in the C/C++ language, which is treated as a fabric type variable when compiling the federation type variable, and then switch the variable to the desired type by type conversion when in use. When a fabric type contains a domain of a federated type, the same fabric type variable used by different contexts may have different types of domain member variables. The above problem may obviously affect the type comparison task, so that the type comparison task should be identified as an equivalent type and is actually identified as an unequivalent type, and further, misinformation, missing report or analysis error conditions (such as pointer alias analysis based on type analysis, indirect call target analysis based on type analysis, etc.) occur in an upper layer task based on type analysis. Such problems can also severely threaten program security and stability if the upper layer tasks are security related tasks (e.g., control flow integrity).
In view of the above problems, the existing implementation solutions are not complete. The type comparison method realized in the LLVM designs different comparison strategies aiming at different types, but does not carry out additional examination and processing on type information deletion and type comparison related to the type of the association; the indirect call target recognition tool TypeDive based on multi-layer type analysis compares types among different contexts by comparing string information represented by the types, has higher comparison efficiency for simple types such as single-value types, tag types and the like, but cannot cope with type information deficiency and type comparison related to the association type.
Disclosure of Invention
In view of the above, the present invention aims to provide a variable type information repairing and comparing method and system based on LLVM, so as to solve the problems that type comparison analysis cannot be performed and comparison analysis is inaccurate due to type information deficiency and/or association type related types in LLVM IR at present.
To achieve the above object, an embodiment provides a variable type information restoration and comparison method based on LLVM, including the following steps:
Step 1, acquiring and compiling a target program source code into LLVM intermediate representation with debugging information;
Step 2, extracting target variables from the LLVM intermediate representation, wherein the target variables comprise structures related to analysis tasks and related to type information deletion or structures with types of complex types;
Step 3, obtaining a source code structure body and a source code definition type thereof corresponding to the structure body and an intermediate representation type thereof contained in the target variable in the source code of the target program according to the debugging information, comparing and analyzing the intermediate representation type of the structure body and the source code definition type of the corresponding source code structure body, and outputting a structure body and a corresponding source code structure body, wherein the comparison result is inconsistent, so as to form a structure body pair;
step 4, for each structure body pair, repairing variable type information by utilizing the source code definition type of the source code structure body and storing the variable type information in a repairing database;
and 5, when the intermediate representation type comparison analysis is carried out on the two variables to be compared and analyzed, calling the structural body information stored in the repair database to repair the missing type information of the structural body, and then carrying out the variable intermediate representation type comparison analysis.
In one embodiment, step 1, comprises:
configuring a compiling environment, and preparing a compiler and target program source codes according to actual requirements;
Configuring compiling options of source codes of target programs, including enabling a reserved debugging information option;
executing the compiling flow, checking the correctness and the integrity of the LLVM intermediate representation after the compiling is finished, and outputting and storing the LLVM intermediate representation with the debugging information after checking the correctness.
In one embodiment, step2, comprises:
Step 2-1, extracting LLVM variables to be analyzed from the LLVM intermediate representation according to the analysis task;
Step 2-2, extracting an intermediate representation type of the LLVM variable in the LLVM intermediate representation, and screening pointer types containing the structure bodies, array types containing the structure bodies and LLVM variables corresponding to the structure body types from the intermediate representation types as candidate LLVM variables;
And 2-3, screening the variables with the missing structural body type information or the variable containing the complex type from the candidate LLVM variables as target variables and outputting the variables.
In one embodiment, before extracting the target variable from the LLVM intermediate representation, further comprising: and checking the read version information and debugging information of the LLVM intermediate representation, extracting the target variable when the version information is matched with the current analysis framework and the debugging information exists, otherwise, terminating the target variable extraction and sending out an alarm to request manual processing.
In step3 of one embodiment, comprising:
Step 3-1, obtaining a target variable, and debug information and an intermediate representation type corresponding to the target variable;
Step 3-2, when the intermediate representation type is judged to be a pointer type, an array type or a structure type, executing the steps 3-2 to 3-6; otherwise, the comparison range is considered to be exceeded, and the type comparison result is considered to be consistent in type;
step 3-3, when the intermediate representation type is a pointer type, obtaining the type of a pointer pointing variable, extracting a corresponding variable corresponding to the pointing variable and a source code definition type of the corresponding variable from a target program source code according to debugging information, taking the type of the pointing variable as the intermediate representation type, and executing the step 3-2 in a jumping manner;
Step 3-4, when the intermediate representation type is an array type, obtaining the type of the array member variable, extracting a corresponding variable corresponding to the array member variable and a source code definition type of the corresponding variable from the source code of the target program according to the debugging information, taking the type of the array member variable as the intermediate representation type, and jumping to execute the step 3-2;
Step 3-5, when the middle representation type is a structural body type, acquiring a structural body and the type thereof contained in a target variable, extracting a corresponding source code structural body and a source code definition type thereof from a target program source code according to debugging information, and entering step 3-6; then, the types of the sub-member variables of the structure body are obtained, corresponding variables corresponding to the sub-member variables and source code definition types of the corresponding variables are extracted from the source code of the target program according to the debugging information, the types of the sub-member variables are used as intermediate representation types, and the step 3-2 is executed in a jumping mode;
And 3-6, comparing and analyzing the type of the structure body with the source code definition type of the source code structure body, and outputting a structure body pair consisting of the structure body and the source code structure body if the type is inconsistent, wherein the type inconsistency comprises inconsistency caused by the type name of the structure body, namely inconsistency caused by the type of union and inconsistency caused by the type deletion of the structure body.
In one embodiment, step 4, comprises:
When the type information of the structural body in the structural body centering is missing, the source code definition type of the source code structural body is used as the missing type of the structural body to repair the type information, the middle representation type of the structural body is used as a Key, the source code definition type of the corresponding structural body is used as a Value, and the Value is stored in a repair database in the form of K-V Key Value pairs;
when the type of the structural body in the structural body pair is the type of the complex, the type of the complex is represented by a custom character string, the middle representation type of the structural body is taken as a Key, the custom character string is taken as Value, and the custom character string is stored in a repair database in the form of a K-V Key Value pair.
In step 5 of one embodiment, comprising:
Step 5-1, the upper program analysis task extracts the intermediate representation type of the two variables to be analyzed and compared from the intermediate representation of the LLVM corresponding to the source code of the target program;
Step 5-2, calling a type comparison method of the LLVM analysis framework to carry out comparison analysis on the intermediate representation types of the two variables, outputting a comparison result if the comparison result is consistent, and executing step 5-3 if the comparison result is inconsistent;
Step 5-3, when judging that the intermediate representation type comprises the structure type, if the type name information of the two structure types is non-empty, removing struct in the structure type names, comparing the structure type name information, and if the comparison results are consistent, outputting the comparison results; if the type information of the existing structure type is empty, executing the step 5-4;
step 5-4, inquiring the structural body information from the repair database aiming at the structural body type with the empty type information, and if the structural body information corresponding to the structural body type with the empty type information cannot be found, considering that the comparison result is inconsistent and outputting; if the structure information corresponding to the structure type with the type information being empty can be found, executing the step 5-5;
And 5-5, judging that the type information contained in the called structure information is a custom character string, considering the comparison result as unknown and outputting, judging that the type information contained in the called structure information is a non-custom character string, comparing the type information with the type information as the type name information of the structure type with empty type name information, and outputting the comparison result.
In order to achieve the above object, an embodiment of the present invention provides a variable type information restoration and comparison system based on LLVM, including:
the compiling module is used for acquiring and compiling the source code of the target program into LLVM intermediate representation with debugging information;
An extraction module for extracting a target variable from the LLVM intermediate representation, wherein the target variable comprises a structure related to an analysis task and related to type information missing or a structure of a type of a complex;
The type matching module is used for acquiring a structure body contained in the target variable and a source code structure body corresponding to the intermediate representation type of the structure body in the source code of the target program and a source code definition type of the structure body according to the debugging information, comparing and analyzing the intermediate representation type of the structure body and the source code definition type of the corresponding source code structure body, and outputting a structure body with inconsistent comparison results and the corresponding source code structure body to form a structure body pair;
the type restoration module is used for restoring variable type information by utilizing the source code definition type of the source code structure body for each structure body pair and storing the variable type information in a restoration database;
And the analysis and comparison module is used for calling the structural body information stored in the repair database to repair the missing type information of the structural body when the intermediate representation type comparison analysis is carried out on the two variables to be compared and analyzed, and then carrying out the variable intermediate representation type comparison analysis.
Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:
(1) The repair of the variable type information missing in the LLVM IR is completed by combining the target program source code and the LLVM intermediate representation, the repaired type name is used for identifying and comparing the composite type information, and the method has higher accuracy and analysis robustness.
(2) Before the intermediate representation type of the variable is repaired, the variable range needing to be analyzed is reduced through target variable extraction and intermediate representation type inspection, only the variable which is concerned by the user and possibly has information loss is analyzed, and meanwhile, the user can add a custom variable screening rule in the part, so that the method has high efficiency.
(3) The variable type information repairing and comparing system based on the LLVM has high portability, type information repairing and comparing are realized through LLVM analysis flow (LLVM Pass), the system is supported to be embedded into upper layer analysis tasks based on LLVM analysis frames of various different types, developers of related programs can use the system without modifying the LLVM analysis frames, and flexible type comparison functional service is provided.
(4) The provided variable type information restoration and comparison method and system have high efficiency, aiming at a set of LLVM intermediate representation, the type information restoration analysis flow is only required to be executed once, the analysis result can be multiplexed for multiple times in the subsequent type comparison process through external storage, and the analysis process does not modify the existing source code or LLVM intermediate representation and does not influence other LLVM-based program analysis tools.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a LLVM-based variable type information repair and comparison method provided by an embodiment;
FIG. 2 is a flow chart of object variable extraction provided by an embodiment;
FIG. 3 is a flow chart of a LLVM-based variable type information repair and comparison system provided by an embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.
Aiming at the situation that type information is lost in LLVM analysis framework, especially the information loss of structural body type, which seriously affects the accuracy of type comparison analysis, the embodiment provides a variable type information repairing and comparing method and system based on LLVM, which mainly comprises a variable type information repairing part and a variable type comparison analysis part, both the parts are realized through LLVM analysis flow, in particular, the variable type information repairing analysis flow is mainly used for analyzing variables which need to be subjected to type repairing in a target program, completing type repairing tasks and storing type repairing results; the variable type comparison analysis flow is used for reading in a variable type comparison request sent by an upper program analysis task, completing the variable type comparison task and returning a variable comparison analysis result.
Fig. 1 is a flowchart of a variable type information repair and comparison method based on LLVM provided in an embodiment. As shown in fig. 1, the variable type information repairing and comparing method based on LLVM provided by the embodiment includes the following steps:
Step 1, acquiring and compiling target program source codes into LLVM intermediate representations with debugging information.
In an embodiment, compiling the target program source code into LLVM IR includes: the user provides the target software source code to be analyzed, configures compiling information meeting related requirements, enables a debugging information retaining option during compiling, then starts compiling the target source code into LLVM IR, and stores the LLVM IR file obtained after compiling.
When configuring a compiling environment, a user needs to prepare a compiler of a proper version according to actual requirements, wherein the compiler comprises but is not limited to using a Clang compiler.
When the user configures the compiling option, the option of keeping the debugging information needs to be started, and other compiling option information is configured according to the actual demands of the user. Wherein, the mode of enabling the reserved debug information option includes, but is not limited to, an add-g option; methods of configuring other compilation options include, but are not limited to, completion by configuring Makefile.
And the user starts a compiling flow, performs a compiling process, and checks whether the output LLVM IR file is correct and complete after the compiling is finished. After checking for errors, the output LLVM IR file is stored locally. The storage mode includes, but is not limited to, writing into a MySQL database, and writing into a local hard disk or a memory for storage.
And 2, extracting a target variable from the LLVM intermediate representation.
In an embodiment, the target variable extraction includes: the analysis program reads LLVM IR with debugging information, scans all LLVM IR, extracts all variables related to analysis tasks and having type information missing in the LLVM IR and all variables related to the type of the complex according to requirements, and the type of the variables with the type information missing is a structural variable type, and the variables related to the type of the complex are structural type variables with type names of union.
FIG. 2 is a flow chart of object variable extraction provided by an embodiment. As shown in fig. 2, specifically, the target variable extraction includes:
step 2-1, LLVM IR file acquisition and inspection.
In an embodiment, an analysis program reads an LLVM IR file to be analyzed, checks LLVM IR version information and checks whether debug information is included, and if the LLVM IR version is not consistent with an LLVM IR version which can be processed by a current implementation type completion analysis flow, or the input LLVM IR file does not include debug information, terminates a subsequent analysis flow, issues an alarm and requests manual processing.
And 2-2, extracting LLVM variables to be analyzed.
In an embodiment, after the LLVM IR file passes detection, extracting LLVM variables to be analyzed from the LLVM intermediate representation according to an analysis task; wherein the LLVM variables are extracted according to custom rules designed for the analysis task, including, but not limited to, extracting all global variables, function definitions, and all instructions within the function.
And 2-3, screening candidate LLVM variables in the LLVM variables.
In an embodiment, an intermediate representation type of the LLVM variable in the intermediate representation of the LLVM is extracted, and a pointer type containing a structure, an array type containing the structure and an LLVM variable corresponding to the structure type are screened from the intermediate representation type as candidate LLVM variables, wherein the candidate LLVM variables are used as data bases for determining target variables. Note that the LLVM intermediate representation includes not only the LLVM variable but also an intermediate representation type corresponding to the LLVM variable.
In one embodiment, the step of screening the candidate LLVM variables among the LLVM variables includes:
Step 2-3-1, when the intermediate representation type is judged to be a pointer type, an array type or a structure type, executing the steps 2-3-2 to 2-3-4; otherwise, stopping judging;
2-3-2, when the middle representation type is a structure type, taking an input LLVM variable corresponding to the structure type as a candidate LLVM variable;
Step 2-3-3, when the middle representation type is a pointer type, acquiring the type of a pointer pointing variable, and when the type of the pointing variable is judged to be a structure type, considering the original pointer type as a pointer type containing a structure, and taking an input LLVM variable corresponding to the pointer type containing the structure as a candidate LLVM variable; otherwise, taking the type of the pointing variable as an intermediate representation type, and executing the step 2-3-1 in a jumping manner;
2-3-4, when the middle representation type is an array type, acquiring the type of an array member variable, and when the type of the array member variable is judged to be a structure type, considering the original array type as an array type containing a structure, and taking an input LLVM variable corresponding to a pointer type containing the structure as a candidate LLVM variable; otherwise, taking the type of the array member variable as the intermediate representation type, and executing the step 2-3-1 in a jumping manner.
And 2-4, screening target variables from the candidate LLVM variables and outputting the target variables.
In an embodiment, the candidate LLVM variables are screened for the missing structure type information or variables containing the association type as target variables and output. The method specifically comprises the following steps:
in one embodiment, the step of screening the candidate LLVM variables for the target variable includes:
Step 2-4-1, checking whether the structure type name is empty when the middle representation type is the structure type, if so, considering that the structure type information is missing, and taking the input candidate LLVM variable as a target variable; if not empty and the structure type name includes union, then consider it to be a complex type, again taking the input candidate LLVM variable as the target variable; otherwise, obtaining the types of all the sub-member variables of the structure body, taking the types of the sub-member variables as intermediate representation types, and executing the step 2-4 in a jumping manner;
Step 2-4-2, when the middle representation type is a pointer type, checking whether the type name of the pointer pointing to the structure body is empty, if so, deleting the structure body type, and taking the input candidate LLVM variable as a target variable; if not null and the pointer points to the type name of the structure containing union, then consider it to be a complex type, with the same input candidate LLVM variable as the target variable; otherwise, obtaining the type of the pointer pointing to the variable, taking the type of the pointer pointing to the variable as an intermediate representation type, and executing the step 2-4 in a jumping manner;
step 2-4-3, when the middle representation type is an array type, checking whether the type name of the array containing the structural body is empty, if so, deleting the structural body type, and taking the input candidate LLVM variable as a target variable; if the type name is not empty and the type name of the array containing structure body contains union, the type name is considered as a complex type, the input candidate LLVM variable is also taken as a target variable, otherwise, the type of the array member variable is obtained, the type of the array member variable is taken as an intermediate representation type, and the step 2-4 is executed in a jumping manner;
Wherein the candidate LLVM variables include a structure type variable, an array type variable, and a pointer type variable.
Step 3, matching the structure of the target variable with the corresponding structure of the source code information to form a structure pair.
In an embodiment, a corresponding structure and a definition type of the corresponding structure in the source code of the target program of the structure included in the target variable are obtained according to the debugging information, the type of the structure and the definition type of the corresponding structure are compared and analyzed, and a structure pair formed by the structure and the corresponding structure, the comparison result of which is inconsistent, is output.
In one embodiment, the matching process of the structure of the target variable and the corresponding structure of the source code information includes:
And step 3-1, acquiring a target variable, and debug information and an intermediate representation type corresponding to the target variable. Specifically, debug information for variables may be extracted through LLVM MDNode.
Step 3-2, when the intermediate representation type is judged to be a pointer type, an array type or a structure type, executing the steps 3-2 to 3-6; otherwise, the comparison range is considered to be exceeded, and the type comparison result is considered to be consistent in type;
step 3-3, when the intermediate representation type is a pointer type, obtaining the type of a pointer pointing variable, extracting a corresponding variable corresponding to the pointing variable and a source code definition type of the corresponding variable from a target program source code according to debugging information, taking the type of the pointing variable as the intermediate representation type, and executing the step 3-2 in a jumping manner;
Step 3-4, when the intermediate representation type is an array type, obtaining the type of the array member variable, extracting a corresponding variable corresponding to the array member variable and a source code definition type of the corresponding variable from the source code of the target program according to the debugging information, taking the type of the array member variable as the intermediate representation type, and jumping to execute the step 3-2;
Step 3-5, when the middle representation type is a structural body type, acquiring a structural body and the type thereof contained in a target variable, extracting a corresponding source code structural body and a source code definition type thereof from a target program source code according to debugging information, and entering step 3-6; then, the types of the sub-member variables of the structure body are obtained, corresponding variables corresponding to the sub-member variables and source code definition types of the corresponding variables are extracted from the source code of the target program according to the debugging information, the types of the sub-member variables are used as intermediate representation types, and the step 3-2 is executed in a jumping mode;
and 3-6, comparing and analyzing the type of the structure body with the definition type of the corresponding structure body, and outputting a structure body pair consisting of the structure body and the corresponding structure body if the type is inconsistent, wherein the type inconsistency comprises inconsistency caused by the type of the structure body with the type name of union and inconsistency caused by the type of the structure body with the type of the structure body missing.
And 4, repairing by using the definition type of the corresponding structure body as the type of the structure body and storing the type of the structure body in a repairing database.
In an embodiment, the obtaining the structure pair constructed in the step 3, and adopting different repair modes according to the structure type in the structure pair includes:
When the type information of the structural body in the structural body centering is missing, the source code definition type of the source code structural body is used as the missing type of the structural body to repair the type information, the middle representation type of the structural body is used as a Key, the source code definition type of the corresponding structural body is used as a Value, and the Value is stored in a repair database in the form of K-V Key Value pairs;
When the type of the structural body in the structural body pair is a complex type, the type is provided with a union character string, the complex type is represented by a custom character string, the middle representation type of the structural body is taken as a Key, the custom character string is taken as a Value, and the Value is stored in a repair database in the form of a K-V Key Value pair.
It should be noted that MySQL database may be used to store the above-described structure information stored in the form of key value pairs. The use of other methods or techniques to enable the storage of structural body information is not precluded. The character string defined may be escape, etc., and is not limited as long as it does not conflict with the existing structure type name.
And 5, calling structural body information in the repair database to repair the intermediate representation type, and then performing intermediate representation type comparison analysis.
In the embodiment, when the intermediate representation type comparison analysis is performed on two variables to be compared and analyzed, the structure body information stored in the repair database is called to repair the missing type information of the structure body included in the intermediate representation type, and then the variable intermediate representation type comparison analysis is performed.
In one embodiment, the comparative analysis process includes:
Step 5-1, the upper program analysis task extracts the intermediate representation type of the two variables to be analyzed and compared from the intermediate representation of the LLVM corresponding to the source code of the target program;
In the embodiment, an analysis program reads a variable type comparison analysis request sent by an upper program analysis task, extracts LLVM intermediate representation types of two variable types to be compared from the request, and if the type extraction fails, terminates a subsequent flow, sends out an alarm and requests manual processing; otherwise, step 5-2 is entered. Wherein the upper level program analysis tasks include other program analysis tasks requiring variable type comparison, such as indirect call analysis based on type analysis, alias analysis based on type information, etc., which send a type comparison request to the variable type comparison analysis stream.
Step 5-2, calling a type comparison method of the LLVM analysis framework to carry out comparison analysis on the intermediate representation types of the two variables, outputting a comparison result if the comparison result is consistent, and executing step 5-3 if the comparison result is inconsistent;
In an embodiment, the LLVM self-contained type comparison method uses FunctionComparator: cmpTypes () method for comparison.
Step 5-3, when judging that the intermediate representation type comprises the structure type, if the type name information of the two structure types is non-empty, removing struct in the structure type names, comparing the structure type name information, and if the comparison results are consistent, outputting the comparison results; if the type information of the existing structure type is empty, executing the step 5-4;
Step 5-4, inquiring the structural body information from the repair database aiming at the structural body type with the empty type information, and if the structural body information corresponding to the structural body type with the empty type information cannot be found, namely, the Key corresponding to the structural body type with the empty type information cannot be found, considering that the comparison result is inconsistent and outputting; if the structural body information corresponding to the structural body type with the empty type information can be found, namely, the Key is found, executing the step 5-5;
and 5-5, judging that the type information contained in the called structure information is a custom character string, namely, the Value corresponding to the Key is the custom character string, considering the comparison result as unknown (such as unknown) and outputting the result, judging that the type information contained in the called structure information is a non-custom character string, and comparing the type information with the structure type name information after taking the type information as the type name information of the structure type with empty type name information, and outputting the comparison result.
FIG. 3 is a flow chart of a LLVM-based variable type information repair and comparison system provided by an embodiment. As shown in fig. 3, the variable type information repair and comparison system provided in the embodiment includes:
the compiling module is used for acquiring and compiling the source code of the target program into LLVM intermediate representation with debugging information;
An extraction module for extracting a target variable from the LLVM intermediate representation, wherein the target variable comprises a structure related to an analysis task and related to type information missing or a structure of a type of a complex;
The type matching module is used for acquiring a structure body contained in the target variable and a source code structure body corresponding to the intermediate representation type of the structure body in the source code of the target program and a source code definition type of the structure body according to the debugging information, comparing and analyzing the intermediate representation type of the structure body and the source code definition type of the corresponding source code structure body, and outputting a structure body with inconsistent comparison results and the corresponding source code structure body to form a structure body pair;
the type restoration module is used for restoring variable type information by utilizing the source code definition type of the source code structure body for each structure body pair and storing the variable type information in a restoration database;
And the analysis and comparison module is used for calling the structural body information stored in the repair database to repair the missing type information of the structural body when the intermediate representation type comparison analysis is carried out on the two variables to be compared and analyzed, and then carrying out the variable intermediate representation type comparison analysis.
It should be noted that, when the variable type information restoring and comparing device based on LLVM provided in the foregoing embodiment performs the variable type information restoring and comparing method, the dividing of each functional module should be used for illustration, and the function allocation may be completed by different functional modules according to the need, that is, the internal structure of the terminal or the server is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the variable type information restoration and comparison device based on the LLVM and the variable type information restoration and comparison method based on the LLVM provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the variable type information restoration and comparison device based on the LLVM are detailed in the embodiment of the variable type information restoration and comparison method based on the LLVM, which is not described herein again.
The variable type information restoration and comparison system based on the LLVM provided by the embodiment supports the operation as an independent LLVM analysis flow, provides a type comparison query interface for other upper-layer tasks needing type comparison analysis in a pluggable mode, and has high portability.
The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims (9)

1. The variable type information restoration and comparison method based on LLVM is characterized by comprising the following steps:
Step 1, acquiring and compiling a target program source code into LLVM intermediate representation with debugging information;
Step 2, extracting target variables from the LLVM intermediate representation, wherein the target variables comprise structures related to analysis tasks and related to type information deletion or structures with types of complex types;
Step 3, obtaining a source code structure body and a source code definition type thereof corresponding to the structure body and an intermediate representation type thereof contained in the target variable in the source code of the target program according to the debugging information, comparing and analyzing the intermediate representation type of the structure body and the source code definition type of the corresponding source code structure body, and outputting a structure body and a corresponding source code structure body, wherein the comparison result is inconsistent, so as to form a structure body pair;
step 4, for each structure body pair, repairing variable type information by utilizing the source code definition type of the source code structure body and storing the variable type information in a repairing database;
And 5, when performing intermediate representation type comparison analysis on two variables to be compared and analyzed, calling the structural body information stored in the repair database to repair the missing type information of the structural body, and then performing variable intermediate representation type comparison analysis, wherein the step comprises the following steps:
Step 5-1, the upper program analysis task extracts the intermediate representation type of the two variables to be analyzed and compared from the intermediate representation of the LLVM corresponding to the source code of the target program;
Step 5-2, calling a type comparison method of the LLVM analysis framework to carry out comparison analysis on the intermediate representation types of the two variables, outputting a comparison result if the comparison result is consistent, and executing step 5-3 if the comparison result is inconsistent;
Step 5-3, when judging that the intermediate representation type comprises the structure type, if the type name information of the two structure types is non-empty, removing struct in the structure type names, comparing the structure type name information, and if the comparison results are consistent, outputting the comparison results; if the type information of the existing structure type is empty, executing the step 5-4;
step 5-4, inquiring the structural body information from the repair database aiming at the structural body type with the empty type information, and if the structural body information corresponding to the structural body type with the empty type information cannot be found, considering that the comparison result is inconsistent and outputting; if the structure information corresponding to the structure type with the type information being empty can be found, executing the step 5-5;
And 5-5, judging that the type information contained in the called structure information is a custom character string, considering the comparison result as unknown and outputting, judging that the type information contained in the called structure information is a non-custom character string, comparing the type information with the type information as the type name information of the structure type with empty type name information, and outputting the comparison result.
2. The LLVM-based variable type information repair and comparison method as set forth in claim 1, wherein step 1 comprises:
configuring a compiling environment, and preparing a compiler and target program source codes according to actual requirements;
Configuring compiling options of source codes of target programs, including enabling a reserved debugging information option;
executing the compiling flow, checking the correctness and the integrity of the LLVM intermediate representation after the compiling is finished, and outputting and storing the LLVM intermediate representation with the debugging information after checking the correctness.
3. The LLVM-based variable type information repair and comparison method of claim 1, wherein step 2 comprises:
Step 2-1, extracting LLVM variables to be analyzed from the LLVM intermediate representation according to the analysis task;
Step 2-2, extracting an intermediate representation type of the LLVM variable in the LLVM intermediate representation, and screening pointer types containing the structure bodies, array types containing the structure bodies and LLVM variables corresponding to the structure body types from the intermediate representation types as candidate LLVM variables;
And 2-3, screening the variables with the missing structural body type information or the variable containing the complex type from the candidate LLVM variables as target variables and outputting the variables.
4. The LLVM-based variable type information repair and comparison method as set forth in claim 3, wherein the step 2-2 comprises:
step 2-2-1, when the intermediate representation type is judged to be a pointer type, an array type or a structure type, executing the steps 2-2-4; otherwise, stopping judging;
2-2-2, when the middle representation type is a structure type, taking an input LLVM variable corresponding to the structure type as a candidate LLVM variable;
step 2-2-3, when the middle representation type is a pointer type, acquiring the type of a pointer pointing variable, and when the type of the pointing variable is judged to be a structure type, considering the original pointer type as a pointer type containing a structure, and taking an input LLVM variable corresponding to the pointer type containing the structure as a candidate LLVM variable; otherwise, taking the type of the pointing variable as an intermediate representation type, and executing the step 2-2-1 in a jumping manner;
2-2-4, when the middle representation type is an array type, acquiring the type of an array member variable, and when the type of the array member variable is judged to be a structure type, considering the original array type as an array type containing a structure, and taking an input LLVM variable corresponding to a pointer type containing the structure as a candidate LLVM variable; otherwise, taking the type of the array member variable as the intermediate representation type, and executing the step 2-2-1 in a jumping manner.
5. The LLVM-based variable type information repair and comparison method as set forth in claim 3, wherein the step 2-3 comprises:
Step 2-3-1, checking whether the structure type name is empty when the middle representation type is the structure type, if so, considering that the structure type information is missing, and taking the input candidate LLVM variable as a target variable; if not empty and the structure type name includes union, then consider it to be a complex type, again taking the input candidate LLVM variable as the target variable; otherwise, obtaining the types of all the sub-member variables of the structure body, taking the types of the sub-member variables as intermediate representation types, and executing the step 2-3 in a jumping manner;
step 2-3-2, when the middle representation type is a pointer type, checking whether the type name of the pointer pointing to the structure body is empty, if so, deleting the structure body type, and taking the input candidate LLVM variable as a target variable; if not null and the pointer points to the type name of the structure containing union, then consider it to be a complex type, with the same input candidate LLVM variable as the target variable; otherwise, obtaining the type of the pointer pointing to the variable, taking the type of the pointer pointing to the variable as an intermediate representation type, and executing the step 2-3 in a jumping manner;
Step 2-3-3, when the middle representation type is an array type, checking whether the type name of the array containing the structural body is empty, if so, deleting the structural body type, and taking the input candidate LLVM variable as a target variable; if the type name is not empty and the type name of the array containing structure body contains union, the type name is considered as a complex type, the input candidate LLVM variable is also taken as a target variable, otherwise, the type of the array member variable is obtained, the type of the array member variable is taken as an intermediate representation type, and the step 2-3 is executed in a jumping manner;
Wherein the candidate LLVM variables include a structure type variable, an array type variable, and a pointer type variable.
6. The LLVM-based variable type information repair and comparison method of claim 1, further comprising, prior to extracting the target variable from the LLVM intermediate representation: and checking the read version information and debugging information of the LLVM intermediate representation, extracting the target variable when the version information is matched with the current analysis framework and the debugging information exists, otherwise, terminating the target variable extraction and sending out an alarm to request manual processing.
7. The LLVM-based variable type information repair and comparison method of claim 1, wherein step 3 comprises:
Step 3-1, obtaining a target variable, and debug information and an intermediate representation type corresponding to the target variable;
Step 3-2, when the intermediate representation type is judged to be a pointer type, an array type or a structure type, executing the steps 3-2 to 3-6; otherwise, the comparison range is considered to be exceeded, and the type comparison result is considered to be consistent in type;
step 3-3, when the intermediate representation type is a pointer type, obtaining the type of a pointer pointing variable, extracting a corresponding variable corresponding to the pointing variable and a source code definition type of the corresponding variable from a target program source code according to debugging information, taking the type of the pointing variable as the intermediate representation type, and executing the step 3-2 in a jumping manner;
Step 3-4, when the intermediate representation type is an array type, obtaining the type of the array member variable, extracting a corresponding variable corresponding to the array member variable and a source code definition type of the corresponding variable from the source code of the target program according to the debugging information, taking the type of the array member variable as the intermediate representation type, and jumping to execute the step 3-2;
Step 3-5, when the middle representation type is a structural body type, acquiring a structural body and the type thereof contained in a target variable, extracting a corresponding source code structural body and a source code definition type thereof from a target program source code according to debugging information, and entering step 3-6; then, the types of the sub-member variables of the structure body are obtained, corresponding variables corresponding to the sub-member variables and source code definition types of the corresponding variables are extracted from the source code of the target program according to the debugging information, the types of the sub-member variables are used as intermediate representation types, and the step 3-2 is executed in a jumping mode;
And 3-6, comparing and analyzing the type of the structure body with the source code definition type of the source code structure body, and outputting a structure body pair consisting of the structure body and the source code structure body if the type is inconsistent, wherein the type inconsistency comprises inconsistency caused by the type name of the structure body, namely inconsistency caused by the type of union and inconsistency caused by the type deletion of the structure body.
8. The LLVM-based variable type information repair and comparison method as set forth in claim 1, wherein the step 4 comprises:
When the type information of the structural body in the structural body centering is missing, the source code definition type of the source code structural body is used as the missing type of the structural body to repair the type information, the middle representation type of the structural body is used as a Key, the source code definition type of the corresponding structural body is used as a Value, and the Value is stored in a repair database in the form of K-V Key Value pairs;
when the type of the structural body in the structural body pair is the type of the complex, the type of the complex is represented by a custom character string, the middle representation type of the structural body is taken as a Key, the custom character string is taken as Value, and the custom character string is stored in a repair database in the form of a K-V Key Value pair.
9. A LLVM-based variable type information repair and comparison system, comprising:
the compiling module is used for acquiring and compiling the source code of the target program into LLVM intermediate representation with debugging information;
An extraction module for extracting a target variable from the LLVM intermediate representation, wherein the target variable comprises a structure related to an analysis task and related to type information missing or a structure of a type of a complex;
The type matching module is used for acquiring a structure body contained in the target variable and a source code structure body corresponding to the intermediate representation type of the structure body in the source code of the target program and a source code definition type of the structure body according to the debugging information, comparing and analyzing the intermediate representation type of the structure body and the source code definition type of the corresponding source code structure body, and outputting a structure body with inconsistent comparison results and the corresponding source code structure body to form a structure body pair;
the type restoration module is used for restoring variable type information by utilizing the source code definition type of the source code structure body for each structure body pair and storing the variable type information in a restoration database;
the analysis and comparison module is used for calling the structural body information stored in the repair database to repair the missing type information of the structural body when the intermediate representation type comparison analysis is carried out on the two variables to be compared and analyzed, and then carrying out the variable intermediate representation type comparison analysis, and comprises the following steps:
Step 5-1, the upper program analysis task extracts the intermediate representation type of the two variables to be analyzed and compared from the intermediate representation of the LLVM corresponding to the source code of the target program;
Step 5-2, calling a type comparison method of the LLVM analysis framework to carry out comparison analysis on the intermediate representation types of the two variables, outputting a comparison result if the comparison result is consistent, and executing step 5-3 if the comparison result is inconsistent;
Step 5-3, when judging that the intermediate representation type comprises the structure type, if the type name information of the two structure types is non-empty, removing struct in the structure type names, comparing the structure type name information, and if the comparison results are consistent, outputting the comparison results; if the type information of the existing structure type is empty, executing the step 5-4;
step 5-4, inquiring the structural body information from the repair database aiming at the structural body type with the empty type information, and if the structural body information corresponding to the structural body type with the empty type information cannot be found, considering that the comparison result is inconsistent and outputting; if the structure information corresponding to the structure type with the type information being empty can be found, executing the step 5-5;
And 5-5, judging that the type information contained in the called structure information is a custom character string, considering the comparison result as unknown and outputting, judging that the type information contained in the called structure information is a non-custom character string, comparing the type information with the type information as the type name information of the structure type with empty type name information, and outputting the comparison result.
CN202210279549.3A 2022-03-21 LLVM (LLVM) -based variable type information restoration and comparison method and system Active CN114610320B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210279549.3A CN114610320B (en) 2022-03-21 LLVM (LLVM) -based variable type information restoration and comparison method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210279549.3A CN114610320B (en) 2022-03-21 LLVM (LLVM) -based variable type information restoration and comparison method and system

Publications (2)

Publication Number Publication Date
CN114610320A CN114610320A (en) 2022-06-10
CN114610320B true CN114610320B (en) 2024-06-21

Family

ID=

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于软件多样化的拟态安全防御策略;张宇嘉;庞建民;张铮;邬江兴;;计算机科学;20180215(02);全文 *
面向***动态可靠性的自适应目标代码生成方法;徐超;葛红美;何炎祥;;计算机应用研究;20170818(02);全文 *

Similar Documents

Publication Publication Date Title
US11797298B2 (en) Automating identification of code snippets for library suggestion models
US11354225B2 (en) Automating identification of test cases for library suggestion models
US11494181B2 (en) Automating generation of library suggestion engine models
Schäfer et al. An empirical evaluation of using large language models for automated unit test generation
US11340896B2 (en) Library model addition
US11775414B2 (en) Automated bug fixing using deep learning
CN112131120B (en) Source code defect detection method and device
CN107622017B (en) Analysis method for universal automation software test
CN104536880A (en) GUI program testing case augmentation method based on symbolic execution
CN111966578A (en) Automatic evaluation method for android compatibility defect repair effect
CN114610320B (en) LLVM (LLVM) -based variable type information restoration and comparison method and system
CN114153447B (en) Automatic AI training code generation method
Harzevili et al. Automatic Static Vulnerability Detection for Machine Learning Libraries: Are We There Yet?
CN115438341A (en) Method and device for extracting code loop counter, storage medium and electronic equipment
CN114297664A (en) Open source component vulnerability detection method based on Gradle
CN115310095A (en) Block chain intelligent contract mixed formal verification method and system
CN114625633A (en) Method, system and storage medium for interface testing
CN113434430A (en) SQL query statement performance detection method and device
CN114610320A (en) LLVM-based variable type information repairing and comparing method and system
CN117390055B (en) JOOQ continuous list sentence generation method, device and medium
Shin et al. Automatic static bug detection for machine learning libraries: Are we there yet?
CN117851101A (en) Warehouse level code defect automatic repairing method based on large language model
McQueary et al. Py-holmes: Causal Testing for Deep Neural Networks in Python
CN117950671A (en) Code generation method, device, electronic equipment and storage medium
KR20240041017A (en) Method and apparatus for selecting the optimal test case for software regression testing

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant