CN114490327A

CN114490327A - Error detection method and device

Info

Publication number: CN114490327A
Application number: CN202111600097.6A
Authority: CN
Inventors: 李斌; 秦伯钦
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-05-13

Abstract

The application discloses an error detection method and an error detection device, which are used for improving the accuracy and efficiency of lock-related error detection in a tested program. The method comprises the following steps: determining variables used for representing the holding of the lock by the thread in the middle-level intermediate code of the tested program and a function call graph; the variable is a variable with a life cycle, and the function call graph comprises a call relation among functions in the tested program; determining at least one variable sequence pair by traversing the function call graph, wherein the variable sequence pair comprises two variables; for each variable sequence pair, mapping each variable in the variable sequence pair into corresponding lock information respectively; and determining a lock graph of a processing relation between locks in the tested program based on the information of the locks corresponding to the variables in the variable sequence pair, and determining error information related to the locks in the tested program based on the lock graph.

Description

Error detection method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for error detection.

Background

The block chain has a wide application prospect in the fields of safety sensitivity such as financial services, transportation, copyright protection and the like, and the safety of the block chain is particularly concerned by people. Currently, blockchains face a large number of security threats, such as intelligent contract vulnerabilities, 51% attacks, double-flower attacks, denial of service attacks, phishing-induced wallet theft, malicious mining, and so on. Due to the characteristics of decentralization, non-tampering and the like of the block chain, the block chain often relates to the fields of finance and the like, once a security vulnerability appears after deployment and operation, the hazard degree and the influence range of the security vulnerability are far larger than those of a traditional distributed system, the block chain can be split due to the fact that the vulnerability is repaired, and the repairing difficulty is greatly improved.

Disclosure of Invention

The embodiment of the application provides an error detection method and device, which are used for improving the accuracy and efficiency of lock-related error detection in a tested program.

The error detection method provided by the embodiment of the application comprises the following steps:

determining variables used for representing the holding of the lock by the thread in the middle-level intermediate code of the tested program and a function call graph; the variable is a variable with a life cycle, and the function call graph comprises a call relation among functions in the tested program;

determining at least one variable sequence pair by traversing the function call graph, wherein the variable sequence pair comprises two variables; for each variable sequence pair, mapping each variable in the variable sequence pair into information of a corresponding lock object respectively;

and determining a lock graph of a processing relation between locks in the tested program based on the information of the lock objects corresponding to the variables in the variable sequence pair, and determining error information related to the locks in the tested program based on the lock graph.

According to the method, variables used for representing holding of a thread on a lock in middle-level intermediate code of a tested program and a function call graph are determined, and at least one variable sequence pair is determined by traversing the function call graph, wherein the variable sequence pair comprises two variables; for each variable sequence pair, mapping each variable in the variable sequence pair into information of a corresponding lock object respectively; and determining a lock graph of a processing relation between locks in the tested program based on the information of the lock objects corresponding to the variables in the variable sequence pair, and determining error information related to the locks in the tested program based on the lock graph, so that the accuracy and efficiency of error detection related to the locks in the tested program can be improved.

Optionally, determining at least one variable sequence pair by traversing the function call graph specifically includes:

and determining a set of the variables which currently live by traversing the function call graph, and determining at least one sequence pair of the variables by utilizing the set of the variables which currently live.

Optionally, for each variable sequence pair, mapping each variable in the variable sequence pair to information of a corresponding lock object respectively, specifically including:

generating a mapping relation from the variable to the lock object or a mapping relation from the variable to the type of the structure body where the lock object is located and the field where the lock object is located aiming at each variable;

and respectively mapping each variable in each variable sequence pair into a corresponding lock object or a structure type where the corresponding lock object is located and a field where the lock object is located based on the mapping relation.

Optionally, the mapping relationship is generated by:

for each variable, tracking a lock object generating the variable from the function call statement of the variable;

if the tracked lock object is from the internal or global variable of the function, recording the mapping relation from the variable to the lock object;

if the tracked lock object is from the function parameter and the function parameter is the structure body, recording the mapping relation between the variable and the structure body type where the lock object is located and the field where the lock object is located.

Optionally, the lock object of each variable is tracked in the following manner:

for each of the variables:

establishing a life cycle of the variable by using the survival statement and the ending statement of the variable;

establishing a moving chain of the variable by using the moving statement of the variable according to the life cycle of the variable;

based on the movement chain of the variable, the lock object generating the variable is tracked starting from the function call statement of the variable.

An error detection device provided by an embodiment of the present application includes:

the method comprises the following steps of a first unit and a function call graph, wherein the first unit is used for determining variables used for representing the holding of threads on locks in middle-level intermediate codes of a tested program and the function call graph; the variable is a variable with a life cycle, and the function call graph comprises a calling relation among functions in the tested program;

a second unit, configured to determine at least one variable sequence pair by traversing the function call graph, where the variable sequence pair includes two variables; for each variable sequence pair, mapping each variable in the variable sequence pair into information of a corresponding lock object respectively;

a third unit, configured to determine a lock graph of a processing relationship between locks in the program under test based on information of a lock object corresponding to the variable in the variable order pair, and determine error information related to the locks in the program under test based on the lock graph.

Optionally, the mapping relationship is generated by:

for each of the variables:

Another embodiment of the present application provides a computing device, which includes a memory and a processor, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions stored in the memory and executing any one of the methods according to the obtained program.

Furthermore, according to an embodiment, for example, a computer program product for a computer is provided, which comprises software code portions for performing the steps of the method as defined above, when said product is run on a computer. The computer program product may include a computer-readable medium having software code portions stored thereon. Further, the computer program product may be directly loaded into an internal memory of the computer and/or transmitted via a network through at least one of an upload process, a download process, and a push process.

Another embodiment of the present application provides a computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform any one of the methods described above.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic specific flow chart of a Bug detection method provided in an embodiment of the present application;

fig. 2 is a schematic workflow diagram of a constraint solver provided in the embodiment of the present application;

fig. 3 is a source code diagram of a blocking Bug associated with a lock in an etherhouse blockchain procedure according to an embodiment of the present application;

fig. 4 is a schematic general flow chart of a Bug detection method provided in the embodiment of the present application;

fig. 5 is a schematic structural diagram of a Bug detection device provided in the embodiment of the present application;

fig. 6 is a schematic structural diagram of a Bug detection device provided in the embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides an error detection method and device, which are used for improving the accuracy and efficiency of blocking Bug detection related to a lock in a block chain program written by Rust.

The method and the device are based on the same application concept, and because the principles of solving the problems of the method and the device are similar, the implementation of the device and the method can be mutually referred, and repeated parts are not repeated.

The terms "first," "second," and the like in the description and in the claims of the embodiments of the application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The following examples and embodiments are to be understood as merely illustrative examples. Although this specification may refer to "an", "one", or "some" example or embodiment(s) in several places, this does not imply that each such reference relates to the same example or embodiment, nor that the feature only applies to a single example or embodiment. Individual features of different embodiments may also be combined to provide other embodiments. Furthermore, terms such as "comprising" and "comprises" should be understood as not limiting the described embodiments to consist of only those features that have been mentioned; such examples and embodiments may also include features, structures, elements, modules, etc. not specifically mentioned.

Various embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the display sequence of the embodiment of the present application only represents the sequence of the embodiment, and does not represent the merits of the technical solutions provided by the embodiments.

Interpretation of terms:

block chains: the blockchain is a growing list of records (called blocks) concatenated cryptographically, each block contains the encrypted hash of the previous block, the corresponding timestamp and transaction information, and the distributed account book concatenated by the blockchain technique enables both parties to record transactions efficiently and to check the transactions permanently. The block chain has the advantages of decentralization, openness, autonomy, information non-tamper property, anonymity and the like.

Rust: rust is an emerging system programming language, and its security and efficiency are the main reasons for its popularity in the blockchain field. The safety of the Rust ensures that the memory Bug such as 'release after use' and the concurrent Bug such as 'data competition' do not exist in the Rust safety code, but cannot ensure that the blocking Bug does not exist in the program.

Blocking Bug: a blocking Bug is a Bug that causes one or some threads in a multi-threaded program to be permanently blocked and unable to proceed. The blocking Bug can cause the blocking and even crashing of a block chain program, and bring huge loss to users, under the most serious condition, an attacker can trigger the blocking of a large number of nodes in the whole network by submitting a well-constructed block, so that 51% of attacks are possible, and the safety of the whole block chain is dynamically shaken. In the Rust language implementation version of the Main flow blockchain "EtherFang", the blocking bugs account for 67.5% of the total number of bugs.

In the blockchain written by Rust, the blocking Bug mainly includes a lock-related blocking Bug, and is mainly a deadlock-like Bug.

Since characteristics and mechanisms of the Rust language are greatly different from those of other programming languages such as C, C + +, Java and the like, the conventional Bug detection tool is difficult to be applied to the blockchain program written by the Rust, and the embodiment of the application utilizes the characteristics and mechanisms of the Rust to realize efficient and accurate Bug detection.

In Rust, each variable has its own lifecycle (Life), which begins when the variable is created; when the life cycle is over, the variable is automatically released. The locking mechanism of Rust relies on the lifecycle mechanism. When a lock instruction (e.g., mutex. lock ()) is executed, a LockGuard object is returned, which represents a hold of the lock. When the lifecycle of the LockGuard object is finished, the LockGuard object is released, so that implicit unlocking without unlocking (unlock) operation is realized.

Therefore, the embodiment of the application extracts the information of the lock object by means of the lifecycle information of the LockGuard object provided by the runt mid-level intermediate code (MIR), instead of the locking instruction and the unlocking instruction.

No unlocking instruction is called in the source code and MIR of the Rust; although there is an unlock instruction call to the operating system level in the final underlying code, the unlock instruction call is often reached through a four to five-layer release function call, and in order to confirm whether the lock instruction and the unlock instruction are directed to the same lock (alias lock), expensive and inaccurate inter-process alias analysis is required. In the prior art, in order to acquire an alias lock, not only alias information inside each function needs to be calculated, but also information of inter-process alias analysis, such as inter-function exit alias information, an access chain of an access escape variable inside the function, and the like, needs to be acquired. In the embodiment of the application, the alias lock can be acquired with low cost and high efficiency only by calculating the alias information in each function (namely alias analysis in the process) and by means of a Rust-rich type system.

In summary, the present application mainly aims to provide a blocking Bug detection method and a blocking Bug detection tool for a blockchain program written by Rust, and in particular relates to detection of blocking bugs related to locks.

The blocking Bug detection related to the lock utilizes a lock mechanism and life cycle information specific to Rust, avoids low-efficiency and inaccurate inter-process alias detection in the traditional method, and can quickly and accurately construct a lock graph, so that the high-efficiency and accurate detection of the blocking Bug related to the lock is realized.

The tool provided by the embodiment of the application can be integrated into the existing compiling tool, so that the blocking bugs existing in the mainstream block chain program at present can be efficiently detected, the source code positions and the trigger paths of the blocking bugs can be accurately reported, and a developer can be helped to locate and repair the bugs in the development and test stages.

In the embodiment of the present application, a module for implementing lock-related blocking Bug detection is referred to as a lock-related blocking Bug static detector. Specific examples are given below.

Because Rust has no explicit unlocking mechanism, but completes implicit automatic unlocking by using the lifecycle of LockGuard, and there is no unlocking statement in the source code and MIR, the traditional deadlock detection algorithm is difficult to directly detect on the source code and MIR, and only depends on the bottom layer code (such as LLVM IR, assembly, etc.). The method comprises the steps of collecting locking primitives and unlocking primitives, judging whether locks operated by the primitives are the same or not, and finding unlocking statements of an operating system through multi-layer function calling, so that complicated inter-process alias analysis is relied on, and the method is low in efficiency and inaccurate.

Rust provides lifecycle (Life) and move (move) statement semantics, each variable having a lifecycle that when exceeded will invoke automatic release. When a variable is moved to a new variable, its ownership is transferred to the new variable and the old variable is inaccessible.

Rust also provides an automatic unlocking mechanism, and the lock function returns a LockGuard type variable whose lifecycle is equal to its scope.

The lifecycle exists in the intermediate code MIR of the Rust in the form of a "pseudo-instruction," which is used for the Rust compiler borrowing checking (borrowchecking). The instruction is divided into two types: StorageLive (survival variable) and StorageDead (end variable). Before the lock function is called, the compiler inserts a storegelive (LockGuard) statement, after the LockGuard is moved into a certain sub-function, a scope is output, after the sub-function is called, the scope is ended, and at the moment, a storedead (LockGuard) statement is inserted, so that the embodiment of the application can know whether the lock is unlocked without tracking the sub-function, and the inter-process tracking from locking to unlocking is avoided.

Further, since the LockGuard itself can only be moved but not copied, the embodiment of the application does not need to worry about alias problems, and only the move chain needs to be tracked to accurately obtain the movement relationship between the lockguards. The alias problem of LockGuard does not need to be considered, namely the problem that two lockguards have access to the same underlying data at the same time does not exist, so that alias analysis is completely avoided, and alias analysis is degenerated into a moving (move) chain with clear tracking. The alias is a data location in the memory that can be accessed by multiple symbolic names of the program. If the LockGuard can be copied, multiple lockguards will be caused to access the same underlying data, i.e. aliasing problems will be caused. If the LockGuard can only be moved, only one LockGuard can exist at the same time, thereby avoiding the alias problem.

Finally, in the process of tracking from Lockguard to Lock, if Lock and Lockguard are located in the same function or are global variables, the embodiment of the application can directly find Lock by using alias analysis in the process; otherwise, the Lock is derived from the function parameter, and at this time, a heuristic method is adopted to record the structure (struct) type of the function parameter and the field where the Lock is located, and the structure type and the field are used as the basis for judging the alias. In a block chain program as an experimental object, the method ensures the accuracy rate of more than 98 percent. The specific explanation is as follows:

alias analysis is used to analyze alias information in a handler, typically to find out which pointers point to the same memory address. In-process alias analysis is only carried out on variables in the same function; inter-process alias analysis requires alias analysis across functions and is therefore more complex. Lockguard can be obtained by calling a Lock function to the Lock object. The embodiment of the application needs to track the corresponding Lock object from the LockGuard, and the process is divided into three cases:

lock object and LockGuard are variables in the same function;

the Lock object is a global variable;

case three the Lock object is derived from the function parameters.

For the first two cases, the embodiment of the present application may directly find the corresponding Lock object from LockGuard by using in-process alias analysis.

For the third case, inter-process alias analysis can be used to find the corresponding Lock object, but it is found through a lot of observations that the function parameter of the source of the Lock object is usually the function parameter of a structure type, and the Lock object is a field of the function parameter of the structure type. Therefore, as long as the structure type and the field number are the same, the Lock objects can be considered to be the same. Therefore, in the embodiment of the present application, the structure type and the field number are recorded to represent the Lock object, and the Lock object does not need to be directly found. Therefore, the detection precision and efficiency of the blocking Bug related to the lock in the block chain program written by the Rust are greatly improved.

Thus, referring to fig. 1, the lock-related blocking Bug static detection procedure provided by the embodiment of the present application is as follows:

and S101, compiling to obtain a middle-level intermediate code (MIR) of the Rust program.

The run program, i.e. the tested program, is described in the embodiment of the present application by taking the tested program as the run program, but is not limited thereto.

Specifically, an API provided by a run compiler is called to obtain a mid-level intermediate code (MIR) of the run program, wherein the MIR is composed of a series of functions.

S102, traversing all functions and local variables thereof in MIR, obtaining all variables of the LockGuard type, and generating a function call graph (Callgraph). The Callgraph includes the calling relationship between functions in the Rust program.

Specifically, for each function in the MIR, all the local variables it contains are traversed, and if the type of the local variable is LockGuard, it is recorded.

Any function in the MIR may have a call instruction inside it to call another function. For all functions in the MIR, the calling function and the called function are connected by the directed edge, and a call graph (Callgraph) of all functions in the MIR can be obtained. In the subsequent step (in S108), the Callgraph is traversed to enable detection of lock-related blocking bugs across functions.

It should be noted that the functions in Callgraph do not include the function for creating a thread. After the Callgraph is generated, the function that created the thread (e.g., thread:: spawn) needs to be deleted from the Callgraph to ensure that the detection of the lock-related blocking Bug occurs within a single thread.

S103, acquiring the following four sentences of each LockGuard variable by using the Def-Use Chain provided by the MIR: move, StorageLive, StorageDead, function call to return the variable.

The Def-Use Chain is a definition-Use Chain and consists of a definition statement of a variable and a statement which can be directly reached from the definition statement and uses the variable.

Specifically, each variable must have a StorageLive and a StorageDead statement, and move and the function call statement that returns the variable do not. For each LockGuard variable, go through the Def-Use Chain, and if there are four statements, record the four statements of the LockGuard variable.

S104, establishing the life cycle range of the LockGuard variable by using the StorageLive and StorageDead sentences of the LockGuard variable for each LockGuard variable recorded with the four sentences.

Wherein, for each LockGuard variable, starting from the StorageLive statement of the LockGuard variable and ending to the StorageDead statement of the LockGuard variable, all statements between the two are the life cycle of the LockGuard variable.

And S105, establishing a move chain of the LockGuard variables by using the life cycles of the LockGuard variables and the move sentences for each LockGuard variable in which the four sentences are recorded.

The move chain of the LockGuard variables is established, that is, the life cycles of LockGuards with the same move statement are merged, that is, merged into one LockGuard, that is, the first LockGuard on the move chain is used to represent all LockGuards on the chain. For example, if the move statement of LockGuard a is equal to the move statement of LockGuard B, the life cycle of LockGuard a is merged with the life cycle of LockGuard B to become one LockGuard.

S106, tracking and generating the Lock object of the Lockguard by utilizing the move chain of the Lockguard variable and starting from the function calling statement of the Lockguard variable by utilizing the in-process alias analysis.

S107, if the tracked Lock object is from the function internal or global variable, recording the mapping relation from the Lockguard to the Lock object; if the tracked Lock object is from the function parameter and the parameter is the structure, recording the mapping relation from the Lockguard to the structure type and the field where the Lock object is located.

For example:

the first condition is as follows:

the Lock object comes from inside the function, i.e. track lockguard to Lock object, and if the Lock object is found to be located in the same function foo as lockguard, then the local variable Lock of lockguard- > is recorded in the mapping table (Map).

Case two:

and if the lock object is from a global variable, recording the lock guard- > global variable lock in the Map.

Case three:

if the lock object is from the function parameter b and the parameter is the structure Bar, the lock guard- > (structure type Bar, field 0) is recorded in the Map.

And S108, determining at least one LockGuard sequence pair by traversing the function call graph.

The pair (pair) refers to a container containing two elements, wherein the first element and the second element are fixed in position and can not be exchanged. The LockGuard sequence pair, namely, includes two LockGuard variables.

Specifically, the method comprises the following steps: starting from an entry function, depth-first searching Callgraph, and recording a current surviving LockGuard variable set at the beginning and the end of each basic block (BasicBlock); at each function call (except for the function that returns LockGuard), the surviving LockGuard variable set is passed to the function being called. Wherein, every time a StorageLive statement of LockGuard is encountered, recording LockGuard sequence pairs (surviving LockGuard, the LockGuard) and adding the LockGuard object into a set of surviving LockGuard variables; if the StorageDead statement of LockGuard is encountered, the LockGuard object is removed from the set of surviving LockGuard variables. Wherein the surviving LockGuard in the sequence pair comprises the set of all currently surviving LockGuard variables. That is, the pair of LockGuard sequences includes at least two LockGuard variables, namely, the discovered surviving LockGuard variable and the newly discovered surviving LockGuard variable.

The set of LockGuard variables on how to determine survival in particular, and how to determine LockGuard pairs, for example:

at the beginning and ending positions of all basic blocks (marked as B) of each function (marked as F), a current alive LockGuard variable set is respectively maintained, and is respectively marked as BEFORE [ F ] [ B ] and AFTER [ F ] [ B ] and is initialized to be an empty set. All [ F ] [ B ] s are added to a queue Q and processed as follows until the queue Q is empty or all BEFORE and AFTER changes no longer:

step one, popping a first basic block [ F ] [ B ] from the queue Q, and merging the BEFORE [ F ] [ B ] into the AFTER [ F ] [ B ].

Step two, the basic block [ F ] [ B ] is processed as follows:

if [ F ] [ B ] contains LockGuard's StorageDead statement, then remove the LockGuard from AFTER [ F ] [ B ];

if [ F ] [ B ] contains the StorageLive statement of LockGuard, then establish an ordered pair of each LockGuard in AFTER [ F ] [ B ] with the current LockGuard (each LockGuard in AFTER [ F ] [ B ], current LockGuard), and add the current LockGuard to AFTER [ F ] [ B ].

Step three, finding all basic blocks P of [ F ] [ B ] which need subsequent processing, wherein the method comprises the following steps:

if the end of B is a function call, finding all called functions F 'at [ F ] [ B ] according to Callgraph, and taking the first basic block [ F' ] [ B '] of the functions F' as P;

if the end of B is function return, finding out all positions (F ') [ B') ] of called F according to Callgraph, and taking the direct successor basic block of [ F '] [ B' ]asP;

if the end of B is neither a function call nor a function return, then the immediate successor basic block of B is taken as P. With respect to the immediate successor: the end of each basic block B may point to 0, 1 or more basic blocks, the pointed basic block being the direct successor of B.

Step four, if the AFTER [ F ] [ B ] is changed AFTER the step two, all SUCCs of the B are added into the queue Q, the AFTER [ F ] [ B ] is merged into the BEFORE [ F ] [ P ], and the step one is returned.

When the constraint solver is started, the path and condition set of each LockGuard sequence pair are recorded, namely, which paths need to be passed from the first variable to the second variable of the sequence pair, and which branch conditions exist on the paths.

The Depth-First Search is implemented by using a Depth-First-Search (DFS) algorithm, which is an algorithm for traversing or searching a tree or a graph. This algorithm searches for branches of the tree as deeply as possible. When all edges of the node v have been searched, the search will go back to the starting node of the edge where the node v is found. This process continues until all nodes reachable from the source node have been discovered.

The Basic Block (Basic Block) is a straight code sequence, with no branching entry except the entry, and no branching except the exit. This limited form makes the basic block very easy to analyze.

The constraint Solver (SMT Solver): the method is a tool for judging whether the mathematical formula has a solution.

There may be branch (branch) statements on the path that provide multiple BasicBlock as successors. The branch condition is included in a branch statement that is used to decide which BasicBlock to select as successor when actually executed.

S109, based on the mapping relation determined in the step S107, obtaining a Lock graph by mapping the Lockguard in each Lockguard sequence pair to a corresponding Lock object, or mapping the Lockguard to the type and the field of the structure where the corresponding Lock object is located; and, based on the lock map, an information report of the blocking Bug associated with the lock is determined.

The lock graph reflects the processing flow relationship between the locks, and if a processing flow starting from one lock exists and finally points to the lock, the lock graph indicates that a deadlock exists, namely, a blocking Bug related to the lock exists.

Specifically, the method comprises the following steps: when the constraint solver is started, judging whether the conditions of the LockGuard sequence pair can be met, if not, indicating that the sequence pair path is unreachable, deleting the sequence pair, and if so, indicating that the sequence pair path is reachable. Then, the LockGuard in each LockGuard sequence pair is mapped to a corresponding Lock (or structure type and field), i.e. a Lock map is obtained.

Wherein, the conditions of the LockGuard sequence pair are the condition set of each sequence pair in the step S108. And performing conjunction operation on elements in all the condition sets to obtain a mathematical formula, judging whether the formula has a solution or not by using a constraint solver, if so, indicating that the conditions of the LockGuard sequence pair can be met, and otherwise, not, judging that the conditions of the LockGuard sequence pair cannot be met.

In step S107, a mapping relationship from LockGuard to Lock (or structure type + field) is recorded, and LockGuard in each order pair is replaced with mapped Lock using the mapping relationship.

For example: LockGuard1 maps to Lock1, LockGuard2 maps to Lock2, then sequential pair (LockGuard1, LockGuard2) maps to (Lock1, Lock2), then a directed edge from Lock1 to Lock2 is established. All LockGuard pairs are processed in this way, and all the obtained directed edges form a directed graph, namely a Lock graph, of Lock.

As another example, if the sequential pair (Lockguard1, Lockguard2) is mapped (structType1+ Field1, structType2+ Field2), then a directed edge is created from structType1+ Field1 to structType2+ Field 2. Wherein, structType represents a structure type, and Field represents a Field.

After the lock graph is obtained, a traditional deadlock detection algorithm based on looping detection can be used for finding blocking bugs related to the lock on the lock graph, and finally, a Bug information report is determined, wherein the Bug information report comprises source code positions, blocking trigger paths, conditions and the like of LockGuard sequence pairs.

In step S108, a LockGuard sequence pair is obtained, and for each LockGuard variable, the source code position of the StorageLive, that is, the source code position of the LockGuard variable, can be obtained by using the StorageLive statement recorded in step S104. The source code positions of the LockGuard sequence pair, namely the source code positions of the two LockGuard variables in the sequence pair.

And the path of the LockGuard program pair recorded in the step S108, namely, the blocking trigger path in the Bug information report. The set of conditions of the LockGuard program pair, i.e. the conditions in the Bug information report.

For example, as shown in FIG. 2, for the LockGuard pair (A, B), traverse all possible execution paths from A to B, collect the conditions for each branch jump on the path, and input a constraint solver (e.g., z3) to determine whether the path is reachable. If the constraint solver finds that all conditions of a certain path have solutions, the LockGuard sequence is accessible to the path; if no solution exists, the LockGuard sequence is not reachable to the path. For example, assuming that two execution paths exist from a to B, the first execution path, for example, a path formed by branch 1 to branch n in fig. 2, and the corresponding conditions are a >1, B <2, and a ═ 3, then one solution that can satisfy all the conditions is a ═ 3 and B ═ 1, and at this time, the constraint solver can successfully calculate the solution and output the solution; the second execution path, for example, the path formed by branch 1 'to branch n' in fig. 2, corresponds to the condition a <1, b >3, a >2, and there is no solution that can satisfy all the conditions, and the constraint solver return condition cannot be satisfied.

Through path traversal, all execution paths that satisfy the LockGuard pair can be exhausted, so the method is complete (complete). The constraint solver only finds a solution that can satisfy the condition to output, and the LockGuard sequence pair path is inevitably reachable at this time, so the method is sound.

In order to further improve the detection efficiency or provide more detailed information for the user, in practical applications, the embodiment of the present application may adopt a simplified or complex configuration for the path traversal and solution module. The constraint solving function may be turned off in a simplified configuration, which has the advantages of increased detection efficiency, potential blocking logic errors may be found, and the disadvantage of possible false positives. In a complex configuration, the constraint solver can be made to output a set of all solutions satisfying the condition, and the configuration has the advantages of providing detailed blocking condition information for the user, facilitating the user to skip all blocking conditions to repair the blocking Bug and possibly bringing high overhead.

For example, the source code for a blocking Bug associated with a lock in the EtherFangfabrik procedure is shown in FIG. 3.

The lifecycle of the LockGuard in the 6 th row spans from the 6 th row to the 15 th row, when the condition of the 7 th row is triggered, the 9 th row is entered, a new LockGuard is created in the 9 th row, and the lifecycle of the LockGuard in the 6 th row is still not finished, so the lifecycles of the lockguards in the 6 th row and the 9 th row are sequentially recorded as a LockGuard sequential pair.

Since the line 6 to the line 9 can only pass through the branch of the line 7, only one execution path can be found for the path traversal of the pair, the condition of the path is only one, that is, gas _ priority is Fixed, and the constraint solver finds that the condition can be satisfied (because there is no other condition that conflicts with the condition), so that the execution path is judged to be reachable.

By tracking the source of LockGuard, it is found that the lockings generating the two lockguards both come from the first field of the Miner structure, i.e. there is a ring pointing to the node itself on the Lock graph, so that the Lock-related blocking Bug has been successfully found in the given blockchain program. Specific examples are as follows:

for example, the following is a partial MIR (simplified here) of the above code obtained by the compiler:

let_0:&Miner；

let_3:MutexGuard<U256>；

_1＝_0；

_2＝&(*_1).0；

_3＝lock(&_2)；

therein, only relevant statements of LockGuard1 on line 6 are recorded. "3 is a LockGuard type (i.e., LockGuard1), the function that created it is" 2 ", which in turn gets the alias of the first field where" 2 is "1" and "0" are aliases using intra-process alias analysis. And _0 is a function parameter and points to the type Miner, which is a struct. Thus LockGuard1 is mapped as (Miner, field 0). Another LockGuard2 on line 9 is also mapped in the same way to (Miner, field 0). Originally, LockGuard1 points to LockGuard2, and is mapped to (Miner,0) point (Miner,0), that is, there is a ring pointing to itself on the lock map, so the lock-related blocking Bug has been successfully found in the given block chain program.

Finally, reporting the Bug information to the user, as shown in table 1, wherein the Bug information is formatted as a list, the list includes at least one record, each record records information of a LockGuard sequence pair, each record includes the following contents:

the list of the positions of the source codes where the LockGuard pair is located specifically includes the following contents: the file name is a function name, a starting line number, a starting column number, an ending line number and an ending column number;

the blocking triggered path list specifically includes the following contents: file name, function name, line number;

the condition list specifically includes the following contents: the file name is the function name, the starting line number, the starting column number, the ending line number, the ending column number and the variable name.

This information can help developers to quickly locate and repair blocking bugs during the development phase.

Table 1 one record in the lock-related congestion Bug information report format:

wherein, the minute.rs represents the file name, the set _ minute _ gas _ price represents the function name, the numbers in 6:10:6:30 are respectively the starting line number, the starting column number, the ending line number and the ending column number, and the same principle is that the numbers in 9:11:9: 31. Gas _ primer in the third row represents the variable name; fixed represents a value.

In summary, in the embodiment of the application, by using a language facility, which is a special life cycle of an intermediate code MIR during Rust compilation, and an automatic unlocking mechanism of Rust based on RAII, in the process of generating a lock graph, alias analysis between processes with high cost and low precision is completely avoided, so that the efficiency and the accuracy of static detection of Rust lock related blocking Bug are improved.

Traditional lock-related blocking Bug detection requires tracking of explicit unlock primitives and complex inter-process alias analysis, which is not suitable for block chain programs written with Rust. The method for generating the lock graph without inter-process alias analysis is realized by using the life cycle and the lock mechanism of the Rust, so that the accuracy and the efficiency of static detection of the blocking Bug related to the lock in the block chain program written by the Rust are greatly improved, the static detection algorithm can respectively complete detection of main block chain programs such as Etheng, Solana, Polkadot and the like within 30min and report the Bug information, the accuracy is more than 98%, and other methods only take hours after one-step alias analysis among the processes, so that a tool capable of detecting the deadlock Bug of the block chain program written by the Rust is difficult to find.

The block chain blocking Bug information format provided by the embodiment of the application is detailed, comprehensive and user-friendly, is irrelevant to the implementation language or the design architecture of the block chain, and can effectively help a developer to quickly position and repair the blocking Bug in the development stage according to the feedback of the developer.

In summary, referring to fig. 4, an error detection method provided in the embodiment of the present application includes:

s401, determining variables used for representing holding of the lock by the thread in the middle-level intermediate code of the tested program and a function call graph; the variable is a variable with a life cycle, and the function call graph comprises a call relation among functions in the tested program;

the variable used for representing the holding of the lock by the thread is, for example, the LockGuard variable.

S402, determining at least one variable sequence pair by traversing the function call graph, wherein the variable sequence pair comprises two variables; for each variable sequence pair, mapping each variable in the variable sequence pair into information of a corresponding lock object respectively;

s403, determining a lock graph of a processing relation between locks in the tested program based on information of lock objects corresponding to the variables in the variable sequence pair, and determining error information related to the locks in the tested program based on the lock graph.

Optionally, the mapping relationship is generated by:

for each of the variables:

establishing a moving chain of the variable by using the moving statement of the variable according to the life cycle of the variable; the mobile chain, namely the move chain;

The following describes an apparatus or device provided in the embodiments of the present application, where technical features the same as or corresponding to those described in the above methods are explained or illustrated, and are not further described later.

Referring to fig. 5, an error detection apparatus provided in an embodiment of the present application includes:

a first unit 11, configured to determine a variable used for representing the holding of a lock by a thread in the intermediate-level intermediate code of the program under test, and a function call graph; the variable is a variable with a life cycle, and the function call graph comprises a call relation among functions in the tested program;

a second unit 12, configured to determine at least one variable sequence pair by traversing the function call graph, where the variable sequence pair includes two variables; for each variable sequence pair, mapping each variable in the variable sequence pair into information of a corresponding lock object respectively;

a third unit 13, configured to determine a lock map of a processing relationship between locks in the program under test based on information of lock objects corresponding to the variables in the variable order pair, and determine error information related to the locks in the program under test based on the lock map.

and determining a set of the variables which are currently alive by traversing the function call graph, and determining at least one sequence pair of the variables by utilizing the set of the variables which are currently alive.

Optionally, the mapping relationship is generated by:

for each of the variables:

It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiment of the present application provides a computing device, which may specifically be a desktop computer, a portable computer, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), and the like. The computing device may include a Central Processing Unit (CPU), memory, input/output devices, etc., the input devices may include a keyboard, mouse, touch screen, etc., and the output devices may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), etc.

The memory may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides the processor with program instructions and data stored in the memory. In the embodiments of the present application, the memory may be used for storing a program of any one of the methods provided by the embodiments of the present application.

The processor is used for executing any one of the methods provided by the embodiment of the application according to the obtained program instructions by calling the program instructions stored in the memory.

For example, referring to fig. 6, another error detection apparatus provided in an embodiment of the present application includes:

a memory 620 for storing program instructions;

a processor 600, configured to call the program instructions stored in the memory, and execute, according to the obtained program:

Optionally, the mapping relationship is generated by:

Optionally, the lock object of each variable is tracked by specifically using the following method:

for each of the variables:

based on the moving chain of the variable, starting from the function call statement of the variable, the lock object that generated the variable is tracked.

A transceiver 610 (optional) for receiving and transmitting data under the control of the processor 600.

Where in fig. 6, the bus architecture may include any number of interconnected buses and bridges, with various circuits being linked together, particularly one or more processors represented by processor 600 and memory represented by memory 620. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 610 may be a number of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium. For different user devices, the user interface 630 may also be an interface capable of interfacing with a desired device externally, including but not limited to a keypad, display, speaker, microphone, joystick, etc.

The processor 600 is responsible for managing the bus architecture and general processing, and the memory 620 may store data used by the processor 600 in performing operations.

Alternatively, the processor 600 may be a CPU (central processing unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or a CPLD (Complex Programmable Logic Device).

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method of any of the above embodiments. The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Embodiments of the present application provide a computer-readable storage medium for storing computer program instructions for an apparatus provided in the embodiments of the present application, which includes a program for executing any one of the methods provided in the embodiments of the present application. The computer-readable storage medium may be a non-transitory computer-readable medium.

The computer-readable storage medium can be any available medium or data storage device that can be accessed by a computer, including but not limited to magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.

It should be understood that:

embodiments suitable for implementation as software code or as part thereof and for operation using a processor or processing functionality are software code independent and may be specified using any known or future developed programming language, such as a high level programming language, such as objective-C, C, C + +, C #, Java, Python, Javascript, other scripting language, etc., or a low level programming language, such as machine language or assembler.

The implementation of the embodiments is hardware independent and may be implemented using any known or future developed hardware technology or any mixture thereof, such as a microprocessor or CPU (central processing unit), MOS (metal oxide semiconductor), CMOS (complementary MOS), BiMOS (bipolar MOS), BiCMOS (bipolar CMOS), ECL (emitter coupled logic) and/or TTL (transistor-transistor logic).

Embodiments may be implemented as separate devices, apparatus, units, components or functions, or in a distributed fashion where, for example, one or more processors or processing functions may be used or shared in a process, or one or more processing segments or processing portions may be used and shared in a process, where one physical processor or more than one physical processor may be used to implement one or more processing portions dedicated to a particular process as described.

The apparatus may be implemented by a semiconductor chip, a chipset, or a (hardware) module comprising such a chip or chipset.

Embodiments may also be implemented as any combination of hardware and software, such as an ASIC (application specific IC (integrated circuit)) component, FPGA (field programmable gate array) or CPLD (complex programmable logic device) component, or DSP (digital signal processor) component.

Embodiments may also be implemented as a computer program product, comprising a computer usable medium having a computer readable program code embodied therein, the computer readable program code adapted to perform a process as described in the embodiments, wherein the computer usable medium may be a non-transitory medium.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of error detection, the method comprising:

2. The method according to claim 1, wherein determining at least one variable order pair by traversing the function call graph comprises:

3. The method according to claim 1, wherein for each variable-ordered pair, mapping each variable in the variable-ordered pair to information of a corresponding lock object respectively comprises:

4. The method of claim 3, wherein the mapping relationship is generated by:

5. The method of claim 4, wherein the lock object of each variable is tracked by:

for each of the variables:

6. An error detection apparatus, comprising:

the device comprises a first unit and a second unit, wherein the first unit is used for determining variables used for representing the holding of the lock by the thread in the middle-level intermediate code of the tested program and a function call graph; the variable is a variable with a life cycle, and the function call graph comprises a call relation among functions in the tested program;

7. The apparatus of claim 6, wherein determining at least one variable order pair by traversing the function call graph comprises:

8. The apparatus according to claim 6, wherein for each variable-ordered pair, mapping each variable in the variable-ordered pair to information of a corresponding lock object respectively comprises:

9. The apparatus of claim 8, wherein the mapping relationship is generated by:

10. The apparatus of claim 9, wherein the lock object for each variable is tracked by:

for each of the variables:

11. A computing device, comprising:

a memory for storing program instructions;

a processor for calling program instructions stored in said memory to execute the method of any one of claims 1 to 5 in accordance with the obtained program.

12. A computer program product for a computer, characterized in that it comprises software code portions for performing the method according to any one of claims 1 to 5 when said product is run on the computer.

13. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 5.