CN111400724A - Operating system vulnerability detection method, system and medium based on code similarity analysis - Google Patents

Operating system vulnerability detection method, system and medium based on code similarity analysis Download PDF

Info

Publication number
CN111400724A
CN111400724A CN202010381909.1A CN202010381909A CN111400724A CN 111400724 A CN111400724 A CN 111400724A CN 202010381909 A CN202010381909 A CN 202010381909A CN 111400724 A CN111400724 A CN 111400724A
Authority
CN
China
Prior art keywords
code
vulnerability
operating system
function
tested
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010381909.1A
Other languages
Chinese (zh)
Other versions
CN111400724B (en
Inventor
任怡
汪哲
谭郁松
周凯
黄辰林
李宝
阳国贵
王晓川
丁滟
张建锋
谭霜
蹇松雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202010381909.1A priority Critical patent/CN111400724B/en
Publication of CN111400724A publication Critical patent/CN111400724A/en
Application granted granted Critical
Publication of CN111400724B publication Critical patent/CN111400724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method, a system and a medium for detecting the vulnerability of an operating system based on code similarity analysis, wherein the method comprises the steps of positioning a vulnerability code segment to form an operating system vulnerability code library; screening an operating system source code library; respectively generating a code attribute graph set aiming at the vulnerability code segment and an operating system source code library; respectively extracting features from the fragile function in the fragile code segment and the code attribute graph set of each function to be tested in the operating system source code library, calculating similarity, detecting whether multiplexing of the fragile code exists or not, and outputting a result. Aiming at the problems of insufficient detection capability and the like of the existing vulnerability code multiplexing detection method when the existing vulnerability code multiplexing detection method is oriented to a large-scale software system such as an operating system, the method adopts methods of screening an operating system source code library, progressively screening multiple vulnerability characteristics and the like, improves and optimizes the flow of the existing detection method, ensures higher operation efficiency and gives consideration to good accuracy.

Description

Operating system vulnerability detection method, system and medium based on code similarity analysis
Technical Field
The invention relates to the technical field of computer program detection and operating system vulnerability analysis, in particular to a method, a system and a medium for detecting the vulnerability of an operating system based on code similarity analysis.
Background
The code reuse refers to that a section of code in one piece of software is directly copied or is used in other software after being slightly modified to be used as a component of the latter code, at present, software development by using an existing code component or a template becomes a common and common means in software engineering, in 2018, Blackduck carries out reuse analysis and audit on anonymous data in more than 1100 commercial code libraries, and covers industries such as big data, network security, enterprise software, financial services, healthcare, Internet of things, automobiles, manufacturing industry and mobile application markets, audit results show that 96% of scanned code has an open source component, in the code composition of large software such as a general operating system, the phenomenon of code reuse is also quite obvious, for example, the kernel authority of Mac OS early version of apple is reused and developed on the basis of an open source FreeBSD kernel, the code reuse effectively improves the software development efficiency, and brings about the diffusion of fragile code as important software in an information system, and the fact that the operation system can be used as a local vulnerability repair, and the operation system can be used for a great number of attacks, and the whole new operation account can be formed by the discovery, and the possibility of the early-time increase of the vulnerability and the attack of the whole new vulnerability.
At present, a software vulnerability formal verification tool is generally suitable for tens of thousands and hundreds of thousands of lines of code scales, is only effective on vulnerability of partial codes such as integer overflow, buffer overflow, empty pointers, memory leakage and the like, and is difficult to identify vulnerability codes such as backdoor codes and authentication bypass which are consistent with program grammatical and semantic rules in form. Operating systems, particularly general-purpose operating systems, have code that is large-scale, often hundreds to tens of millions of lines in kernel code alone, and are growing continuously. Therefore, for code reuse, detecting vulnerabilities in operating systems based on code similarity analysis becomes an important research direction in the field of operating system vulnerability detection.
Scholars at home and abroad have proposed various detection methods based on various code similarity analysis theories and developed numerous detection tools. According to different selected characteristics, the method can be divided into four types, namely text-based, lexical-based, grammar-based, semantic-based and the like. The detection method based on the text and the lexical method is low in complexity and high in efficiency, but code multiplexing of changing the codes such as sentence addition, deletion, exchange and the like cannot be detected; the detection method based on grammar and semantics needs to construct a code grammar tree or a program dependency graph and compare isomorphic relations, can achieve higher detection precision, but has high computational complexity and low running efficiency, and often has the problem of overlong running time when facing large-scale software such as an operating system. Therefore, for large-scale software such as an operating system, compromise needs to be made between accuracy and efficiency of the method, which has little work at home and abroad, and particularly, a special method and a tool for composition characteristics of an operating system software package are lacked.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems of insufficient detection capability, low detection result accuracy, low detection efficiency and the like of the existing vulnerability code multiplexing detection method when the existing vulnerability code multiplexing detection method is oriented to a software system with a large scale such as an operating system, the like, the invention improves and optimizes the existing vulnerability code multiplexing detection process by adopting methods such as progressive screening of vulnerability feature sets such as a screening operating system source code library, basic information, a marker (token) sequence, a control flow path and the like, and has good accuracy while ensuring high operation efficiency.
In order to solve the technical problems, the invention adopts the technical scheme that:
a method for detecting vulnerability of an operating system based on code similarity analysis comprises the following implementation steps:
1) positioning vulnerability code segments related to an operating system based on information provided by a public vulnerability database to form an operating system vulnerability code library;
2) screening an operating system source code library according to the detection target of the vulnerability code segment in the operating system vulnerability code library;
3) analyzing a vulnerability code segment of an operating system vulnerability code library and a screened operating system source code library, and generating a code attribute graph set of all vulnerability functions in the vulnerability code segment according to a generation rule of a code attribute graph, wherein the vulnerability functions refer to one or more functions contained in the vulnerability code segment;
4) and respectively extracting features from the code attribute graph sets of all fragile functions in the fragile code segment and the code attribute graph set of each function to be tested in the source code library of the operating system, calculating the similarity, detecting whether the multiplexing of the fragile codes exists or not, and outputting a result.
Optionally, the detailed steps of step 1) include:
1.1) acquiring vulnerability code information collected in a public vulnerability database;
1.2) for each vulnerability code: judging whether the public vulnerability database provides a patch file of the vulnerability code, and if the public vulnerability database provides the patch file, directly positioning a file and a code segment related to the vulnerability code through information in the patch file; if the public vulnerability database does not provide the patch file, acquiring the repaired version number of the software of the public vulnerability database according to the description of the public vulnerability database, and positioning the file and the code segment related to the vulnerability by comparing the difference between the two software versions before and after the repair;
1.3) extracting the related complete code by taking a function as a unit, constructing a code sample set and recording the code sample set as an operating system vulnerability code base, wherein one code sample in the code sample set comprises one or more vulnerability code segments related to the vulnerability.
Optionally, the operating system vulnerability code library is saved in the form of a code file in units of functions.
Optionally, the detailed steps of step 2) include:
2.1) extracting a code sample from an operating system vulnerability code library;
2.2) judging whether the number of the code samples exceeds a preset threshold value, if so, directly taking the whole operating system source code library as the screened operating system source code library, and skipping to execute the step 3); otherwise, skipping to execute the next step;
2.3) screening and reserving codes with the same type as the software package to which the operating system vulnerability code base belongs according to the type information of the software package to which the vulnerability code segment belongs;
2.4) calculating the code line number contained in each function in the vulnerability code segment, and taking the minimum value and the maximum value to form the line number fluctuation range of the vulnerability code segment; traversing each function to be tested in the source code library of the operating system, if the number of code lines contained in the function to be tested exceeds a preset threshold value compared with the line number fluctuation range, rejecting the function to be tested, otherwise, adding the function to the screening result, thereby obtaining the source code library of the screened operating system, and skipping to execute the step 3).
Optionally, the detailed steps of step 4) include:
4.1) reading a code attribute graph set of the fragile function, and extracting basic information features, token set features and control flow path features of the fragile function as a fragile feature set;
4.2) traversing and acquiring a function to be tested in the source code library of the operating system as a current function to be tested, and acquiring a code attribute diagram of the current function to be tested;
4.3) extracting the basic information of the current function to be tested from the code attribute diagram, calculating the cosine distance between each fragile function and the current function to be tested, if the cosine distance between each fragile function and the current function to be tested is higher than a preset threshold value, judging that the current function to be tested is not similar to the fragile code segment, and skipping to execute the step 4.6); otherwise, skipping to execute the next step;
4.4) extracting a token set of the current function to be tested, calculating the edit distance between each fragile function and the token set of the current function to be tested, if the edit distances are all larger than a preset threshold value, determining that the current function to be tested is not a similar code, and skipping to execute the step 4.6);
4.5) extracting a control flow path set of the current function to be tested, and comparing the control flow path set with the characteristics of the fragile function to judge whether the current function to be tested is similar to the fragile code segment;
4.6) judging whether the functions to be detected in the source code library of the operating system are all detected completely, and if not, skipping to execute the step 4.2); otherwise, outputting the detection result of the similar codes and ending.
Optionally, the detailed steps of step 4.3) include:
4.3.1) firstly classifying the return value types in the code attribute diagram of the current function to be tested, including a basic type, a structure type, a pointer type and a void type, mapping the types into different values, reading the return value types and the parameter types of the function to be tested, quantizing according to the data types, and recording the values; the basic types comprise numerical value types and character types, and the structure types comprise various data structures including arrays, structs, unions and enum;
4.3.2) reading the total number of sentences and the cycle number in the current function to be tested, linking the values of the return value type and the parameter type, and forming a basic information vector V of the current function to be testedB
4.3.3) respectively calculating the basic information vector V of each fragile function and the current function to be measuredBCosine distances between the weak code segments are judged to be dissimilar to the function to be tested if the cosine distances are higher than the preset threshold value, and the step 4.6) is executed; otherwise, skipping to execute the next step.
Optionally, the detailed steps of step 4.5) include:
4.5.1) firstly, obtaining a set of control flow paths of the current function to be tested by means of a code attribute graph of the current function to be tested, wherein each path in the set consists of a statement node and a control flow edge; then, the statement type of each node in the control flow path is extracted and stored as a type sequence in sequence, and the control flow path set is converted into a type sequence set St
4.5.2) calculating the type sequence set S of each fragile function and the current function to be tested respectivelytThe Jacard distance between them, if there is a fragile function and the type sequence set S of the function to be testedtIf the Jacard distance between the fragile function and the function to be tested is smaller than a preset threshold value, the current function to be tested is considered to be a similar code of the fragile code segment corresponding to the fragile function, and a result is recorded; otherwise, the current function to be tested is judged to be dissimilar to the fragile code segment corresponding to the fragile function.
In addition, the invention also provides an operating system vulnerability detection system based on code similarity analysis, which comprises:
the vulnerability code segment acquisition program unit is used for positioning the vulnerability code segments related to the operating system based on the information provided by the public vulnerability database to form an operating system vulnerability code library;
the operating system source code library screening program unit is used for screening the operating system source code library according to the detection target of the vulnerability code segment in the operating system vulnerability code library;
the code attribute graph generating program unit is used for analyzing the vulnerability code section of the vulnerability code library of the operating system and the source code library of the operating system after screening, and generating a code attribute graph set of all the vulnerability functions in the vulnerability code section and a code attribute graph set of each function to be tested in the source code library of the vulnerability operating system according to a generating rule of the code attribute graph, wherein the vulnerability number refers to one or more functions contained in the vulnerability code section;
and the similar feature matching program unit is used for respectively extracting features from the code attribute graph sets of all the fragile functions in the fragile code segment and the code attribute graph set of each function to be tested in the source code library of the operating system, calculating similarity, detecting whether multiplexing of the fragile codes exists or not, and outputting results.
In addition, the invention also provides an operating system vulnerability detection system based on code similarity analysis, which comprises a computer device, wherein the computer device is programmed or configured to execute the steps of the operating system vulnerability detection method based on code similarity analysis, or a computer program which is programmed or configured to execute the operating system vulnerability detection method based on code similarity analysis is stored on a memory of the computer device.
Furthermore, the present invention also provides a computer-readable storage medium having stored thereon a computer program programmed or configured to execute the code similarity analysis-based operating system vulnerability detection method.
Compared with the prior art, the invention has the following advantages: aiming at the phenomenon of multiplexing of fragile codes in an operating system, the source code library of the operating system is preprocessed, and a set of codes to be detected is screened before similarity matching is carried out by utilizing software package type division and function code scale in the operating system, so that the range of the codes to be detected is effectively reduced, and the detection efficiency is improved; the invention selects to analyze the code into the code attribute graph, avoids the problem of code compiling, integrates the advantages of an abstract syntax tree, a control flow graph and a program dependency graph, and keeps the code characteristics in various aspects; the invention extracts multi-angle information such as function basic information, lexical characteristics and control dependence characteristics by using the advantages of comprehensive information and high traversal speed of the code attribute chart, gradually and progressively detects, ensures the detection precision and effectively reduces the influence of a complex flow on the efficiency. Aiming at the problems of insufficient detection capability, low accuracy of detection results, low detection efficiency and the like of the existing vulnerability code multiplexing detection method when the existing vulnerability code multiplexing detection method is oriented to a large-scale software system such as an operating system, the invention adopts methods of screening an operating system source code library, progressively screening vulnerability characteristic sets such as basic information, a marker (token) sequence, a control flow path and the like, improves and optimizes the existing vulnerability code multiplexing detection process, ensures higher operation efficiency and gives consideration to good accuracy.
Drawings
FIG. 1 is a schematic flow chart of the method provided by the present invention.
FIG. 2 is a flowchart of the steps of obtaining a vulnerability code fragment according to the present invention.
FIG. 3 is a flowchart illustrating the steps of screening the source code library according to the present invention.
FIG. 4 is a flowchart of the matching procedure for extracting features according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples.
As shown in fig. 1, the implementation steps of the operating system vulnerability detection method based on code similarity analysis in this embodiment include:
1) positioning vulnerability code segments related to an operating system based on information provided by a public vulnerability database to form an operating system vulnerability code library;
2) screening an operating system source code library according to the detection target of the vulnerability code segment in the operating system vulnerability code library;
3) analyzing a vulnerability code segment of an operating system vulnerability code library and a screened operating system source code library, and generating a code attribute graph set of all vulnerability functions in the vulnerability code segment according to a generation rule of a code attribute graph (CPG), wherein the vulnerability functions refer to one or more functions contained in the vulnerability code segment; the code attribute graph is a joint data structure combining the attributes of an abstract syntax tree, a control flow graph and a program dependency graph; the code attribute diagram of a section of code is obtained by representing an abstract syntax tree, a control flow diagram and a program dependency diagram corresponding to the section of code into the form of an attribute diagram and then combining the abstract syntax tree, the control flow diagram and the program dependency diagram; the attribute graph is essentially a directed multi-graph, each node in the graph contains an attribute and a value corresponding to the attribute, the nodes are connected through directed edges with labels, and the attribute and the labels are assigned to each node and edge by some method (function);
4) and respectively extracting features from the code attribute graph sets of all fragile functions in the fragile code segment and the code attribute graph set of each function to be tested in the source code library of the operating system, calculating the similarity, detecting whether the multiplexing of the fragile codes exists or not, and outputting a result.
In this embodiment, one or more functions included in the vulnerability code segment are referred to as vulnerability functions, and the functions in the source code library are to-be-tested functions. Classifying the vulnerability codes in the public vulnerability database according to whether corresponding repair patches are provided, positioning files and code segments where the vulnerability codes are located according to a specified method, extracting related complete codes by taking functions as units, and constructing an operating system vulnerability code library. As shown in fig. 2, the detailed steps of step 1) include:
1.1) acquiring vulnerability code information (including high-risk vulnerability information and common vulnerability code information) collected in a public vulnerability database (such as CVE and the like); it should be noted that the vulnerability of the software refers to an example that is formed by the objective existence of software defects and can be utilized by an attacker, and the vulnerability code refers to a code directly related to the generation of the vulnerability, and is specifically a code segment with a function as a unit in the embodiment;
1.2) for each vulnerability code: judging whether the public vulnerability database provides a patch file of the vulnerability code, and if the public vulnerability database provides the patch file, directly positioning a file and a code segment related to the vulnerability code through information in the patch file; if the public vulnerability database does not provide the patch file, acquiring the repaired version number of the software of the public vulnerability database according to the description of the public vulnerability database, and positioning the file and the code segment related to the vulnerability by comparing the difference between the two software versions before and after the repair;
1.3) extracting the related complete code by taking a function as a unit, constructing a code sample set and recording the code sample set as an operating system vulnerability code base, wherein one code sample in the code sample set comprises one or more vulnerability code segments related to the vulnerability.
In this embodiment, the operating system vulnerability code library is stored in the form of a code file in units of functions.
And extracting a sample to be detected from the vulnerability code library, selecting a screening strategy for the source code library of the operating system according to the scale of the total amount of the sample, and screening to reduce the range and the scale of the detected code. The threshold value of the decision basis is determined by the efficiency of the screening process and the efficiency of code analysis, namely, if the time cost of repeatedly screening is higher than that of directly analyzing the complete operating system source code library, the screening stage is skipped. As shown in fig. 3, the detailed steps of step 2) include:
2.1) extracting a code sample from an operating system vulnerability code library;
2.2) judging whether the number of the code samples exceeds a preset threshold value, if so, directly taking the whole operating system source code library as the screened operating system source code library, and skipping to execute the step 3); otherwise, skipping to execute the next step;
2.3) screening and reserving codes with the same type as the software package to which the vulnerability code base of the operating system belongs according to the type information of the software package to which the vulnerability code segment belongs, wherein the type of the software package comprises a kernel, a desktop environment, network correlation, hardware drive, an upper application and development environment and the like according to function classification;
2.4) calculating the code line number contained in each function in the vulnerability code segment, and taking the minimum value and the maximum value to form the line number fluctuation range of the vulnerability code segment; traversing each function to be tested in the source code library of the operating system, if the number of code lines contained in the function to be tested exceeds a preset threshold value compared with the line number fluctuation range, rejecting the function to be tested, otherwise, adding the function to the screening result, thereby obtaining the source code library of the screened operating system, and skipping to execute the step 3).
As can be seen from the foregoing description in step 2.2), if the number of samples to be detected is greater than the preset threshold, in order to avoid that the codes in the source code library need to be screened again and a code attribute map set needs to be constructed every time a section of vulnerable codes is detected, and the code library is not screened any more, step 3) is executed by skipping, and a code attribute map set is constructed according to the entire source code library, so that the remaining samples are repeatedly used during detection.
And analyzing the function codes contained in the vulnerability code samples and the codes in the source code library of the operating system after screening, and respectively constructing a code attribute graph set of the vulnerability code segment and a source code attribute graph set of the operating system according to a generation rule of the code attribute graph. The Code Property Graph (CPG) is an extensible and language independent representation designed by Yamaguchi et al for incremental and distributed code analysis. The code attribute graph is essentially a directed multi-graph with marks on the side, integrates three representation forms of an Abstract Syntax Tree (AST), a Control Flow Graph (CFG) and a Program Dependency Graph (PDG), integrates the advantages of the representation forms, can robustly analyze codes, designs an efficient traversal mode aiming at various code characteristics, and can be effectively used for identifying common bugs such as buffer overflow, integer overflow, memory leakage and the like. Since the method of this embodiment only relates to the application of the basic code attribute graph (CPG) and the generation rule thereof, and does not include the improvement of the method of the code attribute graph (CPG), the specific implementation details of the code attribute graph (CPG) are not described herein.
As shown in fig. 4, the detailed steps of step 4) include:
4.1) reading a code attribute graph set of the fragile function, and extracting basic information features, mark (token) sequence features and control flow path features of the fragile function as a fragile feature set; the basic information features refer to some basic features of the structure of the function, and include one or more of return value type, parameter type, statement number and cycle number; the mark sequence characteristic is a label sequence generated by the functional code through lexical analysis, is a one-dimensional vector, comprises all marks (tokens) in the functional code, and can be obtained by sequentially traversing the attributes of nodes in a code attribute graph; the control flow path characteristic is a set containing all control flow paths in the function, and each control flow path in the set consists of a starting node to a terminating node of the path and all edges and nodes passing by the path in the period;
4.2) traversing and acquiring a function to be tested in the source code library of the operating system as a current function to be tested, and acquiring a code attribute diagram of the current function to be tested;
4.3) extracting the basic information of the current function to be tested from the code attribute diagram, calculating the cosine distance between each fragile function and the current function to be tested, if the cosine distance between each fragile function and the current function to be tested is higher than a preset threshold value, judging that the current function to be tested is not similar to the fragile code segment, and skipping to execute the step 4.6); otherwise, skipping to execute the next step;
4.4) extracting a token set of the current function to be tested, calculating the edit distance between each fragile function and the token set of the current function to be tested, if the edit distances are all larger than a preset threshold value, determining that the current function to be tested is not a similar code, and skipping to execute the step 4.6);
4.5) extracting a control flow path set of the current function to be tested, and comparing the control flow path set with the characteristics of the fragile function to judge whether the current function to be tested is similar to the fragile code segment;
4.6) judging whether the functions to be detected in the source code library of the operating system are all detected completely, and if not, skipping to execute the step 4.2); otherwise, outputting the detection result of the similar codes and ending.
In this embodiment, the detailed steps of step 4.3) include:
4.3.1) firstly classifying the return value types in the code attribute diagram of the current function to be tested, including a basic type, a structure type, a pointer type and a void type, mapping the types to different values (for example, mapping to 0, 1, 2 and 3 respectively in the embodiment), reading the return value types and the parameter types of the function to be tested, quantizing according to the data types, and recording the values; the basic types comprise numerical value types and character types, and the structure types comprise various data structures including arrays, structs, unions and enum;
4.3.2) reading the total number of sentences and the cycle number in the current function to be tested, linking the values of the return value type and the parameter type, and forming a basic information vector V of the current function to be testedB
4.3.3) respectively calculating the basic information vector V of each fragile function and the current function to be measuredBCosine distances between the weak code segments are judged to be dissimilar to the function to be tested if the cosine distances are higher than the preset threshold value, and the step 4.6) is executed; otherwise, skipping to execute the next step.
In this embodiment, the detailed steps of step 4.5) include:
4.5.1) firstly, obtaining a set of control flow paths of the current function to be tested by means of a code attribute graph of the current function to be tested, wherein each path in the set consists of a statement node and a control flow edge; then, the statement type of each node in the control flow path is extracted and stored as a type sequence in sequence, and the control flow path set is converted into a type sequence set St
4.5.2) calculating the type sequence set S of each fragile function and the current function to be tested respectivelytThe Jacard distance between them, if there is a fragile function and the type sequence set S of the function to be testedtIf the Jacard distance between the weak code segments is smaller than a preset threshold value, the current function to be tested is considered to be the similar code of the weak code segment corresponding to the weak function, and the node is recordedFruit; otherwise, the current function to be tested is judged to be dissimilar to the fragile code segment corresponding to the fragile function.
In summary, in the method for detecting vulnerability of an operating system based on code similarity analysis according to the present embodiment, firstly, for an analysis object of the operating system, a common vulnerability database is used to extract relevant vulnerability code segments of the operating system, and an operating system vulnerability code library is constructed based on an obtained operating system vulnerability code segment set; then, determining whether to perform code screening based on the composition characteristics of the operating system according to the sample number of the vulnerability codes to be detected, and further screening according to the function code scale on the basis to reduce the scale of the codes to be detected; then, generating a code attribute graph for the screened codes; and adopting a step-by-step detection method, combining the function basic information, the token set and the control flow path as a vulnerability characteristic set, sequentially carrying out similarity detection, and obtaining an output result. The method has high operation efficiency and good detection accuracy. The operating system vulnerability detection method based on code similarity analysis can solve the problems that the existing detection method is insufficient in detection capability, low in detection result accuracy, low in detection efficiency and the like when the operating system is faced with large-scale codes, aiming at the characteristic of vulnerability code multiplexing of the operating system, high-efficiency screening is conducted on a target source code library in a preprocessing stage, a code expression form with higher comprehensiveness is adopted, code characteristics are selected in a targeted mode, a comparison process is designed, and high operation efficiency is guaranteed and comprehensive detection capability and good accuracy are achieved when vulnerability code detection is conducted on the operating system.
In addition, the present embodiment further provides an operating system vulnerability detection system based on code similarity analysis, including:
the vulnerability code segment acquisition program unit is used for positioning the vulnerability code segments related to the operating system based on the information provided by the public vulnerability database to form an operating system vulnerability code library;
the operating system source code library screening program unit is used for screening the operating system source code library according to the detection target of the vulnerability code segment in the operating system vulnerability code library;
the code attribute graph generating program unit is used for analyzing the vulnerability code segment of the vulnerability code library of the operating system and the source code library of the operating system after screening, and generating a code attribute graph set of all the vulnerability functions in the vulnerability code segment and a code attribute graph set of each function to be tested in the source code library of the operating system according to a generating rule of the code attribute graph, wherein the vulnerability function refers to one or more functions contained in the vulnerability code segment; (ii) a
And the similar feature matching program unit is used for respectively extracting features from the code attribute graph sets of all the fragile functions in the fragile code segment and the code attribute graph set of each function to be tested in the source code library of the operating system, calculating similarity, detecting whether multiplexing of the fragile codes exists or not, and outputting results.
In addition, the embodiment also provides an operating system vulnerability detection system based on code similarity analysis, which includes a computer device programmed or configured to execute the steps of the operating system vulnerability detection method based on code similarity analysis, or a computer program programmed or configured to execute the operating system vulnerability detection method based on code similarity analysis is stored on a memory of the computer device.
Furthermore, the present embodiment also provides a computer-readable storage medium having stored thereon a computer program programmed or configured to execute the aforementioned operating system vulnerability detection method based on code similarity analysis.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is directed to methods, apparatus (systems), and computer program products according to embodiments of the application wherein instructions, which execute via a flowchart and/or a processor of the computer program product, create means for implementing functions specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (10)

1. An operating system vulnerability detection method based on code similarity analysis is characterized by comprising the following implementation steps:
1) positioning vulnerability code segments related to an operating system based on information provided by a public vulnerability database to form an operating system vulnerability code library;
2) screening an operating system source code library according to the detection target of the vulnerability code segment in the operating system vulnerability code library;
3) analyzing the vulnerability code segment of the vulnerability code library of the operating system and the source code library of the operating system after screening, and generating a code attribute graph set of all vulnerability functions in the vulnerability code segment and a code attribute graph set of each function to be tested in the source code library of the operating system according to a generation rule of a code attribute graph, wherein the vulnerability function refers to one or more functions contained in the vulnerability code segment;
4) and respectively extracting features from the code attribute graph sets of all fragile functions in the fragile code segment and the code attribute graph set of each function to be tested in the source code library of the operating system, calculating the similarity, detecting whether the multiplexing of the fragile codes exists or not, and outputting a result.
2. The operating system vulnerability detection method based on code similarity analysis of claim 1, characterized in that the detailed steps of step 1) include:
1.1) acquiring vulnerability code information collected in a public vulnerability database;
1.2) for each vulnerability code: judging whether the public vulnerability database provides a patch file of the vulnerability code, and if the public vulnerability database provides the patch file, directly positioning a file and a code segment related to the vulnerability code through information in the patch file; if the public vulnerability database does not provide the patch file, acquiring the repaired version number of the software of the public vulnerability database according to the description of the public vulnerability database, and positioning the file and the code segment related to the vulnerability by comparing the difference between the two software versions before and after the repair;
1.3) extracting the related complete code by taking a function as a unit, constructing a code sample set and recording the code sample set as an operating system vulnerability code base, wherein one code sample in the code sample set comprises one or more vulnerability code segments related to the vulnerability.
3. The method for detecting the vulnerability of the operating system based on the code similarity analysis according to claim 2, wherein the vulnerability code library of the operating system is saved in the form of a code file in units of functions.
4. The operating system vulnerability detection method based on code similarity analysis of claim 2, characterized in that the detailed steps of step 2) include:
2.1) extracting a code sample from an operating system vulnerability code library;
2.2) judging whether the number of the code samples exceeds a preset threshold value, if so, directly taking the whole operating system source code library as the screened operating system source code library, and skipping to execute the step 3); otherwise, skipping to execute the next step;
2.3) screening and reserving codes with the same type as the software package to which the operating system vulnerability code base belongs according to the type information of the software package to which the vulnerability code segment belongs;
2.4) calculating the code line number contained in each function in the vulnerability code segment, and taking the minimum value and the maximum value to form the line number fluctuation range of the vulnerability code segment; traversing each function to be tested in the source code library of the operating system, if the number of code lines contained in the function to be tested exceeds a preset threshold value compared with the line number fluctuation range, rejecting the function to be tested, otherwise, adding the function to the screening result, thereby obtaining the source code library of the screened operating system, and skipping to execute the step 3).
5. The operating system vulnerability detection method based on code similarity analysis of claim 1, characterized in that the detailed steps of step 4) include:
4.1) reading a code attribute graph set of the fragile function, and extracting basic information features, token set features and control flow path features of the fragile function as a fragile feature set;
4.2) traversing and acquiring a function to be tested in the source code library of the operating system as a current function to be tested, and acquiring a code attribute diagram of the current function to be tested;
4.3) extracting the basic information of the current function to be tested from the code attribute diagram, calculating the cosine distance between each fragile function and the current function to be tested, if the cosine distance between each fragile function and the current function to be tested is higher than a preset threshold value, judging that the current function to be tested is not similar to the fragile code segment, and skipping to execute the step 4.6); otherwise, skipping to execute the next step;
4.4) extracting a token set of the current function to be tested, calculating the edit distance between each fragile function and the token set of the current function to be tested, if the edit distances are all larger than a preset threshold value, determining that the current function to be tested is not a similar code, and skipping to execute the step 4.6);
4.5) extracting a control flow path set of the current function to be tested, and comparing the control flow path set with the characteristics of the fragile function to judge whether the current function to be tested is similar to the fragile code segment;
4.6) judging whether the functions to be detected in the source code library of the operating system are all detected completely, and if not, skipping to execute the step 4.2); otherwise, outputting the detection result of the similar codes and ending.
6. The operating system vulnerability detection method based on code similarity analysis of claim 5, characterized in that the detailed step of step 4.3) comprises:
4.3.1) firstly classifying the return value types in the code attribute diagram of the current function to be tested, including a basic type, a structure type, a pointer type and a void type, mapping the types into different values, reading the return value types and the parameter types of the function to be tested, quantizing according to the data types, and recording the values; the basic types comprise numerical value types and character types, and the structure types comprise various data structures including arrays, structs, unions and enum;
4.3.2) reading the total number of sentences and the cycle number in the current function to be tested, linking the values of the return value type and the parameter type, and forming a basic information vector V of the current function to be testedB
4.3.3) respectively calculating the basic information vector V of each fragile function and the current function to be measuredBCosine distances between the weak code segments are judged to be dissimilar to the function to be tested if the cosine distances are higher than the preset threshold value, and the step 4.6) is executed; otherwise, skipping to execute the next step.
7. The operating system vulnerability detection method based on code similarity analysis of claim 5, characterized in that the detailed step of step 4.5) comprises:
4.5.1) firstly, obtaining a set of control flow paths of the current function to be tested by means of a code attribute graph of the current function to be tested, wherein each path in the set consists of a statement node and a control flow edge; then, the statement type of each node in the control flow path is extracted and stored as a type sequence in sequence, and the control flow path set is converted into a type sequence set St
4.5.2) calculating the type sequence set S of each fragile function and the current function to be tested respectivelytThe Jacard distance between them, if there is a fragile function and the type sequence set S of the function to be testedtIf the Jacard distance between the fragile function and the function to be tested is smaller than a preset threshold value, the current function to be tested is considered to be a similar code of the fragile code segment corresponding to the fragile function, and a result is recorded; otherwise, the current function to be tested is judged to be dissimilar to the fragile code segment corresponding to the fragile function.
8. An operating system vulnerability detection system based on code similarity analysis, comprising:
the vulnerability code segment acquisition program unit is used for positioning the vulnerability code segments related to the operating system based on the information provided by the public vulnerability database to form an operating system vulnerability code library;
the operating system source code library screening program unit is used for screening the operating system source code library according to the detection target of the vulnerability code segment in the operating system vulnerability code library;
the code attribute graph generating program unit is used for analyzing the vulnerability code segment of the vulnerability code library of the operating system and the source code library of the operating system after screening, and generating a code attribute graph set of all the vulnerability functions in the vulnerability code segment and a code attribute graph set of each function to be tested in the source code library of the operating system according to a generating rule of the code attribute graph, wherein the vulnerability function refers to one or more functions contained in the vulnerability code segment;
and the similar feature matching program unit is used for respectively extracting features from the code attribute graph sets of all the fragile functions in the fragile code segment and the code attribute graph set of each function to be tested in the source code library of the operating system, calculating similarity, detecting whether multiplexing of the fragile codes exists or not, and outputting results.
9. An operating system vulnerability detection system based on code similarity analysis, comprising a computer device, characterized in that the computer device is programmed or configured to execute the steps of the operating system vulnerability detection method based on code similarity analysis of any one of claims 1 to 7, or the computer device has stored on its memory a computer program programmed or configured to execute the operating system vulnerability detection method based on code similarity analysis of any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon a computer program programmed or configured to perform the operating system vulnerability detection method based on code similarity analysis of any of claims 1-7.
CN202010381909.1A 2020-05-08 2020-05-08 Operating system vulnerability detection method, system and medium based on code similarity analysis Active CN111400724B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010381909.1A CN111400724B (en) 2020-05-08 2020-05-08 Operating system vulnerability detection method, system and medium based on code similarity analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010381909.1A CN111400724B (en) 2020-05-08 2020-05-08 Operating system vulnerability detection method, system and medium based on code similarity analysis

Publications (2)

Publication Number Publication Date
CN111400724A true CN111400724A (en) 2020-07-10
CN111400724B CN111400724B (en) 2023-09-12

Family

ID=71437372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010381909.1A Active CN111400724B (en) 2020-05-08 2020-05-08 Operating system vulnerability detection method, system and medium based on code similarity analysis

Country Status (1)

Country Link
CN (1) CN111400724B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115053A (en) * 2020-09-16 2020-12-22 北京京航计算通讯研究所 API misuse defect detection method based on sequence pattern matching
CN112148359A (en) * 2020-10-10 2020-12-29 中国人民解放军国防科技大学 Distributed code clone detection and search method, system and medium based on subblock filtering
CN112214399A (en) * 2020-09-16 2021-01-12 北京京航计算通讯研究所 API misuse defect detection system based on sequence pattern matching
CN112487434A (en) * 2020-11-05 2021-03-12 杭州孝道科技有限公司 Application software self-adaptive safety protection method
CN112651028A (en) * 2021-01-05 2021-04-13 西安工业大学 Vulnerability code clone detection method based on context semantics and patch verification
CN112733156A (en) * 2021-01-29 2021-04-30 中国人民解放军国防科技大学 Intelligent software vulnerability detection method, system and medium based on code attribute graph
CN113609487A (en) * 2021-07-16 2021-11-05 深圳开源互联网安全技术有限公司 Method for detecting backdoor code by static analysis
CN113722238A (en) * 2021-11-01 2021-11-30 北京大学 Method and system for realizing rapid open source component detection of source code file
CN115586920A (en) * 2022-12-13 2023-01-10 北京安普诺信息技术有限公司 Fragile code segment clone detection method and device, electronic equipment and storage medium
CN117909978A (en) * 2024-03-14 2024-04-19 福建银数信息技术有限公司 Analysis management method and system based on big data security
CN117909978B (en) * 2024-03-14 2024-06-28 福建银数信息技术有限公司 Analysis management method and system based on big data security

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697121A (en) * 2009-10-26 2010-04-21 哈尔滨工业大学 Method for detecting code similarity based on semantic analysis of program source code
JP2012164211A (en) * 2011-02-08 2012-08-30 Hitachi Ltd Software similarity evaluation method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697121A (en) * 2009-10-26 2010-04-21 哈尔滨工业大学 Method for detecting code similarity based on semantic analysis of program source code
JP2012164211A (en) * 2011-02-08 2012-08-30 Hitachi Ltd Software similarity evaluation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JANG J 等: "ReDebug:finding unpatched code clones in entire os distribution" *
常超 等: "基于复用代码检测的缺陷发现方法" *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214399B (en) * 2020-09-16 2023-01-10 北京京航计算通讯研究所 API misuse defect detection system based on sequence pattern matching
CN112214399A (en) * 2020-09-16 2021-01-12 北京京航计算通讯研究所 API misuse defect detection system based on sequence pattern matching
CN112115053A (en) * 2020-09-16 2020-12-22 北京京航计算通讯研究所 API misuse defect detection method based on sequence pattern matching
CN112148359A (en) * 2020-10-10 2020-12-29 中国人民解放军国防科技大学 Distributed code clone detection and search method, system and medium based on subblock filtering
CN112487434A (en) * 2020-11-05 2021-03-12 杭州孝道科技有限公司 Application software self-adaptive safety protection method
CN112651028A (en) * 2021-01-05 2021-04-13 西安工业大学 Vulnerability code clone detection method based on context semantics and patch verification
CN112733156A (en) * 2021-01-29 2021-04-30 中国人民解放军国防科技大学 Intelligent software vulnerability detection method, system and medium based on code attribute graph
CN112733156B (en) * 2021-01-29 2024-04-12 中国人民解放军国防科技大学 Intelligent detection method, system and medium for software vulnerability based on code attribute graph
CN113609487B (en) * 2021-07-16 2023-05-12 深圳开源互联网安全技术有限公司 Method for detecting backdoor code through static analysis
CN113609487A (en) * 2021-07-16 2021-11-05 深圳开源互联网安全技术有限公司 Method for detecting backdoor code by static analysis
CN113722238B (en) * 2021-11-01 2022-04-26 北京大学 Method and system for realizing rapid open source component detection of source code file
CN113722238A (en) * 2021-11-01 2021-11-30 北京大学 Method and system for realizing rapid open source component detection of source code file
CN115586920A (en) * 2022-12-13 2023-01-10 北京安普诺信息技术有限公司 Fragile code segment clone detection method and device, electronic equipment and storage medium
CN115586920B (en) * 2022-12-13 2023-03-14 北京安普诺信息技术有限公司 Fragile code segment clone detection method and device, electronic equipment and storage medium
CN117909978A (en) * 2024-03-14 2024-04-19 福建银数信息技术有限公司 Analysis management method and system based on big data security
CN117909978B (en) * 2024-03-14 2024-06-28 福建银数信息技术有限公司 Analysis management method and system based on big data security

Also Published As

Publication number Publication date
CN111400724B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
CN111400724B (en) Operating system vulnerability detection method, system and medium based on code similarity analysis
CN109144882B (en) Software fault positioning method and device based on program invariants
CN109426722B (en) SQL injection defect detection method, system, equipment and storage medium
CN108268777B (en) Similarity detection method for carrying out unknown vulnerability discovery by using patch information
CN102054149B (en) Method for extracting malicious code behavior characteristic
CN111459799B (en) Software defect detection model establishing and detecting method and system based on Github
CN111125716B (en) Method and device for detecting Ethernet intelligent contract vulnerability
CN112733156B (en) Intelligent detection method, system and medium for software vulnerability based on code attribute graph
CN110059006B (en) Code auditing method and device
CN112131122B (en) Method and device for source code defect detection tool misinformation evaluation
CN111177731A (en) Software source code vulnerability detection method based on artificial neural network
Solanki et al. Comparative study of software clone detection techniques
CN111309589A (en) Code security scanning system and method based on code dynamic analysis
CN113901463A (en) Concept drift-oriented interpretable Android malicious software detection method
Ufuktepe et al. Tracking code bug fix ripple effects based on change patterns using markov chain models
CN112631944A (en) Source code detection method and device based on abstract syntax tree and computer storage medium
CN109032946B (en) Test method and device and computer readable storage medium
CN111966578A (en) Automatic evaluation method for android compatibility defect repair effect
Cheng et al. The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors
Greenan Method-level code clone detection on transformed abstract syntax trees using sequence matching algorithms
CN114880673A (en) Method and system for detecting private data leakage aiming at applet source code
CN113961934A (en) Multi-level associated source code method based on open source vulnerability
Ufuktepe et al. The relation between bug fix change patterns and change impact analysis
CN112925874A (en) Similar code searching method and system based on case marks
KR102286451B1 (en) Method for recognizing obfuscated identifiers based on natural language processing, recording medium and device for performing the method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant