CN107688748B - Fragility Code Clones detection method and its device based on loophole fingerprint - Google Patents

Fragility Code Clones detection method and its device based on loophole fingerprint Download PDF

Info

Publication number
CN107688748B
CN107688748B CN201710789364.6A CN201710789364A CN107688748B CN 107688748 B CN107688748 B CN 107688748B CN 201710789364 A CN201710789364 A CN 201710789364A CN 107688748 B CN107688748 B CN 107688748B
Authority
CN
China
Prior art keywords
code
loophole
fingerprint
fragility
bitmap
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710789364.6A
Other languages
Chinese (zh)
Other versions
CN107688748A (en
Inventor
魏强
刘臻
林超
麻荣宽
柳晓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA Information Engineering University
Original Assignee
PLA Information Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA Information Engineering University filed Critical PLA Information Engineering University
Priority to CN201710789364.6A priority Critical patent/CN107688748B/en
Publication of CN107688748A publication Critical patent/CN107688748A/en
Application granted granted Critical
Publication of CN107688748B publication Critical patent/CN107688748B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Storage Device Security (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The present invention relates to a kind of fragility Code Clones detection methods and its device based on loophole fingerprint, and this method includes: collecting code sample, establish vulnerability scan;Selected loophole, inquires loophole patch information, obtains fragility code sample;Construct code parser;Fragility code sample is pre-processed using code parser, the intermediate representation to be standardized;Intermediate representation is divided into the code block that size is s row, by the characteristic value of hash function calculation code block, and combination producing loophole fingerprint;Code to be detected is pre-processed using code parser, obtains the characteristic value sequence of code to be detected;Loophole fingerprint is mapped as to n bitmaps, utilizes clone and the output that whether there is fragility code in bitmap identification feature value sequence.The present invention successfully manages common code modification means in Code Clones, relationship that can preferably between balance detection efficiency and detection accuracy, keeps good accuracy rate while efficient detection extensive object.

Description

Fragility Code Clones detection method and its device based on loophole fingerprint
Technical field
The invention belongs to computer software bug excavation technical field, in particular to a kind of fragility based on loophole fingerprint Code Clones detection method and its device.
Background technique
Fragility code refers to the key code for causing software vulnerability to generate, and may open the clone of fragility code Identical loophole is introduced during hair.With becoming increasingly popular for Internet application, the growing demand of software is resulted in The needs of Efficient Development, therefore the reuse based on existing component and Code Template has become the conventional means of software development, open source Software (OSS) also becomes the good solution for improving software development efficiency and quality and reducing programming cost.However in OSS Numerous loopholes will lead to a large amount of software vulnerabilities as caused by Code Clones naturally, this will constitute serious prestige to security of system The side of body.Black Duck company, the U.S. points out that about 2/3rds business applications have the code of known bugs, and can With prediction, the attack quantity based on loophole in Open Source Code will increase by 20% in 2017.Therefore, fragility Code Clones are detected Research it is most important.Code Clones are usually along with several code revision methods, such as annotation modifications, variable renaming, data Type Change, operator change, and statement sequence change, code block sequence is changed, and redundant code insertion is equivalent with control structure Conversion.Roy et al. proposes the generally accepted classification based on Code Clones, we use modification level in following code clone Definition: Level1, change the layout of code, and editor's annotation by modification space and tab, code section not by Modification;The data type and function return value of Level2, more knots modification, and rename identifier and variable;Level3, addition or Delete some code statements and modification expression formula or function call, the original function without changing code;Level4, adjustment Code structure is semantic without changing, and such as changes the sequence of code block, the equivalent transformation of control structure.The increase of code revision rank It will result directly in the increase of Code Clones detection difficulty.Level1 and Level2 modification can effectively detect in work on hand, And it further modifies and normally results in bigger computing cost.However, in fact, the modification of Level3 and Level4 is quite general Time.
As towards the important means in source code rank bug excavation, Current Domestic proposes various software fragility outside Property Code Clones detection method.For example Jang et al. proposes fragility code gram in a kind of quick search operation system code library Grand system ReDeBug.By being cloned using sliding window algorithm and bloom filter lookup with code behavioral value granularity Code.ReDeBug has greater advantage in speed, supports the detection towards extensive code library, but it can not cope with variable The common codes modification means such as renaming or data type change, show higher rate of failing to report and rate of false alarm.Li et al. people's needle CLORIFI is proposed to buffer-overflow vulnerability, the clone of known bugs code is searched to position possibility by n-token algorithm Existing loophole, on this basis using concolic test verifying loophole to reduce wrong report.CLORIFI improves the standard of detection True property, but since the limitation of fragility Code Clones detection algorithm results in higher rate of failing to report, and detect consumption resource compared with Greatly.Gan Shuitao etc. proposes the fragility Code Clones detection method CVdetector based on eigenmatrix.By traversing loophole The Vulnerability Characteristics matrix and feature vector of the syntactic analysis tree construction key node of code snippet, are realized using clustering algorithm Detection to multiple types loophole.Although the time overhead of this method and the size of code of detection are in a linear relationship, in efficiency still There are biggish rooms for promotion.Kim et al. proposes a kind of method VUDDY of efficient detection function grade fragility Code Clones, Comparing and function length filtration using function signature realizes high efficiency and scalability, can be identified with higher accuracy Know the Code Clones of loophole.But this method can only cope with simple code modification, not support word order modification, redundant code insertion etc. often The code revision means seen, application scenarios have biggish limitation.
Object code is usually converted to intermediate representation by the method for existing fragility code reuse detection, such as analytic tree or control Flow graph processed can be found and be caused by Code Clones by the way that identical structure or same characteristic features match with known case based on this Loophole.Complicated intermediate representation potentially contributes to improve accuracy, but also results in higher calculating cost, and the high level of abstraction It may effectively utilize, but lose necessary fragile factor.Therefore, how under acceptable cost balance efficiency and Accuracy, and the common modification means in Code Clones are effectively handled, there is important research significance.
Summary of the invention
Aiming at the shortcomings in the prior art, the present invention provides a kind of fragility Code Clones detection side based on loophole fingerprint Method and its device solve the appearance when being modified in various degree in face of code in existing software vulnerability Code Clones detection process High rate of failing to report, low detection efficiency, using the situations such as limited, by carrying out pretreatment and feature extraction to fragility code sample Software vulnerability feature is obtained, and generates loophole fingerprint, fingerprint recognition and positioning are carried out to code to be detected using loophole fingerprint, Good accuracy rate is kept, is had while efficient detection extensive object to cope with a variety of modification means in Code Clones The wider scope of application.
According to design scheme provided by the present invention, a kind of fragility Code Clones detection method based on loophole fingerprint, It comprises the following steps:
Step 1) selectes loophole v for constructing fingerprint, the inquiry loophole patch letter from open vulnerability information database Breath obtains corresponding fragility code sample in patch;
Step 2), building code parser;
Step 3) pre-processes fragility code sample using code parser, the intermediate representation to be standardized;
Intermediate representation is divided into the code block that size is s row by step 4), passes through the feature of hash function calculation code block Value, and combination producing loophole fingerprint;
Step 5) pre-processes code to be detected using code parser, passes through sliding window method and utilizes Kazakhstan Uncommon function calculates the characteristic value of code to be detected, obtains its characteristic value sequence seqf
Step 6), the bitmap bitmap that loophole fingerprint is mapped as to nv, utilize bitmap bitmapvIdentification feature value sequence seqfIn whether there is fragility code clone, and if it exists, then it is recorded and is exported.
It is above-mentioned, in the acquisition patch in step 1) corresponding fragility code sample, in particular to: it is right in patch to obtain All diff files answered, using obtained diff file as fragility code sample.
Above-mentioned, code parser is constructed in step 2), is referred to: using ANTLR, according to C/C++ morphology and syntactic definition Generate the code parser towards C/C++.
Above-mentioned, in step 3) before pretreatment, code is first subjected to small letter conversion, and delete excess space, tab, change Retraction pattern is changed to Lisp by row symbol and all annotations.
Above-mentioned, pretreatment in step 3) includes: to the function and parameter name, variable identifier, data class in code Type, character string constant and function call name do unified replacement respectively.
Above-mentioned, pretreatment, includes: passing through function declaration and record the parameter name of form of ownership in step 3), uses Each of symbol _ PARAM replacement function body parameter, while the function name in the function declaration is replaced with symbol _ FUNCDEC Claim;Use all variables defined in symbol _ DATA replacement function;It is stated in ISO C standard using symbol _ TYPE replacement All data types, while self-defined structure body all in code is replaced, the pass in structural body statement is retained in replacement process Key word struct;Use symbol _ STR substitute character string constant;Each function call is replaced using symbol _ FUNCTION, is retained Usage and parameter value.
Above-mentioned, step 4) includes following content:
Step 41) determines the code change in loophole patch according to diff file, selects code block size for s row;
Step 42), the intermediate representation for obtaining after pretreatment, are denoted as block for the s line code block of deletionD, will add S line code block be denoted as blockA, it is filled or is separated into when the code block deleted or added is less than s row is multiple Continuous code block fills the last one code block with context;
Step 43) calculates each block using hash functionDAnd blockACharacteristic value, with addition label before characteristic value Or label is deleted to distinguish;Characteristic value is respectively combined to the characteristic sequence seq of addition by typeAWith deleted characteristic sequence seqDIn, form the corresponding feature vector V of a diff filediff=[seqA, seqD];
Step 44) executes above-mentioned steps for diff files all in leak repairing program, obtains corresponding feature vector Vdiff1、Vdiff2…Vdiffn, for the loophole v of n diff file in patch, all feature vectors collectively constitute loophole fingerprint Fv={ Vdiff1, Vdiff2…Vdiffn}。
Preferably, step 6 includes that content is as follows:
Step 61), by loophole fingerprint FvMapping becomes one m bitmap bitmapv, wherein m meetsBitmap under original statevIn each be all set to 0;
Step 62), the characteristic value sequence seq for code to be detectedf, traverse seqf, check whether each characteristic value is Loophole fingerprint FvIn, if it is present by this feature value in bitmapvMiddle corresponding positions are set as 1, otherwise without modification;Simultaneously will seqfThe filename composition label Tag of affiliated function name and place is attached to the position;
Step 63), judgement meet the code snippet of preset condition for fragility Code Clones and are exported, preset condition Include condition 1 and condition 2, wherein
Condition 1: the code added in all diff should all be not present in code to be measured;The code deleted in all diff All should completely exist in code to be measured;
Condition 2: bitmap bitmapvIn, for same VdiffThe corresponding position of interior all characteristic values, file famous prime minister in label Deng.
Preferably, in condition 1, according to bitmap bitmapvDetermined, shows bitmap bitmapvIn are as follows: VdiffIn with The all values of label are added in bitmapvMiddle corresponding position is all 0;VdiffIn with delete label all values in bitmapvIn it is right Answering position is all 1.
A kind of detection device of the fragility Code Clones based on loophole fingerprint includes: collection module, sample acquisition mould Block, code parser building module, sample preprocessing module, loophole fingerprint generation module, code preprocessing module to be measured and generation Code clone detection module, wherein
Collection module collects code sample and establishes vulnerability scan using extraction technique;
Sample acquisition module selectes the loophole v for constructing fingerprint, and inquiry loophole is mended from open vulnerability information database Fourth information obtains corresponding fragility code sample in patch;
Code parser constructs module, generates code parser using island grammer by ANTLR;
Sample preprocessing module pre-processes fragility code sample using code parser, is standardized Intermediate representation;
Intermediate representation is divided into the code block that size is s row, calculates generation by hash function by loophole fingerprint generation module The characteristic value of code block, and combination producing loophole fingerprint;
Code preprocessing module to be measured pre-processes code to be detected using code parser, passes through sliding window Method and the characteristic value that code to be detected is calculated using hash function obtain characteristic value sequence seqf
Loophole fingerprint is mapped as n bitmap bitmap by Code Clones detection modulev, utilize bitmap bitmapvIdentification Characteristic value sequence seqfIn whether there is fragility code clone, and if it exists, then it is recorded and is exported.
Beneficial effects of the present invention:
1, of the invention by carrying out pretreatment and feature extraction acquisition software vulnerability feature to fragility code sample, and Loophole fingerprint is generated, fingerprint recognition and positioning are carried out to code to be detected using loophole fingerprint, it is a variety of in Code Clones to cope with Modification means keep good accuracy rate while efficient detection extensive object, have the wider scope of application, can Effectively it is multiplexed existing vulnerability information library knowledge;Using the feature construction fingerprint of lightweight and inspection is improved using efficient recognition methods Efficiency is surveyed, code revision mode common in Multiple Code cloning procedure is coped with;With detection efficiency is high, rate of false alarm is low, The strong advantage of scalability can provide beneficial support for software source code Hole Detection.
2, the present invention improves the accuracy of fragility Code Clones detection, copes with a variety of common code revision hands Section has extensive detection applicability;Compared with prior art, the present invention can be solved effectively as follows present in current method It is insufficient: 1) cannot to successfully manage in the Code Clones such as form modifying, variable renaming, the rearrangement of rubbish code insertion, sentence Common amending method;2) detection method based on complicated intermediate representation exists in the detection process solves computationally intensive, detection The problem of low efficiency, cannot take into account detection efficiency while improving accuracy very well.In the same of the extensive object of efficient detection When keep good accuracy rate, there is great importance to computer network security and bug excavation technology.
Detailed description of the invention:
Fig. 1 is method flow schematic diagram of the invention;
Fig. 2 is loophole fingerprint product process figure in embodiment;
Fig. 3 is to carry out Code Clones overhaul flow chart by bitmap in embodiment;
Fig. 4 is the device of the invention schematic diagram;
Fig. 5 is to realize process schematic in embodiment;
Fig. 6 is the loophole fingerprint schematic diagram generated in embodiment;
Fig. 7 is fragility Code Clones and two kinds of the non-cloned codes corresponding fingerprint bit pattern of situation this signal in embodiment Figure.
Specific embodiment:
To make the object, technical solutions and advantages of the present invention clearer, understand, with reference to the accompanying drawing with technical solution pair The present invention is described in further detail.
Fragility code refers to the key code for causing software vulnerability to generate, and may open the clone of fragility code Identical loophole is introduced during hair.Code revision ability is coped with not to solve existing in fragility Code Clones detection process By force, the problems such as detection efficiency is not high, the fragility Code Clones detection method based on loophole fingerprint that the present embodiment provides a kind of, ginseng As shown in Figure 1, it comprises the following steps:
Step 11 selectes loophole v for constructing fingerprint, the inquiry loophole patch letter from open vulnerability information database Breath obtains corresponding fragility code sample in patch;
Step 12, building code parser;
Step 13 pre-processes fragility code sample using code parser, the intermediate representation to be standardized;
Intermediate representation is divided into the code block that size is s row by step 14, passes through the feature of hash function calculation code block Value, and combination producing loophole fingerprint;
Step 15 pre-processes code to be detected using code parser, passes through sliding window method and utilizes Kazakhstan Uncommon function calculates the characteristic value of code to be detected, obtains its characteristic value sequence seqf
Step 16, the bitmap bitmap that loophole fingerprint is mapped as to nv, utilize bitmap bitmapvIdentification feature value sequence seqfIn whether there is fragility code clone, and if it exists, then it is recorded and is exported.
Using code parser and loophole fingerprint extraction bug code feature, successfully manages common code in Code Clones and repair Change means, relationship that can preferably between balance detection efficiency and detection accuracy, in the same of the extensive object of efficient detection When keep good accuracy rate, there is great importance to computer network security and bug excavation technology.
Diff file is a kind of representation of common code revision, and the code line sequence of special marking is had by one section Composition.In another embodiment of the invention, obtain patch in corresponding fragility code sample, in particular to: obtain patch In corresponding all diff files, using obtained diff file as fragility code sample.
Code parser is the important component of code morphological analysis and parsing, it is the base of pretreatment and feature extraction Plinth.ANTLR can automatically generate syntax tree according to input and visually show, out as open source syntax analyzer In the efficiency and robustness the considerations of, this programme does not use the resolver being integrated in compiler (such as llvm and gcc), uses open source Project ANTLR v4 constructs resolver, generates C/C++ code parser using the corresponding island grammer of ANTLR v4.
To further increase detection efficiency and accuracy, fragility code sample is being located in advance using code parser Before reason, code is first subjected to small letter conversion, and delete excess space, tab, newline and all annotations, more by retraction pattern It is changed to Lisp.
Above-mentioned, fragility code sample is being pre-processed using code parser, is including: to the function in code Do unified replacement respectively with parameter name, variable identifier, data type, character string constant and function call name.
Further, fragility code sample is being pre-processed using code parser, is including:
A) function and parameter name replacement: pass through function declaration and record all formal parameter titles, using symbol _ Each of PARAM replacement function body parameter;The function name in the function declaration is replaced with symbol _ FUNCDEC simultaneously.
B) variable identifier is replaced: using all variables defined in symbol _ DATA replacement function.
C) data type is replaced: all data types stated in ISO C standard is replaced using symbol _ TYPE, simultaneously Replace all self-defined structure body in code, retain in replacement process the keyword " struct " in structural body statement so as to Normal data type classification.Do not replace the member variable stated in self-defined structure body simultaneously, or " signed " or The typess of variables modifier such as " unsigned ".
D) character string constant is replaced: using symbol _ STR substitute character string constant.Include the lattice such as " %s ", " %d ", " %f " The character of formula character is not replaced.
E) function call title is replaced: being replaced each function call using symbol _ FUNCTION, is retained its usage and parameter Value.
In another embodiment of the invention, intermediate representation is divided into the code block that size is s row, passes through hash function The characteristic value of calculation code block, and combination producing loophole fingerprint, it is shown in Figure 2, include following content:
41) code change in loophole patch, is determined according to diff file, selects code block size for s row;
42), for the intermediate representation obtained after pretreatment, the s line code block of deletion is denoted as blockD, by the s of addition Line code block is denoted as blockA, multiple companies are filled or are separated into it when the code block deleted or added is less than s row Continuous code block fills the last one code block with context;
43), each block is calculated using hash functionDAnd blockACharacteristic value, marked or deleted with addition before characteristic value Except label is to distinguish;Characteristic value is respectively combined to the characteristic sequence seq of addition by typeAWith deleted characteristic sequence seqD In, form the corresponding feature vector V of a diff filediff=[seqA, seqD];
44) above-mentioned steps, are executed for diff files all in leak repairing program, obtain corresponding feature vector Vdiff1、Vdiff2…Vdiffn, for the loophole v of n diff file in patch, all feature vectors collectively constitute loophole fingerprint Fv={ Vdiff1, Vdiff2…Vdiffn}。
In one more embodiment of the present invention, loophole fingerprint is mapped as to n bitmap bitmapv, utilize bitmap bitmapv Identification feature value sequence seqfIn whether there is fragility code clone, it is shown in Figure 3, comprising content it is as follows:
61), by loophole fingerprint FvMapping becomes one m bitmap bitmapv, wherein m meets Bitmap under original statevIn each be all set to 0;
62), for the characteristic value sequence seq of code to be detectedf, traverse seqf, check whether each characteristic value is loophole Fingerprint FvIn, if it is present by this feature value in bitmapvMiddle corresponding positions are set as 1, otherwise without modification;Simultaneously by seqf The filename composition label Tag of affiliated function name and place is attached to the position;
63) determine the code snippet for meeting preset condition, for fragility Code Clones and exported, preset condition includes Condition 1 and condition 2, wherein
Condition 1: the code added in all diff should all be not present in code to be measured;The code deleted in all diff All should completely exist in code to be measured;
Condition 2: bitmap bitmapvIn, for same VdiffThe corresponding position of interior all characteristic values, file famous prime minister in label Deng.
Preferably, in condition 1, according to bitmap bitmapvDetermined, shows bitmap bitmapvIn are as follows: VdiffIn with The all values of label are added in bitmapvMiddle corresponding position is all 0;VdiffIn with delete label all values in bitmapvIn it is right Answering position is all 1.
Corresponding with the above method, the embodiment of the invention also provides a kind of fragility Code Clones based on loophole fingerprint Detection device, it is shown in Figure 4, include: collection module 201, sample acquisition module 202, code parser building module 203, Sample preprocessing module 204, loophole fingerprint generation module 205, code preprocessing module 206 to be measured and Code Clones detection module 207, wherein
Collection module 201 collects code sample and establishes vulnerability scan using extraction technique;
Sample acquisition module 202 selectes the loophole v for constructing fingerprint, inquires leakage from open vulnerability information database Hole patch information obtains corresponding fragility code sample in patch;
Code parser constructs module 203, is generated according to C/C++ morphology and syntactic definition towards C/C++'s by ANTLR Code parser;
Sample preprocessing module 204 pre-processes fragility code sample using code parser, is standardized Intermediate representation;
Intermediate representation is divided into the code block that size is s row, passes through hash function meter by loophole fingerprint generation module 205 Calculate the characteristic value of code block, and combination producing loophole fingerprint;
Code preprocessing module 206 to be measured pre-processes code to be detected using code parser, passes through sliding window Mouth method and the characteristic value that code to be detected is calculated using hash function obtain characteristic value sequence seqf
Loophole fingerprint is mapped as n bitmap bitmap by Code Clones detection module 207v, utilize bitmap bitmapvKnow Other characteristic value sequence seqfIn whether there is fragility code clone, and if it exists, then it is recorded and is exported.
Effectiveness of the invention is further illustrated below by concrete example, it is shown in Figure 5, realize that process is as follows:
1. selecting the loophole for constructing fingerprint, diff all in its patch is obtained from CVE vulnerability information database File is as fragility code sample.
Diff file is a kind of representation of common code revision, and the code line sequence of special marking is had by one section Composition.The code of addition is indicated before code line with symbol "+", "-" indicates that deleted code, no symbol description are not repaired Change.
By taking the patch of loophole CVE-2016-6198 as an example, comprising to fs/namei.c and fs/open.c file in the patch In two functions modify, following two diff file can be expressed as:
Diff (1) in 1 loophole CVE-2016-6198 patch of table
Diff (2) in 2 loophole CVE-2016-6198 patch of table
2. generating code parser, resolver is constructed using open source projects ANTLR v4, utilizes the corresponding island ANTLR v4 Grammer generates C/C++ code parser.
3. code pre-processes
Fragility code sample is pre-processed using the code parser generated in step 2, by all code conversions For small letter, extra space, tab, newline and all annotations are deleted.Retraction pattern is changed to Lisp pattern.Then pre- It handles as follows:
Function and parameter name replacement: pass through function declaration and record all formal parameter titles, using symbol _ Each of PARAM replacement function body parameter.The function name in the function declaration is replaced with symbol _ FUNCDEC simultaneously.
Variable identifier replacement: all variables defined in symbol _ DATA replacement function are used.
Data type replacement: all data types stated in ISO C standard are replaced using symbol _ TYPE, are replaced simultaneously All self-defined structure bodies in replacement code, retain in replacement process the keyword " struct " in structural body statement so as to just Regular data type classification.Do not replace the member variable stated in self-defined structure body simultaneously, or " signed " or The typess of variables modifier such as " unsigned ".
Character string constant replacement: symbol _ STR substitute character string constant is used.Include the formats such as " %s ", " %d ", " %f " The character of character is not replaced.
The replacement of function call title: each function call is replaced using symbol _ FUNCTION, retains its usage and parameter Value.
Result is as follows after diff shown in table 1 (1) pretreatment:
4. fingerprint generates
Use the MD5 algorithm that output is 8 bytes to be used for calculation code block eigenvalue as hash function, and passes through following step It is rapid to generate loophole fingerprint:
4.1 determine the code change in loophole patch according to diff file, select code block size s row appropriate, usual s Take 4.
4.2 for pretreated code, selects the s line code block deleted in diff file as blockD, selection The s line code block added in diff file is as blockA.These are filled in the case where deleting or adding code and be less than s row S row block, or multiple continuous blocks are separated into, the last one block fills multiple s rows with context.
4.3 calculate each block using hash functionDAnd blockACharacteristic value.Table is indicated with "+" label before characteristic value Addition, it is marked and is deleted with "-".Characteristic value is respectively combined to the characteristic sequence seq of addition by typeAWith it is deleted Seq in characteristic sequenceDIn, form the corresponding feature vector V of a diff filediff=[seqA, seqD]。
4.4 execute above-mentioned steps for all diff files in the Hotfix of this loophole, obtain corresponding feature vector Vdiff1、Vdiff2…Vdiffn.For sharing the loophole v of n diff file in patch, all feature vectors collectively constitute loophole and refer to Line Fv={ Vdiff1, Vdiff2…Vdiffn}。
The patch of loophole CVE-2016-6198 shown in table 1 and table 2 carries out above-mentioned steps, obtains leakage as shown in FIG. 6 Hole fingerprint.
5. calculating the characteristic value sequence of code to be detected
Function f each in code to be measured is divided into after parsing and pretreatment using sliding window algorithm The identical code block of size, and utilize characteristic value sequence seq corresponding with hash function calculating identical in step 4f.Sliding window The size size of mouthwinIt is determined by the code block size determined in loophole fingerprint.Characteristic value generating algorithm is described in algorithm 1.
Algorithm with pretreated function to be detected, window size be in input, wherein window size size and step 4 really Fixed code block size s is identical.Characteristic value is calculated with hash function identical in step 4 for the code block use in window, The eigenvalue cluster of all windows at code to be measured characteristic sequence seqf, for detecting in next step.
6. loophole identifies
6.1 bitmap mapping
For the loophole fingerprint F of loophole vv, this programme maps it onto the bitmap bitmap for one mv, wherein m meetsThat is the size of the bitmap and fingerprint FvIn include characteristic value total amount it is identical.Pass through each in bitmap Indicate the presence that hashed value is corresponded in fingerprint, thus simplify the process of fingerprint recognition, bitmap under original statevIn each all It is set to 0.
6.2 characteristic sequences traversal
For seq obtained in step 4f, traverse seqfTo check whether each characteristic value is loophole fingerprint FvIn.Such as Fruit exists, then by this feature value in bitmapvMiddle corresponding positions are set as 1, otherwise without modification.Simultaneously by seqfAffiliated function The filename at name and place composition label Tag is attached to the position, to track the position searching fragility Code Clones and occurring.
6.3 loopholes determine
Each bug code is cloned, while meeting following two condition:
Condition 1:
The code added in all diff should all be not present in code to be measured;The code deleted in all diff is to be measured All should completely exist in code.
The condition is shown in bitmap are as follows:
VdiffIn with "+" mark all values in bitmapvMiddle corresponding position is all 0;VdiffIn with all of "-" mark Value is in bitmapvMiddle corresponding position is all 1.
Condition 2:
Bitmap bitmapvIn, for same VdiffThe corresponding position of interior all characteristic values, filename is answered equal in label.This The possible false positive of extensive code can be reduced.
The corresponding bitmap of two samples is as shown in Figure 7.In Fig. 7, (a) meets all conditions, is considered as a fragility code Clone;(b) it is unsatisfactory for condition, not as fragility Code Clones.
By above step, the detection to fragility Code Clones present in code to be measured is completed, reports file in Tag As clone position occurs for name and function name, to achieve the purpose that fragility Code Clones detect.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
The unit and method and step of each example described in conjunction with the examples disclosed in this document, can with electronic hardware, The combination of computer software or the two is realized, in order to clearly illustrate the interchangeability of hardware and software, in above description In generally describe each exemplary composition and step according to function.These functions are held with hardware or software mode Row, specific application and design constraint depending on technical solution.Those of ordinary skill in the art can be to each specific Using using different methods to achieve the described function, but this realization be not considered as it is beyond the scope of this invention.
Those of ordinary skill in the art will appreciate that all or part of the steps in the above method can be instructed by program Related hardware is completed, and described program can store in computer readable storage medium, such as: read-only memory, disk or CD Deng.Optionally, one or more integrated circuits also can be used to realize, accordingly in all or part of the steps of above-described embodiment Ground, each module/unit in above-described embodiment can take the form of hardware realization, can also use the shape of software function module Formula is realized.The present invention is not limited to the combinations of the hardware and software of any particular form.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (8)

1. a kind of detection method of the fragility Code Clones based on loophole fingerprint, which is characterized in that comprise the following steps:
Step 1), the loophole v selected for constructing fingerprint are obtained from inquiry loophole patch information in vulnerability information database is disclosed Take corresponding fragility code sample in patch;
Step 2), building code parser;
Step 3) pre-processes fragility code sample using code parser, the intermediate representation to be standardized;
Intermediate representation is divided into the code block that size is s row by step 4), by the characteristic value of hash function calculation code block, And combination producing loophole fingerprint;
Step 5) pre-processes code to be detected using code parser, passes through sliding window method and utilizes Hash letter Number calculates the characteristic value of code to be detected, obtains its characteristic value sequence seqf
Step 6), the bitmap bitmap that loophole fingerprint is mapped as to nv, utilize bitmap bitmapvIdentification feature value sequence seqf In whether there is fragility code clone, and if it exists, then it is recorded and is exported;
In acquisition patch in step 1) corresponding fragility code sample, in particular to: obtain corresponding all in patch Diff file, using obtained diff file as fragility code sample;
Step 4) includes following content:
Step 41) determines the code change in loophole patch according to diff file, selects code block size for s row;
Step 42), the intermediate representation for obtaining after pretreatment, are denoted as block for the s line code block of deletionD, by the s row of addition Code block is denoted as blockA, it is filled or is separated into when the code block deleted or added is less than s row is multiple continuous Code block, the last one code block is filled with context;
Step 43) calculates each block using hash functionDAnd blockACharacteristic value, marked or deleted with addition before characteristic value Except label is to distinguish;Characteristic value is respectively combined to the characteristic sequence seq of addition by typeAWith deleted characteristic sequence seqD In, form the corresponding feature vector V of a diff filediff=[seqA, seqD];
Step 44) executes above-mentioned steps 41 for diff files all in leak repairing program)~43), obtain corresponding feature Vector Vdiff1、Vdiff2…Vdiffn, for the loophole v of n diff file in patch, all feature vectors collectively constitute loophole Fingerprint Fv={ Vdiff1,Vdiff2…Vdiffn}。
2. the detection method of the fragility Code Clones according to claim 1 based on loophole fingerprint, which is characterized in that step Rapid 2) middle building code parser, refers to: using ANTLR, generating the generation towards C/C++ according to C/C++ morphology and syntactic definition Code resolver.
3. the detection method of the fragility Code Clones according to claim 1 based on loophole fingerprint, which is characterized in that step Before rapid 3) middle pretreatment, code is first subjected to small letter conversion, and delete excess space, tab, newline and all annotations, it will Retraction pattern is changed to Lisp.
4. the detection method of the fragility Code Clones according to claim 1 based on loophole fingerprint, which is characterized in that step It is rapid 3) in pretreatment, include: in code function and parameter name, variable identifier, data type, character string constant and Function call name does unified replacement respectively.
5. the detection method of the fragility Code Clones according to claim 1 or 4 based on loophole fingerprint, feature exist In pretreatment, is included: passing through function declaration and record the parameter name of form of ownership, replaced using symbol _ PARAM in step 3) Each of function body parameter is changed, while replacing the function name in the function declaration with symbol _ FUNCDEC;Using symbol _ All variables defined in DATA replacement function;All data class stated in ISO C standard are replaced using symbol _ TYPE Type, while self-defined structure body all in code is replaced, the keyword struct in structural body statement is retained in replacement process; Use symbol _ STR substitute character string constant;Each function call is replaced using symbol _ FUNCTION, retains usage and parameter Value.
6. the detection method of the fragility Code Clones according to claim 1 based on loophole fingerprint, which is characterized in that step Rapid 6 is as follows comprising content:
Step 61), by loophole fingerprint FvMapping becomes one m bitmap bitmapv, wherein m meets Bitmap under original statevIn each be all set to 0;
Step 62), the characteristic value sequence seq for code to be detectedf, traverse seqf, check whether each characteristic value is loophole Fingerprint FvIn, if it is present by this feature value in bitmapvMiddle corresponding positions are set as 1, otherwise without modification;Simultaneously by seqf The filename composition label Tag of affiliated function name and place is attached to the position;
Step 63), judgement meet the code snippet of preset condition for fragility Code Clones and are exported, and preset condition includes Condition 1 and condition 2, wherein
Condition 1: the code added in all diff should all be not present in code to be measured;The code deleted in all diff to Surveying in code all should completely exist;
Condition 2: bitmap bitmapvIn, for same VdiffThe corresponding position of interior all characteristic values, file famous prime minister etc. in label.
7. the detection method of the fragility Code Clones according to claim 6 based on loophole fingerprint, which is characterized in that item In part 1, according to bitmap bitmapvDetermined, shows bitmap bitmapvIn are as follows: VdiffIn with add label all values exist bitmapvMiddle corresponding position is all 0;VdiffIn with delete label all values in bitmapvMiddle corresponding position is all 1.
8. a kind of detection device of the fragility Code Clones based on loophole fingerprint, which is characterized in that based on described in claim 1 The fragility Code Clones based on loophole fingerprint detection method realize, include: collection module, sample acquisition module, code Resolver constructs module, sample preprocessing module, loophole fingerprint generation module, code preprocessing module to be measured and Code Clones inspection Survey module, wherein
Collection module collects code sample and establishes vulnerability scan using extraction technique;
Sample acquisition module selectes the loophole v for constructing fingerprint, the inquiry loophole patch letter from open vulnerability information database Breath obtains corresponding fragility code sample in patch;
Code parser constructs module, generates code parser using island grammer by ANTLR;
Sample preprocessing module pre-processes fragility code sample using code parser, the centre standardized It indicates;
Intermediate representation is divided into the code block that size is s row, passes through hash function calculation code block by loophole fingerprint generation module Characteristic value, and combination producing loophole fingerprint;
Code preprocessing module to be measured pre-processes code to be detected using code parser, passes through sliding window method And the characteristic value of code to be detected is calculated using hash function, obtain characteristic value sequence seqf
Loophole fingerprint is mapped as n bitmap bitmap by Code Clones detection modulev, utilize bitmap bitmapvIdentification feature Value sequence seqfIn whether there is fragility code clone, and if it exists, positioning output then is carried out to it.
CN201710789364.6A 2017-09-05 2017-09-05 Fragility Code Clones detection method and its device based on loophole fingerprint Active CN107688748B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710789364.6A CN107688748B (en) 2017-09-05 2017-09-05 Fragility Code Clones detection method and its device based on loophole fingerprint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710789364.6A CN107688748B (en) 2017-09-05 2017-09-05 Fragility Code Clones detection method and its device based on loophole fingerprint

Publications (2)

Publication Number Publication Date
CN107688748A CN107688748A (en) 2018-02-13
CN107688748B true CN107688748B (en) 2019-09-24

Family

ID=61155236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710789364.6A Active CN107688748B (en) 2017-09-05 2017-09-05 Fragility Code Clones detection method and its device based on loophole fingerprint

Country Status (1)

Country Link
CN (1) CN107688748B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209425B (en) * 2018-09-21 2022-03-15 电子科技大学 C language-oriented source code clone detection method
CN109445844B (en) * 2018-11-05 2024-06-21 浙江网新恒天软件有限公司 Code clone detection method based on hash value, electronic equipment and storage medium
CN110147673B (en) * 2019-03-29 2021-08-31 中国科学院信息工程研究所 Vulnerability position marking method and device based on text and source code symbol extraction
CN110427316B (en) * 2019-07-04 2023-02-14 沈阳航空航天大学 Embedded software defect repairing method based on access behavior perception
CN111046390B (en) * 2019-07-12 2023-07-07 安天科技集团股份有限公司 Collaborative defense patch protection method and device and storage equipment
CN111368305A (en) * 2019-07-12 2020-07-03 北京关键科技股份有限公司 Code security risk detection method
CN112329012B (en) * 2019-07-19 2023-05-30 中国人民解放军战略支援部队信息工程大学 Detection method for malicious PDF document containing JavaScript and electronic device
CN110989991B (en) * 2019-10-25 2023-12-01 深圳开源互联网安全技术有限公司 Method and system for detecting source code clone open source software in application program
CN111506900B (en) * 2020-04-15 2023-07-18 抖音视界有限公司 Vulnerability detection method and device, electronic equipment and computer storage medium
CN112131570B (en) * 2020-09-03 2022-06-24 苏州浪潮智能科技有限公司 PCA-based password hard code detection method, device and medium
CN112528290B (en) * 2020-12-04 2023-07-18 扬州大学 Vulnerability positioning method, vulnerability positioning system, computer equipment and storage medium
CN112651028B (en) * 2021-01-05 2022-09-30 西安工业大学 Vulnerability code clone detection method based on context semantics and patch verification
CN112685080B (en) * 2021-01-08 2023-08-11 深圳开源互联网安全技术有限公司 Open source component duplicate checking method, system, device and readable storage medium
CN113434870B (en) * 2021-07-14 2022-07-05 中国电子科技网络信息安全有限公司 Vulnerability detection method, device, equipment and medium based on software dependence analysis
WO2023028721A1 (en) * 2021-08-28 2023-03-09 Huawei Technologies Co.,Ltd. Systems and methods for detection of code clones
CN113901474B (en) * 2021-09-13 2022-07-26 四川大学 Vulnerability detection method based on function-level code similarity
CN114880674B (en) * 2022-04-28 2024-05-31 西安交通大学 Vulnerability detection method and system based on novel vulnerability fingerprint
CN115586920B (en) * 2022-12-13 2023-03-14 北京安普诺信息技术有限公司 Fragile code segment clone detection method and device, electronic equipment and storage medium
CN117873905B (en) * 2024-03-11 2024-05-31 北京安普诺信息技术有限公司 Method, device, equipment and medium for code homology detection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065095A (en) * 2013-01-29 2013-04-24 四川大学 WEB vulnerability scanning method and vulnerability scanner based on fingerprint recognition technology
CN106295335A (en) * 2015-06-11 2017-01-04 中国科学院信息工程研究所 The firmware leak detection method of a kind of Embedded equipment and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065095A (en) * 2013-01-29 2013-04-24 四川大学 WEB vulnerability scanning method and vulnerability scanner based on fingerprint recognition technology
CN106295335A (en) * 2015-06-11 2017-01-04 中国科学院信息工程研究所 The firmware leak detection method of a kind of Embedded equipment and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery;Seulbae Kim,et al;《2017 IEEE Symposium on Security and Privacy》;20170626;第599-608页 *
基于数据位图的滑动分块算法;邓雪峰等;《计算机研究与发展》;20141215(第S2期);第31-35页 *

Also Published As

Publication number Publication date
CN107688748A (en) 2018-02-13

Similar Documents

Publication Publication Date Title
CN107688748B (en) Fragility Code Clones detection method and its device based on loophole fingerprint
CN108446540B (en) Program code plagiarism type detection method and system based on source code multi-label graph neural network
CN109445834B (en) Program code similarity rapid comparison method based on abstract syntax tree
Nguyen et al. Clone management for evolving software
Garcés et al. Managing model adaptation by precise detection of metamodel changes
US7480676B2 (en) Chasing engine for data transfer
US8452754B2 (en) Static analysis framework for database applications
CN101650651B (en) Visualizing method of source code level program structure
US7854376B2 (en) System and method for managing item interchange and identification in an extended enterprise
US8312440B2 (en) Method, computer program product, and hardware product for providing program individuality analysis for source code programs
US20100070948A1 (en) System and method for improving modularity of large legacy software systems
CN109684838B (en) Static code auditing system and method for Ether house intelligent contract
US20140013297A1 (en) Query-Based Software System Design Representation
Nguyen et al. Clone-aware configuration management
CN107016071B (en) A kind of method and system using simple path characteristic optimization tree data
CN105843614B (en) A kind of code compatibility appraisal procedure that software-oriented develops
US7792851B2 (en) Mechanism for defining queries in terms of data objects
CN106503496A (en) Replaced and the Python shell script anti-reversal methods for merging based on operation code
LU503512B1 (en) Operating method for construction of knowledge graph based on naming rule and caching mechanism
CN110515838A (en) Method and system for detecting software defects based on topic model
Van Den Brink et al. Quality assessment for embedded SQL
CN105824792A (en) Text comparison method and equipment
Gall Archview-analyzing evolutionary aspects of complex software systems
Maqbool et al. Metarule-guided association rule mining for program understanding
Kpodjedo et al. Recovering the evolution stable part using an ecgm algorithm: Is there a tunnel in mozilla?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant