CN113742731A - Data collection method for code vulnerability intelligent detection - Google Patents
Data collection method for code vulnerability intelligent detection Download PDFInfo
- Publication number
- CN113742731A CN113742731A CN202010487163.2A CN202010487163A CN113742731A CN 113742731 A CN113742731 A CN 113742731A CN 202010487163 A CN202010487163 A CN 202010487163A CN 113742731 A CN113742731 A CN 113742731A
- Authority
- CN
- China
- Prior art keywords
- code
- vulnerability
- detection
- judgment
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000013480 data collection Methods 0.000 title claims abstract description 6
- 238000010801 machine learning Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 5
- 238000012360 testing method Methods 0.000 claims description 4
- 238000007689 inspection Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G06F11/3688—Test management for test execution, e.g. scheduling of test suites
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Stored Programmes (AREA)
Abstract
A data collection method for code vulnerability intelligent detection constructs an initial code vulnerability data set, then utilizes a trained machine learning model to process unmarked codes, and expands the data set according to results of model marking and manual marking. The initial data set is constructed by combining the results of the code vulnerability detection tool with the judgment of testers, the training of the machine model is to utilize the initial data set, determine whether false alarm occurs or not by combining the judgment of the machine learning model and the judgment results of the testers for the code which is not marked, and expand the data set according to the false alarm.
Description
Technical Field
The invention belongs to the field of software engineering, and particularly relates to application of a code vulnerability false alarm detection and machine learning method in the field of software engineering, which is used for constructing and collecting a code vulnerability data set.
Background
Due to the increasing complexity of modern software products, the manual testing method is not enough to quickly complete the software bug detection. At present, the traditional vulnerability discovery technology theory is mature, and vulnerabilities can be mined from codes in a mode of model detection, fuzzy test, symbolic execution and binary ratio equivalence. These sophisticated techniques have been largely automated and can scan software code awaiting testing for specific types of vulnerabilities with minimal human intervention. However, the use of automated code vulnerability detection tools also faces problems, such as:
1) code vulnerability detection tools must make a tradeoff between detection efficiency and accuracy. Whether the syntax is analyzed or the execution path of the code is analyzed, a complex analysis model needs to be constructed, and the problems of overlarge solution scale or path explosion easily occur. Due to the limitation of vulnerability detection technology, accurate analysis requires a considerable analysis time, which is not allowed in practical applications.
2) The code vulnerability detection tool relies on rules preset by human experts, so the detected vulnerabilities are often limited to certain specific types. The manually defined vulnerability rules have strong subjectivity, all conditions are difficult to consider comprehensively, and the imperfect rules can cause the problems of false missing and false reports.
3) The detection capability of the code vulnerability detection tool is fixed, and most detected vulnerabilities are real vulnerabilities for programs with low security levels. However, as bugs are fixed, the security of programs is increasing, and the rate of false alarms also increases. If the capabilities of the code vulnerability detection tools are not increased, most of the developer's time is wasted manually checking and marking invalid vulnerabilities.
In summary, in the using process of the automatic detection tool, the situations of missing report and false report are very common. The problem of excessive false alarms can be solved by improving the model. With the continuous breakthrough of machine learning and deep learning technologies, the machine learning technology can be used for helping a code vulnerability detection tool to improve the detection accuracy and reduce the false alarm ratio. However, the accuracy of the machine learning model is very dependent on the size of the data set, and overfitting may occur when insufficient data is provided during training.
The existing code vulnerability data set construction and collection technology has the following problems:
1) the collection modes and the quality of the vulnerability data are different, and the formats of the data sets are also different. At present, a universal and efficient data set is lacked, so that the data set can be automatically constructed only in a web crawler crawling mode in the research process.
2) The continuous increase of code bugs in the using process cannot be considered, the data set cannot be updated, and therefore the detection model cannot be effectively improved.
Disclosure of Invention
In view of the defects of the prior art, the technical problems to be solved by the invention are as follows: in the code vulnerability intelligent detection, an original data set cannot be expanded, so that the accuracy of vulnerability false-alarm detection is influenced.
In order to solve the problems, the invention adopts the technical scheme that: a data set amplification method in code vulnerability intelligent detection comprises the following steps:
1) sending the original code into an automatic vulnerability detection tool for detection;
2) delivering the original code to a tester for vulnerability marking;
3) comparing the detection result with the mark, determining the vulnerability which belongs to the false alarm, and constructing an initial code vulnerability data set;
4) learning the relation between the bug codes and whether the false alarm occurs by using a machine learning model;
5) processing the unmarked codes by using the trained machine learning model;
6) submitting the vulnerability code identified as false alarm by the model to a tester for auditing;
7) adding the vulnerability and the auditing result into a code vulnerability data set constructed before;
by means of the technical scheme, the invention provides the method for expanding the code vulnerability data set, the original data set can be continuously expanded in the using process of the false alarm detection model, then the false alarm detection model can be subjected to iterative training, and higher accuracy is obtained in the later detection process.
Drawings
FIG. 1 is an overall flow chart of the present invention.
Detailed Description
In order to explain the technical content of the invention, the objectives achieved and the final results in detail, specific embodiments will be described in more detail below:
1) and (4) carrying out automatic detection by using a code vulnerability detection tool, and acquiring vulnerability types and vulnerability positions from the detection reports.
2) And acquiring a code segment possibly containing the vulnerability, and judging whether the vulnerability exists by a tester.
3) And comparing the detection result of the tool with the identification result of the tester, if the results are consistent, the detection is considered to be correct, and if the loophole detected by the tool is not marked as a loophole by the tester, the loophole is considered to belong to a loophole with false alarm. Therefore, the vulnerability source code segments and the judgment result of whether the vulnerability source code segments are false reports can be combined to construct an initial code vulnerability false report data set.
4) And training according to the data set through a machine learning algorithm, and learning the relation between the vulnerability source code text and whether false alarm occurs or not to obtain a trained model.
5) And after detecting bugs in other codes by using the code bug detection tool, processing bug codes indicated in the detection report by using the trained model. If a section of code is identified as a bug code with false alarm by the model, the code is handed to a tester for judgment, and if the bug is not contained in the judgment, the code is marked as the bug and added into the database.
Claims (4)
1. A data collection method for code vulnerability intelligent detection is characterized in that an initial code vulnerability data set is constructed by combining the result of a code vulnerability detection tool and the judgment of a tester, then a machine learning model for false-alarm judgment is trained according to the initial code vulnerability data set, finally, the vulnerability with false alarm can be determined by combining the judgment of the machine learning model and the judgment result of the tester, and the vulnerability is added into the code vulnerability data set.
2. The data collection method for intelligent detection of code vulnerabilities as claimed in claim 1, wherein the results of the code vulnerability detection tool are combined with the judgment of the tester; firstly, integrating detection reports of several different code vulnerability detection tools as a final result of tool detection; then, the testing personnel judges the loophole detected by the tool, and if the loophole is not judged, a false alarm result is recorded; specific data items of the data set include: vulnerability code segment, vulnerability type, whether false report.
3. The method for collecting data oriented to intelligent detection of code vulnerabilities as described in claim 1, characterized in that the code vulnerability data set described in claim 2 is used to train a machine learning model for false positive judgment, and after training is completed, the model can be used to predict a newly given code segment to judge whether the code segment is a false positive vulnerability.
4. The data collection method for intelligent detection of code vulnerabilities as described in claim 1, wherein the vulnerability with false alarm is determined by combining the judgment of a machine learning model and the judgment result of a tester; for each section of code possibly containing a bug, firstly inputting the code into the machine learning model in claim 3, and judging whether the code is a false alarm occurring in the bug detection process; then, according to the judgment result of the model, if the judgment result is false alarm, the code segment is delivered to a tester for inspection; if the code segment is not really a bug after checking, the code segment is added to the initial data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010487163.2A CN113742731A (en) | 2020-05-27 | 2020-05-27 | Data collection method for code vulnerability intelligent detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010487163.2A CN113742731A (en) | 2020-05-27 | 2020-05-27 | Data collection method for code vulnerability intelligent detection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113742731A true CN113742731A (en) | 2021-12-03 |
Family
ID=78727931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010487163.2A Pending CN113742731A (en) | 2020-05-27 | 2020-05-27 | Data collection method for code vulnerability intelligent detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113742731A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942497A (en) * | 2013-09-11 | 2014-07-23 | 杭州安恒信息技术有限公司 | Forensics type website vulnerability scanning method and system |
CN104486141A (en) * | 2014-11-26 | 2015-04-01 | 国家电网公司 | Misdeclaration self-adapting network safety situation predication method |
CN110517469A (en) * | 2019-08-08 | 2019-11-29 | 武汉兴图新科电子股份有限公司 | A kind of intelligent alarm convergence method suitable for audio-video convergence platform |
CN110753047A (en) * | 2019-10-16 | 2020-02-04 | 杭州安恒信息技术股份有限公司 | Method for reducing false alarm of vulnerability scanning |
CN110929267A (en) * | 2019-11-29 | 2020-03-27 | 深信服科技股份有限公司 | Code vulnerability detection method, device, equipment and storage medium |
-
2020
- 2020-05-27 CN CN202010487163.2A patent/CN113742731A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942497A (en) * | 2013-09-11 | 2014-07-23 | 杭州安恒信息技术有限公司 | Forensics type website vulnerability scanning method and system |
CN104486141A (en) * | 2014-11-26 | 2015-04-01 | 国家电网公司 | Misdeclaration self-adapting network safety situation predication method |
CN110517469A (en) * | 2019-08-08 | 2019-11-29 | 武汉兴图新科电子股份有限公司 | A kind of intelligent alarm convergence method suitable for audio-video convergence platform |
CN110753047A (en) * | 2019-10-16 | 2020-02-04 | 杭州安恒信息技术股份有限公司 | Method for reducing false alarm of vulnerability scanning |
CN110929267A (en) * | 2019-11-29 | 2020-03-27 | 深信服科技股份有限公司 | Code vulnerability detection method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109308411B (en) | Method and system for hierarchically detecting software behavior defects based on artificial intelligence decision tree | |
CN105471882A (en) | Behavior characteristics-based network attack detection method and device | |
CN112733156B (en) | Intelligent detection method, system and medium for software vulnerability based on code attribute graph | |
CN111209570B (en) | Method for creating safe closed loop process based on MITER ATT & CK | |
CN113392784B (en) | Automatic editing method for application security detection task based on vulnerability fingerprint identification | |
CN112147221B (en) | Steel rail screw hole crack identification method and system based on ultrasonic flaw detector data | |
CN110309073A (en) | Mobile applications user interface mistake automated detection method, system and terminal | |
CN115277180B (en) | Block chain log anomaly detection and tracing system | |
Yang et al. | Vuldigger: A just-in-time and cost-aware tool for digging vulnerability-contributing changes | |
CN116578980A (en) | Code analysis method and device based on neural network and electronic equipment | |
CN115952503A (en) | Application safety testing method and system integrating black, white and gray safety detection technology | |
CN117336055A (en) | Network abnormal behavior detection method and device, electronic equipment and storage medium | |
CN116383833A (en) | Method and device for testing software program code, electronic equipment and storage medium | |
CN115964757A (en) | Drainage basin environment monitoring and disposal method and device based on block chain | |
CN113779590B (en) | Source code vulnerability detection method based on multidimensional characterization | |
CN117368651B (en) | Comprehensive analysis system and method for faults of power distribution network | |
CN117114420B (en) | Image recognition-based industrial and trade safety accident risk management and control system and method | |
CN111855825B (en) | Rail head nuclear injury identification method and system based on BP neural network | |
CN113742731A (en) | Data collection method for code vulnerability intelligent detection | |
CN104751059A (en) | Function template based software behavior analysis method | |
CN104035866A (en) | Software behavior evaluation method and device based on system calling and analysis | |
CN115062315A (en) | Multi-tool inspection-based security code examination method and system | |
CN113051161A (en) | API misuse detection method based on historical code change information | |
CN113836539A (en) | Power engineering control system leak full-flow disposal system and method based on precise test | |
CN115795467A (en) | Intelligent evaluation method for computer software bugs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |