CN111753303A - Multi-granularity code vulnerability detection method based on deep learning and reinforcement learning - Google Patents

Multi-granularity code vulnerability detection method based on deep learning and reinforcement learning Download PDF

Info

Publication number
CN111753303A
CN111753303A CN202010747186.2A CN202010747186A CN111753303A CN 111753303 A CN111753303 A CN 111753303A CN 202010747186 A CN202010747186 A CN 202010747186A CN 111753303 A CN111753303 A CN 111753303A
Authority
CN
China
Prior art keywords
code
vulnerability
vulnerability detection
learning
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010747186.2A
Other languages
Chinese (zh)
Other versions
CN111753303B (en
Inventor
蒋远
苏小红
王甜甜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202010747186.2A priority Critical patent/CN111753303B/en
Publication of CN111753303A publication Critical patent/CN111753303A/en
Application granted granted Critical
Publication of CN111753303B publication Critical patent/CN111753303B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Virology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a multi-granularity code vulnerability detection method based on deep learning and reinforcement learning, which comprises the following steps: 1) analyzing the source code to obtain an intermediate code representation corresponding to the code; 2) slicing the intermediate code to obtain a code segment smaller than the source program; 3) converting an input code segment into a low-dimensional continuous real-valued vector by using a code segment representation method; 4) inputting the vector representation of the code segment into a coarse-grained code vulnerability detection model based on deep learning, and judging whether the code segment contains defects; 5) and constructing a fine-grained code vulnerability detection model based on reinforcement learning, and predicting code lines which specifically cause vulnerabilities in code segments containing defects. The invention provides a complete multi-granularity code vulnerability detection framework, applies reinforcement learning to the field of fine-granularity code vulnerability detection for the first time, and provides a new code segmentation representation learning model to fully utilize semantic information of a program, thereby improving the accuracy and the practicability of vulnerability detection.

Description

Multi-granularity code vulnerability detection method based on deep learning and reinforcement learning
Technical Field
The invention relates to a code vulnerability detection method, in particular to a multi-granularity code vulnerability detection method based on deep learning and reinforcement learning technologies.
Background
Software bugs refer to defects of software in the life cycle of the software, and the defects can be utilized by lawless persons, bypass access control of a system, and illegally steal higher authority so as to arbitrarily operate the system, such as triggering privilege commands, accessing sensitive information, impersonating identities, monitoring system operation and the like. If the security-related vulnerability cannot be identified and repaired in time, the vulnerability is easily utilized by a malicious attacker, so that the system is invaded to cause unreliable system operation results, or serious security problems such as arbitrary command execution, arbitrary file reading and the like.
Code analysis is a main means for checking and discovering inherent defects and utilization ways in software codes, and is always a research hotspot in the fields of information security and software security. However, as the existing software system becomes more complex and huge, the frequency of vulnerability occurrence and the attack means of hackers are continuously improved, the traditional vulnerability detection tool based on the predefined rule cannot meet the requirements of modern software development, and more researchers begin to pay attention to the code vulnerability detection method based on machine learning and deep learning. The vulnerability detection method based on machine learning relies on experts to manually define code features (e.g., software complexity metrics, function calls, code changes, and system calls), and then automatically classify vulnerability codes and non-vulnerability codes using a machine learning model. However, the definition of the code features is more subjective, so that the method is generally only suitable for specific projects and has poor generalization capability. And the granularity of codes input into the machine learning model is generally coarse, and the exact position of a vulnerability code line cannot be determined. The vulnerability detection method based on deep learning does not need experts to define features manually, can automatically generate vulnerability modes from a large amount of historical data, is expected to change the software source code vulnerability detection method, enables vulnerability modes oriented to various types of vulnerabilities to be changed from dependence on expert manual definition to automatic generation, and obviously improves vulnerability detection effectiveness. However, research related to the method is just started, most of research focuses on coarser-grained code vulnerability detection, for example, code vulnerabilities are detected at a function level or a file level, and research related to a "vulnerability structure" of the code is very deficient. The vulnerability structure not only enables the detection tool to judge whether the code contains the vulnerability, but also can indicate the specific form of the vulnerability in the code and the position where the vulnerability occurs. In order to enable the vulnerability detection method based on deep learning to be better applied in practice, research on a code vulnerability structure is necessary.
A document VulDeeLocator (Z.Li, D.Zou, S.xu, Z.Chen, Y.Zhu, and H.jin, VulDeeLocator: A Deep Learning-based Fine-grained continuity Detector, arXivpreprint arXiv:2001.02350,2020) is the only document which can be searched at present and can realize statement level Fine-grained Vulnerability location based on Deep Learning. However, from the experimental effect, the fine-grained detection of vuldeelocater is not obviously superior to the traditional rule-based vulnerability detection tool, because the network structure of the method cannot sufficiently capture the semantic information of the program and the code structure related to the vulnerability.
Disclosure of Invention
The invention aims to provide a multi-granularity code vulnerability detection method based on deep learning and reinforcement learning, which not only can detect a software module (such as a function) containing defects at a coarser granularity, but also can position code sentences possibly causing vulnerabilities in the module at a finer granularity level, namely, multi-granularity code vulnerability detection is realized. In addition, the accuracy of the vulnerability detection model depends on the accuracy of the model input, namely the accuracy of the code segment representation. Aiming at the problem that the Code semantic information cannot be fully utilized by the conventional Code Representation method based on tokens (tokens), the invention provides a novel Code segmentation Representation Learning method (Staged Code Representation Learning). The method first learns the vector representation of each statement in the code, and then learns the vector representation of the entire program based on the vector representation of each statement. Separating the vector representation of the statement from the vector representation of the program enables the learned code vector to capture the more subtle structural (syntactic) differences of the program as well as more complex semantic information.
The purpose of the invention is realized by the following technical scheme:
a multi-granularity code vulnerability detection method based on deep learning and reinforcement learning comprises the steps of firstly, analyzing a source code and obtaining an intermediate code expression form corresponding to the code. Using intermediate code representation enables capturing more program control flow and variable definition-usage information than the representation form of the source code. Secondly, taking key points (key points) which may cause the vulnerability as a slicing standard, slicing the intermediate code to obtain code segments (code gadgets) smaller than the source program so as to reduce the length of the input sequence of the model and avoid the influence of the vulnerability irrelevant statements on important information. Thirdly, the input code segment is converted into a low-dimensional continuous real-valued vector by using the code segment representation learning model provided by the invention. And then, the vector representation of the code segment is input into a coarse-grained code vulnerability detection model based on deep learning, and whether the code segment corresponding to the input vector contains a vulnerability is judged. And finally, if the coarse-grained model detects that the input code segment contains the bug, continuing to perform next judgment by the fine-grained detection model based on reinforcement learning, namely finding out a possible code line causing the bug.
The method specifically comprises the following steps:
step 1: performing static analysis on a source program by using a Clang tool to obtain an intermediate code representation form of the program;
step 2: extracting key points which possibly cause the vulnerability, generating a slicing standard, slicing the intermediate code, and combining the forward slicing and the backward slicing to obtain a code segment of the program;
and step 3: representing the code segments into low-dimensional continuous real-valued vectors by using a code segment representation learning method;
and 4, step 4: inputting the vector representation of the code segment into a coarse-grained code vulnerability detection model based on deep learning, and judging whether the code segment contains a vulnerability;
and 5: and constructing a fine-grained vulnerability detection model based on reinforcement learning, and predicting code lines which specifically cause vulnerabilities in code segments containing defects.
Compared with the prior art, the invention has the following advantages:
1. compared with the existing method which can only detect the loophole on a coarser granularity level, the method has the advantages that the method can not only finish the module detection of the loophole code with coarse granularity, but also can realize the positioning of a loophole statement with fine granularity, and improves the practicability of the loophole detection method based on data driving and the interpretability of a model prediction result.
2. Compared with the existing code representation method based on the label (token), the novel code segmentation representation learning method has the advantages that the local and global semantic information of a program can be fully utilized, the accuracy of the generated code representation vector is improved, and the vulnerability detection capability of the model is further improved.
3. The invention firstly proposes that a reinforcement learning technology is applied to a fine-grained code vulnerability detection task, various possible vulnerability statement combinations are continuously tried on a training data example, the number of vulnerability statements contained in each combination is fed back to a Policy (Policy) module as a signal for adjusting model behaviors, and the model can automatically learn to obtain a code structure related to vulnerabilities through continuous signal accumulation.
Drawings
Fig. 1 is a general flowchart of a multi-granularity code vulnerability detection method proposed by the present invention.
FIG. 2 is a diagram of statement nodes corresponding to function call key points in a syntax tree.
FIG. 3 is a diagram of the sentence nodes corresponding to the array definition key points in the syntax tree.
FIG. 4 is a sentence node corresponding in a syntax tree for defining keypoints.
FIG. 5 is a diagram of the statement nodes corresponding to evaluation expression key points in a syntax tree.
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings, but not limited thereto, and any modification or equivalent replacement of the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention shall be covered by the protection scope of the present invention.
According to the method and the device, coarse-grained and fine-grained code vulnerability identification and vulnerability statement positioning are respectively realized on the basis of a deep learning technology and a reinforcement learning technology. First, the source code is parsed and an intermediate code representation (e.g., LLVM IR) corresponding to the code is obtained. Secondly, slicing the intermediate code by taking key points (keypoints) which may cause the vulnerability as slicing standards to obtain code segments (code gadgets) smaller than the source program. Thirdly, the code segment representation learning method is used for converting the input code segments into low-dimensional continuous real-valued vectors. And then, the vector representation of the code segment is input into a coarse-grained vulnerability detection model based on deep learning, and whether the code segment corresponding to the input vector contains a vulnerability or not is judged. And finally, if the coarse-grained model detects that the input code segment contains the bug, continuing to perform next judgment by the fine-grained bug detection model based on reinforcement learning, namely finding out the code line causing the bug.
As shown in fig. 1, the specific steps are as follows:
step 1: and (5) performing static analysis on the source program by using a Clang tool to obtain an intermediate code representation form of the program.
Step 2: extracting key points which may cause a vulnerability, generating a slicing standard, slicing the intermediate code, and combining a forward slice and a backward slice to obtain a code segment of the program, wherein the specific steps are as follows:
step 21: analyzing the program into an abstract syntax tree form by using a program analysis technology;
step 22: and traversing the generated abstract syntax tree through a node matching algorithm to find four syntax tree nodes which possibly cause code bugs: (1) function call statement node (i.e., Callee), as shown in fig. 2; (2) the array defines a statement node (i.e., the IdentifierDeslcStatement, and the statement contains "[" and "]" characters), as shown in FIG. 3; (3) the pointer defines a statement node (i.e., the identifierdemotlstatement, and the statement contains the "+" character), as shown in fig. 4; (4) an expression statement node (i.e., expression statement), as shown in fig. 5;
step 23: filtering the four types of nodes extracted from the program, and selecting syntax tree nodes meeting the conditions as key points which may cause a vulnerability, for example: if the identifier (identity) corresponding to the "call" node is in a predefined library function list which may cause a vulnerability, the parent node (CallExpression) corresponding to the "call" node is a statement S in the slicing standard;
step 24: extracting the related variables from the sentence S as V in the slicing standard, and finally obtaining the slicing standard (S, V) for extracting the slices by combining S and V;
step 25: analyzing the code segments into a program dependence graph by using a program analysis technology, performing forward and backward slice analysis on the program dependence graph according to the slice standard generated in the step 24, and combining the forward slices and the backward slices to obtain program slices related to four key points which possibly cause the vulnerability;
step 26: and converting the program slice of the source program into the program slice of the intermediate code according to the corresponding relation between the source program and the intermediate code.
And step 3: the code segment representation learning method is used for representing the code segments into low-dimensional continuous real-valued vectors, and comprises the following specific steps:
step 31: splitting a code segment by taking a statement (statement) as a unit;
step 32: constructing a Statement Encoding Network (SENET) based on CNN, wherein the schematic diagram of the model is shown in the upper part of the figure 1;
step 33: dividing each sentence obtained in the step 31 into a mark (token) sequence by taking a space as a separator as an input of a sentence coding network, and outputting a vector representation of the sentence;
step 34: constructing an LSTM-based Program code Network (PENet), wherein a schematic diagram of a model is shown in the upper right part of FIG. 1;
step 35: and taking the vector representation of each statement contained in the code segment in the step 33 as the input of the program coding network, and outputting the vector representation of the hidden layer of the last time step as the vector representation of the code segment.
And 4, step 4: inputting the vector representation of the code segment into a coarse-grained code vulnerability detection model based on deep learning, and judging whether the code segment contains a vulnerability, wherein the method specifically comprises the following steps:
step 41: constructing a full-connection single-hidden-layer-based coarse-grained code vulnerability detection model (DetectrNet, DNet), wherein the model is schematically shown in the middle lower part of FIG. 1;
step 42: taking the vector representation of the code segment as the input of the model, and outputting the probability of the vulnerability contained in the code segment;
step 43: and taking the predicted probability and a real label as the input of a cross entropy loss function, calculating the predicted error, and updating the parameters of a code segment representation learning model (namely SENEt realized based on CNN and PENet realized based on LSTM) and a coarse-grained vulnerability detection model (DNet) by using a back propagation algorithm.
And 5: constructing a fine-grained vulnerability detection model (Policy Network, PNet) based on reinforcement learning, and predicting code rows which specifically cause vulnerabilities in code segments containing defects, wherein the specific steps are as follows:
step 51: constructing a fine-grained code vulnerability prediction model PNet based on reinforcement learning, wherein the schematic diagram of the model is shown on the right side of FIG. 1;
step 52: splicing vector representation of a current program statement and context content (context) vector representation of the statement to be used as state representation of reinforcement learning at t time step;
step 53: according to the state representation at the time t, predicting actions (action) which can be taken by an agent (agent or policy), if the actions are 'related' (relevance) ', the input statements at the time t are statements which can cause code bugs, if the actions are' unrelated '(relevance)', the input statements do not cause the code bugs, and the same action prediction is carried out on each statement of a code segment to generate an action sequence;
step 54: according to the formula
Figure BDA0002608772020000091
Calculating reward (reward), wherein U is a code line related to the vulnerability in the action sequence predicted in step 53, and V is a code line where the real vulnerability is located;
step 55: parameters of the model PNet are updated according to a classic REINFORCE algorithm and a Policy gradient algorithm, so that the PNet can automatically learn to obtain a code structure related to the vulnerability.
Example (b):
taking a specific vulnerability example in a data set Software Assessment Reference Dataset (SARD) as an example, the detection process of the multi-granularity code vulnerability detection method based on deep learning and reinforcement learning provided by the invention is analyzed. The contents of the four source code files related to the vulnerability instance are shown in tables 1 to 4 respectively. First, step 1 and step 2 of the embodiment of the present invention are executed, the source code is converted into an intermediate code, and a program slice for the intermediate code is generated with a hazard function memset that may cause a bug as a key point, as shown in table 5, in this slice code, actual bug statements are rows 10, 11, 34, and 35. Then, step 3 and step 4 are executed to learn the vector representation of each line of code in the program and the vector representation of the whole program, and the vector representation of the program is used as an input vector of a coarse-grained vulnerability detection model (DNet), and the result of model prediction is 1, namely, the slice contains vulnerabilities. And finally, executing a step 5 to take vector representation of each line of statements in the program as input of a fine-grained vulnerability detection model (PNet), wherein an action sequence of output prediction is shown in a table 6, wherein a numeral 0 represents that a corresponding code statement does not contain a vulnerability, a numeral 1 represents that a corresponding code statement may contain a vulnerability, and the position index of the numeral 1 in the table 6 is also 10, 11, 34 and 35, so that the fine-grained vulnerability detection model accurately identifies a specific vulnerability position. It can be seen from the above example that the method provided by the present invention not only realizes coarse-grained vulnerability code detection, but also realizes positioning to specific code statements that may cause vulnerabilities.
TABLE 1 CWE124_ Buffer _ Underwrite __ char _ delete _ memnove _53a. c
Figure BDA0002608772020000101
TABLE 2 CWE124_ Buffer _ Underwrite __ char _ declar _ memnovre _53b.c
Figure BDA0002608772020000102
TABLE 3 CWE124_ Buffer _ Underwrite __ char _ declar _ memnovre _53c.c
Figure BDA0002608772020000111
TABLE 4 CWE124_ Buffer _ Underwrite __ char _ delete _ memnove _53d.c
Figure BDA0002608772020000112
Table 5 Key to memset, generating program slices for intermediate code
Figure BDA0002608772020000121
Figure BDA0002608772020000131
Table 6 fine-grained vulnerability detection model action sequence for table 5 prediction
Figure BDA0002608772020000132

Claims (4)

1. A multi-granularity code vulnerability detection method based on deep learning and reinforcement learning is characterized by comprising the following steps:
step 1: performing static analysis on a source program by using a Clang tool to obtain an intermediate code representation form of the program;
step 2: extracting key points which possibly cause the vulnerability, generating a slicing standard, slicing the intermediate code, and combining the forward slicing and the backward slicing to obtain a code segment of the program;
and step 3: representing the code segments into low-dimensional continuous real-valued vectors by using a code segment representation learning method;
and 4, step 4: inputting the vector representation of the code segment into a coarse-grained code vulnerability detection model based on deep learning, and judging whether the code segment contains a vulnerability;
and 5: and constructing a fine-grained vulnerability detection model based on reinforcement learning, and predicting code lines which specifically cause vulnerabilities in code segments containing defects.
2. The deep learning and reinforcement learning-based multi-granularity code vulnerability detection method according to claim 1, wherein the specific steps of the step 3 are as follows:
step 31: splitting a code segment by taking a statement (statement) as a unit;
step 32: constructing a statement coding network SEnet based on CNN;
step 33: dividing each sentence obtained in the step 31 into a mark (token) sequence by taking a space as a separator as an input of a sentence coding network, and outputting a vector representation of the sentence;
step 34: constructing a program coding network PENet based on LSTM;
step 35: and taking the vector representation of each statement contained in the code segment in the step 33 as the input of the program coding network, and outputting the vector representation of the hidden layer of the last time step as the vector representation of the code segment.
3. The deep learning and reinforcement learning-based multi-granularity code vulnerability detection method according to claim 1, wherein the specific steps of the step 4 are as follows:
step 41: constructing a full-connection single hidden layer-based coarse-grained code vulnerability detection model DNet;
step 42: taking the vector representation of the code segment as the input of the model, and outputting the probability of the vulnerability contained in the code segment;
step 43: and taking the predicted probability and a real label as the input of a cross entropy loss function, calculating a predicted error, and updating parameters of a code segmentation representation learning model and a coarse-grained vulnerability detection model DNet by using a back propagation algorithm, wherein the learning model comprises SENEt realized based on CNN and PENet realized based on LSTM.
4. The deep learning and reinforcement learning-based multi-granularity code vulnerability detection method according to claim 1, wherein the specific steps of the step 5 are as follows:
step 51: constructing a fine-grained code vulnerability prediction model PNet based on reinforcement learning;
step 52: splicing vector representation of a current program statement and context content (context) vector representation of the statement to be used as state representation of reinforcement learning at t time step;
step 53: according to the state representation at the time t, predicting actions (action) which can be taken by an agent (agent or policy), if the actions are 'related' (relevance) ', the input statements at the time t are statements which can cause code bugs, if the actions are' unrelated '(relevance)', the input statements do not cause the code bugs, and the same action prediction is carried out on each statement of a code segment to generate an action sequence;
step 54: according to the formula
Figure FDA0002608772010000031
Calculating reward (reward), wherein U is a code line related to the vulnerability in the action sequence predicted in step 53, and V is a code line where the real vulnerability is located;
step 55: and updating parameters of the model PNet according to a classic REINFORCE algorithm and a Policy gradient algorithm, so that the model PNet can automatically learn to obtain a code structure related to the vulnerability.
CN202010747186.2A 2020-07-29 2020-07-29 Multi-granularity code vulnerability detection method based on deep learning and reinforcement learning Active CN111753303B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010747186.2A CN111753303B (en) 2020-07-29 2020-07-29 Multi-granularity code vulnerability detection method based on deep learning and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010747186.2A CN111753303B (en) 2020-07-29 2020-07-29 Multi-granularity code vulnerability detection method based on deep learning and reinforcement learning

Publications (2)

Publication Number Publication Date
CN111753303A true CN111753303A (en) 2020-10-09
CN111753303B CN111753303B (en) 2023-02-07

Family

ID=72712297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010747186.2A Active CN111753303B (en) 2020-07-29 2020-07-29 Multi-granularity code vulnerability detection method based on deep learning and reinforcement learning

Country Status (1)

Country Link
CN (1) CN111753303B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699377A (en) * 2020-12-30 2021-04-23 哈尔滨工业大学 Function-level code vulnerability detection method based on slice attribute graph representation learning
CN113901472A (en) * 2021-09-08 2022-01-07 燕山大学 Dual-granularity lightweight vulnerability code slice quality evaluation method
CN113946830A (en) * 2021-10-09 2022-01-18 暨南大学 Multi-mode detection-based Android APP vulnerability fine-grained detection method
CN114969763A (en) * 2022-06-20 2022-08-30 哈尔滨工业大学 Fine-grained vulnerability detection method based on seq2seq code representation learning
CN115080982A (en) * 2022-06-24 2022-09-20 哈尔滨工业大学 Combined attack resisting method for vulnerability detection model
CN115422092A (en) * 2022-11-03 2022-12-02 杭州金衡和信息科技有限公司 Software bug positioning method based on multi-method fusion
CN116166276A (en) * 2023-04-25 2023-05-26 芯瞳半导体技术(山东)有限公司 Control flow analysis method, device, equipment, medium and product
WO2024130686A1 (en) * 2022-12-23 2024-06-27 Huawei Technologies Co., Ltd. Methods, systems, apparatuses, and computer-readable media for training neural network to learn computer code change representations

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885999A (en) * 2017-11-08 2018-04-06 华中科技大学 A kind of leak detection method and system based on deep learning
CN109657473A (en) * 2018-11-12 2019-04-19 华中科技大学 A kind of fine granularity leak detection method based on depth characteristic
US20190138731A1 (en) * 2016-04-22 2019-05-09 Lin Tan Method for determining defects and vulnerabilities in software code
CN110222512A (en) * 2019-05-21 2019-09-10 华中科技大学 A kind of software vulnerability intelligent measurement based on intermediate language and localization method and system
CN110245496A (en) * 2019-05-27 2019-09-17 华中科技大学 A kind of source code leak detection method and detector and its training method and system
CN111459799A (en) * 2020-03-03 2020-07-28 西北大学 Software defect detection model establishing and detecting method and system based on Github

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190138731A1 (en) * 2016-04-22 2019-05-09 Lin Tan Method for determining defects and vulnerabilities in software code
CN107885999A (en) * 2017-11-08 2018-04-06 华中科技大学 A kind of leak detection method and system based on deep learning
CN109657473A (en) * 2018-11-12 2019-04-19 华中科技大学 A kind of fine granularity leak detection method based on depth characteristic
CN110222512A (en) * 2019-05-21 2019-09-10 华中科技大学 A kind of software vulnerability intelligent measurement based on intermediate language and localization method and system
CN110245496A (en) * 2019-05-27 2019-09-17 华中科技大学 A kind of source code leak detection method and detector and its training method and system
CN111459799A (en) * 2020-03-03 2020-07-28 西北大学 Software defect detection model establishing and detecting method and system based on Github

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Z.JIN 等: ""Current and Future Research of Machine Learning Based Vulnerability Detection,"", 《2018 EIGHTH INTERNATIONAL CONFERENCE ON INSTRUMENTATION & MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC)》 *
王婷 等: ""代码克隆检测方法研究进展"", 《现代计算机》 *
王甜甜: ""结构语义相似的程序识别方法研究"", 《中国优秀博士学位论文全文数据库信息科技辑》 *
苏小红 等: ""一种新的过程间静态切片快速算法"", 《哈尔滨工业大学学报》 *
邢文静: ""基于程序切片的二进制代码漏洞智能检测研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
邵云飞: ""融合主题模型与词向量的短文本分类方法研究"", 《中国优秀博士学位论文全文数据库信息科技辑》 *
陈肇炫 等: ""基于抽象语法树的智能化漏洞检测***"", 《信息安全学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699377A (en) * 2020-12-30 2021-04-23 哈尔滨工业大学 Function-level code vulnerability detection method based on slice attribute graph representation learning
CN113901472A (en) * 2021-09-08 2022-01-07 燕山大学 Dual-granularity lightweight vulnerability code slice quality evaluation method
CN113901472B (en) * 2021-09-08 2023-08-08 燕山大学 Dual-granularity lightweight vulnerability code slice quality assessment method
CN113946830A (en) * 2021-10-09 2022-01-18 暨南大学 Multi-mode detection-based Android APP vulnerability fine-grained detection method
CN113946830B (en) * 2021-10-09 2024-05-07 暨南大学 Android APP vulnerability fine-granularity detection method based on multi-mode detection
CN114969763A (en) * 2022-06-20 2022-08-30 哈尔滨工业大学 Fine-grained vulnerability detection method based on seq2seq code representation learning
CN114969763B (en) * 2022-06-20 2024-07-16 哈尔滨工业大学 Fine granularity vulnerability detection method based on seq2seq code representation learning
CN115080982A (en) * 2022-06-24 2022-09-20 哈尔滨工业大学 Combined attack resisting method for vulnerability detection model
CN115080982B (en) * 2022-06-24 2024-07-19 哈尔滨工业大学 Combined anti-attack method for vulnerability detection model
CN115422092A (en) * 2022-11-03 2022-12-02 杭州金衡和信息科技有限公司 Software bug positioning method based on multi-method fusion
WO2024130686A1 (en) * 2022-12-23 2024-06-27 Huawei Technologies Co., Ltd. Methods, systems, apparatuses, and computer-readable media for training neural network to learn computer code change representations
CN116166276A (en) * 2023-04-25 2023-05-26 芯瞳半导体技术(山东)有限公司 Control flow analysis method, device, equipment, medium and product

Also Published As

Publication number Publication date
CN111753303B (en) 2023-02-07

Similar Documents

Publication Publication Date Title
CN111753303B (en) Multi-granularity code vulnerability detection method based on deep learning and reinforcement learning
CN109241740B (en) Malicious software benchmark test set generation method and device
Saccente et al. Project achilles: A prototype tool for static method-level vulnerability detection of Java source code using a recurrent neural network
CN108965340B (en) Industrial control system intrusion detection method and system
US11106801B1 (en) Utilizing orchestration and augmented vulnerability triage for software security testing
CN111459799A (en) Software defect detection model establishing and detecting method and system based on Github
CN113158189B (en) Method, device, equipment and medium for generating malicious software analysis report
Li et al. A lightweight assisted vulnerability discovery method using deep neural networks
CN111522708B (en) Log recording method, computer equipment and storage medium
Dong et al. Towards interpreting recurrent neural networks through probabilistic abstraction
CN116841779A (en) Abnormality log detection method, abnormality log detection device, electronic device and readable storage medium
De La Torre-Abaitua et al. On the application of compression-based metrics to identifying anomalous behaviour in web traffic
CN113778852B (en) Code analysis method based on regular expression
CN112817877A (en) Abnormal script detection method and device, computer equipment and storage medium
Haojie et al. Vulmg: A static detection solution for source code vulnerabilities based on code property graph and graph attention network
CN116032654B (en) Firmware vulnerability detection and data security management method and system
Dutta et al. InspectJS: leveraging code similarity and user-feedback for effective taint specification inference for JavaScript
CN116502230A (en) Deep learning-based vulnerability exploitation authority generation method
Xiao et al. Detecting anomalies in cluster system using hybrid deep learning model
CN115510449A (en) Source code vulnerability detection method and device
Sotos Martínez et al. A survey on the state of the art of vulnerability assessment techniques
Xiaomeng et al. A survey on source code review using machine learning
Qian et al. Semantic-based false alarm detection approach via machine learning
Ju et al. Detection of malicious code using the direct hashing and pruning and support vector machine
de la Torre-Abaitua et al. A parameter-free method for the detection of web attacks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant