CN116484025A - Vulnerability knowledge graph construction method, vulnerability knowledge graph evaluation equipment and storage medium - Google Patents

Vulnerability knowledge graph construction method, vulnerability knowledge graph evaluation equipment and storage medium Download PDF

Info

Publication number
CN116484025A
CN116484025A CN202310706545.3A CN202310706545A CN116484025A CN 116484025 A CN116484025 A CN 116484025A CN 202310706545 A CN202310706545 A CN 202310706545A CN 116484025 A CN116484025 A CN 116484025A
Authority
CN
China
Prior art keywords
data
vulnerability
entity
knowledge graph
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310706545.3A
Other languages
Chinese (zh)
Inventor
王志强
薛培阳
于欣月
罗乐琦
张珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING ELECTRONIC SCIENCE AND TECHNOLOGY INSTITUTE
Original Assignee
BEIJING ELECTRONIC SCIENCE AND TECHNOLOGY INSTITUTE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING ELECTRONIC SCIENCE AND TECHNOLOGY INSTITUTE filed Critical BEIJING ELECTRONIC SCIENCE AND TECHNOLOGY INSTITUTE
Priority to CN202310706545.3A priority Critical patent/CN116484025A/en
Publication of CN116484025A publication Critical patent/CN116484025A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a vulnerability knowledge graph construction method, an evaluation method, equipment and a storage medium, which relate to the technical field of computer application and comprise the following steps: obtaining vulnerability data in a data source, wherein the data source comprises an NVD database, a CNVD database, a CWE database and an Exploid-db database; performing data processing on the vulnerability data to obtain preprocessing data; obtaining relation data between entity data and entities according to the preprocessing data, wherein the entities comprise a vulnerability entity, a product version entity, a manufacturer entity, a vulnerability verification entity, a vulnerability utilization entity and related link entities; fusing the entity data and the relation data by using a preset data fusion algorithm to obtain fusion data; and constructing a security hole knowledge graph according to the fusion data. The method integrates a large amount of data to construct the vulnerability knowledge graph and covers more vulnerability information.

Description

Vulnerability knowledge graph construction method, vulnerability knowledge graph evaluation equipment and storage medium
Technical Field
The invention relates to the technical field of computer application, in particular to a vulnerability knowledge graph construction method, an assessment method, equipment and a storage medium.
Background
The knowledge graph technology is a modern theory which combines the theory and method of subjects such as application mathematics, graphics, information visualization technology, information science and the like with the method of introduction analysis, co-occurrence analysis and the like of metering science, and utilizes the visualized graph to vividly display the core structure, development history, leading edge field and overall knowledge architecture of the subjects to achieve the aim of multi-subject fusion. The method displays the complex knowledge field through data mining, information processing, knowledge metering and graphic drawing, reveals the dynamic development rule of the knowledge field, and provides a practical and valuable reference for discipline research. Knowledge graphs are applied to popular artificial intelligence application scenes such as voice assistants, chat robots, intelligent questions and answers, and cover a plurality of fields such as Internet, finance, government affairs, medical treatment, and the like.
Along with the development of the Internet, various network security problems are endless, and more security holes are maliciously utilized by lawbreakers to cause adverse effects. Most network attack events are exploit vulnerabilities that are not patched by the system. Many enterprises that have deployed security devices and software are still vulnerable to vulnerability intrusion, resulting in significant economic loss. The method is a method for effectively managing the loopholes by users, and finally, the loopholes become an effective way for an attacker to attack. Thousands of network security vulnerabilities are discovered and promulgated each year, along with the continual change in attacker means, the network security of users is also increasing with the number of promulgated security vulnerabilities. The traditional vulnerability databases are often in decentralized management, the data sources are single, different vulnerability databases are built according to different requirements, and massive vulnerability data cannot be fully utilized.
Disclosure of Invention
The method solves the problem that the vulnerability data cannot be fully utilized to construct the vulnerability knowledge graph due to single data source.
In order to solve the above problems, the present invention provides a method for constructing a vulnerability knowledge graph, comprising:
obtaining vulnerability data in a data source, wherein the data source comprises an NVD database, a CNVD database, a CWE database and an Exploid-db database;
performing data processing on the vulnerability data to obtain preprocessed data;
obtaining relation data between entity data and entities according to the preprocessing data, wherein the entities comprise a vulnerability entity, a product version entity, a manufacturer entity, a vulnerability verification entity, a vulnerability utilization entity and related link entities;
fusing the entity data and the relation data by using a preset data fusion algorithm to obtain fusion data;
and constructing a security vulnerability knowledge graph according to the fusion data, wherein the security vulnerability knowledge graph comprises a plurality of nodes, and the nodes correspond to the entity data.
Optionally, the acquiring vulnerability data of the data source includes:
acquiring NVD vulnerability data in the NVD database by using an API (application program interface), wherein the NVD vulnerability data is in a JSON format and comprises CVE numbers, CVE influence products, versions, manufacturers and NVD related links;
obtaining CNVD vulnerability data in the CNVD database, wherein the CNVD vulnerability data is in a structured XML format, and comprises CNVD numbers, CNVD influence products and CNVD related links;
acquiring CWE vulnerability data in the CWE database, wherein the CWE vulnerability data is in a structured CSV format, and the CWE vulnerability data comprises a CWE number and vulnerability name information;
and acquiring the Explat-db vulnerability data in the Explat-db database, wherein the Explat-db vulnerability data is in a structured CSV format, and the Explat-db vulnerability data comprises the CVE number, the Explat-db number and vulnerability exploitation code file path information.
Optionally, the entity data includes structured entity data and unstructured entity data, and the obtaining relationship data between entity data and entities according to the preprocessing data includes:
extracting unstructured data and structured data of the preprocessed data;
obtaining unstructured entity data through the unstructured data by deep learning, and obtaining the structured entity data according to the structured data;
and obtaining the relation data among the entities according to the predefined entity relation information, the unstructured entity data and the structured entity data.
Optionally, the obtaining unstructured entity data through the unstructured data by deep learning includes:
inputting the unstructured data into a pre-training BERT model to obtain word vector information;
inputting the word vector information into a BiLSTM model for bidirectional training to obtain context semantic feature vector information;
resolving the context semantic feature vector information by using a CRF model to obtain an entity tag sequence;
and obtaining the unstructured entity data according to the entity tag sequence.
Optionally, the constructing a vulnerability knowledge graph according to the fusion data, where the node corresponds to the entity data includes:
obtaining a node set and an edge set according to the fusion data;
constructing the vulnerability knowledge graph according to the node set and the edge set;
the node set is a set of all nodes corresponding to the entity data, and the edge set is a set of edges of the relation data between the entities.
Optionally, the performing data processing on the vulnerability data to obtain preprocessed data includes:
and cleaning the vulnerability data through data to obtain the preprocessing data.
According to the vulnerability knowledge graph construction method, vulnerability data in the NVD database, the CNVD database, the CWE database and the Exploid-db database are acquired by utilizing different methods, a large amount of data is integrated, so that the vulnerability data is more perfect, the acquired vulnerability data is processed, and the data quality is improved. And extracting the relationship data between the entity data and the entity from a large amount of processed data, then utilizing the fusion of the related data to obtain fusion data, constructing a security vulnerability knowledge graph according to the fusion data, and covering more vulnerability information by integrating a large amount of data.
The invention also provides a device for constructing the vulnerability knowledge graph, which comprises the following steps:
the vulnerability data unit is used for acquiring vulnerability data in a data source, wherein the data source comprises an NVD database, a CNVD database, a CWE database and an Exploid-db database;
the pretreatment data unit is used for carrying out data processing on the vulnerability data to obtain pretreatment data;
the entity data and the relation data unit are used for obtaining relation data between the entity data and the entity according to the preprocessing data, wherein the entity comprises a vulnerability entity, a product version entity, a manufacturer entity, a vulnerability verification entity, a vulnerability utilization entity and a related link entity;
the fusion data unit is used for fusing the entity data and the relation data by using a preset data fusion algorithm to obtain fusion data;
the security vulnerability knowledge graph unit is used for constructing a security vulnerability knowledge graph according to the fusion data, wherein the security vulnerability knowledge graph comprises a plurality of nodes, and the nodes correspond to the entity data.
The device for constructing the vulnerability knowledge graph has the same advantages as the method for constructing the vulnerability knowledge graph compared with the prior art, and is not described in detail herein.
The invention also provides a vulnerability assessment method, which comprises the following steps:
obtaining relevant information of the loopholes according to the loophole knowledge graph obtained by the construction method of the loophole knowledge graph, wherein the relevant information of the loopholes comprises loophole types and loophole grades;
and evaluating the risk coefficient of the vulnerability according to the vulnerability association information.
The vulnerability assessment method and the vulnerability knowledge graph construction method have the same advantages as compared with the prior art, and are not described in detail herein.
The invention also provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the building method of the vulnerability knowledge graph and/or the steps of the evaluating method of the vulnerability when executing the computer program.
The advantages of the computer device and the method for constructing the vulnerability knowledge graph in the invention are the same as those of the computer device and the method for constructing the vulnerability knowledge graph in the prior art, and are not described in detail herein.
The invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and when the computer program is read and run by a processor, the steps of the vulnerability knowledge graph construction method and/or the steps of the vulnerability assessment method are realized.
The advantages of the computer readable storage medium and the method for constructing the vulnerability knowledge graph are the same as those of the prior art, and are not described in detail herein.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is an application environment diagram of a method for constructing a vulnerability knowledge graph in an embodiment of the invention;
FIG. 2 is a flow chart of a method for constructing a vulnerability knowledge graph in an embodiment of the invention;
FIG. 3 is a schematic diagram of a preferred embodiment of an embodiment of the present invention;
FIG. 4 is a schematic diagram of a device for constructing a vulnerability knowledge graph according to an embodiment of the present invention;
FIG. 5 is a flow chart of a method for evaluating vulnerabilities according to an embodiment of the present invention;
fig. 6 is a diagram showing an internal structure of a computer device in the embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is an application environment diagram of a method for constructing a vulnerability knowledge graph in an embodiment of the invention. Referring to fig. 1, the method for constructing the vulnerability knowledge graph is applied to a system for constructing the vulnerability knowledge graph. The vulnerability knowledge graph construction system comprises a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The terminal 110 may be a desktop terminal or a mobile terminal, and the mobile terminal may be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.
Referring to fig. 2, the embodiment provides a method for constructing a vulnerability knowledge graph, which includes:
step 210, obtaining vulnerability data in a data source, wherein the data source comprises an NVD database, a CNVD database, a CWE database and an Exploid-db database;
specifically, the method automatically collects multi-source heterogeneous vulnerability-related data of different data formats and data structures in a plurality of databases in a data source.
Step 220, performing data processing on the vulnerability data to obtain preprocessed data;
specifically, correlation processing is performed on the collected vulnerability data by using correlation technology, wherein the correlation technology comprises mathematical statistics, data mining or predefined cleaning rules, and the correlation data which does not meet the data processing specification is converted into data meeting the data quality requirement.
Step 230, obtaining relationship data between entity data and entities according to the preprocessing data, wherein the entities comprise a vulnerability entity, a product version entity, a manufacturer entity, a vulnerability verification entity, a vulnerability exploitation entity and related link entities;
specifically, the method is oriented to the network security field, vulnerability data in the Internet are fused and represented in a standardized mode, important terms in the network security field are analyzed and combed, the existing network security field ontology is multiplexed, and the construction efficiency of the ontology is improved. Analyzing the characteristics of vulnerability data sources such as NVD and CNVD, determining a concept class set used for representing an ontology, a relation set among concept classes and an attribute set, evaluating the constructed vulnerability ontology, checking whether the constructed vulnerability ontology can cover the existing data sources, continuously adjusting, optimizing and perfecting the ontology according to an evaluation result, ensuring that the constructed ontology is fully applicable, and finally constructing a vulnerability knowledge graph ontology comprising vulnerabilities, products, product versions, manufacturers, weaknesses, vulnerability verification, vulnerability utilization and related links.
Step 240, fusing the entity data and the relationship data by using a preset data fusion algorithm to obtain fusion data;
specifically, information such as vulnerability numbers, vulnerability descriptions, vulnerability scores, influence products and the like corresponding to NVD and CNVD data is analyzed, heterogeneous data integration and disambiguation are performed under unified specifications, more comprehensive knowledge sharing is provided, the problem of data quality is solved, and the quality and coverage of a knowledge graph are improved. Step 250, constructing a security vulnerability knowledge graph according to the fusion data, wherein the security vulnerability knowledge graph comprises a plurality of nodes, and the nodes correspond to the entity data.
Specifically, each entity is a node. And extracting the relationship to obtain the relationship between the entities in the vulnerability source code, wherein the relationship between any two entities is one side, and the relationship between the entities can be one-to-many or many-to-many.
In the embodiment, vulnerability data in an NVD database, a CNVD database, a CWE database and an Explloit-db database are acquired by using different methods, a large amount of data is integrated, so that the vulnerability data is more perfect, the acquired vulnerability data is processed, and the data quality is improved. And extracting the relationship data between the entity data and the entity from a large amount of processed data, then utilizing the fusion of the related data to obtain fusion data, constructing a security vulnerability knowledge graph according to the fusion data, and covering more vulnerability information by integrating a large amount of data.
Optionally, the acquiring vulnerability data of the data source includes:
acquiring NVD vulnerability data in the NVD database by using an API (application program interface), wherein the NVD vulnerability data is in a JSON format and comprises CVE numbers, CVE influence products, versions, manufacturers and NVD related links;
obtaining CNVD vulnerability data in the CNVD database, wherein the CNVD vulnerability data is in a structured XML format, and comprises CNVD numbers, CNVD influence products and CNVD related links;
acquiring CWE vulnerability data in the CWE database, wherein the CWE vulnerability data is in a structured CSV format, and the CWE vulnerability data comprises a CWE number and vulnerability name information;
and acquiring the Explat-db vulnerability data in the Explat-db database, wherein the Explat-db vulnerability data is in a structured CSV format, and the Explat-db vulnerability data comprises the CVE number, the Explat-db number and vulnerability exploitation code file path information.
Specifically, for the NVD database, the data related to the vulnerability and the product are acquired by using an API interface mode, and the data format is a structured JSON format. And for the CNVD database, writing scripts to download vulnerability sharing data files provided on the CNVD official website in batches, wherein the data format is a structured xml format. For the CWE database, the vulnerability information data file provided on the CWE official website is downloaded in a structured CSV format. And for the Exploid-db database, downloading the data file in the gitlab open source warehouse, wherein the data format is a structured CSV format. For vulnerability verification information, selecting a GitHub open source warehouse inthewilddb and a PoC-in-Github as data sources, and obtaining data by analyzing a sqlite database file and requesting an API interface. And data cleaning, de-duplication, format conversion and the like are performed on the collected original data, so that the quality and the effectiveness of the data are improved. The error data is commonly existed in the manual modification CNVD data, and a large amount of non-important and nonsensical information such as vulnerability submitters, spanish vulnerability descriptions, vulnerability scoring details and the like exist in the original data.
According to the method for constructing the vulnerability knowledge graph, vulnerability data are collected from a plurality of databases, different methods are respectively adopted for collecting NVD databases, CNVD databases, CWE databases and Exploid-db databases, data files with different formats are obtained, the data files are integrated uniformly to obtain the vulnerability data, the data formats of the databases are uniform, and a more comprehensive database is established.
Optionally, the entity data includes structured entity data and unstructured entity data, and the obtaining relationship data between entity data and entities according to the preprocessing data includes:
extracting unstructured data and structured data of the preprocessed data;
obtaining unstructured entity data through the unstructured data by deep learning, and obtaining the structured entity data according to the structured data;
and obtaining the relation data among the entities according to the predefined entity relation information, the unstructured entity data and the structured entity data.
Specifically, the structured data and the unstructured data are extracted through the preprocessing data, the structured data is a database of data, the data can be logically expressed and realized through a two-dimensional table structure, the data is in row units, one row of data represents information of one entity, and the attribute of each row of data is the same. Whereas unstructured data means that the information does not have a predefined data model or is not organized in a predefined way. The structured data may be subject to entity extraction, also referred to as named entity learning or named entity recognition, by its nature, which refers to automatically recognizing named entities from a database. Because the entity is the most basic element in the knowledge graph, the integrity, accuracy, recall rate and the like of extraction directly affect the quality of the knowledge base. Thus, entity extraction is the most fundamental and critical step in knowledge extraction. And writing the template by the structured data in the target entity, and then matching in the original corpus to obtain the structured entity data. While unstructured data is not so easily organized or formatted, collecting, processing, and analyzing unstructured data is also a significant challenge because of its data characteristics to obtain unstructured entity data in unstructured data using a deep learning BERT-BiLSTM-CRF baseline model. And extracting the relationship among the entities, namely extracting the relationship among the identified entities one by one according to definition, traversing and circulating the whole entity to obtain the relationship among the entities.
Vulnerability data sources such as NVD and CNVD are analyzed, and eight entities are defined, namely Vulnerability (Vulnerability), product (Product), product version (Product version), vendor (Vendor), vulnerability (Weakness), vulnerability verification (POC), vulnerability exploitation (Exploit) and related links (Reference). The existing ontology relations are generalized, relation types among entities are manually screened and defined, and the relations among the entities in the coverage vulnerability knowledge graph are guaranteed. Meanwhile, relationships among entities are defined as an influence relationship (af ct), a utilization relationship (has_explot), a reference relationship (has_reference), a POC relationship (has_poc), a cause-effect relationship (lead_to), a version relationship (has_version) and a product relationship (has_product) respectively. As shown in Table 1, the relationship types among the entities are 7, and the relationship in the table is unidirectional, namely, the relationship points to the entity B from the entity A.
TABLE 1
Entity 1 Relationship of Entity 2
Vulnerability affect ProductVersion
Vulnerability has_exploit Exploit
Vulnerability has_reference Reference
Vulnerability has_POC POC
Weakness lead_to Vulnerability
Product has_version ProductVersion
Vendor has_product Product
According to the method for constructing the vulnerability knowledge graph, the structured data and the unstructured data are processed separately, unstructured entity data in the unstructured data are extracted through a deep learning BERT-BiLSTM-CRF baseline model, the accuracy rate is higher than that of other models, the recall rate is improved to a certain extent, and the entity extraction is more complete. And meanwhile, the relation data between the entities is obtained, so that the final result is more accurate.
Optionally, the obtaining unstructured entity data through the unstructured data by deep learning includes:
inputting the unstructured data into a pre-training BERT model to obtain word vector information;
inputting the word vector information into a BiLSTM model for bidirectional training to obtain context semantic feature vector information;
resolving the context semantic feature vector information by using a CRF model to obtain an entity tag sequence;
and obtaining the unstructured entity data according to the entity tag sequence.
Specifically, vendor information, product name information, and product version information in the unstructured data are extracted. First textInput into a pre-trained language model BERT model, BERT uses a mask language model (Masked Language Model) andone sentence of prediction (Next Sentence Prediction) is pre-trained, so that the context information can be fully utilized to generate dynamic word vectors with richer semanticsAnd well solves the problem of word ambiguity. Then inputting the obtained word vector information into a BiLSTM model for bidirectional training, wherein the BiLSTM model is used for extracting text information forward LSTM and backward LSTM which are used for entity classification and contain contexts as an initial sequence and a reverse sequence respectively, capturing long distance and context semantic features, converting the vector sequences into labeling probability matrixes and outputting context feature vectors in hidden states. Finally, the output of the BiLSTM module is decoded using a CRF model, which may add constraints to the final predictive labels to ensure that they are valid, which may be automatically learned by the CRF model from the training dataset during the training process. And obtaining an optimal prediction labeling sequence, accurately classifying each entity in the sequence to output an entity tag sequence, and obtaining entity data according to the entity tag sequence.
According to the method for constructing the vulnerability knowledge graph, the effect of text information extraction is improved through the BERT model, meanwhile, text information containing context is used for entity classification through the BiLSTM model, consistency of label classification in the entity is achieved through the CRF model, unstructured entity data in unstructured data is extracted through the BERT-BiLSTM-CRF baseline model, and the effect of entity extraction in the unstructured data is improved.
Optionally, the constructing a vulnerability knowledge graph according to the fusion data includes:
obtaining a node set and an edge set according to the fusion data;
constructing the vulnerability knowledge graph according to the node set and the edge set;
the node set is a set of all nodes corresponding to the entity data, and the edge set is a set of edges of the relation data between the entities.
Specifically, the vulnerability knowledge graph is composed of the node set and the edge set, wherein the node set comprises a plurality of nodes, each node represents an entity existing in the real world, the edge set comprises a plurality of edges, and each edge is a relationship between the entities.
According to the method for constructing the vulnerability knowledge graph, the node set and the edge set are obtained through the fusion data, and the vulnerability knowledge graph is constructed according to the node set and the edge set, so that the vulnerability knowledge graph can store knowledge in a graph mode, and related vulnerability knowledge can be searched more conveniently by establishing semantic links between the data.
Optionally, the performing data processing on the vulnerability data to obtain preprocessed data includes:
and cleaning the vulnerability data through data to obtain the preprocessing data.
Specifically, selecting a data column in the data set to be analyzed, hiding other data columns not participating in analysis to avoid interference, deleting repeated data values in the data, taking care that only the first piece of data of the repeated data is reserved, and possibly generating data value deletion in the original data, namely, data cells without data exist in the data set. The result is affected during data analysis, and the missing data value needs to be complemented.
The method for constructing the vulnerability knowledge graph in the embodiment cleans and deletes repeated data through data, and simultaneously complements the missing data, thereby improving the data quality, laying a foundation for the subsequent data use and increasing the accuracy.
In some more specific embodiments, as shown in connection with FIG. 3, first, a knowledge-graph G is initialized and a knowledge-graph is constructed from the NVD database. After the total number of CVEs is obtained from the NVD, for each CVE, vulnerability related data is obtained and analyzed from an API interface of the NVD, vulnerability nodes are added, and Weakness nodes and relations are added; and extracting the Product and Vendor information for each Product version set, adding the Product and Vendor nodes and their relations, and adding the Product version node and the relation for each Product version. And then increasing Reference nodes and relations, extracting POC information at the same time, and increasing the POC nodes and relations.
And secondly, constructing a knowledge graph from the CNVD database. Downloading a Vulnerability file of the CNVD, analyzing to obtain Vulnerability related data in the file, and for each piece of data, if the CVE number exists in the Vulnerability, only updating a Vulnerability node and adding information in the CNVD; if the CVE number does not exist in the Vulnerability, namely the unique Vulnerability in the CNVD, adding a Vulnerability node, carrying out named entity identification on each original product in the original product data set, identifying manufacturer, product and version information, and adding Vendor, product, productVersion nodes and relations; reference nodes and relationships are then added.
And finally, constructing a knowledge graph from the Exploid-db and CWE data. Downloading a file_explets.csv file provided by the explet-db, analyzing to obtain relevant data of the Exploit in the file, analyzing and extracting explet information for each piece of data, and adding explet nodes and relations. And downloading the vulnerability file provided by the CWE, analyzing and obtaining vulnerability related data in the file, analyzing and extracting vulnerability information for each piece of data, and updating the weakness node.
According to the vulnerability knowledge graph construction method, vulnerability data in the NVD database, the CNVD database, the CWE database and the Exploid-db database are acquired by utilizing different methods, a large amount of data is integrated, so that the vulnerability data is more perfect, the acquired vulnerability data is processed, and the data quality is improved. And extracting the relationship data between the entity data and the entity from a large amount of processed data, then utilizing the fusion of the related data to obtain fusion data, constructing a security vulnerability knowledge graph according to the fusion data, and covering more vulnerability information by integrating a large amount of data.
Corresponding to the above method for constructing a vulnerability knowledge graph, as shown in fig. 4, a further embodiment of the present invention further provides a device for constructing a vulnerability knowledge graph, including:
the vulnerability data unit 10 is used for acquiring vulnerability data in a data source, wherein the data source comprises an NVD database, a CNVD database, a CWE database and an Exploid-db database;
a preprocessing data unit 20, configured to perform data processing on the vulnerability data to obtain preprocessed data;
entity data and relationship data unit 30, where the entity data and relationship data unit is configured to obtain relationship data between entity data and entities according to the pre-processing data, where the entities include a vulnerability entity, a product version entity, a vendor entity, a vulnerability verification entity, a vulnerability exploitation entity, and related link entities;
a fusion data unit 40, configured to fuse the entity data and the relationship data by using a preset data fusion algorithm to obtain fusion data;
the security hole knowledge graph unit 50 is configured to construct a security hole knowledge graph according to the fusion data, where the security hole knowledge graph includes a plurality of nodes, and the nodes correspond to the entity data.
The device for constructing the vulnerability knowledge graph has the same advantages as the method for constructing the vulnerability knowledge graph compared with the prior art, and is not described in detail herein.
Corresponding to the above method for constructing the vulnerability knowledge graph, as shown in fig. 5, a further embodiment of the present invention further provides a method for evaluating a vulnerability, including the following steps:
step 510, obtaining relevant information of the vulnerability according to the vulnerability knowledge graph obtained by the vulnerability knowledge graph construction method, wherein the relevant information of the vulnerability comprises the vulnerability type and the vulnerability grade;
and step 520, evaluating the risk coefficient of the vulnerability according to the vulnerability association information.
The vulnerability assessment method and the vulnerability knowledge graph construction method have the same advantages as compared with the prior art, and are not described in detail herein.
FIG. 6 illustrates an internal block diagram of a computer device in one embodiment. The computer device may be specifically the terminal 110 (or the server 120) in fig. 1. As shown in fig. 6, the computer device includes a processor, a memory, a network interface, an input device, and a display screen connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program that, when executed by a processor, causes the processor to implement a bimodal object detection model construction method. The internal memory may also store a computer program that, when executed by the processor, may cause the processor to perform a method for constructing a vulnerability knowledge graph and/or a method for evaluating a vulnerability. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present invention and is not limiting of the computer device to which the present invention may be applied, and that a particular computer device may include more or fewer components than those shown, or may combine certain components, or have a different arrangement of components.
Another embodiment of the present invention provides a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: obtaining vulnerability data in a data source, wherein the data source comprises an NVD database, a CNVD database, a CWE database and an Exploid-db database;
performing data processing on the vulnerability data to obtain preprocessed data;
obtaining relation data between entity data and entities according to the preprocessing data, wherein the entities comprise a vulnerability entity, a product version entity, a manufacturer entity, a vulnerability verification entity, a vulnerability utilization entity and related link entities;
fusing the entity data and the relation data by using a preset data fusion algorithm to obtain fusion data;
and constructing a security vulnerability knowledge graph according to the fusion data, wherein the security vulnerability knowledge graph comprises a plurality of nodes, and the nodes correspond to the entity data.
In one embodiment, the processor further implements the above-mentioned method for constructing the vulnerability knowledge graph and/or the step of the vulnerability assessment method when executing the computer program.
Another embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of: obtaining vulnerability data in a data source, wherein the data source comprises an NVD database, a CNVD database, a CWE database and an Exploid-db database;
performing data processing on the vulnerability data to obtain preprocessed data;
obtaining relation data between entity data and entities according to the preprocessing data, wherein the entities comprise a vulnerability entity, a product version entity, a manufacturer entity, a vulnerability verification entity, a vulnerability utilization entity and related link entities;
fusing the entity data and the relation data by using a preset data fusion algorithm to obtain fusion data;
and constructing a security vulnerability knowledge graph according to the fusion data, wherein the security vulnerability knowledge graph comprises a plurality of nodes, and the nodes correspond to the entity data.
In one embodiment, the computer program when executed by the processor further implements the above-mentioned method for constructing a vulnerability knowledge graph and/or the step of the method for evaluating a vulnerability.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The method for constructing the vulnerability knowledge graph is characterized by comprising the following steps of:
obtaining vulnerability data in a data source, wherein the data source comprises an NVD database, a CNVD database, a CWE database and an Exploid-db database;
performing data processing on the vulnerability data to obtain preprocessed data;
obtaining relation data between entity data and entities according to the preprocessing data, wherein the entities comprise a vulnerability entity, a product version entity, a manufacturer entity, a vulnerability verification entity, a vulnerability utilization entity and related link entities;
fusing the entity data and the relation data by using a preset data fusion algorithm to obtain fusion data;
and constructing a security vulnerability knowledge graph according to the fusion data, wherein the security vulnerability knowledge graph comprises a plurality of nodes, and the nodes correspond to the entity data.
2. The method for constructing a vulnerability knowledge graph according to claim 1, wherein the obtaining vulnerability data of a data source comprises:
acquiring NVD vulnerability data in the NVD database by using an API (application program interface), wherein the NVD vulnerability data is in a JSON format and comprises CVE numbers, CVE influence products, versions, manufacturers and NVD related links;
obtaining CNVD vulnerability data in the CNVD database, wherein the CNVD vulnerability data is in a structured XML format, and comprises CNVD numbers, CNVD influence products and CNVD related links;
acquiring CWE vulnerability data in the CWE database, wherein the CWE vulnerability data is in a structured CSV format, and the CWE vulnerability data comprises a CWE number and vulnerability name information;
and acquiring the Explat-db vulnerability data in the Explat-db database, wherein the Explat-db vulnerability data is in a structured CSV format, and the Explat-db vulnerability data comprises the CVE number, the Explat-db number and vulnerability exploitation code file path information.
3. The method for constructing a vulnerability knowledge graph according to claim 1, wherein the entity data includes structured entity data and unstructured entity data, and the obtaining relationship data between the entity data and the entity according to the preprocessing data includes:
extracting unstructured data and structured data of the preprocessed data;
obtaining unstructured entity data through the unstructured data by deep learning, and obtaining the structured entity data according to the structured data;
and obtaining the relation data among the entities according to the predefined entity relation information, the unstructured entity data and the structured entity data.
4. The method for constructing a vulnerability knowledge graph according to claim 3, wherein the obtaining unstructured entity data from the unstructured data by deep learning comprises:
inputting the unstructured data into a pre-training BERT model to obtain word vector information;
inputting the word vector information into a BiLSTM model for bidirectional training to obtain context semantic feature vector information;
resolving the context semantic feature vector information by using a CRF model to obtain an entity tag sequence;
and obtaining the unstructured entity data according to the entity tag sequence.
5. The method for constructing a vulnerability knowledge graph according to claim 1, wherein the constructing the vulnerability knowledge graph according to the fusion data comprises:
obtaining a node set and an edge set according to the fusion data;
constructing the vulnerability knowledge graph according to the node set and the edge set;
the node set is a set of all nodes corresponding to the entity data, and the edge set is a set of edges of the relation data between the entities.
6. The method for constructing a vulnerability knowledge graph according to claim 1, wherein the performing data processing on the vulnerability data to obtain preprocessed data includes:
and cleaning the vulnerability data through data to obtain the preprocessing data.
7. The device for constructing the vulnerability knowledge graph is characterized by comprising the following steps:
the vulnerability data unit is used for acquiring vulnerability data in a data source, wherein the data source comprises an NVD database, a CNVD database, a CWE database and an Exploid-db database;
the pretreatment data unit is used for carrying out data processing on the vulnerability data to obtain pretreatment data;
the entity data and the relation data unit are used for obtaining relation data between the entity data and the entity according to the preprocessing data, wherein the entity comprises a vulnerability entity, a product version entity, a manufacturer entity, a vulnerability verification entity, a vulnerability utilization entity and a related link entity;
the fusion data unit is used for fusing the entity data and the relation data by using a preset data fusion algorithm to obtain fusion data;
the security vulnerability knowledge graph unit is used for constructing a security vulnerability knowledge graph according to the fusion data, wherein the security vulnerability knowledge graph comprises a plurality of nodes, and the nodes correspond to the entity data.
8. A method for evaluating vulnerabilities, comprising:
the vulnerability knowledge graph obtained by the method for constructing the vulnerability knowledge graph according to any one of claims 1 to 6 obtains the associated information of the vulnerability, wherein the vulnerability associated information comprises the vulnerability type and the vulnerability grade;
and evaluating the risk coefficient of the vulnerability according to the vulnerability association information.
9. Computer device, characterized in that it comprises a memory in which a computer program is stored and a processor, which when executing the computer program implements the method of constructing a vulnerability knowledge graph according to any one of claims 1 to 6 and/or the method of assessing vulnerabilities according to claim 8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when read and executed by a processor, implements the method of constructing a vulnerability knowledge graph according to any one of claims 1 to 6 and/or the method of evaluating vulnerabilities according to claim 8.
CN202310706545.3A 2023-06-15 2023-06-15 Vulnerability knowledge graph construction method, vulnerability knowledge graph evaluation equipment and storage medium Pending CN116484025A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310706545.3A CN116484025A (en) 2023-06-15 2023-06-15 Vulnerability knowledge graph construction method, vulnerability knowledge graph evaluation equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310706545.3A CN116484025A (en) 2023-06-15 2023-06-15 Vulnerability knowledge graph construction method, vulnerability knowledge graph evaluation equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116484025A true CN116484025A (en) 2023-07-25

Family

ID=87223443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310706545.3A Pending CN116484025A (en) 2023-06-15 2023-06-15 Vulnerability knowledge graph construction method, vulnerability knowledge graph evaluation equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116484025A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775910A (en) * 2023-08-18 2023-09-19 北京源堡科技有限公司 Automatic vulnerability reproduction knowledge base construction method and medium based on information collection

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688456A (en) * 2019-09-25 2020-01-14 北京计算机技术及应用研究所 Vulnerability knowledge base construction method based on knowledge graph
CN112149135A (en) * 2020-09-16 2020-12-29 国网河北省电力有限公司电力科学研究院 Method and device for constructing security vulnerability knowledge graph
CN113656805A (en) * 2021-07-22 2021-11-16 扬州大学 Event map automatic construction method and system for multi-source vulnerability information
CN113726784A (en) * 2021-08-31 2021-11-30 平安医疗健康管理股份有限公司 Network data security monitoring method, device, equipment and storage medium
CN115796147A (en) * 2022-12-07 2023-03-14 中科大数据研究院 Information correlation degree calculation method applied to network security threat information
CN115827895A (en) * 2022-12-12 2023-03-21 绿盟科技集团股份有限公司 Vulnerability knowledge graph processing method, device, equipment and medium
CN115859304A (en) * 2022-12-19 2023-03-28 南京理工大学 Vulnerability discovery knowledge graph construction method fusing ATT and CK frameworks

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688456A (en) * 2019-09-25 2020-01-14 北京计算机技术及应用研究所 Vulnerability knowledge base construction method based on knowledge graph
CN112149135A (en) * 2020-09-16 2020-12-29 国网河北省电力有限公司电力科学研究院 Method and device for constructing security vulnerability knowledge graph
CN113656805A (en) * 2021-07-22 2021-11-16 扬州大学 Event map automatic construction method and system for multi-source vulnerability information
US20230035121A1 (en) * 2021-07-22 2023-02-02 Yangzhou University Automatic event graph construction method and device for multi-source vulnerability information
CN113726784A (en) * 2021-08-31 2021-11-30 平安医疗健康管理股份有限公司 Network data security monitoring method, device, equipment and storage medium
CN115796147A (en) * 2022-12-07 2023-03-14 中科大数据研究院 Information correlation degree calculation method applied to network security threat information
CN115827895A (en) * 2022-12-12 2023-03-21 绿盟科技集团股份有限公司 Vulnerability knowledge graph processing method, device, equipment and medium
CN115859304A (en) * 2022-12-19 2023-03-28 南京理工大学 Vulnerability discovery knowledge graph construction method fusing ATT and CK frameworks

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775910A (en) * 2023-08-18 2023-09-19 北京源堡科技有限公司 Automatic vulnerability reproduction knowledge base construction method and medium based on information collection
CN116775910B (en) * 2023-08-18 2023-11-24 北京源堡科技有限公司 Automatic vulnerability reproduction knowledge base construction method and medium based on information collection

Similar Documents

Publication Publication Date Title
Vijayakumar et al. Automated risk identification using NLP in cloud based development environments
Ignatiev et al. From contrastive to abductive explanations and back again
US11893355B2 (en) Semantic map generation from natural-language-text documents
Shokripour et al. A time-based approach to automatic bug report assignment
US20230056987A1 (en) Semantic map generation using hierarchical clause structure
Ginde et al. ScientoBASE: a framework and model for computing scholastic indicators of non-local influence of journals via native data acquisition algorithms
CN107391682B (en) Knowledge verification method, knowledge verification apparatus, and storage medium
Alrashedy et al. Scc++: Predicting the programming language of questions and snippets of stack overflow
CN113449204B (en) Social event classification method and device based on local aggregation graph attention network
Movshovitz-Attias et al. Kb-lda: Jointly learning a knowledge base of hierarchy, relations, and facts
CN115827895A (en) Vulnerability knowledge graph processing method, device, equipment and medium
Miao et al. A dynamic financial knowledge graph based on reinforcement learning and transfer learning
CN116484025A (en) Vulnerability knowledge graph construction method, vulnerability knowledge graph evaluation equipment and storage medium
Paydar et al. A semi-automated approach to adapt activity diagrams for new use cases
Ouared et al. Capitalizing the database cost models process through a service‐based pipeline
Li et al. Classifying crowdsourced mobile test reports with image features: An empirical study
US20230075290A1 (en) Method for linking a cve with at least one synthetic cpe
CN117251777A (en) Data processing method, device, computer equipment and storage medium
Yeo et al. Framework for evaluating code generation ability of large language models
Naik et al. An adaptable scheme to enhance the sentiment classification of Telugu language
Nawaz et al. Analysis and classification of employee attrition and absenteeism in industry: A sequential pattern mining-based methodology
US20220309335A1 (en) Automated generation and integration of an optimized regular expression
US11880798B2 (en) Determining section conformity and providing recommendations
Jain et al. A framework for adaptive deep reinforcement semantic parsing of unstructured data
Oosthuizen et al. Analysis of INCOSE Systems Engineering journal and international symposium research topics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230725