CN117910567A

CN117910567A - Vulnerability knowledge graph construction method based on safety dictionary and deep learning network

Info

Publication number: CN117910567A
Application number: CN202410317361.2A
Authority: CN
Inventors: 韩庆良; 史文征; 于志波; 张晓溪; 赵波; 房运德
Original assignee: Dopp Information Technology Co ltd
Current assignee: Dopp Information Technology Co ltd
Priority date: 2024-03-20
Filing date: 2024-03-20
Publication date: 2024-04-19

Abstract

The invention discloses a vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network, which belongs to the technical field of network safety and the technical field of artificial intelligence natural language processing, and the vulnerability knowledge graph construction method based on the safety dictionary and the deep learning network comprises the following concrete implementation steps: the method comprises the steps of collecting and counting internet data resources, constructing a network model, processing the network model, combing vulnerability knowledge graph relations and tracing the vulnerability knowledge graph. The method can realize the construction of the vulnerability knowledge graph, effectively improve the construction and operation efficiency, and simultaneously greatly reduce the error rate in the operation process.

Description

Vulnerability knowledge graph construction method based on safety dictionary and deep learning network

Technical Field

The invention relates to the technical field of network security and the technical field of artificial intelligence natural language processing, in particular to a vulnerability knowledge graph construction method based on a security dictionary and a deep learning network.

Background

Network security refers to that hardware, software and data in a network system are protected, damage, modification and leakage are not caused by accidental or malicious reasons, the system continuously and reliably operates normally, and network service is not interrupted.

The security dictionary is used for defining the security condition of the network based on the running and service states of the network.

Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data. Deep learning is a complex machine learning algorithm that achieves far greater results in terms of speech and image recognition than prior art.

A large number of relevant platforms of vulnerability libraries exist at home and abroad at present, vulnerability data have the defects of singleness, different structures and the like, and a large amount of vulnerability information is contained in a big-data Internet environment, so that potential links among vulnerabilities are better explored, and aggregation analysis is carried out on various vulnerability data, and a knowledge graph is generated.

Based on the above, the present inventors found that:

the traditional knowledge graph utilizes manpower or rules to extract relevant entities and relations of the loopholes, so that the construction and operation efficiency of the knowledge graph is greatly reduced when the knowledge graph is constructed, and meanwhile, the error rate of the knowledge graph is greatly improved in the operation process.

Accordingly, in view of the above, research and improvement are performed on the existing structure, and a vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network is provided so as to achieve the purpose of higher practical value.

Disclosure of Invention

1. Technical problem to be solved

Aiming at the problems existing in the prior art, the invention aims to provide a vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network, which can be realized.

2. Technical proposal

In order to solve the problems, the invention adopts the following technical scheme.

A vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network comprises the following specific implementation steps:

Step one, collecting and counting internet data resources: through the collected network resources, statistical analysis is carried out on the collected network resources based on word frequency, finally, experience of network security operators is synthesized, and a security dictionary is constructed through word selection;

step two, constructing a network model: based on the Internet data resource in the first step, a bert model trained by a large-scale training set is selected as a pre-training model to serve as an overall network entrance, and then a classification result is output through a two-way long-short-term memory network and a conditional random field;

Step three, processing a network model: the multi-source vulnerability text data respectively passes through a safety dictionary and an artificial intelligent reasoning model, makes an entity decision according to a weight ratio, and performs disambiguation alignment processing on the selected entity;

fourth, carding the vulnerability knowledge graph relationship: determining entities, respectively filling vulnerability knowledge graph models according to category contents, and establishing new entities, attributes and relations;

Fifthly, tracing the vulnerability knowledge graph: based on the key entity nodes, data searching is carried out, weaknesses and vulnerabilities related to the data searching are obtained, and tracing is completed.

Further, in the first step, the content included in the security dictionary includes data statistical analysis and security operator experience according to the vulnerability description.

In the first step, when the safety dictionary is constructed, common keywords in the safety field are determined in the safety dictionary, and the keywords are used as auxiliary tools for constructing the atlas.

Further, in the second step, when the network entry is selected, the selected whole network is composed of bert, a bidirectional long-short-term memory network and a conditional random field.

In the second step, when the classification result is output, the classification category is stored according to the classification result, and a system block diagram of the index directory in classification is established.

In the third step, when making entity decisions, category information is determined, and an artificial intelligent classification model is constructed to extract the entities.

Further, in the third step, the specific steps in the entity extraction are as follows:

Step 1: carding all value entity categories involved in the loopholes and related descriptions;

Step 2: the understanding deviation of the model to the semantic information is solved, and the accuracy of the model is improved;

Step 3: and according to the artificial intelligent model, understanding and reasoning are carried out on the input whole sentence, and the entity extraction accuracy is further optimized.

Further, in the third step, when the multi-source vulnerability text data passes through the security dictionary and the artificial intelligent reasoning model respectively, aiming at the vulnerability related multi-source data, the artificial intelligent model faces the challenge of extracting entity information under various contexts, carries out the same understanding by combining the contexts, trains the pre-training model by depending on the data, and understands the context meaning of the corpus.

Further, in the third step, the range of the vulnerability description text is as follows: system, threat, and software.

In the fourth step, when new entities, attributes and relations are established, a knowledge graph prototype graph is designed, and the entity, attribute and relation triples are determined according to the carded category information, so that the vulnerability knowledge graph is dynamically constructed.

3. Advantageous effects

Compared with the prior art, the invention has the advantages that:

① According to the scheme, in the construction of the vulnerability knowledge graph aiming at network security, key words in a security dictionary are combined, and meanwhile, the collection and statistics of internet data resources, the construction of a network model, the processing of the network model, the carding of the vulnerability knowledge graph relationship and the tracing of the vulnerability knowledge graph are sequentially completed, so that the construction of the vulnerability knowledge graph of the network security vulnerability is realized, the construction and operation efficiency of the vulnerability knowledge graph constructed in the mode are effectively improved, and meanwhile, the error rate of the vulnerability knowledge graph is greatly reduced synchronously in the operation process;

② According to the scheme, the content contained in the security dictionary is formed by data statistical analysis of vulnerability descriptions and experience of security operators, and entity extraction can be carried out by using the dictionary by using a better auxiliary artificial intelligent model, so that the accuracy of the extracted entity is judged;

③ Aiming at vulnerability related multi-source data, particularly irregular text data, the artificial intelligent model faces the challenge of extracting entity information under various contexts, understands the context, can train a pretraining model by depending on the data, and better understands the meaning of the context in the corpus;

④ In this solution, the vulnerability description text includes multiple kinds of entities, for example: system, threat, software, etc. How to design the vulnerability knowledge graph model, the extracted content is integrated into the entity, attribute and relation triples, and the execution efficiency and the tracing accuracy of the downstream task are determined.

Drawings

FIG. 1 is a flow chart of a method of constructing a vulnerability knowledge graph of the present invention;

FIG. 2 is a schematic illustration of a vulnerability knowledge graph model of the present invention;

fig. 3 is a schematic diagram of an artificial intelligence network of the present invention with entity extraction.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments, and that all other embodiments obtained by persons of ordinary skill in the art without making creative efforts based on the embodiments in the present invention are within the protection scope of the present invention.

Examples:

Referring to fig. 1-3, a vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network comprises the following specific implementation steps:

And (5) integrating the word frequency of the text of the existing vulnerability description and the experience of safety operators, and extracting keywords to construct a safety dictionary. And training the long-term and short-term memory entity extraction network based on the pretraining model by marking the vulnerability description corpus. In an actual application scene, firstly, a security dictionary assists an artificial intelligent model to extract an entity from an input corpus, secondly, disambiguation alignment operation is carried out on the entity, and finally, the extracted entity is subjected to graph construction according to a vulnerability graph model.

Referring to fig. 1, in step one, the security dictionary includes content including statistical analysis of data according to vulnerability descriptions and security operator experience.

And the entity extraction can be better carried out by utilizing the data statistical analysis result and experience in the safety dictionary, and the accuracy of the extracted entity can be judged.

Referring to fig. 1, in step one, when constructing a security dictionary, common keywords in the security domain are determined in the security dictionary, and the keywords are used as auxiliary tools for constructing a map.

When the security dictionary is constructed, the keywords are used as search nodes, so that accurate search can be realized, the search efficiency and accuracy are improved, and the quality of the vulnerability knowledge graph is further improved.

Referring to fig. 1 and 3, in the second step, when the network entry is selected, the selected whole network is composed of bert, a bidirectional long-short-term memory network and a conditional random field.

The vulnerability description text information is input into a bert model in the form of words to generate word vectors, the word vectors pass through a two-way long-short-term memory network to generate a category probability model, and finally the category of each word is determined through a conditional random field.

Referring to fig. 1, in step two, when the classification result is output, the classification category is stored according to the classification result, and a system block diagram of the index directory in classification is established.

And the output classification results are subjected to secondary arrangement to form a system block diagram and are stored, so that convenience in the process of indexing the subsequent output classification results is improved, and the indexing efficiency is improved.

Referring to fig. 1, in step three, when making an entity decision, determining category information, constructing an artificial intelligent classification model to extract an entity.

When the entity is extracted, the entity decision is carried out depending on the entity extraction result, so that the accuracy of the entity decision is ensured, decision deviation is avoided, and finally, the construction accuracy of the vulnerability knowledge graph is ensured.

Referring to fig. 1, in step three, the specific steps in entity extraction are as follows:

Referring to fig. 1, in step three, when multi-source vulnerability text data passes through a security dictionary and an artificial intelligence reasoning model respectively, aiming at vulnerability related multi-source data, the artificial intelligence model faces the challenge of extracting entity information under various contexts, carries out the same understanding by combining contexts, trains a pre-training model by depending on the data, and understands the context meaning in the corpus.

The loophole related multi-source data, especially for irregular text data, is combined with the context to understand the sentence meaning, so that the condition that the same sentence is ambiguous or objectional is avoided, and the accuracy of the understanding of the multi-source loophole text is improved.

Referring to fig. 1 and fig. 2, in step three, the range of the vulnerability description text is as follows: system, threat, and software.

The system, the threat and the software are all kinds of entities in the vulnerability description text, and when aiming at how to design a vulnerability knowledge graph model, the extracted contents are integrated into the entity, attribute and relation triples, so that the execution efficiency and the tracing accuracy of the downstream tasks can be effectively determined.

The knowledge graph is composed of 7 entities and 5 attributes, and vulnerability and weak point entities form main elements of the knowledge graph, and other entities such as: software, systems and the like are key nodes for tracing the source of security operators, and the related weaknesses and vulnerabilities can be obtained through the data search of the key entity nodes to complete tracing.

Referring to fig. 1, in step four, when new entities, attributes and relationships are established, a knowledge graph prototype graph is designed, and entity, attribute and relationship triples are determined according to the carded category information, so as to dynamically construct a vulnerability knowledge graph.

The designed dynamic vulnerability knowledge graph can meet the expression capability of the vulnerability knowledge graph on network security research of similar types, and the practicability of the whole knowledge graph is improved.

The above description is only of the preferred embodiments of the present invention; the scope of the invention is not limited in this respect. Any person skilled in the art, within the technical scope of the present disclosure, may apply to the present invention, and the technical solution and the improvement thereof are all covered by the protection scope of the present invention.

Claims

1. A vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network is characterized by comprising the following steps of: the construction method comprises the following specific implementation steps:

2. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the first step, the content included in the security dictionary includes data statistical analysis according to vulnerability descriptions and security operator experience.

3. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the first step, when constructing the safety dictionary, determining common keywords in the safety field in the safety dictionary, and taking the keywords as auxiliary tools for constructing the atlas.

4. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the second step, when the network entry is selected, the whole selected network consists of bert, a two-way long-short-term memory network and a conditional random field.

5. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the second step, when the classification result is output, the classification category is stored according to the classification result, and a system block diagram of an index directory in classification is established.

6. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the third step, when the entity decision is made, the category information is determined, and an artificial intelligent classification model is constructed for entity extraction.

7. The security dictionary-based deep learning network vulnerability knowledge graph construction method as set forth in claim 6, wherein the method is characterized in that: in the third step, the specific steps in the entity extraction process are as follows:

8. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the third step, when the multi-source vulnerability text data respectively passes through the safety dictionary and the artificial intelligent reasoning model, aiming at the vulnerability related multi-source data, the artificial intelligent model faces the challenge of extracting entity information under various contexts, the same understanding is carried out by combining the contexts, and the pre-training model is trained by depending on the data to understand the context meanings of the corpus.

9. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the third step, the inclusion range of the vulnerability description text is as follows: system, threat, and software.

10. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the fourth step, when new entities, attributes and relations are established, a knowledge graph prototype graph is designed, and the entity, attribute and relation triples are determined according to the class information which is combed, so that the vulnerability knowledge graph is dynamically constructed.