CN117910567A - Vulnerability knowledge graph construction method based on safety dictionary and deep learning network - Google Patents

Vulnerability knowledge graph construction method based on safety dictionary and deep learning network Download PDF

Info

Publication number
CN117910567A
CN117910567A CN202410317361.2A CN202410317361A CN117910567A CN 117910567 A CN117910567 A CN 117910567A CN 202410317361 A CN202410317361 A CN 202410317361A CN 117910567 A CN117910567 A CN 117910567A
Authority
CN
China
Prior art keywords
knowledge graph
vulnerability
dictionary
network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410317361.2A
Other languages
Chinese (zh)
Inventor
韩庆良
史文征
于志波
张晓溪
赵波
房运德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dopp Information Technology Co ltd
Original Assignee
Dopp Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dopp Information Technology Co ltd filed Critical Dopp Information Technology Co ltd
Priority to CN202410317361.2A priority Critical patent/CN117910567A/en
Publication of CN117910567A publication Critical patent/CN117910567A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network, which belongs to the technical field of network safety and the technical field of artificial intelligence natural language processing, and the vulnerability knowledge graph construction method based on the safety dictionary and the deep learning network comprises the following concrete implementation steps: the method comprises the steps of collecting and counting internet data resources, constructing a network model, processing the network model, combing vulnerability knowledge graph relations and tracing the vulnerability knowledge graph. The method can realize the construction of the vulnerability knowledge graph, effectively improve the construction and operation efficiency, and simultaneously greatly reduce the error rate in the operation process.

Description

Vulnerability knowledge graph construction method based on safety dictionary and deep learning network
Technical Field
The invention relates to the technical field of network security and the technical field of artificial intelligence natural language processing, in particular to a vulnerability knowledge graph construction method based on a security dictionary and a deep learning network.
Background
Network security refers to that hardware, software and data in a network system are protected, damage, modification and leakage are not caused by accidental or malicious reasons, the system continuously and reliably operates normally, and network service is not interrupted.
The security dictionary is used for defining the security condition of the network based on the running and service states of the network.
Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data. Deep learning is a complex machine learning algorithm that achieves far greater results in terms of speech and image recognition than prior art.
A large number of relevant platforms of vulnerability libraries exist at home and abroad at present, vulnerability data have the defects of singleness, different structures and the like, and a large amount of vulnerability information is contained in a big-data Internet environment, so that potential links among vulnerabilities are better explored, and aggregation analysis is carried out on various vulnerability data, and a knowledge graph is generated.
Based on the above, the present inventors found that:
the traditional knowledge graph utilizes manpower or rules to extract relevant entities and relations of the loopholes, so that the construction and operation efficiency of the knowledge graph is greatly reduced when the knowledge graph is constructed, and meanwhile, the error rate of the knowledge graph is greatly improved in the operation process.
Accordingly, in view of the above, research and improvement are performed on the existing structure, and a vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network is provided so as to achieve the purpose of higher practical value.
Disclosure of Invention
1. Technical problem to be solved
Aiming at the problems existing in the prior art, the invention aims to provide a vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network, which can be realized.
2. Technical proposal
In order to solve the problems, the invention adopts the following technical scheme.
A vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network comprises the following specific implementation steps:
Step one, collecting and counting internet data resources: through the collected network resources, statistical analysis is carried out on the collected network resources based on word frequency, finally, experience of network security operators is synthesized, and a security dictionary is constructed through word selection;
step two, constructing a network model: based on the Internet data resource in the first step, a bert model trained by a large-scale training set is selected as a pre-training model to serve as an overall network entrance, and then a classification result is output through a two-way long-short-term memory network and a conditional random field;
Step three, processing a network model: the multi-source vulnerability text data respectively passes through a safety dictionary and an artificial intelligent reasoning model, makes an entity decision according to a weight ratio, and performs disambiguation alignment processing on the selected entity;
fourth, carding the vulnerability knowledge graph relationship: determining entities, respectively filling vulnerability knowledge graph models according to category contents, and establishing new entities, attributes and relations;
Fifthly, tracing the vulnerability knowledge graph: based on the key entity nodes, data searching is carried out, weaknesses and vulnerabilities related to the data searching are obtained, and tracing is completed.
Further, in the first step, the content included in the security dictionary includes data statistical analysis and security operator experience according to the vulnerability description.
In the first step, when the safety dictionary is constructed, common keywords in the safety field are determined in the safety dictionary, and the keywords are used as auxiliary tools for constructing the atlas.
Further, in the second step, when the network entry is selected, the selected whole network is composed of bert, a bidirectional long-short-term memory network and a conditional random field.
In the second step, when the classification result is output, the classification category is stored according to the classification result, and a system block diagram of the index directory in classification is established.
In the third step, when making entity decisions, category information is determined, and an artificial intelligent classification model is constructed to extract the entities.
Further, in the third step, the specific steps in the entity extraction are as follows:
Step 1: carding all value entity categories involved in the loopholes and related descriptions;
Step 2: the understanding deviation of the model to the semantic information is solved, and the accuracy of the model is improved;
Step 3: and according to the artificial intelligent model, understanding and reasoning are carried out on the input whole sentence, and the entity extraction accuracy is further optimized.
Further, in the third step, when the multi-source vulnerability text data passes through the security dictionary and the artificial intelligent reasoning model respectively, aiming at the vulnerability related multi-source data, the artificial intelligent model faces the challenge of extracting entity information under various contexts, carries out the same understanding by combining the contexts, trains the pre-training model by depending on the data, and understands the context meaning of the corpus.
Further, in the third step, the range of the vulnerability description text is as follows: system, threat, and software.
In the fourth step, when new entities, attributes and relations are established, a knowledge graph prototype graph is designed, and the entity, attribute and relation triples are determined according to the carded category information, so that the vulnerability knowledge graph is dynamically constructed.
3. Advantageous effects
Compared with the prior art, the invention has the advantages that:
① According to the scheme, in the construction of the vulnerability knowledge graph aiming at network security, key words in a security dictionary are combined, and meanwhile, the collection and statistics of internet data resources, the construction of a network model, the processing of the network model, the carding of the vulnerability knowledge graph relationship and the tracing of the vulnerability knowledge graph are sequentially completed, so that the construction of the vulnerability knowledge graph of the network security vulnerability is realized, the construction and operation efficiency of the vulnerability knowledge graph constructed in the mode are effectively improved, and meanwhile, the error rate of the vulnerability knowledge graph is greatly reduced synchronously in the operation process;
② According to the scheme, the content contained in the security dictionary is formed by data statistical analysis of vulnerability descriptions and experience of security operators, and entity extraction can be carried out by using the dictionary by using a better auxiliary artificial intelligent model, so that the accuracy of the extracted entity is judged;
③ Aiming at vulnerability related multi-source data, particularly irregular text data, the artificial intelligent model faces the challenge of extracting entity information under various contexts, understands the context, can train a pretraining model by depending on the data, and better understands the meaning of the context in the corpus;
④ In this solution, the vulnerability description text includes multiple kinds of entities, for example: system, threat, software, etc. How to design the vulnerability knowledge graph model, the extracted content is integrated into the entity, attribute and relation triples, and the execution efficiency and the tracing accuracy of the downstream task are determined.
Drawings
FIG. 1 is a flow chart of a method of constructing a vulnerability knowledge graph of the present invention;
FIG. 2 is a schematic illustration of a vulnerability knowledge graph model of the present invention;
fig. 3 is a schematic diagram of an artificial intelligence network of the present invention with entity extraction.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments, and that all other embodiments obtained by persons of ordinary skill in the art without making creative efforts based on the embodiments in the present invention are within the protection scope of the present invention.
Examples:
Referring to fig. 1-3, a vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network comprises the following specific implementation steps:
Step one, collecting and counting internet data resources: through the collected network resources, statistical analysis is carried out on the collected network resources based on word frequency, finally, experience of network security operators is synthesized, and a security dictionary is constructed through word selection;
step two, constructing a network model: based on the Internet data resource in the first step, a bert model trained by a large-scale training set is selected as a pre-training model to serve as an overall network entrance, and then a classification result is output through a two-way long-short-term memory network and a conditional random field;
Step three, processing a network model: the multi-source vulnerability text data respectively passes through a safety dictionary and an artificial intelligent reasoning model, makes an entity decision according to a weight ratio, and performs disambiguation alignment processing on the selected entity;
fourth, carding the vulnerability knowledge graph relationship: determining entities, respectively filling vulnerability knowledge graph models according to category contents, and establishing new entities, attributes and relations;
Fifthly, tracing the vulnerability knowledge graph: based on the key entity nodes, data searching is carried out, weaknesses and vulnerabilities related to the data searching are obtained, and tracing is completed.
And (5) integrating the word frequency of the text of the existing vulnerability description and the experience of safety operators, and extracting keywords to construct a safety dictionary. And training the long-term and short-term memory entity extraction network based on the pretraining model by marking the vulnerability description corpus. In an actual application scene, firstly, a security dictionary assists an artificial intelligent model to extract an entity from an input corpus, secondly, disambiguation alignment operation is carried out on the entity, and finally, the extracted entity is subjected to graph construction according to a vulnerability graph model.
Referring to fig. 1, in step one, the security dictionary includes content including statistical analysis of data according to vulnerability descriptions and security operator experience.
And the entity extraction can be better carried out by utilizing the data statistical analysis result and experience in the safety dictionary, and the accuracy of the extracted entity can be judged.
Referring to fig. 1, in step one, when constructing a security dictionary, common keywords in the security domain are determined in the security dictionary, and the keywords are used as auxiliary tools for constructing a map.
When the security dictionary is constructed, the keywords are used as search nodes, so that accurate search can be realized, the search efficiency and accuracy are improved, and the quality of the vulnerability knowledge graph is further improved.
Referring to fig. 1 and 3, in the second step, when the network entry is selected, the selected whole network is composed of bert, a bidirectional long-short-term memory network and a conditional random field.
The vulnerability description text information is input into a bert model in the form of words to generate word vectors, the word vectors pass through a two-way long-short-term memory network to generate a category probability model, and finally the category of each word is determined through a conditional random field.
Referring to fig. 1, in step two, when the classification result is output, the classification category is stored according to the classification result, and a system block diagram of the index directory in classification is established.
And the output classification results are subjected to secondary arrangement to form a system block diagram and are stored, so that convenience in the process of indexing the subsequent output classification results is improved, and the indexing efficiency is improved.
Referring to fig. 1, in step three, when making an entity decision, determining category information, constructing an artificial intelligent classification model to extract an entity.
When the entity is extracted, the entity decision is carried out depending on the entity extraction result, so that the accuracy of the entity decision is ensured, decision deviation is avoided, and finally, the construction accuracy of the vulnerability knowledge graph is ensured.
Referring to fig. 1, in step three, the specific steps in entity extraction are as follows:
Step 1: carding all value entity categories involved in the loopholes and related descriptions;
Step 2: the understanding deviation of the model to the semantic information is solved, and the accuracy of the model is improved;
Step 3: and according to the artificial intelligent model, understanding and reasoning are carried out on the input whole sentence, and the entity extraction accuracy is further optimized.
Referring to fig. 1, in step three, when multi-source vulnerability text data passes through a security dictionary and an artificial intelligence reasoning model respectively, aiming at vulnerability related multi-source data, the artificial intelligence model faces the challenge of extracting entity information under various contexts, carries out the same understanding by combining contexts, trains a pre-training model by depending on the data, and understands the context meaning in the corpus.
The loophole related multi-source data, especially for irregular text data, is combined with the context to understand the sentence meaning, so that the condition that the same sentence is ambiguous or objectional is avoided, and the accuracy of the understanding of the multi-source loophole text is improved.
Referring to fig. 1 and fig. 2, in step three, the range of the vulnerability description text is as follows: system, threat, and software.
The system, the threat and the software are all kinds of entities in the vulnerability description text, and when aiming at how to design a vulnerability knowledge graph model, the extracted contents are integrated into the entity, attribute and relation triples, so that the execution efficiency and the tracing accuracy of the downstream tasks can be effectively determined.
The knowledge graph is composed of 7 entities and 5 attributes, and vulnerability and weak point entities form main elements of the knowledge graph, and other entities such as: software, systems and the like are key nodes for tracing the source of security operators, and the related weaknesses and vulnerabilities can be obtained through the data search of the key entity nodes to complete tracing.
Referring to fig. 1, in step four, when new entities, attributes and relationships are established, a knowledge graph prototype graph is designed, and entity, attribute and relationship triples are determined according to the carded category information, so as to dynamically construct a vulnerability knowledge graph.
The designed dynamic vulnerability knowledge graph can meet the expression capability of the vulnerability knowledge graph on network security research of similar types, and the practicability of the whole knowledge graph is improved.
The above description is only of the preferred embodiments of the present invention; the scope of the invention is not limited in this respect. Any person skilled in the art, within the technical scope of the present disclosure, may apply to the present invention, and the technical solution and the improvement thereof are all covered by the protection scope of the present invention.

Claims (10)

1. A vulnerability knowledge graph construction method based on a safety dictionary and a deep learning network is characterized by comprising the following steps of: the construction method comprises the following specific implementation steps:
Step one, collecting and counting internet data resources: through the collected network resources, statistical analysis is carried out on the collected network resources based on word frequency, finally, experience of network security operators is synthesized, and a security dictionary is constructed through word selection;
step two, constructing a network model: based on the Internet data resource in the first step, a bert model trained by a large-scale training set is selected as a pre-training model to serve as an overall network entrance, and then a classification result is output through a two-way long-short-term memory network and a conditional random field;
Step three, processing a network model: the multi-source vulnerability text data respectively passes through a safety dictionary and an artificial intelligent reasoning model, makes an entity decision according to a weight ratio, and performs disambiguation alignment processing on the selected entity;
fourth, carding the vulnerability knowledge graph relationship: determining entities, respectively filling vulnerability knowledge graph models according to category contents, and establishing new entities, attributes and relations;
Fifthly, tracing the vulnerability knowledge graph: based on the key entity nodes, data searching is carried out, weaknesses and vulnerabilities related to the data searching are obtained, and tracing is completed.
2. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the first step, the content included in the security dictionary includes data statistical analysis according to vulnerability descriptions and security operator experience.
3. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the first step, when constructing the safety dictionary, determining common keywords in the safety field in the safety dictionary, and taking the keywords as auxiliary tools for constructing the atlas.
4. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the second step, when the network entry is selected, the whole selected network consists of bert, a two-way long-short-term memory network and a conditional random field.
5. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the second step, when the classification result is output, the classification category is stored according to the classification result, and a system block diagram of an index directory in classification is established.
6. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the third step, when the entity decision is made, the category information is determined, and an artificial intelligent classification model is constructed for entity extraction.
7. The security dictionary-based deep learning network vulnerability knowledge graph construction method as set forth in claim 6, wherein the method is characterized in that: in the third step, the specific steps in the entity extraction process are as follows:
Step 1: carding all value entity categories involved in the loopholes and related descriptions;
Step 2: the understanding deviation of the model to the semantic information is solved, and the accuracy of the model is improved;
Step 3: and according to the artificial intelligent model, understanding and reasoning are carried out on the input whole sentence, and the entity extraction accuracy is further optimized.
8. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the third step, when the multi-source vulnerability text data respectively passes through the safety dictionary and the artificial intelligent reasoning model, aiming at the vulnerability related multi-source data, the artificial intelligent model faces the challenge of extracting entity information under various contexts, the same understanding is carried out by combining the contexts, and the pre-training model is trained by depending on the data to understand the context meanings of the corpus.
9. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the third step, the inclusion range of the vulnerability description text is as follows: system, threat, and software.
10. The security dictionary-based deep learning network vulnerability knowledge graph construction method as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the fourth step, when new entities, attributes and relations are established, a knowledge graph prototype graph is designed, and the entity, attribute and relation triples are determined according to the class information which is combed, so that the vulnerability knowledge graph is dynamically constructed.
CN202410317361.2A 2024-03-20 2024-03-20 Vulnerability knowledge graph construction method based on safety dictionary and deep learning network Pending CN117910567A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410317361.2A CN117910567A (en) 2024-03-20 2024-03-20 Vulnerability knowledge graph construction method based on safety dictionary and deep learning network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410317361.2A CN117910567A (en) 2024-03-20 2024-03-20 Vulnerability knowledge graph construction method based on safety dictionary and deep learning network

Publications (1)

Publication Number Publication Date
CN117910567A true CN117910567A (en) 2024-04-19

Family

ID=90686336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410317361.2A Pending CN117910567A (en) 2024-03-20 2024-03-20 Vulnerability knowledge graph construction method based on safety dictionary and deep learning network

Country Status (1)

Country Link
CN (1) CN117910567A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347801A (en) * 2018-09-17 2019-02-15 武汉大学 A kind of vulnerability exploit methods of risk assessment based on multi-source word insertion and knowledge mapping
CN110941716A (en) * 2019-11-05 2020-03-31 北京航空航天大学 Automatic construction method of information security knowledge graph based on deep learning
CN113971398A (en) * 2021-10-20 2022-01-25 西安交通大学 Dictionary construction method for rapid entity identification in network security field
CN115238029A (en) * 2022-06-22 2022-10-25 国网天津市电力公司电力科学研究院 Construction method and device of power failure knowledge graph
CN115859304A (en) * 2022-12-19 2023-03-28 南京理工大学 Vulnerability discovery knowledge graph construction method fusing ATT and CK frameworks
CN116244446A (en) * 2022-12-30 2023-06-09 中国人民解放军战略支援部队信息工程大学 Social media cognitive threat detection method and system
CN117240575A (en) * 2023-10-10 2023-12-15 国网青海省电力公司电力科学研究院 Network attack data processing method, device, equipment and medium
CN117312493A (en) * 2023-09-08 2023-12-29 中国中医科学院中医药信息研究所 Multi-strategy knowledge extraction system
CN117371523A (en) * 2023-10-24 2024-01-09 重庆邮电大学 Education knowledge graph construction method and system based on man-machine hybrid enhancement

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347801A (en) * 2018-09-17 2019-02-15 武汉大学 A kind of vulnerability exploit methods of risk assessment based on multi-source word insertion and knowledge mapping
CN110941716A (en) * 2019-11-05 2020-03-31 北京航空航天大学 Automatic construction method of information security knowledge graph based on deep learning
CN113971398A (en) * 2021-10-20 2022-01-25 西安交通大学 Dictionary construction method for rapid entity identification in network security field
CN115238029A (en) * 2022-06-22 2022-10-25 国网天津市电力公司电力科学研究院 Construction method and device of power failure knowledge graph
CN115859304A (en) * 2022-12-19 2023-03-28 南京理工大学 Vulnerability discovery knowledge graph construction method fusing ATT and CK frameworks
CN116244446A (en) * 2022-12-30 2023-06-09 中国人民解放军战略支援部队信息工程大学 Social media cognitive threat detection method and system
CN117312493A (en) * 2023-09-08 2023-12-29 中国中医科学院中医药信息研究所 Multi-strategy knowledge extraction system
CN117240575A (en) * 2023-10-10 2023-12-15 国网青海省电力公司电力科学研究院 Network attack data processing method, device, equipment and medium
CN117371523A (en) * 2023-10-24 2024-01-09 重庆邮电大学 Education knowledge graph construction method and system based on man-machine hybrid enhancement

Similar Documents

Publication Publication Date Title
CN110717049B (en) Text data-oriented threat information knowledge graph construction method
CN108804521B (en) Knowledge graph-based question-answering method and agricultural encyclopedia question-answering system
CN109948911B (en) Evaluation method for calculating network product information security risk
CN105740228B (en) A kind of internet public feelings analysis method and system
CN111787090B (en) Intelligent treatment platform based on block chain technology
CN110674840B (en) Multi-party evidence association model construction method and evidence chain extraction method and device
CN108874878A (en) A kind of building system and method for knowledge mapping
CN111967761B (en) Knowledge graph-based monitoring and early warning method and device and electronic equipment
WO2021073116A1 (en) Method and apparatus for generating legal document, device and storage medium
CN107818164A (en) A kind of intelligent answer method and its system
CN106776544A (en) Character relation recognition methods and device and segmenting method
CN107704453A (en) A kind of word semantic analysis, word semantic analysis terminal and storage medium
CN111538844A (en) Target field knowledge base generation and problem solution method and device
CN110175585B (en) Automatic correcting system and method for simple answer questions
CN112183059B (en) Chinese structured event extraction method
CN109460460B (en) Domain ontology construction method for intelligent application
CN112883286A (en) BERT-based method, equipment and medium for analyzing microblog emotion of new coronary pneumonia epidemic situation
CN112989414A (en) Mobile service data desensitization rule generation method based on width learning
CN110147552A (en) Educational resource quality evaluation method for digging and system based on natural language processing
CN117520522B (en) Intelligent dialogue method and device based on combination of RPA and AI and electronic equipment
Wen et al. A cross-project defect prediction model based on deep learning with self-attention
Hu Research and implementation of railway technical specification question answering system based on deep learning
CN117910567A (en) Vulnerability knowledge graph construction method based on safety dictionary and deep learning network
CN117473054A (en) Knowledge graph-based general intelligent question-answering method and device
CN109800430A (en) A kind of semantic understanding method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination