CN114915446B

CN114915446B - Intelligent network security detection method integrating priori knowledge

Info

Publication number: CN114915446B
Application number: CN202210340432.1A
Authority: CN
Inventors: 沈毅; 薛鹏飞; 李振汉; 马慧敏; 李倩玉; 施凡
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-04-02
Filing date: 2022-04-02
Publication date: 2023-08-29
Anticipated expiration: 2042-04-02
Also published as: CN114915446A

Abstract

The invention provides an intelligent network security detection method integrating priori knowledge, which comprises the following steps: defining a characterization form of the vulnerability body, storing and extracting vulnerability knowledge to form a vulnerability knowledge base; acquiring information from an environment to be tested; constructing a state information matrix; taking the state information matrix as the input of the agent and the vulnerability knowledge base; constructing an intelligent body; determining a behavior strategy of the intelligent agent based on a state information matrix acquired by the environment to be detected and potential vulnerability information acquired by the vulnerability knowledge base; based on the behavior strategy of the intelligent agent, specific behaviors are executed, and according to the execution result of the behaviors and the influence on the environment, the rewarding module calculates rewarding information and feeds the rewarding information back to the intelligent agent to guide the intelligent agent to update the strategy. According to the scheme of the invention, the network security detection of the network environment is automatically realized, the efficiency of the network security detection is improved, and the problem that the automatic network security detection is difficult to apply in a complex environment is solved.

Description

Intelligent network security detection method integrating priori knowledge

Technical Field

The invention relates to the field of network space safety, in particular to an intelligent network safety detection method integrating priori knowledge.

Background

Periodic security testing is an important process to assess asset resilience and compliance, particularly confidentiality, availability, and integrity. Network security detection (Penetration Testing) is widely recognized as the best method to evaluate digital asset security by identifying and exploiting vulnerabilities. In the process of PT, security specialists face the problems of complex environment, repeated operation and the like, and PT task automation is obviously a method for saving manpower and resources and is efficient. Early research focused on improving PT systems by optimizing the planning phase, which is modeled as an attack graph or decision tree problem, reflecting the practical nature of continuous decisions. Nevertheless, due to the static nature of the method and its limitations on the planning phase, most of the work is related to vulnerability assessment, not PT.

In recent years, machine Learning (ML) opens up a new approach to effectively solve complex problems. ML has proven to be able to handle difficult problems faster and more accurately than humans in some cases. There are three types of ML: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning and unsupervised learning have been used for intrusion detection, malware detection, privacy preserving systems, and the like. Preparing a larger-scale dataset for training is a prerequisite for developing a security solution. However, in a real-time, continuous environment such as web security detection, it is difficult to preprocess the behavior data set, and therefore, both supervised learning and unsupervised learning are not suitable for application to solve the automated PT problem. Reinforcement learning (Reinforcement Learning, RL) is a type of machine learning that learns through exploration of the environment and accumulation of experience, and the agents of the RL can adapt themselves to a real-time, continuous environment without a priori data sets.

In 2013, sarraute et AL established a 4AL decomposition algorithm that divided a large network into smaller networks according to network structure and solved one by one through a partially observable Markov decision process (Partially Observable Markov Decision Processes, POMDP). In 2014, durkota et al proposed an algorithm for computing an optimal attack strategy with action cost and probability of failure attack graph, converting the optimal path planning problem of the attack graph into a markov decision process (Markov Decision Processes, MDP), generating an optimal attack strategy to guide network security detection. In 2017, shmaryahu et al modeled PT as a partially observable sporadic problem and designed a sporadic planning tree algorithm to plan the path of the attack. In the same year Alexander preschner introduced POMDP into industrial control systems in an attempt to automatically verify the security of the industrial control system. In 2018, ghanem and Chen modeled the system as POMDP and tested using an external POMDP solver. In 2019, the week et al describe PT as an MDP procedure, and proposed an attack planning (NIG-AP) algorithm based on network information gain. And acquiring rewards by using the network information, guiding the agent to select the best response action, and finding a hidden attack path from the perspective of an intruder. In 2020, hu et al constructed an automated network security detection framework based on deep reinforcement learning to automatically find the best attack path for a given topology. 2021, zennaro et al formalize simple CTF topics as network security detection problems, and model-free-based reinforcement learning solves such problems.

The research results based on POMDP confirm the assumption that reinforcement learning can improve the accuracy and reliability of network security detection. However, because of the large number of hosts in the network security detection environment, the configuration of the hosts is complex, and it is very difficult to accurately solve the POMDP. MDP-based reinforcement learning may in principle allow model-free learning, but may in fact require some form of a priori knowledge to be relied upon to solve the problem.

Disclosure of Invention

In order to solve the technical problems, the invention provides an intelligent network security detection method integrating priori knowledge, which is used for solving the technical problems that the automatic network security detection method in the prior art is low in efficiency, low in practicality and difficult to apply to a real large-scale network scene.

According to a first aspect of the present invention, there is provided an intelligent network security detection method incorporating a priori knowledge, the method comprising the steps of:

step S1: defining a characterization form of a vulnerability body, wherein the vulnerability body is characterized based on concepts, attributes and relations of the vulnerability body; extracting vulnerability knowledge from the acquired knowledge source based on the acquired knowledge source and the characterization form of the vulnerability body; storing the extracted vulnerability knowledge to form a vulnerability knowledge base;

step S2: an environment information acquisition module is constructed and is used for acquiring the following basic information from the environment to be detected: operating a host IP, an operating system, a survival port and service information, and storing the acquired basic information according to class numbers;

step S3: judging whether a preset target is reached, if so, ending the method; if not, entering step S4; the preset target is to realize network security detection of a specific target;

step S4: based on the environment information acquisition module, acquiring basic information of an environment to be detected; based on the number information, acquiring a network topology structure, host authority and host configuration information of an environment to be detected, and constructing a state information matrix; taking the state information matrix as input of an agent and the vulnerability knowledge base;

step S5: constructing an intelligent body; determining a behavior strategy of the intelligent agent based on the environment state information matrix and the potential vulnerability information acquired by the vulnerability knowledge base;

step S6: executing specific behaviors based on the behavior strategy of the intelligent agent, acting the specific behaviors on the environment to be tested, calculating rewarding information by a rewarding module according to the execution result of the behaviors and the influence on the environment, and feeding back the rewarding information to the intelligent agent to guide the intelligent agent to update the strategy; step S3 is entered.

Further, the vulnerability body is characterized based on concepts, attributes and relations of the vulnerability body, the vulnerability body refers to defects existing in specific implementation of hardware, software and protocols or system security strategies, the attributes of the vulnerability body refer to potential conditions of the vulnerability, the relations of the vulnerability body refer to interaction relations among the vulnerabilities, wherein the attributes of the vulnerability body comprise a utilization mode of the vulnerability, effects and influences generated by the vulnerability, whether the vulnerability utilization exists, whether the vulnerability is contained or not, and an operating system corresponding to the service containing the vulnerability, and the relations comprise an intersection relation, an inheritance relation and an attribute relation.

Further, the step S2 is that:

the preset target is to realize network security detection of a specific target, including network security detection of a specific host in a network environment and/or network security detection of a single host starting from a certain initial host.

Further, the environmental status information matrix to be measured is defined as follows:

wherein h is _i h _j Representing the connection relation between the ith host to be tested and the jth host to be tested, wherein 0 represents that the hosts to be tested are not communicated, 1 represents that the hosts to be tested are communicated, and h _i h _i Representing the authority level acquired on the host i to be tested, and setting the number of nodes as a fixed value, p due to different numbers of network nodes in different network security detection environments _k (h _i ) Indicating whether the host i to be tested contains an attribute with number k, privilege (h _i ) And the authority of the intelligent agent on the host i to be tested is indicated.

According to a second aspect of the present invention, there is provided an intelligent network security detection apparatus incorporating a priori knowledge, the apparatus comprising:

the vulnerability knowledge base construction module: the method comprises the steps of configuring a characterization form of a vulnerability body, wherein the vulnerability body is characterized based on concepts, attributes and relations of the vulnerability body; extracting vulnerability knowledge from the acquired knowledge source based on the acquired knowledge source and the characterization form of the vulnerability body; storing the extracted vulnerability knowledge to form a vulnerability knowledge base;

and a detection module: the environment information acquisition module is configured to construct an environment information acquisition module, and the environment information acquisition module is used for acquiring the following basic information from the environment to be detected: operating a host IP, an operating system, a survival port and service information, and storing the acquired basic information according to class numbers;

and a judging module: the method comprises the steps of configuring to judge whether a preset target is reached, wherein the preset target is network security detection for a specific target;

the state information matrix construction module: the environment information acquisition module is configured to acquire basic information of an environment to be detected; based on the number information, acquiring a network topology structure, host authority and host configuration information of an environment to be detected, and constructing a state information matrix; taking the state information matrix as input of an agent and the vulnerability knowledge base;

the behavior determination module: configured to construct a smart agent; determining a behavior strategy of the intelligent agent based on the environment state information matrix and the potential vulnerability information acquired by the vulnerability knowledge base;

and an updating module: the system is configured to execute specific behaviors based on an agent behavior strategy, act the specific behaviors on the environment to be tested, calculate rewarding information by a rewarding module according to the execution result of the behaviors and the influence on the environment, feed back the rewarding information to the agent, and guide the agent to update the strategy; and triggering a judging module.

According to a third aspect of the present invention, there is provided an intelligent network security detection system incorporating a priori knowledge, comprising:

a processor for executing a plurality of instructions;

a memory for storing a plurality of instructions;

wherein the plurality of instructions are for storage by the memory and loading and executing by the processor the method as described above.

According to a fourth aspect of the present invention, there is provided a computer-readable storage medium having stored therein a plurality of instructions; the plurality of instructions are for loading and executing by the processor the method as described above.

According to the scheme, the method aims at realizing automatic and intelligent network security detection, integrates priori knowledge, and provides a network security detection method based on knowledge graph and reinforcement learning. The method combines the related technology of reinforcement learning in the artificial intelligence field and the knowledge graph to construct an automatic model, thereby realizing intelligent network security detection. The invention aims to solve the problems that the network environment is complex, the types and the number of loopholes are various, and the automatic network security detection is difficult to realize. The method solves the problems by adopting a method combining knowledge graph and reinforcement learning, and realizes the following effects: (1) According to the method, a vulnerability knowledge base is constructed based on the knowledge graph, so that potential vulnerability information in the current host and the network can be managed and inquired conveniently, and meanwhile convenience is brought to intelligent agent analysis of optional effective behaviors; (2) The method changes the behavior selection mode of the intelligent agent, converts the intelligent agent from randomly exploring the available behaviors into selecting the optimal behaviors from the available behaviors, and greatly improves the learning efficiency of the intelligent agent; (3) By utilizing the method, the automation and intelligent network security detection of the network environment can be realized.

The foregoing description is only an overview of the present invention, and is intended to provide a better understanding of the present invention, as it is embodied in the following description, with reference to the preferred embodiments of the present invention and the accompanying drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention, illustrate the invention and together with the description serve to explain the invention. In the drawings:

FIG. 1 is a flow chart of an intelligent network security detection method incorporating a priori knowledge in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of an intelligent network security detection model with integrated prior knowledge according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a vulnerability knowledge base construction mode according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a vulnerability ontology according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an agent neural network according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an intelligent network security detection device with integrated a priori knowledge according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments of the present invention and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

First, a flow chart of an intelligent network security detection method integrating priori knowledge according to an embodiment of the present invention is described with reference to fig. 1-2. As shown in fig. 1-2, the method comprises the steps of:

Because the network security detection is a sequential decision problem, the task is decomposed into network security detection behaviors which are judged to be adopted according to the current state, so that the network security detection can be modeled as a Markov decision process, and the automatic network security detection is realized based on the related technology of reinforcement learning. The method comprises the steps of obtaining a state of an intelligent agent, acquiring a first-pass knowledge base, inquiring a feasible operation in the first-pass knowledge base, and selecting a final action from the feasible actions according to the learned experience, wherein the first-pass knowledge base is used as a library of the intelligent agent.

According to the method, knowledge graph storage vulnerability related information is constructed, and possible vulnerabilities are matched according to current state information; the method for automatically detecting network security based on reinforcement learning is realized, state information acquired by an intelligent agent from the environment is subjected to vulnerability matching, and then the next action is selected from the matched actions.

The step S1, wherein:

the vulnerability information has the characteristics of sea quantity, dispersion and fragmentation, and the important condition for constructing the vulnerability knowledge base is to collect the vulnerability information. The collection of vulnerability information mainly adopts the mode of internet query and acquisition, and currently four types of vulnerability related standards and specifications, namely CVE (Common Vulnerabilities & Exposures), CPE (Common platform enumeration), CVSS (Common Vulnerability Scoring System) and CNNVD (China National Vulnerability Database of Information Security), are internationally known. Wherein CVE is the most internationally trusted security hole disclosure and release unit at present; CPE is a standardized method for describing and identifying applications, operating systems and hardware devices present in an enterprise computing asset; CVSS is an industry-published standard for evaluating vulnerability severity and helping to determine response urgency and importance; CNNVD is a domestic authoritative vulnerability recording platform. And collecting vulnerability information from CVE, CPE, CVSS and CNNVD, integrating the vulnerability information, and taking the obtained information as a knowledge source.

The method comprises the steps of defining a characterization form of a vulnerability body based on a knowledge graph, wherein the vulnerability body is characterized based on concepts, attributes and relations of the vulnerability body, the vulnerability body refers to defects existing in specific implementation of hardware, software and protocols or system security strategies, the attributes of the vulnerability body refer to potential conditions of the vulnerability, the relations of the vulnerability body refer to interaction relations among the vulnerabilities, the attributes of the vulnerability body comprise utilization modes of the vulnerability, effects and influences of the vulnerability, whether vulnerability utilization exists, whether service containing the vulnerability exists or not and an operating system corresponding to the service containing the vulnerability, and the relations comprise intersection relations, inheritance relations and attribute relations.

In this embodiment, the defined vulnerability ontology is shown in fig. 4. Firstly, reasonable characterization of vulnerability information is needed, so that a vulnerability body is constructed for the association among a common operating system, common services and common vulnerabilities according to expert experience, the association among relevant knowledge of the vulnerabilities can be better described, the structure of the vulnerability body is shown in fig. 4, wherein the number is an internal number, the number is consistent with the number in environmental state information, and an intelligent agent can conveniently match the vulnerabilities according to the input state information.

And extracting vulnerability knowledge from the acquired knowledge source based on the acquired knowledge source and the characterization form of the vulnerability body. In this embodiment, the information in the knowledge source is derived from different specifications and standards and may include repeated or structurally different information, so that the named entity recognition technology is adopted to refine and clean the information in the knowledge source, and then relationship extraction is performed, so as to finally realize knowledge extraction of vulnerability information.

Storing and extracting the vulnerability knowledge to form a vulnerability knowledge base, namely selecting the vulnerability knowledge with the importance degree larger than a preset threshold value, and storing the vulnerability knowledge related to the states of the network to be tested and the host to be tested into a graph database by adopting Neo4j to realize the construction of the vulnerability knowledge base.

When the security expert detects network security, the security expert judges possible loopholes in the current environment according to the scanned network and host state information, and utilizes the loopholes to realize network security detection. In this process, the judgment of the expert is based on the accumulation of self knowledge, so that lack of expert experience is one of the great challenges facing the current automatic network security detection. Therefore, the invention constructs a priori knowledge base similar to expert experience, collects the vulnerability information, wherein the vulnerability information comprises vulnerability numbers, vulnerability grades, vulnerability sources, functions capable of being realized by utilizing the vulnerabilities and the like, extracts useful information through a named entity identification technology, processes the information to normalize the information, constructs a knowledge reasoning model, and realizes the management of the vulnerability information.

The step S3, wherein:

the preset target is to realize network security detection of a specific target in the network to be detected, including network security detection of a specific host in a network environment and/or network security detection of a single host starting from a certain host in the network to be detected.

The step S4, wherein:

based on the environment information acquisition module, basic information of the environment to be detected is acquired, the basic information is classified and numbered according to information content, an environment state matrix is constructed based on the numbered environment information and used as input information of the intelligent agent, and the environment state matrix contains information such as network topology structure, host configuration and the like explored by the current intelligent agent. According to the information scanned from the environment by an expert in the actual network security detection process, an environment state information matrix to be detected of an automatic network security detection model is defined as follows:

wherein h is _i h _j Representing the connection relation between the ith host to be tested and the jth host to be tested, wherein 0 represents that the hosts to be tested are not communicated, 1 represents that the hosts to be tested are communicated, and h _i h _i Representing the authority level acquired on the host i to be tested, and setting the number of nodes as a fixed value, p due to different numbers of network nodes in different network security detection environments _k (h _i ) Representing to-be-measuredWhether the host i contains an attribute with number k, privile (h _i ) And the authority of the intelligent agent on the host i to be tested is indicated.

Taking a state information matrix as the input of the agent and the vulnerability database, wherein the state matrix is used as the input of the agent to provide scene information for the agent; and the state information matrix is used as input of a vulnerability database to predict potential vulnerabilities of the environment to be tested, so that a more accurate behavior space is provided for the intelligent agent.

The step S5, wherein:

determining a behavior by an agent based on the output of a vulnerability priori knowledge base, wherein the behavior is the output of the agent, and represents the decision made by the agent for the current environment, the output of the agent comprises the connection behavior between hosts in a network to be tested and the vulnerability exploitation behavior, and the connection behavior between the hosts refers to the behavior of the agent in transverse movement between the hosts; the loophole utilization is to judge possible loopholes in the network environment to be detected by analyzing the input state and matching the loophole information in the loophole knowledge base, combine the loophole utilization corresponding to the possible loopholes with the connection behaviors between the hosts found in the environment to be detected to form a behavior base which can be selected by an intelligent agent, set the size of the behavior base as a fixed value for ensuring the stability of the model in the learning process, and select and execute actions from the given behavior base, which is different from the situation that any intelligent agent freely explores in all the previous actions.

In an automated network security detection model based on reinforcement learning, the input received by an agent is state information obtained from the environment, and the probability of taking each action for the agent is output. Because of the complexity of the network security detection environment, the size of the state space grows exponentially along with the network scale and the host configuration, and the traditional table type method such as the Q-learning method is not applicable to realizing automatic network security detection, the invention introduces a deep reinforcement learning technology, and can effectively solve the problem of overlarge state space based on a neural network fitting Q function. However, with the introduction of a priori knowledge, the actions that an agent can take in different states are different, resulting in inconsistent numbers of output nodes. Aiming at the situation, the reinforcement learning algorithm needs to be redesigned to realize intelligent decision.

The neural network structure for updating the Q value of the agent is shown in figure 5, the neural network model comprises three convolution layers, and a third convolution layer is connected with the full connection layer; since the values in the state matrix are mostly 0 or 1, and the matrix is sparse, no pooling layer is used. Taking an environment state matrix acquired from an environment to be measured as input of a first convolution layer, and outputting the characteristics of the environment state matrix by a full connection layer. In this embodiment, the maximum number of network nodes is set to 100, and the host configuration includes information such as services and ports, and the value is set to 100, so that the input matrix size is 100×200. And in addition, the connection behavior is added besides the behavior of the vulnerability, which indicates the behavior of moving from the current host to other hosts, so that the behavior space of the output of the intelligent agent is set to be 11. In addition, the number of convolution layers, the number of convolution kernels per layer, the size of the convolution kernels of each layer, and other super parameters need to be determined through experiments. And selecting proper output layer activation functions and loss functions according to task requirements.

The step S6, wherein:

the rewards are feedback to the behaviors of the intelligent agents, are critical to reinforcement learning, determine the learning direction and convergence speed of the intelligent agents, influence the correctness and effectiveness of decision of the intelligent agents, and divide the rewards into positive feedback and negative feedback: the intelligent agent successfully completes the positive rewards obtained by the behaviors and the negative rewards obtained by the failure of the behavior execution. And the intelligent agent receives the rewards, adjusts the neural network parameters according to the rewards, guides the intelligent agent to update the strategy, and makes more accurate predictions. The steps are repeated until the objective in step S3 is met.

Fig. 6 is a schematic structural diagram of an intelligent network security detecting device with a priori knowledge fusion according to an embodiment of the present invention, as shown in fig. 6, the device includes:

The embodiment of the invention further provides an intelligent network security detection system integrating priori knowledge, which comprises the following steps:

a processor for executing a plurality of instructions;

a memory for storing a plurality of instructions;

The embodiment of the invention further provides a computer readable storage medium, wherein a plurality of instructions are stored in the storage medium; the plurality of instructions are for loading and executing by the processor the method as described above.

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the elements is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for making a computer device (which may be a personal computer, a physical machine Server, or a network cloud Server, etc., and need to install a Windows or Windows Server operating system) execute part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above description is only of the preferred embodiments of the present invention, and is not intended to limit the present invention in any way, but any simple modification, equivalent variation and modification made to the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. An intelligent network security detection method integrating priori knowledge, which is characterized by comprising the following steps:

2. The method of claim 1, wherein the vulnerability body is characterized based on concepts, attributes and relationships of the vulnerability body, the vulnerability body refers to defects existing in specific implementation of hardware, software and protocols or system security policies, the attributes of the vulnerability body refer to potential conditions for existence of the vulnerability, the relationships of the vulnerability body refer to interaction relationships among the vulnerabilities, wherein the attributes of the vulnerability body comprise utilization modes of the vulnerability, effects and influences generated by the vulnerability, whether the vulnerability utilization exists, whether services containing the vulnerability exist and operating systems corresponding to the services containing the vulnerability, and the relationships comprise intersection relationships, inheritance relationships and attribute relationships.

3. The method according to claim 2, wherein said step S2, wherein:

4. A method as claimed in claim 3, characterized in that the matrix of environmental status information to be measured is defined as follows:

5. An intelligent network security detection device incorporating a priori knowledge, the device comprising:

6. An intelligent network security detection system incorporating a priori knowledge, comprising:

a processor for executing a plurality of instructions;

a memory for storing a plurality of instructions;

wherein the plurality of instructions are for storage by the memory and loading and executing by the processor the method of any of claims 1-4.

7. A computer-readable storage medium having stored therein a plurality of instructions; the plurality of instructions for loading and executing by a processor the method of any of claims 1-4.