WO2024124640A1 - 基于威胁分析图谱的节点分析方法及装置 - Google Patents

基于威胁分析图谱的节点分析方法及装置 Download PDF

Info

Publication number
WO2024124640A1
WO2024124640A1 PCT/CN2022/144095 CN2022144095W WO2024124640A1 WO 2024124640 A1 WO2024124640 A1 WO 2024124640A1 CN 2022144095 W CN2022144095 W CN 2022144095W WO 2024124640 A1 WO2024124640 A1 WO 2024124640A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
target
data
graph
representation
Prior art date
Application number
PCT/CN2022/144095
Other languages
English (en)
French (fr)
Inventor
刘浩然
王占一
吴萌
黄朝文
白敏�
汪列军
Original Assignee
奇安信科技集团股份有限公司
奇安信网神信息技术(北京)股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 奇安信科技集团股份有限公司, 奇安信网神信息技术(北京)股份有限公司 filed Critical 奇安信科技集团股份有限公司
Publication of WO2024124640A1 publication Critical patent/WO2024124640A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the present application relates to the field of network security technology, and in particular to a node analysis method and device based on a threat analysis graph.
  • APT Advanced Persistent Threat
  • log data is usually obtained by detecting the network layer, and the log data is analyzed to obtain threat intelligence from the massive log data.
  • the embodiments of the present application provide a node analysis method and device based on a threat analysis graph.
  • an embodiment of the present application provides a node analysis method based on a threat analysis graph, comprising:
  • the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • extracting target subgraph data associated with the seed node from the threat analysis graph stored in the graph database includes:
  • the target association data includes node data and edge data;
  • the node data of the seed node and the target associated data are combined to obtain the target subgraph data.
  • determining the node representation of the target node in the target subgraph data includes:
  • the node representation of the target node is determined based on the graph embedding vector of each node.
  • the determining of the graph embedding vector of each node in the target subgraph data includes:
  • the current business scenario includes a business scenario of searching for structurally similar nodes, determining a graph embedding vector of each node in the target subgraph data based on a structural similarity algorithm;
  • a graph embedding vector of each node in the target subgraph data is determined based on a content similarity algorithm.
  • determining the node representation of the target node based on the graph embedding vector of each node includes:
  • the target graph neural network model is obtained by training based on graph embedding vector samples of multiple nodes.
  • the target graph neural network model includes an acquisition module and an aggregation module
  • the step of inputting the graph embedding vector of each of the nodes into the target graph neural network model to obtain the node representation of the target node output by the target graph neural network model includes:
  • the node aggregation information is determined as a node representation of the target node.
  • the analyzing the target node based on the node representation of the target node includes:
  • the node data of the target node is stored in a fall identification map database, or an alarm is issued to the target node, or the node data of the target node and the node data of the associated nodes of the target node are displayed.
  • the analyzing the target node based on the node representation of the target node includes:
  • the node representation of the target node is compared and analyzed with the node representations of other nodes to determine nodes similar to the target node.
  • the embodiment of the present application further provides a node analysis device based on a threat analysis graph, including:
  • a first extraction unit is used to extract target data from the source data and use the target data as a seed node; the target data is data with security risks;
  • a second extraction unit configured to extract target subgraph data associated with the seed node from the threat analysis graph stored in the graph database
  • a determination unit configured to determine a node representation of a target node in the target subgraph data; the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • An analyzing unit is used to analyze the target node based on the node representation of the target node.
  • an embodiment of the present application further provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps of the node analysis method based on the threat analysis graph as described in the first aspect are implemented.
  • an embodiment of the present application further provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the node analysis method based on the threat analysis graph as described in the first aspect.
  • an embodiment of the present application further provides a computer program product having executable instructions stored thereon, which, when executed by a processor, enables the processor to implement the steps of the node analysis method based on the threat analysis graph described in the first aspect.
  • the node analysis method and device based on the threat analysis graph uses the target data with security risks extracted from the source data as the seed node, extracts the target subgraph data associated with the seed node in the threat analysis graph, determines the node representation of the target node in the target subgraph data, and the node representation includes the node data of the target node and the node data of the neighboring nodes of the target node, and finally performs a correlation analysis on the target node based on the node representation of the target node. It can be seen that the present application only determines the node representation of the target node in the target subgraph data associated with the seed node, and there is no need to calculate and analyze all the graph data in the threat analysis graph, thereby improving the efficiency of data analysis.
  • FIG1 is a schematic diagram of a process flow of a node analysis method based on a threat analysis graph provided in an embodiment of the present application
  • FIG2 is a schematic diagram of target subgraph data extraction provided by an embodiment of the present application.
  • FIG3 is a second flow chart of a node analysis method based on a threat analysis graph provided in an embodiment of the present application
  • FIG4 is a schematic diagram of the structure of an initial autoencoder model provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of converting target subgraph data into a node representation of a target node according to an embodiment of the present application
  • FIG6 is a schematic diagram of the structure of a node analysis system based on a threat analysis graph according to an embodiment of the present application
  • FIG7 is a schematic diagram of the structure of a node analysis device based on a threat analysis graph provided in an embodiment of the present application;
  • FIG8 is a schematic diagram of the physical structure of an electronic device provided in an embodiment of the present application.
  • FIG. 1 is a flowchart of a node analysis method based on a threat analysis graph provided in an embodiment of the present application. As shown in FIG. 1 , the node analysis method based on a threat analysis graph includes the following steps:
  • Step 101 extract target data from source data and use the target data as a seed node; the target data is data with security risks.
  • the source data can be data in the sandbox, crawler data, or Indicator of Compromise (IOC) data, etc.
  • IOC Indicator of Compromise
  • IOC is a type of threat intelligence, i.e., the remote command and control server intelligence used by the attacker to control the victim host.
  • IOC usually includes domain name, Internet Protocol (IP), uniform resource locator (URL), Secure Socket Layer (SSL) certificate, HASH, etc.
  • IP Internet Protocol
  • URL uniform resource locator
  • SSL Secure Socket Layer
  • massive amounts of data are generated every day, including but not limited to network behavior data generated by malicious samples running in sandboxes, Internet threat risk data crawled by web crawlers, threat intelligence data in open source security reports, etc.
  • Multiple source data are collected regularly, and target data with security risks such as domains, URLs, and IP addresses are extracted from the source data as seed nodes.
  • Step 102 extract target subgraph data associated with the seed node from the threat analysis graph stored in the graph database.
  • the graph database may be NebulaGraph, which is a distributed graph database that stores tens of billions of threat analysis graphs.
  • the threat analysis graph contains multiple nodes and multiple edges, where a node represents an entity and an edge represents the relationship between two entities.
  • the data association undirected graph is determined based on the data type (node type, edge type) of the graph data and the relationship between the graph data. Then, based on this, the data association pointing of the relationship between the graph data is combined to construct a threat analysis graph, and the threat analysis graph is put into practical application.
  • the relationship network is very flexible and can display heterogeneous information in a unified view. Through the built-in services provided by NebulaGraph, graph data and the relationship between graph data can be queried based on different rules.
  • Node types include but are not limited to V(IP), V(domain), and V(URL), where V represents a node; edge types include but are not limited to E(connect), E(release), E(download), and E(delivery), where E represents an edge.
  • data When data initiates a network connection, it may connect to an IP, domain name, or URL. This type of relationship is called a connection. Data may be used to release files, and the relationship is called release. Data may be used to download files, and the relationship is called download. An IP, domain name, or URL may also be used to distribute malicious files, and the relationship is called delivery. All of the above data types and associations can be proposed based on threat intelligence in a specified network environment, and the threat analysis graph can be directly put into the threat intelligence analysis process. At the same time, because the threat analysis graph is suitable for the user's network environment, it is easier and more convenient to obtain threat intelligence suitable for the user's network environment based on the threat analysis graph.
  • the subgraph extraction module uses the point and edge query services provided by NebulaGraph to flexibly extract target subgraph data of different scales associated with the seed node from NebulaGraph.
  • the extracted target subgraph data is saved as point data and edge data, and the data can be in json format.
  • the main fields used in the point data include but are not limited to: fields for identifying the content of the node data, fields for indicating the node type, and fields for indicating the unique identification of the node in the graph data.
  • the node data can be: ⁇ "name":"b**du.com”,”label”:"domain”,”vertexId”:"0005d1b1f7fde4c98455d29ece315570” ⁇
  • the field name stores the node data
  • the field label indicates that the node type is a domain name
  • the field vertexId indicates that the hash value of the node is 0005d1b1f7fde4c98455d29ece315570.
  • edge data includes but are not limited to: fields that indicate the unique identifier of a node in the graph data and fields that indicate the node type.
  • edge data can be: ⁇ "srcId”:"2c238667ca0068cead9c529e06b8675d”
  • dstId :"d878b8a1a12e3920a6a713f12a3d18e2"
  • label :"contain” ⁇ .
  • the field srcId indicates that the hash value of node 1 is 2c238667ca0068cead9c529e06b8675d
  • the field dstId indicates that the hash value of node 2 is d878b8a1a12e3920a6a713f12a3d18e2
  • the field label indicates that the edge type is include (contain).
  • the direction of the edge is from the node represented by the field srcId to the node represented by the field dstId.
  • Step 103 Determine a node representation of a target node in the target subgraph data; the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node.
  • the target subgraph data is extracted, for the target node, the neighbor nodes of the target node are determined in the target subgraph data, and the node data of the target node and the node data of the neighbor nodes of the target node are aggregated to obtain the node representation of the target node; in addition, the target node can be one or more, and the specific number of target nodes can be determined based on actual needs.
  • Step 104 Analyze the target node based on the node representation of the target node.
  • threat analysis when the node representation of each target node is obtained, threat analysis, similarity analysis, etc. may be performed based on the node representation of each target node.
  • the node analysis method based on the threat analysis graph uses the target data with security risks extracted from the source data as the seed node, extracts the target subgraph data associated with the seed node in the threat analysis graph, determines the node representation of the target node in the target subgraph data, and the node representation includes the node data of the target node and the node data of the neighboring nodes of the target node, and finally performs a correlation analysis on the target node based on the node representation of the target node. It can be seen that the present application only determines the node representation of the target node in the target subgraph data associated with the seed node, and there is no need to calculate and analyze all the graph data in the threat analysis graph, thereby improving the efficiency of data analysis.
  • step 102 may be implemented in the following manner:
  • the target association data includes node data and edge data;
  • the node data of the seed node and the target associated data are combined to obtain the target subgraph data.
  • the preset number of hops may be 1 hop, 2 hops, or 3 hops, etc., and may be set based on demand.
  • the point and edge query service provided by the graph database NebulaGraph can be used to extract node data and edge data of different scales associated with the seed node from NebulaGraph, and then the node data of the seed node, the node data and edge data associated with the seed node are combined to obtain the target subgraph data; the specific size of the target subgraph data is determined based on the preset number of hops.
  • Figure 2 is a schematic diagram of the target subgraph data extraction provided by an embodiment of the present application. As shown in Figure 2, the seed node 202 is extracted from the source data 201, and the subgraph extraction module 203 extracts the target subgraph data 205 based on the seed node 202 in the threat analysis graph of the graph database 204.
  • the source data 201 can be sandbox data, crawler data or fall identification data, and the seed node 202 takes node A, node B, node C, node E and node F as examples.
  • the node analysis method based on the threat analysis graph provided in the embodiment of the present application is based on the point and edge query service provided by the graph database NebulaGraph to extract the target subgraph data associated with the seed node, and the extraction is convenient.
  • FIG. 3 is a second flow chart of a node analysis method based on a threat analysis graph provided in an embodiment of the present application. As shown in FIG. 3 , the above step 103 can be specifically implemented by the following steps:
  • Step 1031 Determine the graph embedding vector of each node in the target subgraph data.
  • determining the graph embedding vector of each node in the target subgraph data may be specifically implemented in the following manner:
  • the current business scenario includes a business scenario of searching for structurally similar nodes, determining a graph embedding vector of each node in the target subgraph data based on a structural similarity algorithm;
  • a graph embedding vector of each node in the target subgraph data is determined based on a content similarity algorithm.
  • the target subgraph data consists of edge data and node data.
  • the network relationship in the target subgraph data belongs to non-Euclidean space data, which is not convenient for direct processing and calculation.
  • Euclidean space is a vector space with a richer set of methods and tools.
  • Graph embedding is a process of mapping graph data into low-dimensional dense vectors, which can solve the problem that graph data is difficult to efficiently input into machine learning algorithms and can be calculated in Euclidean space.
  • Graph embedding is more practical than adjacency matrix because graph embedding can pack node attributes into a vector with smaller dimension, and vector operations are simpler and faster than operations on graphs.
  • the purpose of graph embedding is to represent nodes and edges using vectors.
  • graph embedding is to convert the node data of each node in the target subgraph data into the corresponding graph embedding vector.
  • Graph embedding captures the topological structure of the target subgraph data, and more attribute embedding encoding can obtain better results in future tasks.
  • the corresponding algorithm can be selected according to different business scenarios. That is, in the business scenario of searching for structurally similar nodes, a structurally similar algorithm can be selected to determine the graph embedding vector of each node in the target subgraph data; in the business scenario of searching for content-similar nodes, a content-similar algorithm can be selected to determine the graph embedding vector of each node in the target subgraph data.
  • the content similarity algorithm is an algorithm for graph embedding representation of nodes and relationships in a graph structure, including but not limited to the TransE algorithm. It can be widely used in various subsequent graph-based tasks.
  • a piece of content can be represented as a triple (srcId, label, dstId).
  • the triple can be represented as: ⁇ "srcId”:"2c238667ca0068cead9c529e06b8675d”,”dstId”:"d878b8a1a12e3920a6a713f12a3d18e2",”label”:"contain” ⁇ .
  • the fields srcId and dstId are both nodes, which can be represented by the hash value (md5) of the node in the target subgraph data.
  • Contain is a relation, which is represented by the edge in the target subgraph data.
  • the dimension size of the graph embedding vector is between 64 and 512, which can be flexibly selected according to the actual effect of the downstream task and business needs.
  • the structural similarity algorithm is specifically as follows: the edge type statistics corresponding to each node in the target subgraph data are input into the target autoencoder model to obtain the graph embedding vector of each node output by the target autoencoder model.
  • the target autoencoder model is trained based on the edge type statistics sample information corresponding to each node in the graph structure sample.
  • the specific training process of the target autoencoder model is: obtain a large number of graph structure samples, determine the edge type statistical sample information corresponding to each node in each graph structure sample, and then input the edge type statistical sample information corresponding to each node in the graph structure sample into the pre-created initial autoencoder model, and the initial autoencoder model performs feature analysis on the edge type statistical sample information corresponding to each node to obtain the edge type statistical prediction information output by the initial autoencoder model, and then construct a loss function based on the edge type statistical prediction information and the edge type statistical sample information, and optimize the model parameters of the initial autoencoder model based on the loss function until the convergence condition is reached, and the model training is completed.
  • FIG 4 is a schematic diagram of the structure of the initial autoencoder model provided in an embodiment of the present application.
  • the input layer numbered 1 is the input layer
  • the middle hidden layer numbered 2 is the middle hidden layer
  • the output layer numbered 3 is the output layer.
  • the input layer numbered 1 and the middle hidden layer numbered 2 are used as the target autoencoder model, that is, the part in the dotted box is used as the target autoencoder model.
  • the initial autoencoding model can be a three-layer deep neural network (DNN), or the number of layers of the deep neural network can be increased, or other network structures can be used; vector dimensionality reduction (such as PCA) or other encoding techniques can also be used, and this application does not limit this.
  • DNN three-layer deep neural network
  • PCA vector dimensionality reduction
  • PCA vector dimensionality reduction
  • the target autoencoder model can be input in batches for prediction calculation. After the prediction calculation, each node in the target subgraph data corresponds to a graph embedding vector, and the dimension of the graph embedding vector is the encoding layer dimension of the target autoencoder model or other encoding structure.
  • Step 1032 Determine a node representation of the target node based on the graph embedding vector of each node.
  • the graph embedding vector of each of the nodes is input into a target graph neural network model to obtain a node representation of the target node output by the target graph neural network model.
  • the target graph neural network model is obtained by training based on graph embedding vector samples of multiple nodes.
  • the message passing paradigm is a paradigm that aggregates adjacent node information to update central node information. It generalizes the convolution operator to the field of irregular data and realizes the connection between graphs and neural networks. The message passing paradigm is widely used because of its simple and powerful characteristics.
  • the present application determines the node representation of the target node based on the graph embedding vector of each node and the target graph neural network model.
  • the target graph neural network model includes an acquisition module and an aggregation module; the graph embedding vector of each node is input into the target graph neural network model to obtain the node representation of the target node output by the target graph neural network model, which can be specifically implemented in the following manner:
  • the node aggregation information is determined as a node representation of the target node.
  • the target graph neural network model can have built-in multiple mainstream graph neural network algorithms to meet the usage requirements of different security scenarios, including but not limited to the GraphSAGE algorithm.
  • the GraphSAGE algorithm is taken as an example below.
  • GraphSAGE is a graph neural network algorithm that solves the limitations of the Graph Convolutional Nueral Network (GCN). GCN training requires the adjacency matrix of the entire graph, which depends on the specific graph structure and can generally only be used in direct learning.
  • GCN uses multiple layers of aggregation functions. Each layer of aggregation function aggregates the information of the node and its neighbors to obtain the feature vector of the next layer.
  • GraphSAGE uses the neighborhood information of the node and does not depend on the global graph structure.
  • GraphSAGE includes a sampling module and an aggregation module.
  • connection information between nodes is used to sample neighboring nodes, and then the information of adjacent nodes is continuously aggregated through multiple layers of aggregation functions to obtain node aggregation information, and the node aggregation information is determined as the node representation of the target node.
  • the aggregation function can be any of the following: mean aggregator, graph convolution aggregator (GCN aggregator), long short-term memory network aggregator (LSTM aggregator), pooling aggregator (Pooling aggregator).
  • FIG5 is a schematic diagram of converting the target subgraph data provided by the embodiment of the present application into the node representation of the target node.
  • the target subgraph data 501 includes nodes A, B, C, D, E and F.
  • the connection relationship between the specific six nodes is shown in FIG5.
  • FIG5 shows the process of transmitting the node information of a neighbor node to the target node.
  • the neighbor nodes of node B include nodes A and C.
  • the node data of node A and the node data of node C are linearly transformed and aggregated to node B.
  • the node data of node B, the node data of node A after linear transformation and the node data of node C are linearly transformed to obtain the node aggregation information of node B.
  • the neighbor nodes of node C include nodes A, B, E and F.
  • the node data of node A, the node data of node B, the node data of node E and the node data of node F are linearly transformed and aggregated to node C.
  • the node data of node C, the node data of node A after linear transformation, the node data of node B, the node data of node E and the node data of node F are linearly transformed to obtain the node aggregation information of node C.
  • the neighboring nodes of node D include node A.
  • the node data of node A is linearly transformed and then aggregated to node D.
  • the node data of node D and the node data of node A after linear transformation are linearly transformed again to obtain the node aggregation information of node D.
  • the training process of the target graph neural network model can be: inputting graph embedding vector samples of multiple nodes into the initial graph neural network model, the algorithm adopted by the initial graph neural network model can be the GraphSAGE algorithm, the initial graph neural network model collects the node data of the neighbor nodes of the sample node, and aggregates the node data of the sample node and the node data of the neighbor nodes of the sample node based on the aggregation function to obtain the node representation of the sample node; constructing a loss function based on the node representation of the sample node and the graph embedding vector of the sample node, optimizing the initial graph neural network model based on the loss function until the convergence condition is reached, and finally obtaining the target graph neural network model.
  • the algorithm adopted by the initial graph neural network model can be the GraphSAGE algorithm
  • the initial graph neural network model collects the node data of the neighbor nodes of the sample node, and aggregates the node data of the sample node and the node data of the neighbor nodes of
  • the node analysis method based on the threat analysis graph determines the node representation of the target node based on the graph embedding vector of each node and the target graph neural network model, and adds the node information of the neighboring nodes of the target node to the target node, so that the node representation of the target node contains more information. In this way, when the target node is subsequently analyzed based on the node representation of the target node, the accuracy of the analysis can be improved.
  • step 104 may be implemented in the following manner:
  • the node data of the target node is stored in a fall identification map database, or an alarm is issued to the target node, or the node data of the target node and the node data of the associated nodes of the target node are displayed.
  • a target graph neural network model is used to detect, analyze, and track threat events.
  • the graph neural network is used to analyze the node representation of the target node to obtain the threat risk coefficient of the target node, and then the threat risk coefficient of the target node is compared with the preset coefficient value.
  • the threat risk coefficient of the target node is greater than the preset coefficient value, it means that the target node is a risk node.
  • the node data of the target node can be determined as fall identification data, and the node data of the target node can be stored in the fall identification graph database, so that security experts can view the node data of the target node in the graph database; or, when it is determined that the threat risk coefficient of the target node is greater than the preset coefficient value, the target node can also be alarmed to achieve early warning of the risk node; in addition, the node data of the target node and the node data of the neighboring nodes of the target node can be displayed in a visual manner to assist security experts in operation, analysis and confrontation.
  • the target node can be further manually judged and analyzed.
  • the node data of the target node is stored in the fall identification map database.
  • the node analysis method based on the threat analysis graph provided in the embodiment of the present application can utilize the target graph neural network model to continuously monitor the massive data generated daily, realize the prediction of unknown risk nodes and the early warning of risk nodes, and in addition, can also display the node data of the target node and the node data of the neighboring nodes of the target node, which can assist security experts in operation, analysis and confrontation.
  • step 104 may be implemented in the following manner:
  • the node representation of the target node is compared and analyzed with the node representations of other nodes to determine nodes similar to the target node.
  • node representations of multiple nodes can be obtained, and the node representation of the target node can be compared with the node representations of other nodes for similarity, thereby determining nodes similar to the target node. In this way, if the target node is determined to be a risk node, nodes similar to the target node are also risk nodes.
  • the node analysis method based on the threat analysis graph provided in the embodiment of the present application can use the target graph neural network model to continuously monitor the massive data generated daily, and realize the search for similar nodes.
  • the node analysis system based on a threat analysis graph can be deployed on the server side.
  • the node analysis system based on a threat analysis graph includes a graph data storage module 601, a subgraph extraction module 602, a graph embedding module 603, a graph calculation module 604, a data post-processing module 605, and a data acquisition module 606; wherein the data acquisition module is used to collect source data; the graph data storage module 601 is used to store the threat analysis graph and provide query services; the subgraph extraction module 602 is used to generate a query service based on the seed node in the The target subgraph data is extracted from the threat analysis graph; the graph embedding module 603 is used to determine the graph embedding vector of each node in the target subgraph data; the graph calculation module 604 is used to determine the node representation of the target node based on the target graph neural network model and the
  • the node analysis method based on the threat analysis map provided in the embodiment of the present application is based on the threat analysis map and combined with the basic network facilities used by the APT organization to perform correlation analysis on massive heterogeneous multi-source data to realize the calculation of unknown risk nodes, early warning of risk nodes and search for similar nodes.
  • FIG7 is a schematic diagram of the structure of a node analysis device based on a threat analysis graph according to an embodiment of the present application.
  • the node analysis device 700 based on a threat analysis graph includes a first extraction unit 701, a second extraction unit 702, a determination unit 703 and an analysis unit 704; wherein:
  • the first extraction unit 701 is used to extract target data from the source data and use the target data as a seed node; the target data is data with security risks;
  • a second extraction unit 702 is used to extract target subgraph data associated with the seed node from the threat analysis graph stored in the graph database;
  • a determination unit 703 is used to determine a node representation of a target node in the target subgraph data; the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • the analyzing unit 704 is configured to analyze the target node based on the node representation of the target node.
  • the node analysis device based on the threat analysis graph uses the target data with security risks extracted from the source data as the seed node, extracts the target subgraph data associated with the seed node in the threat analysis graph, determines the node representation of the target node in the target subgraph data, and the node representation includes the node data of the target node and the node data of the neighboring nodes of the target node, and finally performs a correlation analysis on the target node based on the node representation of the target node. It can be seen that the present application only determines the node representation of the target node in the target subgraph data associated with the seed node, and there is no need to calculate and analyze all the graph data in the threat analysis graph, thereby improving the efficiency of data analysis.
  • the first extraction unit 702 is specifically configured to:
  • the target association data includes node data and edge data;
  • the node data of the seed node and the target associated data are combined to obtain the target subgraph data.
  • the determining unit 703 is specifically configured to:
  • the node representation of the target node is determined based on the graph embedding vector of each node.
  • the determining unit 703 is further specifically configured to:
  • the current business scenario includes a business scenario of searching for structurally similar nodes, determining a graph embedding vector of each node in the target subgraph data based on a structural similarity algorithm;
  • a graph embedding vector of each node in the target subgraph data is determined based on a content similarity algorithm.
  • the determining unit 703 is further specifically configured to:
  • the target graph neural network model is obtained by training based on graph embedding vector samples of multiple nodes.
  • the target graph neural network model includes an acquisition module and an aggregation module
  • the determining unit 703 is further specifically configured to:
  • the node aggregation information is determined as a node representation of the target node.
  • the analysis unit 704 is specifically used for:
  • the node data of the target node is stored in a fall identification map database, or an alarm is issued to the target node, or the node data of the target node and the node data of the associated nodes of the target node are displayed.
  • the analysis unit 704 is specifically used for:
  • the node representation of the target node is compared and analyzed with the node representations of other nodes to determine nodes similar to the target node.
  • FIG8 is a schematic diagram of the physical structure of an electronic device provided by an embodiment of the present application.
  • the electronic device may include: a processor 810, a communications interface 820, a memory 830, and a communication bus 840, wherein the processor 810, the communications interface 820, and the memory 830 communicate with each other through the communication bus 840.
  • the processor 810 may call the logic instructions in the memory 830 to execute the following method: extract target data from the source data, and use the target data as a seed node; the target data is data with security risks;
  • the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • the target node is analyzed based on the node representation of the target node.
  • the logic instructions in the above-mentioned memory 830 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product.
  • the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art.
  • the computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, etc.
  • the embodiment of the present application further provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the node analysis method based on the threat analysis graph provided in the above embodiments is implemented, for example, including: extracting target data from source data, and using the target data as a seed node; the target data is data with security risks;
  • the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • the target node is analyzed based on the node representation of the target node.
  • the present application further provides a non-transitory computer-readable storage medium having a computer program stored thereon, which is implemented when the computer program is executed by a processor to execute the node analysis method based on the threat analysis graph provided by the above methods, the method comprising: extracting target data from source data, and using the target data as a seed node; the target data is data with security risks;
  • the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • the target node is analyzed based on the node representation of the target node.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Ordinary technicians in this field can understand and implement it without paying creative labor.
  • each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供一种基于威胁分析图谱的节点分析方法及装置,涉及网络安全技术领域,其中,所述方法包括:在源数据中提取目标数据,并将目标数据作为种子节点;目标数据为具有安全风险的数据;在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子图数据;确定所述目标子图数据中目标节点的节点表征;所述目标节点的节点表征中包含所述目标节点的节点数据和所述目标节点的邻居节点的节点数据;基于所述目标节点的节点表征对所述目标节点进行分析。本申请是仅确定种子节点关联的目标子图数据中目标节点的节点表征,无需对威胁分析图谱中的所有图数据进行计算和分析,从而提高了数据分析的效率。

Description

基于威胁分析图谱的节点分析方法及装置
相关申请的交叉引用
本申请要求于2022年12月12日提交的申请号为202211600664.2,名称为“基于威胁分析图谱的节点分析方法及装置”的中国专利申请的优先权,其通过引用方式全部并入本文。
技术领域
本申请涉及网络安全技术领域,尤其涉及一种基于威胁分析图谱的节点分析方法及装置。
背景技术
在网络安全技术领域,高级可持续威胁攻击(Advanced Persistent Threat,APT)组织活动十分隐蔽,但APT组织控制的网络流量可以通过网络层检测获得,所以可以基于对网络层的检测来分析APT组织的攻击行为。
相关技术中,通常是通过对网络层的检测获取日志数据,对日志数据进行分析,从海量日志数据中获得威胁情报。
但上述相关技术中,虽然获取的日志数据非常丰富,但是冗余度较高,所以对海量日志数据进行直接分析,会降低数据分析的效率。
发明内容
针对现有技术中的问题,本申请实施例提供一种基于威胁分析图谱的节点分析方法及装置。
具体地,本申请实施例提供了以下技术方案:
第一方面,本申请实施例提供了一种基于威胁分析图谱的节点分析方法,包括:
在源数据中提取目标数据,并将所述目标数据作为种子节点;所述目标数据为具有安全风险的数据;
在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子 图数据;
确定所述目标子图数据中目标节点的节点表征;所述目标节点的节点表征中包含所述目标节点的节点数据和所述目标节点的邻居节点的节点数据;
基于所述目标节点的节点表征对所述目标节点进行分析
进一步地,所述在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子图数据,包括:
在所述威胁分析图谱中查找所述种子节点关联的预设跳数的目标关联数据;所述目标关联数据包括节点数据和边数据;
将所述种子节点的节点数据和所述目标关联数据进行组合,得到所述目标子图数据。
进一步地,所述确定所述目标子图数据中目标节点的节点表征,包括:
确定所述目标子图数据中每个节点的图嵌入向量;
基于每个节点的图嵌入向量确定目标节点的节点表征。
进一步地,所述确定所述目标子图数据中每个节点的图嵌入向量,包括:
获取当前业务场景;
在所述当前业务场景包括搜索结构相似节点的业务场景时,基于结构相似算法确定所述目标子图数据中每个节点的图嵌入向量;
在所述当前业务场景包括搜索内容相似节点的业务场景时,基于内容相似算法确定所述目标子图数据中每个节点的图嵌入向量。
进一步地,所述基于每个节点的图嵌入向量确定目标节点的节点表征,包括:
将每个所述节点的图嵌入向量输入至目标图神经网络模型中,得到所述目标图神经网络模型输出的目标节点的节点表征;
其中,所述目标图神经网络模型是基于多个节点的图嵌入向量样本训练得到的。
进一步地,所述目标图神经网络模型包括采集模块和聚合模块;
所述将每个所述节点的图嵌入向量输入至目标图神经网络模型中,得到所述目标图神经网络模型输出的目标节点的节点表征,包括:
将每个所述节点的图嵌入向量输入至所述采集模块,通过所述采集模块在每个所述节点的图嵌入向量采集所述目标节点的每个邻居节点的节点数据,并将每个邻居节点的节点数据和所述目标节点的节点数据发送至所述聚合模块;
通过所述聚合模块将所述每个邻居节点的节点数据和所述目标节点的节点数据进行聚合,得到节点聚合信息;
将所述节点聚合信息确定为所述目标节点的节点表征。
进一步地,所述基于所述目标节点的节点表征对所述目标节点进行分析,包括:
基于所述目标节点的节点表征确定所述目标节点的威胁风险系数,所述威胁风险系数用于表征所述目标节点的风险大小;
在确定所述目标节点的威胁风险系数大于预设系数值时,将所述目标节点的节点数据存储在陷落标识图数据库中,或者对所述目标节点进行告警,或者将所述目标节点的节点数据和所述目标节点的关联节点的节点数据进行显示。
进一步地,所述基于所述目标节点的节点表征对所述目标节点进行分析,包括:
将所述目标节点的节点表征与其他节点的节点表征进行对比分析,确定与所述目标节点相似的节点。
第二方面,本申请实施例还提供了一种基于威胁分析图谱的节点分析装置,包括:
第一提取单元,用于在源数据中提取目标数据,并将所述目标数据作为种子节点;所述目标数据为具有安全风险的数据;
第二提取单元,用于在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子图数据;
确定单元,用于确定所述目标子图数据中目标节点的节点表征;所述目标节点的节点表征中包含所述目标节点的节点数据和所述目标节点的邻居节点的节点数据;
分析单元,用于基于所述目标节点的节点表征对所述目标节点进行分析。
第三方面,本申请实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如第一方面所述基于威胁分析图谱的节点分析方法的步骤。
第四方面,本申请实施例还提供了一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面所述基于威胁分析图谱的节点分析方法的步骤。
第五方面,本申请实施例还提供了一种计算机程序产品,其上存储有可执行指令,该指令被处理器执行时使处理器实现第一方面所述基于威胁分析图谱的节点分析方法的步骤。
本申请实施例提供的基于威胁分析图谱的节点分析方法及装置,将从源数据中提取的具有安全风险的目标数据作为种子节点,在威胁分析图谱中提取与种子节点关联的目标子图数据,确定目标子图数据中目标节点的节点表征,该节点表征中包含目标节点的节点数据和目标节点的邻居节点的节点数据,最后基于目标节点的节点表征对目标节点进行相关分析。可知,本申请是仅确定种子节点关联的目标子图数据中目标节点的节点表征,无需对威胁分析图谱中的所有图数据进行计算和分析,从而提高了数据分析的效率。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的基于威胁分析图谱的节点分析方法的流程示意图之一;
图2是本申请实施例提供的目标子图数据提取的示意图;
图3是本申请实施例提供的基于威胁分析图谱的节点分析方法的流程示意图之二;
图4是本申请实施例提供的初始自编码模型的结构示意图;
图5是本申请实施例提供的目标子图数据转换为目标节点的节点表征的示意图;
图6是本申请实施例提供的基于威胁分析图谱的节点分析***的结构示意图;
图7是本申请实施例提供的基于威胁分析图谱的节点分析装置的结构示意图;
图8是本申请实施例提供的电子设备的实体结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
图1是本申请实施例提供的基于威胁分析图谱的节点分析方法的流程示意图之一,如图1所示,该基于威胁分析图谱的节点分析方法包括以下步骤:
步骤101、在源数据中提取目标数据,并将所述目标数据作为种子节点;所述目标数据为具有安全风险的数据。
其中,源数据可以为沙箱中的数据、爬虫数据或陷落标识(Indicator of Compromise,IOC)数据等。陷落标识是一种威胁情报,即攻击者控制被害主机所使用的远程命令与控制服务器情报。IOC通常包括域名(domain)、网际互连协议(Internet Protocol,IP)、统一资源定位***(uniform resource locator,URL)、安全套接层(Secure Socket Layer,SSL)证书、哈希(HASH)等。
示例地,在网络空间中,每天产生海量的数据,包括但不限于恶意样本在沙箱中运行产生的网络行为数据、网络爬虫爬取互联网威胁风险数据、开源安全报告中的威胁情报数据等,定时采集多个源数据,从源数据中提取 domain、URL、IP等具有安全风险的目标数据作为种子节点。
步骤102、在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子图数据。
其中,图数据库可以为NebulaGraph,NebulaGraph是一个分布式图数据库,在图数据库中存储百亿级别的威胁分析图谱,威胁分析图谱包含多个节点和多个边,其中,节点表示实体,边表示两个实体间的关联关系;根据图数据的数据类型(节点类型、边类型)、图数据之间的关联关系确定出数据关联无向图,然后再在此基础上结合图数据之间的关联关系的数据关联指向,构建出威胁分析图谱,并将该威胁分析图谱投入实际应用当中。关系网络极具灵活性,能够在统一视图里展示异构信息。通过NebulaGraph提供的内置服务,可以基于不同规则查询图数据以及图数据间的关联关系。节点类型包含但不限定于V(IP)、V(domain)、V(URL),其中,V表示节点;边类型包含但不限定于E(connect)、E(release)、E(download)、E(delivery),其中,E表示边。数据发起网络连接时,可能会连接IP、域名或者URL,这类关系是连接(connect);数据和数据之间可能用来释放文件,关系为释放(release);数据和数据之间可能用来下载文件,关系为下载(download),某个IP、域名或者URL也可能用来分发恶意文件,关系为分发(delivery);上述所有的数据类型和关联关系都可以基于指定网络环境中的威胁情报提出,能够将威胁分析图谱直接投入到威胁情报的分析过程中。同时,因为该威胁分析图谱是适合用户网络环境的,所以更容易、更方便地基于威胁分析图谱获取与用户网络环境相适合的威胁情报。
示例地,在网络空间数据中对节点进行关联分析,并不需要全图数据分析,提取种子节点的几跳内的关联数据已经涵盖了足够信息,子图提取模块利用NebulaGraph提供的点、边查询服务,可以灵活的从NebulaGraph中提取不同规模的与种子节点关联的目标子图数据。提取后的目标子图数据保存为点数据和边数据,数据可以采用json格式。
其中,点数据中主要使用的字段包含但不限于用于:标识节点数据内容的字段、表示节点类型的字段、以及表示节点在图数据中的唯一标识的字段。 例如,节点数据可以为:{"name":"b**du.com","label":"domain","vertexId":"0005d1b1f7fde4c98455d29ece315570"},字段name中存储了节点数据,字段label表示节点类型为域名,字段vertexId表示节点的哈希值为0005d1b1f7fde4c98455d29ece315570。
边数据中主要使用的字段包含但不限于:表示节点在图数据中的唯一标识的字段和表示节点类型的字段。例如,边数据可以为:{"srcId":"2c238667ca0068cead9c529e06b8675d","dstId":"d878b8a1a12e3920a6a713f12a3d18e2","label":"contain"}。字段srcId表示节点1的哈希值为2c238667ca0068cead9c529e06b8675d,字段dstId表示节点2的哈希值为d878b8a1a12e3920a6a713f12a3d18e2,字段label表示边类型为包括(contain)。边的方向由字段srcId所代表的节点指向字段dstId所代表的节点。
步骤103、确定所述目标子图数据中目标节点的节点表征;所述目标节点的节点表征中包含所述目标节点的节点数据和所述目标节点的邻居节点的节点数据。
示例地,在提取到目标子图数据时,针对目标节点,在目标子图数据中确定目标节点的邻居节点,将目标节点的节点数据和目标节点的邻居节点的节点数据进行聚合,得到目标节点的节点表征;另外,目标节点可以为一个,也可以为多个,目标节点的具体数量可以基于实际需求来确定。
步骤104、基于所述目标节点的节点表征对所述目标节点进行分析。
示例地,在得到每个目标节点的节点表征时,可以基于每个目标节点的节点表征进行威胁分析、相似度分析等。
本申请实施例提供的基于威胁分析图谱的节点分析方法,将从源数据中提取的具有安全风险的目标数据作为种子节点,在威胁分析图谱中提取与种子节点关联的目标子图数据,确定目标子图数据中目标节点的节点表征,该节点表征中包含目标节点的节点数据和目标节点的邻居节点的节点数据,最后基于目标节点的节点表征对目标节点进行相关分析。可知,本申请是仅确定种子节点关联的目标子图数据中目标节点的节点表征,无需对威胁分析图谱中的所有图数据进行计算和分析,从而提高了数据分析的效率。
在一实施例中,上述步骤102具体可通过以下方式实现:
在所述威胁分析图谱中查找所述种子节点关联的预设跳数的目标关联数据;所述目标关联数据包括节点数据和边数据;
将所述种子节点的节点数据和所述目标关联数据进行组合,得到所述目标子图数据。
其中,预设跳数可以为1跳、2跳或者3跳等,具体可基于需求进行设定。
示例地,可以利用图数据库NebulaGraph提供的点、边查询服务,从NebulaGraph中提取不同规模的与种子节点关联的节点数据和边数据,再将种子节点的节点数据、与种子节点关联的节点数据和边数据进行组合,就可以得到目标子图数据;具体目标子图数据的规模大小是基于预设跳数数决定的。图2是本申请实施例提供的目标子图数据提取的示意图,如图2所示,从源数据201中提取种子节点202,子图提取模块203基于种子节点202在图数据库204的威胁分析图谱中提取得到目标子图数据205。在图2中,源数据201可以为沙箱数据、爬虫数据或陷落标识数据,种子节点202以节点A、节点B、节点C、节点E和节点F为例。
本申请实施例提供的基于威胁分析图谱的节点分析方法,基于图数据库NebulaGraph提供的点、边查询服务,提取与种子节点关联的目标子图数据,提取方便。
在一实施例中,图3是本申请实施例提供的基于威胁分析图谱的节点分析方法的流程示意图之二,如图3所示,上述步骤103具体可通过以下步骤实现:
步骤1031、确定所述目标子图数据中每个节点的图嵌入向量。
可选地,确定所述目标子图数据中每个节点的图嵌入向量具体可通过以下方式实现:
获取当前业务场景;
在所述当前业务场景包括搜索结构相似节点的业务场景时,基于结构相似算法确定所述目标子图数据中每个节点的图嵌入向量;
在所述当前业务场景包括搜索内容相似节点的业务场景时,基于内容相似算法确定所述目标子图数据中每个节点的图嵌入向量。
示例地,目标子图数据由边数据和节点数据组成,目标子图数据中的网络关系属于非欧空间数据,不方便直接处理和计算。而欧氏空间是一种向量空间,向量空间有更丰富的方法工具集,图嵌入是一种将图数据映射为低维稠密向量的过程,能够解决图数据难以高效输入机器学习算法的问题,能够在欧氏空间进行计算。图嵌入比邻接矩阵更实用,因为图嵌入可以把节点属性打包到一个维度更小的向量中,同时向量运算比图形上的运算更简单、更快。图嵌入的目的是将节点和边利用向量进行表示。即图嵌入就是将目标子图数据中的每个节点的节点数据转换为对应的图嵌入向量。图嵌入捕获目标子图数据的拓扑结构,更多的属性嵌入编码可以在以后的任务中获得更好的结果。具体地,可以根据不同业务场景来选择对应的算法,即在搜索结构相似节点的业务场景下,可以选择结构相似的算法确定目标子图数据中每个节点的图嵌入向量;在搜索内容相似节点的业务场景下,可以选择内容相似的算法确定目标子图数据中每个节点的图嵌入向量。
其中,内容相似算法是一种用于表示图结构中节点及关系的图嵌入表示的算法,包括但不限于TransE算法。可以广泛应用于后续各类基于图谱的任务,一条内容可以表示为一个三元组(srcId,label,dstId),例如,三元组可以表示为:{"srcId":"2c238667ca0068cead9c529e06b8675d","dstId":"d878b8a1a12e3920a6a713f12a3d18e2","label":"contain"},在此三元组中,字段srcId和字段dstId均为节点,可以用目标子图数据中节点的哈希值(md5)表示,contain是一种relation,用目标子图数据中的边表示。通常图嵌入向量的维度大小在64到512之间,具体可根据下游任务实际效果和业务需要灵活选择。
结构相似算法具体为:将目标子图数据中每个节点对应的边类型统计信息,将每个节点对应的边类型统计信息输入至目标自编码模型中,得到所述目标自编码模型输出的每个节点的图嵌入向量。其中,所述目标自编码模型是基于图结构样本中每个节点对应的边类型统计样本信息训练得到的。
具体地,目标自编码模型的具体训练过程为:获取大量图结构样本,确 定每个图结构样本中每个节点对应的边类型统计样本信息,然后将图结构样本中每个节点对应的边类型统计样本信息输入至预先创建的初始自编码模型中,由初始自编码模型对每个节点对应的边类型统计样本信息进行特征分析,得到初始自编码模型输出的边类型统计预测信息,再基于边类型统计预测信息和边类型统计样本信息构建损失函数,基于损失函数对初始自编码模型的模型参数进行优化,直至达到收敛条件,模型训练完成。即上述通过归纳式学习的训练方式得到训练好的模型,此时从训练好的模型中取出从输入层到中间隐藏层作为目标自编码模型。图4是本申请实施例提供的初始自编码模型的结构示意图,如图4所示,编号为1的为输入层,编号为2的为中间隐藏层,编号为3的为输出层,将编号为1的输入层和编号为2的中间隐藏层作为目标自编码模型,也就是说虚线框内的部分作为目标自编码模型。
需要说明的是,初始自编码模型可以为三层深度神经网络(Deep Neural Networks,DNN),也可以增加深度神经网络的层数,还可以采用其他网络结构;也可以采用向量降维(如PCA)或其他编码技术,本申请对此不做限定。
需要说明的是,在目标子图数据中每个节点对应的边类型统计信息太多时,可按批输入目标自编码模型进行预测计算,预测计算后,目标子图数据中每个节点均对应一个图嵌入向量,图嵌入向量的维度为目标自编码模型或其他编码结构的编码层维度。
步骤1032、基于每个节点的图嵌入向量确定目标节点的节点表征。
可选地,将每个所述节点的图嵌入向量输入至目标图神经网络模型中,得到所述目标图神经网络模型输出的目标节点的节点表征。
其中,所述目标图神经网络模型是基于多个节点的图嵌入向量样本训练得到的。
示例地,现实世界许多数据以图(Graph)的形式呈现,图神经网络模型是一种新的机器学习模型家族,图神经网络模型已被证明可以充分利用图数据的结构信息。机器学习模型促进了许多实际问题的实际解决方案,例如节点分类、恶意样本的相似性检测、恶意软件检测、欺诈检测等。消息传递范式是一种聚合邻接节点信息来更新中心节点信息的范式,它将卷积算子推广 到了不规则数据领域,实现了图与神经网络的连接。消息传递范式因为简单、强大的特性,于是被人们广泛地使用。本申请基于每个节点的图嵌入向量和目标图神经网络模型确定目标节点的节点表征。
在一实施例中,所述目标图神经网络模型包括采集模块和聚合模块;将每个所述节点的图嵌入向量输入至目标图神经网络模型中,得到所述目标图神经网络模型输出的目标节点的节点表征,具体可通过以下方式实现:
将每个所述节点的图嵌入向量输入至所述采集模块,通过所述采集模块在每个所述节点的图嵌入向量中采集所述目标节点的每个邻居节点的节点数据,并将每个邻居节点的节点数据和所述目标节点的节点数据发送至所述聚合模块;
通过所述聚合模块将所述每个邻居节点的节点数据和所述目标节点的节点数据进行聚合,得到节点聚合信息;
将所述节点聚合信息确定为所述目标节点的节点表征。
其中,目标图神经网络模型可以内置多种主流图神经网络算法,满足不同安全场景使用需求,包括但不限于GraphSAGE算法,下面以GraphSAGE算法为例。
GraphSAGE是一种图神经网络算法,解决了图卷积神经网络(Graph Convolutional Nueral Network,GCN)的局限性,GCN训练时需要用到整个图的邻接矩阵,依赖于具体的图结构,一般只能用在直推式学习。GraphSAGE使用多层聚合函数,每一层聚合函数会将节点及其邻居的信息聚合在一起得到下一层的特征向量,GraphSAGE采用了节点的邻域信息,不依赖于全局的图结构,GraphSAGE包含采样模块和聚合模块,首先使用节点之间的连接信息,对邻居节点进行采样,然后通过多层聚合函数不断地将相邻节点的信息聚合在一起,得到节点聚合信息,将节点聚合信息确定为目标节点的节点表征。另外,聚合函数可以为以下任意一种:均值聚合器(Mean aggregator)、图卷积聚合器(GCN aggregator)、长短记忆网络聚合器(LSTM aggregator)、池化聚合器(Pooling aggregator)。
图5是本申请实施例提供的目标子图数据转换为目标节点的节点表征的 示意图,如图5所示,目标子图数据501中包括节点A、节点B、节点C、节点D、节点E和节点F,具体6个节点之间的连接关系如图5所示,图5中展示了一次邻居节点的节点信息传递到目标节点的过程,节点B的邻居节点包括节点A和节点C,将节点A的节点数据和节点C的节点数据进行线性变换后聚合到节点B,将节点B的节点数据、线性变换后的节点A的节点数据和节点C的节点数据,再经过线性变换后,得到节点B的节点聚合信息。节点C的邻居节点包括节点A、节点B、节点E和节点F,将节点A的节点数据、节点B的节点数据、节点E的节点数据和节点F的节点数据进行线性变换后聚合到节点C,将节点C的节点数据、线性变换后的节点A的节点数据、节点B的节点数据、节点E的节点数据和节点F的节点数据,再经过线性变换后,得到节点C的节点聚合信息。节点D的邻居节点包括节点A,将节点A的节点数据进行线性变换后聚合到节点D,将节点D的节点数据、线性变换后的节点A的节点数据,再经过线性变换后,得到节点D的节点聚合信息。
这样的“邻居节点信息传递到目标节点的过程”会进行多次。节点A的邻居节点B、邻居节点C和邻居节点D都已经发生过一次更新的节点信息,经过线性变换、聚合、再线性变换,产生了节点A的节点聚合信息,将节点A的节点聚合信息作为节点A的节点表征。
需要说明的是,目标图神经网络模型的训练过程可以为:将多个节点的图嵌入向量样本输入至初始图神经网络模型中,初始图神经网络模型采用的算法可以为GraphSAGE算法,由初始图神经网络模型采集样本节点的邻居节点的节点数据,并将样本节点的节点数据、以及样本节点的邻居节点的节点数据基于聚合函数进行聚合,得到样本节点的节点表征;基于样本节点的节点表征和样本节点的图嵌入向量构建损失函数,基于损失函数对初始图神经网络模型进行优化,直至达到收敛条件,最终得到目标图神经网络模型。
本申请实施例提供的基于威胁分析图谱的节点分析方法,基于每个节点的图嵌入向量和目标图神经网络模型确定目标节点的节点表征,在目标节点中增加了目标节点的邻居节点的节点信息,使得目标节点的节点表征包含的 信息更多,这样,在后续基于目标节点的节点表征对目标节点进行分析时,能够提高分析的准确性。
在一实施例中,上述步骤104具体可通过以下方式实现:
基于所述目标节点的节点表征确定所述目标节点的威胁风险系数,所述威胁风险系数用于表征所述目标节点的风险大小;
在确定所述目标节点的威胁风险系数大于预设系数值时,将所述目标节点的节点数据存储在陷落标识图数据库中,或者对所述目标节点进行告警,或者将所述目标节点的节点数据和所述目标节点的关联节点的节点数据进行显示。
示例地,为了搭建高度自动化平台及工具链,构建一个统一的、能吞吐海量异构多源数据,利用目标图神经网络模型进行检测、分析、追踪威胁事件,在得到目标图神经网络模型输出的目标节点的节点表征时,利用图神经网络对目标节点的节点表征进行分析,得到目标节点的威胁风险系数,再将目标节点的威胁风险系数与预设系数值进行比较,在确定目标节点的威胁风险系数大于预设系数值时,说明目标节点为风险节点,此时可以将目标节点的节点数据确定为陷落标识数据,并将目标节点的节点数据存储在陷落标识图数据库中,便于安全专家在图数据库中查看目标节点的节点数据;或者,在确定目标节点的威胁风险系数大于预设系数值时,还可以对目标节点进行告警,实现了对风险节点的预警;另外,还可以通过可视化方式展示目标节点的节点数据和目标节点的邻居节点的节点数据,辅助安全专家运营、分析和对抗。
需要说明的是,在确定目标节点的威胁风险系数大于预设系数值时,还可以进一步通过人工对目标节点进行研判和分析,在人工确定目标节点为高风险节点时,再将目标节点的节点数据存储在陷落标识图数据库中。
本申请实施例提供的基于威胁分析图谱的节点分析方法,可以利用目标图神经网络模型对每日产生的海量数据进行持续监测,实现了未知风险节点的预测以及风险节点的预警,另外,还可以显示目标节点的节点数据和目标节点的邻居节点的节点数据,能够辅助安全专家运营、分析和对抗。
在一实施例中,上述步骤104具体可通过以下方式实现:
将所述目标节点的节点表征与其他节点的节点表征进行对比分析,确定与所述目标节点相似的节点。
示例地,可以得到多个节点的节点表征,可以将目标节点的节点表征与其他节点的节点表征进行相似度对比,进而确定出与目标节点相似的节点,这样,若确定目标节点为风险节点,则于目标节点相似的节点也属于风险节点。
本申请实施例提供的基于威胁分析图谱的节点分析方法,可以利用目标图神经网络模型对每日产生的海量数据进行持续监测,实现了相似节点的搜索。
图6是本申请实施例提供的基于威胁分析图谱的节点分析***的结构示意图,基于威胁分析图谱的节点分析***可以部署在服务器端,如图6所示,基于威胁分析图谱的节点分析***包括图数据存储模块601、子图提取模块602、图嵌入模块603、图计算模块604、数据后处理模块605和数据采集模块606;其中,数据采集模块用于采集源数据;图数据存储模块601用于存储威胁分析图谱,并提供查询服务;子图提取模块602用于基于种子节点在威胁分析图谱中提取目标子图数据;图嵌入模块603用于确定目标子图数据中每个节点的图嵌入向量;图计算模块604用于基于目标图神经网络模型和每个节点的图嵌入向量确定目标节点的节点表征;数据后处理模块605用于基于目标节点的节点表征对目标节点进行威胁分析,还用于基于目标节点的节点表征和其他节点的节点表征确定与目标节点相似的节点,还用于在确定目标节点为风险节点时,显示目标节点的节点数据和目标节点的关联节点的节点数据。
本申请实施例提供的基于威胁分析图谱的节点分析方法,基于威胁分析图谱,结合APT组织所使用的基础网络设施,对海量异构多源数据进行关联分析,以实现未知风险节点的计算、风险节点的预警和相似节点的搜索。
图7是本申请实施例提供的基于威胁分析图谱的节点分析装置的结构示意图,如图7所示,该基于威胁分析图谱的节点分析装置700包括第一提取 单元701、第二提取单元702、确定单元703和分析单元704;其中:
第一提取单元701,用于在源数据中提取目标数据,并将所述目标数据作为种子节点;所述目标数据为具有安全风险的数据;
第二提取单元702,用于在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子图数据;
确定单元703,用于确定所述目标子图数据中目标节点的节点表征;所述目标节点的节点表征中包含所述目标节点的节点数据和所述目标节点的邻居节点的节点数据;
分析单元704,用于基于所述目标节点的节点表征对所述目标节点进行分析。
本申请实施例提供的基于威胁分析图谱的节点分析装置,将从源数据中提取的具有安全风险的目标数据作为种子节点,在威胁分析图谱中提取与种子节点关联的目标子图数据,确定目标子图数据中目标节点的节点表征,该节点表征中包含目标节点的节点数据和目标节点的邻居节点的节点数据,最后基于目标节点的节点表征对目标节点进行相关分析。可知,本申请是仅确定种子节点关联的目标子图数据中目标节点的节点表征,无需对威胁分析图谱中的所有图数据进行计算和分析,从而提高了数据分析的效率。
基于上述任一实施例,所述第一提取单元702具体用于:
在所述威胁分析图谱中查找所述种子节点关联的预设跳数的目标关联数据;所述目标关联数据包括节点数据和边数据;
将所述种子节点的节点数据和所述目标关联数据进行组合,得到所述目标子图数据。
基于上述任一实施例,所述确定单元703具体用于:
确定所述目标子图数据中每个节点的图嵌入向量;
基于每个节点的图嵌入向量确定目标节点的节点表征。
基于上述任一实施例,所述确定单元703还具体用于:
获取当前业务场景;
在所述当前业务场景包括搜索结构相似节点的业务场景时,基于结构相 似算法确定所述目标子图数据中每个节点的图嵌入向量;
在所述当前业务场景包括搜索内容相似节点的业务场景时,基于内容相似算法确定所述目标子图数据中每个节点的图嵌入向量。
基于上述任一实施例,所述确定单元703还具体用于:
将每个所述节点的图嵌入向量输入至目标图神经网络模型中,得到所述目标图神经网络模型输出的目标节点的节点表征;
其中,所述目标图神经网络模型是基于多个节点的图嵌入向量样本训练得到的。
基于上述任一实施例,所述目标图神经网络模型包括采集模块和聚合模块;
所述确定单元703还具体用于:
将每个所述节点的图嵌入向量输入至所述采集模块,通过所述采集模块在每个所述节点的图嵌入向量中采集所述目标节点的每个邻居节点的节点数据,并将每个邻居节点的节点数据和所述目标节点的节点数据发送至所述聚合模块;
通过所述聚合模块将所述每个邻居节点的节点数据和所述目标节点的节点数据进行聚合,得到节点聚合信息;
将所述节点聚合信息确定为所述目标节点的节点表征。
基于上述任一实施例,所述分析单元704具体用于:
基于所述目标节点的节点表征确定所述目标节点的威胁风险系数,所述威胁风险系数用于表征所述目标节点的风险大小;
在确定所述目标节点的威胁风险系数大于预设系数值时,将所述目标节点的节点数据存储在陷落标识图数据库中,或者对所述目标节点进行告警,或者将所述目标节点的节点数据和所述目标节点的关联节点的节点数据进行显示。
基于上述任一实施例,所述分析单元704具体用于:
将所述目标节点的节点表征与其他节点的节点表征进行对比分析,确定与所述目标节点相似的节点。
图8是本申请实施例提供的电子设备的实体结构示意图,如图8所示,该电子设备可以包括:处理器(processor)810、通信接口(Communications Interface)820、存储器(memory)830和通信总线840,其中,处理器810,通信接口820,存储器830通过通信总线840完成相互间的通信。处理器810可以调用存储器830中的逻辑指令,以执行如下方法:在源数据中提取目标数据,并将所述目标数据作为种子节点;所述目标数据为具有安全风险的数据;
在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子图数据;
确定所述目标子图数据中目标节点的节点表征;所述目标节点的节点表征中包含所述目标节点的节点数据和所述目标节点的邻居节点的节点数据;
基于所述目标节点的节点表征对所述目标节点进行分析。
此外,上述的存储器830中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
另一方面,本申请实施例还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各实施例提供的基于威胁分析图谱的节点分析方法,例如包括:在源数据中提取目标数据,并将所述目标数据作为种子节点;所述目标数据为具有安全风险的数据;
在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子图数据;
确定所述目标子图数据中目标节点的节点表征;所述目标节点的节点表 征中包含所述目标节点的节点数据和所述目标节点的邻居节点的节点数据;
基于所述目标节点的节点表征对所述目标节点进行分析。
又一方面,本申请还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各方法提供的基于威胁分析图谱的节点分析方法,该方法包括:在源数据中提取目标数据,并将所述目标数据作为种子节点;所述目标数据为具有安全风险的数据;
在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子图数据;
确定所述目标子图数据中目标节点的节点表征;所述目标节点的节点表征中包含所述目标节点的节点数据和所述目标节点的邻居节点的节点数据;
基于所述目标节点的节点表征对所述目标节点进行分析。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技 术方案的本质脱离本申请各实施例技术方案的范围。

Claims (12)

  1. 一种基于威胁分析图谱的节点分析方法,包括:
    在源数据中提取目标数据,并将所述目标数据作为种子节点;所述目标数据为具有安全风险的数据;
    在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子图数据;
    确定所述目标子图数据中目标节点的节点表征;所述目标节点的节点表征中包含所述目标节点的节点数据和所述目标节点的邻居节点的节点数据;
    基于所述目标节点的节点表征对所述目标节点进行分析。
  2. 根据权利要求1所述的基于威胁分析图谱的节点分析方法,其中,所述在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子图数据,包括:
    在所述威胁分析图谱中查找所述种子节点关联的预设跳数的目标关联数据;所述目标关联数据包括节点数据和边数据;
    将所述种子节点的节点数据和所述目标关联数据进行组合,得到所述目标子图数据。
  3. 根据权利要求1所述的基于威胁分析图谱的节点分析方法,其中,所述确定所述目标子图数据中目标节点的节点表征,包括:
    确定所述目标子图数据中每个节点的图嵌入向量;
    基于每个节点的图嵌入向量确定目标节点的节点表征。
  4. 根据权利要求3所述的基于威胁分析图谱的节点分析方法,其中,所述确定所述目标子图数据中每个节点的图嵌入向量,包括:
    获取当前业务场景;
    在所述当前业务场景包括搜索结构相似节点的业务场景时,基于结构相似算法确定所述目标子图数据中每个节点的图嵌入向量;
    在所述当前业务场景包括搜索内容相似节点的业务场景时,基于内容相似算法确定所述目标子图数据中每个节点的图嵌入向量。
  5. 根据权利要求3所述的基于威胁分析图谱的节点分析方法,其中,所述基于每个节点的图嵌入向量确定目标节点的节点表征,包括:
    将每个所述节点的图嵌入向量输入至目标图神经网络模型中,得到所述目标图神经网络模型输出的目标节点的节点表征;
    其中,所述目标图神经网络模型是基于多个节点的图嵌入向量样本训练得到的。
  6. 根据权利要求5所述的基于威胁分析图谱的节点分析方法,其中,所述目标图神经网络模型包括采集模块和聚合模块;
    所述将每个所述节点的图嵌入向量输入至目标图神经网络模型中,得到所述目标图神经网络模型输出的目标节点的节点表征,包括:
    将每个所述节点的图嵌入向量输入至所述采集模块,通过所述采集模块在每个所述节点的图嵌入向量中采集所述目标节点的每个邻居节点的节点数据,并将每个邻居节点的节点数据和所述目标节点的节点数据发送至所述聚合模块;
    通过所述聚合模块将所述每个邻居节点的节点数据和所述目标节点的节点数据进行聚合,得到节点聚合信息;
    将所述节点聚合信息确定为所述目标节点的节点表征。
  7. 根据权利要求1-6任一项所述的基于威胁分析图谱的节点分析方法,其中,所述基于所述目标节点的节点表征对所述目标节点进行分析,包括:
    基于所述目标节点的节点表征确定所述目标节点的威胁风险系数,所述威胁风险系数用于表征所述目标节点的风险大小;
    在确定所述目标节点的威胁风险系数大于预设系数值时,将所述目标节点的节点数据存储在陷落标识图数据库中,或者对所述目标节点进行告警,或者将所述目标节点的节点数据和所述目标节点的关联节点的节点数据进行显示。
  8. 根据权利要求1-6任一项所述的基于威胁分析图谱的节点分析方法,其中,所述基于所述目标节点的节点表征对所述目标节点进行分析,包括:
    将所述目标节点的节点表征与其他节点的节点表征进行对比分析,确定 与所述目标节点相似的节点。
  9. 一种基于威胁分析图谱的节点分析装置,包括:
    第一提取单元,用于在源数据中提取目标数据,并将所述目标数据作为种子节点;所述目标数据为具有安全风险的数据;
    第二提取单元,用于在图数据库中存储的威胁分析图谱中提取与所述种子节点关联的目标子图数据;
    确定单元,用于确定所述目标子图数据中目标节点的节点表征;所述目标节点的节点表征中包含所述目标节点的节点数据和所述目标节点的邻居节点的节点数据;
    分析单元,用于基于所述目标节点的节点表征对所述目标节点进行分析。
  10. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如权利要求1至8任一项所述基于威胁分析图谱的节点分析方法。
  11. 一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如权利要求1至8任一项所述基于威胁分析图谱的节点分析方法。
  12. 一种计算机程序产品,其上存储有可执行指令,该指令被处理器执行时使处理器实现如权利要求1至8中任一项所述基于威胁分析图谱的节点分析方法。
PCT/CN2022/144095 2022-12-12 2022-12-30 基于威胁分析图谱的节点分析方法及装置 WO2024124640A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211600664.2 2022-12-12
CN202211600664.2A CN116248325A (zh) 2022-12-12 2022-12-12 基于威胁分析图谱的节点分析方法及装置

Publications (1)

Publication Number Publication Date
WO2024124640A1 true WO2024124640A1 (zh) 2024-06-20

Family

ID=86626633

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/144095 WO2024124640A1 (zh) 2022-12-12 2022-12-30 基于威胁分析图谱的节点分析方法及装置

Country Status (2)

Country Link
CN (1) CN116248325A (zh)
WO (1) WO2024124640A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032724A1 (en) * 2015-04-16 2018-02-01 Nec Laboratories America, Inc. Graph-based attack chain discovery in enterprise security systems
US20200396230A1 (en) * 2019-06-13 2020-12-17 International Business Machines Corporation Real-time alert reasoning and priority-based campaign discovery
CN113364802A (zh) * 2021-06-25 2021-09-07 中国电子科技集团公司第十五研究所 安全告警威胁性研判方法及装置
CN114584351A (zh) * 2022-02-21 2022-06-03 北京恒安嘉新安全技术有限公司 一种监控方法、装置、电子设备以及存储介质
CN114928493A (zh) * 2022-05-23 2022-08-19 昆明元叙网络科技有限公司 基于威胁攻击大数据的威胁情报生成方法及ai安全***

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032724A1 (en) * 2015-04-16 2018-02-01 Nec Laboratories America, Inc. Graph-based attack chain discovery in enterprise security systems
US20200396230A1 (en) * 2019-06-13 2020-12-17 International Business Machines Corporation Real-time alert reasoning and priority-based campaign discovery
CN113364802A (zh) * 2021-06-25 2021-09-07 中国电子科技集团公司第十五研究所 安全告警威胁性研判方法及装置
CN114584351A (zh) * 2022-02-21 2022-06-03 北京恒安嘉新安全技术有限公司 一种监控方法、装置、电子设备以及存储介质
CN114928493A (zh) * 2022-05-23 2022-08-19 昆明元叙网络科技有限公司 基于威胁攻击大数据的威胁情报生成方法及ai安全***

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MEICONG LI; WEI HUANG; YONGBIN WANG; WENQING FAN: "The optimized attribute attack graph based on APT attack stage model", 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), IEEE, 14 October 2016 (2016-10-14), pages 2781 - 2785, XP033094970, DOI: 10.1109/CompComm.2016.7925204 *
宋晓峰等 (SONG, XIAOFENG ET AL.): "基于大数据引擎的军事信息网络安全防护*** (Research on Security Defense System for Military Information Network Based on Big Data Engine)", 电子信息对抗技术 (ELECTRONIC INFORMATION WARFARE TECHNOLOGY), no. 3, 15 May 2019 (2019-05-15) *

Also Published As

Publication number Publication date
CN116248325A (zh) 2023-06-09

Similar Documents

Publication Publication Date Title
Zhong et al. A cyber security data triage operation retrieval system
US9910980B2 (en) Cyber security
Afuwape et al. Performance evaluation of secured network traffic classification using a machine learning approach
Jha et al. Intrusion detection system using support vector machine
Maza et al. Feature selection algorithms in intrusion detection system: A survey
US20210021616A1 (en) Method and system for classifying data objects based on their network footprint
CN111355697B (zh) 僵尸网络域名家族的检测方法、装置、设备及存储介质
CN114172688B (zh) 基于gcn-dl的加密流量网络威胁关键节点自动提取方法
CN115242438B (zh) 基于异质信息网络的潜在受害群体定位方法
Al-Utaibi et al. Intrusion detection taxonomy and data preprocessing mechanisms
Gogoi et al. A rough set–based effective rule generation method for classification with an application in intrusion detection
Price-Williams et al. Nonparametric self-exciting models for computer network traffic
Li et al. Anomaly detection by discovering bipartite structure on complex networks
More et al. Enhanced-PCA based dimensionality reduction and feature selection for real-time network threat detection
WO2024124640A1 (zh) 基于威胁分析图谱的节点分析方法及装置
Azath et al. Identification of iot device from network traffic using artificial intelligence based capsule networks
CN102611714B (zh) 基于联系发现技术的网络入侵预测方法
Morshed et al. LeL-GNN: Learnable edge sampling and line based graph neural network for link prediction
Fang et al. Active exploration: simultaneous sampling and labeling for large graphs
CN113572781A (zh) 网络安全威胁信息归集方法
Suartana et al. Software-Defined Networking (SDN) Traffic Analysis Using Big Data Analytic Approach
Huang et al. A multi-channel cybersecurity news and threat intelligent engine-SecBuzzer
CN112750047A (zh) 行为关系信息提取方法及装置、存储介质、电子设备
Jose et al. Desinging Intrusion Detection System In Software Defined Networks Using Hybrid Gwo-Ae-Rf Model
Venkatasubramanian et al. Federated Learning Assisted IoT Malware Detection Using Static Analysis