CN117879968A

CN117879968A - Multi-dimensional industrial network behavior anomaly detection method

Info

Publication number: CN117879968A
Application number: CN202410163731.1A
Authority: CN
Inventors: 詹东阳; 张雯琦; 张宏莉; 余翔湛; 叶麟; 方滨兴
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2024-02-05
Filing date: 2024-02-05
Publication date: 2024-04-12

Abstract

The invention provides a multidimensional industrial network behavior anomaly detection method, and belongs to the technical field of network behavior detection. Comprising the following steps: s1, constructing a behavior analysis model of multi-view association analysis; s2, constructing a multidimensional cross-domain shared learning model, taking an industrial control network multidimensional view and a shared node embedding as input and taking a new shared node embedding as output; s3, constructing a single-domain specific learning model, and enabling the model to evaluate and detect the abnormality in multiple dimensions; s4, single-view specific learning is carried out on a plurality of dimensions, embedded information on each dimension is reinforced and learned, a predicted behavior value is reconstructed according to embedded features of each dimension, and an abnormal score on each dimension is calculated according to deviation of predicted behavior and actual behavior; s5, setting an abnormal score threshold, and carrying out abnormal alarm when the abnormal score is within the abnormal score threshold. The method solves the problems of lack of multi-dimensional recognition and understanding of network behaviors and poor efficiency of processing high-dimensional data.

Description

Multi-dimensional industrial network behavior anomaly detection method

Technical Field

The invention relates to a network behavior anomaly detection method, in particular to a multidimensional industrial network behavior anomaly detection method, and belongs to the technical field of network behavior detection.

Background

In the prior art, an anomaly detection method based on deep learning or a network behavior auditing method based on statistical analysis is generally adopted to detect network behaviors.

The anomaly detection method based on deep learning utilizes advanced capabilities of deep learning technology in terms of data analysis and pattern recognition. The anomaly detection based on deep learning has the advantages that: (1) multi-level feature learning: deep learning networks can capture complex data patterns by gradually extracting advanced features of the data through multiple layers. Such hierarchical feature learning is particularly effective for understanding deep structures of data, especially when processing large-scale data sets. (2) automatic feature extraction: unlike traditional machine learning methods, deep learning methods can automatically learn and extract features from data without requiring manual design or selection of features. The automatic feature extraction reduces the dependence on expert knowledge and enhances the generalization capability of the model. (3) nonlinear modeling capability: the deep learning model is able to capture complex nonlinear relationships in the data through nonlinear activation functions. This is critical to simulating complex patterns common in the real world. (4) big data driving: the deep learning method is excellent in processing a large amount of data, which makes it particularly suitable for use in today's data-rich environment. As the amount of data increases, the performance of the deep learning model generally increases. (5) end-to-end learning: the deep learning can realize end-to-end learning, and directly from the original data to the final result, thereby reducing the requirements of preprocessing and feature engineering. This simplifies the modeling process and potentially improves overall performance. (6) multiple network architectures: deep learning provides a variety of network architectures, such as Convolutional Neural Networks (CNNs), recurrent Neural Networks (RNNs), self-encoders, etc., each of which is adapted for different types of data and tasks. This flexibility allows the solution to be tailored to meet specific anomaly detection requirements. (7) time series analysis capability: for time series data, such as financial market data or network traffic, deep learning models (particularly RNNs and variants thereof) can effectively capture time-dependent relationships. This is critical for predicting and detecting anomalies in the time series. (8) unbalanced data processing: anomaly detection often faces the challenge of unbalanced data, i.e., normal data is far more than anomalous data. Deep learning methods can effectively address data imbalance problems through various techniques (e.g., resampling, cost sensitive learning, etc.). These technical features of the anomaly detection method based on deep learning make it show strong performance in various application scenarios, especially in situations where high-dimensional, complex data sets need to be processed. However, the anomaly detection method based on deep learning may face dimension disasters when processing high-dimensional data, resulting in high computational complexity and low model training efficiency;

network behavior auditing methods based on statistical analysis are a traditional and effective technique for monitoring and analyzing network traffic and user behavior. The network behavior auditing method based on statistical analysis has the advantages that: (1) data aggregation and preprocessing: prior to statistical analysis, network data is often required to be aggregated and preprocessed to reduce noise and irrelevant information. The data preprocessing comprises the steps of data cleaning, normalization, variable conversion and the like so as to ensure the quality and consistency of the data. (2) descriptive statistical analysis: basic statistical measures (e.g., mean, variance, standard deviation) are used to describe the overall characteristics of the network behavior. Descriptive statistics help understand the distribution and major trends of data. (3) abnormality detection: by establishing a normal pattern (benchmark) of network behavior, real-time data is then compared to these benchmarks to identify abnormal behavior. Abnormalities are generally defined as significant differences from normal behavior patterns. (4) time series analysis: the time series characteristics of the network behavior data are analyzed to identify periodicity, trends, or incidents. Time series analysis may be used to predict future network behavior or to detect abrupt behavior changes. (5) statistical hypothesis testing: statistical hypothesis testing is used to determine whether the observed data patterns are statistically significant. These tests help to distinguish between random variations and true behavior pattern variations. (6) clustering and classification: cluster analysis is used to identify natural populations or patterns in network behavior. Classification methods can be used to distinguish between normal behavior and abnormal behavior. (7) association rule mining: association rules in the network behavior data are discovered, and associations and sequence patterns between events are identified. This can be used to understand the causal relationships of network behavior. (8) visualization: the network behavior data is visually presented through a visualization tool such as a chart, a heat map and the like. Network behavior auditing methods based on statistical analysis rely on a single or limited data dimension, such as considering only time or traffic, etc. This limits a comprehensive understanding of behavior patterns in complex network environments.

Disclosure of Invention

The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention. Its purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

In view of the above, the invention provides a multi-dimensional industrial network behavior anomaly detection method for solving the technical problems of lack of multi-dimensional recognition and understanding of network behavior and poor efficiency of processing high-dimensional data in the prior art.

Scheme one, a multidimensional industrial network behavior anomaly detection method, comprising the following steps:

s1, constructing a behavior analysis model of multi-view association analysis;

s2, constructing a multidimensional cross-domain shared learning model, taking a multidimensional view of an industrial control network and a shared node embedding as input and taking a new shared node embedding as output;

s3, constructing a single-domain specific learning model, and learning node characteristics in dimensions on a single-dimensional view of each industrial control system to enable the model to evaluate and detect anomalies in multiple dimensions;

s4, single-view specific learning is carried out on a plurality of dimensions, embedded information on each dimension is reinforced and learned, a predicted behavior value is reconstructed according to embedded features of each dimension, and an abnormal score on each dimension is calculated according to deviation of predicted behavior and actual behavior;

s5, setting an abnormal score threshold, and carrying out abnormal alarm when the abnormal score is within the abnormal score threshold.

Preferably, S1 specifically comprises the following steps:

s11, acquiring industrial control network initial data, preprocessing the data, and extracting multidimensional features;

the industrial control network initial data consists of sensor characteristic values of N sensors in Ttrain time and network data; wherein the sensor characteristic value is expressed as:

wherein s is _train Representing an initial training data vector;characteristic values of N sensors in the industrial control system are represented in a time scale t;

in each time scale t, the sensor valueForming an N-dimensional vector to represent the characteristic values of N sensors in the industrial control system;

for each domain D, d= {1,..Since the information of each domain isInterrelated and sharing a completely consistent set of nodes, thus constructing the cross-domain graph as an undirected weighted multi-graphIt contains a node set V with N nodes and an edge set epsilon with D different types, i.e. epsilon= { epsilon% ₁ ,...,ε _D }；

The multi-dimensional features include: control behavior sequence, control behavior invariant, data behavior sequence, data behavior invariant and multi-view data change ratio;

s12, constructing an industrial control network single-dimensional view and node embedding based on multi-dimensional characteristics;

constructing a weighted directed communication graph through the mutual transmission of data between sensors in a physical dimension so as to represent initial association among all nodes in the physical dimension, constructing a weighted directed communication graph in a network dimension through the mutual transmission of network data packets among the nodes in the network dimension, and generating respective initial node embedding through the data processed in the two dimensions;

s13, carrying out multi-view combination on a plurality of industrial control network single-dimensional views to obtain industrial control network multi-dimensional views and embedding sharing nodes;

after obtaining the single graph representation in two dimensions and embedding the nodes, constructing a shared graph representation simultaneously representing node association in the two dimensions through the characteristic of a common node set.

Preferably, S2 specifically comprises the following steps:

s21, acquiring a weight matrix through the multiple graphs, convolving the weight matrix and the shared node embedding as inputs, and iterating the multi-layer graph convolution to obtain a new learned node shared embedding;

s22, embedding the new shared node into the output layer to obtain a reconstructed multidimensional node characteristic predicted value;

s23, splitting the composite multidimensional node embedding vector into two vectors according to the proportion, respectively corresponding to node embedding in physical dimension and network dimension, and respectively inputting to two specific domain convolution layers in cooperation with specific graph representation in the two dimensions; feeding back input to the model output by using a residual network idea, wherein each dimension graph representation obtained by calculating a cosine value is used as the input of a specific domain convolution layer of the next round, and each dimension node in the output value is embedded and reconstructed into a predicted value and then used as a final output result of a training model;

s24, solving a new graph representation structure according to new shared node embedding in each round to perform new round of embedded learning.

Preferably, S24 is specifically calculating the similarity between the embedded vector of the node i and its candidate neighbor node j; selecting the first k candidate neighbor nodes with the largest normalized dot product with the node i as neighbors of the node i, and forming an adjacent matrix A by all the nodes and neighbor nodes thereof; wherein the value of k is selected according to the expected and previous knowledge;

calculating the similarity between the embedded vector of the node i and the candidate neighbor node j thereof:

A _ji ＝1{j∈TopK({e _ii :k∈C _i })} (3)

wherein A is _ji Representing a neighbor matrix of node i; c (C) _i Representing a set of candidate neighbor nodes for node i; e, e _ki Representing a normalized dot product between the embedded vector of node i and the embedded vector of node j;

the working process of the shared convolution layer is as follows:

wherein,representing node i sharing the embedding at layer 1+1 on the multiple graph G,/I>Representing an input shared feature matrix of node i; />Represented from multiple graphs GLearned adjacency matrix A _s The neighbor set of the node i is obtained; w represents a trained shared weight matrix; alpha _i,j Attention coefficients representing nodes i and j; />The shared node representing the first layer output of the cross-domain shared graph convolutional layer is embedded.

Preferably, S3 comprises the steps of:

s31, a single-domain specific learning model takes a single graph structure and specific node embedding in each dimension as input, a weight matrix is obtained through the graph structure in each dimension, CGN convolution is carried out by taking the weight matrix and the node embedding as input, information of the nodes is fused with neighbors of the nodes based on the learned graph structure, and the fused features are used as final node embedding;

s32, the model takes a ReLu function as an activation function to increase the nonlinear relation among layers of the neural network; embedding the learned nodes on each specific domain into each specific domain graph representation structure of a new round obtained through calculation, and repeating iterative multi-layer graph convolution and feature extraction operation to obtain the learned final node embedding until the model converges or reaches the preset training times;

based on the cross-domain shared learning module, the convolution layer of the single-domain specific learning model is defined as:

wherein,node embedding on a single view representing the output of the l+1 specific domain convolutional layer; />Representing the node embedding matrix in each dimension of the first layer, the node embedding matrix being shared by the first layer +.>Calculating to obtain;representation +.>Upper learned adjacency matrix A _s The neighbor set of the node i is obtained; w (W) _d Representing a trained shared weight matrix; alpha _i,j Attention coefficients representing nodes i and j; />The output of the convolution layer of the first specific graph, i.e., the embedded set of nodes of each dimension, is represented.

Preferably, the step S4 is specifically to embed the new shared node learned in each dimension into a time sequence predicted value reconstructed according to the embedded feature by an output layer linear mapping method, directly performing multidimensional cross comparison on the predicted value and the current industrial control network data flow value, performing abnormal assessment according to the deviation condition of the predicted value and the actual value in each dimension and a threshold value, and taking the assessment result in each dimension as an abnormal detection result of the industrial control system finally.

Preferably, the method further comprises S6 of multi-gradient descent optimization based on multi-task learning, and comprises the following steps: s61, in each training process, the model calculates the relative weight of each task according to the parameter information of each task; the weights are dynamically adjusted during training to balance the learning of the model among the plurality of tasks; calculating the gradient of each task and continuously updating the parameter vector through the gradient until the parameter vector meets the KKT condition, and finding the pareto optimal solution of the original problem;

s62, introducing relative weights into the loss functions, so that the loss functions of each task are weighted to reflect the importance of each task; performing inverse gradient propagation on the model using the weighted loss function to update parameters in the model;

s63, performing iterative optimization for multiple times, and adaptively weighing weights among multiple tasks by a multi-gradient descent optimizer.

The second scheme is an electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the step of the multi-dimensional industrial network behavior anomaly detection method in the first scheme when executing the computer program.

A third aspect is a computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a multi-dimensional industrial network behavior anomaly detection method as described in the first aspect.

The beneficial effects of the invention are as follows:

1. according to the invention, anomaly detection is carried out by combining multi-dimensional information, an ICS-oriented cross-domain multi-graph representation method is provided, and cross-domain characteristics among ICS elements in different dimensions are represented on a multi-graph structure by analyzing behaviors among the ICS elements in different dimensions, so that more comprehensive information is provided for a subsequent anomaly detection model;

2. the invention provides ICS anomaly detection based on a graph neural network, which combines a graph structure to simultaneously learn node representations of a shared domain and a specific domain so as to perform joint training on different dimensions and anomaly detection; the method follows a mechanism of multi-task learning, so that a plurality of tasks are combined into a complete and efficient model through a multi-gradient descent optimization algorithm, and the model has universality and generalization capability;

3. the present invention evaluates the methods presented herein on a SWaT dataset containing multidimensional data to demonstrate the effectiveness and advantages of the methods compared to other baseline methods. The experimental result shows that the model provided herein is significantly superior to other most advanced methods in terms of abnormality detection performance, and shows that the model has superior performance, which provides a new idea for research and application in the field of ICS abnormality detection.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a flow chart of a method for detecting behavioral anomalies in a multi-dimensional industrial network;

FIG. 2 is a schematic diagram of a behavior analysis model structure of multi-view correlation analysis;

FIG. 3 is a schematic diagram showing a multi-figure structure;

FIG. 4 is a plurality of schematic diagrams showing the structure;

FIG. 5 is a schematic diagram of a multi-dimensional cross-domain shared learning model;

FIG. 6 is a flow chart of interaction of a multi-dimensional cross-domain shared learning model with a single-domain specific learning model;

FIG. 7 is a flow chart of a multi-gradient descent optimization algorithm.

Detailed Description

In order to make the technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of exemplary embodiments of the present invention is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

Example 1: 1-7, the industrial control network is different from the traditional network, features are more complex, various devices are involved, network points are more dense, protocols are relatively fragile, and accordingly security risks are more. Therefore, aiming at the characteristics of high dimensionality, strong relevance, high real-time performance, limited resources, difficult updating and the like of data generated by an industrial control network, the invention researches a network behavior analysis and audit method based on industrial features aiming at the aggressive behavior in the industrial control network, breaks through key technologies of protocol analysis, industrial feature extraction, abnormal behavior detection and the like of industrial control network flow, and realizes behavior analysis and audit of the network behavior in a typical flow industrial environment.

The method is characterized in that an industrial network data stream file is used for researching an industrial Internet of things-oriented behavior analysis and audit system, industrial network traditional data are obtained through monitoring two channels in real time, a protocol analysis algorithm is used for analyzing data packets layer by layer, at least 25 industrial control network specific protocols can be analyzed at an application layer, a preliminary protocol analysis result is further subjected to multi-dimensional feature extraction, required feature information is extracted by combining five main dimensions from a control behavior sequence, a control behavior invariant, a data behavior sequence, a data behavior invariant and a multi-view, behavior prediction modeling is further carried out on an information learning result, and prediction features, actual feature differences and threshold cross comparison results on a plurality of dimensions are used as bases for finally carrying out anomaly detection and behavior audit. Based on the characteristics of an industrial control network, the method comprehensively explores potential nonlinear association of nodes in the industrial control network from two dimensional information of physical space and network space, utilizes complementary information of related domains to relieve sparsity problems, respectively learns cross-domain shared representation and concrete representation, predicts future behaviors of an industrial control system based on a reconstructed node relation graph, and finally predicts abnormal alarms and interprets deviations based on the predictions, so that intrinsic association can be extracted from complex high-dimensional industrial control data, accuracy of an abnormal detection model is improved without losing excessive efficiency, and abnormal deviations can be interpreted more reliably based on the reconstructed association graph to effectively ensure safety of the industrial control system.

The method specifically comprises the following steps:

s1, constructing a behavior analysis model of multi-view association analysis;

the industrial control network initial data consists of sensor characteristic values (namely read physical value time sequences) of N sensors in Ttrain time and network data; wherein the sensor characteristic value is expressed as formula 2-1:

for each domain D, d= {1,..Because the information of each domain is interrelated and shares a completely consistent set of nodes, the cross-domain graph is constructed as an undirected weighted multi-graphIt contains a node set V with N nodes and an edge set epsilon with D different types, i.e. epsilon= { epsilon% ₁ ,...,ε _D }；

The sensor values, the vectors and the node values are in a corresponding relation in the actual application process, namely the number of the sensors is consistent with the vector dimension and the number of the nodes;

constructing a weighted directed communication graph through the mutual transmission of data between sensors in a physical dimension so as to represent initial association among all nodes in the physical dimension, constructing a weighted directed communication graph in a network dimension through the mutual transmission of network data packets among the nodes in the network dimension, and generating respective initial node embedding through the data processed in the two dimensions; referring to fig. 3;

after obtaining single graph representations in two dimensions and node embedding, constructing a shared graph representation which simultaneously represents node association in the two dimensions by the characteristic of a common node set;

specifically, on the basis of the above work, the probability of node correlation is calculated based on the node embedding matrix of each dimension:

wherein x is _i Representing an embedding matrix of node i; x is x _j Representing an embedding matrix of node i; c (C) _i Representing the set of neighbor nodes for node i.

Identifying the highest probability neighbors of each node by applying a Topk threshold, and generating a graph structure for each dimension; combining edges with different dimensions into one graph, and allowing a plurality of edges to exist between any two nodes; generating a multidimensional graph, and effectively capturing complex relations in data; in order to form node sharing embedding in the multi-graph structure, attaching a network dimension embedding vector to the back of a physical dimension embedding vector, and obtaining a composite multi-dimensional node embedding with more characteristics as input of a multi-graph learning module; referring to fig. 4;

the ICS element information from different dimensions is fused together, the information is combined into a unified representation, and the information representing multiple behavior domains is fused and can be used for analyzing the correlation and the characteristics among different dimensions of ICS behaviors so as to provide a data basis for abnormal behaviors in a test network.

s21, acquiring a weight matrix through multiple graphs, carrying out CGN (Graph Convolutional Networks, graph convolution network) convolution by taking the weight matrix and the shared node embedding as inputs, adding nonlinear relations among layers of a neural network by taking a Relu function as an activation function, mapping abstract embedding obtained through learning into node characteristic matrix representation through a linear output layer, obtaining a graph representation structure obtained by learning the node embedding and cosine value through calculating the node embedding, and repeatedly iterating the multi-layer graph convolution to obtain a new learned node sharing embedding;

s23, splitting the composite multidimensional node embedding vector into two vectors according to the proportion, respectively corresponding to node embedding in physical dimension and network dimension, and respectively inputting to two specific domain convolution layers in cooperation with specific graph representation in the two dimensions; similar to the shared domain convolution layer, the specific domain convolution layer also adopts the GCN algorithm to carry out convolution learning, but uses the residual network idea to feed back the input to the model output aiming at the specific domain convolution layer, wherein, each dimension graph representation obtained by calculating the cosine value is respectively used as the input of the specific domain convolution layer of the next round, and each dimension node in the output value is embedded and reconstructed into a predicted value and then is used as the final output result of the training model.

S24, in order to enable the graph structures participating in convolution to better represent node association, a cosine value calculation method is adopted to embed and solve a new graph representation structure according to a new shared node in each round;

calculating the similarity between the embedded vector of the node i and the candidate neighbor node j; selecting the first k candidate neighbor nodes with the largest normalized dot product with the node i as neighbors of the node i, and forming an adjacent matrix A by all the nodes and the neighbors thereof; wherein the value of k is selected according to the expected and previous knowledge;

A _ji ＝1{j∈TopK({e _ii :k∈C _i })} (3)

the working process of the shared convolution layer is as follows:

wherein,representing node i sharing the embedding at layer 1+1 on the multiple graph G,/I>Representing an input shared feature matrix of node i; />Representing an adjacency matrix A learned from multiple graphs G _s The neighbor set of the node i is obtained; w represents a trained shared weight matrix; alpha _i,j Attention coefficients representing nodes i and j; />A shared node embedding representing a first layer output of a cross-domain shared graph convolutional layer; the multidimensional cross-domain shared learning model structure is referred to in fig. 5;

s3, the learning task of the single-domain specific learning module is more specific and fine-grained, and the learning task is responsible for learning node characteristics in dimensions on a single-dimensional view of each industrial control system, so that the model can evaluate and detect anomalies in multiple dimensions; therefore, a single-domain specific learning model is constructed, node characteristics in the dimension are learned on a single-dimension view of each industrial control system, and the model is used for evaluating and detecting the abnormality in multiple dimensions;

wherein,node embedding on a single view representing the output of the l+1 specific domain convolutional layer; />Representing the node embedding matrix in each dimension of the first layer, the node embedding matrix being shared by the first layer +.>Calculating to obtain;representation +.>Upper learned adjacency matrix A _s The neighbor set of the node i is obtained; w (W) _d Representing a trained shared weight matrix; alpha _i,j Attention coefficients representing nodes i and j; />Representing the output of the convolution layer of the first specific graph, namely the embedded set of nodes in each dimension; the interaction flow of the multidimensional cross-domain shared learning model and the single-domain specific learning model is shown in FIG. 6;

the new shared node embedded in each dimension is converted into a time sequence predicted value reconstructed according to the embedded characteristics by an output layer linear mapping method, the predicted value is directly subjected to multidimensional cross comparison with the current industrial control network data flow value, abnormal assessment is performed according to the deviation condition and the threshold value of the predicted value and the actual value in each dimension, and the assessment result in each dimension is used as an abnormal detection result of the industrial control system;

S6, multiple gradient descent optimization based on multi-task learning; comprising the steps of (a) a step of,

s61, in each training process, the model calculates the relative weight of each task according to the parameter information of each task. These weights are dynamically adjusted during training to balance the model across multiple tasks

Learning in between. The domain sharing convolution can be better studied and optimized by reasonably planning the proportion, so that the studying and training effects of the whole model can be influenced. The process of dynamically adjusting weights may be equivalent to solving the pareto optimal solution of the multi-objective optimization problem, with the requirement of the KKT (Karush-Kuhn-turner) condition [37]. Therefore, the parameter vector can be continuously updated by calculating the gradient of each task until the gradient meets the KKT condition, namely, the pareto optimal solution of the original problem is found.

S62, introducing the relative weights into the loss functions, so that the loss functions of each task are weighted to reflect the importance of each task. The model is then back-gradient propagated using this weighted loss function to update the parameters in the model. In this way, the model may optimize multiple tasks simultaneously, rather than optimizing for each task individually.

S63, through repeated iterative optimization, the multi-gradient descent optimizer can adaptively weight among a plurality of tasks, so that the optimization effect of each task can be fully utilized, and the training efficiency and accuracy of the model are improved. Meanwhile, as the multi-gradient descent optimizer can calculate gradients of all tasks in a single reverse transfer, the calculation cost can be reduced, and the convergence speed and the training speed of the model can be improved.

The invention mainly comprises a multi-layer neural network and the layers are mutually related, so that the related weight to be updated is more complex, and the problem of weight distribution among a plurality of tasks is particularly critical. For this, a method of minimizing weighted linear combination of the multitasking is generally adopted, which is effective only when the tasks are not competing, but is rare, particularly, the learning tasks in each dimension in the model of the present invention have obvious competition relationship, and the allocation of the competition weight can affect the accuracy of the prediction, so that it is difficult to achieve the best optimization effect in the learning model of the present invention.

An optimization algorithm based on a Multiple Gradient Descent Algorithm (MGDA) that uses a gradient-based optimization method and that demonstrably converges to a point on the pareto set, and that computes through a single reverse pass without an explicit task-specific gradient, making the computational overhead of the method negligible; the multiple gradient descent optimizer reasonably distributes weights of next learning for each task according to each task target and training gradient by adaptively balancing weight distribution among multiple tasks, so that a better training effect is achieved by the model.

The multi-gradient descent optimizer is capable of reasonably distributing weights for a plurality of tasks of the multi-view associated learning model when the tasks perform respective loss feedforward optimization, takes the parameters of each task after each round of learning as input, calculates the respective weights for the loss values of the tasks by calculating the pareto optimal solution of each task parameter sequence, multiplies the weights with the respective losses as relative proportion, and then takes the sum as the loss value of the parent task, namely the multidimensional cross-domain shared learning layer, so as to perform feedforward optimization on the whole network.

The invention is compared with methods of GDN, tranAD, DTAAD, USAD, MAD-GAN, LSTM-NDT and the like, and the accuracy rate, the recall rate and the F1 score are used as comparison standards of experiments. The results show that the abnormality detection is carried out on the six-process water treatment industrial control system on the SWaT data set, the optimal experimental result is that the F1 score is as high as 83.5 percent, the accuracy rate is as high as 83.4 percent, and the recall rate is as high as 84.5 percent, and the results superior to those of the current other methods are obtained;

the method and the system creatively extract the characteristics of the industrial control network from multiple dimensions, deeply explore nonlinear relations contained in the industrial control network through cross-domain sharing learning by using the idea of cross-fusion, represent the structure and the characteristic embedding of the nodes by using a more comprehensive construction diagram, further learn the characteristic pertinence in each dimension according to richer and more comprehensive results obtained through cross-domain sharing learning, fully learn the potential characteristics of the industrial control network in each dimension, and predict the behaviors of the industrial control network according to the potential characteristics more accurately, and judge the current behaviors according to the predicted values, so the model has better detection effect compared with other models which only consider the information of the industrial control network in a single dimension.

Example 2: the computer device of the present invention may be a device including a processor and a memory, such as a single chip microcomputer including a central processing unit. And the processor is used for realizing the steps of the multi-dimensional industrial network behavior abnormality detection method when executing the computer program stored in the memory.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Example 3: computer-readable storage medium embodiments.

The computer readable storage medium of the present invention may be any form of storage medium that is read by a processor of a computer device, including but not limited to a nonvolatile memory, a volatile memory, a ferroelectric memory, etc., on which a computer program is stored, and when the processor of the computer device reads and executes the computer program stored in the memory, the steps of a multi-dimensional industrial network behavior abnormality detection method described above may be implemented.

The computer program comprises computer program code which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims

1. The method for detecting the behavior abnormality of the multi-dimensional industrial network is characterized by comprising the following steps of:

s1, constructing a behavior analysis model of multi-view association analysis;

2. The method for detecting abnormal behavior of a multi-dimensional industrial network according to claim 1, wherein S1 specifically comprises the following steps:

for each domain D, d= {1,..Because the information of each domain is correlated and shares a completely consistent node set, the cross-domain graph is constructed as an undirected weighted multiple graph +.>It contains a node set V with N nodes and an edge set epsilon with D different types, i.e. epsilon= { epsilon% ₁ ,...,ε _D }；

3. The method for detecting abnormal behavior of a multi-dimensional industrial network according to claim 1, wherein S2 specifically comprises the following steps:

4. The method for detecting abnormal behavior of a multi-dimensional industrial network according to claim 3, wherein S24 is specifically implemented by calculating the similarity between the embedded vector of the node i and the candidate neighbor node j thereof; selecting the first k candidate neighbor nodes with the largest normalized dot product with the node i as neighbors of the node i, and forming an adjacent matrix A by all the nodes and neighbor nodes thereof; wherein the value of k is selected according to the expected and previous knowledge;

A _ji ＝1{j∈TopK({e _ii :k∈C _i })} (3)

the working process of the shared convolution layer is as follows:

wherein,representing node i sharing the embedding at layer 1+1 on the multiple graph G,/I>Representing an input shared feature matrix of node i; />Representing an adjacency matrix A learned from multiple graphs G _s The neighbor set of the node i is obtained; w represents a trained shared weight matrix; alpha _i,j Attention coefficients representing nodes i and j; />The shared node representing the first layer output of the cross-domain shared graph convolutional layer is embedded.

5. The method for detecting abnormal behavior of a multi-dimensional industrial network according to claim 1, wherein S3 comprises the steps of:

wherein,node embedding on a single view representing the output of the l+1 specific domain convolutional layer; />Representing the node embedding matrix in each dimension of the first layer, the node embedding matrix being shared by the first layer +.>Calculating to obtain; />Representation +.>Upper learned adjacency matrix A _s The neighbor set of the node i is obtained; w (W) _d Representing a trained shared weight matrix; alpha _i,j Attention coefficients representing nodes i and j; />The output of the convolution layer of the first specific graph, i.e., the embedded set of nodes of each dimension, is represented.

6. The method for detecting abnormal behavior of a multi-dimensional industrial network according to claim 1, wherein S4 is specifically characterized in that a new shared node learned in each dimension is embedded and converted into a time sequence predicted value reconstructed according to an embedded feature by an output layer linear mapping method, the predicted value is directly subjected to multi-dimensional cross comparison with a current industrial control network data flow value, abnormal assessment is performed according to deviation conditions and thresholds of the predicted value and an actual value in each dimension, and an assessment result in each dimension is used as an abnormal detection result of an industrial control system finally.

7. The method for detecting behavioral anomalies in a multi-dimensional industrial network of claim 1, further comprising S6 a multi-gradient descent optimization based on multi-task learning, including,

s61, in each training process, the model calculates the relative weight of each task according to the parameter information of each task; the weights are dynamically adjusted during training to balance the learning of the model among the plurality of tasks; calculating the gradient of each task and continuously updating the parameter vector through the gradient until the parameter vector meets the KKT condition, and finding the pareto optimal solution of the original problem;

8. An electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of a multi-dimensional industrial network behavior anomaly detection method according to any one of claims 1-7 when the computer program is executed.

9. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements a multi-dimensional industrial network behavior anomaly detection method according to any one of claims 1 to 7.