CN115146299B

CN115146299B - Safety trusteeship service method based on knowledge graph and domain adaptation

Info

Publication number: CN115146299B
Application number: CN202211083553.9A
Authority: CN
Inventors: 孙捷; 车洵; 梁小川; 胡牧; 金奎�; 孙翰墨; 程佳
Original assignee: Nanjing Zhongzhiwei Information Technology Co ltd
Current assignee: Big Data Security Technology Co ltd
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2022-12-09
Anticipated expiration: 2042-09-06
Also published as: CN115146299A

Abstract

The invention discloses a safe trusteeship service method based on knowledge graph and domain adaptation, comprising the following steps: inputting the prepared network emergency response knowledge graph set into a knowledge graph redundancy processing module, removing redundant features in different knowledge graphs, extracting features with high correlation, and using the processed features as a source domain of a server side; training a network security event inference model by using a source domain at a server side, and broadcasting parameters of the trained network security event inference model to each client side; each client takes the network security log file as a target domain, different source domain-target domain pairs are constructed for different clients, and a network security event inference model of the client is trained; after the training of the network security event inference model of the client is finished, the client uploads the parameters of the reinforced inference module in the local network security event inference model to the server; the method has the characteristic of effectively improving the processing efficiency of the network security event.

Description

Safety trusteeship service method based on knowledge graph and domain adaptation

Technical Field

The invention relates to the technical field of network security, in particular to a safe hosting service method based on a knowledge graph and domain adaptation.

Background

With the increasing innovation and development of internet technology, the network security problem is also more severe, the scale of network attack is increasingly organized, the attack means is continuously changed, diversification and structuralization are realized, and the network emergency response work is also more important.

At present, the traditional network emergency response carries out plan matching according to the content of an alarm event based on characteristics such as keywords or indexes, and the traditional means has the defects of low matching efficiency and low matching accuracy, so that the network security event which is increasingly complicated and has high comprehensive degree is difficult to solve. Meanwhile, the traditional means needs a large amount of manual work to participate in data analysis, has high requirements on data formatting of characteristic information, and is difficult to meet the accuracy requirement of plan matching after a network security incident occurs. The network security emergency response knowledge graph comprises a large amount of characteristic information of an attack means and a corresponding solution, the knowledge graph stores data through a graph structure, the data relation stored by the structure is a non-single relation, and redundant information in the graph is more, so that the subsequent processing is not facilitated. The safety hosting Service (MSS) submits part of heavy and repeated safety operation work to a professional cloud Service provider, and the professional safety operation team develops continuous analysis and operation Service. The enterprise turning to the security hosting service provider can relieve the pressure of the enterprise on information security every day, and by means of the advantages of the security hosting service provider in some security fields, the short boards of the enterprise in security construction or operation management can be supplemented, so that the security management efficiency is improved, and therefore, a security hosting service method based on knowledge graph and domain adaptation is urgently needed to be provided to solve the problems.

Disclosure of Invention

To achieve the above object, the inventor provides a safe hosting service method based on knowledge graph and domain adaptation, comprising the following steps:

s1: preparing a network emergency response knowledge graph set, inputting the network emergency response knowledge graph set into a knowledge graph redundancy processing module, removing redundant features in different knowledge graphs through a graph capsule neural network with self-adaptive feature selection, extracting features with high correlation, and fusing the network emergency response knowledge graph set into a new feature set serving as a source domain of a server side;

s2: training a network security event inference model by using a source domain at a server side, wherein the network security event inference model adopts sub-capsules to encode the characteristics in the source domain, and strengthens the semantic information after encoding through a local reconstruction module;

s3: assembling a plurality of sub-capsules into component capsules, inputting the component capsules into a reasoning module for decoding, generating a network security emergency response plan by the reasoning module through decoding semantic information, and broadcasting parameters of a trained network security event reasoning model to each client;

s4: each client takes the network security log file as a target domain, different source domain-target domain pairs are constructed for different clients, a network security event reasoning model of the client is trained, a reinforced reasoning module is added to the network security event reasoning model of each client during reasoning, and a proper network security emergency response plan is selected according to the result of the reinforced reasoning module;

s5: after the training of the network security event inference model of the client is finished, the client uploads the parameters of the reinforced inference module in the local network security event inference model to the server.

As a preferred embodiment of the present invention, the S1 further includes the steps of:

s101: giving a set of network emergency response knowledge graphs, denoted as S ^N For S ^N The first knowledge-graph S in ¹ Performing redundant processing and relevant feature selection of features in the knowledge graph, and performing redundant processing and relevant feature selection on the knowledge graph S ¹ Respectively constructing node capsules by using nodes a and nodes b of the middle p-th layer

And

s102: calculating the feature mapping vectors of the node a and the node b through the feature mapping layer of the capsule diagram neural network

The expression is as follows:

wherein,

a node capsule representing the ith neighbor node of the p-th layer node a, m represents the number of neighbor nodes of the node a, and represents the node capsule of the node a when i =0,

the node capsule represents the jth neighbor node of the pth layer of node b, n is the number of neighbor nodes of the node b, when j =0, the node capsule represents the node b, and MLP is a multilayer perceptron;

measuring the correlation of two node capsules by using a mutual information function Kinfo, wherein the expression is as follows:

wherein,

to represent

The transpose of (a) is performed,

represent

Exp represents an exponential function with a natural constant e as a base number;

s103: for knowledge graph S ¹ Executing the step S102 on any two nodes in the same layer, carrying out self-adaptive selection on the node feature mapping in each layer, and removing the feature mapping with high redundancy between layersUntil all layers are calculated, and a compressed node feature mapping set in S1 is obtained

The expression is as follows:

wherein f is ^r Set of feature maps representing the r-th layer, f ^s Representing a feature mapping set of the s-th layer, and softmax representing a normalized exponential function;

s104: for S ^N The rest knowledge graphs in (1) are circulated from S102 to S103, and a feature mapping set F of all knowledge graphs is obtained ^S ，

F is to be ^S The source domain is used for training a network security event inference model of the server side.

As a preferred embodiment of the present invention, the S2 includes the steps of:

s201: the network security event reasoning model at the server side adopts two capsule encoders based on a self-attention mechanism to carry out reasoning on the network security events in the source domain

Encoding to generate two sub-capsules

And

the expression is as follows:

wherein Encoder _key Is a key capsule feature extractor composed of a residual error network-50, encoder _value Is a value capsule feature extractor, consisting of a residual error network-50;

s202: and (3) performing characteristic reconstruction on the two sub-capsules by using a local reconstruction module, wherein the characteristic reconstruction is used for abundant semantic information, and the expression is as follows:

wherein

And respectively representing the key capsule and the value capsule after the characteristic reconstruction, and representing the characteristic reconstruction vector by tau and mu, wherein the characteristic reconstruction vector is obtained by automatically learning a network security event inference model of the client during training.

As a preferred embodiment of the present invention, the S3 further includes the steps of:

s301: assembling the output characteristics of the sub-capsules into component capsules, wherein the expression is as follows:

wherein,

representing in the source domain

The resulting part-capsule is then ready for use,

representing two weight parameters, which are automatically learned by a network security event inference model of a client during training and are used for controlling the weight of a key capsule and a value capsule in the characteristics;

s302: for source field F ^S The other subsets execute the steps S201, S202 and S301, the component capsules are spliced together and then input into an inference module, and decoding is carried out in the inference module, namely, a semantic information result in a coding stage is converted into a three-dimensional embedded expression through upsampling, and the expression is as follows:

where Cat denotes the part capsule splicing operation and Decoder denotes the Decoder, consisting of 4 3 x 3-dimensional convolutions, F _Decoder Is a three-dimensional embedded representation.

s303: generating a network security emergency plan, namely performing jump connection on the decoded features and the sub-capsule features in the encoding stage, constructing the network security emergency plan according to the time sequence information, recording the network security emergency plan as Play, setting that no more than m events to be treated exist in one network security emergency plan, wherein the expression is as follows:

wherein, in [ a ] _m-1 ，t _m-1 ]In (a) _m-1 Representing an event to be treated, t _m-1 Indicating the order of the events in the network security emergency protocol,

representing the jth source domain in the source domain

Produced byThe components of the capsule are taken together,

representing jump connection, FAM representing a feature aggregation layer which is composed of convolution of 3 x 3 and double upsampling, and PAM representing a pyramid pooling layer for processing feature vectors of different shapes;

s304: the overall loss function expression of the network security event inference model of the server side is as follows:

wherein, DICE is a similarity measure function, and its expression is:

representing the kth source domain in the source domain

The key capsules in (1) and the key capsules with reconstructed characteristics;

representing the kth source domain in the source domain

A medium value capsule and a reconstructed value capsule;

the expression of the inference module penalty function is:

wherein,

for computing the kth source domain in the source domain

Resulting component capsules

And the p-th event to be disposed [ a ] in the network security emergency plan _p ，t _p ]Relative entropy between;

s305: sending parameters of a network security event inference model of a server side to each Client side ⁱ And i is more than 0 and less than M +1, which indicates that M clients are shared.

As a preferred embodiment of the present invention, the S4 further includes:

s401: for each Client ⁱ Fixing the sub-capsule coding of the network security event inference model of the server end and the parameters of the local reconstruction module, and training the inference module and the reinforced inference module;

client of ith station ⁱ The corresponding target domain is Clog ⁱ Using the domain alignment penalty χ based on the information entropy, the expression is:

wherein,

representing a source domain F ^S For the target domain Clog on the ith client ⁱ Mathematical expectation value of, here

Is each subset in the source domain

And target domain Clog on the ith client ⁱ Relative entropy between, the expression:

wherein, log represents a logarithmic operation;

s402: the method is characterized in that a reinforced reasoning module is introduced into a network security time reasoning model of the client, parameters of the reinforced reasoning module in each client are different, and the reinforced reasoning module reasons an original result of the network security event reasoning model according to local configuration and is used for improving the accuracy rate of network security event handling and the robustness of the model.

As a preferred embodiment of the present invention, the S4 further includes:

s403: when the client is the 1 st client, the final network security emergency plan expression generated by the 1 st client is as follows:

wherein,

the representation server side guides the network security emergency plan generated by the network security event reasoning model in the client side,

represents the final network security emergency plan, clog, on the 1 st client ¹ Representing a target domain on the 1 st client, wherein G and G are convolution layers of 1 × 1, and Refine is a reinforced inference layer which is formed by connecting two groups of activation functions and convolution of 3 × 3 through residual errors;

s404: the loss function expression of the reinforced reasoning module is as follows:

wherein,

s405: the total loss function for model training on the 1 st client is:

wherein

A loss function representing an inference module of a network security event inference model on the server,

a reinforced inference module loss function representing a network security event inference model on a client, a domain alignment loss based on information entropy, | | | | sweet wind ₂ Representing a vector 2 norm operation.

As a preferred embodiment of the present invention, the S5 further includes: when the source domain of the server side is updated in the future, the server finely adjusts the network security event inference model according to the parameters, and the server and the client side only carry out model parameter interaction and do not relate to private information.

Different from the prior art, the technical scheme has the following beneficial effects:

(1) The method takes network security events of different terminals as a plurality of target domains, the server side uses a network security knowledge graph as a source domain, the server side trains a model well and transmits model parameters to the client side, so that inference on different client sides is guided, namely, only model parameter information is transmitted between the server and the client side without transmitting privacy information such as network security log files, the network security events can be analyzed and inferred through the client side, and a network security emergency response plan is automatically matched and disposed.

(2) The conventional network security knowledge graph has a large number of redundant features, the redundant features interfere model training, and accordingly the generalization effect of the model is poor.

Drawings

FIG. 1 is a diagram illustrating the overall architecture of a method according to an embodiment;

FIG. 2 is a diagram of a server side architecture in accordance with an embodiment;

fig. 3 is a diagram of a client architecture in accordance with an embodiment.

Detailed Description

To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.

Referring to fig. 1 to 3, as shown in the figure, the present embodiment provides a security hosting service method based on a knowledge graph and domain adaptation, including the following steps:

s1: preparing a network emergency response knowledge graph set, inputting the network emergency response knowledge graph set into a knowledge graph redundancy processing module, removing redundant features in different knowledge graphs through a graph capsule neural network with self-adaptive feature selection by the knowledge graph redundancy processing module, extracting features with high correlation, and fusing the network emergency response knowledge graph set into a new feature set serving as a source domain of a server side;

In the above embodiment, S1 further includes the steps of:

s101: giving a set of network emergency response knowledge maps, denoted S ^N For S ^N First knowledge-graph S in (1) ¹ Performing redundant processing and relevant feature selection of features in the knowledge graph, and performing redundant processing and relevant feature selection on the knowledge graph S ¹ Respectively constructing node capsules by nodes a and nodes b of the p-th layer

And

as shown in the knowledge-graph redundancy process in fig. 2, S102: calculating the feature mapping vectors of the node a and the node b in the S101 through the feature mapping layer of the capsule diagram neural network

Namely, the characteristics of two nodes and the relationship between the characteristics are expressed by using the characteristic mapping vector, so that the purpose of removing redundancy is achieved, and the expression is as follows:

wherein,

the node capsule represents the jth neighbor node of the pth layer node b, n is the number of neighbor nodes of the node b, and represents the node capsule of the node b when j = 0; MLP is a multilayer perceptron, feature representations of node capsules with low correlation are discarded, feature representations of node capsules with high correlation are reserved, and the effect of adaptive feature selection is achieved.

wherein,

to represent

The transpose of (a) is performed,

to represent

s103: for knowledge graph S ¹ Executing the step S102 on any two nodes in the same layer, carrying out self-adaptive selection on the node feature mapping in each layer, removing the high-redundancy feature mapping between layers until all the layers are calculated, and obtaining the compressed S ¹ Middle node feature mapping set

The expression is as follows:

s104: for S ^N The rest of the knowledge graphs are circulated from S102 to S103 to obtain the feature mapping set F of all the knowledge graphs ^S ，

As shown in the network security event inference model of FIG. 2, let F ^S The source domain is used for training a network security event inference model of the server side.

In the above embodiment, the S2 includes the steps of:

Encoding to generate two sub-capsules

And

the expression is as follows:

wherein Encoder _key Is a key capsule feature extractor consisting of a residual error network-50 (ResNet)-50) composition, encoder _value Is a value capsule feature extractor, consisting of a residual error network-50 (ResNet-50);

s202: the local reconstruction module is used for reconstructing the characteristics of the two sub-capsules, so that the two sub-capsules have richer semantic information, and the expression is as follows:

wherein

The key capsule and the value capsule after characteristic reconstruction are respectively represented, and the tau and the mu represent characteristic reconstruction vectors which are automatically learned by a network security event inference model of a client during training, so that the reconstructed characteristics have richer semantic information.

In the above embodiment, the S3 further includes the following steps:

wherein,

representing in the source domain

The resulting part-capsule is then ready for use,

s302: for source field F ^S The rest subsets execute the steps S201, S202 and S301, the component capsules are spliced together and then input into an inference module, and decoding is carried out in the inference module, namely, a semantic information result in a coding stage is converted into a three-dimensional embedded expression through upsampling, and the expression is as follows:

where Cat denotes the part capsule splicing operation and Decoder denotes the Decoder, consisting of 4 3 x 3-dimensional convolutions, F _Decoder Is a three-dimensional embedded representation;

in the above embodiment, the S3 further includes the following steps:

s303: the second operation in the inference module is network security emergency plan generation, namely jump connection is carried out on the decoded features and the sub-capsule features in the encoding stage, the network security emergency plan is constructed according to the time sequence information and recorded as Play, no more than m events to be treated in one network security emergency plan are set, and the expression is as follows:

representing the jth source domain in the source domain

The resulting part-capsule is then ready for use,

representing jump connection, FAM representing a feature aggregation layer which is composed of 3 × 3 convolution and double upsampling, and PAM representing a pyramid pooling layer, so that feature vectors of different shapes can be conveniently processed;

wherein, DICE is a similarity measure function, and its expression is:

representing the kth source domain in the source domain

representing the kth source domain in the source domain

A medium value capsule and a reconstructed value capsule;

the expression of the inference module penalty function is:

wherein,

for computing the kth source domain in the source domain

Resulting part capsule

in this embodiment, the MSS sends parameters of a network security event inference model of a server to each Client ⁱ And O < i < M +1, which means that M clients are shared.

In the above embodiment, S4 further includes the step of:

for each Client ⁱ As shown in fig. 3, the sub-capsule coding and local reconstruction module of the network security event inference model at the server end is fixed, and only the inference module and the reinforced inference module are trained;

client side of the ith station ⁱ For example, the corresponding target domain is Clog ⁱ To solve the problem of inconsistent content distribution in the source domain and the target domain, the present embodiment uses a domain alignment loss χ based on the information entropy, and the expression is:

wherein,

Is each subset in the source domain

And a target domain Clog on the ith client ⁱ Relative entropy between, the expression:

wherein, log represents a logarithmic operation;

because the physical and software environments of each client are inconsistent, a reinforced reasoning module is additionally introduced into the network security time reasoning model of the client, parameters of the reinforced reasoning module in each client are different, and the reinforced reasoning module can reasoned the original result of the network security event reasoning model according to the configuration of the local computer, so that the accuracy rate of handling the network security event and the robustness of the model can be improved;

taking the 1 st client as an example, the final network security emergency plan expression generated by the 1 st client is as follows:

wherein,

the network security emergency plan generated by the network security event inference model in the client is guided by the presentation server,

the loss function expression of the reinforced reasoning module is as follows:

wherein,

thus, the total loss function for model training on the 1 st client is:

wherein

After the training of the network security event inference model on the client is completed, the parameters of the enhanced inference module are uploaded to the server, as shown in fig. 1, when the source domain of the server end is updated in the future, the server can finely tune the network security event inference model through the parameters, and the server and the client only carry out model parameter interaction and do not relate to private information.

In order to verify the accuracy of the method, using Malware Training Sets and mistre D3fend (network emergency response knowledge graph), the Malware Training Sets are a machine learning data set, and are intended to provide a useful classification data set for researchers who wish to use machine learning techniques to deeply study Malware analysis. Forming a contrast experiment by adopting 4 different model structures and the method used by the method, and calculating the accuracy of the semantic feature similarity of the data set; the experimental results are shown in the table below, where the F1 value = correct rate recall 2/(correct rate + recall) is used to characterize the actual average of both accuracy and recall.

The results of the BiGRU bidirectional gating cycle unit, the Siamese-BiGRU twin neural network-bidirectional gating cycle unit, the Linkage hierarchical clustering and the BERT + WMD distance model (self-coding language model) are compared, so that the method has high accuracy, the accuracy reaches 87.1 percent, and the recall rate reaches 85.1 percent, which shows that the method can deduce more effective samples; the F1 value is a harmonic mean of the accuracy and the recall rate, and the F1 value of the method reaches 86.1 percent. The experimental results prove that the method can effectively reason the network security events and generate the network security emergency plan.

In addition, the method takes the network security events of different terminals as a plurality of target domains, the server side uses the network security knowledge graph as a source domain, the model is trained well at the server side and the model parameters are transmitted to the client side, so that inference on the target domains is guided on different client sides, namely, only the model parameter information is transmitted between the server and the client side without transmitting privacy information such as network security log files and the like, the network security events can be analyzed and inferred through the client side, and the network security emergency response plan is automatically matched and handled. The conventional network security knowledge graph has a large number of redundant features, the redundant features interfere model training, and accordingly the generalization effect of the model is poor.

It should be noted that, although the above embodiments have been described herein, the invention is not limited thereto. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by changing and modifying the embodiments described herein or by using the equivalent structures or equivalent processes of the content of the present specification and the attached drawings, and are included in the scope of the present invention.

Claims

1. A safe hosting service method based on knowledge graph and domain adaptation is characterized by comprising the following steps:

s5: after the training of the network security event inference model of the client is finished, the client uploads the parameters of the reinforced inference module in the local network security event inference model to the server;

the S1 further comprises the following steps:

s101: giving a set of network emergency response knowledge maps, denoted S ^N For S ^N First knowledge-graph S in (1) ¹ Performing redundant processing and correlation feature selection on features in the knowledge graph to obtain a knowledge graph S ¹ Respectively constructing node capsules by nodes a and nodes b of the p-th layer

And

The expression is as follows:

wherein,

representing node capsules of a jth neighbor node of a pth layer node b, wherein n is the number of neighbor nodes of the node b, when j =0, representing the node capsules of the node b, and MLP is a multilayer perceptron;

wherein,

to represent

The transpose of (a) is performed,

to represent

The expression is as follows:

s104: for S ^N The rest of the knowledge graphs are circulated from S102 to S103 to obtain the feature mapping set F of all the knowledge graphs ^S ,

F is to be ^S For use as source domainTraining a network security event reasoning model at a server side;

the S2 comprises the following steps:

Encoding to generate two sub-capsules

And

the expression is as follows:

wherein

2. The knowledge-graph and domain-adaptation based secure hosting service method according to claim 1, wherein the S3 further comprises the steps of:

wherein,

representing in the source domain

The resulting part-capsule is then ready for use,

wherein Cat representsPart capsule splicing operation, decoder, consisting of 4 3 by 3 dimensional convolutions, F _Decoder Is a three-dimensional embedded representation.

3. The knowledge-graph and domain-adaptation based secure hosting service method according to claim 2, wherein the S3 further comprises the steps of:

s303: generating a network security emergency plan, namely performing jump connection on the decoded features and the sub-capsule features in the encoding stage, constructing the network security emergency plan according to the time sequence information, recording the network security emergency plan as Play, setting that no more than m events to be handled exist in one network security emergency plan, wherein the expression is as follows:

wherein, in [ a ] _m-1 ,t _m-1 ]In (a) _m-1 Representing an event to be treated, t _m-1 Indicating the sequence of the event in the network security emergency protocol,

representing the jth source domain in the source domain

The resulting part-capsule is then ready for use,

wherein, DICE is a similarity measure function, and its expression is:

representing the kth source domain in the source domain

The key capsule in (1) and the key capsule after characteristic reconstruction;

representing the kth source domain in the source domain

The value of (1) and the value after reconstruction capsule;

the expression of the inference module penalty function is:

wherein,

for computing the kth source domain in the source domain

Resulting part capsule

And the p-th event to be disposed [ a ] in the network security emergency plan _p ,t _p ]Relative entropy between;

s305: sending parameters of a network security event inference model of a server side to each Client side ⁱ ,0<i<And M +1, representing M clients in total.

4. The knowledge-graph and domain-adaptation based secure hosting service method according to claim 3, wherein the S4 further comprises the steps of:

wherein,

Is each subset in the source domain

wherein, log represents a logarithmic operation;

5. The knowledge-graph and domain-adaptation based secure hosting service method of claim 4, wherein the S4 further comprises the steps of:

wherein,

wherein,

s405: the total loss function for model training on the 1 st client is:

wherein

6. The knowledge-graph and domain-adaptation based secure hosting service method according to claim 1, wherein the S5 further comprises the steps of: when the source domain of the server end is updated in the future, the server finely adjusts the network security event reasoning model through the parameters, and the server and the client only carry out model parameter interaction without involving private information.