CN114037091B

CN114037091B - Expert joint evaluation-based network security information sharing system, method, electronic equipment and storage medium

Info

Publication number: CN114037091B
Application number: CN202111332573.0A
Authority: CN
Inventors: 叶麟; 胡灵娟; 黄洁润; 胡振鹏; 彭凤杰; 杨晓丽; 杨立炳; 叶甜甜; 成燕; 梁稚媛; 张宏莉; 杨语晨; 尹公主; 孟超
Original assignee: Harbin Institute of Technology; Shanghai Pudong Development Bank Co Ltd
Current assignee: Harbin Institute of Technology; Shanghai Pudong Development Bank Co Ltd
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2024-05-28
Anticipated expiration: 2041-11-11
Also published as: CN114037091A

Abstract

The application discloses a network security information sharing system and method based on expert joint evaluation, electronic equipment and a storage medium, and belongs to the technical field of network security. The application introduces dynamic weighted expert committee and active learning idea into the process of information sharing and research and judgment, the central node uses trained risk classifier to research and judge risk category of all safety information, feeds the research and judgment result back to each node expert, the expert improves self analysis process according to the research and judgment result, the central node backups all safety information and risk category output by the risk classifier and then uploads the data to the superior data service processing center, thus greatly improving the accuracy of risk research and judgment in the network safety information sharing mechanism and being beneficial to enhancing member analysis capability in the network safety information sharing mechanism.

Description

Expert joint evaluation-based network security information sharing system, method, electronic equipment and storage medium

Technical Field

The application relates to a network security information sharing system, a network security information sharing method, electronic equipment and a storage medium, in particular to a network security information sharing system, a network security information sharing method, electronic equipment and a storage medium based on expert joint evaluation, and belongs to the technical field of network security.

Background

In recent years, the strength of network security attack is further improved, and each organization cannot defend against isolated, thin and fragmented network security information. Therefore, the establishment of the network security information sharing mechanism is a serious issue in the current network security work. The network security information sharing mechanism can effectively relieve the problem of information asymmetry, mobilize and coordinate the whole society to realize real-time group prevention and group control, and improve the effect of network security management based on security big data. In the mechanism, the safety information of each department can be interacted and summarized, and the risk research and judgment model learns the safety risk category characteristics through summarized information, so that the defect of strong one-sided performance existing in the past learning based on single-node information is effectively avoided.

There has been some research on information sharing mechanisms, but little research is done specifically on network security information sharing mechanisms. The prior researches mainly adopt the following schemes:

(1) A mechanism for information sharing based on block chains. Blockchain technology has the advantages of decentralization, non-real name and co-maintainability, and many systems realize information sharing based on the blockchain technology. DORRI et al devised a distributed privacy protection and security architecture based on a federated chain in which transaction data is packaged by a block manager node authenticated by a trusted third party, each node being connected into the blockchain and generating transactions at time intervals. KANG et al propose a dual stage soft security enhancement scheme with separate miner selection and block verification, using a reputation based voting scheme to ensure safe miner selection in the first stage, and further verifying and auditing each backup miner using a new block in the second stage. However, because the confidentiality of the network security information is strong, the information cannot be directly transferred by using the blockchain technology.

(2) Information sharing mechanism based on excitation theory. Information sharing mechanisms are highly dependent on the quality of information provided by each node, so many systems apply incentive theory to information sharing mechanisms to enhance the sharing willingness and analysis capabilities of the nodes. HUAN et al analyze random disturbance of market demand caused by sudden events and propose a dairy product supply chain information sharing mechanism according to overall benefits. Zhang et al propose an engineering construction safety production information sharing and credit evaluation mechanism based on an excitation theory, and apply excitation measures with different degrees to different levels of building enterprises by combining a analytic hierarchy process.

(3) Information sharing mechanism based on game theory. In the information sharing process, each node has independent information judging and selecting decision-making capability and takes the benefit maximization of own party as a principle, thus being a continuous game process. Wu et al analyze the total benefits of technical innovation by means of an evolution game model, and research the influence of benefit distribution ratio on an information sharing mechanism. Lv Lu et al use game theory to build information resource sharing evolution blogging model between military enterprises and civil enterprises, and perform stability analysis on the equilibrium point of the replication dynamic equation to obtain evolution stability strategy. However, in the network security information sharing mechanism, most of the participating nodes are security departments of each organization, and the benefits cannot be directly quantified, so that the game theory is not applicable.

In a network security information sharing mechanism, the privacy of transmitted information is strong, the degree of dependence of an analysis process on expert knowledge is high, and the member analysis capability difference is large, so that the scheme cannot be directly applied.

Disclosure of Invention

In view of the above, the application provides a network security information sharing system, a method, an electronic device and a storage medium based on expert joint evaluation, so as to solve the problems of large member analysis capability gap and low risk research and judgment accuracy of the existing network security information sharing mechanism; the application introduces the dynamic weighted expert committee and the active learning idea into the information sharing and studying and judging process, greatly improves the risk studying and judging accuracy in the network security information sharing mechanism, and is beneficial to enhancing the member analysis capability in the network security information sharing mechanism.

The technical scheme of the application is realized as follows:

Scheme one: a network security information sharing system based on expert joint evaluation, comprising: n end nodes, a central node;

Each end node comprises a data preprocessing module, an expert data labeling module and an expert weight updating module; the center node comprises an initial risk classifier training module, an expert annotation accuracy calculating module, an updating judging module and a data uploading module;

The data preprocessing module is used for data cleaning and processing and performing preliminary marking on risk categories of network security information content;

The initial risk classifier training module is used for receiving network security data from each end node, summarizing and sorting, selecting a small number of samples with risk labeling information for training to obtain an initial risk classifier, classifying unlabeled samples in summarized information by using the initial risk classifier, selecting samples with uncertainty higher than a threshold epsilon, and distributing the samples to each end node, wherein the threshold epsilon is set according to application scenes;

The expert data labeling module labels the distributed samples by the expert committee of each end node;

The expert annotation accuracy calculation module is used for acquiring expert weights in the expert committee of each end node and obtaining sample annotation and probability distribution by using a weighted voting mode;

The expert weight updating module is used for updating the expert weight in the expert committee according to the sample labeling accuracy;

The updating judging module is used for adding a part with the confidence coefficient larger than a constant lambda in the sample marked by the expert into the initial risk classifier training module, and incrementally training the risk classifier, wherein the constant lambda is set according to the application scene;

And the data uploading module is used for carrying out data backup on all the safety information and the risk categories output by the risk classifier and then uploading the data together to the upper data service processing center.

Scheme II: a network security information sharing method based on expert joint evaluation comprises the following steps:

step one, each end node preprocesses network security information to be uploaded;

The preprocessing operation comprises data cleaning processing and preliminary labeling of risk categories for the network security information content;

step two, the central node receives network security data from all end nodes, gathers and sorts the network security data, and selects a small amount of samples with risk marking information to train an initial risk classifier;

Classifying unlabeled samples in the summarized information by using a risk classifier, and selecting samples with uncertainty higher than a threshold epsilon to be distributed to expert committees consisting of nodes for labeling; the threshold epsilon is freely set according to a specific application scene;

step four, acquiring expert weights in expert committees of all the end nodes and obtaining sample labels and probability distribution thereof by using a weighted voting mode;

step five: updating expert weights in the expert committee according to the sample labeling accuracy;

Step six: adding a part with confidence coefficient larger than a constant lambda in a sample marked by an expert into an initial risk classifier training module, and incrementally training a risk classifier, wherein the constant lambda is set according to an application scene, and repeating the second step to the fourth step until one of the following conditions is met:

① No new unlabeled samples to be handed to the expert committee can be selected;

② Reaching preset iteration times;

Step seven: the central node uses the trained risk classifier to conduct risk category research and judgment on all safety information, the research and judgment results are fed back to each node expert, and the expert improves the self analysis process according to the research and judgment results;

Step eight: and the central node performs data backup on all the security information and the risk categories output by the risk classifier and then uploads the data backup to the upper data service processing center.

Further: in the first step, the network security information content comprises security information assets and various element indexes; the data cleansing processing operation includes checking data consistency, processing invalid values and missing values.

Further: in the third step, the uncertainty is measured by the information quantity of the sample class probability distribution calculated by the risk classifier, namely, the uncertainty is calculated by adopting an information entropy mode:

where n represents the total number of risk categories, p _i represents the probability that a sample is determined by the classifier to belong to the i-th category, and H (x) represents the entropy of the information of sample x.

Further: the specific operation of the fourth step is as follows:

Step four, first: initializing each expert weight in the expert committee if the training is the primary training, otherwise, acquiring the expert weight in the previous round;

The initial weight is defined as 1/N, wherein N is the total number of members of the expert committee, and Wj is set to represent the weight of the jth expert

Step four, two: each node expert marks the sample, and the voting result of each expert is summarized by using a weighted voting mode as the final sample mark, and the calculation process is as follows:

Wherein the method comprises the steps of The final labeling result of the sample x after expert voting is represented by i, V (y _ij) represents whether the j-th expert classifies the sample risk as i, if yes, the sample risk is 1, otherwise, the sample risk is 0, the summation symbol represents the summation of voting results of all classifiers, W _j represents the weight of the j-th expert, and N represents the total number of the experts in the expert committee.

Further: the specific operation of the fifth step is as follows:

Step five: calculating the Kullback-Leibler divergence of all data labels of each expert committee, and evaluating the expert classification accuracy through the sum of all the labeled Kullback-Leibler divergences;

Further: the specific calculation formula of the classification KL divergence of the expert j to the sample x is as follows:

wherein P (x) represents the risk category probability distribution of sample x after voting [ P ₁,p₂,...p_n],Q_j (x) represents the risk category probability distribution of sample x calculated by expert j [ q ₁,q₂,...q_n ];

step five: updating the expert weight according to the expert classification accuracy calculated in the fifth step.

Further: the expert weight updating process comprises the following steps: firstly, calculating the KL divergence sum of all samples by each expert; then taking the logarithm of the inverse sum of KL divergence and normalizing; the specific calculation formula is as follows:

where M represents the total number of samples and N represents the total number of experts involved in the evaluation.

Scheme III: an electronic device comprising a processor and a memory for storing a computer program capable of running on the processor,

Wherein the processor is configured to execute the steps of the method according to the second aspect when running the computer program.

Scheme IV: a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of claim two.

The beneficial effects of the application are as follows:

The application provides a network security information sharing mechanism based on expert joint evaluation, which introduces a dynamic weighted expert committee and an active learning idea into the information sharing and studying and judging process, greatly improves the risk studying and judging accuracy in the network security information sharing mechanism, and is beneficial to enhancing the member analysis capability in the network security information sharing mechanism.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 is a block diagram of a network security information sharing system based on expert joint evaluation according to an embodiment of the present application;

Fig. 2 is a flow chart of a network security information sharing method based on expert joint evaluation according to a second embodiment of the present application;

Fig. 3 is a schematic structural diagram of an electronic device according to the present application.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and not limiting of the application. It should be noted that, for convenience of description, only the portions related to the application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

The first embodiment of the application provides a network security information sharing mechanism based on expert joint evaluation, wherein the information sharing mechanism is that an upper-level department shares information, risk research and judgment results and the like related to network security to each lower-level department, and simultaneously each lower-level department reports the information related to threat, attack and the like to the upper-level department, and the lower-level departments mutually exchange information which is helpful for network security protection work. According to the application, by introducing an active learning technology based on an expert committee, more accurate risk assessment is performed on shared information, and meanwhile, the labeling accuracy of members of the expert committee is assessed, so that the risk assessment capability of the system is improved, and the working efficiency of the information sharing system is finally improved.

Example 1

Fig. 1 shows a block diagram of a network security information sharing system based on expert joint evaluation according to embodiment 1 of the present application.

A network security information sharing system based on expert joint evaluation, comprising: n end nodes, a center node; each end node comprises a data preprocessing module, an expert data labeling module and an expert weight updating module; the center node comprises an initial risk classifier training module, an expert annotation accuracy calculating module, an updating judging module and a data uploading module; the data preprocessing module is used for data cleaning and processing and performing preliminary marking on risk categories of network security information content; the initial risk classifier training module is used for receiving network security data from each end node, summarizing and sorting, selecting a small number of samples with risk labeling information for training to obtain an initial risk classifier, classifying unlabeled samples in summarized information by using the initial risk classifier, selecting samples with uncertainty higher than a threshold epsilon, and distributing the samples to each end node, wherein the threshold epsilon is set according to application scenes; the expert data labeling module labels the distributed samples by the expert committee of each end node; the expert annotation accuracy calculation module is used for acquiring expert weights in the expert committee of each end node and obtaining sample annotation and probability distribution by using a weighted voting mode; the expert weight updating module is used for updating the expert weight in the expert committee according to the sample labeling accuracy; the updating judging module is used for adding a part with the confidence coefficient larger than a constant lambda in the sample marked by the expert into the initial risk classifier training module, and incrementally training the risk classifier, wherein the constant lambda is set according to the application scene; and the data uploading module is used for carrying out data backup on all the safety information and the risk categories output by the risk classifier and then uploading the data together to the upper data service processing center.

Example two

In order to better explain the purposes and advantages of the present embodiment, the second embodiment of the present application provides a network security information sharing method based on expert joint evaluation (see fig. 2), and the following description is further provided in detail. The application carries out simulation experiments in a computer according to the following steps:

S1: each endpoint preprocesses the network security information to be uploaded.

The application carries out simulation experiments on database transaction security risk assessment data from a scientific research institution. There are 170 pieces of initial training data and 2300 pieces of test data. The input characteristics of each piece of data are specific factors influencing risk assessment, and the total number of the input characteristics is 12; the output result is a risk assessment grade, which is divided into two grades of low risk and high risk. The training data label is evaluated by a security expert, and the test data label is evaluated by a real user according to actual conditions. In order to simulate the process of uploading network security information by each end node, different experts respectively label different parts of training data by adopting an AHP method, and the labeling results of all the parties are converged to obtain a complete training set.

S2: the central node receives network security information from each endpoint, collects and cleans the network security information, and selects a small amount of samples with risk labeling information to train an initial risk classifier.

In the experimental process, the initial risk classifier uses SVM to simulate, and the punishment parameter is set to be 0.3. The SVM classifier is trained using training data labeled by each expert.

S3: and classifying unlabeled samples in the summarized information by using a risk classifier, and selecting samples with uncertainty higher than a certain threshold value and distributing the samples to expert committees consisting of the nodes for labeling.

And classifying the test data by using a risk classifier, calculating the information entropy of each sample in the classification result, selecting samples with the information entropy larger than a threshold epsilon, and adding the samples into the set to be annotated by the expert. Wherein epsilon is a constant between 0 and 1, and the control expert needs to mark the number of samples.

S4: and obtaining the weight of each expert in the expert committee and obtaining the sample label and the probability distribution thereof by using a weighted voting mode.

S4.1: if the training is the primary training, initializing each expert weight in the expert committee. Otherwise, the expert weight in the previous coherence is obtained.

The initial weight is defined as 1/N, where N is the total number of expert committee members. Each expert also uses an SVM classifier for simulation, but the penalty parameter values are set differently. In a specific experimental process, penalty parameters are equidistantly set within the range of [0.1,0.6] according to the number of experts N. And during initial training, training each expert classifier by using the original training data to obtain an expert set with different scoring preferences. In practice, the expert in the committee, such as xgboost, lightGBM, etc., may also be simulated using different kinds of classifiers.

S4.2: and marking the samples by each expert, and summarizing the voting results of each expert by using a weighted voting mode to serve as final sample marking.

And predicting the test data by using an expert SVM classifier, and solving a weighted average value of each expert prediction result to obtain a final labeling result.

S5: and updating the expert weights in the expert committee according to the sample labeling accuracy.

S5.1: and calculating the Kullback-Leibler divergence of all data labels of each expert committee, and evaluating the expert classification accuracy through the sum of all the labeled Kullback-Leibler divergences.

And calculating the KL divergence of the expert prediction distribution and the final result distribution for each sample, and summing the KL divergence sum of each sample to obtain the difference between the expert classification and the final distribution.

S5.2: and updating the expert weights according to the expert classification accuracy calculated in the step S5.1.

And the updated expert weight is obtained by taking the logarithm of the inverse KL divergence of all the experts and normalizing.

S6: and adding a part with higher confidence in the sample marked by the expert into the training set, and incrementally training the risk classifier.

And adding expert committee labeling data into the training set, and retraining the risk classifier by taking the final labeling of the expert committee on the sample as a label.

Repeating S2-S4 until one of the following conditions is met:

② The preset iteration times are reached.

In the experimental process, the iteration number is set to 3.

S7: the central node uses the trained risk classifier to conduct risk category research and judgment on all safety information, the research and judgment results are fed back to each node expert, and the expert improves the self analysis process according to the research and judgment results.

And feeding the classification result of the final test set back to each expert, adding the training data into the expert by each expert, retraining the training data, and simulating the improvement process.

S8: and the central node performs data backup on all the security information and the risk categories output by the risk classifier and then uploads the data backup to the upper data service processing center.

And outputting the classification result of the final test set, and calculating each index.

In order to verify the effect of the network security information sharing mechanism provided by the application on security risk assessment and judgment, the network security information sharing mechanism is compared with the conditions of no information sharing mechanism, no expert committee and expert committee but using fixed weights, and the precision, recall ratio, F1 value and accuracy of risk judgment under each condition are recorded, and the obtained results are shown in table 1.

Table 1 comparison of experimental effects of whether network security information sharing mechanism is adopted

	Precision ratio of	Recall ratio	F1 value	Accuracy rate of
					Sharing-free	0.548	0.828	0.659	0.572
Expert-free committee	0.663	0.702	0.682	0.673
					Expert weight fixing	0.672	0.711	0.693	0.680
The method of the application	0.678	0.720	0.699	0.690

As can be seen from Table 1, when there is no sharing mechanism, each risk classifier can only acquire part of the data in the training set, so that the learned knowledge is limited, and the precision and accuracy are low. When the expert committee mechanism is not used, the expert knowledge of each party cannot be interactively fused, and the effect is more time difference than that of the expert committee mechanism. In addition, the expert weight dynamic calculation algorithm adopted by the application is improved compared with the method adopting fixed weights, which proves that the weight updating method adopted by the application has a certain promotion effect on an information sharing mechanism.

Example III

An electronic device according to a third embodiment of the present application is shown in fig. 3, and is in the form of a general-purpose computing device. Components of an electronic device may include, but are not limited to: one or more processors or processing units, a memory for storing a computer program capable of running on the processor, a bus connecting the different system components (including the memory, the one or more processors or processing units).

Wherein the one or more processors or processing units are configured to execute the steps of the method according to embodiment two when the computer program is run. The processor may be of a type that includes a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof.

Where a bus represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Example IV

A fourth embodiment of the present application provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method described in the second embodiment.

The storage medium shown in the present application may be a computer readable signal medium or a storage medium, or any combination of the two. The storage medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this patent, a storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the storage medium may include a data signal that propagates in baseband or as part of a carrier wave, in which computer readable program code is carried. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A storage medium may also be any computer-readable medium that can transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The foregoing embodiments have further described the objects, technical solutions and advantageous effects of the present application in detail, and it should be understood that the foregoing embodiments are merely examples of the present application and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical solutions of the present application should be included in the scope of the present application.

Claims

1. A network security information sharing system based on expert joint evaluation, comprising: n end nodes, a central node;

2. The network security information sharing method based on expert joint evaluation is characterized by comprising the following steps of:

② Reaching preset iteration times;

3. The network security information sharing method based on expert joint evaluation according to claim 2, wherein in the first step, the network security information content includes security information assets and various element indexes; the data cleansing processing operation includes checking data consistency, processing invalid values and missing values.

4. The network security information sharing method based on expert joint evaluation as claimed in claim 3, wherein in the third step, uncertainty is measured by the information quantity of the sample class probability distribution calculated by the risk classifier, namely, the uncertainty is calculated by adopting an information entropy mode:

5. The network security information sharing method based on expert joint evaluation as claimed in claim 4, wherein the specific operation of the fourth step is as follows:

The initial weight is defined as 1/N, wherein N is the total number of members of the expert committee, and W _j is set to represent the weight of the jth expert

6. The network security information sharing method based on expert joint evaluation as claimed in claim 5, wherein the specific operation of the fifth step is as follows:

7. The network security information sharing method based on expert joint evaluation according to claim 6, wherein in the fifth step, the specific calculation formula of the classification KL divergence of the expert j to the sample x is as follows:

Where P (x) represents the risk category probability distribution of sample x after voting [ P ₁,p₂,...p_n],Q_j (x) represents the risk category probability distribution of sample x calculated by expert j [ q ₁,q₂,...q_n ].

8. The network security information sharing method based on expert joint evaluation according to claim 7, wherein in the fifth step, the updating expert weight process is as follows:

firstly, calculating the KL divergence sum of all samples by each expert; then taking the logarithm of the inverse sum of KL divergence and normalizing; the specific calculation formula is as follows:

9. An electronic device, characterized in that: comprising a processor and a memory for storing a computer program capable of running on the processor,

Wherein the processor is adapted to perform the steps of the method of any of claims 2 to 8 when the computer program is run.

10. A storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method according to any of claims 2 to 8.