CN114124503B

CN114124503B - Intelligent network sensing method for optimizing efficiency of progressive concurrent cache

Info

Publication number: CN114124503B
Application number: CN202111350286.2A
Authority: CN
Inventors: 韩道岐; 陆月明; 王东滨; 杨键; 王皓; 吕陆琴
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2022-09-27
Anticipated expiration: 2041-11-15
Also published as: CN114124503A

Abstract

The invention discloses an intelligent network perception method for optimizing efficiency by level concurrent cache, which belongs to the field of network security and specifically comprises the following steps: firstly, building home internet communication gateway equipment, integrating a high-speed high-capacity memory and a network card kernel on an embedded computing platform based on an SoC (system on chip), building a high-capacity on-chip storage resource SPM (system on chip), and caching flow obtained by mirroring. Based on DPDK technology, the method realizes the concurrent acquisition of flow in different areas of a plurality of physical network cards. Receiving and distributing data packets through two-stage lock-free concurrency and user mode protocol stacks; then, through analysis and concurrent detection processing of each protocol field, extraction, filtration and caching processing of IP protocol data fields are completed, and the IP protocol data fields are distributed to subsystems with different capabilities through three channels for detection processing; and finally, fusing the results of various detections, storing the abnormal sample, sampling the normal sample, and associating rich context information. The invention meets the deployment requirement of a high-efficiency security situation perception system in the future intelligent home network environment.

Description

Intelligent network sensing method for optimizing efficiency of progressive concurrent cache

Technical Field

The invention belongs to the technical field of network security, and particularly relates to an intelligent network perception method for optimizing efficiency by level-by-level concurrent caching.

Background

With the proliferation and the intellectualization of the equipment of the internet of things and the large-scale popularization of cloud services, a new edge computing architecture is promoted to rise rapidly. The novel architecture is arranged around a user, and by constructing a large-scale autonomous Internet of things system, user-oriented characteristic services such as equipment unified management, data caching, privacy protection and intelligent services are formed, so that the user can conveniently manage and make decisions in a one-point manner. A user can use intelligent and high-performance computing equipment to construct a plurality of redundant local aggregation service centers, and the problems of response time, battery life, sensor networking intelligence, bandwidth saving, data safety and privacy and the like are solved.

However, the conventional intrusion detection and situation awareness system has the problems of high false alarm rate, poor vulnerability and application layer detection capability, and the like, and a new generation of situation awareness and information security assessment model needs to be researched, so that different types of data with multiple dimensions, such as network flow, logs, assets, internal management (structured data such as state, audit and relation) and the like, can be comprehensively perceived and fused, and after context information is associated and expanded, the capabilities of attack detection, exception prompt and associated early warning prompt are improved. And the method is oriented to a future 6G high-performance intelligent home security gateway, realizes connection of a mirror image port and a home local area network, and acquires network traffic data at a high speed.

The constructed lightweight intrusion detection and defense system can be deployed and integrated into the edge equipment of the Internet of things in a lightweight manner, and full-dimensional extraction features and user portrait are realized. In order to adapt to the characteristics of the mass time sequence data source of the Internet of things, a large data management technology in a light weight layered fragmentation mode is also needed to realize the gradual desensitization type storage of the memory, the local hard disk and the cloud disk. And (4) iterating, preferably selecting appropriate detection models and matched data sets aiming at different time sequence cycle fresh-keeping requirements, and forming the capabilities of automatic evolution and adaptation to different environments.

With the gradual maturity of edge computing technology, and the characteristics of real-time performance and close to user service, the safety protection requirements of real-time detection and response of a family can be supported. The 6G high-speed gateway can become the key network equipment configuration of a family in the future, based on a mirror image port of the 6G gateway and a transparent connection home local area network, can complete the tasks of acquiring basic network flow, asset data and log recording, fusing scattered data and redundant multi-source alarm, deeply analyzing message content, and presenting and predicting the comprehensive security situation of gateway equipment, host equipment, safety equipment, mobile equipment, Internet of things and other equipment in a future smart family. The research on the next generation of intelligent network security situation perception model with full-member linkage characteristics is imminent.

Disclosure of Invention

The invention provides an intelligent network sensing method for optimizing efficiency by a step-by-step concurrent cache, which is based on the centralized acquisition, detection and storage method of the existing centralized network situation sensing system, neglects user data rights and experiences and the current situation of environmental restriction and cannot effectively solve the problems of real-time situation sensing and threat assessment in a smart home environment.

The network flow acquisition, detection and storage method comprises the following specific steps:

firstly, pairing a single network card core and a single CPU core on an embedded computing platform based on SoC to form independent partitions, respectively constructing a plurality of high-capacity on-chip storage resource SPM (scratch Pad memory) blocks in each partition through an integrated memory, and directly accessing by using memory addresses; thereby constituting a home internet communication gateway apparatus.

2M data including network card data and CPU processed data are stored in each SPM block; a plurality of SPM blocks construct a cache partition, and the requirement of different types of network card packet processing rates is met; the data stored on each SPM block is directly accessed through the memory address, and continuous stream access among the blocks is carried out through the initial address and the load data length without being transmitted through a data bus. The zero-copy direct access data of the user mode and the packet loss-free technology of continuous stream reading of the large cache are realized.

Step two, the home internet communication gateway equipment is respectively connected with each internet device in a home, each network card parallelly collects the flow of different internet devices, and user mode multithreading concurrent processing is carried out based on a data Plane Development kit DPDK (data Plane Development kit) technology;

the specific process is as follows:

firstly, each network card is packed by a distribution main process, after the packing is sequenced according to 5-tuple, unique concurrent UFID is constructed by the hash calculation of the character strings, and the calculation formula is as follows:

UFID＝shash(sort(srcip,srcport,dstip,dstport),protocol) (1)

wherein, srcip, srcport, dstip and dstport are respectively a source ip address, a source port, a destination ip and a destination port; the method comprises the steps that two-way data packets are guaranteed to be sent and received through sort processing, the four fields are spliced into character strings in the same sequence, and after a protocol type field is added, a shash algorithm is used for calculating a character string hash id;

and then, according to the number C of the concurrent channels, carrying out hash calculation on character strings of the bidirectional data packets of the same flow, respectively attributing the character strings to corresponding Qi queues for storage, and distributing the character strings to a processing sub-process through the queues.

Thirdly, the processed flow packets are parallelly distributed to a three-channel subsystem for detection according to IP protocol detection and routing configuration;

after completing IP protocol data field extraction, filtering and caching for each flow packet, each sub-process processes the flow packet through three sub-systems, and detects and fuses the records of network behavior abnormity in parallel; the three-channel subsystem comprises an application layer abnormal rule detection subsystem, a network layer flow characteristic and abnormal model detection subsystem, a PCAP (personal computer application protocol) and a evidence obtaining file detection subsystem;

the method comprises the following specific steps:

step 301, distributing the processed traffic packet to an application layer abnormal rule detection subsystem in an IP tunnel mode, extracting metadata defining a rule and application layer protocol metadata, and performing rule matching detection through a calculation logic expression;

the metadata comprises network tcp/ip packet quintuple, each protocol field, uplink byte number, downlink byte number and other statistical fields;

application layer protocol metadata such as URL of http protocol, request code, response code, submitted data, and responded data;

and (4) detecting a rule matching type, and alarming in real time in order to find abnormal behaviors with obvious network flow and protocol field characteristics.

Step 302, forwarding the extracted application layer protocol metadata to a machine learning model through a distributed message queue, performing near-real-time deep detection by using a network layer traffic characteristic and abnormal model detection subsystem, and finding fuzzy user behavior labels such as semantics, similarity and the like, such as activity labels of malicious behaviors, abnormal data transmission, information detection and the like;

the method specifically comprises the following steps: aiming at network traffic carried and triggered by malicious behaviors to be executed in an application layer, establishing a plurality of analysis models such as characteristic field scanning, state machine reasoning, machine learning model classification, abnormal semantics and abnormal logic flow and labeling the application behavior type of the network traffic from a plurality of dimensions;

step 303, at the same time, directly storing the IP protocol data fields in the extracted metadata into a time sequence database, performing stream processing and multiple index statistics, forwarding the statistical indexes to a machine learning model through a distributed message queue, performing long-period network behavior anomaly detection and classification detection by using a network layer flow characteristic and anomaly model detection subsystem, and finding unknown anomalies represented by time sequence type statistical indexes;

and step 304, backing up the PCAP file through the locally created virtual network card, scanning the PCAP file according to the subscription rule and obtaining evidence of the resource file transmitted by the network, and forwarding the PCAP file and the evidence obtaining file to the PCAP and evidence obtaining file detection subsystem through the distributed message queue for virus detection.

Meanwhile, establishing a classification history file storage certificate of a user sending/receiving data resource file;

step four, performing iterative fusion on each detection result output by the three-channel subsystem;

the specific iterative fusion process comprises the following steps:

firstly, detection results output by a three-channel subsystem are periodically collected, training detection and generation of a deep learning model are carried out through dimension reduction, vectorization, sparseness and reconstruction of enhanced training data, and a sustainable self-learning training environment of the model is constructed by utilizing an enhanced learning and confrontation generation algorithm.

Then, sorting and selecting each trained deep learning model according to the standards of high accuracy and low time consumption, reducing the weight of a long-term historical model according to an index equal ratio, and performing time sequence type index moving average weighted fusion learning to obtain a fused detection result;

the specific fusion is performed according to three topics including:

1) integrating a multi-source deep learning model obtained by a network layer, an application layer, a user and a relation layer, and performing integrated learning through Bayesian inference;

2) establishing a relation set and a composite alarm event set through a knowledge graph;

3) and constructing a typical behavior label through the typical behavior portrait of the user, and dividing and labeling a group event set for the single event detection result.

And finally, modeling a Bayesian network model based on the historical sample detection accuracy of the model, calculating the joint detection probability of each classification of the new sample by inference, and continuously updating the deep learning model by using the result.

And fifthly, inquiring the fused detection result data set at regular time, and processing the PCAP and the evidence obtaining file.

Firstly, defining the retention periods of events with different security levels, and dividing the storage requirements of data. Then, according to the result sample which needs to be stored for a long time, the related files are filtered, reduced and indexed to form a reduced file and a sample index.

And step six, transferring daily flow statistical indexes, normal flow sampling and detection results to a large data platform in batches, storing the data in a classified mode, and gradually cleaning event data with different security levels according to a defined life cycle.

The invention has the advantages that:

1) the intelligent network perception method for optimizing the efficiency of the progressive concurrent cache utilizes the Internet of things and the edge computing technology, extracts big data and intelligently detects at a user side, realizes the hierarchical concurrent cache method for processing the network data stream, reduces the energy consumption, improves the system performance, and meets the deployment requirement of a high-efficiency security situation perception system in the future intelligent home network environment.

2) The intelligent network perception method for optimizing efficiency by level concurrent cache comprises the steps that a network security situation perception model with the characteristics of whole-member linkage sinks security services to each edge network, and is closer to the specific user requirements of each type of environment;

3) the intelligent network perception method for optimizing the efficiency of the progressive concurrent cache is characterized in that collection and processing are carried out aiming at each layer of protocol metadata defined by an interconnection reference model of an open communication system, communication protocols and statistical characteristics are analyzed layer by layer, and a relationship between the characteristics and common attack types is established through experiments; a high-efficiency hardware implementation scheme and a software extension mechanism are provided, and a concurrency and cache method of each layer of network data stream processing is realized;

4) an intelligent network perception method for optimizing efficiency by a step-by-step concurrent cache provides a three-channel subsystem for concurrently extracting characteristics of each layer and selecting a matched model for detection; by multi-source and multi-mode iterative fusion of detection results, the detection accuracy, the interpretability of the results and the adaptability of environmental changes are improved.

Drawings

FIG. 1 is a flow chart of an intelligent network aware method for optimizing performance by level concurrent caching according to the present invention;

fig. 2 is a structure diagram of a level-by-level concurrent network traffic collection, detection, and storage system implemented by the present invention.

Fig. 3 is a functional diagram of a physical device based on 5G edge computing gateway technology.

FIG. 4 is a process flow diagram of the specific lightweight acquisition, detection, analysis and storage of the present invention.

FIG. 5 is a flow chart of iterative fusion of multi-source, multi-modality detection results of the present invention.

Detailed Description

The invention is explained in detail below with reference to the figures and examples.

The invention provides a solution for lightweight network traffic collection and intelligent network security situation analysis based on an edge computing technology aiming at an AIoT network environment of an intelligent home, which is oriented to the high-efficiency requirement of the Internet of things, and is used for carrying out step-by-step concurrent division and miniaturized data collection, detection and storage, thereby solving the problems of detection, fusion grading and response of network abnormal events.

As shown in fig. 1, the network traffic collection, detection and storage method specifically includes the following steps:

firstly, pairing a single network card core and a single CPU core on an embedded computing platform based on SoC to form independent partitions, respectively constructing a plurality of storage resource SPM (scratch Pad memory) blocks in each partition through integrating a high-speed high-capacity memory, and directly accessing by using memory addresses; thereby constituting a home internet communication gateway apparatus.

Each network card core is provided with a CPU core; the SPM organizes data in a block form, each block stores 2M data, and a large-scale cache partition is constructed by using 2K to 20K SPM blocks to adapt to the processing rate requirements of different types of network card packets;

the data processed by the network card and the CPU are directly stored in each SPM block, the data stored in each SPM block is directly accessed through the memory address in the subsequent data processing, and the continuous access among the blocks is carried out through the initial address and the load data length without being transmitted through a data bus.

The method comprises the steps of establishing a dedicated addressing access channel for an attribution computing core of the SPM, mapping the dedicated addressing access channel into a large-page memory, using the large-page memory as a network card cache and a user mode cache, enabling an on-chip network card kernel to acquire and copy data into an SPM area only once, and enabling a subsequent processing process to combine with a NUMA sensing and large-page memory mechanism to transparently use the SPM memory, so that zero data copying is realized, and delay and power consumption caused by copying operation are reduced.

Step two, the home internet communication gateway equipment is respectively connected with each internet access equipment in a home, each network card parallelly collects the flow of different internet access equipment, and user mode multithread concurrent processing is carried out based on the DPDK technology;

the method realizes the partition area two-stage lock-free concurrent flow collection of a plurality of physical network cards based on the DPDK technology, and comprises the following specific processes:

firstly, each network card is packed by a distribution main process, the packed packets are sequenced according to 5-tuple, and the unique concurrent UFID is constructed by the hash calculation of a character string, wherein the calculation formula is as follows:

UFID＝shash(sort(srcip,srcport,dstip,dstport),protocol) (1)

And each process and the processing sub-thread are bound with a corresponding CPU core for processing, so that the process and thread scheduling overhead is reduced.

The Hash algorithm is completed by selecting the algorithm which saves the CPU calculation most through performance comparison. Receiving data packets through two-stage lock-free concurrency and user mode protocol stacks, and decomposing high-speed flow above 10G into dozens of parallel processes; the capability of the general multi-core processor is fully utilized;

the MID obtained after the modulus extraction can be used as a sensor ID partition of a subsequent deep packet detection partition, a sensor ID partition of a time sequence database, PCAP file stream ID retrieval and a pre-partition ID of a big data platform, and supports the shunting and concurrent processing of each link partition.

after the IP protocol data field extraction, filtration and caching processing of the flow packet are completed by each sub-process, the flow packet is distributed to sub-systems with different capabilities through three channels to carry out parallel detection and record of fusion network behavior abnormity; three channels are a calculation-intensive application layer abnormal rule detection subsystem, a memory consumption type network layer flow characteristic and abnormal model detection subsystem, a high-capacity storage type PCAP and a evidence obtaining file detection subsystem;

extracting, filtering and caching a batch of concentrated data, asynchronously forwarding a plurality of models, and detecting and fusing the records of network behavior abnormity in parallel; the method comprises the following specific steps:

step 301, distributing the processed traffic packet to an application layer abnormal rule detection subsystem in an IP tunnel mode, extracting metadata defining a rule and application layer protocol metadata in real time, and performing preliminary rule matching detection based on logic expression calculation;

the rule matching type detects abnormal behaviors with obvious network flow and protocol field characteristics for finding out types such as port scanning, DDoS or brute force cracking and the like.

Step 302, forwarding the extracted application layer protocol metadata to a machine learning model through a distributed message queue, and performing near-real-time deep detection by using a network layer traffic characteristic and abnormal model detection subsystem;

the method comprises the following specific steps: aiming at network traffic which carries and triggers malicious behavior execution in an application layer according to the types of malicious codes, viruses, trojans, application layer loopholes and the like, establishing a plurality of analysis models such as characteristic field scanning, state machine reasoning, machine learning model classification, abnormal semantics and abnormal logic flow and the like, and labeling the application behavior type of the network traffic from a plurality of dimensions;

step 303, simultaneously, performing stream processing on the extracted IP protocol data field, and storing the processed stream characteristics, time, correlation statistical indexes such as a host computer and service and 5 tuples into a time sequence database at regular time; forwarding to a machine learning model through a distributed message queue, and performing long-period network behavior anomaly detection and classification detection by using a network layer flow characteristic and anomaly model detection subsystem;

the statistical characteristics are linear combination of average values, standard deviations, information entropies and probability density distribution of random processes of the long-period indexes of a plurality of minutes, hours, days, weeks, months and years, the length of a data packet, the frequency and other statistical characteristic fields;

304, landing the PCAP backup file through a locally created virtual network card, scanning the PCAP file according to a subscription rule and obtaining evidence of a resource file transmitted by a network, and forwarding the PCAP backup file to a PCAP and evidence obtaining file detection subsystem through a distributed message queue for virus detection;

the method comprises the steps that a PCAP file is used for storing network original messages, various types of files are extracted, and the files are distributed to a PCAP and a forensic file detection subsystem through a distributed message queue; meanwhile, a classification history file evidence of the user sending and receiving the data resource file is established, so that the inquiry, evidence obtaining and source tracing are facilitated, and abnormal user data resource file leakage behaviors are found.

Step four, realizing iterative fusion of multi-source and multi-mode detection results of each detection result output by three channels;

the specific iterative fusion process is as follows:

step 401, establishing a self-learning and self-adaptive twin environment, continuously extracting typical data packets and typical statistical characteristics in the environment, iteratively training, and enhancing the detection capability of the model;

in a self-adaptive environment, adding a typical sample of each analysis theme generated in an actual environment in an automatic periodic sliding window mode, and extracting a typical data packet by using a similarity algorithm such as clustering and the like, a compression learning algorithm such as rule and model migration and an excitation algorithm such as countermeasure generation; extracting samples and features by methods of dimension reduction, vectorization, sparseness, reconstruction and the like; establishing attack, defense and honeypot nodes in an auxiliary mode, and actively producing and labeling real samples in the actual environment; generating a network model based on the GAN confrontation of deep learning, training, detecting and generating the deep learning model, and carrying out multi-stage iterative confrontation and production of reinforced samples. And dividing a training set and a verification set, gradually perfecting a test set, and constructing the daily self-learning capability of the model.

Step 402, in a newly constructed twin environment, aiming at data of different sources and characteristics, sequencing and finding the multiple optimal deep learning detection models trained in the step 401, then fusing the detection results of the detection models, performing ensemble learning through Bayesian inference, delineating a composite alarm event set, and performing label division of a group event set on a single event detection result.

According to a naive Bayes inference formula, the product of the output probability of each model of each category and the posterior probability of the category can be calculated, and the category with the highest probability is taken as a discrimination result;

and further deducing Gaussian process probability parameter estimation and iterative ensemble learning of a typical sample sampling loop. And constructing a relation set and scribing a composite alarm event set through a knowledge graph. And constructing a typical behavior label through the typical behavior portrait of the user, and carrying out group event set division and labeling on the single event detection result. And combining the plurality of alarms through association relation rules modeled by expert knowledge or automatically counted association rules to form a relation graph in a knowledge graph form. According to the relation graph, graph structure characteristics such as subjects, core events, relation compactness degree, attack process and the like can be presented, automatic hierarchical fusion results are achieved, and details needing attention and analysis are reduced.

Assuming that n different types of models are provided, the integrated learning and identification test sample X needs to construct an integrated model training set of m samples, and the integrated model training set continuously increases with the number of mThe accuracy of the method can be improved continuously, but the detection calculation is time-consuming and fixed. Let a total of K classes be C _k (K is 1,2, …, K), the number of samples in each class in the training set is m _k Class C _k The prior probability of (a) is:

P(Y＝C _k )＝(m _k +λ)/(m+Kλ) (2)

lambda is a discount factor between 0 and 1,

defining model A to identify class C in a training set _k The accuracy of (2) is conditional probability:

for a certain instance X, each model n _j Running a test to obtain a prediction X of class C _k Of the correlation model of (2) is calculated _j (X)|Y＝C _k ) Calculate class C _k A priori probabilities of, these votes to C _k The product of the conditional probability of the model (b) and the class probability of the model prediction X, and the class corresponding to the maximum posterior probability is taken as the class C of X:

and deducing the stage to which each detection result event belongs according to the activity characteristics and the sequence relation of each stage of the killing chain. After a complete activity flow is established, different group event sets are divided through correlation analysis and characteristic iterative clustering.

And 403, aiming at data in a long time period and different time sequence period ranges, sequencing the found optimal detection model set, clustering to reduce the number of classification labels, and carrying out time sequence type exponential moving average weighted fusion learning by reducing the weight of a long-term historical model according to an exponential equal ratio.

The model trained based on recent data is more in accordance with the rule of the current environmental characteristics, and the detection capability of the system on sudden abnormity and the timely adaptive adjustment capability on environmental changes are continuously improved;

the attack and security problems in the network space are endless, the network space has the characteristic of strong periodic iteration development, and relevant models and samples need to be developed with time to discover unknown attacks and improve the identification accuracy of new attacks in time. The week, month, season, year, 3 years and the like are divided according to time sequence, a plurality of periodic models mp are trained by dividing samples, freshness and comprehensiveness can be considered, and conflicts are reduced. Meanwhile, through accumulation of historical different period time periods, iterative training can be performed, and the optimal model mpm corresponding to the time period is selected. In the real-time detection process, the probability value result of the decision of each model is weighted and fused from near to far by using the following algorithm, and the long-term model is exponentially attenuated to carry out quick decision:

in the post-detection process, the Bayesian inference method in the step 402 can be comprehensively used, an integrated model training sample set with wide coverage is established, and all models mp which are trained according to time sequence division are fused to make more accurate judgment.

The specific fusion function is performed according to three topics, including:

1) performing ensemble learning through Bayesian inference;

and (4) performing weighted integration according to time dimensions, such as applying a Monte Carlo method and a clustering method, dynamically fusing and reducing classification labels, adjusting the fitness weight of the model, and gradually performing self-learning adaptation to a new environment. And jointly reasoning according to a multi-classifier integration mode, if normal abnormal classification is carried out, firstly accurately obtaining abnormal samples by using a statistical model, then discovering potential unknown abnormal by using a deep learning model, and finally filtering the abnormal samples by using a white list rule to reduce the false alarm rate. Finally, a Bayesian inference technology is applied, a Bayesian network model is modeled based on the historical sample detection accuracy of the model, the joint detection probability of each classification of a new sample is inferred and calculated, and the network model is continuously updated with results.

the composite alarm event is to combine a plurality of alarms through the association relation rule of expert knowledge modeling or the automatically counted association rule to form a relation graph in the form of a knowledge graph. According to the relation graph, graph structure characteristics such as topics, core events, relation closeness degrees, attack processes and the like can be presented. And the results are automatically and hierarchically fused, so that the details needing attention and analysis are reduced.

The user portrait is used for analyzing a user behavior label, so that a user can conveniently identify the rule of typical daily activities of the user through network data, and can be used for making statistical indexes such as frequency, data sending and receiving quantity, duration and the like to form probabilistic distribution characteristic indexes such as hourly, daily mean, standard deviation, information entropy, Gaussian process and the like. When the user behavior changes, the indexes can generate large fluctuation, and meanwhile, the occurrence probability, the mean value and the standard deviation range of various activities on the time sequence can be observed.

And step five, inquiring the fused detection result data set at regular time, and processing the PCAP and the evidence obtaining file.

Firstly, defining the retention periods of events with different security levels, and dividing the storage requirements of data. Then, according to a result sample needing to be stored for a long time, filtering, abbreviating and indexing the related files to form an abbreviating file and a sample index; the size of a storage file is reduced, and the efficiency of historical data retrieval is improved;

And data such as flow statistical indexes, normal flow sampling, detection results and the like are transferred to the big data platform in batches every day, the abbreviated files are uploaded to the big data platform in batches every day.

And establishing an association relation between the data and the file, extracting context information tag data, and enriching the subject and relation information. The formed historical data result is combined with an automatic training process, and the detection capability of the model can be continuously improved. And (4) classified storage, namely cleaning up the event data with different security levels step by step according to a defined life cycle.

The embodiment is as follows:

as shown in fig. 2, a home internet communication gateway device is first set up, and all traffic at an internet outlet and traffic of a plurality of sub-network segments inside a home can be obtained through mirror image configuration; then, a mirror image network flow acquisition processing module integrates a high-speed high-capacity memory and a network card inner core on an embedded computing platform based on SoC (system on chip), constructs a high-capacity on-chip storage resource SPM (self-management system), and caches the flow obtained by mirror image. Based on DPDK technology, the method realizes the concurrent acquisition of flow in different areas of a plurality of physical network cards. Receiving and distributing data packets through two-stage lock-free concurrency and user mode protocol stacks; then, each protocol field analysis and concurrency detection processing module is used for completing IP protocol data field extraction, filtration and cache processing, and then distributing the IP protocol data field to different capability subsystems through three channels for detection processing; and finally, a localized sample library and a model efficiency analysis module are used for fusing the results of various detections, storing abnormal samples, sampling normal samples and associating rich context information. Through attack and defense environment simulation and deep learning confrontation generation, samples are enriched. Training and evaluating the efficiency of the model, and continuously improving and integrating excellent models; meanwhile, based on a sample and file abbreviating and index storage module of the cloud platform, discovered files needing to be stored and evidence-obtaining files for a long time are uploaded to a database and distributed storage of the cloud platform every day, and are stored and automatically cleaned according to storage duration in a classified mode.

As shown in fig. 3, a physical device function diagram based on a 5G edge computing gateway technology provided in this example is configured such that, by means of a high-performance 5G communication gateway deployed in a smart home, internet traffic and internal home network traffic are mirrored, and a plurality of groups of agent concurrent acquisition processes are distributed to three computing and detecting models to form a multi-layer and multi-mode feature data set, and a multi-model is continuously trained and integrated through an iterative optimization layer of the models to form a joint reasoning capability.

The flow of collecting, detecting, analyzing and storing network traffic in this embodiment is shown in fig. 4, and the specific steps are as follows:

step 301: the embedded computing platform uses on-chip storage resources;

on the SoC platform, a high-speed high-capacity memory and a network card kernel are integrated, and a large-capacity on-chip storage resource SPM is provided. The network flow has a network card kernel on chip to control the cache processing, and once copied data enters an SPM area. The user-mode network packet reading process can be bound to a CPU of a corresponding area, and a large-page memory mechanism is used for transparently accessing the SPM area.

Step 302: two-stage concurrent lock-free flow acquisition;

and the user state process is based on the DPDK technology, and is respectively bound with a plurality of mirror network cards of different network segments, and the flow is concurrently acquired in different regions. And (3) constructing a unique concurrent UFID by means of hash calculation of the sorted character strings of the 5-tuple of the network package, selecting a home queue Qi after modulus extraction according to the number C of concurrent channels, and distributing the home queue Qi to a processing sub-process through a queue.

Step 303: three-channel concurrent detection;

after the sub-process finishes the extraction, filtration and caching of the IP protocol data field, the data field is distributed to the following subsystems with different capabilities through three channels for detection processing.

1. And the IP tunnel mode distributes the flow to the application layer abnormal rule detection subsystem. The subsystem extracts metadata defining rules and application layer protocol metadata, performs preliminary rule matching type detection, and performs further depth detection and user behavior relation analysis through a distributed message queue forwarding machine learning model after obtaining application layer data;

2. and a network layer flow characteristic and anomaly model detection subsystem. And the extracted IP protocol data field is directly stored in a time sequence database in an internal network for stream processing and various index statistics. Carrying out anomaly detection and attack classification detection on the processed flow characteristics and correlation statistical indexes such as time, host, service and the like through a distributed message queue forwarding machine learning model;

3. PCAP and a forensic file detection subsystem. And landing the PCAP backup file through a locally created virtual network card, scanning the resource file transmitted by the PCAP file forensics network according to subscription rules, and carrying out virus detection through a distributed message queue forwarding file detection model.

Step 304: multi-source and multi-mode iterative fusion;

automatic collection and labeling of new samples, adaptive environment, automatic periodic sliding window mode, and addition of typical samples of each analysis subject generated by the environment. And (3) establishing attack, defense and honeypot nodes in an auxiliary manner, and actively producing and labeling real samples in the actual environment. Training detection and generating two deep learning models, multi-stage iterative confrontation and production of reinforced samples. And dividing a training set and a verification set, and gradually perfecting a test set.

And (4) sequencing data with different sources and characteristics to find an optimal detection model. And sequencing the data in different time sequence period ranges to find the optimal detection model set in different time sequence period ranges. And in the real-time detection process, fusion learning is carried out by using the time sequence index moving weight. And in the post comprehensive detection process, Bayesian inference is used for integrated learning.

Step 305: result data reduction and association index;

and querying the fused detection result data set at regular time, and starting the processes of processing the PCAP and obtaining evidence files. And filtering, abbreviating and indexing the result samples needing to be stored for a long time to form an abbreviative file and a sample index.

Step 306: asynchronous batch unloading, multiplexing and classified storage data every day;

and data such as flow statistical indexes, normal flow sampling, detection results and the like are transferred to the large data platform in batches every day. And uploading the abbreviated files to a big data platform in batches every day. And classifying and storing according to the requirement of a storage period, and gradually cleaning the event data with different security levels according to a defined life cycle.

The process of performing iterative fusion of multi-source and multi-mode detection results and continuously adapting, updating and enriching the sample set in this embodiment is shown in fig. 5, and includes the following steps:

step 401: establishing a new sample automatic collection and label mechanism;

a typical sample of each subject of analysis generated by the actual environment is collected periodically. By establishing attack, defense and honeypot nodes, real samples in the actual environment are actively produced and labeled. Through training detection and generating two deep learning models, the reinforced sample is produced through multi-stage iterative confrontation.

Step 402: analyzing and extracting a typical sample;

and (5) removing the weight and fuzzifying. And (5) clustering analysis, namely extracting class center samples matched with the labels to find abnormal points. And performing classification analysis on the tree model and PCA principal component analysis to obtain key features, and clustering similar samples through the key features.

Step 403: dividing a time sequence periodic data set to extract features;

and establishing a new time sequence periodic data set according to the week, month, season, year and 3 years, and operating a feature processing program aiming at the extracted sample data to form a feature data set.

Step 404: extraction extracts 70% of the data as training and validation sets.

Step 405: and training the plurality of machine learning models, and storing the trained model parameters.

Step 406: extracting a 30% typical test data set;

step 407: and (4) evaluating the models obtained in each time sequence period through the test data set, and sorting and selecting the models according to the standards of high accuracy and low time consumption.

Step 408: in order to obtain information matched with the network protocol, a network layer, an application layer, a user layer and a relation layer are divided, and key elements, statistics and processing characteristic samples of the protocol are respectively extracted.

Step 409: using the network layer samples, training is performed using the previously evaluated optimal model.

Step 410: the previously evaluated optimal model is used for training with application layer samples.

Step 411: training is performed using the previously evaluated optimal model using the user and relationship layer samples.

Step 412: integrating multi-mode time sequence index mobile weight;

in the real-time detection process, the optimal model of each time sequence range is fused and evaluated, the weighting is carried out from near to far, and the long-term model is exponentially attenuated.

Step 413: bayesian inference integration of a multi-source model;

in the comprehensive post-detection process, the multi-source models obtained in the

steps

409, 410 and 411 are integrated, and Bayesian inference is used for integrated learning.

Step 414: and evaluating each optimal submodel in the steps, and comparing the optimal submodel with a multimode time sequence index mobile weight integrated model and a multisource model Bayes inference integrated model.

Step 415: at the end of each time sequence stage, after the sample set iteration and the new model training are completed, the model effective environment with improved accuracy is extracted according to the newly established training data set and the newly established testing data set. And entering the next round of data extraction, model building and new and old evaluation comparison processes.

The invention discloses a method for integrating a high-speed high-capacity memory and a network card kernel on an embedded computing platform based on an SoC (system on chip) to construct a high-capacity on-chip storage resource SPM (system on chip). The kernel of the network card on chip only needs to copy data once to enter an SPM area, and the SPM memory is transparently used by combining NUMA sensing and a large-page memory mechanism; a technology for realizing the regional concurrent flow collection of a plurality of physical network cards based on the DPDK technology. Receiving data packets through two-stage lock-free concurrency and user mode protocol stacks, and decomposing high-speed flow above 10G into dozens of parallel processes; a method for detecting IP protocol data field includes extracting, filtering and caching IP protocol data field, adapting to resource requirement of detection classification, distributing said data field to different capability subsystems through three channels for detection processing. Extracting, filtering and caching a batch of concentrated data, asynchronously forwarding a plurality of models, and detecting and fusing abnormal records in parallel; an iterative fusion algorithm of multi-source and multi-mode detection results is self-adaptive to an environment, adopts an automatic period sliding window mode, adds a typical sample of each analysis theme generated by the environment, trains, detects and generates two types of deep learning models, performs multi-stage iterative confrontation, sequences and finds an optimal detection model set with different time sequence period ranges, exponentially and proportionally reduces the weight of a long-term historical model to perform multi-mode detection result fusion, and continuously improves the detection capability of a system; a method for abbreviating and storing results includes inquiring fused detection result data set, processing PCAP and evidence obtaining file. Defining the storage periods of events with different security levels, and performing filtering, abbreviating and index processing on result samples needing to be stored for a long time to form an abbreviating file and a sample index, thereby reducing the storage capacity and improving the retrieval efficiency; and finally, data such as flow statistical indexes, normal flow sampling and detection results are transferred to the big data platform in batches every day, the abbreviated files are uploaded to the big data platform in batches every day. And (4) classified storage, namely cleaning up the event data with different security levels step by step according to a defined life cycle. Aiming at the network flow acquisition and analysis scene under the 6G full connection and high bandwidth background, the invention provides a new high-efficiency hardware implementation scheme and a software extension mechanism, realizes a multi-level concurrency and cache method for network data stream processing, reduces energy consumption, improves system performance, and meets the deployment requirement of a high-efficiency security situation awareness system in the future intelligent home network environment.

Claims

1. An intelligent network sensing method for optimizing efficiency of a progressive concurrent cache is characterized by specifically comprising the following steps of: on an embedded computing platform based on SoC, pairing a single network card core and a single CPU core to form independent partitions, respectively constructing a plurality of high-capacity on-chip storage resource SPM (scratch Pad memory) blocks in each partition through an integrated memory, and directly accessing by using memory addresses; thereby constituting a home internet communication gateway apparatus;

then, the home internet communication gateway device is respectively connected with each internet device in the home, each network card parallelly collects the flow of different internet devices, and user-mode multithreading concurrent processing is carried out based on a data Plane Development kit DPDK (data Plane Development kit) technology;

the specific process is as follows:

UFID＝shash(sort(srcip,srcport,dstip,dstport),protocol)

wherein, srcip, srcport, dstip and dstport are respectively a source ip address, a source port, a destination ip and a destination port; ensuring that a bidirectional data packet is transmitted and received through sort processing, splicing the four fields into character strings in the same sequence, and calculating a hash id of the character strings by using a shock algorithm after adding a protocol type field;

then, according to the number C of the concurrent channels, the character strings of the bidirectional data packets of the same flow are subjected to hash calculation, and then respectively belong to corresponding Qi queues for storage, and are distributed to a processing sub-process through the queues;

the processed flow packets are parallelly distributed to a three-channel subsystem for detection according to IP protocol detection and routing configuration; carrying out iterative fusion on each detection result output by the three-channel subsystem;

the three-channel subsystem comprises an application layer abnormal rule detection subsystem, a network layer flow characteristic and abnormal model detection subsystem, a PCAP (personal computer application protocol) and a evidence obtaining file detection subsystem;

the three-channel subsystem parallel detection method comprises the following specific steps:

step 301, after completing extraction, filtering and caching of IP protocol data fields of each flow packet by each subprocess, distributing the flow packets to an application layer abnormal rule detection subsystem in an IP tunnel mode, extracting metadata defining rules and application layer protocol metadata, and performing rule matching detection through a computational logic expression;

step 302, forwarding the extracted application layer protocol metadata to a machine learning model through a distributed message queue, performing near-real-time deep detection by using a network layer traffic characteristic and abnormal model detection subsystem, and finding out a user behavior label with fuzzy semantics and similarity;

step 304, backing up the PCAP file through a locally created virtual network card, scanning the PCAP file according to subscription rules and forensics of the resource file transmitted by the network, and forwarding the PCAP file to a PCAP and forensics file detection subsystem through a distributed message queue for virus detection;

finally, inquiring the fused detection result data set regularly, and processing the PCAP and the evidence obtaining file; and (3) unloading daily flow statistical indexes, normal flow sampling and detection results into a big data platform in batches, storing the data in a classified mode, and gradually cleaning event data with different security levels according to a defined life cycle.

2. The intelligent network perception method for progressive concurrent cache optimization performance as claimed in claim 1, wherein said mass on-chip memory resource SPM block stores 2M data including network card data and CPU processed data; a plurality of SPM blocks construct a cache partition, and the requirement of different types of network card packet processing rates is met; the data stored on each SPM block is directly accessed through the memory address, and continuous stream access among the blocks is carried out through the initial address and the load data length without being transmitted through a data bus.

3. The intelligent network sensing method for optimizing performance of progressive concurrent caching as claimed in claim 1, wherein in step 301, the metadata comprises a network tcp/ip packet quintuple, protocol fields, an uplink byte number and a downlink byte number;

the application layer protocol metadata comprises a URL (Uniform resource locator) of an http protocol, a request code, a response code, submitted data and responded data;

4. The intelligent network-aware method for progressive concurrent cache performance optimization according to claim 1, wherein the step 302 specifically comprises: aiming at network traffic carried and triggered by malicious behaviors to be executed in an application layer, a plurality of analysis models of characteristic field scanning, state machine reasoning, machine learning model classification, abnormal semantics and abnormal logic flow are established, and the application behavior types of the network traffic are labeled from a plurality of dimensions.

5. The intelligent network-aware method for progressive concurrent cache optimization performance according to claim 1, wherein the iterative fusion specific process comprises:

firstly, regularly collecting a detection result output by a three-channel subsystem, training, detecting and generating a deep learning model by reducing dimensions, vectorizing, sparsely reconstructing and enhancing training data, and constructing a sustainable self-learning training environment of the model by using an enhanced learning and confrontation generation algorithm;

specific fusion is performed according to three topics including:

3) constructing a typical behavior label, dividing a single event detection result and labeling a group event set through the typical behavior portrait of the user;

and finally, modeling a Bayesian network model based on the historical sample detection accuracy, calculating the joint detection probability of each classification of the new sample by inference, and continuously updating the deep learning model by using the result.

6. The intelligent network sensing method for optimizing performance of progressive concurrent caching as claimed in claim 1, wherein said regularly querying the fused detection result dataset specifically comprises:

firstly, defining the storage periods of events with different security levels, and dividing the storage requirements of data;

then, according to the result sample which needs to be stored for a long time, the related files are filtered, reduced and indexed to form a reduced file and a sample index.