CN115085985B - Memory high-efficiency range base number measuring method for network security monitoring - Google Patents
Memory high-efficiency range base number measuring method for network security monitoring Download PDFInfo
- Publication number
- CN115085985B CN115085985B CN202210631403.0A CN202210631403A CN115085985B CN 115085985 B CN115085985 B CN 115085985B CN 202210631403 A CN202210631403 A CN 202210631403A CN 115085985 B CN115085985 B CN 115085985B
- Authority
- CN
- China
- Prior art keywords
- range
- base number
- network
- bitmap
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
A memory high-efficiency range base measuring method for network security monitoring is characterized in that repeated base information is eliminated through a polymerization key to improve range base estimation precision, a self-adaptive counter is used for dynamically adjusting memory use through network flow distribution conditions, optimal allocation of memory is achieved, and efficient high-precision security monitoring on a network in a resource-limited scene is met; the invention solves the error problem caused by single host cardinal number accumulation, simultaneously solves the memory redundancy problem caused by unbalanced data distribution in network flow, can identify and track the abnormal range in the network, and realizes the network safety monitoring of different granularities.
Description
Technical Field
The invention relates to the technical field of network security monitoring, in particular to a memory high-efficiency range base number measuring method for network security monitoring.
Background
Cardinality estimation is an important task in network security monitoring, and unlike the flow size and frequency estimation task, cardinality estimation is intended to give the number of different elements in a data flow, with non-additivity. Current research has proposed many cardinality measurement methods that implement single host cardinality estimation, but range cardinality estimation is rarely studied. Range cardinality refers to the overall cardinality of a set of flows, e.g., the cardinality of a host that can monitor a range of source addresses by setting keys to source addresses; when the keys are set as commodity numbers, the times of accessing a certain group of commodities by different customers can be calculated, and better commodity recommendation is realized; when the key is set as the target address, whether any host computer in the current network range is attacked by DDoS can be monitored.
The current cardinality measurement method is divided into three categories according to the difference of counting modes: a bitmap-based radix measurement method, a probability counting algorithm-based radix measurement method, and a bit-sharing radix measurement method.
The bitmap-based radix measurement method comprises the following steps: the estimation operation is based on the characteristic of a bitmap structure, and the linear increase of the bitmap memory usage along with the data volume is considered, so that the existing bitmap-based radix measurement method is difficult to be used for measuring the range radix in an actual scene due to high memory usage;
the radix measuring method based on the probability counting algorithm comprises the following steps: the probability estimation algorithm stores the probability of the occurrence of different elements as information. Compared with a bitmap structure, the radix measurement method based on the probability counting algorithm can realize accurate radix information recording in a smaller memory space. Therefore, in an actual scene, the probability counting-based radix measurement method is widely applied, although the high-efficiency and high-precision radix estimation of the memory can be realized, the methods mainly measure the radix of a single host, and repeated information cannot be removed during range radix estimation, so that a large accumulated error is caused;
radix measurement method for bit sharing: the bit sharing base number measuring method shares the counter memory used by the small base number host computer to the large base number host computer, thereby further reducing the memory overhead. At present, researches on a bit sharing base number measuring method mainly focus on the direction of memory optimization, researchers propose a series of base number measuring algorithms with excellent performance, and the bit sharing base number measuring method also has the problem of low-accuracy range base number estimation caused by incapability of removing repeated base number information.
In summary, although the above structure can be measured by multi-point query accumulation or multi-stage radix measurement methods to estimate the range radix, these methods suffer from the problems of too large memory occupation or low precision estimation, and cannot be applied in the real network security measurement scenario. Therefore, it is desirable to design a range-based estimation algorithm with high memory efficiency and high accuracy to monitor high-speed network flows, so as to implement efficient network security monitoring.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a memory high-efficiency range base number measuring method for network security monitoring, which solves the error problem caused by single host base number accumulation, solves the memory redundancy problem caused by unbalanced data distribution in network flow, can identify and track the abnormal range in a network and realizes network security monitoring with different granularities.
In order to achieve the purpose, the invention adopts the technical scheme that:
a memory high-efficiency range base measuring method for network security monitoring improves range base estimation precision by eliminating repeated base information through a polymerization key, dynamically adjusts memory use through network flow distribution conditions by utilizing a self-adaptive counter, realizes optimal memory allocation, and meets the requirement of high-efficiency high-accuracy security monitoring on a network in a resource-limited scene.
A memory high-efficiency range base number measuring method for network security monitoring comprises the following operation functions:
(1) The function of the function f (k) is to aggregate keys within a predefined range into the same bucket; the method comprises the steps that original high-dimensional data are mapped into independent buckets according to data similarity through a local sensitive hash function, keys in a predefined range are regarded as similar data through similar operation by the aid of the memory high-efficiency range base number measuring method, and the keys are aggregated through the local sensitive hash function of a scalar version; specifically, a preset measurement range r is given, the domain of the data stream is D, and the hash function is:
wherein l = D/r represents the number of buckets of the locality sensitive hash function, b-U (0 l), and the above formula shows that l/a continuous keys are mapped into one bucket, so a = l/r;
(2) Function h i (f (k)) 1. Ltoreq. I. Ltoreq. D) are d hash functions h i (f(k))=(f(k)+c i ) mod ω, where is c i Processing data of random values in a set (0,1.,. Omega-1) to obtain an index position of the random values in each row of the range base number measurement algorithm structure;
(3) The function g (u) encodes the elements into a binary sequence of length and each bit obeying a geometric distribution parameter of 1/2; the Murmur3 hash function is used to achieve a uniform independent bit distribution, and furthermore, the function ρ x (g (v)) truncates the first x bits on the right and returns their decimal value.
A memory high-efficiency range base number measuring method for network security monitoring comprises the following functional modules:
radix counter SC ij : the base counter adopts bitmap structure to record range base information and provide approximate range base estimation, and is composed of bits with initial value of 0, and is marked as B ij [1]…,B ij [m];SC ij The size of the network flow is adaptively expanded according to the change of the network flow to improve the cardinalityAn estimated upper limit;
degree of congestion CD ij : it represents the ratio of the median position "1", with large representing SC ij There are more elements in the cell, that is, the current cell stores large cardinality information;
expansion times ET ij : it records SC ij The number of expansion is increased, and each expansion makes SC ij The space of (a) is expanded to twice of the original space;
exception Range Address ID ij :ID ij The exception range address in the cell is recorded, the address being dependent on the UT ij The value of (d) is replaced by a probability;
number of updates UT ij : records the initiation SC ij The number of updates, i.e., the bit is set to 1 from 0; since the hash function g uniformly maps each element to be measured, an abnormal range having a plurality of different elements causes SC ij And (4) updating for multiple times.
The memory high-efficiency range base number measuring method for network security monitoring comprises the following steps:
1) Collecting network flow: collecting network flow information of a plurality of network nodes by using a range base number measurement algorithm; with the dynamic change of network flow, the range base number measurement algorithm dynamically adjusts the memory usage, and the optimal memory usage of multiple nodes in different environments is met;
2) Network flow fusion: after the acquisition is finished, the cardinal number information of each node is fused by utilizing the fusibility of a range cardinal number measurement algorithm, so that the cardinal number information of the whole network is obtained, and the comprehensive network safety monitoring is ensured;
3) Network flow analysis: inquiring multi-range base number information by utilizing the estimation operation of a range base number measurement algorithm, detecting an abnormal range in the network by comparing the base number information with a safety rule, and carrying out safety monitoring; and completing the source tracing of the abnormity according to the recorded actual address of the abnormal range, and collecting corresponding safety defense measures.
The method for measuring the memory high-efficiency range base number comprises d × w cells, wherein the cells in the ith row and the jth column are marked as L (i, j), and the value ranges are that j is more than or equal to 1 and less than or equal to d and j is more than or equal to 1 and less than or equal to w respectively.
The specific process of network flow acquisition in the step 1) is as follows:
in the initialization stage, all cells in the range base number measurement structure are 0; when network flows<When k and v > arrive, firstly using a function f (k) to carry out key aggregation on k to obtain the range of the network flow; secondly, calculate the function h i (f (k)) obtaining the cell index of the network flow in each row i (i is more than or equal to 1 and less than or equal to d); finally, obtaining a binary representation of v using a function g (u); given congestion e and bitmap length m =2 c The update situation of each cell has the following two possibilities:
the first condition is as follows: b is ij [ρ c +ET ij (g(v))]When =1, no operation is required;
situation(s) II, secondly: b is ij [ρ c +ET ij (g(u))]When =0, B is first set ij [ρ c +ET ij (g(v))]=1 and updates the congestion counterThe following operations are then performed:
examination Condition CD ij If = ∈ true; if true, it indicates the counter SC ij The space of (2) is already filled by different elements, and in order to obtain a larger estimation upper limit, the execution counter needs to be expanded; the bitmap expansion process is as follows, firstly, the original length is divided fromExpanded as->ET ij Adding 1; then, performing a data transfer operation; because of the lack of specific hash values for the elements in each bit, there is no precise radix information in the transition, so the range radix measurement structure uses an isolemic hash function p (-) to randomly map the row number to 0 or 1; if p (i) =1, the information for each bit in the original bitmap will be saved in the pre-expansion position, i.e. < >>Otherwise, the information of each bit in the original bitmap is moved to the expanded sequence, i.e. </or>
Inspection condition ID ij If = f (k) is true; if true, execute UT ij + =1; otherwise, withProbability reduction UT ij Wherein b is a preset constant; if UT is ij Becomes 0, and sets the ID to ij Is replaced by and sets UT ij Is 1.
The specific process of network flow fusion in the step 2) is as follows:
multiple range-based measurements structures are fused into one range-based measurement structure by a merge operation, giving a set of range-based measurements structures with the same parameters, denoted as { LS 1 ,...,LS Y The merging step of the cell L (i, j) is as follows:
radix counter fusion: bitmap for multiple range radix measurement structureCarry out a bit-by-bit or operation, i.e. </or>Wherein the symbol->Represents a bitwise OR; it should be noted that before bitmap merging, the bitmap lengths of different units should be normalized first;
and fusing a congestion degree counter: CD (compact disc) ij The value of (d) is calculated from the finally merged bitmap;
The specific process of network flow estimation in step 3) is as follows:
the query operation returns an estimate of the range-specific cardinality information, assuming a data field of D and a range of sizeThe data stream is divided into intervals>The range-radix measurement structure supports the following query operations:
single range radix query: for a single data interval Q, the range index or any key in the range is used for inquiring single-range base information, and the specific process is as follows: first, k cell indexes L (i, h) of a measurement structure of a range base number Q are obtained i (Q)) (1 ≦ i ≦ d); next, the cell's CD is checked ij And ET ij And the single-range base information is obtained by calculation through a linear estimation method:
whereinIs a bitmap counter SC ij λ = ln ((2-e) 2 /4 (1-epsilon)) is a compensation term for cardinality estimation errors caused by bitmap expansion; finally, the range base estimation value of the single data interval is set { RS { 1 (Q),...,RS d A minimum value of (Q) };
multi-range radix query: when the query range comprises a plurality of sub-ranges, the large range is divided into a plurality of small ranges, and the cardinal number information of different small ranges is combined to achieve cardinal number deduplication; using a bitwise OR operation pair Q 1 And Q 2 And carrying out data combination on the bitmaps in each row, if the sizes of the two bitmaps are not equal, the short bitmap is expanded to be consistent with the long bitmap, and after the bitmap information combination operation, the step of range base number estimation is consistent with the algorithm of the single range base number.
The specific process of detecting the abnormal range in the step 3) is as follows:
the memory high-efficiency range base number measuring method identifies an abnormal range according to values in dw cells, and if the abnormal range exceeds a certain threshold value, range addresses in the cells are used for calculating the range base number; and if the calculated range base number still exceeds a preset threshold value, identifying the address as an abnormal range.
The invention has the beneficial effects that:
according to the distribution condition of the network flow cardinality, the range cardinality measurement algorithm can dynamically adjust the memory usage. The host cardinality information is stored by adopting a dynamic memory allocation mechanism, a small counter can be used for monitoring a low-cardinality host to reduce unnecessary memory usage, and a large counter is used for monitoring a high-cardinality host to ensure cardinality measurement accuracy. Therefore, the algorithm provided by the invention realizes the high-efficiency host radix measurement of the memory in the network security monitoring.
The dynamic extensible counter is used, the range radix measurement algorithm can monitor the high radix host and the low radix host at the same time, and the dynamic extensible characteristic can ensure the accuracy of radix measurement of any size.
The range radix measurement method of the present invention supports any range of radix measurements. By utilizing the estimation operation of the range base number measurement algorithm, the base number information of a single-range or multi-range interval can be accurately measured (a single IP network or a plurality of IP networks), and the multi-granularity network safety monitoring is realized.
The range cardinality measurement method of the invention can be distributively deployed at each node of the network to collect network flow information. By utilizing the fusibility of the range base number measurement algorithm, the information of each node can be fused accurately to obtain the global network state.
The invention provides a rapid abnormal range identification method based on a range base number measurement method. The identification algorithm does not need to calculate the cardinal number information of the whole range, and can quickly locate and trace the source of the abnormal range according to the special function module of the range cardinal number measurement algorithm, thereby realizing high-efficiency network safety monitoring.
Compared with the prior art, the invention has the following advantages:
the economic efficiency is as follows: the range radix measurement algorithm can accurately measure the multi-range radix in the scene of limited memory resources, has high resource utilization rate and high throughput rate, can perform distributed measurement, is easy to deploy, and has high commercial value.
Real-time performance: the range base number measurement algorithm is based on Hash operation, ensures the calculation efficiency of data acquisition and query, has lower acquisition and estimation calculation consumption, and can be used for acquiring network flow information in a high-speed network environment in real time.
The universality is as follows: the range cardinality measurement algorithm may be extended to apply to a variety of data flow measurement tasks. For example, range frequency measurements, range network flow size measurements, and range commodity heat tracing may be extended.
High accuracy: the range base number measurement algorithm adopts key aggregation and a dynamic extensible counter, and realizes accurate measurement of network flows in all forms.
And (3) expandability: the invention is not limited to range base number measurement, and can be further expanded to range characteristic estimation by modifying the data stream acquisition and storage mode of the memory high-efficiency measurement method.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
FIG. 2 is a diagram of memory efficient range radix measurement architecture for the method of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and examples.
A memory high-efficiency range base number measuring method for network security monitoring comprises the following operation functions:
(1) The function of the function f (k) is to aggregate keys within a predefined range into the same bucket; the method comprises the steps that original high-dimensional data are mapped into independent buckets according to data similarity through a locality sensitive hash function, keys in a predefined range are regarded as similar data through a memory high-efficiency range base number measuring algorithm through similar operation, and the keys are aggregated through a locality sensitive hash function of a scalar version; specifically, a preset measurement range r is given, the domain of the data stream is D, and the hash function is:
wherein l = D/r represents the number of buckets of the locality sensitive hash function, b-U (0,l), and the above formula shows that l/a continuous keys are mapped into one bucket, so a = l/r;
(2) Function h i (f (k)) (1 ≦ i ≦ d) is d mutually independent hash functions, h i (f(k))=(f(k)+c i ) mod w, where is c i Processing data of random values in a set (0,1.,. W-1) to obtain an index position of the random values in each row of the range base number measurement algorithm structure;
(3) The function g (v) encodes the elements into binary sequences of length and each bit obeying a geometric distribution parameter of 1/2 (each bit in the sequence is independent of the other and has a probability of occurrence); the Murmur3 hash function is used to achieve a uniform independent bit distribution, and furthermore, the function ρ x (g (v)) truncating the first x bits on the right and returning their decimal values, e.g.
ρ3(001101)=101| 2 =5。
The functional module for realizing the operation function comprises:
radix counterSC ij : the radix counter adopts bitmap structure to record range radix information and provide approximate estimation of range radix, and is composed of bits with initial value of 0, and is marked as B ij [1],…,B ij [m](ii) a Unlike the conventional bitmap structure, the SC ij The size of the base number is adaptively expanded according to the change of network flow to improve the upper limit of the base number estimation;
degree of congestion CD ij : it represents the ratio of the median position "1", with large representing SC ij There are more elements in the cell, that is, the current cell stores large cardinality information;
expansion times ET ij : it records SC ij The number of expansion is increased, and each expansion makes SC ij The space of (a) is expanded to twice of the original space;
exception Range Address ID ij :ID ij Record the exception range address in the cell, which address will be according to UT ij The value of (d) is replaced by a probability;
number of updates UT ij : records the initiation SC ij The number of updates, i.e., the bit is set to 1 from 0; since the hash function g uniformly maps each element to be measured, an abnormal range having a plurality of different elements causes SC ij And (4) updating for multiple times.
As shown in fig. 1, a memory high-efficiency range radix measurement method for network security monitoring includes the following steps:
1) Collecting network flow: collecting network flow information of a plurality of network nodes by using a range cardinality measurement algorithm; along with the dynamic change of network flow, the range base number measurement algorithm can dynamically adjust the memory usage, so that the optimal memory usage of multiple nodes in different environments is met;
2) Network flow fusion: after the acquisition is finished, the cardinal number information of each node is fused by utilizing the fusibility of a range cardinal number measurement algorithm, so that the cardinal number information of the whole network is obtained, and the comprehensive network safety monitoring is ensured;
3) Network flow analysis: by utilizing the estimation operation of the range base number measurement algorithm, multi-range base number information can be inquired, and the abnormal range in the network is detected by comparing the base number information with the safety rule to carry out safety monitoring; according to the recorded actual address of the abnormal range, the abnormal tracing can be completed, and corresponding security defense measures can be acquired.
As shown in fig. 2, the memory high-efficiency range radix measurement method includes d × w cells, where the cells in the ith row and j are denoted as L (i, j), and the value ranges are i ≦ d being 1 and j ≦ w being 1.
The specific process of network flow acquisition in the step 1) is as follows:
in the initialization stage, all cells in the range base number measurement structure are 0; when network flows<k,v>When the network flow reaches, firstly, a function f (k) is used for carrying out key aggregation on k to obtain the range of the network flow; secondly, calculate the function h i (f (k)) obtaining cell indices for the network stream at each row i (1. Ltoreq. I. Ltoreq. D); finally, obtaining a binary representation of v using a function g (u); given congestion e and bitmap length m =2 c The update situation of each cell has the following two possibilities:
the first condition is as follows: b is ij [ρ c +ET ij (g(v))]When =1, no operation is required;
case two: b is ij [ρ c +ET ij (g(u))]When =0, B is first set ij [ρ c +ET ij (g(u))]=1 and updates the congestion counterThe following operations are then performed:
examination Condition CD ij If = ∈ true; if true, it indicates the counter SC ij The space of (2) is already filled by different elements, and in order to obtain a larger estimation upper limit, the counter expansion operation needs to be executed; the bitmap expansion process is as follows, firstly, the original length is divided fromExpanded as->ET ij Adding 1; however, the device is not suitable for use in a kitchenThen, executing data transfer operation; because of the lack of specific hash values for the elements in each bit, there is no precise radix information in the transition, so the range radix measurement structure uses an isolemic hash function p (-) to randomly map the row number to 0 or 1; if p (i) =1, the information for each bit in the original bitmap will be saved in the pre-expansion position, i.e. < >>Otherwise, the information of each bit in the original bitmap is moved to the expanded sequence, i.e. </or>
Inspection condition ID ij If = f (k) is true; if true, execute UT ij + =1; otherwise, withProbability reduction UT ij Wherein b is a preset constant; if UT is ij Becomes 0, and sets the ID to ij Is replaced by and sets UT ij 1, the rationality of this update approach is: since the cardinality of the abnormal range is much greater than that of the normal range, there is a greater probability of causing SC ij So that the exception range has a higher UT ij 。
The specific process of network flow fusion in the step 2) is as follows:
multiple range-based measurements structures are fused into one range-based measurement structure by a merge operation, giving a set of range-based measurements structures with the same parameters, denoted as { LS 1 ,...,LS Y The merging step of the cell L (i, j) is as follows:
radix counter fusion: bitmap for multiple range radix measurement structureCarry out a bit-by-bit or operation, i.e. </or>In which the symbol +>Represents a bitwise OR; it should be noted that before bitmap merging, the bitmap lengths of different units should be normalized first; />
And fusing a congestion degree counter: CD (compact disc) ij The value of (d) is calculated from the finally merged bitmap;
The range base number measurement structure can acquire distributed range base number information and perform centralized analysis and processing by virtue of the merging operation of the range base number measurement structure, and the range base number information of the global network can be acquired by the merged range base number measurement structure.
The specific process of network flow estimation in step 3) is as follows:
the query operation returns an estimate of the range-specific cardinality information, assuming a data field of D and a range of sizeThe data stream is divided into intervals->Range radix measurement architecture supports the following queriesThe operation is as follows:
single range radix query: for a single data interval Q (e.g., Q = [ r,2 r)), a range index, or any key in a range is used to query the single-range cardinality information, as follows: first, k cell indexes L (i, h) of a measurement structure of a range base number Q are obtained j (Q)) (1. Ltoreq. I. Ltoreq. D); next, the cell's CD is checked ij And ET ij And the single-range base information is obtained by calculation through a linear estimation method:
whereinIs a bitmap counter SC ij λ = ln ((2-e) 2 /4 (1-epsilon)) is a compensation term for cardinality estimation errors caused by bitmap expansion; finally, the process is carried out in a batch, range base estimate for single data interval is set { RS 1 (Q),...,RS d A minimum value of (Q) };
multi-range radix query: when the query range comprises a plurality of sub-ranges, dividing the large range into a plurality of small ranges, and combining the cardinal number information of different small ranges to achieve cardinal number deduplication; for example, when Q = [ r,3 r), Q = [ Q ] can be classified 1 ,Q 2 ]Wherein Q is 1 = [ r,2 r) and Q 2 = [2r, 3r). Using a bitwise OR operation pair Q 1 And Q 2 And carrying out data combination on the bitmaps in each row, if the sizes of the two bitmaps are not equal, the short bitmap is expanded to be consistent with the long bitmap, and after the bitmap information combination operation, the step of range base number estimation is consistent with the algorithm of the single range base number.
The specific process of detecting the abnormal range in the step 3) is as follows:
in order to monitor the abnormal range, the memory high-efficiency range base number measurement algorithm needs to identify the cells with high base numbers, compared with the traditional detection method of traversing all counters and calculating the base numbers, the memory high-efficiency range base number measurement algorithm identifies the abnormal range according to the values in dw cells, and if the abnormal range exceeds a certain threshold value, the range address in the cell is used for calculating the range base number; and if the calculated range base number still exceeds a preset threshold value, identifying the address as an abnormal range. The rationality of this detection approach is: the normal range has a smaller cardinality and the number of expansions of the cell in which it is located is smaller. But the abnormal range has larger cardinality and higher expansion times.
Claims (5)
1. A memory high-efficiency range base number measuring method for network security monitoring is characterized in that: the estimation precision of the range cardinality is improved by eliminating repeated cardinality information through a polymerization key, the use of the memory is dynamically adjusted by utilizing an adaptive counter through the distribution condition of network flow, the optimal allocation of the memory is realized, and the high-efficiency high-precision safety monitoring of the network in a resource-limited scene is met;
the memory high-efficiency range base number measuring method for network security monitoring comprises the following steps:
1) Collecting network flow: collecting network flow information of a plurality of network nodes by using a range base number measurement algorithm; with the dynamic change of network flow, the range base number measurement algorithm dynamically adjusts the memory usage, and the optimal memory usage of multiple nodes in different environments is met;
2) Network flow fusion: after the acquisition is finished, the cardinal number information of each node is fused by utilizing the fusibility of a range cardinal number measurement algorithm, so that the cardinal number information of the whole network is obtained, and the comprehensive network safety monitoring is ensured;
3) Network flow analysis: inquiring multi-range base number information by utilizing the estimation operation of a range base number measurement algorithm, detecting an abnormal range in the network by comparing the base number information with a safety rule, and carrying out safety monitoring; completing the source tracing of the abnormity according to the recorded actual address of the abnormal range, and collecting corresponding safety defense measures;
the specific process of network flow acquisition in the step 1) is as follows:
in the initialization stage, all cells in the range base number measurement structure are 0; when the network flow < k, v > arrives, the function f (k) is used to perform key aggregation on k to obtain the network flowThe range in which it is present; secondly, calculate the function h i (f (k)) obtaining the cell index of the network flow in each row i (i is more than or equal to 1 and less than or equal to d); finally, obtaining a binary representation of v by using a function g (upsilon); given congestion e and bitmap length m =2 c The update situation of each cell has the following two possibilities:
the first condition is as follows: b is ij [ρ c +ET ij (g(υ))]When =1, no operation is required;
case two: b ij [ρ c +ET ij (g(υ))]When =0, B is first set ij [ρ c +ET ij (g(υ))]=1 and updates the congestion counterThe following operations are then performed:
examination Condition CD ij If = ∈ true; if true, it indicates the counter SC ij The space of (2) is already filled by different elements, and in order to obtain a larger estimation upper limit, the counter expansion operation needs to be executed; the bitmap expansion process is as follows, firstly, the original length is divided fromExpanded as->ET ij Adding 1; then, performing a data transfer operation; because of the lack of specific hash values for the elements in each bit, there is no precise radix information in the transition, so the range radix measurement structure uses an isolemic hash function p (-) to randomly map the row number to 0 or 1; if p (i) =1, the information for each bit in the original bitmap will be saved in the pre-expansion position, i.e. < >>Otherwise, the information of each bit in the original bitmap is moved to the expanded sequence, i.e. </or>
Inspection condition ID ij If = f (k) is true; if true, execute UT ij + =1; otherwise, withProbability reduction UT ij Wherein b is a preset constant; if UT is ij Becomes 0, and sets the ID to ij Is replaced by and sets UT ij Is 1;
the specific process of network flow fusion in the step 2) is as follows:
multiple range-based measurements structures are fused into one range-based measurement structure by a merge operation, giving a set of range-based measurements structures with the same parameters, denoted as { LS 1 ,...,LS Y The merging step of the cell L (i, j) is as follows:
radix counter fusion: bitmap for multiple range radix measurement structureCarry out a bit-by-bit or operation, i.e. </or>Wherein the symbol->Represents a bitwise OR; it should be noted that before bitmap merging, the bitmap lengths of different units should be normalized first;
and fusing a congestion degree counter: CD (compact disc) ij The value of (d) is calculated from the finally merged bitmap;
The specific process of network flow estimation in step 3) is as follows:
the query operation returns an estimate of the range-specific cardinality information, assuming a data field of D and a range of sizeThe data stream is divided into intervals>The range-radix measurement structure supports the following query operations:
single range radix query: for a single data interval Q, the range index or any key in the range is used for inquiring single-range base information, and the specific process is as follows: first, k cell indexes L (i, h) of a measurement structure of a range base number Q are obtained i (Q)) (1. Ltoreq. I. Ltoreq. D); next, the cell's CD is checked ij And ET ij And the single-range base information is obtained by calculation through a linear estimation method:
whereinIs a bitmap counter SC ij λ = ln ((2-e) 2 Per 4 (1-e)) is the radix estimation error caused by bitmap expansionThe compensation term of (2); finally, the range base number estimated value of the single data interval is set { RS 1 (Q),...,RS d A minimum value of (Q) };
multi-range radix query: when the query range comprises a plurality of sub-ranges, dividing the large range into a plurality of small ranges, and combining the cardinal number information of different small ranges to achieve cardinal number deduplication; using a bitwise OR operation pair Q 1 And Q 2 And carrying out data combination on the bitmaps in each row, if the sizes of the two bitmaps are not equal, the short bitmap is expanded to be consistent with the long bitmap, and after the bitmap information combination operation, the step of range base number estimation is consistent with the algorithm of the single range base number.
2. The method of claim 1, wherein the arithmetic function comprises:
(1) The function of the function f (k) is to aggregate keys within a predefined range into the same bucket; the method comprises the steps that original high-dimensional data are mapped into independent buckets according to data similarity through a local sensitive hash function, keys in a predefined range are regarded as similar data through similar operation by the aid of the memory high-efficiency range base number measuring method, and the keys are aggregated through the local sensitive hash function of a scalar version; specifically, a preset measurement range r is given, the domain of the data stream is D, and the hash function is:
wherein l = D/r represents the number of buckets of the locality sensitive hash function, b-U (0,l), and the above formula shows that l/a continuous keys are mapped into one bucket, so a = l/r;
(2) Function h i (f (k)) i 1. Ltoreq. I.ltoreq.d) are J mutually independent hash functions, h i (f(k))=(f(k)+c i ) mod w, where is c i Processing data of random values in a set (0,1.,. W-1) to obtain an index position of the random values in each row of the range base number measurement algorithm structure;
(3) The function g (v) encodes elements of length and each bit obeys a geometric scoreDistributing binary sequences with parameters of 1/2; the Murmur3 hash function is used to achieve a uniform independent bit distribution, and furthermore, the function ρ x (g (v)) truncates the first x bits on the right and returns their decimal values.
3. The method of claim 1, wherein the functional module comprises:
radix counter SC ij : the base counter uses a bitmap structure to record range base information and provide an approximate estimate of the range base, consisting of bits with an initial value of 0, denoted B ij [1],...,B ij [m];SC ij The size of the base number is adaptively expanded according to the change of network flow to improve the upper limit of the base number estimation;
degree of congestion CD ij : it represents the ratio of the median position "1", with large representing SC ij There are more elements in the cell, that is, the current cell stores large cardinality information;
number of expansion ET ij : it records SC ij The number of expansion is increased, and each expansion makes SC ij The space of (a) is expanded to twice of the original space;
exception Range Address ID ij :ID ij The exception range address in the cell is recorded, the address being dependent on the UT ij The value of (d) is replaced by a probability;
number of updates UT ij : records the initiation SC ij The number of updates, i.e., the bit is set to 1 from 0; since the hash function g uniformly maps each element to be measured, an abnormal range having a plurality of different elements causes SC ij And (4) updating for multiple times.
4. The method of claim 1, wherein: the method for measuring the memory high-efficiency range base number comprises d × w cells, wherein the cells in the ith row and the jth column are marked as L (i, j), and the value ranges are that i is larger than or equal to 1 and smaller than or equal to d and j is larger than or equal to 1 and smaller than or equal to w respectively.
5. The method according to claim 1, wherein the specific process of detecting the abnormal range in step 3) is as follows:
the memory high-efficiency range base number measuring method identifies an abnormal range according to values in dw cells, and if the abnormal range exceeds a certain threshold value, range addresses in the cells are used for calculating the range base number; and if the calculated range base number still exceeds a preset threshold value, identifying the address as an abnormal range.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210631403.0A CN115085985B (en) | 2022-06-06 | 2022-06-06 | Memory high-efficiency range base number measuring method for network security monitoring |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210631403.0A CN115085985B (en) | 2022-06-06 | 2022-06-06 | Memory high-efficiency range base number measuring method for network security monitoring |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115085985A CN115085985A (en) | 2022-09-20 |
CN115085985B true CN115085985B (en) | 2023-03-31 |
Family
ID=83249991
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210631403.0A Active CN115085985B (en) | 2022-06-06 | 2022-06-06 | Memory high-efficiency range base number measuring method for network security monitoring |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115085985B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116599865B (en) * | 2023-05-17 | 2024-05-24 | 广州天懋信息***股份有限公司 | Distributed traffic deduplication statistical method, device, equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709001A (en) * | 2016-12-22 | 2017-05-24 | 西安电子科技大学 | Cardinality estimation method aiming at streaming big data |
CN110955685A (en) * | 2019-11-29 | 2020-04-03 | 北京锐安科技有限公司 | Big data base estimation method, system, server and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966006A (en) * | 2021-03-11 | 2021-06-15 | 北京明略昭辉科技有限公司 | Method, apparatus, electronic device and storage medium for cardinality estimation |
CN113360532B (en) * | 2021-06-07 | 2022-11-15 | 东南大学 | Network flow cardinality online real-time estimation method based on outline structure |
CN113904795B (en) * | 2021-08-27 | 2024-06-04 | 北京工业大学 | Flow rapid and accurate detection method based on network security probe |
CN113992541B (en) * | 2021-09-11 | 2023-03-31 | 西安电子科技大学 | Network flow measuring method, system, computer equipment, storage medium and application |
-
2022
- 2022-06-06 CN CN202210631403.0A patent/CN115085985B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709001A (en) * | 2016-12-22 | 2017-05-24 | 西安电子科技大学 | Cardinality estimation method aiming at streaming big data |
CN110955685A (en) * | 2019-11-29 | 2020-04-03 | 北京锐安科技有限公司 | Big data base estimation method, system, server and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115085985A (en) | 2022-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111612039B (en) | Abnormal user identification method and device, storage medium and electronic equipment | |
CN110830322B (en) | Network flow measuring method and system based on probability measurement data structure Sketch | |
US20060271547A1 (en) | Cluster storage collection based data management | |
WO2017020747A1 (en) | Method and device for detecting slow disk | |
CN112926635B (en) | Target clustering method based on iterative self-adaptive neighbor propagation algorithm | |
CN112911627B (en) | Wireless network performance detection method, device and storage medium | |
CN115085985B (en) | Memory high-efficiency range base number measuring method for network security monitoring | |
CN110297715B (en) | Online load resource prediction method based on periodic feature analysis | |
CN113992541B (en) | Network flow measuring method, system, computer equipment, storage medium and application | |
CN115248757A (en) | Hard disk health assessment method and storage device | |
CN111782700B (en) | Data stream frequency estimation method, system and medium based on double-layer structure | |
Sanchez et al. | Fast trajectory clustering using hashing methods | |
CN114389974B (en) | Method, device and medium for searching abnormal flow node in distributed training system | |
CN117596010A (en) | Efficient high-precision base number measurement method and system for network anomaly detection | |
CN111540202B (en) | Similar bayonet determining method and device, electronic equipment and readable storage medium | |
CN111865690B (en) | Opportunistic network link prediction method based on network structure and time sequence | |
Wang et al. | Stull: Unbiased online sampling for visual exploration of large spatiotemporal data | |
CN109543712B (en) | Method for identifying entities on temporal data set | |
CN112765219B (en) | Stream data abnormity detection method for skipping steady region | |
CN114881102A (en) | Rare class detection method for numerical data | |
CN105488192A (en) | Point cloud data K neighborhood search method | |
CN110109960A (en) | A kind of data acquisition expansion control system and its collecting method | |
CN113435501B (en) | Clustering-based metric space data partitioning and performance measuring method and related components | |
CN115858895B (en) | Multi-source heterogeneous data processing method and system for smart city | |
Fu et al. | Jump Filter: A Dynamic Sketch for Big Data Governance. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |