CN115085985B - Memory high-efficiency range base number measuring method for network security monitoring - Google Patents

Memory high-efficiency range base number measuring method for network security monitoring Download PDF

Info

Publication number
CN115085985B
CN115085985B CN202210631403.0A CN202210631403A CN115085985B CN 115085985 B CN115085985 B CN 115085985B CN 202210631403 A CN202210631403 A CN 202210631403A CN 115085985 B CN115085985 B CN 115085985B
Authority
CN
China
Prior art keywords
range
base number
network
bitmap
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210631403.0A
Other languages
Chinese (zh)
Other versions
CN115085985A (en
Inventor
靖旭阳
闫峥
王普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210631403.0A priority Critical patent/CN115085985B/en
Publication of CN115085985A publication Critical patent/CN115085985A/en
Application granted granted Critical
Publication of CN115085985B publication Critical patent/CN115085985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A memory high-efficiency range base measuring method for network security monitoring is characterized in that repeated base information is eliminated through a polymerization key to improve range base estimation precision, a self-adaptive counter is used for dynamically adjusting memory use through network flow distribution conditions, optimal allocation of memory is achieved, and efficient high-precision security monitoring on a network in a resource-limited scene is met; the invention solves the error problem caused by single host cardinal number accumulation, simultaneously solves the memory redundancy problem caused by unbalanced data distribution in network flow, can identify and track the abnormal range in the network, and realizes the network safety monitoring of different granularities.

Description

Memory high-efficiency range base number measuring method for network security monitoring
Technical Field
The invention relates to the technical field of network security monitoring, in particular to a memory high-efficiency range base number measuring method for network security monitoring.
Background
Cardinality estimation is an important task in network security monitoring, and unlike the flow size and frequency estimation task, cardinality estimation is intended to give the number of different elements in a data flow, with non-additivity. Current research has proposed many cardinality measurement methods that implement single host cardinality estimation, but range cardinality estimation is rarely studied. Range cardinality refers to the overall cardinality of a set of flows, e.g., the cardinality of a host that can monitor a range of source addresses by setting keys to source addresses; when the keys are set as commodity numbers, the times of accessing a certain group of commodities by different customers can be calculated, and better commodity recommendation is realized; when the key is set as the target address, whether any host computer in the current network range is attacked by DDoS can be monitored.
The current cardinality measurement method is divided into three categories according to the difference of counting modes: a bitmap-based radix measurement method, a probability counting algorithm-based radix measurement method, and a bit-sharing radix measurement method.
The bitmap-based radix measurement method comprises the following steps: the estimation operation is based on the characteristic of a bitmap structure, and the linear increase of the bitmap memory usage along with the data volume is considered, so that the existing bitmap-based radix measurement method is difficult to be used for measuring the range radix in an actual scene due to high memory usage;
the radix measuring method based on the probability counting algorithm comprises the following steps: the probability estimation algorithm stores the probability of the occurrence of different elements as information. Compared with a bitmap structure, the radix measurement method based on the probability counting algorithm can realize accurate radix information recording in a smaller memory space. Therefore, in an actual scene, the probability counting-based radix measurement method is widely applied, although the high-efficiency and high-precision radix estimation of the memory can be realized, the methods mainly measure the radix of a single host, and repeated information cannot be removed during range radix estimation, so that a large accumulated error is caused;
radix measurement method for bit sharing: the bit sharing base number measuring method shares the counter memory used by the small base number host computer to the large base number host computer, thereby further reducing the memory overhead. At present, researches on a bit sharing base number measuring method mainly focus on the direction of memory optimization, researchers propose a series of base number measuring algorithms with excellent performance, and the bit sharing base number measuring method also has the problem of low-accuracy range base number estimation caused by incapability of removing repeated base number information.
In summary, although the above structure can be measured by multi-point query accumulation or multi-stage radix measurement methods to estimate the range radix, these methods suffer from the problems of too large memory occupation or low precision estimation, and cannot be applied in the real network security measurement scenario. Therefore, it is desirable to design a range-based estimation algorithm with high memory efficiency and high accuracy to monitor high-speed network flows, so as to implement efficient network security monitoring.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a memory high-efficiency range base number measuring method for network security monitoring, which solves the error problem caused by single host base number accumulation, solves the memory redundancy problem caused by unbalanced data distribution in network flow, can identify and track the abnormal range in a network and realizes network security monitoring with different granularities.
In order to achieve the purpose, the invention adopts the technical scheme that:
a memory high-efficiency range base measuring method for network security monitoring improves range base estimation precision by eliminating repeated base information through a polymerization key, dynamically adjusts memory use through network flow distribution conditions by utilizing a self-adaptive counter, realizes optimal memory allocation, and meets the requirement of high-efficiency high-accuracy security monitoring on a network in a resource-limited scene.
A memory high-efficiency range base number measuring method for network security monitoring comprises the following operation functions:
(1) The function of the function f (k) is to aggregate keys within a predefined range into the same bucket; the method comprises the steps that original high-dimensional data are mapped into independent buckets according to data similarity through a local sensitive hash function, keys in a predefined range are regarded as similar data through similar operation by the aid of the memory high-efficiency range base number measuring method, and the keys are aggregated through the local sensitive hash function of a scalar version; specifically, a preset measurement range r is given, the domain of the data stream is D, and the hash function is:
Figure BDA0003680051300000031
wherein l = D/r represents the number of buckets of the locality sensitive hash function, b-U (0 l), and the above formula shows that l/a continuous keys are mapped into one bucket, so a = l/r;
(2) Function h i (f (k)) 1. Ltoreq. I. Ltoreq. D) are d hash functions h i (f(k))=(f(k)+c i ) mod ω, where is c i Processing data of random values in a set (0,1.,. Omega-1) to obtain an index position of the random values in each row of the range base number measurement algorithm structure;
(3) The function g (u) encodes the elements into a binary sequence of length and each bit obeying a geometric distribution parameter of 1/2; the Murmur3 hash function is used to achieve a uniform independent bit distribution, and furthermore, the function ρ x (g (v)) truncates the first x bits on the right and returns their decimal value.
A memory high-efficiency range base number measuring method for network security monitoring comprises the following functional modules:
radix counter SC ij : the base counter adopts bitmap structure to record range base information and provide approximate range base estimation, and is composed of bits with initial value of 0, and is marked as B ij [1]…,B ij [m];SC ij The size of the network flow is adaptively expanded according to the change of the network flow to improve the cardinalityAn estimated upper limit;
degree of congestion CD ij : it represents the ratio of the median position "1", with large representing SC ij There are more elements in the cell, that is, the current cell stores large cardinality information;
expansion times ET ij : it records SC ij The number of expansion is increased, and each expansion makes SC ij The space of (a) is expanded to twice of the original space;
exception Range Address ID ij :ID ij The exception range address in the cell is recorded, the address being dependent on the UT ij The value of (d) is replaced by a probability;
number of updates UT ij : records the initiation SC ij The number of updates, i.e., the bit is set to 1 from 0; since the hash function g uniformly maps each element to be measured, an abnormal range having a plurality of different elements causes SC ij And (4) updating for multiple times.
The memory high-efficiency range base number measuring method for network security monitoring comprises the following steps:
1) Collecting network flow: collecting network flow information of a plurality of network nodes by using a range base number measurement algorithm; with the dynamic change of network flow, the range base number measurement algorithm dynamically adjusts the memory usage, and the optimal memory usage of multiple nodes in different environments is met;
2) Network flow fusion: after the acquisition is finished, the cardinal number information of each node is fused by utilizing the fusibility of a range cardinal number measurement algorithm, so that the cardinal number information of the whole network is obtained, and the comprehensive network safety monitoring is ensured;
3) Network flow analysis: inquiring multi-range base number information by utilizing the estimation operation of a range base number measurement algorithm, detecting an abnormal range in the network by comparing the base number information with a safety rule, and carrying out safety monitoring; and completing the source tracing of the abnormity according to the recorded actual address of the abnormal range, and collecting corresponding safety defense measures.
The method for measuring the memory high-efficiency range base number comprises d × w cells, wherein the cells in the ith row and the jth column are marked as L (i, j), and the value ranges are that j is more than or equal to 1 and less than or equal to d and j is more than or equal to 1 and less than or equal to w respectively.
The specific process of network flow acquisition in the step 1) is as follows:
in the initialization stage, all cells in the range base number measurement structure are 0; when network flows<When k and v > arrive, firstly using a function f (k) to carry out key aggregation on k to obtain the range of the network flow; secondly, calculate the function h i (f (k)) obtaining the cell index of the network flow in each row i (i is more than or equal to 1 and less than or equal to d); finally, obtaining a binary representation of v using a function g (u); given congestion e and bitmap length m =2 c The update situation of each cell has the following two possibilities:
the first condition is as follows: b is ijc +ET ij (g(v))]When =1, no operation is required;
situation(s) II, secondly: b is ijc +ET ij (g(u))]When =0, B is first set ijc +ET ij (g(v))]=1 and updates the congestion counter
Figure BDA0003680051300000053
The following operations are then performed:
examination Condition CD ij If = ∈ true; if true, it indicates the counter SC ij The space of (2) is already filled by different elements, and in order to obtain a larger estimation upper limit, the execution counter needs to be expanded; the bitmap expansion process is as follows, firstly, the original length is divided from
Figure BDA0003680051300000051
Expanded as->
Figure BDA0003680051300000052
ET ij Adding 1; then, performing a data transfer operation; because of the lack of specific hash values for the elements in each bit, there is no precise radix information in the transition, so the range radix measurement structure uses an isolemic hash function p (-) to randomly map the row number to 0 or 1; if p (i) =1, the information for each bit in the original bitmap will be saved in the pre-expansion position, i.e. < >>
Figure BDA0003680051300000061
Otherwise, the information of each bit in the original bitmap is moved to the expanded sequence, i.e. </or>
Figure BDA0003680051300000062
Inspection condition ID ij If = f (k) is true; if true, execute UT ij + =1; otherwise, with
Figure BDA0003680051300000063
Probability reduction UT ij Wherein b is a preset constant; if UT is ij Becomes 0, and sets the ID to ij Is replaced by and sets UT ij Is 1.
The specific process of network flow fusion in the step 2) is as follows:
multiple range-based measurements structures are fused into one range-based measurement structure by a merge operation, giving a set of range-based measurements structures with the same parameters, denoted as { LS 1 ,...,LS Y The merging step of the cell L (i, j) is as follows:
radix counter fusion: bitmap for multiple range radix measurement structure
Figure BDA0003680051300000064
Carry out a bit-by-bit or operation, i.e. </or>
Figure BDA0003680051300000065
Wherein the symbol->
Figure BDA0003680051300000066
Represents a bitwise OR; it should be noted that before bitmap merging, the bitmap lengths of different units should be normalized first;
and fusing a congestion degree counter: CD (compact disc) ij The value of (d) is calculated from the finally merged bitmap;
and (3) fusing an expansion frequency counter: e ij Has a value of
Figure BDA0003680051300000067
Is at a maximum, i.e.>
Figure BDA0003680051300000068
Fusion of referee counters: u shape ij Is composed of
Figure BDA0003680051300000069
Updating the ID record: ID ij To have the maximum
Figure BDA00036800513000000610
The address of the range.
The specific process of network flow estimation in step 3) is as follows:
the query operation returns an estimate of the range-specific cardinality information, assuming a data field of D and a range of size
Figure BDA00036800513000000611
The data stream is divided into intervals>
Figure BDA00036800513000000612
The range-radix measurement structure supports the following query operations:
single range radix query: for a single data interval Q, the range index or any key in the range is used for inquiring single-range base information, and the specific process is as follows: first, k cell indexes L (i, h) of a measurement structure of a range base number Q are obtained i (Q)) (1 ≦ i ≦ d); next, the cell's CD is checked ij And ET ij And the single-range base information is obtained by calculation through a linear estimation method:
Figure BDA0003680051300000071
wherein
Figure BDA0003680051300000072
Is a bitmap counter SC ij λ = ln ((2-e) 2 /4 (1-epsilon)) is a compensation term for cardinality estimation errors caused by bitmap expansion; finally, the range base estimation value of the single data interval is set { RS { 1 (Q),...,RS d A minimum value of (Q) };
multi-range radix query: when the query range comprises a plurality of sub-ranges, the large range is divided into a plurality of small ranges, and the cardinal number information of different small ranges is combined to achieve cardinal number deduplication; using a bitwise OR operation pair Q 1 And Q 2 And carrying out data combination on the bitmaps in each row, if the sizes of the two bitmaps are not equal, the short bitmap is expanded to be consistent with the long bitmap, and after the bitmap information combination operation, the step of range base number estimation is consistent with the algorithm of the single range base number.
The specific process of detecting the abnormal range in the step 3) is as follows:
the memory high-efficiency range base number measuring method identifies an abnormal range according to values in dw cells, and if the abnormal range exceeds a certain threshold value, range addresses in the cells are used for calculating the range base number; and if the calculated range base number still exceeds a preset threshold value, identifying the address as an abnormal range.
The invention has the beneficial effects that:
according to the distribution condition of the network flow cardinality, the range cardinality measurement algorithm can dynamically adjust the memory usage. The host cardinality information is stored by adopting a dynamic memory allocation mechanism, a small counter can be used for monitoring a low-cardinality host to reduce unnecessary memory usage, and a large counter is used for monitoring a high-cardinality host to ensure cardinality measurement accuracy. Therefore, the algorithm provided by the invention realizes the high-efficiency host radix measurement of the memory in the network security monitoring.
The dynamic extensible counter is used, the range radix measurement algorithm can monitor the high radix host and the low radix host at the same time, and the dynamic extensible characteristic can ensure the accuracy of radix measurement of any size.
The range radix measurement method of the present invention supports any range of radix measurements. By utilizing the estimation operation of the range base number measurement algorithm, the base number information of a single-range or multi-range interval can be accurately measured (a single IP network or a plurality of IP networks), and the multi-granularity network safety monitoring is realized.
The range cardinality measurement method of the invention can be distributively deployed at each node of the network to collect network flow information. By utilizing the fusibility of the range base number measurement algorithm, the information of each node can be fused accurately to obtain the global network state.
The invention provides a rapid abnormal range identification method based on a range base number measurement method. The identification algorithm does not need to calculate the cardinal number information of the whole range, and can quickly locate and trace the source of the abnormal range according to the special function module of the range cardinal number measurement algorithm, thereby realizing high-efficiency network safety monitoring.
Compared with the prior art, the invention has the following advantages:
the economic efficiency is as follows: the range radix measurement algorithm can accurately measure the multi-range radix in the scene of limited memory resources, has high resource utilization rate and high throughput rate, can perform distributed measurement, is easy to deploy, and has high commercial value.
Real-time performance: the range base number measurement algorithm is based on Hash operation, ensures the calculation efficiency of data acquisition and query, has lower acquisition and estimation calculation consumption, and can be used for acquiring network flow information in a high-speed network environment in real time.
The universality is as follows: the range cardinality measurement algorithm may be extended to apply to a variety of data flow measurement tasks. For example, range frequency measurements, range network flow size measurements, and range commodity heat tracing may be extended.
High accuracy: the range base number measurement algorithm adopts key aggregation and a dynamic extensible counter, and realizes accurate measurement of network flows in all forms.
And (3) expandability: the invention is not limited to range base number measurement, and can be further expanded to range characteristic estimation by modifying the data stream acquisition and storage mode of the memory high-efficiency measurement method.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
FIG. 2 is a diagram of memory efficient range radix measurement architecture for the method of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and examples.
A memory high-efficiency range base number measuring method for network security monitoring comprises the following operation functions:
(1) The function of the function f (k) is to aggregate keys within a predefined range into the same bucket; the method comprises the steps that original high-dimensional data are mapped into independent buckets according to data similarity through a locality sensitive hash function, keys in a predefined range are regarded as similar data through a memory high-efficiency range base number measuring algorithm through similar operation, and the keys are aggregated through a locality sensitive hash function of a scalar version; specifically, a preset measurement range r is given, the domain of the data stream is D, and the hash function is:
Figure BDA0003680051300000091
wherein l = D/r represents the number of buckets of the locality sensitive hash function, b-U (0,l), and the above formula shows that l/a continuous keys are mapped into one bucket, so a = l/r;
(2) Function h i (f (k)) (1 ≦ i ≦ d) is d mutually independent hash functions, h i (f(k))=(f(k)+c i ) mod w, where is c i Processing data of random values in a set (0,1.,. W-1) to obtain an index position of the random values in each row of the range base number measurement algorithm structure;
(3) The function g (v) encodes the elements into binary sequences of length and each bit obeying a geometric distribution parameter of 1/2 (each bit in the sequence is independent of the other and has a probability of occurrence); the Murmur3 hash function is used to achieve a uniform independent bit distribution, and furthermore, the function ρ x (g (v)) truncating the first x bits on the right and returning their decimal values, e.g.
ρ3(001101)=101| 2 =5。
The functional module for realizing the operation function comprises:
radix counterSC ij : the radix counter adopts bitmap structure to record range radix information and provide approximate estimation of range radix, and is composed of bits with initial value of 0, and is marked as B ij [1],…,B ij [m](ii) a Unlike the conventional bitmap structure, the SC ij The size of the base number is adaptively expanded according to the change of network flow to improve the upper limit of the base number estimation;
degree of congestion CD ij : it represents the ratio of the median position "1", with large representing SC ij There are more elements in the cell, that is, the current cell stores large cardinality information;
expansion times ET ij : it records SC ij The number of expansion is increased, and each expansion makes SC ij The space of (a) is expanded to twice of the original space;
exception Range Address ID ij :ID ij Record the exception range address in the cell, which address will be according to UT ij The value of (d) is replaced by a probability;
number of updates UT ij : records the initiation SC ij The number of updates, i.e., the bit is set to 1 from 0; since the hash function g uniformly maps each element to be measured, an abnormal range having a plurality of different elements causes SC ij And (4) updating for multiple times.
As shown in fig. 1, a memory high-efficiency range radix measurement method for network security monitoring includes the following steps:
1) Collecting network flow: collecting network flow information of a plurality of network nodes by using a range cardinality measurement algorithm; along with the dynamic change of network flow, the range base number measurement algorithm can dynamically adjust the memory usage, so that the optimal memory usage of multiple nodes in different environments is met;
2) Network flow fusion: after the acquisition is finished, the cardinal number information of each node is fused by utilizing the fusibility of a range cardinal number measurement algorithm, so that the cardinal number information of the whole network is obtained, and the comprehensive network safety monitoring is ensured;
3) Network flow analysis: by utilizing the estimation operation of the range base number measurement algorithm, multi-range base number information can be inquired, and the abnormal range in the network is detected by comparing the base number information with the safety rule to carry out safety monitoring; according to the recorded actual address of the abnormal range, the abnormal tracing can be completed, and corresponding security defense measures can be acquired.
As shown in fig. 2, the memory high-efficiency range radix measurement method includes d × w cells, where the cells in the ith row and j are denoted as L (i, j), and the value ranges are i ≦ d being 1 and j ≦ w being 1.
The specific process of network flow acquisition in the step 1) is as follows:
in the initialization stage, all cells in the range base number measurement structure are 0; when network flows<k,v>When the network flow reaches, firstly, a function f (k) is used for carrying out key aggregation on k to obtain the range of the network flow; secondly, calculate the function h i (f (k)) obtaining cell indices for the network stream at each row i (1. Ltoreq. I. Ltoreq. D); finally, obtaining a binary representation of v using a function g (u); given congestion e and bitmap length m =2 c The update situation of each cell has the following two possibilities:
the first condition is as follows: b is ijc +ET ij (g(v))]When =1, no operation is required;
case two: b is ijc +ET ij (g(u))]When =0, B is first set ijc +ET ij (g(u))]=1 and updates the congestion counter
Figure BDA0003680051300000121
The following operations are then performed:
examination Condition CD ij If = ∈ true; if true, it indicates the counter SC ij The space of (2) is already filled by different elements, and in order to obtain a larger estimation upper limit, the counter expansion operation needs to be executed; the bitmap expansion process is as follows, firstly, the original length is divided from
Figure BDA0003680051300000122
Expanded as->
Figure BDA0003680051300000123
ET ij Adding 1; however, the device is not suitable for use in a kitchenThen, executing data transfer operation; because of the lack of specific hash values for the elements in each bit, there is no precise radix information in the transition, so the range radix measurement structure uses an isolemic hash function p (-) to randomly map the row number to 0 or 1; if p (i) =1, the information for each bit in the original bitmap will be saved in the pre-expansion position, i.e. < >>
Figure BDA0003680051300000124
Otherwise, the information of each bit in the original bitmap is moved to the expanded sequence, i.e. </or>
Figure BDA0003680051300000125
Inspection condition ID ij If = f (k) is true; if true, execute UT ij + =1; otherwise, with
Figure BDA0003680051300000126
Probability reduction UT ij Wherein b is a preset constant; if UT is ij Becomes 0, and sets the ID to ij Is replaced by and sets UT ij 1, the rationality of this update approach is: since the cardinality of the abnormal range is much greater than that of the normal range, there is a greater probability of causing SC ij So that the exception range has a higher UT ij
The specific process of network flow fusion in the step 2) is as follows:
multiple range-based measurements structures are fused into one range-based measurement structure by a merge operation, giving a set of range-based measurements structures with the same parameters, denoted as { LS 1 ,...,LS Y The merging step of the cell L (i, j) is as follows:
radix counter fusion: bitmap for multiple range radix measurement structure
Figure BDA0003680051300000127
Carry out a bit-by-bit or operation, i.e. </or>
Figure BDA0003680051300000128
In which the symbol +>
Figure BDA0003680051300000129
Represents a bitwise OR; it should be noted that before bitmap merging, the bitmap lengths of different units should be normalized first; />
And fusing a congestion degree counter: CD (compact disc) ij The value of (d) is calculated from the finally merged bitmap;
and (3) fusing an expansion frequency counter: e ij Has a value of
Figure BDA0003680051300000131
Is at a maximum, i.e.>
Figure BDA0003680051300000132
Fusion of referee counters: u shape ij Is composed of
Figure BDA0003680051300000133
Updating the ID record: ID ij To have the maximum
Figure BDA0003680051300000134
The address of the range.
The range base number measurement structure can acquire distributed range base number information and perform centralized analysis and processing by virtue of the merging operation of the range base number measurement structure, and the range base number information of the global network can be acquired by the merged range base number measurement structure.
The specific process of network flow estimation in step 3) is as follows:
the query operation returns an estimate of the range-specific cardinality information, assuming a data field of D and a range of size
Figure BDA0003680051300000135
The data stream is divided into intervals->
Figure BDA0003680051300000136
Range radix measurement architecture supports the following queriesThe operation is as follows:
single range radix query: for a single data interval Q (e.g., Q = [ r,2 r)), a range index, or any key in a range is used to query the single-range cardinality information, as follows: first, k cell indexes L (i, h) of a measurement structure of a range base number Q are obtained j (Q)) (1. Ltoreq. I. Ltoreq. D); next, the cell's CD is checked ij And ET ij And the single-range base information is obtained by calculation through a linear estimation method:
Figure BDA0003680051300000137
wherein
Figure BDA0003680051300000138
Is a bitmap counter SC ij λ = ln ((2-e) 2 /4 (1-epsilon)) is a compensation term for cardinality estimation errors caused by bitmap expansion; finally, the process is carried out in a batch, range base estimate for single data interval is set { RS 1 (Q),...,RS d A minimum value of (Q) };
multi-range radix query: when the query range comprises a plurality of sub-ranges, dividing the large range into a plurality of small ranges, and combining the cardinal number information of different small ranges to achieve cardinal number deduplication; for example, when Q = [ r,3 r), Q = [ Q ] can be classified 1 ,Q 2 ]Wherein Q is 1 = [ r,2 r) and Q 2 = [2r, 3r). Using a bitwise OR operation pair Q 1 And Q 2 And carrying out data combination on the bitmaps in each row, if the sizes of the two bitmaps are not equal, the short bitmap is expanded to be consistent with the long bitmap, and after the bitmap information combination operation, the step of range base number estimation is consistent with the algorithm of the single range base number.
The specific process of detecting the abnormal range in the step 3) is as follows:
in order to monitor the abnormal range, the memory high-efficiency range base number measurement algorithm needs to identify the cells with high base numbers, compared with the traditional detection method of traversing all counters and calculating the base numbers, the memory high-efficiency range base number measurement algorithm identifies the abnormal range according to the values in dw cells, and if the abnormal range exceeds a certain threshold value, the range address in the cell is used for calculating the range base number; and if the calculated range base number still exceeds a preset threshold value, identifying the address as an abnormal range. The rationality of this detection approach is: the normal range has a smaller cardinality and the number of expansions of the cell in which it is located is smaller. But the abnormal range has larger cardinality and higher expansion times.

Claims (5)

1. A memory high-efficiency range base number measuring method for network security monitoring is characterized in that: the estimation precision of the range cardinality is improved by eliminating repeated cardinality information through a polymerization key, the use of the memory is dynamically adjusted by utilizing an adaptive counter through the distribution condition of network flow, the optimal allocation of the memory is realized, and the high-efficiency high-precision safety monitoring of the network in a resource-limited scene is met;
the memory high-efficiency range base number measuring method for network security monitoring comprises the following steps:
1) Collecting network flow: collecting network flow information of a plurality of network nodes by using a range base number measurement algorithm; with the dynamic change of network flow, the range base number measurement algorithm dynamically adjusts the memory usage, and the optimal memory usage of multiple nodes in different environments is met;
2) Network flow fusion: after the acquisition is finished, the cardinal number information of each node is fused by utilizing the fusibility of a range cardinal number measurement algorithm, so that the cardinal number information of the whole network is obtained, and the comprehensive network safety monitoring is ensured;
3) Network flow analysis: inquiring multi-range base number information by utilizing the estimation operation of a range base number measurement algorithm, detecting an abnormal range in the network by comparing the base number information with a safety rule, and carrying out safety monitoring; completing the source tracing of the abnormity according to the recorded actual address of the abnormal range, and collecting corresponding safety defense measures;
the specific process of network flow acquisition in the step 1) is as follows:
in the initialization stage, all cells in the range base number measurement structure are 0; when the network flow < k, v > arrives, the function f (k) is used to perform key aggregation on k to obtain the network flowThe range in which it is present; secondly, calculate the function h i (f (k)) obtaining the cell index of the network flow in each row i (i is more than or equal to 1 and less than or equal to d); finally, obtaining a binary representation of v by using a function g (upsilon); given congestion e and bitmap length m =2 c The update situation of each cell has the following two possibilities:
the first condition is as follows: b is ijc +ET ij (g(υ))]When =1, no operation is required;
case two: b ijc +ET ij (g(υ))]When =0, B is first set ijc +ET ij (g(υ))]=1 and updates the congestion counter
Figure FDA0004079161440000021
The following operations are then performed:
examination Condition CD ij If = ∈ true; if true, it indicates the counter SC ij The space of (2) is already filled by different elements, and in order to obtain a larger estimation upper limit, the counter expansion operation needs to be executed; the bitmap expansion process is as follows, firstly, the original length is divided from
Figure FDA0004079161440000022
Expanded as->
Figure FDA0004079161440000023
ET ij Adding 1; then, performing a data transfer operation; because of the lack of specific hash values for the elements in each bit, there is no precise radix information in the transition, so the range radix measurement structure uses an isolemic hash function p (-) to randomly map the row number to 0 or 1; if p (i) =1, the information for each bit in the original bitmap will be saved in the pre-expansion position, i.e. < >>
Figure FDA0004079161440000024
Otherwise, the information of each bit in the original bitmap is moved to the expanded sequence, i.e. </or>
Figure FDA0004079161440000025
Inspection condition ID ij If = f (k) is true; if true, execute UT ij + =1; otherwise, with
Figure FDA0004079161440000026
Probability reduction UT ij Wherein b is a preset constant; if UT is ij Becomes 0, and sets the ID to ij Is replaced by and sets UT ij Is 1;
the specific process of network flow fusion in the step 2) is as follows:
multiple range-based measurements structures are fused into one range-based measurement structure by a merge operation, giving a set of range-based measurements structures with the same parameters, denoted as { LS 1 ,...,LS Y The merging step of the cell L (i, j) is as follows:
radix counter fusion: bitmap for multiple range radix measurement structure
Figure FDA0004079161440000027
Carry out a bit-by-bit or operation, i.e. </or>
Figure FDA0004079161440000028
Wherein the symbol->
Figure FDA0004079161440000029
Represents a bitwise OR; it should be noted that before bitmap merging, the bitmap lengths of different units should be normalized first;
and fusing a congestion degree counter: CD (compact disc) ij The value of (d) is calculated from the finally merged bitmap;
and (3) fusing an expansion frequency counter: e ij Has a value of
Figure FDA0004079161440000031
Is at a maximum, i.e.>
Figure FDA0004079161440000032
Fusion of referee counters: u shape ij Is composed of
Figure FDA0004079161440000033
Updating the ID record: ID ij To have the maximum
Figure FDA0004079161440000034
An address of the range; />
The specific process of network flow estimation in step 3) is as follows:
the query operation returns an estimate of the range-specific cardinality information, assuming a data field of D and a range of size
Figure FDA0004079161440000035
The data stream is divided into intervals>
Figure FDA0004079161440000036
The range-radix measurement structure supports the following query operations:
single range radix query: for a single data interval Q, the range index or any key in the range is used for inquiring single-range base information, and the specific process is as follows: first, k cell indexes L (i, h) of a measurement structure of a range base number Q are obtained i (Q)) (1. Ltoreq. I. Ltoreq. D); next, the cell's CD is checked ij And ET ij And the single-range base information is obtained by calculation through a linear estimation method:
Figure FDA0004079161440000037
wherein
Figure FDA0004079161440000038
Is a bitmap counter SC ij λ = ln ((2-e) 2 Per 4 (1-e)) is the radix estimation error caused by bitmap expansionThe compensation term of (2); finally, the range base number estimated value of the single data interval is set { RS 1 (Q),...,RS d A minimum value of (Q) };
multi-range radix query: when the query range comprises a plurality of sub-ranges, dividing the large range into a plurality of small ranges, and combining the cardinal number information of different small ranges to achieve cardinal number deduplication; using a bitwise OR operation pair Q 1 And Q 2 And carrying out data combination on the bitmaps in each row, if the sizes of the two bitmaps are not equal, the short bitmap is expanded to be consistent with the long bitmap, and after the bitmap information combination operation, the step of range base number estimation is consistent with the algorithm of the single range base number.
2. The method of claim 1, wherein the arithmetic function comprises:
(1) The function of the function f (k) is to aggregate keys within a predefined range into the same bucket; the method comprises the steps that original high-dimensional data are mapped into independent buckets according to data similarity through a local sensitive hash function, keys in a predefined range are regarded as similar data through similar operation by the aid of the memory high-efficiency range base number measuring method, and the keys are aggregated through the local sensitive hash function of a scalar version; specifically, a preset measurement range r is given, the domain of the data stream is D, and the hash function is:
Figure FDA0004079161440000041
wherein l = D/r represents the number of buckets of the locality sensitive hash function, b-U (0,l), and the above formula shows that l/a continuous keys are mapped into one bucket, so a = l/r;
(2) Function h i (f (k)) i 1. Ltoreq. I.ltoreq.d) are J mutually independent hash functions, h i (f(k))=(f(k)+c i ) mod w, where is c i Processing data of random values in a set (0,1.,. W-1) to obtain an index position of the random values in each row of the range base number measurement algorithm structure;
(3) The function g (v) encodes elements of length and each bit obeys a geometric scoreDistributing binary sequences with parameters of 1/2; the Murmur3 hash function is used to achieve a uniform independent bit distribution, and furthermore, the function ρ x (g (v)) truncates the first x bits on the right and returns their decimal values.
3. The method of claim 1, wherein the functional module comprises:
radix counter SC ij : the base counter uses a bitmap structure to record range base information and provide an approximate estimate of the range base, consisting of bits with an initial value of 0, denoted B ij [1],...,B ij [m];SC ij The size of the base number is adaptively expanded according to the change of network flow to improve the upper limit of the base number estimation;
degree of congestion CD ij : it represents the ratio of the median position "1", with large representing SC ij There are more elements in the cell, that is, the current cell stores large cardinality information;
number of expansion ET ij : it records SC ij The number of expansion is increased, and each expansion makes SC ij The space of (a) is expanded to twice of the original space;
exception Range Address ID ij :ID ij The exception range address in the cell is recorded, the address being dependent on the UT ij The value of (d) is replaced by a probability;
number of updates UT ij : records the initiation SC ij The number of updates, i.e., the bit is set to 1 from 0; since the hash function g uniformly maps each element to be measured, an abnormal range having a plurality of different elements causes SC ij And (4) updating for multiple times.
4. The method of claim 1, wherein: the method for measuring the memory high-efficiency range base number comprises d × w cells, wherein the cells in the ith row and the jth column are marked as L (i, j), and the value ranges are that i is larger than or equal to 1 and smaller than or equal to d and j is larger than or equal to 1 and smaller than or equal to w respectively.
5. The method according to claim 1, wherein the specific process of detecting the abnormal range in step 3) is as follows:
the memory high-efficiency range base number measuring method identifies an abnormal range according to values in dw cells, and if the abnormal range exceeds a certain threshold value, range addresses in the cells are used for calculating the range base number; and if the calculated range base number still exceeds a preset threshold value, identifying the address as an abnormal range.
CN202210631403.0A 2022-06-06 2022-06-06 Memory high-efficiency range base number measuring method for network security monitoring Active CN115085985B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210631403.0A CN115085985B (en) 2022-06-06 2022-06-06 Memory high-efficiency range base number measuring method for network security monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210631403.0A CN115085985B (en) 2022-06-06 2022-06-06 Memory high-efficiency range base number measuring method for network security monitoring

Publications (2)

Publication Number Publication Date
CN115085985A CN115085985A (en) 2022-09-20
CN115085985B true CN115085985B (en) 2023-03-31

Family

ID=83249991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210631403.0A Active CN115085985B (en) 2022-06-06 2022-06-06 Memory high-efficiency range base number measuring method for network security monitoring

Country Status (1)

Country Link
CN (1) CN115085985B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116599865B (en) * 2023-05-17 2024-05-24 广州天懋信息***股份有限公司 Distributed traffic deduplication statistical method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709001A (en) * 2016-12-22 2017-05-24 西安电子科技大学 Cardinality estimation method aiming at streaming big data
CN110955685A (en) * 2019-11-29 2020-04-03 北京锐安科技有限公司 Big data base estimation method, system, server and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966006A (en) * 2021-03-11 2021-06-15 北京明略昭辉科技有限公司 Method, apparatus, electronic device and storage medium for cardinality estimation
CN113360532B (en) * 2021-06-07 2022-11-15 东南大学 Network flow cardinality online real-time estimation method based on outline structure
CN113904795B (en) * 2021-08-27 2024-06-04 北京工业大学 Flow rapid and accurate detection method based on network security probe
CN113992541B (en) * 2021-09-11 2023-03-31 西安电子科技大学 Network flow measuring method, system, computer equipment, storage medium and application

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709001A (en) * 2016-12-22 2017-05-24 西安电子科技大学 Cardinality estimation method aiming at streaming big data
CN110955685A (en) * 2019-11-29 2020-04-03 北京锐安科技有限公司 Big data base estimation method, system, server and storage medium

Also Published As

Publication number Publication date
CN115085985A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN111612039B (en) Abnormal user identification method and device, storage medium and electronic equipment
CN110830322B (en) Network flow measuring method and system based on probability measurement data structure Sketch
US20060271547A1 (en) Cluster storage collection based data management
WO2017020747A1 (en) Method and device for detecting slow disk
CN112926635B (en) Target clustering method based on iterative self-adaptive neighbor propagation algorithm
CN112911627B (en) Wireless network performance detection method, device and storage medium
CN115085985B (en) Memory high-efficiency range base number measuring method for network security monitoring
CN110297715B (en) Online load resource prediction method based on periodic feature analysis
CN113992541B (en) Network flow measuring method, system, computer equipment, storage medium and application
CN115248757A (en) Hard disk health assessment method and storage device
CN111782700B (en) Data stream frequency estimation method, system and medium based on double-layer structure
Sanchez et al. Fast trajectory clustering using hashing methods
CN114389974B (en) Method, device and medium for searching abnormal flow node in distributed training system
CN117596010A (en) Efficient high-precision base number measurement method and system for network anomaly detection
CN111540202B (en) Similar bayonet determining method and device, electronic equipment and readable storage medium
CN111865690B (en) Opportunistic network link prediction method based on network structure and time sequence
Wang et al. Stull: Unbiased online sampling for visual exploration of large spatiotemporal data
CN109543712B (en) Method for identifying entities on temporal data set
CN112765219B (en) Stream data abnormity detection method for skipping steady region
CN114881102A (en) Rare class detection method for numerical data
CN105488192A (en) Point cloud data K neighborhood search method
CN110109960A (en) A kind of data acquisition expansion control system and its collecting method
CN113435501B (en) Clustering-based metric space data partitioning and performance measuring method and related components
CN115858895B (en) Multi-source heterogeneous data processing method and system for smart city
Fu et al. Jump Filter: A Dynamic Sketch for Big Data Governance.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant