CN115473836A - Network flow measurement method and device based on flow graph model - Google Patents

Network flow measurement method and device based on flow graph model Download PDF

Info

Publication number
CN115473836A
CN115473836A CN202210976811.XA CN202210976811A CN115473836A CN 115473836 A CN115473836 A CN 115473836A CN 202210976811 A CN202210976811 A CN 202210976811A CN 115473836 A CN115473836 A CN 115473836A
Authority
CN
China
Prior art keywords
network
data packet
flow
queue
network flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210976811.XA
Other languages
Chinese (zh)
Other versions
CN115473836B (en
Inventor
贾焰
任思远
逄博
王晔
廖清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202210976811.XA priority Critical patent/CN115473836B/en
Publication of CN115473836A publication Critical patent/CN115473836A/en
Application granted granted Critical
Publication of CN115473836B publication Critical patent/CN115473836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/13Flow control; Congestion control in a LAN segment, e.g. ring or bus
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/30Flow control; Congestion control in combination with information about buffer occupancy at either end or at transit nodes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Mining & Analysis (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a network flow measuring method and device based on a flow graph model. The method comprises the following steps: inserting a data packet stream received from a network card or a network flow file into a cache queue each time, and extracting data packet information from the data packet; constructing a valley laying matrix for updating and storing network flow characteristics according to a flow graph model and the extracted data packet information, wherein weight vectors on nodes, edges and edges of the flow graph model respectively correspond to IP addresses, network flows among IPs and statistical characteristic vectors of the network flows; and inquiring the valley distribution matrix through a basic inquiry interface to obtain network flow characteristic data. The invention reduces the time-space overhead of network flow measurement and improves the efficiency of network flow measurement.

Description

Network flow measuring method and device based on flow graph model
Technical Field
The invention relates to the technical field of network flow measurement, in particular to a network flow measurement method and device based on a flow graph model.
Background
The network flow measurement is important for network safety, network management and flow engineering, and provides rich information for network health diagnosis, network abnormality detection, fault removal, flow engineering and flow charging. The traditional network traffic monitoring technology mainly comprises: 1) A Deep Packet Inspection (DPI) based method; 2) A flow abstract method based on Netflow and sFlow; 3) Sketch based flow counter method.
The prior art has the following disadvantages:
1) The deep data packet detection method comprises the following steps: the time-space overhead of fine analysis of the data packet load is large, and the popularization of the flow encryption technology influences the effectiveness of analyzing the data packet load content;
2) The flow summarization method comprises the following steps: the advent of traffic obfuscation techniques has enabled malicious traffic to exhibit similar flow statistics to benign traffic, thereby limiting the accuracy of traffic summary-based monitoring methods for network analysis and traffic identification applications;
3) A flow counter method: sketch has a fixed query scene, is difficult to customize a query type, and cannot realize structural query related to the topology of network streams.
Disclosure of Invention
The invention provides a network flow measurement method and device based on a flow graph model, which reduces the space-time overhead of network flow measurement and improves the efficiency of network flow measurement.
An embodiment of the present invention provides a network flow measurement method based on a flowsheet model, including the following steps:
inserting a data packet stream received from a network card or a network flow file into a cache queue each time, and extracting data packet information from the data packet; the data packet information comprises a source IP, a destination IP, a source port, a destination port and a characteristic vector;
constructing a valley laying matrix for updating and storing network flow characteristics according to a flow graph model and the extracted data packet information, wherein weight vectors on nodes, edges and edges of the flow graph model respectively correspond to IP addresses, network flows among the IPs and statistical characteristic vectors of the network flows;
and inquiring the valley distribution matrix through a basic inquiry interface to obtain network flow characteristic data.
Further, when the valley distribution matrix is queried, the query includes edge query, node query and composite query, and the composite query refers to user-defined query based on the edge query and the node query.
Further, after the valley distribution matrix is queried, a csv file is generated according to a query result, each row in the csv file records a statistical feature vector corresponding to one network flow, and the statistical feature vector of the network flow takes a quadruple consisting of a source IP, a destination IP, a source port and a destination port of the network flow as an identifier.
Further, a valley distribution matrix used for updating and storing network flow characteristics is constructed according to the flow graph model and the extracted data packet information, the valley distribution matrix is a two-dimensional pointer matrix, and each bucket of the valley distribution matrix points to a nested Cuckoo hash table.
Further, constructing a valley distribution matrix for updating and storing network flow characteristics according to the flow graph model and the extracted data packet information, comprising the following steps of:
calculating a row index and a column index of the network flow corresponding to each data packet in the valley matrix through an xxhash function;
and updating an external valley laying hash table and an internal valley laying hash table according to the extracted data packet information, the row index of the data packet and the column index of the data packet.
Further, the cache queue is a lock-free ring queue.
Further, the lock-free ring queue updates and dequeues queue elements according to the following steps:
when updating the queue element, determining a unique point in the queue through an atom increment pointing to the head of the queue, performing modulo operation on the length of the queue to obtain an address actually inserted into the queue, and inserting a newly added queue element according to the address;
when queue elements are taken out, determining a batch processing element interval according to the atomic weight tail and the atomic weight front, calculating the interval length, when the interval length is larger than or equal to a preset threshold value, performing parallel batch processing on the queue elements in the interval, and updating the value of the atomic weight tail to the value of the atomic weight front after the parallel batch processing is finished; and after the newly added queue element is inserted, keeping the atomic weight tail unchanged, and performing atomic incremental calculation on the atomic weight front.
Further, when the buffer area of the lock-free circular queue is filled, the lock-free circular queue is prevented from continuously updating data through a blocking queue.
Further, the network flow file is a pcap file.
The invention provides a network flow measuring device based on a flow graph model, which comprises a data packet information extraction module, a valley distribution matrix construction module and a network flow characteristic data query module;
the data packet information extraction module is used for inserting a data packet stream received from a network card or a network flow file into a cache queue each time and extracting data packet information from the data packet; the data packet information comprises a source IP, a destination IP, a source port, a destination port and a characteristic vector;
the valley distribution matrix is used for constructing a valley distribution matrix used for updating and storing network flow characteristics according to a flow graph model and the extracted data packet information, and weight vectors on nodes, edges and edges of the flow graph model respectively correspond to IP addresses, network flows among the IPs and statistical characteristic vectors of the network flows;
and the network flow characteristic data query module is used for querying the valley laying matrix through a basic query interface to obtain network flow characteristic data.
The embodiment of the invention has the following beneficial effects:
the invention provides a network flow measuring method and a device based on a flow graph model, the method constructs a valley distribution matrix for updating and storing network flow characteristics according to the flow graph model and the extracted data packet information, and weight vectors on nodes, edges and edges of the flow graph model respectively correspond to an IP address, a network flow among IPs and a statistical characteristic vector of the network flow, so that the topological structure of the network flow can be updated and the statistical characteristic of the network flow can be calculated simultaneously within constant time cost, and the million-level data packet throughput per second can be realized in an evaluation experiment. Therefore, the invention reduces the time-space overhead of network flow measurement and improves the efficiency of network flow measurement.
Drawings
Fig. 1 is a schematic flowchart of a network traffic measurement method based on a flow graph model according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a network flow measurement apparatus based on a flowsheet model according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a lock-free circular queue of a network flow measurement method based on a flow graph model according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a valley matrix of the network flow measurement method based on the flowsheet model according to an embodiment of the present invention.
Detailed Description
The technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a method for measuring network traffic based on a flowsheet model according to an embodiment of the present invention includes the following steps:
step S101: the packet stream received from the network card or network traffic file each time is inserted into a cache queue and packet information is extracted from the packets. Preferably, the network traffic file is a pcap file, and the cache queue is a lock-free ring queue. By inserting the data packet flow into a lock-free circular queue, the data packet elements are processed according to the arrival sequence and the high-speed network flow data is processed correspondingly. The high-speed data packet buffer is a single-producer multi-consumer lock-free queue and supports batch extraction of elements from the buffer queue and simultaneous processing by multiple threads.
The structure of the lock-free circular queue is shown in fig. 3, which updates and fetches queue elements according to the following steps:
when updating the queue element, determining a unique point in the queue through an atomic increment head pointing to the head of the queue, performing modular operation on the length of the queue to obtain an address actually inserted into the queue, and inserting a newly added queue element according to the address; specifically, the queue element is a data packet.
When taking out queue elements, determining a batch processing element interval according to atomic weight tail and atomic weight front, calculating interval length | front-tail |, when the interval length is greater than or equal to a preset threshold value, performing parallel batch processing on the queue elements in the interval, and updating the value of the atomic weight tail into the value of the atomic weight front after the parallel batch processing is finished; and after the newly added queue element is inserted, keeping the atomic weight tail unchanged, and performing atomic incremental calculation on the atomic weight front.
When the buffer area of the lock-free circular queue is filled, the lock-free circular queue is prevented from continuously updating data through a blocking queue, and data packets which are not processed can be prevented from being covered by data which are updated. The ring structure design of the lock-free ring queue can ensure the safe use of the memory of the buffer area and realize automatic reuse.
The packet information includes, as one of the embodiments, a source IP, a destination IP, a source port, a destination port, and a feature vector. For a packet pkt = (sip, dip, sport, dport, w) in the cache, the first four entries respectively represent a source IP, a destination IP, a source port, and a destination port of the packet, and the last entry w represents a feature vector of the packet, where the feature vector includes flag bits such as a protocol number, a packet payload length, a timestamp, a SYN, and an ACK, and a window length and a header length. If it is a UDP packet, the missing feature is initialized to 0.
Step S102: and constructing a valley laying matrix for updating and storing network flow characteristics according to the flow graph model and the extracted data packet information, wherein the weight vectors of the nodes, the edges and the edges of the flow graph model respectively correspond to IP addresses, network flows among the IPs and the statistical characteristics of the network flows.
And modeling linear network flow data into a flow graph model by respectively corresponding the weight vectors of the nodes, the edges and the edges of the flow graph model to the IP addresses, the network flows among the IPs and the statistical characteristic vectors of the network flows. The valley distribution matrix is composed of a pointer matrix with the size of m × n, each bucket points to a nested Cuckoo hash table, and can be used for rapidly updating the topological structure and the weight vector on the edge of the flow graph, and the structure of the valley distribution matrix is shown in fig. 4. The feature vector of the data packet is some feature information of the data packet itself, the statistical feature vector of the network flow is a statistical value obtained by calculating the feature information carried by the data packet, for example, there is a header length in the feature of each data packet, the network flow performs minimum value statistics on the header lengths of all the data packets included in the flow, and the obtained result is a minimum header length feature of the network flow.
As an embodiment, constructing a valley distribution matrix for updating and storing network flow characteristics according to a flow graph model and the extracted data packet information includes the following steps:
step S11: and calculating a row index and a column index of the network flow corresponding to each data packet in the valley laying matrix through an xxhash function. The calculation method is as follows: the row index rindex = xxhash (sip)% m, and the column index cindex = xxhash (dip)% n.
Step S12: and updating an external valley hash table (corresponding to the external hash table in fig. 4) and an internal valley hash table (corresponding to the internal hash table in fig. 4) according to the extracted data packet information, the row index of the data packet and the column index of the data packet. Namely, updating the valley hash table stored in the barrel of the rindex row and cindex column of the matrix according to the extracted data packet information. Specifically, the method comprises the following steps:
representing each IP address by a 32-bit unsigned integer, and retrieving the external valley hash table using [ (sip, dip), p ] as a key-value pair, where p is a pointer to the internal valley hash table; when the retrieval result is empty, creating a new record pointing to the internal valley distribution hash table in the external valley distribution hash table, wherein the new record specifically comprises: [ (sip, dip), p ], where (sip, dip) is key and p is value; and then a network flow record is created in the internal valley-laying hash table according to the data packet information.
Each port number is represented by a 16-bit unsigned integer, and the [ (sport, dport) w ] is used as a key value of an internal valley-laying hash table pointed by p to search the internal valley-laying hash table, wherein the w represents a statistical characteristic vector of the network flow, and comprises 68-dimensional characteristics such as the statistical value (such as sum, maximum, mean and variance) of the byte number of the network flow, the statistical value of the arrival time interval of a data packet, SYN (and 8 total sign counts of FIN, ACK, PSH, RST, URG, ECE and CWR), and the maximum header length; and when the retrieval result is empty, creating a network flow record in the internal valley distribution hash table according to the data packet information.
Step S103: and inquiring the valley distribution matrix through a basic inquiry interface to obtain network flow characteristic data. When the valley distribution matrix is queried, the query comprises edge query, node query and composite query, wherein the composite query refers to self-defined query based on the edge query and the node query, such as BFS query and Heavy Hitter query. The node queries include first-order successor node queries and first-order predecessor node queries. The present invention queries primitives by implementing three graphs: and the query tasks of the topological structure of the network flow graph and the network flow characteristic vector are realized by edge query, first-order successor node query and first-order predecessor node query.
As one embodiment, after the valley distribution matrix is queried, a csv file is generated according to a query result, each row of the csv file records a statistical feature vector corresponding to one network flow, and the statistical feature vector of one network flow is specifically a statistical feature vector identified by a quadruple formed by a source IP, a destination IP, a source port, and a destination port of the network flow. According to the invention, all network flow characteristics calculated during the operation period are exported as csv files through the valley distribution matrix, so that the persistent storage of the flow logs is realized, and further the subsequent tasks of flow analysis, identification and the like are supported.
As an embodiment, each external valley hash table and each internal valley hash table of the valley matrix are traversed, a combination of a source IP and a destination IP of a network flow and a combination of a source port and a destination port are used as keys of the network flow, a statistical feature vector of the network flow is used as a value, and key value pairs obtained by traversal are stored in a local file line by line, where the file format is a comma-separated value file format (i.e., CSV format).
As an embodiment, the edge query specifically includes: given edge e = (s, d, sp, dp), return weight vector w (e) if there is e in the graph, otherwise return null. Firstly, inquiring a barrel indexed by rows and columns of the matrix H(s), and if the key (s, d) cannot be found in the hash table (corresponding to the external valley hash table) pointed by the barrel, directly returning null. Otherwise, continuously querying the nested hash table pointed by the pointer value corresponding to the key (s, d) (corresponding to the internal valley-laying hash table). If a key (sp, dp) is found in the internal valley hash table, returning the corresponding weight w (e); otherwise, null is returned.
As one of the embodiments, the first-order successor node queries: to find the first order successor set S of IP nodes v, first all buckets of the rows of matrix H (v) are searched. For a bucket in row and column H (v) of the matrix, c is any column of row H (v), if the bucket points to a record with a key value of (v, d) in the hash table, d is added to the set S, where v and d are both IP addresses.
As one of the embodiments, the first-order predecessor node queries: to find a first order predecessor node set P for IP node v, first all buckets of matrix H (v) columns are searched. For a bucket in row c and column H (v) of the matrix, c is any row of column H (v), if the bucket points to a record with a key value of (s, v) in the hash table, s is added to the set P, and s and v are both IP addresses.
The invention realizes a lock-free annular buffer queue, can be used for processing the network flow data arriving at a high speed, and improves the processing efficiency of the network flow data packet by multithread batch processing. By designing a valley distribution matrix based on a flow graph model, the invention can simultaneously update the topological structure of the network flow and calculate the statistical characteristics of the network flow within constant time overhead, and can realize the data packet throughput of million levels per second in an evaluation experiment. The valley-fill matrix has a storage overhead that is linear with the number of network flows. Edge query and node query of the network flow graph are supported, and meanwhile more complex user-defined query can be achieved based on the edge query and the node query. And finally, the statistical characteristic vector of the measured network flow and the topological structure of the network flow graph can be stored persistently in a file form so as to support further analysis of the flow characteristic and the network topological structure.
The invention designs a network flow measuring method based on a flow graph model aiming at the problems of large space-time overhead, single flow characteristic and fixed query scene in the existing network flow measuring technology, can simultaneously update network topology and flow statistical characteristic in linear storage overhead and constant time, and supports the user-defined query type of a graph query interface based on nodes and edges. By realizing the linear storage overhead, the linear increase of the storage overhead for storing the network flow along with the increase of the number of the network flow can be ensured, and the size of the required storage space can be estimated according to the number of the network flow in the actual network environment and the time window during measurement. By implementing a constant time update, the time overhead for processing each packet is guaranteed to be O (1), and the time overhead for updating does not increase as the size of the stored network topology or the number of network flows increases. If the update time of the adjacency linked list is O (d), d is the maximum length of the edge linked list, that is, the computation overhead of the adjacency linked list during updating is increased along with the increase of the length of the edge linked list. Since the traditional traffic summarization method can only obtain the statistical characteristics of the traffic, if a related network topology is desired to be obtained, the graph index needs to be reconstructed from massive traffic data. The present invention saves time and coding overhead for such a reconstructed graph network by updating topology and statistical characteristics simultaneously.
On the basis of the above embodiment of the invention, the present invention correspondingly provides an embodiment of an apparatus, as shown in fig. 2;
another embodiment of the present invention provides a network flow measurement device based on a flow graph model, which includes a data packet information extraction module 11, a valley distribution matrix construction module 12, and a network flow characteristic data query module 13;
the data packet information extraction module is used for inserting data packet streams received from a network card or a network flow file into a cache queue each time and extracting data packet information from the data packets; the data packet information comprises a source IP, a destination IP, a source port, a destination port and a characteristic vector;
the valley distribution matrix is used for constructing a valley distribution matrix used for updating and storing network flow characteristics according to a flow graph model and the extracted data packet information, and weight vectors on nodes, edges and edges of the flow graph model respectively correspond to IP addresses, network flows among the IPs and statistical characteristic vectors of the network flows;
and the network flow characteristic data query module is used for querying the valley laying matrix through a basic query interface to obtain network flow characteristic data.
For convenience and brevity of description, the embodiments of the apparatus according to the present invention include all the embodiments in the above embodiments of the method for measuring a network flow based on a flowsheet model, and are not described herein again.
It should be noted that the above-described embodiments of the apparatus are merely illustrative, where the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement without inventive effort.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.
It will be understood by those skilled in the art that all or part of the processes of the above embodiments may be implemented by hardware related to instructions of a computer program, and the computer program may be stored in a computer readable storage medium, and when executed, may include the processes of the above embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims (10)

1. A network flow measurement method based on a flow graph model is characterized by comprising the following steps:
inserting a data packet stream received from a network card or a network flow file into a cache queue each time, and extracting data packet information from the data packet; the data packet information comprises a source IP, a destination IP, a source port, a destination port and a characteristic vector;
constructing a valley laying matrix for updating and storing network flow characteristics according to a flow graph model and the extracted data packet information, wherein weight vectors on nodes, edges and edges of the flow graph model respectively correspond to IP addresses, network flows among the IPs and statistical characteristic vectors of the network flows;
and inquiring the valley distribution matrix through a basic inquiry interface to obtain network flow characteristic data.
2. The flow graph model-based network flow measurement method according to claim 1, wherein the query on the valley distribution matrix includes an edge query, a node query and a composite query, and the composite query is a custom query based on the edge query and the node query.
3. The flow graph model-based network traffic measurement method of claim 2, wherein after the valley-laying matrix is queried, a csv file is generated according to a query result, each row in the csv file records a statistical eigenvector corresponding to a network flow, and the statistical eigenvector of the network flow is identified by a quadruple consisting of a source IP, a destination IP, a source port and a destination port of the network flow.
4. The flow graph model-based network flow measurement method according to claim 3, wherein a valley distribution matrix for updating and storing network flow characteristics is constructed according to the flow graph model and the extracted data packet information, the valley distribution matrix is a two-dimensional pointer matrix, and each bucket of the valley distribution matrix points to a nested Cuckoo hash table.
5. The flow graph model-based network traffic measurement method according to claim 4, wherein constructing a valley distribution matrix for updating and storing network flow characteristics according to the flow graph model and the extracted data packet information comprises the following steps:
calculating a row index and a column index of a network flow corresponding to each data packet in the valley laying matrix through an xxhash function;
and updating an external valley distribution hash table and an internal valley distribution hash table according to the extracted data packet information, the row index of the data packet and the column index of the data packet.
6. The flow graph model-based network traffic measurement method of claim 5, wherein the cache queue is a lock-free ring queue.
7. The flowsheet model-based network traffic measurement method of claim 6, wherein the lock-free circular queue updates and fetches queue elements according to the following steps:
when updating the queue element, determining a unique point in the queue through an atom increment pointing to the head of the queue, performing modulo operation on the length of the queue to obtain an address actually inserted into the queue, and inserting a newly added queue element according to the address;
when queue elements are taken out, determining a batch processing element interval according to the atomic weight tail and the atomic weight front, calculating the interval length, when the interval length is larger than or equal to a preset threshold value, performing parallel batch processing on the queue elements in the interval, and updating the value of the atomic weight tail to the value of the atomic weight front after the parallel batch processing is finished; and after the newly added queue element is inserted, keeping the atomic weight tail unchanged, and performing atomic incremental calculation on the atomic weight front.
8. The flow graph model-based network traffic measurement method of claim 7, wherein the lock-free circular queue is prevented from continuing to update data by a blocking queue when a buffer of the lock-free circular queue is full.
9. The flow graph model-based network traffic measurement method of any one of claims 1-8, wherein the network traffic file is a pcap file.
10. A network flow measuring device based on a flow graph model is characterized by comprising a data packet information extraction module, a valley distribution matrix construction module and a network flow characteristic data query module;
the data packet information extraction module is used for inserting data packet streams received from a network card or a network flow file into a cache queue each time and extracting data packet information from the data packets; the data packet information comprises a source IP, a destination IP, a source port, a destination port and a characteristic vector;
the valley distribution matrix is used for constructing a valley distribution matrix used for updating and storing network flow characteristics according to a flow graph model and the extracted data packet information, and weight vectors on nodes, edges and edges of the flow graph model respectively correspond to IP addresses, network flows among the IPs and statistical characteristic vectors of the network flows;
and the network flow characteristic data query module is used for querying the valley laying matrix through a basic query interface to obtain network flow characteristic data.
CN202210976811.XA 2022-08-15 2022-08-15 Network flow measurement method and device based on flow graph model Active CN115473836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210976811.XA CN115473836B (en) 2022-08-15 2022-08-15 Network flow measurement method and device based on flow graph model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210976811.XA CN115473836B (en) 2022-08-15 2022-08-15 Network flow measurement method and device based on flow graph model

Publications (2)

Publication Number Publication Date
CN115473836A true CN115473836A (en) 2022-12-13
CN115473836B CN115473836B (en) 2023-06-06

Family

ID=84367490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210976811.XA Active CN115473836B (en) 2022-08-15 2022-08-15 Network flow measurement method and device based on flow graph model

Country Status (1)

Country Link
CN (1) CN115473836B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112437016A (en) * 2020-11-11 2021-03-02 中国科学技术大学先进技术研究院 Network flow identification method, device, equipment and computer storage medium
US20210211364A1 (en) * 2018-06-05 2021-07-08 Max-Planck-Gesellschaft Zur Förderung D. Wissenschaften E.V. Distributed and timely network flow summarization at scale
CN113590910A (en) * 2021-09-26 2021-11-02 北京金睛云华科技有限公司 Network traffic retrieval method and device
CN113821793A (en) * 2021-08-27 2021-12-21 北京工业大学 Multi-stage attack scene construction method and system based on graph convolution neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210211364A1 (en) * 2018-06-05 2021-07-08 Max-Planck-Gesellschaft Zur Förderung D. Wissenschaften E.V. Distributed and timely network flow summarization at scale
CN112437016A (en) * 2020-11-11 2021-03-02 中国科学技术大学先进技术研究院 Network flow identification method, device, equipment and computer storage medium
CN113821793A (en) * 2021-08-27 2021-12-21 北京工业大学 Multi-stage attack scene construction method and system based on graph convolution neural network
CN113590910A (en) * 2021-09-26 2021-11-02 北京金睛云华科技有限公司 Network traffic retrieval method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄璇丽 等: "基于深度学习的网络流时空特征自动提取方法", vol. 9, no. 2, pages 60 - 69 *

Also Published As

Publication number Publication date
CN115473836B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
CN107566206B (en) Flow measuring method, equipment and system
US11757739B2 (en) Aggregation of select network traffic statistics
CN106657038B (en) Network traffic anomaly detection and positioning method based on symmetry Sketch
EP2530874B1 (en) Method and apparatus for detecting network attacks using a flow based technique
US10097464B1 (en) Sampling based on large flow detection for network visibility monitoring
US9979624B1 (en) Large flow detection for network visibility monitoring
CN106452868A (en) Network traffic statistics implement method supporting multi-dimensional aggregation classification
US10033613B1 (en) Historically large flows in network visibility monitoring
CN111131084B (en) QoS-aware OpenFlow flow table searching method
CN103714134A (en) Network flow data index method and system
US10003515B1 (en) Network visibility monitoring
CN110535825B (en) Data identification method of characteristic network flow
CN106062740B (en) Method and device for generating multiple index data fields
CN112486914B (en) Data packet storage and quick-checking method and system
CN114205253A (en) Active large flow accurate detection framework and method based on small flow filtering
Gou et al. Graph stream sketch: Summarizing graph streams with high speed and accuracy
CN111200542B (en) Network flow management method and system based on deterministic replacement strategy
Li et al. Ladderfilter: Filtering infrequent items with small memory and time overhead
CN111835599B (en) SketchLearn-based hybrid network measurement method, device and medium
CN115473836A (en) Network flow measurement method and device based on flow graph model
Li et al. Scalable packet classification using bit vector aggregating and folding
Bandi et al. Fast algorithms for heavy distinct hitters using associative memories
Ahmadi et al. Modified collision packet classification using counting bloom filter in tuple space.
CN113360532B (en) Network flow cardinality online real-time estimation method based on outline structure
CN113965492A (en) Data flow statistical method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant