CN113746700B - Elephant flow rapid detection method and system based on probability sampling - Google Patents

Elephant flow rapid detection method and system based on probability sampling Download PDF

Info

Publication number
CN113746700B
CN113746700B CN202111028109.2A CN202111028109A CN113746700B CN 113746700 B CN113746700 B CN 113746700B CN 202111028109 A CN202111028109 A CN 202111028109A CN 113746700 B CN113746700 B CN 113746700B
Authority
CN
China
Prior art keywords
data packet
flow
information
index
pkt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111028109.2A
Other languages
Chinese (zh)
Other versions
CN113746700A (en
Inventor
彭伟
段晨
王宝生
赵宝康
郦苏丹
唐竹
原玉磊
陶静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202111028109.2A priority Critical patent/CN113746700B/en
Publication of CN113746700A publication Critical patent/CN113746700A/en
Application granted granted Critical
Publication of CN113746700B publication Critical patent/CN113746700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method and a system for rapidly detecting an elephant flow based on probability sampling, wherein the method for rapidly detecting the elephant flow based on the probability sampling comprises the following steps: 1) The data packet forwarding module sends the five-tuple information of the data packet head of the passing data packet and the occupation proportion of the interface queue; 2) Receiving quintuple information of the head of a data packet and the occupation ratio of an interface queue sent by a data packet forwarding module, counting the data packet by adopting a probability sampling method, and detecting the elephant flow based on the counting result of the data packet; 3) And storing the detected elephant flow information based on the elephant flow storage queue. The invention can realize the purposes of low time overhead, local quick decision of the switch and real-time detection of the elephant flow, does not bring large memory overhead, and can be deployed on programmable switches, intelligent network cards, commercial switch chips and any forwarding hardware.

Description

Elephant flow rapid detection method and system based on probability sampling
Technical Field
The invention relates to a computer network communication technology, in particular to a method and a system for rapidly detecting elephant flow based on probability sampling.
Background
With the rapid development of cloud computing and big data, the performance requirements of operators on a data center network are continuously increased. Traffic scheduling has been a persistent and difficult problem in data center networks for many years. Existing research has shown that in a data center network, flows accounting for 1% of the total number of flows in the data center network produce 90% of the total traffic, and these flows accounting for 1% of the total number of flows are called elephant flows. Scheduling of elephant flows is therefore an important factor affecting data center network performance. In order to allocate a reasonable transmission path for the elephant flow, thereby reducing network congestion and improving the network load balancing situation, a data center operator needs an accurate and fast elephant flow detection method.
The elephant flow detection is to detect the flow with large occupied bandwidth and long transmission time in the network under the condition that the starting and ending time and the transmission rate of the flow in the network are unknown. The current elephant flow detection method is mainly divided into the following two categories. The first method is to identify the elephant flow based on periodic flow granularity statistics such as the number of bytes of a data packet and the number of data packets. This method is often applied in the context of SDN software defined networks. The SDN switch counts the number of bytes and the number of data packets transmitted by the flow, and the SDN controller periodically inquires the SDN switch about the number of bytes and the number of data packets transmitted by each flow, so as to obtain the elephant flow in the network by screening. The method mainly faces the problem that the timeliness of periodic statistical data is insufficient. There is a time delay of at least one RTT for communication between the SDN controller and the SDN switch and the statistical period is typically in seconds. The second method is to count the number of data packets occupied by each flow in the queue of the switch interface according to the real-time snapshot or a plurality of continuous snapshots of the queue. The method is applied to the programmable switch and has strong timeliness. However, the real-time snapshot method has instability, and the result of one queue snapshot cannot objectively reflect the size of the stream. The method of multiple continuous snapshots introduces large memory overhead to store the snapshot result, and puts high requirements on the programmable switch hardware.
Currently, programmable switches enable programmability of packet processing. A schematic diagram of a programmable switch for processing packets is shown in fig. 1. The packet forwarding module extracts five-tuple information (source IP address, destination IP address, source port number, destination port number, and protocol number) of the packet header from the interface ingress queue of the switch. And the data packet forwarding module determines a forwarding outlet interface of the data packet according to the forwarding rule. And then the data packet forwarding module copies the data packet from the input queue of the receiving interface to the output queue of the interface corresponding to the forwarding output interface.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the invention provides a method and a system for rapidly detecting the elephant flow based on probability sampling, which can realize the purposes of low time overhead, local rapid decision of a switch and real-time detection of the elephant flow, can not bring large memory overhead and can be deployed on any forwarding hardware including but not limited to a programmable switch, an intelligent network card, a commercial switch chip and the like, aiming at the problems of insufficient timeliness of detecting the elephant flow and large memory overhead caused by a snapshot method by a centralized controller in an SDN scene.
In order to solve the technical problems, the invention adopts the technical scheme that:
a method for rapidly detecting elephant flow based on probability sampling comprises the following steps:
1) The data packet forwarding module sends the five-tuple information of the data packet head of the passing data packet and the occupation proportion of the interface queue;
2) Receiving quintuple information of the head of the data packet and the occupation proportion of an interface queue sent by a data packet forwarding module, counting the data packet by adopting a probability sampling method, and detecting the elephant flow based on the counting result of the data packet;
3) And storing the detected elephant flow information based on the elephant flow storage queue.
Optionally, the processing step of the packet forwarding module in step 1) for any passing packet pkt includes:
1.1 Read the head Quintuple information Quinumple of the data packet pkt passing through from the queue corresponding to the input interface InInt, the head Quintuple information Quinuple is a binary string formed by splicing a source IP address srcIP, a destination IP address dstIP, a source port number srcPort, a destination port number dstPort and a protocol number protocol;
1.2 Inquiring a hardware forwarding table according to head Quintuple information Quintuple of the data packet pkt to obtain a forwarding output interface OutInt of the data packet pkt;
1.3 Copying the data packet pkt from an in-queue of an interface InInt to an out-queue OutQueue of a forwarding out-interface OutInt, and acquiring the number CurrentNum of the data packets existing in the out-queue OutQueue of the forwarding out-interface OutInt;
1.4 Dividing the total Length of the out-queue OutQueue by the number CurrentNum of the data packets existing in the out-queue OutQueue of the forwarding out-interface OutInt to obtain an interface queue occupation ratio ORatio;
1.5 Quintuple information quintuplet of the header of the output packet pkt and the occupation ratio ORatio of the interface queue.
Optionally, step 2) comprises:
2.1 Judging whether header Quintuple information Quinuple and an interface queue occupation ratio ORatio of the data packet pkt are received or not, and if yes, skipping to execute the step 2.2); otherwise, continuing to return to execute the step 2.1);
2.2 ) if the occupation ratio ORatio is less than or equal to the preset minimum threshold Min th If yes, the data packet is not counted; if the occupation ratio ORatio of the interface queue is larger than the preset minimum threshold Min th And less than or equal to a preset maximum threshold value Max th Counting the data packet pkt by the probability p; if the occupation ratio ORatio of the interface queue is larger than the preset maximum threshold value Max th Counting the data packet directly; elephant flow is detected based on packet count results.
Optionally, step 2.1) is preceded by the following initialization steps: initially setting the countless data packet number as 0, the countless data packet number is in the value range of 0]Minimum threshold Min preset by interface queue occupation ratio ORatio th And Max of the highest threshold th Timeout time Tmax of flow count information in hash table, maximum value Max of real-time count probability q q (ii) a Step 2.2) comprises the following steps:
2.2.1 Initializing a position Index of a flow to which the packet pkt belongs in the hash table to 0, and initializing flow count information CountUpdate of the flow to which the packet pkt belongs to 0;
2.2.2 ) determine that the occupation ratio ORatio of the interface queue is greater than the preset minimum threshold Min th And less than or equal to a preset maximum threshold value Max th If yes, skipping to step 2.2.3), if the occupation ratio ORatio of the interface queue is greater than the preset maximum threshold value Max th Then jump to step 2.2.4), if the occupation ratio ORatio of the interface queue is less than or equal toPreset minimum threshold Min th Then jump to step 2.2.5);
2.2.3 According to q = Max q (Oratio–Min th )/(Max th –Min th ) Calculating a real-time count probability q, wherein Max q The value range of the real-time counting probability q is [0,Max ] as the maximum value of the real-time counting probability q q ](ii) a Calculating a counting probability p according to p = q/(1-count q), wherein the count is the number of the data packets which are not counted, and the value range of the counting probability p is [0, 1%](ii) a Multiplying the counting probability p by a preset random number value upper boundary value to obtain a random number rand; if the random number rand is smaller than the preset threshold value m, setting the countable number of the data packets to 0, counting the data packets pkt, and storing the flow information of the data packets pkt in a hash table; otherwise, adding 1 to the countless data packet number count on the basis of the original value, if the countless data packet number count after adding 1 is greater than the total Length of the out queue, setting the countless data packet number count to 0, counting the data packets pkt, and storing the flow information to which the data packets pkt belong in a hash table; jump execution step 2.2.6); otherwise, skipping to execute the step 2.2.5);
2.2.4 Count the number of the data packets pkt by setting the countless number of the data packets to 0, and storing the flow information to which the data packets pkt belong in a hash table; jump execution step 2.2.6);
2.2.5 Does not count the packet pkt, sets the location Index of the flow to which the packet pkt belongs in the hash table to 0, and sets the flow count information CountUpdate of the flow to which the packet pkt belongs to 0; jump execution step 2.2.6);
2.2.6 Detect elephant flow based on the count result of the data packets pkt.
Optionally, the step of storing the flow information to which the data packet pkt belongs in the hash table includes:
s1) acquiring currenttime at the current time; calculating a position Index Index1 of a flow to which the data packet pkt belongs in a hash table by using a preset first hash function; if the position corresponding to the position Index1 in the hash table is empty, inserting the head Quintuple information Quinumple, the flow count information 1 and the current time currenttime of the data packet pkt into the position corresponding to the position Index1 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); if the position corresponding to the position Index Index1 in the hash table is not empty, skipping to the next step;
s2) extracting stream quintuple information QuinupleA, stream count information CountA and filling time timeA of the position Index1; if the extracted stream Quintuple information QuintupleA of the position Index1 and the header quintuplet information Quintuple of the data packet pkt are equal to each other, the stream count information of the position Index1 is incremented by one, the stream count information CountA is incremented by 1 on the basis of the original value, setting the position Index of the stream to which the data packet pkt belongs in the hash table to be position Index1, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs to be new stream count information CountA, and skipping to execute step S3); otherwise, skipping to execute the step S4);
s3) judging whether the difference value between the current time currenttime and the filling time timeA is greater than the preset time Tmax, and clearing data of a position Index Index1 in the hash table if the difference value is greater than the preset time Tmax; then, inserting head Quintuple information Quintuple, flow count information 1 and current time currenttime of the data packet pkt into a position Index1 in the hash table; setting the position Index of the stream to which the data packet pkt belongs in the hash table as position Index1, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs as 1, and skipping to execute step 2.2.6); otherwise, directly skipping to execute the step 2.2.6); at this time, since it cannot be established that the difference between the current time currenttime and the filling time timeA is greater than the preset time Tmax, it can be considered that the action of counting the flow count information CountA +1 in step S2) is valid, that is, the counting of the data packet pkt is completed, and thus step 2.2.6 can be directly skipped to execute;
s4) calculating a position Index Index2 of the flow to which the data packet pkt belongs in the hash table by using a preset second hash function; if the position corresponding to the position Index2 in the hash table is empty, inserting the head Quintuple information quintuplet, the flow count information 1 and the current time currenttime of the data packet pkt into the position corresponding to the position Index2 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); if the position corresponding to the position Index2 in the hash table is not empty, skipping to the next step;
s5) extracting stream quintuple information QuintupleB, stream count information CountB and filling time timeB of the position Index2; if the extracted stream Quintuple information QuintupleB of the position Index2 and the header quintuplet information quintuplet of the packet pkt are equal to each other, the stream count information of the position Index2 is incremented by one, the stream count information CountB is incremented by 1 on the basis of the original value, setting the position Index of the stream to which the data packet pkt belongs in the hash table as position Index2, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs as new stream count information CountB, and skipping to execute step S6); otherwise, skipping to execute the step S7);
s6) judging whether the difference value between the currenttime and the filling time timeB at the current time is greater than the preset time Tmax or not, and emptying data of a position Index2 in the hash table if the difference value is greater than the preset time Tmax; then, inserting head Quintuple information Quintuple, flow count information 1 and current time currenttime of the data packet pkt into a position Index2 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, directly skipping to execute the step 2.2.6);
s7) judging whether the difference between the currenttime and the filling time timeA at the current time is greater than the preset time Tmax or not, and emptying data of a position Index Index1 in the hash table if the difference is greater than the preset time Tmax; then, inserting head Quintuple information Quinuple, flow count information 1 and current time currenttime of the data packet pkt into a position Index Index1 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, skipping to execute the step S8);
s8) judging whether the difference between the currenttime and the filling time timeB at the current time is greater than the preset time Tmax or not, and emptying data of a position Index2 in the hash table if the difference is greater than the preset time Tmax; then, inserting head Quintuple information Quintuple, flow count information 1 and current time currenttime of the data packet pkt into a position Index2 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, skipping to execute the step S9);
s9) calculating the minimum value minCount = Min { CountA, countB } between the flow count information CountA and the flow count information CountB, determining a corresponding position index minIndex in the hash table according to the minimum value minCount, and obtaining quintuple information minQuintuplle, flow count information minCount and filling time minTime of the position index minIndex;
s10) calculating the position index of quintuple information minQuintUPle in the hash table to be equal to minIndex1 by using a preset first hash function, if the position corresponding to the position index minIndex1 in the hash table is empty, inserting the quintuple information minQuintUPle, flow count information minCount and filling time minTime into the position corresponding to the position index minIndex1 in the hash table, and skipping to execute the step S12); otherwise, jumping to execute the step S11);
s11) calculating the position index of quintuple information minQuintUPle in the hash table to be equal to minIndex2 by using a preset second hash function, if the position corresponding to the position index minIndex2 in the hash table is empty, inserting the quintuple information minQuintUPle, flow count information minCount and filling time minTime into the position corresponding to the position index minIndex2 in the hash table, and skipping to execute the step S12); otherwise, skipping to execute the step S12);
s12) emptying data of a position index minIndex in the hash table, and jumping to execute the step S13);
s13) inserting Quintuple information Quinumple, 1 and filling time currenttime of the stream to which the data packet pkt belongs into a position Index minIndex in the hash table, setting the position Index Index of the stream to which the data packet pkt belongs in the hash table as the position Index minIndex, and setting the stream count information countUpdate of the stream to which the data packet pkt belongs as 1; jump execution step 2.2.6);
optionally, step 2.2.6) comprises: if the counting result of the data packet pkt is greater than the preset elephant flow counting threshold value, judging that the flow to which the data packet pkt belongs is the elephant flow, and emptying data of a position Index in the hash table; otherwise, the stream to which the data packet pkt belongs is judged to be not the elephant stream.
Optionally, the elephant flow storage queue in step 3) is a round-robin queue with a length of LenQueue, the round-robin queue includes a plurality of queue units connected end to end, the elephant flow storage queue includes a head pointer HeadP and a tail pointer TailP, the head pointer HeadP points to an element position of the earliest joining queue, the tail pointer TailP points to an element position of the latest joining queue of the round-robin queue, and the elephant flow storage queue is initialized to be empty, and both the initial head pointer HeadP and the initial tail pointer TailP are equal.
Optionally, the step of performing storage based on the elephant flow storage queue in step 3) includes:
3.1 Judging whether quintuple information EleQuintuple of the elephant flow is received or not, and executing the next step if the quintuple information EleQuintuple of the elephant flow is received; otherwise, continuously returning to the step 3.1) to continuously carry out detection;
3.2 Judging whether the queue unit pointed by the tail pointer TailP is empty, and if not, executing the next step; otherwise, skipping to execute the step 3.5);
3.3 Clearing the queue element pointed to by the tail pointer TailP;
3.4 Update the head pointer HeadP according to HeadP = (HeadP + 1)% LenQueue, where% is modulo arithmetic;
3.5 Store quintuple information elequeintuple of elephant flow into a queue unit pointed by a tail pointer Tailp;
3.6 Update the tail pointer TailP according to TailP = (TailP + 1)% LenQueue, end.
In addition, the invention also provides a system for rapidly detecting the elephant flow based on the probability sampling, which comprises an input module with at least one input port and a corresponding input queue, an output module with at least one output port and a corresponding output queue, and a data forwarding controller, wherein the data forwarding controller is respectively connected with the input module and the output module, and is programmed or configured to execute the steps of the method for rapidly detecting the elephant flow based on the probability sampling.
In addition, the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program of the above elephant flow rapid detection method based on probability sampling.
Compared with the prior art, the invention has the following advantages:
1. the method comprises the steps that a data packet forwarding module sends five-tuple information of the head part of a data packet of a passing data packet and the occupation proportion of an interface queue; receiving quintuple information of the head of the data packet and the occupation proportion of an interface queue sent by a data packet forwarding module, counting the data packet by adopting a probability sampling method, and detecting the elephant flow based on the counting result of the data packet; the detected elephant flow information is stored based on the elephant flow storage queue, the elephant flow detection method and the device can realize elephant flow detection, solve the problems of low timeliness and high communication cost of elephant flow detection in an SDN scene, solve the problem of high memory cost of the elephant flow detection method based on queue snapshot, greatly improve the elephant flow detection speed and simultaneously reduce the resource cost of a data exchange plane.
2. The method can be deployed on programmable switches, intelligent network cards, commercial switch chips and various network data forwarding hardware, and has the advantage of good universality.
Drawings
FIG. 1 is a schematic diagram of a prior art programmable switch architecture
FIG. 2 is a schematic diagram of a basic process flow of a method according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a programmable switch according to an embodiment of the present invention.
Detailed Description
The first embodiment is as follows:
in order to more clearly illustrate the details of the method and system for rapidly detecting elephant flow based on probability sampling according to the present invention, the following will take the data plane deployed in the programmable switch as an example to further detail the method and system for rapidly detecting elephant flow based on probability sampling according to the present invention.
As shown in fig. 2, the method for rapidly detecting an elephant flow based on probability sampling in this embodiment includes:
1) The data packet forwarding module sends the five-tuple information of the data packet head of the passing data packet and the occupation proportion of the interface queue;
2) Receiving quintuple information of the head of a data packet and the occupation ratio of an interface queue sent by a data packet forwarding module, counting the data packet by adopting a probability sampling method, and detecting the elephant flow based on the counting result of the data packet;
3) And storing the detected elephant flow information based on the elephant flow storage queue.
In this embodiment, the processing step of the packet forwarding module for any passing packet pkt in step 1) includes:
1.1 Read the head Quintuple information Quinumple of the data packet pkt passing through from the queue corresponding to the input interface InInt, the head Quintuple information Quinuple is a binary string formed by splicing a source IP address srcIP, a destination IP address dstIP, a source port number srcPort, a destination port number dstPort and a protocol number protocol;
1.2 Inquiring a hardware forwarding table according to head Quintuple information Quintuple of the data packet pkt to obtain a forwarding output interface OutInt of the data packet pkt;
1.3 Copying the data packet pkt from an input queue of an interface InInt to an output queue OutQueue of a forwarding-out interface OutInt, and acquiring the number CurrentNum of the data packets existing in the output queue OutQueue of the forwarding-out interface OutInt;
1.4 Dividing the total Length of the out-queue OutQueue by the number CurrentNum of the data packets existing in the out-queue OutQueue of the forwarding out-interface OutInt to obtain an interface queue occupation ratio ORatio;
1.5 Quintuple information quintuplet of the header of the output packet pkt and the occupation ratio ORatio of the interface queue.
By the means, the occupation ratio ORatio of the interface queue can be extracted quickly, the data packet is counted by adopting a probability sampling method in the step 2), and basic data is provided for detecting the elephant flow based on the counting result of the data packet.
It should be noted that the packet forwarding module is a core module on a programmable switch, an intelligent network card, a commercial switch chip and various network data forwarding hardware, in this embodiment, step 1) relates to function extension of the existing packet forwarding module, and the packet header quintuple information and interface queue occupation ratio extraction and transmission functions of the packet are extended on the basis of the function of the existing packet forwarding module.
In this embodiment, step 2) includes:
2.1 Judging whether head Quintuple information Quinuple and an interface queue occupation ratio ORatio of the data packet pkt are received or not, and if so, skipping to execute the step 2.2); otherwise, continuing to return to execute the step 2.1);
2.2 ) if the occupation ratio ORatio is less than or equal to the preset minimum threshold Min th If so, not counting the data packet; if the occupation ratio ORatio of the interface queue is larger than the preset minimum threshold Min th And less than or equal to a preset maximum threshold value Max th Counting the data packet pkt by a probability p; if the occupation ratio ORatio of the interface queue is larger than the preset maximum threshold value Max th Directly counting the data packet; the elephant flow is detected based on the packet count result. Through the step 2.2), the data packet counting method based on probability sampling capable of self-adapting to the flow size is realized. And the flow counting module counts the data packets one by one under the condition that the occupation proportion of the interface queue exceeds a maximum threshold value, namely the link load is heavier. And the flow counting module counts the data packets according to the probability p under the condition that the occupation proportion of the interface queue is between the minimum threshold and the maximum threshold, namely when the link load is normal. The magnitude of the probability p is related to the number of consecutive uncounted times when the interface queue occupancy ratio is between the minimum threshold and the maximum threshold. The greater the number of consecutive uncounted times, the greater the probability p, which appears to be that even though the interface queue occupancy ratio is still between the minimum threshold and the maximum threshold, as the number of consecutive uncounted times increases,the probability of the flow counting module counting the data packets increases. And the flow counting module does not count the data packets under the condition that the occupation proportion of the interface queue is lower than the minimum threshold value, namely the link load is light.
In this embodiment, step 2.1) further includes the following initialization steps: the countless data packet number is initialized to 0, the countless data packet number is in the value range of 0]Minimum threshold Min preset by interface queue occupation ratio ORatio th And Max of the highest threshold th Maximum value Max of real-time counting probability q q (ii) a In this embodiment, to ensure that the final counting probability p is in the range of [0,1]Maximum value Max of real-time counting probability q q The value is 1/(1 + Length). Step 2.2) comprises:
2.2.1 Initializing a position Index of a flow to which the packet pkt belongs in the hash table to 0, and initializing flow count information CountUpdate of the flow to which the packet pkt belongs to 0; the hash table is used for counting the number of data packets of the flow, each element in the hash table comprises three fields, namely a flow quintuple information field, a counting field and a filling time field, and the quintuple information field of the flow to which the data packet belongs comprises a source IP address, a destination IP address, a source port number, a destination port number and a protocol number of the flow.
2.2.2 ) determine that the occupation ratio ORatio of the interface queue is greater than the preset minimum threshold Min th And less than or equal to a preset maximum threshold value Max th If yes, skipping to step 2.2.3), if the occupation ratio ORatio of the interface queue is greater than the preset maximum threshold value Max th Skipping to step 2.2.4), if the occupation ratio ORatio of the interface queue is less than or equal to the preset minimum threshold Min th Then jump to step 2.2.5);
2.2.3 According to q = Max q (Oratio–Min th )/(Max th –Min th ) Calculating a real-time count probability q, wherein Max q The value range of the real-time counting probability q is [0,Max ] as the maximum value of the real-time counting probability q q ](ii) a Calculating a counting probability p according to p = q/(1-count q), wherein the count is the number of the data packets which are not counted, and the value range of the counting probability p is [0, 1%](ii) a Multiplying the count probability p byObtaining a random number rand by taking a boundary value by a preset random number; if the random number rand is smaller than the preset threshold value m, setting the number count of the data packets which are not counted as 0, counting the data packets pkt, and storing the flow information to which the data packets pkt belong in a hash table; otherwise, adding 1 to the countless data packet number count on the basis of the original value, if the countless data packet number count after adding 1 is greater than the total Length of the out queue, setting the countless data packet number count to 0, counting the data packets pkt, and storing the flow information to which the data packets pkt belong in a hash table; jump execution step 2.2.6); otherwise, the jump executes step 2.2.5), that is: if the number of the data packets which are not counted after adding 1 is counted<If = Length, then not counting;
2.2.4 Set the countless number of packets to 0, count the packets pkt, and store the flow information to which the packets pkt belong in the hash table; jump execution step 2.2.6);
2.2.5 Does not count the packet pkt, sets the location Index of the flow to which the packet pkt belongs in the hash table to 0, and sets the flow count information CountUpdate of the flow to which the packet pkt belongs to 0; jump execution step 2.2.6);
2.2.6 Detect elephant flow based on the count result of the data packets pkt.
In this embodiment, the step of storing the flow information to which the data packet pkt belongs in the hash table includes:
s1) acquiring currenttime at the current time; calculating a position Index1 of the flow to which the data packet pkt belongs in the hash table by using a preset first hash function, which can be expressed as:
Index1=Hash(Quintuple,hashA),
the Hash is a Hash function, the quintuplet is head Quintuple information of the data packet pkt, and the Hash A is a parameter of the first Hash function;
if the position corresponding to the position Index1 in the hash table is empty, inserting the head Quintuple information Quinumple, the flow count information 1 and the current time currenttime of the data packet pkt into the position corresponding to the position Index1 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); if the position corresponding to the position Index Index1 in the hash table is not empty, skipping to the next step;
s2) extracting stream quintuple information QuintupleA, stream count information CountA and filling time timeA of the position Index Inex 1; if the extracted stream Quintuple information QuintupleA of the position Index1 and the header quintuplet information Quintuple of the data packet pkt are equal to each other, the stream count information of the position Index1 is incremented by one, the stream count information CountA is incremented by 1 on the basis of the original value, setting the position Index of the stream to which the data packet pkt belongs in the hash table to be position Index1, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs to be new stream count information CountA, and skipping to execute step S3); otherwise, skipping to execute the step S4);
s3) judging whether the difference between the currenttime and the filling time timeA at the current time is greater than the preset time Tmax or not, and emptying data of a position Index Index1 in the hash table if the difference is greater than the preset time Tmax; then, inserting head Quintuple information Quinuple, flow count information 1 and current time currenttime of the data packet pkt into a position Index Index1 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, directly skipping to execute the step 2.2.6);
s4) calculating, by using a preset second hash function, a position Index2 of the flow to which the data packet pkt belongs in the hash table, which may be expressed as:
Index2=Hash(Quintuple,hashB),
wherein, the Hash is a Hash function, the quintuplet is the head Quintuple information of the data packet pkt, and the Hash B is a parameter of the second Hash function;
if the position corresponding to the position Index2 in the hash table is empty, inserting the header Quintuple information quintuplet, the flow count information 1 and the current time currenttime of the data packet pkt into the position corresponding to the position Index2 in the hash table; setting a position Index of a stream to which the data packet pkt belongs in a hash table as a position Index2, and setting stream count information CountUpdate of the stream to which the data packet pkt belongs as 1; jump execution step 2.2.6); if the position corresponding to the position Index2 in the hash table is not empty, skipping to the next step;
s5) extracting stream quintuple information QuinupleB, stream count information CountB and filling time timeB of the position Index2; if the extracted stream Quintuple information QuintupleB of the position Index2 and the header quintuplet information quintuplet of the packet pkt are equal to each other, the stream count information of the position Index2 is incremented by one, the stream count information CountB is incremented by 1 on the basis of the original value, setting the position Index of the stream to which the data packet pkt belongs in the hash table as position Index2, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs as new stream count information CountB, and skipping to execute step S6); otherwise, skipping to execute the step S7);
s6) judging whether the difference value between the currenttime and the filling time timeB at the current time is greater than the preset time Tmax or not, and emptying data of a position Index2 in the hash table if the difference value is greater than the preset time Tmax; then, inserting head Quintuple information Quinuple, flow count information 1 and current time currenttime of the data packet pkt into a position Index2 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, directly skipping to execute the step 2.2.6);
s7) judging whether the difference between the currenttime and the filling time timeA at the current time is greater than the preset time Tmax or not, and emptying data of a position Index Index1 in the hash table if the difference is greater than the preset time Tmax; then, inserting head Quintuple information Quinuple, flow count information 1 and current time currenttime of the data packet pkt into a position Index Index1 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, skipping to execute the step S8);
s8) judging whether the difference between the currenttime and the filling time timeB at the current time is greater than the preset time Tmax or not, and emptying data of a position Index2 in the hash table if the difference is greater than the preset time Tmax; then, inserting head Quintuple information Quinuple, flow count information 1 and current time currenttime of the data packet pkt into a position Index2 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, skipping to execute the step S9);
s9) calculating the minimum value minCount = Min { CountA, countB } between the flow count information CountA and the flow count information CountB, determining a corresponding position index minIndex in the hash table according to the minimum value minCount, and obtaining quintuple information minQuinumple, flow count information minCountt and filling time minTime of the position index minIndex;
s10) calculating the position index of quintuple information minQuintuple in the hash table to be equal to minIndex1 by using a preset first hash function, wherein the position index can be expressed as follows:
minIndex1=Hash(minQuintuple,hashA),
wherein, the Hash is a Hash function, the minQuintuple is quintuple information of a position index minIndex, and the Hash A is a parameter of the first Hash function;
if the corresponding position of the position index minIndex1 in the hash table is empty, inserting quintuple information minQuintuple, stream count information minCount and filling time minTime into the corresponding position of the position index minIndex1 in the hash table, and skipping to execute the step S12); otherwise, jumping to execute the step S11);
s11) calculating the position index of quintuple information minQuintUPle in the hash table to be equal to minIndex2 by using a preset second hash function, inserting the quintuple information minQuintUPle, flow count information minCount and filling time minTime into the position corresponding to the position index minIndex2 in the hash table if the position corresponding to the position index minIndex2 in the hash table is empty, and skipping to execute the step S12); otherwise, skipping to execute the step S12);
using the first hash function and the second hash function to calculate the index position minQuintuple, wherein one of the index positions must be equal to the current storage position minIndex, and steps S9) -S11) are essentially to confirm whether the index position calculated by another hash algorithm for minQuintuple is empty. And if the index position obtained by calculating minQuintuple by another hash algorithm is empty, inserting the minQuintuple information in the hash table into a new empty position. Otherwise, if the index positions calculated by the minQuintuple in the hash table based on the first hash function and the second hash function are not null, deleting the minQuintuple information stored in the hash table.
S12) emptying data of a position index minIndex in the hash table (at this time, minQuinutile is inserted into a new position or the position cannot be replaced because no empty position exists in minQuinutile, and the original position minIndex of minQuinutile needs to be emptied so as to insert counting information of a data packet pkt), and skipping to execute the step S13);
s13) inserting Quintuple information Quinumple, 1 and filling time currenttime of the stream to which the data packet pkt belongs into a position Index minIndex in the hash table, setting the position Index Index of the stream to which the data packet pkt belongs in the hash table as the position Index minIndex, and setting the stream count information countUpdate of the stream to which the data packet pkt belongs as 1; the jump performs step 2.2.6).
In this embodiment, step 2.2.6) includes: if the counting result of the data packet pkt is larger than a preset elephant flow counting threshold value, judging that the flow to which the data packet pkt belongs is the elephant flow, and emptying data of a position Index in a hash table by a flow counting module; otherwise, the stream to which the data packet pkt belongs is judged to be not the elephant stream.
In this embodiment, the elephant flow storage queue in step 3) is a round-robin queue with a length of LenQueue, the round-robin queue includes a plurality of queue units (for storing five tuple information of a flow) connected end to end, the elephant flow storage queue includes a head pointer HeadP and a tail pointer TailP, the head pointer HeadP points to an element position of the earliest joining queue, the tail pointer TailP points to an element position of the latest joining queue of the round-robin queue, and the elephant flow storage queue is initialized to be empty, and the initial head pointer HeadP and the initial tail pointer TailP are equal. The elephant flow storage queue is a circular queue with the length of LenQueue, and the constraint of the self length of the circular queue is also the constraint of the effective time of the elephant flow detection result, because the elephant flow detected earlier may be finished, the elephant flow detection queue has no reference value for network flow scheduling, and meanwhile, the memory space is saved by limiting the length of the circular queue.
In this embodiment, the step of storing based on the elephant flow storage queue in step 3) includes:
3.1 Judging whether quintuple information EleQuintuple of the elephant flow is received or not, and executing the next step if the quintuple information EleQuintuple of the elephant flow is received; otherwise, continuously returning to the step 3.1) to continuously carry out detection;
3.2 Judging whether the queue unit pointed by the tail pointer TailP is empty, and if not, executing the next step; otherwise, skipping to execute the step 3.5);
3.3 Clearing the queue element pointed to by the tail pointer TailP;
3.4 Update the head pointer HeadP according to HeadP = (HeadP + 1)% LenQueue, where% is modulo arithmetic;
3.5 Five-tuple information EleQuintuple of the elephant flow is stored into a queue unit pointed by a tail pointer TailP;
3.6 Update the tail pointer TailP according to TailP = (TailP + 1)% LenQueue, end.
The embodiment also provides a system for rapidly detecting an elephant flow based on probability sampling, which comprises an input module with at least one input port and a corresponding input queue, an output module with at least one output port and a corresponding output queue, and a data forwarding controller, wherein the data forwarding controller is respectively connected with the input module and the output module, and is programmed or configured to execute the steps of the elephant flow rapid detection method based on probability sampling.
As shown in fig. 3, the data forwarding controller in this embodiment includes: the data packet forwarding module is used for sending the five-tuple information of the data packet head of the passing data packet and the queue occupation ratio; the flow counting module is used for receiving the quintuple information of the head part of the data packet and the occupation proportion of the interface queue sent by the data packet forwarding module, counting the data packet by adopting a probability sampling method and detecting the elephant flow based on the counting result of the data packet; the elephant flow storage module is used for storing the detected elephant flow information based on the elephant flow storage queue; the input end of the flow counting module is connected with the data packet forwarding module, the output end of the flow counting module is connected with the elephant flow storage module, and the elephant flow storage module is connected with the elephant flow storage queue.
The implementation method of the elephant flow rapid detection system based on probability sampling in the embodiment comprises the following steps:
in the first step, compared with a data forwarding controller in a traditional switch data plane, a flow counting module, an elephant flow storage module, a hash table and an elephant flow storage queue are added to the data forwarding controller, and a data packet forwarding module is modified. The hash table is used for counting the number of data packets of the flow, and each element in the hash table comprises three fields, namely a flow quintuple information field, a counting field and a filling time field. The flow five tuple information field is derived from the source IP address, destination IP address, source port number, destination port number, and protocol number of the flow. The elephant flow storage queue is a circular queue that stores elephant flow information. Five tuple information of the flow is stored in each element of the queue. The flow counting module is connected with the flow counting table, the data packet forwarding module and the elephant flow storage module. The input of the flow counting module is the quintuple information of the data packet head and the queue occupation proportion sent by the data packet forwarding module. The flow counting module may read and write the hash table. The output of the stream counting module is the detected elephant stream quintuple information. The elephant flow storage module is connected with the flow counting module and the elephant flow storage queue. The elephant flow detection module receives elephant flow quintuple information sent by the flow counting module. The elephant flow storage module can read and write the elephant flow storage queue. The data packet forwarding module is connected with the switch interface queue and is responsible for forwarding the data packet. When the data packet forwarding module copies the data packet to the dequeue of the switch interface, the occupation condition of the dequeue of the interface is recorded, and the quintuple information of the head of the data packet and the occupation condition of the queue are sent to the flow counting module.
In the second step, the programmable switch begins operation and the flow count table is initialized to null.
And thirdly, the data packet forwarding module, the flow counting module and the elephant flow storage module work in parallel to complete elephant flow detection and storage in a matching manner.
The data packet forwarding module forwards the data packets according to the following procedures and sends the head quintuple information of each data packet to the flow counting module: and the data packet forwarding module reads the five-tuple information of the head part of the data packet from the interface in-queue, and queries a hardware forwarding table to obtain a forwarding interface of the data packet. And the data packet forwarding module copies the data packets to an out-queue of a forwarding out-interface to acquire the number of the data packets in the out-queue. And the data packet forwarding module calculates the occupation proportion of the dequeue and sends the five-tuple information at the head of the data packet and the occupation proportion of the dequeue to the flow counting module. The specific method comprises the following steps:
3.1.1 the data packet forwarding module reads the head Quintuple information of the data packet pkt from the input queue of an interface InInt and records the information as a binary string Quinuple, wherein the Quinuple is formed by splicing srCp, dstIP, srcPort, dstPort and protocol, namely, a source IP address, a destination IP address, a source port number, a destination port number and a protocol number. Execution 3.1.2;
3.1.2 the data packet forwarding module queries a hardware forwarding table according to the Quintuple information quintuplet of the head of the data packet pkt to obtain a forwarding output interface OutInt of the data packet pkt. Execution 3.1.3;
3.1.3 the data packet forwarding module copies the data packet pkt from an enqueue of the interface InInt to an dequeue of the interface OutInt, acquires the number of the data packets in the dequeue of the interface OutInt, records the dequeue of the interface OutInt as OutQueue, records the number of the existing data packets as CurrentNum, and executes 3.1.4;
3.1.4 packet forwarding module calculates the out-queue occupancy ratio of the interface OutInt, and is denoted as ORatio. ORatio is equal to the number of packets, currentNum, present in the out-queue OutQueue of the interface OutInt divided by the total Length of OutQueue. Execution 3.1.5;
3.1.5 the data packet forwarding module sends the head Quintuple information Quintuple of the data packet pkt and the occupation ratio QRTio of the queue OutQueue to the flow counting module. Turn 3.1.1.
The flow counting module counts the data packets by adopting a probability sampling method in the following processes, writes counting results into a hash table, and sends elephant flow detection results to the elephant flow storage module in real time:
and 3.2.1, the flow counting module receives the quintuple information of the head part of the data packet and the occupation ratio of the queue, which are sent by the data packet forwarding module. And the flow counting module judges whether to count the data packet according to the queue occupation proportion. The specific method comprises the following steps:
3.2.1.1, if the queue occupation ratio is less than or equal to a preset lowest threshold, not counting the data packets;
3.2.1.2 if the queue occupation ratio is greater than a preset lowest threshold and less than or equal to a preset highest threshold, counting the data packets by a probability p;
3.2.1.3 if the queue occupation ratio is larger than the preset highest threshold, directly counting the data packet.
Further, the process of counting the data packets by the flow counting module is as follows:
the flow counting module initializes the hash table to null. Each element in the hash table contains three fields, namely a flow five tuple information field, a count field and a fill-in time field. The flow counting module first enters a first round of hashing.
The first round of hashing:
first, hash function 1 (first hash function) is used to calculate a hash value, i.e., index1 in the hash table, for the packet header quintuple information. And if the position of the index1 in the hash table is empty, filling stream quintuple information, counting information of '1' and filling time in the position of the index1, and recording the stored position index and the counting value. And if the position of the index1 is not empty and the quintuple information of the data packet header is equal to the stream quintuple information stored in the position of the index1, updating the stream counting information of the position of the index1 by the stream counting module and recording the stored position index and the counting value. Corresponding to the foregoing step S3), if the position stream quintuple information of index1 is equal to the packet header quintuple information, the stream count information of the position of index1 is updated, and meanwhile, whether the position of index1 is overtime is also determined. The flow count information is set to 1 if the time-out occurs. And entering second round hashing under the condition that the first round hashing fails.
And a second round of hash:
the flow counting module uses hash function 2 (second hash function) to calculate a hash value, i.e., index2 in the hash table, for the packet header quintuple information. And if the position of the index2 in the hash table is empty, filling stream quintuple information, counting information '1' and filling time into the position of the index2, and recording the stored position index and the counting value. And if the position of the index2 is not empty and the quintuple information of the data packet head is equal to the stream quintuple information stored in the position of the index2, updating the stream count information in the position of the index2 and recording the stored position index and the count value. In addition, as with the first round of hash, it is also necessary to determine whether the time is out. Here, the quintuple information corresponding to the index values 1 and 2 obtained by the two hash functions is not equal to the quintuple information at the head of the data packet, and then whether the two positions of the index values 1 and 2 are overtime or not can be judged.
Under the condition that the second round of hash fails, the flow counting module judges whether the information stored in the two positions is overtime one by one, namely whether the difference between the current time and the filling time exceeds a preset time threshold value: if the stored information in one position is overtime, the stored information is deleted and filled with new flow quintuple information, counting and filling time, and the stored position index and counting value are recorded. And if the hash value is not overtime, the flow counting module transfers to the third round of hash.
The third round of hashing:
and the stream counting module firstly uses two different hash functions to calculate a hash value for the information stored in the index1 position in the third round of hash, if one of the two obtained index positions is empty, the information stored in the index1 position is copied to the empty position, new stream quintuple information, counting information '1' and filling time are filled in the index1 position, and the stored position index and the count value are recorded. Otherwise, the stream counting module calculates a hash value for the information stored in the index2 position by using two different hash functions, if one of the two obtained index positions is empty, the information stored in the index2 position is copied to the empty position in the same way, new stream quintuple information, counting information '1' and filling time are filled in the index2 position, and the stored position index and the stored counting value are recorded. And if the hash values obtained by respectively calculating the information stored in the index1 and the index2 by the flow counting module by using two different hash functions are not empty in the hash table, the flow counting module is switched to an occupation process. And the flow counting module compares the sizes of the counting information stored in the positions of the index1 and the index2 in the hash table firstly in the occupying process, selects the smaller one of the counting information, and clears the information stored in the position. And the flow counting module inserts new flow quintuple information, flow counting information '1' and filling time into the emptied position, and records the stored position index and the count value. And after the flow counting module finishes counting the data packets, judging whether the flow to which the data packets belong is the elephant flow according to the position index and the counting value recorded in the data packet counting process. The specific method comprises the following steps: and the flow counting module judges that the flow to which the data packet belongs is the elephant flow if the counting value is larger than the elephant flow counting threshold value. And the flow counting module sends the head quintuple information of the data packet to the elephant flow storage module, and clears the value of the position index recorded in the hash table. Otherwise, the flow counting module judges that the flow to which the data packet belongs is not the elephant flow. The specific method comprises the following steps:
3.2.1, initializing an countless data packet quantity variable count =0 by the flow counting module, wherein the value range of the countless data packet quantity variable is [0,Length ]]. Minimum threshold Min for initializing interface queue occupation proportion by flow counting module th Maximum threshold value Max of occupation ratio of interface queue th . The flow counting module initializes the timeout time Tmax of the flow counting information in the hash table. Flow counting module initialization real-time counting probability q maximum value Max q = 1/(1 + Length). The flow counting module initializes the hash table to be empty, the length of the hash table is LenHash, and each element of the hash table contains flow quintuple information, flow counting information and filling time. The flow counting module initializes an expression of a first Hash function equal to Hash (data, hash A), an expression of a second Hash function equal to Hash (data, hash B), wherein the data represents data packet header quintuple information, the Hash A and the Hash B represent parameters of the first Hash function and the second Hash function respectively, and the results of the Hash (data, hash A) and the Hash (data, hash B) are positive integers and the value range is [1, lenH ]ash]. Execution 3.2.2.
And 3.2.2, judging whether the head Quintuple information Quinuple of the data packet pkt sent by the data packet forwarding module and the occupation ratio ORatio of the interface queue are received by the flow counting module. If yes, executing 3.2.3, otherwise, continuing to execute 3.2.2.
3.2.3, the flow count module makes Index indicate the location Index of the flow to which the packet pkt belongs in the hash table and initializes Index to 0, countUpdate indicates the flow count information of the flow to which the packet pkt belongs and initializes CountUpdate to 0. Execution 3.2.4.
3.2.4, if Min th <ORatio<Max th Execute 3.2.4.1, otherwise go to 3.2.5.
3.2.4.1, counting the data packet pkt by the flow counting module according to the probability p, wherein the specific method comprises the following steps:
3.2.4.1.1, calculating the real-time counting probability q by a flow counting module, and enabling q = Max q (Oratio–Min th )/(Max th –Min th ) Then the value range of q is [0,Max ] q ]And 3.2.4.1.2 is executed.
3.2.4.1.2, the flow counting module calculates the counting probability p, and let p = q/(1-count × q), at this time, the larger the value of the un-counted packet number variable count is, the higher the counting probability p is, and the probability of counting pkt can be increased under the condition that the packets are not counted for a plurality of times continuously. At the same time, the value range of q is [0, max ] q ]Ensures that the value range of p is [0,1 ]]. Run 3.2.4.1.3.
3.2.4.1.3, the result of the flow counting module calculating p 100 is denoted as m. The flow counting module generates a random number rand between 1 and 100. Run 3.2.4.1.4.
3.2.4.1.4, if rand < = m, the flow count module sets the countless packet number variable count =0. The flow counting module counts the data packets pkt, and executes 3.2.4.1.5. Otherwise, turn to 3.2.4.1.8.
And 3.2.4.1.5, the flow counting module stores the flow information of the data packet pkt in the hash table.
3.2.4.1.6 now rand > m, the flow count module sets the countless packet number variable count = count +1. Turn 3.2.4.1.7.
3.2.4.1.7 if the number of uncounted packets measures count > Length, then the flow counting module has uncounted consecutive Length packets. The flow counting module needs to count the data packets pkt. Run 3.2.4.1.5.
3.2.5 if ORatio ≧ Max at this time th The flow counting module sets the countless packet number variable count =0. The flow counting module directly counts the data packets pkt and executes 3.2.4.1.5. Otherwise, go to 3.2.6.
3.2.6 when ORatio is less than or equal to Min th The flow counting module does not count the data packet pkt. The flow counting module sets the countless data packet quantity variable count =0, and makes CountUpdate =0, index =0, and turns to 3.2.7.
And the 3.2.7 flow counting module judges that if the counting information CountUpdate of the flow to which the data packet pkt belongs is greater than Elemax, and executes the process to 3.2.7.1. Otherwise, go to 3.2.8.
The 3.2.7.1 stream counting module determines that the stream to which the packet pkt belongs is a elephant stream. And the stream counting module sends Quintuple information Quintuple of the data packet pkt to the elephant stream storage module. Execution 3.2.7.2.
The 3.2.7.2 flow counting module clears the Index location of the hash table and executes 3.2.8.
And 3.2.8, completing the processing of the data packet pkt and the elephant flow detection process by the flow counting module. The flow counting module counts the data packets by a method based on probability sampling through 3.2.4.1.1-3.2.4.1.7, and the data packets are continuously positioned for multiple times when the occupation proportion of the interface queue is (Min) th ,Max th ) In the interval, along with the increase of the times of non-counting, the probability of counting the data packets by the flow counting module is continuously increased, so that the condition that the data packets are not counted continuously for multiple times under an extreme condition is avoided. The flow counting module completes the detection of the elephant flow through 3.2.7.1-3.2.7.2. Turn 3.2.2.
Wherein, the step 3.2.4.1.5 of the flow counting module storing the flow information to which the data packet pkt belongs in the hash table includes:
the current time acquired by the flow counting module is recorded as currenttime 3.2.4.1.5.1. Run 3.2.4.1.5.2.
The 3.2.4.1.5.2 flow counting module calculates the position Index1= Hash (quintuplet, hash a) of the flow to which the data packet pkt belongs in the Hash table using the first Hash function. Run 3.2.4.1.5.3.
3.2.4.1.5.3 if the Index1 position of the hash table is empty, the stream counting module inserts the Quintuple information quintuplet, the stream counting information 0 and the current time currenttime of the header of the data packet pkt into the Index1 position of the hash table. The flow counting module makes Index = Index1, countUpdate =1, and turns to 3.2.7. Otherwise, turn to 3.2.4.1.5.4.
3.2.4.1.5.4 at this time, the Index1 position of the hash table is not empty, the flow counting module extracts the information of the Index1 position of the hash table, the flow quintuple information is marked as QuintupleA, the flow counting information is marked as CountA, the filling time is marked as timeA, and the process is executed for 3.2.4.1.5.5.
The 3.2.4.1.5.5 flow counting module determines that if QuintupleA = quintuplet, the flow counting module adds one to the flow counting information at the Index1 position of the hash table, so that CountA = CountA +1, countupdate = CountA, index = Index1, and then turns to 3.2.4.1.5.6. Otherwise 3.2.4.1.5.7 is executed.
3.2.4.1.5.6 flow counting module judges that if currenttime-time A > Tmax, the flow counting module clears Index1 position of hash table. The stream counting module inserts Quintuple, 1 and currenttime at the Index1 position of the hash table, and the stream counting module enables Index = Index1 and CountUpdate =1, and turns to 3.2.7. Otherwise 3.2.4.1.5.6 is executed.
The 3.2.4.1.5.7 stream counting module calculates the position Index2= = Hash (Hash b) of the stream to which the data packet belongs in the Hash table by using the second Hash function. Run 3.2.4.1.6.
3.2.4.1.5.8 if the Index2 position of the hash table is empty, the flow counting module inserts the Quintuple information quintuplet, the flow counting information 1 and the current time currenttime of the header of the data packet at the Index2 position of the hash table. The flow counting module makes Index = Index2, countUpdate =1, and turns to 3.2.7. Otherwise, turn 3.2.4.1.5.9.
3.2.4.1.5.9 when the Index2 position of the hash table is not empty, the flow counting module extracts the information of the Index2 position of the hash table, the flow quintuple information is marked as QuintupleB, the flow counting information is marked as CountB, the filling time is marked as timeB, and the step is executed to 3.2.4.1.5.10.
The 3.2.4.1.5.10 flow counting module determines that if QuintupleB = quintuplet, the flow counting module adds one to the flow counting information at the Index2 position of the hash table, so that CountB = CountB +1, countupdate = CountB, index = Index2, and then turns to 3.2.4.1.5.11. Otherwise, 3.2.4.1.5.12 is executed.
3.2.4.1.5.11 flow counting module judges if currenttime-time B > Tmax, the flow counting module clears Index2 position of the hash table, and the flow counting module inserts quintuplet, 1 and currenttime into Index2 position of the hash table. The flow counting module makes Index = Index2, countUpdate =0, and turns to 3.2.7. Otherwise, 3.2.4.1.5.12 is executed.
3.2.4.1.5.12 at this time, the positions in the hash table obtained by the flow Quintuple information quintuplet through the first hash function and the second hash function all have counting information of other flows. The flow counting module calculates whether other flows in the positions of the hash table calculated by the first hash function and the second hash function are overtime, and the specific method is as follows, and is executed by 3.2.4.1.5.12.1.
3.2.4.1.5.12.1 flow counting module judges that if currenttime-time A > Tmax, the flow counting module clears Index1 position of the hash table. The flow counting module inserts Quintuple, 1 and currenttime at the Index1 position of the hash table, and the flow counting module makes Index = Index1, countUpdate =1, and turns to 3.2.7. Otherwise, turn to 3.2.4.1.5.12.2.
3.2.4.1.5.12.2 the stream counting module judges if currenttime-time B > Tmax, the stream counting module clears the Index2 position of the hash table, and the stream counting module inserts Quintuple, 1 and currenttime into the Index2 position of the hash table. The flow counting module makes Index = Index2, countUpdate =1, and turns to 3.2.7. Otherwise 3.2.4.1.5.13 is executed.
3.2.4.1.5.13 at this time, counting information of other streams exists in the positions in the hash table obtained by the flow Quintuple information quintuplet through the first hash function and the second hash function, and the storage time of the other streams in the hash table does not exceed Tmax. The flow counting module calculates the minimum value of CountA and CountB as minCount = Min { CountA, countB }, and the position index of the hash table corresponding to minCount is minIndex. The quintuple information at the minIndex position of the hash table is denoted as minQuintuple, and the padding time is denoted as minTime. The flow counting module reselects a storage location for an element of the minIndex location of the hash table. The specific method is as follows, implementing 3.2.4.1.5.13.1.
The 3.2.4.1.5.13.1 stream counting module calculates the position of the stream quintuple information minquintupple in the Hash table equal to minIndex1= Hash (minquintupple, hash a) using the first Hash function. If the minIndex1 position in the hash table is empty, the stream counting module inserts minQuintUPle, minCount and minTime into the minIndex1 position of the hash table. The flow counting module clears the minidex location of the hash table, turning to 3.2.4.1.5.14. Otherwise, execution continues at 3.2.4.1.5.13.2.
The 3.2.4.1.5.13.2 stream counting module calculates the position of the stream quintuple information minquintupple in the Hash table equal to minIndex2= Hash (minquintupple, hash b) using the second Hash function. If minIndex2 position in the hash table is empty, the stream counting module inserts minQuintUPle, minCount and minTime into Index12 position of the hash table. The flow counting module clears the miniindex position of the hash table, and turns to 3.2.4.1.5.14. Otherwise, execution continues at 3.2.4.1.5.13.3.
3.2.4.1.5.13.3 at this time, the stream counting module uses two hash algorithms to calculate that other streams exist in the storage position of the stream quintuple information minquintupple in the hash table. And the stream counting module clears the minIndex position of the hash table, namely, the counting of the stream quintuple information minQuintuple in the hash table is deleted. Turn 3.2.4.1.5.14.
The 3.2.4.1.5.14 flow counting module inserts Quintuple, 1 and currenttime at the minidex location of the hash table. The flow counting module makes CountUpdate =0, index = minindex convert 3.2.7.
The elephant flow storage module stores the elephant flow information sent by the flow counting module according to the following flow: the elephant flow storage module receives flow quintuple information sent by the flow counting module and stores the flow quintuple information into an elephant flow storage queue, and the method specifically comprises the following steps:
3.3.1, the elephant flow storage module initializes the elephant flow storage queue to empty. The elephant flow store queue is a circular queue of LenQueue length. The elephant stream storage module initializes a head pointer, headP, and a tail pointer, tailP, and HeadP = TailP. The head pointer, headP, points to the element position that was first enqueued. The tail pointer TailP points to the position of the element that was last added to the circular queue.
3.3.2, judging whether quintuple information EleQuintuple of the elephant flow sent by the flow counting module is received by the elephant flow storage module. If yes, executing 3.3.3, otherwise, continuing to execute 3.3.2.
3.3.3, the elephant flow storage module judges whether the queue unit pointed by the TailP is empty, and if not, the step 3.3.4 is executed. Otherwise, 3.3.6 is executed.
3.3.4, the elephant flow storage module empties the queue element pointed to by TailP. Execution 3.3.5.
3.3.5, the elephant stream storage module has HeadP = (HeadP + 1)% LenQueue, execute 3.3.6.
3.3.6, storing quintuple information EleQuintuple of the elephant flow into a queue unit pointed by TailP by the elephant flow storage module, and executing 3.3.7.
3.3.7 elephant flow storage Module let TailP = (TailP + 1)% LenQueue. Execution 3.3.8.
3.3.8, finally, through 3.3.2-3.3.7, the elephant flow storage module finishes the process of storing the five-tuple information of the elephant flow into the circular queue, and then the elephant flow storage module continues to receive the five-tuple information EleQuintuple of the elephant flow sent by the flow counting module by 3.3.2.
In addition, the embodiment also provides a computer readable storage medium, in which a computer program of the above-mentioned method for rapidly detecting an elephant flow based on probability sampling is stored.
Example two:
the present embodiment is substantially the same as the first embodiment, and the main difference is that the first hash function and the second hash function are implemented differently. An embodiment uses the same hash function with different parameters to calculate the hash value. In the embodiment, different hash functions are directly used to calculate the hash value.
Calculating a position Index1 of the flow to which the data packet pkt belongs in the hash table by using a preset first hash function, which can be expressed as: index1= HashA (Quintuple), where HashA is the first hash function and Quintuple is the header Quintuple information of the packet pkt.
Calculating a position Index2 of the flow to which the data packet pkt belongs in the hash table by using a preset second hash function, which can be expressed as: index2= HashB (Quintuple), where HashB is the second hash function and Quintuple is the header Quintuple information of the packet pkt.
Calculating the position index of the quintuple information minQuintuple in the hash table by using a preset first hash function, wherein the position index is equal to minIndex1 and can be represented as follows: minIndex1= HashA (minQuintuple), where HashA is the first hash function and minQuintuple is the five-tuple information of the location index minIndex.
Example three:
the present embodiment is basically the same as the first embodiment, and the main differences are as follows: the system for rapidly detecting the elephant flow based on probability sampling in the comparison document 1 is a programmable switch. The present embodiment also provides a system for rapidly detecting an elephant flow based on probability sampling, which is specifically an intelligent network card, and the system for rapidly detecting an elephant flow based on probability sampling also includes an input module with at least one input port and a corresponding input queue, an output module with at least one output port and a corresponding output queue, and a data forwarding controller, the data forwarding controller is respectively connected to the input module and the output module, the data forwarding controller is programmed or configured to execute the steps of the method for rapidly detecting an elephant flow based on probability sampling, and the data forwarding controller includes: the data packet forwarding module is used for sending the five-tuple information of the data packet head of the passing data packet and the queue occupation ratio; the flow counting module is used for receiving the quintuple information of the head part of the data packet and the occupation proportion of the interface queue sent by the data packet forwarding module, counting the data packet by adopting a probability sampling method and detecting the elephant flow based on the counting result of the data packet; the elephant flow storage module is used for storing the detected elephant flow information based on the elephant flow storage queue; the input end of the flow counting module is connected with the data packet forwarding module, the output end of the flow counting module is connected with the elephant flow storage module, and the elephant flow storage module is connected with the elephant flow storage queue.
Example four:
the present embodiment is basically the same as the first embodiment, and the main differences are as follows: the system for rapidly detecting the elephant flow based on probability sampling in the comparison file 1 is a programmable switch. The present embodiment also provides a system for rapidly detecting an elephant flow based on probability sampling, which is specifically a commercial switch chip, and the system for rapidly detecting an elephant flow based on probability sampling also includes an input module with at least one input port and a corresponding input queue, an output module with at least one output port and a corresponding output queue, and a data forwarding controller, the data forwarding controller is respectively connected to the input module and the output module, the data forwarding controller is programmed or configured to execute the steps of the method for rapidly detecting an elephant flow based on probability sampling, and the data forwarding controller includes: the data packet forwarding module is used for sending the five-tuple information of the data packet head of the passing data packet and the queue occupation ratio; the flow counting module is used for receiving the quintuple information of the head part of the data packet and the occupation ratio of the interface queue sent by the data packet forwarding module, counting the data packet by adopting a probability sampling method and detecting the elephant flow based on the counting result of the data packet; the elephant flow storage module is used for storing the detected elephant flow information based on the elephant flow storage queue; the input end of the flow counting module is connected with the data packet forwarding module, the output end of the flow counting module is connected with the elephant flow storage module, and the elephant flow storage module is connected with the elephant flow storage queue.
It should be noted that the programmable switch, the intelligent network card, and the commercial switch chip in the foregoing embodiments are merely examples of physical forms of the elephant flow rapid detection system based on probability sampling, and are not exhaustive, and needless to say, the elephant flow rapid detection method based on probability sampling of the present invention may also be applied to various types of network data forwarding hardware, and the elephant flow rapid detection system based on probability sampling may also be other various types of network data forwarding hardware, and an integrated component product or a complete machine product including various types of network data forwarding hardware, and therefore, description thereof is omitted.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is directed to methods, apparatus (systems), and computer program products according to embodiments of the application, wherein the instructions that execute via the flowcharts and/or processor of the computer program product create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention should also be considered as within the scope of the present invention.

Claims (7)

1. A method for rapidly detecting elephant flow based on probability sampling is characterized by comprising the following steps:
1) The data packet forwarding module sends the five-tuple information of the data packet head of the passing data packet and the occupation proportion of the interface queue;
2) Receiving quintuple information of the head of the data packet and the occupation proportion of an interface queue sent by a data packet forwarding module, counting the data packet by adopting a probability sampling method, and detecting the elephant flow based on the counting result of the data packet;
3) Storing the detected elephant flow information based on the elephant flow storage queue;
the step 2) comprises the following steps:
2.1 ) determine whether a packet has been receivedpktHead quintuple information ofQuintupleAnd the occupation ratio of the interface queueORatioIf yes, skipping to execute the step 2.2); otherwise, continuing to return to execute the step 2.1);
2.2 If the interface queue occupies a proportionORatioLess than or equal to a preset minimum threshold Min th If yes, the data packet is not counted; if the occupation ratio of the interface queueORatioGreater than a preset minimum threshold Min th And less than or equal to a preset maximum threshold value Max th Then the probability p is used to match the data packetpktCounting; if the occupation ratio of the interface queueORatioGreater than a preset maximum threshold value Max th Counting the data packet directly; detecting the elephant flow based on the data packet counting result;
before step 2.1), the following initialization steps are included: initializing the number of outstanding packetscount0, number of uncounted packetscountThe value range of (a) is [0,Length]occupation ratio of interface queueORatioPreset minimum threshold Min th And Max of the highest threshold th Timeout of flow count information in hash tablesTmaxCounting probabilities in real timeqMaximum value ofMax q (ii) a Step 2.2) comprises:
2.2.1 To send data packetspktIndexing of the location of the associated stream in a hash tableIndexInitialized to 0, data packetpktFlow count information of associated flowsCountUpdateInitializing to 0;
2.2.2 Determine the occupancy rate of the interface queueORatioGreater than a preset minimum threshold Min th And less than or equal to a preset maximum threshold value Max th If yes, skipping to step 2.2.3), if yes, the occupation ratio of the interface queueORatioGreater than a preset maximum threshold value Max th Then step 2.2.4) is skipped, if the occupation ratio of the interface queue isORatioLess than or equal to a preset minimum threshold Min th Then jump to step 2.2.5);
2.2.3 According toq=Max q (Oratio – Min th ) / (Max th – Min th ) Calculating real-time count probabilityqIn whichMax q Counting probabilities in real timeqMaximum of, real-time count probabilityqIs in the value range of [0 ], Max q ](ii) a According top=q / (1 – count*q) Calculating count probabilitypWhereincountCounting probability for the number of data packets not countedpIs in the value range of [0 ],1](ii) a Probability of countingpMultiplying the value by a preset random number to obtain a random numberrand(ii) a If random numberrandLess than a predetermined thresholdmThen the number of data packets not counted will be countedcountSet to 0, for data packetspktCounting, storing data packets in a hash tablepktThe flow information to which it belongs; otherwise, the number of data packets not counted is determinedcountAdding 1 to the original value, if the number of the data packets which are not counted is 1 added, the number of the data packets which are not counted iscountGreater than dequeueOutQueueTotal length ofLengthThen the number of data packets not counted will be countedcountSet to 0, for data packetspktCounting, storing data packets in a hash tablepktThe flow information to which it belongs; jump execution step 2.2.6); otherwise, skipping to execute the step 2.2.5);
2.2.4 The number of data packets not countedcountSet to 0, for data packetspktCounting, storing the data packets in a hash tablepktThe flow information to which it belongs; jump execution step 2.2.6);
2.2.5 For packet to packetpktCounting, and packing the data packetspktIndexing of the location of the associated stream in a hash tableIndexSet to 0, data packetpktFlow count information of associated flowsCountUpdateSet to 0; jump execution step 2.2.6);
2.2.6 Based on the data packetpktDetecting the elephant flow according to the counting result;
storing the data packet in the hash tablepktThe step of belonging stream information comprises:
s1) obtaining the current timecurrenttime(ii) a Computing data packets using a preset first hash functionpktIndexing of the location of the associated stream in a hash tableIndex1; if the position index in the hash tableIndexIf the corresponding position of 1 is null, the position index is in the hash tableIndex1 inserting data packet corresponding to positionpktHead quintuple information ofQuintupleFlow count information 1 and current timecurrenttime(ii) a Data packetpktIndexing of the location of the associated stream in a hash tableIndexSet as a position indexIndex1, data packetpktFlow count information of associated flowsCountUpdateIs set to 1; jump execution step 2.2.6); if the position index in the hash tableIndex1, if the corresponding position is not empty, skipping to the next step;
s2) extracting position indexIndex1 flow quintuple informationQuintupleA、Stream count informationCountA、Filling timetimeA(ii) a If the extracted position indexIndex1 flow quintuple informationQuintupleA、Data packetpktHead quintuple information ofQuintupleIf they are equal, the position is indexedIndex1 adding one to the flow counting informationCountAAdding 1 on the basis of the original value, and data packetspktIndexing of the location of the associated stream in a hash tableIndexSet as a position indexIndex1, data packetpktFlow count information of associated flowsCountUpdateSet as new flow count informationCountASkipping to execute step S3); otherwise, skipping to execute the step S4);
s3) judging the current timecurrenttime、Filling timetimeAThe difference between the two is greater than the preset timeTmaxIf yes, clearing position index in hash tableIndex1, data of; then, position index is carried out in the hash tableIndex1 insert dataBag (bag)pktHeader quintuple information ofQuintupleFlow count information 1 and current timecurrenttime(ii) a Data packetpktIndexing of the location of the associated stream in a hash tableIndexSet as a position indexIndex1, data packetpktFlow count information of associated flowsCountUpdateIs set to 1; jump execution step 2.2.6); otherwise, directly skipping to execute the step 2.2.6);
s4) calculating the data packet by using a preset second hash functionpktIndexing of the location of the associated stream in a hash tableIndex2; if the position index in the hash tableIndex2 if the corresponding position is null, the position index is in the hash tableIndex2 inserting data packet into corresponding positionpktHeader quintuple information ofQuintupleFlow count information 1 and current timecurrenttime(ii) a Data packetpktIndexing of the location of the associated stream in the hash tableIndexSet as a position indexIndex2, data packetpktFlow count information of associated flowsCountUpdateIs set to 1; jump execution step 2.2.6); if the position index in the hash tableIndex2, if the corresponding position is not empty, skipping to the next step;
s5) extracting the position indexIndex2 flow quintuple informationQuintupleB、Flow counting informationCountB、Filling timetimeB(ii) a If the extracted position indexIndex2 flow quintuple informationQuintupleB、Data packetpktHead quintuple information ofQuintupleIf they are equal, the position is indexedIndex2 adding one to the flow counting informationCountBAdding 1 on the basis of the original value, and data packetspktIndexing of the location of the associated stream in a hash tableIndexSet as a position indexIndex2, data packetpktFlow count information of associated flowsCountUpdateSet as new flow count informationCountBSkipping to execute step S6); otherwise, skipping to execute the step S7);
s6) judging the current timecurrenttime、Filling timetimeBThe difference between them is greater than a preset timeTmaxIf yes, clearing position index in hash tableIndex2; then, position in the hash tableIndexIndex2 insert data packetpktHeader quintuple information ofQuintupleFlow count information 1 and current timecurrenttime(ii) a Data packetpktIndexing of the location of the associated stream in a hash tableIndexSet as a position indexIndex2, data packetpktFlow count information of associated flowsCountUpdateIs set to 1; jump execution step 2.2.6); otherwise, directly skipping to execute the step 2.2.6);
s7) judging the current timecurrenttime、Time of fillingtimeAThe difference between the two is greater than the preset timeTmaxIf yes, clearing position index in hash tableIndex1, data of; then, position index is carried out in the hash tableIndex1 insert data packetpktHead quintuple information ofQuintupleFlow count information 1 and current timecurrenttime(ii) a Data packetpktIndexing of the location of the associated stream in the hash tableIndexSet as a position indexIndex1, data packetpktFlow count information of associated flowsCountUpdateIs set to 1; jump execution step 2.2.6); otherwise, skipping to execute the step S8);
s8) judging the current timecurrenttime、Time of fillingtimeBThe difference between them is greater than a preset timeTmaxIf yes, clearing position index in hash tableIndex2; then, position index is carried out in the hash tableIndex2 insert data packetpktHeader quintuple information ofQuintupleFlow count information 1 and current timecurrenttime(ii) a Data packetpktIndexing of the location of the associated stream in the hash tableIndexSet as a position indexIndex2, data packetpktFlow count information of associated flowsCountUpdateIs set to 1; jump execution step 2.2.6); otherwise, skipping to execute the step S9);
s9) calculating flow counting informationCountA、Flow counting informationCountBMinimum value therebetweenminCount= Min{CountA, CountB},According to a minimum valueminCountDetermining corresponding location index in hash tableminIndexObtaining a position indexminIndexQuintuple information ofminQuintuple、Flow counterInformationminCount、Filling timeminTime
S10) calculating quintuple information by using a preset first hash functionminQuintupleIndex of position in hash table equalsminIndex1, if the position index in the hash tableminIndexIf the corresponding position of 1 is null, the position index is in the hash tableminIndex1 inserting quintuple information into corresponding positionminQuintuple、Stream count informationminCount、Filling timeminTimeSkipping to execute step S12); otherwise, jumping to execute the step S11);
s11) calculating quintuple information by using a preset second hash functionminQuintupleIndex of position in hash table equalsminIndex2, if the position index in the hash tableminIndex2 if the corresponding position is null, the position index is in the hash tableminIndex2 inserting quintuple information into corresponding positionminQuintuple、Stream count informationminCount、Time of fillingminTimeSkipping to execute step S12); otherwise, jumping to execute the step S12);
s12) emptying data of a position index minIndex in the hash table, and skipping to execute the step S13);
s13) indexing the positions in the hash tableminIndexInserting data packetspktQuintuple information of the associated streamQuintuple、1Filling timecurrenttimeTo be data packetpktIndexing of the location of the associated stream in a hash tableIndexSet as a position indexminIndexData packetpktFlow count information of associated flowsCountUpdateIs set to 1; jump execution step 2.2.6).
2. The method for rapidly detecting elephant flow based on probabilistic sampling as recited in claim 1, wherein the data packet forwarding module in step 1) is configured to forward any passing data packetpktThe processing steps of (2) include:
1.1 From a corresponding input interfaceInIntRead the passing data packet in the queuepktHead quintuple information ofQuintupleHeader quintuple informationQuintupleFrom a source IP addresssrcIP、Destination IP addressdstIP、Source port numbersrcPort、Destination port numberdstPortAnd protocol numberprotocolSplicing the formed binary strings;
1.2 According to the data packetpktHeader quintuple information ofQuintupleInquiring a hardware forwarding table to obtain a data packetpktForward-out interface ofOutInt
1.3 To send data packetspktSlave interfaceInIntIs copied to the forwarding-out interfaceOutIntIs dequeuedOutQueueIn, get the forwarding out interfaceOutIntIs dequeuedOutQueueNumber of data packets present inCurrentNum
1.4 To be forwarded out interfaceOutIntIs dequeuedOutQueueNumber of data packets present inCurrentNumDivide by dequeueOutQueueTotal length ofLengthObtaining the occupation ratio of the interface queueORatio
1.5 Output data packetpktHeader quintuple information ofQuintupleAnd the occupation ratio of the interface queueORatio
3. The method for rapidly detecting elephant flow based on probability sampling according to claim 1, wherein the step 2.2.6) comprises: if data packetpktIf the counting result is greater than the preset elephant flow counting threshold value, the data packet is judgedpktThe stream is a elephant stream, and the position index in the hash table is clearedIndexThe data of (a); otherwise, the data packet is judgedpktThe belonging stream is not a elephant stream.
4. The method for rapidly detecting elephant flow based on probability sampling as claimed in claim 1, wherein the elephant flow storage queue in step 3) is of length ofLenQueueThe circular queue comprises a plurality of queue units connected end to end, and the elephant flow storage queue comprises a head pointerHeadPAnd tail pointerTailPHead pointerHeadPElement position, tail pointer to earliest joining queueTailPPoint to the element position of the circular queue that was added to the queue at the latest, and the elephant flow store queue is initialized to empty, with the initial head pointerHeadPAnd tail pointerTailPBoth are equal.
5. The method for rapidly detecting elephant flow based on probability sampling as claimed in claim 4, wherein the step of storing based on elephant flow storage queue in step 3) comprises:
3.1 ) determine whether the elephant flow quintuple information is receivedEleQuintupleIf the elephant stream quintuple information is receivedEleQuintupleExecuting the next step; otherwise, continuously returning to the step 3.1) to continuously carry out detection;
3.2 Judge the tail pointerTailPIf the pointed queue unit is empty, if the tail pointer isTailPIf the pointed queue unit is not empty, executing the next step; otherwise, skipping to execute the step 3.5);
3.3 Clear tail pointerTailPA pointed queue element;
3.4 According toHeadP=(HeadP+1) % LenQueueUpdating head pointerHeadPWherein% is modulo;
3.5 Quintuple information of elephant flowEleQuintuplePointer with tailTailPA pointed queue element;
3.6 According toTailP=(TailP+1) % LenQueueUpdating tail pointersTailPAnd ending.
6. A system for rapidly detecting elephant flow based on probability sampling, which comprises an input module with at least one input port and an input queue corresponding to the input port, an output module with at least one output port and an output queue corresponding to the output port, and a data forwarding controller, wherein the data forwarding controller is respectively connected with the input module and the output module, and is characterized in that the data forwarding controller is programmed or configured to execute the steps of the elephant flow rapid detection method based on probability sampling according to any one of claims 1 to 5, and the data forwarding controller comprises: the data packet forwarding module is used for sending the five-tuple information of the data packet head of the passing data packet and the queue occupation ratio; the flow counting module is used for receiving the quintuple information of the head part of the data packet and the occupation proportion of the interface queue sent by the data packet forwarding module, counting the data packet by adopting a probability sampling method and detecting the elephant flow based on the counting result of the data packet; the elephant flow storage module is used for storing the detected elephant flow information based on the elephant flow storage queue; the input end of the flow counting module is connected with the data packet forwarding module, the output end of the flow counting module is connected with the elephant flow storage module, and the elephant flow storage module is connected with the elephant flow storage queue.
7. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program for executing the probability sampling-based elephant flow fast detecting method of any one of claims 1-5 by a programmable switch, an intelligent network card or a commercial switch chip.
CN202111028109.2A 2021-09-02 2021-09-02 Elephant flow rapid detection method and system based on probability sampling Active CN113746700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111028109.2A CN113746700B (en) 2021-09-02 2021-09-02 Elephant flow rapid detection method and system based on probability sampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111028109.2A CN113746700B (en) 2021-09-02 2021-09-02 Elephant flow rapid detection method and system based on probability sampling

Publications (2)

Publication Number Publication Date
CN113746700A CN113746700A (en) 2021-12-03
CN113746700B true CN113746700B (en) 2023-04-07

Family

ID=78735146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111028109.2A Active CN113746700B (en) 2021-09-02 2021-09-02 Elephant flow rapid detection method and system based on probability sampling

Country Status (1)

Country Link
CN (1) CN113746700B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114240730B (en) * 2021-12-20 2024-01-02 苏州凌云光工业智能技术有限公司 Processing method of detection data in AOI detection equipment
CN115396373A (en) * 2022-10-27 2022-11-25 阿里云计算有限公司 Information processing method and system based on cloud server and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106453130A (en) * 2016-09-30 2017-02-22 杭州电子科技大学 Flow scheduling system and method based on accurate elephant flow identification
CN112788038A (en) * 2021-01-15 2021-05-11 昆明理工大学 Method for distinguishing DDoS attack and elephant flow based on PCA and random forest

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9124515B2 (en) * 2010-11-22 2015-09-01 Hewlett-Packard Development Company, L.P. Elephant flow detection in a computing device
CN106453129A (en) * 2016-09-30 2017-02-22 杭州电子科技大学 Elephant flow two-level identification system and method
US10924418B1 (en) * 2018-02-07 2021-02-16 Reservoir Labs, Inc. Systems and methods for fast detection of elephant flows in network traffic
CN109861881B (en) * 2019-01-24 2021-11-19 大连理工大学 Elephant flow detection method based on three-layer Sketch framework
CN110677324B (en) * 2019-09-30 2023-02-14 华南理工大学 Elephant flow two-stage detection method based on sFlow sampling and controller active update list
CN111262756B (en) * 2020-01-20 2022-05-06 长沙理工大学 High-speed network elephant flow accurate measurement method and device
CN112416950B (en) * 2021-01-25 2021-03-26 中国人民解放军国防科技大学 Design method and device of three-dimensional sketch structure

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106453130A (en) * 2016-09-30 2017-02-22 杭州电子科技大学 Flow scheduling system and method based on accurate elephant flow identification
CN112788038A (en) * 2021-01-15 2021-05-11 昆明理工大学 Method for distinguishing DDoS attack and elephant flow based on PCA and random forest

Also Published As

Publication number Publication date
CN113746700A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
US10735325B1 (en) Congestion avoidance in multipath routed flows
CN113746700B (en) Elephant flow rapid detection method and system based on probability sampling
US10778588B1 (en) Load balancing for multipath groups routed flows by re-associating routes to multipath groups
US7787442B2 (en) Communication statistic information collection apparatus
US7643486B2 (en) Pipelined packet switching and queuing architecture
US8005012B1 (en) Traffic analysis of data flows
CN113785536A (en) System and method for facilitating tracer grouping in data-driven intelligent networks
US10693790B1 (en) Load balancing for multipath group routed flows by re-routing the congested route
US9674102B2 (en) Methods and network device for oversubscription handling
US7664112B2 (en) Packet processing apparatus and method
CN107005485A (en) A kind of method, corresponding intrument and system for determining route
CN109547341B (en) Load sharing method and system for link aggregation
CN101984608A (en) Method and system for preventing message congestion
US11606448B2 (en) Efficient capture and streaming of data packets
US10924374B2 (en) Telemetry event aggregation
US10764209B2 (en) Providing a snapshot of buffer content in a network element using egress mirroring
RU2628477C2 (en) Package processing device, method of configuring stream entry and program
CN111970211A (en) Elephant flow processing method and device based on IPFIX
US8194545B2 (en) Packet processing apparatus
KR101688635B1 (en) Apparatus for storing traffic based on flow and method
CN1638362A (en) Parallel data link layer controllers in a network switching device
EP2690821A1 (en) Method And Apparatus For Packet Buffering Measurement
CN105393597B (en) Method for controlling network congestion and controller
CN1638385A (en) Parallel data link layer controllers in a network switching device
CN110324255B (en) Data center network coding oriented switch/router cache queue management method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant