CN115174961A - Multi-platform video flow early identification method facing high-speed network - Google Patents

Multi-platform video flow early identification method facing high-speed network Download PDF

Info

Publication number
CN115174961A
CN115174961A CN202210796253.9A CN202210796253A CN115174961A CN 115174961 A CN115174961 A CN 115174961A CN 202210796253 A CN202210796253 A CN 202210796253A CN 115174961 A CN115174961 A CN 115174961A
Authority
CN
China
Prior art keywords
video
traffic
flow
stream
speed network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210796253.9A
Other languages
Chinese (zh)
Inventor
吴桦
乐鑫
程光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210796253.9A priority Critical patent/CN115174961A/en
Publication of CN115174961A publication Critical patent/CN115174961A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/858Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
    • H04N21/8586Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by using a URL

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a multi-platform video flow early identification method facing a high-speed network. Then, a feature space for classifying video and non-video traffic is constructed based on a protocol-independent principle, and a data set is constructed by extracting feature vectors from marked traffic. Finally, a classification model is constructed offline on a dataset containing video and non-video traffic using a supervised machine learning approach. The classification model can accurately identify the video flow in the high-speed network under the situation of high-speed network sampling data acquisition by combining the characteristic space provided by the above. The feature space provided by the invention can extract stable feature vectors from a small amount of data packets of the stream, and can identify the video flow in the early stage of stream transmission. The invention can realize real-time identification of video traffic in massive high-speed traffic in limited memory and reasonable time, and can be used for network traffic analysis and network management.

Description

Multi-platform video flow early identification method facing high-speed network
Technical Field
The invention relates to a high-speed network-oriented multi-platform video traffic early identification method, and belongs to the technical field of network security.
Background
With the development of the internet, video traffic increasingly dominates the global network. By 2022, IP video traffic will account for 82% of all IP traffic (including businesses and consumers), higher than 75% in 2017, with a composite annual growth rate of 33%. Identifying video traffic in high speed networks in a timely manner helps manage and allocate network resources, and thus traffic identification methods have been a major concern for Internet Service Providers (ISPs).
However, as the demand of users for video streaming services increases, a large number of video platforms using different transport protocols appear, which brings some challenges to the identification of video traffic; in addition, due to the high speed of the network bandwidth, the ISP can only obtain the sampling data of the video traffic at the traffic collection node under limited resources, which also puts new requirements on the video traffic identification method.
Researchers have proposed a series of video traffic identification methods, of which threshold-based and machine learning-based methods are widely used, but these methods still have some limitations.
(1) Identification method based on threshold value
The threshold-based method records some statistics of the stream, compares the statistics with a set threshold, and judges whether the stream is a video stream according to whether the statistics exceed the threshold. Although the method can quickly and accurately identify the video traffic, the setting of the threshold value has strong dependence on the protocol, and only the video traffic of certain specific applications can be identified, and the diversity of the video transmission protocol causes that the method cannot identify the full-platform video traffic on a high-speed network.
(2) Machine learning-based identification method
The video traffic identification method based on machine learning constructs a traffic classification model by extracting effective features from the content and the mode of traffic, and the identification performance of the traffic classification model depends on the construction of a feature space. The existing feature space construction methods are mainly divided into two types. One is to construct the transmission mode (such as timing characteristics) of the video stream from the full traffic, however, this method needs a long time to extract the characteristics of the complete long stream, and cannot identify the video traffic in the high-speed network within a reasonable time. Another class of methods extracts features from critical packets of the stream (e.g., packets during the handshaking phase), thereby reducing the time required for feature extraction and increasing the speed of recognition. The effectiveness of such methods depends on whether critical packets can be acquired at an early stage of stream set-up. However, in a high-speed network, due to limited resources, the ISP cannot obtain all the critical data packets in the sampling traffic, and thus such a method has poor performance in a high-speed network sampling environment. In summary, none of the existing machine learning based methods can be used to identify video traffic in a high speed network.
Disclosure of Invention
The invention discloses a multi-platform video traffic early identification method facing a high-speed network, aiming at identifying video traffic in the high-speed network in limited memory and reasonable time. Specifically, the method firstly collects video traffic of different platforms, and then marks the video stream and the non-video stream according to handshake or request information of unknown streams. Then, a feature space for classifying video and non-video streams is constructed based on a protocol independent principle, and a data set is constructed by extracting feature vectors from marked traffic. Finally, a supervised machine learning approach is used to train the classification model offline on the obtained data set. The classification model can accurately identify the video flow in the high-speed network under the situation of high-speed network sampling data acquisition by combining with the characteristic space provided by the above. The feature space proposed by the invention can extract stable feature vectors from a small number of data packets of the stream, so that the video traffic can be identified at the early stage of stream transmission.
In order to realize the purpose of the invention, the specific technical steps of the scheme are as follows:
the method comprises the following steps that (1) video playing flow of different platforms is collected through data collection equipment;
preprocessing the acquired flow, and marking video and non-video streams;
step (3) extracting features of the traffic marked in the step (2), constructing a feature space based on rules, and then obtaining a sample set with labels;
step (4) taking the sample set obtained in the step (3) as a training set, and then training by using a supervised machine learning method to obtain a classification model capable of distinguishing video streams from non-video streams;
step (5) setting a sampling ratio, carrying out system sampling on the flow in the high-speed network according to groups, and then grouping the sampled groups and extracting characteristics;
and (6) predicting unknown streams by applying the classification model obtained in the step (4) and identifying video flow.
Further, in the step (1), the acquiring of the video traffic specifically includes the following substeps:
and (1.1) respectively capturing the flow on the laboratory host and the android device. Using Wireshark to directly grab flow at a host end; the android device is connected with a hot spot on the host, and the traffic of the video playing process of the android device is captured through Wireshark. And when video traffic is captured, the networking permission of other applications is forbidden.
(1.2) selecting popular video websites at home and abroad, playing videos and capturing flow according to the following strategies: setting the maximum capture time of each video to be 5 minutes, and then finishing capture and storing the video as a pcap file;
and (1.3) compiling an automatic script to realize the step (1.2) and capturing video flow in batches.
Further, in the step (2), the preprocessing and marking of the flow specifically includes the following sub-steps:
(2.1) recombining the data packets into bidirectional flows according to five-tuple (source IP, source port, destination IP, destination port and transport layer protocol) for the video flows of different platforms obtained in the step (1), and discarding the flows with the packet quantity less than N;
(2.2) judging a transmission protocol adopted by the bidirectional stream, and if the bidirectional stream is an unencrypted video stream, performing (2.3); otherwise, performing (2.4);
(2.3) extracting URL request information containing the transmitted file type from the bidirectional stream, judging whether the stream is a video stream according to the file type keyword, and marking;
(2.4) extracting an SNI field containing domain name information from handshake information in the bidirectional stream, judging whether the stream is a video stream according to keywords contained in the SNI, and marking;
further, in the step (3), constructing the labeled sample set specifically includes the following sub-steps:
(3.1) extracting the features shown in Table 1 for the tagged stream obtained in step (2);
TABLE 1 statistics and description
Statistical value Description of statistical values
f_pck Number of data packets transmitted in uplink direction
b_pck Number of data packets transmitted in the downstream direction
f_len Number of bytes transmitted in uplink direction
b_len Byte number of downlink transmission
f_d_p Number of data packets with load transmitted in uplink direction
b_d_p Number of data packets with load transmitted in downlink direction
f_d_l Data byte number with load transmitted in uplink direction
b_d_l Data byte number with load transmitted in downlink direction
p_len The number of payload bytes carried by each packet in a bi-directional flow
tmGap Effective transmission time of bidirectional flow
(3.2) further processing the collected information, and eliminating the influence of data packet sampling on the feature stability through statistical calculation;
(3.3) when the characteristics are selected, the influence of the protocol on the characteristics is avoided as much as possible, and a characteristic space shown in a table 2 is constructed on the basis of three characteristics (asymmetry, high transmission rate and unique payload length distribution) of video traffic transmission per se;
TABLE 2 feature space and description thereof
Figure BDA0003735964850000031
Figure BDA0003735964850000041
And (3.4) extracting a feature vector from the collected flow to construct a sample set based on the constructed feature space.
Further, in the step (4), training the classification model specifically includes the following steps:
(4.1) the sample sets are divided into 3:1, dividing the training set into a training set and a test set;
(4.2) training the training set by using a random forest algorithm, performing dimension reduction processing on the feature vectors by using the test set, and determining parameters of the algorithm;
and (4.3) obtaining a classification model for video traffic identification.
Further, the step (5) of acquiring the high-speed network traffic and extracting the feature vector specifically includes the following steps:
(5.1) deploying traffic collection equipment in a high-speed network, and continuously capturing traffic by using tcpdump;
(5.2) setting a sampling ratio, carrying out system sampling on the obtained data, and recombining flows according to quintuple;
(5.3) setting the number M of data packets required for extracting the features, and extracting feature vectors from the first M data packets of the sampled stream;
further, in the step (6), the feature vector of the high-speed network traffic extracted in the step (5) is input into the classification model obtained in the step (4), and the video traffic is identified from the feature vector and the result is output.
Compared with the prior art, the technical scheme of the invention has the following advantages:
(1) The invention provides a new feature space, the feature space uses features irrelevant to protocols, and the model searched by the features can identify multi-platform video flow adopting different protocols, so that the method has higher practicability in a high-speed network.
(2) The feature space provided by the invention can extract stable feature vectors from the first 500 data packets of each stream, so that the video can be quickly identified in the early stage of stream transmission, and the test result proves that the method can be used for real-time identification of video flow.
(3) The invention combines the sampling technology with the video stream identification method, reduces the resource consumption of flow processing in the high-speed network, and experiments prove that the invention can identify more than 98 percent of video flow in the 10Gbps high-speed network when the sampling rate is set to be 1/32.
Drawings
FIG. 1 is an overall architecture diagram of the present invention;
FIG. 2 is a packet payload length probability distribution for a video stream and other types of streams;
fig. 3 shows the recognition performance of the present invention when different sampling rates are set in high-speed network traffic.
Detailed Description
The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention.
The specific embodiment is as follows: the invention provides a high-speed network-oriented multi-platform video traffic early identification method, the general architecture of which is shown in figure 1, comprising the following steps:
the method comprises the following steps that (1) video playing flow of different platforms is collected through data collection equipment;
preprocessing the acquired flow, and marking video and non-video streams;
step (3) extracting features from the traffic marked in step (2), constructing a feature space based on rules, and then obtaining a sample set with labels;
step (4) taking the sample set obtained in the step (3) as a training set, then training by using a supervised machine learning method and obtaining a classification model capable of distinguishing video streams from non-video streams;
step (5) setting a sampling ratio, carrying out system sampling on the flow in the high-speed network according to groups, and then grouping the sampled groups and extracting characteristics;
and (6) predicting the unknown stream by applying the classification model obtained in the step (4) and identifying the video flow.
In an embodiment of the present invention, in the step (1), the specific steps of acquiring video traffic of different platforms are as follows:
and (1.1) respectively capturing the flow on the laboratory host and the android device. Using Wireshark to directly grab flow at a host end; the android device is connected to a hot spot on the host, and the traffic of a specific process of the android device is captured through Wireshark. And when video traffic is captured, the networking permission of other applications is forbidden.
(1.2) selecting popular video websites at home and abroad, playing videos and capturing flow according to the following strategies: setting the maximum capture time of each video to be 5 minutes, and then finishing capture and storing as a pcap file;
and (1.3) compiling an automatic script, realizing the capture of the video flow according to the strategy of the step (1.2), and forbidding other networking equipment when capturing the video flow.
(1.4) selecting a part of video platforms with the highest domestic and foreign user quantity, collecting video playing flow of the video platforms, and analyzing transmission protocols used by different platforms, wherein the specific description of the flow is shown in table 1.
Acquisition platform Number of bytes of data collected Transmission protocol
Facebook 378MB HTTP+TLS1.3;
Youtube 13.85GB HTTP+TLS1.3;GQUIC;
Twitter 70MB HTTP+TLS1.3;
Bilibili 2.87GB HTTP+TCP;UDT;
Love art 5.3GB HTTP+TCP;HTTP+TLS1.2;
Youke 1.29GB HTTP+TCP;HTTP+TLS1.2;
Fast hand 3.07GB HTTP+TLS1.2;HTTP+TLS1.3;
Human-body film and television 1.18GB HTTP+TLS1.2;
Fox-searching movie 1.01GB HTTP+TCP;HTTP+TLS1.3;GQUIC;
Tremble sound 112MB HTTP+TCP;HTTP+TLS1.2;
Volcano small video 334MB HTTP+TCP;
Other platforms 0.99GB HTTP+TCP;HTTP+TLS1.2;
In one embodiment of the present invention, in step (2), the specific steps of preprocessing and marking the flow rate are as follows:
(2.1) for the captured video flow, recombining the data packets into bidirectional flow according to five-tuple (source IP, source port, destination IP, destination port and transport layer protocol), setting N as 100, and discarding the flow of which the number of the data packets is less than N;
(2.2) unpacking the stream by using a dpkt tool, and extracting key information containing the flow type according to a transmission protocol used by the stream, wherein the method specifically comprises the following steps: if the stream is encrypted by adopting TLS or QUIC protocol, finding out a data packet containing ClientHello information, then extracting an SNI field containing server domain name information from the data packet, and finally judging whether the stream is a video stream according to keywords contained in the SNI field; if the stream is transmitted by adopting an unencrypted HTTP protocol, obtaining a URL from a data packet containing the GET request, and judging whether the stream is a video stream according to a request data type keyword contained in the URL.
And (2.3) writing a program to realize batch extraction of SNI and URL, matching according to a regular expression, and quickly marking video streams and non-video streams.
In one embodiment of the present invention, in step (3), statistics as shown in table 2 are collected for the tagged bidirectional stream obtained in step (2), then the collected information is amplified by the reciprocal of the set sampling rate to eliminate the influence of sampling on the stability of the statistics, and then statistical features are specifically constructed from the following three directions according to the characteristics of video traffic transmission:
TABLE 2 statistics and description
Statistical value Description of statistical values
f_pck Number of data packets transmitted in uplink direction
b_pck Number of data packets to be transmitted in the downstream direction
f_len Number of bytes transmitted in uplink direction
b_len Byte number of downlink transmission
f_d_p Number of data packets with load transmitted in uplink direction
b_d_p Number of data packets with load transmitted in downlink direction
f_d_l Data byte number with load transmitted in uplink direction
b_d_l Data byte number with load transmitted in downlink direction
p_len The number of payload bytes carried by each data packet in the bidirectional flow
tmGap Effective transmission time of bidirectional flow
(3.1) constructing four statistical characteristics RAT = { r _ b _ pck, r _ b _ len, r _ b _ dp, r _ b _ dl } based on the asymmetry of the uplink and downlink transmission of the video stream. Wherein r _ b _ pck is the ratio of the number of data packets sent in the downlink direction and the bidirectional flow, r _ b _ len is the ratio of the number of bytes sent in the downlink direction and the bidirectional flow, r _ b _ dp is the ratio of the number of data packets with load sent in the downlink direction and the bidirectional flow, and r _ b _ dl is the ratio of the number of bytes of load data sent in the downlink direction and the bidirectional flow. These four statistical characteristics are calculated using equation (1):
Figure BDA0003735964850000071
(3.2) based on the high transmission rate characteristic of the video stream, four statistical characteristics SPD = { b _ SPD _ pck, f _ SPD _ pck, b _ SPD _ len, f _ SPD _ len }. Wherein b _ spd _ pck and f _ spd _ pck are the transmission rates of the number of packets in the downlink direction and the uplink direction, respectively, and b _ spd _ len and f _ spd _ len are the byte transmission rates in the downlink direction and the uplink direction, respectively. These four statistical features are calculated using equation (2):
Figure BDA0003735964850000072
(3.3) attached fig. 2 shows packet payload length probability distribution of a video stream as distinguished from other types of streams, so the payload length is divided among regions based on unique payload length distribution of the video stream. According to the common MTU in a network link being 1300 bytes, the data packet payload is divided into 13 intervals according to every 100 bytes, and 15 intervals are divided by adding a left boundary and a right boundary, and the bidirectional stream comprises two directions, so that 30 intervals are included in total. These features are named PLD and calculated using equation (3):
Figure BDA0003735964850000081
wherein Interval i The number of data packets included in the ith interval.
And (3.4) combining three types of features of RAT, SPD and PLD to construct a feature space, wherein the feature space contains 38 features in total, and extracting feature vectors from the tagged traffic obtained in the step (2) to construct a data set.
In an embodiment of the present invention, in the step (4), the training of the classification model specifically includes the following steps:
(4.1) the data set obtained in the step (3) is divided into 3:1, dividing a training set and a test set, wherein the training set comprises 7899 samples, and the test set comprises 2633 samples;
(4.2) this example trains the training set using a random forest algorithm and tests on the test set. Firstly, sorting the reusability of the features based on average impurity reduction (MDI), taking 8 features with the highest importance to realize the dimension reduction operation of the feature vector, and finally selecting the features as shown in a table 3; then, determining the optimal parameters of a random forest algorithm based on grid search ten-fold cross validation; and finally, obtaining a classification model for identifying the video flow.
TABLE 3 flow characteristics and meanings
Characteristic name Means of
per_b_(0) The ratio of the number of packets having a payload length of 0 bytes to the total number of packets in the downstream direction
per_b_(1-100) The ratio of the number of packets having a payload length of 1 to 100 bytes to the total number of packets in the downstream direction
per_f_(>1300) Ratio of number of packets having payload length greater than 1300 bytes to total number of packets in upstream direction
r_f_dp Ratio between number of data packets with load transmitted in upstream and bidirectional flow
r_f_dl Ratio between number of payload bytes transmitted in upstream and bidirectional flows
r_f_pck Ratio between number of data packets transmitted in upstream and bidirectional flows
r_f_len Ratio between number of bytes transmitted in upstream and bidirectional flow
b_spd_len Data transmission rate in downlink direction
In one embodiment of the present invention, in the step (5), the specific steps of collecting the high-speed network traffic and extracting the feature vector are as follows:
(5.1) in this example, traffic is collected at the campus network port in the morning of 8 am 11 month in 2021, the collection time is 400s, the collected port bandwidth is 10Gbps, and finally the obtained traffic size is 117GB, which includes 171485 streams. The collected traffic comprises video traffic from different platforms;
(5.2) setting the sampling rate to be 1/32, performing grouping system sampling on the acquired data, and then recombining the data packets with the same five-tuple into the same bidirectional flow;
(5.3) according to the test result, setting the number M of data packets needed by extracting the feature vector of the flow to be 500, extracting the features of the sampled flow, and finally obtaining 30766 samples containing the features;
in an embodiment of the present invention, in the step (6), identifying the high-speed network video traffic by using the video traffic identifier includes the following specific steps:
(6.1) the precision ratio precision and the recall ratio recall are selected as evaluation indexes in the embodiment, and for the condition that all high-speed traffic does not contain tags, the precision and the recall are respectively calculated by adopting the following two methods:
sampling and verifying method: applying the classification model to obtain a classification result with a label, and manually proofreading a part of the result so as to estimate precision of the classification model;
a mark re-complementing method: mixing M video stream samples marked in advance into a sample to be predicted, applying a classification model to obtain a classification result with a label, recording the number of the samples of the video stream predicted by the classification model in the M samples as M, and estimating the call = M/M of the classification model.
(6.2) the classification model is applied to a high-speed flow data set to identify video flow, the data set comprises video flow of a plurality of platforms, precision and recycle of models at different sampling rates are shown in the attached drawing 3, and the identification of the multi-platform video flow of more than 98% in a high-speed network is proved by the method;
(6.3) the example analyzes the shortest time required by the invention for identifying the video traffic in a high-speed network through experiments to prove that the invention has stronger practicability. The time required for identifying the video stream in the high-speed network comprises feature extraction time and model prediction time. Wherein the time required for extracting features from a stream is mainly influenced by the bandwidth and the sampling rate, and for high-speed network traffic with 10Gbp bandwidth, the invention only needs 2.24 milliseconds for extracting features from the first 500 data packets of a bidirectional stream under the sampling rate of 1/32 and neglecting other processing overhead. For 30766 samples used in this example, the present invention can complete feature extraction in 68915 ms at the shortest time, and can complete model prediction in 322 ms. In conclusion, for 400 seconds of data in a real high-speed network of 10Gbps, the method can complete the identification of the video traffic in only 69.237 seconds at the shortest time, and the method proves that the method can be used for real-time identification of video traffic of different platforms in the high-speed network.
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims (7)

1. A multi-platform video flow early identification method facing a high-speed network is characterized by comprising the following steps:
the method comprises the following steps that (1) video playing flows of different platforms are collected through data collection equipment;
preprocessing the acquired flow, and marking video and non-video streams;
step (3) extracting features from the traffic marked in step (2), constructing a feature space based on rules, and then obtaining a sample set with labels;
step (4) taking the sample set obtained in the step (3) as a training set, and then training by using a supervised machine learning method to obtain a classification model capable of distinguishing video streams from non-video streams;
step 5, setting a sampling ratio, carrying out system sampling on the flow in the high-speed network according to groups, then grouping the sampled groups, and extracting characteristics;
and (6) predicting the unknown stream by applying the classification model obtained in the step (4) and identifying the video flow.
2. The method for early identifying the video traffic of the high-speed network-oriented multiple platforms according to claim 1, wherein in the step (1), the method for capturing the video traffic is as follows:
(1.1) respectively capturing flow on a laboratory host and android equipment, and directly capturing the flow at a host end by using Wireshark; the android device is connected with a hot spot on the host, the flow of the video playing process of the android device is captured through Wireshark, and the networking permission of other applications is forbidden when the video flow is captured;
(1.2) selecting popular video websites at home and abroad, playing videos and capturing flow according to the following strategies: setting the maximum capture time of each video to be 5 minutes, and then finishing capture and storing the video as a pcap file;
and (1.3) compiling an automatic script to realize the step (1.2) and capturing video flow in batches.
3. The method according to claim 1, wherein the preprocessing and marking of the traffic in step (2) specifically comprises the following steps:
(2.1) for the video traffic of different platforms obtained in the step (1), forming a quintuple, namely a source IP, a source port, a destination IP, a destination port and a data packet with the same transport layer protocol into the same bidirectional flow, and discarding the flow with the packet quantity less than N;
(2.2) judging the transmission protocol adopted by the bidirectional stream, and if the bidirectional stream is the non-encrypted video stream, performing (2.3); otherwise, carrying out (2.4);
(2.3) extracting URL request information containing the transmitted file type from the bidirectional stream, judging whether the stream is a video stream according to the file type keyword, and marking;
and (2.4) extracting an SNI field containing domain name information from handshake information in the bidirectional stream, judging whether the stream is a video stream according to keywords contained in the SNI, and marking.
4. The method for early recognition of multi-platform video traffic oriented to high-speed network according to claim 1, wherein in the step (3), the specific steps of constructing the tagged sample set are as follows:
(3.1) recording statistics as shown in table 1 for the bi-directional stream that has been marked;
TABLE 1 statistics and description
Statistical value Description of statistical values f_pck Number of data packets transmitted in uplink direction b_pck Number of data packets to be transmitted in the downstream direction f_len Number of bytes transmitted in uplink direction b_len Byte number of downlink transmission f_d_p Number of data packets with load transmitted in uplink direction b_d_p Number of data packets with load transmitted in downlink direction f_d_l Data byte number with load transmitted in uplink direction b_d_l Data byte number with load transmitted in downlink direction p_len The number of payload bytes carried by each data packet in the bidirectional flow tmGap Effective transmission time of bidirectional flow
(3.2) further processing the collected information, and eliminating the influence of data packet sampling on the feature stability through statistical calculation;
(3.3) avoiding the influence of the protocol on the characteristics as much as possible when the characteristics are selected, and constructing a characteristic space shown in a table 2 for the bidirectional flow from three characteristics of video flow transmission, namely asymmetry, high transmission rate and unique payload length distribution of uplink and downlink flow transmission;
TABLE 2 description of feature spaces and features contained therein
Figure FDA0003735964840000021
Figure FDA0003735964840000031
And (3.4) extracting a feature vector from the collected flow to construct a sample set based on the constructed feature space.
5. The method for early recognition of multi-platform video traffic oriented to high-speed network according to claim 1, wherein in the step (4), training the classification model specifically includes the following steps:
(4.1) the sample sets were expressed as 3:1, dividing the training set into a training set and a testing set;
(4.2) training the training set by using a random forest algorithm, performing dimension reduction processing on the feature vectors by using the test set, and determining parameters of the algorithm;
and (4.3) obtaining a classification model for video traffic identification.
6. The method according to claim 1, wherein the step (5) of acquiring the high-speed network traffic and extracting the feature vector comprises the following steps:
(5.1) deploying traffic collection equipment in a high-speed network, and continuously capturing traffic by using tcpdump;
(5.2) setting a sampling ratio, carrying out system sampling on the obtained data, and recombining the streams according to quintuple;
and (5.3) setting the number M of data packets required for extracting the features, and extracting feature vectors from the first M data packets of the sampled stream.
7. The method for early identifying the video traffic of the high-speed network-oriented multiple platforms as claimed in claim 1, wherein in the step (6), the feature vector of the high-speed network traffic extracted in the step (5) is input into the classification model obtained in the step (4), and the video traffic is identified therefrom and the result is output.
CN202210796253.9A 2022-07-07 2022-07-07 Multi-platform video flow early identification method facing high-speed network Pending CN115174961A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210796253.9A CN115174961A (en) 2022-07-07 2022-07-07 Multi-platform video flow early identification method facing high-speed network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210796253.9A CN115174961A (en) 2022-07-07 2022-07-07 Multi-platform video flow early identification method facing high-speed network

Publications (1)

Publication Number Publication Date
CN115174961A true CN115174961A (en) 2022-10-11

Family

ID=83490736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210796253.9A Pending CN115174961A (en) 2022-07-07 2022-07-07 Multi-platform video flow early identification method facing high-speed network

Country Status (1)

Country Link
CN (1) CN115174961A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117077030A (en) * 2023-10-16 2023-11-17 易停车物联网科技(成都)有限公司 Few-sample video stream classification method and system for generating model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685016A (en) * 2012-06-06 2012-09-19 济南大学 Internet flow distinguishing method
CN106998322A (en) * 2017-02-20 2017-08-01 南京邮电大学 A kind of stream sorting technique of the Mean Opinion Score characteristics of mean of use video traffic
WO2019060949A1 (en) * 2017-09-27 2019-04-04 Newsouth Innovations Pty Limited Process and apparatus for identifying and classifying video-data
CN113591950A (en) * 2021-07-19 2021-11-02 中国海洋大学 Random forest network traffic classification method, system and storage medium
CN114513685A (en) * 2022-01-28 2022-05-17 武汉绿色网络信息服务有限责任公司 Method and device for identifying HTTPS encrypted video stream based on stream characteristics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685016A (en) * 2012-06-06 2012-09-19 济南大学 Internet flow distinguishing method
CN106998322A (en) * 2017-02-20 2017-08-01 南京邮电大学 A kind of stream sorting technique of the Mean Opinion Score characteristics of mean of use video traffic
WO2019060949A1 (en) * 2017-09-27 2019-04-04 Newsouth Innovations Pty Limited Process and apparatus for identifying and classifying video-data
CN113591950A (en) * 2021-07-19 2021-11-02 中国海洋大学 Random forest network traffic classification method, system and storage medium
CN114513685A (en) * 2022-01-28 2022-05-17 武汉绿色网络信息服务有限责任公司 Method and device for identifying HTTPS encrypted video stream based on stream characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁梦娇, 董育宁: "基于特征融合和机器学习的网络视频流分类", 南京邮电大学学报(自然科学版), 28 February 2021 (2021-02-28), pages 1 - 4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117077030A (en) * 2023-10-16 2023-11-17 易停车物联网科技(成都)有限公司 Few-sample video stream classification method and system for generating model
CN117077030B (en) * 2023-10-16 2024-01-26 易停车物联网科技(成都)有限公司 Few-sample video stream classification method and system for generating model

Similar Documents

Publication Publication Date Title
Dubin et al. I know what you saw last minute—encrypted http adaptive video streaming title classification
KR100523486B1 (en) Traffic measurement system and traffic analysis method thereof
Bujlow et al. A method for classification of network traffic based on C5. 0 Machine Learning Algorithm
CN112714045B (en) Rapid protocol identification method based on device fingerprint and port
Callado et al. A survey on internet traffic identification
CN102315974B (en) Stratification characteristic analysis-based method and apparatus thereof for on-line identification for TCP, UDP flows
CN101714952B (en) Method and device for identifying traffic of access network
Tsilimantos et al. Classifying flows and buffer state for YouTube's HTTP adaptive streaming service in mobile networks
US20070076606A1 (en) Statistical trace-based methods for real-time traffic classification
Areström et al. Early online classification of encrypted traffic streams using multi-fractal features
Bujlow et al. Classification of HTTP traffic based on C5. 0 Machine Learning Algorithm
CN111611280A (en) Encrypted traffic identification method based on CNN and SAE
US11743195B2 (en) System and method for monitoring and managing video stream content
Manzoor et al. How HTTP/2 is changing web traffic and how to detect it
Gutterman et al. Requet: Real-time QoE metric detection for encrypted YouTube traffic
CN113283498A (en) VPN flow rapid identification method facing high-speed network
Wang et al. Benchmark data for mobile app traffic research
CN115174961A (en) Multi-platform video flow early identification method facing high-speed network
KR101344398B1 (en) Router and method for application awareness and traffic control on flow based router
Dubin et al. Video quality representation classification of Safari encrypted DASH streams
CN101854366A (en) Peer-to-peer network flow-rate identification method and device
Bentaleb et al. Inferring quality of experience for adaptive video streaming over HTTPS and QUIC
Wu et al. Inferring adu combinations from encrypted quic stream
CN116723313A (en) Method, system and medium for evaluating quality of experience of QUIC video based on machine learning
CN114679318B (en) Lightweight Internet of things equipment identification method in high-speed network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination