CN113037628A - Method, system and medium for automatically discovering service path - Google Patents

Method, system and medium for automatically discovering service path Download PDF

Info

Publication number
CN113037628A
CN113037628A CN202110236560.7A CN202110236560A CN113037628A CN 113037628 A CN113037628 A CN 113037628A CN 202110236560 A CN202110236560 A CN 202110236560A CN 113037628 A CN113037628 A CN 113037628A
Authority
CN
China
Prior art keywords
service
communication
data packet
network
services
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110236560.7A
Other languages
Chinese (zh)
Other versions
CN113037628B (en
Inventor
陶飞
蔡晓华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Netis Technologies Co ltd
Original Assignee
Shanghai Netis Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Netis Technologies Co ltd filed Critical Shanghai Netis Technologies Co ltd
Priority to CN202110236560.7A priority Critical patent/CN113037628B/en
Publication of CN113037628A publication Critical patent/CN113037628A/en
Application granted granted Critical
Publication of CN113037628B publication Critical patent/CN113037628B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/30Routing of multiclass traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/26Route discovery packet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/51Discovery or management thereof, e.g. service location protocol [SLP] or web services

Abstract

The invention provides a method, a system and a medium for automatically discovering a service path, which relate to the technical field of network traffic analysis, and the method comprises the following steps: step A: acquiring a network data packet, and searching a service endpoint in the network data packet; and B: after the service end points are found, clustering the service end points to find the service; and C: and analyzing the incidence relation among the services according to the services, and finding out a corresponding service path. The invention can discover the relation and the whole service path between services while classifying the traffic.

Description

Method, system and medium for automatically discovering service path
Technical Field
The present invention relates to the field of network traffic analysis technologies, and in particular, to a method, a system, and a medium for automatically discovering a service path.
Background
The network flow analysis is to monitor the flow distribution of each layer in the seven-layer structure of the user network in real time and to perform comprehensive analysis of the protocol and the flow, thereby effectively finding and preventing the network flow and the bottleneck in application and providing a basis for the optimization of the network performance.
In the field of network traffic analysis, it is a common task to analyze a large amount of heterogeneous traffic and find a service and a relationship graph (service path graph) between services from the large amount of heterogeneous traffic, and this task is a basis for developing downstream tasks such as network performance management, service performance management, and the like.
In view of the above-mentioned prior art, the present invention has the following technical problems that some automated methods exist in the related art to classify or cluster traffic, but these methods only stay in traffic classification, and do not consider discovery of relationships between services, and discovery of relationships between services and an overall service path diagram cannot be discovered by a single service discovery.
Disclosure of Invention
In view of the defects in the prior art, an object of the present invention is to provide a method, a system, and a medium for automatically discovering a service path, which can discover a relationship between services and an overall service path while classifying traffic.
According to the method and the system for automatically discovering the service path, the scheme is as follows:
in a first aspect, a method for automatically discovering a service path is provided, where the method includes:
acquiring a network data packet, and searching a service endpoint in the network data packet;
after the service end points are found, clustering the service end points to find the service;
and analyzing the incidence relation among the services according to the services, and finding out a corresponding service path.
Preferably, finding the service endpoint comprises:
analyzing the network data packet to find out all communication endpoints, wherein the communication endpoints comprise an IP and a communication opposite terminal;
counting the number of communication opposite ends of each communication end point;
for two communication end points of each network communication pair, marking the ports with more communication opposite ends as the service ports;
if the number of the communication opposite ends is the same, the communication end point with the small port number is marked as the service end point.
Preferably, clustering the service endpoints and finding the service comprises:
for each service endpoint, collecting network traffic characteristics related to the service endpoint, wherein the network traffic characteristics comprise a port number, a communication opposite terminal set, a received data packet and an emitted data packet;
extracting and converting network traffic characteristics, wherein the extraction of the network characteristics comprises port numbers, IP addresses of a plurality of clients with the largest traffic, and character statistical characteristics of an inflow data packet and an outflow data packet;
combining the above features and generating a feature vector;
and applying a clustering algorithm to the feature vectors of the service endpoints, wherein each cluster output by the clustering algorithm is used as a service.
Preferably, analyzing the association relationship between the services, and finding out the corresponding service path includes:
for each service, calculating the number of received packets according to a time window W, and further forming a flow time sequence;
for any two services, calculating the correlation of the two corresponding traffic time series, wherein the service team with the correlation exceeding a threshold value T is added into a path candidate set;
and taking the service as a node, taking each element of the path candidate set as an edge, further forming a service path graph, and finally outputting the system.
In a second aspect, there is provided a service path automatic discovery system, the system comprising:
the searching module is used for acquiring a network data packet and searching a service endpoint in the network data packet;
the clustering module is used for clustering the service endpoints to find the service after searching the service endpoints;
and the analysis module is used for analyzing the incidence relation among the services according to the services and finding out the corresponding service path.
Preferably, the searching module specifically includes:
analyzing the network data packet to find out all communication endpoints, wherein the communication endpoints comprise an IP and a communication opposite terminal;
counting the number of communication opposite ends of each communication end point;
and for the two communication end points of each network communication pair, marking the port with more communication opposite ends as the service port, and if the number of the communication opposite ends is the same, marking the communication end point with a smaller port number as the service end point.
Preferably, the clustering module specifically includes:
for each service endpoint, collecting network traffic characteristics related to the service endpoint, wherein the network traffic characteristics comprise a port number, a communication opposite terminal set, a received data packet and an emitted data packet;
extracting and converting network traffic characteristics, wherein the extraction of the network characteristics comprises port numbers, IP addresses of a plurality of clients with the largest traffic, and character statistical characteristics of an inflow data packet and an outflow data packet;
combining the above features and generating a feature vector;
and applying a clustering algorithm to the feature vectors of the service endpoints, wherein each cluster output by the clustering algorithm is used as a service.
Preferably, the analysis module specifically includes:
for each service, calculating the number of received packets according to a time window W, and further forming a flow time sequence;
for any two services, calculating the correlation of the two corresponding traffic time series, wherein the service team with the correlation exceeding a threshold value T is added into a path candidate set;
and taking the service as a node, taking each element of the path candidate set as an edge, further forming a service path graph, and finally outputting the system.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the steps, services can be automatically found from a large amount of flow, and the cost investment of people combing together is reduced;
2. the relationship among the services is discovered, the service path is realized, and the actual network topology can be better understood.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a schematic structural diagram of a method for automatically discovering a service path according to the present invention;
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Referring to fig. 1, a network data packet is first obtained, the obtained network data packet is analyzed, all communication endpoints are found in the analyzing process, the communication endpoints to be found comprise an IP and a communication opposite terminal, and after the corresponding IP and the communication opposite terminal are found, the number of the communication opposite terminals needs to be counted. Regarding two communication end points on each network communication pair, taking the ports with more communication opposite ends as service ports and correspondingly marking; if the number of the communication opposite ends is the same, the port numbers can be compared, and then the communication end point with the small port number is used as a service end point, and corresponding marks are made.
After searching for a service endpoint in a network data packet, clustering the service endpoint is needed, and network traffic characteristics related to the searched service endpoint are collected first, wherein the network traffic characteristics comprise a port number, a communication opposite terminal set, a received data packet and a sent data packet. The collected network traffic characteristics need to be extracted and converted, and the extraction of the network characteristics includes port numbers, IP addresses of a plurality of clients with the largest traffic, character statistical characteristics of incoming data packets and outgoing data packets, such as character-level TF-IDF values. After the relevant information is extracted, the extracted features are combined, and a feature vector is generated. Finally, a clustering algorithm can be applied to the feature vectors of the service endpoints, and each cluster output by the clustering algorithm is used as a service through the calculation of the clustering algorithm, so that the service can be searched.
After the services are found, the number of received packets is calculated for each service according to the characteristics between the services and the time window W, so that the number of received packets in the time window W forms a flow time series, W in this embodiment is set as a parameter of the time window, and specifically, the parameter W may be selected from 1 second, 10 seconds, 1 minute, and the like.
Secondly, the association relation between the services needs to be analyzed, for any two services, the traffic time series corresponding to the any two services are obtained, the correlation of the two traffic time series is calculated, if the correlation exceeds the service team of the threshold value T, a path candidate set can be added, wherein T is a set threshold value parameter and is a reference value, the services are further screened, and if the correlation does not exceed the threshold value T, the path candidate set cannot be added.
And finally, taking each service as a node, taking each element of the determined path candidate set as an edge, further forming a service path graph, and making final output of the system, thereby obtaining the required service path.
The embodiment of the invention provides an automatic service path discovery method, which comprises the steps of firstly searching service endpoints, then clustering all the service endpoints to further find corresponding services, and finally analyzing the incidence relation among all the services to search a service path. In the traditional process of combing the flow into the generated flow path diagram, more manpower and time are needed, and the automatic mode provided by the invention discovers the service from a large amount of flow, so that the cost of manual combing is reduced, and meanwhile, the relationship among the services can be discovered to search for the service path, thereby being beneficial to understanding the actual network topology.
The embodiment of the invention also provides an automatic service path discovery system which comprises a searching module, a clustering module and an analyzing module, wherein the searching module is mainly used for acquiring the network data packet, so that the service endpoint can be searched in the network data packet. In the process of searching for the service end point, analyzing the obtained network data packet, finding out all communication end points while analyzing the network data packet, wherein the communication end points comprise an IP and communication opposite ends, and counting the number of the communication opposite ends after finding out the communication end points.
Each network communication pair has two corresponding communication end points, and aiming at the two communication end points in each network communication pair, the ports with more communication end points are used as service ports in the communication end points which are just counted and marked correspondingly; when the number of the communication opposite ends is the same, the communication end point with the small port number can be used as a service end point, and corresponding marks are made.
In the clustering module, the clustering module is mainly used for clustering the searched service endpoints to find out the service. For each found service endpoint, collecting network traffic characteristics related to the service endpoint is started, where the network traffic characteristics in this embodiment include a port number, a communication peer set, a received data packet, and an outgoing data packet. After the network traffic characteristics are collected, the corresponding network traffic characteristics need to be extracted and converted, wherein the extraction of the network traffic characteristics includes port numbers, IP addresses of a plurality of clients with the largest traffic, and character statistical characteristics of incoming data packets and outgoing data packets. Combining the extracted network flow characteristics to generate a characteristic vector, calculating the generated characteristic vector by using a clustering algorithm, and outputting the cluster obtained by the calculation of the clustering algorithm as a service.
After finding the corresponding service, the analysis module needs to analyze, and the analysis module analyzes the relationship between the services according to the found service to obtain the association relationship between the services, and then searches the corresponding service path. In the analysis module, for each service, the number of received packets needs to be calculated within a set time, W is used as a parameter of a time window, the time window W can be arbitrarily selected to be 1 second, 10 seconds, 1 minute or the like, and all the number of received packets in the selected time window W is calculated to form a flow time sequence.
Any two of the services are selected, the correlation of the two corresponding flow time sequences is calculated, a reference value is set as a threshold value T, the calculated correlation is compared with the threshold value T, and therefore the service team with the correlation exceeding the threshold value T can be added into the path candidate set. Finally, each service is used as a node, each element in the obtained path candidate set is used as an edge, and a graph can be formed, namely the obtained service path graph is requested and used as the output of the system.
The embodiment of the invention provides an automatic service path discovery system, which sequentially realizes the searching of service endpoints, the searching of services and the analysis of the relationship among the services to obtain a service path through a searching module, a clustering module and an analyzing module. Therefore, the relationship and the whole path diagram of the service can be discovered while the service is discovered, the service discovery is simpler, and the cost of manual input is reduced.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (9)

1. A method for automatic discovery of service paths, the method comprising:
step A: acquiring a network data packet, and searching a service endpoint in the network data packet;
and B: after the service end points are found, clustering the service end points to find the service;
and C: and analyzing the incidence relation among the services according to the services, and finding out a corresponding service path.
2. The method of claim 1, wherein step a comprises:
step A-a: analyzing the network data packet to find out all communication endpoints, wherein the communication endpoints comprise an IP and a communication opposite terminal;
step A-b: counting the number of communication opposite ends of each communication end point;
step A-c: for two communication end points of each network communication pair, marking the ports with more communication opposite ends as the service ports;
if the number of the communication opposite ends is the same, the communication end point with the small port number is marked as the service end point.
3. The method of claim 1, wherein step B comprises:
step B-a: for each service endpoint, collecting network traffic characteristics related to the service endpoint, wherein the network traffic characteristics comprise a port number, a communication opposite terminal set, a received data packet and an emitted data packet;
step B-B: extracting and converting network traffic characteristics, wherein the extraction of the network characteristics comprises port numbers, IP addresses of a plurality of clients with the largest traffic, and character statistical characteristics of an inflow data packet and an outflow data packet;
step B-c: combining the above features and generating a feature vector;
step B-d: and applying a clustering algorithm to the feature vectors of the service endpoints, wherein each cluster output by the clustering algorithm is used as a service.
4. The method of claim 1, wherein step C comprises:
step C-a: for each service, calculating the number of received packets according to a time window W, and further forming a flow time sequence;
step C-b: for any two services, calculating the correlation of the two corresponding traffic time series, wherein the service team with the correlation exceeding a threshold value T is added into a path candidate set;
and C-d: and taking the service as a node, taking each element of the path candidate set as an edge, further forming a service path graph, and finally outputting the system.
5. An automatic service path discovery system, the system comprising:
the searching module is used for acquiring a network data packet and searching a service endpoint in the network data packet;
the clustering module is used for clustering the service endpoints to find the service after searching the service endpoints;
and the analysis module is used for analyzing the incidence relation among the services according to the services and finding out the corresponding service path.
6. The system of claim 5, wherein the finding module comprises:
analyzing the network data packet to find out all communication endpoints, wherein the communication endpoints comprise an IP and a communication opposite terminal;
counting the number of communication opposite ends of each communication end point;
for two communication end points of each network communication pair, marking the ports with more communication opposite ends as the service ports;
if the number of the communication opposite ends is the same, the communication end point with the small port number is marked as the service end point.
7. The system of claim 5, wherein the clustering module comprises:
for each service endpoint, collecting network traffic characteristics related to the service endpoint, wherein the network traffic characteristics comprise a port number, a communication opposite terminal set, a received data packet and an emitted data packet;
extracting and converting network traffic characteristics, wherein the extraction of the network characteristics comprises port numbers, IP addresses of a plurality of clients with the largest traffic, and character statistical characteristics of an inflow data packet and an outflow data packet;
combining the above features and generating a feature vector;
and applying a clustering algorithm to the feature vectors of the service endpoints, wherein each cluster output by the clustering algorithm is used as a service.
8. The system of claim 5, wherein the analysis module comprises:
for each service, calculating the number of received packets according to a time window W, and further forming a flow time sequence;
for any two services, calculating the correlation of the two corresponding traffic time series, wherein the service team with the correlation exceeding a threshold value T is added into a path candidate set;
and taking the service as a node, taking each element of the path candidate set as an edge, further forming a service path graph, and finally outputting the system.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
CN202110236560.7A 2021-03-03 2021-03-03 Method, system and medium for automatically discovering service path Active CN113037628B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110236560.7A CN113037628B (en) 2021-03-03 2021-03-03 Method, system and medium for automatically discovering service path

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110236560.7A CN113037628B (en) 2021-03-03 2021-03-03 Method, system and medium for automatically discovering service path

Publications (2)

Publication Number Publication Date
CN113037628A true CN113037628A (en) 2021-06-25
CN113037628B CN113037628B (en) 2022-11-22

Family

ID=76466498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110236560.7A Active CN113037628B (en) 2021-03-03 2021-03-03 Method, system and medium for automatically discovering service path

Country Status (1)

Country Link
CN (1) CN113037628B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103338150A (en) * 2013-07-19 2013-10-02 中国人民解放军信息工程大学 Method and device for establishing information communication network system structure, as well as server and router
CN104780099A (en) * 2014-01-10 2015-07-15 瞻博网络公司 Dynamic end-to-end network path setup across multiple network layers with network service chaining
US20150263901A1 (en) * 2014-03-13 2015-09-17 Cisco Technology, Inc. Service node originated service chains in a network environment
CN105207804A (en) * 2015-08-18 2015-12-30 昆明理工大学 Real-time evaluation method for service path stability
WO2018010491A1 (en) * 2016-07-14 2018-01-18 华为技术有限公司 Method and device for generating service path
CN108833279A (en) * 2018-05-08 2018-11-16 西安交通大学 The method of Multi-constraint QoS paths based on business classification in software defined network
CN108880857A (en) * 2015-08-24 2018-11-23 上海天旦网络科技发展有限公司 It was found that the method and system with presentation network application access information

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103338150A (en) * 2013-07-19 2013-10-02 中国人民解放军信息工程大学 Method and device for establishing information communication network system structure, as well as server and router
US20150023207A1 (en) * 2013-07-19 2015-01-22 The Pla Information Engineering University Method and device for establishing structure of a communication network system
CN104780099A (en) * 2014-01-10 2015-07-15 瞻博网络公司 Dynamic end-to-end network path setup across multiple network layers with network service chaining
US20150263901A1 (en) * 2014-03-13 2015-09-17 Cisco Technology, Inc. Service node originated service chains in a network environment
CN105207804A (en) * 2015-08-18 2015-12-30 昆明理工大学 Real-time evaluation method for service path stability
CN108880857A (en) * 2015-08-24 2018-11-23 上海天旦网络科技发展有限公司 It was found that the method and system with presentation network application access information
WO2018010491A1 (en) * 2016-07-14 2018-01-18 华为技术有限公司 Method and device for generating service path
CN108833279A (en) * 2018-05-08 2018-11-16 西安交通大学 The method of Multi-constraint QoS paths based on business classification in software defined network

Also Published As

Publication number Publication date
CN113037628B (en) 2022-11-22

Similar Documents

Publication Publication Date Title
Shapira et al. FlowPic: A generic representation for encrypted traffic classification and applications identification
Dong et al. Novel feature selection and classification of Internet video traffic based on a hierarchical scheme
CN107665191B (en) Private protocol message format inference method based on extended prefix tree
CN110445653B (en) Network state prediction method, device, equipment and medium
Grimaudo et al. Select: Self-learning classifier for internet traffic
US7836171B2 (en) Communication link interception using link fingerprint analysis
CN102420723A (en) Anomaly detection method for various kinds of intrusion
CN113328985B (en) Passive Internet of things equipment identification method, system, medium and equipment
CN111953552B (en) Data flow classification method and message forwarding equipment
US20150188879A1 (en) Apparatus for grouping servers, a method for grouping servers and a recording medium
CN112564991A (en) Application identification method and device and storage medium
CN108028807B (en) Method and system for online automatic identification of network traffic models
Mardini et al. Genetic algorithm for friendship selection in social IoT
CN109768936B (en) Refined shunting system and shunting method
CN112468324B (en) Graph convolution neural network-based encrypted traffic classification method and device
Fernandes et al. A stratified traffic sampling methodology for seeing the big picture
CN113037628B (en) Method, system and medium for automatically discovering service path
Kardes et al. Graph based induction of unresponsive routers in internet topologies
CN112383488A (en) Content identification method suitable for encrypted and non-encrypted data streams
CN113726809B (en) Internet of things equipment identification method based on flow data
CN110912906B (en) Edge calculation malicious node identification method
Boussaoud et al. Performance evaluation of supervised ml algorithms for elephant flow detection in sdn
Koksal et al. Markov model based traffic classification with multiple features
Menuka et al. Network traffic classification using machine learning for software defined networks
CN112733689B (en) HTTPS terminal type classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant