CN110784465A - Data stream detection method and device and electronic equipment - Google Patents

Data stream detection method and device and electronic equipment Download PDF

Info

Publication number
CN110784465A
CN110784465A CN201911024915.5A CN201911024915A CN110784465A CN 110784465 A CN110784465 A CN 110784465A CN 201911024915 A CN201911024915 A CN 201911024915A CN 110784465 A CN110784465 A CN 110784465A
Authority
CN
China
Prior art keywords
data stream
model
classification
flow
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911024915.5A
Other languages
Chinese (zh)
Other versions
CN110784465B (en
Inventor
王春磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Security Technologies Co Ltd
Original Assignee
New H3C Security Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Security Technologies Co Ltd filed Critical New H3C Security Technologies Co Ltd
Priority to CN201911024915.5A priority Critical patent/CN110784465B/en
Publication of CN110784465A publication Critical patent/CN110784465A/en
Application granted granted Critical
Publication of CN110784465B publication Critical patent/CN110784465B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the invention provides a data stream detection method, a data stream detection device and electronic equipment. The method comprises the following steps: acquiring the traffic model characteristic of the encrypted data message to be transmitted subsequently according to the information carried in the handshake message transmitted in the handshake stage, wherein the traffic model characteristic is used for expressing the transmission characteristic of the encrypted data message to be transmitted; and inputting the flow model characteristics of the encrypted data message into a pre-established data flow classification model for classification to obtain a classification result of the encrypted data message, wherein the classification result is used for indicating whether the encrypted data message contains malicious data. On the premise of not decrypting the encrypted data message, the detection of whether the encrypted data message contains malicious data or not can be realized.

Description

Data stream detection method and device and electronic equipment
Technical Field
The present invention relates to the field of data transmission technologies, and in particular, to a data stream detection method, an apparatus, and an electronic device.
Background
In some application scenarios, for network security, a monitoring mechanism may detect a data flow to determine whether there is data (hereinafter referred to as malicious data) that may affect data security, such as viruses, malicious scripts, and malicious attack messages.
In the related art, the data stream may be analyzed to obtain data transmitted by the data stream, and the data characteristics of the data may be matched with the data characteristics of the malicious data to determine whether the transmitted data includes the malicious data.
However, with the popularization of encrypted transmission technologies such as Transport Layer Security (TLS) protocol, encrypted data packets are widely used for data transmission. Data in the encrypted data message is transmitted in a form of a ciphertext, so that a third party organization which does not know a secret key is difficult to decrypt the data message, and therefore the data transmitted by the encrypted data message may not be obtained, so that the related technology cannot effectively detect the encrypted data message.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a data stream detection method, an apparatus, and an electronic device, so as to implement detection on whether malicious data is included in a data stream on the premise that the data stream is not decrypted. The specific technical scheme is as follows:
in a first aspect of the present invention, a data stream detection method is provided, the method comprising:
acquiring the traffic model characteristic of the encrypted data message to be transmitted subsequently according to the information carried in the handshake message transmitted in the handshake stage, wherein the traffic model characteristic is used for expressing the transmission characteristic of the encrypted data message to be transmitted;
and inputting the flow model characteristics of the encrypted data message into a pre-established data flow classification model for classification to obtain a classification result of the encrypted data message, wherein the classification result is used for indicating whether the encrypted data message contains malicious data.
With reference to the first aspect, in one possible embodiment, the flow model features include one or more of the following features:
the SPLT characteristic is used for expressing the effective load length and the arrival interval of the encrypted data message to be transmitted, and the arrival interval is used for expressing the interval of the arrival time between the encrypted data message and the previous data message of the encrypted data message;
BD characteristics, wherein the BD characteristics are distribution conditions of byte values in the process of negotiating encryption parameters;
TLS characteristic, which is used to express the configuration parameter in the safety transmission layer protocol used by the following encrypted data message to be transmitted;
and the IDP characteristic is used for expressing data elements in the first handshake message of the handshake messages according to the time sequence.
With reference to the first aspect, in a possible embodiment, the data flow classification model is obtained by training in advance in the following manner:
acquiring the flow model characteristics of the first sample data stream;
inputting the flow model characteristic of the first sample data stream into an initial classification model to obtain a classification result of the first sample data stream;
constructing a loss function according to the classification result and the data stream type labeled by the first sample data stream;
and adjusting model parameters in the initial classification model based on the loss function by using a gradient descent algorithm to obtain the data stream classification model.
With reference to the first aspect, in a possible embodiment, the data flow classification model is tested after training is completed by:
acquiring traffic model characteristics of a second sample data stream, wherein the second sample data stream comprises a normal sample data stream and/or an abnormal sample data stream;
inputting the flow model characteristics of the second data stream into the data stream classification model to obtain a classification result of the second sample data stream;
determining the score of the data stream classification model according to the similarity degree of the classification result and the data stream classification labeled by the second sample data stream, wherein the score is positively correlated with the similarity degree;
if the score is higher than a preset score threshold value, determining that the data flow classification model passes the test;
and if the score is not higher than a preset score threshold value, adjusting the model parameters of the data flow classification model to obtain a new data flow classification model.
In a second aspect of the present invention, there is provided a data stream detection apparatus, the apparatus comprising:
the characteristic extraction module is used for acquiring the flow model characteristic of the subsequent encrypted data message to be transmitted according to the information carried in the handshake message transmitted in the handshake stage, wherein the flow model characteristic is used for expressing the transmission characteristic of the encrypted data message to be transmitted;
and the classification module is used for inputting the flow model characteristics of the encrypted data message into a pre-established data flow classification model for classification to obtain a classification result of the encrypted data message, wherein the classification result is used for indicating whether the encrypted data message contains malicious data.
With reference to the second aspect, in one possible embodiment, the flow model features include one or more of the following features:
the SPLT characteristic is used for expressing the effective load length and the arrival interval of the encrypted data message to be transmitted, and the arrival interval is used for expressing the interval of the arrival time between the encrypted data message and the previous encrypted data message of the encrypted data message;
BD characteristics, wherein the BD characteristics are distribution conditions of byte values in the process of negotiating encryption parameters;
TLS characteristic, which is used to express the configuration parameter in the safety transmission layer protocol used by the encrypted data message to be transmitted;
and the IDP characteristic is used for expressing data elements in the first handshake message of the handshake messages according to the time sequence.
In combination with the second aspect, in a possible embodiment, the data flow classification model is trained in advance by:
acquiring the flow model characteristics of the first sample data stream;
inputting the flow model characteristic of the first sample data stream into an initial classification model to obtain a classification result of the first sample data stream;
constructing a loss function according to the classification result and the data stream type labeled by the first sample data stream;
and adjusting model parameters in the initial classification model based on the loss function by using a gradient descent algorithm to obtain the data stream classification model.
With reference to the second aspect, in a possible embodiment, the data flow classification model is tested after training is completed by:
acquiring traffic model characteristics of a second sample data stream, wherein the second sample data stream comprises a normal sample data stream and/or an abnormal sample data stream;
inputting the flow model characteristics of the second data stream into the data stream classification model to obtain a classification result of the second sample data stream;
determining the score of the data stream classification model according to the similarity degree of the classification result and the data stream classification labeled by the second sample data stream, wherein the score is positively correlated with the similarity degree;
if the score is higher than a preset score threshold value, determining that the data flow classification model passes the test;
and if the score is not higher than a preset score threshold value, adjusting the model parameters of the data flow classification model to obtain a new data flow classification model.
In a third aspect of the present invention, there is provided an electronic device comprising:
a memory for storing a computer program;
a processor adapted to perform the method steps of any of the above first aspects when executing a program stored in the memory.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, having stored therein a computer program which, when executed by a processor, performs the method steps of any of the above-mentioned first aspects.
According to the data stream detection method, the device and the electronic equipment provided by the embodiment of the invention, the subsequent encrypted data message to be transmitted in the encrypted data stream to be detected can be classified through the flow model characteristics extracted from the handshake message in the handshake phase, so as to determine whether the encrypted data message to be transmitted contains malicious data, and the data characteristics of the data transmitted by the encrypted data message do not need to be acquired, so that the encrypted data message does not need to be decrypted, and the detection of whether the encrypted data message contains the malicious data can be realized on the premise of not decrypting the encrypted data message. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a data stream detection method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method for training a data flow classification model according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a method for detecting a data stream classification model according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data stream detection apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a data stream detection method according to an embodiment of the present invention, where the method may be applied to any electronic device having a data stream detection function, and for example, may be applied to a server device having a data stream detection function, where the server device is configured to perform data stream detection on a data stream sent by a received client, and the method may include:
s101, according to the plaintext carried in the handshake message transmitted in the handshake stage, the flow model characteristic of the subsequent encrypted data message to be transmitted is obtained.
The traffic model feature is used for representing the data transmission mode in the encrypted data message. The plaintext carried in the handshake message may be different according to different application scenarios. Illustratively, taking a handshake message according to the TLS protocol as an example, information contained in the handshake message is expressed in a clear text form. The plaintext comprises the relevant characteristics of the encrypted data message to be transmitted subsequently, so that the traffic model characteristics of the encrypted data message can be acquired according to the plaintext carried in the handshake message transmitted in the handshake stage. In a possible embodiment, the handshake message may include a Hello message, a ClientKeyExchange message, and a ChangeCipherSpec message.
In this example, the traffic model features may include one or more of a length and arrival detection time order (SPLT) feature, a Byte Distribution (BD) feature, a Transport Layer Security (TLS) feature, and an Initial Data Packets (IDP) feature. The flow model features may be directly obtained through the handshake messages, or may be calculated by obtaining the relevant information in the handshake messages.
The SPLT feature may be used to indicate the payload length and the arrival interval of the encrypted data packet to be subsequently transmitted.
The arrival interval is used to indicate the interval between the arrival time of an encrypted data message and the arrival time of the previous data message of the encrypted data message. For example, if the arrival time interval information of the subsequent encrypted data packet to be transmitted, which is obtained in the handshake message, is 0.5ms, when any encrypted data packets a and B are received, whether the encrypted data packet is an abnormal packet is determined by whether the arrival interval between the encrypted data packets a and B is 0.5ms, so that the encrypted data packet does not need to be decrypted.
BD is characterized by the distribution of byte values during negotiation of encryption parameters. In the handshake phase, the home terminal and the opposite terminal negotiate encryption parameter information such as an encryption protocol used for transmitting the encrypted data messages, the number of the transmitted encrypted data messages, the size of the encrypted data messages and the like. Therefore, the distribution condition of each byte value can be determined according to each byte of the handshake message in the encryption parameter negotiation process. When determining the distribution of the values of the bytes, a counter corresponding to each byte may be set. In one example, if the encrypted transport protocol of handshake negotiation is HTTPs, and the byte value of the first byte of the transport protocol is "H", the counter count set for "H" is incremented by one, the byte value of the second byte is "T", the counter count set for "T" is incremented by one, and so on. After traversing each byte of the payload, the reading of each counter may indicate the number of occurrences of each byte value in the payload. The number of occurrences of each byte value may be taken as a BD characteristic.
The BD characteristic may be calculated as the frequency of occurrence of each byte value based on the number of occurrences of each byte value, or may be calculated as the BD characteristic based on the frequency of occurrence or the frequency of occurrence of each byte value (e.g., entropy calculation, mean calculation, deviation calculation, etc.), and the result of the calculation is used as the BD characteristic.
For example, assume that there are n kinds of byte values in the payload, respectively denoted as v, according to the occurrence frequency of each byte value 1-v nAnd the ith byte value v iHas an appearance frequency of P iThen, the entropy of the byte value in the payload can be calculated according to the following formula:
Figure BDA0002248351120000071
the average value of the byte values in the payload can also be calculated according to the following formula:
the deviation of the byte value in the payload can also be calculated according to the following formula:
the TLS feature may be used to indicate configuration parameters in a secure transport layer protocol used by a subsequent encrypted data packet to be transmitted. The configuration parameters may be agreed at the handshake stage through the handshake message, so that the TLS feature may be obtained through the plaintext carried by the handshake message transmitted at the handshake stage. The TLS feature will be described in detail in the following embodiments, and will not be described herein.
The IDP feature may be configured to represent data elements in a first data packet of a handshake message in chronological order, and, taking transmission of the handshake message according to a TLS Protocol as an example, obtain a Uniform Resource Locator (URL) of a hypertext Transfer Protocol (HTTP) and a Domain Name System (DNS) host Name or address of a hypertext Transfer Protocol (HTTP) carried in the first handshake message, where the feature represents information such as a URL, a DNS host Name or address used by a subsequent data packet to be transmitted. Other data elements may also be included according to different actual application scenarios, which is not limited in this embodiment.
S102, inputting the flow model characteristics of the encrypted data message into a pre-established data flow classification model for classification, and obtaining the classification result of the encrypted data message.
The data flow classification model is obtained by training based on the flow model characteristics of the first sample data flow, and the flow model characteristics of the first sample data flow are obtained according to the information carried in the handshake messages transmitted in the handshake stage of the first sample data flow, and the flow model characteristics of the encrypted data messages subsequently transmitted in the first sample data flow are obtained.
The first sample data stream comprises normal sample data of which the marked data stream type is a normal data stream and an abnormal sample data stream of which the marked data stream type is an abnormal data stream. The data stream type of the first sample data stream may be labeled manually, or may be labeled by using other data stream detection methods, which is not limited in this embodiment.
The classification result is used for indicating whether the encrypted data message contains malicious data. In a possible embodiment, since the classification result of the encrypted data packet may only be a data stream including malicious data (hereinafter referred to as an abnormal data stream) or a data stream not including malicious data (hereinafter referred to as a normal data stream), the mapping relationship may be a binary mapping relationship, and the mapping relationship may be expressed by a Logistic equation.
The mapping relationship may be obtained based on machine learning, and according to different application scenarios, the mapping relationship may be a neural network model obtained based on deep learning, or an algorithm model obtained based on conventional machine learning, which is not limited in this embodiment.
It can be understood that the transmission mode of the encrypted data packet containing malicious data often has certain regularity, and the traffic model feature may represent the feature of the transmission mode of the encrypted data packet on one or more indexes, so that it may be determined whether the encrypted data packet contains malicious data according to the traffic model feature.
By adopting the embodiment, the subsequent encrypted data message to be transmitted can be classified through the flow model characteristics extracted from the handshake message in the handshake phase so as to determine whether the encrypted data message to be transmitted contains malicious data, and the data characteristics of the data transmitted by the encrypted data message do not need to be acquired, so that the encrypted data message does not need to be decrypted, and the detection of whether the encrypted data message contains the malicious data can be realized on the premise of not decrypting the encrypted data message.
In a possible embodiment, the TLS characteristics obtained from the handshake message may include one or more characteristics shown in the following table, and in other possible embodiments, other characteristics may also be included in the TLS characteristics, which is not limited in this embodiment:
in a possible embodiment, a data flow classification model will be described below, and referring to fig. 2, fig. 2 is a schematic flow chart of a data flow classification model training method provided by an embodiment of the present invention, and it is understood that the method may be applied to any device having a classification model training function, and the method may include:
s201, obtaining the flow model characteristics of the first sample data flow.
And the traffic model characteristic of the first sample data flow is used for expressing the transmission characteristic of the encrypted data message in the first sample data flow. The traffic model feature of the first sample data flow may be a traffic model feature of an encrypted data packet to be subsequently transmitted, which is obtained according to information carried in a handshake packet transmitted by the first sample data flow in the handshake stage. The traffic model feature of the first sample data stream includes the same feature type as the traffic model feature of the encrypted data packet that needs to be detected.
In order to make the trained data stream classification model more accurate, the first sample data stream includes a normal sample data stream with the labeled data stream type being a normal data stream, and also includes an abnormal sample data stream with the labeled data stream type being an abnormal data stream. So that the data flow classification model can learn the flow model characteristics of normal sample data flow and the flow model characteristics of abnormal sample data flow.
For example, assuming that the traffic model features of the encrypted data packet that needs to be subjected to data flow detection include and only include the TLS feature and the BD feature, the traffic model features of the first sample data flow include and only include the TLS feature and the BD feature.
S202, inputting the flow model characteristics of the first sample data stream into the initial classification model to obtain the classification result of the first sample data stream.
The model parameters of the initial classification model may be pre-configured.
It is to be understood that the initial classification model can be regarded as a mapping from flow model features to classification results, and for convenience of description, it can be assumed that the mapping is expressed in the form of a function y ═ h (x), where y represents the classification results and x represents the flow model features. And assuming that the flow model characteristic of the ith first sample data stream is a vector x iExpressed in the form of (i), the function of the classification result of the ith first sample data stream may be h (x) i) Is shown in the form of (1).
S203, constructing a loss function according to the classification result and the data stream type labeled by the first sample data stream.
Let y be the function of the class of the data stream labeled by the ith first sample data stream iThe loss function can then be calculated in the form of:
Figure BDA0002248351120000101
where J is the loss function and m is the number of first sample data streams.
And S204, adjusting model parameters in the initial classification model by using a gradient descent algorithm based on a loss function to obtain a data stream classification model.
Assuming that the flow model features of the sample data stream are n-dimensional vectors, the data stream classification model may include n +1 parameters, which are respectively denoted as θ 01...θ nThen the mapping relationship may be h θ(X)=θ TA function represented in the form of X, wherein T representsTransposing theta by theta 01...θ nA parameter vector consisting of X 0,x 1,x 2,…,x nConstituent feature matrices, x 0Is a preset value, exemplary x 0Can be 1, i-th action x in the feature matrix i-1
May be a series of theta's calculated using a batch gradient descent algorithm of linear regression such that the loss function J is minimized 01...θ nThe series of theta 01...θ nNamely the model parameters of the data flow classification model.
In a possible embodiment, in order to improve the accuracy of the trained data stream classification model, the data stream classification model may be detected after being trained, and the detection process may be as shown in fig. 3, which includes:
s301, obtaining the flow model characteristic of the second sample data flow.
And the traffic model characteristic of the second sample data flow is used for representing the transmission characteristic of the encrypted data message in the second sample data flow. The traffic model feature of the second sample data stream may be a traffic model feature of an encrypted data message to be subsequently transmitted, which is obtained according to information carried in a handshake message transmitted by the second sample data stream at a handshake stage. The traffic model feature of the second sample data stream includes the same feature type as the traffic feature of the encrypted data packet to be detected. The second sample data stream may only include a normal sample data stream, may also only include an abnormal sample data stream, and may also include a normal sample data stream and an abnormal sample data stream.
S302, inputting the flow model characteristics of the second sample data flow into the data flow classification model to obtain the classification result of the second sample data flow.
For the data flow classification model, reference may be made to the related descriptions in the foregoing S102 and S202, which are not described herein again.
And S303, determining the score of the data stream classification model according to the similarity between the classification result and the data stream classification labeled by the second sample data stream.
Wherein the score is positively correlated with the degree of similarity. According to different application scenarios, the score may be calculated in different manners, for example, in one possible embodiment, the number of second sample data streams with the classification result being the same as the labeled data stream category may be counted, and the score of the data stream model is determined by using the number or a ratio of the number and the total number of the second sample data streams. In another possible embodiment, the number of the abnormal sample data streams with the classification result being the same as the labeled data stream category may also be counted, and the ratio of the number to the abnormal sample data streams in the second sample data stream is used as the score of the data stream model.
S304, determining whether the score is higher than a preset score threshold, if the score is higher than the preset score threshold, executing S305, and if the score is not higher than the preset score threshold, executing S306.
S305, determining that the data flow classification model passes the test.
It can be understood that, if the score is higher than the preset score threshold, the accuracy of the classification result of the data flow classification model may be considered to be higher, and the data flow to be detected may be classified by using the data flow classification model.
S306, adjusting the model parameters of the data flow classification model to obtain a new data flow classification model.
The process of adjusting parameters may refer to the aforementioned process of training the data flow classification model, and is not described herein again. It can be understood that if the score is not higher than the preset score threshold, the classification result of the data flow classification model can be considered to be less accurate, and therefore, the model parameters in the data flow classification model still need to be further adjusted. After the new data stream classification model is obtained, the new data stream classification model can be detected.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a data stream detection apparatus according to an embodiment of the present invention, which may include:
a feature extraction module 401, configured to obtain a traffic model feature of an encrypted data packet to be subsequently transmitted according to information carried in a data packet transmitted in a handshake phase, where the traffic model feature is used to represent a transmission feature of the encrypted data packet with transmission;
the classification module 402 is configured to input the traffic model features of the encrypted data packet into a pre-established data stream classification model for classification, so as to obtain a classification result of the encrypted data packet, where the classification result is used to indicate whether the encrypted data packet contains malicious data.
In one possible embodiment, the flow model features include one or more of the following features:
the SPLT characteristic is used for expressing the effective load length and the arrival interval of the encrypted data message to be transmitted, and the arrival interval is used for expressing the interval of the arrival time between the encrypted data message and the previous encrypted data message of the encrypted data message;
the BD features are the distribution conditions of all byte values in the process of negotiating the encryption parameters;
TLS characteristic, which is used to express the configuration parameter in the safety transmission layer protocol used by the encrypted data message to be transmitted;
and the IDP characteristic is used for expressing data elements in the first handshake message of the handshake messages according to the time sequence.
In one possible embodiment, the data flow classification model is trained in advance by:
acquiring the flow model characteristics of the first sample data stream;
inputting the flow model characteristic of the first sample data stream into the initial classification model to obtain a classification result of the first sample data stream;
constructing a loss function according to the classification result and the data stream type labeled by the first sample data stream;
and adjusting model parameters in the initial classification model by using a gradient descent algorithm based on a loss function to obtain a data stream classification model.
In one possible embodiment, the data flow classification model is trained in advance by:
acquiring the flow model characteristics of the first sample data stream;
inputting the flow model characteristic of the first sample data flow into an initial classification model to obtain a classification result of the first sample data flow, wherein the model structure of the initial classification model is the same as that of the data flow classification model;
constructing a loss function according to the classification result and the data stream type labeled by the first sample data stream;
and adjusting model parameters in the initial classification model by using a gradient descent algorithm based on a loss function to obtain a data stream classification model.
After the training of the data flow classification model is completed, the data flow classification model is tested in the following mode:
acquiring the traffic model characteristics of a second sample data stream, wherein the second sample data stream comprises a normal sample data stream and/or an abnormal sample data stream;
inputting the flow model characteristics of the second data flow into a data flow classification model to obtain a classification result of the second sample data flow;
determining the score of the data stream classification model according to the similarity degree of the classification result and the data stream classification labeled by the second sample data stream, wherein the score is positively correlated with the similarity degree;
if the score is higher than a preset score threshold value, determining that the data flow classification model passes the test;
and if the score is not higher than the preset score threshold value, adjusting the model parameters of the data flow classification model to obtain a new data flow classification model.
An embodiment of the present invention further provides an electronic device, as shown in fig. 5, including:
a memory 501 for storing a computer program;
the processor 502 is configured to implement the following steps when executing the program stored in the memory 501:
acquiring the flow model characteristics of the encrypted data message to be transmitted subsequently according to the information carried in the handshake message transmitted in the handshake stage, wherein the flow model characteristics are used for expressing the transmission characteristics of the encrypted data message to be transmitted;
and inputting the flow model characteristics of the encrypted data message into a pre-established data flow classification model for classification to obtain a classification result of the encrypted data message, wherein the classification result is used for expressing whether the encrypted data message contains malicious data.
In one possible embodiment, the flow model features include one or more of the following features:
the SPLT characteristic is used for expressing the effective load length and the arrival interval of the encrypted data message to be transmitted, and the arrival interval is used for expressing the interval of the arrival time between the encrypted data message and the previous data message of the encrypted data message;
the BD features are the distribution conditions of all byte values in the process of negotiating the encryption parameters;
TLS characteristic, which is used to express the configuration parameter in the safety transmission layer protocol used by the following encrypted data message to be transmitted;
and the IDP characteristic is used for expressing data elements in the first handshake message of the handshake messages according to the time sequence.
In one possible embodiment, the data flow classification model is trained in advance by:
acquiring the flow model characteristics of the first sample data stream;
inputting the flow model characteristic of the first sample data stream into the initial classification model to obtain a classification result of the first sample data stream;
constructing a loss function according to the classification result and the data stream type labeled by the first sample data stream;
and adjusting model parameters in the initial classification model by using a gradient descent algorithm based on a loss function to obtain a data stream classification model.
In one possible embodiment, the data flow classification model is tested after training is completed by:
acquiring the traffic model characteristics of a second sample data stream, wherein the second sample data stream comprises a normal sample data stream and/or an abnormal sample data stream;
inputting the flow model characteristics of the second data flow into a data flow classification model to obtain a classification result of the second sample data flow;
determining the score of the data stream classification model according to the similarity degree of the classification result and the data stream classification labeled by the second sample data stream, wherein the score is positively correlated with the similarity degree;
if the score is higher than a preset score threshold value, determining that the data flow classification model passes the test;
and if the score is not higher than the preset score threshold value, adjusting the model parameters of the data flow classification model to obtain a new data flow classification model.
The Memory mentioned in the electronic device may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In another embodiment of the present invention, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to execute any one of the data stream detection methods in the above embodiments.
In a further embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the data stream detection methods of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, the computer-readable storage medium, and the computer program product, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A method for data stream detection, the method comprising:
acquiring the traffic model characteristic of the encrypted data message to be transmitted subsequently according to the information carried in the handshake message transmitted in the handshake stage, wherein the traffic model characteristic is used for expressing the transmission characteristic of the encrypted data message to be transmitted;
and inputting the flow model characteristics of the encrypted data message into a pre-established data flow classification model for classification to obtain a classification result of the encrypted data message, wherein the classification result is used for indicating whether the encrypted data message contains malicious data.
2. The method of claim 1, wherein the flow model features include one or more of the following features:
the SPLT characteristic is used for expressing the effective load length and the arrival interval of the encrypted data message to be transmitted, and the arrival interval is used for expressing the interval of the arrival time between the encrypted data message and the previous data message of the encrypted data message;
BD characteristics, wherein the BD characteristics are distribution conditions of byte values in the process of negotiating encryption parameters;
TLS characteristic, which is used to express the configuration parameter in the safety transmission layer protocol used by the following encrypted data message to be transmitted;
and the IDP characteristic is used for expressing data elements in the first handshake message of the handshake messages according to the time sequence.
3. The method of claim 1, wherein the data flow classification model is trained in advance by:
acquiring the flow model characteristics of the first sample data stream;
inputting the flow model characteristic of the first sample data stream into an initial classification model to obtain a classification result of the first sample data stream;
constructing a loss function according to the classification result and the data stream type labeled by the first sample data stream;
and adjusting model parameters in the initial classification model based on the loss function by using a gradient descent algorithm to obtain the data stream classification model.
4. The method of claim 3, wherein the data flow classification model is tested after training is completed by:
acquiring traffic model characteristics of a second sample data stream, wherein the second sample data stream comprises a normal sample data stream and/or an abnormal sample data stream;
inputting the flow model characteristics of the second data stream into the data stream classification model to obtain a classification result of the second sample data stream;
determining the score of the data stream classification model according to the similarity degree of the classification result and the data stream classification labeled by the second sample data stream, wherein the score is positively correlated with the similarity degree;
if the score is higher than a preset score threshold value, determining that the data flow classification model passes the test;
and if the score is not higher than a preset score threshold value, adjusting the model parameters of the data flow classification model to obtain a new data flow classification model.
5. A data flow detection apparatus, characterized in that the apparatus comprises:
the characteristic extraction module is used for acquiring the flow model characteristic of the subsequent encrypted data message to be transmitted according to the information carried in the handshake message transmitted in the handshake stage, wherein the flow model characteristic is used for expressing the transmission characteristic of the encrypted data message to be transmitted;
and the classification module is used for inputting the flow model characteristics of the encrypted data message into a pre-established data flow classification model for classification to obtain a classification result of the encrypted data message, wherein the classification result is used for indicating whether the encrypted data message contains malicious data.
6. The apparatus of claim 5, wherein the flow model features comprise one or more of the following features:
the SPLT characteristic is used for expressing the effective load length and the arrival interval of the encrypted data message to be transmitted, and the arrival interval is used for expressing the interval of the arrival time between the encrypted data message and the previous encrypted data message of the encrypted data message;
BD characteristics, wherein the BD characteristics are distribution conditions of byte values in the process of negotiating encryption parameters;
TLS characteristic, which is used to express the configuration parameter in the safety transmission layer protocol used by the encrypted data message to be transmitted;
and the IDP characteristic is used for expressing data elements in the first handshake message of the handshake messages according to the time sequence.
7. The apparatus of claim 5, wherein the data flow classification model is pre-trained by:
acquiring the flow model characteristics of the first sample data stream;
inputting the flow model characteristic of the first sample data stream into an initial classification model to obtain a classification result of the first sample data stream;
constructing a loss function according to the classification result and the data stream type labeled by the first sample data stream;
and adjusting model parameters in the initial classification model based on the loss function by using a gradient descent algorithm to obtain the data stream classification model.
8. The apparatus of claim 7, wherein the data flow classification model is tested after training is completed by:
acquiring traffic model characteristics of a second sample data stream, wherein the second sample data stream comprises a normal sample data stream and/or an abnormal sample data stream;
inputting the flow model characteristics of the second data stream into the data stream classification model to obtain a classification result of the second sample data stream;
determining the score of the data stream classification model according to the similarity degree of the classification result and the data stream classification labeled by the second sample data stream, wherein the score is positively correlated with the similarity degree;
if the score is higher than a preset score threshold value, determining that the data flow classification model passes the test;
and if the score is not higher than a preset score threshold value, adjusting the model parameters of the data flow classification model to obtain a new data flow classification model.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 4 when executing a program stored in the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 4.
CN201911024915.5A 2019-10-25 2019-10-25 Data stream detection method and device and electronic equipment Active CN110784465B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911024915.5A CN110784465B (en) 2019-10-25 2019-10-25 Data stream detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911024915.5A CN110784465B (en) 2019-10-25 2019-10-25 Data stream detection method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110784465A true CN110784465A (en) 2020-02-11
CN110784465B CN110784465B (en) 2023-04-07

Family

ID=69386538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911024915.5A Active CN110784465B (en) 2019-10-25 2019-10-25 Data stream detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110784465B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111163114A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Method and apparatus for detecting network attacks
CN112153045A (en) * 2020-09-24 2020-12-29 中国人民解放军战略支援部队信息工程大学 Method and system for identifying encrypted field of private protocol
CN112637292A (en) * 2020-12-14 2021-04-09 中国联合网络通信集团有限公司 Data processing method and device, electronic equipment and storage medium
CN113329023A (en) * 2021-05-31 2021-08-31 西北大学 Encrypted flow malice detection model establishing and detecting method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180262487A1 (en) * 2017-03-13 2018-09-13 At&T Intellectual Property I, L.P. Extracting data from encrypted packet flows
CN109474568A (en) * 2017-12-25 2019-03-15 北京安天网络安全技术有限公司 For the detection method and system for realizing malicious attack using the preposition technology in domain
CN109714343A (en) * 2018-12-28 2019-05-03 北京天融信网络安全技术有限公司 A kind of judgment method and device of exception of network traffic
CN109766872A (en) * 2019-01-31 2019-05-17 广州视源电子科技股份有限公司 Image-recognizing method and device
CN109802920A (en) * 2017-11-16 2019-05-24 杭州中威电子股份有限公司 A kind of equipment access hybrid authentication system for security industry
CN109936578A (en) * 2019-03-21 2019-06-25 西安电子科技大学 The detection method of HTTPS tunnel traffic in a kind of network-oriented
CN110363243A (en) * 2019-07-12 2019-10-22 腾讯科技(深圳)有限公司 The appraisal procedure and device of disaggregated model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180262487A1 (en) * 2017-03-13 2018-09-13 At&T Intellectual Property I, L.P. Extracting data from encrypted packet flows
CN109802920A (en) * 2017-11-16 2019-05-24 杭州中威电子股份有限公司 A kind of equipment access hybrid authentication system for security industry
CN109474568A (en) * 2017-12-25 2019-03-15 北京安天网络安全技术有限公司 For the detection method and system for realizing malicious attack using the preposition technology in domain
CN109714343A (en) * 2018-12-28 2019-05-03 北京天融信网络安全技术有限公司 A kind of judgment method and device of exception of network traffic
CN109766872A (en) * 2019-01-31 2019-05-17 广州视源电子科技股份有限公司 Image-recognizing method and device
CN109936578A (en) * 2019-03-21 2019-06-25 西安电子科技大学 The detection method of HTTPS tunnel traffic in a kind of network-oriented
CN110363243A (en) * 2019-07-12 2019-10-22 腾讯科技(深圳)有限公司 The appraisal procedure and device of disaggregated model

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111163114A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Method and apparatus for detecting network attacks
CN112153045A (en) * 2020-09-24 2020-12-29 中国人民解放军战略支援部队信息工程大学 Method and system for identifying encrypted field of private protocol
CN112637292A (en) * 2020-12-14 2021-04-09 中国联合网络通信集团有限公司 Data processing method and device, electronic equipment and storage medium
CN112637292B (en) * 2020-12-14 2022-11-22 中国联合网络通信集团有限公司 Data processing method and device, electronic equipment and storage medium
CN113329023A (en) * 2021-05-31 2021-08-31 西北大学 Encrypted flow malice detection model establishing and detecting method and system

Also Published As

Publication number Publication date
CN110784465B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN110784465B (en) Data stream detection method and device and electronic equipment
Anderson et al. Deciphering malware’s use of TLS (without decryption)
US10728041B2 (en) Protecting computer systems using merkle trees as proof of-work
US20200412767A1 (en) Hybrid system for the protection and secure data transportation of convergent operational technology and informational technology networks
EP3602999B1 (en) Initialisation vector identification for encrypted malware traffic detection
CN107646190B (en) Malicious encrypted traffic detector, identification method and computer program element
EP3272096B1 (en) Learned profiles for malicious encrypted network traffic identification
US8483056B2 (en) Analysis apparatus and method for abnormal network traffic
US9853996B2 (en) System and method for identifying and preventing malicious API attacks
US20220201042A1 (en) Ai-driven defensive penetration test analysis and recommendation system
CN112235264B (en) Network traffic identification method and device based on deep migration learning
US10523699B1 (en) Privilege escalation vulnerability detection using message digest differentiation
US10623429B1 (en) Network management using entropy-based signatures
US9531749B2 (en) Prevention of query overloading in a server application
CN111866024B (en) Network encryption traffic identification method and device
CN107992738B (en) Account login abnormity detection method and device and electronic equipment
US10015192B1 (en) Sample selection for data analysis for use in malware detection
CN111224941A (en) Threat type identification method and device
US20230344846A1 (en) Method for network traffic analysis
Fallah et al. Android malware detection using network traffic based on sequential deep learning models
CN111163114A (en) Method and apparatus for detecting network attacks
CN112134829A (en) Method and device for generating encrypted flow characteristic set
JP2023536972A (en) Low latency identification of network device properties
CN114866310A (en) Malicious encrypted flow detection method, terminal equipment and storage medium
Kozik et al. The http content segmentation method combined with adaboost classifier for web-layer anomaly detection system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant