CN113824644A - Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content - Google Patents

Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content Download PDF

Info

Publication number
CN113824644A
CN113824644A CN202010561791.0A CN202010561791A CN113824644A CN 113824644 A CN113824644 A CN 113824644A CN 202010561791 A CN202010561791 A CN 202010561791A CN 113824644 A CN113824644 A CN 113824644A
Authority
CN
China
Prior art keywords
service
https
information
record
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010561791.0A
Other languages
Chinese (zh)
Inventor
王伟
程思霖
王磊
卢阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Design Institute Co Ltd
China Mobile Group Shanxi Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Design Institute Co Ltd
China Mobile Group Shanxi Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Design Institute Co Ltd, China Mobile Group Shanxi Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010561791.0A priority Critical patent/CN113824644A/en
Publication of CN113824644A publication Critical patent/CN113824644A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2408Traffic characterised by specific attributes, e.g. priority or QoS for supporting different services, e.g. a differentiated services [DiffServ] type of service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/29Flow control; Congestion control using a combination of thresholds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention relates to the technical field of mobile internet, and discloses a method, a device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content, wherein the method comprises the following steps: acquiring a service access record of a user; extracting user plane control information of the user according to the service access record of the user; extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information; determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information; and determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value. By the method, the HTTPS service content can be accurately identified.

Description

Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content
Technical Field
The embodiment of the invention relates to the technical field of mobile internet, in particular to a method, a device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content.
Background
The HTTPS is an HTTP channel which takes safety as a target, and the safety of a transmission process is ensured through transmission encryption and identity authentication on the basis of HTTP. At present, in a network, the content accessed by a user is often required to be analyzed so as to count the traffic distribution condition of the network.
In the prior art, a method of fingerprint extraction and fingerprint identification of an HTTPS web page is generally adopted, and ciphertext lengths and encryption modes of a plurality of objects of the HTTPS web page to be processed are obtained according to a data stream of the HTTPS web page to be processed; obtaining respective plaintext length intervals of a plurality of objects according to respective ciphertext lengths and encryption modes of the plurality of objects of the HTTPS webpage to be processed to determine information of each object, wherein the information of each object comprises a maximum length, a minimum length and an average length corresponding to the object; and constructing the fingerprint of the HTTPS webpage to be processed by utilizing the respective information of the objects of the HTTPS webpage to be processed. The fingerprint identification method comprises the following steps: and the identification is completed by extracting the object information of the HTTPS webpage to be identified and matching the object information with the information in the HTTPS webpage fingerprint library.
In the research process, the inventor of the application finds that in the prior art, because fingerprint identification and fingerprint extraction are required, the identification accuracy is low, the identification of HTTPS services cannot be effectively carried out, and the leakage is easily caused.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present invention provide an HTTPS service content identification method, apparatus, device and computer storage medium, which are used to solve the problems in the prior art.
According to an aspect of an embodiment of the present invention, there is provided an HTTPS service content identification method, including:
acquiring a service access record of a user;
extracting user plane control information of the user according to the service access record of the user;
extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information;
determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
and determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value.
Further, the acquiring the service access record of the user includes:
a DPI image light splitting point is deployed at a user service access node;
mirroring the service access record of the user through the DPI mirroring splitting point;
and storing the mirror image data in a sharing layer server.
Further, before extracting the HTTPS service record from the user plane control information according to the pre-established service information feature library, the method further includes:
and establishing a service information feature library according to the user plane control information, wherein the service information feature library comprises a plurality of feature codes, and the feature codes at least comprise user numbers, HOST, URLs and page size information.
Further, the determining the type of the HTTPS service record according to the HOST information includes:
acquiring an HTTPS signaling message according to the HOST information;
and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record.
Further, the determining the type of the HTTPS service record according to the HOST information further includes:
acquiring multi-level domain name information corresponding to the HTTPS service record according to the URL information;
and analyzing the multi-stage domain name information to obtain the type of the HTTPS service record.
Further, the preset traffic information threshold is generated after training according to a preset neural network model, and includes:
inputting training data to the preset neural network model, wherein the training data comprises HTTPS service types and flow information;
and the neural network model outputs a traffic information threshold corresponding to the traffic content recorded by each HTTPS service according to the HTTPS service type and the traffic information.
Further, the determining, according to a preset traffic information threshold, the service content corresponding to the HTTPS service record includes:
when the flow information is smaller than a first flow information threshold value, determining that the service content corresponding to the HTTPS service record is a text service;
when the flow information is greater than or equal to the first flow information threshold and smaller than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service;
when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service;
and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
To sum up, in the embodiment of the present application, the user plane control information is analyzed through the preset service information feature library, the HTTPS service record is extracted, and the service content corresponding to the HTTPS service record is determined by analyzing the page size of the HTTPS service record. By the method, the HTTPS deep identification does not need to add new hardware equipment, system resources are saved, the identification accuracy of the HTTPS service is higher, the user perception index is closer to the user perception, the identification accuracy of the HTTPS service can be continuously improved, the analysis and identification of the encrypted service are focused, the fingerprint identification and decryption of the HTTPS service are not needed, and the service safety is not damaged.
The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.
Drawings
The drawings are only for purposes of illustrating embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic application environment diagram illustrating an HTTPS service content identification method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating an HTTPS service content identification method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an HTTPS service content identification apparatus according to another embodiment of the present invention;
fig. 4 is a schematic structural diagram illustrating an HTTPS service content identification device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein.
As shown in fig. 1, which is an application environment schematic diagram of the HTTPS service content identification method provided by the embodiment of the present invention, a user terminal accesses a mobile communication network through an access network, accesses a core network through GGSN/MGW/XGW, connects with a firewall and a router, accesses an internet service provider, and accesses an internet service. In the mobile communication network, in order to analyze HTTPS data accessed by a user, a convergence distribution server and an application server are provided in the embodiment of the present application, and when the user accesses an application service provided by an application service provider, the embodiment of the present application mirrors data accessed by the user to a shared layer database by establishing shared layer data, and then analyzes the user access data through the convergence distribution server and the application server, and establishes an HTTPS table and identifies service contents in the HTTPS table by using the HTTPS service content identification method provided in the present application without affecting the user access.
Specifically, as shown in fig. 2, the method for identifying HTTPS service content provided by the embodiment of the present application includes:
step 110: acquiring a service access record of a user;
under the network architecture shown in fig. 1, the service access records of the user are quickly extracted by acquiring server image DPI spectroscopic data. By accessing nodes at the user, such as: a mirror tap point is set on the GGSN/MGW/XGW node, the service access record of the user is mirrored through the tap point, and the mirrored data is stored as shared layer data, which is usually stored in a distributed database. In the embodiment of the application, data collection may be performed through an S1-MME or S1-U interface, after the data collection is completed, the mirror image data is stored in the convergence distribution server shown in fig. 1, and then the mirror image data is analyzed and distributed through the convergence distribution server.
Step 120: extracting user plane control information of the user according to the service access record of the user;
the convergence and distribution server extracts user plane control information of a user by filtering key field information, wherein the user plane control information comprises HOST information of an HTTPS request URL, page size, page contained resource content, resource content size, dynamic resource information, embedded HTTPS and the like.
Step 130: extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information;
due to the encryption attribute of the HTTPS, the network node only knows the access address of the HTTPS, how much content is accessed, and cannot acquire the content of the service, so the HTTPS information is extracted by establishing a service information feature library. The service information feature library comprises a plurality of feature codes, and the feature codes at least comprise user numbers, HOST, URLs and page size information. The service information feature library is a feature field extracted after user plane control information is analyzed in advance, and the feature field can be dynamically updated and subjected to incremental refreshing so as to enable the information contained in the service information feature library to be more comprehensive.
Extracting HTTPS information according to the service information feature library, and adding the extracted information into an HTTPS _ XDR table.
Step 140: determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
acquiring an HTTPS signaling message according to the HOST information; and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record. Determining the type of HTTPS service according to the HOST information, such as: recognizing that the HTTPS service is a browsing downloading HTTPS service, an instant messaging HTTPS service, a video HTTPS service, a shopping HTTPS service, a payment HTTPS service and the like. And acquiring flow information corresponding to the HTTPS business record by calculating page size information.
Through the analysis of the HTTPS related signaling message, the application identifies and classifies HTTPS services in the table, respectively calculates and processes the screened HTTPS services, extracts related domain name information, and backfills the domain name information into an XDR (X data reduction) so as to realize domain name visualization of HTTPS flow. In practice, HOST backfill is successfully achieved, and the HOST backfill rate (the record number) of the HTTPS ticket is 90.05%. HTTPS traffic HOST identifies a traffic percentage of 77.2%, 12.8% for traffic with HOST empty, and 10% for traffic that needs to be analyzed and identified.
Further, in order to more accurately determine the type of the HTTPS service record, the present application further obtains the multi-level domain name information corresponding to the HTTPS service record according to the URL information, and analyzes the multi-level domain name information to obtain the type of the HTTPS service record. For example, from the 3-level or 4-level domain name, a large number of sub-columns, sub-actions and the like, the size of a user service behavior data packet and the like can be identified, and the blank of HTTPS traffic identification is filled to a certain extent.
Step 150: determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold;
after the flow information corresponding to the HTTPS service record is obtained, the corresponding service content is determined by analyzing the flow information, fingerprint identification, decryption and the like are not needed for the HTTPS content, and the accuracy is very high.
The preset flow information threshold may be a preset fixed value. However, in order to improve the reliability of the traffic information threshold, training data including HTTPS service types and traffic information is input to a preset neural network model in a dynamic generation manner; and the neural network model outputs a flow information threshold value corresponding to each service content recorded by each HTTPS service according to the HTTPS service type and the flow information. The neural network model may be a convolutional neural network model or other types of neural network models, which are not described in detail herein.
When the flow information is smaller than a first flow information threshold value, determining that the service content corresponding to the HTTPS service record is a text service; when the flow information is greater than or equal to the first flow information threshold and smaller than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service; when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service; and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
Preferably, the first flow information threshold is 20K, the second flow information threshold is 100K, and the third flow information threshold is 500K.
Browsing and downloading HTTPS service calculation formula: if delta <20K is returned to the text, 20K < delta <100K is returned to the picture, 100K < delta <500K is returned to the file downloading, and delta >500K is returned to the video;
an instant messaging HTTPS business calculation formula: Δ <20K text transmission, 20K < Δ <100K return picture, 100K < Δ <500K return voice, Δ >500K return video delivery;
the video HTTPS business calculation formula comprises that delta <20K returns text, 20K < delta <100K returns pictures, 100K < delta <500K returns file downloading, 500K < delta <3000K returns advertisements, and delta >3000K returns video downloading;
shopping HTTPS business calculation formula: Δ <20K then returns text and payment, 20K < Δ <100K returns pictures, 100K < Δ <500K returns file downloads, Δ >500K returns video;
payment HTTPS service calculation formula: Δ <20K then returns text, 20K < Δ <100K returns picture, 100K < Δ <500K returns file download.
Therefore, the method and the device realize the determination of the corresponding HTTPS content type by analyzing the traffic of the service, and have the advantages of simple analysis and high accuracy.
To sum up, in the embodiment of the present application, the user plane control information is analyzed through the preset service information feature library, the HTTPS service record is extracted, and the service content corresponding to the HTTPS service record is determined by analyzing the page size of the HTTPS service record. By the method, the HTTPS deep identification does not need to add new hardware equipment, system resources are saved, the identification accuracy of the HTTPS service is higher, the user perception index is closer to the user perception, the identification accuracy of the HTTPS service can be continuously improved, the analysis and identification of the encrypted service are focused, the fingerprint identification and decryption of the HTTPS service are not needed, and the service safety is not damaged.
Further, fig. 3 shows a schematic diagram of an HTTPS service content identification apparatus 300 according to an embodiment of the present invention, including: a service access record obtaining module 310, a user plane control information extracting module 320, an HTTPS service record extracting module 330, and a service content determining module 340;
the service access record obtaining module 310 is configured to obtain a service access record of a user;
the user plane control information extracting module 320 is configured to extract user plane control information of the user according to the service access record of the user;
the HTTPS service record extraction module 330 is configured to extract an HTTPS service record from the user plane control information according to a pre-established service information feature library, where the HTTPS service record includes HOST information and page size information; determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
the service content determining module 340 is configured to determine the service content corresponding to the HTTPS service record according to the traffic information corresponding to the HTTPS service record and a preset traffic information threshold.
Further, the service access record obtaining module 310 distributes DPI image light splitting points on the user service access nodes; mirroring the service access record of the user through the DPI mirroring splitting point; and storing the mirror image data in a sharing layer server.
Further, the user plane control information extracting module 320 is further configured to establish a service information feature library according to the user plane control information, where the service information feature library includes a plurality of feature codes, and the feature codes at least include a user number, a HOST, a URL, and page size information; acquiring an HTTPS signaling message according to the HOST information; and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record.
Further, the user plane control information extracting module 320 is further configured to obtain, according to the URL information, multi-level domain name information corresponding to the HTTPS service record; and analyzing the multi-stage domain name information to obtain the type of the HTTPS service record.
Further, the HTTPS service content recognition apparatus 300 further includes a training module, configured to input training data to a preset neural network model, where the training data includes an HTTPS service type and traffic information; and the neural network model outputs a traffic information threshold corresponding to the traffic content recorded by each HTTPS service according to the HTTPS service type and the traffic information.
Further, the service content determining module 340 is configured to determine that the service content corresponding to the HTTPS service record is a text service when the traffic information is smaller than a first traffic information threshold; when the flow information is greater than or equal to the first flow information threshold and less than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service; when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service; and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
To sum up, in the embodiment of the present application, the user plane control information is analyzed through the preset service information feature library, the HTTPS service record is extracted, and the service content corresponding to the HTTPS service record is determined by analyzing the page size of the HTTPS service record. By the method, the HTTPS deep identification does not need to add new hardware equipment, system resources are saved, the identification accuracy of the HTTPS service is higher, the user perception index is closer to the user perception, the identification accuracy of the HTTPS service can be continuously improved, the analysis and identification of the encrypted service are focused, the fingerprint identification and decryption of the HTTPS service are not needed, and the service safety is not damaged.
The embodiment of the invention provides a computer-readable storage medium, where at least one executable instruction is stored in the storage medium, and when the executable instruction runs on an HTTPS service content identification device, the HTTPS service content identification device is enabled to execute an HTTPS service content identification method in any method embodiment described above.
The executable instructions may be specifically configured to cause the HTTPS service content identification apparatus to perform the following operations:
acquiring a service access record of a user;
extracting user plane control information of the user according to the service access record of the user;
extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information;
determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
and determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value.
Further, the acquiring the service access record of the user includes:
a DPI image light splitting point is deployed at a user service access node;
mirroring the service access record of the user through the DPI mirroring splitting point;
and storing the mirror image data in a sharing layer server.
Further, before extracting the HTTPS service record from the user plane control information according to the pre-established service information feature library, the method further includes:
and establishing a service information feature library according to the user plane control information, wherein the service information feature library comprises a plurality of feature codes, and the feature codes at least comprise user numbers, HOST, URLs and page size information.
Further, the determining the type of the HTTPS service record according to the HOST information includes:
acquiring an HTTPS signaling message according to the HOST information;
and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record.
Further, the determining the type of the HTTPS service record according to the HOST information further includes:
acquiring multi-level domain name information corresponding to the HTTPS service record according to the URL information;
and analyzing the multi-stage domain name information to obtain the type of the HTTPS service record.
Further, the preset traffic information threshold is generated after training according to a preset neural network model, and includes:
inputting training data to the preset neural network model, wherein the training data comprises HTTPS service types and flow information;
and the neural network model outputs a traffic information threshold corresponding to the traffic content recorded by each HTTPS service according to the HTTPS service type and the traffic information.
Further, the determining, according to a preset traffic information threshold, the service content corresponding to the HTTPS service record includes:
when the flow information is smaller than a first flow information threshold value, determining that the service content corresponding to the HTTPS service record is a text service;
when the flow information is greater than or equal to the first flow information threshold and smaller than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service;
when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service;
and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
To sum up, in the embodiment of the present application, the user plane control information is analyzed through the preset service information feature library, the HTTPS service record is extracted, and the service content corresponding to the HTTPS service record is determined by analyzing the page size of the HTTPS service record. By the method, the HTTPS deep identification does not need to add new hardware equipment, system resources are saved, the identification accuracy of the HTTPS service is higher, the user perception index is closer to the user perception, the identification accuracy of the HTTPS service can be continuously improved, the analysis and identification of the encrypted service are focused, the fingerprint identification and decryption of the HTTPS service are not needed, and the service safety is not damaged.
Fig. 4 is a schematic structural diagram illustrating an embodiment of an HTTPS service identification device provided by the present invention, and a specific implementation of the HTTPS service identification device is not limited in the specific embodiment of the present invention.
As shown in fig. 4, the HTTPS service identification device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.
Wherein: the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408. A communication interface 404 for communicating with network elements of other devices, such as clients or other servers. The processor 402 is configured to execute the program 410, and may specifically execute the relevant steps in the foregoing embodiments for the HTTPS service identification method.
In particular, program 410 may include program code comprising computer-executable instructions.
The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The common mode base station adjusting device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
Specifically, the program 410 may be invoked by the processor 402 to cause the HTTPS service identification device to perform the following operations:
acquiring a service access record of a user;
extracting user plane control information of the user according to the service access record of the user;
extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information;
determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
and determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value.
Further, the acquiring the service access record of the user includes:
a DPI image light splitting point is deployed at a user service access node;
mirroring the service access record of the user through the DPI mirroring splitting point;
and storing the mirror image data in a sharing layer server.
Further, before extracting the HTTPS service record from the user plane control information according to the pre-established service information feature library, the method further includes:
and establishing a service information feature library according to the user plane control information, wherein the service information feature library comprises a plurality of feature codes, and the feature codes at least comprise user numbers, HOST, URLs and page size information.
Further, the determining the type of the HTTPS service record according to the HOST information includes:
acquiring an HTTPS signaling message according to the HOST information;
and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record.
Further, the determining the type of the HTTPS service record according to the HOST information further includes:
acquiring multi-level domain name information corresponding to the HTTPS service record according to the URL information;
and analyzing the multi-stage domain name information to obtain the type of the HTTPS service record.
Further, the preset traffic information threshold is generated after training according to a preset neural network model, and includes:
inputting training data to the preset neural network model, wherein the training data comprises HTTPS service types and flow information;
and the neural network model outputs a traffic information threshold corresponding to the traffic content recorded by each HTTPS service according to the HTTPS service type and the traffic information.
Further, the determining, according to a preset traffic information threshold, the service content corresponding to the HTTPS service record includes:
when the flow information is smaller than a first flow information threshold value, determining that the service content corresponding to the HTTPS service record is a text service;
when the flow information is greater than or equal to the first flow information threshold and smaller than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service;
when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service;
and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
To sum up, in the embodiment of the present application, the user plane control information is analyzed through the preset service information feature library, the HTTPS service record is extracted, and the service content corresponding to the HTTPS service record is determined by analyzing the page size of the HTTPS service record. By the method, the HTTPS deep identification does not need to add new hardware equipment, system resources are saved, the identification accuracy of the HTTPS service is higher, the user perception index is closer to the user perception, the identification accuracy of the HTTPS service can be continuously improved, the analysis and identification of the encrypted service are focused, the fingerprint identification and decryption of the HTTPS service are not needed, and the service safety is not damaged.
The embodiment of the invention provides an HTTPS service identification device, which is used for executing the HTTPS service identification method.
The embodiment of the invention provides a computer program, which can be called by a processor to enable an application deployment system to execute the HTTPS service identification method in any method embodiment.
An embodiment of the present invention provides a computer program product, where the computer program product includes a computer program stored on a computer-readable storage medium, and the computer program includes program instructions, when the program instructions are run on a computer, the computer is caused to execute the HTTPS service identification method in any of the above-described method embodiments.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims (10)

1. An HTTPS service content identification method is characterized by comprising the following steps:
acquiring a service access record of a user;
extracting user plane control information of the user according to the service access record of the user;
extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information;
determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
and determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value.
2. The HTTPS service content identification method of claim 1, wherein the obtaining a service access record of a user comprises:
a DPI image light splitting point is deployed at a user service access node;
mirroring the service access record of the user through the DPI mirroring splitting point;
and storing the mirror image data in a sharing layer server.
3. The method for identifying HTTPS service content according to claim 1, wherein before extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, further comprising:
and establishing a service information feature library according to the user plane control information, wherein the service information feature library comprises a plurality of feature codes, and the feature codes at least comprise user numbers, HOST, URLs and page size information.
4. The HTTPS service content identification method of claim 3, wherein the determining the type of the HTTPS service record according to the HOST information comprises:
acquiring an HTTPS signaling message according to the HOST information;
and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record.
5. The HTTPS service content identification method of claim 4, wherein the determining the type of the HTTPS service record according to the HOST information further comprises:
acquiring multi-level domain name information corresponding to the HTTPS service record according to the URL information;
and analyzing the multi-stage domain name information to obtain the type of the HTTPS service record.
6. The HTTPS service content recognition method of claim 1, wherein the preset traffic information threshold is generated after training according to a preset neural network model, and comprises:
inputting training data to the preset neural network model, wherein the training data comprises HTTPS service types and flow information;
and the neural network model outputs a traffic information threshold corresponding to the traffic content recorded by each HTTPS service according to the HTTPS service type and the traffic information.
7. The method for identifying HTTPS service content according to claim 6, wherein the determining the service content corresponding to the HTTPS service record according to a preset traffic information threshold includes:
when the flow information is smaller than a first flow information threshold value, determining that the service content corresponding to the HTTPS service record is a text service;
when the flow information is greater than or equal to the first flow information threshold and smaller than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service;
when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service;
and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
8. An HTTPS service content recognition apparatus, comprising:
a service access record obtaining module: the method comprises the steps of obtaining a service access record of a user;
the user plane control information extraction module: the user interface control information of the user is extracted according to the service access record of the user;
HTTPS business record extraction module: the HTTPS service record is extracted from the user plane control information according to a pre-established service information feature library, and comprises HOST information and page size information; determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
a service content determination module: and the method is used for determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value.
9. An HTTPS service content recognition device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation of the HTTPS business content identification method according to any one of claims 1-7.
10. A computer-readable storage medium, characterized in that at least one executable instruction is stored in the storage medium, which when run on an HTTPS service content identification device, causes the HTTPS service content identification device to perform the operations of the HTTPS service content identification method according to any one of claims 1 to 7.
CN202010561791.0A 2020-06-18 2020-06-18 Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content Pending CN113824644A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010561791.0A CN113824644A (en) 2020-06-18 2020-06-18 Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010561791.0A CN113824644A (en) 2020-06-18 2020-06-18 Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content

Publications (1)

Publication Number Publication Date
CN113824644A true CN113824644A (en) 2021-12-21

Family

ID=78924372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010561791.0A Pending CN113824644A (en) 2020-06-18 2020-06-18 Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content

Country Status (1)

Country Link
CN (1) CN113824644A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513685A (en) * 2022-01-28 2022-05-17 武汉绿色网络信息服务有限责任公司 Method and device for identifying HTTPS encrypted video stream based on stream characteristics

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070266045A1 (en) * 2006-05-11 2007-11-15 Computer Associates Think, Inc. Hierarchy for characterizing interactions with an application
CN101977235A (en) * 2010-11-03 2011-02-16 北京北信源软件股份有限公司 URL (Uniform Resource Locator) filtering method aiming at HTTPS (Hypertext Transport Protocol Server) encrypted website access
CN103618726A (en) * 2013-12-04 2014-03-05 北京中创信测科技股份有限公司 Method for recognizing mobile data service based on HTTPS
IL230004A0 (en) * 2012-12-31 2014-03-31 Huawei Tech Co Ltd Application identification method, and data mining method, apparatus, and system
CN105787512A (en) * 2016-02-29 2016-07-20 南京邮电大学 Network browsing and video classification method based on novel characteristic selection method
CN107438254A (en) * 2016-05-25 2017-12-05 中兴通讯股份有限公司 Business recognition method, device and system based on user behavior
CN108173781A (en) * 2017-12-20 2018-06-15 广东宜通世纪科技股份有限公司 HTTPS method for recognizing flux, device, terminal device and storage medium
CN109905288A (en) * 2018-12-21 2019-06-18 中国科学院信息工程研究所 A kind of application service classification method and device
WO2019119837A1 (en) * 2017-12-21 2019-06-27 华为技术有限公司 Service identification method and device, and network device
CN110011931A (en) * 2019-01-25 2019-07-12 中国科学院信息工程研究所 A kind of encryption traffic classes detection method and system
CN111030941A (en) * 2019-10-29 2020-04-17 武汉瑞盈通网络技术有限公司 Decision tree-based HTTPS encrypted flow classification method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070266045A1 (en) * 2006-05-11 2007-11-15 Computer Associates Think, Inc. Hierarchy for characterizing interactions with an application
CN101977235A (en) * 2010-11-03 2011-02-16 北京北信源软件股份有限公司 URL (Uniform Resource Locator) filtering method aiming at HTTPS (Hypertext Transport Protocol Server) encrypted website access
IL230004A0 (en) * 2012-12-31 2014-03-31 Huawei Tech Co Ltd Application identification method, and data mining method, apparatus, and system
CN103618726A (en) * 2013-12-04 2014-03-05 北京中创信测科技股份有限公司 Method for recognizing mobile data service based on HTTPS
CN105787512A (en) * 2016-02-29 2016-07-20 南京邮电大学 Network browsing and video classification method based on novel characteristic selection method
CN107438254A (en) * 2016-05-25 2017-12-05 中兴通讯股份有限公司 Business recognition method, device and system based on user behavior
CN108173781A (en) * 2017-12-20 2018-06-15 广东宜通世纪科技股份有限公司 HTTPS method for recognizing flux, device, terminal device and storage medium
WO2019119837A1 (en) * 2017-12-21 2019-06-27 华为技术有限公司 Service identification method and device, and network device
CN109905288A (en) * 2018-12-21 2019-06-18 中国科学院信息工程研究所 A kind of application service classification method and device
CN110011931A (en) * 2019-01-25 2019-07-12 中国科学院信息工程研究所 A kind of encryption traffic classes detection method and system
CN111030941A (en) * 2019-10-29 2020-04-17 武汉瑞盈通网络技术有限公司 Decision tree-based HTTPS encrypted flow classification method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LAURENT BERNAILLE ET AL.: "Early recognition of encrypted applications", PROCEEDING OF THE 8TH INTERNATIONAL CONFERENCE ON PASSIVE AND ACTIVE NETWORK MEASUREMENT, 5 April 2007 (2007-04-05) *
WAZEN M. SHBAIR ET AL.: "A multi-level framework to identify HTTPS services", IEEE, 4 July 2016 (2016-07-04) *
YANJIE FU ET AL.: "Service Usage Classification with Encrypted Internet Traffic in Mobile Messaging Apps", IEEE, 8 January 2016 (2016-01-08) *
张磊;赵辉;: "一种HTTPS应用的层次分类方法", 网络新媒体技术, no. 03, 15 May 2020 (2020-05-15) *
陈贞贞: "基于DPI和机器学习的加密流量类型识别研究", 信息通信, 11 June 2018 (2018-06-11) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513685A (en) * 2022-01-28 2022-05-17 武汉绿色网络信息服务有限责任公司 Method and device for identifying HTTPS encrypted video stream based on stream characteristics

Similar Documents

Publication Publication Date Title
US20220030085A1 (en) Method, client, server, and system for sharing content
CN110083789B (en) Applet page acquisition method, server, client and electronic equipment
CN110609937A (en) Crawler identification method and device
CN110298662B (en) Automatic detection method and device for transaction repeated submission
CN106878108B (en) Network flow playback test method and device
CN113676563B (en) Scheduling method, device, equipment and storage medium of content distribution network service
CN107528812B (en) Attack detection method and device
CN108173781B (en) HTTPS method for recognizing flux, device, terminal device and storage medium
KR102060766B1 (en) System for monitoring crime site in dark web
CN112333706A (en) Internet of things equipment anomaly detection method and device, computing equipment and storage medium
EP3223184A1 (en) Method and device for verifying identity information
CN116324766A (en) Optimizing crawling requests by browsing profiles
CN115563600A (en) Data auditing method and device, electronic equipment and storage medium
CN111324883A (en) Internet-based E-commerce platform intrusion detection method and computer equipment
CN105184559B (en) A kind of payment system and method
CN113824644A (en) Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content
CN113055420B (en) HTTPS service identification method and device and computing equipment
CN109446807A (en) The method, apparatus and electronic equipment of malicious robot are intercepted for identification
CN108011936A (en) Method and apparatus for pushed information
CN110929129A (en) Information detection method, equipment and machine-readable storage medium
CN106982147B (en) Communication monitoring method and device for Web communication application
US9904662B2 (en) Real-time agreement analysis
CN115314268B (en) Malicious encryption traffic detection method and system based on traffic fingerprint and behavior
CN111988644A (en) Anti-stealing-link method, device, equipment and storage medium for network video
CN113794731B (en) Method, device, equipment and medium for identifying CDN (content delivery network) -based traffic masquerading attack

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination