CN113824644A - Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content - Google Patents
Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content Download PDFInfo
- Publication number
- CN113824644A CN113824644A CN202010561791.0A CN202010561791A CN113824644A CN 113824644 A CN113824644 A CN 113824644A CN 202010561791 A CN202010561791 A CN 202010561791A CN 113824644 A CN113824644 A CN 113824644A
- Authority
- CN
- China
- Prior art keywords
- service
- https
- information
- record
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000012546 transfer Methods 0.000 title abstract description 4
- 238000003062 neural network model Methods 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 16
- 230000011664 signaling Effects 0.000 claims description 13
- 238000004891 communication Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 5
- 230000008447 perception Effects 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 3
- 238000013480 data collection Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2408—Traffic characterised by specific attributes, e.g. priority or QoS for supporting different services, e.g. a differentiated services [DiffServ] type of service
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2483—Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/29—Flow control; Congestion control using a combination of thresholds
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the invention relates to the technical field of mobile internet, and discloses a method, a device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content, wherein the method comprises the following steps: acquiring a service access record of a user; extracting user plane control information of the user according to the service access record of the user; extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information; determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information; and determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value. By the method, the HTTPS service content can be accurately identified.
Description
Technical Field
The embodiment of the invention relates to the technical field of mobile internet, in particular to a method, a device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content.
Background
The HTTPS is an HTTP channel which takes safety as a target, and the safety of a transmission process is ensured through transmission encryption and identity authentication on the basis of HTTP. At present, in a network, the content accessed by a user is often required to be analyzed so as to count the traffic distribution condition of the network.
In the prior art, a method of fingerprint extraction and fingerprint identification of an HTTPS web page is generally adopted, and ciphertext lengths and encryption modes of a plurality of objects of the HTTPS web page to be processed are obtained according to a data stream of the HTTPS web page to be processed; obtaining respective plaintext length intervals of a plurality of objects according to respective ciphertext lengths and encryption modes of the plurality of objects of the HTTPS webpage to be processed to determine information of each object, wherein the information of each object comprises a maximum length, a minimum length and an average length corresponding to the object; and constructing the fingerprint of the HTTPS webpage to be processed by utilizing the respective information of the objects of the HTTPS webpage to be processed. The fingerprint identification method comprises the following steps: and the identification is completed by extracting the object information of the HTTPS webpage to be identified and matching the object information with the information in the HTTPS webpage fingerprint library.
In the research process, the inventor of the application finds that in the prior art, because fingerprint identification and fingerprint extraction are required, the identification accuracy is low, the identification of HTTPS services cannot be effectively carried out, and the leakage is easily caused.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present invention provide an HTTPS service content identification method, apparatus, device and computer storage medium, which are used to solve the problems in the prior art.
According to an aspect of an embodiment of the present invention, there is provided an HTTPS service content identification method, including:
acquiring a service access record of a user;
extracting user plane control information of the user according to the service access record of the user;
extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information;
determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
and determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value.
Further, the acquiring the service access record of the user includes:
a DPI image light splitting point is deployed at a user service access node;
mirroring the service access record of the user through the DPI mirroring splitting point;
and storing the mirror image data in a sharing layer server.
Further, before extracting the HTTPS service record from the user plane control information according to the pre-established service information feature library, the method further includes:
and establishing a service information feature library according to the user plane control information, wherein the service information feature library comprises a plurality of feature codes, and the feature codes at least comprise user numbers, HOST, URLs and page size information.
Further, the determining the type of the HTTPS service record according to the HOST information includes:
acquiring an HTTPS signaling message according to the HOST information;
and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record.
Further, the determining the type of the HTTPS service record according to the HOST information further includes:
acquiring multi-level domain name information corresponding to the HTTPS service record according to the URL information;
and analyzing the multi-stage domain name information to obtain the type of the HTTPS service record.
Further, the preset traffic information threshold is generated after training according to a preset neural network model, and includes:
inputting training data to the preset neural network model, wherein the training data comprises HTTPS service types and flow information;
and the neural network model outputs a traffic information threshold corresponding to the traffic content recorded by each HTTPS service according to the HTTPS service type and the traffic information.
Further, the determining, according to a preset traffic information threshold, the service content corresponding to the HTTPS service record includes:
when the flow information is smaller than a first flow information threshold value, determining that the service content corresponding to the HTTPS service record is a text service;
when the flow information is greater than or equal to the first flow information threshold and smaller than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service;
when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service;
and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
To sum up, in the embodiment of the present application, the user plane control information is analyzed through the preset service information feature library, the HTTPS service record is extracted, and the service content corresponding to the HTTPS service record is determined by analyzing the page size of the HTTPS service record. By the method, the HTTPS deep identification does not need to add new hardware equipment, system resources are saved, the identification accuracy of the HTTPS service is higher, the user perception index is closer to the user perception, the identification accuracy of the HTTPS service can be continuously improved, the analysis and identification of the encrypted service are focused, the fingerprint identification and decryption of the HTTPS service are not needed, and the service safety is not damaged.
The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.
Drawings
The drawings are only for purposes of illustrating embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic application environment diagram illustrating an HTTPS service content identification method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating an HTTPS service content identification method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an HTTPS service content identification apparatus according to another embodiment of the present invention;
fig. 4 is a schematic structural diagram illustrating an HTTPS service content identification device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein.
As shown in fig. 1, which is an application environment schematic diagram of the HTTPS service content identification method provided by the embodiment of the present invention, a user terminal accesses a mobile communication network through an access network, accesses a core network through GGSN/MGW/XGW, connects with a firewall and a router, accesses an internet service provider, and accesses an internet service. In the mobile communication network, in order to analyze HTTPS data accessed by a user, a convergence distribution server and an application server are provided in the embodiment of the present application, and when the user accesses an application service provided by an application service provider, the embodiment of the present application mirrors data accessed by the user to a shared layer database by establishing shared layer data, and then analyzes the user access data through the convergence distribution server and the application server, and establishes an HTTPS table and identifies service contents in the HTTPS table by using the HTTPS service content identification method provided in the present application without affecting the user access.
Specifically, as shown in fig. 2, the method for identifying HTTPS service content provided by the embodiment of the present application includes:
step 110: acquiring a service access record of a user;
under the network architecture shown in fig. 1, the service access records of the user are quickly extracted by acquiring server image DPI spectroscopic data. By accessing nodes at the user, such as: a mirror tap point is set on the GGSN/MGW/XGW node, the service access record of the user is mirrored through the tap point, and the mirrored data is stored as shared layer data, which is usually stored in a distributed database. In the embodiment of the application, data collection may be performed through an S1-MME or S1-U interface, after the data collection is completed, the mirror image data is stored in the convergence distribution server shown in fig. 1, and then the mirror image data is analyzed and distributed through the convergence distribution server.
Step 120: extracting user plane control information of the user according to the service access record of the user;
the convergence and distribution server extracts user plane control information of a user by filtering key field information, wherein the user plane control information comprises HOST information of an HTTPS request URL, page size, page contained resource content, resource content size, dynamic resource information, embedded HTTPS and the like.
Step 130: extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information;
due to the encryption attribute of the HTTPS, the network node only knows the access address of the HTTPS, how much content is accessed, and cannot acquire the content of the service, so the HTTPS information is extracted by establishing a service information feature library. The service information feature library comprises a plurality of feature codes, and the feature codes at least comprise user numbers, HOST, URLs and page size information. The service information feature library is a feature field extracted after user plane control information is analyzed in advance, and the feature field can be dynamically updated and subjected to incremental refreshing so as to enable the information contained in the service information feature library to be more comprehensive.
Extracting HTTPS information according to the service information feature library, and adding the extracted information into an HTTPS _ XDR table.
Step 140: determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
acquiring an HTTPS signaling message according to the HOST information; and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record. Determining the type of HTTPS service according to the HOST information, such as: recognizing that the HTTPS service is a browsing downloading HTTPS service, an instant messaging HTTPS service, a video HTTPS service, a shopping HTTPS service, a payment HTTPS service and the like. And acquiring flow information corresponding to the HTTPS business record by calculating page size information.
Through the analysis of the HTTPS related signaling message, the application identifies and classifies HTTPS services in the table, respectively calculates and processes the screened HTTPS services, extracts related domain name information, and backfills the domain name information into an XDR (X data reduction) so as to realize domain name visualization of HTTPS flow. In practice, HOST backfill is successfully achieved, and the HOST backfill rate (the record number) of the HTTPS ticket is 90.05%. HTTPS traffic HOST identifies a traffic percentage of 77.2%, 12.8% for traffic with HOST empty, and 10% for traffic that needs to be analyzed and identified.
Further, in order to more accurately determine the type of the HTTPS service record, the present application further obtains the multi-level domain name information corresponding to the HTTPS service record according to the URL information, and analyzes the multi-level domain name information to obtain the type of the HTTPS service record. For example, from the 3-level or 4-level domain name, a large number of sub-columns, sub-actions and the like, the size of a user service behavior data packet and the like can be identified, and the blank of HTTPS traffic identification is filled to a certain extent.
Step 150: determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold;
after the flow information corresponding to the HTTPS service record is obtained, the corresponding service content is determined by analyzing the flow information, fingerprint identification, decryption and the like are not needed for the HTTPS content, and the accuracy is very high.
The preset flow information threshold may be a preset fixed value. However, in order to improve the reliability of the traffic information threshold, training data including HTTPS service types and traffic information is input to a preset neural network model in a dynamic generation manner; and the neural network model outputs a flow information threshold value corresponding to each service content recorded by each HTTPS service according to the HTTPS service type and the flow information. The neural network model may be a convolutional neural network model or other types of neural network models, which are not described in detail herein.
When the flow information is smaller than a first flow information threshold value, determining that the service content corresponding to the HTTPS service record is a text service; when the flow information is greater than or equal to the first flow information threshold and smaller than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service; when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service; and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
Preferably, the first flow information threshold is 20K, the second flow information threshold is 100K, and the third flow information threshold is 500K.
Browsing and downloading HTTPS service calculation formula: if delta <20K is returned to the text, 20K < delta <100K is returned to the picture, 100K < delta <500K is returned to the file downloading, and delta >500K is returned to the video;
an instant messaging HTTPS business calculation formula: Δ <20K text transmission, 20K < Δ <100K return picture, 100K < Δ <500K return voice, Δ >500K return video delivery;
the video HTTPS business calculation formula comprises that delta <20K returns text, 20K < delta <100K returns pictures, 100K < delta <500K returns file downloading, 500K < delta <3000K returns advertisements, and delta >3000K returns video downloading;
shopping HTTPS business calculation formula: Δ <20K then returns text and payment, 20K < Δ <100K returns pictures, 100K < Δ <500K returns file downloads, Δ >500K returns video;
payment HTTPS service calculation formula: Δ <20K then returns text, 20K < Δ <100K returns picture, 100K < Δ <500K returns file download.
Therefore, the method and the device realize the determination of the corresponding HTTPS content type by analyzing the traffic of the service, and have the advantages of simple analysis and high accuracy.
To sum up, in the embodiment of the present application, the user plane control information is analyzed through the preset service information feature library, the HTTPS service record is extracted, and the service content corresponding to the HTTPS service record is determined by analyzing the page size of the HTTPS service record. By the method, the HTTPS deep identification does not need to add new hardware equipment, system resources are saved, the identification accuracy of the HTTPS service is higher, the user perception index is closer to the user perception, the identification accuracy of the HTTPS service can be continuously improved, the analysis and identification of the encrypted service are focused, the fingerprint identification and decryption of the HTTPS service are not needed, and the service safety is not damaged.
Further, fig. 3 shows a schematic diagram of an HTTPS service content identification apparatus 300 according to an embodiment of the present invention, including: a service access record obtaining module 310, a user plane control information extracting module 320, an HTTPS service record extracting module 330, and a service content determining module 340;
the service access record obtaining module 310 is configured to obtain a service access record of a user;
the user plane control information extracting module 320 is configured to extract user plane control information of the user according to the service access record of the user;
the HTTPS service record extraction module 330 is configured to extract an HTTPS service record from the user plane control information according to a pre-established service information feature library, where the HTTPS service record includes HOST information and page size information; determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
the service content determining module 340 is configured to determine the service content corresponding to the HTTPS service record according to the traffic information corresponding to the HTTPS service record and a preset traffic information threshold.
Further, the service access record obtaining module 310 distributes DPI image light splitting points on the user service access nodes; mirroring the service access record of the user through the DPI mirroring splitting point; and storing the mirror image data in a sharing layer server.
Further, the user plane control information extracting module 320 is further configured to establish a service information feature library according to the user plane control information, where the service information feature library includes a plurality of feature codes, and the feature codes at least include a user number, a HOST, a URL, and page size information; acquiring an HTTPS signaling message according to the HOST information; and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record.
Further, the user plane control information extracting module 320 is further configured to obtain, according to the URL information, multi-level domain name information corresponding to the HTTPS service record; and analyzing the multi-stage domain name information to obtain the type of the HTTPS service record.
Further, the HTTPS service content recognition apparatus 300 further includes a training module, configured to input training data to a preset neural network model, where the training data includes an HTTPS service type and traffic information; and the neural network model outputs a traffic information threshold corresponding to the traffic content recorded by each HTTPS service according to the HTTPS service type and the traffic information.
Further, the service content determining module 340 is configured to determine that the service content corresponding to the HTTPS service record is a text service when the traffic information is smaller than a first traffic information threshold; when the flow information is greater than or equal to the first flow information threshold and less than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service; when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service; and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
To sum up, in the embodiment of the present application, the user plane control information is analyzed through the preset service information feature library, the HTTPS service record is extracted, and the service content corresponding to the HTTPS service record is determined by analyzing the page size of the HTTPS service record. By the method, the HTTPS deep identification does not need to add new hardware equipment, system resources are saved, the identification accuracy of the HTTPS service is higher, the user perception index is closer to the user perception, the identification accuracy of the HTTPS service can be continuously improved, the analysis and identification of the encrypted service are focused, the fingerprint identification and decryption of the HTTPS service are not needed, and the service safety is not damaged.
The embodiment of the invention provides a computer-readable storage medium, where at least one executable instruction is stored in the storage medium, and when the executable instruction runs on an HTTPS service content identification device, the HTTPS service content identification device is enabled to execute an HTTPS service content identification method in any method embodiment described above.
The executable instructions may be specifically configured to cause the HTTPS service content identification apparatus to perform the following operations:
acquiring a service access record of a user;
extracting user plane control information of the user according to the service access record of the user;
extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information;
determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
and determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value.
Further, the acquiring the service access record of the user includes:
a DPI image light splitting point is deployed at a user service access node;
mirroring the service access record of the user through the DPI mirroring splitting point;
and storing the mirror image data in a sharing layer server.
Further, before extracting the HTTPS service record from the user plane control information according to the pre-established service information feature library, the method further includes:
and establishing a service information feature library according to the user plane control information, wherein the service information feature library comprises a plurality of feature codes, and the feature codes at least comprise user numbers, HOST, URLs and page size information.
Further, the determining the type of the HTTPS service record according to the HOST information includes:
acquiring an HTTPS signaling message according to the HOST information;
and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record.
Further, the determining the type of the HTTPS service record according to the HOST information further includes:
acquiring multi-level domain name information corresponding to the HTTPS service record according to the URL information;
and analyzing the multi-stage domain name information to obtain the type of the HTTPS service record.
Further, the preset traffic information threshold is generated after training according to a preset neural network model, and includes:
inputting training data to the preset neural network model, wherein the training data comprises HTTPS service types and flow information;
and the neural network model outputs a traffic information threshold corresponding to the traffic content recorded by each HTTPS service according to the HTTPS service type and the traffic information.
Further, the determining, according to a preset traffic information threshold, the service content corresponding to the HTTPS service record includes:
when the flow information is smaller than a first flow information threshold value, determining that the service content corresponding to the HTTPS service record is a text service;
when the flow information is greater than or equal to the first flow information threshold and smaller than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service;
when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service;
and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
To sum up, in the embodiment of the present application, the user plane control information is analyzed through the preset service information feature library, the HTTPS service record is extracted, and the service content corresponding to the HTTPS service record is determined by analyzing the page size of the HTTPS service record. By the method, the HTTPS deep identification does not need to add new hardware equipment, system resources are saved, the identification accuracy of the HTTPS service is higher, the user perception index is closer to the user perception, the identification accuracy of the HTTPS service can be continuously improved, the analysis and identification of the encrypted service are focused, the fingerprint identification and decryption of the HTTPS service are not needed, and the service safety is not damaged.
Fig. 4 is a schematic structural diagram illustrating an embodiment of an HTTPS service identification device provided by the present invention, and a specific implementation of the HTTPS service identification device is not limited in the specific embodiment of the present invention.
As shown in fig. 4, the HTTPS service identification device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.
Wherein: the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408. A communication interface 404 for communicating with network elements of other devices, such as clients or other servers. The processor 402 is configured to execute the program 410, and may specifically execute the relevant steps in the foregoing embodiments for the HTTPS service identification method.
In particular, program 410 may include program code comprising computer-executable instructions.
The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The common mode base station adjusting device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
Specifically, the program 410 may be invoked by the processor 402 to cause the HTTPS service identification device to perform the following operations:
acquiring a service access record of a user;
extracting user plane control information of the user according to the service access record of the user;
extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information;
determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
and determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value.
Further, the acquiring the service access record of the user includes:
a DPI image light splitting point is deployed at a user service access node;
mirroring the service access record of the user through the DPI mirroring splitting point;
and storing the mirror image data in a sharing layer server.
Further, before extracting the HTTPS service record from the user plane control information according to the pre-established service information feature library, the method further includes:
and establishing a service information feature library according to the user plane control information, wherein the service information feature library comprises a plurality of feature codes, and the feature codes at least comprise user numbers, HOST, URLs and page size information.
Further, the determining the type of the HTTPS service record according to the HOST information includes:
acquiring an HTTPS signaling message according to the HOST information;
and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record.
Further, the determining the type of the HTTPS service record according to the HOST information further includes:
acquiring multi-level domain name information corresponding to the HTTPS service record according to the URL information;
and analyzing the multi-stage domain name information to obtain the type of the HTTPS service record.
Further, the preset traffic information threshold is generated after training according to a preset neural network model, and includes:
inputting training data to the preset neural network model, wherein the training data comprises HTTPS service types and flow information;
and the neural network model outputs a traffic information threshold corresponding to the traffic content recorded by each HTTPS service according to the HTTPS service type and the traffic information.
Further, the determining, according to a preset traffic information threshold, the service content corresponding to the HTTPS service record includes:
when the flow information is smaller than a first flow information threshold value, determining that the service content corresponding to the HTTPS service record is a text service;
when the flow information is greater than or equal to the first flow information threshold and smaller than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service;
when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service;
and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
To sum up, in the embodiment of the present application, the user plane control information is analyzed through the preset service information feature library, the HTTPS service record is extracted, and the service content corresponding to the HTTPS service record is determined by analyzing the page size of the HTTPS service record. By the method, the HTTPS deep identification does not need to add new hardware equipment, system resources are saved, the identification accuracy of the HTTPS service is higher, the user perception index is closer to the user perception, the identification accuracy of the HTTPS service can be continuously improved, the analysis and identification of the encrypted service are focused, the fingerprint identification and decryption of the HTTPS service are not needed, and the service safety is not damaged.
The embodiment of the invention provides an HTTPS service identification device, which is used for executing the HTTPS service identification method.
The embodiment of the invention provides a computer program, which can be called by a processor to enable an application deployment system to execute the HTTPS service identification method in any method embodiment.
An embodiment of the present invention provides a computer program product, where the computer program product includes a computer program stored on a computer-readable storage medium, and the computer program includes program instructions, when the program instructions are run on a computer, the computer is caused to execute the HTTPS service identification method in any of the above-described method embodiments.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.
Claims (10)
1. An HTTPS service content identification method is characterized by comprising the following steps:
acquiring a service access record of a user;
extracting user plane control information of the user according to the service access record of the user;
extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, wherein the HTTPS service records comprise HOST information and page size information;
determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
and determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value.
2. The HTTPS service content identification method of claim 1, wherein the obtaining a service access record of a user comprises:
a DPI image light splitting point is deployed at a user service access node;
mirroring the service access record of the user through the DPI mirroring splitting point;
and storing the mirror image data in a sharing layer server.
3. The method for identifying HTTPS service content according to claim 1, wherein before extracting HTTPS service records from the user plane control information according to a pre-established service information feature library, further comprising:
and establishing a service information feature library according to the user plane control information, wherein the service information feature library comprises a plurality of feature codes, and the feature codes at least comprise user numbers, HOST, URLs and page size information.
4. The HTTPS service content identification method of claim 3, wherein the determining the type of the HTTPS service record according to the HOST information comprises:
acquiring an HTTPS signaling message according to the HOST information;
and analyzing the HTTPS signaling message to obtain the type of the HTTPS service record.
5. The HTTPS service content identification method of claim 4, wherein the determining the type of the HTTPS service record according to the HOST information further comprises:
acquiring multi-level domain name information corresponding to the HTTPS service record according to the URL information;
and analyzing the multi-stage domain name information to obtain the type of the HTTPS service record.
6. The HTTPS service content recognition method of claim 1, wherein the preset traffic information threshold is generated after training according to a preset neural network model, and comprises:
inputting training data to the preset neural network model, wherein the training data comprises HTTPS service types and flow information;
and the neural network model outputs a traffic information threshold corresponding to the traffic content recorded by each HTTPS service according to the HTTPS service type and the traffic information.
7. The method for identifying HTTPS service content according to claim 6, wherein the determining the service content corresponding to the HTTPS service record according to a preset traffic information threshold includes:
when the flow information is smaller than a first flow information threshold value, determining that the service content corresponding to the HTTPS service record is a text service;
when the flow information is greater than or equal to the first flow information threshold and smaller than a second flow information threshold, determining that the service content corresponding to the HTTPS service record is a picture service;
when the flow information is greater than or equal to the second flow information threshold and smaller than a third flow information threshold, determining that the service content corresponding to the HTTPS service record is a file download service;
and when the flow information is greater than or equal to the third flow information threshold value, determining that the service content corresponding to the HTTPS service record is a video service.
8. An HTTPS service content recognition apparatus, comprising:
a service access record obtaining module: the method comprises the steps of obtaining a service access record of a user;
the user plane control information extraction module: the user interface control information of the user is extracted according to the service access record of the user;
HTTPS business record extraction module: the HTTPS service record is extracted from the user plane control information according to a pre-established service information feature library, and comprises HOST information and page size information; determining the type of the HTTPS business record according to the HOST information, and calculating the flow information corresponding to the HTTPS business record according to the page size information;
a service content determination module: and the method is used for determining the service content corresponding to the HTTPS service record according to the flow information corresponding to the HTTPS service record and a preset flow information threshold value.
9. An HTTPS service content recognition device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation of the HTTPS business content identification method according to any one of claims 1-7.
10. A computer-readable storage medium, characterized in that at least one executable instruction is stored in the storage medium, which when run on an HTTPS service content identification device, causes the HTTPS service content identification device to perform the operations of the HTTPS service content identification method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010561791.0A CN113824644A (en) | 2020-06-18 | 2020-06-18 | Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010561791.0A CN113824644A (en) | 2020-06-18 | 2020-06-18 | Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113824644A true CN113824644A (en) | 2021-12-21 |
Family
ID=78924372
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010561791.0A Pending CN113824644A (en) | 2020-06-18 | 2020-06-18 | Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113824644A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114513685A (en) * | 2022-01-28 | 2022-05-17 | 武汉绿色网络信息服务有限责任公司 | Method and device for identifying HTTPS encrypted video stream based on stream characteristics |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070266045A1 (en) * | 2006-05-11 | 2007-11-15 | Computer Associates Think, Inc. | Hierarchy for characterizing interactions with an application |
CN101977235A (en) * | 2010-11-03 | 2011-02-16 | 北京北信源软件股份有限公司 | URL (Uniform Resource Locator) filtering method aiming at HTTPS (Hypertext Transport Protocol Server) encrypted website access |
CN103618726A (en) * | 2013-12-04 | 2014-03-05 | 北京中创信测科技股份有限公司 | Method for recognizing mobile data service based on HTTPS |
IL230004A0 (en) * | 2012-12-31 | 2014-03-31 | Huawei Tech Co Ltd | Application identification method, and data mining method, apparatus, and system |
CN105787512A (en) * | 2016-02-29 | 2016-07-20 | 南京邮电大学 | Network browsing and video classification method based on novel characteristic selection method |
CN107438254A (en) * | 2016-05-25 | 2017-12-05 | 中兴通讯股份有限公司 | Business recognition method, device and system based on user behavior |
CN108173781A (en) * | 2017-12-20 | 2018-06-15 | 广东宜通世纪科技股份有限公司 | HTTPS method for recognizing flux, device, terminal device and storage medium |
CN109905288A (en) * | 2018-12-21 | 2019-06-18 | 中国科学院信息工程研究所 | A kind of application service classification method and device |
WO2019119837A1 (en) * | 2017-12-21 | 2019-06-27 | 华为技术有限公司 | Service identification method and device, and network device |
CN110011931A (en) * | 2019-01-25 | 2019-07-12 | 中国科学院信息工程研究所 | A kind of encryption traffic classes detection method and system |
CN111030941A (en) * | 2019-10-29 | 2020-04-17 | 武汉瑞盈通网络技术有限公司 | Decision tree-based HTTPS encrypted flow classification method |
-
2020
- 2020-06-18 CN CN202010561791.0A patent/CN113824644A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070266045A1 (en) * | 2006-05-11 | 2007-11-15 | Computer Associates Think, Inc. | Hierarchy for characterizing interactions with an application |
CN101977235A (en) * | 2010-11-03 | 2011-02-16 | 北京北信源软件股份有限公司 | URL (Uniform Resource Locator) filtering method aiming at HTTPS (Hypertext Transport Protocol Server) encrypted website access |
IL230004A0 (en) * | 2012-12-31 | 2014-03-31 | Huawei Tech Co Ltd | Application identification method, and data mining method, apparatus, and system |
CN103618726A (en) * | 2013-12-04 | 2014-03-05 | 北京中创信测科技股份有限公司 | Method for recognizing mobile data service based on HTTPS |
CN105787512A (en) * | 2016-02-29 | 2016-07-20 | 南京邮电大学 | Network browsing and video classification method based on novel characteristic selection method |
CN107438254A (en) * | 2016-05-25 | 2017-12-05 | 中兴通讯股份有限公司 | Business recognition method, device and system based on user behavior |
CN108173781A (en) * | 2017-12-20 | 2018-06-15 | 广东宜通世纪科技股份有限公司 | HTTPS method for recognizing flux, device, terminal device and storage medium |
WO2019119837A1 (en) * | 2017-12-21 | 2019-06-27 | 华为技术有限公司 | Service identification method and device, and network device |
CN109905288A (en) * | 2018-12-21 | 2019-06-18 | 中国科学院信息工程研究所 | A kind of application service classification method and device |
CN110011931A (en) * | 2019-01-25 | 2019-07-12 | 中国科学院信息工程研究所 | A kind of encryption traffic classes detection method and system |
CN111030941A (en) * | 2019-10-29 | 2020-04-17 | 武汉瑞盈通网络技术有限公司 | Decision tree-based HTTPS encrypted flow classification method |
Non-Patent Citations (5)
Title |
---|
LAURENT BERNAILLE ET AL.: "Early recognition of encrypted applications", PROCEEDING OF THE 8TH INTERNATIONAL CONFERENCE ON PASSIVE AND ACTIVE NETWORK MEASUREMENT, 5 April 2007 (2007-04-05) * |
WAZEN M. SHBAIR ET AL.: "A multi-level framework to identify HTTPS services", IEEE, 4 July 2016 (2016-07-04) * |
YANJIE FU ET AL.: "Service Usage Classification with Encrypted Internet Traffic in Mobile Messaging Apps", IEEE, 8 January 2016 (2016-01-08) * |
张磊;赵辉;: "一种HTTPS应用的层次分类方法", 网络新媒体技术, no. 03, 15 May 2020 (2020-05-15) * |
陈贞贞: "基于DPI和机器学习的加密流量类型识别研究", 信息通信, 11 June 2018 (2018-06-11) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114513685A (en) * | 2022-01-28 | 2022-05-17 | 武汉绿色网络信息服务有限责任公司 | Method and device for identifying HTTPS encrypted video stream based on stream characteristics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220030085A1 (en) | Method, client, server, and system for sharing content | |
CN110083789B (en) | Applet page acquisition method, server, client and electronic equipment | |
CN110609937A (en) | Crawler identification method and device | |
CN110298662B (en) | Automatic detection method and device for transaction repeated submission | |
CN106878108B (en) | Network flow playback test method and device | |
CN113676563B (en) | Scheduling method, device, equipment and storage medium of content distribution network service | |
CN107528812B (en) | Attack detection method and device | |
CN108173781B (en) | HTTPS method for recognizing flux, device, terminal device and storage medium | |
KR102060766B1 (en) | System for monitoring crime site in dark web | |
CN112333706A (en) | Internet of things equipment anomaly detection method and device, computing equipment and storage medium | |
EP3223184A1 (en) | Method and device for verifying identity information | |
CN116324766A (en) | Optimizing crawling requests by browsing profiles | |
CN115563600A (en) | Data auditing method and device, electronic equipment and storage medium | |
CN111324883A (en) | Internet-based E-commerce platform intrusion detection method and computer equipment | |
CN105184559B (en) | A kind of payment system and method | |
CN113824644A (en) | Method, device and equipment for identifying HTTPS (hypertext transfer protocol secure) service content | |
CN113055420B (en) | HTTPS service identification method and device and computing equipment | |
CN109446807A (en) | The method, apparatus and electronic equipment of malicious robot are intercepted for identification | |
CN108011936A (en) | Method and apparatus for pushed information | |
CN110929129A (en) | Information detection method, equipment and machine-readable storage medium | |
CN106982147B (en) | Communication monitoring method and device for Web communication application | |
US9904662B2 (en) | Real-time agreement analysis | |
CN115314268B (en) | Malicious encryption traffic detection method and system based on traffic fingerprint and behavior | |
CN111988644A (en) | Anti-stealing-link method, device, equipment and storage medium for network video | |
CN113794731B (en) | Method, device, equipment and medium for identifying CDN (content delivery network) -based traffic masquerading attack |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |