CN111585851B - Method and device for identifying private line user - Google Patents

Method and device for identifying private line user Download PDF

Info

Publication number
CN111585851B
CN111585851B CN202010285794.6A CN202010285794A CN111585851B CN 111585851 B CN111585851 B CN 111585851B CN 202010285794 A CN202010285794 A CN 202010285794A CN 111585851 B CN111585851 B CN 111585851B
Authority
CN
China
Prior art keywords
user
identified
private line
data
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010285794.6A
Other languages
Chinese (zh)
Other versions
CN111585851A (en
Inventor
班瑞
李彤
马季春
白海龙
陈泉霖
郝宇飞
王鹏
邹雨佳
王佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
China Information Technology Designing and Consulting Institute Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
China Information Technology Designing and Consulting Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd, China Information Technology Designing and Consulting Institute Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202010285794.6A priority Critical patent/CN111585851B/en
Publication of CN111585851A publication Critical patent/CN111585851A/en
Application granted granted Critical
Publication of CN111585851B publication Critical patent/CN111585851B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2854Wide area networks, e.g. public data networks
    • H04L12/2856Access arrangements, e.g. Internet access
    • H04L12/2869Operational details of access network equipments
    • H04L12/287Remote access server, e.g. BRAS
    • H04L12/2876Handling of subscriber policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2854Wide area networks, e.g. public data networks
    • H04L12/2856Access arrangements, e.g. Internet access
    • H04L12/2869Operational details of access network equipments
    • H04L12/287Remote access server, e.g. BRAS
    • H04L12/2874Processing of data for distribution to the subscribers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2854Wide area networks, e.g. public data networks
    • H04L12/2856Access arrangements, e.g. Internet access
    • H04L12/2869Operational details of access network equipments
    • H04L12/2878Access multiplexer, e.g. DSLAM
    • H04L12/2887Access multiplexer, e.g. DSLAM characterised by the offered subscriber services

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application provides a method and a device for identifying a private line user, relates to the technical field of communication, and is used for carrying out efficient automatic identification on the private line user. The method comprises the following steps: the server acquires service characteristic data of a user to be identified through Deep Packet Inspection (DPI), wherein the service characteristic data comprises one or more of the following items: the method comprises the steps that a service preference label, a device connection type, a device connection number, a service time attribute and a geographic position label of a user to be identified are obtained; the server inputs the service characteristic data of the user to be identified into the special line user identification model, and determines whether the user to be identified is a special line user; and if the user to be identified is the private line user, the server sets a private line label for the user to be identified, wherein the private line label is used for indicating that the user to be identified is the private line user. The method and the device are applied to the server to identify the private line user.

Description

Method and device for identifying private line user
Technical Field
The present application relates to the field of communications, and in particular, to a method and an apparatus for identifying a private line user.
Background
In modern operation, operators often configure some customers who have long fixed network IP address resources as static private subscribers (IPHOST), or simply private subscribers. These private line users have many privileges in the operating network, for example, they can become online users and enjoy network resources for a long time by learning the physical MAC address in the broadband access server (BRAS). Even if the private line user does not learn the MAC address, the already configured existing IP address cannot be provided to other users. That is, a private line user, whether truly online or not, monopolizes the IP address and some network resources.
Based on the above situation, the operator needs to identify the private line user in the process of providing the communication service. At present, the identification of the private line user mainly depends on the association with the private line resource information for identification. However, this approach has a number of disadvantages: when the private line resource information is incomplete or the related resource information cannot be provided, the condition of missing identification or incapability of identification exists; the special line resource information needs manpower to synchronize timing data, and a certain time difference exists. Furthermore, these disadvantages result in the recognition accuracy of the method not reaching the needs of the operator.
Therefore, a suitable solution is needed at present to solve the problem of how to realize efficient automatic identification of private line users.
Disclosure of Invention
The application provides a method and a device for identifying a private line user, which are used for solving the problem of how to realize high-efficiency automatic identification of the private line user at the present stage.
In order to achieve the purpose, the following scheme is adopted in the application:
in a first aspect, the present application provides a method for identifying a private line user, including: the server acquires service characteristic data of a user to be identified through Deep Packet Inspection (DPI), wherein the service characteristic data comprises one or more of the following items: the service preference label, the equipment connection type, the equipment connection quantity, the service time attribute and the geographic position label of the user to be identified. The server inputs the service characteristic data of the user to be identified into a pre-trained private line user identification model, and determines whether the user to be identified is a private line user. And if the user to be identified is the private line user, the server sets a private line label for the user to be identified, wherein the private line label is used for indicating that the user to be identified is the private line user.
Based on the technical scheme, the server acquires the service characteristic data of the user to be identified through the DPI, wherein the service characteristic data comprises one or more of the following items: the service preference label, the equipment connection type, the equipment connection quantity, the service time attribute and the geographic position label of the user to be identified. And then, the server constructs a special line user identification model by using a machine learning mode according to the service characteristic data of a plurality of users to be identified in historical time. In the process of constructing the private line user identification model, the service characteristic data of the user to be identified provides characteristic data of three aspects of service attribute, equipment type and number and geographic position for constructing the private line user identification model, and the identification rate and accuracy of the private line user identification model to the private line user are improved. And finally, the server inputs the current service characteristic data of the user to be identified into the special line user identification model, so that the effect of automatically identifying the special line user with high efficiency is realized.
In one possible design, the server obtains the service characteristic data of the user through the DPI, and includes: the server collects original flow through DPI to generate a DPI ticket; wherein the DPI ticket comprises one or more of the following items: the usage period of the IP, the usage location, the device type, the protocol type distribution, the http request key set.
In one possible design, the server constructs a special line user recognition model according to a Support Vector Machine (SVM) classifier.
In one possible design, the construction of the private line user identification model specifically includes: the method comprises the steps of obtaining a learning data set, wherein the learning data set comprises a plurality of sample data, and each sample data comprises service characteristic data of a sample user. And performing data preprocessing on the service characteristic data, wherein the data preprocessing comprises filtering data with invalid values and missing values. And constructing a special line user identification model according to the service characteristic data after data preprocessing.
In a second aspect, the present application provides a server comprising: the acquisition module is used for acquiring service characteristic data of a user to be identified through Deep Packet Inspection (DPI), wherein the service characteristic data comprises one or more of the following items: the service preference label, the equipment connection type, the equipment connection quantity, the service time attribute and the geographic position label of the user to be identified. The processing module is used for inputting the service characteristic data of the user to be identified into the trained private line user identification model and determining whether the user to be identified is a private line user; the method is used for setting a private line label for the user to be identified when the user to be identified is the private line user, and the private line label is used for indicating that the user to be identified is the private line user.
In one possible design, the processing module is further configured to collect service traffic data of a user to be identified through a DPI, and generate a DPI ticket; wherein, the DPI call ticket comprises one or more of the following items: the usage period of the IP, the usage location, the device type, the protocol type distribution, the http request key set. And the acquisition module is also used for acquiring the service characteristic data of the user to be identified according to the DPI ticket.
In one possible design, the processing module is further configured to construct a special line user recognition model according to a Support Vector Machine (SVM) classifier.
In one possible design, the obtaining module is further configured to obtain a learning data set, where the learning data set includes a plurality of sample data, and each sample data includes service feature data of a sample user. The processing module is also used for carrying out data preprocessing on the service characteristic data, and the data preprocessing comprises the filtration of data with invalid values and missing values; and the method is also used for constructing a special line user identification model according to the service characteristic data after data preprocessing.
In a third aspect, the present application provides a server, comprising: a processor and a communication interface; the communication interface is coupled to a processor for executing a computer program or instructions for implementing the method for identifying a private line user as described in the first aspect and any one of the possible implementations of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, in which instructions are stored, and when the instructions are executed on a computer, the instructions cause the computer to perform the method for identifying a private line user described in any one of the possible implementation manners of the first aspect and the first aspect.
In a fifth aspect, the present application provides a computer program product containing instructions for causing a computer to perform the method for identifying a private line user described in the first aspect and any one of the possible implementations of the first aspect when the computer program product runs on the computer.
In a sixth aspect, the present application provides a chip comprising a processor and a communication interface, the communication interface being coupled to the processor, the processor being configured to execute a computer program or instructions to implement the method for identifying a dedicated subscriber as described in the first aspect and any one of the possible implementations of the first aspect.
Drawings
Fig. 1 is a schematic flowchart of an identification method for a private line subscriber according to an embodiment of the present application;
fig. 2 is a schematic flowchart of another method for identifying a private line user according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a server according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of another server according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The character "/" herein generally indicates that the former and latter associated objects are in an "or" relationship. For example, A/B may be understood as A or B.
The terms "first" and "second" in the description and claims of the present application are used for distinguishing between different objects and not for describing a particular order of the objects. For example, the first edge service node and the second edge service node are used for distinguishing different edge service nodes, and are not used for describing the characteristic sequence of the edge service nodes.
Furthermore, the terms "including" and "having," and any variations thereof, as referred to in the description of the present application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, article, or apparatus.
In addition, in the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or explanations. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "e.g.," is intended to present concepts in a concrete fashion.
In order to facilitate understanding of the technical solutions of the present application, some technical terms are described below.
1. Deep packet inspection
Deep Packet Inspection (DPI) is a packet-based deep inspection technology, which performs deep inspection on different network application layer loads (such as HTTP, DNS, etc.), and determines the validity of the packet by inspecting the payload of the packet.
In the application, the server collects the source initial flow through the DPI to obtain the DPI ticket.
2. Support vector machine classifier
A Support Vector Machine (SVM) classifier is a generalized linear classifier (generalized linear classifier) that performs binary classification on data in a supervised learning manner, and a decision boundary of the SVM classifier is a maximum-margin hyperplane (maximum-margin hyperplane) that solves for a learning sample.
The SVM calculates an empirical risk (empirical risk) using a hinge loss function (change loss) and adds a regularization term to a solution system to optimize a structural risk (structural risk), which is a classifier with sparsity and robustness. SVMs can be classified non-linearly by a kernel method, which is one of the common kernel learning (kernel learning) methods. The following steps of the SVM classifier are briefly described:
given a training sample set: d ═ x1,y1),(x2,y2),...,(xm,ym),yi∈{-1,1}。
The SVM linear classifier finds a hyperplane in two-dimensional space based on the training samples D to separate the two classes of samples.
Dividing the hyperplane in the sample space can be described by the following linear equation:
wTxi+b=0
wherein w ═ w1;w2;...;wd) The direction of the hyperplane is determined for the normal vector, and b is the displacement term, and the distance between the hyperplane and the original point is determined. We shall hereinafter denote this hyperplane as (w, b). The distance of an arbitrary point x in sample space from the hyperplane can be written as:
Figure BDA0002448455360000051
if the hyperplane can correctly classify the training samples, then:
Figure BDA0002448455360000052
the training samples closest to the hyperplane hold the above equation, which is called support vector, and the sum of the distances from two heterogeneous support vectors to the hyperplane is:
Figure BDA0002448455360000053
the above equation is called interval, and it is desired to find the dividing hyperplane with the largest interval, i.e. to find the constraint parameters w and b that satisfy γ. By derivation, to maximize heterogeneous spacing, only | | w | | luminance needs to be maximized-1The basic form of svm is available:
Figure BDA0002448455360000054
s.t.yi(wTxi+b)≥1,i=1,2,...,m
the purpose is to obtain a model corresponding to the maximum interval hyperplane as follows:
f(x)=wTxi+b
from the above basic model, it can be seen that the objective function is quadratic and the constraint is linear. This is a convex quadratic programming problem that can be solved directly with off-the-shelf optimization computation packages, or with the "dual problem" and will not be described in detail here. In addition, if the training result still presents linear inseparability, kernel function can be added for further optimization.
The technical scheme provided by the embodiment of the application can be applied to the identification of the private line user, and is mainly used for solving the problem that the private line user cannot identify under the condition that the private line resource information is incomplete or a related resource table cannot be provided. In the prior art, identification is carried out mainly by associating with private line resource information, the identification method is quick and efficient, but under the condition that the private line resource information is not complete or related resources cannot be provided, identification is missed or cannot be identified. For example, when the private line resource information table is incomplete or asynchronous, the situation of being unable to be identified or being identified by mistake can occur. At this time, in order to solve the influence caused by the above situation, timing synchronization needs to be performed manually, and the private line resource information table needs to be updated, but this causes a time difference problem, and thus, a private line user cannot be identified accurately in time, and the consumption of human resources is increased.
According to the embodiment of the application, the special line user identification model is established to realize the identification of the special line user, the learning data set is obtained according to the historical service characteristic data of a plurality of sample users, and the learning data set is subjected to data processing and data analysis, so that the learning difficulty of the special line user identification model is reduced. And then, combining the learning data set with an SVM classifier to construct a special line user recognition model, and performing iterative training and algorithm adjustment to finally obtain the special line user recognition model which can be used for practice. The embodiment of the application is used for solving the problem that the existing special line user identification efficiency is low, and provides a good foundation for the maintenance and management of a follow-up operator on the special line user.
The technical solution provided by the present application is specifically explained below with reference to the drawings of the specification.
As shown in fig. 1, a method for identifying a private line user provided in an embodiment of the present application includes the following steps:
s101, the server acquires service characteristic data of a user to be identified through DPI.
Wherein the service characteristic data may comprise one or more of: the equipment model, the service account number, the service access type, the service access time, the operating system number, the IMEI and the GPS position information of the user to be identified.
Optionally, the server collects the service traffic data of the user to be identified through the DPI, and generates a DPI ticket. Wherein, the DPI call ticket comprises one or more of the following items: the usage period of the IP, the usage location, the device type, the protocol type distribution, the http request key set.
S102, the server inputs the service characteristic data of the user to be identified into a pre-trained special line user identification model.
Optionally, the server constructs a special line user recognition model according to the SVM classifier.
It can be understood that, after the server inputs the service characteristic data of the user to be identified into the trained private line user identification model, the private line user identification model outputs the identification result. Wherein, the recognition result includes: the user to be identified is a private line user or the user to be identified is a common user.
S103, the server marks the user to be identified according to the identification result output by the special line user identification model.
Optionally, the server marks the user to be identified according to the private line label and the common label. Wherein, the private line label is used for indicating that the marked user is a private line user, and the common label is used for indicating that the marked user is a common user. If the identification result output by the special line user identification model indicates that the user to be identified is a special line user, the server marks the user to be identified by using a special line label; and if the identification result output by the private line user identification model is that the user to be identified is a common user, the server marks the user to be identified by using a common label. For example, private users are labeled "1" and normal users are labeled "-1".
Based on the technical scheme, the server acquires the service characteristic data of the user to be identified through the DPI, then inputs the service characteristic data into the trained private line user identification model, determines whether the user to be identified is the private line user, and finally marks the identified private line user, so that an operator can conveniently maintain and manage the private line user. Therefore, the server can automatically identify the private line user by using the service characteristic data of the user to be identified and the trained private line user identification model, and meanwhile, the identification rate and the accuracy in the private line user identification process are improved.
As shown in fig. 2, another method for identifying a private line user provided in this embodiment of the present application includes constructing a private line user identification model, and after step S101, further includes the following steps:
s201, the server acquires a learning data set.
Wherein the learning data set comprises business feature data of a plurality of sample users.
For example, the server selects the service characteristic data of at least five thousand sample users as a data source, and then performs data standardization construction on the data source to obtain a learning data set. The data source contains service characteristic data of a certain proportion of private line users.
Optionally, the server performs data standardization construction on the data source, including extracting effective data features from the data source, and constructing a sample set with a single IP address as a unit; and the special line label is used for marking the special line user, and the common label is used for marking the common user. Wherein, the private line label is used for indicating that the marked user is a private line user, and the common label is used for indicating that the marked user is a common user.
It can be understood that the service characteristic data of a plurality of sample users is selected as a data source, so as to ensure the richness of data for training the private line user identification model, reduce the contingency of the training private line user identification model in the process of identifying the user type and improve the accuracy of the private line user identification model in the process of identifying the user type.
Optionally, the learning data set further includes a private line user resource information table. The private line user resource information table is used for indicating that the service characteristic data in the table come from the private line user.
Optionally, the server selects 70% of the learning data set as a training data set, and selects 30% of the learning data set as a testing data set. The training data set is used for initially constructing and training a special line user identification model; the test data set is used for testing the primarily constructed and trained private line user identification model and judging whether the accuracy rate meets the requirement of an accurate threshold value. The accuracy threshold may be set according to actual requirements, for example 95%.
It can be understood that the learning data set is divided into the training data set and the test data set according to a certain proportion, so as to ensure the reliability of the special line user identification model in identifying the user type, and if the training and the test use the same service characteristic data, the situation that the learning data set cannot be detected when the special line user identification model is over-fitted to the learning data set is avoided.
S202, the server carries out data preprocessing on the learning data set.
Optionally, the data preprocessing performed on the learning data set by the server includes: data containing invalid values and data containing missing values are filtered out.
It should be noted that, after the foregoing data standardization operation, the data is formatted into a fixed format, and there may be some invalid data, for example: the key field is empty, data whose data value is not within the filtering range. Therefore, the server is required to filter data containing invalid values and data containing missing values.
S203, the server analyzes the learning data set to obtain an effective characteristic value.
Optionally, the server takes the private line user and the ordinary user as statistical units, and performs statistical analysis on the learning data set to obtain the characteristic value. Wherein the characteristic value may include one or more of: the distribution of the port numbers of all the sessions, the distribution of the types of the session services, the distribution of Chinese and English vocabulary keywords of the session requests and the like under the private line user group and the common user group.
Illustratively, a feature point whose feature value satisfies the following formula is a large difference feature point, and the server sets the feature value of the large difference feature point as a valid feature value:
Figure BDA0002448455360000081
Figure BDA0002448455360000082
Figure BDA0002448455360000083
wherein n isiAnd NiThe subscript h represents that the statistic value is the statistic value of a common user, and the subscript s represents that the statistic value is the statistic value of a private line user; the heat value of the private line user on each characteristic point i is SiThe common user is Hi(ii) a Difference of characteristic pointThe magnitude-difference threshold is T.
It can be understood that the judgment of the feature point with larger difference between the private line user and the ordinary user and the setting of the feature value of the feature point with larger difference as the effective feature value are to extract effective data so as to reduce the learning difficulty of the private line user identification model. For example, a government agency is located in the same geographical area as a residential cell, and the peak time of broadband use of the government agency is generally 8: 30 to 17: 30, while the peak time of broadband use of residents of the residential cell is generally 19: 30 to 23: 30. Under the condition, the characteristic point of the internet peak time is selected as a characteristic point with larger difference, and the time characteristic value is extracted as an effective characteristic value, so that the learning difficulty of the special line user identification model can be reduced.
And S204, the server constructs a special line user identification model according to the SVM classifier.
Optionally, the server sets an initial model parameter list. For example, the SVM classifier may select and configure various parameters from the Sklearn library.
Optionally, the optimal parameters of the model construction are searched from the initial model parameter list according to a cross validation grid search method. Illustratively, the search process is as follows: selecting a group of parameters from the selectable parameter list to construct a private line user identification model; inputting training data to start training; after a set training target is reached, the model is stored in a model library for model quality comparison; continuing to circulate the first step operation until the initial model parameter list is completely traversed; and finally, selecting an optimal group of parameters and training a special line user identification model according to the group of parameters.
Illustratively, the SVM classifier may select the following parameters from the Sklearn library:
(1) penalty coefficient C
The larger the penalty coefficient C of the error item is, namely the greater the penalty degree of the error sample is, so that the accuracy rate in the training sample is higher, but the generalization capability is reduced; on the contrary, if the penalty coefficient C is reduced, some misclassified wrong samples in the training samples are allowed, and the generalization capability is strong.
(2) Kernel parameter
The parameters are used to select kernel functions used by the model, and the kernel functions commonly used in the algorithm are: linear kernel function; a poly polynomial kernel; rbf diameter image kernel function/gaussian kernel; a sigmod kernel function; a precomputed kernel matrix.
(3) gamma parameter
The parameters are kernel coefficients, and are valid only for rbf, poly, and sigmod. If gamma is set to auto, it represents that the value is the reciprocal of the sample feature number, i.e., 1/n _ features. Also, other values may be set.
(4) degree parameter
This parameter is only useful for 'kernel ═ poly' (polynomial kernel) and refers to the order n of the polynomial kernel, which is automatically ignored if the given kernel parameter is other kernels.
(5) coef0 parameter
The parameter represents a separate term in the kernel function, and is only useful for poly and sigmod kernel functions, and refers to the parameter c in the kernel function.
It can be understood that when the SVM classifier selects and configures a plurality of parameters, the selection may be performed according to the effective feature values in the training data set. After the SVM classifier selects and configures various parameters and establishes an initial model parameter list, the SVM classifier initially establishes a special line user recognition model according to a training data set. Therefore, when the SVM classifier establishes the special line user recognition model, the effective characteristic value in the training data set is selected, and the learning difficulty of the special line user recognition model is reduced.
S205, the server confirms whether the private line user identification model can be used for identifying the actual private line user.
Optionally, the server presets a standard-reaching threshold, and determines whether the identification capability of the private line user identification model reaches the standard according to the standard-reaching threshold and the F value, and whether the identification capability can be used for identifying the actual private line user. And the F value is used for reflecting the strength of the identification capability of the special line user identification model.
Optionally, two metric values in the field of statistical classification are selected: accuracy and recall are used as parameters in calculating the F value. In the present application, the method for calculating the F value using the accuracy and recall is as follows:
Figure BDA0002448455360000101
Figure BDA0002448455360000102
Figure BDA0002448455360000103
it should be noted that the above accuracy and recall are calculated based on the test data set. After the test data set is used for testing the primarily constructed and trained private line user identification model, the accuracy and the recall rate are counted according to the test result.
For example, the server may set the compliance threshold to 80%, and when the F value of the private line user identification model reaches 80%, the identification capability of the private line user identification model at this time is considered to be strong enough to be used for identification of the actual private line user.
Based on the technical scheme, the server acquires the learning data set according to the historical service characteristic data of the plurality of sample users, and performs data processing and data analysis on the learning data set so as to reduce the learning difficulty of the special line user identification model. And then, combining the learning data set with an SVM classifier to construct a special line user recognition model, and performing iterative training and algorithm adjustment to finally obtain the special line user recognition model which can be used for practice. And finally, when the recognition capability of the special line user recognition model reaches a preset standard, the special line user recognition model is used for recognizing the actual special line user. According to the embodiment of the application, the special line user identification model is constructed in a machine learning mode, and the identification capability of the special line user identification model is improved.
In the embodiment of the present application, the server may be divided into the functional modules or the functional units according to the above method examples, for example, each functional module or functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module may be implemented in a form of hardware, or may be implemented in a form of a software functional module or a functional unit. The division of the modules or units in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
As shown in fig. 3, a schematic structural diagram of a server 30 provided in the embodiment of the present application is used for executing the above method for identifying a private line user, where the server 30 includes:
an obtaining module 301, configured to obtain service characteristic data of a user to be identified through deep packet inspection DPI, where the service characteristic data includes one or more of the following: the service preference label, the equipment connection type, the equipment connection quantity, the service time attribute and the geographic position label of the user to be identified.
The processing module 302 is configured to input service feature data of the user to be identified into the trained private line user identification model, and determine whether the user to be identified is a private line user; the method is used for setting a private line label for the user to be identified when the user to be identified is the private line user, and the private line label is used for indicating that the user to be identified is the private line user.
Optionally, the processing module 302 is further configured to collect service traffic data of the user to be identified through a DPI, and generate a DPI ticket; wherein, the DPI call ticket comprises one or more of the following items: the usage period of the IP, the usage location, the device type, the protocol type distribution, the http request key set.
Optionally, the obtaining module 301 is further configured to obtain service feature data of the user to be identified according to the DPI ticket.
Optionally, the processing module 302 is further configured to construct a special line user recognition model according to a support vector machine SVM classifier.
Optionally, the obtaining module 301 is further configured to obtain a learning data set, where the learning data set includes a plurality of sample data, and each sample data includes service feature data of one sample user.
Optionally, the processing module 302 is further configured to perform data preprocessing on the service feature data, where the data preprocessing includes filtering data with invalid values and missing values; and the method is also used for constructing a special line user identification model according to the service characteristic data after data preprocessing.
Fig. 4 shows a schematic diagram of another possible structure of the server involved in the above embodiment. The device includes: a processor 402 and a communication interface 403.
In the case where the server shown in fig. 3 is implemented as the server shown in fig. 4, the processor 402 is configured to control and manage the actions of the apparatus, for example, to perform the steps performed by the processing module 302 described above, and/or to perform other processes for the techniques described herein. The communication interface 403 is used to support the server's communication with other network entities. Such as the steps performed by the acquisition module 301. The server may also include a memory 401 and a bus 404, the memory 401 being used to store program codes and data for the devices.
The processor 402 may implement or execute various illustrative logical blocks, units, and circuits described in connection with the disclosure herein. The processor may be a central processing unit, general purpose processor, digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, units, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others.
Memory 401 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.
The bus 404 may be an Extended Industry Standard Architecture (EISA) bus or the like. The bus 404 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. For the specific working processes of the system, the apparatus, and the module described above, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.
Embodiments of the present application provide a computer program product including instructions, which when run on a computer, cause the computer to execute the method for identifying a node of an internet of things according to the foregoing method embodiments.
An embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the network device executes the instructions, the network device executes each step executed by the network device in the method flow shown in the foregoing method embodiment.
The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, and a hard disk. Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM), registers, a hard disk, an optical fiber, a portable Compact disk Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any other form of computer-readable storage medium, in any suitable combination, or as appropriate in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). In embodiments of the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. A method for identifying a private line user, the method comprising:
acquiring service characteristic data of a user to be identified through Deep Packet Inspection (DPI), wherein the service characteristic data comprises one or more of the following items: the method comprises the steps that a service preference label, a device connection type, a device connection number, a service time attribute or a geographical position label of a user to be identified;
acquiring a learning data set, wherein the learning data set comprises a plurality of sample data, and each sample data comprises the service characteristic data of a sample user;
analyzing the learning data set to obtain an effective characteristic value; the valid eigenvalues satisfy the following formula:
Figure FDA0003185450490000011
Figure FDA0003185450490000012
Figure FDA0003185450490000013
wherein n isiAnd NiThe subscript h represents that the statistic value is the statistic value of a common user, and the subscript S represents that the statistic value is the statistic value of a private line user; the heat value of the private line user on each characteristic point i is SiThe common user is Hi(ii) a The difference threshold value of the feature points is T;
establishing a special line user recognition model according to the effective characteristic values in the training data set by using a Support Vector Machine (SVM) classifier; inputting the service characteristic data of the user to be identified into the pre-trained private line user identification model, and determining whether the user to be identified is a private line user;
and if the user to be identified is the private line user, setting a private line label for the user to be identified, wherein the private line label is used for indicating that the user to be identified is the private line user.
2. The method for identifying a private line subscriber according to claim 1, wherein the step of obtaining the service feature data of the subscriber by a DPI specifically comprises:
collecting the service flow data of the user to be identified through DPI to generate a DPI ticket; wherein the DPI ticket comprises one or more of the following items: the usage period, usage location, device type, protocol type distribution, or http request keyword set of the IP;
and acquiring the service characteristic data of the user to be identified according to the DPI ticket.
3. The method of identifying a private line user according to claim 1 or 2, wherein after said obtaining a learning data set, the method further comprises:
and performing data preprocessing on the service characteristic data, wherein the data preprocessing comprises filtering data with invalid values and missing values.
4. A server, characterized in that the server comprises:
the system comprises an acquisition module and a processing module, wherein the acquisition module is used for acquiring service characteristic data of a user to be identified through Deep Packet Inspection (DPI), and the service characteristic data comprises one or more of the following items: the method comprises the steps that a service preference label, a device connection type, a device connection number, a service time attribute and a geographic position label of a user to be identified are obtained;
the obtaining module is further configured to obtain a learning data set, where the learning data set includes a plurality of sample data, and each sample data includes the service feature data of one sample user;
the processing module is used for analyzing the learning data set to obtain an effective characteristic value; the valid eigenvalues satisfy the following formula:
Figure FDA0003185450490000021
Figure FDA0003185450490000022
Figure FDA0003185450490000023
wherein n isiAnd NiThe subscript h represents that the statistic value is the statistic value of a common user, and the subscript S represents that the statistic value is the statistic value of a private line user; the heat value of the private line user on each characteristic point i is SiThe common user is Hi(ii) a The difference threshold value of the feature points is T;
the processing module is also used for establishing a special line user recognition model according to the effective characteristic values in the training data set by utilizing a vector machine (SVM) classifier;
the processing module is used for inputting the service characteristic data of the user to be identified into the trained private line user identification model and determining whether the user to be identified is a private line user; and the special line tag is used for setting a special line tag for the user to be identified when the user to be identified is the special line user, and the special line tag is used for indicating that the user to be identified is the special line user.
5. The server according to claim 4,
the processing module is further used for acquiring the service flow data of the user to be identified through DPI and generating a DPI ticket; wherein the DPI ticket comprises one or more of the following items: the method comprises the steps of IP use time period, use position, equipment type, protocol type distribution and http request keyword set;
the obtaining module is further configured to obtain the service characteristic data of the user to be identified according to the DPI ticket.
6. The server according to claim 4 or 5,
the processing module is further configured to perform data preprocessing on the service feature data, where the data preprocessing includes filtering data with invalid values and missing values.
7. A server, comprising: a processor and a communication interface; the communication interface is coupled to the processor, which is configured to run a computer program or instructions to implement the method for identifying a private line subscriber as claimed in any one of claims 1 to 3.
8. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a computer, cause the computer to perform the method for identifying a private line subscriber according to any one of claims 1 to 3.
CN202010285794.6A 2020-04-13 2020-04-13 Method and device for identifying private line user Active CN111585851B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010285794.6A CN111585851B (en) 2020-04-13 2020-04-13 Method and device for identifying private line user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010285794.6A CN111585851B (en) 2020-04-13 2020-04-13 Method and device for identifying private line user

Publications (2)

Publication Number Publication Date
CN111585851A CN111585851A (en) 2020-08-25
CN111585851B true CN111585851B (en) 2021-11-19

Family

ID=72126418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010285794.6A Active CN111585851B (en) 2020-04-13 2020-04-13 Method and device for identifying private line user

Country Status (1)

Country Link
CN (1) CN111585851B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743103A (en) * 2021-08-20 2021-12-03 南京星云数字技术有限公司 Comment user identity identification method and device, computer equipment and storage medium
CN114039924B (en) * 2021-10-19 2024-06-21 浪潮通信信息***有限公司 Quality guarantee method and system for network resource inclination of passenger collecting private line
CN114091695B (en) * 2021-11-09 2023-01-24 中国联合网络通信集团有限公司 User identification method and device for vehicle and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140143179A (en) * 2012-03-30 2014-12-15 톰슨 라이센싱 Laser projector system with graphical pointer
CN102902974B (en) * 2012-08-23 2015-07-08 西南交通大学 Image based method for identifying railway overhead-contact system bolt support identifying information
WO2015196377A1 (en) * 2014-06-25 2015-12-30 华为技术有限公司 Method and device for determining user identity category
CN107395635B (en) * 2017-08-25 2020-04-21 中国联合网络通信集团有限公司 Method and device for positioning user position of wired end

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
An Experimental Evaluation of the Computational Cost of a DPI Traffic Classifier;N. Cascarano,A.Este,F. Gringoli,F. Risso,L. Salgarelli;《GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference》;20100304;全文 *
IP网络业务识别关键技术研究;王攀;《中国博士学位论文全文数据库》;20130615;全文 *

Also Published As

Publication number Publication date
CN111585851A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
CN111585851B (en) Method and device for identifying private line user
CN110222170B (en) Method, device, storage medium and computer equipment for identifying sensitive data
CN107122369B (en) Service data processing method, device and system
US20120136812A1 (en) Method and system for machine-learning based optimization and customization of document similarities calculation
KR20210110823A (en) Image recognition method, training method of recognition model, and related devices and devices
CN111651601B (en) Training method and classification method for fault classification model of power information system
CN106296344B (en) Malicious address identification method and device
WO2021072876A1 (en) Identification image classification method and apparatus, computer device, and readable storage medium
CN104899195A (en) Customized educational resource recommending method and apparatus
CN112580108A (en) Signature and seal integrity verification method and computer equipment
CN111881943A (en) Method, device, equipment and computer readable medium for image classification
CN108462624A (en) A kind of recognition methods of spam, device and electronic equipment
CN114091551A (en) Pornographic image identification method and device, electronic equipment and storage medium
CN109583492A (en) A kind of method and terminal identifying antagonism image
CN112764839B (en) Big data configuration method and system for management service platform
CN115564156A (en) Transaction aggregation occurrence early warning method based on machine learning and application thereof
CN112749702B (en) Image recognition method, device, terminal and storage medium
CN109933969B (en) Verification code identification method and device, electronic equipment and readable storage medium
CN111966851A (en) Image recognition method and system based on small number of samples
CN111860655A (en) User processing method, device and equipment
CN111143626A (en) Group partner identification method, device, equipment and computer readable storage medium
CN109635286B (en) Policy hotspot analysis method and device, computer equipment and storage medium
CN111178443B (en) Model parameter selection, image classification and information identification methods, devices and equipment
CN113672783B (en) Feature processing method, model training method and media resource processing method
CN112732398B (en) Big data visualization management method and system based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant