CN111585851B

CN111585851B - Method and device for identifying private line user

Info

Publication number: CN111585851B
Application number: CN202010285794.6A
Authority: CN
Inventors: 班瑞; 李彤; 马季春; 白海龙; 陈泉霖; 郝宇飞; 王鹏; 邹雨佳; 王佳
Original assignee: China United Network Communications Group Co Ltd; China Information Technology Designing and Consulting Institute Co Ltd
Current assignee: China United Network Communications Group Co Ltd; China Information Technology Designing and Consulting Institute Co Ltd
Priority date: 2020-04-13
Filing date: 2020-04-13
Publication date: 2021-11-19
Anticipated expiration: 2040-04-13
Also published as: CN111585851A

Abstract

The application provides a method and a device for identifying a private line user, relates to the technical field of communication, and is used for carrying out efficient automatic identification on the private line user. The method comprises the following steps: the server acquires service characteristic data of a user to be identified through Deep Packet Inspection (DPI), wherein the service characteristic data comprises one or more of the following items: the method comprises the steps that a service preference label, a device connection type, a device connection number, a service time attribute and a geographic position label of a user to be identified are obtained; the server inputs the service characteristic data of the user to be identified into the special line user identification model, and determines whether the user to be identified is a special line user; and if the user to be identified is the private line user, the server sets a private line label for the user to be identified, wherein the private line label is used for indicating that the user to be identified is the private line user. The method and the device are applied to the server to identify the private line user.

Description

Method and device for identifying private line user

Technical Field

The present application relates to the field of communications, and in particular, to a method and an apparatus for identifying a private line user.

Background

In modern operation, operators often configure some customers who have long fixed network IP address resources as static private subscribers (IPHOST), or simply private subscribers. These private line users have many privileges in the operating network, for example, they can become online users and enjoy network resources for a long time by learning the physical MAC address in the broadband access server (BRAS). Even if the private line user does not learn the MAC address, the already configured existing IP address cannot be provided to other users. That is, a private line user, whether truly online or not, monopolizes the IP address and some network resources.

Based on the above situation, the operator needs to identify the private line user in the process of providing the communication service. At present, the identification of the private line user mainly depends on the association with the private line resource information for identification. However, this approach has a number of disadvantages: when the private line resource information is incomplete or the related resource information cannot be provided, the condition of missing identification or incapability of identification exists; the special line resource information needs manpower to synchronize timing data, and a certain time difference exists. Furthermore, these disadvantages result in the recognition accuracy of the method not reaching the needs of the operator.

Therefore, a suitable solution is needed at present to solve the problem of how to realize efficient automatic identification of private line users.

Disclosure of Invention

The application provides a method and a device for identifying a private line user, which are used for solving the problem of how to realize high-efficiency automatic identification of the private line user at the present stage.

In order to achieve the purpose, the following scheme is adopted in the application:

in a first aspect, the present application provides a method for identifying a private line user, including: the server acquires service characteristic data of a user to be identified through Deep Packet Inspection (DPI), wherein the service characteristic data comprises one or more of the following items: the service preference label, the equipment connection type, the equipment connection quantity, the service time attribute and the geographic position label of the user to be identified. The server inputs the service characteristic data of the user to be identified into a pre-trained private line user identification model, and determines whether the user to be identified is a private line user. And if the user to be identified is the private line user, the server sets a private line label for the user to be identified, wherein the private line label is used for indicating that the user to be identified is the private line user.

Based on the technical scheme, the server acquires the service characteristic data of the user to be identified through the DPI, wherein the service characteristic data comprises one or more of the following items: the service preference label, the equipment connection type, the equipment connection quantity, the service time attribute and the geographic position label of the user to be identified. And then, the server constructs a special line user identification model by using a machine learning mode according to the service characteristic data of a plurality of users to be identified in historical time. In the process of constructing the private line user identification model, the service characteristic data of the user to be identified provides characteristic data of three aspects of service attribute, equipment type and number and geographic position for constructing the private line user identification model, and the identification rate and accuracy of the private line user identification model to the private line user are improved. And finally, the server inputs the current service characteristic data of the user to be identified into the special line user identification model, so that the effect of automatically identifying the special line user with high efficiency is realized.

In one possible design, the server obtains the service characteristic data of the user through the DPI, and includes: the server collects original flow through DPI to generate a DPI ticket; wherein the DPI ticket comprises one or more of the following items: the usage period of the IP, the usage location, the device type, the protocol type distribution, the http request key set.

In one possible design, the server constructs a special line user recognition model according to a Support Vector Machine (SVM) classifier.

In one possible design, the construction of the private line user identification model specifically includes: the method comprises the steps of obtaining a learning data set, wherein the learning data set comprises a plurality of sample data, and each sample data comprises service characteristic data of a sample user. And performing data preprocessing on the service characteristic data, wherein the data preprocessing comprises filtering data with invalid values and missing values. And constructing a special line user identification model according to the service characteristic data after data preprocessing.

In a second aspect, the present application provides a server comprising: the acquisition module is used for acquiring service characteristic data of a user to be identified through Deep Packet Inspection (DPI), wherein the service characteristic data comprises one or more of the following items: the service preference label, the equipment connection type, the equipment connection quantity, the service time attribute and the geographic position label of the user to be identified. The processing module is used for inputting the service characteristic data of the user to be identified into the trained private line user identification model and determining whether the user to be identified is a private line user; the method is used for setting a private line label for the user to be identified when the user to be identified is the private line user, and the private line label is used for indicating that the user to be identified is the private line user.

In one possible design, the processing module is further configured to collect service traffic data of a user to be identified through a DPI, and generate a DPI ticket; wherein, the DPI call ticket comprises one or more of the following items: the usage period of the IP, the usage location, the device type, the protocol type distribution, the http request key set. And the acquisition module is also used for acquiring the service characteristic data of the user to be identified according to the DPI ticket.

In one possible design, the processing module is further configured to construct a special line user recognition model according to a Support Vector Machine (SVM) classifier.

In one possible design, the obtaining module is further configured to obtain a learning data set, where the learning data set includes a plurality of sample data, and each sample data includes service feature data of a sample user. The processing module is also used for carrying out data preprocessing on the service characteristic data, and the data preprocessing comprises the filtration of data with invalid values and missing values; and the method is also used for constructing a special line user identification model according to the service characteristic data after data preprocessing.

In a third aspect, the present application provides a server, comprising: a processor and a communication interface; the communication interface is coupled to a processor for executing a computer program or instructions for implementing the method for identifying a private line user as described in the first aspect and any one of the possible implementations of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, in which instructions are stored, and when the instructions are executed on a computer, the instructions cause the computer to perform the method for identifying a private line user described in any one of the possible implementation manners of the first aspect and the first aspect.

In a fifth aspect, the present application provides a computer program product containing instructions for causing a computer to perform the method for identifying a private line user described in the first aspect and any one of the possible implementations of the first aspect when the computer program product runs on the computer.

In a sixth aspect, the present application provides a chip comprising a processor and a communication interface, the communication interface being coupled to the processor, the processor being configured to execute a computer program or instructions to implement the method for identifying a dedicated subscriber as described in the first aspect and any one of the possible implementations of the first aspect.

Drawings

Fig. 1 is a schematic flowchart of an identification method for a private line subscriber according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another method for identifying a private line user according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of another server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The character "/" herein generally indicates that the former and latter associated objects are in an "or" relationship. For example, A/B may be understood as A or B.

The terms "first" and "second" in the description and claims of the present application are used for distinguishing between different objects and not for describing a particular order of the objects. For example, the first edge service node and the second edge service node are used for distinguishing different edge service nodes, and are not used for describing the characteristic sequence of the edge service nodes.

Furthermore, the terms "including" and "having," and any variations thereof, as referred to in the description of the present application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, article, or apparatus.

In addition, in the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or explanations. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "e.g.," is intended to present concepts in a concrete fashion.

In order to facilitate understanding of the technical solutions of the present application, some technical terms are described below.

1. Deep packet inspection

Deep Packet Inspection (DPI) is a packet-based deep inspection technology, which performs deep inspection on different network application layer loads (such as HTTP, DNS, etc.), and determines the validity of the packet by inspecting the payload of the packet.

In the application, the server collects the source initial flow through the DPI to obtain the DPI ticket.

2. Support vector machine classifier

A Support Vector Machine (SVM) classifier is a generalized linear classifier (generalized linear classifier) that performs binary classification on data in a supervised learning manner, and a decision boundary of the SVM classifier is a maximum-margin hyperplane (maximum-margin hyperplane) that solves for a learning sample.

The SVM calculates an empirical risk (empirical risk) using a hinge loss function (change loss) and adds a regularization term to a solution system to optimize a structural risk (structural risk), which is a classifier with sparsity and robustness. SVMs can be classified non-linearly by a kernel method, which is one of the common kernel learning (kernel learning) methods. The following steps of the SVM classifier are briefly described:

given a training sample set: d ═ x₁，y₁),(x₂,y₂),...,(x_m,y_m),y_i∈{-1,1}。

The SVM linear classifier finds a hyperplane in two-dimensional space based on the training samples D to separate the two classes of samples.

Dividing the hyperplane in the sample space can be described by the following linear equation:

w^Tx_i+b＝0

wherein w ═ w₁；w₂；...；w_d) The direction of the hyperplane is determined for the normal vector, and b is the displacement term, and the distance between the hyperplane and the original point is determined. We shall hereinafter denote this hyperplane as (w, b). The distance of an arbitrary point x in sample space from the hyperplane can be written as:

if the hyperplane can correctly classify the training samples, then:

the training samples closest to the hyperplane hold the above equation, which is called support vector, and the sum of the distances from two heterogeneous support vectors to the hyperplane is:

the above equation is called interval, and it is desired to find the dividing hyperplane with the largest interval, i.e. to find the constraint parameters w and b that satisfy γ. By derivation, to maximize heterogeneous spacing, only | | w | | luminance needs to be maximized^-1The basic form of svm is available:

s.t.y_i(w^Tx_i+b)≥1,i＝1,2,...,m

the purpose is to obtain a model corresponding to the maximum interval hyperplane as follows:

f(x)＝w^Tx_i+b

from the above basic model, it can be seen that the objective function is quadratic and the constraint is linear. This is a convex quadratic programming problem that can be solved directly with off-the-shelf optimization computation packages, or with the "dual problem" and will not be described in detail here. In addition, if the training result still presents linear inseparability, kernel function can be added for further optimization.

The technical scheme provided by the embodiment of the application can be applied to the identification of the private line user, and is mainly used for solving the problem that the private line user cannot identify under the condition that the private line resource information is incomplete or a related resource table cannot be provided. In the prior art, identification is carried out mainly by associating with private line resource information, the identification method is quick and efficient, but under the condition that the private line resource information is not complete or related resources cannot be provided, identification is missed or cannot be identified. For example, when the private line resource information table is incomplete or asynchronous, the situation of being unable to be identified or being identified by mistake can occur. At this time, in order to solve the influence caused by the above situation, timing synchronization needs to be performed manually, and the private line resource information table needs to be updated, but this causes a time difference problem, and thus, a private line user cannot be identified accurately in time, and the consumption of human resources is increased.

According to the embodiment of the application, the special line user identification model is established to realize the identification of the special line user, the learning data set is obtained according to the historical service characteristic data of a plurality of sample users, and the learning data set is subjected to data processing and data analysis, so that the learning difficulty of the special line user identification model is reduced. And then, combining the learning data set with an SVM classifier to construct a special line user recognition model, and performing iterative training and algorithm adjustment to finally obtain the special line user recognition model which can be used for practice. The embodiment of the application is used for solving the problem that the existing special line user identification efficiency is low, and provides a good foundation for the maintenance and management of a follow-up operator on the special line user.

The technical solution provided by the present application is specifically explained below with reference to the drawings of the specification.

As shown in fig. 1, a method for identifying a private line user provided in an embodiment of the present application includes the following steps:

s101, the server acquires service characteristic data of a user to be identified through DPI.

Wherein the service characteristic data may comprise one or more of: the equipment model, the service account number, the service access type, the service access time, the operating system number, the IMEI and the GPS position information of the user to be identified.

Optionally, the server collects the service traffic data of the user to be identified through the DPI, and generates a DPI ticket. Wherein, the DPI call ticket comprises one or more of the following items: the usage period of the IP, the usage location, the device type, the protocol type distribution, the http request key set.

S102, the server inputs the service characteristic data of the user to be identified into a pre-trained special line user identification model.

Optionally, the server constructs a special line user recognition model according to the SVM classifier.

It can be understood that, after the server inputs the service characteristic data of the user to be identified into the trained private line user identification model, the private line user identification model outputs the identification result. Wherein, the recognition result includes: the user to be identified is a private line user or the user to be identified is a common user.

S103, the server marks the user to be identified according to the identification result output by the special line user identification model.

Optionally, the server marks the user to be identified according to the private line label and the common label. Wherein, the private line label is used for indicating that the marked user is a private line user, and the common label is used for indicating that the marked user is a common user. If the identification result output by the special line user identification model indicates that the user to be identified is a special line user, the server marks the user to be identified by using a special line label; and if the identification result output by the private line user identification model is that the user to be identified is a common user, the server marks the user to be identified by using a common label. For example, private users are labeled "1" and normal users are labeled "-1".

Based on the technical scheme, the server acquires the service characteristic data of the user to be identified through the DPI, then inputs the service characteristic data into the trained private line user identification model, determines whether the user to be identified is the private line user, and finally marks the identified private line user, so that an operator can conveniently maintain and manage the private line user. Therefore, the server can automatically identify the private line user by using the service characteristic data of the user to be identified and the trained private line user identification model, and meanwhile, the identification rate and the accuracy in the private line user identification process are improved.

As shown in fig. 2, another method for identifying a private line user provided in this embodiment of the present application includes constructing a private line user identification model, and after step S101, further includes the following steps:

s201, the server acquires a learning data set.

Wherein the learning data set comprises business feature data of a plurality of sample users.

For example, the server selects the service characteristic data of at least five thousand sample users as a data source, and then performs data standardization construction on the data source to obtain a learning data set. The data source contains service characteristic data of a certain proportion of private line users.

Optionally, the server performs data standardization construction on the data source, including extracting effective data features from the data source, and constructing a sample set with a single IP address as a unit; and the special line label is used for marking the special line user, and the common label is used for marking the common user. Wherein, the private line label is used for indicating that the marked user is a private line user, and the common label is used for indicating that the marked user is a common user.

It can be understood that the service characteristic data of a plurality of sample users is selected as a data source, so as to ensure the richness of data for training the private line user identification model, reduce the contingency of the training private line user identification model in the process of identifying the user type and improve the accuracy of the private line user identification model in the process of identifying the user type.

Optionally, the learning data set further includes a private line user resource information table. The private line user resource information table is used for indicating that the service characteristic data in the table come from the private line user.

Optionally, the server selects 70% of the learning data set as a training data set, and selects 30% of the learning data set as a testing data set. The training data set is used for initially constructing and training a special line user identification model; the test data set is used for testing the primarily constructed and trained private line user identification model and judging whether the accuracy rate meets the requirement of an accurate threshold value. The accuracy threshold may be set according to actual requirements, for example 95%.

It can be understood that the learning data set is divided into the training data set and the test data set according to a certain proportion, so as to ensure the reliability of the special line user identification model in identifying the user type, and if the training and the test use the same service characteristic data, the situation that the learning data set cannot be detected when the special line user identification model is over-fitted to the learning data set is avoided.

S202, the server carries out data preprocessing on the learning data set.

Optionally, the data preprocessing performed on the learning data set by the server includes: data containing invalid values and data containing missing values are filtered out.

It should be noted that, after the foregoing data standardization operation, the data is formatted into a fixed format, and there may be some invalid data, for example: the key field is empty, data whose data value is not within the filtering range. Therefore, the server is required to filter data containing invalid values and data containing missing values.

S203, the server analyzes the learning data set to obtain an effective characteristic value.

Optionally, the server takes the private line user and the ordinary user as statistical units, and performs statistical analysis on the learning data set to obtain the characteristic value. Wherein the characteristic value may include one or more of: the distribution of the port numbers of all the sessions, the distribution of the types of the session services, the distribution of Chinese and English vocabulary keywords of the session requests and the like under the private line user group and the common user group.

Illustratively, a feature point whose feature value satisfies the following formula is a large difference feature point, and the server sets the feature value of the large difference feature point as a valid feature value:

wherein n is_iAnd N_iThe subscript h represents that the statistic value is the statistic value of a common user, and the subscript s represents that the statistic value is the statistic value of a private line user; the heat value of the private line user on each characteristic point i is S_iThe common user is H_i(ii) a Difference of characteristic pointThe magnitude-difference threshold is T.

It can be understood that the judgment of the feature point with larger difference between the private line user and the ordinary user and the setting of the feature value of the feature point with larger difference as the effective feature value are to extract effective data so as to reduce the learning difficulty of the private line user identification model. For example, a government agency is located in the same geographical area as a residential cell, and the peak time of broadband use of the government agency is generally 8: 30 to 17: 30, while the peak time of broadband use of residents of the residential cell is generally 19: 30 to 23: 30. Under the condition, the characteristic point of the internet peak time is selected as a characteristic point with larger difference, and the time characteristic value is extracted as an effective characteristic value, so that the learning difficulty of the special line user identification model can be reduced.

And S204, the server constructs a special line user identification model according to the SVM classifier.

Optionally, the server sets an initial model parameter list. For example, the SVM classifier may select and configure various parameters from the Sklearn library.

Optionally, the optimal parameters of the model construction are searched from the initial model parameter list according to a cross validation grid search method. Illustratively, the search process is as follows: selecting a group of parameters from the selectable parameter list to construct a private line user identification model; inputting training data to start training; after a set training target is reached, the model is stored in a model library for model quality comparison; continuing to circulate the first step operation until the initial model parameter list is completely traversed; and finally, selecting an optimal group of parameters and training a special line user identification model according to the group of parameters.

Illustratively, the SVM classifier may select the following parameters from the Sklearn library:

(1) penalty coefficient C

The larger the penalty coefficient C of the error item is, namely the greater the penalty degree of the error sample is, so that the accuracy rate in the training sample is higher, but the generalization capability is reduced; on the contrary, if the penalty coefficient C is reduced, some misclassified wrong samples in the training samples are allowed, and the generalization capability is strong.

(2) Kernel parameter

The parameters are used to select kernel functions used by the model, and the kernel functions commonly used in the algorithm are: linear kernel function; a poly polynomial kernel; rbf diameter image kernel function/gaussian kernel; a sigmod kernel function; a precomputed kernel matrix.

(3) gamma parameter

The parameters are kernel coefficients, and are valid only for rbf, poly, and sigmod. If gamma is set to auto, it represents that the value is the reciprocal of the sample feature number, i.e., 1/n _ features. Also, other values may be set.

(4) degree parameter

This parameter is only useful for 'kernel ═ poly' (polynomial kernel) and refers to the order n of the polynomial kernel, which is automatically ignored if the given kernel parameter is other kernels.

(5) coef0 parameter

The parameter represents a separate term in the kernel function, and is only useful for poly and sigmod kernel functions, and refers to the parameter c in the kernel function.

It can be understood that when the SVM classifier selects and configures a plurality of parameters, the selection may be performed according to the effective feature values in the training data set. After the SVM classifier selects and configures various parameters and establishes an initial model parameter list, the SVM classifier initially establishes a special line user recognition model according to a training data set. Therefore, when the SVM classifier establishes the special line user recognition model, the effective characteristic value in the training data set is selected, and the learning difficulty of the special line user recognition model is reduced.

S205, the server confirms whether the private line user identification model can be used for identifying the actual private line user.

Optionally, the server presets a standard-reaching threshold, and determines whether the identification capability of the private line user identification model reaches the standard according to the standard-reaching threshold and the F value, and whether the identification capability can be used for identifying the actual private line user. And the F value is used for reflecting the strength of the identification capability of the special line user identification model.

Optionally, two metric values in the field of statistical classification are selected: accuracy and recall are used as parameters in calculating the F value. In the present application, the method for calculating the F value using the accuracy and recall is as follows:

it should be noted that the above accuracy and recall are calculated based on the test data set. After the test data set is used for testing the primarily constructed and trained private line user identification model, the accuracy and the recall rate are counted according to the test result.

For example, the server may set the compliance threshold to 80%, and when the F value of the private line user identification model reaches 80%, the identification capability of the private line user identification model at this time is considered to be strong enough to be used for identification of the actual private line user.

Based on the technical scheme, the server acquires the learning data set according to the historical service characteristic data of the plurality of sample users, and performs data processing and data analysis on the learning data set so as to reduce the learning difficulty of the special line user identification model. And then, combining the learning data set with an SVM classifier to construct a special line user recognition model, and performing iterative training and algorithm adjustment to finally obtain the special line user recognition model which can be used for practice. And finally, when the recognition capability of the special line user recognition model reaches a preset standard, the special line user recognition model is used for recognizing the actual special line user. According to the embodiment of the application, the special line user identification model is constructed in a machine learning mode, and the identification capability of the special line user identification model is improved.

In the embodiment of the present application, the server may be divided into the functional modules or the functional units according to the above method examples, for example, each functional module or functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module may be implemented in a form of hardware, or may be implemented in a form of a software functional module or a functional unit. The division of the modules or units in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

As shown in fig. 3, a schematic structural diagram of a server 30 provided in the embodiment of the present application is used for executing the above method for identifying a private line user, where the server 30 includes:

an obtaining module 301, configured to obtain service characteristic data of a user to be identified through deep packet inspection DPI, where the service characteristic data includes one or more of the following: the service preference label, the equipment connection type, the equipment connection quantity, the service time attribute and the geographic position label of the user to be identified.

The processing module 302 is configured to input service feature data of the user to be identified into the trained private line user identification model, and determine whether the user to be identified is a private line user; the method is used for setting a private line label for the user to be identified when the user to be identified is the private line user, and the private line label is used for indicating that the user to be identified is the private line user.

Optionally, the processing module 302 is further configured to collect service traffic data of the user to be identified through a DPI, and generate a DPI ticket; wherein, the DPI call ticket comprises one or more of the following items: the usage period of the IP, the usage location, the device type, the protocol type distribution, the http request key set.

Optionally, the obtaining module 301 is further configured to obtain service feature data of the user to be identified according to the DPI ticket.

Optionally, the processing module 302 is further configured to construct a special line user recognition model according to a support vector machine SVM classifier.

Optionally, the obtaining module 301 is further configured to obtain a learning data set, where the learning data set includes a plurality of sample data, and each sample data includes service feature data of one sample user.

Optionally, the processing module 302 is further configured to perform data preprocessing on the service feature data, where the data preprocessing includes filtering data with invalid values and missing values; and the method is also used for constructing a special line user identification model according to the service characteristic data after data preprocessing.

Fig. 4 shows a schematic diagram of another possible structure of the server involved in the above embodiment. The device includes: a processor 402 and a communication interface 403.

In the case where the server shown in fig. 3 is implemented as the server shown in fig. 4, the processor 402 is configured to control and manage the actions of the apparatus, for example, to perform the steps performed by the processing module 302 described above, and/or to perform other processes for the techniques described herein. The communication interface 403 is used to support the server's communication with other network entities. Such as the steps performed by the acquisition module 301. The server may also include a memory 401 and a bus 404, the memory 401 being used to store program codes and data for the devices.

The processor 402 may implement or execute various illustrative logical blocks, units, and circuits described in connection with the disclosure herein. The processor may be a central processing unit, general purpose processor, digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, units, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others.

Memory 401 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.

The bus 404 may be an Extended Industry Standard Architecture (EISA) bus or the like. The bus 404 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.

Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. For the specific working processes of the system, the apparatus, and the module described above, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.

Embodiments of the present application provide a computer program product including instructions, which when run on a computer, cause the computer to execute the method for identifying a node of an internet of things according to the foregoing method embodiments.

An embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the network device executes the instructions, the network device executes each step executed by the network device in the method flow shown in the foregoing method embodiment.

The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, and a hard disk. Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM), registers, a hard disk, an optical fiber, a portable Compact disk Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any other form of computer-readable storage medium, in any suitable combination, or as appropriate in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). In embodiments of the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for identifying a private line user, the method comprising:

acquiring service characteristic data of a user to be identified through Deep Packet Inspection (DPI), wherein the service characteristic data comprises one or more of the following items: the method comprises the steps that a service preference label, a device connection type, a device connection number, a service time attribute or a geographical position label of a user to be identified;

acquiring a learning data set, wherein the learning data set comprises a plurality of sample data, and each sample data comprises the service characteristic data of a sample user;

analyzing the learning data set to obtain an effective characteristic value; the valid eigenvalues satisfy the following formula:

wherein n is_iAnd N_iThe subscript h represents that the statistic value is the statistic value of a common user, and the subscript S represents that the statistic value is the statistic value of a private line user; the heat value of the private line user on each characteristic point i is S_iThe common user is H_i(ii) a The difference threshold value of the feature points is T;

establishing a special line user recognition model according to the effective characteristic values in the training data set by using a Support Vector Machine (SVM) classifier; inputting the service characteristic data of the user to be identified into the pre-trained private line user identification model, and determining whether the user to be identified is a private line user;

and if the user to be identified is the private line user, setting a private line label for the user to be identified, wherein the private line label is used for indicating that the user to be identified is the private line user.

2. The method for identifying a private line subscriber according to claim 1, wherein the step of obtaining the service feature data of the subscriber by a DPI specifically comprises:

collecting the service flow data of the user to be identified through DPI to generate a DPI ticket; wherein the DPI ticket comprises one or more of the following items: the usage period, usage location, device type, protocol type distribution, or http request keyword set of the IP;

and acquiring the service characteristic data of the user to be identified according to the DPI ticket.

3. The method of identifying a private line user according to claim 1 or 2, wherein after said obtaining a learning data set, the method further comprises:

and performing data preprocessing on the service characteristic data, wherein the data preprocessing comprises filtering data with invalid values and missing values.

4. A server, characterized in that the server comprises:

the system comprises an acquisition module and a processing module, wherein the acquisition module is used for acquiring service characteristic data of a user to be identified through Deep Packet Inspection (DPI), and the service characteristic data comprises one or more of the following items: the method comprises the steps that a service preference label, a device connection type, a device connection number, a service time attribute and a geographic position label of a user to be identified are obtained;

the obtaining module is further configured to obtain a learning data set, where the learning data set includes a plurality of sample data, and each sample data includes the service feature data of one sample user;

the processing module is used for analyzing the learning data set to obtain an effective characteristic value; the valid eigenvalues satisfy the following formula:

the processing module is also used for establishing a special line user recognition model according to the effective characteristic values in the training data set by utilizing a vector machine (SVM) classifier;

the processing module is used for inputting the service characteristic data of the user to be identified into the trained private line user identification model and determining whether the user to be identified is a private line user; and the special line tag is used for setting a special line tag for the user to be identified when the user to be identified is the special line user, and the special line tag is used for indicating that the user to be identified is the special line user.

5. The server according to claim 4,

the processing module is further used for acquiring the service flow data of the user to be identified through DPI and generating a DPI ticket; wherein the DPI ticket comprises one or more of the following items: the method comprises the steps of IP use time period, use position, equipment type, protocol type distribution and http request keyword set;

the obtaining module is further configured to obtain the service characteristic data of the user to be identified according to the DPI ticket.

6. The server according to claim 4 or 5,

the processing module is further configured to perform data preprocessing on the service feature data, where the data preprocessing includes filtering data with invalid values and missing values.

7. A server, comprising: a processor and a communication interface; the communication interface is coupled to the processor, which is configured to run a computer program or instructions to implement the method for identifying a private line subscriber as claimed in any one of claims 1 to 3.

8. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a computer, cause the computer to perform the method for identifying a private line subscriber according to any one of claims 1 to 3.