CN111275453A - Industry identification method and system of Internet of things equipment - Google Patents

Industry identification method and system of Internet of things equipment Download PDF

Info

Publication number
CN111275453A
CN111275453A CN201811466893.3A CN201811466893A CN111275453A CN 111275453 A CN111275453 A CN 111275453A CN 201811466893 A CN201811466893 A CN 201811466893A CN 111275453 A CN111275453 A CN 111275453A
Authority
CN
China
Prior art keywords
equipment
identified
xdr
industry
xdr record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811466893.3A
Other languages
Chinese (zh)
Inventor
吴瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Shanghai Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Shanghai Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201811466893.3A priority Critical patent/CN111275453A/en
Publication of CN111275453A publication Critical patent/CN111275453A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides an industry identification method and system of Internet of things equipment. The method comprises the following steps: inputting the characteristic data of the XDR record flow of the equipment to be identified into the trained neural network model, and outputting an industry class label corresponding to the XDR record flow of the equipment to be identified; and acquiring the industry type of the equipment to be identified according to the industry type label output by the neural network model. According to the industry identification method and system of the Internet of things equipment, provided by the embodiment of the invention, the industry category of the equipment to be identified is obtained by inputting the characteristic data of the XDR record stream of the equipment to be identified into the trained neural network model. The problem of in the prior art under some circumstances the operator can't be according to the APN of equipment or the card number of thing networking card discernment equipment belongs to the trade is solved to even can be according to the APN of equipment or the card number of thing networking card discernment equipment belong to the trade, also cause information security hidden danger easily is solved.

Description

Industry identification method and system of Internet of things equipment
Technical Field
The embodiment of the invention relates to the technical field of Internet of things, in particular to an industry identification method and system of Internet of things equipment.
Background
With the vigorous development of the internet of things, many cities reach a state of being beyond the reach of people, that is, the number of devices connected to the internet of things in the city exceeds the number of mobile phone users, according to the trend, the number of devices connected to the internet of things will gradually increase, and as a communication operator, in order to better manage the internet of things, the service quality, performance and influence on the internet of things of each device connected to the internet of things need to be mastered and known, and in order to master and know the service quality, performance and influence, the most core and key is to identify industries to which each device connected to the internet of things belongs.
In the prior art, a method for identifying industries to which devices connected to the internet of things belong is as follows: and identifying the industry to which each device belongs through the card number of the Internet of things card of each device or the activated Access Point Name (APN). The reason is as follows: if a certain enterprise has a batch of equipment to be connected into the Internet of things, a worker of the enterprise firstly purchases a batch of Internet of things cards from an operator, and the operator associates the batch of Internet of things cards with one APN during purchase; and then, the staff allocates each internet of things card to the corresponding equipment for use, so that the equipment can activate the associated APN through the corresponding internet of things card to connect into the internet of things. It should be noted that, during purchase, if the operator is notified by the staff of the industry to which the device using the batch of internet of things cards belongs, the operator associates the APN with the operator, so that the operator can identify the industry to which the device belongs according to the APN of the device or the card number of the internet of things card.
However, the above method also has certain disadvantages:
(1) for a batch of internet of things purchased by a worker, when the worker does not inform the industry to which the equipment using the batch of internet of things belongs, an operator cannot identify the industry to which the equipment belongs according to the APN of the equipment or the card number of the internet of things.
(2) Even if the industry to which the equipment belongs can be identified according to the APN of the equipment or the card number of the Internet of things card, information safety hazards are easily caused when the industry to which the equipment belongs is identified through the card number of the APN or the Internet of things card because the card number of the APN or the Internet of things card relates to privacy information of the equipment.
Disclosure of Invention
Aiming at the technical problems in the prior art, the embodiment of the invention provides an industry identification method and system of Internet of things equipment.
In a first aspect, an embodiment of the present invention provides an industry identification method for internet of things devices, including:
inputting the characteristic data of the XDR record flow of the equipment to be identified into a trained neural network model, and outputting an industry class label corresponding to the XDR record flow of the equipment to be identified, wherein the neural network model is obtained by training the characteristic data of the sample XDR record flow based on the sample equipment and the industry class label of the predetermined sample XDR record flow;
and acquiring the industry category of the equipment to be identified according to the industry category label output by the neural network model.
In a second aspect, an embodiment of the present invention provides an industry identification system for internet of things devices, including:
the system comprises an input module, a neural network model and a control module, wherein the input module is used for inputting the characteristic data of the XDR record flow of the equipment to be identified into a trained neural network model and outputting an industry class label corresponding to the XDR record flow of the equipment to be identified, and the neural network model is obtained by training the characteristic data of the sample XDR record flow based on sample equipment and a predetermined industry class label of the sample XDR record flow;
and the output module is used for acquiring the industry category of the equipment to be identified according to the industry category label output by the neural network model.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method provided in the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.
According to the industry identification method and system of the equipment of the Internet of things, provided by the embodiment of the invention, the industry category of the equipment to be identified is obtained by inputting the characteristic data of the XDR record stream of the equipment to be identified into the trained neural network model. The problem of in the prior art under some circumstances the operator can't be according to the APN of equipment or the card number of thing networking card discernment equipment belongs to the trade is solved to even can be according to the APN of equipment or the card number of thing networking card discernment equipment belong to the trade, also cause information security hidden danger easily is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of an industry identification method for internet of things equipment according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of industry identification of an internet of things device according to an embodiment of the present invention;
fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The internet of things is the internet with which things are connected, wherein things refer to internet of things devices in the embodiment of the invention, namely devices connected into the internet of things, and things-to-things connection refers to the fact that communication and data transmission exist among the devices. Generally, the communication between the devices needs to perform data transmission through a management platform in the cloud, and the transmission channel is generally provided by an operator. An operator needs to manage the internet of things while providing a transmission channel for the internet of things, and in order to manage the internet of things, the most core and key is to identify industries to which devices connected into the internet of things belong. In the prior art, the industry to which each device belongs is identified through the card number of the internet of things card of each device or the activated APN, and the defect that in some cases, an operator cannot identify the industry to which the device belongs according to the APN of the device or the card number of the internet of things card, and even if the industry to which the device belongs can be identified according to the APN of the device or the card number of the internet of things card, information safety hazards are easily caused through the method.
In order to solve the above problems, embodiments of the present invention provide an industry identification method for internet of things devices, which can be applied to an industry identification scenario of the internet of things devices. The execution main body corresponding to the method may be a switch, a router, a computer terminal, a server, or other devices with corresponding functions, or may also be an independently set device or module, which is not specifically limited in this embodiment of the present invention. For convenience of description, the embodiment of the present invention takes an execution subject as an example of a switch, and explains an industry identification method for an internet of things device provided in the embodiment of the present invention.
Fig. 1 is a flowchart of an industry identification method for internet of things equipment according to an embodiment of the present invention, and as shown in fig. 1, the method includes:
step 101, inputting the characteristic data of the XDR record stream of the device to be identified into a trained neural network model, and outputting an industry class label corresponding to the XDR record stream of the device to be identified, wherein the neural network model is obtained by training the characteristic data of the sample XDR record stream based on the sample device and the industry class label of the predetermined sample XDR record stream.
It should be noted that an industry category label corresponding to an External Data Representation (XDR) recording stream of the device to be identified is also an industry category label of the device to be identified, the industry category of the device to be identified can be known through the label, and an industry category label of the sample XDR recording stream is also an industry category label of the sample device, and the industry category of the sample device can be known through the label.
Further, the device to be identified and the sample device are explained, the device to be identified is the device to be subjected to industry identification, and the sample device is the device of which the industry category is known. The XDR record flow of the device to be identified includes multiple XDR records, and each XDR record refers to a detailed record of signaling and service of the device to be identified at a time, so the XDR record flow of the device to be identified is also a detailed record flow of signaling and service of the device to be identified in a time period. It should be noted that the sample XDR record flow of the sample device is similar to the XDR record flow of the device to be identified, and the description thereof is omitted here.
The neural network model can be a long-short term memory network model, a deep belief network and the like, and the neural network model is not particularly limited in the embodiment of the invention. It can be understood that the neural network model can be trained through the feature data of the sample XDR record flow of the sample device and the predetermined industry class label of the sample XDR record flow, so as to perform industry identification on the device to be identified through the trained neural network model.
And 102, acquiring the industry type of the equipment to be identified according to the industry type label output by the neural network model.
And the data stream output result of the neural network model is an industry class label corresponding to the XDR record stream of the equipment to be identified, and the industry class of the equipment to be identified can be obtained according to the industry class label because the association relation between the industry class of the sample equipment and the industry class label is known.
According to the method provided by the embodiment of the invention, the industry category of the equipment to be identified is obtained by inputting the characteristic data of the XDR record stream of the equipment to be identified into the trained neural network model. The problem of in the prior art under some circumstances the operator can't be according to the APN of equipment or the card number of thing networking card discernment equipment belongs to the trade is solved to even can be according to the APN of equipment or the card number of thing networking card discernment equipment belong to the trade, also cause information security hidden danger easily is solved.
On the basis of the above embodiments, as an alternative embodiment, the embodiments of the present invention will be described with reference to preparation work required before industry identification of a device to be identified. Before inputting the characteristic data of the XDR record flow of the device to be identified into the trained neural network model and outputting the industry class label corresponding to the XDR record flow, the method further comprises the following steps:
and acquiring the industry category of the sample equipment, and setting a corresponding label for the XDR record flow of the sample equipment.
It should be noted that, in order to train the neural network model so that it can identify the industry class of the device to be identified, it is necessary to accurately obtain the industry class of the sample device, and set a corresponding label for the sample XDR record stream of the sample device.
Further, industry categories of the sample equipment can be obtained through Deep Packet Inspection (DPI) technology, specifically, the operating principle of the DPI technology is to perform load matching in an application layer, and identify corresponding industry categories according to characteristics corresponding to flows of different industry categories.
According to the method provided by the embodiment of the invention, the industry category of the sample equipment is obtained through the DPI technology, so that the corresponding label is set for the XDR record flow of the sample equipment, the accuracy and precision of the trained neural network model are greatly improved, and the accuracy and precision of industry identification of the equipment are greatly improved.
On the basis of the foregoing embodiments, as an alternative embodiment, the embodiment of the present invention explains acquisition of an XDR record stream of a device to be identified and a sample XDR record stream of a sample device. And the XDR record flow of the equipment to be identified and the sample XDR record flow of the sample equipment are acquired from each interface of the network based on a deep packet inspection technology.
Specifically, not only the industry class of the sample device may be acquired by the DPI technology, but also the XDR record flow of the device to be identified and the sample XDR record flow of the sample device may be acquired by the DPI technology.
Further, the XDR record stream of the device to be identified and the sample XDR record stream of the sample device are obtained in a similar manner, and only the obtaining of the XDR record stream of the device to be identified is explained here. Specifically, at any time, data of the device to be identified is acquired from an S1-MME interface and an S1-U interface acquired from a Long Term Evolution (LTE) core network, and a Gb interface and a Gn interface in a Global System For Mobile Communications (GSM) network, and then, one XDR record is generated by processing the data based on a DPI technology, so that a plurality of XDR records at a plurality of times can be acquired according to the above steps to form an XDR record flow.
On the basis of the above embodiments, as an alternative embodiment, the embodiments of the present invention describe each interface of the network. Each interface of the network comprises any one or more of an S1-MME interface, an S1U interface, a Gb interface and a Gn interface.
Specifically, an S1-MME interface and an S1-U interface are both interfaces in an LTE core network, and in addition, in consideration of the existing service scenario, a large number of internet of things service applications are still loaded in a GSM network, so data of a Gb interface in the GSM network is collected at the same time, and in consideration of the case that a gateway used by a specific internet of things service application uses an provincial gateway, data of a Gn interface is collected.
On the basis of the above embodiments, as an alternative embodiment, the embodiment of the present invention explains the selection of the feature data. The characteristic data includes: any one or more of different rate ratios, overall rate mean, partial rate mean, different distance division ratio, overall distance mean, partial distance mean, number of passed cells, overall number of cells mean, partial number of cells mean.
It should be noted that the feature data used for neural network model training needs to be consistent with the selection of the feature data for industry identification.
In the prior art, common service indexes of the communication industry, such as position updating frequency, service initiating frequency and the like, are generally used for distinguishing users, but the method is not suitable for industry identification of the internet of things equipment in a testing process. The reason is mainly that, although the service flows of the devices of various industries of the internet of things are different, the communication modules or the same communication modules are used, and even if different communication modules are used, the devices of the internet of things and the man-machine network terminal cannot be distinguished from each other from a mass of terminals due to the fact that the communication modules are based on the communication service specifications of the international third Generation Partnership Project (3 GPP). Especially, in the service of the internet of things based on the GSM network, the phenomenon is more common.
Based on the above, the embodiment of the invention selects three characteristics of the cell number, the moving distance and the speed in a specific time period from three physical dimensions of time dimension, space dimension and speed, wherein the characteristics of the cell number in the specific time period comprise the ratio of the passed cell numbers, the average value of the total cell number and the average value of the partial cell number; the moving distance class characteristics comprise different distance dividing ratios, an overall distance mean value and a partial distance mean value; the rate class characteristics include different rate ratios, overall rate means, and partial rate means. The characteristics are obviously different from the traditional communication service indexes, and the daily behavior mode of the Internet of things equipment can be more closely described. Meanwhile, the above characteristics can be obtained based on XDR record flow calculation, and feasibility is achieved.
The following exemplifies the calculation of three features of different distance division ratio, overall distance mean and partial distance mean in the moving distance class features:
in this example, when XDR recording streams for 5 days of the vehicle are acquired, the farthest distances over which the vehicle moves in the day are calculated from the XDR data streams for each day, and assuming that the farthest distance of the vehicle on the first day is 10km and the farthest distances of the vehicles on the next four days are 20km, 50km, 60km, and 40km in this order, the average values of the five farthest distances are taken as the total distance average value, and the average values of the three farthest distances on the last three days are taken as the partial distance average values. And the calculation method for different distance division ratios is as follows:
the farthest distances moved by the vehicle per day are ranked according to a preset distance ranking, for example, the farthest distance moved per day is not more than 5km is rated as a short distance, the farthest distance moved per day is more than 5km and not more than 20km is rated as a medium distance, the farthest distance moved per day is more than 20km is rated as a long distance, then for the vehicle, only the farthest distances of the first day and the second day are rated as medium distances, the farthest distances of the other three days are rated as long distances, and then the different distance rating of the vehicle is (0, 2/5, 3/5).
The calculation of the farthest distance that the vehicle moves during a day is explained here: first, an XDR record stream of the vehicle in the day, i.e. a plurality of XDR records, is obtained, and since at least time information and cell information where the vehicle is located are recorded in each XDR record, a location point where the vehicle is located can be determined by the cell information, and then a plurality of location points of the vehicle can be obtained according to the plurality of XDR records. Then, all cells through which the vehicle passes are determined, and the centroids of all cells are determined. Subsequently, the position point a farthest from the centroid is determined, and the position point B farthest from a is determined, where the distance between A, B is the farthest distance that the vehicle moved in the day.
The calculation of three characteristics, i.e., the ratio of the number of passed cells, the average of the total number of cells, and the average of the number of partial cells in the specific time period cell number class characteristic, is exemplified as follows:
in this example, when the XDR log stream for 5 days of the vehicle is acquired, the number of cells through which the vehicle passes in the day is calculated from the XDR data stream for each day, and assuming that the number of cells through which the vehicle passes in the first day is 10 and the number of cells through which the vehicle passes in the next four days is 20, 50, 60, and 40 in this order, the average value of the five cell numbers is taken as the total cell number average value, and the average value of the three cell numbers in the following three days is taken as the partial cell number average value. And the calculation method for the ratio of the passed cell numbers is as follows:
the number of cells passed by the vehicle per day is ranked according to a preset cell number ranking, for example, if the number of cells passed per day is not more than 5, the number of cells passed per day is more than 5 and not more than 20, the number is medium, the number of cells passed per day is more than 20, the number is high, then for the vehicle, only the number of passed cells on the first and second days is ranked as medium, the number of passed cells on the other three days is ranked as high, and then the number of passed cells for the vehicle at this time is (0, 2/5, 3/5).
The following illustrates the calculation of three characteristics, i.e., different rate ratios, overall rate mean, and partial rate mean, in the rate class characteristics:
in this example, if an XDR recording stream is acquired within 5 days of the vehicle, the maximum speed of the vehicle within the day is calculated from the XDR data stream within each day, and assuming that the maximum speed of the vehicle on the first day is 10km/h, and the maximum speeds of the vehicle on the next four days are 20km/h, 50km/h, 60km/h and 40km/h in this order, the average value of the five maximum speeds is taken as the total speed average value, and the average value of the three maximum speeds on the last three days is taken as the partial speed average value. And the calculation method for different rate ratios is as follows:
the maximum rate of vehicles per day is rated according to a preset rate rating, for example, the maximum rate of vehicles per day is rated as low rate, the maximum rate of vehicles per day is rated as medium rate, the maximum rate of vehicles per day is rated as high rate, the maximum rate of vehicles per day is rated as medium rate, the rate of vehicles per day is rated as high rate, only the rates of the first day and the second day are rated as medium rate, the rates of the other three days are rated as high rate, and then the different rate ratio of vehicles is (0, 2/5, 3/5).
On the basis of the above embodiments, inputting the characteristic data of the XDR recording stream of the device to be recognized into the trained neural network model, includes:
filtering and cleaning an XDR record stream of equipment to be identified to generate a target XDR record stream;
inputting the characteristic data of the target XDR record flow into the trained neural network model.
According to the method provided by the embodiment of the invention, the XDR record flow of the equipment to be identified is filtered and cleaned to remove the useless records, so that the computing resources are saved, and the processing efficiency is improved.
On the basis of the above embodiments, filtering and cleaning the XDR record stream of the device to be identified to generate a target XDR record stream, including:
for each XDR record in the XDR record stream of the equipment to be identified, if the APN field in the XDR record is judged to be known not to meet the first preset condition and/or not to contain the IMSI field, deleting the XDR record from the XDR record stream of the equipment to be identified, and taking the rest XDR records as target XDR record streams.
It should be noted that one available XDR record at least includes an APN field and an International Mobile Subscriber Identity (IMSI) field, where the APN field is a field that identifies whether a device corresponding to the XDR record is an internet of things device or a personal network terminal, and therefore, the APN field does not satisfy the first preset condition indicates that the APN field is a field that identifies the personal network terminal, and if the XDR record does not satisfy the first preset condition, it is determined that the XDR record is not available and needs to be deleted.
However, an available XDR record must include an IMSI field, and therefore, if the XDR record does not include an IMSI field, it is determined that the XDR record is not available and needs to be deleted.
As an alternative embodiment, the embodiment of the present invention explains the construction and training of the neural network model:
(1) XDR log stream filtering and cleaning
The XDR record stream is obtained, and the XDR record stream is filtered and cleaned, and the filtering and cleaning processes are described in detail in the above embodiments, and are not described herein again.
(2) Anomaly detection and normalization
Extracting the characteristic data of the filtered and cleaned XDR record flow, then carrying out abnormity detection on the characteristic data, deleting the abnormal characteristic data, and standardizing the normal characteristic data, wherein:
abnormality detection: and detecting the characteristic data to verify whether the characteristic data meets the original purpose of index design and business meaning. For example, whether the feature data is abnormal or not is checked, and if the moving distance of one day exceeds 400km, the feature data is judged to be abnormal and deleted.
Data normalization: and (3) carrying out standardization processing on partial data according to modeling requirements, and carrying out mathematical operation on different indexes on the same dimension. For example, the value v of the feature data a is normalized based on the mean and standard deviation of a, and v' after normalizing the value v is calculated by the following formula:
Figure BDA0001890043350000101
wherein σAAnd
Figure BDA0001890043350000102
mean and standard deviation of a, respectively.
z-score normalization (positive-score normalization) is the normalization of data based on the mean and standard deviation (standard deviation) of the raw data. The original value x of A is normalized to x' using z-score. When x is a matrix, the z-score method is still a matrix, and the mean and standard deviation used in the calculation process are the mean and variance of each column.
In a specific implementation process, after a discrete graph is drawn on original data, a few deviation values appear, the deviation values are possibly caused by abnormal use behaviors of extremely individual terminal users, the data are extremely individual, the data are confirmed to belong to a non-internet-of-things service terminal for use, and the deviation values are generated for the same IMSI all the time after continuous observation for several days. Such outliers, while reducing the impact on the classification model by regularization, still reduce the accuracy of the model. In order to solve this problem, records satisfying such a one-day moving distance exceeding 400km are deleted.
(3) Modeling sample set construction
And after the characteristic data recorded by the XDR is subjected to abnormity judgment and standardization, constructing a sample set for the neural network model. For example, the constructed data samples include positive samples (60%) and negative samples (40%).
Positive samples are typical application connection samples that are clearly within the target industry. And the negative sample is an application connection sample for determining that the industry attribution is different from the target industry. For example: for data sets targeted for the internet of vehicles, the connection of agjistar (apn ═ onstar) is a positive sample. The concatenation of a spread sheet class, apn ═ sjrqyd, is taken as a negative sample.
(4) Training set and validation set construction
The whole modeling sample set is randomly divided into a training set and a testing set according to the proportion of 70% and 30%.
The existing car networking services still have a small amount of active card holding of services, and can be almost ignored and ignored relative to ten million-level people network users. The traffic is less, so that the sample data of the Internet of vehicles is less in the original data and can not meet the requirement of model modeling. In the data modeling process, the over-fitting of the model can be caused by the overlarge proportion deviation of the positive sample and the negative sample, the accuracy of the sample data test model is high, but the true accuracy of the model can be represented, and the detection rate of the over-fitted model to new data is low.
Therefore, embodiments of the present invention accumulate a number of consecutive days of sample data in preparation for model data, and maintain the positive and negative sample data in a relatively uniform amount as much as possible. The data samples included positive samples (60%) and negative samples (40%).
After the model is built, the correct classification capability of the model is verified by using a known sample of the connection between the sibling and the internet of vehicles (GL automobile). The recognition accuracy of the model for GL automobile samples can reach 78.6%, and the misjudgment rate of the model for samples of known non-Internet-of-vehicles applications (such as shdky) is only 1.4%. Therefore, the model can provide good recognition rate and recognition accuracy.
(5) Model building
The construction of a neural network classification model is completed based on a Tensorflow deep learning system.
And after the model is constructed, model parameter optimization is carried out.
a) In deep learning, the conduction of neurons must be mathematically formulated, and the activation function can pass neuron signals from a previous layer to a next layer. The activation function is usually a nonlinear function (usually Sigmoid function and ReLU function), so that the neural network can arbitrarily approximate the nonlinear function, and thus the neural network can be applied to a plurality of nonlinear models.
The modeling process is as follows:
1. setting target variables, defining training and testing data
The target variable of the modeling is Internet of vehicles-Anji star, which is set to be 1, and the rest APNs are 0;
dividing a training set and a testing set: data were randomly divided into training and testing sets, in a 70% ratio: 30 percent. To ensure that the training set and the test set data are consistent each time, a random _ state may be set.
2. Defining a network (or model) of layer composition, mapping inputs to targets
Establishing an input layer, a hidden layer and an output layer: this time, two hidden layers are set.
3. Model training: selecting a loss function, an optimizer and an index to be monitored
Defining the loss function loss: cross _ entrpy (cross entropy) is generally used in deep learning;
optimizer: when training is set, using an adam optimizer in deep learning;
monitoring index metrics: accuracy of the usage model, accuracy;
4. iterating training data by calling fit method of model
The mathematics and the iteration are performed using a model fit method.
b) Model tuning
1. Adjusting optimization characteristics and dimensions
The condition that the model precision does not meet the requirement is often that the feature (index) selection is not good enough. The optimization features can be adjusted from the aspects of adding, deleting and converting.
2. Adjustment of algorithm parameters
Model parameter adjustment needs to be adjusted index by index so as to observe whether each adjustment improves the model effect, and evaluation can be performed from Recall (Recall), Precision (Precision) and F1Score of the model.
The parameter adjustment of the algorithm is mainly tested from three parameters of neuron number, epoch number (training period) and sample initialization method. 2 parameter combinations were selected:
the combination of the parameters: 500 neurons, 20 epochs, 20 initials, uniform;
combining parameters two: 500 neurons, 30 epochs, uniform initialization;
by combining with the data of the verification set, the evaluation index of the model can be obtained, so that the optimal parameters of the model are determined, and the most parameters determined by experiments are as follows: neuron 500, epochs 30, and initialize uniform.
Fig. 2 is a schematic structural diagram of industry identification of an internet of things device according to an embodiment of the present invention, and as shown in fig. 2, the system includes:
an input module 201, configured to input feature data of an XDR record flow of a device to be identified to a trained neural network model, and output an industry class label corresponding to the XDR record flow of the device to be identified, where the neural network model is obtained by training feature data of a sample XDR record flow based on a sample device and a predetermined industry class label of the sample XDR record flow; and the output module 202 is configured to obtain the industry category of the device to be identified according to the industry category label output by the neural network model.
Specifically, the system includes an input module 201 and an output module 202. The input module 201 inputs the characteristic data of the XDR recording stream of the device to be identified to the trained neural network model, outputs an industry category label corresponding to the XDR recording stream, and the output module 202 acquires the industry category of the device to be identified according to the industry category label output by the neural network model.
The system provided in the embodiment of the present invention specifically executes the flows of the above-mentioned methods, and for details, the contents of the above-mentioned methods are referred to, and are not described herein again. According to the system provided by the embodiment of the invention, the industry category of the equipment to be identified is obtained by inputting the characteristic data of the XDR record stream of the equipment to be identified into the trained neural network model. The problem of in the prior art under some circumstances the operator can't be according to the APN of equipment or the card number of thing networking card discernment equipment belongs to the trade is solved to even can be according to the APN of equipment or the card number of thing networking card discernment equipment belong to the trade, also cause information security hidden danger easily is solved.
Fig. 3 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)301, a communication Interface (communication Interface)302, a memory (memory)303 and a communication bus 304, wherein the processor 301, the communication Interface 302 and the memory 303 complete communication with each other through the communication bus 304. The processor 301 may invoke a computer program stored on the memory 303 and executable on the processor 301 to perform the methods provided by the various embodiments described above, including, for example: inputting the characteristic data of the XDR record flow of the equipment to be identified into a trained neural network model, and outputting an industry class label corresponding to the XDR record flow of the equipment to be identified, wherein the neural network model is obtained by training the characteristic data of the sample XDR record flow based on the sample equipment and the industry class label of the predetermined sample XDR record flow; and acquiring the industry category of the equipment to be identified according to the industry category label output by the neural network model.
In addition, the logic instructions in the memory 303 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, and the method includes: inputting the characteristic data of the XDR record flow of the equipment to be identified into a trained neural network model, and outputting an industry class label corresponding to the XDR record flow of the equipment to be identified, wherein the neural network model is obtained by training the characteristic data of the sample XDR record flow based on the sample equipment and the industry class label of the predetermined sample XDR record flow; and acquiring the industry category of the equipment to be identified according to the industry category label output by the neural network model.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An industry identification method of Internet of things equipment is characterized by comprising the following steps:
inputting the characteristic data of the XDR record flow of the equipment to be identified into a trained neural network model, and outputting an industry class label corresponding to the XDR record flow of the equipment to be identified, wherein the neural network model is obtained by training the characteristic data of the sample XDR record flow based on the sample equipment and the industry class label of the predetermined sample XDR record flow;
and acquiring the industry category of the equipment to be identified according to the industry category label output by the neural network model.
2. The method according to claim 1, wherein before inputting the characteristic data of the XDR record flow of the device to be identified into the trained neural network model and outputting the industry class label corresponding to the XDR record flow, the method further comprises:
and acquiring the industry category of the sample equipment, and setting a corresponding label for the XDR record flow of the sample equipment.
3. The method according to claim 1, wherein the XDR record flow of the device to be identified and the sample XDR record flow of the sample device are both acquired from each interface of the network based on a deep packet inspection technology.
4. The method of claim 3, wherein each interface of the network comprises any one or more of an S1-MME interface, an S1U interface, a Gb interface, and a Gn interface.
5. The method of claim 1, wherein the characterization data comprises: any one or more of different rate ratios, overall rate mean, partial rate mean, different distance division ratio, overall distance mean, partial distance mean, number of passed cells, overall number of cells mean, partial number of cells mean.
6. The method of claim 1, wherein inputting feature data of an XDR record stream of a device to be identified to the trained neural network model comprises:
filtering and cleaning an XDR record stream of equipment to be identified to generate a target XDR record stream;
inputting the characteristic data of the target XDR record flow into the trained neural network model.
7. The method of claim 6, wherein filtering and cleaning the XDR record stream of the device to be identified to generate a target XDR record stream, comprises:
for each XDR record in the XDR record stream of the equipment to be identified, if the APN field in the XDR record is judged to be known not to meet the first preset condition and/or not to contain the IMSI field, deleting the XDR record from the XDR record stream of the equipment to be identified, and taking the rest XDR records as target XDR record streams.
8. An industry identification system of internet of things equipment, comprising:
the system comprises an input module, a neural network model and a control module, wherein the input module is used for inputting the characteristic data of the XDR record flow of the equipment to be identified into a trained neural network model and outputting an industry class label corresponding to the XDR record flow of the equipment to be identified, and the neural network model is obtained by training the characteristic data of the sample XDR record flow based on sample equipment and a predetermined industry class label of the sample XDR record flow;
and the output module is used for acquiring the industry category of the equipment to be identified according to the industry category label output by the neural network model.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the processor executes the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN201811466893.3A 2018-12-03 2018-12-03 Industry identification method and system of Internet of things equipment Pending CN111275453A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811466893.3A CN111275453A (en) 2018-12-03 2018-12-03 Industry identification method and system of Internet of things equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811466893.3A CN111275453A (en) 2018-12-03 2018-12-03 Industry identification method and system of Internet of things equipment

Publications (1)

Publication Number Publication Date
CN111275453A true CN111275453A (en) 2020-06-12

Family

ID=70999924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811466893.3A Pending CN111275453A (en) 2018-12-03 2018-12-03 Industry identification method and system of Internet of things equipment

Country Status (1)

Country Link
CN (1) CN111275453A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112235326A (en) * 2020-12-15 2021-01-15 长沙树根互联技术有限公司 Internet of things equipment data analysis method and device and electronic equipment
CN113079052A (en) * 2021-04-29 2021-07-06 恒安嘉新(北京)科技股份公司 Model training method, device, equipment and storage medium, and method and device for identifying data of Internet of things
CN114422619A (en) * 2020-10-12 2022-04-29 ***通信集团广东有限公司 Service identification method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645806A (en) * 2009-09-04 2010-02-10 东南大学 Network flow classifying system and network flow classifying method combining DPI and DFI
CN105657001A (en) * 2015-12-28 2016-06-08 中国联合网络通信集团有限公司 Method and device for analyzing communication big data
CN107862468A (en) * 2017-11-23 2018-03-30 深圳市智物联网络有限公司 The method and device that equipment Risk identification model is established
CN108076475A (en) * 2016-11-17 2018-05-25 ***通信有限公司研究院 A kind of data processing method and server
CN108173781A (en) * 2017-12-20 2018-06-15 广东宜通世纪科技股份有限公司 HTTPS method for recognizing flux, device, terminal device and storage medium
CN108322354A (en) * 2017-01-18 2018-07-24 ***通信集团河南有限公司 One kind is escaped the recognition methods of flow account and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645806A (en) * 2009-09-04 2010-02-10 东南大学 Network flow classifying system and network flow classifying method combining DPI and DFI
CN105657001A (en) * 2015-12-28 2016-06-08 中国联合网络通信集团有限公司 Method and device for analyzing communication big data
CN108076475A (en) * 2016-11-17 2018-05-25 ***通信有限公司研究院 A kind of data processing method and server
CN108322354A (en) * 2017-01-18 2018-07-24 ***通信集团河南有限公司 One kind is escaped the recognition methods of flow account and device
CN107862468A (en) * 2017-11-23 2018-03-30 深圳市智物联网络有限公司 The method and device that equipment Risk identification model is established
CN108173781A (en) * 2017-12-20 2018-06-15 广东宜通世纪科技股份有限公司 HTTPS method for recognizing flux, device, terminal device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SAKTHI VIGNESH RADHAKRISHNAN 等: "GTID: A Technique for Physical Device and Device Type Fingerprinting", 《TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114422619A (en) * 2020-10-12 2022-04-29 ***通信集团广东有限公司 Service identification method, device, equipment and storage medium
CN114422619B (en) * 2020-10-12 2023-11-10 ***通信集团广东有限公司 Service identification method, device, equipment and storage medium
CN112235326A (en) * 2020-12-15 2021-01-15 长沙树根互联技术有限公司 Internet of things equipment data analysis method and device and electronic equipment
CN112235326B (en) * 2020-12-15 2021-03-16 长沙树根互联技术有限公司 Internet of things equipment data analysis method and device and electronic equipment
CN113079052A (en) * 2021-04-29 2021-07-06 恒安嘉新(北京)科技股份公司 Model training method, device, equipment and storage medium, and method and device for identifying data of Internet of things
CN113079052B (en) * 2021-04-29 2023-04-07 恒安嘉新(北京)科技股份公司 Model training method, device, equipment and storage medium, and method and device for identifying data of Internet of things

Similar Documents

Publication Publication Date Title
JP6918137B2 (en) Driving behavior evaluation method, device and computer-readable storage medium
CN111614690B (en) Abnormal behavior detection method and device
CN110147823B (en) Wind control model training method, device and equipment
CN110659318B (en) Big data-based policy pushing method, system and computer equipment
EP3038025A1 (en) Retention risk determiner
CN106682906B (en) Risk identification and service processing method and equipment
CN111275453A (en) Industry identification method and system of Internet of things equipment
CN109919781A (en) Case recognition methods, electronic device and computer readable storage medium are cheated by clique
CN109816043B (en) Method and device for determining user identification model, electronic equipment and storage medium
CN109600336A (en) Store equipment, identifying code application method and device
CN109145030B (en) Abnormal data access detection method and device
CN109063433B (en) False user identification method and device and readable storage medium
CN110619535B (en) Data processing method and device
DE102015108296A1 (en) Quality forecast of networked vehicles
CN111064719B (en) Method and device for detecting abnormal downloading behavior of file
CN107867295A (en) Be in danger accidents early warning method, storage device and the car-mounted terminal of probability based on vehicle
CN112785146B (en) Method and system for evaluating network public sentiment
CN112437034A (en) False terminal detection method and device, storage medium and electronic device
CN110852860A (en) Vehicle maintenance reimbursement behavior abnormity detection method, equipment and storage medium
CN108090040A (en) A kind of text message sorting technique and system
CN106304084B (en) Information processing method and device
CN110851414B (en) Method and system for analyzing boundary data by clustering method
CN107832925A (en) Internet content risk evaluating method, device and server
CN112241820A (en) Risk identification method and device for key nodes in fund flow and computing equipment
CN111368858A (en) User satisfaction evaluation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200612

RJ01 Rejection of invention patent application after publication