CN114301629A - IP detection method, device, terminal equipment and storage medium - Google Patents

IP detection method, device, terminal equipment and storage medium Download PDF

Info

Publication number
CN114301629A
CN114301629A CN202111437475.3A CN202111437475A CN114301629A CN 114301629 A CN114301629 A CN 114301629A CN 202111437475 A CN202111437475 A CN 202111437475A CN 114301629 A CN114301629 A CN 114301629A
Authority
CN
China
Prior art keywords
data
feature vector
original
vector
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111437475.3A
Other languages
Chinese (zh)
Inventor
丰竹勃
安韬
王智民
王高杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing 6Cloud Technology Co Ltd
Beijing 6Cloud Information Technology Co Ltd
Original Assignee
Beijing 6Cloud Technology Co Ltd
Beijing 6Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing 6Cloud Technology Co Ltd, Beijing 6Cloud Information Technology Co Ltd filed Critical Beijing 6Cloud Technology Co Ltd
Priority to CN202111437475.3A priority Critical patent/CN114301629A/en
Publication of CN114301629A publication Critical patent/CN114301629A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an IP detection method, an IP detection device, terminal equipment and a storage medium, wherein IP data are acquired; extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data; inputting the original feature vector into a preset model, and coding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector; and obtaining a rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector. The invention solves the problem of relying on label data when detecting IP data.

Description

IP detection method, device, terminal equipment and storage medium
Technical Field
The present invention relates to the field of network security, and in particular, to an IP detection method, apparatus, terminal device, and storage medium.
Background
With the rapid popularization and application of the internet, a plurality of network security problems are exposed. The main problem faced by network security detection is how to better analyze different customer groups and the different types of attacks they are subjected to. Network traffic is an important carrier for recording network activities, and network activities and user use conditions can be mastered by analyzing the traffic. Since normal traffic is usually repeated in a large amount, analysis of IP (Internet Protocol) addresses can be an important feature for analyzing network user behavior, and rare IP addresses are very likely to be risky.
In the process of detecting the IP address, characteristics need to be extracted from a large amount of IP data which change in real time, a model needs to be trained rapidly in real time, and the characteristics of the IP frequency are analyzed.
Therefore, there is a need to provide a solution to the problem of relying on tag data when detecting IP data.
Disclosure of Invention
The invention mainly aims to provide an IP detection method, an IP detection device, terminal equipment and a storage medium, and aims to solve the problem of dependence on label data when IP data is detected.
In order to achieve the above object, the present invention provides an IP detection method, including:
acquiring IP data;
extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data;
inputting the original feature vector into a preset model, and coding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector;
and obtaining a rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector.
Optionally, the step of performing feature extraction on the IP data to obtain an original feature vector of the IP data includes:
detecting the digit of the IP data, and finding out the IP data with insufficient digit in the IP data;
zero padding is carried out on the IP data with insufficient digits until all the IP data are character strings with preset lengths consisting of preset characters;
dividing the character string into words and constructing a vocabulary table;
and carrying out embedded vectorization training on the vocabulary to obtain the original feature vector of the IP data.
Optionally, the step of inputting the original feature vector into a preset model, and encoding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector further includes:
constructing a learning model based on a learning framework, wherein the learning model comprises an encoder and a decoder;
and training the learning model to obtain the preset model.
Optionally, the step of training the learning model to obtain the preset model includes:
randomly selecting a plurality of sample data from the IP data;
extracting the features of the sample data to obtain a feature vector of the sample data;
and inputting the characteristic vector of the sample data into the learning model to perform model training to obtain the preset model.
Optionally, the step of inputting the original feature vector into a preset model, and encoding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector includes:
inputting the original feature vector into the preset model;
performing dimensionality reduction on the original feature vector through the encoder to obtain a low-dimensional feature vector;
and reconstructing the low-dimensional feature vector through the decoder to obtain the reconstructed feature vector.
Optionally, the step of deriving the rareness value of the IP data based on the reconstructed feature vector and the original feature vector includes:
calculating the vector distance between the reconstructed feature vector and the original feature vector;
calculating probability distribution of vector intervals corresponding to the IP data within preset time;
and obtaining the rarity degree value of the IP data based on the probability distribution of the vector spacing.
Optionally, the step of obtaining the rareness value of the IP data based on the reconstructed feature vector and the original feature vector further includes:
and storing the rare degree value of the IP data to a database, and programming a corresponding interface for inquiring the rare degree value.
In addition, to achieve the above object, the present invention further provides an IP detection apparatus, including:
the acquisition module is used for acquiring IP data;
the characteristic extraction module is used for extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data;
the reconstruction module is used for inputting the original characteristic vector into a preset model, and coding and decoding the original characteristic vector through the preset model to obtain a reconstructed characteristic vector;
and the calculation module is used for obtaining the rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector.
In addition, in order to achieve the above object, the present invention further provides a terminal device, where the terminal device includes a memory, a processor, and an IP detection program stored in the memory and executable on the processor, and the IP detection program implements the steps of the IP detection method as described above when executed by the processor.
Further, to achieve the above object, the present invention also provides a computer readable storage medium having an IP detection program stored thereon, which when executed by a processor, implements the steps of the IP detection method as described above.
The embodiment of the invention provides an IP detection method, an IP detection device, terminal equipment and a storage medium, wherein IP data are acquired; extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data; inputting the original feature vector into a preset model, and coding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector; and obtaining a rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector. By carrying out feature extraction on the IP data, unique features of the IP address can be effectively reserved, a reconstructed feature vector of the IP data is obtained through a preset model, a rare degree value of the IP data is obtained based on the reconstructed feature vector and an original feature vector, and a label in the IP data is not required to be relied on, so that the problem that the label data is relied on when the IP data is detected can be solved, and a basis is provided for judging network safety.
Drawings
Fig. 1 is a functional block diagram of a terminal device to which an IP detection apparatus of the present invention belongs;
FIG. 2 is a flowchart illustrating an exemplary embodiment of an IP detection method according to the present invention;
FIG. 3 is a schematic flow chart diagram illustrating another exemplary embodiment of an IP detection method according to the present invention;
fig. 4 is a schematic diagram of a refining process in which the original feature vector is input into a preset model, and the original feature vector is encoded and decoded by the preset model to obtain a reconstructed feature vector according to an embodiment of the present invention;
fig. 5 is a schematic flow chart illustrating a process of encoding and decoding an original feature vector of IP data by a preset model to obtain a reconstructed feature vector according to an embodiment of the present invention;
fig. 6 is a schematic flow chart illustrating a process of obtaining an IP rarity value by detecting IP data according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: by acquiring IP data; extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data; inputting the original feature vector into a preset model, and coding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector; and obtaining a rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector. By carrying out feature extraction on the IP data, unique features of the IP address can be effectively reserved, a reconstructed feature vector of the IP data is obtained through a preset model, a rare degree value of the IP data is obtained based on the reconstructed feature vector and an original feature vector, and a label in the IP data is not required to be relied on, so that the problem that the label data is relied on when the IP data is detected can be solved, and a basis is provided for judging network safety.
The technical terms related to the embodiment of the invention are as follows:
IP (Internet Protocol): the information of various protocols can be provided for the transport layer, and the network where the equipment is located can be identified according to the IP address;
k-means (k-means clustering algorithm): is a clustering analysis algorithm for iterative solution;
IPv4(Internet Protocol version 4 ): the fourth revision in the internet protocol development process is also the first widely deployed version of the protocol, which is the core of the internet and is also the most widely used internet protocol version, and the subsequent version is IPv 6;
IPv6(Internet Protocol Version 6 ): is the next generation IP protocol designed by the internet engineering task force to replace IPv 4.
N-Gram: is a common language model in large vocabulary continuous speech recognition.
Tensorflow: the symbolic mathematical system is based on data flow programming and is widely applied to programming realization of various machine learning algorithms;
dropout layer (anti-over-fitting layer): is a structure in neural networks that prevents over-fitting of the model.
MSE (mean-square error): is a measure reflecting the degree of difference between the estimator and the estimated volume.
Embedding (embedded vectorization training): the original high-dimensional data is mapped to a low-dimensional space to facilitate the learning of a subsequent model, and the method is a feature processing method commonly used in deep learning.
In a traditional analysis method, a k-means clustering algorithm needs preset parameters, the density clustering algorithm can only be applied to countable data, a data object is regarded as a point with density correlation on an n-dimensional space, association among numbers is ignored, an IP address cannot be regarded as a common irrelevant number, and the data characteristics and the occurrence frequency can reflect the risk of the IP address. Therefore, how to extract features in the IP and train the model quickly in real time is very necessary to analyze the features of the IP frequency.
In the prior art, a method for detecting an abnormal IP address mainly uses a random forest model for training and identification, but the supervised training needs labeled sample data, usually needs to read information in a log as a label, and the method extremely depends on the result of the log and needs a large amount of samples.
The IP detection method provided by the invention has the advantages that the model training is carried out without supervision, the label data is not needed, the processed IP is input into the model for training, each training data enables the model to generate tiny change, the model learns a large amount of normal flow due to the fact that the normal flow is large and repeated, and a small amount of abnormal flow is used as noise. The key idea of the model is to learn the difference of input and output, and by using different training models of input and output and the perception loss in the training process of the self-encoder, the self-encoder reconstructs a large amount of repeated IP content information for detecting whether the subsequent data conforms to the characteristics, thereby calculating the rare degree of the IP.
Referring to fig. 1, fig. 1 is a functional module schematic diagram of a terminal device to which an IP detection apparatus of the present invention belongs. The IP detection apparatus may be an apparatus capable of performing IP detection independent of the terminal device, and may be carried on the terminal device in the form of hardware or software. The terminal device can be an intelligent mobile terminal with a data processing function, such as a mobile phone and a tablet personal computer, and can also be a fixed terminal device or a server with a data processing function.
In this embodiment, the terminal device to which the IP detection apparatus belongs at least includes an output module 110, a processor 120, a memory 130, and a communication module 140.
The memory 130 stores an operating system and an IP detection program, and the IP detection apparatus may perform feature extraction on the acquired IP data to obtain an original feature vector of the IP data, input the original feature vector into a preset model, encode and decode the original feature vector through the preset model, and store information such as an obtained reconstructed feature vector, the preset model, and a rare degree value of the IP data obtained based on the reconstructed feature vector and the original feature vector in the memory 130; the output module 110 may be a display screen or the like. The communication module 140 may include a WIFI module, a mobile communication module, a bluetooth module, and the like, and communicates with an external device or a server through the communication module 140.
Wherein the IP detection program in the memory 130 when executed by the processor implements the steps of:
acquiring IP data;
extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data;
inputting the original feature vector into a preset model, and coding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector;
and obtaining a rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector.
Further, the IP detection program in the memory 130 when executed by the processor further implements the steps of:
detecting the digit of the IP data, and finding out the IP data with insufficient digit in the IP data;
zero padding is carried out on the IP data with insufficient digits until all the IP data are character strings with preset lengths consisting of preset characters;
dividing the character string into words and constructing a vocabulary table;
and carrying out embedded vectorization training on the vocabulary to obtain the original feature vector of the IP data.
Further, the IP detection program in the memory 130 when executed by the processor further implements the steps of:
constructing a learning model based on a learning framework, wherein the learning model comprises an encoder and a decoder;
and training the learning model to obtain the preset model.
Further, the IP detection program in the memory 130 when executed by the processor further implements the steps of:
randomly selecting a plurality of sample data from the IP data;
extracting the features of the sample data to obtain a feature vector of the sample data;
and inputting the characteristic vector of the sample data into the learning model to perform model training to obtain the preset model.
Further, the IP detection program in the memory 130 when executed by the processor further implements the steps of:
inputting the original feature vector into the preset model;
performing dimensionality reduction on the original feature vector through the encoder to obtain a low-dimensional feature vector;
and reconstructing the low-dimensional feature vector through the decoder to obtain the reconstructed feature vector.
Further, the IP detection program in the memory 130 when executed by the processor further implements the steps of:
calculating the vector distance between the reconstructed feature vector and the original feature vector;
calculating probability distribution of vector intervals corresponding to the IP data within preset time;
and obtaining the rarity degree value of the IP data based on the probability distribution of the vector spacing.
Further, the IP detection program in the memory 130 when executed by the processor further implements the steps of:
and storing the rare degree value of the IP data to a database, and programming a corresponding interface for inquiring the rare degree value.
According to the scheme, the IP data are obtained specifically; extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data; inputting the original feature vector into a preset model, and coding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector; and obtaining a rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector. By carrying out feature extraction on the IP data, unique features of the IP address can be effectively reserved, a reconstructed feature vector of the IP data is obtained through a preset model, a rare degree value of the IP data is obtained based on the reconstructed feature vector and an original feature vector, and a label in the IP data is not required to be relied on, so that the problem that the label data is relied on when the IP data is detected can be solved, and a basis is provided for judging network safety.
Based on the above terminal device architecture but not limited to the above architecture, embodiments of the method of the present application are provided.
The execution subject of the method of this embodiment may be an IP detection device or a terminal device, and the IP detection device is used for example in this embodiment.
Referring to fig. 2, fig. 2 is a flowchart illustrating an IP detection method according to an exemplary embodiment of the present invention. The IP detection method comprises the following steps:
step S10, IP data is obtained;
the main problem faced by the current network security detection is how to better analyze different client groups and different types of attacks they are subjected to, the network traffic is used as an important carrier for recording network activities, the network activities and the use conditions of users can be mastered by analyzing the traffic, and bases can be provided for judging the network attacks by detecting the IP data in the network traffic. The network traffic can be acquired through port mirror images of network equipment such as a switch or through additional equipment such as an optical splitter and a network probe, so that lossless copying and mirror image acquisition of the network traffic can be realized, the IP data can be further acquired from the acquired network traffic, and the IP data in the network traffic can be acquired by acquiring the network traffic at different times due to the huge amount of the IP data and the change of the IP data along with the time.
Step S20, extracting the characteristics of the IP data to obtain the original characteristic vector of the IP data;
after the IP data is acquired, because the quantity of the IP data is large and the data characteristics are not obvious, the IP data needs to be subjected to characteristic extraction, effective characteristics are sorted out for subsequent use, and the specific method for performing the characteristic extraction on the IP data comprises the following steps:
detecting the digit of the IP data, and finding out the IP data with insufficient digit in the IP data;
zero padding is carried out on the IP data with insufficient digits until all the IP data are character strings with preset lengths consisting of preset characters;
because the IP address is used as the basis of data transmission and has unique attributes, the IPv4 uses 32-bit addresses uniformly and is expressed by decimal point division, and the IPv6 uses sixteen-digit notation, so that the digits of the detection point/the top-digit are respectively several digits, the front of the insufficient digit is supplemented with 0, and each IP is a character string formed by 18 characters 0123456789ABCDEF,
dividing the character string into words and constructing a vocabulary table;
and carrying out embedded vectorization training on the vocabulary to obtain the original feature vector of the IP data.
Further, after the character string of each IP data is obtained, N-Gram word segmentation is carried out on the character string, the word segmentation method uses a window with the length of N to carry out character-level sliding word extraction on each character string, the result is stored in an ordered list, a vocabulary table is built, then Embedding (embedded vectorization training) calculation is carried out on the vocabulary table, and further the original feature vector of the IP data is obtained.
Step S30, inputting the original feature vector into a preset model, and coding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector;
after the original characteristic vector of the IP data is obtained, the original characteristic vector needs to be input into a preset model to obtain a reconstructed characteristic vector, and the rare degree value of the IP data can be obtained through the vector distance between the reconstructed characteristic vector and the original characteristic vector.
In the embodiment of the invention, a TensorFlow framework is adopted to construct a neural network of a depth self-encoder, the neural network is composed of a convolution layer, a pooling layer, a Dropout layer, a deconvolution layer, an upper sampling layer, a full-connection layer and the like, and the neural network can be used as a preset model after being initially trained. The preset model can be roughly divided into an encoder and a decoder, the extracted original characteristic vector of the IP data is input into the encoder to be encoded, the dimension of the data is descended layer by layer after entering the encoder in the encoding process, and a Dropout layer is added on a full connection layer to prevent the model from being over-fitted. After the encoding is finished, the data enters the decoder part. In the decoding process, the data dimension is increased layer by layer, and a Dropout layer is added to a full connection layer to prevent the model from being over-fitted. And finally, completely reducing the dimensionality of the original data to obtain the reconstructed feature vector.
And step S40, obtaining the rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector, and providing basis for judging network security according to the rarity degree value of the IP data.
The step of obtaining the rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector specifically includes:
calculating the vector distance between the reconstructed feature vector and the original feature vector;
calculating probability distribution of vector intervals corresponding to the IP data within preset time;
and obtaining the rarity degree value of the IP data based on the probability distribution of the vector spacing.
The vector distance can measure the similarity between the characteristic vectors and can be obtained through calculation in various ways, and mainly comprises methods such as Euclidean distance, cosine distance, Hamming distance and the like.
Further, the step of obtaining the rareness degree value of the IP data based on the reconstructed feature vector and the original feature vector further includes:
and storing the rare degree value of the IP data to a database, and programming a corresponding interface for inquiring the rare degree value.
Since normal traffic is usually large and repetitive, each training data will cause a slight change in the learning model after being input into the learning model for training. After the model learns a large amount of common IP data and a small amount of rare IP data, common IPs can be attached, the vector distance obtained by the rare IPs can be higher, a final result is obtained according to probability distribution, the IPs and corresponding results are stored in a database, a specific interface is compiled, and therefore the rare IP degree value query can be provided.
In the embodiment, the IP data is acquired; extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data; inputting the original feature vector into a preset model, and coding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector; and obtaining a rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector. By carrying out feature extraction on the IP data, unique features of the IP address can be effectively reserved, a reconstructed feature vector of the IP data is obtained through a preset model, a rare degree value of the IP data is obtained based on the reconstructed feature vector and an original feature vector, and a label in the IP data is not required to be relied on, so that the problem of detection of label-free IP data can be solved.
Further, referring to fig. 3, fig. 3 is a flowchart illustrating an IP detection method according to another exemplary embodiment of the present invention. Based on the embodiment shown in fig. 2, in this embodiment, before the step S30 inputs the original feature vector into a preset model, and encodes and decodes the original feature vector through the preset model to obtain a reconstructed feature vector, the IP detection method further includes:
step S21: and constructing a learning model based on a learning framework, and training the learning model to obtain the preset model. In the present embodiment, step S21 is implemented between steps S20 and S30, and in other embodiments, step S21 may be implemented between step S10 and step S20.
Compared with the embodiment shown in fig. 2, the embodiment further includes a scheme of constructing a learning model based on a learning framework, and training the learning model to obtain the preset model.
Specifically, the step of constructing a learning model based on a learning framework and training the learning model to obtain the preset model may include:
constructing a learning model based on a learning framework, wherein the learning model comprises an encoder and a decoder;
training the learning model to obtain the preset model, and specifically comprising:
randomly selecting a plurality of sample data from the IP data;
extracting the features of the sample data to obtain a feature vector of the sample data;
and inputting the characteristic vector of the sample data into the learning model to perform model training to obtain the preset model.
The mainstream learning framework at present comprises TensorFlow, Keras, MXNet, PyTorch and other learning frameworks, in the embodiment of the invention, TensorFlow is adopted as the learning framework, and a deep self-encoder neural network with the depth of m layers is constructed as a learning model, wherein m is a positive integer greater than or equal to 1. The learning model can be roughly divided into an encoder part and a decoder part, wherein the encoder part mainly comprises a convolution layer, a pooling layer, a full-connection layer and other structures and is used for reducing the dimension of data; the decoder part mainly comprises a deconvolution layer, an up-sampling layer, full connection layers and other structures, is used for the upscaling of data, and is additionally provided with a Dropout layer between all the full connection layers, so that model overfitting can be prevented.
A plurality of sample data are randomly selected from the IP data acquired in step S10, and 500 sample data are selected in the embodiment of the present invention. The method comprises the steps of extracting features of selected sample data, detecting character strings formed by 18 characters of which each IP is [ 0123456789ABCDEF ], wherein the numbers of point marks/top marks in the sample data are respectively a few, and 0 is supplemented in front of the insufficient number of the digits, performing N-Gram word segmentation on the character strings, and constructing a vocabulary table for executing Embedding calculation to obtain feature vectors of the sample data. And then inputting the characteristic vector of the sample data into a built learning model for model training, randomly initializing the weight of the model, then training the whole model by utilizing an adam optimization algorithm, reducing the vector distance between the reconstructed characteristic vector of the sample data output by the model and the characteristic vector of the sample data input into the model as much as possible, finishing two rounds of training, obtaining a preset model, and storing the parameters of the model.
In the subsequent process, random sampling and feature extraction can be carried out on IP data of each batch, the extracted feature vectors are input into the model for model training, and then the parameters of the model are updated, so that a new model can be trained quickly and in real time.
In the embodiment, a learning model is built based on a TensorFlow learning framework, sample data is randomly selected from IP data and feature extraction is carried out, unique features of an IP address can be effectively reserved, the learning model is trained, weights of a neural network are randomly initialized, then the whole network is trained by using an Adam optimization algorithm to obtain the preset model, a new model can be quickly trained in real time according to the IP data acquired in real time and used for calculating the rare degree value of the IP data, and therefore the problem of detection of the label-free IP data is solved.
Further, referring to fig. 4, fig. 4 is a schematic diagram of a refining process in which the original feature vector is input into a preset model, and the original feature vector is encoded and decoded by the preset model to obtain a reconstructed feature vector in the embodiment of the present invention.
In this embodiment, based on the embodiment shown in fig. 2, in this embodiment, the step S30 is to input the original feature vector into a preset model, and encode and decode the original feature vector through the preset model to obtain a reconstructed feature vector, where the step S includes:
step S301, inputting the original feature vector into the preset model;
the original feature vector of the IP data obtained through feature extraction is input into the trained preset model, referring to fig. 5, fig. 5 is a schematic flow diagram of the preset model in the embodiment of the present invention, where the original feature vector of the IP data is encoded and decoded to obtain a reconstructed feature vector. The preset model mainly comprises an encoder part and a decoder part, wherein the encoder part mainly comprises a convolution layer, a pooling layer, a full-connection layer and other structures and is used for reducing the dimension of data; the decoder part mainly comprises a deconvolution layer, an up-sampling layer, full connection layers and other structures, is used for the upscaling of data, and is additionally provided with a Dropout layer between the full connection layers.
Step S302, performing dimensionality reduction processing on the original feature vector through the encoder to obtain a low-dimensional feature vector;
specifically, after the original feature vectors of the IP data enter the encoder, the convolution layer, the pooling layer and the full-connection layer in the encoder reduce the dimension of the high-dimensional original feature vectors, and the low-dimensional feature vectors are obtained through convolution, pooling and full-connection, wherein a Dropout layer is added between the full-connection layers, so that overfitting of the model can be prevented.
The overfitting means that noise in the fitted data and unrepresentative characteristics in the fitted data are caused by excessive learning iteration times and the like of the neural network model, so that the accuracy of an output result is influenced, and the model overfitting can be prevented by adopting Dropout.
Step S303, reconstructing the low-dimensional feature vector through the decoder to obtain the reconstructed feature vector.
After the low-dimensional feature vector is obtained by the encoder, the low-dimensional feature vector is input into a decoder of a preset model, the low-dimensional feature vector is subjected to dimension raising by a deconvolution layer, an upsampling layer and a full-link layer in the decoder, the feature vector is reconstructed by modes of deconvolution, upsampling, full-link and the like, and a Dropout layer is added between full-link layers of the decoder in the same way to prevent overfitting of the model. And (5) raising the dimension of the low-dimensional feature vector through decoding until the dimension of the IP data is completely restored finally, so as to obtain a reconstructed feature vector.
The rare degree value of the IP data is obtained by calculating the vector distance between the reconstructed feature vector and the original feature vector and according to the probability distribution of the vector distance in a period of time, so that a basis can be provided for judging the network attack.
In the embodiment, the original feature vector of the IP data is input into a preset model, dimensionality reduction is performed in the modes of convolution, pooling, full connection and the like to obtain a low-dimensional feature vector, dimensionality enhancement is performed on the low-dimensional feature vector in the modes of deconvolution, upsampling, full connection and the like to obtain a reconstructed feature vector, the model can be subjected to perception loss according to the difference of input and output, so that the difference between normal IP data and rare IP data is amplified, the rare degree value of the IP data is obtained conveniently, and a basis is provided for judging network attack.
In addition, an embodiment of the present invention further provides an IP detection apparatus, where the IP detection apparatus includes:
the acquisition module is used for acquiring IP data;
the characteristic extraction module is used for extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data;
the reconstruction module is used for inputting the original characteristic vector into a preset model, and coding and decoding the original characteristic vector through the preset model to obtain a reconstructed characteristic vector;
and the calculation module is used for obtaining the rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector.
Referring to fig. 6, fig. 6 is a schematic flow chart illustrating a process of obtaining an IP rarity value by detecting IP data according to an embodiment of the present invention. As shown in fig. 6, the rare IP detection model parameters are adaptively modified by learning the IP in the real-time traffic, and the rare degree of the IP is calculated.
Specifically, each batch of I data acquired in real time is subjected to feature extraction, the IP address is used as a data transmission basis and has unique attributes, the IPv4 uses 32-bit addresses for unified use and is represented in a dot decimal system, and the IPv6 is represented in an imposter sixteen-progressive system. In the embodiment of the invention, the number of the detection point mark/the number of the top mark are respectively several digits, 0 is supplemented in front of the insufficient number of digits, each IP is a character string consisting of 18 characters [ 0123456789ABCDEF ], N-Gram participles are carried out on the character string, and a vocabulary table is constructed for carrying out Embedding calculation.
Furthermore, a depth self-encoder is built, a Tensorflow framework is adopted to build a depth self-encoder neural network with the depth of m layers in the embodiment of the invention, and the neural network is composed of a convolution layer, a pooling layer, a Dropout layer, a deconvolution layer, an upper sampling layer, a full connection layer and the like. The method mainly comprises an encoder and a decoder, wherein the IP enters the encoder after feature extraction, the dimension of data enters the encoder and descends layer by layer, and a Dropout layer is added on a full connection layer to prevent overfitting of a model. After the encoding is finished, the data enters the decoder part. In the decoding process, the data dimension is increased layer by layer, and a Dropout layer is added to a full connection layer to prevent the model from being over-fitted. Until finally the dimensionality of the original data is completely restored.
Furthermore, after feature extraction is carried out on the IP data obtained in each batch, a vector X is obtained, the vector X is input into the built depth self-encoder model, the vector distance between the output value X' of the model and the input value X is calculated, and the IP and the corresponding vector distance are stored. And reading the IP and the corresponding numerical value within a period of time at regular time, and obtaining a probability numerical value according to data distribution. Storing the data in a database, writing a specific interface and providing query.
In addition, the model in the embodiment of the invention can be trained quickly and in real time, and the model training process comprises the following steps: and randomly selecting 500 samples from the IP data acquired in each batch, obtaining a vector X after feature extraction, inputting the vector X into a built depth self-encoder for model training, randomly initializing the weight of a neural network, training the whole network by utilizing an adam optimization algorithm, reducing the vector distance between a model output value X' and an input value X as much as possible, finishing two rounds of training, and storing model parameters.
Since normal traffic is usually overwhelming and repetitive, the input model is trained, and each training datum will make a slight change to the model. After the model learns a large number of common IPs and a small number of rare IPs, the common IPs can be attached, the vector distance obtained by the rare IPs can be higher, the final result can be obtained according to probability distribution, and the IPs and the corresponding result are stored in a database.
In this embodiment, by performing data processing according to the proprietary attribute of the IP address and Embedding the data, the unique feature of the IP address can be effectively retained, so that each character does not exist independently, but is used as a part of the dotting/imposition. In addition, compared with the traditional clustering method, the deep learning self-encoder does not need to set parameters in advance, and can train the model quickly in real time.
In addition, the present invention also provides a terminal device, where the terminal device includes a memory, a processor, and an IP detection program stored in the memory and capable of running on the processor, and the IP detection program implements the steps of the IP detection method when executed by the processor.
Since the IP detection program is executed by the processor, all technical solutions of all the foregoing embodiments are adopted, so that at least all the beneficial effects brought by all the technical solutions of all the foregoing embodiments are achieved, and details are not repeated herein.
Furthermore, the present invention also provides a computer-readable storage medium having an IP detection program stored thereon, which when executed by a processor implements the steps of the IP detection method as described above.
Since the IP detection program is executed by the processor, all technical solutions of all the foregoing embodiments are adopted, so that at least all the beneficial effects brought by all the technical solutions of all the foregoing embodiments are achieved, and details are not repeated herein.
Compared with the prior art, the IP detection method, the device, the terminal equipment and the storage medium provided by the embodiment of the invention acquire IP data; extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data; inputting the original feature vector into a preset model, and coding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector; and obtaining a rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector. By carrying out feature extraction on the IP data, unique features of the IP address can be effectively reserved, the reconstruction feature vector of the IP data is obtained through a preset model, the dependence on a label in the IP data is not needed, the model can be perceptively lost according to the difference of input and output, and therefore the difference between normal IP data and rare IP data is amplified, the rare degree value of the IP data is convenient to obtain, and a basis is provided for judging network attacks.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a controlled terminal, or a network device) to execute the method of each embodiment of the present application.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An IP detection method, characterized in that the IP detection method comprises the following steps:
acquiring IP data;
extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data;
inputting the original feature vector into a preset model, and coding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector;
and obtaining a rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector.
2. The IP detection method according to claim 1, wherein the step of extracting the features of the IP data to obtain the original feature vector of the IP data comprises:
detecting the digit of the IP data, and finding out the IP data with insufficient digit in the IP data;
zero padding is carried out on the IP data with insufficient digits until all the IP data are character strings with preset lengths consisting of preset characters;
dividing the character string into words and constructing a vocabulary table;
and carrying out embedded vectorization training on the vocabulary to obtain the original feature vector of the IP data.
3. The IP detection method according to claim 1, wherein the step of inputting the original feature vector into a preset model, and encoding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector further comprises:
constructing a learning model based on a learning framework, wherein the learning model comprises an encoder and a decoder;
and training the learning model to obtain the preset model.
4. The IP detection method of claim 3, wherein the step of training the learning model to obtain the predetermined model comprises:
randomly selecting a plurality of sample data from the IP data;
extracting the features of the sample data to obtain a feature vector of the sample data;
and inputting the characteristic vector of the sample data into the learning model to perform model training to obtain the preset model.
5. The IP detection method according to claim 3, wherein the step of inputting the original feature vector into a preset model, and encoding and decoding the original feature vector through the preset model to obtain a reconstructed feature vector comprises:
inputting the original feature vector into the preset model;
performing dimensionality reduction on the original feature vector through the encoder to obtain a low-dimensional feature vector;
and reconstructing the low-dimensional feature vector through the decoder to obtain the reconstructed feature vector.
6. The IP detection method of claim 1, wherein the step of deriving the rareness value for the IP data based on the reconstructed feature vector and the original feature vector comprises:
calculating the vector distance between the reconstructed feature vector and the original feature vector;
calculating probability distribution of vector intervals corresponding to the IP data within preset time;
and obtaining the rarity degree value of the IP data based on the probability distribution of the vector spacing.
7. The IP detection method of claim 6, wherein the step of deriving the rareness value for the IP data based on the reconstructed feature vector and the original feature vector further comprises:
and storing the rare degree value of the IP data to a database, and programming a corresponding interface for inquiring the rare degree value.
8. An IP detection apparatus, comprising:
the acquisition module is used for acquiring IP data;
the characteristic extraction module is used for extracting the characteristics of the IP data to obtain an original characteristic vector of the IP data;
the reconstruction module is used for inputting the original characteristic vector into a preset model, and coding and decoding the original characteristic vector through the preset model to obtain a reconstructed characteristic vector;
and the calculation module is used for obtaining the rarity degree value of the IP data based on the reconstructed feature vector and the original feature vector.
9. A terminal device, characterized in that the terminal device comprises a memory, a processor and an IP detection program stored on the memory and executable on the processor, the IP detection program, when executed by the processor, implementing the steps of the IP detection method according to any one of claims 1-7.
10. A computer-readable storage medium, having an IP detection program stored thereon, which when executed by a processor, performs the steps of the IP detection method of any one of claims 1-7.
CN202111437475.3A 2021-11-26 2021-11-26 IP detection method, device, terminal equipment and storage medium Pending CN114301629A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111437475.3A CN114301629A (en) 2021-11-26 2021-11-26 IP detection method, device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111437475.3A CN114301629A (en) 2021-11-26 2021-11-26 IP detection method, device, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114301629A true CN114301629A (en) 2022-04-08

Family

ID=80966343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111437475.3A Pending CN114301629A (en) 2021-11-26 2021-11-26 IP detection method, device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114301629A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110365636A (en) * 2019-05-23 2019-10-22 中国科学院信息工程研究所 The method of discrimination and device of industry control honey jar attack data source
CN110569322A (en) * 2019-07-26 2019-12-13 苏宁云计算有限公司 Address information analysis method, device and system and data acquisition method
US20200076842A1 (en) * 2018-09-05 2020-03-05 Oracle International Corporation Malicious activity detection by cross-trace analysis and deep learning
WO2020126994A1 (en) * 2018-12-17 2020-06-25 Commissariat A L'energie Atomique Et Aux Energies Alternatives Method and system for detecting anomalies in a telecommunications network
CN111343147A (en) * 2020-02-05 2020-06-26 北京中科研究院 Network attack detection device and method based on deep learning
CN111832647A (en) * 2020-07-10 2020-10-27 上海交通大学 Abnormal flow detection system and method
CN112398779A (en) * 2019-08-12 2021-02-23 中国科学院国家空间科学中心 Network traffic data analysis method and system
CN112671768A (en) * 2020-12-24 2021-04-16 四川虹微技术有限公司 Abnormal flow detection method and device, electronic equipment and storage medium
CN113395276A (en) * 2021-06-10 2021-09-14 广东为辰信息科技有限公司 Network intrusion detection method based on self-encoder energy detection
CN113468537A (en) * 2021-06-15 2021-10-01 江苏大学 Feature extraction and vulnerability exploitation attack detection method based on improved self-encoder

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200076842A1 (en) * 2018-09-05 2020-03-05 Oracle International Corporation Malicious activity detection by cross-trace analysis and deep learning
WO2020126994A1 (en) * 2018-12-17 2020-06-25 Commissariat A L'energie Atomique Et Aux Energies Alternatives Method and system for detecting anomalies in a telecommunications network
CN110365636A (en) * 2019-05-23 2019-10-22 中国科学院信息工程研究所 The method of discrimination and device of industry control honey jar attack data source
CN110569322A (en) * 2019-07-26 2019-12-13 苏宁云计算有限公司 Address information analysis method, device and system and data acquisition method
CN112398779A (en) * 2019-08-12 2021-02-23 中国科学院国家空间科学中心 Network traffic data analysis method and system
CN111343147A (en) * 2020-02-05 2020-06-26 北京中科研究院 Network attack detection device and method based on deep learning
CN111832647A (en) * 2020-07-10 2020-10-27 上海交通大学 Abnormal flow detection system and method
CN112671768A (en) * 2020-12-24 2021-04-16 四川虹微技术有限公司 Abnormal flow detection method and device, electronic equipment and storage medium
CN113395276A (en) * 2021-06-10 2021-09-14 广东为辰信息科技有限公司 Network intrusion detection method based on self-encoder energy detection
CN113468537A (en) * 2021-06-15 2021-10-01 江苏大学 Feature extraction and vulnerability exploitation attack detection method based on improved self-encoder

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
张云泉,方娟,贾海鹏等: "《人工智能三驾马车》", 31 July 2021, 北京:科学技术文献出版社, pages: 200 - 201 *
王兰成: "《网络舆情分析技术》", 31 October 2014, 国防工业出版社, pages: 95 - 96 *
美国微软公司著,希望图书创作室译: "《Microsoft Windows NT 4.0环境下的TCP/IP网络互连》", 31 July 1998, 宇航出版社, pages: 32 - 33 *
高仕斌: "《中国智能铁路核心技术 高速铁路 智能牵引供电***》", 31 December 2020, 成都:西南交通大学出版社, pages: 118 *

Similar Documents

Publication Publication Date Title
CN109685647B (en) Credit fraud detection method and training method and device of model thereof, and server
CN108304911B (en) Knowledge extraction method, system and equipment based on memory neural network
CN113269189B (en) Construction method of text recognition model, text recognition method, device and equipment
CN111382555B (en) Data processing method, medium, device and computing equipment
CN111371806A (en) Web attack detection method and device
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN113836928B (en) Text entity generation method, device, equipment and storage medium
CN107451106A (en) Text method and device for correcting, electronic equipment
CN111475622A (en) Text classification method, device, terminal and storage medium
CN110162939B (en) Man-machine identification method, equipment and medium
CN115795038A (en) Intention identification method and device based on localization deep learning framework
CN113688955B (en) Text recognition method, device, equipment and medium
CN112132269B (en) Model processing method, device, equipment and storage medium
CN114301629A (en) IP detection method, device, terminal equipment and storage medium
CN114936326A (en) Information recommendation method, device, equipment and storage medium based on artificial intelligence
CN115203206A (en) Data content searching method and device, computer equipment and readable storage medium
CN114911940A (en) Text emotion recognition method and device, electronic equipment and storage medium
CN114328894A (en) Document processing method, document processing device, electronic equipment and medium
CN116414976A (en) Document detection method and device and electronic equipment
CN114528908A (en) Network request data classification model training method, classification method and storage medium
CN114338058A (en) Information processing method, device and storage medium
CN111199170B (en) Formula file identification method and device, electronic equipment and storage medium
CN111860662B (en) Training method and device, application method and device of similarity detection model
CN115114627B (en) Malicious software detection method and device
CN113065348B (en) Internet negative information monitoring method based on Bert model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination