CN117194903A - Network traffic data complement method and system based on generation of countermeasure network - Google Patents

Network traffic data complement method and system based on generation of countermeasure network Download PDF

Info

Publication number
CN117194903A
CN117194903A CN202311119704.6A CN202311119704A CN117194903A CN 117194903 A CN117194903 A CN 117194903A CN 202311119704 A CN202311119704 A CN 202311119704A CN 117194903 A CN117194903 A CN 117194903A
Authority
CN
China
Prior art keywords
training
generator
network
vector
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311119704.6A
Other languages
Chinese (zh)
Inventor
朱晨露
胡博涵
邓贤君
张立杰
熊杰
阮梦雄
杨泽灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Chutian High Speed Digital Technology Co ltd
Original Assignee
Hubei Chutian High Speed Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Chutian High Speed Digital Technology Co ltd filed Critical Hubei Chutian High Speed Digital Technology Co ltd
Priority to CN202311119704.6A priority Critical patent/CN117194903A/en
Publication of CN117194903A publication Critical patent/CN117194903A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a network traffic data complement method and system based on a generation countermeasure network, which belong to the technical field of data processing and comprise the following steps: pre-training a generator and a discriminator for generating an countermeasure network based on a part of a preset low-loss-rate sample in the training data set to obtain a part of the training data set after completion; clustering the complemented partial training data set by adopting a clustering algorithm to generate a pseudo tag, and training a preset classifier to obtain a trained classifier; and training the generator and the discriminant by using all training data sets, and restricting the generator by using the classifier after training to obtain the network flow data complement model. According to the invention, the Wo Sesi generation countermeasure network is combined with the conditions to generate the countermeasure network on the basis of the variant with the weight clipping penalty, so that the problems of stability and diversity of the standard generation countermeasure network structure are solved, the potential category information in the network flow data is fully utilized, and the data complement precision is improved.

Description

Network traffic data complement method and system based on generation of countermeasure network
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and system for supplementing network traffic data based on generation of an countermeasure network.
Background
With the rapid development of network scale and the continuous improvement of information technology, network construction is more complex, network faults are more frequent, and network management becomes a difficult problem to be solved urgently. In order to achieve reasonable control and regulation of network operation, improve network performance and ensure service quality, network traffic data needs to be monitored, measured and analyzed. Although the collection device and the storage device of the network traffic data are updated gradually, massive network traffic data can be collected and stored, the network traffic data still inevitably encounters problems such as data loss or data abnormality due to the influence of factors such as a transmission protocol. On one hand, because the measurement and storage cost of the network traffic data is too high, it is impossible to directly measure and store all the network traffic data, and in general, only part of the network traffic data can be collected, which results in that the network traffic data contains a large number of missing values; on the other hand, due to various network attacks, such as cross-site scripting attack and distributed denial of service attack, some abnormal data and even missing data may exist in the collected network traffic data. These problems severely degrade the quality of the network traffic data for subsequent analysis, which can have a significant impact on the performance of downstream applications. Thus, accurately supplementing missing data from measured network traffic data is important to achieve better network management.
The current data complement method mainly comprises a data complement method based on statistics, a data complement method based on machine learning and a data complement method based on deep learning. With the development of artificial intelligence technology, the data complement method based on deep learning has obvious advantages in performance and accuracy compared with other methods. Although deep learning has made great progress in discriminating models, it has still progressed poorly in generating models, and has a difficulty in that approximate computation at the time of maximizing likelihood functions is complicated. Generating the countermeasure network approximates the data distribution by learning a model instead of directly learning the complex data distribution, solves the computational complexity problem of generating patterns. However, the existing network traffic data complement method based on the generation countermeasure network has the problems that the performance in the optimization process is not stable enough, the generated samples are lack of diversity, the pattern collapse and gradient disappearance are easy to cause, and the potential correlation in the network traffic data is not fully utilized.
Disclosure of Invention
The invention provides a network traffic data complement method and system based on a generation countermeasure network, which are used for solving the defects in the prior art.
In a first aspect, the present invention provides a network traffic data completion method based on generating an antagonism network, including:
acquiring a training data set in a network flow data set, determining a part of preset low-loss-rate samples in the training data set, and pre-training a generator and a discriminator for generating an countermeasure network based on the part of preset low-loss-rate samples to obtain a part of training data set after completion;
clustering the complement part training data set by adopting a clustering algorithm to generate a pseudo tag, and training a preset classifier based on the complement part training data set and the pseudo tag to obtain a trained classifier;
training the generator and the discriminator by using all training data sets, and restricting the generator by using the trained classifier to obtain a network traffic data complement model;
and inputting the network flow data set into the network flow data complement model, and outputting the complemented network flow data.
According to the network traffic data complement method based on generating the countermeasure network provided by the invention, a training data set in the network traffic data set is obtained, a part of preset low-loss-rate samples in the training data set are determined, and a generator and a discriminator for generating the countermeasure network are pre-trained based on the part of preset low-loss-rate samples to obtain the complement part of training data set, and the method comprises the following steps:
determining training data setsFor d-dimensional data vector X, training data set +.>All samples of the plurality are arranged in ascending order according to the deletion rate, and the pre-set low deletion rate sample +.>For d-dimensional data vector X L Wherein N is the training dataset +.>All the sample numbers in the test sample number are more than 0 and less than 1;
for d-dimensional data vectors X of the same size L The mask vector M, the randomly sampled noise vector Z and the noise vector B which is randomly valued at the prompt rate k are respectively sampled for n independent samples;
randomly initializing a generator G, and adopting the generator G after random initialization,Data vector X L N independent samples in the mask vector M and the randomly sampled noise vector Z are calculated to obtain a vectorAnd interpolation vector +.>
Wherein, as indicated by the letter OR;
the hint vector H is calculated using n independent samples in the noise vector B and the mask vector M:
H=B⊙M
the generator G and the adaptive moment estimation optimizer are adopted to pretrain the discriminant D, so as to obtain a pretrained discriminant D;
after the training of the discriminator D is finished, carrying out weight cutting on the parameters of the discriminator D, so that the updated weights are positioned in a preset interval;
pre-training the generator G by adopting a pre-training discriminator D and a self-adaptive moment estimation optimizer to obtain a pre-training generator G;
partial preset low-loss-rate sample using a pre-training generator GObtaining the complement part training data set
According to the network traffic data complement method based on the generation of the countermeasure network, the deletion rate comprises the following steps:
wherein mask m i Indicating whether the data at the i-th position in x is missing or not, and r (x) is the missing rate.
According to the network traffic data complement method based on the generation of the countermeasure network, the discriminant D pre-training optimization target is as follows:
wherein the loss function of the discriminator DThe method comprises the following steps:
m(j)、and h (j) is M,/respectively>And H;
correspondingly, the generator G pretrains the optimization objective as:
wherein the loss function L of the generator G G :{0,1} d ×[0,1] d ×{0,1} d R is:
loss function L M :R d ×R d R is:
wherein α is a first hyper-parameter.
According to the network traffic data complement method based on the generated countermeasure network, provided by the invention, a clustering algorithm is adopted to cluster the complement part training data set to generate a pseudo tag, and a preset classifier is trained based on the complement part training data set and the pseudo tag to obtain a trained classifier, and the method comprises the following steps:
partial training data set after completion based on k-means clustering algorithmClustering to generate pseudo tags->
Using the complement-back part training datasetAnd pseudo tag->And training a classifier C, wherein the classifier C is a support vector machine.
According to the network traffic data complement method based on the generation countermeasure network provided by the invention, the generator and the discriminator are trained by all training data sets, the generator is restrained by the trained classifier, and a network traffic data complement model is obtained, and the method comprises the following steps:
respectively sampling n independent samples of a data vector X, a mask vector M, a randomly sampled noise vector Z and a noise vector B which are randomly valued at a prompt rate k, wherein the data vector X, the mask vector M, the randomly sampled noise vector Z and the noise vector B have the same size;
randomly initializing a generator G, and calculating n independent samples in the randomly initialized generator G, a data vector X, a mask vector M and a randomly sampled noise vector Z to obtain a vectorAnd interpolation vector +.>
Wherein, as indicated by the letter OR;
the hint vector H is calculated using n independent samples in the noise vector B and the mask vector M:
H=B⊙M
training the discriminator D by adopting a generator G and a self-adaptive moment estimation optimizer to obtain the discriminator D;
after the training of the discriminator D is finished, carrying out weight cutting on the parameters of the discriminator D, so that the updated weights are positioned in a preset interval;
training the generator G by adopting a pre-training discriminant D and a self-adaptive moment estimation optimizer to obtain the generator G, and obtaining the network flow data complement model.
According to the network traffic data complement method based on the generation of the countermeasure network, the training optimization target of the arbiter D is as follows:
wherein,loss function of discriminator DThe method comprises the following steps:
m(j)、and h (j) is M,/respectively>And H;
correspondingly, the generator G trains the optimization targets as:
wherein the loss function L of the generator G G :{0,1} d ×[0,1] d ×{0,1} d R is:
loss function L M :R d ×R d R is:
loss function L generated by classifier C constraint c The method comprises the following steps:
wherein alpha is a first super parameter and beta is a second super parameter.
According to the network traffic data complement method based on generating the countermeasure network provided by the invention, the network traffic data set is input into the network traffic data complement model, and the network traffic data after complement is output, comprising the following steps:
training data setInputting the network traffic data complement model to obtain the complemented network traffic data
In a second aspect, the present invention also provides a network traffic data completion system based on generating an countermeasure network, including:
the pre-training module is used for acquiring a training data set in the network flow data set, determining a part of preset low-loss-rate samples in the training data set, and pre-training a generator and a discriminator for generating an countermeasure network based on the part of preset low-loss-rate samples to obtain a part of training data set after completion;
the classifier training module is used for clustering the complemented partial training data set by adopting a clustering algorithm to generate a pseudo tag, and training a preset classifier based on the complemented partial training data set and the pseudo tag to obtain a trained classifier;
the training module is used for training the generator and the discriminator by all training data sets, and restraining the generator by the trained classifier to obtain a network flow data complement model;
and the completion processing module is used for inputting the network flow data set into the network flow data completion model and outputting the completed network flow data.
In a third aspect, the present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a network traffic data complementing method based on generating an countermeasure network as described in any of the above when the program is executed by the processor.
In a fourth aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a network traffic data complementing method based on generating an countermeasure network as described in any of the above.
According to the network traffic data complement method and system based on the generation of the countermeasure network, the countermeasure network is generated by combining the conditions on the basis of using the Wo Sesi generation variant with the weight clipping penalty, so that the stability and diversity problems of the standard generation countermeasure network structure are solved, the potential type information in the network traffic data is fully utilized, and the data complement precision is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a network traffic data completion method based on generation of an countermeasure network according to the present invention;
FIG. 2 is a second flow chart of the method for supplementing network traffic data based on generating an countermeasure network according to the present invention;
FIG. 3 is a schematic diagram of the network traffic data completion result provided by the present invention;
FIG. 4 is a schematic diagram of a network traffic data completion system based on generation of an countermeasure network according to the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Aiming at the limitation of the prior art, the invention provides a network traffic data complement method based on generating an countermeasure network, which aims at generating a countermeasure network structure by using a Wo Sesi tam generation variant with weight clipping penalty instead of a standard, so as to reduce the possibility of occurrence of mode collapse and gradient disappearance; and generating an countermeasure network by combining the conditions, and fully utilizing potential category information closely related to potential feature distribution in the network traffic data, thereby solving the technical problem of network traffic data completion based on the generation of the countermeasure network.
Fig. 1 is a flow chart of a network traffic data complementing method based on generating an countermeasure network according to an embodiment of the present invention, as shown in fig. 1, including:
step 100: acquiring a training data set in a network flow data set, determining a part of preset low-loss-rate samples in the training data set, and pre-training a generator and a discriminator for generating an countermeasure network based on the part of preset low-loss-rate samples to obtain a part of training data set after completion;
step 200: clustering the complement part training data set by adopting a clustering algorithm to generate a pseudo tag, and training a preset classifier based on the complement part training data set and the pseudo tag to obtain a trained classifier;
step 300: training the generator and the discriminator by using all training data sets, and restricting the generator by using the trained classifier to obtain a network traffic data complement model;
step 400: and inputting the network flow data set into the network flow data complement model, and outputting the complemented network flow data.
It should be noted that, technical terms related in the embodiments of the present invention include:
generating an antagonizing network: generating an antagonism network is a network structure that can be trained in an unsupervised learning manner over a more complex distribution, training two opponent networks simultaneously to counter each other to fit the data distribution. The generation countermeasure network includes a generation model for generating samples and a discriminant model for estimating probabilities that the samples are from the real data rather than the generation model.
Wo Sesi tam generates an antagonizing network: wo Sesi tam generation of an antagonism network is a variant of generating an antagonism network using Wo Sesi tam distance instead of one of the cross entropy and relative entropy in the classical generation antagonism network model.
Adaptive moment estimation optimizer: the adaptive moment estimation optimizer is a one-step gradient optimization algorithm, is based on a random objective function of low-order moment adaptive estimation, and has the capability of processing sparse gradients by an adaptive gradient method and the capability of processing non-stationary targets by a root mean square propagation gradient descent method.
k-means clustering algorithm: the k-means clustering algorithm is an iterative solution clustering analysis algorithm, which divides a data set into k different clusters, and the center of each cluster is calculated by adopting the mean value of the values contained in the cluster.
Support vector machine: the support vector machine is a generalized linear classifier for binary classification of data according to a supervised learning mode, and the decision boundary is the maximum margin hyperplane for solving a learning sample.
Specifically, as shown in fig. 2, the network traffic data complement method based on generating an countermeasure network in the embodiment of the invention includes:
firstly, selecting partial low-loss-rate samples in a data set to pretrain a generator and a discriminator in a generated countermeasure network, and obtaining a complemented partial training data set:
known training data setsCan be expressed as d-dimensional data vector X, training data set +.>All samples in (1) are arranged in ascending order of deletion rate, and the pre-lambda N (0<λ<1) The individual samples form a pre-training data set +.>N is training data set +.>Can be expressed as d-dimensional data vector X L The deletion rate r (x) of any one d-dimension data x is as follows:
wherein mask m i Indicating whether the data at the i-th position in x is missing.
Respectively sampling data vectors X of the same size L N independent samples of the mask vector M, the noise vector Z and the corresponding positions in the noise vector B. Where Z is the noise variable randomly sampled in a uniform distribution [ -0.01, 0.01), and B is the noise variable in {0,1} d Noise variable which is randomly valued at prompt rate k, namely B i Taking 0 with probability k and taking 1 with probability 1-k.
Using randomly initialized generators G and X L N samples of M and Z are calculated to generate a vectorAnd interpolation vector +.>Generating vector->And interpolation vector +.>Is defined as follows:
the hint vector H is calculated using n samples of B and M, defined as follows:
H=B⊙M
wherein, the symbol ". Iy represents a nor operation.
The discriminant D is pre-trained using a fixed generator G and an adaptive moment estimation optimizer, with the following optimization objectives:
wherein the loss function of the arbiter DIs defined as follows:
all are defined as follows:
wherein m (j),And h (j) is M,/respectively>And H.
And, after each discriminant training is finished, the weight is cut, and the updated weight is limited in the interval [ -0.01, +0.01 ].
The generator G is pre-trained by using the pre-trained arbiter D and the adaptive moment estimation optimizer, and the optimization objective of the generator G is as follows:
wherein the loss function L G :{0,1} d ×[0,1] d ×{0,1} d The definition of R is as follows:
loss function L M :R d ×R d The definition of R is as follows:
where α is a hyper-parameter.
Pre-training data set with pre-trained generator GThe deletion data in (1) is complemented to obtain a data corresponding to +.>Is->
Then, a clustering algorithm is applied to the completed partial training data set to create pseudo tags, and a classifier is trained by using the completed partial training data set and the created pseudo tags:
interpolation dataset Using k-means clustering algorithmClustering to generate pseudo tags->Using interpolated data setsAnd its corresponding pseudo tag->Training the auxiliary classifier C. Wherein the auxiliary classifier C belongs to a support vector machine.
Further, training using the full training data set to generate a generator and a arbiter in the antagonism network, wherein the generator is constrained by the trained classifier, specifically comprising:
n independent samples of the corresponding positions in the data vector X, the mask vector M, the noise vector Z and B, which are the same in size, are sampled respectively. Where Z is the noise variable randomly sampled in a uniform distribution [ -0.01, 0.01), and B is the noise variable in {0,1} d Noise variable which is randomly valued at prompt rate k, namely B i Taking 0 with probability k and taking 1 with probability 1-k.
Generating vectors using n samples of randomly initialized generator G, X, M and ZAnd interpolation vector +.>Generating vector->And interpolation vector +.>Is defined as follows:
the hint vector H is calculated using n samples of B and M, defined as follows:
H=B⊙M
the discriminant D is trained using a fixed generator G and an adaptive moment estimation optimizer, with the following optimization objectives:
wherein the loss function of the arbiter DIs defined as follows:
is defined as follows:
wherein m (j),And h (j) is M,/respectively>And H.
And, after each discriminant training is finished, the weight is cut, and the updated weight is limited in the interval [ -0.01, +0.01 ].
Training a generator G using a trained discriminant D and an adaptive moment estimation optimizer, the optimization objective of the generator G being as follows:
wherein the loss function L G :{0,1} d ×[0,1] d ×{0,1} d The definition of R is as follows:
loss function L M :R d ×R d The definition of R is as follows:
loss function L due to classifier constraints C Is defined as follows:
where α and β are hyper-parameters.
Finally, the trained generator is utilized to complement the missing data in the network flow data set, and the trained generator G is utilized to train the data setThe deletion data in (1) is complemented to obtain a data corresponding to +.>Is->
As shown in fig. 3, an example of the result of data complement in the embodiment of the present invention is shown, where the data deletion rate and the root mean square error (Root Mean Square Error, RMSE) are related, and experiments prove that the method of the embodiment of the present invention has a good data complement effect.
It can be seen that, compared with the prior art, the technical solution of the embodiment of the present invention has low probability of mode collapse and gradient disappearance: the Wo Sesi generation method has the advantages that a variant substitution standard with weight clipping penalty is used for generating an countermeasure network structure, the method for measuring the difference between two probability distributions is improved, the method is more reliable and reasonable, and the possibility of occurrence of pattern collapse and gradient disappearance is reduced; the antagonism network is generated by combining the conditions, the regularity of the network flow data on the date, time and other attributes is fully utilized by clustering and classification, and the data complement effect is improved.
The network traffic data completing system based on the generated countermeasure network, which is described below, and the network traffic data completing method based on the generated countermeasure network, which is described below, can be referred to correspondingly.
Fig. 4 is a schematic structural diagram of a network traffic data complementing system based on generating an countermeasure network according to an embodiment of the present invention, as shown in fig. 4, including: a pre-training module 41, a classifier training module 42, a training module 43, and a completion processing module 44, wherein:
the pre-training module 41 is configured to obtain a training data set in the network traffic data set, determine a partial preset low-loss-rate sample in the training data set, and pre-train a generator and a discriminator for generating an countermeasure network based on the partial preset low-loss-rate sample to obtain a partial training data set after completion; the classifier training module 42 is configured to perform clustering on the complement partial training data set by using a clustering algorithm to generate a pseudo tag, and train a preset classifier based on the complement partial training data set and the pseudo tag to obtain a trained classifier; the training module 43 is configured to train the generator and the arbiter by using all training data sets, and constraint the generator by using the trained classifier to obtain a network traffic data completion model; the completion processing module 44 is configured to input the network traffic data set into the network traffic data completion model and output the completed network traffic data.
Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a network traffic data completion method based on generating a countermeasure network, the method comprising: acquiring a training data set in a network flow data set, determining a part of preset low-loss-rate samples in the training data set, and pre-training a generator and a discriminator for generating an countermeasure network based on the part of preset low-loss-rate samples to obtain a part of training data set after completion; clustering the complement part training data set by adopting a clustering algorithm to generate a pseudo tag, and training a preset classifier based on the complement part training data set and the pseudo tag to obtain a trained classifier; training the generator and the discriminator by using all training data sets, and restricting the generator by using the trained classifier to obtain a network traffic data complement model; and inputting the network flow data set into the network flow data complement model, and outputting the complemented network flow data.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the network traffic data complementing method based on generating an countermeasure network provided by the methods described above, the method comprising: acquiring a training data set in a network flow data set, determining a part of preset low-loss-rate samples in the training data set, and pre-training a generator and a discriminator for generating an countermeasure network based on the part of preset low-loss-rate samples to obtain a part of training data set after completion; clustering the complement part training data set by adopting a clustering algorithm to generate a pseudo tag, and training a preset classifier based on the complement part training data set and the pseudo tag to obtain a trained classifier; training the generator and the discriminator by using all training data sets, and restricting the generator by using the trained classifier to obtain a network traffic data complement model; and inputting the network flow data set into the network flow data complement model, and outputting the complemented network flow data.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for supplementing network traffic data based on generating an antagonism network, comprising:
acquiring a training data set in a network flow data set, determining a part of preset low-loss-rate samples in the training data set, and pre-training a generator and a discriminator for generating an countermeasure network based on the part of preset low-loss-rate samples to obtain a part of training data set after completion;
clustering the complement part training data set by adopting a clustering algorithm to generate a pseudo tag, and training a preset classifier based on the complement part training data set and the pseudo tag to obtain a trained classifier;
training the generator and the discriminator by using all training data sets, and restricting the generator by using the trained classifier to obtain a network traffic data complement model;
and inputting the network flow data set into the network flow data complement model, and outputting the complemented network flow data.
2. The method of claim 1, wherein obtaining a training dataset in the network traffic dataset, determining a partial preset low loss rate sample in the training dataset, pre-training a generator and a arbiter for generating the countermeasure network based on the partial preset low loss rate sample, and obtaining a post-completion partial training dataset, comprising:
determining training data setsFor d-dimensional data vector X, training data set +.>All samples of the plurality are arranged in ascending order according to the deletion rate, and the pre-set low deletion rate sample +.> For d-dimensional data vector X L Wherein N is the training dataset +.>All sample numbers of 0< <1;
For d-dimensional data vectors X of the same size L The mask vector M, the randomly sampled noise vector Z and the noise vector B which is randomly valued at the prompt rate k are respectively sampled for n independent samples;
randomly initializing a generator G, and adopting the randomly initialized generator G and a data vector X L N independent samples in the mask vector M and the randomly sampled noise vector Z are calculated to obtain a vectorAnd interpolation vector +.>
Wherein, as indicated by the letter OR;
the hint vector H is calculated using n independent samples in the noise vector B and the mask vector M:
H=B⊙M
the generator G and the adaptive moment estimation optimizer are adopted to pretrain the discriminant D, so as to obtain a pretrained discriminant D;
after the training of the discriminator D is finished, carrying out weight cutting on the parameters of the discriminator D, so that the updated weights are positioned in a preset interval;
pre-training the generator G by adopting a pre-training discriminator D and a self-adaptive moment estimation optimizer to obtain a pre-training generator G;
partial preset low-loss-rate sample using a pre-training generator GObtaining the complement part training data set
3. The network traffic data complementing method based on generation of an countermeasure network of claim 2, wherein the miss rate comprises:
wherein mask m i Indicating whether the data at the i-th position in x is missing or not, and r (x) is the missing rate.
4. The network traffic data completion method based on generation of an countermeasure network according to claim 2, wherein the discriminant D pre-training optimization objective is:
wherein the loss function of the discriminator D{0,1} d ×[0,1] d ×{0,1} d R is:
m(j)、and h (j) is M,/respectively>And H;
correspondingly, the generator G pretrains the optimization objective as:
wherein the loss function L of the generator G G :{0,1} d ×[0,1] d ×{0,1} d R is:
loss function L M :R d ×R d R is:
wherein α is a first hyper-parameter.
5. The method for supplementing network traffic data based on generation of an countermeasure network according to claim 1, wherein clustering the supplemented partial training data set by a clustering algorithm to generate a pseudo tag, training a preset classifier based on the supplemented partial training data set and the pseudo tag to obtain a trained classifier, and comprising:
partial training data set after completion based on k-means clustering algorithmClustering to generate pseudo tags->
Using the complement-back part training datasetAnd pseudo tag->And training a classifier C, wherein the classifier C is a support vector machine.
6. The method of claim 1, wherein training the generator and the arbiter from the full training data set, and constraining the generator from the trained classifier, results in a network traffic data complement model, comprising:
respectively sampling n independent samples of a data vector X, a mask vector M, a randomly sampled noise vector Z and a noise vector B which are randomly valued at a prompt rate k, wherein the data vector X, the mask vector M, the randomly sampled noise vector Z and the noise vector B have the same size;
randomly initializing a generator G, and calculating n independent samples in the randomly initialized generator G, a data vector X, a mask vector M and a randomly sampled noise vector Z to obtain a vectorAnd interpolation vector +.>
Wherein, as indicated by the letter OR;
the hint vector H is calculated using n independent samples in the noise vector B and the mask vector M:
H=B⊙M
training the discriminator D by adopting a generator G and a self-adaptive moment estimation optimizer to obtain the discriminator D;
after the training of the discriminator D is finished, carrying out weight cutting on the parameters of the discriminator D, so that the updated weights are positioned in a preset interval;
training the generator G by adopting the discriminator D and the self-adaptive moment estimation optimizer to obtain the generator G, and obtaining the network flow data complement model.
7. The network traffic data completion method based on generation of an countermeasure network of claim 6, wherein the discriminant D training optimization objective is:
wherein the loss function of the discriminator D{0,1} d ×[0,1] d ×{0,1} d R is:
m(j)、and h (j) is M,/respectively>And H;
correspondingly, the generator G trains the optimization targets as:
wherein the loss function L of the generator G G :{0,1} d ×[0,1] d ×{0,1} d R is:
loss function L M :R d ×R d R is:
loss function L generated by classifier C constraint C The method comprises the following steps:
wherein alpha is a first super parameter and beta is a second super parameter.
8. The method of generating network traffic data completions based on countermeasure network of claim 1, wherein inputting the network traffic data set into the network traffic data completions model and outputting the completed network traffic data comprises:
training data setInputting the network flow data complement model to obtain complemented network flow data +.>
9. A network traffic data completion system based on generating an antagonism network, comprising:
the pre-training module is used for acquiring a training data set in the network flow data set, determining a part of preset low-loss-rate samples in the training data set, and pre-training a generator and a discriminator for generating an countermeasure network based on the part of preset low-loss-rate samples to obtain a part of training data set after completion;
the classifier training module is used for clustering the complemented partial training data set by adopting a clustering algorithm to generate a pseudo tag, and training a preset classifier based on the complemented partial training data set and the pseudo tag to obtain a trained classifier;
the training module is used for training the generator and the discriminator by all training data sets, and restraining the generator by the trained classifier to obtain a network flow data complement model;
and the completion processing module is used for inputting the network flow data set into the network flow data completion model and outputting the completed network flow data.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the network traffic data complementing method based on generating an countermeasure network as claimed in any one of claims 1 to 8 when the program is executed by the processor.
CN202311119704.6A 2023-08-30 2023-08-30 Network traffic data complement method and system based on generation of countermeasure network Pending CN117194903A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311119704.6A CN117194903A (en) 2023-08-30 2023-08-30 Network traffic data complement method and system based on generation of countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311119704.6A CN117194903A (en) 2023-08-30 2023-08-30 Network traffic data complement method and system based on generation of countermeasure network

Publications (1)

Publication Number Publication Date
CN117194903A true CN117194903A (en) 2023-12-08

Family

ID=88995382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311119704.6A Pending CN117194903A (en) 2023-08-30 2023-08-30 Network traffic data complement method and system based on generation of countermeasure network

Country Status (1)

Country Link
CN (1) CN117194903A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117874639A (en) * 2024-03-12 2024-04-12 山东能源数智云科技有限公司 Mechanical equipment service life prediction method and device based on artificial intelligence

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117874639A (en) * 2024-03-12 2024-04-12 山东能源数智云科技有限公司 Mechanical equipment service life prediction method and device based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN111177792B (en) Method and device for determining target business model based on privacy protection
CN104331635B (en) The method of power optical fiber Communication ray power prediction
CN107103332A (en) A kind of Method Using Relevance Vector Machine sorting technique towards large-scale dataset
CN111144548B (en) Method and device for identifying working condition of oil pumping well
CN113884290A (en) Voltage regulator fault diagnosis method based on self-training semi-supervised generation countermeasure network
CN114169110B (en) Motor bearing fault diagnosis method based on feature optimization and GWAA-XGboost
CN117194903A (en) Network traffic data complement method and system based on generation of countermeasure network
CN108062302A (en) A kind of recognition methods of particular text information and device
CN115510042A (en) Power system load data filling method and device based on generation countermeasure network
CN114091661B (en) Oversampling method for improving intrusion detection performance based on generation countermeasure network and k-nearest neighbor algorithm
CN115049024B (en) Training method and device of wind speed prediction model, electronic equipment and storage medium
CN115051864B (en) PCA-MF-WNN-based network security situation element extraction method and system
CN113988177A (en) Water quality sensor abnormal data detection and fault diagnosis method
CN117874639B (en) Mechanical equipment service life prediction method and device based on artificial intelligence
CN115659244A (en) Fault prediction method, device and storage medium
CN108665001B (en) Cross-tested idle state detection method based on deep belief network
CN111400964B (en) Fault occurrence time prediction method and device
CN109934352B (en) Automatic evolution method of intelligent model
CN117216713A (en) Fault delimiting method, device, electronic equipment and storage medium
Khushaba et al. Feature subset selection using differential evolution
CN116400168A (en) Power grid fault diagnosis method and system based on depth feature clustering
CN116595465A (en) High-dimensional sparse data outlier detection method and system based on self-encoder and data enhancement
CN116232699A (en) Training method of fine-grained network intrusion detection model and network intrusion detection method
CN117291314B (en) Construction method of energy risk identification model, energy risk identification method and device
Trivedi et al. Fine Tuning (Diagnosis) of Machine Learning Algorithm (Model) for optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination