CN110177112B - Network intrusion detection method based on double subspace sampling and confidence offset - Google Patents

Network intrusion detection method based on double subspace sampling and confidence offset Download PDF

Info

Publication number
CN110177112B
CN110177112B CN201910490598.XA CN201910490598A CN110177112B CN 110177112 B CN110177112 B CN 110177112B CN 201910490598 A CN201910490598 A CN 201910490598A CN 110177112 B CN110177112 B CN 110177112B
Authority
CN
China
Prior art keywords
layer
sample
confidence
model
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910490598.XA
Other languages
Chinese (zh)
Other versions
CN110177112A (en
Inventor
王喆
陈立龙
曹晨杰
李冬冬
杜文莉
杨海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN201910490598.XA priority Critical patent/CN110177112B/en
Publication of CN110177112A publication Critical patent/CN110177112A/en
Application granted granted Critical
Publication of CN110177112B publication Critical patent/CN110177112B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a network intrusion detection method based on double subspace sampling and confidence offset, which comprises the following steps of firstly, carrying out down-sampling pretreatment of a sample and feature double layer on a base classifier of each layer; secondly, the confidence of each layer is mixed with the original characteristics by an interpolation method to be used as new characteristics to be input into the next layer of the model; and then, disturbing the confidence coefficient of the interpolation layer by layer through a cascade model. In the test step, the perturbation of the confidence will not participate. Compared with the traditional unbalanced classification integration method, the method has the advantages that the unbalanced problem is expanded by the deep forest, and the threshold problem in unbalanced classification is further solved through a cascade structure; the system generates a model with selectable disturbance amplitude to train the sample, so that the detection performance of the model for unbalanced network intrusion can be effectively improved; meanwhile, the integrated model stacked layer by layer can obtain more excellent generalization performance in the detection process.

Description

Network intrusion detection method based on double subspace sampling and confidence offset
Technical Field
The invention relates to an unbalanced network intrusion detection and identification method, belonging to the field of network information security
Background
With the rapid development of network technology and the gradual expansion of the scale of the internet, the network security problem gradually goes into the public sight. The research related to the network intrusion identification method is also a popular field in the year. The basic main network attack types include Denial of Service (DoS), unauthorized Remote host access (R2L), unauthorized super User access (User-to-Root, U2R), and snooping detection (Probing), and the above attack methods can further derive numerous sub-attack methods. It is therefore imperative to construct a targeted detection scheme for these network attacks.
The existing common network attack detection method comprises the following steps: 1) the rule-based detection method has high dependency on the existing rule database, cannot update new network attack means in time and is easy to cause huge loss; 2) the detection method depends on the network flow characteristic distribution, but the detection method excessively depends on the randomness, and partial attack means can be skillfully avoided; 3) intrusion detection methods based on machine learning, for example using support vector machines, random forests, neural networks, etc. The unknown network attack can be effectively and timely responded by using the machine learning-based method. But limited by different physical conditions and environmental restrictions, the number of network intrusions tends to be unbalanced in category, but the traditional machine learning method is difficult to solve the unbalanced type of network intrusions.
The unbalanced network intrusion detection problem can be effectively solved by using ensemble learning and combining with data sampling. These sampling-based integration methods can be further classified into bagging integration, boosting integration and hybrid integration strategies according to different integration strategies. These integration areas already have a number of representative algorithms. On the other hand, algorithms of the stochastic feature subspace are proposed to avoid underestimating some implicit important features and to filter some possible noisy features. The related algorithm is combined with a bagging integration strategy and a base classifier such as an SVM, and a representative algorithm ABRS-SVM and the like are provided.
The Zhou Shihua teacher provides a deep forest integration algorithm in 2017, and can compete with deep learning classification performance by using fewer hyper-parameters and lighter models. Meanwhile, the ideal classification effect can be achieved on a small number of data sets. The idea of the cascade forest is a design idea of model stacking, and generalization performance of the algorithm can be effectively improved.
However, the above ensemble learning method does not solve the threshold problem in the imbalance problem well, which results in that the classification performance cannot achieve the ideal effect when facing a data set with a high imbalance rate, such as network intrusion. And the cascade forest is used as a novel integrated model and has no optimization strategy aiming at the imbalance problem. Therefore, a new cascaded integration model is needed to promote the cascaded forest to the unbalanced problem and effectively solve the threshold problem in the existing unbalanced network intrusion detection.
Disclosure of Invention
Aiming at the problems that the existing integration algorithm can not effectively solve unbalanced network intrusion detection, the integration scale can not be well determined and modeling can only be carried out by experience, the invention provides a network intrusion detection method based on double subspace sampling and confidence offset by popularizing the characteristics of cascade forests. The integrated model effectively utilizes the model stacking structure of the cascade forest to adjust the classification threshold value of the imbalance problem layer by layer; and double downsampling data preprocessing is introduced to enable the model to effectively solve the problem of unbalanced network intrusion detection; and the verification mechanism can well control the scale of the model.
The technical scheme adopted by the invention for solving the technical problems is as follows: firstly, converting an acquired sample into a vector model which can be processed by the system according to specific problem description by a background, and carrying out one-hot coding on discrete features; secondly, the integrated model optimizes the classification performance of the unbalance problem through a double down-sampling strategy on a sample level and a feature level; according to the output confidence coefficient of the base classifier of the previous layer, carrying out feature disturbance on the base classifier of the previous layer, and mixing the base classifier with the original features to be used as the input of a model of the next layer; and a verification mechanism is added into the cascade model, so that the layer number can be adaptively stopped from increasing. In the testing process, data is substituted into the cascade model trained before, confidence offset is not needed, and the output of the last layer is used as a final result.
The technical scheme adopted by the invention for solving the technical problem can be further refined. In the first stage of the training step, the base classifier trained by each layer is a random forest and naive Bayes of a classical algorithm. The base classifier can be expanded more, and only the 2 types are selected as the base classifier in the experiment in consideration of the interpretability of the problem and the realization difficulty of the method. Meanwhile, in the testing and verifying process, the average accuracy of the majority classes and the minority classes is used as an evaluation index to objectively express the performance of the algorithm.
The invention has the beneficial effects that: the method comprises the steps of (1) designing a cascade integration model to popularize a cascade forest to the unbalanced field; the disturbance amplitude is controlled by adjusting the hyper-parameter eta, so that the model can effectively solve the classification problem in unbalanced classification.
Drawings
FIG. 1 is an overall flow chart of the present invention.
Detailed Description
The invention will be further described with reference to the following figures and examples: the system designed by the invention is divided into four modules.
A first part: data acquisition
In the data acquisition process, real sample data is transformed, and a data set represented by a vector is generated to facilitate the processing of a subsequent module. In this step, the collected sample is divided into a training sample and a test sample. The training samples are processed first. Generating a vector from a training sample
Figure BDA0002085665020000031
Wherein i represents that the sample is the ith of the total training sample, and c represents that the sample belongs to the c-th class. Each element of the vector corresponds to an attribute of a sample, and the dimension d of the vector is the number of attributes of the sample. To facilitate subsequent calculation, all training samples are combined into a training matrix X0In the matrix, each row is a sample, where the subscript 0 denotes X0Is the initial input.
A second part: training classification model
In this block, the training sample matrix X generated by the previous block0Will be substituted into the inventive core algorithm for training. The method mainly comprises the following steps:
1) the base classifier used in the integration model is random forest and naive Bayes: in a random forest, a CART tree is used as a sub-classifier, and k features are randomly selected from d features to participate in the discrimination of Gini indexes every time leaf nodes of the CART are split, wherein k is generally
Figure BDA0002085665020000032
Gini index is calculated as follows
Figure BDA0002085665020000033
Figure BDA0002085665020000034
Wherein
Figure BDA0002085665020000035
Representing k feature subspaces FkWherein the ith feature and v represent features
Figure BDA0002085665020000036
Is given by the value v, pyIndicating the scale of the class y samples. The lower the Gini index, the better the classification performance of the feature. Naive bayes can be viewed as the simplest bayesian network classifier. There is a conditional independence assumption in naive Bayes with a decision equation of
Figure BDA0002085665020000037
Where P (y) represents the prior probability of the class y, P (x)iY) then represents the conditional probability of the feature i in category y. Both random forests and naive bayes cannot reasonably deal with the imbalance problem because they are optimized on a global basis.
2) Training a random forest or naive Bayes base classifier of each layer by using a random down-sampling strategy based on sample and feature double layers: hypothesis training set XFThe total number of samples is N, wherein the number of the minority class samples is NpThe number of majority samples is Nn. In the double random down-sampling strategy, first, a majority class N 'equal to a minority class is randomly selected without being returned in a sample set'n=NpWhile all minority classes are involved in training; then, for the feature space F, a different feature subspace F '(F' e F) is selected for training. This not only reduces the effect of unbalanced sample ratios, but also effectively filters the negative effects of some undesirable features. The specific algorithm steps are as follows
Figure BDA0002085665020000041
Where S and E are the number of integration times of the sample and feature sample, respectively, δ is the feature sampling rate, and RUS is the majority class with random downsampling equal to the minority class.
3) And (3) according to the output confidence coefficient of the last layer of base classifier, performing characteristic disturbance on the base classifier, and mixing the base classifier with the original characteristics to be used as the input of the next layer of model: the base classifier used by the cascade interpolation integration model is random forest and naive Bayes, wherein the confidence coefficient calculation mode of the Random Forest (RF) is
Figure BDA0002085665020000042
Can be intuitively understood as the average of the sample proportions of the belonged category y' in the leaf nodes. The confidence degree of the Naive Bayes (NB) is calculated in the way of
Figure BDA0002085665020000043
Representing the posterior probability of the category y'. To prevent overfitting, the base classifiers inside each layer are cross-validated by 3-fold to generate confidence. The confidence offset procedure through which the resulting confidence vector V passes is as follows
V′l(i,ymajority)=Vl(i,ymajority)×η
V′l(i,yminority)=Vi(i,yminority)/η,
Wherein eta is a hyper-parameter, the general value range is the neighborhood of 1, and the value range in the experiment is {0.85,0.9,0.95,1,1.05,1.1,1.15 }. It is clear that the confidence offset process is negligible when η is 1. From the above equation, the confidence for the majority classes is multiplied by the parameter η, and the confidence for the minority classes is divided by η. The bias weight of the majority class/minority class is dynamically adjusted layer by layer through the disturbance of the confidence level. Finally, the disturbed feature V' and the original feature are mixed and interpolated to be used as the input of the next layer model
Figure BDA0002085665020000044
Wherein X0For the original feature, l is the current layer number, the sample number is m, the dimension is d, and the dimension of the interpolation confidence coefficient is NclassI.e. the number of categories.
4) A verification mechanism is added into a cascade model, so that the number of layers can be adaptively stopped from increasing, and the method is specifically realized as follows: the number of layers of the cascade interpolation model is limited by 2, and in the experiment, the maximum number of layers cannot exceed 5; second, each layer will perform a validation process with all layers before and after training is complete. Since the prior training was done by cross-validation, the validation process becomes more convincing. Here, the average accuracy (M-ACC) is used as the evaluation criterion for the verification
Figure BDA0002085665020000051
Wherein TPR is the accuracy of the minority class, and TNR is the accuracy of the majority class. If the verified M-ACC drops, the number of layers stops growing.
And a third part: testing unknown data
The module firstly takes the other half of samples randomly divided in the first module as test samples to form a test sample matrix, wherein a training set and a test need to meet the premise of the same probability distribution. Secondly, a model trained by the optimal over-parameter eta and the cascade layer number l is used in the testing process. And it is important to note that confidence migration is not required in the testing process, because it is the difference between the training set and the disturbance of the testing set that makes it sensitive enough to different classification thresholds, so that the unbalanced classification problem can be solved better.
Design of experiments
1) Selecting and introducing an experimental data set: KDD is short for Data Mining and Knowledge Discovery (Data Mining and Knowledge Discovery), and KDD CUP is an annual competition organized by SIGKDD (Special Interest Group on Knowledge Discovery and Data Mining) of ACM (Association for Computing machine). The KDD CUP 99 data set is a standard in the field of network intrusion detection, and lays a foundation for network intrusion detection research based on computational intelligence. Different kinds of network attack data have obvious imbalance phenomena in quantity, and the imbalance phenomena form a main factor influencing the classification performance. The experiment selected 5 unbalanced KDD Cup 99 datasets from the KEEL database. Respectively, 'land _ vs _ satan', 'side _ past _ vs _ satan', 'land _ vs _ portsweep', 'buffer _ overflow _ vs _ back' and 'rootkit-imap _ vs _ back'. The data information is shown in the following table, and the discrete features in the data are all represented by replacing one-hot.
Figure BDA0002085665020000052
Figure BDA0002085665020000061
All used data sets were checked with 5 rounds of cross-validation, i.e., the data sets were shuffled and equally divided into 5, 4 of which were used for training each time, 1 for testing, and a total of 5 rounds were performed. I.e., all data will be tested as a test set.
2) Comparing models: the system provided by the invention is named as CILDC, and the models based on the random forest are named as CILDC-RF respectively. In addition, we chose Random Forest (RF), dual subspace SVM (ABRS-SVM) and cost-sensitive based SVM (CS-SVM) as a comparison.
3) Parameter selection: the value range of a disturbance coefficient eta of the CILDC is {0.85,0.9,0.95,1,1.05,1.1 and 1.15}, the integration times of the double subspaces are all 5, the number of trees of the random forest is 50, the SVM uses an RBF kernel, the values of a relaxation coefficient C and a kernel radius sigma of the SVM are all {0.01,0.1,1,10 and 100}, and the characteristic sampling rates are all selected from {0.5,0.7 and 0.9}
4) The performance measurement method comprises the following steps: the experiments uniformly used the average accuracy M-ACC of the majority and minority classes as the evaluation criterion.
Results of the experiment
The M-ACC results for all models on each KEEL dataset are as follows. The last line in the table represents their average M-ACC and the black font represents the optimal result.
Figure BDA0002085665020000062
From the above table, it can be found that the CILDC-RF of the present invention can obtain the best results in most data sets, and the performance exceeds that of other comparison algorithms. Particularly, the precedence is obvious on a data set of 'rootkit-imap _ vs _ back'. In addition, the variance of the CILDC-RF is lower compared with other algorithms, which shows that the algorithm has more stable classification effect on KDD network attack data.

Claims (4)

1. The network intrusion detection method based on double subspace sampling and confidence offset is characterized in that: the method comprises the following specific steps:
1) the first step of pretreatment: constructing a network attack characteristic through a network data acquisition tool, and converting the acquired sample set characteristic into a data matrix suitable for subsequent processing;
2) a second step of pretreatment: distinguishing continuous features and discrete features in original data, and performing one-hot conversion on all discrete features;
3) training a first step: training a random forest or naive Bayes base classifier of each layer by using a random down-sampling strategy based on a sample and feature double layer, wherein the details are as follows: suppose the total number of training set samples is N, wherein the number of minority class samples is NpThe number of majority samples is Nn(ii) a For the ith sample integration, a total of S times are performed, and in the double random downsampling strategy, the majority class N 'equal to the minority class is selected randomly without being replaced in the sample set'n=NpAnd all the minority classes participate in training to obtain the integrated sample in the feature space F after the ith sample samplingSample collection
Figure FDA0003249477800000011
And then E-time feature sampling integration is carried out, for j-th time feature sampling integration, different feature subspaces F 'are randomly selected for a feature space F, wherein F' belongs to F, | F | ═ F | × delta, and h is usedi,j(x) Training is performed, where S and E are the number of integration times of the sample and feature sample, respectively, and δ is the feature sampling rate,
Figure FDA0003249477800000012
is the sample set h after the ith sample sampling integration in the feature space Fi,j(x) The method is a base classifier after the ith sample sampling and the jth feature sampling, and the RUS is a majority class which is randomly sampled and is equal to a minority class;
4) and a second training step: performing feature disturbance on the output confidence coefficient of the base classifier of the previous layer according to the output confidence coefficient of the base classifier of the previous layer, and mixing the output confidence coefficient with the original features to be used as the input of a model of the next layer;
5) and a third training step: a verification mechanism is added into the cascade model, so that the number of layers can be adaptively stopped from increasing;
6) and (3) testing: and inputting the test data set into the obtained cascade model to finally obtain a detection classification result of the network intrusion.
2. The method of claim 1, wherein the network intrusion detection method based on dual subspace sampling and confidence offsets is characterized in that: and in the second training step, according to the output confidence coefficient of the last-layer base classifier, performing feature disturbance on the last-layer base classifier, and mixing the feature with the original features to be used as the input of the next-layer model, wherein the detailed description is as follows: the base classifier used by the cascade interpolation integration model is random forest and naive Bayes, wherein the confidence coefficient calculation mode of the Random Forest (RF) is
Figure FDA0003249477800000021
Can be intuitively understood as the mean value of the sample proportion of the belonged category y' in the leaf node, and the confidence degree of Naive Bayes (NB) is calculated in the way of
Figure FDA0003249477800000022
Representing the posterior probability of class y', the base classifier inside each layer is used to generate confidence by 3-fold cross validation to prevent overfitting, and the confidence offset process passed by the resulting confidence vector V is as follows
V′l(i,ymajority)=Vl(i,ymajority)×η
V′l(i,yminority)=Vl(i,yminority)/η,
Where 1 is the current number of layers, Vl(i,ymaiority) Is the probability, V, that a sample i in layer 1 belongs to the majority classl(i,yminority) The probability that a sample i in a layer belongs to a minority class, eta is a hyper-parameter, the general value range is a neighbor of 1, the confidence coefficient of a majority class is multiplied by a parameter eta, the confidence coefficient of a minority class is divided by eta, the bias weight of the majority class/the minority class is dynamically adjusted layer by layer through disturbance of the confidence coefficient, and finally the feature V 'after disturbance'lWill be mixed with the original features and interpolated as the input of the next layer model
Figure FDA0003249477800000023
Wherein X0The method has the characteristics that the method has the original characteristics,
Figure FDA0003249477800000024
is a real number set, l is the current layer number, the sample number is m, the dimensionality is d, and the dimensionality of the interpolation confidence coefficient is NclassI.e. the number of categories.
3. The method of claim 1, wherein the network intrusion detection method based on dual subspace sampling and confidence offsets is characterized in that: the third step of training, adding a verification mechanism in the cascade model to enable the number of layers to be self-adaptive and stop increasing, and the method is specifically realized as follows: the number of layers of the cascade interpolation model is limited by 2, and in the experiment, the maximum number of layers cannot exceed 5; secondly, each layer is subjected to a verification process after training is finished and before all the layers, and the average accuracy (M-ACC) is used as an evaluation standard of verification
Figure FDA0003249477800000025
Where TPR is the accuracy of the minority class and TNR is the accuracy of the majority class, the number of layers stops increasing if the verified M-ACC is somewhat degraded.
4. The method of claim 1, wherein the network intrusion detection method based on dual subspace sampling and confidence offsets is characterized in that: in the testing stage, a testing data set is input into the obtained cascade model, the characteristics of the cascade model do not need to be disturbed in the process of layer-by-layer interpolation, and the specific operation is as follows: the training set and the test need to satisfy the premise of the same probability distribution, and secondly, the optimal hyper-parameter eta and the model trained by the cascade layer number I are used in the test process.
CN201910490598.XA 2019-06-05 2019-06-05 Network intrusion detection method based on double subspace sampling and confidence offset Active CN110177112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910490598.XA CN110177112B (en) 2019-06-05 2019-06-05 Network intrusion detection method based on double subspace sampling and confidence offset

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910490598.XA CN110177112B (en) 2019-06-05 2019-06-05 Network intrusion detection method based on double subspace sampling and confidence offset

Publications (2)

Publication Number Publication Date
CN110177112A CN110177112A (en) 2019-08-27
CN110177112B true CN110177112B (en) 2021-11-30

Family

ID=67697332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910490598.XA Active CN110177112B (en) 2019-06-05 2019-06-05 Network intrusion detection method based on double subspace sampling and confidence offset

Country Status (1)

Country Link
CN (1) CN110177112B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016597B (en) * 2020-08-12 2023-07-18 河海大学常州校区 Depth sampling method based on Bayesian unbalance measurement in machine learning
CN116226629B (en) * 2022-11-01 2024-03-22 内蒙古卫数数据科技有限公司 Multi-model feature selection method and system based on feature contribution
CN117240602B (en) * 2023-11-09 2024-01-19 北京中海通科技有限公司 Identity authentication platform safety protection method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108023876A (en) * 2017-11-20 2018-05-11 西安电子科技大学 Intrusion detection method and intruding detection system based on sustainability integrated study
CN108304884A (en) * 2018-02-23 2018-07-20 华东理工大学 A kind of cost-sensitive stacking integrated study frame of feature based inverse mapping
CN109347872A (en) * 2018-11-29 2019-02-15 电子科技大学 A kind of network inbreak detection method based on fuzziness and integrated study
CN109460872A (en) * 2018-11-14 2019-03-12 重庆邮电大学 One kind being lost unbalanced data prediction technique towards mobile communication subscriber
EP3336739B1 (en) * 2016-12-18 2020-02-26 Deutsche Telekom AG A method for classifying attack sources in cyber-attack sensor systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3336739B1 (en) * 2016-12-18 2020-02-26 Deutsche Telekom AG A method for classifying attack sources in cyber-attack sensor systems
CN108023876A (en) * 2017-11-20 2018-05-11 西安电子科技大学 Intrusion detection method and intruding detection system based on sustainability integrated study
CN108304884A (en) * 2018-02-23 2018-07-20 华东理工大学 A kind of cost-sensitive stacking integrated study frame of feature based inverse mapping
CN109460872A (en) * 2018-11-14 2019-03-12 重庆邮电大学 One kind being lost unbalanced data prediction technique towards mobile communication subscriber
CN109347872A (en) * 2018-11-29 2019-02-15 电子科技大学 A kind of network inbreak detection method based on fuzziness and integrated study

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于主动学习的非均衡异常数据分类算法研究;王波、王怀彬;《信息网络安全》;20171030;第46页 *

Also Published As

Publication number Publication date
CN110177112A (en) 2019-08-27

Similar Documents

Publication Publication Date Title
Song et al. Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data
Grathwohl et al. Your classifier is secretly an energy based model and you should treat it like one
Isa et al. Using the self organizing map for clustering of text documents
CN110177112B (en) Network intrusion detection method based on double subspace sampling and confidence offset
CN110266672B (en) Network intrusion detection method based on information entropy and confidence degree downsampling
CN108898154A (en) A kind of electric load SOM-FCM Hierarchical clustering methods
CN105160249B (en) A kind of method for detecting virus based on improved Artificial neural network ensemble
CN110009030B (en) Sewage treatment fault diagnosis method based on stacking meta-learning strategy
CN112001788B (en) Credit card illegal fraud identification method based on RF-DBSCAN algorithm
Al Iqbal et al. Knowledge based decision tree construction with feature importance domain knowledge
CN115048988B (en) Unbalanced data set classification fusion method based on Gaussian mixture model
Satyanarayana et al. Survey of classification techniques in data mining
Wang et al. An improving majority weighted minority oversampling technique for imbalanced classification problem
CN109409434A (en) The method of liver diseases data classification Rule Extraction based on random forest
Fu et al. Construction and reasoning approach of belief rule-base for classification base on decision tree
CN114091661A (en) Oversampling method for improving intrusion detection performance based on generation countermeasure network and k-nearest neighbor algorithm
Gao et al. Machine learning for credit card fraud detection
Zhang et al. Consumer credit risk assessment: A review from the state-of-the-art classification algorithms, data traits, and learning methods
Zhou et al. Credit card fraud identification based on principal component analysis and improved AdaBoost algorithm
Mao et al. Naive Bayesian algorithm classification model with local attribute weighted based on KNN
Nakashima et al. Incremental learning of fuzzy rule-based classifiers for large data sets
Alcalá et al. Generating single granularity-based fuzzy classification rules for multiobjective genetic fuzzy rule selection
Trawinski et al. Embedding evolutionary multiobjective optimization into fuzzy linguistic combination method for fuzzy rule-based classifier ensembles
Zhang et al. Unbalanced data classification based on oversampling and integrated learning
Li et al. Study on the Prediction of Imbalanced Bank Customer Churn Based on Generative Adversarial Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant