CN114254691A - Multi-channel operation wind control method based on active identification and intelligent monitoring - Google Patents

Multi-channel operation wind control method based on active identification and intelligent monitoring Download PDF

Info

Publication number
CN114254691A
CN114254691A CN202111303151.0A CN202111303151A CN114254691A CN 114254691 A CN114254691 A CN 114254691A CN 202111303151 A CN202111303151 A CN 202111303151A CN 114254691 A CN114254691 A CN 114254691A
Authority
CN
China
Prior art keywords
data
wind control
control method
method based
channel operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111303151.0A
Other languages
Chinese (zh)
Inventor
曹世龙
蔡颖凯
王一哲
付瀚臣
刘鑫
穆蓉
许晶晶
韩昕檀
赵千乔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Marketing Service Center Of State Grid Liaoning Electric Power Co ltd
State Grid Corp of China SGCC
Original Assignee
Marketing Service Center Of State Grid Liaoning Electric Power Co ltd
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Marketing Service Center Of State Grid Liaoning Electric Power Co ltd, State Grid Corp of China SGCC filed Critical Marketing Service Center Of State Grid Liaoning Electric Power Co ltd
Priority to CN202111303151.0A priority Critical patent/CN114254691A/en
Publication of CN114254691A publication Critical patent/CN114254691A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a multi-channel operation wind control method based on active identification and intelligent monitoring, which is constructed by a computer: an under-sampling module and an intelligent analysis module. The under-sampling module is responsible for sampling and checking input data and judging whether data abnormity exists or not. And when the data are abnormal, the data are transferred to an intelligent analysis module for verification. And the intelligent analysis module is responsible for rechecking the input data. By adopting the method, a balanced and efficient processing mechanism exists in the manual work and the intelligent work. By carrying out double monitoring on network flow, main flow characteristics and the like, multiple searching is carried out on the premise of not increasing hardware resources and consumption too much, and then suggestive abnormal behaviors and risks are judged, so that a better wind control effect is achieved.

Description

Multi-channel operation wind control method based on active identification and intelligent monitoring
Technical Field
The invention relates to the field of network security, in particular to a multi-channel operation wind control method based on active identification and intelligent monitoring, which aims at network data faking and comprises but is not limited to malicious traffic promotion behaviors, black product group behaviors, activity cheating behaviors, malicious ranking behaviors and the like.
Background
With the rapid development of the network, the network citizens have increasingly diversified behaviors of malicious modification, attack and tampering of browsed information, and the behaviors typically include malicious traffic promotion behaviors, black-generation group-partner behaviors, activity cheating behaviors, malicious ranking behaviors and the like.
These hackers, who hide behind the network, exploit vulnerabilities of the network, exploit cheating, violation or even illegal means, or advance data in a short time, or steal network information of the user, or install malware, or send false shopping, winning, recruiting information, etc.
It can be said that as the function of network equipment is complex, the bandwidth of the network is increased, and the network tools are enriched, the wind control risk for the network is increased day by day, and the hidden, inductive and technical properties thereof make the wind control management difficult, and no matter manual, intelligent or combined prevention and control, the network equipment faces huge pressure.
However, the network security and wind control problem is an unavoidable objective reality problem, and how to stick to and maintain the security order of the internet, improve the quality of network information, and detect false information from a large amount of network resources has very important practical significance.
Disclosure of Invention
Aiming at the problems, the invention provides a multi-channel operation wind control method based on active identification and intelligent monitoring.
By adopting the method, a balanced and efficient processing mechanism exists in the manual work and the intelligent work.
By carrying out double monitoring on network flow, main flow characteristics and the like, multiple searching is carried out on the premise of not increasing hardware resources and consumption too much, and then suggestive abnormal behaviors and risks are judged, so that a better wind control effect is achieved.
The invention specifically comprises the following steps:
the multi-channel operation wind control method based on active identification and intelligent monitoring is established by a computer: an under-sampling module and an intelligent analysis module. The under-sampling module is responsible for sampling and checking input data and judging whether data abnormity exists or not. And when the data are abnormal, the data are transferred to an intelligent analysis module for verification. And the intelligent analysis module is responsible for rechecking the input data.
The undersampling module is used for judging the distance of the appointed nearest neighbor sample point, simultaneously paying attention to poor data effect caused by unbalanced samples, and overcoming the problem of large influence of noise points. The undersampling module specifically comprises: the system comprises a network traffic data acquisition and preprocessing submodule, a supervised learning classification model building submodule, a KNN-based semi-supervised learning label correction submodule and a model updating submodule.
After the intelligent analysis module is activated by the undersampling module, the data is verified according to the following steps:
step 1: data set balancing processing: the number of samples designated as category C1 was k1 and the number of samples designated as category C2 was k2, given in the data set. Each sample in the data set represents a d-dimensional vector, and the data set is balanced by adopting a k-means algorithm to obtain k clustering clusters.
If a sample is closer to the cluster center, the more representative a sample is the characteristic of the cluster. Assuming that Ci cluster contains ni samples, i takes 1, 2, … … k.
The samples nearest ni x k2/k1 to the cluster center wi will be selected from the cluster Ci in proportion. Finally, N samples of the type C1 are obtained, and the number of the samples of the type C1 is balanced with the number of the samples of the type C2.
Step 2: selecting characteristics: firstly, generating a feature subset from an original feature set, then evaluating the feature subset by adopting an evaluation function, finally comparing an evaluation result with a stopping criterion, if the evaluation result meets the stopping criterion, outputting the feature subset and verifying the feature subset, and if not, continuously generating the next feature subset and continuously evaluating the feature subset. The evaluation formula adopted in the step is as follows:
Figure BDA0003339113050000021
wherein, A is the characteristic, C is the category, H (C) represents the information entropy of the whole classification system, n represents the category number of the classification system, P (Ci) represents the sample proportion with the category Ci, m represents the value number of the characteristic A, and P (C)i∣A=Aj) And the probability of belonging to the category Ci under the condition that the characteristic A takes the value of Aj is shown.
And step 3: assuming a set S, a characteristic A and a breakpoint T, dividing a value set S of the characteristic A into two sets S by the breakpoint T1And S2. Wherein at S1Wherein A is less than or equal to T, in S2A in (A) is greater than T. The weighted information Entropy of feature A, Encopy (A, T; S), can be used to calculate the information Entropy of set S, with the following formula:
Figure BDA0003339113050000031
namely, the information entropy of the set S divided by the breakpoint T is obtained.
For feature A in subset S1And S2And continuously and circularly carrying out discretization treatment to obtain:
IG(A,T;S)=Entropy(S)-Entropy(A,T;S)
Figure BDA0003339113050000032
Figure BDA0003339113050000033
the value of the information gain IG (A, T; S) is now less than the threshold value delta. In the above formula, N represents the number of samples in the set S; k represents the number of categories contained in the set S, kiIs represented in the set SiThe number of categories contained in (1).
And 4, step 4: and adopting ant colony optimization to intelligently judge whether the behavior causing data abnormity is a cheating behavior. The cheating behaviors comprise malicious popularization flow behaviors, black product group behaviors and activity cheating behaviors.
In the process of forming the classification rule of the ant algorithm, the ants add the condition items to the classification rule antecedents, and the condition items are added to the rule antecedents and are also selected by the probability selection function PijAnd (4) determining.
Figure BDA0003339113050000034
Wherein eta isijRepresenting conditional items termij has a formula function value etaijThe larger the conditional termijThe greater the contribution to the classification system, and therefore the greater the probability that the condition term is selected for addition to the classification rule antecedent. Tau isij(t) indicates the condition term during the t-th iterationijThe pheromone of (a). a represents the number of features if feature AiNot used by current ants, then xiWill be set to 1. Otherwise, xiWill be set to 0, biValue representing ith characteristicAnd (4) the number. termijRepresents a condition item Ai=VijWherein A isiDenotes the ith feature, VijThe jth value representing the ith feature. Probability selection function PijI.e. the condition termijProbability of being selected by ants and added to the classification rule antecedent.
And taking the proportion of the condition items associated with the cheating website in the training set as heuristic information to guide ants to search the optimal condition item combination to construct a classification rule. Conditional termijOf the heuristic function etaijComprises the following steps:
Figure BDA0003339113050000041
wherein, | Tij| represents the condition termijFrequency of occurrence in the entire training set, | spam _ class Tij| represents a conditional item termijThe frequency of occurrence in the training set of the cheating websites is in a direct proportional relationship, so that the mosquitoes can preferentially select the condition item with higher association degree with the cheating websites to add into the classification rule antecedent. As the number of iterations increases and the pheromone is updated, ants will gradually find the classification detection rules associated with normal websites.
And 5: and (4) outputting the result obtained in the step (4), and manually and comprehensively judging to determine whether the behavior causing the data exception is an illegal behavior.
Advantageous technical effects
The method has high working efficiency, adopts the combination of the KNN semi-supervised learning algorithm and the ant colony learning algorithm, monitors the fluctuation and the abnormality of data through a small amount of data, verifies and finds out possible occurrence points of the abnormality by using the ant colony optimization algorithm, is not only suitable for a link-based detection technology, but also considers the content-based detection technology, and is suitable for various network risk occasions.
The invention not only considers the efficiency of artificial intelligence, but also attaches importance value of artificial experience. By adopting the method, the accuracy and timeliness of artificial intelligence can be gradually improved through software training.
The method can overcome the learning deviation of artificial intelligence caused by insufficient sample points, and can better improve the universality of equipment through later training.
Drawings
FIG. 1 is a block flow diagram of the present invention.
FIG. 2 is a flow chart of the undersampling module of FIG. 1
Fig. 3 is a flow chart of the intelligent analysis module of fig. 1.
Detailed Description
Technical features of the present invention will now be described in detail with reference to the accompanying drawings.
Referring to fig. 1, the multi-channel operation wind control method based on active identification and intelligent monitoring is constructed by a computer: an under-sampling module and an intelligent analysis module. The under-sampling module is responsible for sampling and checking input data and judging whether data abnormity exists or not. And when the data are abnormal, the data are transferred to an intelligent analysis module for verification. And the intelligent analysis module is responsible for rechecking the input data.
Referring to fig. 2, the undersampling module focuses on poor data effect caused by unbalanced samples while making a decision on the specified nearest neighbor sample point distance, and overcomes the problem of large influence of noise points.
The undersampling module specifically comprises: the system comprises a network traffic data acquisition and preprocessing submodule, a supervised learning classification model building submodule, a KNN-based semi-supervised learning label correction submodule and a model updating submodule.
Furthermore, the network traffic data acquisition and preprocessing submodule is responsible for acquiring the network traffic data. The main flow characteristics of the aforementioned network traffic data should include: source IP address, protocol type, number of bytes, etc.
The submodule is also responsible for normalizing the network traffic data,
Figure BDA0003339113050000051
where x is the sample attribute value, xminIs the minimum value of the property, xmaxIs the maximum value of the property, xscaleIs the normalized data. The above data subjected to the normalization processing is referred to as "tag data".
Furthermore, in a supervised learning-based classification model building submodule, label data is selected as training data, and then a proper classification model is selected for training. The data set is divided in the model training process, part of data is randomly extracted for verification, and an initial classification model is trained by using a cross-validation method in the model selection process, wherein the initial classification model is the optimal network flow. The set of "initial classification models" is the "initial classification dataset". The accuracy of the classification model is improved by the steps.
Furthermore, the KNN-based semi-supervised learning label correction submodule is responsible for detecting whether the detected abnormal flow data reaches a specified threshold value or not in a low-loss manner on the premise of not increasing the amount of human participation: if the threshold value is not reached, the monitoring is continued. Otherwise, the data label correction operation is executed. The method specifically comprises the following steps:
1) a portion of the "tag data" is scaled for manual tagging and then placed back into the dataset.
2) Firstly, extracting data in a data set, adding the importance of flow characteristics into a KNN decision, and calculating the weighted Euclidean distance between samples. Then, the classification of the data set is completed using the Self-tracing method. And finally, selecting the optimal neighbor number K by using ten-fold cross validation. The "corrected classification data" is obtained. The aggregation of the obtained "corrected classification data" is the obtained "corrected classification data set".
Further, the model update submodule operates according to the following steps:
1) the initial classification data set and the corrected classification data set are compared, after the places with different data classification marks are found out, the places are manually verified and input into the initial classification data set, and the classification data set is called as the classification data set after being manually verified and updated.
2) And training the data set in the classification data set after the artificial verification is updated to obtain a classification model after the artificial verification is updated.
3) Comparing the classification progress of the 'initial classification model' and the 'classification model after artificial verification and update':
if the former progress is higher than the latter, the proportion of the manually marked data is increased.
Otherwise, the 'initial classification model' is abandoned, and the 'classification model after the artificial verification updating' is adopted.
Referring to fig. 3, after the intelligent analysis module is activated by the undersampling module, the data is verified according to the following steps:
step 1: data set balancing processing: the number of samples designated as category C1 was k1 and the number of samples designated as category C2 was k2, given in the data set. Each sample in the data set represents a d-dimensional vector, and the data set is balanced by adopting a k-means algorithm to obtain k clustering clusters.
If a sample is closer to the cluster center, the more representative a sample is the characteristic of the cluster. Assuming that Ci cluster contains ni samples, i takes 1, 2, … … k.
The samples nearest ni x k2/k1 to the cluster center wi will be selected from the cluster Ci in proportion. Finally, N samples of the type C1 are obtained, and the number of the samples of the type C1 is balanced with the number of the samples of the type C2.
Step 2: selecting characteristics: firstly, generating a feature subset from an original feature set, then evaluating the feature subset by adopting an evaluation function, finally comparing an evaluation result with a stopping criterion, if the evaluation result meets the stopping criterion, outputting the feature subset and verifying the feature subset, and if not, continuously generating the next feature subset and continuously evaluating the feature subset. The evaluation formula adopted in the step is as follows:
Figure BDA0003339113050000071
wherein A is a feature, C is a class, H (C)Representing the information entropy of the whole classification system, n representing the number of classes of the classification system, P (Ci) representing the sample proportion of the class Ci, m representing the number of values of the characteristic A, and P (C)i∣A=Aj) And the probability of belonging to the category Ci under the condition that the characteristic A takes the value of Aj is shown.
And step 3: assuming a set S, a characteristic A and a breakpoint T, dividing a value set S of the characteristic A into two sets S by the breakpoint T1And S2. Wherein at S1Wherein A is less than or equal to T, in S2A in (A) is greater than T. The weighted information Entropy of feature A, Encopy (A, T; S), can be used to calculate the information Entropy of set S, with the following formula:
Figure BDA0003339113050000072
namely, the information entropy of the set S divided by the breakpoint T is obtained.
For feature A in subset S1And S2And continuously and circularly carrying out discretization treatment to obtain:
IG(A,T;S)=Entropy(S)-Entropy(A,T;S)
Figure BDA0003339113050000073
Figure BDA0003339113050000074
the value of the information gain IG (A, T; S) is now less than the threshold value delta. In the above formula, N represents the number of samples in the set S; k represents the number of categories contained in the set S, kiIs represented in the set SiThe number of categories contained in (1).
And 4, step 4: and adopting ant colony optimization to intelligently judge whether the behavior causing data abnormity is a cheating behavior. The cheating behaviors comprise malicious popularization flow behaviors, black product group behaviors and activity cheating behaviors.
The classification rule of ant algorithm is formedIn the course of course, ant adds condition item to the front piece of classification rule, and the condition item is added to the front piece of rule by probability selection function PijAnd (4) determining.
Figure BDA0003339113050000075
Wherein eta isijRepresenting conditional items termij has a formula function value etaijThe larger the conditional termijThe greater the contribution to the classification system, and therefore the greater the probability that the condition term is selected for addition to the classification rule antecedent. Tau isij(t) indicates the condition term during the t-th iterationijThe pheromone of (a). a represents the number of features if feature AiNot used by current ants, then xiWill be set to 1. Otherwise, xiWill be set to 0, biAnd the value number of the ith characteristic is represented. termijRepresents a condition item Ai=VijWherein A isiDenotes the ith feature, VijThe jth value representing the ith feature. Probability selection function PijI.e. the condition termijProbability of being selected by ants and added to the classification rule antecedent.
And taking the proportion of the condition items associated with the cheating website in the training set as heuristic information to guide ants to search the optimal condition item combination to construct a classification rule. Conditional termijOf the heuristic function etaijComprises the following steps:
Figure BDA0003339113050000081
wherein, | Tij| represents the condition termijFrequency of occurrence in the entire training set, | spam _ class Tij| represents a conditional item termijThe frequency of occurrence in the training set of the cheating websites is in a direct proportional relationship, so that the mosquitoes can preferentially select the condition item with higher association degree with the cheating websites to add into the classification rule antecedent. With increasing number of iterations and pheromonesMore recently, ants will gradually discover the classification detection rules associated with normal web sites.
And 5: and (4) outputting the result obtained in the step (4), and manually and comprehensively judging to determine whether the behavior causing the data exception is an illegal behavior.

Claims (10)

1. The multi-channel operation wind control method based on active identification and intelligent monitoring is characterized by comprising the following steps: constructing by a computer: the system comprises an undersampling module and an intelligent analysis module; the under-sampling module is responsible for sampling and checking input data and judging whether data abnormity exists or not; when data are found to be abnormal, the data are transferred to an intelligent analysis module for verification; and the intelligent analysis module is responsible for rechecking the input data.
2. The multi-channel operation wind control method based on active identification and intelligent monitoring as claimed in claim 1, wherein: the undersampling module is used for judging the distance of the appointed nearest neighbor sample point, simultaneously paying attention to poor data effect caused by unbalanced samples, and overcoming the problem of large influence of noise points.
3. The multi-channel operation wind control method based on active identification and intelligent monitoring as claimed in claim 2, wherein: the network flow data acquisition and preprocessing submodule is responsible for acquiring network flow data; the main flow characteristics of the aforementioned network traffic data should include: source IP address, protocol type, byte number and other characteristics;
the submodule is also responsible for carrying out normalization processing on the network flow data and recording xscaleThe data is normalized; the above data subjected to the normalization processing is referred to as "tag data".
4. The multi-channel operation wind control method based on active identification and intelligent monitoring as claimed in claim 2, wherein: in a supervised learning-based classification model building submodule, selecting label data as training data, and then selecting a proper classification model for training; the data set is divided in the model training process, part of data is randomly extracted for verification, and the model is trained to be an initial classification model by using a cross-validation method in the selection process.
5. The multi-channel operation wind control method based on active identification and intelligent monitoring as claimed in claim 2, wherein: the KNN-based semi-supervised learning label correction submodule is responsible for detecting whether the detected abnormal flow data reaches a specified threshold value or not in a low-loss manner on the premise of not increasing the amount of artificial participation: if the threshold value is not reached, continuing monitoring; otherwise, the data label correction operation is executed.
6. The multi-channel operation wind control method based on active identification and intelligent monitoring as claimed in claim 1, wherein: after the energy analysis module is activated by the undersampling module, the following steps are carried out in sequence: data set balancing processing; selecting characteristics; carrying out discretization treatment; carrying out intelligent identification by adopting an ant colony optimization method; and outputting the result.
7. The multi-channel operation wind control method based on active identification and intelligent monitoring as claimed in claim 6, wherein: the specific steps of the step 1 are as follows: data set balancing processing: the number of samples designated as class C1 was k1 and the number of samples designated as class C2 was k2, provided in the dataset; each sample in the data set represents a d-dimensional vector, and the data set is balanced by adopting a k-means algorithm to obtain k clustering clusters;
if a sample is closer to the cluster center, the more representative the sample is of the characteristics of the cluster; assuming that Ci cluster contains ni samples, i takes 1, 2, … … k;
selecting ni x k2/k1 samples nearest to the cluster center wi from the clusters Ci according to the proportion; finally, N samples of the type C1 are obtained, and the number of the samples of the type C1 is balanced with the number of the samples of the type C2.
8. The multi-channel operation wind control method based on active identification and intelligent monitoring as claimed in claim 6, wherein: the specific steps of the feature selection in the step 2 are as follows: firstly, generating a feature subset from an original feature set, then evaluating the feature subset by adopting an evaluation function, finally comparing an evaluation result with a stopping criterion, if the evaluation result meets the stopping criterion, outputting the feature subset and verifying the feature subset, and if not, continuously generating the next feature subset and continuously evaluating the feature subset; the evaluation formula adopted in the step is as follows:
Figure FDA0003339113040000021
wherein, A is the characteristic, C is the category, H (C) represents the information entropy of the whole classification system, n represents the category number of the classification system, P (Ci) represents the sample proportion with the category Ci, m represents the value number of the characteristic A, and P (C)i∣A=Aj) And the probability of belonging to the category Ci under the condition that the characteristic A takes the value of Aj is shown.
9. The multi-channel operation wind control method based on active identification and intelligent monitoring as claimed in claim 6, wherein: the specific steps of the step 3 are as follows: assuming a set S, a characteristic A and a breakpoint T, dividing a value set S of the characteristic A into two sets S by the breakpoint T1And S2(ii) a Wherein at S1Wherein A is less than or equal to T, in S2The value of A in the (A) is more than T; the weighted information Entropy of feature A, Encopy (A, T; S), can be used to calculate the information Entropy of set S, with the following formula:
Figure FDA0003339113040000022
namely, acquiring the information entropy of the set S divided by the breakpoint T;
for feature A in subset S1And S2And continuously and circularly carrying out discretization treatment to obtain:
IG(A,T;S)=Entropy(S)-Entropy(A,T;S)
Figure FDA0003339113040000031
Figure FDA0003339113040000032
when the value of the information gain IG (A, T; S) is smaller than the threshold value delta; in the above formula, N represents the number of samples in the set S; k represents the number of categories contained in the set S, kiIs represented in the set SiThe number of categories contained in (1).
10. The multi-channel operation wind control method based on active identification and intelligent monitoring as claimed in claim 6, wherein: the specific steps of the step 4 are as follows: adopting ant colony optimization to intelligently judge whether the behavior causing data abnormity is a cheating behavior; the cheating behaviors comprise malicious popularization flow behaviors, black product group behaviors and activity cheating behaviors;
in the process of forming the classification rule of the ant algorithm, the ants add the condition items to the classification rule antecedents, and the condition items are added to the rule antecedents and are also selected by the probability selection function PijDetermining;
Figure FDA0003339113040000033
probability selection function PijI.e. the condition termijProbability of being selected by ants and added to the classification rule antecedents;
the proportion of condition items associated with the cheating website in the training set is used as heuristic information to guide ants to search for the optimal condition item combination to construct a classification rule; conditional termijOf the heuristic function etaijComprises the following steps:
Figure FDA0003339113040000034
wherein, | Tij| represents the condition termijFrequency of occurrence, | spam _ class T, in the entire training setij| represents the condition termijThe frequency of occurrence in the training set of the cheating websites is in a direct proportional relationship, so that the mosquitoes can preferentially select the condition item with higher association degree with the cheating websites to add into the classification rule antecedent; as the number of iterations increases and the pheromone is updated, ants will gradually find the classification detection rules associated with normal websites.
CN202111303151.0A 2021-11-05 2021-11-05 Multi-channel operation wind control method based on active identification and intelligent monitoring Pending CN114254691A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111303151.0A CN114254691A (en) 2021-11-05 2021-11-05 Multi-channel operation wind control method based on active identification and intelligent monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111303151.0A CN114254691A (en) 2021-11-05 2021-11-05 Multi-channel operation wind control method based on active identification and intelligent monitoring

Publications (1)

Publication Number Publication Date
CN114254691A true CN114254691A (en) 2022-03-29

Family

ID=80790482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111303151.0A Pending CN114254691A (en) 2021-11-05 2021-11-05 Multi-channel operation wind control method based on active identification and intelligent monitoring

Country Status (1)

Country Link
CN (1) CN114254691A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114757599A (en) * 2022-06-15 2022-07-15 武汉极意网络科技有限公司 Method for measuring flow quality based on extra cost
CN116629709A (en) * 2023-07-21 2023-08-22 国网山东省电力公司青岛市即墨区供电公司 Intelligent analysis alarm system of power supply index

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114757599A (en) * 2022-06-15 2022-07-15 武汉极意网络科技有限公司 Method for measuring flow quality based on extra cost
CN116629709A (en) * 2023-07-21 2023-08-22 国网山东省电力公司青岛市即墨区供电公司 Intelligent analysis alarm system of power supply index
CN116629709B (en) * 2023-07-21 2023-10-20 国网山东省电力公司青岛市即墨区供电公司 Intelligent analysis alarm system of power supply index

Similar Documents

Publication Publication Date Title
Wang et al. HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection
CN108632279B (en) Multilayer anomaly detection method based on network traffic
CN109194612B (en) Network attack detection method based on deep belief network and SVM
Saxena et al. Intrusion detection in KDD99 dataset using SVM-PSO and feature reduction with information gain
Hu et al. Adaboost-based algorithm for network intrusion detection
CN108718310A (en) Multi-level attack signatures generation based on deep learning and malicious act recognition methods
Kim et al. Fusions of GA and SVM for anomaly detection in intrusion detection system
CN111259219B (en) Malicious webpage identification model establishment method, malicious webpage identification method and malicious webpage identification system
CN114254691A (en) Multi-channel operation wind control method based on active identification and intelligent monitoring
CN110149347B (en) Network intrusion detection method for realizing dynamic self-adaptive clustering by using inflection point radius
Jayakumar et al. Intrusion detection using artificial neural networks with best set of features.
CN114374541A (en) Abnormal network flow detector generation method based on reinforcement learning
Das et al. A comprehensive analysis of accuracies of machine learning algorithms for network intrusion detection
Brifcani et al. Intrusion detection and attack classifier based on three techniques: a comparative study
CN115242441A (en) Network intrusion detection method based on feature selection and deep neural network
CN113783852B (en) Intelligent contract Pompe fraudster detection algorithm based on neural network
CN116962089B (en) Network monitoring method and system for information security
BOUIJIJ et al. Machine learning algorithms evaluation for phishing urls classification
CN117236699A (en) Network risk identification method and system based on big data analysis
Chou et al. Classification of malicious traffic using tensorflow machine learning
CN110290101B (en) Deep trust network-based associated attack behavior identification method in smart grid environment
US20230095966A1 (en) Intrusion detection method based on improved immune network algorithm, and application thereof
Punitha et al. A feature reduction intrusion detection system using genetic algorithm
CN114936615B (en) Small sample log information anomaly detection method based on characterization consistency correction
Qin et al. ADSAD: An unsupervised attention-based discrete sequence anomaly detection framework for network security analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication