CN110336768B - Situation prediction method based on combined hidden Markov model and genetic algorithm - Google Patents

Situation prediction method based on combined hidden Markov model and genetic algorithm Download PDF

Info

Publication number
CN110336768B
CN110336768B CN201910060212.1A CN201910060212A CN110336768B CN 110336768 B CN110336768 B CN 110336768B CN 201910060212 A CN201910060212 A CN 201910060212A CN 110336768 B CN110336768 B CN 110336768B
Authority
CN
China
Prior art keywords
chromosome
probability
matrix
algorithm
hidden markov
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910060212.1A
Other languages
Chinese (zh)
Other versions
CN110336768A (en
Inventor
高岭
毛勇
郑杰
杨旭东
冯通
张晓�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern University
Original Assignee
Northwestern University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern University filed Critical Northwestern University
Priority to CN201910060212.1A priority Critical patent/CN110336768B/en
Publication of CN110336768A publication Critical patent/CN110336768A/en
Application granted granted Critical
Publication of CN110336768B publication Critical patent/CN110336768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A situation prediction method based on a combined hidden Markov model and genetic algorithm is characterized in that redundant alarms and false reports are processed by utilizing an artificial fish swarm optimization fuzzy clustering method, and the artificial fish swarm optimization can well overcome the defect that fuzzy c-means clustering is sensitive to an initial clustering center, so that the aim of optimizing alarm clustering precision is fulfilled. And meanwhile, aiming at the problem that the local optimization of a training result is easily caused by improper setting of initial parameters of the hidden Markov model in the training process, the clustered alarm is used as input, the initial value of the hidden Markov is optimized by using a genetic algorithm, the optimized parameters are further trained by using a Bowmember algorithm, finally the parameters of the hidden Markov model under the maximum likelihood estimation are obtained, and the security situation is predicted by combining a Viterbi algorithm with an observation value. The method can improve the accuracy of network security situation prediction.

Description

Situation prediction method based on combined hidden Markov model and genetic algorithm
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a situation prediction method based on a combined hidden Markov model and a genetic algorithm.
Background
With the development of internet technology, more and more services are carried by the internet technology. Electric power, water conservancy, communication, banking, transportation, education, military, etc. are all independent of the internet. Various services borne on the Internet and various stored information are all the embodiments of physical and practical values. The appearance of bitcoin further blurs the boundary between the virtual network world and the real world. The network world has huge information quantity and is complex. The internet is freely, conveniently and quickly accessed, so that the use of the internet by people all over the world is not limited by time and places, and the network security is concerned more and more. In recent years, attack tools and methods in networks are becoming more and more complex, and the requirements of security highly sensitive departments cannot be met only by means of traditional security measures. The traditional protection means adopted aiming at network safety is dispersed and single, and various network key factors cannot be comprehensively judged from a macroscopic view. It is in this context that emerging research into the awareness of network security posture has emerged.
The network security situation awareness is to acquire, understand and evaluate key element data in a network, and finally predict the security situation of the whole network according to an evaluation result, wherein a specific network security situation awareness framework is shown in fig. 2. The situation prediction is realized by continuously detecting the network state, and when the network state is abnormal, the next state of the network is predicted by using a known prediction model. The existing situation prediction method based on the hidden Markov model is trained by combining an EM algorithm with an actual network observation value, and when the network is abnormal, the trained model is used for predicting the network situation value, so that the following defects exist:
the existing clustering method has the problem of sensitivity to an initial clustering center when being applied to intrusion detection alarm processing, so that the analysis of an alarm result is not accurate enough. Thereby affecting the training of the final model and failing to obtain an accurate model well.
Due to inherent defects of the hidden Markov model, when the EM algorithm is used for training, the selection result of the initial value is poor due to the selection standard defects of the initial value, and therefore a local optimal training result appears.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a situation prediction method based on a combined hidden Markov model and a genetic algorithm, which adopts a fish swarm algorithm optimized fuzzy clustering method in the alarm initialization process to effectively overcome the defect that alarm clustering analysis is easy to fall into a local extreme value, improves the precision of an alarm clustering result, and simultaneously utilizes a swarm intelligent perception algorithm to optimize the hidden Markov situation prediction model to well train the model and avoid local optimization, thereby enabling the network security situation prediction result to be more accurate.
In order to achieve the purpose, the invention adopts the technical scheme that:
a situation prediction method based on a combined hidden Markov model and a genetic algorithm is characterized by comprising the following steps:
the method comprises the following steps: according to the collected intrusion detection alarms, preprocessing the intrusion detection alarms by an intrusion detection alarm clustering method based on artificial fish school optimization fuzzy mean clustering is carried out on the collected intrusion detection alarms, so that the purpose of simplifying and accurately classifying the alarms is achieved, and the processed result is used as an external observation value of a network;
according to the collected intrusion detection alarms, carrying out preprocessing of an intrusion detection alarm clustering method based on artificial fish swarm optimization fuzzy mean clustering on the collected intrusion detection alarms, wherein the preprocessing comprises the following steps:
1): initializing intrusion detection system alarms: removing unnecessary attributes and carrying out preliminary aggregation on multi-source heterogeneous data;
2) carrying out weight distribution on the alarm attribute by using a consistent matrix method;
3) establishing a fuzzy similarity matrix of the alarm by using a self-defined alarm attribute similarity function and a weight relation;
4) establishing a fuzzy equivalent matrix by using a transmission closed-packet method, and establishing an artificial fish individual for each alarm;
5) constructing a food concentration function, and mapping the high-dimensional sample to a three-dimensional plane;
6) performing FCM clustering based on an artificial fish swarm algorithm, wherein the FCM clustering comprises the following steps:
1) defining an error function of the artificial fish swarm algorithm:
Figure RE-GDA0002184223150000031
wherein rij1rij represents the euclidean distance between sample i and sample j mapped from the higher order sample to the three-dimensional plane, assuming that the coordinate values of i and j are (a)i,bi,ci)、(aj,bj, cj) Then rij1
Figure RE-GDA0002184223150000032
rij is the value of the corresponding position in the fuzzy equivalent matrix established in the fourth step;
2) defining a food concentration function for an individual:
Figure RE-GDA0002184223150000033
3) randomly distributing samples to be clustered, which are mapped from a high dimension to three dimensions, in a three-dimensional space, and randomly assigning a three-dimensional coordinate value to each sample;
4) calculating the food concentration of the artificial fish;
5) performing optimization behaviors such as herd gathering, foraging and rear-end collision on the basis of the current food concentration of the fish school;
6) if all the artificial fishes in the group finish moving, continuing to execute downwards, otherwise, turning to the step 4);
7) if the difference between the updated individual maximum food concentration value of the artificial fish and the maximum food concentration function value before updating is smaller than a certain specified value, or the updating times reach the specified maximum times, ending, otherwise, turning to the step 4);
8) clustering by applying an FCM algorithm to obtain three-dimensional coordinate values, and mapping the final result to the original high-dimensional sample;
step two: determining the number N of the hidden states of the network according to the network risk level, carrying out interval division on the initial probability of each hidden state according to expert experience, and carrying out interval division on the transition probability between the hidden states and the output probability from the hidden states to the display states;
step three: according to the initial probability interval matrix and the transition probability interval matrix of each hidden state divided in the second step, the output probability interval matrix takes random numbers in the interval and is normalized to respectively generate P hidden Markov initial probability matrixes pi, a transition probability matrix A and an output probability matrix B;
p hidden Markov initial probability matrixes pi, a transition probability matrix A and an output probability matrix B are respectively generated randomly, and the specific normalization result met by the generated probability matrix meets the following formula:
Figure RE-GDA0002184223150000051
step four: encoding the generated P initial probability matrixes by adopting a floating point number encoding method; the three parameter matrixes of the chromosome generated by the adopted floating point number coding method corresponding to the hidden Markov model respectively comprise three parts, a hidden state initial probability matrix corresponds to an initial chromosome Ge pi, a hidden state transition probability matrix corresponds to a transition chromosome GeA, and an output matrix from a hidden state to a display state corresponds to an output chromosome GeB;
step five: calculating the fitness values of all P chromosomes, and directly copying the individuals with the maximum fitness values to the next population in order to prevent the randomness of a genetic algorithm from damaging the individuals with the optimal fitness values in the current population, namely the optimal storage strategy;
step six: for the last P-1 chromosomes, calculating the weighted sum of the support degree and the fitness value of the chromosome to the dispersion of the population, and combining the roulette rule to enable the population scale to reach P again;
the individual support degree calculation mode for the dispersion of the population relates to the following definition:
definition 1: defining the size of the population as S, and defining that one chromosome contains Q ═ m × N + N × N + N genes, and the chromosome k is formed from Gk=(Gk1,Gk2...GkQ) S denotes k ═ 1,2.. S;
definition 2: chromosome fitness function f: since the optimal chromosome individual solved by the genetic algorithm is the initial parameter matrix of Hmm, the forward probabilities of all chromosomes are used as the fitness function, i.e. the
f=P(O/λ);
Definition 3: defining individual phenotypesηk, i.e. the ratio of the fitness value of chromosome k to the sum of population fitness values
Figure RE-GDA0002184223150000061
Definition 4: defining population dispersion d
Figure RE-GDA0002184223150000062
Definition 5: defining the support degree of the kth chromosome on the dispersion of the population as follows;
Figure RE-GDA0002184223150000063
step seven: determining the cross probability according to the support degree, and completing the genetic cross among individuals by adopting an arithmetic cross mode according to the following steps:
1): randomly selecting a chromosome k, and calculating the formula:
Figure RE-GDA0002184223150000064
wherein SptmaxRepresenting the maximum support, SptminRepresents the minimum support, SptkRepresents randomly selected chromosome support;
2): and generating a random number r, and if r < Sptr, determining the chromosome k as a chromosome to be crossed. Repeating the two steps until two chromosomes to be crossed are generated;
3): and (3) carrying out genetic crossing on the two chromosomes to be crossed, wherein the crossing principle is as follows: ge pi 1 crosses Ge pi 1, GeA1 crosses GeA2, GeB1 crosses GeB 2;
step eight: determining variation probability according to the support degree, and completing individual genetic variation by adopting a non-uniform variation mode; the variation mode is as follows:
Figure RE-GDA0002184223150000071
wherein G iskFor randomly selected k chromosome before mutation, Gk' is GkAltered chromosome, GmaxAnd GminThe individuals with the maximum and minimum current fitness are respectively. t is (0 to 1)]An inter-variance constant, r is a random number; if random integer rand () is even, G is usedk’=Gk+t(Gmax-Gk) The variation of r is odd by Gk’=Gk+t(Gk-Gmix) The mutation mode of r is a synchronization step seven by utilizing the mode of determining the mutation probability by utilizing the support degree;
step nine: carrying out individual normalization processing on the new born population subjected to genetic replication, genetic crossing and genetic variation to meet the hidden Markov parameter constraint condition;
step ten: and checking whether a preset iteration termination condition is met, if so, terminating, selecting the chromosome with the maximum fitness value as a global optimum value, and mapping the chromosome to three initial matrixes of the hidden Markov model. Otherwise, returning to the step five to carry out a new round of evolution;
step eleven: carrying out iterative training on the model parameter lambda (pi, A and B) obtained in the step ten by adopting a Bowmville algorithm to obtain a maximum likelihood estimation parameter of the hidden Markov model; performing iterative training on the lambda (pi, A and B) obtained in the step ten by using a Bowmville algorithm to obtain the maximum likelihood estimation parameter of the HMM model, wherein the method comprises the following steps of:
1) d alarm sequence data samples { O ] are obtained according to the intrusion detection alarm clustering method of the artificial fish swarm optimization fuzzy mean clustering1,O2,...ODH, any alarm sequence O thereofd={o1 (d),o2 (d),o3 (d),....oT (d)};
2) Optimizing according to the genetic algorithm to obtain an optimal initial value lambda (pi, A, B);
3) for each sample D1, 2.. D, γ is calculated using a forward-backward algorithmt (d)(i),ξt (d)(i,j),t=1,2...T;
4) Updating the model parameter matrix;
5) checking whether each matrix meets a convergence condition, if so, finishing the algorithm, and otherwise, returning to the step (3) for iterative execution;
step twelve: if the network state is abnormal, the network security situation can be predicted by utilizing a Viterbi algorithm through collecting external observation values and a trained hidden Markov model.
The invention has the following advantages:
1. the collected alarm data are classified by combining the artificial fish swarm algorithm and the fuzzy clustering, so that the defect that the accuracy of a clustering result is low due to the fact that a traditional clustering method is sensitive to an initial clustering center in the process of processing redundant alarms is effectively overcome. Thereby improving the situation prediction accuracy.
2. And (3) adopting a genetic algorithm and a hidden Markov model to predict the situation, inputting an initial value of an optimization result generated by processing the genetic algorithm into the BombWilch algorithm, and adopting detected and processed network alarm data as an observation value to carry out iterative training on the optimization result to obtain a parameter value. The method effectively overcomes the defect that the local optimization of the training result is caused by improper initial value selection in the situation prediction process of the traditional hidden Markov model.
Drawings
Fig. 1 is a working principle diagram of the present invention.
Fig. 2 is a network security situation awareness framework diagram.
FIG. 3 is a flow chart of the fish swarm algorithm-fuzzy clustering alarm processing steps of the present invention.
FIG. 4 is a diagram of the genetic algorithm optimization process of the present invention.
Detailed Description
The present invention will be further described with reference to the following examples and drawings, but the present invention is not limited to the following examples.
The invention provides a situation prediction method based on a combined hidden Markov model and a genetic algorithm, which aims at the problem that the situation prediction method of the hidden Markov in the existing network security situation perception method has theoretical defects and easily leads to local optimization of a training result, and proposes to optimize an initial parameter by adopting a swarm intelligence perception theory so that a Bowmember algorithm can obtain a parameter value with higher fitness in the initial stage of training. In the initialization process of training data, a combined artificial fish school algorithm and a c-means clustering method are adopted to remove false alarms and redundant alarms. The combined use of the two methods can improve the accuracy of the situation prediction result to a great extent, so that a network security administrator can more accurately obtain the real situation of the network security situation.
Fig. 1 is a schematic diagram of the operation of the present invention. Specifically, the alarm data of the intrusion detection system after being processed is used as input, and after the data is initialized, the data is processed by adopting an improved clustering method and is used as a situation observation value. After the initial parameters of the existing hidden Markov prediction model are optimized, model training is carried out by utilizing the Bowman's algorithm in combination with the situation observation value, and finally the maximum likelihood model parameter values of the observation sequence are obtained. And predicting the situation value of the network by using the observation sequence and the Viterbi algorithm. The method specifically comprises the following steps:
1) according to the preprocessed intrusion detection alarm, preprocessing the intrusion detection alarm clustering method based on artificial fish school optimization fuzzy clustering is carried out on the intrusion detection alarm, so that the purpose of simplifying and accurately classifying the alarm is achieved, and the processed result is used as an external observation value of the network;
2) determining the number N of the hidden states of the network according to the network risk level, carrying out interval division on the initial probability of each hidden state according to expert experience, and carrying out interval division on the transition probability between the hidden states and the output probability from the hidden states to the display states;
3) according to the initial probability interval matrix and the transition probability interval matrix of each hidden state divided in the second step, the output probability interval matrix takes random numbers in the interval and is normalized to respectively generate P hidden Markov model initial probability matrixes pi, a transition probability matrix A and an output probability matrix B;
4) encoding the generated P initial probability matrixes by adopting a floating point number encoding method;
5) calculating the fitness values of all P chromosomes, and directly copying the individuals with the maximum fitness values to the next population in order to prevent the randomness of a genetic algorithm from damaging the individuals with the optimal fitness values in the current population, namely the optimal storage strategy;
6) for the last P-1 chromosomes, calculating the weighted sum of the support degree and the fitness value of the chromosome to the dispersion of the population, and combining the roulette rule to enable the population scale to reach P again;
7) determining the crossover probability according to the support degree, and completing genetic crossover among individuals by adopting an arithmetic crossover mode;
8) determining variation probability according to the support degree, and completing individual genetic variation by adopting a non-uniform variation mode;
9) carrying out individual normalization processing on the new born population subjected to genetic replication, genetic crossing and genetic variation to meet the hidden Markov parameter constraint condition;
10) and checking whether a preset iteration termination condition is met, if so, terminating, selecting the chromosome with the maximum fitness value as a global optimum value, and mapping the chromosome to three initial matrixes of the hidden Markov model. Otherwise, returning to the step 5) to begin a new round of evolution;
11) carrying out iterative training on the model parameter lambda (pi, A, B) obtained in the step 10) by adopting a Bowmville algorithm to obtain a maximum likelihood estimation parameter of the hidden Markov model;
12) if the network state is abnormal, the network security situation can be predicted by utilizing a Viterbi algorithm through collecting external observation situation values and a trained hidden Markov model.
Fig. 3 is a fish school algorithm-fuzzy clustering alarm processing step. Specifically, according to the collected intrusion detection alarms, preprocessing of an intrusion detection alarm clustering method based on artificial fish swarm optimization fuzzy clustering is carried out on the collected intrusion detection alarms, and the method specifically comprises the following steps:
(1): initializing intrusion detection system alarms: removing unnecessary attributes and carrying out preliminary aggregation on multi-source heterogeneous data, and the method comprises the following steps:
1) inputting a piece of alarm information xiIf i is 1, its alarm type (1) is recorded, and the type number counter t is 1
2) When i is>When the alarm is 2, the type (x) is judged to be the type (1) to the type (t) which are identified at present and the type (x) of the current alarmi) As a result of comparison of (i) with
Figure RE-GDA0002184223150000111
3) When i ═ n, for each of the t classes of alarm data, classifying according to a predefined length of time;
(2) the method for distributing the weights of the alarm attributes by using the consistent matrix method specifically comprises the following steps:
1) according to expert experience, the m attributes of the intrusion detection alarm are subjected to pairwise attribute importance degree ratio scoring to obtain a judgment matrix
Figure RE-GDA0002184223150000121
Wherein xijThe ratio of the importance of the ith and jth attributes;
2)
Figure RE-GDA0002184223150000122
each factor is weighted by (β)1,β2,。。。,,βi,。。。,βn);
(3) Establishing a fuzzy similarity matrix of the alarm by using a self-defined alarm attribute similarity function and a weight relation, wherein the attribute similarity function specifically comprises the following steps:
1) time similarity function:
Figure RE-GDA0002184223150000123
2) port similarity function:
Figure RE-GDA0002184223150000124
3) source/destination ip address similarity function
Figure RE-GDA0002184223150000125
(η is the same number of bits from left to right for both source/destination ip addresses);
4) a protocol similarity function;
Figure RE-GDA0002184223150000131
similarity x of the ith alarm and the jth alarmijThe calculation formula is as follows;
Figure RE-GDA0002184223150000132
(where m is the number of attributes,
Figure RE-GDA0002184223150000136
for the ith and jth alarms
Figure RE-GDA0002184223150000137
Similarity values of individual attributes);
(4) establishing a fuzzy equivalent matrix by using a transmission closed-packet method, and establishing an artificial fish individual for each alarm;
(5) constructing a food concentration function, and mapping the high-dimensional sample to a three-dimensional plane;
(6) performing FCM clustering based on an artificial fish swarm algorithm, wherein the FCM clustering comprises the following steps:
1) defining an error function of the artificial fish swarm algorithm:
Figure RE-GDA0002184223150000133
wherein r isij' represents a Euclidean distance between a sample i and a sample j mapped from a high-order sample to a three-dimensional plane, assuming that coordinate values of i and j are (a) respectivelyi,bi,ci)、(aj,bj,cj) Then rij1
Figure RE-GDA0002184223150000134
rij *Is the value of the corresponding position in the fuzzy equivalent matrix established in the step four;
2) defining a food concentration function for an individual:
Figure RE-GDA0002184223150000135
3) randomly distributing the samples to be clustered from high dimension to three dimension in a three-dimensional space, and randomly assigning a three-dimensional coordinate value to each sample
4) Calculating the food concentration of the artificial fish
5) Performing optimal behaviors such as herding, foraging and rear-end collision on the basis of the current food concentration of fish herds
6) If all the artificial fishes in the group finish moving, continuing to execute downwards, otherwise, turning to (4)
7) If the difference between the updated individual maximum food concentration value of the artificial fish and the maximum food concentration function value before updating is less than a certain specified value or the updating times reach the specified maximum times, ending the process, otherwise turning to (4)
8) And (4) clustering by applying an FCM algorithm to obtain three-dimensional coordinate values, and mapping the final result to the original high-dimensional sample.
FIG. 4 is a diagram of genetic algorithm optimization process. Specifically, P hidden Markov initial probability matrixes pi, a transition probability matrix A and an output probability matrix B are respectively generated randomly. The specific normalization result satisfied by the generated probability matrix satisfies the following formula:
Figure RE-GDA0002184223150000141
the chromosome generated by the floating point number coding method and corresponding to three parameter matrixes of a hidden Markov model respectively comprises three parts, a hidden state initial probability matrix corresponds to an initial chromosome Ge pi, a hidden state transition probability matrix corresponds to a transition chromosome GeA, and an output matrix from a hidden state to an explicit state corresponds to an output chromosome GeB, as shown in figure 1:
the specific calculation mode of the individual support degree for the dispersion of the population relates to the following definitions:
definition 1: defining the size of the population as S, and defining that one chromosome contains Q ═ m × N + N × N + N genes, and the chromosome k is formed from Gk=(Gk1,Gk2...GkQ) S denotes k ═ 1,2.. S;
definition 2: chromosome fitness function f: since the optimal chromosome individual solved by the genetic algorithm is the initial parameter matrix of Hmm, the forward probabilities of all chromosomes are used as the fitness function, i.e. the
f=P(O/λ)
Definition 3: defining individual phenotype eta k, i.e. the ratio of fitness value of chromosome k to the sum of population fitness values
Figure RE-GDA0002184223150000151
Definition 4: defining the dispersion d of the population;
Figure RE-GDA0002184223150000152
definition 5: defining the support degree of the kth chromosome on the dispersion of the population as
Figure RE-GDA0002184223150000153
The method combines the roulette rule to enable the population size to reach P again, and comprises the following specific steps:
(1): calculating the formula ti=ufi+vSptiWherein u and v are respectively the weight occupied by the fitness value and the support value;
(2): calculation formula Tn=∑ufi+vSpti
(3): calculation formula Wi=ti/Tn
(4): calculating cumulative probability
Figure RE-GDA0002184223150000161
(5): randomly generating a random number r satisfying 0-1 in uniform distribution, and adding r and giIf g is comparedi-1<r<giSelecting an individual i to enter a next generation new group; repeating (4) and (5) until the number of new populations generated is equal to the parent population size;
determining the crossover probability according to the support degree, and finishing the genetic crossover among individuals by adopting an arithmetic crossover mode.
(1): randomly selecting a chromosome k, and calculating the formula:
Figure RE-GDA0002184223150000162
wherein SptmaxRepresenting maximum support,SptminRepresents the minimum support, SptkRepresents randomly selected chromosome support;
(2): and generating a random number r, and if r < Sptr, determining the chromosome k as a chromosome to be crossed. Repeating the two steps until two chromosomes to be crossed are generated
(3): carrying out genetic crossing on two chromosomes to be crossed, wherein the crossing principle is as follows: ge π 1 crosses Ge π 1, GeA1 and GeA2, and GeB1 and GeB 2. The specific interleaving operations used are as follows:
1) parent generation: ge pi 1 ═ pi11,π12,。。。π1n},Geπ1={π21,π22,。。。π2n}
2) Random selection of a Gene j
3) And (3) filial generation: ge pi 1 ═ pi11,π12,...a*π1k+(1-a)π2k,a*π1(k+1)+(1-a)π2(k+1)...a*π1n+(1-a)π2nA is a random number between 0 and 1, and the intersection of the transfer matrix and the output matrix is the same, which is not described again;
and determining the mutation probability according to the support degree, and completing the genetic mutation of the individuals by adopting a non-uniform mutation mode. Specifically, the following formula is used:
the variation mode is as follows:
Figure RE-GDA0002184223150000171
wherein G iskFor randomly selected k chromosome before mutation, Gk' is the variant GkA mutated chromosome. GmaxAnd GminThe individuals with the maximum and minimum current fitness are respectively. t is (0 to 1)]The inter-variance constant, r, is a random number. I.e. G is used when random integer rand () is evenk’=Gk+t(Gmax-Gk) The variation of r is odd by Gk’=Gk+t(Gk-Gmix) The variation of r.
Performing iterative training on the obtained lambda (pi, A and B) by using a BombWelch algorithm to obtain a maximum likelihood estimation parameter of the HMM model, and specifically comprising the following steps:
(1): d alarm sequence data samples { O ] are obtained according to the intrusion detection alarm clustering method of the artificial fish swarm optimization fuzzy mean clustering1,O2,...OD0 of any alarm sequenced={o1 (d),o2 (d),...oT (d)And according to said claim 2
(2): optimizing according to the genetic algorithm to obtain an optimal initial value lambda (pi, A, B)
(3): for each sample D1, 2.. D, γ is calculated using a forward-backward algorithmt (d)(i),ξt (d)(i, j), T ═ 1,2.. T, where
Figure RE-GDA0002184223150000172
Figure RE-GDA0002184223150000181
Where Ci (i) is the forward probability, βi(i) Is a backward probability, aijFor transition probabilities, bj (a +1) is the output probability
(4): the model parameters are updated according to the following formula:
Figure RE-GDA0002184223150000182
Figure RE-GDA0002184223150000183
Figure RE-GDA0002184223150000184
(5): and (4) checking whether each matrix meets a convergence condition, if so, finishing the algorithm, and otherwise, returning to the step (3) for iterative execution.
And obtaining the trained hidden Markov model parameters through the steps. When the network is not normally operated, the idea of the prediction algorithm is as follows:
(1) and acquiring a network situation observation value sequence.
(2) And acquiring the trained hidden Markov model parameters.
(3) A sequence of maximized hidden states is computed according to the viterbi algorithm.
(4) And determining the network security situation value at the next moment according to the state transition matrix.

Claims (1)

1. A situation prediction method based on a combined hidden Markov model and a genetic algorithm is characterized by comprising the following steps:
the method comprises the following steps: according to the collected intrusion detection alarms, preprocessing the intrusion detection alarms by an intrusion detection alarm clustering method based on artificial fish school optimization fuzzy mean clustering is carried out on the collected intrusion detection alarms, so that the purpose of simplifying and accurately classifying the alarms is achieved, and the processed result is used as an external observation value of a network;
according to the collected intrusion detection alarms, carrying out preprocessing of an intrusion detection alarm clustering method based on artificial fish swarm optimization fuzzy mean clustering on the collected intrusion detection alarms, wherein the preprocessing comprises the following steps:
1): initializing intrusion detection system alarms: removing unnecessary attributes and carrying out preliminary aggregation on multi-source heterogeneous data;
2) carrying out weight distribution on the alarm attribute by using a consistent matrix method;
3) establishing a fuzzy similarity matrix of the alarm by using a self-defined alarm attribute similarity function and a weight relation;
4) establishing a fuzzy equivalent matrix by using a transmission closed-packet method, and establishing an artificial fish individual for each alarm;
5) constructing a food concentration function, and mapping the high-dimensional sample to a three-dimensional plane;
6) performing FCM clustering based on an artificial fish swarm algorithm, wherein the FCM clustering comprises the following steps:
1) defining an error function of the artificial fish swarm algorithm:
Figure FDA0003101317530000011
wherein r isij' represents a Euclidean distance between a sample i and a sample j mapped from a high-order sample to a three-dimensional plane, assuming that coordinate values of i and j are (a) respectivelyi,bi,ci)、(aj,bj,cj) Then rij′:
Figure FDA0003101317530000021
rij *Is the value of the corresponding position in the fuzzy equivalent matrix established in the step 4);
2) defining a food concentration function for an individual:
Figure FDA0003101317530000022
3) randomly distributing samples to be clustered, which are mapped from a high dimension to three dimensions, in a three-dimensional space, and randomly assigning a three-dimensional coordinate value to each sample;
4) calculating the food concentration of the artificial fish;
5) performing optimization behaviors such as herd gathering, foraging and rear-end collision on the basis of the current food concentration of the fish school;
6) if all the artificial fishes in the group finish moving, continuing to execute downwards, otherwise, turning to the step 4);
7) if the difference between the updated individual maximum food concentration value of the artificial fish and the maximum food concentration function value before updating is smaller than a certain specified value, or the updating times reach the specified maximum times, ending, otherwise, turning to the step 4);
8) clustering by applying an FCM algorithm to obtain three-dimensional coordinate values, and mapping the final result to the original high-dimensional sample;
step two: determining the number N of the hidden states of the network according to the network risk level, carrying out interval division on the initial probability of each hidden state, and carrying out interval division on the transition probability between the hidden states and the output probability from the hidden states to the display states;
step three: according to the initial probability interval matrix and the transition probability interval matrix of each hidden state divided in the second step, the output probability interval matrix takes random numbers in the interval and is normalized to respectively generate P hidden Markov initial probability matrixes pi, a transition probability matrix A and an output probability matrix B;
p hidden Markov initial probability matrixes pi, a transition probability matrix A and an output probability matrix B are respectively generated randomly, and the specific normalization result met by the generated probability matrix meets the following formula:
Figure FDA0003101317530000031
i=1,2,3....N……;
step four: encoding the generated P initial probability matrixes by adopting a floating point number encoding method; the three parameter matrixes of the chromosome generated by the adopted floating point number coding method corresponding to the hidden Markov model respectively comprise three parts, a hidden state initial probability matrix corresponds to an initial chromosome Ge pi, a hidden state transition probability matrix corresponds to a transition chromosome GeA, and an output matrix from a hidden state to a display state corresponds to an output chromosome GeB;
step five: calculating the fitness values of all P chromosomes, and directly copying the individuals with the maximum fitness values to the next population in order to prevent the randomness of a genetic algorithm from damaging the individuals with the optimal fitness values in the current population, namely the optimal storage strategy;
step six: and for the last P-1 chromosomes, calculating the weighted sum of the support degree and the fitness value of the chromosome to the dispersion of the population, and combining a roulette rule to enable the population to reach the P again, wherein the method comprises the following specific steps:
(1): calculating the formula ti=ufi+vSptiWherein u and v are respectively adaptiveThe weight of the value and the support value;
(2): calculation formula Tn=∑ufi+vSpti
(3): calculation formula Wi=ti/Tn
(4): calculating cumulative probability
Figure FDA0003101317530000041
(5): randomly generating a random number r satisfying 0-1 in uniform distribution, and adding r and giIf g is comparedi-1<r<giSelecting an individual i to enter a next generation new group; repeating (4) and (5) until the number of new populations generated is equal to the parent population size;
the individual support degree calculation mode for the dispersion of the population relates to the following definition:
definition 1: defining the size of the population as S, and defining that one chromosome contains Q ═ m × N + N × N + N genes, and the chromosome k is formed from Gk=(Gk1,Gk2...GkQ) S denotes k ═ 1,2.. S;
definition 2: chromosome fitness function f: since the optimal chromosome individual solved by the genetic algorithm is the initial parameter matrix of Hmm, the forward probabilities of all chromosomes are used as the fitness function, i.e. the
f=P(0/λ);
Definition 3: defining an individual phenotype etakI.e. the ratio of the fitness value of chromosome k to the sum of population fitness values
Figure FDA0003101317530000042
k=1,2,...S;
Definition 4: defining population dispersion d
Figure FDA0003101317530000043
Definition 5: defining the support degree of the kth chromosome on the dispersion of the population as follows;
Figure FDA0003101317530000051
step seven: determining the cross probability according to the support degree, and completing the genetic cross among individuals by adopting an arithmetic cross mode according to the following steps:
1): randomly selecting a chromosome k, and calculating the formula:
Figure FDA0003101317530000052
wherein SptmaxRepresenting the maximum support, SptminRepresents the minimum support, SptkRepresents randomly selected chromosome support;
2): generating a random number r, if r is less than Sptr, determining the chromosome k as a chromosome to be crossed, and repeatedly executing the two steps until two chromosomes to be crossed are generated;
3): and (3) carrying out genetic crossing on the two chromosomes to be crossed, wherein the crossing principle is as follows: ge pi 1 crosses Ge pi 1, GeA1 crosses GeA2, GeB1 crosses GeB 2;
step eight: determining variation probability according to the support degree, and completing individual genetic variation by adopting a non-uniform variation mode; the variation mode is as follows:
Figure FDA0003101317530000053
wherein G iskFor randomly selected k chromosome before mutation, Gk' is GkAltered chromosome, GmaxAnd GminThe individuals with the maximum and minimum current fitness are respectively, and t is (0-1)]An inter-variance constant, r is a random number; if random integer rand () is even, G is usedk′=Gk+t·(Gmax-Gk) The variation of r, if it is odd, is Gk′=Gk+t·(Gk-Gmin) The manner of variation of r;
step nine: carrying out individual normalization processing on the new born population subjected to genetic replication, genetic crossing and genetic variation to meet the hidden Markov parameter constraint condition;
step ten: checking whether a preset iteration termination condition is met, if so, terminating, selecting the chromosome with the maximum fitness value as a global optimum value, mapping the chromosome to three initial matrixes of the hidden Markov model, and otherwise, returning to the fifth step to carry out a new round of evolution;
step eleven: carrying out iterative training on the model parameter lambda (pi, A and B) obtained in the step ten by adopting a Bowmville algorithm to obtain a maximum likelihood estimation parameter of the hidden Markov model; performing iterative training on the lambda (pi, A and B) obtained in the step ten by using a Bowmville algorithm to obtain the maximum likelihood estimation parameter of the HMM model, wherein the method comprises the following steps of:
(1): d alarm sequence data samples { O ] are obtained according to the intrusion detection alarm clustering method of the artificial fish swarm optimization fuzzy mean clustering1,O2,...ODH, any alarm sequence O thereofd={o1 (d),O2 (d),...oT (d)};
(2): optimizing according to the genetic algorithm to obtain an optimal initial value lambda (pi, A, B)
(3): for each sample D1, 2.. D, γ is calculated using a forward-backward algorithmt (d)(i),ξt (d)(i, j), T ═ 1,2.. T, where
Figure FDA0003101317530000061
Figure FDA0003101317530000062
Wherein alpha ist(i) Is a forward probability, betat(i) Is a backward probability, aijIn order to make the probability transition,
Figure FDA0003101317530000063
is the output probability;
(4): the model parameters are updated according to the following formula:
Figure FDA0003101317530000071
Figure FDA0003101317530000072
Figure FDA0003101317530000073
(5): checking whether each matrix meets a convergence condition, if so, finishing the algorithm, and otherwise, returning to the step (3) for iterative execution;
step twelve: if the network state is abnormal, the network security situation can be predicted by utilizing a Viterbi algorithm through collecting external observation values and a trained hidden Markov model.
CN201910060212.1A 2019-01-22 2019-01-22 Situation prediction method based on combined hidden Markov model and genetic algorithm Active CN110336768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910060212.1A CN110336768B (en) 2019-01-22 2019-01-22 Situation prediction method based on combined hidden Markov model and genetic algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910060212.1A CN110336768B (en) 2019-01-22 2019-01-22 Situation prediction method based on combined hidden Markov model and genetic algorithm

Publications (2)

Publication Number Publication Date
CN110336768A CN110336768A (en) 2019-10-15
CN110336768B true CN110336768B (en) 2021-07-20

Family

ID=68138888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910060212.1A Active CN110336768B (en) 2019-01-22 2019-01-22 Situation prediction method based on combined hidden Markov model and genetic algorithm

Country Status (1)

Country Link
CN (1) CN110336768B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826617A (en) * 2019-10-31 2020-02-21 中国人民公安大学 Situation element classification method and training method and device of model thereof, and server
CN111598335A (en) * 2020-05-15 2020-08-28 长春理工大学 Traffic area division method based on improved spectral clustering algorithm
CN112101673B (en) * 2020-09-22 2024-01-16 华北电力大学 Power grid development trend prediction method and system based on hidden Markov model
CN112260870B (en) * 2020-10-21 2022-04-05 重庆邮电大学 Network security prediction method based on dynamic fuzzy clustering and grey neural network
CN112784896A (en) * 2021-01-20 2021-05-11 齐鲁工业大学 Time series flow data anomaly detection method based on Markov process
CN112994944B (en) * 2021-03-03 2023-07-25 上海海洋大学 Network state prediction method
CN114490619B (en) * 2022-02-15 2022-09-09 北京大数据先进技术研究院 Data filling method, device, equipment and storage medium based on genetic algorithm
CN116055182B (en) * 2023-01-28 2023-06-06 北京特立信电子技术股份有限公司 Network node anomaly identification method based on access request path analysis

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106453294A (en) * 2016-09-30 2017-02-22 重庆邮电大学 Security situation prediction method based on niche technology with fuzzy elimination mechanism

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120016641A1 (en) * 2010-07-13 2012-01-19 Giuseppe Raffa Efficient gesture processing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106453294A (en) * 2016-09-30 2017-02-22 重庆邮电大学 Security situation prediction method based on niche technology with fuzzy elimination mechanism

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Multiscale Entropy-Based Weighted Hidden Markov Network Security Situation Prediction Model;Wei Liang等;《2017 IEEE International Congress on Internet of Things(ICIOT)》;20170630;全文 *
一种改进的网络安全态势量化评估方法;席荣荣等;《计算机学报》;20141030;全文 *
基于遗传算法的网络安全态势感知研究;王国华;《计算机测量与控制》;20161225;全文 *
基于马尔可夫链的自适应DRX优化机制;高岭等;《东南大学学报(自然科学版)》;20170930;第47卷(第5期);全文 *
模糊聚类分析传递闭包法实用程序;郭凤鸣等;《电脑学习》;19991231;全文 *

Also Published As

Publication number Publication date
CN110336768A (en) 2019-10-15

Similar Documents

Publication Publication Date Title
CN110336768B (en) Situation prediction method based on combined hidden Markov model and genetic algorithm
CN108520272B (en) Semi-supervised intrusion detection method for improving Cantonese algorithm
CN105488528B (en) Neural network image classification method based on improving expert inquiry method
CN110070141A (en) A kind of network inbreak detection method
Jiang et al. Fuzzy c-means clustering based on weights and gene expression programming
CN112581262A (en) Whale algorithm-based fraud detection method for optimizing LVQ neural network
CN106874963B (en) A kind of Fault Diagnosis Method for Distribution Networks and system based on big data technology
CN106650920A (en) Prediction model based on optimized extreme learning machine (ELM)
CN110990718B (en) Social network model building module of company image lifting system
Li et al. Feature selection for high dimensional data using weighted k-nearest neighbors and genetic algorithm
CN111726349B (en) GRU parallel network flow abnormity detection method based on GA optimization
CN116759100B (en) Method for constructing chronic cardiovascular disease large model based on federal learning
CN111768027A (en) Reinforcement learning-based crime risk prediction method, medium, and computing device
Nooraeni et al. Fuzzy centroid and genetic algorithms: solutions for numeric and categorical mixed data clustering
CN114004153A (en) Penetration depth prediction method based on multi-source data fusion
CN114065646B (en) Energy consumption prediction method based on hybrid optimization algorithm, cloud computing platform and system
Zhang et al. Mining significant fuzzy association rules with differential evolution algorithm
CN114742564A (en) False reviewer group detection method fusing complex relationships
CN108737429B (en) Network intrusion detection method
Tiruneh et al. Feature selection for construction organizational competencies impacting performance
Michelakos et al. A hybrid classification algorithm evaluated on medical data
CN112528554A (en) Data fusion method and system suitable for multi-launch multi-source rocket test data
CN112132259B (en) Neural network model input parameter dimension reduction method and computer readable storage medium
CN112241811A (en) Method for predicting hierarchical mixed performance of customized product in &#39;Internet +&#39; environment
Ren et al. Fuzzy clustering based on water wave optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant