CN112257917A - Time series abnormal mode detection method based on entropy characteristics and neural network - Google Patents

Time series abnormal mode detection method based on entropy characteristics and neural network Download PDF

Info

Publication number
CN112257917A
CN112257917A CN202011116876.4A CN202011116876A CN112257917A CN 112257917 A CN112257917 A CN 112257917A CN 202011116876 A CN202011116876 A CN 202011116876A CN 112257917 A CN112257917 A CN 112257917A
Authority
CN
China
Prior art keywords
sequence
score
abnormal
sample
entropy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011116876.4A
Other languages
Chinese (zh)
Other versions
CN112257917B (en
Inventor
苏维均
牛雨晴
于重重
赵霞
韩璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN202011116876.4A priority Critical patent/CN112257917B/en
Publication of CN112257917A publication Critical patent/CN112257917A/en
Application granted granted Critical
Publication of CN112257917B publication Critical patent/CN112257917B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Agronomy & Crop Science (AREA)
  • Animal Husbandry (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Mining & Mineral Resources (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a time series abnormal pattern detection method based on entropy characteristics and a neural network, which comprises the following steps: 1) extracting a second-order difference ratio sample entropy characteristic sequence from the time sequence in the training data set; 2) training a generated confrontation network model to obtain a generator and a corresponding discriminator; 3) calculating the abnormal score of the characteristic sequence and constructing a threshold value; 4) and carrying out abnormity judgment on the input data to be detected according to the threshold value. The method has the advantages that the time series data are subjected to feature extraction by utilizing the difference rate sample entropy, so that the abnormal mode is more obvious; a new abnormal score calculation method is established, and the accuracy and the generalization of model identification are improved, so that the method has higher practicability and application value.

Description

Time series abnormal mode detection method based on entropy characteristics and neural network
Technical Field
The invention relates to prediction of coal mine thermodynamic composite disasters, in particular to a time series abnormal mode detection method based on entropy characteristics and a neural network, and belongs to the field of emergency safety.
Background
Coal as a main energy source occupies an irreplaceable important position in the energy structure of China, the area left after coal mining is a goaf, ventilation in the goaf is poor, more coal is left, and combustible gas is generated by continuous oxidation, so that coal spontaneous combustion, gas explosion and other coal mine thermal power disasters are easily caused. The concentration change of the released combustible gas shows a certain rule along with the development of time, the inflection points of monitoring data in different stages are effectively detected, and when the gas concentration is greatly changed, the gas concentration can be considered to enter an abnormal mode, so that the possibility of disasters such as coal spontaneous combustion and the like is shown. The gas generation content of different coal mines is different, and if the gas content value is only used as the judgment standard of disaster occurrence, great errors can be caused when the method is applied to other coal mines, so that the detection of the abnormal mode can improve the generalization of disaster judgment, and a new thought is provided for the detection of the coal composite disaster.
With the research and the deepening of the artificial intelligence theory by people, the prediction of coal and gas by applying the time sequence prediction method becomes a new trend, people introduce the prediction into the quantitative evaluation and analysis of coal and gas disasters and integrate the theory of computer technology, support vector machines, artificial neural networks and the like for research, but the prediction methods are difficult to apply to complex data, have the problems of easy falling into local minimum values, appear an overfitting phenomenon, have low accuracy and are large in limitation.
With the improvement of information technology, the problem of abnormality detection in time series has become a research focus in recent years. Time series anomalies generally refer to a series of data that is significantly different from other data, and such anomalies do not refer to random bias, but rather to differences due to different mechanisms. The abnormal mode of the gas time sequence data is detected, and a theoretical basis can be provided for the coal mine thermal power disaster. If the time series data has the abnormal mode, the change trend of the data is greatly changed, and the abnormal mode can be used as a judgment basis for disaster occurrence.
In the conventional method (CN201910809956.9), GAN is used for anomaly detection of a time sequence, which mainly uses an optimized GAN generator and a discriminator to build an anomaly detection model, and uses a generated residual and an identification loss output by the model as a judgment basis for judging abnormal data. But most of time series do not change significantly, and the time series are directly used as input data of the GAN, so that the characteristics are not significant enough; meanwhile, a more effective judgment criterion is obtained by using the generated residual error and the identification loss output by the model, and how to improve the accuracy and universality of the abnormity judgment is yet to be researched.
Disclosure of Invention
The invention aims to realize a time series abnormal mode detection method based on entropy characteristics and a neural network. The method of the invention is divided into 4 stages: extracting a second-order difference ratio sample entropy characteristic sequence from the time sequence in the training data set; training a generated confrontation network model to obtain a generator and a corresponding discriminator; calculating the abnormal score of the characteristic sequence and constructing a threshold value; and carrying out abnormity judgment on the input data to be detected according to the threshold value. Specifically, the method of the present invention comprises the steps of:
A. the method specifically comprises the following steps of extracting a second-order difference ratio sample entropy characteristic sequence from a time sequence in a training data set:
A1. dividing a training data set into two sets which are respectively marked as a training data set 1 and a training data set 2;
all the training data set 1 is normal data, and the training data set 2 comprises normal data and abnormal data;
A2. for training data set 1 time series
Figure BDA0002730598460000021
Carrying out segmentation by using the formula 1 according to the window size W and the step length d in a sliding manner to obtain a sequence segment set W with the length of L, wherein the ith time sequence segment is recorded as si
si=[x1+(i-1)d,x2+(i-1)d,…,x1+(i-1)d+w](formula 1)
SaidTtrain1 × T representing the number of time series of training data setstrainRepresenting a training data set time series dimension;
A3. performing difference ratio operation on each sequence segment in the sequence segment set W to obtain a second-order difference ratio sequence of all the sequence segments, wherein the specific implementation is as follows:
A3.1. for sequence segment siCalculating a second order difference ratio sequence G ═ G using equation 21,g2,…,gw′Solving the standard deviation std;
Figure BDA0002730598460000022
said
Figure BDA0002730598460000023
Is the e-order difference value of the u time point,
Figure BDA0002730598460000024
is the e-order difference value of the u-1 time point;
A3.2. dividing a second-order difference rate sequence with w 'data points by taking m time sequence data points as a subsequence, totaling w' -m +1 subsequence segments, and marking as K2i={q1,q2,…,qw′-m+1};
A4. Carrying out sample entropy feature extraction on the second-order difference rate sequences of all the sequence segments to obtain the second-order difference rate sample entropy feature sequences of all the sequence segments, and concretely realizing the following steps:
A4.1. calculating any two subsequence fragments qaAnd q isbA distance D [ q ] betweena,qb]The distance is determined by the maximum difference of the corresponding position elements in the two subsequence segments;
A4.2. calculating the subsequence fragment qaObtaining the similarity probability of the subsequences with the distance between subsequences smaller than the threshold value by formula 3, and obtaining the average similarity probability of the second-order difference rate sequence by formula 4;
Figure BDA0002730598460000031
Figure BDA0002730598460000032
r is a similarity threshold;
A4.3. according to the steps A4.1-A4.2, the average similarity probability B is recalculated by taking m +1 as the subsequence lengthm+1(r) obtaining a second-order difference ratio sample entropy feature SE by formula 5;
Figure BDA0002730598460000033
A5. carrying out sectional average preprocessing on the difference rate sample entropy sequence to obtain the difference rate sample entropy sequence, and concretely realizing the following steps:
A5.1. from Xt(t-1, 2.. t-w), and extracting a sequence segment S with the length wt={Xt,Xt+1,...,Xw+t-1}1×tSumming according to formula 6, and then averaging according to formula 7;
sumt=Xt+Xt+1...Xw+t-1(formula 6)
sumt=sumtW; (formula 7)
A5.2. Repeating the step A4.1, taking out t-w sequence segments in total, and adding sumtForming a new entropy sequence S of difference ratio samplest'={sum1,sum2,…,sumt-w}1×t
B. Training a generated confrontation network model to obtain a generator and a corresponding discriminator, and the specific implementation is as follows:
B1. randomly sampled noisy data Z ═ ZiI is 1,2, …, n, where n corresponds to the number of samples. The generator model G is a plurality of LSTM memory units, the number of the memory units is set, Z is input into the generator model G, and reconstructed sample sequence data G (Z) is generated;
B2. entropy sequencing of new difference ratio samplesSt' inputting the generated reconstructed sample sequence data G (Z) into a built discriminator model D;
B3. updating the model parameters by using a random gradient descent algorithm according to the value of the loss function, updating the parameters of the discriminator, and then updating the parameters of the generator according to the noise data by using an Adam optimization algorithm;
B4. saving the model parameters, repeating the steps B1-B3 to carry out loop iteration, and finally obtaining a trained generator model G capable of generating a normal time sequence and a corresponding discriminator model D;
C. calculating the abnormal score of the characteristic sequence and constructing a threshold, wherein the method is specifically realized as follows:
C1. using time series in training data set 2
Figure BDA0002730598460000041
Repeating the steps A2-A5, and extracting the features to obtain a new feature sequence
Figure BDA0002730598460000042
C2. Randomly sampling noise data ZvalInputting the data into a generator G which is completed in training, and generating a reconstructed sample G (Z)val) Calculating the abnormal score R of the input sample by using the generated errorscoreThe method is concretely realized as follows:
C2.1. for the reconstructed sample G (Z) with the length of nval) New signature sequence with training data set 2
Figure BDA0002730598460000043
The elements in the absolute error E are sorted from small to large to obtain the sorted absolute error Ei′={e′1,e′2,…,e′nGet the absolute error E after sortingi′={e′1,e′2,…,e′nMean value M of;
C2.2. e'iComparing the extracted elements with the average value M, and taking out E'iMiddle { e'k,e′k+1,…,e′nAre data elements greater than the mean M, the number beingn-k + 1; initializing weight sequence Wi′={w′1,w′2,…,w′n}T,w′1~n-2X 'is provided'nCorresponding weight w'nIs lambda, x'n-1Corresponding weight w'n-1Is 1-lambda, the weight sequence W is updatedi' size of element in, W is represented by formula 8i' update is performed;
Figure BDA0002730598460000044
C2.3. using the updated weight sequence Wi'and sequenced sample E'iCalculating the generation abnormality score R of the training sample set 2 by equation 9score
Rscore=Ei′·Wi' (formula 9)
C3. Outputting a generated sample and a new characteristic sequence by using the discriminator D trained in the step B
Figure BDA0002730598460000046
The similarity probability P of (2), calculating the discriminant anomaly score DscoreIs 1-P;
C4. using discriminant anomaly score DscoreAnd generating an anomaly score RscoreThe anomaly score O is calculated by equation 10, and a threshold is established according to the training data set 2, specifically implemented as follows:
O=WD×Dscore+WG×Rscore(formula 10)
W isDAnd WGGenerating weights of the abnormal scores for the discrimination abnormal scores and the samples respectively;
C4.1. will train the data set
Figure BDA0002730598460000045
The maximum abnormal score and the minimum abnormal score in the result are used as the maximum boundary and the minimum boundary, the maximum abnormal score and the minimum abnormal score are divided averagely, and the abnormal score of the q-th training data set 2 is calculated through an equation 11;
Figure BDA0002730598460000051
C4.2. the abnormal score corresponding to the maximum F1 score is used as a threshold, and the calculation mode of F1 is as shown in formula 12;
Figure BDA0002730598460000052
Figure BDA0002730598460000053
the Pre is the proportion of the positive sample predicted to be positive in all the positive samples, and the Rec is the proportion of the positive sample predicted to be positive in all the positive samples; TP is the positive sample predicted to be positive by the model; FP is a negative sample predicted to be positive by the model; FN is the positive sample predicted as negative by the model;
D. the method specifically comprises the following steps of judging the abnormity of input data to be detected according to a threshold value:
D1. inputting a time series of data sets to be detected
Figure BDA0002730598460000054
Repeating the steps A1-A5, and extracting the entropy characteristics of the difference rate samples to obtain a new time sequence
Figure BDA0002730598460000055
D2. Repeating steps C1-C4 to obtain
Figure BDA0002730598460000056
Inputting the data into a trained generation countermeasure network, and calculating the abnormal score O of the data to be detected by using the formula 10real
D3. Abnormality score O obtained by calculationrealAnd C, comparing the data to be detected with the threshold value obtained by calculation in the step C, if the abnormal score is larger than the threshold value, judging that the data to be detected contains an abnormal mode, otherwise, judging that the data to be detected does not contain the abnormal mode.
The method has the advantages that the time series data are subjected to feature extraction by utilizing the difference rate sample entropy, so that the abnormal mode is more obvious; a new abnormal score calculation method is established, the accuracy and the generalization of time series abnormal pattern detection are improved, and the method has higher practicability and application value.
Drawings
FIG. 1: general flow chart of abnormal pattern detection
Detailed Description
The present invention will be further described below as an example by performing CO time series prediction on experimental data and performing a description of a time series abnormal pattern detection method based on a difference ratio entropy characteristic and generation of a countermeasure network according to a time series data amount, an input-output dimension, and the like with reference to the accompanying drawings.
The general flow chart of the method is shown in figure 1. The method comprises the following steps: 1) extracting a second-order difference ratio sample entropy characteristic sequence from the time sequence in the training data set; 2) training a generated confrontation network model to obtain a generator and a corresponding discriminator; 3) calculating the abnormal score of the characteristic sequence and constructing a threshold value; 4) and carrying out abnormity judgment on the input data to be detected according to the threshold value. The invention is further described below by way of example according to the following steps:
A. the method specifically comprises the following steps of extracting a second-order difference ratio sample entropy characteristic sequence from a time sequence in a training data set:
A1. selecting experimental data, wherein a research object is a CO gas concentration one-dimensional time sequence, selecting a training data set, dividing the training data set into two sets which are respectively marked as a training data set 1 and a training data set 2;
all the training data set 1 is normal data, and the training data set 2 comprises normal data and abnormal data;
A2. setting the sliding window size of the sequence segment to be 10 and the step length to be 1 for the training data set 1 which is all normal data to slide for segmentation;
A3. performing difference rate operation on each sequence segment in the sequence segment set to obtain a second-order difference rate sequence of all the sequence segments, wherein the specific implementation is as follows:
A3.1. the CO gas concentration sequence is totally 348 data, and the formula is utilized
Figure BDA0002730598460000061
The second order difference ratio series was obtained to have 345 parts of data, G ═ G, as shown in table 21,g2,…,gw′And find the standard deviation std to be 0.11, and the partial data are as follows:
Figure BDA0002730598460000062
Figure BDA0002730598460000071
A3.2. dividing a second-order difference rate sequence with 345 data points by taking 6 time sequence data points as a subsegment, and counting 340 subsequences which are marked as K2i={q1,q2,…,qw′-m+1Part of the data are as follows:
Figure BDA0002730598460000072
A4. carrying out sample entropy feature extraction on the second-order difference rate sequences of all the sequence segments to obtain second-order difference rate sample entropy feature sequences of all the sequence segments, and concretely realizing the following steps;
A4.1. calculating the second-order difference ratio sample entropy characteristics of each sequence segment to finally obtain a complete second-order difference ratio sample entropy sequence, wherein partial data are as follows:
Figure BDA0002730598460000073
A5. carrying out sectional average preprocessing on the difference rate sample entropy sequence to obtain the difference rate sample entropy sequence, and concretely realizing the following steps:
A5.1. from Xt(t 1,2.. t-w), and taking out the sequence with the length wColumn segment St={Xt,Xt+1,...,Xw+t-1}1×tSumming and then averaging;
A5.2. repeating the step A4.1, taking out t-w sequence segments in total, and adding sumtConstitute a new sequence St'={sum1,sum2,…,sumt-w}1×tPart of the data are as follows:
Figure BDA0002730598460000081
B. training a generated confrontation network model to obtain a generator and a corresponding discriminator, and the specific implementation is as follows:
B1. randomly sampled noisy data Z ═ ZiI ═ 1,2, …, n }, where n is 330. The generator model is a plurality of LSTM memory units, the number of the memory units is set, Z is input into the built generator model, and reconstructed sample sequence data G (Z) is generated;
B2. entropy S of new difference ratio samplest' and the generated reconstructed sample sequence data G (Z) are input into a built discriminator model D, and partial parameter data are as follows:
Figure BDA0002730598460000082
B3. updating the model parameters by using a random gradient descent algorithm according to the value of the loss function, updating the parameters of the discriminator, and then updating the parameters of the generator according to the noise data by using an Adam optimization algorithm;
B4. saving the model parameters, returning to B2 for 1000 times of loop iteration, setting the learning rate to be 0.1, and finally obtaining a trained generator model G and a discriminant model D;
C. calculating the abnormal score of the characteristic sequence and constructing a threshold, wherein the method is specifically realized as follows:
C1. the steps A2-A5 are first repeated for a time series of training data set 2 containing normal data and abnormal data
Figure BDA0002730598460000091
Extracting features to obtain new feature sequence
Figure BDA0002730598460000092
Part of the data are as follows:
Figure BDA0002730598460000093
C2. using discriminant anomaly score DscoreAnd sample generation anomaly score RscoreCalculating an anomaly score O;
C2.1. will train the data set
Figure BDA0002730598460000094
The maximum abnormal score and the minimum abnormal score in the result are used as the maximum and minimum boundaries, and the maximum abnormal score and the minimum abnormal score are averagely divided to obtain the abnormal score of the training data set 2 of the q-th section
Figure BDA0002730598460000095
C2.2. The maximum F1 score is 0.8916, and the corresponding abnormal score O is used as a threshold value, so that the threshold value is 0.375;
D. the method specifically comprises the following steps of judging the abnormity of input data to be detected according to a threshold value:
D1. inputting time series samples of a data set to be detected
Figure BDA0002730598460000096
Repeating the steps A2-A5, and extracting the entropy characteristics of the difference rate samples to obtain a new time sequence
Figure BDA0002730598460000097
Part of the data are as follows:
Figure BDA0002730598460000098
D2. repeating steps C1-C4 to obtain
Figure BDA0002730598460000099
Inputting the abnormal score O into a trained generation countermeasure network, and calculating the abnormal score O of an actual data samplerealIs 0.572;
D3. abnormality score O obtained by calculationrealAnd C, comparing the abnormal score with the threshold value calculated in the step C, if the abnormal score is larger than the threshold value, judging that the sample is an abnormal sample, and actually processing the whole sample as follows:
Figure BDA0002730598460000101
the method realizes a time series abnormal mode detection method based on the difference rate entropy characteristics and generation of the countermeasure network, and can detect whether the sequence section contains an abnormal mode, thereby achieving the purpose of providing judgment basis for the occurrence of coal mine thermal dynamic disasters; a new abnormal score calculation method is established, so that the accuracy and the generalization of model identification are improved, and the method has higher application value.
Finally, it is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

Claims (6)

1. A time series abnormal pattern detection method based on entropy characteristics and a neural network comprises the following steps:
A. the method specifically comprises the following steps of extracting a second-order difference ratio sample entropy characteristic sequence from a time sequence in a training data set:
A1. dividing a training data set into two sets which are respectively marked as a training data set 1 and a training data set 2;
all the training data set 1 is normal data, and the training data set 2 comprises normal data and abnormal data;
A2. for training data set 1 time series
Figure FDA0002730598450000011
The window size W and the step length d are used for sliding segmentation to obtain a sequence segment set W with the length of L, wherein the ith time sequence segment is marked as siThe calculation formula is as follows:
si=[x1+(i-1)d,x2+(i-1)d,…,x1+(i-1)d+w]
the T istrain1 × T representing the number of time series of training data setstrainRepresenting a training data set time series dimension;
A3. performing difference rate operation on each sequence segment in the sequence segment set W to obtain a second-order difference rate sequence of all the sequence segments;
A4. carrying out sample entropy feature extraction on the second-order difference rate sequences of all the sequence segments to obtain second-order difference rate sample entropy feature sequences of all the sequence segments;
A5. carrying out sectional average pretreatment on the difference rate sample entropy sequence to obtain a difference rate sample entropy sequence;
B. training a generated confrontation network model to obtain a generator and a corresponding discriminator, and the specific implementation is as follows:
B1. randomly sampled noisy data Z ═ ZiI is 1,2, …, n, where n corresponds to the number of samples. The generator model G is a plurality of LSTM memory units, the number of the memory units is set, Z is input into the generator model G, and reconstructed sample sequence data G (Z) is generated;
B2. entropy sequencing S of new difference ratio samplest' inputting the generated reconstructed sample sequence data G (Z) into a built discriminator model D;
B3. updating the model parameters by using a random gradient descent algorithm according to the value of the loss function, updating the parameters of the discriminator, and then updating the parameters of the generator according to the noise data by using an Adam optimization algorithm;
B4. saving the model parameters, repeating the steps B1-B3 to carry out loop iteration, and finally obtaining a trained generator model G capable of generating a normal time sequence and a corresponding discriminator model D;
C. calculating the abnormal score of the characteristic sequence and constructing a threshold, wherein the method is specifically realized as follows:
C1. using time series in training data set 2
Figure FDA0002730598450000012
Repeating the steps A2-A5, and extracting the features to obtain a new feature sequence
Figure FDA0002730598450000013
C2. Randomly sampling noise data ZvalInputting the data into a generator G which is completed in training, and generating a reconstructed sample G (Z)val) Calculating the abnormal score R of the input sample by using the generated errorscore
C3. Outputting a generated sample and a new characteristic sequence by using the discriminator D trained in the step B
Figure FDA0002730598450000021
The similarity probability P of (2), calculating the discriminant anomaly score DscoreIs 1-P;
C4. using discriminant anomaly score DscoreAnd generating an anomaly score RscoreCalculating an abnormal score O, and establishing a threshold value according to the training data set 2, wherein the calculation formula is as follows:
O=WD×Dscore+WG×Rscore
w isDAnd WGGenerating weights of the abnormal scores for the discrimination abnormal scores and the samples respectively;
D. the method specifically comprises the following steps of judging the abnormity of input data to be detected according to a threshold value:
D1. inputting a time series of data sets to be detected
Figure FDA0002730598450000022
Repeating the steps A1-A5, and extracting the entropy characteristics of the difference rate samples to obtain a new time sequence
Figure FDA0002730598450000023
D2. Repeating steps C1-C4 to obtain
Figure FDA0002730598450000024
Inputting the data into a trained generation countermeasure network, and calculating the abnormal score O of the data to be detected by using the formula 10real
D3. Abnormality score O obtained by calculationrealAnd C, comparing the data to be detected with the threshold value obtained by calculation in the step C, if the abnormal score is larger than the threshold value, judging that the data to be detected contains an abnormal mode, otherwise, judging that the data to be detected does not contain the abnormal mode.
2. The method for detecting the abnormal pattern of the time series based on the entropy features and the neural network as claimed in claim 1, wherein the difference ratio operation is performed on each sequence segment in the sequence segment set W to obtain the second order difference ratio sequence of all the sequence segments, and the method is implemented as follows:
A3.1. for sequence segment siCalculating the second order difference rate sequence G ═ G1,g2,…,gw′And solving the standard deviation std thereof, wherein the calculation formula is as follows:
Figure FDA0002730598450000025
said
Figure FDA0002730598450000026
Is the e-order difference value of the u time point,
Figure FDA0002730598450000027
is the e-order difference value of the u-1 time point;
A3.2. dividing m time sequence data points into w' dataThe second order difference ratio sequence of points, totaling w' -m +1 subsequences fragments, is denoted as K2i={q1,q2,…,qw′-m+1}。
3. The method for detecting the abnormal pattern of the time series based on the entropy characteristics and the neural network as claimed in claim 1, wherein the sample entropy characteristics are extracted from the second order difference ratio sequences of all the sequence segments to obtain the second order difference ratio sample entropy characteristic sequences of all the sequence segments, and the specific implementation steps are as follows:
A4.1. calculating any two subsequence fragments qaAnd q isbA distance D [ q ] betweena,qb]The distance is determined by the maximum difference of the corresponding position elements in the two subsequence segments;
A4.2. calculating the subsequence fragment qaProbability of similarity to the remainder of the subsequence fragment. Using the occupation ratio of the subsequence segments with the distance between the subsequence segments smaller than a threshold value and the average similarity probability of the second-order difference rate sequence as the second-order difference rate sample entropy, wherein the calculation formula is as follows:
Figure FDA0002730598450000031
Figure FDA0002730598450000032
r is a similarity threshold;
A4.3. according to the steps A4.1-A4.2, the average similarity probability B is recalculated by taking m +1 as the subsequence lengthm+1(r), the second order difference ratio sample entropy characteristic SE, the calculation mode is as follows:
Figure FDA0002730598450000033
4. the method for detecting the abnormal pattern of the time series based on the entropy characteristics and the neural network as claimed in claim 1, wherein the difference ratio sample entropy sequence is subjected to the segmented average preprocessing to obtain the difference ratio sample entropy sequence, and the method is specifically realized as follows:
A5.1. from Xt(t-1, 2.. t-w), and extracting a sequence segment S with the length wt={Xt,Xt+1,...,Xw+t-1}1×tSumming and then averaging, wherein the calculation formula is as follows:
sumt=Xt+Xt+1...Xw+t-1
sumt=sumt/w;
A5.2. repeating the step A4.1, taking out t-w sequence segments in total, and adding sumtForming a new entropy sequence S of difference ratio samplest'={sum1,sum2,…,sumt-w}1×t
5. The entropy feature and neural network-based time series abnormal pattern detection method of claim 1, wherein the noise data Z is randomly sampledvalInputting the data into a generator G which is completed in training, and generating a reconstructed sample G (Z)val) Calculating the abnormal score R of the input sample by using the generated errorscoreThe method is concretely realized as follows:
C2.1. for the reconstructed sample G (Z) with the length of nval) New signature sequence with training data set 2
Figure FDA0002730598450000041
The elements in the absolute error E are sorted from small to large to obtain the sorted absolute error Ei′={e′1,e′2,…,e′nGet absolute error E 'after sorting'i={e′1,e′2,…,e′nMean value M of;
C2.2. e'iComparing the extracted elements with the average value M, and taking out E'iMiddle { e'k,e′k+1,…,e′n-data elements larger than the mean M, number n-k + 1; initializing weight sequence Wi′={w′1,w′2,…,w′n}T,w′1~n-2X 'is provided'nCorresponding weight w'nIs lambda, x'n-1Corresponding weight w'n-1Is 1-lambda, the weight sequence W is updatedi' the size of the middle element, the calculation formula is:
Figure FDA0002730598450000042
C2.3. using the updated weight sequence Wi'and sequenced sample E'iCalculating the abnormal score R of training sample set 2scoreThe calculation formula is as follows:
Rscore=Ei′·Wi′。
6. the entropy feature and neural network-based time series abnormal pattern detection method of claim 1, wherein a discriminant abnormality score D is usedscoreAnd generating an anomaly score RscoreThe anomaly score O is calculated by equation 10, and a threshold is established according to the training data set 2, specifically implemented as follows:
C4.1. will train the data set
Figure FDA0002730598450000043
The maximum abnormal score and the minimum abnormal score in the result are used as the maximum boundary and the minimum boundary, the maximum abnormal score and the minimum abnormal score are averagely divided, and the abnormal score of the q-th training data set 2 is calculated, wherein the calculation formula is as follows:
Figure FDA0002730598450000044
C4.2. the abnormal score corresponding to the maximum F1 score is used as a threshold, and the calculation formula of F1 is as follows:
Figure FDA0002730598450000045
Figure FDA0002730598450000046
the Pre is the proportion of positive samples predicted to be positive in all the positive samples predicted to be positive; rec is the proportion of positive samples predicted to be positive among all positive samples. TP is the positive sample predicted to be positive by the model; FP is a negative sample predicted to be positive by the model; FN is the positive sample that is predicted to be negative by the model.
CN202011116876.4A 2020-10-19 2020-10-19 Time sequence abnormal mode detection method based on entropy characteristics and neural network Active CN112257917B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011116876.4A CN112257917B (en) 2020-10-19 2020-10-19 Time sequence abnormal mode detection method based on entropy characteristics and neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011116876.4A CN112257917B (en) 2020-10-19 2020-10-19 Time sequence abnormal mode detection method based on entropy characteristics and neural network

Publications (2)

Publication Number Publication Date
CN112257917A true CN112257917A (en) 2021-01-22
CN112257917B CN112257917B (en) 2023-05-12

Family

ID=74244702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011116876.4A Active CN112257917B (en) 2020-10-19 2020-10-19 Time sequence abnormal mode detection method based on entropy characteristics and neural network

Country Status (1)

Country Link
CN (1) CN112257917B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127705A (en) * 2021-04-02 2021-07-16 西华大学 Heterogeneous bidirectional generation countermeasure network model and time sequence anomaly detection method
CN114386454A (en) * 2021-12-09 2022-04-22 首都医科大学附属北京友谊医院 Medical time sequence signal data processing method based on signal mixing strategy
CN114844796A (en) * 2022-04-29 2022-08-02 济南浪潮数据技术有限公司 Method, device and medium for detecting abnormity of time-series KPI
CN115600116A (en) * 2022-12-15 2023-01-13 西南石油大学(Cn) Dynamic detection method, system, storage medium and terminal for time series abnormity

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001092990A2 (en) * 2000-06-01 2001-12-06 Variagenics, Inc. Structure-based methods for assessing amino acid variances
CN103886405A (en) * 2014-02-20 2014-06-25 东南大学 Boiler combustion condition identification method based on information entropy characteristics and probability nerve network
CN109035488A (en) * 2018-08-07 2018-12-18 哈尔滨工业大学(威海) Aero-engine time series method for detecting abnormality based on CNN feature extraction
CN110071913A (en) * 2019-03-26 2019-07-30 同济大学 A kind of time series method for detecting abnormality based on unsupervised learning
CN110211114A (en) * 2019-06-03 2019-09-06 浙江大学 A kind of scarce visible detection method of the vanning based on deep learning
CN110598851A (en) * 2019-08-29 2019-12-20 北京航空航天大学合肥创新研究院 Time series data abnormity detection method fusing LSTM and GAN

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001092990A2 (en) * 2000-06-01 2001-12-06 Variagenics, Inc. Structure-based methods for assessing amino acid variances
CN103886405A (en) * 2014-02-20 2014-06-25 东南大学 Boiler combustion condition identification method based on information entropy characteristics and probability nerve network
CN109035488A (en) * 2018-08-07 2018-12-18 哈尔滨工业大学(威海) Aero-engine time series method for detecting abnormality based on CNN feature extraction
CN110071913A (en) * 2019-03-26 2019-07-30 同济大学 A kind of time series method for detecting abnormality based on unsupervised learning
CN110211114A (en) * 2019-06-03 2019-09-06 浙江大学 A kind of scarce visible detection method of the vanning based on deep learning
CN110598851A (en) * 2019-08-29 2019-12-20 北京航空航天大学合肥创新研究院 Time series data abnormity detection method fusing LSTM and GAN

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127705A (en) * 2021-04-02 2021-07-16 西华大学 Heterogeneous bidirectional generation countermeasure network model and time sequence anomaly detection method
CN113127705B (en) * 2021-04-02 2022-08-05 西华大学 Heterogeneous bidirectional generation countermeasure network model and time sequence anomaly detection method
CN114386454A (en) * 2021-12-09 2022-04-22 首都医科大学附属北京友谊医院 Medical time sequence signal data processing method based on signal mixing strategy
CN114386454B (en) * 2021-12-09 2023-02-03 首都医科大学附属北京友谊医院 Medical time sequence signal data processing method based on signal mixing strategy
CN114844796A (en) * 2022-04-29 2022-08-02 济南浪潮数据技术有限公司 Method, device and medium for detecting abnormity of time-series KPI
CN115600116A (en) * 2022-12-15 2023-01-13 西南石油大学(Cn) Dynamic detection method, system, storage medium and terminal for time series abnormity

Also Published As

Publication number Publication date
CN112257917B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN112257917B (en) Time sequence abnormal mode detection method based on entropy characteristics and neural network
CN113434357B (en) Log anomaly detection method and device based on sequence prediction
CN110213222B (en) Network intrusion detection method based on machine learning
CN111914873A (en) Two-stage cloud server unsupervised anomaly prediction method
CN107194524B (en) RBF neural network-based coal and gas outburst prediction method
KR102361423B1 (en) Artificial intelligence system and method for predicting maintenance demand
CN112199670B (en) Log monitoring method for improving IFOREST (entry face detection sequence) to conduct abnormity detection based on deep learning
CN108446714B (en) Method for predicting residual life of non-Markov degradation system under multiple working conditions
CN113505826B (en) Network flow anomaly detection method based on joint feature selection
CN112761628A (en) Shale gas yield determination method and device based on long-term and short-term memory neural network
CN112329974B (en) LSTM-RNN-based civil aviation security event behavior subject identification and prediction method and system
CN111881299B (en) Outlier event detection and identification method based on replicated neural network
CN114281864A (en) Correlation analysis method for power network alarm information
CN113806889A (en) Processing method, device and equipment of TBM cutter head torque real-time prediction model
CN115018512A (en) Electricity stealing detection method and device based on Transformer neural network
Li et al. A rockburst prediction model based on extreme learning machine with improved Harris Hawks optimization and its application
CN114742165A (en) Aero-engine gas circuit performance abnormity detection system based on depth self-encoder
CN110991363B (en) Method for extracting CO emission characteristics of coal mine safety monitoring system in different coal mining processes
Yu et al. Anomaly detection in unstructured logs using attention-based Bi-LSTM network
US20230401454A1 (en) Method using weighted aggregated ensemble model for energy demand management of buildings
CN115017015B (en) Method and system for detecting abnormal behavior of program in edge computing environment
CN115048873B (en) Residual service life prediction system for aircraft engine
CN114826718A (en) Multi-dimensional information-based internal network anomaly detection method and system
CN113326371B (en) Event extraction method integrating pre-training language model and anti-noise interference remote supervision information
CN111967494B (en) Multi-source heterogeneous data analysis method for guard security of large movable public security system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant