CN116340384A

CN116340384A - Nuclear recursion maximum correlation entropy time sequence on-line prediction method based on rule evolution

Info

Publication number: CN116340384A
Application number: CN202310105269.5A
Authority: CN
Inventors: 韩敏; 夏慧娟; 梁漪; 胡磊
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-06-27

Abstract

The invention belongs to the field of time sequence prediction, and provides a kernel recursion maximum correlation entropy time sequence on-line prediction method based on rule evolution. Firstly, preprocessing acquired data by using a normalization method and phase space reconstruction, and fully mining useful information in the data; then, realizing autonomous learning evolution of a rule base by utilizing the compatibility measurement and wake-up index double rules, and weakening adverse effects of outliers or complex noise; then, the kernel recursive maximum cross-correlation entropy method and the sparsification strategy are combined to update the model parameters, so that a compact dictionary is formed, the calculation complexity is reduced, the dynamic tracking capacity of the model to a time sequence is further enhanced, and the prediction precision is improved; and finally, outputting and predicting the trained model by adopting test data, and verifying the high efficiency of the model. The invention can carry out structural evolution aiming at an unknown complex environment, has stronger autonomy and robustness, and can realize the balance of high prediction precision and low calculation complexity.

Description

Nuclear recursion maximum correlation entropy time sequence on-line prediction method based on rule evolution

Technical Field

The invention belongs to the field of time sequence prediction, and relates to a method for online prediction of a kernel recursion maximum correlation entropy time sequence based on rule evolution.

Background

The time sequence is a group of data sets arranged according to time sequence, and widely exists in various fields of nature, industrial production, financial technology and the like. Along with the rapid development of sensors and storage devices, the data scale and the update speed of a time sequence are continuously improved, and the model for online prediction is required to show stronger performance on the aspects of nonlinearity, non-stability, non-Gaussian and the like of the time sequence while mining data hidden information in the face of environments with complex data burst and noise. Compared with other online prediction models, the kernel adaptive filter has the advantages of strong generalization capability, simple iterative update, low computational complexity and the like, and can effectively solve the problem of complex nonlinear prediction.

Although the kernel adaptive filter is widely used in the field of time series prediction, the following disadvantages still exist: (1) poor ability to capture time-varying features of a dynamic system. Although Wu et al use the related entropy criteria to replace the traditional mean square error criteria in the paper "Wu Z, shi J, zhang X, et al, kernel recursive maximum correntropy [ J ]. Signal Processing,2015,117:11-16 ], propose a kernel recursive maximum related entropy method, which improves the performance of the model in an outlier or non-gaussian noise environment, because the model only adjusts parameters, there is no structure adaptive evolution, resulting in poor time-varying property effects of model tracking time sequences, resulting in poor prediction accuracy. And (2) the calculation complexity of loading the complete dictionary is high. When the data is processed in each iteration, the model needs to be added with corresponding kernel space to store new data, and the size of the complete dictionary depends on the size of the data sample, which presents challenges for calculation time and memory, so that the model is difficult to be suitable for time series with large data scale.

Therefore, the invention takes a time sequence with larger scale and time-varying characteristics as a research object, and provides a method for online prediction of a nuclear recursion maximum correlation entropy time sequence based on rule evolution, so as to realize self-adaptive evolution of a model structure and self-adaptive update of parameters, further reduce calculation complexity and improve the prediction precision of the time sequence. The invention is funded by the national natural fund project (62173063).

Disclosure of Invention

Aiming at the problems of poor online prediction performance and over-high calculation cost in the prior art, the invention provides a method for online prediction of a kernel recursion maximum correlation entropy time sequence based on rule evolution.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the method for online predicting the core recursion maximum correlation entropy time sequence based on rule evolution comprises the following specific steps:

step 1: time series data is collected from the real world and normalized.

First, a sample { (x (n), d (n)), n=1, 2, … }, where x (n) represents a model input vector consisting of t-dimensional feature inputs x (n), i.e., x (n) = [ x ] ₁ (n),x ₂ (n),…,x _t (n)]D (n) represents a prediction target, and n represents a time; secondly, considering the dimension difference of the multidimensional features in the input vector, adopting a normalization method to process the data, and further reducing the adverse effect of the data on the accurate prediction of the model, wherein the calculation formula is as follows:

wherein x (n) and x' (n) are values before and after normalization of input data, respectively, x _min And x _max Respectively minimum and maximum values of input data; and similarly, carrying out normalization processing on the predicted target to obtain a normalized value d' (n) of the predicted target.

Step 2: the normalized time series data is subjected to phase space reconstruction, and the purpose of the phase space reconstruction is to deeply mine useful information (correlation, dynamics characteristics and the like) in the time series. Thus, phase space reconstruction is performed on the time series data, and the reconstructed input vector is used for the time series data

Expressed as:

wherein τ _t (n) and m _t (n) delay time and embedding dimension of the T-th feature sequence in the normalized input vector x' (n), T is transposed, and the prediction target is

Step 3: dividing the data set reconstructed in the step 2, and setting and initializing parameters.

80% of the reconstructed time series data are selected as training sets, the total amount of the training sets is recorded as N, and the rest is used as a test set.

Setting model parameters including kernel width sigma', gaussian kernel size sigma, cluster center learning rate eta, wake-up index learning rate beta, regularization parameter gamma and wake-up threshold value in related entropy

And a distance threshold δ.

At the 1 st moment, substituting the 1 st training data into the model, creating a first rule to form a rule base, and initializing key parameters of the model, including a wake-up index a ₁ (1) =0, cluster center

Dictionary->

Expansion coefficient->

κ<·,·>Representing a gaussian kernel function, variables b and b ^* Is defined as:

where σ is the gaussian kernel size.

Step 4: and (3) performing iterative training of the model from the 2 nd moment according to the training data set and the parameter setting. In particular to

Step 4.1: loading 1 training data at each moment, and calculating compatibility measurement and wake-up index of the training data under each rule in a rule base;

the model employs compatibility metrics and wake-up index definition rules.

The compatibility measure ρ _i Is used for measuring input vector

Correlation with the i-th normal clustering center R. When the compatibility metric reaches a maximum, it indicates that the degree of similarity between the current input sample and the rule is the greatest, at which point the rule is the most compatible rule. Input vector->

Clustering center R with ith rule _i The compatibility metrics between (n) are as follows:

wherein ρ is _i (n)∈[0,1]T represents the total feature number contained in the input vector, r represents the correlation dependence between two observed variables, calculated as:

wherein,,

and->

Respectively represent the nth characteristic input +.>

And the average value of the cluster center R.

The wake-up index a _i Can be used as a compatibility measure ρ _i To create new rules to reduce the negative impact of outliers. The calculation formula of the wake-up index a under the ith rule at the nth time is expressed as follows:

a _i (n)＝(1-β)a _i (n-1)+β(1-ρ _i (n)) (6)

where β represents the learning rate of the wake up index.

Step 4.2: will wake up index a _i (n) minimum and wake-up threshold

Comparing, judging whether to create new rule, wherein the wake-up threshold value +.>

The range of (2) is 0-1, and the following two cases are specifically included:

case 1: when the minimum wake index is greater than the wake threshold, i.e

When a new rule is created, the number of rules in the rule base is increased, the new rule is updated to r=r+1, and parameters in the new rule including a cluster center are initialized>

Dictionary->

Expansion coefficient->

Case 2: when the minimum wake index is less than or equal to the wake threshold

I.e. < ->

When the clustering center is in the same time, the input vector (namely the current time sequence) is brought into the most compatible rule in the rule base, and the clustering center is recursively updated, wherein the calculation formula is as follows:

wherein eta epsilon [0,1] represents the learning rate of the cluster center.

And then, in the subsequent parameter updating of the model structure, a related entropy criterion is selected to replace a mean square error criterion in a traditional kernel recursion least square method as a cost function, and the kernel recursion maximum related entropy method is utilized to improve the prediction performance of the model under the condition of non-Gaussian noise or abnormal value. The optimization objective based on the correlation entropy is defined as:

wherein, omega is the weight of the filter,

indicating the current input at time j>

By means of the non-linearly mapped input vector, gamma represents regularization parameters, |·| represents L2 norms. Under the ith rule at the nth moment, obtaining an intermediate variable h by using a gradient descent method _i (n)、z _i (n) and lambda _i (n) calculating intermediate variables according to formula (9), the calculation formula being:

wherein c _ik (n) is the dictionary set in the ith rule at the nth time, k is the number of input vectors contained in the dictionary set, Q _i (n-1) is a matrix variable of the ith rule at the n-1 th moment.

Since the kernel recursive maximum correlation entropy method performs parameter calculation based on sets of historical data, each set forms a dictionary, i.e., C _i (n)＝[c _i1 (n),…,c _in′ (n)]Where n' is the number stored in the dictionary set. If the dictionary contains all of the input data, this will result in a significant increase in computational burden. Therefore, in order to reduce the computational complexity, the sparsification processing is performed by adopting a novel criterion, so that only relevant input vectors are reserved in the dictionary, and a compact dictionary is formed. Defining a distance calculation formula as

In c _ii* ＝[c ₁ ,…,c _m ]Representing the ith in the ith rule lower dictionary set c ^* A plurality of input vectors; thereafter, it is compared to a distance threshold, wherein the threshold ranges from 0 to 1, to determine whether the data is to be incorporated into a dictionary. The method comprises the following two parts:

(1) When (when)

At the time, the current input sample +.>

The inclusion of a dictionary is to be taken into account,

matrix variable Q at this time _i (n) and expansion coefficient θ _i (n) update formula is as follows:

wherein,,

e _i (n) is a prediction error, expressed as

(2) When (when)

When the current data is excluded from the dictionary, the dictionary remains unchanged, and the matrix variable Q _i (n) and expansion coefficient θ _i The calculation form of (n) is as follows:

wherein the method comprises the steps of

Step 4.3: loading training data, and judging whether training is finished;

if the current moment is smaller than the total quantity of the training set, namely N is smaller than N, returning to the step 4.1, and entering the next moment for iterative updating; otherwise, the model training is completed, a rule base is exported, and the next step is entered.

Step 5: and (3) selecting the most compatible rule in the rule base to output and predict the test data set by using the model trained in the step (4), and then performing inverse normalization calculation on the sample data. Finally, the prediction accuracy of the model is measured using an evaluation index, including Root Mean Square Error (RMSE) and symmetric average absolute percentage error (SMAPE), defined as follows:

wherein d (j) and

the true value and the predicted value are respectively, and n is the number of samples.

Compared with the prior art, the invention has the following obvious advantages:

(1) The invention can carry out structural evolution aiming at an unknown complex environment, has stronger autonomy and robustness, and is characterized in that the invention constructs a rule base from nothing to nothing according to input data through compatibility measurement and wake-up index, self-adaptively updates a structure, and fully extracts useful information contained in the data, so that the rule quality is ensured;

(2) When data containing abnormal values or non-Gaussian noise is processed, the method can effectively inhibit adverse effects caused by the abnormal values or the non-Gaussian noise by adopting a kernel recursion maximum correlation entropy method, achieves real-time capture of dynamic changes of a time sequence, and generates accurate prediction results;

(3) In addition, the self-adaption and sparsification method of the rule base can ensure the compactness of the dictionary structure, and realize the balance of high prediction precision and low calculation complexity.

Drawings

FIG. 1 is a flow chart of a time series on-line prediction model of the present invention.

Fig. 2 (a) is a graph showing the prediction of Beijing PM2.5 time series according to the present invention.

Fig. 2 (b) is an error curve of the present invention for Beijing PM2.5 time series.

Detailed Description

The invention will be further described with reference to the accompanying drawings and specific examples.

The hardware equipment used by the invention comprises a PC machine.

As shown in fig. 1, the method for online predicting the temporal sequence of the maximum correlation entropy of the kernel recursion based on rule evolution is implemented as follows:

step 1: 8760 groups of PM2.5 pollutants and 4-dimensional capital international airport meteorological data, including air temperature, air pressure, dew point and wind speed, are obtained from 1 month 1 in 2019 to 12 months 31 in 2019, and the data are all collected once per hour. At this time, the input vector of the model is x (n) = [ x ] ₁ (n),x ₂ (n),…,x ₅ (n)]Single-step prediction of Beijing PM2.5 contaminant is performed, and the prediction target is determined to be d (n) =x ₁ (n+1); the data set { (x (n), d (n)), n=1, 2, … } is normalized according to the formula (1) to construct an input-output sample set (x '(n), d' (n)), so that the model prediction effect is not good due to the fact that the dimension difference of the multidimensional data set is large.

Step 2: in order to fully extract hidden information in the time sequence, determining that delay time and embedding dimension are tau=1 and m=20 respectively according to a phase space reconstruction theory, selecting 5000 groups of data after reconstruction, wherein input vectors and prediction targets are respectively

Step 3: 80% of the Beijing time series after reconstruction is used as a training set, the total data of the training set is 4000 at the moment, and the rest data are used as a test set. At the 1 st moment, the 1 st training data steps into the model, a first rule is created, a rule base is formed, and the model parameters are set as shown in table 1; initializing key parameters including wake index a ₁ (1) =0, cluster center

Dictionary->

And expansion coefficient->

TABLE 1 model parameter settings

Step 4: and (3) utilizing the training set which arrives in sequence to evolve the rule base from the 2 nd moment, and iterating the training model. In particular to

Step 4.1: respectively calculating compatibility measurement and wake-up index of training data at the current moment and each rule in a rule base according to a formula (4) and a formula (6);

for Beijing 5-dimensional time series data, a vector is input at the nth moment

And the ith rule clustering center R _i The compatibility metrics of (n) are as follows:

wherein the current time is input

Clustering center R with ith rule _i The correlation dependence r between (n) is calculated according to equation (5).

Step 4.2: comparing wake-up index minimum value a _imin (n) and wake threshold

Whether to add new rules is discussed, specifically, the following two cases:

case 1: when a is _imin When (n) > 1e-5, a new rule is generated. At this time, the number of rules in the rule base is increased, i.e. r=r+1, and the key parameters under the rules are initialized, including the cluster center

Dictionary->

Expansion coefficient->

Case 2: when a is _imin When (n) is less than or equal to 1e-5, storing the current time sequence into the most compatible rule base, and performing clustering center R _i (n) updating, expressed as

Due to the complexity of external environmental noise, the acquired Beijing time series data has non-Gaussian characteristics, so that the final prediction effect of the model is not ideal. In order to solve the problems, an optimization target is established according to a formula (8), a related entropy criterion is essentially adopted to replace a traditional mean square error criterion, model parameter updating is performed based on a kernel recursion maximum related entropy method, and adverse effects of non-Gaussian noise or abnormal values are weakened. The calculation of the intermediate variables is then performed according to equation (9). In addition, the dimension of the kernel matrix in the kernel recursive maximum correlation entropy method depends on the size of the data amount, and it is difficult to avoid generating a large computational and storage burden when processing large-scale data. Therefore, a sparse method is adopted, so that the model only stores relevant input vectors, and a compact dictionary is formed. By comparison of

And analyzing whether the current data is added into the dictionary according to the relation between the current data and the distance threshold delta, wherein the relation is specifically divided into the following two parts:

(1) When (when)

When the current input data +.>

Added to the dictionary, matrix variable Q _i (n) and expansion coefficient θ _i (n) updating according to formula (10);

(2) When (when)

When the current input data +.>

Is rejected, dictionary is kept unchanged, matrix variable Q _i (n) and expansion coefficient θ _i (n) updating according to formula (11);

step 4.3: determining whether the training dataset is loaded, i.e., n < 4000?

If n is less than 4000, indicating that training is not finished, continuing to perform iterative updating at the next moment, and returning to the step 4.1; otherwise, the model training is completed, and the next step is entered.

Step 5: and testing on a trained model by using a test set, calculating the predicted value of the Beijing PM2.5 pollutant, performing inverse normalization treatment on the predicted value, and drawing a predicted curve and an error curve of the Beijing PM2.5 pollutant, which are shown in figure 2. As can be seen from the graph, the method can effectively track the change trend of the time sequence, has smaller prediction error value, and shows the accuracy of model prediction. Finally, RMSE and SMAPE were calculated according to equation (12), as shown in table 2. Compared with a quantization kernel recursion least square method (ANS-QKRRS) and a quantization kernel recursion generalized maximum correlation entropy method (QKRGMC) based on self-adaptive normalized sparsity, the method provided by the invention has the advantages that the obtained evaluation indexes are optimal, and the effectiveness of the method on Beijing PM2.5 prediction is further verified.

TABLE 2 prediction results of various methods

Finally, it should be noted that: the above examples are only for the purpose of illustrating the embodiments of the present invention, and it should be understood that the examples are only for the purpose of illustrating the present invention and not for the purpose of limiting the scope of the present invention, and that it is possible for those skilled in the art, after having read the present invention, to make several variations and modifications without departing from the spirit of the invention, and that various equivalent modifications of the present invention fall within the scope of the present invention as defined in the appended claims.

Claims

1. The method for online predicting the core recursion maximum correlation entropy time sequence based on rule evolution is characterized by comprising the following steps of:

step 1: collecting time sequence data and carrying out normalization processing on the time sequence data;

step 2: carrying out phase space reconstruction on the normalized time sequence data, and deep mining useful information in the time sequence;

step 3: dividing the data set reconstructed in the step 2, and setting and initializing parameters;

step 4: according to the training data set and parameter setting, performing iterative training of the model from the 2 nd moment;

step 5: and (3) selecting the most compatible rule in the rule base to output and predict the test data set by using the model trained in the step (4), then performing inverse normalization calculation on the sample data, and finally measuring the model prediction precision by using the evaluation index.

2. The method for online prediction of a core recursive maximum correlation entropy time sequence based on rule evolution according to claim 1, wherein the step 1 is specifically as follows:

building samples { (x (n), d (n)), n=1, 2, … }, for the prediction problem, where x (n) represents a model input vector consisting of t-dimensional feature inputs x (n), i.e., x (n) = [ x ] ₁ (n),x ₂ (n),…,x _t (n)]D (n) represents a prediction target, and n represents a time; secondly, considering the dimensional differences of the multidimensional features in the input vector, and processing the data by adopting a normalization method;

and carrying out normalization processing on the predicted target to obtain a normalized value d' (n) of the predicted target.

3. The method for online prediction of a kernel recursive maximum correlation entropy time sequence based on rule evolution according to claim 2, wherein in the step 1, the calculation formula of the normalization process is as follows:

wherein x (n) and x' (n) are values before and after normalization of input data, respectively, x _min And x _max Respectively, input dataIs a minimum and a maximum of (a).

4. The method for online prediction of a core recursive maximum correlation entropy time sequence based on rule evolution according to claim 1, wherein the step 2 is specifically as follows:

phase space reconstruction is carried out on the normalized time sequence data, and the reconstructed input vector is obtained

Expressed as:

5. The method for online prediction of a core recursive maximum correlation entropy time sequence based on rule evolution according to claim 1, wherein the step 3 is specifically as follows:

80% of the reconstructed time sequence data are selected as training sets, the total quantity of the training sets is recorded as N, and the rest is used as a test set;

A distance threshold delta;

at the 1 st moment, substituting the 1 st training data into the model, creating a first rule, forming a rule base, and initializing key parameters of the modelArrangements comprising wake-up index a ₁ (1) =0, cluster center

Dictionary->

Expansion coefficient->

Kappa. Shows Gaussian kernel function, variables b and b ^* Is defined as:

where σ is the gaussian kernel size.

6. The method for online prediction of a core recursive maximum correlation entropy time sequence based on rule evolution according to claim 1, wherein the step 4 is specifically as follows:

the model adopts compatibility measurement and wake-up index definition rules;

the compatibility measure ρ _i Is used for measuring input vector

Correlation with the i-th normal clustering center R; when the compatibility measurement value reaches the maximum, the similarity degree between the current input sample and the rule is indicated to be the maximum, and the rule is the most compatible rule; input vector->

wherein,,

and->

Respectively represent the nth characteristic input +.>

And the average value of the cluster centers R;

the wake-up index a _i Can be used as a compatibility measure ρ _i To create new rules to reduce negative effects of outliers; the calculation formula of the wake-up index a under the ith rule at the nth time is expressed as follows:

a _i (n)＝(1-β)a _i (n-1)+β(1-ρ _i (n)) (6)

wherein, beta represents the learning rate of the wake-up index;

step 4.2: will wake up index a _i (n) minimum and wake-up threshold

The range of (2) is 0-1, and the following two cases are specifically included:

case 1: when the minimum wake index is greater than the wake threshold, i.e

Dictionary for dictionary

Expansion coefficient->

Case 2: when the minimum wake index is less than or equal to the wake threshold

I.e. < ->

wherein eta epsilon [0,1] represents the learning rate of the clustering center;

then, in the subsequent parameter updating of the model structure, a related entropy criterion is selected to replace a mean square error criterion in a traditional kernel recursion least square method as a cost function, and the kernel recursion maximum related entropy method is utilized to improve the prediction performance of the model under the condition of non-Gaussian noise or abnormal value; the optimization objective based on the correlation entropy is defined as:

wherein, omega is the weight of the filter,

indicating the current input at time j>

Through the nonlinear mapped input vector, gamma represents regularization parameters, and I.I.I represents L2 norms; under the ith rule at the nth moment, obtaining an intermediate variable h by using a gradient descent method _i (n)、z _i (n) and lambda _i (n) calculating intermediate variables according to formula (9), the calculation formula being:

wherein c _ik (n) is the dictionary set in the ith rule at the nth time, k is the number of input vectors contained in the dictionary set, Q _i (n-1) is a matrix variable of the ith rule at the n-1 th moment;

since the kernel recursive maximum correlation entropy method performs parameter calculation based on sets of historical data, each set forms a dictionary, i.e., C _i (n)＝[c _i1 (n),…,c _in′ (n)]Wherein n' is the number stored in the dictionary set; if the dictionary contains all input data, the computational burden will be greatly increased; therefore, in order to reduce the computational complexity, the sparsification processing is performed by adopting a novel criterion, so that only relevant input vectors are reserved in the dictionary, and a compact dictionary is formed; defining a distance calculation formula as

In->

Representing the ith in the ith rule lower dictionary set c ^* A plurality of input vectors; then comparing the data with a distance threshold value to determine whether the data is included in a dictionary, wherein the threshold value ranges from 0 to 1; the method comprises the following two parts:

(1) When (when)

At the time, the current input sample +.>

The inclusion of a dictionary is to be taken into account,

wherein,,

e _i (n) is the prediction error, expressed as +.>

(2) When (when)

wherein the method comprises the steps of

Step 4.3: loading training data, and judging whether training is finished;

7. The method for online prediction of a kernel recursive maximum correlation entropy time sequence based on rule evolution according to claim 1, wherein in the step 5, the prediction accuracy of the evaluation index metric model is adopted, and the prediction accuracy comprises a root mean square error RMSE and a symmetric average absolute percentage error SMAPE.