CN108062561B

CN108062561B - Short-time data flow prediction method based on long-time and short-time memory network model

Info

Publication number: CN108062561B
Application number: CN201711264618.9A
Authority: CN
Inventors: 薛洋; 薛泽龙; 李磊
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-12-05
Filing date: 2017-12-05
Publication date: 2020-01-14
Anticipated expiration: 2037-12-05
Also published as: CN108062561A

Abstract

The invention discloses a short-time data flow prediction method based on a long-time and short-time memory network model, which comprises the following steps: firstly, obtaining a plurality of training samples in an observation point, then extracting the characteristics of the training samples, classifying the training samples according to the characteristics of the training samples, and respectively classifying the training samples into two types, namely a severe and gentle data flow value change trend or two types, namely an ascending and descending change trend; and training all training samples aiming at the LSTM model to obtain a trained main model, and respectively training the main model by adopting two types of training samples to respectively obtain a first type sub model and a second type sub model. Obtaining an observation point test sample, classifying the test sample through a classifier, inputting the test sample into a first type submodel or a second type submodel according to a classification result, and predicting a quantity flow value of a next time point of an observation point through the first type submodel or the second type submodel. The method improves the accuracy of short-time data stream prediction.

Description

Short-time data flow prediction method based on long-time and short-time memory network model

Technical Field

The invention belongs to the technical field of pattern recognition and artificial intelligence, and particularly relates to a short-time data flow prediction method based on a long-time and short-time memory network model.

Background

With the continuous and steady development of the world economy, the services in many countries are data streams, such as network load flow and traffic flow, which represent the characteristics of the services, and the prediction is the best way to optimize the services, such as predicting the load data of the internet, so that the scheduling for the next moment is provided for more suitable resource scheduling; for the prediction of the traffic flow, the configuration of the traffic resource can be optimized.

At present, many methods for predicting short-time data flow use a single model for prediction, however, data flow is a signal with nonlinearity and randomness, and a plurality of mixed components are difficult to distinguish and separate. Therefore, the prediction effect of a single model has a bottleneck, and when one model has a better prediction effect on the data stream under the condition of congestion, the prediction effect on the data stream under the condition of unblocked traffic is often to be improved, so that the existing short-time data stream prediction method has the defect of low prediction precision.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a short-time data stream prediction method based on a Long Short Term Memory (LSTM) network model, and the method can more accurately predict the short-time data stream.

The purpose of the invention is realized by the following technical scheme: a short-time data flow prediction method based on a long-time and short-time memory network model comprises the following steps:

step S1, aiming at an observation point of the data stream needing to be predicted in a short time, firstly collecting data flow values counted by a plurality of historical time points at the observation point, and then splicing and aggregating the collected data flow values counted by each historical time point into a one-dimensional array according to a time sequence and carrying out normalization processing; wherein the time intervals of every two adjacent time points are the same and are both T minutes; the data flow value counted at each time point refers to the data flow generated in the time interval from the last time point to the time point;

step S2, performing windowing processing of a sliding window aiming at the normalized one-dimensional array obtained in the step S1 to obtain a plurality of training samples, and taking a data flow value counted in each training sample as a data flow value counted at a next time point corresponding to a last time point as a label of the training sample; each training sample comprises data flow values counted at a plurality of time points;

step S3, extracting the characteristics of the training sample: performing first-order difference processing on the training sample to obtain the characteristics of the training sample; clustering the training samples according to the characteristics of the training samples, and separating two types of training samples with violent and gentle data flow value change trends or separating two types of training samples with ascending and descending data flow value change trends;

step S4, obtaining an LSTM model after model parameter initialization; then training the LSTM model, specifically: firstly, training an LSTM model by taking each training sample in all training samples as input and taking a label corresponding to each training sample as output to obtain a trained main model; inputting a class of training samples with violent or ascending data flow value variation trend acquired in the step S3 into the trained main model for training to obtain a first class of sub-model; meanwhile, inputting a class of training samples with a gentle or reduced data flow value change trend acquired in the step S3 into the trained main model for training to obtain a second class of sub models;

step S5, when the data flow value counted at the next time point of the observation point is to be predicted at the current time point, firstly, a sample composed of the data flow value counted at the current time point of the observation point and a plurality of time points before the current time point is obtained through a sliding window, and the sample is used as a test sample; in the test sample, the time intervals of every two adjacent time points are the same and are both T1 minutes; wherein T1 ═ T; the time interval between the next time point and the current time point of the data flow value to be predicted is also T minutes;

step S6, extracting the characteristics of the test sample: performing first-order difference processing on the test sample to obtain the characteristics of the test sample; then, judging whether the test sample belongs to a sample with a violent or ascending data flow value variation trend or belongs to a sample with a gentle or descending data flow value variation trend by the classifier according to the characteristics of the test sample; the classifier is obtained by training a training sample with the type separated in the step S3 as an input and a type to which the training sample belongs as an output;

s7, when the test sample belongs to a sample with a violent or ascending data flow value change trend, inputting the test sample into the first type submodel obtained in the S4, and predicting the data flow value counted at the next time point of the observation point through the first type submodel;

and when the test sample belongs to a sample with a gradual or reduced data flow value change trend, inputting the test sample into the second type submodel acquired in the step S4, and predicting the data flow value counted at the next time point of the observation point through the second type submodel.

Preferably, in step S1, the normalization process is performed on the one-dimensional array obtained after the splicing and aggregation in the following manner:

wherein x_fIs the f-th dimension in the one-dimensional array; x is the number of_minAnd x_maxRespectively corresponding to the maximum value and the minimum value in the spliced and aggregated one-dimensional array.

Preferably, the length of the sliding window is N, wherein when the data stream is a traffic data stream, the product of the length of the sliding window and the time interval T of each two adjacent time points satisfies the following relationship: NxT is less than or equal to 60.

Furthermore, in the step S1, in the step S1, when the data stream is a traffic data stream, T is less than or equal to 30;

and when the observation point history is data flow counted every E minutes, T is E, 2E, …, (n-1) E or nE, wherein n and E are both a certain value, and nE is less than or equal to 30.

Further, when T is Y minutes, the length N of the sliding window is an integer value of 2 to 60/Y.

Preferably, in step S3, if the data stream is a traffic data stream, the feature extraction process of the training sample is as follows: performing first-order difference processing on the training samples, and taking an absolute value of a result of each first-order difference as a characteristic of the training samples; wherein

When the training sample is [ x ]_tx_t-1... x_t-N+1]Then, after first-order difference processing is carried out, the following results are obtained:

[x_t-x_t-1，x_t-1-x_t-2，...x_t-N+2-x_t-N+1]；

whereinx_t、x_t-1，...，x_t-N+1Respectively corresponding to the data flow values counted at time points t, t-1, … and t-N + 1; n is the length of the sliding window;

the characteristics of the training sample obtained in step S3 are:

[|x_t-x_t-1|,|x_t-1-x_t-2|,...|x_t-N+2-x_t-N+1|]；

in step S3, if the data flow is a network load data flow, the feature extraction process of the training sample is as follows: performing first-order difference processing on the training sample, and taking a first-order difference result of the training sample as the characteristic of the training sample; wherein

When the training sample is [ x ]_tx_t-1... x_t-N+1]And then, performing first-order difference processing to obtain the characteristics of the training sample as follows:

[x_t-x_t-1，x_t-1-x_t-2，...x_t-N+2-x_t-N+1]。

preferably, in step S3, according to the characteristics of the training samples, all the training samples are separated by K-means clustering, so as to separate two types of training samples with a severe data flow rate value variation trend and a moderate data flow rate value variation trend, or two types of training samples with an increasing and decreasing data flow rate value variation trend.

Preferably, in step S4, when initializing the LSTM model, a matrix for specifying the dimension division is first generated, then singular value decomposition is performed to generate three matrices, i.e., a matrix U, a matrix Σ, and a matrix V, the matrix U is used as an initial value of a weight matrix of an input gate, a forgetting gate, an output gate, and candidate state values in the LSTM model hidden layer, and all the offset vectors in the LSTM model are set to 0.

Preferably, in step S6, the classifier is a K-nearest neighbor classifier.

Preferably, in step S6, if the data stream is a traffic data stream, the feature extraction process of the test sample is as follows: performing first-order difference processing on the test sample, and taking a first-order difference result of the test sample as the characteristic of the training sample; wherein

When the obtained test sample is [ x ]_t′x_t′-1... x_t′-N+1]Then, after first-order difference processing is carried out, the following results are obtained:

[x_t′-x_t′-1，x_t′-1-x_t′-2，...x_t′-N+2-x_t′-N+1]；

wherein x_t′Data flow value counted for current time point t

x_t′-1，...，x_t′-N+1Respectively corresponding to the data flow values counted at time points t, t-1, … and t-N + 1; n is the length of the sliding window;

the characteristics of the test sample obtained in step S6 are:

[|x_t′-x_t′-1|,|x_t′-1-x_t′-2|,...|x_t′-N+2-x_t′-N+1|]；

in step S6, if the data flow is a network load data flow, the feature extraction process of the test sample is as follows: performing first-order difference processing on the test sample, and taking a first-order difference result of the test sample as the characteristic of the test sample; wherein

When the obtained test sample is [ x ]_t′x_t′-1... x_t′-N+1]And then, performing first-order difference processing to obtain the characteristics of the test sample as follows:

[x_t′-x_t′-1，x_t′-1-x_t′-2，...x_t′-N+2-x_t′-N+1]。

compared with the prior art, the invention has the following advantages and effects:

(1) in the short-time data flow prediction method, a plurality of training samples in an observation point are obtained firstly, wherein each training sample correspondingly comprises a data flow value counted by a plurality of continuous time points: then, performing first-order difference on the training samples, taking absolute values as the characteristics of the training samples, and classifying the training samples according to the characteristics of the training samples, wherein the training samples are respectively in two types, namely a type that the data flow rate value change trend is severe or rising and a type that the data flow rate value change trend is slow or falling; then, each training sample in all the training samples is adopted to respectively train the LSTM model to obtain a trained main model, and then two types of training samples are respectively adopted to respectively train the main model to respectively obtain a first type sub model and a second type sub model. When the data flow value of the next time point of the observation point needs to be predicted, firstly, a test sample of the observation point is obtained, the test sample is classified through a classifier, so that the data flow change trend in the test sample is obtained, then the test sample is input into a first type sub model or a second type sub model according to the classification result, and the quantity flow value of the next time point of the observation point is predicted through the first type sub model or the second type sub model. According to the method, the submodel suitable for sharp or rising of the data flow value variation trend and the submodel suitable for slow or falling of the data flow value variation trend are obtained through the two types of training samples, after the test sample is obtained, the data flow variation trend of the test sample can be distinguished, and then the data flow prediction of the next time point is carried out through different submodels according to the data flow variation trend of the test sample; therefore, the method can model the flow with nonlinearity and randomness, and then predict the data flow value of the next time point at the current time point through the data flow values counted at the current time point and the previous time point of the observation point, and has the advantage of high accuracy of short-time data flow prediction.

(2) In the short-time data flow prediction method, aiming at an observation point of a short-time data flow needing to be predicted, historical data flow values counted by a plurality of continuous time points are collected at the observation point, and then the collected data flow values counted by each time point in the history are spliced and aggregated into a one-dimensional array according to a time sequence and are subjected to normalization processing; performing windowing processing on a sliding window aiming at the one-dimensional array obtained by aggregation so as to obtain a plurality of training samples; in the method, the training sample is composed of data flow values counted by a plurality of continuous time points; the time points can be continuous time points when the original data actually counts the data flow value or time points separated by a certain time, so that the training samples are very easy to obtain.

(3) According to the short-time data stream prediction method, the training sample is subjected to first-order difference, and then the absolute value is taken as the characteristic of the training sample, wherein the first-order difference characteristic of the training sample can represent the change rate of the data stream, and the change trend of the data stream can be effectively represented after the absolute value is taken. Therefore, the invention can accurately classify the types of the training samples by taking the first-order difference absolute value as the characteristic of the training samples, and provides further guarantee for accurate prediction of the short-time data stream.

(4) According to the short-time data flow prediction method, all training samples are separated through K-means clustering according to the characteristics of the training samples, and two types of training samples with violent and gentle data flow value change trends or two types of training samples with ascending and descending data flow value change trends can be effectively separated.

Drawings

FIG. 1 is a flow chart of a short-term data stream prediction method according to the present invention.

Fig. 2 is a graph showing the effect of K-means clustering on the data flow values randomly acquired at each time point in one day when the time interval between every two adjacent time points is 5 minutes in example 1 of the present invention.

Fig. 3 is a graph showing the effect of K-means clustering on the data flow rate values counted at each time point in the year when the time interval between every two adjacent time points is 5 minutes in example 1 of the present invention.

Fig. 4 is a comparison graph of the predicted value of the flow rate value at each time point and the actually counted data flow rate value when the time interval between every two adjacent time points is 5 minutes in example 1 of the present invention.

FIG. 5 is a MAPE comparison graph of the prediction data streams of the present invention and other models in example 1 with different length sliding windows at 5 minutes intervals between two adjacent time points.

Fig. 6 is a graph showing the effect of K-means clustering on the data flow rate values counted at each time point in 10 days taken at random when the time interval between every two adjacent time points is 60 minutes in example 2 of the present invention.

Fig. 7 is a graph showing the effect of K-means clustering on the data flow rate values counted at each time point in 100 days randomly taken when the time interval between every two adjacent time points is 60 minutes in example 2 of the present invention.

Fig. 8 is a comparison graph of the predicted value of the flow rate value at each time point and the actually counted data flow rate value when the time interval between two adjacent time points is 60 minutes in example 2 of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Examples

The embodiment discloses a short-term data flow prediction method based on a long-term and short-term memory network model (LSTM), as shown in fig. 1, the steps are as follows:

in this step, normalization processing is performed on the one-dimensional array obtained after splicing and polymerization in the following manner:

wherein x_fIs the f-th dimension in the one-dimensional array; x is the number of_minAnd x_maxRespectively is the maximum value and the minimum value in the spliced and polymerized one-dimensional array.

In the step, T is a fixed value, and when the data stream is a traffic data stream, T is less than or equal to 30;

when the observation point history is data flow counted once every E minutes, T can be E, 2E, …, (n-1) E or nE, wherein n and E are both a certain value, and nE is less than or equal to 30 when the data flow is a traffic data flow. For example, when the observation point is to count data traffic every 5 minutes, then T may be 5, 10, 15, 20, 25, or 30.

Step S2, performing windowing processing of a sliding window aiming at the normalized one-dimensional array obtained in the step S1 to obtain a plurality of training samples, and taking a data flow value counted in each training sample as a data flow value counted at a next time point corresponding to a last time point as a label of the training sample; each training sample comprises data flow values counted at a plurality of time points; e.g. for training sample x_tx_t-1... x_t-N+1]，x_tThe data flow value counted for the time point t is the data flow value counted for the last time point in the data flow values counted for all the time points in the training sample, and in this embodiment, the data flow value x counted for the next time point t +1 of the last time point t is used_t+1As a label for the training sample.

In this step, the length of the sliding window is N, wherein the product of the length of the sliding window and the time interval T of each two adjacent time points satisfies the following relationship: NxT is less than or equal to 60.

In this step, when T is Y minutes, the length N of the sliding window is an integer value of 2 to 60/Y.

Step S3, extracting the characteristics of the training sample: performing first-order difference processing on the training samples, and taking an absolute value of a result of each first-order difference as a characteristic of the training samples; clustering the training samples according to the characteristics of the training samples, and separating two types of training samples with violent data flow value change trend and gentle data flow value change trend, or separating two types of training samples with ascending data flow value change trend and descending data flow value change trend;

in this step, if the data stream is a traffic data stream, the feature extraction process of the training sample is as follows: performing first-order difference processing on the training samples, and taking an absolute value of a result of each first-order difference as a characteristic of the training samples; wherein

[x_t-x_t-1，x_t-1-x_t-2，...x_t-N+2-x_t-N+1]；

wherein x_t、x_t-1，...，x_t-N+1Respectively corresponding to the data flow values counted at time points t, t-1, … and t-N + 1; n is the length of the sliding window; x is the number of_tNamely, the data flow value at the time point t refers to the data flow generated in the time interval from the time point t-1 to the time point t; the meaning of the data flow value counted at other time points is the same.

The training samples obtained in this step are characterized as follows:

[|x_t-x_t-1|,|x_t-1-x_t-2|,...|x_t-N+2-x_t-N+1|]；

in this step, if the data stream is a network load data stream, the feature extraction process of the training sample is as follows: performing first-order difference processing on the training sample, and taking a first-order difference result of the training sample as the characteristic of the training sample; wherein

[x_t-x_t-1，x_t-1-x_t-2，...x_t-N+2-x_t-N+1]。

in the step, according to the characteristics of the training samples, all the training samples are separated through K-means clustering to separate two types of training samples with sharp and gentle data flow value variation trends, or two types of training samples with rising and falling data flow value variation trends. In this embodiment, when the data stream is a traffic data stream, two types of training samples with a sharp and gentle data traffic value variation trend are separated, and when the data stream is a network load data stream, two types of training samples with a data traffic value variation trend that increases and decreases are separated.

Step S4, obtaining an LSTM model after model parameter initialization; then training the LSTM model, specifically: firstly, training an LSTM model by taking each training sample in all training samples as input and taking a label corresponding to each training sample as output to obtain a trained main model; inputting a class of training samples with violent or ascending data flow value variation trend acquired in the step S3 into the trained main model for training to obtain a first class of sub-model; in addition, inputting a class of training samples with a gentle or reduced data traffic value change trend acquired in the step S3 into the trained main model for training to obtain a second class of sub models;

in the step, when the LSTM model is initialized, a matrix of the specified dimension division is generated firstly, then singular value decomposition is carried out to generate three matrixes of a matrix U, a matrix sigma and a matrix V, and the U matrix is used as a weight matrix W of an input gate, a forgetting gate, an output gate and candidate state values in the hidden layer of the LSTM model_i、W_f、W_o、W_c、U_i、U_f、U_o、U_c、V_oIs used to bias the bias vector b in the LSTM model_i、b_f、b_o、b_cAre all taken as 0.

In this step, the training process of the main model after the LSTM model training is: firstly, giving weight matrix W of input gate, forgetting gate, output gate and candidate state value in LSTM model hidden layer_i、W_f、W_o、W_c、U_i、U_f、U_o、U_c、V_oInitial and offset vectors b_i、b_f、b_o、b_cAfter the training sample is input, calculating the gradient of the loss function through forward propagation, updating the model parameters through backward propagation, and finishing the training when the model parameters are updated to be convergent or the maximum iteration number.

In step S5, when the data flow rate value counted at the next time point of the observation point is to be predicted at the current time point, first, the data flow rate value is predictedObtaining a sample consisting of normalized data flow values counted at the current time point of the observation point and a plurality of time points in front of the current time point through a sliding window, and taking the sample as a test sample; in the test sample, the time intervals of every two adjacent time points are the same and are both T1; wherein T1 ═ T; the time interval between the next time point and the current time point of the data flow value to be predicted is also T; the data flow value in the test sample is also x acquired in step S1_minAnd x_maxCarrying out normalization; the normalization formula is identical to that in step S1.

Step S6, extracting the characteristics of the test sample: performing first-order difference processing on the test sample to obtain the characteristics of the test sample; then, judging whether the test sample belongs to a sample with a violent or ascending data flow value variation trend or belongs to a sample with a gentle or descending data flow value variation trend by the classifier; in this embodiment, when the data stream is a traffic data stream, the classifier determines whether the test sample belongs to a sample with a severe data flow value variation trend or a sample with a gentle data flow value variation trend; when the data flow is the network load data flow, the classifier judges whether the test sample belongs to a sample with an increasing data flow value change trend or a sample with a decreasing data flow value change trend;

the classifier is a K-nearest neighbor classifier, and is obtained by training a training sample of the type separated in the step S3 as an input and a type to which the training sample belongs as an output; the training process of the K-nearest neighbor algorithm is inert, only the training samples need to be saved, and K training samples closest to the training samples are calculated when the test samples are received, wherein K in the K-nearest neighbor classifier in the embodiment is 10. In this embodiment, the K-nearest neighbor classifier adopts a weighted voting strategy, that is, the inverse of the distance is used as a weight coefficient in voting, and the closer the distance is, the greater the influence on the determination of the final class is.

In the above step S6, if the data stream is a traffic data stream, the feature extraction process of the test sample is as follows: performing first-order difference processing on the test sample, and taking a first-order difference result of the test sample as the characteristic of the training sample; wherein

[x_t′-x_t′-1，x_t′-1-x_t′-2，...x_t′-N+2-x_t′-N+1]；

wherein x_t′Data flow value counted for current time point t

the characteristics of the test sample obtained in step S6 are:

[|x_t′-x_t′-1|,|x_t′-1-x_t′-2|,...|x_t′-N+2-x_t′-N+1|]；

[x_t′-x_t′-1，x_t′-1-x_t′-2，...x_t′-N+2-x_t′-N+1]。

Example 1

The method of the embodiment is applied to the prediction of traffic data flow, and specifically comprises the following steps:

in this embodiment, traffic data traffic values for 52 weeks from 1/2015 to 12/2015 and 30/2015 at a certain traffic observation point are taken, wherein the traffic data traffic values are acquired by sensors every 30 seconds. In this embodiment, after the weekend and holiday days are removed, traffic data flow values of 247 days remain. In the experiment, a training sample set is obtained through the traffic data flow value counted at each time point in the first 200 days, and a testing sample set is obtained through the traffic data flow value counted at each time point in the last 47 days.

As shown in fig. 2, when T is 5 in this embodiment, that is, when the time interval between every two adjacent time points is 5 minutes, when the training samples acquired at random one day from 1/2015 to 12/2015 and 30/2015 are K-means clustered, two types of training samples with a sharp data traffic value variation trend and a gentle data traffic value variation trend are separated. As shown in fig. 3, when T is 5 in this embodiment, that is, when the time interval between every two adjacent time points is 5 minutes, when training samples acquired through data flow all year around 2015, 1 month and 1 day to 2015, 12 month and 30 days, two types of training samples with a sharp data traffic value variation trend and a gentle data traffic value variation trend are separated.

For the obtained training sample set and the test sample set, when T is 5, the comparison between the data flow value at each time point predicted by the method of this embodiment and the actually counted data flow value at each time point is shown in fig. 4; as can be seen from fig. 4, the prediction result of the method of this embodiment well fits the actually observed data flow value, and it can be seen that the data flow value prediction accuracy of the method of this embodiment is very high.

When sliding windows with different lengths are adopted, the method of the present embodiment is shown in fig. 5, for example, the Average absolute percentage error MAPE pair of the data stream predicted by using other 4 models is shown in fig. 5, where the models commonly used in the field of short-time data stream prediction are a Historical mean Model (HA), a K Nearest Neighbor (KNN), a Support Vector Regression (SVR) Model and a long-short time memory Model (single LSTM) of a single Model, where the Multi-Model LSTM in fig. 5 is the Average absolute percentage error MAPE of the data stream predicted by using sliding windows with different lengths in the method. As can be seen from fig. 5, when sliding windows of different lengths are used, the method of the present embodiment predicts lower MAPE for the data stream than the other four models, especially the long-short time memory model compared with the single model.

When T takes different values (T ═ 10, 15, 20, and 25), the prediction accuracy of the method of the present embodiment and the other four models are shown in table 1, respectively:

TABLE 1

Wherein MAPE (Mean Absolute percentage Error) and RMSE (root Mean Square Error) are respectively evaluation standards of prediction accuracy; respectively as follows:

wherein M represents the total number of test samples, x_i,realDenotes the true label, x, corresponding to the ith test sample_i,preThe data flow value obtained after the prediction is performed on the ith test sample by the method of the embodiment.

As can be seen from table 1, when T takes different values, both prediction error indicators in the method of the present embodiment are lower than those in the other 4 methods.

Example 2

The method of the embodiment is applied to the prediction of the network load data traffic, and specifically comprises the following steps:

in this embodiment, a network flow load data log of wikipedia from 2014 to 2016 is obtained, wherein the number of persons accessing a platform is recorded once per hour, then, the network observation point 2014 from 6 months 1 to 2015 12 months 30 is collected for 45510 hours in total, in the experiment, the first 36408 hours are taken as a training sample set, and the last 9102 hours are taken as a test sample set.

As shown in fig. 6, when T is 60 in this embodiment, that is, when the time interval between every two adjacent time points is 60 minutes, when training samples acquired at random 10 days from 1/6/2014 to 30/12/2015 are K-means clustered, the separated two types of training samples with an increasing trend of data traffic value change and a decreasing trend of data traffic value change are shown in the effect diagram. As shown in fig. 7, when T is 60 in this embodiment, that is, when the time interval between every two adjacent time points is 60 minutes, when K-means clustering is performed on training samples acquired through random 100 days in 6/1/2014 to 12/30/2015, two types of training samples with separated data traffic value variation trends rising and falling are shown.

For the obtained training sample set and the test sample set, when T is 5, the comparison between the data flow value at each time point predicted by the method of this embodiment and the actually counted data flow value at each time point is shown in fig. 8; as can be seen from fig. 8, the prediction result of the method of this embodiment well fits the actual data flow value, and it can be seen that the data flow prediction accuracy of the method of this embodiment is very high.

When T is 60, the prediction accuracy of the method and the other four models are respectively shown in table 2, where the other 4 models respectively refer to a historical mean Model (HA), a K Nearest Neighbor Model (KNN), a Support Vector Regression (SVR) Model and a single Model long-short time memory Model (single LSTM) which are commonly used in the short-time data stream prediction field, and the Multi-Model LSTM is the average absolute percentage error MAPE and the root mean square error of the data stream predicted by the method of the present embodiment.

TABLE 2

As can be seen from table 2, the prediction error index of the prediction accuracy of the method of the present embodiment is lower than that of the other 4 models.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A short-time data flow prediction method based on a long-time and short-time memory network model is characterized by being applied to the prediction of traffic data flow or the prediction of network load data flow, and comprising the following steps of:

2. The method for predicting short-term data stream based on long-term and short-term memory network model according to claim 1, wherein in step S1, the normalization process is performed on the one-dimensional array obtained after the concatenation and aggregation by:

3. The short-term data flow prediction method based on the long-term memory network model as claimed in claim 1, wherein the length of the sliding window is N, and when the data flow is a traffic data flow, the product of the length of the sliding window and the time interval T of every two adjacent time points satisfies the following relationship: NxT is less than or equal to 60.

4. The method for predicting short-term data flow based on short-term memory network model as claimed in claim 3, wherein in step S1, in step S1, when the data flow is traffic data flow, T is less than or equal to 30;

5. The short-term data flow prediction method based on a long-term and short-term memory network model as claimed in claim 4, wherein when T is Y minutes, the length N of the sliding window is an integer value of 2 to 60/Y.

6. The method for predicting short-term data flow based on long-term and short-term memory network model of claim 1, wherein in step S3, if the data flow is traffic data flow, the feature extraction process of the training samples is as follows: performing first-order difference processing on the training samples, and taking an absolute value of a result of each first-order difference as a characteristic of the training samples; wherein

[x_t-x_t-1，x_t-1-x_t-2，...x_t-N+2-x_t-N+1]；

wherein x_t、x_t-1，...，x_t-N+1Respectively corresponding to the data flow values counted at time points t, t-1, … and t-N + 1; n is the length of the sliding window;

the characteristics of the training sample obtained in step S3 are:

[|x_t-x_t-1|,|x_t-1-x_t-2|,...|x_t-N+2-x_t-N+1|]；

[x_t-x_t-1，x_t-1-x_t-2，...x_t-N+2-x_t-N+1]。

7. the method for predicting short-term data flow based on a long-term and short-term memory network model according to claim 1, wherein in step S3, according to the characteristics of the training samples, all the training samples are separated by K-means clustering to separate two types of training samples with severe data flow rate value variation trend and moderate data flow rate value variation trend, or two types of training samples with rising and falling data flow rate value variation trend.

8. The method for predicting a short-term data stream based on a long-term and short-term memory network model as claimed in claim 1, wherein in step S4, when initializing the LSTM model, a matrix of a specified dimension division is first generated, then singular value decomposition is performed to generate three matrices, i.e. a matrix U, a matrix Σ, and a matrix V, the matrix U is used as an initial value of a weight matrix of an input gate, a forgetting gate, an output gate, and candidate state values in the LSTM model hidden layer, and all bias vectors in the LSTM model are set to 0.

9. The short-term data flow prediction method based on the long-term and short-term memory network model according to claim 1, wherein in step S6, the classifier is a K-nearest neighbor classifier.

10. The method for predicting short-term data flow based on long-term and short-term memory network model of claim 1, wherein in step S6, if the data flow is traffic data flow, the feature extraction process of the test sample is as follows: performing first-order difference processing on the test sample, and taking a first-order difference result of the test sample as the characteristic of the training sample; wherein

[x_t′-x_t′-1，x_t′-1-x_t′-2，...x_t′-N+2-x_t′-N+1]；

wherein x_t′Data flow value counted for current time point t

x_t′-1，...，x_t′-N+1Corresponding to time points t, t-1, …, t, respectively-N +1 statistical data flow values; n is the length of the sliding window;

the characteristics of the test sample obtained in step S6 are:

[|x_t′-x_t′-1|,|x_t′-1-x_t′-2|,...|x_t′-N+2-x_t′-N+1|]；

[x_t′-x_t′-1，x_t′-1-x_t′-2，...x_t′-N+2-x_t′-N+1]。