CN111327441B

CN111327441B - Traffic data prediction method, device, equipment and storage medium

Info

Publication number: CN111327441B
Application number: CN201811534956.4A
Authority: CN
Inventors: 唐春; 叶德忠; 吕海兵; 蒋勇; 周亮; 魏昕; 段齐; 高赟
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2018-12-14
Filing date: 2018-12-14
Publication date: 2022-07-08
Anticipated expiration: 2038-12-14
Also published as: CN111327441A

Abstract

The application relates to a traffic data prediction method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring flow data sequences of a plurality of data flow devices; cleaning the abnormal flow data in the flow data sequence of each data flow device; performing cluster analysis on the flow data sequence after data cleaning to obtain a plurality of flow data matrixes, wherein the flow data matrixes comprise missing values; filling missing values in each flow data matrix by using a preset filling mode to obtain a missing value filling matrix; and determining the flow predicted value of each data flow device by utilizing the preset neural network model, the missing value filling matrix and the position parameter of each data flow device. The method can solve the problem of low accuracy of flow data prediction in the prior art, and achieves the technical effect of improving the accuracy of the flow data prediction.

Description

Traffic data prediction method, device, equipment and storage medium

Technical Field

The present application relates to the field of data mining and data analysis of traffic data, and in particular, to a traffic data prediction method, apparatus, device, and storage medium.

Background

With the introduction of energy saving and emission reduction concepts, the entire ICT (Information and Communication Technology ) industry is beginning to shift to the green energy saving industry. As an important component of an ICT industrial chain, the power consumption of a base station is large in the energy consumption of the whole ICT industry, and as an energy source in the ICT industry, the reduction of the power consumption of the base station is imperative.

The main reason for the high power consumption of the existing base station is that the conventional base station is basically designed to meet the capacity requirement of the peak period user, so the capacity of the base station is usually large. However, in general, the base station has less time to peak, which is mainly due to tidal phenomena of day and night and regional differences of user behaviors, so that the load flow of the base station shows obvious time and space differences.

Meanwhile, with the rise of the intelligent mobile terminal, the mobile internet also shows a rapid growth trend, and the number and the density of the base stations are also greatly increased. If each base station can dynamically adjust the working state according to the change of the network flow, the method not only can effectively meet the user requirements, but also can greatly reduce the energy consumption of the base station. Therefore, it is important to analyze and predict the load traffic of the whole base station.

Disclosure of Invention

In order to solve the technical problems or at least partially solve the technical problems, the present application provides a method, an apparatus, a device and a storage medium for predicting traffic data, which alleviate the problem of low accuracy of predicting traffic data in the prior art.

In a first aspect, an embodiment of the present application provides a traffic data prediction method, including:

acquiring flow data sequences of a plurality of data flow devices;

performing data cleaning on abnormal flow data in the flow data sequence of each data flow device;

performing clustering analysis on the flow data sequence after data cleaning to obtain a plurality of flow data matrixes, wherein the flow data matrixes comprise missing values;

filling missing values in each flow data matrix by using a preset filling mode to obtain a missing value filling matrix;

and determining the flow predicted value of each data flow device by utilizing a preset neural network model, the missing value filling matrix and the position parameter of each data flow device.

Optionally, the obtaining a flow data sequence of a plurality of data flow devices includes:

collecting flow data of each data flow device on a preset time node;

and summarizing the flow data of each data flow device according to the sequence of the preset time nodes to obtain the flow data sequence of each data flow device.

Optionally, the data cleaning of the abnormal flow data in the flow data sequence of each data flow device includes:

acquiring a flow knowledge base obtained in advance;

in each flow data sequence, determining the flow data corresponding to the flow knowledge base as the abnormal flow data;

and deleting the abnormal flow data.

Optionally, the performing cluster analysis on the flow data sequence after data cleaning to obtain a plurality of flow data matrices includes:

performing data conversion on the flow data in each flow data sequence subjected to data cleaning according to a preset data format to respectively obtain a converted flow data sequence of each data flow device;

calculating a correlation coefficient between any two converted flow data sequences to obtain a correlation matrix R and a similarity distance matrix S, wherein R ═ R (R ═ R)_ij)_M×M，S＝(s_ij)_M×M,s_ij＝1-r_ijWherein r is_ijFor the ith said converted traffic data sequence anda correlation coefficient between jth converted flow data sequences, where M is the number of data flow devices, i is 1,2, …, M, j is 1,2, …, M;

determining a clustering number C by using the position parameters of each data flow device, clustering the M converted flow data sequences by using the clustering number C and a preset clustering mode to obtain C converted flow data sets V_cSaid converted traffic data set V_cIncluding L_cA sequence of said converted traffic data, wherein L₁+L₂+…+L_C＝M,c＝1,2,…,C；

Sorting the converted traffic data sequences in each converted traffic data set by using the similarity distance matrix S;

and respectively converting each sequenced converted flow data set into the flow data matrix.

Optionally, the formula corresponding to the preset data format is as follows:

wherein the content of the first and second substances,

for the nth data, a, in the converted flow data sequence of the mth data flow device_mnAnd the nth flow data in the mth data-cleaned flow data sequence is M, 1,2, …, M, N, 1,2, …, N.

Optionally, a correlation coefficient formula is used to calculate a correlation coefficient between any two converted flow data sequences, where the correlation coefficient formula is:

wherein i, j is 1,2, …, M.

Optionally, the filling missing values in each traffic data matrix by using a preset filling manner to obtain a missing value filling matrix includes:

setting an initialization iteration matrix and an initialization iteration step length;

and for each flow data matrix, inputting the initialization iteration matrix, the initialization iteration step length and the flow data matrix into a preset iteration model to obtain a missing value filling matrix corresponding to the flow data matrix.

Optionally, the determining a predicted flow value of each data flow device by using a preset neural network model, the missing value filling matrix, and a location parameter of each data flow device includes:

decomposing each of the missing value padding matrices into missing value padding traffic data sequences for the data traffic devices;

filling a flow data sequence according to the position parameter and the missing value of each data flow device to construct a training data set and a prediction data set;

inputting the training data set into the preset neural network model to obtain a trained model;

and inputting the prediction data set into the trained model to obtain the flow prediction value of each data flow device.

In a second aspect, an embodiment of the present application provides a traffic data prediction apparatus, including: the device comprises an acquisition module, a data cleaning module, a clustering module, a missing value filling module and a prediction module;

the acquisition module is used for acquiring flow data sequences of a plurality of data flow devices;

the data cleaning module is used for cleaning the abnormal flow data in the flow data sequence of each data flow device;

the clustering module is used for clustering and analyzing the flow data sequence after data cleaning to obtain a plurality of flow data matrixes, and the flow data matrixes comprise missing values;

the missing value filling module is used for filling missing values in each flow data matrix by using a preset filling mode to obtain a missing value filling matrix;

and the prediction module is used for determining the flow prediction value of each data flow device by utilizing a preset neural network model, the missing value filling matrix and the position parameter of each data flow device.

In a third aspect, an embodiment of the present application provides a flow data prediction apparatus, including: a processor, a memory, a communication interface, and a bus;

the processor, the memory and the communication interface complete mutual communication through the bus;

the communication interface is used for information transmission between external devices;

the processor is configured to invoke program instructions in the memory to perform the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing computer instructions for causing a computer to perform the steps of the method according to the first aspect.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:

(1) in the application, firstly, abnormal flow data in a flow data sequence of each data flow device is subjected to data cleaning, the interference of the abnormal flow data to normal flow data is eliminated, then, the flow data sequence after the data cleaning is subjected to cluster analysis to obtain a plurality of flow data matrixes, missing value filling is carried out on each flow data matrix to obtain a missing value filling matrix, so that the correlation between the flow data sequences after the data cleaning in each flow data matrix is higher, the missing value filling matrix corresponding to the flow data matrix can more accurately restore real data, and finally, the flow predicted value of each data flow device is determined by utilizing a preset neural network model, the missing value filling matrix and the position parameters of each data flow device, and the real data can be more accurately restored by the missing value filling matrix, therefore, the accuracy of the flow predicted value of each data flow device is improved, the problem of low accuracy of flow data prediction in the prior art is solved, and the technical effect of improving the accuracy of flow data prediction is achieved;

(2) in this application, obtaining a flow data sequence of a plurality of data flow devices includes: collecting flow data of each data flow device on a preset time node; according to the sequence of the preset time nodes, the flow data of each data flow device is collected to obtain the flow data sequence of each data flow device, so that in the process of obtaining the flow data sequence, a worker can set the preset time nodes according to actual requirements, the flexibility of the process of obtaining the flow data sequence and the diversity of the flow data sequence are improved, and the actual requirements are better met;

(3) in this application, the flow prediction value of each data traffic device is determined by using the preset neural network model, the missing value filling matrix and the position parameter of each data traffic device, and the method includes: decomposing each missing value filling matrix into a missing value filling flow data sequence of the data flow equipment; filling a flow data sequence according to the position parameter and the missing value of each data flow device, and constructing a training data set and a prediction data set; inputting the training data set into a preset neural network model to obtain a trained model; inputting the prediction data set into the trained model to obtain the flow prediction value of each data flow device, wherein the missing value filling matrix can more accurately restore real data, and meanwhile, the position parameters of each data flow device are utilized to construct the training data set and the prediction data set, so that the prediction data set can accurately restore the real data, and the trained model is more accurate, therefore, the trained model can better and more efficiently predict the change of the flow data, and the accuracy of the flow prediction value of each data flow device is improved;

(4) in this application, data cleaning is carried out to the abnormal flow data in the flow data sequence of each data flow device, including: acquiring a flow knowledge base obtained in advance; in each flow data sequence, determining the flow data corresponding to the flow knowledge base as abnormal flow data; deleting abnormal flow data, and performing data cleaning on the abnormal flow data in the flow data sequence of each data flow device by using a flow knowledge base, so that the efficiency and the accuracy of data cleaning can be improved, and powerful support can be provided for network fault analysis of the data flow devices;

(5) in the application, the flow predicted value of each data flow device is determined by utilizing the preset neural network model, the time-space characteristics of the flow data sequence can be fully considered, and certain universality is achieved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart of a traffic data prediction method according to an embodiment of the present application;

fig. 2 is a schematic diagram illustrating comparison of average absolute percentage errors of filling results obtained by using a flow data matrix filling method, a gaussian filling method, and a K-nearest neighbor algorithm filling method based on cluster analysis according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating comparison of average absolute percentage errors of predicted results obtained by using a flow data matrix filling result, a gaussian filling result, and a K-nearest neighbor algorithm filling result, respectively, according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a flow data prediction apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a flow data prediction device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The application provides a flow data prediction method, a flow data prediction device, flow data prediction equipment and a storage medium, solves the problem of low accuracy of flow data prediction in the prior art, and achieves the technical effect of improving the accuracy of flow data prediction.

First, a flow data prediction method in the embodiment of the present application is described in detail, as shown in fig. 1, the flow data prediction method may include steps S101 to S105:

s101, acquiring flow data sequences of a plurality of data flow devices.

Illustratively, the data traffic device may be a base station.

And S102, cleaning the abnormal flow data in the flow data sequence of each data flow device.

S103, carrying out cluster analysis on the flow data sequence after data cleaning to obtain a plurality of flow data matrixes, wherein the flow data matrixes comprise missing values.

And S104, filling missing values in each flow data matrix by using a preset filling mode to obtain a missing value filling matrix.

And S105, determining the flow predicted value of each data flow device by using a preset neural network model, the missing value filling matrix and the position parameter of each data flow device.

In the embodiment of the invention, abnormal flow data in a flow data sequence of each data flow device is firstly cleaned, the interference of the abnormal flow data to normal flow data is eliminated, then the flow data sequence after cleaning is subjected to cluster analysis to obtain a plurality of flow data matrixes, each flow data matrix is subjected to missing value filling to obtain a missing value filling matrix, thus the correlation between the flow data sequences after cleaning the data in each flow data matrix is higher, so that the missing value filling matrix corresponding to the flow data matrix can more accurately restore real data, finally, the flow predicted value of each data flow device is determined by utilizing a preset neural network model, the missing value filling matrix and the position parameter of each data flow device, and the real data can be more accurately restored by the missing value filling matrix, therefore, the accuracy of the flow predicted value of each data flow device is improved, the problem of low accuracy of flow data prediction in the prior art is solved, and the technical effect of improving the accuracy of flow data prediction is achieved.

In practical applications, it is necessary to satisfy the requirement of various flow data sequences, and to solve this problem, in another embodiment of the present invention, the step S101 may include the following steps:

and S1011, collecting the traffic data of each data traffic device on a preset time node.

Illustratively, the data traffic device may be a base station. The traffic data may be total traffic data, or may be average network flow rate data, or may be maximum network flow rate data. The staff can set up according to the practical application demand and predetermine the time node. The preset time node may be a time node with a time interval of 15 minutes, or may be a time node with a time interval of 1 hour, or may be a time node with a time interval of 1 day.

The clock of each data flow device is synchronous, and the accuracy of flow data is guaranteed.

And S1012, summarizing the traffic data of each data traffic device according to the sequence of the preset time nodes to obtain the traffic data sequence of each data traffic device.

The following description will be given by taking an example in which M data traffic devices are provided and N traffic data per data traffic device is provided. Then, the traffic data sequence of the 1 st data traffic device may be (a)₁₁,a₁₂,…,a_1N) The flow data sequence of the 2 nd data flow device may be (a)₂₁,a₂₂,…,a_2N) …, the flow data sequence of the mth data flow device may be (a)_M1,a_M2,…,a_MN). With (a)_M1,a_M2,…,a_MN) For the purpose of illustration, a_M1May be the 1 st flow data, a, of the Mth data flow device collected_M2For the 2 nd flow data of the M data flow device collected, …, a_MNIs the nth flow data of the mth data flow device.

In the embodiment of the present invention, a traffic data sequence of each data traffic device may also be saved.

In the embodiment of the present invention, acquiring a traffic data sequence of a plurality of data traffic devices includes: collecting flow data of each data flow device on a preset time node; according to the sequence of the preset time nodes, the flow data of each data flow device are collected to obtain the flow data sequence of each data flow device, so that in the process of obtaining the flow data sequence, a worker can set the preset time nodes according to actual requirements, the flexibility of the process of obtaining the flow data sequence and the diversity of the flow data sequence are improved, and the actual requirements are better met.

In still another embodiment of the present invention, step S102 may include the steps of:

and S1021, acquiring a flow knowledge base obtained in advance.

Wherein, the flow knowledge base can include: abnormal flow data and network equipment fault information corresponding to the abnormal flow data. Specifically, current network equipment fault information is recorded when abnormal flow data are generated, or current flow data are automatically recorded as abnormal flow data when the network equipment is in fault, so that a flow knowledge base is established.

And S1022, in each flow data sequence, determining the flow data corresponding to the flow knowledge base as the abnormal flow data.

Exemplary, the abnormal traffic data may include: outliers and/or duplicates.

And S1023, deleting the abnormal flow data.

In the embodiment of the invention, the value of the data can be filled with zero after the abnormal flow data is deleted, so that the influence on subsequent calculation caused by the missing of the data is avoided. With (a)₂₁,a₂₂,…,a_2N) For the sake of example, if a is determined₂₂If the data is abnormal flow data, deleting a₂₂Simultaneously at a₂₂At the data location, i.e. (a)₂₁,0,…,a_2N)。

Before the step S102, the following steps may be further included: acquiring the missing data amount and the total data amount of the flow data sequences of all the data flow equipment, calculating the ratio of the missing data amount to the total data amount, judging whether the ratio is greater than a preset threshold value, if so, discarding the flow data sequences of all the data flow equipment, and if not, performing step S102. Preferably, the preset threshold may be 0.4.

In the embodiment of the present invention, the data cleaning of the abnormal flow data in the flow data sequence of each data flow device includes: acquiring a flow knowledge base obtained in advance; in each flow data sequence, determining the flow data corresponding to the flow knowledge base as abnormal flow data; and deleting the abnormal flow data, and performing data cleaning on the abnormal flow data in the flow data sequence of each data flow device by using the flow knowledge base, so that the efficiency and the accuracy of data cleaning can be improved, and powerful support can be provided for network fault analysis of the data flow devices.

In still another embodiment of the present invention, step S103 may include the steps of:

and S1031, performing data conversion on the flow data in each flow data sequence subjected to data cleaning according to a preset data format, and respectively obtaining a converted flow data sequence of each data flow device.

The formula corresponding to the preset data format may be:

wherein the content of the first and second substances,

for the nth data, a, in the converted flow data sequence of the mth data flow device_mnThe flow rate data of the mth data-cleaned flow rate data sequence is represented by M-1, 2, …, M-1, 2, …, N.

Can be the sample mean value s of the flow data sequence after the mth data cleaning_mThe standard deviation of the flow data sequence after the mth data washing can be set.

Specifically, with (a)₁₁,a₁₂,…,a_1N) The corresponding converted traffic data sequence may be

And (a)₂₁,a₂₂,…,a_2N) The corresponding converted traffic data sequence may be

And (a)_M1,a_M2,…,a_MN) The corresponding converted traffic data sequence may be

S1032, calculating a correlation coefficient between any two converted flow data sequences to obtain a correlation matrix R and a similarity distance matrix S.

R＝(r_ij)_M×M，S＝(s_ij)_M×M,s_ij＝1-r_ijWherein r is_ijAnd M is a correlation coefficient between the ith converted flow data sequence and the jth converted flow data sequence, where i is 1,2, …, and M, j is 1,2, …, M.

Wherein, the correlation coefficient between any two converted flow data sequences can be calculated by using a correlation coefficient formula, and the correlation coefficient formula can be:

specifically, the correlation matrix R may be:

specifically, the similarity distance matrix S may be:

s1033, determining a clustering number C by using the position parameter of each data flow device, clustering the M converted flow data sequences by using the clustering number C and a preset clustering mode, and obtaining C converted flow data sets V_cSaid converted traffic data set V_cIncluding L_cA sequence of said converted traffic data, wherein L₁+L₂+…+L_C＝M,c＝1,2,…,C。

Preferably, the preset clustering method may be an R-type Ward clustering method.

For example, M is 10, and C is 3. Clustering 10 converted flow data sequences to obtain 3 converted flow data sets V₁、V₂And V₃Wherein the converted traffic data set V₁The flow data sequence after the conversion is respectively the 1 st flow data sequence after the conversion, the 3 rd flow data sequence after the conversion and the 4 th flow data sequence after the conversion. Converted traffic data set V₂The flow data sequence after 4 conversions is included, which is the flow data sequence after 2 th conversion, the flow data sequence after 5 th conversion, the flow data sequence after 6 th conversion and the flow data sequence after 8 th conversion. Converted traffic data set V₃The flow data sequence comprises 3 converted flow data sequences, namely a 7 th converted flow data sequence, a 9 th converted flow data sequence and a 10 th converted flow data sequence.

S1034, sorting the converted traffic data sequences in each converted traffic data set by using the similarity distance matrix S.

Wherein for the converted traffic data set V₁Determining the 1 st converted flow data sequence and the 3 rd converted flow according to the similarity distance matrix SThe method comprises the steps of measuring the sequencing priority among flow data sequences, determining the sequencing priority between a 1 st converted flow data sequence and a 4 th converted flow data sequence, determining the sequencing priority between a 3 rd converted flow data sequence and a 4 th converted flow data sequence, and sequencing the 1 st converted flow data sequence, the 3 rd converted flow data sequence and the 4 th converted flow data sequence by utilizing 3 sequencing priorities, so that the converted flow data sequences with higher relevance are closer to one another.

S1035, respectively converting each sorted converted traffic data set into the traffic data matrix.

Exemplary, sorted set of transformed traffic data V₁The corresponding traffic data matrix may be:

wherein, according to H₁The expression of (1) th converted traffic data sequence and (3) th converted traffic data sequence is higher than the correlation between (3) th converted traffic data sequence and (4) th converted traffic data sequence, and the other same principles.

In particular, with the sorted converted traffic data set V₁The corresponding traffic data matrix may be H₁And the sorted converted traffic data set V₂The corresponding traffic data matrix may be H₂…, and the sorted converted traffic data set V_CThe corresponding traffic data matrix may be H_C。

In still another embodiment of the present invention, the step S104 may include the steps of:

s1041, setting an initialization iteration matrix and an initialization iteration step size.

Illustratively, the iteration matrix is initialized

C1, 2, …, C, initializing iteration step δ₁＝1。

And S1042, inputting the initialization iteration matrix, the initialization iteration step length and the flow data matrix into a preset iteration model for each flow data matrix to obtain a missing value filling matrix corresponding to the flow data matrix.

Preferably, the preset iterative model may be a soft threshold iterative shrinkage model of linear brageman iteration in the singular value threshold algorithm.

Specifically, the iterative model may be:

wherein τ > 0, Y^kFor iterative matrices, the operator D is shrunk_τIs defined as follows: d_τ(Y)＝UD_τ(Σ)V^T,D_τ(Σ)＝diag({σ_i-τ}₊) To do so

Σ＝diag({σ_i}_1＜i＜r) R is

Is determined. Omega is [ L]×[N]A uniform random sampling set of (2), P_ΩThe matrix form is represented on Ω. t is t₊Representing the non-negative part of t, i.e. t₊＝max(0,t)。

Adopting grid search optimization to update contraction operator D in each iteration process_τAnd an iteration step delta_kThe specific updating process is as follows:

wherein f is_τ(x) In order to be a function of the lagrange,

represents δ corresponding to the minimum value of f (x)_kThe value of (c).

Wherein the content of the first and second substances,

to represent

F norm of (d). Tau is_kIs a threshold value of tau, threshold value tau_kThe contraction operator D can be further accelerated_τConvergence of (2).

And for each flow data matrix, obtaining a missing value filling matrix corresponding to the flow data matrix after the iteration process is completed. In particular, to obtain₁Corresponding Q₁Obtaining a reaction with H₂Corresponding Q₂…, obtaining a reaction with H_CCorresponding Q_C。

Fig. 2 is a schematic diagram illustrating comparison of average absolute percentage errors of filling results obtained by using a flow data matrix filling method, a gaussian filling method, and a K-nearest neighbor algorithm filling method based on cluster analysis according to an embodiment of the present application, and it can be seen from fig. 2 that the average absolute percentage error is significantly reduced when missing values are filled by using the matrix filling method based on cluster analysis, which is enough to show that the matrix filling method based on cluster analysis has a relatively good filling effect when processing missing values of data with time series characteristics.

In still another embodiment of the present invention, the step S105 may include the steps of:

s1051, decomposing each missing value filling matrix into missing value filling flow data sequences of the data flow equipment.

In particular, the deficiency padding matrix Q₁Can be decomposed into L₁Filling flow data sequence with missing values, filling matrix Q with missing values₂Can be divided intoIs decomposed into L₂Filling traffic data sequences with missing values …, filling matrix Q with missing values_CCan be decomposed into L_CFilling the traffic data sequence with missing values, L₁+L₂+…+L_C＝M。

S1052, filling a flow data sequence according to the position parameter and the missing value of each data flow device, and constructing a training data set and a prediction data set.

The method comprises the steps of dividing an area where M data traffic equipment are located into M grids, wherein each grid represents a coverage area of one data traffic equipment, so that a missing value filling traffic data sequence of each data traffic equipment forms a three-dimensional tensor I multiplied by J multiplied by N, wherein the I multiplied by J represents position information of the coverage area of the data traffic equipment, and the N represents the missing value filling traffic data sequence of the data traffic equipment.

A training dataset and a prediction dataset are constructed using the M three-dimensional tensors. In particular, the training data set may be represented as:

wherein the content of the first and second substances,

representing data with dimension K (K < N),

representing data of dimension 1 and representing the data of dimension 1,

t representsThe starting point of l.

For example, the first A (K < A < N) missing value filling flow datA in each missing value filling flow datA sequence may be collected together as A training datA set, and the last N-A missing value filling flow datA in each missing value filling flow datA sequence may be collected together as A prediction datA set.

And S1053, inputting the training data set into the preset neural network model to obtain a trained model.

For example, the preset neural network model may be a three-layer neural network model, where a first layer of the three-layer neural network model is an input layer, a second layer is a hidden layer, and a third layer is an output layer. The input layer may include 20 neurons, the hidden layer may include 50 neurons, and the output layer may include 1 neuron. The three-layer neural network model may employ the BPTT algorithm.

In the process of training the preset neural network model, firstly, randomly initializing the weight and the bias of the preset neural network model, and the specific process is as follows:

b_g＝b_i＝b₀＝b≈O (14)

b_f＝1 (15)

then, calculating the output value of each gate in each memory module in the preset neural network model, and the specific process is as follows:

wherein the content of the first and second substances,

it is shown that the input-pressing unit,

representing the input gate unit, both units are ready for state updating,

the forgetting gate unit is used for determining the forgetting degree of the preset neural network model to the input data,

for updating the state of the module(s),

and

may represent the updated output of the module.

Specifically, the weight updating process is as follows:

in the above equation, τ ∈ { g, i, f, o }, L_KLoss function for the whole function:

and recording the specific iteration times in the iteration process, if the specific iteration times are smaller than a preset iteration threshold, continuing the next iteration process, otherwise, terminating the iteration process, and outputting the weight and the trained model.

Preferably, the trained model may be the Conv-LSTM model.

S1054, inputting the prediction data set into the trained model to obtain the flow prediction value of each data flow device.

Fig. 3 is a schematic diagram illustrating comparison of average absolute percentage errors of prediction results obtained by using a flow data matrix filling result, a gaussian filling result, and a K-nearest neighbor algorithm filling result, which are provided in the embodiment of the present application, and as can be seen from fig. 3, in the embodiment of the present application, a missing value is filled by using a flow data matrix based on cluster analysis, so that time series characteristics of flow data of data traffic devices and geographical location characteristics between adjacent data traffic devices can be well mined in a Conv-LSTM model, and prediction accuracy is effectively improved.

In the embodiment of the present invention, determining the traffic prediction value of each data traffic device by using a preset neural network model, a missing value filling matrix, and a location parameter of each data traffic device includes: decomposing each missing value filling matrix into a missing value filling flow data sequence of the data flow equipment; filling a flow data sequence according to the position parameter and the missing value of each data flow device, and constructing a training data set and a prediction data set; inputting the training data set into a preset neural network model to obtain a trained model; the method comprises the steps of inputting a prediction data set into a trained model to obtain a flow prediction value of each data flow device, wherein a missing value filling matrix can restore real data more accurately, and meanwhile, a training data set and a prediction data set are constructed by using position parameters of each data flow device, so that the prediction data set can restore the real data accurately, and the trained model is more accurate, therefore, the trained model can predict the change of the flow data better and more efficiently, and the accuracy of the flow prediction value of each data flow device is improved.

Furthermore, in the embodiment of the invention, the preset neural network model is used for determining the flow predicted value of each data flow device, the time-space characteristics of the flow data sequence can be fully considered, and certain universality is achieved.

In another embodiment of the present invention, a flow data prediction apparatus in the embodiment of the present application is described in detail, and as shown in fig. 4, the flow data prediction apparatus includes: the system comprises an acquisition module 31, a data cleaning module 32, a clustering module 33, a missing value filling module 34 and a prediction module 35.

The obtaining module 31 is configured to obtain a flow data sequence of a plurality of data flow devices.

The data cleaning module 32 is configured to perform data cleaning on the abnormal flow data in the flow data sequence of each data flow device.

The clustering module 33 is configured to perform clustering analysis on the flow data sequence after data cleaning to obtain a plurality of flow data matrices, where each flow data matrix includes a missing value.

The missing value filling module 34 is configured to perform missing value filling on each traffic data matrix by using a preset filling manner, so as to obtain a missing value filling matrix.

The prediction module 35 is configured to determine a predicted flow value of each data traffic device by using a preset neural network model, the missing value padding matrix, and a location parameter of each data traffic device.

In another embodiment of the present invention, a flow data prediction apparatus in the embodiment of the present application is described in detail, as shown in fig. 5, the flow data prediction apparatus includes: a processor 501, a memory 502, a communication interface 503, and a bus 504.

The processor 501, the memory 502 and the communication interface 503 are all communicated with each other through the bus 504.

The communication interface 503 is used for information transmission between external devices.

Illustratively, the external device may be a user equipment UE.

The processor 501 is used to call program instructions in the memory 502 to perform methods as provided by various method embodiments, including, for example:

s101, acquiring flow data sequences of a plurality of data flow devices.

In yet another embodiment of the present invention, a computer-readable storage medium in the embodiments of the present application is described in detail, and the computer-readable storage medium stores computer instructions that cause the computer to execute the method provided by the method embodiments, for example, the method includes:

s101, acquiring flow data sequences of a plurality of data flow devices.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for predicting traffic data, comprising:

acquiring flow data sequences of a plurality of data flow devices;

performing cluster analysis on the flow data sequence after data cleaning to obtain a plurality of flow data matrixes, wherein the flow data matrixes comprise missing values;

2. The traffic data prediction method of claim 1, wherein the obtaining the traffic data sequence of the plurality of data traffic devices comprises:

collecting flow data of each data flow device on a preset time node;

3. The method for predicting flow data according to claim 1, wherein the performing data cleaning on abnormal flow data in the flow data sequence of each data flow device comprises:

acquiring a flow knowledge base obtained in advance;

and deleting the abnormal flow data.

4. The method for predicting flow data according to claim 1, wherein the performing cluster analysis on the flow data sequence after data cleaning to obtain a plurality of flow data matrices comprises:

calculating a correlation coefficient between any two converted flow data sequences to obtain a correlation matrix R and a similarity distance matrix S, wherein R ═ R (R ═ R)_ij)_M×M，S＝(s_ij)_M×M,s_ij＝1-r_ijWherein r is_ijA correlation coefficient between the ith converted traffic data sequence and the jth converted traffic data sequence is set, M is the number of the data traffic devices, i is 1,2, …, M, j is 1,2, …, M;

determining a clustering number C by using the position parameter of each data flow device, clustering the M converted flow data sequences by using the clustering number C and a preset clustering mode to obtain C converted flow data sets V_cSaid converted traffic data set V_cIncluding L_cA sequence of said converted traffic data, wherein L₁+L₂+…+L_C＝M,c＝1,2,…,C；

5. The method for predicting flow data according to claim 4, wherein the preset data format corresponds to a formula:

wherein the content of the first and second substances,

6. The flow data prediction method of claim 5, wherein a correlation coefficient between any two of the converted flow data sequences is calculated using a correlation coefficient formula, wherein the correlation coefficient formula is:

wherein i, j is 1,2, …, M.

7. The method for predicting traffic data according to claim 1, wherein the missing value padding is performed on each traffic data matrix by using a preset padding method to obtain a missing value padding matrix, and the method comprises:

8. The method for predicting flow data according to claim 1, wherein the determining the predicted flow value of each data flow device by using the preset neural network model, the deficiency value filling matrix and the position parameter of each data flow device comprises:

9. A traffic data prediction apparatus, comprising: the device comprises an acquisition module, a data cleaning module, a clustering module, a missing value filling module and a prediction module;

10. A flow data prediction device, comprising: a processor, a memory, a communication interface, and a bus;

the processor is configured to invoke program instructions in the memory to perform the steps of the method of any of claims 1 to 8.

11. A computer-readable storage medium storing computer instructions for causing a computer to perform the steps of the method of any one of claims 1 to 8.