CN116432850A

CN116432850A - PM2.5 full-coverage prediction method based on deep neural network

Info

Publication number: CN116432850A
Application number: CN202310423066.0A
Authority: CN
Inventors: 任珂; 陈康旭; 俞扬信; 高尚兵; 王媛媛; 李翔
Original assignee: Huaiyin Institute of Technology
Current assignee: Huaiyin Institute of Technology
Priority date: 2023-04-19
Filing date: 2023-04-19
Publication date: 2023-07-14

Abstract

The invention discloses a PM2.5 full coverage prediction method based on a deep neural network, which is used for preprocessing air pollution concentration, meteorological data and land utilization data which are acquired in advance and dividing the air pollution concentration, the meteorological data and the land utilization data into training data and test data; constructing a deep neural network prediction model STA-ConvLSTM and training the model STA-ConvLSTM; STA-ConvLSTM takes CNN network as bottom layer, and space correlation of grid data is extracted through convolution; taking stacked layers of STA-LSTM with space-time memory units and space memory units as intermediate layers of a prediction model for extracting features of time correlation and space correlation; the last layer uses the CNN layer in combination with features extracted by the STA-LSTM unit for decoding. The method integrates multi-source heterogeneous data, considers more influencing factors, considers the problems of space-time correlation, reduces overfitting, avoids gradient disappearance and gradient explosion, considers the problem of full-coverage prediction of the region, and improves the prediction precision by using a time attention mechanism and a space attention mechanism.

Description

PM2.5 full-coverage prediction method based on deep neural network

Technical Field

The invention belongs to the field of urban air pollutant concentration monitoring and early warning, and particularly relates to a full coverage prediction method of PM2.5 based on a deep neural network.

Background

In recent years, under the background of rapid development of industrialization and science and technology in China, the air pollution problem is very prominent, and brings about a plurality of hidden dangers to the health of people, so the air pollution problem has become one of the important problems of the society and researchers. Of these, fine particulate matter (PM 2.5) is the most dangerous one, as it can directly enter the lungs and cause serious injury to the body. In this case, it is necessary to make an efficient and accurate PM2.5 prediction method.

Currently, a number of prediction methods for air pollution have been proposed. Such as conventional statistical methods, machine learning, artificial neural networks, and the like. The machine learning technology is widely applied, and includes algorithms such as a Support Vector Machine (SVM), a Decision Tree (Decision Tree), a Random Forest (Random Forest), a Neural Network (Neural Network), and the like. However, these methods have certain drawbacks in terms of accuracy and real-time. In recent years, with the development of deep learning technology, deep learning has a great breakthrough in various research fields, and an air pollution prediction method based on a deep learning model gradually becomes a research hot spot. The space-time correlation between the data can be well extracted through the effective training of the deep learning technology on a large amount of data.

Although the existing deep learning method has good performance in air pollution prediction, most models use single air pollution data, and influence of factors such as weather, topography and the like on air pollution transmission is ignored. Most of the current models only consider predicting air pollutants at a single site, ignore the full coverage prediction of air pollutants, and suffer from high-value air pollutant prediction underestimation. Most current deep learning models have difficulty capturing long-time dependent patterns, the time-space correlation of which decays over time, resulting in a large loss of accuracy of the predictions. Most current models cannot fuse space-time characteristics at the same time, so that the pollutant condition in a future period of time can be effectively predicted.

Disclosure of Invention

The invention aims to: the invention provides a full coverage prediction method of PM2.5 based on a deep neural network, which aims to solve the defects and the shortcomings of the prior art, realize full coverage prediction of PM2.5 and obtain future time-space evolution data.

The technical scheme is as follows: the invention provides a PM2.5 full coverage prediction method based on a deep neural network, which specifically comprises the following steps: :

(1) Preprocessing the pre-acquired air pollution concentration, meteorological data and land utilization data, and dividing the pre-acquired air pollution concentration, meteorological data and land utilization data into training data and test data;

(2) Constructing a deep neural network prediction model STA-ConvLSTM for predicting the pollutant concentration of a target area; the STA-ConvLSTM takes a CNN network as a bottom layer and is used for processing input data to extract spatial features and extracting spatial correlation of grid data through convolution; taking stacked layers of STA-LSTM with space-time memory units and space memory units as intermediate layers of a prediction model for extracting features of time correlation and space correlation; the last layer uses CNN layer to combine with the feature extracted by STA-LSTM unit to decode;

(3) Training a deep neural network prediction model STA-ConvLSTM by using training data;

(4) The concentration of PM2.5 for the target area for the next N hours is predicted by using the trained model.

Further, the implementation process of the step (1) is as follows:

interpolation filling is carried out on the acquired multi-source heterogeneous data by using a kriging space interpolation method according to the longitude and latitude of a data site, the data is interpolated into a grid of 100 x 100, and a generated original feature matrix is generated; fitting an exponential model of the semi-variational function using an exponential model of the exponentiation; the interpolation result can be expressed as:

wherein Z (x) represents the spatial interpolation of the unknown point, Z (xi) represents the data value of the known point, h _ij Representing a known point x _i And the Euclidean distance between the unknown points x, α being a correlation length parameter in the half-variance function for controlling the rate at which the spatial autocorrelation gradually decreases with increasing distance, n being the number of known points, e representing the error term;

the original feature matrix is subjected to dimension lifting processing in a mode of increasing the number of output channels by using a 1 multiplied by 1 convolution kernel and a filter, so that information of different channels can be interacted and fused; then, the input data is aggregated into a two-dimensional eigenvector by adopting a 1×1 convolution kernel, the two-dimensional eigenvector is compatible with the input of the subsequent model, and the data is normalized.

Further, the feature implementation process for extracting the time correlation and the space correlation is as follows:

s1: the spatial features extracted by the CNN network are input into a prediction unit based on multi-layer STA_LSTM stacking, and the input of the first layer of prediction unit is the extracted features X of the CNN network _t ∈R ^B×C×H×W B and C represent the batch size and channel of the feature map; time memory

Spatial memory->

Hidden state->

l is the number of layers of the current prediction unit, and t is the current moment; extracting the time sequence characteristics of the pollutant concentration, weather and land utilization data of the predicted area through the first layer of prediction unit to realize the encoding of the data and generate new +.>

Spatial memory->

Hidden state->

Will->

And->

Respectively adding the hidden state and the time memory list, and taking the current hidden state and the space memory as input of the next layer and the historical hidden state and the time memory;

s2: adding SAM at the top layer of the prediction unit, merging the hidden state of each layer output of STA_LSTM into the key and the value of attention, using the current output as the query, forming the corresponding SAM output, and realizing that more information of the lower layer is integrated at the top layer to improve the prediction;

s3: when the time t reaches the prediction stage k, a TAM is embedded in the prediction unit of each layer, the historical feature grid data is connected in the channel size as a key and a value of attention, the current output is used as a query by using a channel attention function softmax, and the current output of the TAM and the historical hidden state are used as the query and input of the SAM.

Further, in the step (2), the decoding is performed by using the CNN layer in combination with features extracted by the sta_lstm unit, and then there are:

wherein f _SL As a function of the STA-LSTM,

output of STA-LSTM at time t and hidden state of each layer encoder respectively,/>

Is a history memory unit, Y _t+1 And finally, decoding the features extracted by the STA-LSTM unit by using the CNN layer to finish the pollutant concentration prediction at the time t+1.

Further, the implementation process of the step S1 is as follows:

spatial feature X extracted from CNN network layer _t ∈R ^B×C×H×W And the initialized time memory, space memory and hidden state are used as the original input of the STA-LSTM, and the following steps are made

For one layer of time memory above the current time, +.>

For the current temporal previous layer of spatial memory, +.>

For a layer of hidden state at the current time, the hidden state is a Hadamard product, sigma represents an activation function sigmoid, sigma represents a convolution operator, and the feature extraction process in a prediction unit STA-LSTM is as follows:

wherein W is _hf 、W _hi 、W _h3 Respectively representing forgetting gate, input gate, output gate and h in the characteristic extraction process _t-1 Weight coefficient, W of (2) _xf 、W _xi 、W _x3 Respectively representing forgetting gate, input gate, output gate and x in the characteristic extraction process _t Weight coefficient of b) _i 、b _f And b ₃ Respectively representing the bias values of the forgetting gate, the input gate, the output gate and the characteristic extraction process,

respectively representing the time memory, the space memory and the hidden state of the first layer at the moment t;

in standard LSTM, for

With the original door, another set of door structures is constructed in the same way to accommodate +.>

At this time, the STA-LSTM unit comprises a time memory unit and a space memory unit; final hidden state->

Fusion based on space-time memory; in order to link memories from different directions together, the STA-LSTM unit uses a shared output gate to handle both types of memories, thereby achieving seamless memory fusion; in addition, the dimension reduction is performed using a 1×1 convolution layer to enable the hidden state +.>

Has the same dimension as the memory cell.

Further, the implementation process of the step S2 is as follows:

original hidden state of top layer

Convolving into query Q by a remodelling operation _S ∈R ^B ^×C×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the B and C represent the batch size and channel of the feature map, respectively; hidden state of different layers at the same time step

A key K is also generated _S ∈R ^{B×((L-1)*C)×(H*W)} Sum value V _S ∈R ^{B×((L-1)*C)×(H*W)} The method comprises the steps of carrying out a first treatment on the surface of the New hidden state->

Obtained by:

after reshaping the dimensions back

After (I)>

Is +_hidden from original state>

Added and then normalized by Layer Normalization layers to be output from the SAM module.

Further, the implementation process of the step S3 is as follows:

delivering current hidden states over a convolutional layer

Generating query Q through shaping operations _t ∈R ^B ^×C×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the Also, from history input->

Obtaining key K by two independent convolutions _T ∈R ^{B×(t*C)×(H*W)} Sum value V _T ∈R ^{B×(t*C)×(H*W)} The method comprises the steps of carrying out a first treatment on the surface of the Then, new hidden state->

The formula can be noted in time to calculate:

finally, will

Remodelling to the same size as the original hidden state, and adding Layer Normalization layers of normalized hidden state +.>

As an output of the TAM module.

Further, the implementation process of the step (3) is as follows:

the mean square error function is used as a loss function of the model:

wherein,,

and X _A+1:T The predicted result data and the real data, H and W are the height and width of the mesh data, respectively, (i, j) indicates the position in the mesh region.

Further, the implementation process of the step (4) is as follows:

predicting and outputting to obtain grid data, namely predicting the value of the pollutant concentration in a certain target site area in order to obtain the value of the pollutant concentration of the target area grid, and accurately positioning the pollutant concentration in the row and column coordinates of the target grid according to the longitude and latitude coordinates of the interpolated site; the coordinate formula obtained by longitude and latitude is as follows:

y＝Y _i，j

wherein l3n_PuerT and lat_querT are longitude and latitude of the query site respectively, l3n_max and lat_max are longitude and latitude of the interpolation maximum respectively, l3n_min and lat_min are longitude and latitude of the interpolation minimum respectively, height and width represent the height and width of the grid.

The beneficial effects are that: compared with the prior art, the invention has the beneficial effects that:

1. the method combines pollutant concentration, weather and land utilization data among multiple sites, effectively fuses space-time areas of the multi-source heterogeneous data combined characteristics, combines the multi-source data characteristics of the multiple areas by using a multi-dimensional convolution CNN network, thereby extracting deep spatial correlation and realizing full coverage prediction; the model adopts a full convolution method, so that a large amount of characteristic loss caused by a pooling layer can be effectively eliminated, and the space characteristics of pollutants, weather and land utilization data can be better extracted;

2. the method comprises the steps that a STA-LSTM architecture based on deep learning stacking is constructed by a model, a convolutional neural network is used for capturing a spatial interaction relation and a long-short-term memory network is used for capturing time correlation, a attention mechanism is integrated for capturing global information, so that modeling capacity of time-space information is improved, a regularization technology is used for reducing the over-fitting problem of the model, the problems of gradient disappearance, gradient explosion and the like are avoided, a time attention module is embedded in the model for solving the problem that historical information is lost along with the time, a spatial attention module is embedded for solving the problem that stacking characteristics of a multi-layer prediction unit are lost from a bottom layer to a top layer, the local prediction capacity is improved, and the accuracy of a prediction model can be further improved from the angle of time sequence data;

3. the method uses multi-source heterogeneous data, fully considers the problem of space time, and overcomes the problems of insufficient predicted characteristic extraction strength and weak data correlation; the prediction result can be connected with the predicted PM2.5 concentration data in a front-back manner, and the accuracy of PM2.5 concentration continuity prediction in a future period is greatly improved;

4. the existing PM2.5 prediction method based on deep learning rarely considers the full coverage prediction of data, and the deep neural network model constructed based on the problem has good prediction performance and performance for sites-dense cities.

Drawings

FIG. 1 is a schematic diagram of a STA-ConvLSTM model constructed in the present invention;

FIG. 2 is a schematic diagram of an STA-LSTM prediction unit;

FIG. 3 is a schematic diagram showing a structure of a SAM module;

FIG. 4 is a schematic view of a TAM module structure;

fig. 5 is a diagram of the temporal and spatial evolution.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

The invention provides a PM2.5 full coverage prediction method based on a deep neural network, which specifically comprises the following steps:

step 1: air pollution concentration, meteorological data and land utilization data are collected from environment monitoring data, and the collected multi-source heterogeneous data are preprocessed and divided into training data and testing data.

And filling the acquired air quality data, meteorological data and land utilization data with null values, and using a Kriging space interpolation method according to the longitude and latitude of the data station. Firstly, preprocessing data, then selecting a proper interpolation method according to data types by using a Kriging spatial interpolation method to fit the data, and fitting an exponential model of a half-variant function by using an exponential model; the interpolation result can be expressed as:

wherein Z (x) represents the spatial interpolation of the unknown point, Z (xi) represents the data value of the known point, h _ij Representing a known point x _i And the Euclidean distance between the unknown points x, α being the correlation length parameter in the half-variance function for controlling the rate at which the spatial autocorrelation tapers off with increasing distance, n being the number of known points, e representing the error term.

The method comprises the steps of interpolating data into a 100 x 100 grid to generate an original feature matrix, carrying out dimension lifting processing on the original feature matrix by using a 1 x 1 convolution kernel and a mode of increasing the number of output channels by a filter, enabling information of different channels to be interacted and fused, improving the extraction capacity of a model to nonlinear features, then adopting the 1 x 1 convolution kernel to aggregate input data into two-dimensional feature vectors, enabling the two-dimensional feature vectors to be compatible with the input of a subsequent model, and carrying out normalization processing on the data. And the first two years in the data set are used as training data of a training set, and the data of the next year are used as test data of a test set, so that the initialization of the deep neural network prediction model is completed.

Step 2: as shown in fig. 1, a deep neural network prediction model STA-ConvLSTM is constructed that predicts contaminant concentrations in a target region.

(2.1) STA-ConvLSTM uses CNN network as the bottom layer for processing input data to extract spatial features, and spatial correlation of mesh data is extracted by convolution. The multi-source heterogeneous data is converted into a two-dimensional matrix with a time sequence which can be received by the CNN network, and is input into the CNN network to extract spatial features as the input of the stacked STA-LSTM.

(2.2) as shown in fig. 2, a plurality of layers of STA-LSTM having a space-time memory cell and a space memory cell are stacked as intermediate layers of a prediction model for extracting features of time correlation and space correlation.

The trained CNN weight parameters are input into a prediction unit based on multi-layer STA-LSTM stacking, and the input of a first layer of prediction unit is respectively used for extracting characteristic X from a CNN network _t ∈R ^B×C×H×W (B and C represent the batch size and channel of the feature map), time memory

Spatial memory->

Hidden state->

(except that the input X is zero initially and t is the current moment), extracting the time sequence characteristics of the pollutant concentration, weather and land utilization data of the predicted area through the first layer prediction unit, realizing the encoding of the data and generating a new ∈>

Spatial memory->

Hidden state->

Will->

And->

Respectively adding the hidden state and the time memory list, taking the current hidden state and the space memory as well as the history hidden state and the time memory as the input of the next layer, and so on.

CNN network training generation X _t ∈R ^B×C×H×W X to be generated _t And the initialized time memory, space memory and hidden state are used as the original input of the STA-LSTM, and the following steps are made

For the time memory of the last layer,/the>

For the memory of the space of the upper layer,

for the upper layer hidden state, ° is a Hadamard product, σ represents the activation function sigmoid, # represents the convolution operator, and the feature extraction process in the prediction unit STA-LSTM can be expressed by the following formula:

respectively representing the time memory, the space memory and the hidden state of the first layer at the time t.

In standard LSTM, for

Using original gates while constructing another set of gate structures in the same way to accommodate

The STA_LSTM unit includes both a time memory unit and a space memory unit. Final hidden state->

Fusion based on spatiotemporal memory. In order to link memories from different directions together, the sta_lstm unit uses a shared output gate to handle both types of memories, thereby achieving a seamless memory fusion. Furthermore, by using a 1 x 1 convolution layer for dimension reduction, the hidden state +.>

Has the same dimension as the memory cell. The method is different from simple memory series connection, and can effectively simulate the spatial variation and variation track in the space-time sequence.

And adding SAM at the top layer of the prediction unit, combining the hidden state of each layer output of the TAM into noted keys and values to form corresponding SAM output, and integrating more information of the lower layer into the top layer to improve the prediction.

Specifically, the SAM process is shown in fig. 3. Original hidden state of top layer

Convolving into query Q by a remodelling operation _S ∈R ^B×C×(H*W) . Here B and C represent the batch size and channel of the feature map, respectively. Then, the hidden state of different layers at the same time step +.>

A key K is also generated _S ∈R ^{B×((L-1)*C)×(H*W)} Sum value V _S ∈R ^{B×((L-1)*C)×(H*W)} Finally, new hidden state->

The method can be obtained by the following steps:

after reshaping the dimensions back

After (I)>

Is +_hidden from original state>

When time t reaches prediction stage k, a TAM is embedded in the prediction unit of each layer, the historical feature grid data is connected in channel size as a key and value of attention, the current output is used as a query by using the channel attention function softmax, the TAM process is as shown in fig. 4, and the output of the TAM is used as the input of SAM (spatial attention module).

Delivering current hidden states over a convolutional layer

Generating query Q through shaping operations _t ∈R ^B ^×C×(H*W) . Also, from history input->

The key K can be obtained by two independent convolutions _T ∈R ^B ^{×(t*C)×(H*W)} Sum value V _T ∈R ^{B×(t*C)×(H*W)} . Then, new hidden state->

The formula can be noted in time to calculate:

finally, will

As an output of the TAM module.

(2.3) the last layer uses CNN layer in combination with features extracted by STA-LSTM units for decoding.

Wherein f _SL As a function of the STA-LSTM,

output of STA_LSTM at time t and hidden state of each layer encoder respectively,/>

Is a history memory unit, Y _t+1 And finally, decoding the features extracted by the STA_LSTM unit by using the CNN layer to finish the pollutant concentration prediction at the time t+1.

Step 3: and training the deep neural network prediction model by using training data.

In the training process of the STA-ConvLSTM prediction model, a mean square error function is adopted as a loss function of the model, and a calculation formula is as follows:

wherein Y is _A+1:T Can also be expressed as

And X _A+1:T Respectively, prediction result data and real data, H and W are grid data(i, j) represents the position in the grid area.

Step 4: the concentration of PM2.5 for the target area for the next N hours is predicted by using the trained model.

As shown in fig. 5, the final prediction output obtains a grid data, and in order to obtain the value of the pollutant concentration of the target area grid or the monitoring station, the target grid row-column coordinates need to be accurately positioned according to the interpolated station longitude and latitude coordinates, and the current grid data is the value of the predicted pollutant concentration in a certain target station area. The coordinate formula obtained by longitude and latitude is as follows:

T＝Y _j ， _j

wherein l3n_PuerT and lat_PuerT are longitude and latitude of the query site respectively, l3n_max and lat_max are longitude and latitude of the interpolation maximum respectively, l3n_min and lat_min are longitude and latitude of the interpolation minimum respectively, height and width represent the height and width of the grid.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explanation of the principles of the present invention and are in no way limiting of the invention. Accordingly, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present invention should be included in the scope of the present invention. Furthermore, the appended claims are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or equivalents of such scope and boundary.

Claims

1. The full coverage prediction method of PM2.5 based on the deep neural network is characterized by comprising the following steps of:

2. The full coverage prediction method of PM2.5 based on deep neural network according to claim 1, wherein the implementation process of step (1) is as follows:

3. The deep neural network-based full coverage prediction method for PM2.5 according to claim 1, wherein the feature implementation process of extracting the temporal correlation and the spatial correlation is as follows:

Spatial memory->

Hidden state->

Spatial memory->

Hidden state->

Will->

And->

4. The method for full coverage prediction of PM2.5 based on a neural network according to claim 1, wherein in step (2) the feature extracted by using the CNN layer in combination with the sta_lstm unit is decoded, which comprises:

wherein f _SL As a function of the STA-LSTM,

5. The method for predicting full coverage of PM2.5 based on deep neural network according to claim 3, wherein the step S1 is implemented as follows:

For one layer of time memory above the current time, +.>

For the current temporal previous layer of spatial memory, +.>

in standard LSTM, for

At this time STA-LSTM listThe cell includes a time memory unit and a space memory unit; final hidden state->

Fusion based on space-time memory; in order to link memories from different directions together, the STA-LSTM unit uses a shared output gate to handle both types of memories, thereby achieving seamless memory fusion; in addition, the dimension reduction is performed by using a convolution layer of 1×1 to enable the hidden state

Has the same dimension as the memory cell.

6. The method for predicting full coverage of PM2.5 based on deep neural network according to claim 3, wherein said step S2 is implemented as follows:

original hidden state of top layer

Convolving into query Q by a remodelling operation _S ∈R ^B×C×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the B and C represent the batch size and channel of the feature map, respectively; hidden state of different layers at the same time step

Obtained by:

after reshaping the dimensions back

After (I)>

Is +_hidden from original state>

7. The method for predicting full coverage of PM2.5 based on deep neural network according to claim 3, wherein said step S3 is implemented as follows:

delivering current hidden states over a convolutional layer

Generating query Q through shaping operations _t ∈R ^B×C×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the Also, from history input->

Obtaining key K by two independent convolutions _T ∈R ^{B×(t*C)×(H*W)} Sum value V _T ∈R ^B ^{×(t*C)×(H*W)} The method comprises the steps of carrying out a first treatment on the surface of the Then, new hidden state->

The formula can be noted in time to calculate:

finally, will

Remodelling to the same size as the original hidden state, and adding Layer Normalization layer normalized originalHidden state->

As an output of the TAM module.

8. The full coverage prediction method of PM2.5 based on deep neural network according to claim 1, wherein the implementation procedure of step (3) is as follows:

the mean square error function is used as a loss function of the model:

wherein,,

9. The full coverage prediction method of PM2.5 based on deep neural network according to claim 1, wherein the implementation procedure of step (4) is as follows:

y＝Y _j，j

the method comprises the steps of determining a grid, wherein lon_query and lat_query are respectively the longitude and the latitude of a query site, lon_max and lat_max are respectively the longitude and the latitude with the largest interpolation, lon_min and lat_min are respectively the longitude and the latitude with the smallest interpolation, and height and width represent the height and the width of the grid.