CN116432850A - PM2.5 full-coverage prediction method based on deep neural network - Google Patents

PM2.5 full-coverage prediction method based on deep neural network Download PDF

Info

Publication number
CN116432850A
CN116432850A CN202310423066.0A CN202310423066A CN116432850A CN 116432850 A CN116432850 A CN 116432850A CN 202310423066 A CN202310423066 A CN 202310423066A CN 116432850 A CN116432850 A CN 116432850A
Authority
CN
China
Prior art keywords
data
layer
sta
hidden state
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310423066.0A
Other languages
Chinese (zh)
Inventor
任珂
陈康旭
俞扬信
高尚兵
王媛媛
李翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN202310423066.0A priority Critical patent/CN116432850A/en
Publication of CN116432850A publication Critical patent/CN116432850A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a PM2.5 full coverage prediction method based on a deep neural network, which is used for preprocessing air pollution concentration, meteorological data and land utilization data which are acquired in advance and dividing the air pollution concentration, the meteorological data and the land utilization data into training data and test data; constructing a deep neural network prediction model STA-ConvLSTM and training the model STA-ConvLSTM; STA-ConvLSTM takes CNN network as bottom layer, and space correlation of grid data is extracted through convolution; taking stacked layers of STA-LSTM with space-time memory units and space memory units as intermediate layers of a prediction model for extracting features of time correlation and space correlation; the last layer uses the CNN layer in combination with features extracted by the STA-LSTM unit for decoding. The method integrates multi-source heterogeneous data, considers more influencing factors, considers the problems of space-time correlation, reduces overfitting, avoids gradient disappearance and gradient explosion, considers the problem of full-coverage prediction of the region, and improves the prediction precision by using a time attention mechanism and a space attention mechanism.

Description

PM2.5 full-coverage prediction method based on deep neural network
Technical Field
The invention belongs to the field of urban air pollutant concentration monitoring and early warning, and particularly relates to a full coverage prediction method of PM2.5 based on a deep neural network.
Background
In recent years, under the background of rapid development of industrialization and science and technology in China, the air pollution problem is very prominent, and brings about a plurality of hidden dangers to the health of people, so the air pollution problem has become one of the important problems of the society and researchers. Of these, fine particulate matter (PM 2.5) is the most dangerous one, as it can directly enter the lungs and cause serious injury to the body. In this case, it is necessary to make an efficient and accurate PM2.5 prediction method.
Currently, a number of prediction methods for air pollution have been proposed. Such as conventional statistical methods, machine learning, artificial neural networks, and the like. The machine learning technology is widely applied, and includes algorithms such as a Support Vector Machine (SVM), a Decision Tree (Decision Tree), a Random Forest (Random Forest), a Neural Network (Neural Network), and the like. However, these methods have certain drawbacks in terms of accuracy and real-time. In recent years, with the development of deep learning technology, deep learning has a great breakthrough in various research fields, and an air pollution prediction method based on a deep learning model gradually becomes a research hot spot. The space-time correlation between the data can be well extracted through the effective training of the deep learning technology on a large amount of data.
Although the existing deep learning method has good performance in air pollution prediction, most models use single air pollution data, and influence of factors such as weather, topography and the like on air pollution transmission is ignored. Most of the current models only consider predicting air pollutants at a single site, ignore the full coverage prediction of air pollutants, and suffer from high-value air pollutant prediction underestimation. Most current deep learning models have difficulty capturing long-time dependent patterns, the time-space correlation of which decays over time, resulting in a large loss of accuracy of the predictions. Most current models cannot fuse space-time characteristics at the same time, so that the pollutant condition in a future period of time can be effectively predicted.
Disclosure of Invention
The invention aims to: the invention provides a full coverage prediction method of PM2.5 based on a deep neural network, which aims to solve the defects and the shortcomings of the prior art, realize full coverage prediction of PM2.5 and obtain future time-space evolution data.
The technical scheme is as follows: the invention provides a PM2.5 full coverage prediction method based on a deep neural network, which specifically comprises the following steps: :
(1) Preprocessing the pre-acquired air pollution concentration, meteorological data and land utilization data, and dividing the pre-acquired air pollution concentration, meteorological data and land utilization data into training data and test data;
(2) Constructing a deep neural network prediction model STA-ConvLSTM for predicting the pollutant concentration of a target area; the STA-ConvLSTM takes a CNN network as a bottom layer and is used for processing input data to extract spatial features and extracting spatial correlation of grid data through convolution; taking stacked layers of STA-LSTM with space-time memory units and space memory units as intermediate layers of a prediction model for extracting features of time correlation and space correlation; the last layer uses CNN layer to combine with the feature extracted by STA-LSTM unit to decode;
(3) Training a deep neural network prediction model STA-ConvLSTM by using training data;
(4) The concentration of PM2.5 for the target area for the next N hours is predicted by using the trained model.
Further, the implementation process of the step (1) is as follows:
interpolation filling is carried out on the acquired multi-source heterogeneous data by using a kriging space interpolation method according to the longitude and latitude of a data site, the data is interpolated into a grid of 100 x 100, and a generated original feature matrix is generated; fitting an exponential model of the semi-variational function using an exponential model of the exponentiation; the interpolation result can be expressed as:
Figure BDA0004188810470000021
wherein Z (x) represents the spatial interpolation of the unknown point, Z (xi) represents the data value of the known point, h ij Representing a known point x i And the Euclidean distance between the unknown points x, α being a correlation length parameter in the half-variance function for controlling the rate at which the spatial autocorrelation gradually decreases with increasing distance, n being the number of known points, e representing the error term;
the original feature matrix is subjected to dimension lifting processing in a mode of increasing the number of output channels by using a 1 multiplied by 1 convolution kernel and a filter, so that information of different channels can be interacted and fused; then, the input data is aggregated into a two-dimensional eigenvector by adopting a 1×1 convolution kernel, the two-dimensional eigenvector is compatible with the input of the subsequent model, and the data is normalized.
Further, the feature implementation process for extracting the time correlation and the space correlation is as follows:
s1: the spatial features extracted by the CNN network are input into a prediction unit based on multi-layer STA_LSTM stacking, and the input of the first layer of prediction unit is the extracted features X of the CNN network t ∈R B×C×H×W B and C represent the batch size and channel of the feature map; time memory
Figure BDA0004188810470000031
Spatial memory->
Figure BDA0004188810470000032
Hidden state->
Figure BDA0004188810470000033
l is the number of layers of the current prediction unit, and t is the current moment; extracting the time sequence characteristics of the pollutant concentration, weather and land utilization data of the predicted area through the first layer of prediction unit to realize the encoding of the data and generate new +.>
Figure BDA0004188810470000034
Spatial memory->
Figure BDA0004188810470000035
Hidden state->
Figure BDA0004188810470000036
Will->
Figure BDA0004188810470000037
And->
Figure BDA0004188810470000038
Respectively adding the hidden state and the time memory list, and taking the current hidden state and the space memory as input of the next layer and the historical hidden state and the time memory;
s2: adding SAM at the top layer of the prediction unit, merging the hidden state of each layer output of STA_LSTM into the key and the value of attention, using the current output as the query, forming the corresponding SAM output, and realizing that more information of the lower layer is integrated at the top layer to improve the prediction;
s3: when the time t reaches the prediction stage k, a TAM is embedded in the prediction unit of each layer, the historical feature grid data is connected in the channel size as a key and a value of attention, the current output is used as a query by using a channel attention function softmax, and the current output of the TAM and the historical hidden state are used as the query and input of the SAM.
Further, in the step (2), the decoding is performed by using the CNN layer in combination with features extracted by the sta_lstm unit, and then there are:
Figure BDA0004188810470000039
Figure BDA00041888104700000310
wherein f SL As a function of the STA-LSTM,
Figure BDA00041888104700000311
output of STA-LSTM at time t and hidden state of each layer encoder respectively,/>
Figure BDA00041888104700000312
Is a history memory unit, Y t+1 And finally, decoding the features extracted by the STA-LSTM unit by using the CNN layer to finish the pollutant concentration prediction at the time t+1.
Further, the implementation process of the step S1 is as follows:
spatial feature X extracted from CNN network layer t ∈R B×C×H×W And the initialized time memory, space memory and hidden state are used as the original input of the STA-LSTM, and the following steps are made
Figure BDA00041888104700000313
For one layer of time memory above the current time, +.>
Figure BDA00041888104700000314
For the current temporal previous layer of spatial memory, +.>
Figure BDA00041888104700000315
For a layer of hidden state at the current time, the hidden state is a Hadamard product, sigma represents an activation function sigmoid, sigma represents a convolution operator, and the feature extraction process in a prediction unit STA-LSTM is as follows:
Figure BDA0004188810470000041
Figure BDA0004188810470000042
Figure BDA0004188810470000043
Figure BDA0004188810470000044
Figure BDA0004188810470000045
Figure BDA0004188810470000046
wherein W is hf 、W hi 、W h3 Respectively representing forgetting gate, input gate, output gate and h in the characteristic extraction process t-1 Weight coefficient, W of (2) xf 、W xi 、W x3 Respectively representing forgetting gate, input gate, output gate and x in the characteristic extraction process t Weight coefficient of b) i 、b f And b 3 Respectively representing the bias values of the forgetting gate, the input gate, the output gate and the characteristic extraction process,
Figure BDA0004188810470000048
respectively representing the time memory, the space memory and the hidden state of the first layer at the moment t;
in standard LSTM, for
Figure BDA0004188810470000049
With the original door, another set of door structures is constructed in the same way to accommodate +.>
Figure BDA00041888104700000410
At this time, the STA-LSTM unit comprises a time memory unit and a space memory unit; final hidden state->
Figure BDA00041888104700000411
Fusion based on space-time memory; in order to link memories from different directions together, the STA-LSTM unit uses a shared output gate to handle both types of memories, thereby achieving seamless memory fusion; in addition, the dimension reduction is performed using a 1×1 convolution layer to enable the hidden state +.>
Figure BDA00041888104700000412
Has the same dimension as the memory cell.
Further, the implementation process of the step S2 is as follows:
original hidden state of top layer
Figure BDA00041888104700000413
Convolving into query Q by a remodelling operation S ∈R B ×C×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the B and C represent the batch size and channel of the feature map, respectively; hidden state of different layers at the same time step
Figure BDA00041888104700000414
A key K is also generated S ∈R B×((L-1)*C)×(H*W) Sum value V S ∈R B×((L-1)*C)×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the New hidden state->
Figure BDA00041888104700000415
Obtained by:
Figure BDA0004188810470000051
after reshaping the dimensions back
Figure BDA0004188810470000052
After (I)>
Figure BDA0004188810470000053
Is +_hidden from original state>
Figure BDA0004188810470000054
Added and then normalized by Layer Normalization layers to be output from the SAM module.
Further, the implementation process of the step S3 is as follows:
delivering current hidden states over a convolutional layer
Figure BDA0004188810470000055
Generating query Q through shaping operations t ∈R B ×C×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the Also, from history input->
Figure BDA0004188810470000056
Obtaining key K by two independent convolutions T ∈R B×(t*C)×(H*W) Sum value V T ∈R B×(t*C)×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the Then, new hidden state->
Figure BDA0004188810470000057
The formula can be noted in time to calculate:
Figure BDA0004188810470000058
finally, will
Figure BDA0004188810470000059
Remodelling to the same size as the original hidden state, and adding Layer Normalization layers of normalized hidden state +.>
Figure BDA00041888104700000510
As an output of the TAM module.
Further, the implementation process of the step (3) is as follows:
the mean square error function is used as a loss function of the model:
Figure BDA00041888104700000511
wherein,,
Figure BDA00041888104700000512
and X A+1:T The predicted result data and the real data, H and W are the height and width of the mesh data, respectively, (i, j) indicates the position in the mesh region.
Further, the implementation process of the step (4) is as follows:
predicting and outputting to obtain grid data, namely predicting the value of the pollutant concentration in a certain target site area in order to obtain the value of the pollutant concentration of the target area grid, and accurately positioning the pollutant concentration in the row and column coordinates of the target grid according to the longitude and latitude coordinates of the interpolated site; the coordinate formula obtained by longitude and latitude is as follows:
Figure BDA00041888104700000513
Figure BDA00041888104700000514
y=Y i,j
wherein l3n_PuerT and lat_querT are longitude and latitude of the query site respectively, l3n_max and lat_max are longitude and latitude of the interpolation maximum respectively, l3n_min and lat_min are longitude and latitude of the interpolation minimum respectively, height and width represent the height and width of the grid.
The beneficial effects are that: compared with the prior art, the invention has the beneficial effects that:
1. the method combines pollutant concentration, weather and land utilization data among multiple sites, effectively fuses space-time areas of the multi-source heterogeneous data combined characteristics, combines the multi-source data characteristics of the multiple areas by using a multi-dimensional convolution CNN network, thereby extracting deep spatial correlation and realizing full coverage prediction; the model adopts a full convolution method, so that a large amount of characteristic loss caused by a pooling layer can be effectively eliminated, and the space characteristics of pollutants, weather and land utilization data can be better extracted;
2. the method comprises the steps that a STA-LSTM architecture based on deep learning stacking is constructed by a model, a convolutional neural network is used for capturing a spatial interaction relation and a long-short-term memory network is used for capturing time correlation, a attention mechanism is integrated for capturing global information, so that modeling capacity of time-space information is improved, a regularization technology is used for reducing the over-fitting problem of the model, the problems of gradient disappearance, gradient explosion and the like are avoided, a time attention module is embedded in the model for solving the problem that historical information is lost along with the time, a spatial attention module is embedded for solving the problem that stacking characteristics of a multi-layer prediction unit are lost from a bottom layer to a top layer, the local prediction capacity is improved, and the accuracy of a prediction model can be further improved from the angle of time sequence data;
3. the method uses multi-source heterogeneous data, fully considers the problem of space time, and overcomes the problems of insufficient predicted characteristic extraction strength and weak data correlation; the prediction result can be connected with the predicted PM2.5 concentration data in a front-back manner, and the accuracy of PM2.5 concentration continuity prediction in a future period is greatly improved;
4. the existing PM2.5 prediction method based on deep learning rarely considers the full coverage prediction of data, and the deep neural network model constructed based on the problem has good prediction performance and performance for sites-dense cities.
Drawings
FIG. 1 is a schematic diagram of a STA-ConvLSTM model constructed in the present invention;
FIG. 2 is a schematic diagram of an STA-LSTM prediction unit;
FIG. 3 is a schematic diagram showing a structure of a SAM module;
FIG. 4 is a schematic view of a TAM module structure;
fig. 5 is a diagram of the temporal and spatial evolution.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
The invention provides a PM2.5 full coverage prediction method based on a deep neural network, which specifically comprises the following steps:
step 1: air pollution concentration, meteorological data and land utilization data are collected from environment monitoring data, and the collected multi-source heterogeneous data are preprocessed and divided into training data and testing data.
And filling the acquired air quality data, meteorological data and land utilization data with null values, and using a Kriging space interpolation method according to the longitude and latitude of the data station. Firstly, preprocessing data, then selecting a proper interpolation method according to data types by using a Kriging spatial interpolation method to fit the data, and fitting an exponential model of a half-variant function by using an exponential model; the interpolation result can be expressed as:
Figure BDA0004188810470000071
wherein Z (x) represents the spatial interpolation of the unknown point, Z (xi) represents the data value of the known point, h ij Representing a known point x i And the Euclidean distance between the unknown points x, α being the correlation length parameter in the half-variance function for controlling the rate at which the spatial autocorrelation tapers off with increasing distance, n being the number of known points, e representing the error term.
The method comprises the steps of interpolating data into a 100 x 100 grid to generate an original feature matrix, carrying out dimension lifting processing on the original feature matrix by using a 1 x 1 convolution kernel and a mode of increasing the number of output channels by a filter, enabling information of different channels to be interacted and fused, improving the extraction capacity of a model to nonlinear features, then adopting the 1 x 1 convolution kernel to aggregate input data into two-dimensional feature vectors, enabling the two-dimensional feature vectors to be compatible with the input of a subsequent model, and carrying out normalization processing on the data. And the first two years in the data set are used as training data of a training set, and the data of the next year are used as test data of a test set, so that the initialization of the deep neural network prediction model is completed.
Step 2: as shown in fig. 1, a deep neural network prediction model STA-ConvLSTM is constructed that predicts contaminant concentrations in a target region.
(2.1) STA-ConvLSTM uses CNN network as the bottom layer for processing input data to extract spatial features, and spatial correlation of mesh data is extracted by convolution. The multi-source heterogeneous data is converted into a two-dimensional matrix with a time sequence which can be received by the CNN network, and is input into the CNN network to extract spatial features as the input of the stacked STA-LSTM.
(2.2) as shown in fig. 2, a plurality of layers of STA-LSTM having a space-time memory cell and a space memory cell are stacked as intermediate layers of a prediction model for extracting features of time correlation and space correlation.
The trained CNN weight parameters are input into a prediction unit based on multi-layer STA-LSTM stacking, and the input of a first layer of prediction unit is respectively used for extracting characteristic X from a CNN network t ∈R B×C×H×W (B and C represent the batch size and channel of the feature map), time memory
Figure BDA0004188810470000081
Spatial memory->
Figure BDA0004188810470000082
Hidden state->
Figure BDA0004188810470000083
(except that the input X is zero initially and t is the current moment), extracting the time sequence characteristics of the pollutant concentration, weather and land utilization data of the predicted area through the first layer prediction unit, realizing the encoding of the data and generating a new ∈>
Figure BDA0004188810470000084
Spatial memory->
Figure BDA0004188810470000085
Hidden state->
Figure BDA0004188810470000086
Will->
Figure BDA0004188810470000087
And->
Figure BDA0004188810470000088
Respectively adding the hidden state and the time memory list, taking the current hidden state and the space memory as well as the history hidden state and the time memory as the input of the next layer, and so on.
CNN network training generation X t ∈R B×C×H×W X to be generated t And the initialized time memory, space memory and hidden state are used as the original input of the STA-LSTM, and the following steps are made
Figure BDA0004188810470000089
For the time memory of the last layer,/the>
Figure BDA00041888104700000810
For the memory of the space of the upper layer,
Figure BDA00041888104700000811
for the upper layer hidden state, ° is a Hadamard product, σ represents the activation function sigmoid, # represents the convolution operator, and the feature extraction process in the prediction unit STA-LSTM can be expressed by the following formula:
Figure BDA00041888104700000812
Figure BDA00041888104700000813
Figure BDA00041888104700000814
Figure BDA00041888104700000815
Figure BDA00041888104700000816
Figure BDA00041888104700000817
Figure BDA00041888104700000818
wherein W is hf 、W hi 、W h3 Respectively representing forgetting gate, input gate, output gate and h in the characteristic extraction process t-1 Weight coefficient, W of (2) xf 、W xi 、W x3 Respectively representing forgetting gate, input gate, output gate and x in the characteristic extraction process t Weight coefficient of b) i 、b f And b 3 Respectively representing the bias values of the forgetting gate, the input gate, the output gate and the characteristic extraction process,
Figure BDA0004188810470000091
respectively representing the time memory, the space memory and the hidden state of the first layer at the time t.
In standard LSTM, for
Figure BDA0004188810470000092
Using original gates while constructing another set of gate structures in the same way to accommodate
Figure BDA0004188810470000093
The STA_LSTM unit includes both a time memory unit and a space memory unit. Final hidden state->
Figure BDA0004188810470000094
Fusion based on spatiotemporal memory. In order to link memories from different directions together, the sta_lstm unit uses a shared output gate to handle both types of memories, thereby achieving a seamless memory fusion. Furthermore, by using a 1 x 1 convolution layer for dimension reduction, the hidden state +.>
Figure BDA0004188810470000095
Has the same dimension as the memory cell. The method is different from simple memory series connection, and can effectively simulate the spatial variation and variation track in the space-time sequence.
And adding SAM at the top layer of the prediction unit, combining the hidden state of each layer output of the TAM into noted keys and values to form corresponding SAM output, and integrating more information of the lower layer into the top layer to improve the prediction.
Specifically, the SAM process is shown in fig. 3. Original hidden state of top layer
Figure BDA0004188810470000096
Convolving into query Q by a remodelling operation S ∈R B×C×(H*W) . Here B and C represent the batch size and channel of the feature map, respectively. Then, the hidden state of different layers at the same time step +.>
Figure BDA0004188810470000097
A key K is also generated S ∈R B×((L-1)*C)×(H*W) Sum value V S ∈R B×((L-1)*C)×(H*W) Finally, new hidden state->
Figure BDA0004188810470000098
The method can be obtained by the following steps:
Figure BDA0004188810470000099
after reshaping the dimensions back
Figure BDA00041888104700000910
After (I)>
Figure BDA00041888104700000911
Is +_hidden from original state>
Figure BDA00041888104700000912
Added and then normalized by Layer Normalization layers to be output from the SAM module.
When time t reaches prediction stage k, a TAM is embedded in the prediction unit of each layer, the historical feature grid data is connected in channel size as a key and value of attention, the current output is used as a query by using the channel attention function softmax, the TAM process is as shown in fig. 4, and the output of the TAM is used as the input of SAM (spatial attention module).
Delivering current hidden states over a convolutional layer
Figure BDA00041888104700000913
Generating query Q through shaping operations t ∈R B ×C×(H*W) . Also, from history input->
Figure BDA00041888104700000914
The key K can be obtained by two independent convolutions T ∈R B ×(t*C)×(H*W) Sum value V T ∈R B×(t*C)×(H*W) . Then, new hidden state->
Figure BDA0004188810470000101
The formula can be noted in time to calculate:
Figure BDA0004188810470000102
finally, will
Figure BDA0004188810470000103
Remodelling to the same size as the original hidden state, and adding Layer Normalization layers of normalized hidden state +.>
Figure BDA0004188810470000104
As an output of the TAM module.
(2.3) the last layer uses CNN layer in combination with features extracted by STA-LSTM units for decoding.
Figure BDA0004188810470000105
Figure BDA0004188810470000106
Wherein f SL As a function of the STA-LSTM,
Figure BDA0004188810470000107
output of STA_LSTM at time t and hidden state of each layer encoder respectively,/>
Figure BDA0004188810470000108
Is a history memory unit, Y t+1 And finally, decoding the features extracted by the STA_LSTM unit by using the CNN layer to finish the pollutant concentration prediction at the time t+1.
Step 3: and training the deep neural network prediction model by using training data.
In the training process of the STA-ConvLSTM prediction model, a mean square error function is adopted as a loss function of the model, and a calculation formula is as follows:
Figure BDA0004188810470000109
wherein Y is A+1:T Can also be expressed as
Figure BDA00041888104700001010
And X A+1:T Respectively, prediction result data and real data, H and W are grid data(i, j) represents the position in the grid area.
Step 4: the concentration of PM2.5 for the target area for the next N hours is predicted by using the trained model.
As shown in fig. 5, the final prediction output obtains a grid data, and in order to obtain the value of the pollutant concentration of the target area grid or the monitoring station, the target grid row-column coordinates need to be accurately positioned according to the interpolated station longitude and latitude coordinates, and the current grid data is the value of the predicted pollutant concentration in a certain target station area. The coordinate formula obtained by longitude and latitude is as follows:
Figure BDA00041888104700001011
Figure BDA0004188810470000111
T=Y jj
wherein l3n_PuerT and lat_PuerT are longitude and latitude of the query site respectively, l3n_max and lat_max are longitude and latitude of the interpolation maximum respectively, l3n_min and lat_min are longitude and latitude of the interpolation minimum respectively, height and width represent the height and width of the grid.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explanation of the principles of the present invention and are in no way limiting of the invention. Accordingly, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present invention should be included in the scope of the present invention. Furthermore, the appended claims are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or equivalents of such scope and boundary.

Claims (9)

1. The full coverage prediction method of PM2.5 based on the deep neural network is characterized by comprising the following steps of:
(1) Preprocessing the pre-acquired air pollution concentration, meteorological data and land utilization data, and dividing the pre-acquired air pollution concentration, meteorological data and land utilization data into training data and test data;
(2) Constructing a deep neural network prediction model STA-ConvLSTM for predicting the pollutant concentration of a target area; the STA-ConvLSTM takes a CNN network as a bottom layer and is used for processing input data to extract spatial features and extracting spatial correlation of grid data through convolution; taking stacked layers of STA-LSTM with space-time memory units and space memory units as intermediate layers of a prediction model for extracting features of time correlation and space correlation; the last layer uses CNN layer to combine with the feature extracted by STA-LSTM unit to decode;
(3) Training a deep neural network prediction model STA-ConvLSTM by using training data;
(4) The concentration of PM2.5 for the target area for the next N hours is predicted by using the trained model.
2. The full coverage prediction method of PM2.5 based on deep neural network according to claim 1, wherein the implementation process of step (1) is as follows:
interpolation filling is carried out on the acquired multi-source heterogeneous data by using a kriging space interpolation method according to the longitude and latitude of a data site, the data is interpolated into a grid of 100 x 100, and a generated original feature matrix is generated; fitting an exponential model of the semi-variational function using an exponential model of the exponentiation; the interpolation result can be expressed as:
Figure FDA0004188810440000011
wherein Z (x) represents the spatial interpolation of the unknown point, Z (xi) represents the data value of the known point, h ij Representing a known point x i And the Euclidean distance between the unknown points x, α being a correlation length parameter in the half-variance function for controlling the rate at which the spatial autocorrelation gradually decreases with increasing distance, n being the number of known points, e representing the error term;
the original feature matrix is subjected to dimension lifting processing in a mode of increasing the number of output channels by using a 1 multiplied by 1 convolution kernel and a filter, so that information of different channels can be interacted and fused; then, the input data is aggregated into a two-dimensional eigenvector by adopting a 1×1 convolution kernel, the two-dimensional eigenvector is compatible with the input of the subsequent model, and the data is normalized.
3. The deep neural network-based full coverage prediction method for PM2.5 according to claim 1, wherein the feature implementation process of extracting the temporal correlation and the spatial correlation is as follows:
s1: the spatial features extracted by the CNN network are input into a prediction unit based on multi-layer STA_LSTM stacking, and the input of the first layer of prediction unit is the extracted features X of the CNN network t ∈R B×C×H×W B and C represent the batch size and channel of the feature map; time memory
Figure FDA0004188810440000021
Spatial memory->
Figure FDA0004188810440000022
Hidden state->
Figure FDA0004188810440000023
l is the number of layers of the current prediction unit, and t is the current moment; extracting the time sequence characteristics of the pollutant concentration, weather and land utilization data of the predicted area through the first layer of prediction unit to realize the encoding of the data and generate new +.>
Figure FDA0004188810440000024
Spatial memory->
Figure FDA0004188810440000025
Hidden state->
Figure FDA0004188810440000026
Will->
Figure FDA0004188810440000027
And->
Figure FDA0004188810440000028
Respectively adding the hidden state and the time memory list, and taking the current hidden state and the space memory as input of the next layer and the historical hidden state and the time memory;
s2: adding SAM at the top layer of the prediction unit, merging the hidden state of each layer output of STA_LSTM into the key and the value of attention, using the current output as the query, forming the corresponding SAM output, and realizing that more information of the lower layer is integrated at the top layer to improve the prediction;
s3: when the time t reaches the prediction stage k, a TAM is embedded in the prediction unit of each layer, the historical feature grid data is connected in the channel size as a key and a value of attention, the current output is used as a query by using a channel attention function softmax, and the current output of the TAM and the historical hidden state are used as the query and input of the SAM.
4. The method for full coverage prediction of PM2.5 based on a neural network according to claim 1, wherein in step (2) the feature extracted by using the CNN layer in combination with the sta_lstm unit is decoded, which comprises:
Figure FDA0004188810440000029
Figure FDA00041888104400000210
wherein f SL As a function of the STA-LSTM,
Figure FDA00041888104400000211
output of STA-LSTM at time t and hidden state of each layer encoder respectively,/>
Figure FDA00041888104400000212
Is a history memory unit, Y t+1 And finally, decoding the features extracted by the STA-LSTM unit by using the CNN layer to finish the pollutant concentration prediction at the time t+1.
5. The method for predicting full coverage of PM2.5 based on deep neural network according to claim 3, wherein the step S1 is implemented as follows:
spatial feature X extracted from CNN network layer t ∈R B×C×H×W And the initialized time memory, space memory and hidden state are used as the original input of the STA-LSTM, and the following steps are made
Figure FDA0004188810440000031
For one layer of time memory above the current time, +.>
Figure FDA0004188810440000032
For the current temporal previous layer of spatial memory, +.>
Figure FDA0004188810440000033
For a layer of hidden state at the current time, the hidden state is a Hadamard product, sigma represents an activation function sigmoid, sigma represents a convolution operator, and the feature extraction process in a prediction unit STA-LSTM is as follows:
Figure FDA0004188810440000034
Figure FDA0004188810440000035
Figure FDA0004188810440000036
Figure FDA0004188810440000037
Figure FDA0004188810440000038
Figure FDA0004188810440000039
Figure FDA00041888104400000310
wherein W is hf 、W hi 、W h3 Respectively representing forgetting gate, input gate, output gate and h in the characteristic extraction process t-1 Weight coefficient, W of (2) xf 、W xi 、W x3 Respectively representing forgetting gate, input gate, output gate and x in the characteristic extraction process t Weight coefficient of b) i 、b f And b 3 Respectively representing the bias values of the forgetting gate, the input gate, the output gate and the characteristic extraction process,
Figure FDA00041888104400000311
respectively representing the time memory, the space memory and the hidden state of the first layer at the moment t;
in standard LSTM, for
Figure FDA00041888104400000312
With the original door, another set of door structures is constructed in the same way to accommodate +.>
Figure FDA00041888104400000313
At this time STA-LSTM listThe cell includes a time memory unit and a space memory unit; final hidden state->
Figure FDA00041888104400000314
Fusion based on space-time memory; in order to link memories from different directions together, the STA-LSTM unit uses a shared output gate to handle both types of memories, thereby achieving seamless memory fusion; in addition, the dimension reduction is performed by using a convolution layer of 1×1 to enable the hidden state
Figure FDA00041888104400000315
Has the same dimension as the memory cell.
6. The method for predicting full coverage of PM2.5 based on deep neural network according to claim 3, wherein said step S2 is implemented as follows:
original hidden state of top layer
Figure FDA0004188810440000041
Convolving into query Q by a remodelling operation S ∈R B×C×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the B and C represent the batch size and channel of the feature map, respectively; hidden state of different layers at the same time step
Figure FDA0004188810440000042
A key K is also generated S ∈R B×((L-1)*C)×(H*W) Sum value V S ∈R B×((L-1)*C)×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the New hidden state->
Figure FDA0004188810440000043
Obtained by:
Figure FDA0004188810440000044
after reshaping the dimensions back
Figure FDA0004188810440000045
After (I)>
Figure FDA0004188810440000046
Is +_hidden from original state>
Figure FDA0004188810440000047
Added and then normalized by Layer Normalization layers to be output from the SAM module.
7. The method for predicting full coverage of PM2.5 based on deep neural network according to claim 3, wherein said step S3 is implemented as follows:
delivering current hidden states over a convolutional layer
Figure FDA0004188810440000048
Generating query Q through shaping operations t ∈R B×C×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the Also, from history input->
Figure FDA0004188810440000049
Obtaining key K by two independent convolutions T ∈R B×(t*C)×(H*W) Sum value V T ∈R B ×(t*C)×(H*W) The method comprises the steps of carrying out a first treatment on the surface of the Then, new hidden state->
Figure FDA00041888104400000410
The formula can be noted in time to calculate:
Figure FDA00041888104400000411
finally, will
Figure FDA00041888104400000412
Remodelling to the same size as the original hidden state, and adding Layer Normalization layer normalized originalHidden state->
Figure FDA00041888104400000413
As an output of the TAM module.
8. The full coverage prediction method of PM2.5 based on deep neural network according to claim 1, wherein the implementation procedure of step (3) is as follows:
the mean square error function is used as a loss function of the model:
Figure FDA00041888104400000414
wherein,,
Figure FDA00041888104400000415
and X A+1:T The predicted result data and the real data, H and W are the height and width of the mesh data, respectively, (i, j) indicates the position in the mesh region.
9. The full coverage prediction method of PM2.5 based on deep neural network according to claim 1, wherein the implementation procedure of step (4) is as follows:
predicting and outputting to obtain grid data, namely predicting the value of the pollutant concentration in a certain target site area in order to obtain the value of the pollutant concentration of the target area grid, and accurately positioning the pollutant concentration in the row and column coordinates of the target grid according to the longitude and latitude coordinates of the interpolated site; the coordinate formula obtained by longitude and latitude is as follows:
Figure FDA0004188810440000051
Figure FDA0004188810440000052
y=Y j,j
the method comprises the steps of determining a grid, wherein lon_query and lat_query are respectively the longitude and the latitude of a query site, lon_max and lat_max are respectively the longitude and the latitude with the largest interpolation, lon_min and lat_min are respectively the longitude and the latitude with the smallest interpolation, and height and width represent the height and the width of the grid.
CN202310423066.0A 2023-04-19 2023-04-19 PM2.5 full-coverage prediction method based on deep neural network Pending CN116432850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310423066.0A CN116432850A (en) 2023-04-19 2023-04-19 PM2.5 full-coverage prediction method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310423066.0A CN116432850A (en) 2023-04-19 2023-04-19 PM2.5 full-coverage prediction method based on deep neural network

Publications (1)

Publication Number Publication Date
CN116432850A true CN116432850A (en) 2023-07-14

Family

ID=87083036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310423066.0A Pending CN116432850A (en) 2023-04-19 2023-04-19 PM2.5 full-coverage prediction method based on deep neural network

Country Status (1)

Country Link
CN (1) CN116432850A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117420052A (en) * 2023-10-09 2024-01-19 江苏海洋大学 PM2.5 prediction method integrating multi-scale space-time information

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117420052A (en) * 2023-10-09 2024-01-19 江苏海洋大学 PM2.5 prediction method integrating multi-scale space-time information

Similar Documents

Publication Publication Date Title
CN111612066B (en) Remote sensing image classification method based on depth fusion convolutional neural network
CN109492822B (en) Air pollutant concentration time-space domain correlation prediction method
CN109887282B (en) Road network traffic flow prediction method based on hierarchical timing diagram convolutional network
CN111223301B (en) Traffic flow prediction method based on graph attention convolution network
CN109508360B (en) Geographical multivariate stream data space-time autocorrelation analysis method based on cellular automaton
CN109117987B (en) Personalized traffic accident risk prediction recommendation method based on deep learning
CN110766942B (en) Traffic network congestion prediction method based on convolution long-term and short-term memory network
CN111127888A (en) Urban traffic flow prediction method based on multi-source data fusion
CN109299401A (en) Metropolitan area space-time stream Predicting Technique based on deep learning model LSTM-ResNet
CN111523706B (en) Section lane-level short-term traffic flow prediction method based on deep learning combination model
CN110322453A (en) 3D point cloud semantic segmentation method based on position attention and auxiliary network
CN109829495A (en) Timing image prediction method based on LSTM and DCGAN
CN110570035B (en) People flow prediction system for simultaneously modeling space-time dependency and daily flow dependency
CN115827335B (en) Time sequence data missing interpolation system and time sequence data missing interpolation method based on modal crossing method
CN110097028A (en) Crowd's accident detection method of network is generated based on three-dimensional pyramid diagram picture
CN116432850A (en) PM2.5 full-coverage prediction method based on deep neural network
CN115392554A (en) Track passenger flow prediction method based on depth map neural network and environment fusion
CN112700104A (en) Earthquake region landslide susceptibility evaluation method based on multi-modal classification
CN116844041A (en) Cultivated land extraction method based on bidirectional convolution time self-attention mechanism
CN114819386A (en) Conv-Transformer-based flood forecasting method
CN117494034A (en) Air quality prediction method based on traffic congestion index and multi-source data fusion
CN116596151A (en) Traffic flow prediction method and computing device based on time-space diagram attention
CN114742206A (en) Rainfall intensity estimation method for comprehensive multi-space-time scale Doppler radar data
CN113935458A (en) Air pollution multi-site combined prediction method based on convolution self-coding deep learning
CN118038021A (en) Transformer substation operation site foreign matter intrusion detection method based on improvement yolov4

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination