CN117371571A

CN117371571A - Regional air quality prediction model based on multi-scale dynamic synchronous diagram mechanism

Info

Publication number: CN117371571A
Application number: CN202311126538.2A
Authority: CN
Inventors: 陈晓霞; 夏汉忠; 刘承硕; 王振; 胡悦
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2023-09-04
Filing date: 2023-09-04
Publication date: 2024-01-09

Abstract

The invention aims at effectively encoding complex space-time correlation between monitoring stations by a unique composition mode which combines a composition mode of a dynamic diagram and a multi-scale synchronous diagram. And extracting multi-scale spatio-temporal correlations in parallel by multi-scale spatio-temporal synchronization map convolution components composed of GCNs. Finally, the long-term side-feature effects and the short-term spatiotemporal effects are focused by the encoder-decoder structure. The method of the present invention has superior performance in long-term and short-term regional air quality predictions compared to existing data-driven air quality prediction popularity methods.

Description

Regional air quality prediction model based on multi-scale dynamic synchronous diagram mechanism

Technical Field

The invention relates to a data-driven regional air quality prediction model, in particular to a regional air quality prediction method based on a multi-scale dynamic synchronous diagram mechanism.

Background

The intelligent city is a specific application of the Internet of things, and urban services such as city management, traffic, public safety and the like are more efficient by utilizing sensors among cities, and the regional air quality prediction system is a management system for processing regional air quality under the framework of the intelligent city. Research and prediction technology of the regional air quality prediction system can enable us to better prevent and control the air pollution problem, and related departments can implement more effective measures and promote related legislation under the guidance of the regional air quality prediction system. The regional air quality prediction system helps authorities solve regional air quality management problems by extracting features from historical measurements and auxiliary features (e.g., meteorological conditions, wind direction) of urban monitoring sites to predict future air quality conditions.

Some recent studies have considered regional air quality prediction as a predictive problem in a space-time diagram, where monitoring sites within each region can be considered as nodes in the diagram, where each node in the space-time diagram can affect neighboring nodes simultaneously in the same time step (referred to as spatial correlation), and in subsequent time steps neighboring nodes can still be affected (referred to as temporal correlation). In addition, a node may affect neighboring nodes at the same time as the next time step (called short cross correlation), and may also affect neighboring nodes of neighboring nodes (called long cross correlation), where these different types of correlations are collectively called multi-scale spatio-temporal correlations (multiscale spatio-temporal correlations, abbreviated as MSTCs), and the MSTCs may also dynamically fluctuate with changes in auxiliary features such as meteorological conditions, wind speed, and the like.

Some studies have achieved good predictive performance by modeling auxiliary features by deep learning methods: capturing the influence of auxiliary features on air quality based on a convolutional neural network and a recurrent neural network; the recurrent neural network based on multi-layer attention considers the readings and spatial data of a plurality of sensors at the same time, and obtains good regional air quality prediction performance, however, the method focuses on the influence of auxiliary features, but cannot accurately capture the spatial dependence among sites. Other researches are based on a graph neural network to study the air quality prediction problem, the time-space correlation contained in the time steps is proposed through graph convolution, and then the regional air quality prediction result is further given by using a cyclic neural network. Good predictions were also obtained, however the above approach uses different components to model the temporal and spatial correlations separately, resulting in significant difficulty in capturing inter-site cross-correlation.

Accurate predictions of regional air quality rely heavily on modeling MSTCs, previous work has been able to capture spatial and temporal correlations independently, but difficulties arise when trying to capture the effects of cross-correlations simultaneously, even though spatio-temporal correlations are captured, but lack the capture of dynamic correlations. Capturing the MSTCs if dynamic factors are considered will result in a significant improvement in the air quality performance of the predicted area.

Disclosure of Invention

The main technical problems to be solved by the invention are as follows: a construction method of a multi-scale dynamic synchronous diagram is designed, and the structure of the diagram is respectively deduced from the dynamic and synchronous angles. Specifically, the method of the invention provides a multi-scale space-time synchronous picture volume integral structure based on a stacked residual picture convolution layer, which can capture MSTCs from a multi-scale dynamic synchronous picture, then effectively and dynamically capture long-term influence of auxiliary features and short-term influence of multi-scale space-time characterization through a synchronous picture attention mechanism, and finally predict regional air quality by utilizing the extracted features and a model.

The technical scheme adopted for solving the technical problems is as follows: a regional air quality prediction method based on a multiscale dynamic synchronous diagram mechanism comprises the following steps:

Step (1): collecting data, collecting the data to be testedUrban N regional site historical air pollution data, and sites in a target urban region are expressed as a group of V= { V ₁ ,v ₂ ,...,v _N }. Continuously collecting observation results and meteorological factor data of air pollutant particles at fixed intervals, wherein the air pollutant data comprises sulfur dioxide, nitrogen dioxide, ozone, carbon monoxide and fine particulate matter PM _2.5 And PM ₁₀ The method comprises the steps of carrying out a first treatment on the surface of the The meteorological factor data comprise wind speed, precipitation, temperature, dew point and air pressure. The number of points of interest of the city sites is collected, and the number of different points of interest in 5KM of each site is collected.

Step (2): the original data is preprocessed. Collecting data containing missing or unclean data, detecting abnormal values by using Laida criteria and removing the abnormal values; filling missing values and outliers through linear interpolation, then adopting Z-score to normalize to promote convergence of a model, and finally carrying out time step division and input-output definition. The specific implementation process is as described in the steps (2.1) to (2.5)

Step (2.1): abnormal value detection and elimination are carried out on the data by using a Laida criterion method, and the independently obtained measurement data are subjected toCalculate the arithmetic mean mu and the residual error +. >And calculate the standard deviation sigma, if a certain measured value x _i Is the residual error v of (2) _i (1.ltoreq.i.ltoreq.n) satisfying the following formula:

then consider asShould be rejected.

Step (2.2): and filling the acquired air pollution data and the missing values of the meteorological factor data in the time dimension by adopting a linear interpolation method.

Step (2.3): normalization is carried out by using a Z-score mode, and air pollution data and meteorological factor data are normalized in a time dimension, and the formula is as follows:

wherein the method comprises the steps ofRepresenting the measured data, σ is the standard deviation of the measured data, μ is the mean value of the measured data, ++>Is normalized data.

Step (2.4): the data set is partitioned. The obtained air quality and meteorological data are divided into a training set, a testing set and a verification set.

Step (2.5): dividing the time steps, defining an initial input and a target output. Taking representative pollutant PM in air pollutants _2.5 Concentration as target pollutant, setPM representing sites within a target city area at time step t _2.5 Concentration observation history, where F is the air contaminant particle characteristic number. Taking as input the observation history value with a time window length of T time steps, denoted as x= { X ¹ ,x ² ,...,x ^T }. The air contaminant particle concentration of the zone site V for the future τ time step is defined as the set of sequences y= { Y ^T+1 ,y ^T+2 ,...y ^T+τ Output as target, wherein ∈ ->Meteorological characteristics may be defined as +.>Wherein F is _M Is the characteristic quantity of the meteorological characteristics. The point of interest feature may be defined as +.>Wherein F is _P Is the number of features of the point of interest feature.

Step (3): and generating a multi-scale dynamic synchronous diagram. A predefined graph with geographic prior knowledge is first generated from latitude and longitude information between sites. The accuracy of predicting the target air contaminant particle concentration is then further improved by fluctuations in the node signals and introducing spatial attention mechanisms and mask matrices. The spatial attention mechanism assigns weights in the node dimensions that enable accurate capture of dynamic spatial information to generate a node matrix. And constructing a dynamic graph adjacent matrix by utilizing the node matrix, multiplying the dynamic graph adjacent matrix by the mask matrix to enable the dynamic graph adjacent matrix to follow the structure of the predefined graph, thereby capturing the dynamic time-space correlation, and finally adding the generated dynamic graph adjacent matrix and the predefined graph adjacent matrix to obtain the prior knowledge of geography. And then constructing a multi-scale dynamic synchronous graph, capturing different MSTC by connecting space-time neighbor nodes of adjacent time steps, and generating the dynamic synchronous graph with different scales. And finally, taking the adjacency matrix of the generated synchronous graphs with different scales as a part of model output. The specific implementation process is as described in the steps (3.1) to (3.3).

Step (3.1): following the principle of "the closer two sites are, the more geographically relevant they are". Constructing a predefined graph G by using the geographical longitude and latitude of a site _Pre ＝(V _Pre ,E _Pre ,A _Pre ) Wherein V is _Pre Representing the total number of vertices in the graph, E _Pre The total number of edges in the graph is represented,representing an adjacency matrix of the graph, wherein the generation formula of the weight values in the adjacency matrix is as follows:

wherein dist (v) _i ,v _j ) Representing v _i And v _j Euclidean distance between, ψ ² The parameters of the distribution are controlled for a gaussian kernel function,is the set Euclidean distance threshold.

Step (3.2): stacked one-dimensional convolution, spatial attention mechanisms, and metric learning are introduced to construct a dynamic graph. The specific implementation process is as described in the steps (3.2.1) to (3.2.5).

Step (3.2.1): firstly, the input node signal X is projected to a potential space through a fully connected network, and the conversion formula is as follows:

wherein H is _node Representing the projected node signal, d _model The dimension size representing the potential space is also designated as the generic dimension size of the proposed model.

Step (3.2.2): h was determined using the aggregation function AGG (. Cndot.) _node The time dimension of (2) is subjected to aggregation dimension reduction, and the process formula is as follows:

where d' represents the aggregated dimension, AGG (·) represents the aggregate function, consisting of stacked one-dimensional convolution operations, the time dimension can be reduced to 1, which is specific to the result M after each convolution operation _i,f' The specific polymerization formula is as follows:

wherein, represents the cross operation relation in convolution, H _:,i,f' Time information representing the f' th entry in the input i-th node，M _i,f' Represents the f' th channel of the output, W _f',f Is a trainable parameter for the model from the f' th channel to the f th channel.

Step (3.2.3): each node is dynamically allocated with different weights by adopting a spatial attention mechanism, and the allocation weight formula is as follows:

wherein LeakyReLU (·) is an activation function, the data is non-linearly transformed, FC () represents a fully connected layer with the activation function,is->Dimension, ζ _i Representing node v _i Is>Is an inner product operation. Notably, the operation of the attention mechanism is operated through the node dimension. />Representing the calculated node v _i And v _j Spatial correlation between->Representing spatial attention scores after softmax manipulation.

Step (3.2.4): further, a metric learning method is adopted to generate the correlation between nodes by learning a metric function phi () expressed by paired nodes, and the specific formula is as follows:

wherein the method comprises the steps ofRepresenting learned node v _i And node v _j Dynamic spatial correlation between. Next, the generated Δa is multiplied by the mask matrix mask so that its generated graph adjacency matrix can be similar to the adjacency matrix of the predefined graph. Then generating an adjacency matrix of the dynamic graph in a normalization mode, wherein the adjacency matrix comprises the following specific procedures:

ΔA＝Norm(ΔA⊙mask)

Where mask is a mask matrix with a fill number of 1, the fill 1 shape is the same as the shape of the adjacency matrix of the predefined graph, and the rest of the positions are filled with 0. As indicated by the fact that the corresponding multiplication of the elements in the same position of the matrix is performed, norm (. Cndot.) represents the normalization operation.

Step (3.2.5): the predefined graph adjacency matrix A to be generated _Pre And adding the adjacent matrixes of the dynamic diagrams, performing nonlinear change through an activation function, and finally normalizing to obtain an adjacent matrix A of the new structural diagram, wherein the specific process formula is as follows:

A＝Norm(LeakyReLU(A _Pre +ΔA))

wherein, leakyReLU (·) is the activation function.

Step (3.3): the generated graph adjacency matrix A is used for generating dynamic synchronous graph adjacency matrices with different scales. To capture the time correlation, A 'is constructed by connecting all nodes to itself at adjacent time steps' _t . Creating a 'by connecting all nodes with their respective neighbors in successive time steps' _st To capture short cross-correlations. By connecting all nodes with their 1-hop neighbors in successive time stepsJian A' _lst To capture long cross-correlations. The constructed graph may implicitly carry spatial correlation due to its inherent nature. The new weights of the edges in the adjacent matrix of the dynamic synchronous diagram with three different scales are aggregated based on the weights of the edges of the diagram on the original time step, and the weight calculation formula is as follows:

Wherein v is _i ,v _j Is a node in the original graph, v _k Representing nodes of the graph at other time steps,representing v in the original graph _j Is>Weights representing the original graph, +.>Representation and v _j Number of adjacent nodes. After the operation, the adjacency matrix { A 'of the multiscale dynamic synchronous diagram can be obtained' _t ,A′ _st ,A′ _lst As part of the subsequent model input.

Step (4): model building and input and output. The model includes a multi-scale spatio-temporal synchronization map convolution component module (Multiscale dynamic synchronous graph convolution component, abbreviated MSTS-GCC), an auxiliary function embedding module (Auxiliary feature embedding, abbreviated AFE), and a codec-Decoder module (Encoder-Decoder, abbreviated EDs).

Wherein the MSTS-GCC is composed of a series of Space-Time attention blocks (STS) which are composed of three Space-Time attention blocks (ST) corresponding to different scales of the dynamic synchronous diagram respectively, and the ST blocks are composed of stacked diagram convolution neural networks (Graph Convolutional Network, GCN) and pooling operation, thereby extracting multi-scale Space-Time representation and helping the model capture deeper Space information.

The AFE converts the auxiliary features (meteorological features, temporal features and points of interest) that have direct or indirect influence on the region into an embedded matrix for model training, so that the prediction result of the model is more accurate.

EDs module is made up of five subassemblies: synchronization map attention (Synchronous graph attention, abbreviation: SGA), time attention (Temporal attention, abbreviation: TA), encoder-decoder attention (Encoder-Decoder attention, abbreviation: EDA), fusion layer and feed forward neural network (Feedforward Neural Network, abbreviation: FFN). The SGA is primarily responsible for dynamically assigning site weights between different time steps based on extracting a multi-scale spatio-temporal representation. The TA weights different time steps of the same station based on the assist feature embedding. The EDA is responsible for fusion reassignment of the auxiliary features of the future step sizes and the encoder output. In other words, SGA is related to the spatial dimension, TA is related to the temporal dimension, and EDA is responsible for the interfacing between encoder and decoder. The specific implementation process is shown in the steps (4.1) to (4.3).

And (4.1) extracting the multi-scale space-time representation by using MSTS-GCC. The MSTS-GCC module consists of a plurality of parallel STS blocks, and the STS blocks consist of three parallel ST blocks, and the corresponding generated dynamic space-time diagrams of three scales. Wherein the ST block consists of stacked GCN layers, which are finally subjected to an aggregation operation by an aggregation layer. STS blocks may capture single-time MSTCs by corresponding to different-scale ST blocks. The MSTS-GCC captures the MSTCs of the entire time window through multiple parallel STS blocks. The specific implementation process is shown in the steps (4.1.1) to (4.1.3).

Step (4.1.1): a single scale spatio-temporal representation is obtained by the ST block. The ST block consists of stacked GCN layers, which are finally subjected to an aggregation operation by a pooling layer. The GCN layer may cause each node to aggregate its characteristics with neighboring nodes at neighboring time steps. The formula of the GCN is shown below:

H ^(k) ＝GCN(A′,H ^(k-1) )＝GLU((A′H ^(k-1) W _a +b ₁ )⊙sigmoid(A′H ^(k-1) W _b +b ₂ ))

wherein H is ^(k) Expressed as the output of the kth layer GCN, H ⁽⁰⁾ Then the signal sequence of the first layer GCN,adjacency matrix, W, representing a constructed single-scale dynamic synchronization map _a 、W _b 、b ₁ B ₂ For trainable parameters, GLU () and sigmoid () are activation functions.

Thus, deep spatio-temporal information { H } can be obtained by stacking K GCN layers ⁽⁰⁾ ,H ⁽¹⁾ ,...,H ^(K) }. Since the GCN uses dynamic synchronization graphs as part of the input, the GCN can aggregate features from the previous time step and the next time step, such that the output of the GCN layer is noisy. The averaging pooling operation AvgPooling (·) is used to filter the noise. This operation applies element-level averaging operations to the outputs of all GCN blocks in the ST block, so that { H } ⁽⁰⁾ ,H ⁽¹⁾ ,...,H ^(K) Compression to H _agg The operation is as follows:

further, for H _agg Performing a clipping operation to obtain information for the intermediate time step, generating an output H' of the ST block, i.e., a single scale spatio-temporal representation, as follows:

Step (4.1.2): STS Block according to step (4.1.1), the dynamic Sync map { A } ', is plotted' _t ,A′ _st ,A′ _lst Input ST block of the signal of the sequence and the corresponding sequence, extract three-scale multi-scale space-time hiding dependence { H' _t ,H′ _st ,H′ _lst Generating a multi-scale spatio-temporal representation of corresponding time steps using the linear layer and the fusion layerThe specific process is as follows:

wherein W is _t 、W _st 、W _lst And b is a trainable parameter of the fusion layer, tanh (·), relu (·) is an activation function, linear (·) represents a linear layer, STblock (·) represents the ST block operation of step (4.1.1).

Step (4.1.3): MSTS-GCC according to step (4.1.2), MSTCs are captured for different time steps by a plurality of STS blocks, and finally a multi-scale spatio-temporal representation of the whole time window is obtained. The input to the MSTS-GCC is the projected node signalThe process is as follows:

further, at X _temp Performing filling operation on the time step information to obtain a filled sequenceThen dividing the padded sequence by a sliding window with the length of 3 to obtain a division sequence of t time steps +.>Will divide sequence X' _t With three-dimensional dynamic map { A' _t ,A′ _st ,A′ _lst Input STS module, multi-scale space-time representation of the corresponding t-th time step can be obtained +.>The process can be expressed as follows:

Wherein STSblock (·) represents the operation of the STS block of step (4.1.2).

Further, a multi-scale spatio-temporal representation of a time window of length T may be obtained in accordance with the above stepsFinally, the three time windows are connected in parallel to obtain multi-scale space-time representation of the whole time window:

where concat (-) represents a concatenation operation.

And (4.2) converting the auxiliary features into an embedded matrix by using an AFE module for model training. The specific implementation process is shown in the steps (4.2.1) to (4.2.4).

Step (4.2.1) of embedding the time feature. First, using one-hot encoding to represent the hour of the day and the day of the week for each time step, two shapes are created separatelyAnd->Tensors of (c). Subsequently, two tensors are transformed into +.>Is a shape of (c). Then, the two transformed tensors are added element by element to obtain an embedded +.>It captures information about the past time step. Similarly, a time feature is embedded for a future time step, resulting in an embedding +.>To sum up, the time characteristics of the past T and future τ time steps are embedded, expressed as matrix +.>

And (4.2.2) embedding the meteorological features. Meteorological feature X using a two-layer fully connected neural network _M The embedding is performed as follows:

step (4.2.3): for point of interest feature (Point of Interests, abbreviation: POIs) X _P Embedding is performed. POIs data represent, to some extent, a potential geographic feature: air pollution in industrial areas tends to be more severe than in park dense areas. Constructing a new graph G using POIs _POI ＝(V _SC ,E _SC ,A _SC ) WhereinIndicating observation points +.>Representing different POIs categories, wherein the weights A _sc(i,j) Representing monitoring site->Within 5km of the vicinity->Number of kinds of buildings. The generated graph is processed by Node2Vec method for G _POI Performing graph embedding to obtain final POI embedding +.>

Step (4.2.4) of obtaining the auxiliary embedding representation. Broadcasting and adding POIs embedded and time characteristic embedded obtained in the step (4.2.1) and the step (4.2.3) to obtain auxiliary embedded representation

E _AU ＝E _POI +E _Time

Step (4.3) E extracted in step (4.1) _SP And E extracted in step (4.2) _AU And inputting the model into an EDs module to obtain the final output of the model. The specific implementation process is shown in the steps (4.3.1) to (4.3.8).

Step (4.3.1): embedding an input node signal X and an aerial image into E before entering an encoder _M Performing projective transformation, and adding the post and position codes to obtain the initial input of the encoder The process is as follows:

Z ⁽⁰⁾ ＝FC(X)+FC(E _M )+PE

where FC (·) represents the fully connected layer for projective transformation, PE represents position coding,representing the coding layer output of the first layer and the input of the l+1 encoder layer.

Step (4.3.2): output Z of the first layer ^(l-1) And E generated in the step (4.1) _SP The first-1 layer SGA is input, and the output of the first encoder layer SGA is generated. Specifically, the SGA module will Z ^(l-1) Medium representative node v _i Time step t _h Is hidden state of (a)And slave E _SP Sub-multi-scale spatio-temporal representation extracted from tensors of +.>Connected together, and then node v is calculated using a scaled dot product method _i And the spatial correlation between the nodes v, the specific formula is as follows:

wherein, the connection operation is represented by the I,is->Wherein SGA employs a multi-headed attentiveness mechanism,/->For node v _i And node v, FC (·) represents the fully connected layer followed by the LeakyReLU () activation function.

Further, then the calculation is performedNormalization by softmax layer gave the attention score, which was converted to the following formula:

wherein the SGA introduces a multi-head attention mechanism to stabilize the learning process,represents the nth _h Score of the individual SGA attention header. Specific N thereof _h The individual attention parallel mechanism is as follows:

further, the SGA output of the first layer can be obtained through the steps

Step (4.3.3): output Z of layer 1 ^(l-1) And E generated in the step (4.2) _AU Input to the first layer TA, and generate the output of the first encoder layer TA. In particular the TA Module will hide the stateAnd E is connected with _ex Sub-auxiliary feature embedding +.>Connected together, the attention score is then calculated using a multi-head attention mechanism. After obtaining the attention score, a time step t can be generated _j Time node v _i The specific formula of the attention calculation is as follows:

wherein,representing a time step t _j And t, +.>Is TA block n _h A score of the individual attentiveness, representing the time step t _j Importance to t->Is->Dimension of->Representing time step t _j The previous subset.

Further, the TA output of the first encoder layer can be obtained through the steps

Step (4.3.4): the output of SGA and TA of the first encoder layer is fused by using a gating fusion layer and a feedforward neural network, and the conversion formula is as follows:

wherein,and->Are trainable parameters. g is the gate control tensor that is generated,is the output of the fusion layer in the first encoder layer, which can adaptively control the spatial and temporal dependencies of each node and time step, FFN (·) represents the feedforward neural network.

Step (4.3.5): and an encoder input/output. Initial input Z to encoder ⁽⁰⁾ Feeding into a decoder, according to steps (4.3.2) to (4.3.4), the 1 st to L th can be generated in turn _en An output of the encoder layer, where L _en The output of the encoder layers is taken as the final output of the encoder

Step (4.3.6): the final output of the encoder is taken as input to the EDA module, generating the initial input to the decoder. EDA Module slave E _ex Extract node v _i Time step t of (2) _f Future assist feature embedding of (a)As a query vector, where t _f Representing a certain time step (t) _f ＝t _T+1 ,t _T+2 ,...,t _T+τ ) T denotes a certain time step in the set of historical time steps (t=t ₁ ,t ₂ ,...,t _T ). EDA using encoder output +.>The hidden state of the output node serves as a key and a value in the attention mechanism, and a specific formula is as follows:

wherein the method comprises the steps ofRepresenting a time step t _j And t, correlation between->Is the nth _h EDA multi-head attention score of individual, +.>Is->Is a dimension of (c).

Further, an initial input D to the decoder is made possible by the above procedure ⁽⁰⁾ 。

Step (4.3.7) is decoder input/output. Initial input D of decoder ⁽⁰⁾ Feeding into a decoder, according to steps (4.3.2) to (4.3.4), the 1 st to L th can be generated in turn _de Outputs of decoder layers, where L _de The output of each decoder layer is taken as the final output of the decoder

And (4.3.8) outputting the model. The final output of the decoderFeeding into a convolutional neural network and a fully-connected neural network to generate a final output:

wherein Relu (·) represents the activation function, FC (·) and Conv (·) represent the fully connected and convolutional neural networks, respectively,representing the final output of the model.

Step (5): and (3) calculating a loss function by the model, performing parameter training, and adding the data set obtained in the step (2) and the multi-scale dynamic synchronous diagram obtained in the step (3) into the model constructed in the step (4).

In the step (5), the loss function adopts a mean square error (Mean Squared Error, abbreviated as MSE), and the specific formula is as follows:

where θ is a model-trainable parameter,is a regular term and λ is a regular term hyper-parameter.

In the step (5), the training process adopts an optimizer Adam to optimize the network, the initial learning rate is 0.0001, the iteration number is 100, meanwhile, an early-stop method is adopted when training is carried out, and when the error of the model on the verification set is 10 times continuously worse than the previous training result, the training is stopped. Meanwhile, the dynamic learning rate is adopted, and the learning rate is updated to be 0.5 times of the original learning rate every 10 times of iteration.

Step (6): evaluating model performance, and comparing the trained model with PM on the test set _2.5 Concentration was predicted in single and multiple steps, and model predictive performance was assessed by visual and quantitative evaluation. Furthermore, to verify the effectiveness of the various components, an ablation experiment was performed: and respectively removing one part for training, and comparing the true model prediction result and the complete model prediction result to illustrate the effectiveness of each part.

In the step (6), two evaluation indexes of root mean square error (Root Mean Squared Error, abbreviated as RMSE) and average absolute value error (Mean Absolute Error, abbreviated as MAE) are adopted in the study in order to verify the effectiveness of the model. The smaller the RMSE and MAE values, the more accurate the predictive model. These index formulae are as follows:

wherein y is ^t As an observation of the concentration of the target contaminant particles,for the predicted value, τ is the number of time steps to be predicted, and t represents the t-th time step.

By the steps carried out above, the method of the invention has the advantages that:

(1) Aiming at the problem of how to encode MSTCs, the method of the invention provides a new graph construction method for constructing a multi-scale dynamic synchronous graph. The method for constructing the map fuses the synchronous map and the method for constructing the dynamic map, wherein the dynamic correlation between sites is effectively captured by the construction of the dynamic map, and further MSTCs are effectively encoded by the construction of the multi-scale synchronous map, so that the effectiveness of the time-space map composition is improved.

(2) Aiming at the problem of how to extract MSTCs, the method of the invention provides a new graph network component MSTS-GCC, and mainly uses the multi-scale dynamic synchronous graph generated by the cooperation of parallel stacked GCNs to effectively capture the MSTCs in the space-time graph and improve the prediction precision of air quality.

(3) In the past, the influence of auxiliary factors on the air quality is singly considered in air quality prediction, and the method provides an attention module for the problem, fuses the captured multi-scale space-time representation with the auxiliary factors, and improves the prediction accuracy of the air quality.

(4) The synchronous graph attention and the time attention are integrated in the encoder-decoder structure in the model, so that the long-term influence of auxiliary features and the short-term influence of multi-scale space-time representation can be effectively and dynamically captured, and the prediction accuracy of air quality is improved.

In summary, the MTST-GCC proposed by the method of the present invention captures multi-scale spatio-temporal representations from dynamic synchronous diagrams of different scales. The encoder-decoder architecture focuses on the impact of multi-scale spatiotemporal representation and assist features using a synchronous diagramming mechanism and a temporal attention mechanism, respectively. By basing the fusion on both types, the decoder layer can iteratively generate multi-step predictions of the region site. It has superior performance to the existing method. The method can be used to assist in decision making and management of urban air quality prediction systems, ultimately helping to control air pollution.

Drawings

For a clearer description of the technical solutions of the present invention, the figures used for the description of the examples below are briefly introduced, it being evident that the following figures are only some examples of the present invention, from which other figures can be obtained for a person skilled in the art without the inventive effort.

FIG. 1 is a flow chart of regional air quality prediction in accordance with the present invention

FIG. 2 is a diagram of the construction of a multi-scale dynamic synchronization map of the present invention, wherein (a) represents a new adjacency matrix constructed by the synchronization map and (b) represents the construction of a synchronization map of different scales

FIG. 3 is a block diagram showing the MSTS-GCC module according to the present invention

FIG. 4 is a block diagram showing the specific structure of the EDs module of the present invention

FIG. 5 is a map of the present invention for regional prediction on a Beijing dataset, where (a) represents actual values and (b) represents predicted values

FIG. 6 shows the results of an ablation test of the present invention, wherein (a) is 12h and (b) is 24h

Detailed description of the preferred embodiments

In order to make the technical scheme and advantages of the present invention more clear, the technical scheme in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings and specific examples of the present invention.

As shown in figure 1, the invention provides a regional air quality prediction method based on a multi-scale dynamic synchronous diagram mechanism, which comprises the following specific implementation steps:

step (1): collecting data, collecting historical air pollution data of N regional sites of a city to be detected, and representing sites in a target city region as a group of V= { V ₁ ,v ₂ ,...,v _N }. Continuously collecting observation results and meteorological factor data of air pollutant particles at fixed intervals, wherein the air pollutant data comprises sulfur dioxide, nitrogen dioxide, ozone, carbon monoxide and fine particulate matter PM _2.5 And PM ₁₀ The method comprises the steps of carrying out a first treatment on the surface of the The meteorological factor data comprise wind speed, precipitation, temperature, dew point and air pressure. The number of points of interest of the city sites is collected, and the number of different points of interest in 5KM of each site is collected.

Step (2.1): abnormal value detection and elimination are carried out on the data by using a Laida criterion method, and the independently obtained measurement data are subjected toCalculate the arithmetic mean mu and the residual error +.>And calculate the standard deviation sigma, if a certain measured value x _i Is the residual error v of (2) _i (1.ltoreq.i.ltoreq.n) satisfying the following formula:

then consider asShould be rejected.

Step (2.5): dividing the time steps, defining an initial input and a target output. Taking representative pollutant PM in air pollutants _2.5 Concentration as target pollutant, setPM representing sites within a target city area at time step t _2.5 Concentration observation history, where F is the air contaminant particle characteristic number. Taking as input the observation history value with a time window length of T time steps, denoted as x= { X ¹ ,x ² ,...,x ^T }. The air contaminant particle concentration of the zone site V for the future τ time step is defined as the set of sequences y= { Y ^T+1 ,y ^T+2 ,...y ^T+τ Output as target, wherein ∈ ->Meteorological characteristics can be definedIs->Wherein F is _M Is the characteristic quantity of the meteorological characteristics. The point of interest feature may be defined as +.>Wherein F is _P Is the number of features of the point of interest feature.

Step (3.1): specifically, each site has some geospatial relevance to neighboring sites: the closer the two sites are, the stronger the correlation of the measured data. Following this principle, a predefined graph G is constructed using the geographical longitude and latitude of the site _Pre ＝(V _Pre ,E _Pre ,A _Pre ) Wherein V is _Pre Representing the total number of vertices in the graph, E _Pre The total number of edges in the graph is represented,representing an adjacency matrix of the graph, wherein the generation formula of the weight values in the adjacency matrix is as follows：

wherein, represents a cross operation relational character, H _:,i,f' Time information representing the f' th entry in the input i-th node, M _i,f' Represents the f' th channel of the output, W _f',f Is a trainable parameter for the model from the f' th channel to the f th channel.

wherein LeakyReLU () is an activation function, nonlinear transformation is performed on data, FC () represents a fully connected layer with the activation function,is->Dimension, ζ _i Representing node v _i Is>Is an inner product operation. Notably, the operation of the attention mechanism is operated through the node dimension. />Representing the calculated node v _i And v _j Spatial correlation between->Representing spatial attention scores after softmax manipulation.

Step (3.2.4): the initial weight in the dynamic graph adjacency matrix is generated by learning the metric function phi () expressed by the paired nodes by adopting a metric learning method, and the specific formula is as follows:

Wherein the method comprises the steps ofRepresenting learned node v _i And node v _j Dynamic spatial correlation between as node v in a dynamic graph matrix _i And node v _j Is used to determine the initial edge weight of the block.

Further, the generated ΔA is multiplied by the mask matrix mask so that the generated graph adjacency matrix can be similar to the adjacency matrix of the predefined graph. Then generating an adjacency matrix of the dynamic graph in a normalization mode, wherein the adjacency matrix comprises the following specific procedures:

ΔA＝Norm(ΔA⊙mask)

where mask is a mask matrix with a fill number of 1, which fills 1 the same shape as the shape of the adjacency matrix of the predefined graph, the rest of the positions are filled with 0. As indicated by the letter, the product of the corresponding multiplication of the parity elements is calculated.

A＝Norm(LeakyReLU(A _Pre +ΔA))

step (3.3): as shown in fig. 2 (a), the generated graph adjacency matrix a is used to generate dynamic synchronous graph adjacency matrices with different scales, and the graph matrices with three time steps are constructed according to different connection modes. As shown in figure 2 (b) of the drawings,to capture the time correlation, A 'is constructed by connecting all nodes to itself at adjacent time steps' _t . Creating a 'by connecting all nodes with their respective neighbors in successive time steps' _st To capture short cross-correlations. Constructing a 'by connecting all nodes with their 1-hop neighbors in successive time steps' _lst To capture long cross-correlations. The constructed graph may implicitly carry spatial correlation due to its inherent nature. The new weights of the edges in the adjacent matrix of the dynamic synchronous diagram with three different scales are aggregated based on the weights of the edges of the diagram on the original time step, and the weight calculation formula is as follows:

Step (4): model building and input and output. As shown in FIG. 1, the model includes a multi-scale spatio-temporal synchronization map convolution component module (Multiscale dynamic synchronous graph convolution component, abbreviation: MSTS-GCC), an auxiliary function embedding module (Auxiliary feature embedding, abbreviation: AFE), and a codec-Decoder module (Encoder-Decoder, abbreviation: EDs). Wherein the MSTS-GCC is composed of a series of Space-Time attention blocks (STS) which are composed of three Space-Time attention blocks (ST) corresponding to different scales of the dynamic synchronous diagram respectively, and the ST blocks are composed of stacked diagram convolution neural networks (Graph Convolutional Network, GCN) and pooling operation, thereby extracting multi-scale Space-Time representation and helping the model capture deeper Space information. The AFE converts the auxiliary features (meteorological features, temporal features and points of interest) that have direct or indirect influence on the region into an embedded matrix for model training, so that the prediction result of the model is more accurate. EDs module is made up of five subassemblies: synchronization map attention (Synchronous graph attention, abbreviation: SGA), time attention (Temporal attention, abbreviation: TA), encoder-decoder attention (Encoder-Decoder attention, abbreviation: EDA), fusion layer and feed forward neural network (Feedforward Neural Network, abbreviation: FFN). The SGA is primarily responsible for dynamically assigning site weights between different time steps based on extracting a multi-scale spatio-temporal representation. The TA weights different time steps of the same station based on the assist feature embedding. The EDA is responsible for fusion reassignment of the auxiliary features of the future step sizes and the encoder output. In other words, SGA is related to the spatial dimension, TA is related to the temporal dimension, and EDA is responsible for the interfacing between encoder and decoder. The specific implementation process is shown in the steps (4.1) to (4.3).

Step (4.1) the multi-scale spatio-temporal representation is extracted using MSTS-GCC as shown in FIG. 3. The MSTS-GCC module consists of a plurality of parallel STS blocks, and the STS blocks consist of three parallel ST blocks, and the corresponding generated dynamic space-time diagrams of three scales. Wherein the ST block consists of stacked GCN layers, which are finally subjected to an aggregation operation by an aggregation layer. ST blocks employ GCNs on the vertical domain to aggregate the single-scale spatio-temporal dependencies derived from the synchronous graph. STS blocks may capture single-time MSTCs by corresponding to different-scale ST blocks. The MSTS-GCC captures the MSTCs of the entire time window through multiple parallel STS blocks. The specific implementation process is shown in the steps (4.1.1) to (4.1.3).

Step (4.1.1): as shown in fig. 3, a single scale spatio-temporal representation is obtained by ST blocks. The ST block consists of stacked GCN layers, which are finally subjected to an aggregation operation by a pooling layer. The GCN layer may cause each node to aggregate its characteristics with neighboring nodes at neighboring time steps. The formula of the GCN is shown below:

wherein H is ^(k ) Expressed as the output of the kth layer GCN, H ⁽⁰⁾ Then the signal sequence of the first layer GCN,adjacency matrix, W, representing a constructed single-scale dynamic synchronization map _a 、W _b 、b ₁ B ₂ For trainable parameters, both GLU () and sigmoid () are activation functions.

Thus, deep spatio-temporal information { H } can be obtained by stacking K GCN layers ⁽⁰⁾ ,H ⁽¹⁾ ,...,H ^(K) }. Since the GCN uses dynamic synchronization maps as part of the input, the GCN can aggregate features from previous and next time steps, i.e., preserve features of neighbors, such that the output of the GCN layer is noisy. The averaging pooling operation AvgPooling (·) is used to filter the noise. This operation applies element-level averaging operations to the outputs of all GCN blocks in the ST block, so that { H } ⁽⁰⁾ ,H ⁽¹⁾ ,...,H ^(K) Compression to H _agg The operation is as follows:

step (4.1.2): STS Block according to step (4.1.1), the dynamic Sync map { A } ', is plotted' _t ,A′ _st ,A′ _lst Weight input ST block of the corresponding sequence signal and the weight input ST block of the corresponding sequence signal, and multi-scale space-time hiding dependence { H 'of three scales is respectively extracted' _t ,H′ _st ,H′ _lst Generating a multi-scale spatio-temporal representation of corresponding time steps using the linear layer and the fusion layerThe specific process is as follows:

/>

further, at X _temp On which filling operations are performedFilling the initial and final time step information to obtain a filled sequenceThen dividing the padded sequence by a sliding window with the length of 3 to obtain a division sequence of t time steps +.>Will divide sequence X' _t With three-dimensional dynamic map { A' _t ,A′ _st ,A′ _lst Input STS module, multi-scale space-time representation of the corresponding t-th time step can be obtained +.>The process can be expressed as follows:

and (4.2) converting the auxiliary features into an embedded matrix by using an AFE module for model training. Embedding for temporal and meteorological features: extracting two time features of a day and a week from the time stamps of the sequence data points to mark time information of each time step; weather feature X of each station _M And time feature X _T Two full connection layers are input for transformation. The specific implementation process is shown in the steps (4.2.1) to (4.2.3).

Step (4.2.1) embedding the temporal features. First, using one-hot encoding to represent the hour of the day and the day of the week for each time step, two shapes are created separatelyAnd->Tensors of (c). Subsequently, two tensors are transformed into +.>Is a shape of (c). Then, the two transformed tensors are added element by element to obtain an embedded +.>Information about the past time step is contained. Similarly, a time feature is embedded for a future time step, resulting in an embedding +.>To sum up the time characteristics of the past T and future τ time steps embedded, expressed as matrix +.>

step (4.2.3) embedding the point of interest features (Pointof Interests, abbreviations: POIs). POIs data represent, to some extent, a potential geographic feature: air pollution in industrial areas tends to be more severe than in park dense areas. Constructing a new graph G using POIs _POI ＝(V _SC ,E _SC ,A _SC ) Which is provided withIn (a)Indicating observation points +.>Representing different POIs categories, wherein the weights A _sc(i,j) Representing monitoring site->Within 5km of the vicinity->Number of kinds of buildings. The generated graph is processed by Node2Vec method for G _POI Performing graph embedding to obtain final POI embedding +.>

E _AU ＝E _POI +E _Time

Step (4.3) E is extracted in step (4.1) as shown in FIG. 4 _SP And E extracted in step (4.2) _AU And inputting the predicted output to an EDs module. The specific implementation process is shown in the steps (4.3.1) to (4.3.3).

Step (4.3.1): as shown in fig. 4, the input node signal X and the meteorological embedding E are performed before entering the encoder _M Performing projective transformation, and adding the post and position codes to obtain the initial input of the encoderThe process is as follows:

Z ⁽⁰⁾ ＝FC(X)+FC(E _M )+PE

where FC (-) represents the fully connected layer for projection, PE represents the position coding,representing the coding layer output of the first layer and the input of the l+1 encoder layer.

Step (4.3.2): as shown in FIG. 4, the output Z of the first layer ^(l-1) And E generated in the step (4.1) _SP The first-1 layer SGA is input, and the output of the first encoder layer SGA is generated. Specifically, the SGA module will Z ^(l-1) Medium representative node v _i Time step t _h Is hidden state of (a)And slave E _SP Sub-multi-scale spatio-temporal representation extracted from tensors of +.>Connected together, and then node v is calculated using a scaled dot product method _i And the spatial correlation between the nodes v, the specific formula is as follows:

further, the SGA output of the first layer can be obtained through the steps

Step (4.3.3): as shown in FIG. 4, the output Z of layer 1 is ^(l-1) And E generated in the step (4.2) _AU Input to the first layer TA, and generate the output of the first encoder layer TA. In particular the TA Module will hide the stateAnd E is connected with _ex Sub-auxiliary feature embedding +.>Connected together, the attention score is then calculated using a multi-head attention mechanism. After obtaining the attention score, a time step t can be generated _j Time node v _i The specific formula of the attention calculation is as follows:

/>

wherein,representing a time step t _j And t, +.>Is TA block n _h Attention score, representing time step t _j Importance to t->Is->Dimension of->Representing time step t _j The previous subset.

Step (4.3.4): as shown in fig. 4, the output of SGA and TA of the first encoder layer is fused using a gated fusion layer and a feed forward neural network, and the conversion formula is as follows:

Step (4.3.5): each encoder layer in the encoder generates the output of the next encoder layer through steps (4.3.2) through (4.3.4), and finally generates the L < th > _en Individual encoder layer outputsAs an output of the encoder.

Step (4.3.6): the encoder output is taken as input to the EDA module, generating an initial input to the decoder layer. EDA Module slave E _ex Extract node v _i Time step t of (2) _f Future assist feature embedding of (a)As a query vector, where t _f Representing a certain time step (t) _f ＝t _T+1 ,t _T+2 ,...,t _T+τ ) T denotes a certain time step in the set of historical time steps (t=t ₁ ,t ₂ ,...,t _T ). EDA using encoder output +.>The hidden state of the output node is used as a key and a value in the attention mechanism, and a specific conversion formula is as followsThe illustration is:

Step (4.3.7) is decoder input/output. As shown in fig. 4, the initial input D of the decoder ⁽⁰⁾ Feeding into a decoder, according to steps (4.3.2) to (4.3.4), the 1 st to L th can be generated in turn _de Outputs of decoder layers, where L _de The output of each decoder layer is taken as the final output of the decoder

And (4.3.8) outputting the model. As shown in fig. 4, the final output of the decoderFeeding into a convolutional neural network and a fully-connected neural network to generate a final output:

wherein Relu (·) represents the activation function, FC (·) and Conv (·) represent the fully connected and convolutional neural networks, respectively, Representing the final output of the model.

In the step (5), the loss function adopts a mean square error (MeanSquared Error, abbreviated as MSE), and the specific formula is as follows:

Step (6): evaluating model performance, and comparing the trained model with PM on the test set _2.5 Concentration single-step and multi-step time prediction, model predictive performance evaluation by visualization and quantitative evaluation. Further, to verify the effectiveness of the individual components, an ablation experiment was performed: and respectively removing one part for training, and comparing the true model prediction result and the complete model prediction result to illustrate the effectiveness of each part.

In the step (6), two evaluation indexes of root mean square error (Root Mean Squared Error, abbreviated as RMSE) and average error (Mean Absolute Error, abbreviated as MAE) are adopted in the study in order to verify the effectiveness of the model. The smaller the RMSE and MAE values, the more accurate the predictive model. These respective equations can be expressed as follows:

wherein y is ^t As an observation of the concentration of the target contaminant particles,for the model output value, τ is the number of time steps to be predicted, and t is the t-th time step.

Further, to verify the superiority of the proposed model, eight baseline models were chosen for short-term prediction and long-term prediction comparisons, including statistical and classical deep learning methods, as well as some well-known GCN-based methods that are capable of capturing spatial correlations. And performing prediction performance comparison in the Beijing data set and the Tianjin data set. Beijing dataset from 2018/01/01 to 2021/01/01 collected air pollution data and meteorological data for 34 air monitoring sites in Beijing area. The time interval for these data was 1 hour for a total of 26280 time steps. Five features are included in the meteorological data. At the same time, the Tianjin data set records air pollutant data and meteorological data of 27 monitoring stations from 2014/05/01 to 2015/05/01 in Tianjin area, the time interval is 1 hour, and 8760 time steps are accumulated. The meteorological data includes five features. The Beijing data set divides the training set, the verification set and the test set according to the proportion of 7:1:2, and the Tianjin data set divides the training set, the verification set and the test set according to the proportion of 8:1:1.

The standard baseline model includes: autoregressive moving average model (ARIMA), long and short term memory neural network (LSTM), time sequence convolutional neural network (TCN), long and short term memory neural network (CNN-LSTM) based on convolutional neural network, space-time diagram convolutional neural network (STGCN), space-time diagram convolutional neural network (ASTGCN) based on attention mechanism, graph convolutional neural network (Graph Wavenet) for space-time diagram modeling, space-time synchronization diagram convolutional neural network (STSGCN).

Further, in order to explore the advantages of the method in short-term prediction, a single-step prediction experiment is performed on two data sets, and the experimental comparison results are shown in table 1, so that the MAE and the RMSE corresponding to the method are minimum, and the method has better prediction performance.

Table 1 comparison of the performance of different models in a single step prediction experiment

Further, in order to explore the advantages of the method in long-term prediction, multi-step prediction experiments are performed on two data sets, and the experimental comparison results are shown in table 2, wherein the MAE and the RMSE corresponding to the method are minimum, and the method has better prediction performance.

Table 2 comparison of the performance of different models in multi-step prediction experiments

To better compare the spatial distribution of the predicted results, a geographical heat map set describing four specific, highly contaminated time steps of observations and predictions is visualized in the Beijing dataset, as shown in FIG. 5, with the first four maps showing the heat map of air pollution observed in the Beijing dataset and the last four maps showing the heat map of air pollution predicted in the Beijing dataset by the method of the present invention. These results demonstrate the effectiveness of regional level prediction using the method of the present invention.

Further, in order to fully evaluate the effectiveness of the proposed components of the method of the invention, ablation experiments were performed on Beijing dataset. These variants include: w/o MSTS-GCC variants, which replace the MSTS-GCC blocks with normal GCN; w/o SGA variants, with SGA blocks removed; w/o TA variants, TA blocks removed, and w/o EDs variants, EDA layers replaced with full connectivity layers. As shown in FIG. 6, after MSTS-GCC is removed, the performance of the model is greatly affected, and the importance of multi-scale space-time characterization is highlighted. Furthermore, removal of the SGA and TA blocks resulted in a slight decrease in prediction accuracy, which suggests the effectiveness of incorporating assist features into the model. Furthermore, we found that EDA layers outperform fully connected layers in terms of air quality prediction, demonstrating the effectiveness of tensor-based fusion in improving prediction accuracy. The ablation experiment result shows that each sub-component in the model can improve the prediction performance.

In summary, the composition method provided by the method of the invention combines the composition modes of the dynamic diagram and the synchronous diagram, and effectively encodes MSTCs. MTST-GCC is a space-time representation that captures multiple scales from dynamic synchronous diagrams of different scales. AFE is submerged in a variety of auxiliary features to aid in model modeling. The encoder-decoder architecture focuses on the impact of multi-scale spatiotemporal representation and assist features using a synchronous diagramming mechanism and a temporal attention mechanism, respectively. By basing the fusion on both types, the decoder layer can iteratively generate multi-step predictions of the region site. Experiments on two real data sets verify the effectiveness of the method of the invention, which has superior predictive performance compared to existing methods. The method can be used to assist in decision making and management of urban air quality prediction systems, ultimately helping to control air pollution.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. The regional air quality prediction model based on the multi-scale dynamic synchronous diagram mechanism is characterized by comprising the following steps of:

step (1): and collecting data, namely collecting historical air pollution data of N regional sites of the city to be detected, and representing the sites in the target city region as a group. Continuously collecting observation results and meteorological factor data of air pollutant particles at fixed intervals, wherein the air pollutant data comprises sulfur dioxide, nitrogen dioxide, ozone, carbon monoxide and fine particulate matter PM _2.5 And PM ₁₀ The method comprises the steps of carrying out a first treatment on the surface of the The meteorological factor data comprise wind speed, precipitation, temperature, dew point and air pressure. The number of points of interest of the city sites is collected, and the number of different points of interest in 5KM of each site is collected.

Step (2): the original data is preprocessed. Collecting data containing missing or unclean data, detecting abnormal values by using Laida criteria and removing the abnormal values; filling missing values and outliers through linear interpolation, then adopting Z-score to normalize to promote convergence of a model, and finally carrying out time step division and input-output definition.

Step (3): and generating a multi-scale dynamic synchronous diagram. A predefined graph with geographic prior knowledge is first generated from latitude and longitude information between sites. The accuracy of predicting the target air contaminant particle concentration is then further improved by fluctuations in the node signals and introducing spatial attention mechanisms and mask matrices. The spatial attention mechanism assigns weights in the node dimensions that enable accurate capture of dynamic spatial information to generate a node matrix. And constructing a dynamic graph adjacent matrix by utilizing the node matrix, multiplying the dynamic graph adjacent matrix by the mask matrix to enable the dynamic graph adjacent matrix to follow the structure of the predefined graph, thereby capturing the dynamic time-space correlation, and finally adding the generated dynamic graph adjacent matrix and the predefined graph adjacent matrix to obtain the prior knowledge of geography. And then constructing a multi-scale dynamic synchronous graph, capturing different MSTC by connecting space-time neighbor nodes of adjacent time steps, and generating the dynamic synchronous graph with different scales. And finally, taking the adjacency matrix of the generated synchronous graphs with different scales as a part of model output.

Step (4): model building and input and output. The model includes a multi-scale spatio-temporal synchronization map convolution component module (Multiscaledynamic synchronous graph convolution component, abbreviated MSTS-GCC), an auxiliary function embedding module (Auxiliary feature embedding, abbreviated AFE), and a codec-Decoder module (Encoder-Decoder, abbreviated EDs). Wherein the MSTS-GCC is composed of a series of Space-Time attention blocks (STS) which are composed of three Space-Time attention blocks (ST) corresponding to different scales of the dynamic synchronous diagram respectively, and the ST blocks are composed of stacked diagram convolution neural networks (Graph Convolutional Network, GCN) and pooling operation, thereby extracting multi-scale Space-Time representation and helping the model capture deeper Space information. The AFE converts the auxiliary features (meteorological features, temporal features and points of interest) that have direct or indirect influence on the region into an embedded matrix for model training, so that the prediction result of the model is more accurate. EDs module is made up of five subassemblies: synchronization map attention (Synchronous graph attention, abbreviation: SGA), time attention (Temporal attention, abbreviation: TA), encoder-decoder attention (Encoder-Decoder attention, abbreviation: EDA), fusion layer and feed forward neural network (Feedforward Neural Network, abbreviation: FFN). The SGA is primarily responsible for dynamically assigning site weights between different time steps based on extracting a multi-scale spatio-temporal representation. The TA weights different time steps of the same station based on the assist feature embedding. The EDA is responsible for fusion reassignment of the auxiliary features of the future step sizes and the encoder output. In other words, SGA is related to the spatial dimension, TA is related to the temporal dimension, and EDA is responsible for the interfacing between encoder and decoder.

Step (5): and calculating a loss function and training model parameters. The loss function is the mean square error with a regularized term.

2. The regional air quality prediction model based on the multi-scale dynamic synchronous diagram mechanism according to claim 1, wherein the step (2) is specifically:

then consider asShould be rejected.

wherein the method comprises the steps ofRepresenting the measured data, σ is the standard deviation of the measured data, μ is the mean value of the measured data, ++>To get home toAnd (5) converted data.

Step (2.4) dividing the data set. The obtained air quality and meteorological data are divided into a training set, a testing set and a verification set according to a specific proportion.

Step (2.5) dividing the time steps to define an initial input and a target output. Taking representative pollutant PM in air pollutants _2.5 Concentration as target pollutant, setPM representing sites within a target city area at time step t _2.5 Concentration observation history, where F is the air contaminant particle characteristic number. Taking as input the observation history value with a time window length of T time steps, denoted as x= { X ¹ ,x ² ,...,x ^T }. The air contaminant particle concentration of the zone site V for the future τ time step is defined as the set of sequences y= { Y ^T+1 ,y ^T+2 ,...y ^T+τ Output as target, wherein ∈ ->Meteorological characteristics may be defined as +.>Wherein F is _M Is the characteristic quantity of the meteorological characteristics. The point of interest feature may be defined as +.>Wherein F is _P Is the number of features of the point of interest feature.

3. The regional air quality prediction model based on the multi-scale dynamic synchronous diagram mechanism according to claim 1, wherein the step (3) is specifically:

Wherein dist (v) _i ,v _j ) Representing v _i And v _j Euclidean distance between, ψ ² Controlling parameters of the distribution, θ, for gaussian kernel functions _s Is the set Euclidean distance threshold.

Step (3.2): stacked one-dimensional convolution, spatial attention mechanisms, and metric learning are introduced to construct a dynamic graph.

Step (3.3): the generated graph adjacency matrix A is used for generating dynamic synchronous graph adjacency matrices with different scales. To capture the time correlation, A 'is constructed by connecting all nodes to itself at adjacent time steps' _t . Creating a 'by connecting all nodes with their respective neighbors in successive time steps' _st To capture short cross-correlations. Constructing a 'by connecting all nodes with their 1-hop neighbors in successive time steps' _lst To capture long cross-correlations. The constructed graph may implicitly carry spatial correlation due to its inherent nature. The new weights of the edges in the adjacent matrix of the dynamic synchronous diagram with three different scales are aggregated based on the weights of the edges of the diagram on the original time step, and the weight calculation formula is as follows:

4. The regional air quality prediction model based on the multi-scale dynamic synchronous map mechanism according to claim 1, wherein the step (3.2) is specifically:

wherein d' represents the dimension after aggregation, AGG (·) represents the aggregation function, and the time dimension can be reduced to be the same as the one-dimensional convolution operation of the stack1, which is specific to the result M after each convolution operation _i,f' The specific polymerization formula is as follows:

wherein, represents the cross operation relation in convolution, H _:,i,f' Time information representing the f' th entry in the input i-th node, M _i,f' Represents the f' th channel of the output, W _f',f Is a trainable parameter for the model from the f' th channel to the f th channel.

ΔA＝Norm(ΔA⊙mask)

A＝Norm(LeakyReLU(A _Pre +ΔA))

wherein LeakyReLU (·) is the activation function.

5. The regional air quality prediction model based on the multi-scale dynamic synchronous diagram mechanism according to claim 1, wherein the step (4) is specifically:

and (4.1) extracting the multi-scale space-time representation by using MSTS-GCC. The MSTS-GCC module consists of a plurality of parallel STS blocks, and the STS blocks consist of three parallel ST blocks, and the corresponding generated dynamic space-time diagrams of three scales. Wherein the ST block consists of stacked GCN layers, which are finally subjected to an aggregation operation by an aggregation layer. STS blocks may capture single-time MSTCs by corresponding to different-scale ST blocks. The MSTS-GCC captures the MSTCs of the entire time window through multiple parallel STS blocks.

And (4.2) converting the auxiliary features into an embedded matrix by using an AFE module for model training.

Step (4.3) E extracted in step (4.1) _SP And E extracted in step (4.2) _AU And inputting the model into an EDs module to obtain the final output of the model.

6. The regional air quality prediction model based on the multi-scale dynamic synchronous map mechanism according to claim 1, wherein the step (4.1) is specifically:

step (4.1.1): a single scale spatio-temporal representation is obtained by the ST block. The ST block consists of stacked GCN layers, which are finally subjected to an aggregation operation by a pooling layer. Wherein the inputs to the GCN layer are the attribute signal sequence and adjacency matrix of the synchronization map a 'and the single-scale synchronization map a' constructed in step (3), the GCN layer can be such that each node aggregates its features with neighboring nodes at neighboring time steps. The formula of the GCN is shown below:

wherein H is ^(k) Representation ofFor the output of the k-th layer GCN, H ⁽⁰⁾ Then the signal sequence of the first layer GCN,adjacency matrix, W, representing a constructed single-scale dynamic synchronization map _a 、W _b 、b ₁ B ₂ For trainable parameters, GLU () and sigmoid () are activation functions.

further, at X _temp Performing filling operation on the time step information to obtain a filled sequenceThen dividing the padded sequence by a sliding window with the length of 3 to obtain a division sequence of t time steps +. >Will divide sequence X' _t With three-dimensional dynamic map { A' _t ,A′ _st ,A′ _lst Input STS module, multi-scale space-time representation of the corresponding t-th time step can be obtained +.>The process can be expressed as follows:

where concat (-) represents a concatenation operation.

7. The regional air quality prediction model based on the multi-scale dynamic synchronous map mechanism according to claim 1, wherein the step (4.2) is specifically:

step (4.2.1) of embedding the time feature. First, using one-hot encoding to represent the hour of the day and the day of the week for each time step, two shapes are created separatelyAnd->Tensors of (c). Subsequently, two tensors are transformed into +.>Is a shape of (c). Then, the two transformed tensors are added element by element to obtain an embedded +.>It captures information about the past time step. Similarly, a time feature is embedded for a future time step, resulting in an embedding +. >To sum up, the time characteristics of the past T and future τ time steps are embedded, expressed as matrix +.>

step (4.2.3): for point of interest feature (Point of Interests, abbreviation: POIs) X _P Embedding is performed. POIs data represent, to some extent, a potential geographic feature: air pollution in industrial areas tends to be more severe than in park dense areas. Constructing a new graph G using POIs _POI ＝(V _SC ,E _SC ,A _SC ) WhereinIndicating observation points +.>Representing different POIs categories, wherein the weights A _sc(i,j) Representing observation station +.>Within 5km of the vicinity->Number of kinds of buildings. The generated graph is processed by Node2Vec method for G _POI Performing graph embedding to obtain final POI embedding +.>

8. The regional air quality prediction model based on the multi-scale dynamic synchronous map mechanism according to claim 1, wherein the step (4.3) is specifically:

Step (4.3.1): embedding an input node signal X and an aerial image into E before entering an encoder _M Performing projective transformation, and adding the post and position codes to obtain the initial input of the encoderThe process is as follows:

Z ⁽⁰⁾ ＝FC(X)+FC(E _M )+PE

where FC (·) represents the fully connected layer for projective transformation, PE represents the position-coding embedded tensor,coding layer output representing layer I and layer IAn input of the +1 encoder layer.

wherein the SGA introduces a multi-head attention mechanism to stabilize the learning process, Represents the nth _h Score of the individual SGA attention header. Specific N thereof _h The individual attention parallel mechanism is as follows:

further, the SGA output of the first layer can be obtained through the steps

Step (4.3.4): the gating fusion layer is used to fuse the outputs of SGA and TA of the first encoder layer, whose conversion formula is shown below:

wherein W is _T ，And->Are trainable parameters. g is the generated gate control tensor, +.>Is the output of the fusion layer in the first encoder layer, which can adaptively control the spatial and temporal dependencies of each node and time step.

9. The regional air quality prediction model based on the multi-scale dynamic synchronous diagram mechanism according to claim 1, wherein the step (5) is specifically: