CN114997506A - Atmospheric pollution propagation path prediction method based on link prediction - Google Patents

Atmospheric pollution propagation path prediction method based on link prediction Download PDF

Info

Publication number
CN114997506A
CN114997506A CN202210690966.7A CN202210690966A CN114997506A CN 114997506 A CN114997506 A CN 114997506A CN 202210690966 A CN202210690966 A CN 202210690966A CN 114997506 A CN114997506 A CN 114997506A
Authority
CN
China
Prior art keywords
network
propagation path
atmospheric pollution
prediction
atmospheric
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210690966.7A
Other languages
Chinese (zh)
Other versions
CN114997506B (en
Inventor
李勇
吴京鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lanzhou Qidu Digital Polymer Technology Co ltd
Northwest Normal University
Original Assignee
Lanzhou Qidu Digital Polymer Technology Co ltd
Northwest Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lanzhou Qidu Digital Polymer Technology Co ltd, Northwest Normal University filed Critical Lanzhou Qidu Digital Polymer Technology Co ltd
Priority to CN202210690966.7A priority Critical patent/CN114997506B/en
Publication of CN114997506A publication Critical patent/CN114997506A/en
Application granted granted Critical
Publication of CN114997506B publication Critical patent/CN114997506B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/20Air quality improvement or preservation, e.g. vehicle emission control or emission reduction by using catalytic converters

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Operations Research (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an atmospheric pollution propagation path prediction method based on link prediction. Calculating the propagation quantity of the atmospheric pollutants PM2.5 among different monitoring stations based on the transfer entropy; converting complex time sequence data into network data which is easy to model and analyze, and strengthening and paying attention to pollutant propagation relations among different monitoring stations; vectorizing and representing network nodes by a network representation learning technology, and obtaining node vector representation with low data noise based on an attention mechanism and a network neighbor aggregation technology; and obtaining vector representation corresponding to the connected edges in the network through a Hadamard product, and converting the atmospheric pollutant propagation path prediction problem into a two-classification problem. The method effectively solves the problems of high difficulty in collecting data of the atmospheric pollution propagation path, lag in prediction result, difficulty in transferring the model and the like, and has high adaptability and stability in the aspect of predicting the atmospheric pollution propagation path.

Description

Atmospheric pollution propagation path prediction method based on link prediction
Technical Field
The invention relates to a time series data network construction method based on transfer entropy and a link prediction method based on network representation learning. The method has important application and popularization values in the field of prediction of the PM2.5 propagation path of the atmospheric pollutants.
Background
With the progress of human industry and science and technology, atmospheric pollution becomes an important factor influencing human health and daily life, and the accurate prediction of the transmission path of pollutants is an important means for preventing the atmospheric pollution. The existing atmospheric pollution propagation path prediction model mainly comprises a mathematical model based on probability theory and an image model based on deep learning. The establishment of mathematical models often relies on a large amount of a priori data, such as: the method has the advantages that the building density, the population density, the pedestrian volume, the road width and the like are high, a large amount of manpower and material resources are consumed in the data acquisition process, the model solving difficulty is high, and in addition, the model precision is easily influenced greatly under the condition of urban infrastructure reconstruction. The establishment of the image model usually depends on an instant photographic image in a certain area of a city or a high-altitude image shot by a satellite, under the condition of severe atmospheric change, the perception range of the image model is limited to a certain extent, and different images lack correlation, so that the prediction result of the image model has obvious hysteresis. Such as: the pollution is found to be serious at a position 3km away from the current position, and the image shot at the current position cannot be timely sensed.
Link prediction is a common data analysis method in the field of network science, and aims to infer the probability of generating a connection edge relationship between any two nodes. The atmospheric pollutant PM2.5 propagation path prediction model is established from the perspective of network science, the transfer relationship of the atmospheric pollutant at different positions in a city can be better described, and the universality of the model is improved while the dependence on prior data is reduced.
Disclosure of Invention
In order to overcome the defects of the prior art, the collected PM2.5 time sequence data matrix is assumed to be X in the technical scheme of the invention, the matrix comprises N rows of monitoring stations, and each monitoring station comprises a PM2.5 concentration value data M column. The atmospheric pollutant PM2.5 propagation network constructed on the basis of the transfer entropy and the PM2.5 time sequence matrix X is G (V, E), wherein V represents a node set of the network, and nodes represent monitoring stations; and E represents a connected edge set of the network, and the connected edges represent the propagation relation of PM2.5 among different monitoring stations. By giving random walk of the initial node in the network, a vector representation matrix H representing the local topology of the node is obtained. And optimizing the H by using a node neighbor aggregation technology of the graph neural network to obtain a final node vector representation matrix H'. And randomly selecting a continuous edge and a non-existent continuous edge in the network to construct a sample space, wherein a continuous edge vector is represented by a Hadamard product of the characterization vectors of two nodes forming the continuous edge. Through the logistic regression classifier, the whole prediction output of the model, namely the probability value matrix Y of the propagation of the atmospheric pollutants PM2.5 among the monitoring stations can be obtained.
The invention mainly comprises five parts: (1) for the PM2.5 time sequence data matrix X, K time windows are selected according to time sequence intervals, N (N-1) groups of data are calculated according to the row-to-row interval, and each group of data is K transfer entropy values. (2) And (3) taking the monitoring station as a node, simultaneously solving an average value and a standard deviation of each group of K transfer entropy values, determining a connecting edge according to the average value and the standard deviation, and constructing an atmospheric pollutant propagation network G (V, E). (3) And sequentially giving initial nodes, and randomly walking in the network to obtain a node local topological structure vectorization expression matrix H. (4) M important nodes are found out from the network node set V, a similarity matrix Sim of the important nodes and all the nodes is calculated, an attention coefficient matrix A can be obtained through the Sim matrix, and a final node vector representation matrix H' is obtained through the attention coefficient and the neighbor aggregation process of the graph neural network. (5) Constructing a training set and a test set of connected edges, calculating a vector Hadamard product representing two nodes to obtain connected edge vectors corresponding to the two nodes, and solving the two classification problems through a logistic regression classifier. The following respectively describes the concrete contents of the above five parts:
1. for the PM2.5 time sequence data matrix X, K time windows are selected according to time sequence intervals, N (N-1) groups of data are calculated according to the row-to-row interval, and each group of data is K transfer entropy values. The magnitude of each transfer entropy represents the amount of contaminant propagation from one monitoring station to another monitoring station PM2.5 over a prescribed time window.
2. And taking the monitoring station as a node, and simultaneously solving the average value and the standard deviation of each group of K transfer entropy values. Measuring the pollutant propagation amount through an average value, wherein the larger the average value is, the larger the pollutant propagation amount between two monitoring stations is; and measuring the stability of the propagation relationship through the standard deviation, wherein the smaller the standard deviation is, the more stable the propagation relationship of the pollutants between the two monitoring stations is. And selecting a monitoring station as a node, selecting node pairs with large average value and small standard deviation to form a connecting edge, and constructing an atmospheric pollutant PM2.5 propagation network G (V, E).
3. And sequentially giving initial nodes, randomly walking in an atmospheric pollution PM2.5 propagation network G to obtain N random walking paths, and expressing the paths by a word vectorization method to obtain a feature vector matrix H representing a local network structure.
4. Selecting m important nodes from a node set V of a network G, carrying out similarity calculation with all nodes once to obtain a similarity matrix Sim, obtaining an attention coefficient matrix A by calculating the product of the Sim and an amplification factor matrix L, mapping the value range of the elements in the A to an interval [0,1] by row normalization, carrying out optimization adjustment on the elements in the H by combining a node neighbor aggregation technology of a graph neural network, and finally obtaining a node vector representation matrix H' for link prediction.
5. And adding all elements in the edge connecting set of the network G into a sample space as positive samples, and selecting reverse samples which are equal to the positive samples and are added into the sample space by a negative sampling technology. Randomly extracting a training set and a test set from a sample space, calculating a vector corresponding to a Hadamard product expression connecting edge according to two node vector expressions forming the connecting edge, and solving the two classification problems through a logistic regression classifier.
The method for predicting the atmospheric pollution propagation path based on the link prediction (xx) comprises the following steps:
step 1: respectively calculating transfer entropy through steps 1.1, 1.2 and 1.3, describing the transfer amount of PM2.5 among different monitoring stations through the transfer entropy, and then turning to step 2; the matrix X represents a PM2.5 time sequence data matrix, N rows are provided, each row represents a monitoring station, each row has M columns, each column represents an acquisition time, and each element in the matrix represents a PM2.5 concentration value acquired by the monitoring station at the acquisition time; let X have the subscript i i Represents the PM2.5 concentration acquired by the monitoring station i at M acquisition timesThe values constitute a row vector. Steps 1.1, 1.2 and 1.3 are described in detail below:
step 1.1: data acquisition and cleaning: recording PM2.5 concentration values of a current area at intervals of 1 hour at urban densely-distributed air quality monitoring stations, filling missing values which cannot be recorded due to equipment faults by using the average value of the PM2.5 concentrations of the previous hour and the next hour to obtain a PM2.5 time sequence data matrix X, and then, turning to the step 1.2.
Step 1.2: for the PM2.5 time sequence data matrix X, taking the row vector X of the 1 st row of data 1 Row vector X of data of i-1 row with the rest 2 ,X 3 …,X i Is mixing X 1 Is marked by Y, any X i Is marked as X by the formula
Figure BDA0003699729060000041
Then X can be calculated 1 To X i The index n of x and y represents the dimension of the row vector is n dimensions, the superscripts k and l of x and y represent the time window size specified by calculating the transition entropy, and the practical requirement can be met by analyzing the PM2.5 propagation amount within 4 hours, so that in four cases, i.e. 1,2,3 and 4, any two monitoring stations can calculate 4 transition entropies, and the calculation is carried out as a group, and then the process is shifted to step 1.3.
Step 1.3: and (3) repeating the step 1.2, calculating transfer entropies of all the row vectors and all the row vectors except the PM2.5 time sequence data matrix X to calculate N (N-1) groups of transfer entropies, wherein each group of transfer entropies represents the PM2.5 transfer amount of the two monitoring stations within 1,2,3 and 4 hours respectively, and then switching to the step 2.
Step 2: constructing an atmospheric pollutant propagation network through the steps 2.1, 2.2 and 2.3, and then turning to the step 3; let G ═ V, E denote the atmospheric pollutant propagation network, where V represents the set of nodes of the network and E represents the set of connected edges in the network. Order to
Figure BDA0003699729060000051
And (4) representing the transfer entropy calculated by the monitoring station i to the monitoring station j by taking 1 hour as a time window. The detailed description of steps 2.1 and 2.2 is as follows:
step 2.1: for any two monitoring stations i, j, 4 transfer entropies can be obtained
Figure BDA0003699729060000052
Using formulas
Figure BDA0003699729060000053
The average value of the data can be calculated by formula
Figure BDA0003699729060000054
Figure BDA0003699729060000055
The standard deviation of the group of data can be calculated, N (N-1) mean values and standard deviations are obtained by calculating N (N-1) groups of transfer entropies through the step, and then the step 2.2 is carried out.
Step 2.2: for the mean value as x-axis and the standard deviation as y-axis, a planar rectangular coordinate system xOy can be constructed, and the N (N-1) mean values and standard deviations calculated in step 2.1 are represented in xOy. By using
Figure BDA0003699729060000056
Figure BDA0003699729060000057
Calculating the average value of the whole sample mean value by using
Figure BDA0003699729060000058
Figure BDA0003699729060000059
The average of the standard deviations of the entire sample is calculated, and in xOy, the parallel line y parallel to the x axis is made STDEV, the parallel line x parallel to the y axis is made AVG, the xOy is divided into four regions, and then the procedure goes to step 2.3.
Step 2.3: for the points in the lower right corner area of the xOy, the average value is smaller in the sample space, the standard deviation is relatively smaller in the sample space, the larger the mean value of the transition entropy is, the larger the transmission amount of PM2.5 is, and the smaller the standard deviation of the transition entropy is, the more stable the transmission relationship existing between the two sites is. Therefore, all monitoring stations related to the point in the lower right corner area of the xOy are added into the node set V of the network G, the two monitoring stations forming the point construct a connecting edge between the monitoring stations, the connecting edge set E of the network G is added, and then the step 3 is carried out.
And step 3: the atmospheric pollutant PM2.5 propagation network G calculated in step 2 is (V, E), a feature vector matrix H representing a node local network structure can be obtained by a random walk and vectorization technique, H has N rows, N is the number of monitoring stations, i.e., the number of elements in V, has dim columns, and dim is an output vector dimension of the vectorization technique. The detailed description of step 3 is as follows:
step 3.1: for an atmospheric pollutant propagation network G ═ (V, E), a node V is given i E.g. V, at node V i Randomly selecting a node v from the first-order neighbor nodes j Random walk is carried out, a random walk step length k is given, and a node access sequence obtained through the random walk is obtained through the process
Figure BDA0003699729060000064
Figure BDA0003699729060000065
Then, the step 3.2 is carried out.
Step 3.2: for the process described in 3.1, the Node2vec technology calculates the probability of the Node which is possibly accessed by random walk each time by introducing the depth random walk parameter d and the breadth random walk parameter b, walks according to the probability value, so that the obtained Node access sequence has certain controllability, and can obtain the Node access sequence
Figure BDA0003699729060000061
Expressed as a value in the range of [0,1]]Is represented by a vector of floating-point numbers of,
Figure BDA0003699729060000062
the vector expresses the node v i In the local topology of the network G,
Figure BDA0003699729060000063
dimension of (c) can be specified by a dim parameter, which is usually 128, and the matrix formed by all node vectors is recorded as H, and then step 4 is performed.
And 4, step 4: considering adverse effects caused by data noise and an unsupervised training mode generated in a Node2vec technology random walk process, the Node vector matrix H obtained in the step 3 is optimized and adjusted by introducing a figure attention mechanism and a Node neighbor aggregation technology, the Node vector matrix H finally used for link prediction is obtained through the steps 4.1, 4.2 and 4.3, and the steps 4.1, 4.2 and 4.3 are described in detail as follows:
step 4.1: selecting the first m important nodes with the largest degree from the atmospheric pollution propagation network graph G, and calculating a similarity matrix Sim n×m =Similarity(WH n×dim ,WH m×dim ) Simiarity is a cosine Similarity function, W is a deep learning parameter matrix to be solved, H n×dim Eigenvector matrix, H, representing nodes m×dim The eigenvector matrix representing the m important nodes is then transferred to step 4.2.
And 4.2: calculating an attention coefficient matrix A n×n =Sim n×m L m×n ,Sim n×m Representing a similarity matrix, L m×n For deep learning of parameter matrix to be solved, n is number of nodes in network graph G, m is number of selected important nodes, and attention matrix A n×n Each element a in ij The normalization is performed according to rows, and the normalization mode is as follows: a is ij ∈A n×n
Figure BDA0003699729060000071
And then the step 4.3 is carried out.
Step 4.3: computing
Figure BDA0003699729060000072
Wherein K represents the number of stacked layers of the hidden layer, N i A set of neighbor nodes representing a node i,
Figure BDA0003699729060000073
represents the firstAttention Strength of node i to node j, W, in k hidden layers k The parameter matrix to be learned for the k-th hidden layer,
Figure BDA0003699729060000074
representing the characteristic vector of the node j to obtain a node vector representation matrix H' for link prediction, and then turning to the step 5.
And 5: and 4, adding an output layer containing two neuron nodes, namely a logistic regression classifier after the last hidden layer of the node vector representation matrix H' for link prediction calculated in the step 4, wherein the logistic regression classifier is used for outputting the probability Y of existence of the connecting edge. The detailed description of steps 5.1, 5.2 and 5.3 is as follows:
step 5.1: and (V, E) calculating the atmospheric pollutant PM2.5 propagation network G which is obtained by the step 2, adding all elements of the continuous edge set E of the atmospheric pollutant as positive example samples into the sample space, randomly selecting the elements which are not in the continuous edge set E and are equal to the positive example samples by using a negative sampling technology, adding the elements as negative example samples into the sample space, carrying out disorder processing on the sample space, and carrying out disorder processing on the continuous edge E consisting of any node i and j ij The corresponding feature vector can be represented as
Figure BDA0003699729060000075
Wherein the hadamard product operation of the |, representing a vector, proceeds to step 5.2.
Step 5.2: and adding an output layer containing two neuron nodes after the hidden layer described in the step 4.3, wherein the output layer is used for outputting the probability of existence of a connecting edge, an activation function adopts Softmax, a loss function adopts a binary cross entropy loss function, an optimizer selects Adam, an activation function selects LeakyReLU with a parameter of 0.2, the number of the neuron nodes of the input layer is 128, the neuron nodes of the neighbor aggregation layer is 64, the neighbor aggregation layer is stacked into 32 layers, the learning rate is 0.001, the dropout parameter is 0.4, and the model training iteration cycle epoch is 100. Then, the step 5.3 is carried out.
Step 5.2: after the training is completed in the step 5.3, selecting the PM2.5 time sequence data X collected by two air quality monitoring stations i, j optionally i And X j As model input, can obtainProbability Y of PM2.5 occurrence of propagation relation between air quality monitoring stations i, j ij
Compared with the prior art, the invention has the following advantages:
the invention provides an atmospheric pollution propagation path prediction method based on link prediction. The atmospheric pollution propagation path prediction method has the following characteristics that firstly, the method quantifies the propagation amount of pollutants among different monitoring stations through transfer entropy, and can better realize accurate prevention and control of pollutant diffusion and propagation; secondly, modeling is carried out by using a network science view angle, and the method focuses on the pollutant propagation relation existing among different monitoring stations; and thirdly, the method does not depend on manual data acquisition, and can automatically acquire data for model input under the existing conditions of the air quality monitoring station. Fourthly, the method does not depend on image data, is more sensitive to atmospheric changes and has wider perception visual field; the method does not depend on parameters such as the geographic structure, the building density and the like of a specific city, the model can be conveniently migrated and reconstructed in other cities, and the method has higher universality;
drawings
FIG. 1 is a diagram of an example data fragment used in the present invention
FIG. 2 is a flow chart of the construction of the propagation relationship of atmospheric pollutants among air quality monitoring stations according to the present invention
FIG. 3 is a two-dimensional spatial representation of transfer entropy and standard deviation of any two-site 4-group constructed based on Lanzhou air quality monitoring station data
FIG. 4 is a flow chart of an atmospheric pollution propagation path prediction model based on link prediction according to the present invention
FIG. 5 is a graph comparing AUC index (unit:%) during training of the present invention and the prior art link prediction method
FIG. 6 is a graph comparing AUC values and average relative errors (unit:%) of 10 times of random experiments repeated with the prior link prediction method
FIG. 7 is a comparison graph of Precision index under different L values in the present invention and the existing link prediction method
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The original data relied on by the present invention is selected from data collected by 111 air quality monitoring stations densely distributed in Lanzhou, wherein the data segment of one monitoring station is shown in FIG. 1. Each collection time corresponds to a PM2.5 concentration value, and the collection time interval is 1 hour. And replacing the missing values which are not collected due to mechanical faults by the mean value of the PM2.5 concentration values of the previous hour and the next hour.
The transfer entropy calculation process of the present invention is shown in FIG. 2, where S 1 ,S 2 ,…,S n Representing n air quality monitoring micro-stations, t 1 ,t 2 ,…,t m Representing m acquisition times, each element in the raw data matrix represents a PM2.5 concentration value. Under 4 conditions with time windows k-l-1, 2,3,4, any two monitoring stations can calculate 4 sets of transition entropies as 1. This operation yields (111 × (111-1)) group transfer entropy through step 1.
The network construction process of the invention is as shown in fig. 2, after (111 × (111-1)) groups of transfer entropies are obtained, 1 standard deviation and 1 average value can be calculated for each group of transfer entropies, a plane rectangular coordinate system xoy is established by taking the average value as a horizontal axis and the standard deviation as a vertical axis, wherein each point in the xoy is associated with 2 monitoring stations. The mean value of the transfer entropy standard deviations of all groups is calculated as STDEV, the mean value of the transfer entropy of all groups is calculated as AVG, a parallel line y parallel to an x axis is taken as STDEV, a parallel line x parallel to a y axis is taken as AVG, the xOy is divided into 4 areas, wherein the points in the lower right area represent the transfer entropy of two monitoring stations forming the point, the mean value is larger and the standard deviation is smaller in the whole sample space, namely the mean value of the transfer entropy is larger, the pollutant propagation quantity is larger, the standard deviation of the transfer entropy is smaller, and the pollutant propagation relation is more stable. And selecting the points in the lower right corner area, taking the monitoring stations as nodes, and constructing connecting edges between the corresponding monitoring stations to obtain the atmospheric pollutant PM2.5 transmission network. Based on 111 monitoring station data in Lanzhou city, a two-dimensional spatial representation of the mean and standard deviation of the transfer entropy was constructed as shown in FIG. 3. The operation is carried out through step 2 to obtain an atmospheric pollutant PM2.5 propagation network G ═ V, E, and then the operation proceeds to step 3 to obtain a vector matrix H representing the local topology of the node.
The atmospheric pollution propagation path prediction model flow chart based on link prediction is shown in fig. 4, after a node eigenvector matrix H is obtained in step 3, a vector representation of m nodes with the maximum degree is selected, an attention coefficient matrix A is obtained by combining parameters W and L to be solved in deep learning, Softmax normalization is carried out on the attention coefficient matrix A according to node neighbors, and finally an updated node eigenvector representation matrix H 'is obtained through H' ═ sigma (AWH). The operation is to optimize and adjust the node eigenvector matrix H through step 4 to obtain H'. And (5) turning to a step 5, constructing a sample set and sample characteristics, combining and training the classification structure and the neighbor aggregation structure in the step 4, and obtaining the probability of generating a connecting edge between any two nodes, namely the probability of the atmospheric pollutant PM2.5 propagation relation between any two monitoring stations.
FIGS. 5, 6 and 7 are comparative graphs of the present invention with other similar models under different evaluation indexes. Wherein the closer the AUC value is to 1, the better the prediction performance of the model is represented, and the slower the descending speed of the Precision value curve indicates the better performance of the model. It can be seen from fig. 5 that the model (FALP) not only has higher prediction performance, but also has faster convergence rate; as can be seen from a comparison graph of AUC values and relative errors of 10 repeated random experiments in fig. 6, the average AUC value of the model is the highest, and the average relative error is the lowest, which indicates that the model not only has high prediction performance, but also has better stability; as can be seen from FIG. 7, the Precision value curve of the model has the slowest descending speed, which indicates that the fault tolerance of the model is higher.

Claims (7)

1. An atmospheric pollution propagation path prediction method based on link prediction is characterized by comprising the following steps: when an atmospheric pollution propagation path prediction model is constructed, a processing method under a network science view angle is adopted.
2. The atmospheric pollution propagation path prediction method based on link prediction as claimed in claim 1, characterized in that: and according to PM2.5 time sequence data vectors acquired by different monitoring stations, sliding time windows to calculate transfer entropy values under different window conditions, and quantizing the transmission quantity of PM2.5 among different monitoring stations by using the transfer entropy values.
3. The atmospheric pollution propagation path prediction method based on link prediction as claimed in claim 1, characterized in that: calculating a transfer entropy mean value and a standard deviation according to the solved transfer entropy values under different time window conditions, constructing a two-dimensional space representation of PM2.5 propagation relations among monitoring stations by taking the transfer entropy mean value as a horizontal axis and the transfer entropy standard deviation as a vertical axis, determining sample points with larger transfer entropy mean value and smaller standard deviation mean value in the whole sample space, selecting the monitoring stations related to the sample points to construct connecting edges, and constructing an atmospheric pollutant propagation network.
4. The atmospheric pollution propagation path prediction method based on link prediction as claimed in claim 1, characterized in that: in an atmospheric pollutant propagation network, an initial node is given to carry out random walk through a network representation learning technology to obtain a node access sequence, and finally a vector representation matrix of the nodes in the network is obtained through a vectorization technology.
5. The atmospheric pollution propagation path prediction method based on link prediction as claimed in claim 1, characterized in that: and selecting important nodes in the atmospheric pollutant propagation network, calculating similarity with all other nodes, and combining deep learning to-be-solved parameters to obtain an attention coefficient matrix.
6. The atmospheric pollution propagation path prediction method based on link prediction as claimed in claim 1, characterized in that: and carrying out normalized representation on the attention coefficient according to the node neighbors of the network, and obtaining an optimized node vector representation matrix through the attention coefficient matrix and combining a network node neighbor aggregation structure of deep learning of the graph.
7. The atmospheric pollution propagation path prediction method based on link prediction as claimed in claim 1, characterized in that: all the connecting edges are used as positive example samples, and negative example samples are constructed through a negative sampling technology to obtain a sample space; representing continuous edge vectors in a sample space through a node vector representation matrix and a Hadamard product; and adding a logistic regression structure after the last deep learning hidden layer to convert the atmospheric pollution propagation path prediction problem into a link prediction problem under a network science view angle, solving the link prediction problem through two classifications, and effectively improving the performance of the atmospheric pollution propagation path prediction model.
CN202210690966.7A 2022-06-17 2022-06-17 Atmospheric pollution propagation path prediction method based on link prediction Active CN114997506B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210690966.7A CN114997506B (en) 2022-06-17 2022-06-17 Atmospheric pollution propagation path prediction method based on link prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210690966.7A CN114997506B (en) 2022-06-17 2022-06-17 Atmospheric pollution propagation path prediction method based on link prediction

Publications (2)

Publication Number Publication Date
CN114997506A true CN114997506A (en) 2022-09-02
CN114997506B CN114997506B (en) 2024-05-14

Family

ID=83034854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210690966.7A Active CN114997506B (en) 2022-06-17 2022-06-17 Atmospheric pollution propagation path prediction method based on link prediction

Country Status (1)

Country Link
CN (1) CN114997506B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114993336A (en) * 2022-07-18 2022-09-02 山东建筑大学 Commuting path optimization method and system based on PM2.5 pollutant exposure risk
BE1029906B1 (en) * 2023-02-08 2024-03-05 Nanchang Inst Tech A classification method for labeling sample sets of PM2.5 pollutants

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898838A (en) * 2018-08-03 2018-11-27 首都经济贸易大学 A kind of aerodrome traffic congestion prediction technique and device based on LSTM model
WO2019131486A1 (en) * 2017-12-25 2019-07-04 ローム株式会社 Signal processing device, wireless sensor network system, and signal processing method
CN110363350A (en) * 2019-07-15 2019-10-22 西华大学 A kind of regional air pollutant analysis method based on complex network
CN111275951A (en) * 2019-12-25 2020-06-12 ***通信集团江苏有限公司 Information processing method, device and equipment and computer storage medium
CN112066355A (en) * 2020-09-10 2020-12-11 河北工业大学 Self-adaptive adjusting method of waste heat boiler valve based on data driving
CN113222328A (en) * 2021-03-25 2021-08-06 中国科学技术大学先进技术研究院 Air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity
CN113516304A (en) * 2021-06-29 2021-10-19 上海师范大学 Space-time joint prediction method and device for regional pollutants based on space-time graph network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019131486A1 (en) * 2017-12-25 2019-07-04 ローム株式会社 Signal processing device, wireless sensor network system, and signal processing method
CN108898838A (en) * 2018-08-03 2018-11-27 首都经济贸易大学 A kind of aerodrome traffic congestion prediction technique and device based on LSTM model
CN110363350A (en) * 2019-07-15 2019-10-22 西华大学 A kind of regional air pollutant analysis method based on complex network
CN111275951A (en) * 2019-12-25 2020-06-12 ***通信集团江苏有限公司 Information processing method, device and equipment and computer storage medium
CN112066355A (en) * 2020-09-10 2020-12-11 河北工业大学 Self-adaptive adjusting method of waste heat boiler valve based on data driving
CN113222328A (en) * 2021-03-25 2021-08-06 中国科学技术大学先进技术研究院 Air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity
CN113516304A (en) * 2021-06-29 2021-10-19 上海师范大学 Space-time joint prediction method and device for regional pollutants based on space-time graph network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LI YONG等: "link prediction of attention flow network based on maximum entropy model", COMMUNICATIONS IN COMPUTER AND INFORMATIONSCIENCE, 1 April 2021 (2021-04-01), pages 123 - 136 *
吴京鹏: "基于图嵌入表示的节点无特征网络链路预测研究", 中国优秀硕士学位论文全文数据库基础科学辑, no. 2, 15 February 2023 (2023-02-15), pages 002 - 362 *
李勇等: "融合快速注意力机制的节点无特征网络链路预测算法", 计算机科学, vol. 49, no. 4, 2 April 2022 (2022-04-02), pages 43 - 48 *
梁涛;谢高锋;米大斌;姜文;: "基于CEEMDAN-SE和LSTM神经网络的PM_(10)浓度预测", 环境工程, vol. 38, no. 02, 15 February 2020 (2020-02-15), pages 107 - 113 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114993336A (en) * 2022-07-18 2022-09-02 山东建筑大学 Commuting path optimization method and system based on PM2.5 pollutant exposure risk
CN114993336B (en) * 2022-07-18 2022-10-25 山东建筑大学 Commuting path optimization method and system based on PM2.5 pollutant exposure risk
BE1029906B1 (en) * 2023-02-08 2024-03-05 Nanchang Inst Tech A classification method for labeling sample sets of PM2.5 pollutants

Also Published As

Publication number Publication date
CN114997506B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
CN109492822B (en) Air pollutant concentration time-space domain correlation prediction method
CN111798051B (en) Air quality space-time prediction method based on long-term and short-term memory neural network
CN111612066B (en) Remote sensing image classification method based on depth fusion convolutional neural network
CN114997506B (en) Atmospheric pollution propagation path prediction method based on link prediction
CN113313947B (en) Road condition evaluation method of short-term traffic prediction graph convolution network
CN112183862A (en) Traffic flow prediction method and system for urban road network
CN112232543A (en) Multi-site prediction method based on graph convolution network
CN114693064B (en) Building group scheme generation performance evaluation method
CN113808396A (en) Traffic speed prediction method and system based on traffic flow data fusion
CN112712169A (en) Model building method and application of full residual depth network based on graph convolution
CN115376317B (en) Traffic flow prediction method based on dynamic graph convolution and time sequence convolution network
CN114860715A (en) Lanczos space-time network method for predicting flow in real time
CN113516304A (en) Space-time joint prediction method and device for regional pollutants based on space-time graph network
CN112598165A (en) Private car data-based urban functional area transfer flow prediction method and device
CN114265913A (en) Space-time prediction algorithm based on federal learning on industrial Internet of things edge equipment
CN115661652A (en) Object-oriented graph neural network unsupervised remote sensing image change detection method
CN110766066B (en) Tensor heterogeneous integrated vehicle networking missing data estimation method based on FNN
CN115629160A (en) Air pollutant concentration prediction method and system based on space-time diagram
CN112562312A (en) GraphSAGE traffic network data prediction method based on fusion characteristics
CN116259172A (en) Urban road speed prediction method considering space-time characteristics of traffic network
CN114970946A (en) PM2.5 pollution concentration long-term space prediction method based on deep learning model and empirical mode decomposition coupling
CN115862324A (en) Space-time synchronization graph convolution neural network for intelligent traffic and traffic prediction method
CN116819423A (en) Method and system for detecting abnormal running state of gateway electric energy metering device
CN111340187A (en) Network characterization method based on counter attention mechanism
CN113935458A (en) Air pollution multi-site combined prediction method based on convolution self-coding deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant