CN111581792A - Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system - Google Patents
Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system Download PDFInfo
- Publication number
- CN111581792A CN111581792A CN202010325992.0A CN202010325992A CN111581792A CN 111581792 A CN111581792 A CN 111581792A CN 202010325992 A CN202010325992 A CN 202010325992A CN 111581792 A CN111581792 A CN 111581792A
- Authority
- CN
- China
- Prior art keywords
- carbon dioxide
- data
- dioxide emission
- training
- stage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 claims abstract description 624
- 229910002092 carbon dioxide Inorganic materials 0.000 claims abstract description 311
- 239000001569 carbon dioxide Substances 0.000 claims abstract description 310
- 239000013618 particulate matter Substances 0.000 claims abstract description 82
- 238000012549 training Methods 0.000 claims description 219
- 238000012544 monitoring process Methods 0.000 claims description 72
- 239000000126 substance Substances 0.000 claims description 31
- 230000006870 function Effects 0.000 claims description 30
- 239000013598 vector Substances 0.000 claims description 30
- 238000012545 processing Methods 0.000 claims description 20
- 230000007613 environmental effect Effects 0.000 claims description 18
- 238000012935 Averaging Methods 0.000 claims description 17
- 238000012360 testing method Methods 0.000 claims description 16
- 238000004519 manufacturing process Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 11
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 claims description 10
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 claims description 10
- 230000000694 effects Effects 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 238000003860 storage Methods 0.000 claims description 7
- 230000001419 dependent effect Effects 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 3
- 238000011109 contamination Methods 0.000 claims description 3
- 239000003344 environmental pollutant Substances 0.000 abstract description 6
- 231100000719 pollutant Toxicity 0.000 abstract description 6
- 230000001360 synchronised effect Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 239000002245 particle Substances 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 238000003915 air pollution Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000002803 fossil fuel Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 241001131927 Placea Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000002485 combustion reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000011158 quantitative evaluation Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000002345 respiratory system Anatomy 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 231100000331 toxic Toxicity 0.000 description 1
- 230000002588 toxic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N15/00—Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
- G01N15/06—Investigating concentration of particle suspensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/10—Numerical modelling
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- Dispersion Chemistry (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to the technical field of atmospheric pollutant concentration prediction, and particularly relates to an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model2.5A method of predicting concentration, the method comprising: dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area; inputting carbon dioxide emission data of a certain region of the region collected in real time into a pre-trained two-stage non-negative Lasso model, and outputting a first prediction result and a second prediction result; adding the first prediction result and the second prediction result to obtain the PM of the area2.5Concentration data prediction result, atmospheric PM realizing the region2.5And (4) predicting the concentration.
Description
Technical Field
The invention belongs to the technical field of atmospheric pollutant concentration prediction, and particularly relates to an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model2.5A concentration prediction method and system.
Background
PM2.5It refers to particles with an aerodynamic equivalent diameter of less than or equal to 2.5 microns in the environment, and the higher content concentration of the particles in the air represents the more serious air pollution. With the rapid advance of industrialization, the atmospheric haze phenomenon is more and more serious, and PM2.5The haze-preventing agent is one of main primitive fierce haze phenomena, the particle size of the haze-preventing agent is small, the haze-preventing agent can suspend in the air for a long time and spread, toxic and harmful substances can be carried into respiratory tracts and lungs, frequent large-scale haze influences daily trips of people, and direct threats are caused to human health. PM (particulate matter)2.5Is the main component of haze, and the primary tasks of treating haze and improving air quality are to control PM2.5,PM2.5The concentration prediction is the main content of the air quality prediction. Recent studies have shown that PM is used2.5The typical atmospheric composite air pollution has begun to become a significant environmental problem affecting the quality of life of people.
The simulation technique is a model technique that reflects the system behavior by means of numerical calculation or the like. Different from a general prediction model, only the pursuit of high prediction precision is carried out, a simulation model pays more attention to model interpretability, the attention to a simulation process is paid, and a limiting condition needs to be added in the model according to the actual service condition.
Research shows that in China, the emission of CO2 and the emission of atmospheric pollutants are the same (fossil fuel), the same (in the combustion process) and the same (the same equipment or the same emission port) in the energy structure mainly based on fossil fuels, and the emission of the atmospheric pollutants have a very close relationship.
For atmospheric PM2.5Concentration prediction, typically based on pollutant emission data and meteorological condition data, using multivariate regression models and random forest models for atmospheric PM2.5And (4) predicting the concentration. However, the conventional method has the following problems:
1) the consistency of the positive and negative of the model coefficient with the actual service cannot be ensured;
2) the model coefficients are not guaranteed to be all non-zero, namely that each carbon dioxide index cannot guarantee to monitor PM of the environment monitoring station2.5The concentration has influence, which is not in accordance with the actual service, so that the prediction has larger error, and the accuracy of the prediction is reduced.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides an atmosphere PM based on a two-stage non-negative Lasso model (Least Absolute shock and Selection Operator)2.5Concentration prediction method by monitoring PM of sites in different environments2.5And establishing a model for the concentration and the carbon dioxide emission list data of the area around the site, and analyzing the specific association relation between the concentration and the carbon dioxide emission list data.
The invention provides an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model2.5A method of predicting concentration, the method comprising:
dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area;
inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model, and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5Concentration data prediction result, atmospheric PM realizing the region2.5And (4) predicting the concentration.
As an improvement of the above technical solution, the method is characterized in that a certain area is divided into a plurality of grid areas on a spatial level, and for each grid area, the annual carbon dioxide emission data in the grid area is calculated by using a bottom-up spatialization method and is used as the carbon dioxide emission list data of the grid area; the method specifically comprises the following steps:
dividing a certain area into a plurality of square grid areas on a spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission data in each square grid area by using a bottom-up spatialization method for each square grid area to serve as the carbon dioxide emission list data of the grid area;
wherein the carbon dioxide emissions inventory data comprises: carbon dioxide total emissions data, energy carbon dioxide emissions data, industrial carbon dioxide emissions data, agricultural carbon dioxide emissions data, service industry carbon dioxide emissions data, municipal carbon dioxide emissions data, rural carbon dioxide emissions data, traffic carbon dioxide emissions data, aviation carbon dioxide emissions data, highway carbon dioxide emissions data, railway carbon dioxide emissions data, water transport carbon dioxide emissions data, and industrial process carbon dioxide emissions data.
As one improvement of the above technical solution, the carbon dioxide emission list data of a certain grid area of the area is input to a pre-trained two-stage non-negative Lasso model, and a first prediction result and a second prediction result are output; the method specifically comprises the following steps:
the two-stage non-negative Lasso model includes: a first stage non-negative Lasso model and a second stage non-negative Lasso model;
wherein, the non-negative Lasso model in the first stage is as follows:
wherein the content of the first and second substances,is a first prediction result; xttIndicates the placeA vector formed by carbon dioxide total emission data in carbon dioxide emission list data of a certain grid area of the area;representing the estimated value of the coefficient of the first-stage model;
wherein a first objective function is constructed:
wherein the content of the first and second substances,is the first stage squared error;represents the regularization term, the Lasso portion of the model; lambda [ alpha ]nA weight coefficient which is a first-stage regularization term; y ispm2.5Monitoring site PM for all environments2.5A vector of concentration data;
converting the first objective function into a matrix form:
wherein the content of the first and second substances,for the first stage model coefficient estimationTransposing; xtt' is XttTransposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
solving the estimated value of the coefficient of the model in the first stage by quadratic programming
The second stage non-negative Lasso model is:
wherein, X-ttA vector formed by the remaining carbon dioxide emission data except the carbon dioxide total emission data in a certain grid area of the region is used as an independent variable;representing the estimated value of the coefficient of the second-stage model; respm2.5The second prediction result is obtained;
wherein a second objective function is constructed:
wherein the content of the first and second substances,is the estimated value of the second stage model coefficient;the second stage squared error;representing a regularization term; lambda [ alpha ]mThe weight coefficient is the second-stage regular term;
converting the second objective function into a matrix form:
wherein the content of the first and second substances,for second stage model coefficient estimationTransposing; x-tt' is X-ttTransposing; 1 denotes a dimension p2× 1 and each entry is a column vector of 1, p2A dimension equal to the second stage model coefficients;
Inputting the total carbon dioxide emission data in the carbon dioxide emission data of a certain grid area of the area into the first-stage non-negative Lasso model, and outputting a first prediction result;
and inputting the rest carbon dioxide emission data except the total carbon dioxide emission data in the carbon dioxide emission data of a certain grid area of the region into the second-stage non-negative Lasso model, and outputting a second prediction result.
As an improvement of the above technical solution, the two-stage non-negative Lasso model training step specifically includes:
dividing the certain area into a plurality of square grid areas on the spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission training data in each square grid area by using a bottom-up spatialization method for each square grid area to be used as the carbon dioxide emission list training data of the grid area;
wherein the carbon dioxide emissions manifest training data comprises: carbon dioxide total emission training data, energy carbon dioxide emission training data, industrial carbon dioxide emission training data, agricultural carbon dioxide emission training data, service industry carbon dioxide emission training data, urban living carbon dioxide emission training data, rural living carbon dioxide emission training data, traffic carbon dioxide emission training data, aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data, water transportation carbon dioxide emission training data, and industrial process carbon dioxide emission training data;
calculating the grid area to which each environment monitoring station belongs according to the station position of each environment monitoring station, namely longitude data and latitude data of each environment monitoring station, and longitude data and latitude data of four vertexes of the corresponding grid area;
selecting N circles of grid areas around each environmental monitoring station according to the grid area to which each environmental monitoring station belongs, and acquiring atmospheric PM (particulate matter) from the grid area where the environmental monitoring station is located2.5Pollution concentration data as atmospheric PM in the circle of grid area2.5Pollution concentration training data;
selecting carbon dioxide emission list training data in each circle of grid area around the environment monitoring station; for the carbon dioxide emission list training data in each circle of grid area, solving the corresponding carbon dioxide class mean value according to different carbon dioxide index classes to obtain the carbon dioxide emission list training data of the corresponding carbon dioxide index class subjected to averaging treatment;
training data of carbon dioxide emission lists for averaging processing in each circle of grid area around environment monitoring site and atmosphere PM in the circle of grid area2.5The pollution concentration training data is divided into training set data and test set data according to the ratio of 7: 3; namely, training data of carbon dioxide emission lists which are subjected to equalization processing in each circle of grid area around 70 percent of environment monitoring sites and atmosphere PM in the circle of grid area2.5Taking the pollution concentration training data as training set data; training data of carbon dioxide emission lists for averaging processing in each circle of grid area around 30% of environment monitoring sites and atmosphere PM in the circle of grid area2.5Taking pollution concentration training data as test set data;
utilizing atmosphere PM in Nth circle of grid area around environment monitoring station2.5The pollution concentration training data are used as dependent variables, and the total carbon dioxide emission training data in the carbon dioxide emission list training data subjected to averaging processing in the Nth circle of grid area around the environment monitoring station are used as independent variables to establish a first-stage non-negative Lasso model;
wherein the content of the first and second substances,a first prediction result; xtt1Representing a vector formed by carbon dioxide total emission training data in carbon dioxide emission training list data subjected to averaging processing in an Nth circle of grid area around the environment monitoring station;representing a first model coefficient training estimation value;
when the first model coefficient training estimation value is solved, the following objective function is constructed:
wherein the content of the first and second substances,training the square error for the first stage;representing a training regularization term, λn1Training the weight coefficient of the regular term for the first stage;
converting the objective function in (8) into a matrix form:
wherein the content of the first and second substances,representing an objective function;is composed ofTransposing; xtt1' is Xtt1Transposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
Calculating the fitting error res of the non-negative Lasso model in the first stagepm2.5:
Taking the fitting error res obtained by calculation in the formula (10) as a dependent variable; using the remaining carbon dioxide emission training list data, excluding the total carbon dioxide emission training data, as the independent variable (X)-tt1) Establishing a second stage non-negative Lasso model:
wherein, X-tt1A vector formed by the rest carbon dioxide emission training data except the carbon dioxide total emission training data in a certain grid area of the area is used as an independent variable;representing the training estimation value of the second stage model coefficient; respm2.5The second prediction result is obtained;
when solving the second model coefficient training estimation value, constructing the following objective function:
wherein the content of the first and second substances,training an estimated value for the second stage model coefficients;training the squared error for the second stage;representing a regularization term; lambda [ alpha ]m1Training the weight coefficient of the regular term for the second stage;
converting the objective function in (12) into a matrix form:
wherein the content of the first and second substances,is composed ofTransposing; x-tt1' is X-tt1Transposing; 1 denotes a dimension p2× 1 and each entry is a column vector of 1, p2A dimension equal to the second stage model coefficients;
For the reserved 30% test set, respectively using the two-stage model obtained by training to predict, obtaining a corresponding first prediction result and a second prediction result, and adding the two prediction results to obtain the atmosphere PM of the environment monitoring station2.5Predicted value of concentration data:
evaluating the model prediction effect by using the relative percentage error MAPE:
wherein, observedtAtmospheric PM representing environmental monitoring sites2.5The actual value of the contamination concentration data; predictedtAtmospheric PM for environmental monitoring sites2.5The predicted value of the concentration data is the predicted result output by the two-stage non-negative lasso model; n1 denotes the number of prediction samples; the subscript t is used to identify the t-th sample;
for each environment monitoring site, 70% of carbon dioxide emission list data in each circle of grid area in N circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs2.5And (3) using the pollution concentration data as training set data, and repeating the modeling process until the model effect evaluation index MAPE enables the model effect to be converged in the test set data, so as to obtain the final two-stage non-negative Lasso model.
The invention also provides an atmosphere PM based on the two-stage non-negative Lasso model2.5A concentration prediction system, the system comprising:
the grid division module is used for dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area; and
the prediction module is used for inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5The concentration data predicts the result.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method when executing the computer program.
The invention also provides a computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the above-mentioned method.
Compared with the prior art, the invention has the beneficial effects that:
simulating carbon dioxide emission list data in grid area and atmosphere PM in grid area to which nearby environment monitoring station belongs by using two-stage non-negative Lasso model2.5Relationships between concentration data;
by adopting a two-stage non-negative Lasso model, atmospheric PM of a predicted environment monitoring site can be guaranteed2.5Carbon dioxide emissions inventory data for all surrounding grids at concentration versus predicted PM2.5Positive influence is generated, namely the positive influence is expressed in the model, and the coefficient corresponding to each carbon dioxide index is not 0, so that the method is more in line with business practice;
the Lasso part in the model can compress unimportant variables, remove the collinearity among indexes and ensure the generalization capability of the model;
on the spatial grid level, atmospheric PM in any grid area is realized2.5Predicting concentration data so as to complete quantitative evaluation on the influence of air quality; the regional carbon dioxide emission and the atmospheric pollutant cooperative management are quickly and effectively realized.
Drawings
FIG. 1 shows an atmospheric PM based on a two-stage non-negative Lasso model according to the present invention2.5The spatial relative position schematic diagram of the grid region to which the environment monitoring station belongs during training of the two-stage non-negative Lasso model in the concentration prediction method;
FIG. 2 is an atmospheric PM based on a two-stage non-negative Lasso model according to the present invention2.5A schematic diagram of a two-stage non-negative Lasso model training process in the concentration prediction method;
FIG. 3 is an atmospheric PM based on a two-stage non-negative Lasso model according to the present invention2.5Environment monitoring site for testing and verifying two-stage non-negative Lasso model in concentration prediction method by adopting test set after training is completedPM of the atmosphere2.5Distribution histogram of prediction error of concentration.
Detailed Description
The invention will now be further described with reference to the accompanying drawings.
The invention provides an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model2.5The concentration prediction method can quickly simulate and predict the atmospheric PM of the environmental monitoring sites near the area when the carbon dioxide emission (essentially the energy utilization condition) changes in the area2.5The change of concentration; or predicting and analyzing possible regional air quality (PM) according to energy utilization and structural change in regional planning2.5Concentration) of the sample.
The Lasso model was first proposed by Robert Tibshirani in 1996, and is called as a Least absoluteshrinkage and selection operator. The model is a compressed estimate, and a more refined model is obtained by constructing a penalty function so that it compresses coefficients while setting coefficients to zero. Thus, the advantage of subset puncturing is retained, and is a way to process biased estimates with complex collinearity data.
The method comprises the following steps:
dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area;
specifically, a certain area is divided into a plurality of square grid areas on the spatial level according to 10km multiplied by 10km, and the annual carbon dioxide emission data in each square grid area is checked out by utilizing a bottom-up spatialization method for each square grid area and is used as the carbon dioxide emission list data of the grid area;
wherein the carbon dioxide emissions inventory data comprises: carbon dioxide total emissions data, energy carbon dioxide emissions data, industrial carbon dioxide emissions data, agricultural carbon dioxide emissions data, service industry carbon dioxide emissions data, municipal carbon dioxide emissions data, rural carbon dioxide emissions data, traffic carbon dioxide emissions data, aviation carbon dioxide emissions data, highway carbon dioxide emissions data, railway carbon dioxide emissions data, water transport carbon dioxide emissions data, and industrial process carbon dioxide emissions data.
Wherein the total carbon dioxide emission data is the sum of energy carbon dioxide emission data and industrial process carbon dioxide emission data;
the energy carbon dioxide emission data is the sum of industrial carbon dioxide emission data, agricultural carbon dioxide emission data, service industry carbon dioxide emission data, urban life carbon dioxide emission data, rural life carbon dioxide emission data and traffic carbon dioxide emission data; the traffic carbon dioxide emission data is a sum of aviation carbon dioxide emission data, highway carbon dioxide emission data, railroad carbon dioxide emission data, and water transport carbon dioxide emission data.
Wherein the data is from a high spatial resolution drainage grid database.
Each emission data in the carbon dioxide emission list data corresponds to a carbon dioxide emission index, the total carbon dioxide emission data corresponds to a total carbon dioxide emission index, the energy carbon dioxide emission data corresponds to an energy carbon dioxide emission index, the industrial carbon dioxide emission data corresponds to an industrial carbon dioxide emission index, the agricultural carbon dioxide emission data corresponds to an agricultural carbon dioxide emission index, the service carbon dioxide emission data corresponds to a service carbon dioxide emission index, the urban life carbon dioxide emission data corresponds to an urban life carbon dioxide emission index, the rural life carbon dioxide emission data corresponds to a rural life carbon dioxide emission index, the traffic carbon dioxide emission data corresponds to a traffic carbon dioxide emission index, the aviation carbon dioxide emission data corresponds to an aviation carbon dioxide emission index, and the highway carbon dioxide emission data corresponds to a highway carbon dioxide emission index, the railway carbon dioxide emission data correspond to railway carbon dioxide emission indexes, the water transport carbon dioxide emission data correspond to water transport carbon dioxide emission indexes, the industrial process carbon dioxide emission data correspond to industrial process carbon dioxide emission indexes, and 13 carbon dioxide emission indexes are provided in total.
Inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model, and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5Concentration data prediction result, atmospheric PM realizing the region2.5And (4) predicting the concentration.
Wherein the two-stage non-negative Lasso model comprises: a first stage non-negative Lasso model and a second stage non-negative Lasso model;
wherein, the non-negative Lasso model in the first stage is as follows:
wherein the content of the first and second substances,is a first prediction result; xttA vector composed of carbon dioxide total emission data in carbon dioxide emission list data of a certain grid area of the region is used as an independent variable;representing first stage model coefficient estimates, i.e. first stage model coefficient true values βttAn estimated value of (d);
wherein a first objective function is constructed:
wherein the content of the first and second substances,is the first stage squared error;represents the regularization term, the Lasso portion of the model; lambda [ alpha ]nBeing a first-stage regularization termA weight coefficient; y ispm2.5Monitoring site PM for all environments2.5A vector of concentration data;
converting the first objective function into a matrix form:
wherein the content of the first and second substances,for the first stage model coefficient estimationTransposing; xtt' is XttTransposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
solving the estimated value of the coefficient of the model in the first stage by quadratic programming
The second stage non-negative Lasso model is:
wherein, X-ttA vector formed by the remaining carbon dioxide emission data except the carbon dioxide total emission data in a certain grid area of the area is used as an independent variable;representing the estimated value of the coefficient of the second-stage model; respm2.5The second prediction result is obtained;
wherein a second objective function is constructed:
wherein the content of the first and second substances,is an estimate of the second stage model coefficients, i.e. the second stage model coefficient true value β-ttAn estimated value of (d);the second stage squared error;represents the regularization term, the Lasso portion of the model; lambda [ alpha ]mThe weight coefficient is the second-stage regular term; y ispm2.5Monitoring site PM for all environments2.5A vector of data;vectors formed for the first prediction, i.e. PM for each environmental monitoring site according to the first stage model2.5A vector formed by the predicted values of (a);
converting the second objective function into a matrix form:
wherein the content of the first and second substances,for second stage model coefficient estimationTransposing; x-tt' is X-ttTransposing; 1 denotes a dimension p2× 1 and each entry is a column vector of 1, p2A dimension equal to the second stage model coefficients;
inputting the total carbon dioxide emission data in the carbon dioxide emission list data of a certain grid area of the area into the first-stage non-negative Lasso model, and outputting a first prediction result;
and inputting the rest carbon dioxide emission data except the total carbon dioxide emission data in the carbon dioxide emission data of a certain grid area of the region into the second-stage non-negative Lasso model, and outputting a second prediction result.
As shown in fig. 2, the two-stage non-negative Lasso model training step specifically includes:
dividing the certain area into a plurality of square grid areas on the spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission training data in each square grid area by using a bottom-up spatialization method for each square grid area to be used as the carbon dioxide emission list training data of the grid area;
wherein the carbon dioxide emissions manifest training data comprises: carbon dioxide total emission training data, energy carbon dioxide emission training data, industrial carbon dioxide emission training data, agricultural carbon dioxide emission training data, service industry carbon dioxide emission training data, urban living carbon dioxide emission training data, rural living carbon dioxide emission training data, traffic carbon dioxide emission training data, aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data, water transportation carbon dioxide emission training data, and industrial process carbon dioxide emission training data;
wherein the carbon dioxide total emission training data is the sum of energy carbon dioxide emission training data and industrial process carbon dioxide emission training data;
the energy carbon dioxide emission training data is the sum of industrial carbon dioxide emission training data, agricultural carbon dioxide emission training data, service industry carbon dioxide emission training data, urban living carbon dioxide emission training data, rural living carbon dioxide emission training data, traffic carbon dioxide emission training data, aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data and water transport carbon dioxide emission training data;
the traffic carbon dioxide emission training data is the sum of aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data and water transport carbon dioxide emission training data.
Wherein the data is from a high spatial resolution drainage grid database.
Each emission training data in the carbon dioxide emission list training data corresponds to a carbon dioxide emission training index, the carbon dioxide total emission training data corresponds to a carbon dioxide total emission training index, the energy carbon dioxide emission training data corresponds to an energy carbon dioxide emission training index, the industrial carbon dioxide emission training data corresponds to an industrial carbon dioxide emission training index, the agricultural carbon dioxide emission training data corresponds to an agricultural carbon dioxide emission training index, the service carbon dioxide emission training data corresponds to a service carbon dioxide emission training index, the urban carbon dioxide emission training data corresponds to an urban carbon dioxide emission training index, the rural carbon dioxide emission training data corresponds to a rural carbon dioxide emission training index, and the traffic carbon dioxide emission training data corresponds to a traffic carbon dioxide emission training index, aviation carbon dioxide emission training data corresponds to aviation carbon dioxide emission training indexes, highway carbon dioxide emission training data corresponds to highway carbon dioxide emission training indexes, railway carbon dioxide emission training data corresponds to railway carbon dioxide emission training indexes, water-borne carbon dioxide emission training data corresponds to water-borne carbon dioxide emission training indexes, industrial process carbon dioxide emission training data corresponds to industrial process carbon dioxide emission training indexes, and the total number of the training indexes is 13.
Calculating the grid area to which each environment monitoring station belongs according to the station position of each environment monitoring station, namely longitude data and latitude data of each station, and the longitude data and the latitude data of four vertexes of the corresponding grid area;
monitoring sites according to each environmentSelecting N circles of grid areas around each station, and acquiring atmospheric PM from the grid area where the environmental station is located2.5Pollution concentration data as atmospheric PM in the circle of grid area2.5Pollution concentration training data;
selecting carbon dioxide emission list training data in each circle of grid area around the environment monitoring station; for the carbon dioxide emission list training data in each circle of grid area, solving the corresponding carbon dioxide class mean value according to different carbon dioxide index classes to obtain the carbon dioxide emission list training data of the corresponding carbon dioxide index class subjected to averaging treatment; for example, a certain circle of grid region of N circles of grid regions around each site is selected to include 8 grids, each grid includes the carbon dioxide emission list training data of the 13 carbon dioxide emission training indexes, and then the total carbon dioxide data in the certain circle of grid region is 8 × the carbon dioxide total emission training data including the 13 carbon dioxide emission training indexes; the averaging at this time is to add training data corresponding to 8 carbon dioxide emission training indexes with the same grid to obtain training data corresponding to the carbon dioxide emission training indexes subjected to averaging processing, and repeat the above operation 13 times to obtain training data corresponding to 13 carbon dioxide emission training indexes subjected to averaging processing respectively.
Training data of carbon dioxide emission lists for averaging processing in each circle of grid area around environment monitoring site and atmosphere PM in the circle of grid area2.5The pollution concentration training data is divided into training set data and test set data according to the ratio of 7: 3; namely, training data of carbon dioxide emission lists which are subjected to equalization processing in each circle of grid area around 70 percent of environment monitoring sites and atmosphere PM in the circle of grid area2.5Taking the pollution concentration training data as training set data; training data of carbon dioxide emission lists for averaging processing in each circle of grid area around 30% of environment monitoring sites and atmosphere PM in the circle of grid area2.5Taking pollution concentration training data as test set data;
utilizing atmosphere PM in Nth circle of grid area around environment monitoring station2.5The pollution concentration training data are used as dependent variables, and the total carbon dioxide emission training data in the carbon dioxide emission list training data subjected to averaging processing in the Nth circle of grid area around the environment monitoring station are used as independent variables to establish a first-stage non-negative Lasso model;
wherein the content of the first and second substances,is a first prediction result; xtt1Representing a vector formed by carbon dioxide total emission training data in carbon dioxide emission training list data subjected to averaging processing in an Nth circle of grid area around the environment monitoring station;representing a first model coefficient training estimation value;
when solving the first-stage model coefficient training estimation value, constructing the following objective function:
wherein the content of the first and second substances,training the square error for the first stage;representing a training regularization term, λn1Training the weight coefficient of the regular term for the first stage;
converting the objective function in (8) into a matrix form:
wherein the content of the first and second substances,representing an objective function;is composed ofTransposing; xtt1' is Xtt1Transposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
Calculating the fitting error res of the non-negative Lasso model in the first stagepm2.5:
Taking the fitting error res obtained by calculation in the formula (10) as a dependent variable; using the remaining carbon dioxide emission training list data, excluding the total carbon dioxide emission training data, as the independent variable (X)-tt1) Establishing a second stage non-negative Lasso model:
wherein, X-tt1A vector formed by the rest carbon dioxide emission training data except the carbon dioxide total emission training data in a certain grid area of the area is used as an independent variable;representing the training estimation value of the second stage model coefficient; respm2.5The second prediction result is obtained;
when solving the second stage model coefficient training estimation value, constructing the following objective function:
wherein the content of the first and second substances,for the second stage model coefficients β-tt1The training estimation value of (2), namely the training estimation value of the second stage model coefficient;training the squared error for the second stage;represents the regularization term, the Lasso portion of the model; lambda [ alpha ]m1The weight coefficient is the second-stage regular term;
converting the second objective function in (12) into a matrix form:
wherein the content of the first and second substances,for second stage model coefficient estimationTransposing; x-tt1' is X-tt1Transposing; 1 denotes a dimension p2× 1 column vector with each entry being 1, p2A dimension equal to the second stage model coefficients;
for (13), the second-stage model coefficient training estimation value is solved by utilizing quadratic programming
For the reserved 30% test set, respectively using the two-stage model obtained by training to perform pre-testMeasuring to obtain a first prediction result and a second prediction result, and adding the two prediction results to obtain the atmospheric PM of the environment monitoring station2.5Predicted value of concentration data:
evaluating the model prediction effect by using the relative percentage error MAPE:
wherein, observedtAtmospheric PM representing environmental monitoring sites2.5The actual value of the contamination concentration data; predictedtAtmospheric PM for environmental monitoring sites2.5The predicted value of the concentration data is the predicted result output by the two-stage non-negative lasso model; n1 denotes the number of prediction samples; the subscript t is used to identify the tth sample (environmental monitoring site);
for each environment monitoring site, 70% of carbon dioxide emission list data in each circle of grid area in N circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs2.5And (3) using the pollution concentration data as training set data, repeating the modeling process until the model effect evaluation index MAPE reaches convergence on the test set data (when the MAPE value is reduced by no more than 0.01, the model is considered to reach convergence), and obtaining the final two-stage non-negative Lasso model.
In this embodiment, as shown in fig. 1, 3 circles of grid areas around an environment monitoring site are selected, where each circle of grid area includes a plurality of grids; wherein, the dots represent environment monitoring sites; squares represent a 10 x 10km grid; for 70% carbon dioxide emission list data in each circle of grid areas in 3 circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs2.5The modeling process is repeated by using the pollution concentration data as training set data until the model effect evaluation index MAPE reaches convergence in the test set data (when the MAPE reaches convergenceWhen the value drops by no more than 0.01, the model is considered to have converged), and the final two-stage non-negative Lasso model is obtained. And if the model effect evaluation index MAPE does not meet the standard, increasing the number of turns and carrying out training again to meet the standard.
As shown in FIG. 3, the model representing the convergence predicts PM per sample on the test set2.5And the distribution of the obtained prediction errors shows that the prediction errors are concentrated between 0 and 15, and the model has good prediction accuracy.
The prediction error is calculated as follows:
errort=|observedt-predictedt|
wherein, errortRepresenting the prediction error of the model on the t sample; observedtIndicates the PM corresponding to the t-th sample2.5Obtaining a true value; predictedtPM representing model vs. t sample2.5The predicted value of (2).
In the embodiment, the atmospheric PM of the environmental monitoring sites around the grid is simulated by changing the carbon dioxide emission list data of a certain grid area through a two-stage non-negative Lasso model2.5The change in concentration.
Example 1.
The invention also provides an atmosphere PM based on the two-stage non-negative Lasso model2.5A concentration prediction system, characterized in that the system comprises:
the grid division module is used for dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area; and
the prediction module is used for inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5Concentration data predictionAnd (6) obtaining the result.
Example 2.
Embodiment 2 of the present invention may also provide a computer device including: at least one processor, memory, at least one network interface, and a user interface. The various components in the device are coupled together by a bus system. It will be appreciated that a bus system is used to enable communications among the components. The bus system includes a power bus, a control bus, and a status signal bus in addition to a data bus.
The user interface may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, trackball, touch pad, or touch screen, among others.
It will be appreciated that the memory in the embodiments disclosed herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double data rate Synchronous Dynamic random access memory (ddr DRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
In some embodiments, the memory stores elements, executable modules or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system and an application program.
The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application programs, including various application programs such as a Media Player (Media Player), a Browser (Browser), etc., are used to implement various application services. The program for implementing the method of the embodiment of the present disclosure may be included in an application program.
In the above embodiments, the processor may further be configured to call a program or an instruction stored in the memory, specifically, a program or an instruction stored in the application program, and the processor is configured to:
the steps of the method of the invention are performed.
The method of the present invention may be applied in or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The methods, steps, and logic blocks disclosed in embodiment 1 may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with embodiment 1 may be directly implemented by a hardware decoding processor, or may be implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques of the present invention may be implemented by executing the functional blocks (e.g., procedures, functions, and so on) of the present invention. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Example 3
Embodiment 3 of the present invention may also provide a nonvolatile storage medium for storing a computer program. The computer program may realize the steps of the above-described method embodiments when executed by a processor.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (7)
1. Atmospheric PM based on two-stage non-negative Lasso model2.5A method for predicting concentration, the method comprising:
dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area;
inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model, and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5Concentration data prediction result, atmospheric PM realizing the region2.5And (4) predicting the concentration.
2. The atmospheric PM based on the two-stage non-negative Lasso model of claim 12.5The concentration prediction method is characterized in that a certain area is divided into a plurality of grid areas on a spatial level, and the annual carbon dioxide emission data in each grid area is calculated by a bottom-up spatialization method for each grid area and is used as the carbon dioxide emission list data of the grid area; the method specifically comprises the following steps:
dividing a certain area into a plurality of square grid areas on a spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission data in each square grid area by using a bottom-up spatialization method for each square grid area to serve as the carbon dioxide emission list data of the grid area;
wherein the carbon dioxide emissions inventory data comprises: carbon dioxide total emissions data, energy carbon dioxide emissions data, industrial carbon dioxide emissions data, agricultural carbon dioxide emissions data, service industry carbon dioxide emissions data, municipal carbon dioxide emissions data, rural carbon dioxide emissions data, traffic carbon dioxide emissions data, aviation carbon dioxide emissions data, highway carbon dioxide emissions data, railway carbon dioxide emissions data, water transport carbon dioxide emissions data, and industrial process carbon dioxide emissions data.
3. The atmospheric PM based on the two-stage non-negative Lasso model of claim 12.5The concentration prediction method is characterized in that the carbon dioxide emission list data of a certain grid area of the area is input into a pre-trained two-stage non-negative Lasso model and outputObtaining a first prediction result and a second prediction result; the method specifically comprises the following steps:
the two-stage non-negative Lasso model includes: a first stage non-negative Lasso model and a second stage non-negative Lasso model;
wherein, the non-negative Lasso model in the first stage is as follows:
wherein the content of the first and second substances,is a first prediction result; xttA vector consisting of carbon dioxide total emission data in carbon dioxide emission list data of a certain grid area of the region;representing the estimated value of the coefficient of the first-stage model;
wherein a first objective function is constructed:
wherein the content of the first and second substances,is the first stage squared error;represents the regularization term, the Lasso portion of the model; lambda [ alpha ]nA weight coefficient which is a first-stage regularization term; y ispm2.5Monitoring site PM for all environments2.5A vector of concentration data;
converting the first objective function into a matrix form:
wherein the content of the first and second substances,for the first stage model coefficient estimationTransposing; xtt' is XttTransposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
solving the estimated value of the coefficient of the model in the first stage by quadratic programming
The second stage non-negative Lasso model is:
wherein, X-ttA vector formed by the remaining carbon dioxide emission data except the carbon dioxide total emission data in a certain grid area of the region is used as an independent variable;representing the estimated value of the coefficient of the second-stage model; respm2.5The second prediction result is obtained;
wherein a second objective function is constructed:
wherein the content of the first and second substances,is the estimated value of the second stage model coefficient;the second stage squared error;representing a regularization term; lambda [ alpha ]mThe weight coefficient is the second-stage regular term;
converting the second objective function into a matrix form:
wherein the content of the first and second substances,for second stage model coefficient estimationTransposing; x-tt' is X-ttTransposing; 1 denotes a dimension p2× 1 and each entry is a column vector of 1, p2A dimension equal to the second stage model coefficients;
Inputting the total carbon dioxide emission data in the carbon dioxide emission list data of a certain grid area of the area into the first-stage non-negative Lasso model, and outputting a first prediction result;
inputting the remaining carbon dioxide emission data except the total carbon dioxide emission data in the carbon dioxide emission list data of a certain grid area of the region into the second-stage non-negative Lasso model, and outputting a second prediction result.
4. The atmospheric PM based on the two-stage non-negative Lasso model of claim 32.5The concentration prediction method is characterized in that the training step of the two-stage non-negative Lasso model isThe body includes:
dividing the certain area into a plurality of square grid areas on the spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission training data in each square grid area by using a bottom-up spatialization method for each square grid area to be used as the carbon dioxide emission list training data of the grid area;
wherein the carbon dioxide emissions manifest training data comprises: carbon dioxide total emission training data, energy carbon dioxide emission training data, industrial carbon dioxide emission training data, agricultural carbon dioxide emission training data, service industry carbon dioxide emission training data, urban living carbon dioxide emission training data, rural living carbon dioxide emission training data, traffic carbon dioxide emission training data, aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data, water transportation carbon dioxide emission training data, and industrial process carbon dioxide emission training data;
calculating the grid area to which each environment monitoring station belongs according to the station position of each environment monitoring station, namely longitude data and latitude data of each environment monitoring station, and longitude data and latitude data of four vertexes of the corresponding grid area;
selecting N circles of grid areas around each environmental monitoring station according to the grid area to which each environmental monitoring station belongs, and acquiring atmospheric PM (particulate matter) from the grid area where the environmental monitoring station is located2.5Pollution concentration data as atmospheric PM in the circle of grid area2.5Pollution concentration training data;
selecting carbon dioxide emission list training data in each circle of grid area around the environment monitoring station; for the carbon dioxide emission list training data in each circle of grid area, solving the corresponding carbon dioxide class mean value according to different carbon dioxide index classes to obtain the carbon dioxide emission list training data of the corresponding carbon dioxide index class subjected to averaging treatment;
training carbon dioxide emission lists for averaging processing in each circle of grid area around environment monitoring siteData, and atmospheric PM within the circle of grid regions2.5The pollution concentration training data is divided into training set data and test set data according to the ratio of 7: 3; namely, training data of carbon dioxide emission lists which are subjected to equalization processing in each circle of grid area around 70 percent of environment monitoring sites and atmosphere PM in the circle of grid area2.5Taking the pollution concentration training data as training set data; training data of carbon dioxide emission lists which are equalized in each circle of grid area around 30% of environment monitoring sites and atmospheric PM in the circle of grid area2.5Taking pollution concentration training data as test set data;
utilizing atmosphere PM in Nth circle of grid area around environment monitoring station2.5The pollution concentration training data are used as dependent variables, and the total carbon dioxide emission training data in the carbon dioxide emission list training data subjected to averaging processing in the Nth circle of grid area around the environment monitoring station are used as independent variables to establish a first-stage non-negative Lasso model;
wherein the content of the first and second substances,a first prediction result; xtt1Representing a vector formed by carbon dioxide total emission training data in carbon dioxide emission training list data subjected to averaging processing in an Nth circle of grid area around the environment monitoring station;representing a first model coefficient training estimation value;
when the first model coefficient training estimation value is solved, the following objective function is constructed:
wherein the content of the first and second substances,training the square error for the first stage;representing a training regularization term, λn1Training the weight coefficient of the regular term for the first stage;
converting the objective function in (8) into a matrix form:
wherein the content of the first and second substances,representing an objective function;is composed ofTransposing; xtt1' is Xtt1Transposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
Calculating the fitting error res of the non-negative Lasso model in the first stagepm2.5:
Taking the fitting error res obtained by calculation in the formula (10) as a dependent variable; training with carbon dioxide emissions remaining in addition to carbon dioxide total emissions training dataExercise list data as argument (X)-tt1) Establishing a second stage non-negative Lasso model:
wherein, X-tt1A vector formed by the rest carbon dioxide emission training data except the carbon dioxide total emission training data in a certain grid area of the region is used as an independent variable;representing the training estimation value of the second stage model coefficient; respm2.5The second prediction result is obtained;
when solving the second model coefficient training estimation value, constructing the following objective function:
wherein the content of the first and second substances,training an estimated value for the second stage model coefficients;training the squared error for the second stage;representing a regularization term; lambda [ alpha ]m1Training the weight coefficient of the regular term for the second stage;
converting the objective function in (12) into a matrix form:
wherein the content of the first and second substances,is composed ofTransposing; x-tt1' is X-tt1Transposing; 1 denotes a dimension p2× 1 and each entry is a column vector of 1, p2A dimension equal to the second stage model coefficients;
For the reserved 30% test set, respectively using the two-stage model obtained by training to predict, obtaining a corresponding first prediction result and a second prediction result, and adding the two prediction results to obtain the atmosphere PM of the environment monitoring station2.5Predicted value of concentration data:
evaluating the model prediction effect by using the relative percentage error MAPE:
wherein, observedtAtmospheric PM representing environmental monitoring sites2.5The actual value of the contamination concentration data; predictedtAtmospheric PM for environmental monitoring sites2.5The predicted value of the concentration data is the predicted result output by the two-stage non-negative lasso model; n1 denotes the number of prediction samples; the subscript t is used to identify the t-th sample;
for each environment monitoring site, 70% of carbon dioxide emission list data in each circle of grid area in N circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs2.5Using the pollution concentration data as training set data, and repeating the modeling process until the model is obtainedAnd the effect evaluation index MAPE enables the model effect to be converged in the test set data, so that the final two-stage non-negative Lasso model is obtained.
5. Atmospheric PM based on two-stage non-negative Lasso model2.5A concentration prediction system, characterized in that the system comprises:
the grid division module is used for dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area; and
the prediction module is used for inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5The concentration data predicts the result.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 4 when executing the computer program.
7. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010325992.0A CN111581792B (en) | 2020-04-23 | 2020-04-23 | Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010325992.0A CN111581792B (en) | 2020-04-23 | 2020-04-23 | Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111581792A true CN111581792A (en) | 2020-08-25 |
CN111581792B CN111581792B (en) | 2021-01-08 |
Family
ID=72120308
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010325992.0A Active CN111581792B (en) | 2020-04-23 | 2020-04-23 | Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111581792B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112884274A (en) * | 2021-01-11 | 2021-06-01 | 生态环境部环境规划院 | Carbon dioxide source-sink matching method and device based on emission grid |
CN113144844A (en) * | 2021-03-22 | 2021-07-23 | 国家能源集团国源电力有限公司 | Desulfurizer flow control method and device and coal combustion system |
CN116108998A (en) * | 2023-02-22 | 2023-05-12 | 葛洲坝集团交通投资有限公司 | Expressway construction project carbon emission prediction method and system |
CN117540346A (en) * | 2024-01-09 | 2024-02-09 | 四川国蓝中天环境科技集团有限公司 | Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120290520A1 (en) * | 2011-05-11 | 2012-11-15 | Affectivon Ltd. | Affective response predictor for a stream of stimuli |
US20130326625A1 (en) * | 2012-06-05 | 2013-12-05 | Los Alamos National Security, Llc | Integrating multiple data sources for malware classification |
CN105550766A (en) * | 2015-12-04 | 2016-05-04 | 山东大学 | Micro-grid robustness multi-target operation optimization method containing renewable energy resources |
CN106094786A (en) * | 2016-05-30 | 2016-11-09 | 宁波大学 | Industrial process flexible measurement method based on integrated-type independent entry regression model |
CN106124700A (en) * | 2016-06-20 | 2016-11-16 | 重庆大学 | A kind of band is from the Electronic Nose non-targeted interference Gas Distinguishing Method expressed |
CN106529081A (en) * | 2016-12-03 | 2017-03-22 | 安徽新华学院 | PM2.5 real-time level prediction method and system based on neural net |
CN107451545A (en) * | 2017-07-15 | 2017-12-08 | 西安电子科技大学 | The face identification method of Non-negative Matrix Factorization is differentiated based on multichannel under soft label |
CN107766296A (en) * | 2017-09-30 | 2018-03-06 | 东南大学 | The method that evaluation path traffic characteristic influences on Inhaled Particulate Matters Emission concentration |
CN108009674A (en) * | 2017-11-27 | 2018-05-08 | 上海师范大学 | Air PM2.5 concentration prediction methods based on CNN and LSTM fused neural networks |
CN109344963A (en) * | 2018-10-17 | 2019-02-15 | 西安邮电大学 | Ultra-large hidden layer node fast selecting method in extreme learning machine |
CN110580386A (en) * | 2019-08-23 | 2019-12-17 | 生态环境部环境规划院 | Traffic department carbon dioxide emission space gridding method |
-
2020
- 2020-04-23 CN CN202010325992.0A patent/CN111581792B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120290520A1 (en) * | 2011-05-11 | 2012-11-15 | Affectivon Ltd. | Affective response predictor for a stream of stimuli |
US20130326625A1 (en) * | 2012-06-05 | 2013-12-05 | Los Alamos National Security, Llc | Integrating multiple data sources for malware classification |
CN105550766A (en) * | 2015-12-04 | 2016-05-04 | 山东大学 | Micro-grid robustness multi-target operation optimization method containing renewable energy resources |
CN106094786A (en) * | 2016-05-30 | 2016-11-09 | 宁波大学 | Industrial process flexible measurement method based on integrated-type independent entry regression model |
CN106124700A (en) * | 2016-06-20 | 2016-11-16 | 重庆大学 | A kind of band is from the Electronic Nose non-targeted interference Gas Distinguishing Method expressed |
CN106529081A (en) * | 2016-12-03 | 2017-03-22 | 安徽新华学院 | PM2.5 real-time level prediction method and system based on neural net |
CN107451545A (en) * | 2017-07-15 | 2017-12-08 | 西安电子科技大学 | The face identification method of Non-negative Matrix Factorization is differentiated based on multichannel under soft label |
CN107766296A (en) * | 2017-09-30 | 2018-03-06 | 东南大学 | The method that evaluation path traffic characteristic influences on Inhaled Particulate Matters Emission concentration |
CN108009674A (en) * | 2017-11-27 | 2018-05-08 | 上海师范大学 | Air PM2.5 concentration prediction methods based on CNN and LSTM fused neural networks |
CN109344963A (en) * | 2018-10-17 | 2019-02-15 | 西安邮电大学 | Ultra-large hidden layer node fast selecting method in extreme learning machine |
CN110580386A (en) * | 2019-08-23 | 2019-12-17 | 生态环境部环境规划院 | Traffic department carbon dioxide emission space gridding method |
Non-Patent Citations (5)
Title |
---|
CAI, YAPING 等: "Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches", 《AGRICULTURAL AND FOREST METEOROLOGY》 * |
SHAN, YULI 等: "Methodology and applications of city level CO2 emission accounts in China", 《JOURNAL OF CLEANER PRODUCTION》 * |
王健颖: "不同排放源清单对于京津冀PM_(2.5)影响的数值试验研究", 《中国优秀硕士学位论文全文数据库工程科技Ⅰ辑》 * |
翁克瑞 等: "TPE-XGBOOST与LassoLars组合下PM_(2.5)浓度分解集成预测模型研究", 《***工程理论与实践》 * |
蔡博峰 等: "基于1km网格的天津市二氧化碳排放研究", 《环境科学学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112884274A (en) * | 2021-01-11 | 2021-06-01 | 生态环境部环境规划院 | Carbon dioxide source-sink matching method and device based on emission grid |
CN113144844A (en) * | 2021-03-22 | 2021-07-23 | 国家能源集团国源电力有限公司 | Desulfurizer flow control method and device and coal combustion system |
CN116108998A (en) * | 2023-02-22 | 2023-05-12 | 葛洲坝集团交通投资有限公司 | Expressway construction project carbon emission prediction method and system |
CN116108998B (en) * | 2023-02-22 | 2023-12-15 | 葛洲坝集团交通投资有限公司 | Expressway construction project carbon emission prediction method and system |
CN117540346A (en) * | 2024-01-09 | 2024-02-09 | 四川国蓝中天环境科技集团有限公司 | Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data |
CN117540346B (en) * | 2024-01-09 | 2024-03-19 | 四川国蓝中天环境科技集团有限公司 | Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data |
Also Published As
Publication number | Publication date |
---|---|
CN111581792B (en) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111581792B (en) | Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system | |
Kanaroglou et al. | Estimation of sulfur dioxide air pollution concentrations with a spatial autoregressive model | |
Kim | An assessment of deforestation models for reducing emissions from deforestation and forest degradation (REDD) | |
CN110348746B (en) | Air quality influence assessment method and device based on single pollution source | |
CN110046382A (en) | Source Apportionment, device, electronic equipment and the storage medium of atmosphere pollution | |
CN111753426B (en) | Method and device for analyzing source of particulate pollution | |
Rusiawan et al. | System dynamics modeling for urban economic growth and CO2 emission: a case study of Jakarta, Indonesia | |
KR20210086326A (en) | Prediction Method and System of Regional PM2.5 Concentration | |
Torkayesh et al. | A comparative assessment of air quality across European countries using an integrated decision support model | |
CN112711893B (en) | Method and device for calculating contribution of pollution source to PM2.5 and electronic equipment | |
Li et al. | Source contribution analysis of PM2. 5 using response surface model and particulate source apportionment technology over the PRD region, China | |
Kaginalkar et al. | Stakeholder analysis for designing an urban air quality data governance ecosystem in smart cities | |
Peng et al. | Unit and regression tests of scientific software: A study on SWMM | |
Qiao et al. | Prediction of PM 2.5 concentration based on weighted bagging and image contrast-sensitive features | |
Chen et al. | Modelling traffic noise in a wide gradient interval using artificial neural networks | |
Chen et al. | Global sensitivity analysis of VISSIM parameters for project-level traffic emissions: a case study at a signalized intersection | |
Al‐Adwani et al. | A surrogate‐based optimization methodology for the optimal design of an air quality monitoring network | |
Fu et al. | Physio-chemical modeling of the NOx-O3 photochemical cycle and the air pollutants’ reactive dispersion around an isolated building | |
Jia et al. | Embodied GHG emissions of high speed rail stations: Quantification, data-driven prediction and cost-benefit analysis | |
Kang et al. | Fine dust forecast based on recurrent neural networks | |
Rumaling et al. | Forecasting particulate matter concentration using nonlinear autoregression with exogenous input model | |
Rahi et al. | Smart platforms of air quality monitoring: A logical literature exploration | |
Hogrefe et al. | Demonstrating attainment of the air quality standards: Integration of observations and model predictions into the probabilistic framework | |
Wikle et al. | A mechanistic model of annual sulfate concentrations in the United States | |
Ren et al. | Predicting indoor particle concentration in mechanically ventilated classrooms using neural networks: Model development and generalization ability analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |