CN111581792A - Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system - Google Patents

Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system Download PDF

Info

Publication number
CN111581792A
CN111581792A CN202010325992.0A CN202010325992A CN111581792A CN 111581792 A CN111581792 A CN 111581792A CN 202010325992 A CN202010325992 A CN 202010325992A CN 111581792 A CN111581792 A CN 111581792A
Authority
CN
China
Prior art keywords
carbon dioxide
data
dioxide emission
training
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010325992.0A
Other languages
Chinese (zh)
Other versions
CN111581792B (en
Inventor
蔡博峰
刘译璟
鲁瑞
魏太云
曹丽斌
伍鹏程
庞凌云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Environmental Planning Institute Of Ministry Of Ecology And Environment
Original Assignee
Environmental Planning Institute Of Ministry Of Ecology And Environment
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Environmental Planning Institute Of Ministry Of Ecology And Environment filed Critical Environmental Planning Institute Of Ministry Of Ecology And Environment
Priority to CN202010325992.0A priority Critical patent/CN111581792B/en
Publication of CN111581792A publication Critical patent/CN111581792A/en
Application granted granted Critical
Publication of CN111581792B publication Critical patent/CN111581792B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/06Investigating concentration of particle suspensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/10Numerical modelling

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Dispersion Chemistry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of atmospheric pollutant concentration prediction, and particularly relates to an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model2.5A method of predicting concentration, the method comprising: dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area; inputting carbon dioxide emission data of a certain region of the region collected in real time into a pre-trained two-stage non-negative Lasso model, and outputting a first prediction result and a second prediction result; adding the first prediction result and the second prediction result to obtain the PM of the area2.5Concentration data prediction result, atmospheric PM realizing the region2.5And (4) predicting the concentration.

Description

Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system
Technical Field
The invention belongs to the technical field of atmospheric pollutant concentration prediction, and particularly relates to an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model2.5A concentration prediction method and system.
Background
PM2.5It refers to particles with an aerodynamic equivalent diameter of less than or equal to 2.5 microns in the environment, and the higher content concentration of the particles in the air represents the more serious air pollution. With the rapid advance of industrialization, the atmospheric haze phenomenon is more and more serious, and PM2.5The haze-preventing agent is one of main primitive fierce haze phenomena, the particle size of the haze-preventing agent is small, the haze-preventing agent can suspend in the air for a long time and spread, toxic and harmful substances can be carried into respiratory tracts and lungs, frequent large-scale haze influences daily trips of people, and direct threats are caused to human health. PM (particulate matter)2.5Is the main component of haze, and the primary tasks of treating haze and improving air quality are to control PM2.5,PM2.5The concentration prediction is the main content of the air quality prediction. Recent studies have shown that PM is used2.5The typical atmospheric composite air pollution has begun to become a significant environmental problem affecting the quality of life of people.
The simulation technique is a model technique that reflects the system behavior by means of numerical calculation or the like. Different from a general prediction model, only the pursuit of high prediction precision is carried out, a simulation model pays more attention to model interpretability, the attention to a simulation process is paid, and a limiting condition needs to be added in the model according to the actual service condition.
Research shows that in China, the emission of CO2 and the emission of atmospheric pollutants are the same (fossil fuel), the same (in the combustion process) and the same (the same equipment or the same emission port) in the energy structure mainly based on fossil fuels, and the emission of the atmospheric pollutants have a very close relationship.
For atmospheric PM2.5Concentration prediction, typically based on pollutant emission data and meteorological condition data, using multivariate regression models and random forest models for atmospheric PM2.5And (4) predicting the concentration. However, the conventional method has the following problems:
1) the consistency of the positive and negative of the model coefficient with the actual service cannot be ensured;
2) the model coefficients are not guaranteed to be all non-zero, namely that each carbon dioxide index cannot guarantee to monitor PM of the environment monitoring station2.5The concentration has influence, which is not in accordance with the actual service, so that the prediction has larger error, and the accuracy of the prediction is reduced.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides an atmosphere PM based on a two-stage non-negative Lasso model (Least Absolute shock and Selection Operator)2.5Concentration prediction method by monitoring PM of sites in different environments2.5And establishing a model for the concentration and the carbon dioxide emission list data of the area around the site, and analyzing the specific association relation between the concentration and the carbon dioxide emission list data.
The invention provides an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model2.5A method of predicting concentration, the method comprising:
dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area;
inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model, and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5Concentration data prediction result, atmospheric PM realizing the region2.5And (4) predicting the concentration.
As an improvement of the above technical solution, the method is characterized in that a certain area is divided into a plurality of grid areas on a spatial level, and for each grid area, the annual carbon dioxide emission data in the grid area is calculated by using a bottom-up spatialization method and is used as the carbon dioxide emission list data of the grid area; the method specifically comprises the following steps:
dividing a certain area into a plurality of square grid areas on a spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission data in each square grid area by using a bottom-up spatialization method for each square grid area to serve as the carbon dioxide emission list data of the grid area;
wherein the carbon dioxide emissions inventory data comprises: carbon dioxide total emissions data, energy carbon dioxide emissions data, industrial carbon dioxide emissions data, agricultural carbon dioxide emissions data, service industry carbon dioxide emissions data, municipal carbon dioxide emissions data, rural carbon dioxide emissions data, traffic carbon dioxide emissions data, aviation carbon dioxide emissions data, highway carbon dioxide emissions data, railway carbon dioxide emissions data, water transport carbon dioxide emissions data, and industrial process carbon dioxide emissions data.
As one improvement of the above technical solution, the carbon dioxide emission list data of a certain grid area of the area is input to a pre-trained two-stage non-negative Lasso model, and a first prediction result and a second prediction result are output; the method specifically comprises the following steps:
the two-stage non-negative Lasso model includes: a first stage non-negative Lasso model and a second stage non-negative Lasso model;
wherein, the non-negative Lasso model in the first stage is as follows:
Figure BDA0002463240570000031
wherein the content of the first and second substances,
Figure BDA0002463240570000032
is a first prediction result; xttIndicates the placeA vector formed by carbon dioxide total emission data in carbon dioxide emission list data of a certain grid area of the area;
Figure BDA0002463240570000033
representing the estimated value of the coefficient of the first-stage model;
wherein a first objective function is constructed:
Figure BDA0002463240570000034
wherein the content of the first and second substances,
Figure BDA0002463240570000035
is the first stage squared error;
Figure BDA0002463240570000036
represents the regularization term, the Lasso portion of the model; lambda [ alpha ]nA weight coefficient which is a first-stage regularization term; y ispm2.5Monitoring site PM for all environments2.5A vector of concentration data;
converting the first objective function into a matrix form:
Figure BDA0002463240570000037
wherein the content of the first and second substances,
Figure BDA0002463240570000038
for the first stage model coefficient estimation
Figure BDA0002463240570000039
Transposing; xtt' is XttTransposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
solving the estimated value of the coefficient of the model in the first stage by quadratic programming
Figure BDA00024632405700000310
The second stage non-negative Lasso model is:
Figure BDA00024632405700000311
wherein, X-ttA vector formed by the remaining carbon dioxide emission data except the carbon dioxide total emission data in a certain grid area of the region is used as an independent variable;
Figure BDA00024632405700000312
representing the estimated value of the coefficient of the second-stage model; respm2.5The second prediction result is obtained;
wherein a second objective function is constructed:
Figure BDA00024632405700000313
wherein the content of the first and second substances,
Figure BDA00024632405700000314
is the estimated value of the second stage model coefficient;
Figure BDA00024632405700000315
the second stage squared error;
Figure BDA00024632405700000316
representing a regularization term; lambda [ alpha ]mThe weight coefficient is the second-stage regular term;
converting the second objective function into a matrix form:
Figure BDA0002463240570000041
wherein the content of the first and second substances,
Figure BDA0002463240570000042
for second stage model coefficient estimation
Figure BDA0002463240570000043
Transposing; x-tt' is X-ttTransposing; 1 denotes a dimension p2× 1 and each entry is a column vector of 1, p2A dimension equal to the second stage model coefficients;
solving the estimated value of the second-stage model coefficient by quadratic programming
Figure BDA0002463240570000044
Inputting the total carbon dioxide emission data in the carbon dioxide emission data of a certain grid area of the area into the first-stage non-negative Lasso model, and outputting a first prediction result;
and inputting the rest carbon dioxide emission data except the total carbon dioxide emission data in the carbon dioxide emission data of a certain grid area of the region into the second-stage non-negative Lasso model, and outputting a second prediction result.
As an improvement of the above technical solution, the two-stage non-negative Lasso model training step specifically includes:
dividing the certain area into a plurality of square grid areas on the spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission training data in each square grid area by using a bottom-up spatialization method for each square grid area to be used as the carbon dioxide emission list training data of the grid area;
wherein the carbon dioxide emissions manifest training data comprises: carbon dioxide total emission training data, energy carbon dioxide emission training data, industrial carbon dioxide emission training data, agricultural carbon dioxide emission training data, service industry carbon dioxide emission training data, urban living carbon dioxide emission training data, rural living carbon dioxide emission training data, traffic carbon dioxide emission training data, aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data, water transportation carbon dioxide emission training data, and industrial process carbon dioxide emission training data;
calculating the grid area to which each environment monitoring station belongs according to the station position of each environment monitoring station, namely longitude data and latitude data of each environment monitoring station, and longitude data and latitude data of four vertexes of the corresponding grid area;
selecting N circles of grid areas around each environmental monitoring station according to the grid area to which each environmental monitoring station belongs, and acquiring atmospheric PM (particulate matter) from the grid area where the environmental monitoring station is located2.5Pollution concentration data as atmospheric PM in the circle of grid area2.5Pollution concentration training data;
selecting carbon dioxide emission list training data in each circle of grid area around the environment monitoring station; for the carbon dioxide emission list training data in each circle of grid area, solving the corresponding carbon dioxide class mean value according to different carbon dioxide index classes to obtain the carbon dioxide emission list training data of the corresponding carbon dioxide index class subjected to averaging treatment;
training data of carbon dioxide emission lists for averaging processing in each circle of grid area around environment monitoring site and atmosphere PM in the circle of grid area2.5The pollution concentration training data is divided into training set data and test set data according to the ratio of 7: 3; namely, training data of carbon dioxide emission lists which are subjected to equalization processing in each circle of grid area around 70 percent of environment monitoring sites and atmosphere PM in the circle of grid area2.5Taking the pollution concentration training data as training set data; training data of carbon dioxide emission lists for averaging processing in each circle of grid area around 30% of environment monitoring sites and atmosphere PM in the circle of grid area2.5Taking pollution concentration training data as test set data;
utilizing atmosphere PM in Nth circle of grid area around environment monitoring station2.5The pollution concentration training data are used as dependent variables, and the total carbon dioxide emission training data in the carbon dioxide emission list training data subjected to averaging processing in the Nth circle of grid area around the environment monitoring station are used as independent variables to establish a first-stage non-negative Lasso model;
Figure BDA0002463240570000051
wherein the content of the first and second substances,
Figure BDA0002463240570000052
a first prediction result; xtt1Representing a vector formed by carbon dioxide total emission training data in carbon dioxide emission training list data subjected to averaging processing in an Nth circle of grid area around the environment monitoring station;
Figure BDA0002463240570000053
representing a first model coefficient training estimation value;
when the first model coefficient training estimation value is solved, the following objective function is constructed:
Figure BDA0002463240570000054
wherein the content of the first and second substances,
Figure BDA0002463240570000055
training the square error for the first stage;
Figure BDA0002463240570000056
representing a training regularization term, λn1Training the weight coefficient of the regular term for the first stage;
converting the objective function in (8) into a matrix form:
Figure BDA0002463240570000057
wherein the content of the first and second substances,
Figure BDA0002463240570000058
representing an objective function;
Figure BDA0002463240570000059
is composed of
Figure BDA00024632405700000510
Transposing; xtt1' is Xtt1Transposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
solving the training estimated value of the first-stage model coefficient by quadratic programming
Figure BDA00024632405700000511
Calculating the fitting error res of the non-negative Lasso model in the first stagepm2.5
Figure BDA0002463240570000061
Taking the fitting error res obtained by calculation in the formula (10) as a dependent variable; using the remaining carbon dioxide emission training list data, excluding the total carbon dioxide emission training data, as the independent variable (X)-tt1) Establishing a second stage non-negative Lasso model:
Figure BDA0002463240570000062
wherein, X-tt1A vector formed by the rest carbon dioxide emission training data except the carbon dioxide total emission training data in a certain grid area of the area is used as an independent variable;
Figure BDA0002463240570000063
representing the training estimation value of the second stage model coefficient; respm2.5The second prediction result is obtained;
when solving the second model coefficient training estimation value, constructing the following objective function:
Figure BDA0002463240570000064
wherein the content of the first and second substances,
Figure BDA0002463240570000065
training an estimated value for the second stage model coefficients;
Figure BDA0002463240570000066
training the squared error for the second stage;
Figure BDA0002463240570000067
representing a regularization term; lambda [ alpha ]m1Training the weight coefficient of the regular term for the second stage;
converting the objective function in (12) into a matrix form:
Figure BDA0002463240570000068
wherein the content of the first and second substances,
Figure BDA0002463240570000069
is composed of
Figure BDA00024632405700000610
Transposing; x-tt1' is X-tt1Transposing; 1 denotes a dimension p2× 1 and each entry is a column vector of 1, p2A dimension equal to the second stage model coefficients;
solving the estimated value of the second-stage model coefficient by quadratic programming
Figure BDA00024632405700000611
For the reserved 30% test set, respectively using the two-stage model obtained by training to predict, obtaining a corresponding first prediction result and a second prediction result, and adding the two prediction results to obtain the atmosphere PM of the environment monitoring station2.5Predicted value of concentration data:
Figure BDA00024632405700000612
evaluating the model prediction effect by using the relative percentage error MAPE:
Figure BDA00024632405700000613
wherein, observedtAtmospheric PM representing environmental monitoring sites2.5The actual value of the contamination concentration data; predictedtAtmospheric PM for environmental monitoring sites2.5The predicted value of the concentration data is the predicted result output by the two-stage non-negative lasso model; n1 denotes the number of prediction samples; the subscript t is used to identify the t-th sample;
for each environment monitoring site, 70% of carbon dioxide emission list data in each circle of grid area in N circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs2.5And (3) using the pollution concentration data as training set data, and repeating the modeling process until the model effect evaluation index MAPE enables the model effect to be converged in the test set data, so as to obtain the final two-stage non-negative Lasso model.
The invention also provides an atmosphere PM based on the two-stage non-negative Lasso model2.5A concentration prediction system, the system comprising:
the grid division module is used for dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area; and
the prediction module is used for inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5The concentration data predicts the result.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method when executing the computer program.
The invention also provides a computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the above-mentioned method.
Compared with the prior art, the invention has the beneficial effects that:
simulating carbon dioxide emission list data in grid area and atmosphere PM in grid area to which nearby environment monitoring station belongs by using two-stage non-negative Lasso model2.5Relationships between concentration data;
by adopting a two-stage non-negative Lasso model, atmospheric PM of a predicted environment monitoring site can be guaranteed2.5Carbon dioxide emissions inventory data for all surrounding grids at concentration versus predicted PM2.5Positive influence is generated, namely the positive influence is expressed in the model, and the coefficient corresponding to each carbon dioxide index is not 0, so that the method is more in line with business practice;
the Lasso part in the model can compress unimportant variables, remove the collinearity among indexes and ensure the generalization capability of the model;
on the spatial grid level, atmospheric PM in any grid area is realized2.5Predicting concentration data so as to complete quantitative evaluation on the influence of air quality; the regional carbon dioxide emission and the atmospheric pollutant cooperative management are quickly and effectively realized.
Drawings
FIG. 1 shows an atmospheric PM based on a two-stage non-negative Lasso model according to the present invention2.5The spatial relative position schematic diagram of the grid region to which the environment monitoring station belongs during training of the two-stage non-negative Lasso model in the concentration prediction method;
FIG. 2 is an atmospheric PM based on a two-stage non-negative Lasso model according to the present invention2.5A schematic diagram of a two-stage non-negative Lasso model training process in the concentration prediction method;
FIG. 3 is an atmospheric PM based on a two-stage non-negative Lasso model according to the present invention2.5Environment monitoring site for testing and verifying two-stage non-negative Lasso model in concentration prediction method by adopting test set after training is completedPM of the atmosphere2.5Distribution histogram of prediction error of concentration.
Detailed Description
The invention will now be further described with reference to the accompanying drawings.
The invention provides an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model2.5The concentration prediction method can quickly simulate and predict the atmospheric PM of the environmental monitoring sites near the area when the carbon dioxide emission (essentially the energy utilization condition) changes in the area2.5The change of concentration; or predicting and analyzing possible regional air quality (PM) according to energy utilization and structural change in regional planning2.5Concentration) of the sample.
The Lasso model was first proposed by Robert Tibshirani in 1996, and is called as a Least absoluteshrinkage and selection operator. The model is a compressed estimate, and a more refined model is obtained by constructing a penalty function so that it compresses coefficients while setting coefficients to zero. Thus, the advantage of subset puncturing is retained, and is a way to process biased estimates with complex collinearity data.
The method comprises the following steps:
dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area;
specifically, a certain area is divided into a plurality of square grid areas on the spatial level according to 10km multiplied by 10km, and the annual carbon dioxide emission data in each square grid area is checked out by utilizing a bottom-up spatialization method for each square grid area and is used as the carbon dioxide emission list data of the grid area;
wherein the carbon dioxide emissions inventory data comprises: carbon dioxide total emissions data, energy carbon dioxide emissions data, industrial carbon dioxide emissions data, agricultural carbon dioxide emissions data, service industry carbon dioxide emissions data, municipal carbon dioxide emissions data, rural carbon dioxide emissions data, traffic carbon dioxide emissions data, aviation carbon dioxide emissions data, highway carbon dioxide emissions data, railway carbon dioxide emissions data, water transport carbon dioxide emissions data, and industrial process carbon dioxide emissions data.
Wherein the total carbon dioxide emission data is the sum of energy carbon dioxide emission data and industrial process carbon dioxide emission data;
the energy carbon dioxide emission data is the sum of industrial carbon dioxide emission data, agricultural carbon dioxide emission data, service industry carbon dioxide emission data, urban life carbon dioxide emission data, rural life carbon dioxide emission data and traffic carbon dioxide emission data; the traffic carbon dioxide emission data is a sum of aviation carbon dioxide emission data, highway carbon dioxide emission data, railroad carbon dioxide emission data, and water transport carbon dioxide emission data.
Wherein the data is from a high spatial resolution drainage grid database.
Each emission data in the carbon dioxide emission list data corresponds to a carbon dioxide emission index, the total carbon dioxide emission data corresponds to a total carbon dioxide emission index, the energy carbon dioxide emission data corresponds to an energy carbon dioxide emission index, the industrial carbon dioxide emission data corresponds to an industrial carbon dioxide emission index, the agricultural carbon dioxide emission data corresponds to an agricultural carbon dioxide emission index, the service carbon dioxide emission data corresponds to a service carbon dioxide emission index, the urban life carbon dioxide emission data corresponds to an urban life carbon dioxide emission index, the rural life carbon dioxide emission data corresponds to a rural life carbon dioxide emission index, the traffic carbon dioxide emission data corresponds to a traffic carbon dioxide emission index, the aviation carbon dioxide emission data corresponds to an aviation carbon dioxide emission index, and the highway carbon dioxide emission data corresponds to a highway carbon dioxide emission index, the railway carbon dioxide emission data correspond to railway carbon dioxide emission indexes, the water transport carbon dioxide emission data correspond to water transport carbon dioxide emission indexes, the industrial process carbon dioxide emission data correspond to industrial process carbon dioxide emission indexes, and 13 carbon dioxide emission indexes are provided in total.
Inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model, and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5Concentration data prediction result, atmospheric PM realizing the region2.5And (4) predicting the concentration.
Wherein the two-stage non-negative Lasso model comprises: a first stage non-negative Lasso model and a second stage non-negative Lasso model;
wherein, the non-negative Lasso model in the first stage is as follows:
Figure BDA0002463240570000091
wherein the content of the first and second substances,
Figure BDA0002463240570000101
is a first prediction result; xttA vector composed of carbon dioxide total emission data in carbon dioxide emission list data of a certain grid area of the region is used as an independent variable;
Figure BDA0002463240570000102
representing first stage model coefficient estimates, i.e. first stage model coefficient true values βttAn estimated value of (d);
wherein a first objective function is constructed:
Figure BDA0002463240570000103
wherein the content of the first and second substances,
Figure BDA0002463240570000104
is the first stage squared error;
Figure BDA0002463240570000105
represents the regularization term, the Lasso portion of the model; lambda [ alpha ]nBeing a first-stage regularization termA weight coefficient; y ispm2.5Monitoring site PM for all environments2.5A vector of concentration data;
converting the first objective function into a matrix form:
Figure BDA0002463240570000106
wherein the content of the first and second substances,
Figure BDA0002463240570000107
for the first stage model coefficient estimation
Figure BDA0002463240570000108
Transposing; xtt' is XttTransposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
solving the estimated value of the coefficient of the model in the first stage by quadratic programming
Figure BDA0002463240570000109
The second stage non-negative Lasso model is:
Figure BDA00024632405700001010
wherein, X-ttA vector formed by the remaining carbon dioxide emission data except the carbon dioxide total emission data in a certain grid area of the area is used as an independent variable;
Figure BDA00024632405700001011
representing the estimated value of the coefficient of the second-stage model; respm2.5The second prediction result is obtained;
wherein a second objective function is constructed:
Figure BDA00024632405700001012
wherein the content of the first and second substances,
Figure BDA00024632405700001013
is an estimate of the second stage model coefficients, i.e. the second stage model coefficient true value β-ttAn estimated value of (d);
Figure BDA00024632405700001014
the second stage squared error;
Figure BDA00024632405700001015
represents the regularization term, the Lasso portion of the model; lambda [ alpha ]mThe weight coefficient is the second-stage regular term; y ispm2.5Monitoring site PM for all environments2.5A vector of data;
Figure BDA0002463240570000111
vectors formed for the first prediction, i.e. PM for each environmental monitoring site according to the first stage model2.5A vector formed by the predicted values of (a);
converting the second objective function into a matrix form:
Figure BDA0002463240570000112
wherein the content of the first and second substances,
Figure BDA0002463240570000113
for second stage model coefficient estimation
Figure BDA0002463240570000114
Transposing; x-tt' is X-ttTransposing; 1 denotes a dimension p2× 1 and each entry is a column vector of 1, p2A dimension equal to the second stage model coefficients;
solving second stage model coefficients through quadratic programming
Figure BDA0002463240570000115
An estimated value;
inputting the total carbon dioxide emission data in the carbon dioxide emission list data of a certain grid area of the area into the first-stage non-negative Lasso model, and outputting a first prediction result;
and inputting the rest carbon dioxide emission data except the total carbon dioxide emission data in the carbon dioxide emission data of a certain grid area of the region into the second-stage non-negative Lasso model, and outputting a second prediction result.
As shown in fig. 2, the two-stage non-negative Lasso model training step specifically includes:
dividing the certain area into a plurality of square grid areas on the spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission training data in each square grid area by using a bottom-up spatialization method for each square grid area to be used as the carbon dioxide emission list training data of the grid area;
wherein the carbon dioxide emissions manifest training data comprises: carbon dioxide total emission training data, energy carbon dioxide emission training data, industrial carbon dioxide emission training data, agricultural carbon dioxide emission training data, service industry carbon dioxide emission training data, urban living carbon dioxide emission training data, rural living carbon dioxide emission training data, traffic carbon dioxide emission training data, aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data, water transportation carbon dioxide emission training data, and industrial process carbon dioxide emission training data;
wherein the carbon dioxide total emission training data is the sum of energy carbon dioxide emission training data and industrial process carbon dioxide emission training data;
the energy carbon dioxide emission training data is the sum of industrial carbon dioxide emission training data, agricultural carbon dioxide emission training data, service industry carbon dioxide emission training data, urban living carbon dioxide emission training data, rural living carbon dioxide emission training data, traffic carbon dioxide emission training data, aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data and water transport carbon dioxide emission training data;
the traffic carbon dioxide emission training data is the sum of aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data and water transport carbon dioxide emission training data.
Wherein the data is from a high spatial resolution drainage grid database.
Each emission training data in the carbon dioxide emission list training data corresponds to a carbon dioxide emission training index, the carbon dioxide total emission training data corresponds to a carbon dioxide total emission training index, the energy carbon dioxide emission training data corresponds to an energy carbon dioxide emission training index, the industrial carbon dioxide emission training data corresponds to an industrial carbon dioxide emission training index, the agricultural carbon dioxide emission training data corresponds to an agricultural carbon dioxide emission training index, the service carbon dioxide emission training data corresponds to a service carbon dioxide emission training index, the urban carbon dioxide emission training data corresponds to an urban carbon dioxide emission training index, the rural carbon dioxide emission training data corresponds to a rural carbon dioxide emission training index, and the traffic carbon dioxide emission training data corresponds to a traffic carbon dioxide emission training index, aviation carbon dioxide emission training data corresponds to aviation carbon dioxide emission training indexes, highway carbon dioxide emission training data corresponds to highway carbon dioxide emission training indexes, railway carbon dioxide emission training data corresponds to railway carbon dioxide emission training indexes, water-borne carbon dioxide emission training data corresponds to water-borne carbon dioxide emission training indexes, industrial process carbon dioxide emission training data corresponds to industrial process carbon dioxide emission training indexes, and the total number of the training indexes is 13.
Calculating the grid area to which each environment monitoring station belongs according to the station position of each environment monitoring station, namely longitude data and latitude data of each station, and the longitude data and the latitude data of four vertexes of the corresponding grid area;
monitoring sites according to each environmentSelecting N circles of grid areas around each station, and acquiring atmospheric PM from the grid area where the environmental station is located2.5Pollution concentration data as atmospheric PM in the circle of grid area2.5Pollution concentration training data;
selecting carbon dioxide emission list training data in each circle of grid area around the environment monitoring station; for the carbon dioxide emission list training data in each circle of grid area, solving the corresponding carbon dioxide class mean value according to different carbon dioxide index classes to obtain the carbon dioxide emission list training data of the corresponding carbon dioxide index class subjected to averaging treatment; for example, a certain circle of grid region of N circles of grid regions around each site is selected to include 8 grids, each grid includes the carbon dioxide emission list training data of the 13 carbon dioxide emission training indexes, and then the total carbon dioxide data in the certain circle of grid region is 8 × the carbon dioxide total emission training data including the 13 carbon dioxide emission training indexes; the averaging at this time is to add training data corresponding to 8 carbon dioxide emission training indexes with the same grid to obtain training data corresponding to the carbon dioxide emission training indexes subjected to averaging processing, and repeat the above operation 13 times to obtain training data corresponding to 13 carbon dioxide emission training indexes subjected to averaging processing respectively.
Training data of carbon dioxide emission lists for averaging processing in each circle of grid area around environment monitoring site and atmosphere PM in the circle of grid area2.5The pollution concentration training data is divided into training set data and test set data according to the ratio of 7: 3; namely, training data of carbon dioxide emission lists which are subjected to equalization processing in each circle of grid area around 70 percent of environment monitoring sites and atmosphere PM in the circle of grid area2.5Taking the pollution concentration training data as training set data; training data of carbon dioxide emission lists for averaging processing in each circle of grid area around 30% of environment monitoring sites and atmosphere PM in the circle of grid area2.5Taking pollution concentration training data as test set data;
utilizing atmosphere PM in Nth circle of grid area around environment monitoring station2.5The pollution concentration training data are used as dependent variables, and the total carbon dioxide emission training data in the carbon dioxide emission list training data subjected to averaging processing in the Nth circle of grid area around the environment monitoring station are used as independent variables to establish a first-stage non-negative Lasso model;
Figure BDA0002463240570000131
wherein the content of the first and second substances,
Figure BDA0002463240570000132
is a first prediction result; xtt1Representing a vector formed by carbon dioxide total emission training data in carbon dioxide emission training list data subjected to averaging processing in an Nth circle of grid area around the environment monitoring station;
Figure BDA0002463240570000133
representing a first model coefficient training estimation value;
when solving the first-stage model coefficient training estimation value, constructing the following objective function:
Figure BDA0002463240570000134
wherein the content of the first and second substances,
Figure BDA0002463240570000135
training the square error for the first stage;
Figure BDA0002463240570000136
representing a training regularization term, λn1Training the weight coefficient of the regular term for the first stage;
converting the objective function in (8) into a matrix form:
Figure BDA0002463240570000137
wherein the content of the first and second substances,
Figure BDA0002463240570000138
representing an objective function;
Figure BDA0002463240570000139
is composed of
Figure BDA00024632405700001310
Transposing; xtt1' is Xtt1Transposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
solving the training estimated value of the first-stage model coefficient by quadratic programming
Figure BDA00024632405700001311
Calculating the fitting error res of the non-negative Lasso model in the first stagepm2.5
Figure BDA0002463240570000141
Taking the fitting error res obtained by calculation in the formula (10) as a dependent variable; using the remaining carbon dioxide emission training list data, excluding the total carbon dioxide emission training data, as the independent variable (X)-tt1) Establishing a second stage non-negative Lasso model:
Figure BDA0002463240570000142
wherein, X-tt1A vector formed by the rest carbon dioxide emission training data except the carbon dioxide total emission training data in a certain grid area of the area is used as an independent variable;
Figure BDA0002463240570000143
representing the training estimation value of the second stage model coefficient; respm2.5The second prediction result is obtained;
when solving the second stage model coefficient training estimation value, constructing the following objective function:
Figure BDA0002463240570000144
wherein the content of the first and second substances,
Figure BDA0002463240570000145
for the second stage model coefficients β-tt1The training estimation value of (2), namely the training estimation value of the second stage model coefficient;
Figure BDA0002463240570000146
training the squared error for the second stage;
Figure BDA0002463240570000147
represents the regularization term, the Lasso portion of the model; lambda [ alpha ]m1The weight coefficient is the second-stage regular term;
converting the second objective function in (12) into a matrix form:
Figure BDA0002463240570000148
wherein the content of the first and second substances,
Figure BDA0002463240570000149
for second stage model coefficient estimation
Figure BDA00024632405700001410
Transposing; x-tt1' is X-tt1Transposing; 1 denotes a dimension p2× 1 column vector with each entry being 1, p2A dimension equal to the second stage model coefficients;
for (13), the second-stage model coefficient training estimation value is solved by utilizing quadratic programming
Figure BDA00024632405700001411
For the reserved 30% test set, respectively using the two-stage model obtained by training to perform pre-testMeasuring to obtain a first prediction result and a second prediction result, and adding the two prediction results to obtain the atmospheric PM of the environment monitoring station2.5Predicted value of concentration data:
Figure BDA00024632405700001412
evaluating the model prediction effect by using the relative percentage error MAPE:
Figure BDA00024632405700001413
wherein, observedtAtmospheric PM representing environmental monitoring sites2.5The actual value of the contamination concentration data; predictedtAtmospheric PM for environmental monitoring sites2.5The predicted value of the concentration data is the predicted result output by the two-stage non-negative lasso model; n1 denotes the number of prediction samples; the subscript t is used to identify the tth sample (environmental monitoring site);
for each environment monitoring site, 70% of carbon dioxide emission list data in each circle of grid area in N circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs2.5And (3) using the pollution concentration data as training set data, repeating the modeling process until the model effect evaluation index MAPE reaches convergence on the test set data (when the MAPE value is reduced by no more than 0.01, the model is considered to reach convergence), and obtaining the final two-stage non-negative Lasso model.
In this embodiment, as shown in fig. 1, 3 circles of grid areas around an environment monitoring site are selected, where each circle of grid area includes a plurality of grids; wherein, the dots represent environment monitoring sites; squares represent a 10 x 10km grid; for 70% carbon dioxide emission list data in each circle of grid areas in 3 circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs2.5The modeling process is repeated by using the pollution concentration data as training set data until the model effect evaluation index MAPE reaches convergence in the test set data (when the MAPE reaches convergenceWhen the value drops by no more than 0.01, the model is considered to have converged), and the final two-stage non-negative Lasso model is obtained. And if the model effect evaluation index MAPE does not meet the standard, increasing the number of turns and carrying out training again to meet the standard.
As shown in FIG. 3, the model representing the convergence predicts PM per sample on the test set2.5And the distribution of the obtained prediction errors shows that the prediction errors are concentrated between 0 and 15, and the model has good prediction accuracy.
The prediction error is calculated as follows:
errort=|observedt-predictedt|
wherein, errortRepresenting the prediction error of the model on the t sample; observedtIndicates the PM corresponding to the t-th sample2.5Obtaining a true value; predictedtPM representing model vs. t sample2.5The predicted value of (2).
In the embodiment, the atmospheric PM of the environmental monitoring sites around the grid is simulated by changing the carbon dioxide emission list data of a certain grid area through a two-stage non-negative Lasso model2.5The change in concentration.
Example 1.
The invention also provides an atmosphere PM based on the two-stage non-negative Lasso model2.5A concentration prediction system, characterized in that the system comprises:
the grid division module is used for dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area; and
the prediction module is used for inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5Concentration data predictionAnd (6) obtaining the result.
Example 2.
Embodiment 2 of the present invention may also provide a computer device including: at least one processor, memory, at least one network interface, and a user interface. The various components in the device are coupled together by a bus system. It will be appreciated that a bus system is used to enable communications among the components. The bus system includes a power bus, a control bus, and a status signal bus in addition to a data bus.
The user interface may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, trackball, touch pad, or touch screen, among others.
It will be appreciated that the memory in the embodiments disclosed herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double data rate Synchronous Dynamic random access memory (ddr DRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
In some embodiments, the memory stores elements, executable modules or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system and an application program.
The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application programs, including various application programs such as a Media Player (Media Player), a Browser (Browser), etc., are used to implement various application services. The program for implementing the method of the embodiment of the present disclosure may be included in an application program.
In the above embodiments, the processor may further be configured to call a program or an instruction stored in the memory, specifically, a program or an instruction stored in the application program, and the processor is configured to:
the steps of the method of the invention are performed.
The method of the present invention may be applied in or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The methods, steps, and logic blocks disclosed in embodiment 1 may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with embodiment 1 may be directly implemented by a hardware decoding processor, or may be implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques of the present invention may be implemented by executing the functional blocks (e.g., procedures, functions, and so on) of the present invention. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Example 3
Embodiment 3 of the present invention may also provide a nonvolatile storage medium for storing a computer program. The computer program may realize the steps of the above-described method embodiments when executed by a processor.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. Atmospheric PM based on two-stage non-negative Lasso model2.5A method for predicting concentration, the method comprising:
dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area;
inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model, and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5Concentration data prediction result, atmospheric PM realizing the region2.5And (4) predicting the concentration.
2. The atmospheric PM based on the two-stage non-negative Lasso model of claim 12.5The concentration prediction method is characterized in that a certain area is divided into a plurality of grid areas on a spatial level, and the annual carbon dioxide emission data in each grid area is calculated by a bottom-up spatialization method for each grid area and is used as the carbon dioxide emission list data of the grid area; the method specifically comprises the following steps:
dividing a certain area into a plurality of square grid areas on a spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission data in each square grid area by using a bottom-up spatialization method for each square grid area to serve as the carbon dioxide emission list data of the grid area;
wherein the carbon dioxide emissions inventory data comprises: carbon dioxide total emissions data, energy carbon dioxide emissions data, industrial carbon dioxide emissions data, agricultural carbon dioxide emissions data, service industry carbon dioxide emissions data, municipal carbon dioxide emissions data, rural carbon dioxide emissions data, traffic carbon dioxide emissions data, aviation carbon dioxide emissions data, highway carbon dioxide emissions data, railway carbon dioxide emissions data, water transport carbon dioxide emissions data, and industrial process carbon dioxide emissions data.
3. The atmospheric PM based on the two-stage non-negative Lasso model of claim 12.5The concentration prediction method is characterized in that the carbon dioxide emission list data of a certain grid area of the area is input into a pre-trained two-stage non-negative Lasso model and outputObtaining a first prediction result and a second prediction result; the method specifically comprises the following steps:
the two-stage non-negative Lasso model includes: a first stage non-negative Lasso model and a second stage non-negative Lasso model;
wherein, the non-negative Lasso model in the first stage is as follows:
Figure FDA0002463240560000011
wherein the content of the first and second substances,
Figure FDA0002463240560000021
is a first prediction result; xttA vector consisting of carbon dioxide total emission data in carbon dioxide emission list data of a certain grid area of the region;
Figure FDA0002463240560000022
representing the estimated value of the coefficient of the first-stage model;
wherein a first objective function is constructed:
Figure FDA0002463240560000023
wherein the content of the first and second substances,
Figure FDA0002463240560000024
is the first stage squared error;
Figure FDA0002463240560000025
represents the regularization term, the Lasso portion of the model; lambda [ alpha ]nA weight coefficient which is a first-stage regularization term; y ispm2.5Monitoring site PM for all environments2.5A vector of concentration data;
converting the first objective function into a matrix form:
wherein the content of the first and second substances,
Figure FDA0002463240560000027
for the first stage model coefficient estimation
Figure FDA0002463240560000028
Transposing; xtt' is XttTransposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
solving the estimated value of the coefficient of the model in the first stage by quadratic programming
Figure FDA0002463240560000029
The second stage non-negative Lasso model is:
Figure FDA00024632405600000210
wherein, X-ttA vector formed by the remaining carbon dioxide emission data except the carbon dioxide total emission data in a certain grid area of the region is used as an independent variable;
Figure FDA00024632405600000211
representing the estimated value of the coefficient of the second-stage model; respm2.5The second prediction result is obtained;
wherein a second objective function is constructed:
Figure FDA00024632405600000212
wherein the content of the first and second substances,
Figure FDA00024632405600000213
is the estimated value of the second stage model coefficient;
Figure FDA00024632405600000214
the second stage squared error;
Figure FDA00024632405600000215
representing a regularization term; lambda [ alpha ]mThe weight coefficient is the second-stage regular term;
converting the second objective function into a matrix form:
Figure FDA00024632405600000216
wherein the content of the first and second substances,
Figure FDA0002463240560000031
for second stage model coefficient estimation
Figure FDA0002463240560000032
Transposing; x-tt' is X-ttTransposing; 1 denotes a dimension p2× 1 and each entry is a column vector of 1, p2A dimension equal to the second stage model coefficients;
solving the estimated value of the second-stage model coefficient by quadratic programming
Figure FDA0002463240560000033
Inputting the total carbon dioxide emission data in the carbon dioxide emission list data of a certain grid area of the area into the first-stage non-negative Lasso model, and outputting a first prediction result;
inputting the remaining carbon dioxide emission data except the total carbon dioxide emission data in the carbon dioxide emission list data of a certain grid area of the region into the second-stage non-negative Lasso model, and outputting a second prediction result.
4. The atmospheric PM based on the two-stage non-negative Lasso model of claim 32.5The concentration prediction method is characterized in that the training step of the two-stage non-negative Lasso model isThe body includes:
dividing the certain area into a plurality of square grid areas on the spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission training data in each square grid area by using a bottom-up spatialization method for each square grid area to be used as the carbon dioxide emission list training data of the grid area;
wherein the carbon dioxide emissions manifest training data comprises: carbon dioxide total emission training data, energy carbon dioxide emission training data, industrial carbon dioxide emission training data, agricultural carbon dioxide emission training data, service industry carbon dioxide emission training data, urban living carbon dioxide emission training data, rural living carbon dioxide emission training data, traffic carbon dioxide emission training data, aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data, water transportation carbon dioxide emission training data, and industrial process carbon dioxide emission training data;
calculating the grid area to which each environment monitoring station belongs according to the station position of each environment monitoring station, namely longitude data and latitude data of each environment monitoring station, and longitude data and latitude data of four vertexes of the corresponding grid area;
selecting N circles of grid areas around each environmental monitoring station according to the grid area to which each environmental monitoring station belongs, and acquiring atmospheric PM (particulate matter) from the grid area where the environmental monitoring station is located2.5Pollution concentration data as atmospheric PM in the circle of grid area2.5Pollution concentration training data;
selecting carbon dioxide emission list training data in each circle of grid area around the environment monitoring station; for the carbon dioxide emission list training data in each circle of grid area, solving the corresponding carbon dioxide class mean value according to different carbon dioxide index classes to obtain the carbon dioxide emission list training data of the corresponding carbon dioxide index class subjected to averaging treatment;
training carbon dioxide emission lists for averaging processing in each circle of grid area around environment monitoring siteData, and atmospheric PM within the circle of grid regions2.5The pollution concentration training data is divided into training set data and test set data according to the ratio of 7: 3; namely, training data of carbon dioxide emission lists which are subjected to equalization processing in each circle of grid area around 70 percent of environment monitoring sites and atmosphere PM in the circle of grid area2.5Taking the pollution concentration training data as training set data; training data of carbon dioxide emission lists which are equalized in each circle of grid area around 30% of environment monitoring sites and atmospheric PM in the circle of grid area2.5Taking pollution concentration training data as test set data;
utilizing atmosphere PM in Nth circle of grid area around environment monitoring station2.5The pollution concentration training data are used as dependent variables, and the total carbon dioxide emission training data in the carbon dioxide emission list training data subjected to averaging processing in the Nth circle of grid area around the environment monitoring station are used as independent variables to establish a first-stage non-negative Lasso model;
Figure FDA0002463240560000041
wherein the content of the first and second substances,
Figure FDA0002463240560000042
a first prediction result; xtt1Representing a vector formed by carbon dioxide total emission training data in carbon dioxide emission training list data subjected to averaging processing in an Nth circle of grid area around the environment monitoring station;
Figure FDA0002463240560000043
representing a first model coefficient training estimation value;
when the first model coefficient training estimation value is solved, the following objective function is constructed:
Figure FDA0002463240560000044
wherein the content of the first and second substances,
Figure FDA0002463240560000045
training the square error for the first stage;
Figure FDA0002463240560000046
representing a training regularization term, λn1Training the weight coefficient of the regular term for the first stage;
converting the objective function in (8) into a matrix form:
Figure FDA0002463240560000047
wherein the content of the first and second substances,
Figure FDA0002463240560000048
representing an objective function;
Figure FDA0002463240560000049
is composed of
Figure FDA00024632405600000410
Transposing; xtt1' is Xtt1Transposing; 1 denotes a dimension p1× 1 and each entry is a column vector of 1, p1A dimension equal to the first stage model coefficients;
solving the training estimated value of the first-stage model coefficient by quadratic programming
Figure FDA00024632405600000411
Calculating the fitting error res of the non-negative Lasso model in the first stagepm2.5
Figure FDA0002463240560000051
Taking the fitting error res obtained by calculation in the formula (10) as a dependent variable; training with carbon dioxide emissions remaining in addition to carbon dioxide total emissions training dataExercise list data as argument (X)-tt1) Establishing a second stage non-negative Lasso model:
Figure FDA0002463240560000052
wherein, X-tt1A vector formed by the rest carbon dioxide emission training data except the carbon dioxide total emission training data in a certain grid area of the region is used as an independent variable;
Figure FDA0002463240560000053
representing the training estimation value of the second stage model coefficient; respm2.5The second prediction result is obtained;
when solving the second model coefficient training estimation value, constructing the following objective function:
Figure FDA0002463240560000054
wherein the content of the first and second substances,
Figure FDA0002463240560000055
training an estimated value for the second stage model coefficients;
Figure FDA0002463240560000056
training the squared error for the second stage;
Figure FDA0002463240560000057
representing a regularization term; lambda [ alpha ]m1Training the weight coefficient of the regular term for the second stage;
converting the objective function in (12) into a matrix form:
Figure FDA0002463240560000058
wherein the content of the first and second substances,
Figure FDA0002463240560000059
is composed of
Figure FDA00024632405600000510
Transposing; x-tt1' is X-tt1Transposing; 1 denotes a dimension p2× 1 and each entry is a column vector of 1, p2A dimension equal to the second stage model coefficients;
solving the second stage model coefficient training estimated value through quadratic programming
Figure FDA00024632405600000511
For the reserved 30% test set, respectively using the two-stage model obtained by training to predict, obtaining a corresponding first prediction result and a second prediction result, and adding the two prediction results to obtain the atmosphere PM of the environment monitoring station2.5Predicted value of concentration data:
Figure FDA00024632405600000512
evaluating the model prediction effect by using the relative percentage error MAPE:
Figure FDA00024632405600000513
wherein, observedtAtmospheric PM representing environmental monitoring sites2.5The actual value of the contamination concentration data; predictedtAtmospheric PM for environmental monitoring sites2.5The predicted value of the concentration data is the predicted result output by the two-stage non-negative lasso model; n1 denotes the number of prediction samples; the subscript t is used to identify the t-th sample;
for each environment monitoring site, 70% of carbon dioxide emission list data in each circle of grid area in N circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs2.5Using the pollution concentration data as training set data, and repeating the modeling process until the model is obtainedAnd the effect evaluation index MAPE enables the model effect to be converged in the test set data, so that the final two-stage non-negative Lasso model is obtained.
5. Atmospheric PM based on two-stage non-negative Lasso model2.5A concentration prediction system, characterized in that the system comprises:
the grid division module is used for dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area; and
the prediction module is used for inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model and outputting a first prediction result and a second prediction result;
adding the first prediction result and the second prediction result to obtain the PM of the area2.5The concentration data predicts the result.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 4 when executing the computer program.
7. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the method according to any one of claims 1 to 4.
CN202010325992.0A 2020-04-23 2020-04-23 Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system Active CN111581792B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010325992.0A CN111581792B (en) 2020-04-23 2020-04-23 Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010325992.0A CN111581792B (en) 2020-04-23 2020-04-23 Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system

Publications (2)

Publication Number Publication Date
CN111581792A true CN111581792A (en) 2020-08-25
CN111581792B CN111581792B (en) 2021-01-08

Family

ID=72120308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010325992.0A Active CN111581792B (en) 2020-04-23 2020-04-23 Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system

Country Status (1)

Country Link
CN (1) CN111581792B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884274A (en) * 2021-01-11 2021-06-01 生态环境部环境规划院 Carbon dioxide source-sink matching method and device based on emission grid
CN113144844A (en) * 2021-03-22 2021-07-23 国家能源集团国源电力有限公司 Desulfurizer flow control method and device and coal combustion system
CN116108998A (en) * 2023-02-22 2023-05-12 葛洲坝集团交通投资有限公司 Expressway construction project carbon emission prediction method and system
CN117540346A (en) * 2024-01-09 2024-02-09 四川国蓝中天环境科技集团有限公司 Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120290520A1 (en) * 2011-05-11 2012-11-15 Affectivon Ltd. Affective response predictor for a stream of stimuli
US20130326625A1 (en) * 2012-06-05 2013-12-05 Los Alamos National Security, Llc Integrating multiple data sources for malware classification
CN105550766A (en) * 2015-12-04 2016-05-04 山东大学 Micro-grid robustness multi-target operation optimization method containing renewable energy resources
CN106094786A (en) * 2016-05-30 2016-11-09 宁波大学 Industrial process flexible measurement method based on integrated-type independent entry regression model
CN106124700A (en) * 2016-06-20 2016-11-16 重庆大学 A kind of band is from the Electronic Nose non-targeted interference Gas Distinguishing Method expressed
CN106529081A (en) * 2016-12-03 2017-03-22 安徽新华学院 PM2.5 real-time level prediction method and system based on neural net
CN107451545A (en) * 2017-07-15 2017-12-08 西安电子科技大学 The face identification method of Non-negative Matrix Factorization is differentiated based on multichannel under soft label
CN107766296A (en) * 2017-09-30 2018-03-06 东南大学 The method that evaluation path traffic characteristic influences on Inhaled Particulate Matters Emission concentration
CN108009674A (en) * 2017-11-27 2018-05-08 上海师范大学 Air PM2.5 concentration prediction methods based on CNN and LSTM fused neural networks
CN109344963A (en) * 2018-10-17 2019-02-15 西安邮电大学 Ultra-large hidden layer node fast selecting method in extreme learning machine
CN110580386A (en) * 2019-08-23 2019-12-17 生态环境部环境规划院 Traffic department carbon dioxide emission space gridding method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120290520A1 (en) * 2011-05-11 2012-11-15 Affectivon Ltd. Affective response predictor for a stream of stimuli
US20130326625A1 (en) * 2012-06-05 2013-12-05 Los Alamos National Security, Llc Integrating multiple data sources for malware classification
CN105550766A (en) * 2015-12-04 2016-05-04 山东大学 Micro-grid robustness multi-target operation optimization method containing renewable energy resources
CN106094786A (en) * 2016-05-30 2016-11-09 宁波大学 Industrial process flexible measurement method based on integrated-type independent entry regression model
CN106124700A (en) * 2016-06-20 2016-11-16 重庆大学 A kind of band is from the Electronic Nose non-targeted interference Gas Distinguishing Method expressed
CN106529081A (en) * 2016-12-03 2017-03-22 安徽新华学院 PM2.5 real-time level prediction method and system based on neural net
CN107451545A (en) * 2017-07-15 2017-12-08 西安电子科技大学 The face identification method of Non-negative Matrix Factorization is differentiated based on multichannel under soft label
CN107766296A (en) * 2017-09-30 2018-03-06 东南大学 The method that evaluation path traffic characteristic influences on Inhaled Particulate Matters Emission concentration
CN108009674A (en) * 2017-11-27 2018-05-08 上海师范大学 Air PM2.5 concentration prediction methods based on CNN and LSTM fused neural networks
CN109344963A (en) * 2018-10-17 2019-02-15 西安邮电大学 Ultra-large hidden layer node fast selecting method in extreme learning machine
CN110580386A (en) * 2019-08-23 2019-12-17 生态环境部环境规划院 Traffic department carbon dioxide emission space gridding method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CAI, YAPING 等: "Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches", 《AGRICULTURAL AND FOREST METEOROLOGY》 *
SHAN, YULI 等: "Methodology and applications of city level CO2 emission accounts in China", 《JOURNAL OF CLEANER PRODUCTION》 *
王健颖: "不同排放源清单对于京津冀PM_(2.5)影响的数值试验研究", 《中国优秀硕士学位论文全文数据库工程科技Ⅰ辑》 *
翁克瑞 等: "TPE-XGBOOST与LassoLars组合下PM_(2.5)浓度分解集成预测模型研究", 《***工程理论与实践》 *
蔡博峰 等: "基于1km网格的天津市二氧化碳排放研究", 《环境科学学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884274A (en) * 2021-01-11 2021-06-01 生态环境部环境规划院 Carbon dioxide source-sink matching method and device based on emission grid
CN113144844A (en) * 2021-03-22 2021-07-23 国家能源集团国源电力有限公司 Desulfurizer flow control method and device and coal combustion system
CN116108998A (en) * 2023-02-22 2023-05-12 葛洲坝集团交通投资有限公司 Expressway construction project carbon emission prediction method and system
CN116108998B (en) * 2023-02-22 2023-12-15 葛洲坝集团交通投资有限公司 Expressway construction project carbon emission prediction method and system
CN117540346A (en) * 2024-01-09 2024-02-09 四川国蓝中天环境科技集团有限公司 Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data
CN117540346B (en) * 2024-01-09 2024-03-19 四川国蓝中天环境科技集团有限公司 Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data

Also Published As

Publication number Publication date
CN111581792B (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN111581792B (en) Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system
Kanaroglou et al. Estimation of sulfur dioxide air pollution concentrations with a spatial autoregressive model
Kim An assessment of deforestation models for reducing emissions from deforestation and forest degradation (REDD)
CN110348746B (en) Air quality influence assessment method and device based on single pollution source
CN110046382A (en) Source Apportionment, device, electronic equipment and the storage medium of atmosphere pollution
CN111753426B (en) Method and device for analyzing source of particulate pollution
Rusiawan et al. System dynamics modeling for urban economic growth and CO2 emission: a case study of Jakarta, Indonesia
KR20210086326A (en) Prediction Method and System of Regional PM2.5 Concentration
Torkayesh et al. A comparative assessment of air quality across European countries using an integrated decision support model
CN112711893B (en) Method and device for calculating contribution of pollution source to PM2.5 and electronic equipment
Li et al. Source contribution analysis of PM2. 5 using response surface model and particulate source apportionment technology over the PRD region, China
Kaginalkar et al. Stakeholder analysis for designing an urban air quality data governance ecosystem in smart cities
Peng et al. Unit and regression tests of scientific software: A study on SWMM
Qiao et al. Prediction of PM 2.5 concentration based on weighted bagging and image contrast-sensitive features
Chen et al. Modelling traffic noise in a wide gradient interval using artificial neural networks
Chen et al. Global sensitivity analysis of VISSIM parameters for project-level traffic emissions: a case study at a signalized intersection
Al‐Adwani et al. A surrogate‐based optimization methodology for the optimal design of an air quality monitoring network
Fu et al. Physio-chemical modeling of the NOx-O3 photochemical cycle and the air pollutants’ reactive dispersion around an isolated building
Jia et al. Embodied GHG emissions of high speed rail stations: Quantification, data-driven prediction and cost-benefit analysis
Kang et al. Fine dust forecast based on recurrent neural networks
Rumaling et al. Forecasting particulate matter concentration using nonlinear autoregression with exogenous input model
Rahi et al. Smart platforms of air quality monitoring: A logical literature exploration
Hogrefe et al. Demonstrating attainment of the air quality standards: Integration of observations and model predictions into the probabilistic framework
Wikle et al. A mechanistic model of annual sulfate concentrations in the United States
Ren et al. Predicting indoor particle concentration in mechanically ventilated classrooms using neural networks: Model development and generalization ability analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant