CN111581792A

CN111581792A - Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system

Info

Publication number: CN111581792A
Application number: CN202010325992.0A
Authority: CN
Inventors: 蔡博峰; 刘译璟; 鲁瑞; 魏太云; 曹丽斌; 伍鹏程; 庞凌云
Original assignee: Environmental Planning Institute Of Ministry Of Ecology And Environment
Current assignee: Environmental Planning Institute Of Ministry Of Ecology And Environment
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2020-08-25
Anticipated expiration: 2040-04-23
Also published as: CN111581792B

Abstract

The invention belongs to the technical field of atmospheric pollutant concentration prediction, and particularly relates to an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model_2.5A method of predicting concentration, the method comprising: dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area; inputting carbon dioxide emission data of a certain region of the region collected in real time into a pre-trained two-stage non-negative Lasso model, and outputting a first prediction result and a second prediction result; adding the first prediction result and the second prediction result to obtain the PM of the area_2.5Concentration data prediction result, atmospheric PM realizing the region_2.5And (4) predicting the concentration.

Description

Atmospheric PM based on two-stage non-negative Lasso model2.5Concentration prediction method and system

Technical Field

The invention belongs to the technical field of atmospheric pollutant concentration prediction, and particularly relates to an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model_2.5A concentration prediction method and system.

Background

PM_2.5It refers to particles with an aerodynamic equivalent diameter of less than or equal to 2.5 microns in the environment, and the higher content concentration of the particles in the air represents the more serious air pollution. With the rapid advance of industrialization, the atmospheric haze phenomenon is more and more serious, and PM_2.5The haze-preventing agent is one of main primitive fierce haze phenomena, the particle size of the haze-preventing agent is small, the haze-preventing agent can suspend in the air for a long time and spread, toxic and harmful substances can be carried into respiratory tracts and lungs, frequent large-scale haze influences daily trips of people, and direct threats are caused to human health. PM (particulate matter)_2.5Is the main component of haze, and the primary tasks of treating haze and improving air quality are to control PM_2.5，PM_2.5The concentration prediction is the main content of the air quality prediction. Recent studies have shown that PM is used_2.5The typical atmospheric composite air pollution has begun to become a significant environmental problem affecting the quality of life of people.

The simulation technique is a model technique that reflects the system behavior by means of numerical calculation or the like. Different from a general prediction model, only the pursuit of high prediction precision is carried out, a simulation model pays more attention to model interpretability, the attention to a simulation process is paid, and a limiting condition needs to be added in the model according to the actual service condition.

Research shows that in China, the emission of CO2 and the emission of atmospheric pollutants are the same (fossil fuel), the same (in the combustion process) and the same (the same equipment or the same emission port) in the energy structure mainly based on fossil fuels, and the emission of the atmospheric pollutants have a very close relationship.

For atmospheric PM_2.5Concentration prediction, typically based on pollutant emission data and meteorological condition data, using multivariate regression models and random forest models for atmospheric PM_2.5And (4) predicting the concentration. However, the conventional method has the following problems:

1) the consistency of the positive and negative of the model coefficient with the actual service cannot be ensured;

2) the model coefficients are not guaranteed to be all non-zero, namely that each carbon dioxide index cannot guarantee to monitor PM of the environment monitoring station_2.5The concentration has influence, which is not in accordance with the actual service, so that the prediction has larger error, and the accuracy of the prediction is reduced.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides an atmosphere PM based on a two-stage non-negative Lasso model (Least Absolute shock and Selection Operator)_2.5Concentration prediction method by monitoring PM of sites in different environments_2.5And establishing a model for the concentration and the carbon dioxide emission list data of the area around the site, and analyzing the specific association relation between the concentration and the carbon dioxide emission list data.

The invention provides an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model_2.5A method of predicting concentration, the method comprising:

dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area;

inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model, and outputting a first prediction result and a second prediction result;

adding the first prediction result and the second prediction result to obtain the PM of the area_2.5Concentration data prediction result, atmospheric PM realizing the region_2.5And (4) predicting the concentration.

As an improvement of the above technical solution, the method is characterized in that a certain area is divided into a plurality of grid areas on a spatial level, and for each grid area, the annual carbon dioxide emission data in the grid area is calculated by using a bottom-up spatialization method and is used as the carbon dioxide emission list data of the grid area; the method specifically comprises the following steps:

dividing a certain area into a plurality of square grid areas on a spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission data in each square grid area by using a bottom-up spatialization method for each square grid area to serve as the carbon dioxide emission list data of the grid area;

wherein the carbon dioxide emissions inventory data comprises: carbon dioxide total emissions data, energy carbon dioxide emissions data, industrial carbon dioxide emissions data, agricultural carbon dioxide emissions data, service industry carbon dioxide emissions data, municipal carbon dioxide emissions data, rural carbon dioxide emissions data, traffic carbon dioxide emissions data, aviation carbon dioxide emissions data, highway carbon dioxide emissions data, railway carbon dioxide emissions data, water transport carbon dioxide emissions data, and industrial process carbon dioxide emissions data.

As one improvement of the above technical solution, the carbon dioxide emission list data of a certain grid area of the area is input to a pre-trained two-stage non-negative Lasso model, and a first prediction result and a second prediction result are output; the method specifically comprises the following steps:

the two-stage non-negative Lasso model includes: a first stage non-negative Lasso model and a second stage non-negative Lasso model;

wherein, the non-negative Lasso model in the first stage is as follows:

wherein the content of the first and second substances,

is a first prediction result; x_ttIndicates the placeA vector formed by carbon dioxide total emission data in carbon dioxide emission list data of a certain grid area of the area;

representing the estimated value of the coefficient of the first-stage model;

wherein a first objective function is constructed:

wherein the content of the first and second substances,

is the first stage squared error;

represents the regularization term, the Lasso portion of the model; lambda [ alpha ]_nA weight coefficient which is a first-stage regularization term; y is_pm2.5Monitoring site PM for all environments_2.5A vector of concentration data;

converting the first objective function into a matrix form:

wherein the content of the first and second substances,

for the first stage model coefficient estimation

Transposing; x_tt' is X_ttTransposing; 1 denotes a dimension p₁× 1 and each entry is a column vector of 1, p₁A dimension equal to the first stage model coefficients;

solving the estimated value of the coefficient of the model in the first stage by quadratic programming

The second stage non-negative Lasso model is:

wherein, X_-ttA vector formed by the remaining carbon dioxide emission data except the carbon dioxide total emission data in a certain grid area of the region is used as an independent variable;

representing the estimated value of the coefficient of the second-stage model; res_pm2.5The second prediction result is obtained;

wherein a second objective function is constructed:

wherein the content of the first and second substances,

is the estimated value of the second stage model coefficient;

the second stage squared error;

representing a regularization term; lambda [ alpha ]_mThe weight coefficient is the second-stage regular term;

converting the second objective function into a matrix form:

wherein the content of the first and second substances,

for second stage model coefficient estimation

Transposing; x_-tt' is X_-ttTransposing; 1 denotes a dimension p₂× 1 and each entry is a column vector of 1, p₂A dimension equal to the second stage model coefficients;

solving the estimated value of the second-stage model coefficient by quadratic programming

Inputting the total carbon dioxide emission data in the carbon dioxide emission data of a certain grid area of the area into the first-stage non-negative Lasso model, and outputting a first prediction result;

and inputting the rest carbon dioxide emission data except the total carbon dioxide emission data in the carbon dioxide emission data of a certain grid area of the region into the second-stage non-negative Lasso model, and outputting a second prediction result.

As an improvement of the above technical solution, the two-stage non-negative Lasso model training step specifically includes:

dividing the certain area into a plurality of square grid areas on the spatial level according to 10km multiplied by 10km, and checking the annual carbon dioxide emission training data in each square grid area by using a bottom-up spatialization method for each square grid area to be used as the carbon dioxide emission list training data of the grid area;

wherein the carbon dioxide emissions manifest training data comprises: carbon dioxide total emission training data, energy carbon dioxide emission training data, industrial carbon dioxide emission training data, agricultural carbon dioxide emission training data, service industry carbon dioxide emission training data, urban living carbon dioxide emission training data, rural living carbon dioxide emission training data, traffic carbon dioxide emission training data, aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data, water transportation carbon dioxide emission training data, and industrial process carbon dioxide emission training data;

calculating the grid area to which each environment monitoring station belongs according to the station position of each environment monitoring station, namely longitude data and latitude data of each environment monitoring station, and longitude data and latitude data of four vertexes of the corresponding grid area;

selecting N circles of grid areas around each environmental monitoring station according to the grid area to which each environmental monitoring station belongs, and acquiring atmospheric PM (particulate matter) from the grid area where the environmental monitoring station is located_2.5Pollution concentration data as atmospheric PM in the circle of grid area_2.5Pollution concentration training data;

selecting carbon dioxide emission list training data in each circle of grid area around the environment monitoring station; for the carbon dioxide emission list training data in each circle of grid area, solving the corresponding carbon dioxide class mean value according to different carbon dioxide index classes to obtain the carbon dioxide emission list training data of the corresponding carbon dioxide index class subjected to averaging treatment;

training data of carbon dioxide emission lists for averaging processing in each circle of grid area around environment monitoring site and atmosphere PM in the circle of grid area_2.5The pollution concentration training data is divided into training set data and test set data according to the ratio of 7: 3; namely, training data of carbon dioxide emission lists which are subjected to equalization processing in each circle of grid area around 70 percent of environment monitoring sites and atmosphere PM in the circle of grid area_2.5Taking the pollution concentration training data as training set data; training data of carbon dioxide emission lists for averaging processing in each circle of grid area around 30% of environment monitoring sites and atmosphere PM in the circle of grid area_2.5Taking pollution concentration training data as test set data;

utilizing atmosphere PM in Nth circle of grid area around environment monitoring station_2.5The pollution concentration training data are used as dependent variables, and the total carbon dioxide emission training data in the carbon dioxide emission list training data subjected to averaging processing in the Nth circle of grid area around the environment monitoring station are used as independent variables to establish a first-stage non-negative Lasso model;

wherein the content of the first and second substances,

a first prediction result; x_tt1Representing a vector formed by carbon dioxide total emission training data in carbon dioxide emission training list data subjected to averaging processing in an Nth circle of grid area around the environment monitoring station;

representing a first model coefficient training estimation value;

when the first model coefficient training estimation value is solved, the following objective function is constructed:

wherein the content of the first and second substances,

training the square error for the first stage;

representing a training regularization term, λ_n1Training the weight coefficient of the regular term for the first stage;

converting the objective function in (8) into a matrix form:

wherein the content of the first and second substances,

representing an objective function;

is composed of

Transposing; x_tt1' is X_tt1Transposing; 1 denotes a dimension p₁× 1 and each entry is a column vector of 1, p₁A dimension equal to the first stage model coefficients;

solving the training estimated value of the first-stage model coefficient by quadratic programming

Calculating the fitting error res of the non-negative Lasso model in the first stage_pm2.5：

Taking the fitting error res obtained by calculation in the formula (10) as a dependent variable; using the remaining carbon dioxide emission training list data, excluding the total carbon dioxide emission training data, as the independent variable (X)_-tt1) Establishing a second stage non-negative Lasso model:

wherein, X_-tt1A vector formed by the rest carbon dioxide emission training data except the carbon dioxide total emission training data in a certain grid area of the area is used as an independent variable;

representing the training estimation value of the second stage model coefficient; res_pm2.5The second prediction result is obtained;

when solving the second model coefficient training estimation value, constructing the following objective function:

wherein the content of the first and second substances,

training an estimated value for the second stage model coefficients;

training the squared error for the second stage;

representing a regularization term; lambda [ alpha ]_m1Training the weight coefficient of the regular term for the second stage;

converting the objective function in (12) into a matrix form:

wherein the content of the first and second substances,

is composed of

Transposing; x_-tt1' is X_-tt1Transposing; 1 denotes a dimension p₂× 1 and each entry is a column vector of 1, p₂A dimension equal to the second stage model coefficients;

For the reserved 30% test set, respectively using the two-stage model obtained by training to predict, obtaining a corresponding first prediction result and a second prediction result, and adding the two prediction results to obtain the atmosphere PM of the environment monitoring station_2.5Predicted value of concentration data:

evaluating the model prediction effect by using the relative percentage error MAPE:

wherein, observed_tAtmospheric PM representing environmental monitoring sites_2.5The actual value of the contamination concentration data; predicted_tAtmospheric PM for environmental monitoring sites_2.5The predicted value of the concentration data is the predicted result output by the two-stage non-negative lasso model; n1 denotes the number of prediction samples; the subscript t is used to identify the t-th sample;

for each environment monitoring site, 70% of carbon dioxide emission list data in each circle of grid area in N circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs_2.5And (3) using the pollution concentration data as training set data, and repeating the modeling process until the model effect evaluation index MAPE enables the model effect to be converged in the test set data, so as to obtain the final two-stage non-negative Lasso model.

The invention also provides an atmosphere PM based on the two-stage non-negative Lasso model_2.5A concentration prediction system, the system comprising:

the grid division module is used for dividing a certain area into a plurality of grid areas on a spatial level, and checking annual carbon dioxide emission data in each grid area by using a bottom-up spatialization method for each grid area to serve as carbon dioxide emission list data of the grid area; and

the prediction module is used for inputting carbon dioxide emission list data of a certain grid area of the area into a pre-trained two-stage non-negative Lasso model and outputting a first prediction result and a second prediction result;

adding the first prediction result and the second prediction result to obtain the PM of the area_2.5The concentration data predicts the result.

The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method when executing the computer program.

The invention also provides a computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the above-mentioned method.

Compared with the prior art, the invention has the beneficial effects that:

simulating carbon dioxide emission list data in grid area and atmosphere PM in grid area to which nearby environment monitoring station belongs by using two-stage non-negative Lasso model_2.5Relationships between concentration data;

by adopting a two-stage non-negative Lasso model, atmospheric PM of a predicted environment monitoring site can be guaranteed_2.5Carbon dioxide emissions inventory data for all surrounding grids at concentration versus predicted PM_2.5Positive influence is generated, namely the positive influence is expressed in the model, and the coefficient corresponding to each carbon dioxide index is not 0, so that the method is more in line with business practice;

the Lasso part in the model can compress unimportant variables, remove the collinearity among indexes and ensure the generalization capability of the model;

on the spatial grid level, atmospheric PM in any grid area is realized_2.5Predicting concentration data so as to complete quantitative evaluation on the influence of air quality; the regional carbon dioxide emission and the atmospheric pollutant cooperative management are quickly and effectively realized.

Drawings

FIG. 1 shows an atmospheric PM based on a two-stage non-negative Lasso model according to the present invention_2.5The spatial relative position schematic diagram of the grid region to which the environment monitoring station belongs during training of the two-stage non-negative Lasso model in the concentration prediction method;

FIG. 2 is an atmospheric PM based on a two-stage non-negative Lasso model according to the present invention_2.5A schematic diagram of a two-stage non-negative Lasso model training process in the concentration prediction method;

FIG. 3 is an atmospheric PM based on a two-stage non-negative Lasso model according to the present invention_2.5Environment monitoring site for testing and verifying two-stage non-negative Lasso model in concentration prediction method by adopting test set after training is completedPM of the atmosphere_2.5Distribution histogram of prediction error of concentration.

Detailed Description

The invention will now be further described with reference to the accompanying drawings.

The invention provides an atmospheric PM (particulate matter) based on a two-stage non-negative Lasso model_2.5The concentration prediction method can quickly simulate and predict the atmospheric PM of the environmental monitoring sites near the area when the carbon dioxide emission (essentially the energy utilization condition) changes in the area_2.5The change of concentration; or predicting and analyzing possible regional air quality (PM) according to energy utilization and structural change in regional planning_2.5Concentration) of the sample.

The Lasso model was first proposed by Robert Tibshirani in 1996, and is called as a Least absoluteshrinkage and selection operator. The model is a compressed estimate, and a more refined model is obtained by constructing a penalty function so that it compresses coefficients while setting coefficients to zero. Thus, the advantage of subset puncturing is retained, and is a way to process biased estimates with complex collinearity data.

The method comprises the following steps:

specifically, a certain area is divided into a plurality of square grid areas on the spatial level according to 10km multiplied by 10km, and the annual carbon dioxide emission data in each square grid area is checked out by utilizing a bottom-up spatialization method for each square grid area and is used as the carbon dioxide emission list data of the grid area;

Wherein the total carbon dioxide emission data is the sum of energy carbon dioxide emission data and industrial process carbon dioxide emission data;

the energy carbon dioxide emission data is the sum of industrial carbon dioxide emission data, agricultural carbon dioxide emission data, service industry carbon dioxide emission data, urban life carbon dioxide emission data, rural life carbon dioxide emission data and traffic carbon dioxide emission data; the traffic carbon dioxide emission data is a sum of aviation carbon dioxide emission data, highway carbon dioxide emission data, railroad carbon dioxide emission data, and water transport carbon dioxide emission data.

Wherein the data is from a high spatial resolution drainage grid database.

Each emission data in the carbon dioxide emission list data corresponds to a carbon dioxide emission index, the total carbon dioxide emission data corresponds to a total carbon dioxide emission index, the energy carbon dioxide emission data corresponds to an energy carbon dioxide emission index, the industrial carbon dioxide emission data corresponds to an industrial carbon dioxide emission index, the agricultural carbon dioxide emission data corresponds to an agricultural carbon dioxide emission index, the service carbon dioxide emission data corresponds to a service carbon dioxide emission index, the urban life carbon dioxide emission data corresponds to an urban life carbon dioxide emission index, the rural life carbon dioxide emission data corresponds to a rural life carbon dioxide emission index, the traffic carbon dioxide emission data corresponds to a traffic carbon dioxide emission index, the aviation carbon dioxide emission data corresponds to an aviation carbon dioxide emission index, and the highway carbon dioxide emission data corresponds to a highway carbon dioxide emission index, the railway carbon dioxide emission data correspond to railway carbon dioxide emission indexes, the water transport carbon dioxide emission data correspond to water transport carbon dioxide emission indexes, the industrial process carbon dioxide emission data correspond to industrial process carbon dioxide emission indexes, and 13 carbon dioxide emission indexes are provided in total.

Wherein the two-stage non-negative Lasso model comprises: a first stage non-negative Lasso model and a second stage non-negative Lasso model;

wherein, the non-negative Lasso model in the first stage is as follows:

wherein the content of the first and second substances,

is a first prediction result; x_ttA vector composed of carbon dioxide total emission data in carbon dioxide emission list data of a certain grid area of the region is used as an independent variable;

representing first stage model coefficient estimates, i.e. first stage model coefficient true values β_ttAn estimated value of (d);

wherein a first objective function is constructed:

wherein the content of the first and second substances,

is the first stage squared error;

represents the regularization term, the Lasso portion of the model; lambda [ alpha ]_nBeing a first-stage regularization termA weight coefficient; y is_pm2.5Monitoring site PM for all environments_2.5A vector of concentration data;

converting the first objective function into a matrix form:

wherein the content of the first and second substances,

for the first stage model coefficient estimation

The second stage non-negative Lasso model is:

wherein, X_-ttA vector formed by the remaining carbon dioxide emission data except the carbon dioxide total emission data in a certain grid area of the area is used as an independent variable;

wherein a second objective function is constructed:

wherein the content of the first and second substances,

is an estimate of the second stage model coefficients, i.e. the second stage model coefficient true value β_-ttAn estimated value of (d);

the second stage squared error;

represents the regularization term, the Lasso portion of the model; lambda [ alpha ]_mThe weight coefficient is the second-stage regular term; y is_pm2.5Monitoring site PM for all environments_2.5A vector of data;

vectors formed for the first prediction, i.e. PM for each environmental monitoring site according to the first stage model_2.5A vector formed by the predicted values of (a);

converting the second objective function into a matrix form:

wherein the content of the first and second substances,

for second stage model coefficient estimation

solving second stage model coefficients through quadratic programming

An estimated value;

inputting the total carbon dioxide emission data in the carbon dioxide emission list data of a certain grid area of the area into the first-stage non-negative Lasso model, and outputting a first prediction result;

As shown in fig. 2, the two-stage non-negative Lasso model training step specifically includes:

wherein the carbon dioxide total emission training data is the sum of energy carbon dioxide emission training data and industrial process carbon dioxide emission training data;

the energy carbon dioxide emission training data is the sum of industrial carbon dioxide emission training data, agricultural carbon dioxide emission training data, service industry carbon dioxide emission training data, urban living carbon dioxide emission training data, rural living carbon dioxide emission training data, traffic carbon dioxide emission training data, aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data and water transport carbon dioxide emission training data;

the traffic carbon dioxide emission training data is the sum of aviation carbon dioxide emission training data, highway carbon dioxide emission training data, railway carbon dioxide emission training data and water transport carbon dioxide emission training data.

Wherein the data is from a high spatial resolution drainage grid database.

Each emission training data in the carbon dioxide emission list training data corresponds to a carbon dioxide emission training index, the carbon dioxide total emission training data corresponds to a carbon dioxide total emission training index, the energy carbon dioxide emission training data corresponds to an energy carbon dioxide emission training index, the industrial carbon dioxide emission training data corresponds to an industrial carbon dioxide emission training index, the agricultural carbon dioxide emission training data corresponds to an agricultural carbon dioxide emission training index, the service carbon dioxide emission training data corresponds to a service carbon dioxide emission training index, the urban carbon dioxide emission training data corresponds to an urban carbon dioxide emission training index, the rural carbon dioxide emission training data corresponds to a rural carbon dioxide emission training index, and the traffic carbon dioxide emission training data corresponds to a traffic carbon dioxide emission training index, aviation carbon dioxide emission training data corresponds to aviation carbon dioxide emission training indexes, highway carbon dioxide emission training data corresponds to highway carbon dioxide emission training indexes, railway carbon dioxide emission training data corresponds to railway carbon dioxide emission training indexes, water-borne carbon dioxide emission training data corresponds to water-borne carbon dioxide emission training indexes, industrial process carbon dioxide emission training data corresponds to industrial process carbon dioxide emission training indexes, and the total number of the training indexes is 13.

Calculating the grid area to which each environment monitoring station belongs according to the station position of each environment monitoring station, namely longitude data and latitude data of each station, and the longitude data and the latitude data of four vertexes of the corresponding grid area;

monitoring sites according to each environmentSelecting N circles of grid areas around each station, and acquiring atmospheric PM from the grid area where the environmental station is located_2.5Pollution concentration data as atmospheric PM in the circle of grid area_2.5Pollution concentration training data;

selecting carbon dioxide emission list training data in each circle of grid area around the environment monitoring station; for the carbon dioxide emission list training data in each circle of grid area, solving the corresponding carbon dioxide class mean value according to different carbon dioxide index classes to obtain the carbon dioxide emission list training data of the corresponding carbon dioxide index class subjected to averaging treatment; for example, a certain circle of grid region of N circles of grid regions around each site is selected to include 8 grids, each grid includes the carbon dioxide emission list training data of the 13 carbon dioxide emission training indexes, and then the total carbon dioxide data in the certain circle of grid region is 8 × the carbon dioxide total emission training data including the 13 carbon dioxide emission training indexes; the averaging at this time is to add training data corresponding to 8 carbon dioxide emission training indexes with the same grid to obtain training data corresponding to the carbon dioxide emission training indexes subjected to averaging processing, and repeat the above operation 13 times to obtain training data corresponding to 13 carbon dioxide emission training indexes subjected to averaging processing respectively.

wherein the content of the first and second substances,

is a first prediction result; x_tt1Representing a vector formed by carbon dioxide total emission training data in carbon dioxide emission training list data subjected to averaging processing in an Nth circle of grid area around the environment monitoring station;

representing a first model coefficient training estimation value;

when solving the first-stage model coefficient training estimation value, constructing the following objective function:

wherein the content of the first and second substances,

training the square error for the first stage;

converting the objective function in (8) into a matrix form:

wherein the content of the first and second substances,

representing an objective function;

is composed of

when solving the second stage model coefficient training estimation value, constructing the following objective function:

wherein the content of the first and second substances,

for the second stage model coefficients β_-tt1The training estimation value of (2), namely the training estimation value of the second stage model coefficient;

training the squared error for the second stage;

represents the regularization term, the Lasso portion of the model; lambda [ alpha ]_m1The weight coefficient is the second-stage regular term;

converting the second objective function in (12) into a matrix form:

wherein the content of the first and second substances,

for second stage model coefficient estimation

Transposing; x_-tt1' is X_-tt1Transposing; 1 denotes a dimension p₂× 1 column vector with each entry being 1, p₂A dimension equal to the second stage model coefficients;

for (13), the second-stage model coefficient training estimation value is solved by utilizing quadratic programming

For the reserved 30% test set, respectively using the two-stage model obtained by training to perform pre-testMeasuring to obtain a first prediction result and a second prediction result, and adding the two prediction results to obtain the atmospheric PM of the environment monitoring station_2.5Predicted value of concentration data:

wherein, observed_tAtmospheric PM representing environmental monitoring sites_2.5The actual value of the contamination concentration data; predicted_tAtmospheric PM for environmental monitoring sites_2.5The predicted value of the concentration data is the predicted result output by the two-stage non-negative lasso model; n1 denotes the number of prediction samples; the subscript t is used to identify the tth sample (environmental monitoring site);

for each environment monitoring site, 70% of carbon dioxide emission list data in each circle of grid area in N circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs_2.5And (3) using the pollution concentration data as training set data, repeating the modeling process until the model effect evaluation index MAPE reaches convergence on the test set data (when the MAPE value is reduced by no more than 0.01, the model is considered to reach convergence), and obtaining the final two-stage non-negative Lasso model.

In this embodiment, as shown in fig. 1, 3 circles of grid areas around an environment monitoring site are selected, where each circle of grid area includes a plurality of grids; wherein, the dots represent environment monitoring sites; squares represent a 10 x 10km grid; for 70% carbon dioxide emission list data in each circle of grid areas in 3 circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs_2.5The modeling process is repeated by using the pollution concentration data as training set data until the model effect evaluation index MAPE reaches convergence in the test set data (when the MAPE reaches convergenceWhen the value drops by no more than 0.01, the model is considered to have converged), and the final two-stage non-negative Lasso model is obtained. And if the model effect evaluation index MAPE does not meet the standard, increasing the number of turns and carrying out training again to meet the standard.

As shown in FIG. 3, the model representing the convergence predicts PM per sample on the test set_2.5And the distribution of the obtained prediction errors shows that the prediction errors are concentrated between 0 and 15, and the model has good prediction accuracy.

The prediction error is calculated as follows:

error_t＝|observed_t-predicted_t|

wherein, error_tRepresenting the prediction error of the model on the t sample; observed_tIndicates the PM corresponding to the t-th sample_2.5Obtaining a true value; predicted_tPM representing model vs. t sample_2.5The predicted value of (2).

In the embodiment, the atmospheric PM of the environmental monitoring sites around the grid is simulated by changing the carbon dioxide emission list data of a certain grid area through a two-stage non-negative Lasso model_2.5The change in concentration.

Example 1.

The invention also provides an atmosphere PM based on the two-stage non-negative Lasso model_2.5A concentration prediction system, characterized in that the system comprises:

adding the first prediction result and the second prediction result to obtain the PM of the area_2.5Concentration data predictionAnd (6) obtaining the result.

Example 2.

Embodiment 2 of the present invention may also provide a computer device including: at least one processor, memory, at least one network interface, and a user interface. The various components in the device are coupled together by a bus system. It will be appreciated that a bus system is used to enable communications among the components. The bus system includes a power bus, a control bus, and a status signal bus in addition to a data bus.

The user interface may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, trackball, touch pad, or touch screen, among others.

It will be appreciated that the memory in the embodiments disclosed herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double data rate Synchronous Dynamic random access memory (ddr DRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, the memory stores elements, executable modules or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system and an application program.

The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application programs, including various application programs such as a Media Player (Media Player), a Browser (Browser), etc., are used to implement various application services. The program for implementing the method of the embodiment of the present disclosure may be included in an application program.

In the above embodiments, the processor may further be configured to call a program or an instruction stored in the memory, specifically, a program or an instruction stored in the application program, and the processor is configured to:

the steps of the method of the invention are performed.

The method of the present invention may be applied in or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The methods, steps, and logic blocks disclosed in embodiment 1 may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with embodiment 1 may be directly implemented by a hardware decoding processor, or may be implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques of the present invention may be implemented by executing the functional blocks (e.g., procedures, functions, and so on) of the present invention. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Example 3

Embodiment 3 of the present invention may also provide a nonvolatile storage medium for storing a computer program. The computer program may realize the steps of the above-described method embodiments when executed by a processor.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. Atmospheric PM based on two-stage non-negative Lasso model_2.5A method for predicting concentration, the method comprising:

2. The atmospheric PM based on the two-stage non-negative Lasso model of claim 1_2.5The concentration prediction method is characterized in that a certain area is divided into a plurality of grid areas on a spatial level, and the annual carbon dioxide emission data in each grid area is calculated by a bottom-up spatialization method for each grid area and is used as the carbon dioxide emission list data of the grid area; the method specifically comprises the following steps:

3. The atmospheric PM based on the two-stage non-negative Lasso model of claim 1_2.5The concentration prediction method is characterized in that the carbon dioxide emission list data of a certain grid area of the area is input into a pre-trained two-stage non-negative Lasso model and outputObtaining a first prediction result and a second prediction result; the method specifically comprises the following steps:

wherein, the non-negative Lasso model in the first stage is as follows:

wherein the content of the first and second substances,

is a first prediction result; x_ttA vector consisting of carbon dioxide total emission data in carbon dioxide emission list data of a certain grid area of the region;

representing the estimated value of the coefficient of the first-stage model;

wherein a first objective function is constructed:

wherein the content of the first and second substances,

is the first stage squared error;

converting the first objective function into a matrix form:

wherein the content of the first and second substances,

for the first stage model coefficient estimation

The second stage non-negative Lasso model is:

wherein a second objective function is constructed:

wherein the content of the first and second substances,

is the estimated value of the second stage model coefficient;

the second stage squared error;

converting the second objective function into a matrix form:

wherein the content of the first and second substances,

for second stage model coefficient estimation

inputting the remaining carbon dioxide emission data except the total carbon dioxide emission data in the carbon dioxide emission list data of a certain grid area of the region into the second-stage non-negative Lasso model, and outputting a second prediction result.

4. The atmospheric PM based on the two-stage non-negative Lasso model of claim 3_2.5The concentration prediction method is characterized in that the training step of the two-stage non-negative Lasso model isThe body includes:

training carbon dioxide emission lists for averaging processing in each circle of grid area around environment monitoring siteData, and atmospheric PM within the circle of grid regions_2.5The pollution concentration training data is divided into training set data and test set data according to the ratio of 7: 3; namely, training data of carbon dioxide emission lists which are subjected to equalization processing in each circle of grid area around 70 percent of environment monitoring sites and atmosphere PM in the circle of grid area_2.5Taking the pollution concentration training data as training set data; training data of carbon dioxide emission lists which are equalized in each circle of grid area around 30% of environment monitoring sites and atmospheric PM in the circle of grid area_2.5Taking pollution concentration training data as test set data;

wherein the content of the first and second substances,

representing a first model coefficient training estimation value;

wherein the content of the first and second substances,

training the square error for the first stage;

converting the objective function in (8) into a matrix form:

wherein the content of the first and second substances,

representing an objective function;

is composed of

Taking the fitting error res obtained by calculation in the formula (10) as a dependent variable; training with carbon dioxide emissions remaining in addition to carbon dioxide total emissions training dataExercise list data as argument (X)_-tt1) Establishing a second stage non-negative Lasso model:

wherein, X_-tt1A vector formed by the rest carbon dioxide emission training data except the carbon dioxide total emission training data in a certain grid area of the region is used as an independent variable;

wherein the content of the first and second substances,

training an estimated value for the second stage model coefficients;

training the squared error for the second stage;

converting the objective function in (12) into a matrix form:

wherein the content of the first and second substances,

is composed of

solving the second stage model coefficient training estimated value through quadratic programming

for each environment monitoring site, 70% of carbon dioxide emission list data in each circle of grid area in N circles of grid areas around the selected environment monitoring site and atmosphere PM in the grid area to which the environment monitoring site belongs_2.5Using the pollution concentration data as training set data, and repeating the modeling process until the model is obtainedAnd the effect evaluation index MAPE enables the model effect to be converged in the test set data, so that the final two-stage non-negative Lasso model is obtained.

5. Atmospheric PM based on two-stage non-negative Lasso model_2.5A concentration prediction system, characterized in that the system comprises:

6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 4 when executing the computer program.

7. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the method according to any one of claims 1 to 4.