CN112949680A - Pollution source identification method based on corresponding analysis and multiple linear regression - Google Patents
Pollution source identification method based on corresponding analysis and multiple linear regression Download PDFInfo
- Publication number
- CN112949680A CN112949680A CN202110102148.6A CN202110102148A CN112949680A CN 112949680 A CN112949680 A CN 112949680A CN 202110102148 A CN202110102148 A CN 202110102148A CN 112949680 A CN112949680 A CN 112949680A
- Authority
- CN
- China
- Prior art keywords
- pollution source
- linear regression
- multiple linear
- source
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000012417 linear regression Methods 0.000 title claims abstract description 23
- 239000003344 environmental pollutant Substances 0.000 claims abstract description 23
- 231100000719 pollutant Toxicity 0.000 claims abstract description 22
- 239000011159 matrix material Substances 0.000 claims description 19
- 238000012544 monitoring process Methods 0.000 claims description 11
- 238000011109 contamination Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 6
- 229910001385 heavy metal Inorganic materials 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 239000000356 contaminant Substances 0.000 claims description 4
- 125000005575 polycyclic aromatic hydrocarbon group Chemical group 0.000 claims description 2
- 230000007613 environmental effect Effects 0.000 abstract description 4
- 239000002131 composite material Substances 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 5
- 238000000556 factor analysis Methods 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 4
- 238000009792 diffusion process Methods 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 239000013049 sediment Substances 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 229910052785 arsenic Inorganic materials 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910052793 cadmium Inorganic materials 0.000 description 1
- 229910052804 chromium Inorganic materials 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 229910052745 lead Inorganic materials 0.000 description 1
- 229910052753 mercury Inorganic materials 0.000 description 1
- 238000007431 microscopic evaluation Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Processing Of Solid Wastes (AREA)
Abstract
The invention discloses an environmental active pollutant source analysis method based on corresponding analysis and multiple linear regression, which comprises the following steps: firstly, identifying a pollution source by using a corresponding analysis method based on sample data of the pollution source, and determining the number of main factors; and secondly, calculating the contribution rate of the pollution source of the factor load by utilizing multiple linear regression, and realizing source analysis of the characteristic pollutants. The pollution source identification method based on the corresponding analysis and the multiple linear regression provided by the invention identifies the pollution source by using the corresponding analysis method, calculates the contribution rate of the pollution source by using the composite multiple linear regression method, considers the factor load identification process as a nonlinear classification process, is a multi-factor comprehensive classification problem, is a pattern identification process, has strong practicability and wide popularization and application values, and provides reliable technical support for environmental management departments to deal with pollution accidents and control pollution risks.
Description
Technical Field
The invention relates to the technical field of pollution source identification, in particular to a pollution source identification method based on corresponding analysis and multiple linear regression.
Background
The pollution source identification technology is a method for distinguishing, analyzing and evaluating the source of pollutants. Current pollution source identification technologies can be broadly divided into three categories: list analysis, diffusion model and receptor model. The list analysis method is a source analysis method for establishing a list model by observing and simulating source emission amount, emission characteristics, emission geographical distribution and the like of pollutants; the diffusion model belongs to a prediction model, and predicts the time-space change condition of pollutants by inputting the emission data and related parameter information of each pollution source; receptor models are a class of techniques that determine the contribution rates of each source of contamination by chemical and microscopic analysis of a sample of the receptor, with the ultimate goal of identifying the source of contamination that contributes to the receptor and quantitatively calculating the contribution rates of each source of contamination. In the various source analysis methods based on the receptor model chemical method, the multivariate statistical method is simple to apply, the fingerprint spectrogram of each pollution source does not need to be known in advance, the pollution source in a research area does not need to be monitored in advance, and only receptor sample monitoring data are needed. A positive definite matrix factorization model belongs to a multivariate statistical method in a pollutant source analysis technology, and is a factor analysis method which is based on non-negative elements in a decomposition matrix and utilizes data standard deviation to carry out optimization. The core idea of the technology is principal component analysis, traditional principal component factor analysis (PCA) based on the least square method, which causes data distortion in the factor analysis process due to the adoption of line or column based normalization of the receptor sample data D. They also believe that least squares based PCA implicitly assumes that there is an unrealistic standard deviation of the sample data, resulting in PCA that does not yield an optimal solution of minimum variance. The method utilizes a positive definite matrix factorization model to carry out source analysis research, and has the core links of nonnegative constraint factorization and pollution source identification by utilizing a factor load matrix.
At present, research aiming at pollution source identification is few, and the main pollution source identification method is to realize qualitative comparison through graph observation of a source spectrum and a factor load or realize semi-quantitative comparison through calculating deviation of the source spectrum and the factor load. Most of the methods do not consider the nonlinear characteristics of the pollution source spectrum, and the identification result cannot truly reflect the corresponding relation between the factor load and the pollution source spectrum.
Disclosure of Invention
The invention aims to provide a pollution source identification method based on correspondence analysis and multiple linear regression.
A pollution source identification method based on corresponding analysis and multiple linear regression comprises the following steps:
the method comprises the following steps: based on sample data of the pollution source, identifying the pollution source by using a corresponding analysis method, which specifically comprises the following steps:
carrying out standardized processing on sample data of the pollution source;
calculating a correlation coefficient matrix of the pollution source sample;
calculating the eigenvalue and corresponding eigenvector of the correlation coefficient matrix;
taking all factors with the characteristic value larger than 1 as main factors, and determining the number of the main factors;
step two: the method for realizing the factor load pollution source contribution rate calculation by utilizing the multiple linear regression and realizing the characteristic pollutant source analysis specifically comprises the following steps:
establishing a multivariate phenomenon regression model:
y represents the total concentration of the pollution source, P represents the number of extracted main factors, and XiIs a factor score, m is a pollutant class, b is residual variable information not interpreted by the factor;
requirement XiFor non-linearity, sum y and X aboveiNormal normalization analysis was performed to yield:
z is the positive standard deviation of the source of contamination, XiIs a factor score, BiIs a multiple linear regression coefficient;
according to the formula: t is ti(%)=100(Bi/∑Bi) And calculating the average percentage contribution rate of the pollution source i.
Optionally, the sample data of the pollution source is standardized,
according to the formula:
i represents the number of monitoring points, j represents the positions of the pollutants of the class I of the monitoring points, and xijIndicating the concentration of the class j contaminant at the ith monitoring point,represents the average concentration of the class j contaminants at all monitoring points,is the pollutant concentration after standardized treatment;
the calculation of a correlation coefficient matrix for the pollution source samples,
according to the formula:
optionally, the sample of the contamination source comprises polycyclic aromatic hydrocarbons or heavy metals.
Compared with the prior art, the pollution source identification method based on the corresponding analysis and the multiple linear regression has the following beneficial effects:
firstly, the method can quickly and accurately trace the source of the pollutant, has strong practicability and wide popularization and application values, and provides reliable technical support for environmental management departments to deal with pollution accidents and control pollution risks;
secondly, the traditional pollutant source analysis technology can only roughly give a pollution source class which has a large contribution to an environment receptor, but cannot give the contribution of a specific emission source to the receptor, and lacks the practical guiding significance on pollution prevention and treatment work;
thirdly, the invention provides technical support for making regional pollution control countermeasures and improving regional environment quality, so that when environmental management departments face pollution problems in the future, pollution sources can be rapidly identified through a system, a complete source analysis method and a corresponding data information system, and pollution prevention and control are carried out.
Detailed Description
All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a pollution source identification method based on correspondence analysis and multiple linear regression.
The present invention will be described in further detail with reference to specific embodiments in order to make the above objects, features and advantages more apparent and understandable.
A heavy metal pollution source identification method based on corresponding analysis and multiple linear regression comprises the following steps:
the method comprises the following steps: the method for calculating the heavy metal pollution source by applying the corresponding analysis method to determine the number of the main component factors comprises the following steps:
data normalization, according to the formula:
calculating a correlation coefficient matrix of the samples according to the formula as follows:
calculating the eigenvalues and corresponding eigenvectors of the correlation coefficient matrix:
characteristic value: lambda [ alpha ]1,λ2...λp
Feature vector: a isi=(ai1,ai2,...,ai3),i=1,2,...,p;
Taking all factors with the characteristic value larger than 1 as main factors, and determining the number of the main factors;
and step two, calculating the pollution source contribution rate of the factor load by utilizing multiple linear regression, and realizing source analysis of the characteristic pollutants, wherein the method comprises the following steps:
establishing a multiple linear regression model:
y represents the total concentration of the pollution source, P represents the number of extracted main factors, and XiIs a factor score, m is a pollutant category, i.e. a main factor, b is residual variable information which is not explained by the factor and belongs to random errors;
requirement XiFor non-linearity, sum y and X aboveiNormal normalization analysis was performed to yield:
z is the normal standard deviation of the source of contamination, XiIs a factor score, BiIs a multiple linear regression coefficient;
according to the formula: t is ti(%)=100(Bi/∑Bi) And calculating the average percentage contribution rate of the pollution source i.
In the first step, the process of determining the main factor is realized by corresponding analysis, and specifically includes:
The concentration of the pollutant after standardization is T is the sum of the concentrations of all monitoring points of the pollutant;
(2) computing a transition matrix
Z=(zij)
Wherein, i is 1.·, n; j 1.. p
T is the sum of the concentrations of the monitoring points of the pollutant for the normalized pollutant concentration, wherein
xi.=xi1+xi2+.......+xip,
x.j=x1j+x2j+......+xnj;
(3) Performing factor analysis
Analysis of R-type factor
Calculating characteristic roots of the skew matrix A which is Z' Z, extracting the previous m characteristic roots according to the cumulative percentage of the characteristic roots which is more than or equal to 85%, and calculating corresponding unit characteristic vectors to obtain a new factor load matrix;
analysis of Q-type factor
Calculating unit eigenvectors corresponding to the matrix B which is ZZ' for the m characteristic roots, thereby obtaining a Q-type factor load matrix;
(4) and (5) comprehensively analyzing and judging the main factor by R-Q.
The method is used for identifying heavy metal pollution sources in sediments in the Yangjiang river basin, and 10 station surface sediment samples are collected by the grab type sampler. The data analysis mainly performed on As, Hg, Cd, Cr and Pb, and the results are shown in Table 1.
TABLE 1
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.
Claims (3)
1. An environmental pollutant source analysis method based on corresponding analysis and multiple linear regression is characterized by comprising the following steps:
the method comprises the following steps: based on sample data of the pollution source, identifying the pollution source by using a corresponding analysis method, which specifically comprises the following steps:
carrying out standardized processing on sample data of the pollution source;
calculating a correlation coefficient matrix of the pollution source sample;
calculating the eigenvalue and corresponding eigenvector of the correlation coefficient matrix;
taking all factors with the characteristic value larger than 1 as main factors, and determining the number of the main factors;
step two: the method for realizing the factor load pollution source contribution rate calculation by utilizing the multiple linear regression and realizing the characteristic pollutant source analysis specifically comprises the following steps:
establishing a multiple linear regression model:
y represents the total concentration of the pollution source, P represents the number of extracted main factors, and XiIs a factor score, m is a pollutant class, b is residual variable information not interpreted by the factor;
requirement XiFor non-linearity, sum y and X aboveiNormal normalization analysis was performed to yield:
z is the normal standard deviation of the source of contamination, XiIs a factor score, BiIs a multiple linear regression coefficient;
according to the formula: t is ti(%)=100(Bi/∑Bi) And calculating the average percentage contribution rate of the pollution source i.
2. The pollution source identification method based on correspondence analysis and multiple linear regression as claimed in claim 1, wherein the normalization process is performed on the pollution source sample data,
according to the formula:
i represents the number of monitoring points, j represents the positions of the pollutants of the class I of the monitoring points, and xijIndicating the concentration of the class j contaminant at the ith monitoring point,represents the average concentration of the class j contaminants at all monitoring points,is the pollutant concentration after standardized treatment;
the calculation of a correlation coefficient matrix for the pollution source samples,
according to the formula:
3. the pollution source identification method based on the correspondence analysis and the multiple linear regression according to claim 1, wherein the pollution source sample comprises polycyclic aromatic hydrocarbons or heavy metals.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110102148.6A CN112949680A (en) | 2021-01-26 | 2021-01-26 | Pollution source identification method based on corresponding analysis and multiple linear regression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110102148.6A CN112949680A (en) | 2021-01-26 | 2021-01-26 | Pollution source identification method based on corresponding analysis and multiple linear regression |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112949680A true CN112949680A (en) | 2021-06-11 |
Family
ID=76236809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110102148.6A Pending CN112949680A (en) | 2021-01-26 | 2021-01-26 | Pollution source identification method based on corresponding analysis and multiple linear regression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112949680A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113706127A (en) * | 2021-10-22 | 2021-11-26 | 长视科技股份有限公司 | Water area analysis report generation method and electronic equipment |
CN116148400A (en) * | 2023-04-20 | 2023-05-23 | 北京大学 | Quantitative source analysis method based on pollution source and pollution receptor high-resolution mass spectrum data |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175647A (en) * | 2019-05-28 | 2019-08-27 | 北华航天工业学院 | A kind of pollution source discrimination clustered based on principal component analysis and K-means |
-
2021
- 2021-01-26 CN CN202110102148.6A patent/CN112949680A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175647A (en) * | 2019-05-28 | 2019-08-27 | 北华航天工业学院 | A kind of pollution source discrimination clustered based on principal component analysis and K-means |
Non-Patent Citations (1)
Title |
---|
刘颖: "上海市土壤和水体沉积物中多环芳烃的测定方法、分布特征和源解析", 《中国博士学位论文全文数据库工程科技Ⅰ辑》, no. 8, 15 August 2008 (2008-08-15), pages 5 - 2 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113706127A (en) * | 2021-10-22 | 2021-11-26 | 长视科技股份有限公司 | Water area analysis report generation method and electronic equipment |
CN113706127B (en) * | 2021-10-22 | 2022-02-22 | 长视科技股份有限公司 | Water area analysis report generation method and electronic equipment |
CN116148400A (en) * | 2023-04-20 | 2023-05-23 | 北京大学 | Quantitative source analysis method based on pollution source and pollution receptor high-resolution mass spectrum data |
CN116148400B (en) * | 2023-04-20 | 2023-06-27 | 北京大学 | Quantitative source analysis method based on pollution source and pollution receptor high-resolution mass spectrum data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105868479A (en) | Polycyclic aromatic hydrocarbon source apportionment method | |
CN105631203A (en) | Method for recognizing heavy metal pollution source in soil | |
CN112949680A (en) | Pollution source identification method based on corresponding analysis and multiple linear regression | |
CN105184000A (en) | Nonnegative-constrain-factor pollution source apportionment method based on naive Bayesian source identification | |
CN113281229B (en) | Multi-model self-adaptive atmosphere PM based on small samples 2.5 Concentration prediction method | |
CN112904810B (en) | Process industry nonlinear process monitoring method based on effective feature selection | |
CN112198144B (en) | Method and system for quickly tracing sewage | |
CN110619691B (en) | Prediction method and device for slab surface cracks | |
CN108052486B (en) | Fine source analysis method based on inorganic components and organic markers of particulate matters | |
CN103712939A (en) | Pollutant concentration fitting method based on ultraviolet-visible spectrum | |
CN115453064B (en) | Fine particulate matter air pollution cause analysis method and system | |
CN113516228A (en) | Network anomaly detection method based on deep neural network | |
CN116187861A (en) | Isotope-based water quality traceability monitoring method and related device | |
CN109669017B (en) | Refinery distillation tower top cut water ion concentration prediction method based on deep learning | |
CN110175647A (en) | A kind of pollution source discrimination clustered based on principal component analysis and K-means | |
CN113918707A (en) | Policy convergence and enterprise image matching recommendation method | |
CN117538492B (en) | On-line detection method and system for pollutants in building space | |
CN114217025A (en) | Analysis method for evaluating influence of meteorological data on air quality concentration prediction | |
CN117408519A (en) | Chemical reaction process risk early warning method and device based on deep learning algorithm | |
CN115879379B (en) | Intelligent corrosion monitoring and early warning method and system for equipment | |
CN112633528A (en) | Power grid primary equipment operation and maintenance cost determination method based on support vector machine | |
CN117493759A (en) | Gas methane distinguishing method and device based on principal component analysis and vector machine | |
CN111737993A (en) | Method for extracting health state of equipment from fault defect text of power distribution network equipment | |
Zhang et al. | Determining statistical process control baseline periods in long historical data streams | |
CN112711911B (en) | Rapid pollution tracing method applied to boundary observation based on pollution source spectrum library |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |