CN117009767A - Soil benchmark formulation and risk assessment method based on bioavailability - Google Patents

Soil benchmark formulation and risk assessment method based on bioavailability Download PDF

Info

Publication number
CN117009767A
CN117009767A CN202311002800.2A CN202311002800A CN117009767A CN 117009767 A CN117009767 A CN 117009767A CN 202311002800 A CN202311002800 A CN 202311002800A CN 117009767 A CN117009767 A CN 117009767A
Authority
CN
China
Prior art keywords
soil
bioavailability
pollutant
data
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311002800.2A
Other languages
Chinese (zh)
Other versions
CN117009767B (en
Inventor
王晓南
张加文
王旭升
吴凡
艾舜豪
王佳琪
刘征涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Research Academy of Environmental Sciences
Original Assignee
Chinese Research Academy of Environmental Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Research Academy of Environmental Sciences filed Critical Chinese Research Academy of Environmental Sciences
Priority to CN202311002800.2A priority Critical patent/CN117009767B/en
Publication of CN117009767A publication Critical patent/CN117009767A/en
Application granted granted Critical
Publication of CN117009767B publication Critical patent/CN117009767B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Processing Of Solid Wastes (AREA)

Abstract

The invention discloses a soil benchmark formulation and risk assessment method based on biological effectiveness, which comprises the following steps: acquiring soil pollutant biological availability data and preprocessing; dividing the pretreated soil pollutant biological effectiveness data into data sets according to pollutant types; establishing a soil pollutant biological effectiveness content prediction model, and training based on a data set to obtain a best fit model; outputting bioavailability data based on the best fit model; and calculating a soil benchmark value based on the bioavailability data and performing ecological risk assessment of soil pollutants based on the bioavailability. The method fully considers the importance of quantifying the biological effectiveness of the pollutants in the soil and treating the geographical variability, can well predict the biological effectiveness content of the pollutants in the target soil, and avoids overestimation of ecological risks.

Description

Soil benchmark formulation and risk assessment method based on bioavailability
Technical Field
The invention relates to the technical field of ecological environment protection, in particular to a soil benchmark formulation and risk assessment method based on biological effectiveness.
Background
For a long time, since soil pollutant ecological risk assessment has a deep degree of dependence on pollutant exposure data, ecological risk assessment is mostly based on total content of pollutants, however, the total content cannot accurately reflect geographical variability and biotoxicity. Research shows that the actual biotoxic effect is generated by the bioavailability content of pollutants, and the bioavailability can well reflect the difference of soil properties. Since the problem of how to scientifically and reasonably apply biological effectiveness to ecological risk assessment remains unsolved, few methods based on biological effectiveness are devoted to deriving soil quality benchmarks and assessing pollutant ecological risk.
Therefore, how to provide a method for soil benchmark formulation and risk assessment based on biological effectiveness is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a method for establishing a soil benchmark and evaluating risks based on biological effectiveness, which deduces the benchmark of soil pollutants with multiple regional scales based on biological effectiveness and evaluates ecological risks to obtain a more accurate ecological risk evaluation result and avoid overestimation of ecological risks.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a method of soil benchmark formulation and risk assessment based on bioavailability, comprising:
acquiring soil pollutant biological availability data and preprocessing;
dividing the pretreated soil pollutant biological effectiveness data into data sets according to pollutant types;
aiming at each kind of pollutant, establishing a soil pollutant bioavailability content prediction model, and training based on a data set of the kind of pollutant to obtain a best fit model;
outputting bioavailability data based on the best fit model;
and calculating a soil benchmark value based on the bioavailability data and performing ecological risk assessment of soil pollutants based on the bioavailability.
Preferably, acquiring soil pollutant bioavailability data and performing pretreatment specifically comprises:
acquiring soil pollutant biological effectiveness data by combining domestic and foreign databases and documents with experimental testing methods;
in view of the integrity and reliability of the contaminant bioavailability data, the soil contaminant bioavailability data is screened according to the following conditions:
(i) The data need to provide a basic sampling method, a processing method or experimental conditions for research;
(ii) The data need to include the pollutant bioavailability content in the soil, the pollutant bioavailability content extraction method, the soil physicochemical property data (including pH value, cation exchange capacity CEC, organic matter content OM and Clay class content) and the total content of the soil pollutants;
(iii) The contaminant bioavailability data extracted by the in vitro gastrointestinal tract simulation method concerning human health is deleted.
Preferably, the classification data set of the pretreated soil pollutant biological effectiveness data according to the pollutant types specifically comprises:
and (3) carrying out normal distribution inspection on the biological effectiveness data of the soil pollutants, and carrying out logarithmic conversion on the data and then re-inspecting if the biological effectiveness data does not accord with the normal distribution. The dataset was then divided into different subsets according to the different heavy metals, with 80% of the samples randomly selected for each subset as training dataset and 20% as test dataset.
Preferably, for each type of pollutant, establishing a soil pollutant bioavailability content prediction model and training based on a data set of the type of pollutant, and obtaining a best fit model specifically comprises:
taking the soil physicochemical property data, the total content of the soil pollutants and the extraction method of the pollutant bioavailability content as independent variables, and respectively establishing a soil pollutant bioavailability content prediction model through multiple linear regression, random forests and a multilayer perceptron;
and selecting a model with optimal performance among multiple linear regression, random forests and multi-layer perceptrons as a best fit model.
Preferably, the soil pollutant bioavailability content prediction model is established by adopting multiple linear regression:
lg(Bio)=k 1 lg(Total)+k 2 lg(OM)+k 3 lg(CEC)+k 4 lg(Sand)+k 5 lg(Silt)+k 6 lg(Clay)+k 7 lg(Fe)+k 8 BM+k 9 pH+k n
k in 1 、k 2 、k 3 、k 4 、k 5 、k 6 、k 7 、k 8 、k 9 And k n For model coefficients, bio is the bioavailability content, total is the Total content of pollutants in soil, OM is the content of soil organic matters, CEC is the cation exchange capacity of soil, clay is the content of Clay particles, sand is the content of Sand particles in soil, sil is the content of powder particles in soil, fe is the content of iron particles in soil, PH is the pH value of soil, and BM is the method for extracting the bioavailability content of pollutants.
Preferably, a random forest is used for establishing a soil pollutant biological effectiveness content prediction model, the random forest is an integrated learning method, a plurality of decision trees are fitted on different subsets of a data set, each decision tree is trained by randomly selecting a part of samples and features, the decision trees are constructed according to node splitting criteria, and the results of each tree are averaged to improve prediction precision and control over fitting, and the method comprises the following specific steps:
the method comprises the following steps: input deviceTraining data set: d= { (x) 1 ,y 1 ),(x 2 ,y 2 ),…,(x n ,y n ) X, where x i Is the characteristic vector of the input sample, the characteristic vector of the input sample forms a characteristic set, y i Is the corresponding target variable;
step 102: for each decision tree T j (j=1, 2, …, m, representing the number of the decision tree) the following steps are performed:
sampling with put back from training data set D to obtain a new training set D j
Randomly selecting a part of features from the feature set to obtain a new feature subset F j
Using training set D j And feature subset F j Constructing decision tree T j
Step 103: for each decision tree T by a new input sample x j Predicting to obtain a prediction result
Prediction results for all decision treesAveraging or voting to obtain final prediction result +.>
Step 104: fitting the result parameters R according to the model 2 And continuously adjusting the number of the random decision trees and the number of the random features until the best fitting result is achieved.
Preferably, the process of establishing the soil heavy metal bioavailability content prediction model by adopting the multi-layer perceptron comprises the following steps:
step a: inputting a training data set: d= { (x) 1 ,y 1 ),(x 2 ,y 2 ),…,(x n ,y n ) X, where x i Is the eigenvector of the input sample, y i Is the corresponding target variable;
step b: initializing weight and bias parameters of the multi-layer perceptron;
step c-for each training sample (x i ,y i ) The following steps are carried out:
forward propagation: calculating the output of each neuron until a final predicted value is obtained
Calculating a loss function: comparing the predicted valuesAnd true value y i Obtaining a loss function value L i
Back propagation: updating weight and bias parameters in the network according to the gradient of the loss function;
step d: repeating step c until a predetermined stopping condition is reached (e.g., a maximum number of iterations is reached or the loss function converges);
step e: fitting the result parameters R according to the model 2 The number of hidden layers, the number of neurons in each hidden layer, the maximum number of iterations, the type of learning algorithm, and the type of activation function are continuously adjusted until the best fitting result is reached.
Preferably, the process of obtaining the best fit model is:
ten times of cross validation is carried out on the training set after the prediction model is established, and the model is used for fitting the result parameter R 2 And comparing the performance of the model and the prediction accuracy with the Root Mean Square Error (RMSE), wherein the working process is as follows:
1) Dividing the data set:
D=D 1 ∪D 2 ∪…∪D k
2) Model training:
for each fold D i Using the rest folds as training set to obtain model parameter theta i ;θ i =train(D\D i )
3) Model evaluation:
for each fold D i Using model parameters θ i For training data set D i Making predictionsObtaining a prediction result
4) Performance evaluation:
using some evaluation index (e.g. determining coefficient R 2 And root mean square error RMSE, etc.) to compare the prediction resultsAnd true value y i Obtaining the value E of the evaluation index i
5) Summarizing the evaluation results:
statistics and summary of all fold assessment indicators E 1 ,E 2 ,…,E k A final performance assessment of the model is obtained.
And finally, selecting the best fitting model in the models for subsequent steps.
Preferably, the bioavailability data comprises a shift in soil pollutant toxicity value, a shift in soil pollutant background value, and a shift in soil pollutant exposure value.
The toxicity values of the soil pollutants are all exogenously added concentrations, and the toxicity values based on bioavailability can be predicted by the created soil pollutant bioavailability content prediction model. It should be noted that a given contaminant bioavailability level extraction method is required when predicting using a predictive model, and the soil contaminant bioavailability level is determined using the extraction method.
Similarly, the conversion value of the background value of the soil pollutant and the conversion value of the exposure value of the soil pollutant are consistent with the conversion value of the toxicity value of the soil pollutant in the prediction process.
Preferably, calculating a soil benchmark value based on bioavailability based on the bioavailability data and performing the soil pollutant ecological risk assessment based on the bioavailability specifically includes:
calculating a soil reference value based on bioavailability; specifically, a. Using species sensitivity profiles (Species sensitivity distribution, SSD), the dangerous concentration of 5% species based on Bioavailability was determined (Bioavailabilities-based hazardous concentration for the 5%species,Bio-based HC 5 ) Value of
SSD curve for contaminants based on log-normal function calculation Bio-based HC 5 . The fitting formula of the curve is as follows:
Y=1/2*(1+erf((X-P 1 )/SQRT(2*P 2 ^2)))
where Y is the cumulative probability of a species, defined as (sequence of data points)/(total number of 1+ data points), and the value of X on the SSD curve at 5% on the Y-axis is Bio-based HC5; x is the average value of the transformed values based on the toxicity value of the soil pollutant after log10 transformation; p1 and P2 are parameters.
b. Calculating a soil reference value based on bioavailability:
Bio-based SQC=Bio-based SBV+Bio-based HC 5
wherein Bio-based SQC is a soil benchmark value based on bioavailability, bio-based HC 5 The Bio-based SBV is a conversion value of the background value of the soil pollutant, which is obtained by fitting and deducing the conversion value of the toxicity value of the soil pollutant by a species sensitivity distribution curve method for the dangerous concentration value of 5% of species based on biological effectiveness.
The ecological risk of soil contamination based on bioavailability is assessed by a soil benchmark value based on bioavailability. The soil pollutant ecological risk assessment index of biological effectiveness comprises: a single factor pollution index or a combined endo-merrill pollution index;
the single factor pollution index is used for evaluating the relative dimensionless index of soil, crop pollution level or soil environment quality grade, and can comprehensively reflect the pollution degree of each element, and the calculation formula is as follows:
wherein P is i Environmental pollution index, P, for the ith pollutant in the soil i No pollution is caused when the content is less than or equal to 1; 1<P i Less than or equal to 2 indicates slight pollution; 2<P i Less than or equal to 3 is moderate pollution; p (P) i >3 represents serious contamination; bio-based EEV i For the ith pollution in the soilA conversion value of the exposure value of the object; bio-based SQC i A soil benchmark value based on bioavailability for the ith contaminant;
the method (the Nemerow integrated pollution index, PN) for calculating the comprehensive pollution index of the inner plum is to calculate the classification index of each factor, then take the maximum classification index and the average value, and the calculation formula of the comprehensive pollution index of the inner plum is as follows:
wherein P is N For the inner Mezle comprehensive pollution index, bio-based EEV i An environmental exposure bioavailability value for an ith contaminant in the soil; bio-based SQC i P, a soil benchmark value based on bioavailability of the ith pollutant N The values were respectively determined as safe, warning limit, slight contamination, moderate contamination and severe contamination with corresponding ranges of 0-0.7,0.7-1,1-2,2-3 and greater than 3.
Compared with the prior art, the method fully considers the importance of quantifying the biological effectiveness of pollutants in the soil and processing the geographical variability, develops a novel biological effectiveness-based ecological risk assessment method, can well predict the biological effectiveness content of the pollutants in the target soil, is then applied to the proposed biological effectiveness-based soil ecological risk assessment, can avoid deviation of risk assessment results caused by soil property differences, can also avoid overestimation of ecological risks, can obtain more accurate ecological risk assessment results, and can provide technical support for accurately assessing the ecological risks and environmental risk management work of the polluted soil.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method for soil benchmark formulation and risk assessment based on biological effectiveness provided by the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention discloses a method for establishing a soil benchmark and evaluating risk based on biological effectiveness by taking heavy metals as an example, which is shown in figure 1 and comprises the following steps:
acquiring soil heavy metal bioavailability data and preprocessing;
dividing the pretreated soil heavy metal bioavailability data into data sets according to heavy metal types;
establishing a soil heavy metal bioavailability content prediction model, and training based on a data set to obtain a best fit model;
outputting bioavailability data based on the best fit model;
and calculating a soil benchmark value based on the bioavailability data and performing ecological risk assessment of the soil heavy metal based on the bioavailability.
The present invention is further specifically described below:
1. acquiring soil heavy metal bioavailability data and preprocessing:
the soil heavy metal bioavailability data uses strict quality control, and mainly has two sources:
a. about 90% of heavy metal bioavailability data comes from published articles in the Web ofScience, elsevier Science Direct and china knowledge infrastructure databases. The key words used for searching are mainly as follows: "element", "heavy metal", "soil", "bioavailabilities" and "bioaccessibility". In view of the integrity and reliability of the heavy metal bioavailability data, the obtained publications were then screened under the following conditions: (i) Providing a basic sampling method, a processing method or experimental conditions for research; (ii) All the collected information comprises specific soil characteristics (pH, CEC, OM and Clay content), total content of soil heavy metals, bioavailability content of soil heavy metals and extraction method of the bioavailability content of the heavy metals; (iii) And deleting the biological effective state data of heavy metals about human health extracted by an in-vitro gastrointestinal tract simulation method.
Collecting 12 natural soil samples with depth of 0-20 cm in 2020-2021, and adopting CaCl 2 Extraction, EDTA extraction and HNO 3 Three methods of extraction extract the bioavailability content of heavy metals As, cd, cr, cu, ni, pb and Zn in the soil. This portion of the data represents approximately 10% of the data set.
The bioavailability data determined by the same test method for the same heavy metals under the same soil properties were further processed using geometric means. The number of bioavailability data for the final co-obtained As, cd, cr, cu, ni, pb and Zn were 260, 681, 163, 794, 193, 610 and 823, respectively.
2. Dividing the pretreated soil heavy metal bioavailability data into data sets according to heavy metal types:
the research simultaneously uses 3 models to predict the bioavailability content of the heavy metal in the soil, and the specific method for establishing the model is as follows:
firstly, according to different heavy metals, the data set is divided into 7 subsets (As, cd, cr, cu, ni, pb and Zn), corresponding to 7 regression models, each subset uses 80% of samples randomly selected by R language As a training set and 20% of samples As a test set. And 4 soil properties (pH, CEC, OM and Clay contents), total content of soil heavy metals and bioavailability of heavy metals are used for constructing a prediction model to predict the bioavailability content of the soil heavy metals. All the predictive models are built by using R
4.2.2 and RStudio 2023.03.0 software.
(1) Multiple linear regression (Multiple linear regression, MLR) analysis was used to build the predictive model. The training set is used to build a predictive model by self-contained lm () function in the R software, and the test set is used to validate the model. The dataset was then cross-validated by the "caret" package and the decision coefficients (R 2 ) And Root Mean Square Error (RMSE) to compare the performance of the model and the accuracy of the predictions.
(2) A Multi-layer prediction (MLP) is used to build the prediction model. The MLP model is implemented by means of the "RSNNS" package. When a regression model is established for the package, data is required to be normalized to be between 0 and 1, so that the heavy metal bioavailability content extraction method is converted into a numerical label. Cross-validation of ten folds by "caret" package to determine optimal model parameters using R 2 And RMSE to compare the performance of the model with the accuracy of the predictions. The optimal parameters for the MLP model of this study are as follows: the hidden layers are 3, and the number of neurons in each hidden layer is 10; maximum iteration number 200 times; the learning algorithm is "Rprop"; the activation function is "act_tanh".
(3) A Random Forest (RF) model is used to build the predictive model. And establishing a prediction model by using a random forest program package in R software. The RF model determines optimal parameters through a grid search method: using 1500 random decision trees, 5 random features are selected at each node. Ten times cross-validation of training set and use R 2 And RMSE to compare the performance of the model with the accuracy of the predictions.
3. Outputting bioavailability data based on the best fit model:
the bioavailability data comprises a conversion value of a soil heavy metal toxicity value, a conversion value of a soil heavy metal background value and a conversion value of a soil heavy metal exposure value.
4. Calculating a soil benchmark value based on the bioavailability data and performing a soil heavy metal ecological risk assessment based on the bioavailability:
SSD curves for heavy metals calculate dangerous concentration values for 5% of species based on log-normal functions. The fitting formula of the curve is as follows:
Y=1/2*(1+erf((X-P 1 )/SQRT(2*P 2 ^2)))
where Y is the cumulative probability of the species, defined as (sequence of data points)/(total number of 1+ data points); x is the average of the bioavailability-based toxicity values over log 10; p1 and P2 are parameters.
Calculating a soil reference value based on bioavailability:
Bio-based SQC=Bio-based SBV+Bio-based HC 5
wherein Bio-based SQC is a soil benchmark value based on bioavailability, bio-based HC 5 For a dangerous concentration value of 5% species based on bioavailability, bio-based SBV is a conversion value of soil heavy metal background value.
The ratio between the calculated heavy metal Environmental Exposure Value (EEV) and the SQC is evaluated to evaluate the ecological risk level. The present invention uses an established predictive model to convert EEVs to environmental exposure bioavailability values (Bio-based EEVs). The single factor pollution index (the single factor pollution index, PI) is a relatively dimensionless index for evaluating soil, crop pollution level or soil environmental quality grade, and can comprehensively reflect the pollution degree of each element. The calculation formula is as follows:
wherein P is i Is the environmental pollution index (P i No pollution is caused when the content is less than or equal to 1; 1<P i Less than or equal to 2 indicates slight pollution; 2<P i Less than or equal to 3 is moderate pollution; p (P) i >3 represents severe contamination); bio-based EEV i The biological effectiveness value of the environmental exposure of the ith heavy metal in the soil is shown as mg/kg; bio-based SQC i The soil reference value based on bioavailability for the ith heavy metal is given in mg/kg.
Neumello Integrated pollution index method (the Nemerow integrated pollution index, P) N ) Firstly, calculating the classification index of each factor, then taking the maximum classification indexNumber and average value. The calculation formula is as follows:
wherein P is N For the inner Mezle comprehensive pollution index, bio-based EEV i The biological effectiveness value of the environmental exposure of the ith heavy metal in the soil is shown as mg/kg; bio-based SQC i The soil reference value based on bioavailability for the ith heavy metal is given in mg/kg. The pollution degree is P N The values are respectively defined as safe, warning limit, slight pollution, moderate pollution and serious pollution by taking 0-0.7,0.7-1,1-2,2-3 and more than 3 as corresponding ranges.
The specific implementation results of this embodiment are as follows:
a: biological effectiveness prediction results
3 methods are respectively used for establishing prediction models to predict the bioavailability of 7 heavy metals in soil. The predicted performance parameters for the bioavailability of heavy metals for the three models are shown in Table 1. The prediction error of MLP is the smallest (the RMSE range of 7 heavy metal test sets is 0.13-0.24), which is generally lower than that of RF and MLR models, but unfortunately, the prediction accuracy is poor, especially for Cd, ni and Pb, R 2 0.09, 0.01 and 0.05, respectively, fail to accurately fit the inherent relationship between the variable and the bioavailability of heavy metals. Prediction accuracy R of MLR 2 The prediction error is larger at 0.55 to 0.83, and is the largest in 3 models. The RF model predicts all heavy metals most accurately, R of training set 2 R of test set is not less than 0.95 2 Not less than 0.76, and has accurate and stable prediction capability as shown in Table 1.
TABLE 1 model predictive performance
B. Biological effectiveness-based ecological risk assessment results
Of the three models, the RF prediction model predicts best, becauseThis is applied in subsequent risk assessment. Toxicity values of all heavy metals were converted to bioavailability-based toxicity values using the established RF model, and then Hazard Concentrations (HC) of 5% species in the ecosystem were calculated using SSD curves 5 ). The total amount of soil heavy metal environmental exposure and the soil element background value are then converted into a bioavailability content by using an RF model, and a Bio-based SQC is calculated and risk assessment is performed. CaCl is selected in the invention 2 Extraction, EDTA extraction and HNO 3 Three methods are extracted as target methods, three data sets are obtained, and an average value is taken as a final prediction result to be used in risk assessment. Finally, the Bio-based SQC of the Chinese part province was obtained as shown in Table 2.
Table 2 soil quality reference value based on bioavailability
The results of the biological availability-based ecological risk assessment (Bioavailabilities-based ecological risk assessment, bio-based ERA) are shown in Table 3. Risk assessment based on RF model showed that, overall, heavy metal contamination was at a level of slight contamination and below (P i <2) As is slightly contaminated in the most provinces (the number is 4), cr, ni and Pb are free from contamination in all provinces (P i <1)。
TABLE 3 biological availability based heavy metal risk assessment results for China provincial soil
The invention emphasizes the necessity of quantifying the bioavailability of heavy metals in soil on the aspect of processing geographic variability through the provided risk assessment method based on bioavailability. The bioavailability of seven heavy metals (As, cd, cr, cu, ni, pb and Zn) in China is accurately predicted by using 4 types of data (soil property, soil heavy metal bioavailability state content, soil heavy metal total content and heavy metal bioavailability extraction method). The result shows that the RF model has accurate and stable prediction capability. The Bio-based SQC of 217 heavy metals saved in China was deduced through the RF model and SSD model, and the China Bio-based ERA was comprehensively evaluated using a risk assessment framework based on bioavailability. Compared with the ERA traditional method based on total content, the novel method can effectively avoid overestimation of ecological risks after reducing uncertainty of soil difference.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for soil benchmark formulation and risk assessment based on bioavailability, comprising:
acquiring soil pollutant biological availability data and preprocessing;
dividing the pretreated soil pollutant biological effectiveness data into data sets according to pollutant types;
aiming at each kind of pollutant, establishing a soil pollutant bioavailability content prediction model, and training based on a data set of the kind of pollutant to obtain a best fit model;
outputting bioavailability data based on the best fit model;
and calculating a soil benchmark value based on the bioavailability data and performing ecological risk assessment of soil pollutants based on the bioavailability.
2. The method for soil benchmark formulation and risk assessment based on bioavailability of claim 1, wherein the steps of obtaining soil pollutant bioavailability data and preprocessing comprise:
acquiring soil pollutant biological effectiveness data;
soil pollutant bioavailability data were screened according to the following conditions:
(i) The data need to provide a basic sampling method, a processing method or experimental conditions for research;
(ii) The data comprises the biological availability content of the pollutants in the soil, an extraction method of the biological availability content of the pollutants, physical and chemical property data of the soil and total content of the pollutants in the soil;
(iii) The contaminant bioavailability data extracted by the in vitro gastrointestinal tract simulation method concerning human health is deleted.
3. The method of soil benchmark formulation and risk assessment based on bioavailability of claim 2, wherein for each type of contaminant, establishing a soil contaminant bioavailability content prediction model and training based on the data set of that type of contaminant, the obtaining a best fit model specifically comprises:
taking the soil physicochemical property data, the total content of the soil pollutants and the extraction method of the pollutant bioavailability content as independent variables, and respectively establishing a soil pollutant bioavailability content prediction model through multiple linear regression, random forests and a multilayer perceptron;
and selecting a model with optimal performance among multiple linear regression, random forests and multi-layer perceptrons as a best fit model.
4. The method for soil benchmark formulation and risk assessment based on bioavailability of claim 3, wherein the method for establishing the soil pollutant bioavailability content prediction model by adopting multiple linear regression is as follows:
lg(Bio)=k 1 lg(Total)+k 2 lg(OM)+k 3 lg(CEC)+k 4 lg(Sand)+k 5 lg(Silt)+k 6 lg(Clay)+k 7 lg(Fe)+k 8 BM+k 9 pH+k n
k in 1 、k 2 、k 3 、k 4 、k 5 、k 6 、k 7 、k 8 、k 9 And k n For model coefficients, bio is the bioavailability content, total is the Total content of pollutants in soil, OM is the organic matter content of the soil, CEC is the cation exchange capacity of the soil, clay is the Clay content of the soil, sand is the Sand content of the soil, sil is the powder content of the soil, fe is the iron content of the soil, pH is the pH value of the soil, and BM is the method for extracting the bioavailability content of the pollutants.
5. The method for establishing soil benchmarking and risk assessment based on bioavailability of claim 3, wherein the process of establishing the soil pollutant bioavailability content prediction model by adopting random forests is as follows:
the method comprises the following steps: inputting a training data set: d= { (x) 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n ) X, where x i Is the characteristic vector of the input sample, the characteristic vector of the input sample forms a characteristic set, y i Is the corresponding target variable;
step 102: for each decision tree T j The following steps are performed, j=1, 2,..m, representing the number of decision trees:
sampling with put back from training data set D to obtain a new training set D j
Randomly selecting a part of features from the feature set to obtain a new feature subset F j
Using training set D j And feature subset F j Constructing decision tree T j
Step 103: for each decision tree T by a new input sample x j Predicting to obtain a prediction result
Prediction results for all decision treesAveraging or voting to obtain final prediction result +.>
Step 104: fitting the result parameters R according to the model 2 And continuously adjusting the number of the random decision trees and the number of the random features until the best fitting result is achieved.
6. The method for establishing soil benchmark formulation and risk assessment based on bioavailability according to claim 3, wherein the process of establishing the soil pollutant bioavailability content prediction model by adopting the multi-layer perceptron is as follows:
step a: inputting a training data set: d= { (x) 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n ) X, where x i Is the eigenvector of the input sample, y i Is the corresponding target variable;
step b: initializing weight and bias parameters of the multi-layer perceptron;
step c-for each training sample (x i ,y i ) The following steps are carried out:
forward propagation: calculating the output of each neuron until a final predicted value is obtained
Calculating a loss function: comparing the predicted valuesAnd true value y i Obtaining a loss function value L i
Back propagation: updating weight and bias parameters in the network according to the gradient of the loss function;
step d: repeating step c until a predetermined stop condition is reached;
step e: fitting the result parameters R according to the model 2 The number of hidden layers, the number of neurons in each hidden layer, the maximum number of iterations, the type of learning algorithm, and the type of activation function are continuously adjusted until the best fitting result is reached.
7. The method of claim 1, wherein the bioavailability data comprises a shift in soil pollutant toxicity values, a shift in soil pollutant background values, and a shift in soil pollutant exposure values.
8. The method of biological availability based soil benchmark formulation and risk assessment of claim 7, wherein calculating the biological availability based soil benchmark value based on the biological availability data and performing the biological availability based soil contaminant ecological risk assessment comprises:
calculating a soil reference value based on bioavailability;
the ecological risk of soil contamination based on bioavailability is assessed by a soil benchmark value based on bioavailability.
9. The method for making a soil benchmark with biological effectiveness and evaluating risk according to claim 8, wherein the soil benchmark with biological effectiveness is calculated by the following formula:
Bio-based SQC=Bio-based SBV+Bio-based HC 5
wherein Bio-based SQC is a soil benchmark value based on bioavailability, bio-based HC 5 The Bio-based SBV is a conversion value of the background value of the soil pollutant, which is obtained by fitting and deducing the conversion value of the toxicity value of the soil pollutant by a species sensitivity distribution curve method for the dangerous concentration value of 5% of species based on biological effectiveness.
10. The method for soil benchmark formulation and risk assessment based on bioavailability of claim 9, wherein the soil pollutant ecological risk assessment indicator of bioavailability comprises: a single factor pollution index or a combined endo-merrill pollution index;
the single factor pollution index is used for evaluating the relative dimensionless index of soil, crop pollution level or soil environment quality grade, and can comprehensively reflect the pollution degree of each element, and the calculation formula is as follows:
wherein P is i Environmental pollution index, P, for the ith pollutant in the soil i No pollution is caused when the content is less than or equal to 1; 1<P i Less than or equal to 2 indicates slight pollution; 2<P i Less than or equal to 3 is moderate pollution; p (P) i >3 represents serious contamination; bio-based EEV i A conversion value for the exposure value of the ith pollutant in the soil; bio-based SQC i A soil benchmark value based on bioavailability for the ith contaminant;
the calculation formula of the comprehensive pollution index of the endo-melilone is as follows:
wherein P is N For the inner Mezle comprehensive pollution index, bio-based EEV i An environmental exposure bioavailability value for an ith contaminant in the soil; bio-based SQC i For the soil benchmark value based on bioavailability of the ith heavy metal, P N The values were respectively determined as safe, warning limit, slight contamination, moderate contamination and severe contamination with corresponding ranges of 0-0.7,0.7-1,1-2,2-3 and greater than 3.
CN202311002800.2A 2023-08-10 2023-08-10 Soil benchmark formulation and risk assessment method based on bioavailability Active CN117009767B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311002800.2A CN117009767B (en) 2023-08-10 2023-08-10 Soil benchmark formulation and risk assessment method based on bioavailability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311002800.2A CN117009767B (en) 2023-08-10 2023-08-10 Soil benchmark formulation and risk assessment method based on bioavailability

Publications (2)

Publication Number Publication Date
CN117009767A true CN117009767A (en) 2023-11-07
CN117009767B CN117009767B (en) 2024-04-26

Family

ID=88570808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311002800.2A Active CN117009767B (en) 2023-08-10 2023-08-10 Soil benchmark formulation and risk assessment method based on bioavailability

Country Status (1)

Country Link
CN (1) CN117009767B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596409A (en) * 2018-07-16 2018-09-28 江苏智通交通科技有限公司 The method for promoting traffic hazard personnel's accident risk prediction precision
CN113406287A (en) * 2021-05-27 2021-09-17 中国科学院水生生物研究所 Regional protection aquatic organism water quality benchmark derivation method for optimally controlling heavy metal pollutant chromium
CN113657799A (en) * 2021-08-26 2021-11-16 北京市环境保护科学研究院 Method for evaluating environmental health risk of benzo [ a ] pyrene in soil and animal model
CN114167031A (en) * 2021-11-22 2022-03-11 中国环境科学研究院 Method for predicting bioavailability content of heavy metals in soil

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596409A (en) * 2018-07-16 2018-09-28 江苏智通交通科技有限公司 The method for promoting traffic hazard personnel's accident risk prediction precision
CN113406287A (en) * 2021-05-27 2021-09-17 中国科学院水生生物研究所 Regional protection aquatic organism water quality benchmark derivation method for optimally controlling heavy metal pollutant chromium
CN113657799A (en) * 2021-08-26 2021-11-16 北京市环境保护科学研究院 Method for evaluating environmental health risk of benzo [ a ] pyrene in soil and animal model
CN114167031A (en) * 2021-11-22 2022-03-11 中国环境科学研究院 Method for predicting bioavailability content of heavy metals in soil

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李志博 等: "基于稻米摄入风险的稻田土壤镉临界值研究:个案研究", 土壤学报, no. 01 *
李志博 等: "基于稻米摄入风险的稻田土壤镉临界值研究:个案研究", 土壤学报, no. 01, 15 January 2008 (2008-01-15) *

Also Published As

Publication number Publication date
CN117009767B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN112381369B (en) Water body pollution tracing and risk prediction evaluation method based on online spectrum identification
CN107909564A (en) A kind of full convolutional network image crack detection method based on deep learning
CN103530688B (en) A kind of grinding technique detection system and method
CN110610209A (en) Air quality prediction method and system based on data mining
CN114781538A (en) Air quality prediction method and system of GA-BP neural network coupling decision tree
Jamal et al. Predicting air quality index based on meteorological data: A comparison of regression analysis, artificial neural networks and decision tree
Kapadia et al. Prediction of tropospheric ozone using artificial neural network (ANN) and feature selection techniques
Negara et al. Riau forest fire prediction using supervised machine learning
CN112418682A (en) Security assessment method fusing multi-source information
Shichkin et al. Comparison of artificial neural network, random forest and random perceptron forest for forecasting the spatial impurity distribution
CN117009767B (en) Soil benchmark formulation and risk assessment method based on bioavailability
CN113077271A (en) Enterprise credit rating method and device based on BP neural network
CN116090710B (en) Management method, system, electronic equipment and medium for enterprise pollution discharge permission
CN113281229A (en) Multi-model self-adaptive atmosphere PM based on small samples2.5Concentration prediction method
CN112308426A (en) Training method, evaluation method and device for food heavy metal pollution risk evaluation model
Hamami et al. Classification of air pollution levels using artificial neural network
CN115936192A (en) Method and system for predicting risk of soil environmental pollutants
Kekulanadara et al. Machine learning approach for predicting air quality index
Tarasov et al. A hybrid method for assessment of soil pollutants spatial distribution
CN113034316B (en) Patent value conversion analysis method and system
CN115408646A (en) River pollutant monitoring method and system based on big data
CN111062118B (en) Multilayer soft measurement modeling system and method based on neural network prediction layering
Zhang et al. Intelligent air quality detection based on genetic algorithm and neural network: An urban China case study
Yang et al. Simulation System of Lake Eutrophication Evolution based on RS & GIS Technology—a Case Study in Wuhan East Lake
Barthwal et al. Advancing air quality prediction models in urban India: a deep learning approach integrating DCNN and LSTM architectures for AQI time-series classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant