CN111564187B - Method and system for predicting reaction rate constant of organic matter and singlet oxygen - Google Patents
Method and system for predicting reaction rate constant of organic matter and singlet oxygen Download PDFInfo
- Publication number
- CN111564187B CN111564187B CN202010380633.5A CN202010380633A CN111564187B CN 111564187 B CN111564187 B CN 111564187B CN 202010380633 A CN202010380633 A CN 202010380633A CN 111564187 B CN111564187 B CN 111564187B
- Authority
- CN
- China
- Prior art keywords
- reaction rate
- rate constant
- quantitative structure
- compounds
- molecular
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/10—Analysis or design of chemical reactions, syntheses or processes
Abstract
The invention relates to a method and a system for predicting a reaction rate constant of an organic matter and singlet oxygen. The method comprises the following steps: collecting data related to reaction rate constants of organic matters under different pH values of an aqueous solution; dividing the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data; carrying out statistical regression modeling according to the data related to the reaction rate constant of the training set to obtain a quantitative structure-activity relation model of the singlet oxygen reaction rate constant; verifying the quantitative structure-activity relation model according to the relevant data of the reaction rate constant of the test set; obtaining a molecular structure of an organic matter to be predicted; and inputting the molecular structure of the organic matter to be predicted into a corresponding quantitative structure-activity relation model as input to obtain a reaction rate constant of the organic matter to be predicted and the singlet oxygen. The method can simply, quickly and efficiently predict the reaction rate constant of the organic compound in water and the singlet oxygen under any pH condition.
Description
Technical Field
The invention relates to the field of prediction and evaluation of reaction rate constants, in particular to a method and a system for predicting a reaction rate constant of an organic matter and singlet oxygen.
Background
Singlet oxygen ( 1 O 2 ) Is an important Reactive Oxygen Species (ROS), has a wide generation path, and can participate in the transformation of organic pollutants in water and the reaction of biological and chemical systems. Furthermore, based on 1 O 2 The photosensitization technology is an emerging advanced oxidation technology for removing organic pollutants, and is widely concerned in the field of water treatment. In view of this, organic pollutants and 1 O 2 reactivity of the reaction (using reaction rate constants)Representation) not only for understanding 1 O 2 Has important significance for the conversion of organic pollutants in water and can provide scientific basis for the practical feasibility evaluation of the photosensitization technology.
According to the United states of AmericaStatistics by Chemical Abstracts Service (CAS) show that the chemicals registered worldwide have more than 1.11 hundred million, with about 1.5 million new chemicals added each day, however, only hundreds of chemicals currently have available aqueous phasesAnd (4) data. In addition, the organic compounds are determined individually by experimental methodsThe value, not only is a large amount of manpower, material resources consumed, but also lags behind the pollution prevention requirement in time. Therefore, chemical environmental migration and transformation behavior parameters (such asData) have important research significance.
The Quantitative Structure-Activity Relationship (QSAR) means the Relationship between the molecular Structure of organic chemicals and the physicochemical property, environmental behavior parameters and ecotoxicology effect of the organic chemicals which are quantitatively reflected in a mathematical model mode, can provide powerful data support for the environmental risk evaluation of the organic chemicals, and has the advantages of making up the lack of experimental data, reducing the experimental consumption and evaluating the uncertainty of the data and the like. So the aqueous phase of organic chemicals passes through QSAR modelValues are feasible.
The european union "on chemical registration, assessment, approval and restriction system" also clearly stipulates that QSAR methods can provide information support for the registration of chemicals. The consortium of ventures (OECD) issued guidelines for QSAR model establishment and use in 2007, and proposed the following five-point criteria that QSAR models should meet: (1) there is a well-defined environmental index; (2) using a well-defined mathematical algorithm; (3) defining an application domain of the model; (4) The model has proper goodness of fit, robustness and prediction capability; (5) explanation of the mechanism as much as possible.
At present, researchers have successfully established organic compounds and compounds by applying QSAR methods 1 O 2 ReactingThe predictive model of (3). Based on Hammett substitution constant and oxidation potential (E) as described in "environ. Sci. Technol.1991,25 (9), 1596-1604 1/2 ) 6 models were constructed to predict 22 phenolic and phenoxide compounds using Linear Regression (LR) methodThe correlation coefficients of the models are all above 0.700 (0.706, 0.723,0.828,0.914,0.918 and 0.920); the literature "J.Chemom.1996,10 (2), 79-93" LR-based methods and two quantitative descriptors (E, respectively) 1/2 And highest occupied molecular orbital energy (E) HOMO ) 2 can be used for predicting 21 phenolic compoundsThe correlation coefficients of the prediction model are 0.800 and 0.810 respectively, and a Partial Least Squares (PLS) method is used for establishing a prediction model covering 21 phenolic compounds based on 15 descriptorsThe prediction model of (3), the correlation coefficient is 0.871; the document "j.mol.graphics model.2009, 28 (1), 12-19" uses a Multiple Linear Regression (MLR) method and 6 molecular descriptors to construct a QSAR model for predicting heterocyclic compounds and molecular descriptors 1 O 2 Overall quenching rate constant (k) of the reaction t ) And a good linear relation (r) is obtained 2 = 0.940); the document "environ.sci.process.impact.2017,19 (3), 324-338" adopts E 1/2 The descriptor establishes three QSAR models which can be used for predicting phenols, phenates, phenols and phenates respectively through an LR method, and the correlation coefficients of the QSAR models are 0.390,0.590 and 0.710 respectively; the literature "environ. Chem.2018,14 (7), 442-450" uses quantitative descriptors to build five non-linear and linear classification QSAR models for predicting non-aromatic olefination, respectivelyAqueous phase k of compound, naphthalene and anthracene, thiol and thioether, aromatic alkene compound, and aromatic amine 1O2 The correlation coefficients are 0.860,0.730,0.720,0.740, and 0.880, respectively. Although most of the above models have better fitting ability, the models cover a smaller number of compounds and are only applicable to a specific class of compounds (mainly for phenols and phenoxides), i.e. the application domain is small, and the models are not externally verified according to the rules of OECD for the QSAR modeling guideline. In addition, many organic compounds have ionizing groups, which are present in different dissociated forms in an aqueous environment and are different from those of the organic compounds 1 O 2 May vary greatly.
Disclosure of Invention
The invention aims to provide a method and a system for predicting the reaction rate constant of organic compounds and singlet oxygen, which can simply, quickly and efficiently predict the reaction rate constant of organic compounds and singlet oxygen in water under any pH condition.
In order to achieve the purpose, the invention provides the following scheme:
a method for predicting a reaction rate constant of an organic matter and singlet oxygen comprises the following steps:
collecting reaction rate constant related data of an organic matter under different pH values of an aqueous solution, wherein the reaction rate constant related data comprises molecular state compound reaction rate constant related data and ionic state compound reaction rate constant related data;
dividing the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data;
performing statistical regression modeling according to the relevant data of the training set reaction rate constants to obtain a singlet oxygen reaction rate constant quantitative structure-activity relation model, wherein the singlet oxygen reaction rate constant quantitative structure-activity relation model comprises a comprehensive quantitative structure-activity relation model covering molecular compounds and ionic compounds, a quantitative structure-activity relation model covering molecular compound reaction rate constants and a quantitative structure-activity relation model covering ionic compound reaction rate constants;
verifying the quantitative structure-activity relation model according to the data related to the reaction rate constant of the test set;
obtaining a molecular structure of an organic matter to be predicted;
and inputting the molecular structure of the organic matter to be predicted into the corresponding quantitative structure-activity relation model as input to obtain a reaction rate constant of the organic matter to be predicted and the singlet oxygen.
Optionally, the acquiring of the data related to the reaction rate constant of the organic matter under different pH values of the aqueous solution, where the data related to the reaction rate constant includes data related to the reaction rate constant of the molecular state compound and data related to the reaction rate constant of the ionic state compound, specifically includes:
collecting 180 reaction rate constant related data of the organic matter under the condition that the pH = 3-12 of the aqueous solution, wherein the molecular state compound reaction rate constant related data is 109, and the ionic state compound reaction rate constant related data is 71.
Optionally, the dividing the data related to the reaction rate constant into data related to the reaction rate constant of the training set and data related to the reaction rate constant of the test set specifically includes:
splitting the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data according to a ratio of 4; the number of compounds in a training set covering a molecular state compound reaction rate constant quantitative structure-activity relation model is 89, and the number of compounds in a verification set covering the molecular state compound reaction rate constant quantitative structure-activity relation model is 20; the number of compounds in the training set covering the ionic compound reaction rate constant quantitative structure-activity relation model is 58, and the number of compounds in the verification set covering the ionic compound reaction rate constant quantitative structure-activity relation model is 13.
Optionally, the performing statistical regression modeling according to the data related to the reaction rate constant of the training set to obtain a quantitative structure-activity relationship model of the reaction rate constant of singlet oxygen specifically includes:
obtaining a comprehensive quantitative structure-activity relation model covering the molecular state and the ionic state according to a reaction rate constant training set covering the molecular state and the ionic state:
obtaining a quantitative structure-activity relation model covering the reaction rate constants of the molecular compounds according to a training set covering the reaction rate constants of the molecular compounds:
obtaining an ionic compound reaction rate constant quantitative structure-activity relation model according to a training set covering the ionic compound reaction rate constants:
logk 1O2, ions =13.327–1.016E 1/2 –0.789C-017+0.018SAdon+0.421n ArOR –0.121H-047–0.058N%;
Wherein logk 1O2 Represents the rate constant, logk, of the reaction of organic matter with singlet oxygen 1O2, synthesis Representing the calculated rate constant, logk, for predicting the reaction of molecular and/or ionic states with singlet oxygen based on a comprehensive quantitative structure-activity relationship model covering both molecular and ionic compounds 1O2, molecule Representing the reaction rate constant, logk, of the molecular state compound calculated based on a quantitative structure-activity relationship model covering the reaction rate constant of the molecular state compound 1O2, ion Representing the reaction rate constant of the ionic compound calculated on the basis of a model covering the quantitative structure-activity relationship of the reaction rate constant of the ionic compound, E 1/2 Represents a half-wave potential, C-008 represents CHR 2 The group X, H-047 represents a hydrogen atom attached to an sp2 or sp3 hybridized carbon atom,h-054 represents a hydrogen atom attached to an sp3 hybridized carbon atom, to which is attached three electronegative atoms, n Crq Denotes the number of sp3 hybridized carbon atoms of the four-membered ring, H-046 denotes the hydrogen atom attached to the sp3 hybridized carbon atom and to the next carbon atom to which this carbon atom is attached without an electronegative atom, n C(=N)N2 Representing the number of guanidine derivatives, C-034 representing the R- -CR min,C + Represents the minimum Fukui index at a carbon atom in the molecule, and N-077 represents Al-NO 2 Molecular fragment, n TB Denotes the number of triple bonds, N aasN Denotes the number of aasN type atoms, n R05 Denotes the number of five-membered rings, n dssC Number of atoms representing dssC type, n sssCH Number of atoms representing the type sssCH, n H Represents the number of hydrogen atoms, n CONN Denotes the number of urea derivatives, n Ct Denotes the number of tertiary carbons, n Pyridines Denotes the number of pyridine functions, f min + Represents the minimum Fukui index on an atom in the molecule, C-017 means = CR 2 The group, SAdon, represents the surface area of the donor atom in the P _ VSA-like descriptor, n ArOR Represents the number of ethers in the aromatic compound, and N% represents the percentage of nitrogen atoms in the molecule.
Optionally, the organic compounds include alcohols, ketones, ethers, aldehydes, acids, esters, halogenated compounds, aromatic compounds, heterocyclic compounds, nitrogen-containing compounds, sulfur-containing compounds, and emerging pollutants.
A system for predicting a rate constant for an organic material to react with singlet oxygen, comprising:
the data acquisition module is used for acquiring reaction rate constant related data of the organic matters under different pH values of the aqueous solution, wherein the reaction rate constant related data comprises molecular state compound reaction rate constant related data and ionic state compound reaction rate constant related data;
the data dividing module is used for dividing the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data;
the model construction module is used for carrying out statistical regression modeling according to the relevant data of the reaction rate constants of the training set to obtain a singlet oxygen reaction rate constant quantitative structure-activity relation model, and the singlet oxygen reaction rate constant quantitative structure-activity relation model comprises a comprehensive quantitative structure-activity relation model covering molecular compounds and ionic compounds, a quantitative structure-activity relation model covering the reaction rate constants of the molecular compounds and a quantitative structure-activity relation model covering the reaction rate constants of the ionic compounds;
the verification module is used for verifying the quantitative structure-activity relation model according to the data related to the reaction rate constant of the test set;
the acquisition module is used for acquiring the molecular structure of the organic matter to be predicted;
and the prediction module is used for inputting the molecular structure of the organic matter to be predicted into the corresponding quantitative structure-activity relation model as input to obtain a reaction rate constant of the organic matter to be predicted and the singlet oxygen.
Optionally, the data acquisition module specifically includes:
the data acquisition unit is used for acquiring 180 pieces of reaction rate constant related data of the organic matter under the condition that the pH = 3-12 of the aqueous solution, wherein the molecular state compound reaction rate constant related data is 109 pieces, and the ionic state compound reaction rate constant related data is 71 pieces.
Optionally, the data dividing module specifically includes:
the data dividing unit is used for dividing the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data according to a ratio of 4; the number of compounds in a training set covering the molecular state compound reaction rate constant quantitative structure-activity relation model is 89, and the number of compounds in a verification set covering the molecular state compound reaction rate constant quantitative structure-activity relation model is 20; the number of compounds in the training set covering the ionic compound reaction rate constant quantitative structure-activity relationship model is 58, and the number of compounds in the verification set covering the ionic compound reaction rate constant quantitative structure-activity relationship model is 13.
Optionally, the model building module specifically includes:
the first model building unit is used for obtaining a comprehensive quantitative structure-activity relation model covering the molecular state and the ionic state compound according to a reaction rate constant training set covering the molecular state and the ionic state:
the second model building unit is used for obtaining a covering molecular state compound reaction rate constant quantitative structure-activity relation model according to the covering molecular state compound reaction rate constant training set:
the third model building unit is used for obtaining the quantitative structure-activity relation model of the reaction rate constant of the ionic compound according to the training set covering the reaction rate constant of the ionic compound:
logk 1O2, ion =13.327–1.016E 1/2 –0.789C-017+0.018SAdon+0.421n ArOR –0.121H-047–0.058N%;
Wherein logk 1O2 Represents the rate constant, logk, of the reaction of organic matter with singlet oxygen 1O2, synthesis Representing the calculated rate constant, logk, for predicting the reaction of molecular and/or ionic states with singlet oxygen based on a comprehensive quantitative structure-activity relationship model covering both molecular and ionic compounds 1O2, molecule Representing the reaction rate constant of the molecular state compound calculated based on a quantitative structure-activity relation model covering the reaction rate constant of the molecular state compound,logk 1O2, ions Representing the reaction rate constant of the ionic compound calculated on the basis of a model covering the quantitative structure-activity relationship of the reaction rate constant of the ionic compound, E 1/2 Represents a half-wave potential, and C-008 represents CHR 2 A group X, H-047 represents a hydrogen atom attached to an sp2 or sp3 hybridized carbon atom, H-054 represents a hydrogen atom attached to an sp3 hybridized carbon atom to which the next carbon atom is attached three electronegative atoms, n Crq Denotes the number of sp3 hybridized carbon atoms of the four-membered ring, H-046 denotes the hydrogen atom attached to the sp3 hybridized carbon atom and to the next carbon atom to which this carbon atom is attached without an electronegative atom, n C(=N)N2 Representing the number of guanidine derivatives, C-034 representing a fragment of the R- -CR min,C + Represents the minimum Fukui index at a carbon atom in the molecule, and N-077 represents Al-NO 2 Molecular fragment, n TB Denotes the number of triple bonds, N aasN Denotes the number of aasN type atoms, n R05 Denotes the number of five-membered rings, n dssC Number of atoms representing dssC type, n sssCH Number of atoms representing the type sssCH, n H Represents the number of hydrogen atoms, n CONN Denotes the number of urea derivatives, n Ct Denotes the number of tertiary carbons, n Pyridines Denotes the number of pyridine functions, f min + Represents the minimum Fukui index on an atom in the molecule, C-017 means = CR 2 The group, SAdon, represents the surface area of the donor atom in the P _ VSA-like descriptor, n ArOR Represents the number of ethers in the aromatic compound, and N% represents the percentage of nitrogen atoms in the molecule.
Optionally, the organic compounds include alcohols, ketones, ethers, aldehydes, acids, esters, halogenated compounds, aromatic compounds, heterocyclic compounds, nitrogen-containing compounds, sulfur-containing compounds, and emerging pollutants.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention can be based on the acid-base dissociation constant (pK) of the compound a ) And the pH value of the aqueous solution, calculating the ratio of each form of the compound under any pH condition, and predictingRate constant of reaction of organic compound with singlet oxygen at any pHThe value is obtained. The invention has low cost, is simple, convenient and quick, and saves a large amount of manpower, material resources and financial resources; the invention relates toEstablishment and validation of predictive methods strictly following OECD-mandated guidelines for the development and use of QSAR models, and thus use of the inventionPredicting the result and helping to understand the water 1 O 2 The method provides scientific basis for the practical feasibility evaluation of the photosensitization technology and provides important data support for the ecological risk evaluation and management work of chemicals for the conversion trend of organic pollutants.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a method for predicting the reaction rate constant of an organic substance and singlet oxygen according to the present invention;
FIG. 2 is a training set and validation set of compounds encompassing a comprehensive quantitative structure-activity relationship model for molecular and ionic compoundsFitting graph of the measured value and the predicted value;
FIG. 3 is a graph of a training set and validation set of compounds covering a quantitative structure-activity relationship model for the reaction rate constant of a molecular compoundFitting graph of the measured value and the predicted value;
FIG. 4 is a training set and validation set of quantitative structure-activity relationship models covering the reaction rate constants of ionic compoundsFitting graph of the measured value and the predicted value;
FIG. 5 is a comprehensive quantitative structure-activity relationship model application domain characterization plot covering molecular and ionic compounds;
FIG. 6 is a representation of the domain of application of a quantitative structure-activity relationship model covering the reaction rate constants of the compounds in molecular state;
FIG. 7 is a plot of domain characterization for a quantitative structure-activity relationship model covering the reaction rate constants of ionic compounds;
FIG. 8 is a diagram of a system for predicting the reaction rate constant of organic substances with singlet oxygen according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a method and a system for predicting the reaction rate constant of organic compounds and singlet oxygen, which can simply, quickly and efficiently predict the reaction rate constant of organic compounds and singlet oxygen in water under any pH condition.
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.
FIG. 1 is a flow chart of a method for predicting a reaction rate constant of an organic substance and singlet oxygen according to the present invention. The invention relates to a method for predicting the reaction rate constant of organic matters and singlet oxygen, which mainly predicts the reaction rate constant of the organic matters and the singlet oxygen in different dissociation forms in water through a quantitative structure-activity relationship model. As shown in fig. 1, a method for predicting a reaction rate constant of an organic substance and singlet oxygen includes:
step 101: collecting reaction rate constant related data of organic matters under different pH values of an aqueous solution, wherein the reaction rate constant related data comprise molecular state compound reaction rate constant related data and ionic state compound reaction rate constant related data, and the method specifically comprises the following steps:
collecting 180 reaction rate constant related data of an organic matter under the condition that the pH = 3-12 of an aqueous solution, wherein the molecular state compound reaction rate constant related data is 109, and the ionic state compound reaction rate constant related data is 71.
The organic compounds include alcohols, ketones, ethers, aldehydes, acids, esters, halogenated compounds, aromatic compounds, heterocyclic compounds, nitrogen-containing compounds, sulfur-containing compounds, and emerging pollutants (such as pharmaceuticals and personal care products, endocrine disrupting chemicals, pesticides, herbicides, brominated flame retardants, and the like).
Step 102: dividing the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data, and specifically comprising:
splitting the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data according to a ratio of 4; the number of compounds in a training set covering the molecular state compound reaction rate constant quantitative structure-activity relation model is 89, and the number of compounds in a verification set covering the molecular state compound reaction rate constant quantitative structure-activity relation model is 20; the number of compounds in the training set covering the ionic compound reaction rate constant quantitative structure-activity relation model is 58, and the number of compounds in the verification set covering the ionic compound reaction rate constant quantitative structure-activity relation model is 13.
Step 103: performing statistical regression modeling according to the data related to the reaction rate constants of the training set to obtain a singlet oxygen reaction rate constant quantitative structure-activity relationship model, wherein the singlet oxygen reaction rate constant quantitative structure-activity relationship model comprises a comprehensive quantitative structure-activity relationship model covering molecular compounds and ionic compounds, a quantitative structure-activity relationship model covering the reaction rate constants of the molecular compounds and a quantitative structure-activity relationship model covering the reaction rate constants of the ionic compounds, and specifically comprises the following steps:
obtaining a comprehensive quantitative structure-activity relation model covering the molecular state and the ionic state according to a reaction rate constant training set covering the molecular state and the ionic state:
obtaining a quantitative structure-activity relation model covering the reaction rate constants of the molecular compounds according to a training set covering the reaction rate constants of the molecular compounds:
obtaining a quantitative structure-activity relation model of the reaction rate constant of the ionic compound according to a training set covering the reaction rate constant of the ionic compound:
logk 1O2, ion =13.327–1.016E 1/2 –0.789C-017+0.018SAdon+0.421n ArOR –0.121H-047–0.058N% (3)
Wherein logk 1O2 Represents the rate constant, logk, of the reaction of organic matter with singlet oxygen 1O2, synthesis Representing rate constants that can be used to predict the reaction of molecular and/or ionic states with singlet oxygen calculated based on a comprehensive quantitative structure-activity relationship model covering both molecular and ionic compounds,logk 1O2, molecule Representing the reaction rate constant, logk, of the molecular state compound calculated based on a quantitative structure-activity relationship model covering the reaction rate constant of the molecular state compound 1O2, ion Representing the reaction rate constant of the ionic compound calculated on the basis of a model covering the quantitative structure-activity relationship of the reaction rate constant of the ionic compound, E 1/2 Represents a half-wave potential, and C-008 represents CHR 2 The X group, H-047 represents a hydrogen atom attached to an sp2 or sp3 hybridized carbon atom, H-054 represents a hydrogen atom attached to an sp3 hybridized carbon atom, to which the next carbon atom is attached three electronegative atoms (such as O, N, S, P, se and halogen atoms), N Crq Denotes the number of sp3 hybridized carbon atoms of the four-membered ring, H-046 denotes the hydrogen atom attached to the sp3 hybridized carbon atom and to the next carbon atom to which this carbon atom is attached without an electronegative atom, n C(=N)N2 Represents the number of guanidine derivatives, C-034 represents R- -CR.. X molecular fragments (wherein R represents any group attached through a carbon atom, X represents any electronegative atom (O, N, S, P, se and halogen atoms), "- - -" represents an aromatic bond in the benzene ring or a delocalized bond such as an N- -O bond in the nitro group, ". Represents a single C- -N bond in the pyrrole), f min,C + Represents the minimum Fukui index at a carbon atom in the molecule, and N-077 represents Al-NO 2 Molecular fragment (where Al is an aliphatic group), n TB Denotes the number of triple bonds, N aasN Denotes the number of aasN-type atoms (wherein a denotes an aromatic bond and s denotes a single bond), n R05 Denotes the number of five-membered rings, n dssC Denotes the number of atoms of the dssC type (wherein d denotes a double bond and s denotes a single bond), n sssCH Denotes the number of atoms of the sssCH type (wherein s denotes a single bond), n H Represents the number of hydrogen atoms, n CONN Denotes the number of urea (sulphur-containing) derivatives, n Ct Denotes the number of tertiary carbons (sp 3 hybridization), n Pyridines Denotes the number of pyridine functions, f min + Represents the minimum Fukui index on an atom in the molecule, C-017 means = CR 2 The group SAdon represents the donor atom in the P _ VSA-like descriptor (i.e. -OH oxygen and-NH/-NH) 2 Nitrogen atom of (2), n ArOR In the expression of aromatic compoundsThe number of ethers, N%, represents the percentage of nitrogen atoms in the molecule.
Step 104: and verifying the quantitative structure-activity relation model according to the test set reaction rate constant related data, wherein the step is mainly used for evaluating the goodness of fit, robustness and prediction capability of the model.
The number of compounds in a training set covering a comprehensive quantitative structure-activity relation model of molecular state compounds and ionic state compounds is n tr =145, variable expansion factor (VIF) of each descriptor is less than 10, matrix M of independent variables and dependent variables YX And an argument matrix M X K correlation index K of XY And K XX Satisfy K XX (0.222)<K XY (0.256), all indicating that the model does not have multiple correlations; the fitting ability of the model is represented by R 2 And root mean square error, RMSE, R 2 tr =0.767,rmse =0.579, indicating that the model has good fitting ability; model robustness is determined by the cross-validation factor (Q) of the internal validation 2 LOO ) And Q obtained by Bootstrapping method 2 BOOT Evaluation, Q 2 LOO =0.604,Q 2 BOOT =0.769,R 2 And Q 2 The difference is far less than 0.3, so that the model is considered to have no overfitting phenomenon and has good robustness; in the external verification process of the model, the number n of verification set data ext =35, R of external prediction correlation coefficient 2 ext =0.749,Q 2 ext =0.785,RMSE ext =0.378, it appears that the model has good external prediction capabilities. The model has wider application range than the former research result, and the covered compound functional groups comprise>C=C<,–C(=O)–,–O–,–COOH/–COOR,–COO – ,–CHO,Ph–,–CN,–OH,Ph–O – ,–NH–/–NH 2 ,–N<,–NO 2 ,–NH–C(O)–,>N–N<-SH, -S-, -O =) S (= O) -, - (O =) (NH-) S (= O) -, -X (Br, cl and F).
Quantitative structure-activity relationship covering reaction rate constant of molecular compoundThe number of training set compounds of the series model is n tr =89, variable expansion factor (VIF) of each descriptor is less than 10, matrix M of independent variables and dependent variables YX And an argument matrix M X K correlation index K of XY And K X Satisfy K X (0.302)<K XY (0.328), both indicating that the model does not have multiple correlations; the fitting ability of the model is represented by R 2 And root mean square error, RMSE, R 2 tr =0.742,rmse =0.592, indicating that the model has good fitting ability; model robustness is determined by the cross-validation factor (Q) of the internal validation 2 LOO ) And Q obtained by Bootstrapping method 2 BOOT Evaluation, Q 2 LOO =0.617,Q 2 BOOT =0.762,R 2 And Q 2 The difference is far less than 0.3, so that the model is considered to have no overfitting phenomenon and has good robustness; in the external verification process of the model, the number n of verification set data ext =20, R of external prediction correlation coefficient 2 ext =0.774,Q 2 ext =0.774,RMSE ext =0.516, it appears that the model has good external prediction capability.
The number of compounds in a training set of a quantitative structure-activity relation model covering the reaction rate constant of the ionic compound is n =58, the variable expansion factor (VIF) of each descriptor is less than 10, and a matrix M consisting of independent variables and dependent variables YX And an argument matrix M X K correlation index K of XY And K XX Satisfy K XX (0.330)<K XY (0.400), all indicate that the model does not have multiple correlations; the fitting ability of the model is represented by R 2 And root mean square error, RMSE characterization, R 2 tr =0.790,rmse =0.381, indicating that the model has good fitting ability; model robustness is determined by the cross-validation factor (Q) of the internal validation 2 LOO ) And Q obtained by Bootstrapping method 2 BOOT Evaluation, Q 2 LOO =0.696,Q 2 BOOT =0.768,R 2 And Q 2 The difference is far less than 0.3, the model is considered to have no overfitting phenomenon and to have good performanceThe robustness of (2); in the external verification process of the model, the number n of verification set data ext =13, R of external prediction correlation coefficient 2 ext =0.799,Q 2 ext =0.797,RMSE ext =0.333, it appears that the model has good external prediction capability.
The three models above pick descriptors from 2 sources:
(1) And 25 quantum chemical descriptors are selected.
(2) A DRAGON descriptor, wherein 526 DRAGON descriptors related to component indexes, rings, the number of functional groups, fragments with atoms as centers, atomic E-state indexes and molecular properties are calculated based on the optimized configuration; focused on training separatelyThe data and all descriptors above were subjected to MLR regression analysis.
Step 105: and obtaining the molecular structure of the organic matter to be predicted.
Step 106: and inputting the molecular structure of the organic matter to be predicted into the corresponding quantitative structure-activity relation model as input to obtain a reaction rate constant of the organic matter to be predicted and the singlet oxygen.
FIG. 2 is a comprehensive quantitative structure-activity relationship model training set and validation set of compounds encompassing molecular and ionic compoundsFitting graph of measured value and predicted value. FIG. 3 is a training set of quantitative structure-activity relationship models and compounds in validation sets covering the reaction rate constants of molecular state compoundsFitting graph of measured value and predicted value. FIG. 4 is a set of training and validation experiments for a quantitative structure-activity relationship model covering the reaction rate constant of ionic compoundsFitting graph of measured value and predicted value. FIG. 5 is a domain characterization graph of the application of the established comprehensive quantitative structure-activity relationship model covering molecular and ionic compounds. FIG. 6 is a domain characterization diagram of an established quantitative structure-activity relationship model covering the reaction rate constant of a molecular compound. FIG. 7 is a plot of the domain characterization of a quantitative structure-activity relationship model covering the reaction rate constants of ionic compounds. In each figure: circle represents training set compound and Δ represents validation set compound.
The results show that 3 models can effectively predict the content of alkane, olefin, alcohol, ketone, ether, aldehyde, acid, ester, halogenated compound, aromatic compound, heterocyclic compound, nitrogen-containing compound, sulfur-containing compound and some emerging pollutants (such as medicine and personal care product, endocrine disrupting chemical, pesticide, herbicide, brominated flame retardant and the like)A numerical value; the application domain characterization of the above 3 models adopts Williams' method, that is, the standard residual error (delta) of the compound is used for the distance (h) of the compound from the center of the descriptor i I represents a different compound) graphically characterize the applied domain of the model and diagnose outliers, with absolute values of δ greater than 3 considered outliers. h is a total of i Alarm value (h) * ) And δ is defined as:
h i =x i T (X T X) -1 x i (4)
h * =3(A+1)/n (5)
wherein x is i Is a variable of a molecular structure descriptor of the ith compound; x is a matrix formed by the molecular structure descriptors; a is the number of descriptors contained in the model; n is the number of compounds in the model; y is i Andexperimental and predicted values for the ith compound, respectively.
For the three models constructed, | δ # of all compounds<3, no outliers; h of three models * The values were 0.3,0.4 and 0.4, respectively. H of Compounds in training set i >h * When the molecular structure is small, the molecular substructure is shown to be less in the data set, and the molecular structure has certain influence on the model; verification of h of concentrated Compounds i >h * ,|δ|<When 3, the prediction result is an extrapolation result of the model, and the model is also applicable to the compound, but the uncertainty is large.
The comprehensive quantitative structure-activity relation model covering molecular state and ionic state compounds can be used for various organic compounds with different dissociation states in water environment and 1 O 2 reactingA numerical value; a quantitative structure-activity relationship model covering the reaction rate constant of a molecular compound can be used for effectively predicting the molecular state of an organic compound 1 O 2 ReactingA numerical value; a quantitative structure-activity relationship model covering the reaction rate constant of ionic compounds can be used for effectively predicting the ionic state of organic compounds 1 O 2 Of reactionNumerical values. It is worth noting that the acid-base dissociation constant (pK) of a compound is dependent on a ) And the pH value of the aqueous solution, the ratio of each form of the compound under any pH condition can be calculated, and the organic compound under any pH condition can be predicted based on the formulas (1), (2) and (3) and the formulas (7), (8) and (9)The value is obtained. The method has low cost, simple process, and high speed, and saves a large amount of manpowerMaterial and financial resources; the invention relates toEstablishment and verification of the predictive method strictly following the guidelines for the development and use of QSAR models prescribed by OECD, and thus using the inventionPredicting the result and helping to understand the water 1 O 2 The method provides scientific basis for the practical feasibility evaluation of the photosensitization technology and provides important data support for the ecological risk evaluation and management work of chemicals for the conversion trend of organic pollutants.
k i =k i,n α+k i,a (1-α) (7)
Wherein, k in the formula i,n And k i,a Respectively a molecular compound and an ionic compound and 1 O 2 the reaction rate constant can be obtained by using a comprehensive quantitative structure-activity relation model covering molecular state and ionic state compounds, or respectively calculated by using a quantitative structure-activity relation model covering the molecular state compound reaction rate constant and a quantitative structure-activity relation model covering the ionic state compound reaction rate constant; alpha and (1-alpha) are the percentages of the molecular and dissociated states of the compound in solution, respectively.
K is determined by the number of dissociation groups of different compounds 1O2 The specific calculation formula of (2) is as follows:
wherein k is 1 And k 2 Each represents a compound having one or two dissociative groups and 1 O 2 rate constant of reaction, in formula (8)Andare molecular and ionic compounds and 1 O 2 the reaction rate constant can be obtained by using a comprehensive quantitative structure-activity relation model covering molecular state and ionic state compounds, or respectively calculated by using a quantitative structure-activity relation model covering the molecular state compound reaction rate constant and a quantitative structure-activity relation model covering the ionic state compound reaction rate constant; in formula (9)Andrespectively, a first order dissociation form and a second order dissociation form of the compound 1 O 2 The rate constants of the reactions can be obtained using a comprehensive quantitative structure-activity relationship model covering the molecular state and ionic state compounds or calculated using a quantitative structure-activity relationship model covering the rate constants of the reactions of the ionic state compounds.
The method for predicting the reaction rate constant of the organic matter and the singlet oxygen has the following characteristics:
1. the application domain of the model covers various structures of organic compounds such as alcohols, ketones, ethers, aldehydes, acids, esters, halogenated compounds, aromatic compounds, heterocyclic compounds, nitrogen-containing compounds, sulfur-containing compounds and some emerging pollutants (such as drugs and personal care products, endocrine disrupting chemicals, pesticides, herbicides, brominated flame retardants, etc.), according to the pK of the compounds a pH of aqueous solutionValue and development model, which can predict different dissociation forms of organic compounds under any pH conditionValue of, this for evaluation 1 O 2 The method has important significance for the transformation of organic pollutants in natural water, and can also provide scientific basis for the practical feasibility evaluation of the photosensitization technology.
2. In the modeling process, a transparent algorithm (MLR) recommended by OECD is adopted, 13 descriptors are screened out for a comprehensive quantitative structure-activity relationship model covering molecular state compounds and ionic state compounds, 10 descriptors are screened out for a quantitative structure-activity relationship model covering molecular state compound reaction rate constants, and 6 descriptors are screened out for a quantitative structure-activity relationship model covering ionic state compound reaction rate constants, so that the model is simple and convenient to analyze, understand and apply.
3. According to the guiding rule of OECD about QSAR model construction and use, the established model has good fitting capability, robustness and prediction capability.
FIG. 8 is a diagram of a system for predicting the reaction rate constant of organic substances with singlet oxygen according to the present invention. As shown in fig. 8, a system for predicting a reaction rate constant of an organic substance with singlet oxygen includes:
the data acquisition module 201 is configured to acquire data related to reaction rate constants of the organic matter under different pH values of the aqueous solution, where the data related to the reaction rate constants include data related to reaction rate constants of molecular state compounds and data related to reaction rate constants of ionic state compounds. The organic compounds include alcohols, ketones, ethers, aldehydes, acids, esters, halogenated compounds, aromatic compounds, heterocyclic compounds, nitrogen-containing compounds, sulfur-containing compounds, and emerging pollutants.
A data dividing module 202, configured to divide the data related to the reaction rate constant into data related to a training set reaction rate constant and data related to a test set reaction rate constant.
The model construction module 203 is configured to perform statistical regression modeling according to the data related to the training set reaction rate constants to obtain a singlet oxygen reaction rate constant quantitative structure-activity relationship model, where the singlet oxygen reaction rate constant quantitative structure-activity relationship model includes a comprehensive quantitative structure-activity relationship model covering molecular compounds and ionic compounds, a quantitative structure-activity relationship model covering molecular compound reaction rate constants, and a quantitative structure-activity relationship model covering ionic compound reaction rate constants.
A verification module 204, configured to verify the quantitative structure-activity relationship model according to the data related to the test set reaction rate constant.
An obtaining module 205, configured to obtain a molecular structure of an organic matter to be predicted.
And the prediction module 206 is configured to input the molecular structure of the organic matter to be predicted as an input into the corresponding quantitative structure-activity relationship model, so as to obtain a reaction rate constant of the organic matter to be predicted and singlet oxygen.
The data acquisition module 201 specifically includes:
the data acquisition unit is used for acquiring 180 pieces of reaction rate constant related data of the organic matter under the condition that the pH = 3-12 of the aqueous solution, wherein the molecular state compound reaction rate constant related data is 109 pieces, and the ionic state compound reaction rate constant related data is 71 pieces.
The data dividing module 202 specifically includes:
the data dividing unit is used for dividing the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data according to a ratio of 4; the number of compounds in a training set covering a molecular state compound reaction rate constant quantitative structure-activity relation model is 89, and the number of compounds in a verification set covering the molecular state compound reaction rate constant quantitative structure-activity relation model is 20; the number of compounds in the training set covering the ionic compound reaction rate constant quantitative structure-activity relation model is 58, and the number of compounds in the verification set covering the ionic compound reaction rate constant quantitative structure-activity relation model is 13.
The model building module 203 specifically includes:
the first model building unit is used for obtaining a comprehensive quantitative structure-activity relation model covering the molecular state and the ionic state compound according to a reaction rate constant training set covering the molecular state and the ionic state:
the second model building unit is used for obtaining a covering molecular state compound reaction rate constant quantitative structure-activity relation model according to the covering molecular state compound reaction rate constant training set:
the third model building unit is used for obtaining the quantitative structure-activity relation model of the reaction rate constant of the ionic compound according to the training set covering the reaction rate constant of the ionic compound:
logk 1O2, ions =13.327–1.016E 1/2 –0.789C-017+0.018SAdon+0.421n ArOR –0.121H-047–0.058N%。
Wherein logk 1O2 Represents the rate constant, logk, of the reaction of organic matter with singlet oxygen 1O2, synthesis Representing the calculated rate constant, logk, that can be used to predict the reaction of molecular and/or ionic states with singlet oxygen based on a comprehensive quantitative structure-activity relationship model covering both molecular and ionic compounds 1O2, molecule Represents the reaction rate constant, logk of the molecular state compound calculated based on a quantitative structure-activity relation model covering the reaction rate constant of the molecular state compound 1O2, ions Representing ionic compounds calculated based on a quantitative structure-activity relationship model covering the reaction rate constants of ionic compoundsConstant of rate of reaction of the reactants, E 1/2 Represents a half-wave potential, and C-008 represents CHR 2 A group X, H-047 represents a hydrogen atom attached to an sp2 or sp3 hybridized carbon atom, H-054 represents a hydrogen atom attached to an sp3 hybridized carbon atom to which the next carbon atom is attached three electronegative atoms, n Crq Denotes the number of sp3 hybridized carbon atoms of the four membered ring, H-046 denotes a hydrogen atom attached to an sp3 hybridized carbon atom, to which the next carbon atom is not attached an electronegative atom, n C(=N)N2 Representing the number of guanidine derivatives, C-034 representing a fragment of the R- -CR min,C + Represents the minimum Fukui index at a carbon atom in the molecule, and N-077 represents Al-NO 2 Molecular fragment, n TB Denotes the number of triple bonds, N aasN Denotes the number of aasN type atoms, n R05 Denotes the number of five-membered rings, n dssC Number of atoms representing dssC type, n sssCH Number of atoms representing the type sssCH, n H Denotes the number of hydrogen atoms, n CONN Denotes the number of urea derivatives, n Ct Denotes the number of tertiary carbons, n Pyridines Denotes the number of pyridine functions, f min + Represents the minimum Fukui index on an atom in the molecule, C-017 means = CR 2 The group, SAdon, represents the surface area of the donor atom in the P _ VSA-like descriptor, n ArOR Represents the number of ethers in the aromatic compound, and N% represents the percentage of nitrogen atoms in the molecule.
Example 1:
given a compound of p-hydroxybiphenyl, its use in aqueous solutions at pH 11 is to be predictedNumerical values. First, the form of p-hydroxybiphenyl in the solution is determined, which is known to have a dissociation group and pK a 9.2, and thus the form in an aqueous solution at pH 11 can be calculated by the formula (8) to be a negative monovalent ion state of 100%. According to the structural information of the negative monovalent p-hydroxybiphenyl, the E of the p-hydroxybiphenyl is calculated by using Gaussian 16 software 1/2 The method comprises the following specific steps: optimized by adopting the method of B3 LYP/6-311G (d, p)Molecular structure, and thus IP gas,298K (equation 10), calculating the single point energy of the molecule by adopting an SMD/M06-2X/6-311+ G (2df, p) method based on the optimized configuration, and then respectively calculating based on equation (11) to obtainAnd Δ G sol,A Then E of the univalent p-hydroxybiphenyl can be calculated by the formulas (12 and 13) 1/2 The value was 4.91:
ΔG sol ≈E sol -E gas (11)
wherein the content of the first and second substances,and E gas,A Respectively, the enthalpies of the cationic and neutral states of the compound molecules in the gas phase;and Δ G sol,A The free energies of solvation, which represent the cationic and neutral states of the molecule, respectively; Δ G sol Represents a solvation free energy; e sol And E gas Respectively represents the enthalpy of the compound molecules in the liquid phase and the gas phase;representing the difference between the total electron energies of the molecules in the solvation and in the gas phase; n represents the number of transferred electrons; f denotes the faraday constant.
Next, using a MaterThe ials Studio 6.0 can calculate f min,C + The value was 0.006. Next, C-034, H-047, H-054, C-008, N-077, n were calculated by Draogon7.0 software C(=N)N2 ,n Crq ,H-046,n TB ,n aasN ,n R05 C-017, SAdon, arOR, H-047, N% respectively is 0,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0. Then according to the formulas (4), (5) and (6), the h value of the negative univalent p-hydroxybiphenyl in the comprehensive quantitative structure-activity relation model covering the molecular state and the ionic state compound is calculated to be 0.0170: (<0.3 H value of 0.0471 of the negative monovalent p-hydroxybiphenyl in a quantitative structure-activity relationship model covering the reaction rate constant of the ionic compound (h value)<0.4 Both of which are within the application domain of a comprehensive quantitative structure-activity relationship model covering molecular and ionic compounds and a quantitative structure-activity relationship model covering the reaction rate constant of ionic compounds, so that the prediction can be performed by using the two models. The values of the descriptors are then respectively introduced into the formulas (1) and (3), so that the prediction of the monovalent-negative p-hydroxybiphenyl under the condition of pH =11 by using a comprehensive model (comprehensive quantitative structure-activity relation model covering molecular and ionic compound) and an ionic compound model (quantitative structure-activity relation model covering ionic compound reaction rate constant) respectivelyFinally, the values calculated by using a comprehensive quantitative structure-activity relationship model covering molecular state and ionic state compounds and a quantitative structure-activity relationship model covering reaction rate constants of ionic state compounds can be obtained according to the formula (8)The values were 7.62 and 8.47, respectively. With monovalent p-hydroxy biphenyl of negative valence under the condition of pH =11The experimental data 8.59 are compared, the difference value is 0.97 and 0.12, the predicted value is very close to the experimental value, the prediction is good, and the ionic state is goodThe model prediction effect is better than that of a comprehensive model.
Example 2:
given a compound of 4-methylcatechol, its prediction is made in aqueous solution at pH =6Numerical values. First, the morphology of 4-methylcatechol in this solution was judged, and it is known that 4-methylcatechol has two dissociation groups and pK a1 And pK a2 9.91 and 12.84, respectively, and thus the molecular state of which the form is 100% in an aqueous solution of pH =6 can be calculated by the formula (8). Then, according to the structural information of the 4-methylcatechol, the structure of the 4-methylcatechol is optimized by using Gaussian 16 software, and the molecular state 4-methylcatechol E can be calculated by using the method in example 1 1/2 The value was 5.67. Using Materials Studio 6.0, the molecular state of 4-methylcatechol f can be calculated min,C + And f min + The values were-0.007, and-0.005, respectively. Secondly, C-034, H-047, H-054, C-008, N-077, n of the molecular 4-methylcatechol was calculated by Draogon7.0 software C(=N)N2 ,n Crq ,H-046,n TB ,n aasN ,n R05 ,n H ,n Pyridines ,n CONN ,n dssC ,n sssCH ,n Ct <xnotran> 0,3,0,0,0,0,0,3,0,0,0,8,0,0,0,0,0. </xnotran> Then according to the formulas (4), (5) and (6), the h value of the molecular state 4-methyl catechol in the comprehensive quantitative structure-activity relation model covering the molecular state and the ionic state compounds can be calculated to be 0.0142: (2: (6))<0.3 H value of the molecular 4-methylcatechol in a quantitative structure-activity relationship model covering the reaction rate constant of the molecular compound is 0.0172: (<0.4 Both in the application domain of the comprehensive quantitative structure-activity relationship model covering molecular and ionic compounds and the quantitative structure-activity relationship model covering the reaction rate constant of molecular compounds, so that the two models can be used for prediction. Then, the values of the descriptors are respectively introduced into the formulas (1) and (2), and a comprehensive model (covering molecular state and ionic state) can be obtainedComprehensive quantitative structure-activity relationship model of compound) and molecular state compound model (quantitative structure-activity relationship model covering reaction rate constant of molecular state compound) for predicting molecular state 4-methylcatechol under condition of pH =6Finally, the logk calculated by using the comprehensive quantitative structure-activity relationship model covering molecular and ionic compounds and the quantitative structure-activity relationship model covering the reaction rate constant of the molecular compound can be obtained according to the formula (8) 1O2 The values were 7.16 and 6.94, respectively. With 4-methylcatechol at pH =6The experimental data 6.38 are compared, the difference value is-0.78 and-0.56, the predicted value is very close to the experimental value, the prediction is good, and the prediction effect of the molecular compound model is better than that of the comprehensive model.
Example 3:
given a compound 2' -HO-BDE-68, its use in aqueous solution at pH 8 is to be predictedNumerical values. First, the morphology of 2'-HO-BDE-68 in this solution was judged, and it was known that 2' -HO-BDE-68 has a dissociation group, the pK of which is a1 Is 6.6, so that the form in an aqueous solution having a pH of 8 can be calculated from the formula (8) to be 3.8% in the molecular state and 96.2% in the negative monovalent ion state. Then, according to the structural information of the molecular state 2'-HO-BDE-68 and the negative valence 2' -HO-BDE-68, the structure is optimized by using Gaussian 16 software, and the method in example 1 can be used for calculating the E of the molecular state 2'-HO-BDE-68 and the negative valence 2' -HO-BDE-68 1/2 The values were 6.09 and 4.87, respectively. F for the molecular state 2'-HO-BDE-68 and the negative valence 2' -HO-BDE-68 can be calculated by using Materials Studio 6.0 min,C + Values of-0.005 and-0.019, respectively, of molecular state 2' -HO-BDE-68 min + The value is-0.005. Secondly, the molecular state 2' -HO-BDE-68 of C-034, H-047, H-054, C-008, N-077, n C(=N)N2 ,n Crq ,H-046,n TB ,n aasN ,n R05 ,n H ,n Pyridines ,n CONN ,n dssC ,n sssCH ,n Ct <xnotran> 0,5,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0. </xnotran> C-034, H-047, H-054, C-008, N-077,n of monovalent negative 2' -HO-BDE-68 C(=N)N2 ,n Crq ,H-046,n TB ,n aasN ,n R05 ,C-017,SAdon,n ArOR <xnotran>, H-047, N% 0,5,0,0,0,0,0,0,0,0,0,0,0,1,5,0. </xnotran> Then according to the formulas (4), (5) and (6), the h value of the molecular state 2' -HO-BDE-68 in the comprehensive quantitative structure-activity relation model covering the molecular state and the ionic state compound can be calculated to be 0.0157 (6)<0.3 Negative monovalent 2' -HO-BDE-68 has an h value of 0.0140: (<0.3 H value of the molecular state 2' -HO-BDE-68 in a quantitative structure-activity relation model covering the reaction rate constant of the molecular state compound is 0.1150 (<0.4 H value of minus univalent 2' -HO-BDE-68 in a quantitative structure-activity relationship model covering the reaction rate constant of the ionic compound is 0.0672: (<0.4 In the application domain range of the comprehensive quantitative structure-activity relationship model covering molecular state and ionic state compounds, the quantitative structure-activity relationship model covering molecular state compound reaction rate constants and the quantitative structure-activity relationship model covering ionic state compound reaction rate constants, so that the three models can be used for prediction. Then, the values of the descriptors are respectively introduced into the formulas (1), (2) and (3), so that the prediction of 2' -HO-BDE-68 under the condition of pH =8 by using a total model (comprehensive quantitative structure-activity relation model covering molecular state and ionic state compounds) and a molecular state and ionic state compound model (quantitative structure-activity relation model covering molecular state compound reaction rate constants and quantitative structure-activity relation model covering ionic state compound reaction rate constants) can be obtainedAndfinally, the logk calculated by the total model and the classification model can be obtained according to the formula (8) 1O2 The values were 8.15 and 8.17, respectively. And 2' -HO-BDE-68 under the condition of pH =9 1O2 The experimental data 8.25 are compared, the difference value is 0.10 and 0.08, the predicted value is very close to the experimental value, the prediction is good, and the prediction effect of the classification model is slightly better than that of the comprehensive model.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the foregoing, the description is not to be taken in a limiting sense.
Claims (4)
1. A method for predicting a reaction rate constant of an organic matter and singlet oxygen is characterized by comprising the following steps:
collecting reaction rate constant related data of the organic matter under different pH values of an aqueous solution, wherein the reaction rate constant related data comprise molecular state compound reaction rate constant related data and ionic state compound reaction rate constant related data;
dividing the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data;
performing statistical regression modeling according to the relevant data of the training set reaction rate constants to obtain a singlet oxygen reaction rate constant quantitative structure-activity relation model, wherein the singlet oxygen reaction rate constant quantitative structure-activity relation model comprises a comprehensive quantitative structure-activity relation model covering molecular compounds and ionic compounds, a quantitative structure-activity relation model covering molecular compound reaction rate constants and a quantitative structure-activity relation model covering ionic compound reaction rate constants;
verifying the quantitative structure-activity relation model according to the data related to the reaction rate constant of the test set;
obtaining a molecular structure of an organic matter to be predicted;
inputting the molecular structure of the organic matter to be predicted into the corresponding quantitative structure-activity relation model as input to obtain a reaction rate constant of the organic matter to be predicted and singlet oxygen;
the method comprises the following steps of collecting reaction rate constant related data of the organic matter under different pH values of an aqueous solution, wherein the reaction rate constant related data comprise molecular state compound reaction rate constant related data and ionic state compound reaction rate constant related data, and specifically comprises the following steps:
collecting 180 reaction rate constant related data of an organic matter under the condition that the pH = 3-12 of an aqueous solution, wherein the molecular state compound reaction rate constant related data is 109, and the ionic state compound reaction rate constant related data is 71;
dividing the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data, and specifically comprising:
splitting the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data according to a ratio of 4, wherein the number of training set compounds covering the comprehensive quantitative structure-activity relation model of the molecular state compound and the ionic state compound is 145, and the number of verification set compounds covering the comprehensive quantitative structure-activity relation model of the molecular state compound and the ionic state compound is 35; the number of compounds in a training set covering the molecular state compound reaction rate constant quantitative structure-activity relation model is 89, and the number of compounds in a verification set covering the molecular state compound reaction rate constant quantitative structure-activity relation model is 20; the number of compounds in a training set covering the ionic compound reaction rate constant quantitative structure-activity relation model is 58, and the number of compounds in a verification set covering the ionic compound reaction rate constant quantitative structure-activity relation model is 13;
performing statistical regression modeling according to the data related to the reaction rate constant of the training set to obtain a quantitative structure-activity relation model of the reaction rate constant of the singlet oxygen, which specifically comprises the following steps:
obtaining a comprehensive quantitative structure-activity relation model covering the molecular state and the ionic state according to a reaction rate constant training set covering the molecular state and the ionic state:
logk 1 o2, synthesis of =14.241–1.153E 1/2 –0.791C-008-0.130H-047+
4.187H-054+1.101n Crq –0.072H-046+1.281n C(=N)N2 +0.827C-034–8.828f min,C + –0.543N-077–1.216n TB +1.809N aasN +0.381n R05 ;
Obtaining a quantitative structure-activity relation model covering the reaction rate constants of the molecular compounds according to a training set covering the reaction rate constants of the molecular compounds:
logk 1 o2, molecule =15.086–1.310E 1/2 +1.189n R05 +0.382n dssC –0.828
n sssCH +1.487n Crq –0.100n H –0.856n CONN +0.735n Ct +0.991n Pyridines –10.812f min + ;
Obtaining an ionic compound reaction rate constant quantitative structure-activity relation model according to a training set covering the ionic compound reaction rate constants:
logk 1O2, ion =13.327–1.016E 1/2 –0.789C-017+0.018SAdon+
0.421n ArOR –0.121H-047–0.058N%;
Wherein logk 1O2 Represents the rate constant, logk, of the reaction of an organic substance with singlet oxygen 1O2, synthesis The representation is based on covering molecular and ionic compoundsCan be used for predicting the rate constant, logk, of the reaction of molecular state and/or ionic state with singlet oxygen calculated by the comprehensive quantitative structure-activity relation model 1O2, molecule Represents the reaction rate constant, logk of the molecular state compound calculated based on a quantitative structure-activity relation model covering the reaction rate constant of the molecular state compound 1O2, ions Representing the reaction rate constant of the ionic compound calculated on the basis of a model covering the quantitative structure-activity relationship of the reaction rate constant of the ionic compound, E 1/2 Represents a half-wave potential, C-008 represents CHR 2 The X group, H-047 represents a hydrogen atom attached to an sp2 or sp3 hybridized carbon atom, H-054 represents a hydrogen atom attached to an sp3 hybridized carbon atom, to which the next carbon atom is attached three electronegative atoms, n Crq Denotes the number of sp3 hybridized carbon atoms of the four membered ring, H-046 denotes a hydrogen atom attached to an sp3 hybridized carbon atom, to which the next carbon atom is not attached an electronegative atom, n C(=N)N2 Representing the number of guanidine derivatives, C-034 representing a fragment of the R- -CR min,C + Represents the minimum Fukui index at a carbon atom in the molecule, and N-077 represents Al-NO 2 Molecular fragment, n TB Denotes the number of triple bonds, N aasN Denotes the number of aasN type atoms, n R05 Denotes the number of five-membered rings, n dssC Number of atoms representing dssC type, n sssCH Number of atoms representing the type sssCH, n H Represents the number of hydrogen atoms, n CONN Denotes the number of urea derivatives, n Ct Denotes the number of tertiary carbons, n Pyridines Denotes the number of pyridine functions, f min + Represents the minimum Fukui index on the atom in the molecule, C-017 means = CR 2 The group, SAdon, represents the surface area of the donor atom in the P _ VSA-like descriptor, n ArOR Represents the number of ethers in the aromatic compound, and N% represents the percentage of nitrogen atoms in the molecule.
2. The method of predicting the rate constant of reaction of an organic substance with singlet oxygen according to claim 1, wherein the organic substance comprises alcohols, ketones, ethers, aldehydes, acids, esters, halogenated compounds, aromatic compounds, heterocyclic compounds, nitrogen-containing compounds, sulfur-containing compounds, and emerging pollutants.
3. A system for predicting a rate constant for a reaction of an organic substance with singlet oxygen, comprising:
the data acquisition module is used for acquiring reaction rate constant related data of the organic matters under different pH values of the aqueous solution, wherein the reaction rate constant related data comprises molecular state compound reaction rate constant related data and ionic state compound reaction rate constant related data;
the data dividing module is used for dividing the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data;
the model construction module is used for carrying out statistical regression modeling according to the relevant data of the reaction rate constants of the training set to obtain a singlet oxygen reaction rate constant quantitative structure-activity relation model, and the singlet oxygen reaction rate constant quantitative structure-activity relation model comprises a comprehensive quantitative structure-activity relation model covering molecular compounds and ionic compounds, a quantitative structure-activity relation model covering the reaction rate constants of the molecular compounds and a quantitative structure-activity relation model covering the reaction rate constants of the ionic compounds;
the verification module is used for verifying the quantitative structure-activity relation model according to the data related to the reaction rate constant of the test set;
the acquisition module is used for acquiring the molecular structure of the organic matter to be predicted;
the prediction module is used for inputting the molecular structure of the organic matter to be predicted into the corresponding quantitative structure-activity relation model as input to obtain a reaction rate constant of the organic matter to be predicted and singlet oxygen;
the data acquisition module specifically comprises:
the data acquisition unit is used for acquiring 180 pieces of reaction rate constant related data of the organic matter under the condition that the pH = 3-12 of the aqueous solution, wherein the molecular state compound reaction rate constant related data is 109 pieces, and the ionic state compound reaction rate constant related data is 71 pieces;
the data dividing module specifically comprises:
the data dividing unit is used for dividing the reaction rate constant related data into training set reaction rate constant related data and test set reaction rate constant related data according to the proportion of 4; the number of compounds in a training set covering the molecular state compound reaction rate constant quantitative structure-activity relation model is 89, and the number of compounds in a verification set covering the molecular state compound reaction rate constant quantitative structure-activity relation model is 20; the number of compounds in a training set covering the ionic compound reaction rate constant quantitative structure-activity relation model is 58, and the number of compounds in a verification set covering the ionic compound reaction rate constant quantitative structure-activity relation model is 13;
the model building module specifically comprises:
the first model building unit is used for obtaining a comprehensive quantitative structure-activity relation model covering the molecular state and the ionic state compounds according to a reaction rate constant training set covering the molecular state and the ionic state:
logk 1 o2, synthesis of =14.241–1.153E 1/2 –0.791C-008-0.130H-047+
4.187H-054+1.101n Crq –0.072H-046+1.281n C(=N)N2 +0.827C-034–8.828f min,C + –0.543N-077–1.216n TB +1.809N aasN +0.381n R05 ;
The second model building unit is used for obtaining a covering molecular state compound reaction rate constant quantitative structure-activity relation model according to the covering molecular state compound reaction rate constant training set:
logk 1 o2, molecule =15.086–1.310E 1/2 +1.189n R05 +0.382n dssC –0.828
n sssCH +1.487n Crq –0.100n H –0.856n CONN +0.735n Ct +0.991n Pyridines –10.812f min + ;
The third model building unit is used for obtaining the quantitative structure-activity relation model of the reaction rate constant of the ionic compound according to the training set covering the reaction rate constant of the ionic compound:
logk 1O2, ion =13.327–1.016E 1/2 –0.789C-017+0.018SAdon+
0.421n ArOR –0.121H-047–0.058N%;
Wherein logk 1O2 Represents the rate constant, logk, of the reaction of an organic substance with singlet oxygen 1O2, synthesis Representing the calculated rate constant, logk, for predicting the reaction of molecular and/or ionic states with singlet oxygen based on a comprehensive quantitative structure-activity relationship model covering both molecular and ionic compounds 1O2, molecule Representing the reaction rate constant, logk, of the molecular state compound calculated based on a quantitative structure-activity relationship model covering the reaction rate constant of the molecular state compound 1O2, ion Representing the reaction rate constant of the ionic compound calculated on the basis of a model covering the quantitative structure-activity relationship of the reaction rate constant of the ionic compound, E 1/2 Represents a half-wave potential, and C-008 represents CHR 2 The X group, H-047 represents a hydrogen atom attached to an sp2 or sp3 hybridized carbon atom, H-054 represents a hydrogen atom attached to an sp3 hybridized carbon atom, to which the next carbon atom is attached three electronegative atoms, n Crq Denotes the number of sp3 hybridized carbon atoms of the four-membered ring, H-046 denotes the hydrogen atom attached to the sp3 hybridized carbon atom and to the next carbon atom to which this carbon atom is attached without an electronegative atom, n C(=N)N2 Representing the number of guanidine derivatives, C-034 representing a fragment of the R- -CR min,C + Represents the minimum Fukui index at a carbon atom in the molecule, and N-077 represents Al-NO 2 Molecular fragment, n TB Denotes the number of triple bonds, N aasN Denotes the number of aasN type atoms, n R05 Represents the number of five-membered rings and,n dssC number of atoms representing dssC type, n sssCH Number of atoms representing the type sssCH, n H Represents the number of hydrogen atoms, n CONN Denotes the number of urea derivatives, n Ct Denotes the number of tertiary carbons, n Pyridines Denotes the number of pyridine functions, f min + Represents the minimum Fukui index on the atom in the molecule, C-017 means = CR 2 The group, SAdon, represents the surface area of the donor atom in the P _ VSA-like descriptor, n ArOR Represents the number of ethers in the aromatic compound, and N% represents the percentage of nitrogen atoms in the molecule.
4. The system for predicting the rate constant of reaction of organic matter with singlet oxygen according to claim 3, wherein the organic matter comprises alcohols, ketones, ethers, aldehydes, acids, esters, halogenated compounds, aromatic compounds, heterocyclic compounds, nitrogen-containing compounds, sulfur-containing compounds, and emerging pollutants.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010380633.5A CN111564187B (en) | 2020-05-08 | 2020-05-08 | Method and system for predicting reaction rate constant of organic matter and singlet oxygen |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010380633.5A CN111564187B (en) | 2020-05-08 | 2020-05-08 | Method and system for predicting reaction rate constant of organic matter and singlet oxygen |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111564187A CN111564187A (en) | 2020-08-21 |
CN111564187B true CN111564187B (en) | 2023-03-14 |
Family
ID=72071809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010380633.5A Active CN111564187B (en) | 2020-05-08 | 2020-05-08 | Method and system for predicting reaction rate constant of organic matter and singlet oxygen |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111564187B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114755285B (en) * | 2022-05-06 | 2023-05-16 | 中国科学院生态环境研究中心 | Method for predicting oxidizing capacity of sulfate radical to organic matters |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101115995A (en) * | 2004-07-30 | 2008-01-30 | 普罗美加公司 | Covalent tethering of functional groups to proteins and substrates therefor |
CN102507630A (en) * | 2011-11-30 | 2012-06-20 | 大连理工大学 | Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature |
CN104877127A (en) * | 2015-06-23 | 2015-09-02 | 厦门赛诺邦格生物科技有限公司 | Eight-armed polyethylene glycol derivative, preparation method and related biological substance modified by derivative |
CN108021784A (en) * | 2017-12-01 | 2018-05-11 | 大连理工大学 | The photic Forecasting Methodology for producing active oxygen species of nano-metal-oxide |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040082074A1 (en) * | 2002-10-11 | 2004-04-29 | Mcgrath Terrence S. | Axial atomic model for determination of elemental particle field structure and energy levels |
US9593138B2 (en) * | 2012-10-05 | 2017-03-14 | Wayne State University | Nitrile-containing enzyme inhibitors and ruthenium complexes thereof |
-
2020
- 2020-05-08 CN CN202010380633.5A patent/CN111564187B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101115995A (en) * | 2004-07-30 | 2008-01-30 | 普罗美加公司 | Covalent tethering of functional groups to proteins and substrates therefor |
CN102507630A (en) * | 2011-11-30 | 2012-06-20 | 大连理工大学 | Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature |
CN104877127A (en) * | 2015-06-23 | 2015-09-02 | 厦门赛诺邦格生物科技有限公司 | Eight-armed polyethylene glycol derivative, preparation method and related biological substance modified by derivative |
CN108021784A (en) * | 2017-12-01 | 2018-05-11 | 大连理工大学 | The photic Forecasting Methodology for producing active oxygen species of nano-metal-oxide |
Non-Patent Citations (2)
Title |
---|
Quantitative structure-activity relationships for oxidation reactions of organic chemicals in water;Silvio Canonica;《Environmental Toxicology and Chemistry》;20091106;第22卷;全文 * |
有机污染物与典型活性氧物种单线态氧反应速率常数的构效关系研究;李田田;《中国化学会环境计算化学与预测毒理学高端论坛摘要集》;20181031;摘要 * |
Also Published As
Publication number | Publication date |
---|---|
CN111564187A (en) | 2020-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Charbonnet et al. | Communicating confidence of per-and polyfluoroalkyl substance identification via high-resolution mass spectrometry | |
Tan et al. | No evidence for a significant impact of heterogeneous chemistry on radical concentrations in the North China Plain in summer 2014 | |
Yu et al. | Non-target and suspect screening of per-and polyfluoroalkyl substances in airborne particulate matter in China | |
Ye et al. | Chemical structure-based predictive model for the oxidation of trace organic contaminants by sulfate radical | |
Faust et al. | Role of aerosol liquid water in secondary organic aerosol formation from volatile organic compounds | |
Luecken et al. | Sensitivity of ambient atmospheric formaldehyde and ozone to precursor species and source types across the United States | |
Herrmann et al. | Tropospheric aqueous-phase chemistry: kinetics, mechanisms, and its coupling to a changing gas phase | |
Sudhakaran et al. | QSAR models for oxidation of organic micropollutants in water based on ozone and hydroxyl radical rate constants and their chemical classification | |
Pastina et al. | Dependence of molecular hydrogen formation in water on scavengers of the precursor to the hydrated electron | |
Kutsuna et al. | Rate constants for aqueous‐phase reactions of SO4− with C2F5C (O) O− and C3F7C (O) O− at 298 K | |
Zhang et al. | Photochemical reactions of glyoxal during particulate ammonium nitrate photolysis: Brown carbon formation, enhanced glyoxal decay, and organic phase formation | |
Mabato et al. | Aqueous secondary organic aerosol formation from the direct photosensitized oxidation of vanillin in the absence and presence of ammonium nitrate | |
Tsuneda et al. | Theoretical investigation of the H2o2-induced degradation mechanism of hydrated Nafion membrane via ether-linkage dissociation | |
CN111564187B (en) | Method and system for predicting reaction rate constant of organic matter and singlet oxygen | |
Fu et al. | Formation of low-volatile products and unexpected high formaldehyde yield from the atmospheric oxidation of methylsiloxanes | |
Liagkouridis et al. | Combined use of total fluorine and oxidative fingerprinting for quantitative determination of side-chain fluorinated polymers in textiles | |
Ateia et al. | Total Oxidizable Precursor (TOP) Assay─ Best Practices, Capabilities and Limitations for PFAS Site Investigation and Remediation | |
Liu et al. | Quantitative structure activity relationship (QSAR) modelling of the degradability rate constant of volatile organic compounds (VOCs) by OH radicals in atmosphere | |
Isaacman et al. | Heterogeneous OH oxidation of motor oil particles causes selective depletion of branched and less cyclic hydrocarbons | |
Fang et al. | Aqueous-phase decomposition of isoprene hydroxy hydroperoxide and hydroxyl radical formation by Fenton-like reactions with iron ions | |
Zhang et al. | Production of formate via oxidation of glyoxal promoted by particulate nitrate photolysis | |
Zhang et al. | Density functional theory calculations decipher complex reaction pathways of 6: 2 fluorotelomer sulfonate to perfluoroalkyl carboxylates initiated by hydroxyl radical | |
Cheng et al. | Interpretation of reductive PFAS defluorination with quantum chemical parameters | |
Chen et al. | Atmospheric Fate of the CH3SOO Radical from the CH3S+ O2 Equilibrium | |
Amador et al. | Ultra-short chain fluorocarboxylates exhibit wide ranging reactivity with hydrated electrons |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |