CN113808683A - Method and system for virtual screening of drugs based on receptors and ligands - Google Patents

Method and system for virtual screening of drugs based on receptors and ligands Download PDF

Info

Publication number
CN113808683A
CN113808683A CN202111029529.2A CN202111029529A CN113808683A CN 113808683 A CN113808683 A CN 113808683A CN 202111029529 A CN202111029529 A CN 202111029529A CN 113808683 A CN113808683 A CN 113808683A
Authority
CN
China
Prior art keywords
molecular
ligand
ligands
activity data
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111029529.2A
Other languages
Chinese (zh)
Other versions
CN113808683B (en
Inventor
高敏
熊江辉
陈颖
辛冰牧
许楫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spacenter Space Science And Technology Institute
Original Assignee
Spacenter Space Science And Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spacenter Space Science And Technology Institute filed Critical Spacenter Space Science And Technology Institute
Priority to CN202111029529.2A priority Critical patent/CN113808683B/en
Priority claimed from CN202111029529.2A external-priority patent/CN113808683B/en
Publication of CN113808683A publication Critical patent/CN113808683A/en
Application granted granted Critical
Publication of CN113808683B publication Critical patent/CN113808683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C10/00Computational theoretical chemistry, i.e. ICT specially adapted for theoretical aspects of quantum chemistry, molecular mechanics, molecular dynamics or the like

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to a method and a system for virtual screening of drugs based on receptors and ligands, wherein the method comprises the following steps: (1) acquiring activity data of a target receptor, wherein the activity data of the receptor is used for molecular docking; (2) obtaining ligands of the target receptors and activity data thereof to construct a ligand structure library; (3) processing activity data of ligands in the library of ligand structures to obtain molecular fingerprints of the ligands; (4) performing molecular docking on the target receptor and the ligand in the ligand structure library, then performing molecular dynamics simulation, and performing energy decomposition to obtain an energy decomposition value; (5) selecting the molecular fingerprints and the energy decomposition values in proportion to perform feature fusion, and establishing a model according to a machine learning algorithm; and (6) carrying out virtual drug screening by using the model. The drug screening method and the drug screening system have low cost and high efficiency, and have wide application prospects in the fields of activity prediction, structure optimization and design of drugs.

Description

Method and system for virtual screening of drugs based on receptors and ligands
Technical Field
The invention belongs to the technical field of computer-aided drug design, and particularly relates to a method and a system for designing virtual drug screening based on receptors and ligands.
Background
Drug development is a system engineering with long development period, large capital consumption, high investment and low output. It is reported that a new Drug comes into the market from conceptualization, lead structure determination, lead structure optimization, preclinical and clinical practice tests, with a period of 10-15 years, with an investment of about 8 billion dollars, and this investment is still increasing with increasing difficulty in Drug Development, which has been reported as high as 25.58 billion dollars in 2014 by the Tufts Center for the Study of Drug Development (CSDD). Despite the increasing investment in new drug development, the number of new drugs produced worldwide each year shows a decreasing trend. The FDA approved 53 new drugs on the market in 1996, as few as 15 in 2007. With the increasing role of computer-aided drug design in drug discovery, some large pharmaceutical enterprises and research institutes have also begun to conduct relevant theoretical and application studies. To date, many paradigms have been created to apply computer-assisted methods to the successful development of new drugs. However, the computer-aided drug design method still has many defects in the development stage, such as that the crystal structure of the target protein is not easy to obtain, the accuracy of model prediction is low, and the like. Therefore, there is still a need in the art for new virtual drug screening to improve screening efficiency and reduce screening costs.
Disclosure of Invention
The invention aims to overcome the defects of the existing drug screening technology and provide a method and a system for virtually screening drugs based on receptors and ligands.
The present invention provides in one aspect a method for virtual receptor and ligand based screening of drugs comprising the steps of: (1) acquiring activity data of a target receptor, wherein the activity data of the receptor is used for molecular docking; (2) obtaining the ligand of the target receptor and activity data thereof, and constructing a ligand structure library; (3) processing activity data of ligands in the library of ligand structures to obtain molecular fingerprints of the ligands; (4) performing molecular docking on the target receptor and the ligand in the structure library by using molecular docking software, performing molecular dynamics simulation, and performing energy decomposition to obtain an energy decomposition value; (5) selecting the molecular fingerprints and the energy decomposition values in proportion to perform feature fusion, and establishing a model according to a machine learning algorithm; and (6) carrying out virtual drug screening by using the model. The virtual screening method of the medicine decomposes Gibbs free energy on the basis of the traditional molecular docking, so that the Gibbs free energy is converted into characteristic values which can be learned; and the receptor-based drug screening technology is combined with the ligand-based 2D and 3D molecular fingerprints, so that the advantages of the two technologies are complemented, the characteristics of different active ligand molecule sets are efficiently represented, and a drug screening model with high accuracy is trained.
In some embodiments, the activity data for the target receptor comprises a crystal structure of the target receptor with a resolution of less than 2 angstroms. The resolution can ensure that the probability of wrong side chain placement and wrong annular structure occurrence is kept at a lower level.
In some embodiments, the ligand and activity data of the ligand are obtained from ChEMBL database. In some embodiments, the activity Data of the target receptor is obtained from a Protein Data Bank (PDB) database.
In some embodiments, the activity data of the ligand comprises IC50、EC50And an Inhibition rate (Inhibition). In some embodiments, the screening criteria for activity data of the ligand are: IC (integrated circuit)50Screening for Compounds with Standard Value < 10 μm, EC50Screening the compound with the Stand Value less than 10 mu m,inhibition screening of the compounds with Stand Value > 30%. In some embodiments, to ensure the rational structure of the small molecule ligand, the small molecule ligand structure needs to be pre-treated, for molecules containing multiple fragments, only the largest molecular fragment is retained, strong acid deprotonation and strong base protonation are used to correct the structural error, and the 2D molecular structure is converted into a 3D molecular structure. And all molecules are subjected to energy minimization (bound/Van der Waals/Electrostatics/resources) treatment under the MMFF99X force field (setting the partial charge, hydrogenation and nonbond interaction ranges cutoff equal to 8-10 and the Gradient equal to 0.001, namely the RMS Gradient root mean square Gradient is less than 0.001, and the optimization is finished).
In some embodiments, the molecular fingerprint is a 2D molecular fingerprint and/or a 3D molecular fingerprint. Preferably, the 2D molecular fingerprint is MACCS, RDkit and/or ECFP; the 3D molecular fingerprint is ESshiape.
In some embodiments, in step (4), the molecular docking software is DOCK software, and the conformation with the highest score is selected for molecular dynamics simulation. In some embodiments, the molecular docking software is DOCK software, performing 3D protonation in MMFF99X force field, resulting in overall lowest potential energy configuration of different states of terminal amide, hydroxyl, thiol, histidine and titratable groups throughout the system; forming a protein cavity by taking an original ligand in a protein crystal as a reference, realizing the formation by a rigid protein and flexible molecule mode, generating a ligand conformation by a bond rotation method, placing the ligand into a docking site by using a Triangle Matcher method, reserving 10 conformations, and selecting the conformation with the highest score (namely, the optimal conformation) to perform molecular dynamics simulation and calculation.
In some embodiments, in step (4), the binding free energy of the system is calculated using a selective MM/GBSA method by performing a two-step energy minimization in the molecular dynamics simulation. In some embodiments, in step (4), during the molecular dynamics simulation, a two-step energy minimization is performed using 10 angstroms for the truncation distance to add truncated octahedral water boxes to the complex: the first step is to limit protein amino acid residues by 500kcal/mol, optimize solvent to carry out 5000 steps of energy optimization, circularly use a steepest descent method in the first 2500 steps, and switch a conjugate gradient method in the last 2500 steps; and secondly, optimizing the whole system: cut-off distance for non-bonding interactions at 10 angstroms; periodic boundaries using a constant volume; in order to realize better energy convergence of the system, system vibration is not executed; the minimization method switches the steepest descent method to the conjugate gradient method after 2500 cycles, and the complete interaction force is calculated. The combined free energy of the system is calculated by using a method for selecting MM/GBSA, and a specific algorithm is as follows:
ΔGb=ΔEMM+ΔGsol-TΔS
wherein Δ Gb is a binding free energy in a solvent; Δ EMM is the molecular mechanical energy, consisting of electrostatic and van der waals interaction energies (Δ eintel and Δ Evdwint) between ligand and protein; Δ Gsol is a solvation energy that can be divided into electrostatic and hydrophobic interactions (Δ gel and Δ gnonopol sols) on the solvation free energy. The technical scheme utilizes two-step energy minimization to carry out system optimization, so that the whole compound system is in a dominant energy state, and Gibbs free energy is decomposed and converted into characteristic values which can be learned. The method for virtually screening the drugs is based on the traditional molecular docking, the system optimization is carried out by utilizing two-step energy minimization, the whole compound system is in an advantageous energy state, and Gibbs free energy is decomposed and converted into characteristic values which can be learned.
In some embodiments, the ligand is an agonist or antagonist of the target receptor. In some embodiments, the ratio of step (5) is selected to be 5% to 100%; preferably, the selection proportion is 40 to 100 percent
In some embodiments, the machine learning algorithm is a random forest method, the model is a SVM model, the agonist and antagonist samples are randomly divided into training and test sets in a 4:1 or 7:3 ratio, and an RBF kernel is taken:
KRBF(x1,x2)=exp(-γ||x1-x2||2)
X1as penalty parameter cost, X2The optimization range of the parameters is cost ^ 2^ seq (-2,10, by ^ 2), gamma ^ 2 seq (-10,2, by ^ 2), and the value of epsilon is set to 0.1.
The present invention provides in another aspect a system for virtual receptor and ligand based screening of drugs, the virtual drug screening system comprising: A. a target receptor data acquisition module for acquiring activity data of a target receptor; B. a ligand data acquisition module for acquiring a ligand for the target receptor and activity data of the ligand, wherein the activity data of the ligand comprises a molecular fingerprint of the ligand; C. a molecular fingerprint acquisition module for processing activity data of ligands in the ligand structure library to obtain a fingerprint of the ligands; D. the molecular docking and kinetic simulation module is used for performing molecular docking on the target receptor and the ligand, then performing molecular kinetic simulation, and performing energy decomposition to obtain an energy decomposition value; E. the training module is used for carrying out feature fusion on the molecular fingerprint and the energy decomposition value in proportion and establishing a model according to a machine learning algorithm; and F, a drug virtual screening module for virtually screening drugs according to the model. The virtual screening method of the medicine decomposes Gibbs free energy on the basis of the traditional molecular docking, so that the Gibbs free energy is converted into characteristic values which can be learned; and the receptor-based drug screening technology is combined with the ligand-based 2D and 3D molecular fingerprints, so that the advantages of the two technologies are complemented, the characteristics of different active ligand molecule sets are efficiently represented, and a drug screening model with high accuracy is trained.
In some embodiments, the machine learning algorithm is a random forest method, the model is a SVM model, the agonist and antagonist samples in the ligand structure library are randomly divided into training and testing sets in a ratio of 4:1 or 7:3, and an RBF kernel is taken:
KRBF(x1,x2)=exp(-γ||x1-x2||2)
X1for punishing ginsengSeveral costs, X2The optimization range of the parameters is cost ^ 2^ seq (-2,10, by ^ 2), gamma ^ 2 seq (-10,2, by ^ 2), and the value of epsilon is set to 0.1.
Compared with the traditional method, the method has the advantages that: (1) on the basis of the traditional molecular docking, the system optimization is carried out by utilizing two-step energy minimization, so that the whole compound system is in an advantageous energy state, and Gibbs free energy is decomposed and converted into characteristic values which can be learned; and/or (2) combining a receptor-based drug screening technology with ligand-based 2D and 3D molecular fingerprints to complement the advantages of the two technologies, thereby efficiently characterizing the characteristics of different active ligand molecule sets and training a drug screening model with higher accuracy.
Drawings
FIG. 1 is a flow chart of a method of virtual drug screening according to the present invention.
FIG. 2.IED method selection discriminant model modeling result statistical chart.
FIG. 3 is a statistical chart of the modeling results of four molecular fingerprint methods for selecting a discriminant model.
Detailed Description
To better illustrate the objects, aspects and advantages of the present invention, the present invention will be further described with reference to specific examples.
For a better understanding of the present invention, the following explanations and illustrations are provided.
The term "ChEMBL" refers to a database of active ligands.
The term "Protein Data Bank (PDB)" refers to a database of Protein crystal structures.
The term "Random Forest (RF) is a Random Forest method.
The term "molecular fingerprint" refers to an abstract representation of a molecule that transforms (encodes) the molecule into a series of bit strings (i.e., bit vectors) that can then be easily compared between molecules. A typical procedure would be to extract the structural features of the molecule and then hash (Hashing) to generate a bit vector active drug: a drug capable of treating a disease, i.e. the drug is active against the disease.
The term "Support Vector Machine (SVM)": support vector machine methods.
The term "energy decomposition value" refers to an energy value obtained by decomposing each amino acid residue and ligand molecule in a target receptor after molecular docking, molecular dynamics simulation and energy decomposition of the target receptor and the ligand.
Example 1 establishment of a drug screening model for Vitamin D Receptor (VDR) Using the method of the present invention
A. Structure-based selection discriminant model construction (SBDD)
1) Obtaining a target protein structure: the VDR protein crystal structure is obtained from a PDB database, wherein the resolution of the VDR protein crystal structure is lower than 2 angstroms, and the resolution can ensure that the probability of wrong side chain placement and wrong annular structure appearance is kept at a lower level;
2) respectively acquiring data sets of a VDR receptor agonist and an inhibitor from a ChEMBL database, and constructing a structure library of the VDR receptor agonist and the inhibitor; wherein ligand molecules or behaviors IC are obtained50、EC50And molecular data of Inhibition. The activity information screening criteria was IC50Screening for Compounds with Standard Value < 10 μm, ECa0The compounds with Stand Value < 10 μm were screened, while the compounds with Stand Value > 30% were screened by Inhibition. In order to ensure the reasonability of the small molecular structure, the small molecular structure needs to be pretreated, only the largest molecular fragment of a molecule containing a plurality of fragments is reserved, strong acid deprotonation and strong base protonation are adopted to correct structural errors, and the 2D molecular structure is converted into a 3D molecular structure. And all molecules are subjected to energy minimization (bound/Van der Waals/Electrostatics/resources) treatment under the MMFF99X force field (setting the partial charge, hydrogenation and nonbond interaction ranges cutoff equal to 8-10 and the Gradient equal to 0.001, namely the RMS Gradient root mean square Gradient is less than 0.001, and the optimization is finished).
3) Performing molecular docking on the obtained VDR protein structure and ligand molecules in a structure library by using DOCK software, 3D protonation is carried out in MMFF99X force field, so that the overall lowest potential energy configuration of different states of terminal amide, hydroxyl, thiol, histidine and titratable groups in the whole system forms protein cavities by taking the original ligand in protein crystals as reference, realized by rigid protein and flexible molecule, ligand conformation is generated by a bond rotation method, and then is placed in a docking site by using a Triangle Matcher method, the 10 conformations were retained, the highest scoring conformation (i.e., the optimal ligand conformation for binding to the target was obtained and subsequently integrated with the target protein in turn) was selected for molecular dynamics simulation using a 10 angstrom cutoff distance to add a cutoff octahedral water box for the complex, performing a two-step energy minimization: the first step is to limit protein amino acid residues by 500kcal/mol, optimize solvent to carry out 5000 steps of energy optimization, circularly use a steepest descent method in the first 2500 steps, and switch a conjugate gradient method in the last 2500 steps; and in the second step, the whole system is optimized. Cut-off distance for non-bonding interactions at 10 angstroms; periodic boundaries using a constant volume; in order to realize better energy convergence of the system, system vibration is not executed; the minimization method switches the steepest descent method to the conjugate gradient method after 2500 cycles, and the complete interaction force is calculated. The binding free energy of the selected MM/GBSA method is used to calculate the system.
ΔGb=ΔEMM+ΔGsol-TΔS
Wherein Δ Gb is the binding free energy in the solvent; Δ EMM is the molecular mechanical energy, consisting of electrostatic and van der waals interaction energies (Δ eintel and Δ Evdwint) between ligand and protein; the delta Gsol is solvation energy which can be divided into electrostatic interaction and hydrophobic interaction (delta Gele sol and delta Gnenpol sol) on the solvation free energy, the binding free energy (MM-GBAS) of the protein and the ligand is calculated, the amino acid residues in the range of the target binding cavity 5A are taken as parameters for energy decomposition, and the energy values of all the amino acid residues and the ligand molecule decomposition energy value are reserved during the energy decomposition, so that the effective characteristics are all used for the subsequent model establishment;
4) using a machine learning method RF, a VDR receptor-based agonist/antagonist SVM selection discriminant model was constructed by selecting characteristic values in proportion (5%, 10%, 20%, 40%, 60%, 80%, or 100%) with energy decomposition values obtained based on molecular docking, molecular dynamics of the target-ligand complex as characteristic values.
As a result:
in the research based on the structure, the optimal binding conformation of a vitamin D receptor and an active ligand is obtained through molecular docking, the optimization of a binding system is realized through molecular dynamics simulation, the binding free energy of the ligand and the receptor is calculated and energy decomposition is carried out, and the construction of an SVM selection discrimination model is carried out by taking each amino acid residue in the receptor and the energy value obtained through the decomposition of ligand molecules as characteristics. This method is herein named Interaction Energy Interaction method (IED).
5 important characteristic values are reserved after similarity screening and random forest screening of characteristic values, modeling is carried out according to 5%, 10%, 20%, 40%, 60%, 80% and 100% of characteristics according to a modeling process, and as only one characteristic is selected from 5% to 20%, the MCC value cannot be predicted by the model, and finally, only the characteristic modeling result of more than 40% is displayed. The modeling result of the IED method is shown in FIG. 2, and the model is used for training the statistical results of the set cross validation and the test set under different IED feature numbers.
When the model selects 40% of features (2 features), the AUC and the total prediction accuracy Qtotal of the cross validation are less than 0.7, the sensitivity SE and the MCC are low, and the values are about 0.2. With increasing numbers of features to 100% (5 features), AUC increased above 0.8, Qtotal approached 0.8, and both SE and MCC values reached above 0.5. The SP does not change much with increasing number of features, and is always maintained at about 0.9. As can be seen from the modeling results, the characteristic values calculated and selected by the IED method can well reflect the EC50Reactive molecule and IC50Difference in binding of active molecule to VDR receptor.
B. Ligand-based selection discriminant model construction (LBDD)
Ligand-based approaches also typically use various molecular descriptors. They are generated from the two-dimensional (2D) structure (2D descriptor) of the compound, or they are obtained using their spatial orientation, or simply by minimization of ligands or construction of three-dimensional (3D) descriptors using docking.
1) Converting the chemical structures in the library of structures of step 2) above "a. structure-based selection discriminant model construction (SBDD)" into a form that can be processed by computer methods. The most common method of capturing chemical information is to apply a digital descriptor or fingerprint. Numerical descriptors characterize the physicochemical properties of compounds, examples of these descriptors are as follows: molecular weight, octanol water partition coefficient (logP), pKa, number of hydrogen bond donors, number of hydrogen bond acceptors, number of atoms specific type, number of bonds of specific type, atomic charge, polarity, molecular volume, and the like. Fingerprints are the result of conversion of a compound into a bit string, and this study used bond-based fingerprinting to annotate the presence (1) or absence (0) of a particular chemical moiety in a molecule to obtain a molecular fingerprint.
2) Using a machine learning method RF, a model for discriminating the agonist/antagonist selection of VDR receptors was constructed by selecting characteristic values in proportion (5%, 10%, 20%, 40%, 60%, 80%, or 100%) with molecular fingerprints obtained based on the molecular structure of the ligand as characteristic values.
As a result:
in the construction of a ligand-based selection discrimination model of a vitamin D receptor, three typical 2D molecular fingerprint methods are respectively adopted, and the characteristics of an active ligand are calculated and a model is constructed based on a molecular fingerprint of a substructure (MACCS fingerprint), a molecular fingerprint based on topology or path (RDKit fingerprint), a circular hash topology fingerprint (ECFP fingerprint) and a 3D molecular fingerprint (ESshape3D fingerprint).
As shown in FIG. 3A, the MACCS fingerprint modeling results show that the AUC value and Qtotal increase from about 0.8 to more than 0.95, the SE value and MCC value increase from about 0.6 to more than 0.9, and the SP increases from about 0.9 to 1 as the feature number increases. The RDkit fingerprinting results are shown in FIG. 3B, except that the indices remain between 0.97 and 1 when both 20% and 40% features are selected. As shown in fig. 3C, as the feature number increases, the various numerical values of the modeling result are stable, the AUC value is stable at a level above 0.97, the SE value has a small fluctuation range, the numerical range is about 0.86 to 0.91, the SP also has a certain fluctuation range, the numerical range is between 0.95 and 1, the Qtotal is increased from about 0.93 to about 0.95, and the MCC value is between 0.87 and 0.91. The fingerprint modeling result of ESshape3D is shown in FIG. 3D, as the feature number increases, the AUC value increases from about 0.8 to about 0.87, the SE value increases from about 0.5 to about 0.77, the SP has certain fluctuation, the numerical range is from about 0.8 to about 0.9, the Qtotal increases from about 0.72 to about 0.8, and the MCC value increases from about 0.36 to about 0.59. The results show that the RDKit fingerprint model has the best effect, the ECFP fingerprint and MACCS fingerprint model have the second effect, and the ESshape3D fingerprint model has slightly poor effect compared with the three fingerprint models.
C. Modeling of the invention
The characteristic value (descriptor) acquired by the IDE method based on the structure is fused with three descriptors acquired by ECFP, MACCS and ESshape3D based on ligand methods respectively, and then a model is reconstructed, the model acquired by the method is fused with characteristic information acquired by the method based on structural energy decomposition and characteristic information acquired based on the ligand structure, and the method is called as an E-QSAR method in the application. Selecting characteristic values according to a proportion by adopting a random forest algorithm, establishing a plurality of characteristic fusion SVM models, randomly dividing an agonist sample and an antagonist sample into a training set and a testing set by the SVM models according to the proportion of 4:1 or 7:3, and adopting an RBF kernel function, wherein the method specifically comprises the following steps:
KRBF(x1,x2)=exp(-γ||x1-x2||2)
X1as penalty parameter cost, X2The optimization range of the parameters is cost ^ 2^ seq (-2,10, by ^ 2), gamma ^ 2 seq (-10,2, by ^ 2), and the value of epsilon is set to 0.1. After the model is established, overall accuracy (Qtotal), area under a characteristic curve (AUC), Manikin Correlation Coefficient (MCC), Sensitivity (SE) and Specificity (SP) indexes are adopted to carry out model evaluation and output a medicine activity prediction result.
As a result:
as can be seen from Table 1, the E-QSAR model effect after feature fusion is greatly improved compared with the model effect of a single method. For example, compared with the ECFP model, the ECFP + IED model has the advantage that Qtotal is improved from 0.8945 to 0.9550; compared with the ECFP model, Qtotal of the MACCS + IED model is improved from 0.9181 to 0.9769; compared with the ESshape3D model, the ESshape3D + IED model has the advantage that Qtotal is improved from 0.5941 to 0.7339. In addition, the MCC values between groups are also increased.
Table 1E-QSAR method and modeling results for the single method.
Figure BDA0003243406120000091
Note: each method exhibits the modeling result of the largest feature number.
The method comprises the steps of taking a vitamin D receptor of a nuclear receptor family as a research object, obtaining receptor structure information from a PDB database, obtaining active ligand information from a ChEMBL database, calculating the difference value of interaction between different micromolecule ligands and the nuclear receptor and the structural characteristics of the micromolecule ligands through two methods, namely molecular docking, molecular dynamics, free energy decomposition technology, molecular fingerprint calculation and the like, based on structures and based on ligands, respectively establishing SVM selection discrimination models of agonists and antagonists of the nuclear receptor of the two methods, respectively establishing selection discrimination models fusing the energy information of the receptor and the structural characteristics of the ligands by comparing the accuracy of the models established by the two research methods and combining the respective advantages of the receptor and the ligands. Features in the model of the invention will be used not only for chemical structure characterization but also to describe the chemical structure and energy characteristics of the ligand-receptor complex obtained in docking. The method provides information related to the interaction between the ligand and the specific amino acid of the protein, forms a structural interaction fingerprint, and further constructs a drug target ligand selection discrimination model with higher accuracy. The characteristics acquired by the IED method based on the structure are respectively fused with the characteristics acquired by the ECFP, MACCS and ESshape3D based on the ligand method, and then a model (E-QSAR model) is reconstructed, and the model effect after the characteristic fusion is greatly improved compared with the model effect of a single method. Therefore, the E-QSAR combines a structure-based method and a ligand-based method to construct a virtual screening model with better performance. The application provides a reliable and efficient drug screening model, and provides an effective high-throughput drug virtual screening tool for the development of drugs (such as endocrine regulation drugs and antitumor drugs).
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the protection scope of the present invention, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for virtual receptor and ligand-based drug screening, comprising the steps of:
(1) acquiring activity data of a target receptor, wherein the activity data of the receptor is used for molecular docking;
(2) obtaining ligands of the target receptors and activity data thereof to construct a ligand structure library;
(3) processing activity data of ligands in the library of ligand structures to obtain molecular fingerprints of the ligands;
(4) performing molecular docking on the target receptor and the ligand in the ligand structure library, then performing molecular dynamics simulation, and performing energy decomposition to obtain an energy decomposition value;
(5) selecting the molecular fingerprints and the energy decomposition values in proportion to perform feature fusion, and establishing a model according to a machine learning algorithm; and
(6) and carrying out virtual drug screening by using the model.
2. The method for virtual drug screening of claim 1, wherein the activity data of the target receptor includes a crystal structure of the target receptor with a resolution of less than 2 angstroms.
3. The method for virtual screening of drugs according to claim 1, wherein the molecular fingerprint is a 2D molecular fingerprint or a 3D molecular fingerprint.
4. The method for virtual drug screening of claim 1, wherein the ligand activity data comprises IC50、EC50And an inhibition rate.
5. The method for virtual drug screening according to claim 1, wherein in the step (4), the software used in the molecular docking is DOCK software, and the conformation with the highest score is selected for molecular dynamics simulation.
6. The method for virtual drug screening according to claim 5, wherein in the step (4), the binding free energy of the system is calculated by performing two-step energy minimization using the selection MM/GBSA method in the molecular dynamics simulation.
7. The method for virtual screening of drugs according to any one of claims 1-6, wherein the ligands in the ligand structure library are agonists or antagonists of the target receptor.
8. The method for virtual drug screening of claim 7 wherein the machine learning algorithm is a random forest method, the model is an SVM model, the agonist and antagonist are randomly divided into training and testing sets in a ratio of 4:1 or 7:3, and an RBF kernel is taken:
KRBF(x1,x2)=exp(-γ||x1-x2||2)
X1as penalty parameter cost, X2The optimization range of the parameters is cost ^ 2^ seq (-2,10, by ^ 2), gamma ^ 2 seq (-10,2, by ^ 2), and the value of epsilon is set to 0.1.
9. A system for virtual receptor and ligand-based drug screening, the system comprising:
A. a target receptor data acquisition module for acquiring activity data of a target receptor;
B. a ligand structure library module for obtaining ligands of the target receptors and activity data of the ligands to construct a ligand structure library;
C. a molecular fingerprint acquisition module for processing activity data of ligands in the ligand structure library to obtain a fingerprint of the ligands;
D. the molecular docking and kinetic simulation module is used for performing molecular docking on the target receptor and the ligand, then performing molecular kinetic simulation, and performing energy decomposition to obtain an energy decomposition value;
E. the training module is used for selecting the molecular fingerprints and the energy decomposition values in proportion to perform feature fusion and establishing a model according to a machine learning algorithm; and
F. and the drug virtual screening module is used for virtually screening drugs according to the model.
10. The system for virtual drug screening of claim 9 wherein the machine learning algorithm is a random forest method, the model is an SVM model, the agonists and antagonists in the ligand structure library are randomly divided into training and testing sets in a ratio of 4:1 or 7:3, and RBF kernel function is taken:
KRBF(x1,x2)=exp(-γ||x1-x2||2)
X1as penalty parameter cost, X2The optimization range of the parameters is cost ^ 2^ seq (-2,10, by ^ 2), gamma ^ 2 seq (-10,2, by ^ 2), and the value of epsilon is set to 0.1.
CN202111029529.2A 2021-09-02 Method and system for virtually screening medicines based on receptor and ligand Active CN113808683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111029529.2A CN113808683B (en) 2021-09-02 Method and system for virtually screening medicines based on receptor and ligand

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111029529.2A CN113808683B (en) 2021-09-02 Method and system for virtually screening medicines based on receptor and ligand

Publications (2)

Publication Number Publication Date
CN113808683A true CN113808683A (en) 2021-12-17
CN113808683B CN113808683B (en) 2024-07-02

Family

ID=

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116453587A (en) * 2023-06-15 2023-07-18 之江实验室 Task execution method and device, storage medium and electronic equipment
WO2024026725A1 (en) * 2022-08-03 2024-02-08 深圳阿尔法分子科技有限责任公司 Mm/pb(gb)sa-based protein-drug binding free energy prediction method and prediction system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446607A (en) * 2016-09-26 2017-02-22 华东师范大学 Drug target virtual screening method based on interactive fingerprints and machine learning
CN107862173A (en) * 2017-11-15 2018-03-30 南京邮电大学 A kind of lead compound virtual screening method and device
US20190272887A1 (en) * 2018-03-05 2019-09-05 The Board Of Trustees Of The Leland Stanford Junior University Machine Learning and Molecular Simulation Based Methods for Enhancing Binding and Activity Prediction
CN112086146A (en) * 2020-08-24 2020-12-15 南京邮电大学 Small molecule drug virtual screening method and device based on deep parameter transfer learning
CN112086139A (en) * 2020-08-24 2020-12-15 南京邮电大学 Multi-source transfer learning method and device for virtual screening of small molecule drugs

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446607A (en) * 2016-09-26 2017-02-22 华东师范大学 Drug target virtual screening method based on interactive fingerprints and machine learning
CN107862173A (en) * 2017-11-15 2018-03-30 南京邮电大学 A kind of lead compound virtual screening method and device
US20190272887A1 (en) * 2018-03-05 2019-09-05 The Board Of Trustees Of The Leland Stanford Junior University Machine Learning and Molecular Simulation Based Methods for Enhancing Binding and Activity Prediction
CN112086146A (en) * 2020-08-24 2020-12-15 南京邮电大学 Small molecule drug virtual screening method and device based on deep parameter transfer learning
CN112086139A (en) * 2020-08-24 2020-12-15 南京邮电大学 Multi-source transfer learning method and device for virtual screening of small molecule drugs

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SABINA PODLEWSKA ET AL.: "Mutual Support of Ligand-and Structure-Based Approaches—To What Extent We Can Optimize the Power of Predictive Model? Case Study of Opioid Receptors", 《MOLECULES》, pages 1 - 16 *
黄琦 等: "基于配体、受体和复合物指纹的虚拟筛选方法比较", 《化学学报》, vol. 69, no. 05, pages 515 - 521 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024026725A1 (en) * 2022-08-03 2024-02-08 深圳阿尔法分子科技有限责任公司 Mm/pb(gb)sa-based protein-drug binding free energy prediction method and prediction system
CN116453587A (en) * 2023-06-15 2023-07-18 之江实验室 Task execution method and device, storage medium and electronic equipment
CN116453587B (en) * 2023-06-15 2023-08-29 之江实验室 Task execution method for predicting ligand affinity based on molecular dynamics model

Similar Documents

Publication Publication Date Title
Aggarwal et al. DeepPocket: ligand binding site detection and segmentation using 3D convolutional neural networks
EP3455236A1 (en) Computational method for classifying and predicting protein side chain conformations
Becker Geometric versus topological clustering: an insight into conformation mapping
WO1998007107A1 (en) Molecular hologram qsar
Pijeau et al. Improved complete active space configuration interaction energies with a simple correction from density functional theory
Gómez et al. Multiple facets of modeling electronic absorption spectra of systems in solution
CA3226172A1 (en) Systems and methods for artificial intelligence-guided biomolecule design and assessment
Fukunishi Structure-based drug screening and ligand-based drug screening with machine learning
Wang et al. MDC-Kace: A model for predicting lysine acetylation sites based on modular densely connected convolutional networks
US8886505B2 (en) Method of predicting protein-ligand docking structure based on quantum mechanical scoring
Flower DISSIM: a program for the analysis of chemical diversity
CN114446383A (en) Quantum computation-based ligand-protein interaction prediction method
US8374837B2 (en) Descriptors of three-dimensional objects, uses thereof and a method to generate the same
CN113808683A (en) Method and system for virtual screening of drugs based on receptors and ligands
CN113808683B (en) Method and system for virtually screening medicines based on receptor and ligand
Sciabola et al. Critical Assessment of State‐of‐the‐Art Ligand‐Based Virtual Screening Methods
EP1862927B1 (en) Descriptors of three-dimensional objects, uses thereof and a method to generate the same
JP2003524831A (en) System and method for exploring combinatorial space
Habgood Bioactive focus in conformational ensembles: a pluralistic approach
US20030060982A1 (en) Method for searching heterogeneous compound databases using topomeric shape descriptors and pharmacophoric features
Zeng et al. Neural network based in silico simulation of combustion reactions
Sadowski et al. 3D structure generation and conformational searching
H Lushington et al. Chemical informatics and the drug discovery knowledge pyramid
JP2021192199A (en) Structure search method, structure search device, program for structure search, and interaction potential specification method
Wang et al. Reconstruction of Protein Backbone with the alpha-Carbon Coordinates.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant