CN110689929B - Protein ATP docking method based on contact probability assistance - Google Patents

Protein ATP docking method based on contact probability assistance Download PDF

Info

Publication number
CN110689929B
CN110689929B CN201910805001.6A CN201910805001A CN110689929B CN 110689929 B CN110689929 B CN 110689929B CN 201910805001 A CN201910805001 A CN 201910805001A CN 110689929 B CN110689929 B CN 110689929B
Authority
CN
China
Prior art keywords
atp
residue
binding
atom
protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910805001.6A
Other languages
Chinese (zh)
Other versions
CN110689929A (en
Inventor
张贵军
饶亮
刘俊
赵凯龙
胡俊
周晓根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201910805001.6A priority Critical patent/CN110689929B/en
Publication of CN110689929A publication Critical patent/CN110689929A/en
Application granted granted Critical
Publication of CN110689929B publication Critical patent/CN110689929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Physiology (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A protein ATP docking method based on contact probability assistance comprises the steps that firstly, binding residue information of protein-ATP is predicted by using five protein binding residue prediction servers such as ATPbind and the like, residues with large occurrence times are selected as binding residues by using a voting method, and the accuracy of the binding residues is improved; secondly, extracting a contact probability matrix of the binding residues of the specific type and each atom of ATP from the PDB database, and scoring the generated conformation as an energy function to improve the docking accuracy; and finally, searching for the optimal individual by using an improved differential evolution algorithm, thereby improving the calculation efficiency. The invention provides a protein ATP docking method based on contact probability assistance, which is low in calculation cost and high in prediction accuracy.

Description

Protein ATP docking method based on contact probability assistance
Technical Field
The invention relates to the fields of bioinformatics, intelligent optimization and computer application, in particular to a protein ATP docking method based on contact probability assistance.
Background
With the continuous research of proteomics, it is more and more common to find that proteins and some ligand small molecules are combined into a whole to play a role in organisms. Throughout life, protein-ligand mutual recognition processes, including substrate-enzyme, antigen-antibody, hormone-receptor recognition, are important bases for molecular mechanisms and regulation processes of various biological functions. The mutual recognition and action of proteins and ligands are important ways for proteins to exert their biological functions, and play very important roles in various life activities, such as gene regulation, signal transduction, immune response, etc., which are not separated from the interaction of proteins and ligands. ATP is also a small molecule ligand, it is a widely distributed energy molecule in the human body, through the action of ATP hydrolase, the released energy becomes ADP, ADP can form ATP through the action of ATP synthetase, and both processes need to combine with enzyme protein to occur. The research on the molecular recognition mechanism between protein and its ligand, the establishment of recognition model and the research on the relationship between molecular recognition and molecular selectivity not only have very important significance for revealing the biological essence, but also can be applied to guide the design and synthesis of compounds with special recognition function and bioactivity.
At present, the wet experimental methods mainly adopted for determining the structure of the protein-ligand complex comprise X-ray crystal diffraction, nuclear magnetic resonance and the like, but the methods for determining the structure of the protein-ligand complex have the defects of great difficulty, high cost and long time. In recent years, with the continuous enhancement of computer technology and the rapid development and wide application of molecular simulation method theory, molecular simulation methods such as homologous modeling, molecular docking, molecular dynamics simulation, binding free energy calculation, quantum mechanics calculation and the like have become important means for researching the interaction mechanism and dynamic process of protein and ligand. The molecular simulation method provides a good means for researching life phenomena and revealing essential rules of the life phenomena on the molecular level or even the atomic level, and can provide powerful theoretical guidance for experiments. With the theoretical perfection of molecular simulation and the advancement of technology, molecular simulation methods are increasingly being used in the research work of protein structure and function, mutual recognition of protein and ligand, and drug design.
Computer molecular simulation techniques rely primarily on the process of searching for complex structures with the lowest energy using intelligent algorithms and energy functions. However, at present, an energy function can perfectly judge the energy of the complex, besides, the inaccurate prediction of protein binding residues can also cause errors of the energy function, so that the predicted complex structure is inaccurate, and some intelligent algorithms also have the problems of long search time or inaccurate search results.
Therefore, the existing protein and ligand molecule docking methods have defects in prediction accuracy and computational cost, and need to be improved.
Disclosure of Invention
In order to overcome the defects of the conventional protein and ligand ATP docking method in the aspects of prediction accuracy and calculation cost, the invention provides a contact probability-assisted protein ATP docking method which is low in calculation cost and high in prediction accuracy.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a contact probability-assisted protein ATP docking method, the method comprising the steps of:
1) inputting the structures of the target protein and ATP, which are respectively marked as R and A;
2) predicting all ATP binding residues of the target protein R using an ATPbind server (http:// zhanglab. ccmb. med. umich. edu/ATPbind /), a TargetS server (http:// www.csbio.sjtu.edu.cn:8080/TargetS /), a TargetSOS server (http:// www.csbio.sjtu.edu.cn:8080/TargetSOS /), a TargetNUCs server (http://202.119.84.36:3079/TargetNUCs /), and a TargetTPsite server (http:// www.csbio.sjtu.edu.cn: 8080/TargetTPsite /), respectively;
3) for each possible binding residue, if three or more servers predict that the binding residue is a binding residue, the binding residue is used as the binding residue, and finally h protein binding residues are obtained and are marked as r1,r2,...,rh
4) Calculation of all binding residues r1,r2,...,rhCentral carbon atom CαThe mean value of the coordinates is obtained to obtain the central coordinate C of the binding residueR(ii) a Calculating the average value of all the atomic coordinates in A to obtain the central coordinate C of AAMoving A so that CAAnd CRThe coordinates of (2) are overlapped;
5) the probability of each type of binding residue coming into contact with each ATP atom is extracted from the PDB database as follows:
5.1) for each complex in the PDB databaseCalculating the C of the binding residue of all residue types gαAverage distance d between atom and jth atom in ATPg,jIf, if
Figure GDA0003267695010000021
Then order
Figure GDA0003267695010000022
Otherwise, it orders
Figure GDA0003267695010000023
Wherein g ═ {1,2, …,21} represents 21 residue types, j ═ {1,2, …,31} represents 31 ATP atoms,
Figure GDA0003267695010000031
indicating whether there is contact between a binding residue of residue type g in the kth complex and the jth atom in ATP;
5.2) calculation of all complexes
Figure GDA0003267695010000032
The average value of (1) is denoted as cg,jTo obtain a 21 × 31 dimensional contact probability matrix:
Figure GDA0003267695010000033
6) setting parameters: setting a population size NP, a scaling factor F0Cross probability CR, maximum number of iterations GmaxInitializing the iteration times G to be 0;
7) population initialization: randomly generating an initial population P ═ S1,S2,...,Si,...,SNP},Si=(si,1,si,2,si,3,si,4,si,5,si,6) Is the i-th individual of the population P, si,1、si,2、si,3、si,4、si,5And si,6Is SiOf 6 elements of (a), wherein si,1、si,2And si,3Is in the value range of
Figure GDA0003267695010000038
si,4、si,5And si,6The value range of (a) is 0 to 2 pi;
8) for each individual in the population SiThe protein was docked with ATP according to the following manner and the score E for that individual was calculatedi
8.1) according to SiThe last three elements s ini,4、si,5And si,6Calculating a spatial rotation matrix R:
Figure GDA0003267695010000034
8.2) rotating all the atomic coordinates in A according to a rotation matrix R to obtain a new ATP structure AR
8.3) according to SiThe first three elements s ini,1、si,2、si,3A isRAll coordinates in (a) perform a translation process as follows, calculating a new ATP structure AT
Figure GDA0003267695010000035
Wherein
Figure GDA0003267695010000036
Is ATThe coordinates of the jth atom of (c),
Figure GDA0003267695010000037
are respectively ARX, Y, Z coordinates of the jth atom in (j) 1, 2.·, 31;
8.4) calculation of h binding residues CαThe distances between the atoms and all the atoms of ATP are calculated as followsi
Figure GDA0003267695010000041
Figure GDA0003267695010000042
Wherein g represents the type of the currently bound residue; c. Cg,jIs the probability that there is a contact between the g-type binding residue and the jth atom in ATP, corresponding to the value in the jth row and jth column of the contact matrix C; dh,jIs the currently binding residue CαThe distance between an atom and the jth atom in ATP; dmin=0.75×(rh+rj),rhAnd rjC representing the currently bound residue, respectivelyαThe van der waals radius of the atom and the jth atom in ATP;
Figure GDA0003267695010000047
9) according to a differential evolution algorithm, for each individual S in the population PiI ∈ {1,2, …, NP } performs the following:
9.1) randomly selecting three different individuals S from the Current population Pa、SbAnd ScWherein a, b and c are respectively belonged to {1,2, …, NP }, and a ≠ b ≠ c ≠ i, and the mutant individuals S are generated according to the following formulamutant
Figure GDA0003267695010000043
Smutant=Sa+F·(Sb-Sc)
9.2) generating crossed individuals S according to the following procedurecross1And Scross2
Figure GDA0003267695010000044
Wherein s iscross1,t、smutant,t、scross2,tAnd si,tAre each Scross1、Smutant、Scross2And SiIn (1)Element, t 1,2randIs a random integer between 1 and 6, and rand (0,1) is a random decimal between 0 and 1;
9.3) calculating S according to the score calculation mode of the step 8)cross1,Scross2And SiCorresponding score Ecross1,Ecross2And Ei
9.4) selection of Scross1,Scross2And SiReplacement of S in population P by the lowest scoring individuali(ii) a 10) G is G +1, if G ≧ GmaxThen record the lowest score E in the current population PminAnd corresponding ATP structure information
Figure GDA0003267695010000045
Will be provided with
Figure GDA0003267695010000046
Output as final ATP position information, otherwise return to step 9).
The technical conception of the invention is as follows: firstly, predicting binding residue information of protein-ATP (adenosine triphosphate) by using five protein binding residue prediction servers such as ATPbind and the like, and selecting residues with a large number of occurrences as binding residues by using a voting method, so that the accuracy of the binding residues is improved; secondly, extracting a contact probability matrix of the binding residues of the specific type and each atom of ATP from the PDB database, and scoring the generated conformation as an energy function to improve the docking accuracy; finally, the optimal individual is searched by using the improved differential evolution algorithm, so that the calculation efficiency is improved. The invention provides a protein ATP docking method based on contact probability assistance, which is low in calculation cost and high in prediction accuracy.
The beneficial effects of the invention are as follows: firstly, a plurality of protein binding residue prediction servers are used for predicting binding residues of protein-ATP, so that the reliability of the binding residues is improved; secondly, the extracted binding residues and an ATP atom contact probability matrix are utilized to assist in butt joint, so that the butt joint precision of the protein ATP is improved; thirdly, the improved differential evolution algorithm is adopted to search the space position of the ATP, and the searching efficiency of the algorithm is improved.
Drawings
FIG. 1 is a schematic diagram of a protein ATP docking method based on contact probability assistance.
FIG. 2 is a diagram of the structure of the complex obtained by docking protein 1e2q with ATP using a protein ATP docking method based on contact probability assistance.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, a protein ATP docking method based on contact probability assistance includes the following steps:
1) inputting the structures of the target protein and ATP, which are respectively marked as R and A;
2) predicting all ATP binding residues of the target protein R using an ATPbind server (http:// zhanglab. ccmb. med. umich. edu/ATPbind /), a TargetS server (http:// www.csbio.sjtu.edu.cn:8080/TargetS /), a TargetSOS server (http:// www.csbio.sjtu.edu.cn:8080/TargetSOS /), a TargetNUCs server (http://202.119.84.36:3079/TargetNUCs /), and a TargetTPsite server (http:// www.csbio.sjtu.edu.cn: 8080/TargetTPsite /), respectively;
3) for each possible binding residue, if three or more servers predict that the binding residue is a binding residue, the binding residue is used as the binding residue, and finally h protein binding residues are obtained and are marked as r1,r2,...,rh
4) Calculation of all binding residues r1,r2,...,rhCentral carbon atom CαThe mean value of the coordinates is obtained to obtain the central coordinate C of the binding residueR(ii) a Calculating the average value of all the atomic coordinates in A to obtain the central coordinate C of AAMoving A so that CAAnd CRThe coordinates of (2) are overlapped;
5) the probability of each type of binding residue coming into contact with each ATP atom is extracted from the PDB database as follows:
5.1) for each complex in the PDB database, C of binding residues of all residue types g is calculatedαBetween an atom and the jth atom in ATPAverage distance dg,jIf, if
Figure GDA0003267695010000067
Then order
Figure GDA0003267695010000061
Otherwise, it orders
Figure GDA0003267695010000062
Wherein g ═ {1,2, …,21} represents 21 residue types, j ═ {1,2, …,31} represents 31 ATP atoms,
Figure GDA0003267695010000063
indicating whether there is contact between a binding residue of residue type g in the kth complex and the jth atom in ATP;
5.2) calculation of all complexes
Figure GDA0003267695010000064
The average value of (1) is denoted as cg,jTo obtain a 21 × 31 dimensional contact probability matrix:
Figure GDA0003267695010000065
6) setting parameters: setting a population size NP, a scaling factor F0Cross probability CR, maximum number of iterations GmaxInitializing the iteration times G to be 0;
7) population initialization: randomly generating an initial population P ═ S1,S2,...,Si,...,SNP},Si=(si,1,si,2,si,3,si,4,si,5,si,6) Is the i-th individual of the population P, si,1、si,2、si,3、si,4、si,5And si,6Is SiOf 6 elements of (a), wherein si,1、si,2And si,3Is in the value range of
Figure GDA0003267695010000068
si,4、si,5And si,6The value range of (a) is 0 to 2 pi;
8) for each individual in the population SiThe protein was docked with ATP according to the following manner and the score E for that individual was calculatedi
8.1) according to SiThe last three elements s ini,4、si,5And si,6Calculating a spatial rotation matrix R:
Figure GDA0003267695010000066
8.2) rotating all the atomic coordinates in A according to a rotation matrix R to obtain a new ATP structure AR
8.3) according to SiThe first three elements s ini,1、si,2、si,3A isRAll coordinates in (a) perform a translation process as follows, calculating a new ATP structure AT
Figure GDA0003267695010000071
Wherein
Figure GDA0003267695010000072
Is ATThe coordinates of the jth atom of (c),
Figure GDA0003267695010000073
are respectively ARX, Y, Z coordinates of the jth atom in (j) 1, 2.·, 31;
8.4) calculation of h binding residues CαThe distances between the atoms and all the atoms of ATP are calculated as followsi
Figure GDA0003267695010000074
Figure GDA0003267695010000075
Wherein g represents the type of the currently bound residue; c. Cg,jIs the probability that there is a contact between the g-type binding residue and the jth atom in ATP, corresponding to the value in the jth row and jth column of the contact matrix C; dh,jIs the currently binding residue CαThe distance between an atom and the jth atom in ATP; dmin=0.75×(rh+rj),rhAnd rjC representing the currently bound residue, respectivelyαThe van der waals radius of the atom and the jth atom in ATP;
Figure GDA0003267695010000076
9) according to a differential evolution algorithm, for each individual S in the population PiI ∈ {1,2, …, NP } performs the following:
9.1) randomly selecting three different individuals S from the Current population Pa、SbAnd ScWherein a, b and c are respectively belonged to {1,2, …, NP }, and a ≠ b ≠ c ≠ i, and the mutant individuals S are generated according to the following formulamutant
Figure GDA0003267695010000077
Smutant=Sa+F·(Sb-Sc)
9.2) generating crossed individuals S according to the following procedurecross1And Scross2
Figure GDA0003267695010000078
Wherein s iscross1,t、smutant,t、scross2,tAnd si,tAre each Scross1、Smutant、Scross2And SiWherein t is 1,2, 6, trandIs a random integer between 1 and 6, and rand (0,1) is between 0 and 1Random decimal fraction;
9.3) calculating S according to the score calculation mode of the step 8)cross1,Scross2And SiCorresponding score Ecross1,Ecross2And Ei
9.4) selection of Scross1,Scross2And SiReplacement of S in population P by the lowest scoring individuali
10) G is G +1, if G ≧ GmaxThen record the lowest score E in the current population PminAnd corresponding ATP structure information
Figure GDA0003267695010000081
Will be provided with
Figure GDA0003267695010000082
Output as final ATP position information, otherwise return to step 9).
In this embodiment, taking the three-dimensional space structure of the compound after predicting the docking of the protein 1e2q and ATP as an example, a protein ATP docking method based on contact probability assistance comprises the following steps:
1) inputting the structures of the target protein and ATP, which are respectively marked as R and A;
2) predicting all ATP binding residues of the target protein R using an ATPbind server (http:// zhanglab. ccmb. med. umich. edu/ATPbind /), a TargetS server (http:// www.csbio.sjtu.edu.cn:8080/TargetS /), a TargetSOS server (http:// www.csbio.sjtu.edu.cn:8080/TargetSOS /), a TargetNUCs server (http://202.119.84.36:3079/TargetNUCs /), and a TargetTPsite server (http:// www.csbio.sjtu.edu.cn: 8080/TargetTPsite /), respectively;
3) for each possible binding residue, if three or more servers predict that the binding residue is a binding residue, the binding residue is used as the binding residue, and finally h protein binding residues are obtained and are marked as r1,r2,...,rh
4) Calculation of all binding residues r1,r2,...,rhCentral carbon atom CαAverage value of coordinates to obtainBinding residue center coordinate CR(ii) a Calculating the average value of all the atomic coordinates in A to obtain the central coordinate C of AAMoving A so that CAAnd CRThe coordinates of (2) are overlapped;
5) the probability of each type of binding residue coming into contact with each ATP atom is extracted from the PDB database as follows:
5.1) for each complex in the PDB database, C of binding residues of all residue types g is calculatedαAverage distance d between atom and jth atom in ATPg,jIf, if
Figure GDA0003267695010000086
Then order
Figure GDA0003267695010000083
Otherwise, it orders
Figure GDA0003267695010000084
Wherein g ═ {1,2, …,21} represents 21 residue types, j ═ {1,2, …,31} represents 31 ATP atoms,
Figure GDA0003267695010000085
indicating whether there is contact between a binding residue of residue type g in the kth complex and the jth atom in ATP;
5.2) calculation of all complexes
Figure GDA0003267695010000091
The average value of (1) is denoted as cg,jTo obtain a 21 × 31 dimensional contact probability matrix:
Figure GDA0003267695010000092
6) setting parameters: setting population size NP to 50, scaling factor F00.5, 0.5 cross probability CR, and maximum number of iterations Gmax500, initializing the iteration number G to 1;
7) population initialization: randomly generating an initial population P ═ S1,S2,...,Si,...,SNP},Si=(si,1,si,2,si,3,si,4,si,5,si,6) Is the i-th individual of the population P, si,1、si,2、si,3、si,4、si,5And si,6Is SiOf 6 elements of (a), wherein si,1、si,2And si,3Is in the value range of
Figure GDA0003267695010000093
si,4、si,5And si,6The value range of (a) is 0 to 2 pi;
8) for each individual in the population SiThe protein was docked with ATP according to the following manner and the score E for that individual was calculatedi
8.1) according to SiThe last three elements s ini,4、si,5And si,6Calculating a spatial rotation matrix R:
Figure GDA0003267695010000094
8.2) rotating all the atomic coordinates in A according to a rotation matrix R to obtain a new ATP structure AR
8.3) according to SiThe first three elements s ini,1、si,2、si,3A isRAll coordinates in (a) perform a translation process as follows, calculating a new ATP structure AT
Figure GDA0003267695010000095
Wherein
Figure GDA0003267695010000096
Is ATThe coordinates of the jth atom of (c),
Figure GDA0003267695010000097
are respectively ARX, Y, Z coordinates of the jth atom in (j) 1, 2.·, 31;
8.4) calculation of h binding residues CαThe distances between the atoms and all the atoms of ATP are calculated as followsi
Figure GDA0003267695010000098
Figure GDA0003267695010000101
Wherein g represents the type of the currently bound residue; c. Cg,jIs the probability that there is a contact between the g-type binding residue and the jth atom in ATP, corresponding to the value in the jth row and jth column of the contact matrix C; dh,jIs the currently binding residue CαThe distance between an atom and the jth atom in ATP; dmin=0.75×(rh+rj),rhAnd rjC representing the currently bound residue, respectivelyαThe van der waals radius of the atom and the jth atom in ATP;
Figure GDA0003267695010000102
9) according to a differential evolution algorithm, for each individual S in the population PiI ∈ {1,2, …, NP } performs the following:
9.1) randomly selecting three different individuals S from the Current population Pa、SbAnd ScWherein a, b and c are respectively belonged to {1,2, …, NP }, and a ≠ b ≠ c ≠ i, and the mutant individuals S are generated according to the following formulamutant
Figure GDA0003267695010000103
Smutant=Sa+F·(Sb-Sc)
9.2) generating crossed individuals S according to the following procedurecross1And Scross2
Figure GDA0003267695010000104
Wherein s iscross1,t、smutant,t、scross2,tAnd si,tAre each Scross1、Smutant、Scross2And SiWherein t is 1,2, 6, trandIs a random integer between 1 and 6, and rand (0,1) is a random decimal between 0 and 1;
9.3) calculating S according to the score calculation mode of the step 8)cross1,Scross2 Scross1And SiCorresponding score Ecross1,Ecross2And Ei
9.4) selection of Scross1,Scross2And SiReplacement of S in population P by the lowest scoring individuali
10) G is G +1, if G ≧ GmaxThen record the lowest score E in the current population PminAnd corresponding ATP structure information
Figure GDA0003267695010000105
Will be provided with
Figure GDA0003267695010000106
Output as final ATP position information, otherwise return to step 9).
Using the three-dimensional spatial structure of the protein 1e2q and ATP as an example, the root mean square deviation of the three-dimensional spatial structure information of the complex of the protein 1e2q and ATP obtained by the above method from the complex structure measured by the wet experiment is
Figure GDA0003267695010000111
The predicted protein ATP complex structure is shown in figure 2.
The above description is the prediction result of the protein 1e2q and ATP as examples in the present invention, and is not intended to limit the scope of the present invention, and various modifications and improvements can be made without departing from the scope of the present invention.

Claims (1)

1. A protein ATP docking method based on contact probability assistance is characterized in that: the butt joint method comprises the following steps:
1) inputting the structures of the target protein and ATP, which are respectively marked as R and A;
2) predicting all ATP binding residues of the target protein R by using an ATPbind server, a TargetS server, a TargetSOS server, a TargetNUCs server and a TargetTPsite server respectively;
3) for each possible binding residue, if three or more servers predict that the binding residue is a binding residue, the binding residue is used as the binding residue, and finally h protein binding residues are obtained and are marked as r1,r2,...,rh
4) Calculation of all binding residues r1,r2,...,rhCentral carbon atom CαThe mean value of the coordinates is obtained to obtain the central coordinate C of the binding residueR(ii) a Calculating the average value of all the atomic coordinates in A to obtain the central coordinate C of AAMoving A so that CAAnd CRThe coordinates of (2) are overlapped;
5) the probability of each type of binding residue coming into contact with each ATP atom is extracted from the PDB database as follows:
5.1) for each complex in the PDB database, C of binding residues of all residue types g is calculatedαAverage distance d between atom and jth atom in ATPg,jIf, if
Figure FDA0003267690000000011
Then order
Figure FDA0003267690000000012
Otherwise, it orders
Figure FDA0003267690000000013
Wherein g ═ {1,2, …,21} represents 21 residue types, j ═ {1,2, …,31} represents 31 ATP atoms,
Figure FDA0003267690000000014
indicating whether there is contact between a binding residue of residue type g in the kth complex and the jth atom in ATP;
5.2) calculation of all complexes
Figure FDA0003267690000000015
The average value of (1) is denoted as cg,jTo obtain a 21 × 31 dimensional contact probability matrix:
Figure FDA0003267690000000016
6) setting parameters: setting a population size NP, a scaling factor F0Cross probability CR, maximum number of iterations GmaxInitializing the iteration times G to be 0;
7) population initialization: randomly generating an initial population P ═ S1,S2,...,Si,...,SNP},Si=(si,1,si,2,si,3,si,4,si,5,si,6) Is the i-th individual of the population P, si,1、si,2、si,3、si,4、si,5And si,6Is SiOf 6 elements of (a), wherein si,1、si,2And si,3Is in the value range of
Figure FDA0003267690000000021
si,4、si,5And si,6The value range of (a) is 0 to 2 pi;
8) for each individual in the population SiThe protein was docked with ATP according to the following manner and the score E for that individual was calculatedi
8.1) according to SiThe last three elements s ini,4、si,5And si,6Calculating a spatial rotation matrix R:
Figure FDA0003267690000000022
8.2) rotating all the atomic coordinates in A according to a rotation matrix R to obtain a new ATP structure AR
8.3) according to SiThe first three elements s ini,1、si,2、si,3A isRAll coordinates in (a) perform a translation process as follows, calculating a new ATP structure AT
Figure FDA0003267690000000023
Wherein
Figure FDA0003267690000000024
Is ATThe coordinates of the jth atom of (c),
Figure FDA0003267690000000025
are respectively ARX, Y, Z coordinates of the jth atom in (j) 1, 2.·, 31;
8.4) calculation of h binding residues CαThe distances between the atoms and all the atoms of ATP are calculated as followsi
Figure FDA0003267690000000026
Figure FDA0003267690000000027
Wherein g represents the type of the currently bound residue; c. Cg,jIs the probability that a contact exists between a g-type binding residue and the jth atom in ATP, corresponding to the jth column of the g-th row in the contact matrix CThe value of (d); dh,jIs the currently binding residue CαThe distance between an atom and the jth atom in ATP; dmin=0.75×(rh+rj),rhAnd rjC representing the currently bound residue, respectivelyαThe van der waals radius of the atom and the jth atom in ATP;
Figure FDA0003267690000000028
9) according to a differential evolution algorithm, for each individual S in the population PiI ∈ {1,2, …, NP } performs the following:
9.1) randomly selecting three different individuals S from the Current population Pa、SbAnd ScWherein a, b and c are respectively belonged to {1,2, …, NP }, and a ≠ b ≠ c ≠ i, and the mutant individuals S are generated according to the following formulamutant
Figure FDA0003267690000000031
Smutant=Sa+F·(Sb-Sc)
9.2) generating crossed individuals S according to the following procedurecross1And Scross2
Figure FDA0003267690000000032
Wherein s iscross1,t、smutant,t、scross2,tAnd si,tAre each Scross1、Smutant、Scross2And SiWherein t is 1,2, 6, trandIs a random integer between 1 and 6, and rand (0,1) is a random decimal between 0 and 1;
9.3) calculating S according to the score calculation mode of the step 8)cross1,Scross2And SiCorresponding score Ecross1,Ecross2And Ei
9.4) selectionScross1,Scross2And SiReplacement of S in population P by the lowest scoring individuali
10) G is G +1, if G ≧ GmaxThen record the lowest score E in the current population PminAnd corresponding ATP structure information
Figure FDA0003267690000000033
Will be provided with
Figure FDA0003267690000000034
Output as final ATP position information, otherwise return to step 9).
CN201910805001.6A 2019-08-29 2019-08-29 Protein ATP docking method based on contact probability assistance Active CN110689929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910805001.6A CN110689929B (en) 2019-08-29 2019-08-29 Protein ATP docking method based on contact probability assistance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910805001.6A CN110689929B (en) 2019-08-29 2019-08-29 Protein ATP docking method based on contact probability assistance

Publications (2)

Publication Number Publication Date
CN110689929A CN110689929A (en) 2020-01-14
CN110689929B true CN110689929B (en) 2021-12-17

Family

ID=69108516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910805001.6A Active CN110689929B (en) 2019-08-29 2019-08-29 Protein ATP docking method based on contact probability assistance

Country Status (1)

Country Link
CN (1) CN110689929B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360596A (en) * 2018-08-30 2019-02-19 浙江工业大学 A kind of protein conformation space optimization method based on differential evolution local dip
CN109461470A (en) * 2018-08-29 2019-03-12 浙江工业大学 A kind of protein structure prediction energy function weight optimization method
CN109524058A (en) * 2018-11-07 2019-03-26 浙江工业大学 A kind of protein dimer Structure Prediction Methods based on differential evolution
WO2019080829A1 (en) * 2017-10-23 2019-05-02 Shanghaitech University Compositions and methods for detecting molecule-molecule interactions

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3022863A1 (en) * 2016-05-02 2017-11-09 Encodia, Inc. Macromolecule analysis employing nucleic acid encoding of molecular recognition events

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019080829A1 (en) * 2017-10-23 2019-05-02 Shanghaitech University Compositions and methods for detecting molecule-molecule interactions
CN109461470A (en) * 2018-08-29 2019-03-12 浙江工业大学 A kind of protein structure prediction energy function weight optimization method
CN109360596A (en) * 2018-08-30 2019-02-19 浙江工业大学 A kind of protein conformation space optimization method based on differential evolution local dip
CN109524058A (en) * 2018-11-07 2019-03-26 浙江工业大学 A kind of protein dimer Structure Prediction Methods based on differential evolution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Protein ligand-specific binding residue predictions by an ensemble classifier";Hu X;《BMC Bioinformatics》;20161231;第1-12页 *
"识别蛋白质配体绑定残基的生物计算方法综述";於东军;《数据采集与处理》;20180331;第195-206页 *

Also Published As

Publication number Publication date
CN110689929A (en) 2020-01-14

Similar Documents

Publication Publication Date Title
Basith et al. Machine intelligence in peptide therapeutics: A next‐generation tool for rapid disease screening
Wang et al. The PDBbind database: methodologies and updates
Lin et al. Efficient classification of hot spots and hub protein interfaces by recursive feature elimination and gradient boosting
CN109524058B (en) Protein dimer structure prediction method based on differential evolution
CN109785901B (en) Protein function prediction method and device
Durairaj et al. Geometricus represents protein structures as shape-mers derived from moment invariants
CN108846256B (en) Group protein structure prediction method based on residue contact information
Zhang et al. A new graph autoencoder-based consensus-guided model for scRNA-seq cell type detection
CN110600075B (en) Protein ATP docking method based on ligand growth strategy
CN109872770B (en) Variable strategy protein structure prediction method combined with displacement degree evaluation
Pearce et al. Fast and accurate Ab Initio Protein structure prediction using deep learning potentials
CN109101785B (en) Protein structure prediction method based on secondary structure similarity selection strategy
CN110689929B (en) Protein ATP docking method based on contact probability assistance
Roshan Multiple sequence alignment using Probcons and Probalign
CN110600076B (en) Protein ATP docking method based on distance and angle information
CN108920894B (en) Protein conformation space optimization method based on brief abstract convex estimation
CN109360597B (en) Group protein structure prediction method based on global and local strategy cooperation
CN110197700B (en) Protein ATP docking method based on differential evolution
Zhang et al. Two-stage distance feature-based optimization algorithm for de novo protein structure prediction
Yue et al. A systematic review on the state-of-the-art strategies for protein representation
Dhakal et al. Predicting Protein-Ligand Binding Structure Using E (n) Equivariant Graph Neural Networks
Lu et al. Research on DNA-binding protein identification method based on LSTM-CNN feature fusion
CN109448786B (en) Method for predicting protein structure by lower bound estimation dynamic strategy
Wang et al. SAPocket: Finding pockets on protein surfaces with a focus towards position and voxel channels
CN111180006B (en) Template pocket searching method based on energy function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant