CN110689929B

CN110689929B - Protein ATP docking method based on contact probability assistance

Info

Publication number: CN110689929B
Application number: CN201910805001.6A
Authority: CN
Inventors: 张贵军; 饶亮; 刘俊; 赵凯龙; 胡俊; 周晓根
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2021-12-17
Anticipated expiration: 2039-08-29
Also published as: CN110689929A

Abstract

A protein ATP docking method based on contact probability assistance comprises the steps that firstly, binding residue information of protein-ATP is predicted by using five protein binding residue prediction servers such as ATPbind and the like, residues with large occurrence times are selected as binding residues by using a voting method, and the accuracy of the binding residues is improved; secondly, extracting a contact probability matrix of the binding residues of the specific type and each atom of ATP from the PDB database, and scoring the generated conformation as an energy function to improve the docking accuracy; and finally, searching for the optimal individual by using an improved differential evolution algorithm, thereby improving the calculation efficiency. The invention provides a protein ATP docking method based on contact probability assistance, which is low in calculation cost and high in prediction accuracy.

Description

Protein ATP docking method based on contact probability assistance

Technical Field

The invention relates to the fields of bioinformatics, intelligent optimization and computer application, in particular to a protein ATP docking method based on contact probability assistance.

Background

With the continuous research of proteomics, it is more and more common to find that proteins and some ligand small molecules are combined into a whole to play a role in organisms. Throughout life, protein-ligand mutual recognition processes, including substrate-enzyme, antigen-antibody, hormone-receptor recognition, are important bases for molecular mechanisms and regulation processes of various biological functions. The mutual recognition and action of proteins and ligands are important ways for proteins to exert their biological functions, and play very important roles in various life activities, such as gene regulation, signal transduction, immune response, etc., which are not separated from the interaction of proteins and ligands. ATP is also a small molecule ligand, it is a widely distributed energy molecule in the human body, through the action of ATP hydrolase, the released energy becomes ADP, ADP can form ATP through the action of ATP synthetase, and both processes need to combine with enzyme protein to occur. The research on the molecular recognition mechanism between protein and its ligand, the establishment of recognition model and the research on the relationship between molecular recognition and molecular selectivity not only have very important significance for revealing the biological essence, but also can be applied to guide the design and synthesis of compounds with special recognition function and bioactivity.

At present, the wet experimental methods mainly adopted for determining the structure of the protein-ligand complex comprise X-ray crystal diffraction, nuclear magnetic resonance and the like, but the methods for determining the structure of the protein-ligand complex have the defects of great difficulty, high cost and long time. In recent years, with the continuous enhancement of computer technology and the rapid development and wide application of molecular simulation method theory, molecular simulation methods such as homologous modeling, molecular docking, molecular dynamics simulation, binding free energy calculation, quantum mechanics calculation and the like have become important means for researching the interaction mechanism and dynamic process of protein and ligand. The molecular simulation method provides a good means for researching life phenomena and revealing essential rules of the life phenomena on the molecular level or even the atomic level, and can provide powerful theoretical guidance for experiments. With the theoretical perfection of molecular simulation and the advancement of technology, molecular simulation methods are increasingly being used in the research work of protein structure and function, mutual recognition of protein and ligand, and drug design.

Computer molecular simulation techniques rely primarily on the process of searching for complex structures with the lowest energy using intelligent algorithms and energy functions. However, at present, an energy function can perfectly judge the energy of the complex, besides, the inaccurate prediction of protein binding residues can also cause errors of the energy function, so that the predicted complex structure is inaccurate, and some intelligent algorithms also have the problems of long search time or inaccurate search results.

Therefore, the existing protein and ligand molecule docking methods have defects in prediction accuracy and computational cost, and need to be improved.

Disclosure of Invention

In order to overcome the defects of the conventional protein and ligand ATP docking method in the aspects of prediction accuracy and calculation cost, the invention provides a contact probability-assisted protein ATP docking method which is low in calculation cost and high in prediction accuracy.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a contact probability-assisted protein ATP docking method, the method comprising the steps of:

1) inputting the structures of the target protein and ATP, which are respectively marked as R and A;

2) predicting all ATP binding residues of the target protein R using an ATPbind server (http:// zhanglab. ccmb. med. umich. edu/ATPbind /), a TargetS server (http:// www.csbio.sjtu.edu.cn:8080/TargetS /), a TargetSOS server (http:// www.csbio.sjtu.edu.cn:8080/TargetSOS /), a TargetNUCs server (http://202.119.84.36:3079/TargetNUCs /), and a TargetTPsite server (http:// www.csbio.sjtu.edu.cn: 8080/TargetTPsite /), respectively;

3) for each possible binding residue, if three or more servers predict that the binding residue is a binding residue, the binding residue is used as the binding residue, and finally h protein binding residues are obtained and are marked as r₁,r₂,...,r_h；

4) Calculation of all binding residues r₁,r₂,...,r_hCentral carbon atom C_αThe mean value of the coordinates is obtained to obtain the central coordinate C of the binding residue_R(ii) a Calculating the average value of all the atomic coordinates in A to obtain the central coordinate C of A_AMoving A so that C_AAnd C_RThe coordinates of (2) are overlapped;

5) the probability of each type of binding residue coming into contact with each ATP atom is extracted from the PDB database as follows:

5.1) for each complex in the PDB databaseCalculating the C of the binding residue of all residue types g_αAverage distance d between atom and jth atom in ATP_g,jIf, if

Then order

Otherwise, it orders

Wherein g ═ {1,2, …,21} represents 21 residue types, j ═ {1,2, …,31} represents 31 ATP atoms,

indicating whether there is contact between a binding residue of residue type g in the kth complex and the jth atom in ATP;

5.2) calculation of all complexes

The average value of (1) is denoted as c_g,jTo obtain a 21 × 31 dimensional contact probability matrix:

6) setting parameters: setting a population size NP, a scaling factor F₀Cross probability CR, maximum number of iterations G_maxInitializing the iteration times G to be 0;

7) population initialization: randomly generating an initial population P ═ S₁,S₂,...,S_i,...,S_NP}，S_i＝(s_i,1,s_i,2,s_i,3,s_i,4,s_i,5,s_i,6) Is the i-th individual of the population P, s_i,1、s_i,2、s_i,3、s_i,4、s_i,5And s_i,6Is S_iOf 6 elements of (a), wherein s_i,1、s_i,2And s_i,3Is in the value range of

s_i,4、s_i,5And s_i,6The value range of (a) is 0 to 2 pi;

8) for each individual in the population S_iThe protein was docked with ATP according to the following manner and the score E for that individual was calculated_i：

8.1) according to S_iThe last three elements s in_i,4、s_i,5And s_i,6Calculating a spatial rotation matrix R:

8.2) rotating all the atomic coordinates in A according to a rotation matrix R to obtain a new ATP structure A^R；

8.3) according to S_iThe first three elements s in_i,1、s_i,2、s_i,3A is^RAll coordinates in (a) perform a translation process as follows, calculating a new ATP structure A^T：

Wherein

Is A^TThe coordinates of the jth atom of (c),

are respectively A^RX, Y, Z coordinates of the jth atom in (j) 1, 2.·, 31;

8.4) calculation of h binding residues C_αThe distances between the atoms and all the atoms of ATP are calculated as follows_i：

Wherein g represents the type of the currently bound residue; c. C_g,jIs the probability that there is a contact between the g-type binding residue and the jth atom in ATP, corresponding to the value in the jth row and jth column of the contact matrix C; d_h,jIs the currently binding residue C_αThe distance between an atom and the jth atom in ATP; d_min＝0.75×(r^h+r^j)，r^hAnd r^jC representing the currently bound residue, respectively_αThe van der waals radius of the atom and the jth atom in ATP;

9) according to a differential evolution algorithm, for each individual S in the population P_iI ∈ {1,2, …, NP } performs the following:

9.1) randomly selecting three different individuals S from the Current population P_a、S_bAnd S_cWherein a, b and c are respectively belonged to {1,2, …, NP }, and a ≠ b ≠ c ≠ i, and the mutant individuals S are generated according to the following formula_mutant：

S_mutant＝S_a+F·(S_b-S_c)

9.2) generating crossed individuals S according to the following procedure_cross1And S_cross2：

Wherein s is_cross1,t、s_mutant,t、s_cross2,tAnd s_i,tAre each S_cross1、S_mutant、S_cross2And S_iIn (1)Element, t 1,2_randIs a random integer between 1 and 6, and rand (0,1) is a random decimal between 0 and 1;

9.3) calculating S according to the score calculation mode of the step 8)_cross1，S_cross2And S_iCorresponding score E_cross1，E_cross2And E_i；

9.4) selection of S_cross1，S_cross2And S_iReplacement of S in population P by the lowest scoring individual_i(ii) a 10) G is G +1, if G ≧ G_maxThen record the lowest score E in the current population P_minAnd corresponding ATP structure information

Will be provided with

Output as final ATP position information, otherwise return to step 9).

The technical conception of the invention is as follows: firstly, predicting binding residue information of protein-ATP (adenosine triphosphate) by using five protein binding residue prediction servers such as ATPbind and the like, and selecting residues with a large number of occurrences as binding residues by using a voting method, so that the accuracy of the binding residues is improved; secondly, extracting a contact probability matrix of the binding residues of the specific type and each atom of ATP from the PDB database, and scoring the generated conformation as an energy function to improve the docking accuracy; finally, the optimal individual is searched by using the improved differential evolution algorithm, so that the calculation efficiency is improved. The invention provides a protein ATP docking method based on contact probability assistance, which is low in calculation cost and high in prediction accuracy.

The beneficial effects of the invention are as follows: firstly, a plurality of protein binding residue prediction servers are used for predicting binding residues of protein-ATP, so that the reliability of the binding residues is improved; secondly, the extracted binding residues and an ATP atom contact probability matrix are utilized to assist in butt joint, so that the butt joint precision of the protein ATP is improved; thirdly, the improved differential evolution algorithm is adopted to search the space position of the ATP, and the searching efficiency of the algorithm is improved.

Drawings

FIG. 1 is a schematic diagram of a protein ATP docking method based on contact probability assistance.

FIG. 2 is a diagram of the structure of the complex obtained by docking protein 1e2q with ATP using a protein ATP docking method based on contact probability assistance.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 and 2, a protein ATP docking method based on contact probability assistance includes the following steps:

5.1) for each complex in the PDB database, C of binding residues of all residue types g is calculated_αBetween an atom and the jth atom in ATPAverage distance d_g,jIf, if

Then order

Otherwise, it orders

5.2) calculation of all complexes

s_i,4、s_i,5And s_i,6The value range of (a) is 0 to 2 pi;

Wherein

Is A^TThe coordinates of the jth atom of (c),

are respectively A^RX, Y, Z coordinates of the jth atom in (j) 1, 2.·, 31;

S_mutant＝S_a+F·(S_b-S_c)

Wherein s is_cross1,t、s_mutant,t、s_cross2,tAnd s_i,tAre each S_cross1、S_mutant、S_cross2And S_iWherein t is 1,2, 6, t_randIs a random integer between 1 and 6, and rand (0,1) is between 0 and 1Random decimal fraction;

9.4) selection of S_cross1，S_cross2And S_iReplacement of S in population P by the lowest scoring individual_i；

10) G is G +1, if G ≧ G_maxThen record the lowest score E in the current population P_minAnd corresponding ATP structure information

Will be provided with

Output as final ATP position information, otherwise return to step 9).

In this embodiment, taking the three-dimensional space structure of the compound after predicting the docking of the protein 1e2q and ATP as an example, a protein ATP docking method based on contact probability assistance comprises the following steps:

4) Calculation of all binding residues r₁,r₂,...,r_hCentral carbon atom C_αAverage value of coordinates to obtainBinding residue center coordinate C_R(ii) a Calculating the average value of all the atomic coordinates in A to obtain the central coordinate C of A_AMoving A so that C_AAnd C_RThe coordinates of (2) are overlapped;

5.1) for each complex in the PDB database, C of binding residues of all residue types g is calculated_αAverage distance d between atom and jth atom in ATP_g,jIf, if

Then order

Otherwise, it orders

5.2) calculation of all complexes

6) setting parameters: setting population size NP to 50, scaling factor F₀0.5, 0.5 cross probability CR, and maximum number of iterations G_max500, initializing the iteration number G to 1;

s_i,4、s_i,5And s_i,6The value range of (a) is 0 to 2 pi;

Wherein

Is A^TThe coordinates of the jth atom of (c),

are respectively A^RX, Y, Z coordinates of the jth atom in (j) 1, 2.·, 31;

S_mutant＝S_a+F·(S_b-S_c)

Wherein s is_cross1,t、s_mutant,t、s_cross2,tAnd s_i,tAre each S_cross1、S_mutant、S_cross2And S_iWherein t is 1,2, 6, t_randIs a random integer between 1 and 6, and rand (0,1) is a random decimal between 0 and 1;

9.3) calculating S according to the score calculation mode of the step 8)_cross1，S_cross2 S_cross1And S_iCorresponding score E_cross1，E_cross2And E_i；

Will be provided with

Output as final ATP position information, otherwise return to step 9).

Using the three-dimensional spatial structure of the protein 1e2q and ATP as an example, the root mean square deviation of the three-dimensional spatial structure information of the complex of the protein 1e2q and ATP obtained by the above method from the complex structure measured by the wet experiment is

The predicted protein ATP complex structure is shown in figure 2.

The above description is the prediction result of the protein 1e2q and ATP as examples in the present invention, and is not intended to limit the scope of the present invention, and various modifications and improvements can be made without departing from the scope of the present invention.

Claims

1. A protein ATP docking method based on contact probability assistance is characterized in that: the butt joint method comprises the following steps:

2) predicting all ATP binding residues of the target protein R by using an ATPbind server, a TargetS server, a TargetSOS server, a TargetNUCs server and a TargetTPsite server respectively;

Then order

Otherwise, it orders

5.2) calculation of all complexes

s_i,4、s_i,5And s_i,6The value range of (a) is 0 to 2 pi;

Wherein

Is A^TThe coordinates of the jth atom of (c),

are respectively A^RX, Y, Z coordinates of the jth atom in (j) 1, 2.·, 31;

Wherein g represents the type of the currently bound residue; c. C_g,jIs the probability that a contact exists between a g-type binding residue and the jth atom in ATP, corresponding to the jth column of the g-th row in the contact matrix CThe value of (d); d_h,jIs the currently binding residue C_αThe distance between an atom and the jth atom in ATP; d_min＝0.75×(r^h+r^j)，r^hAnd r^jC representing the currently bound residue, respectively_αThe van der waals radius of the atom and the jth atom in ATP;

S_mutant＝S_a+F·(S_b-S_c)

9.4) selectionS_cross1，S_cross2And S_iReplacement of S in population P by the lowest scoring individual_i；

Will be provided with

Output as final ATP position information, otherwise return to step 9).