CN108614957B - Multi-stage protein structure prediction method based on Shannon entropy - Google Patents
Multi-stage protein structure prediction method based on Shannon entropy Download PDFInfo
- Publication number
- CN108614957B CN108614957B CN201810238703.6A CN201810238703A CN108614957B CN 108614957 B CN108614957 B CN 108614957B CN 201810238703 A CN201810238703 A CN 201810238703A CN 108614957 B CN108614957 B CN 108614957B
- Authority
- CN
- China
- Prior art keywords
- state
- stage
- population
- current
- shannon entropy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000000455 protein structure prediction Methods 0.000 title claims abstract description 15
- 230000007704 transition Effects 0.000 claims abstract description 26
- 230000008569 process Effects 0.000 claims abstract description 21
- 239000012634 fragment Substances 0.000 claims description 20
- 239000011159 matrix material Substances 0.000 claims description 20
- 238000009825 accumulation Methods 0.000 claims description 4
- 230000001186 cumulative effect Effects 0.000 claims description 4
- 238000003064 k means clustering Methods 0.000 claims description 4
- 238000004422 calculation algorithm Methods 0.000 abstract description 5
- 230000009466 transformation Effects 0.000 abstract description 2
- 108090000623 proteins and genes Proteins 0.000 description 19
- 102000004169 proteins and genes Human genes 0.000 description 19
- 238000005457 optimization Methods 0.000 description 4
- 238000002424 x-ray crystallography Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 101000859758 Homo sapiens Cartilage-associated protein Proteins 0.000 description 1
- 101000916686 Homo sapiens Cytohesin-interacting protein Proteins 0.000 description 1
- 101000726740 Homo sapiens Homeobox protein cut-like 1 Proteins 0.000 description 1
- 101000761460 Homo sapiens Protein CASP Proteins 0.000 description 1
- 108010052285 Membrane Proteins Proteins 0.000 description 1
- 101000761459 Mesocricetus auratus Calcium-dependent serine proteinase Proteins 0.000 description 1
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- 102000015636 Oligopeptides Human genes 0.000 description 1
- 108010038807 Oligopeptides Proteins 0.000 description 1
- 102100024933 Protein CASP Human genes 0.000 description 1
- 125000003275 alpha amino acid group Chemical group 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000009833 condensation Methods 0.000 description 1
- 230000005494 condensation Effects 0.000 description 1
- 238000002425 crystallisation Methods 0.000 description 1
- 230000008025 crystallization Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000018044 dehydration Effects 0.000 description 1
- 238000006297 dehydration reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009510 drug design Methods 0.000 description 1
- 238000001493 electron microscopy Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 229920002521 macromolecule Polymers 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 239000005416 organic matter Substances 0.000 description 1
- 229920001184 polypeptide Polymers 0.000 description 1
- 102000004196 processed proteins & peptides Human genes 0.000 description 1
- 108090000765 processed proteins & peptides Proteins 0.000 description 1
- 230000012846 protein folding Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Engineering & Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Chemical & Material Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Complex Calculations (AREA)
Abstract
A multi-stage protein structure prediction method based on Shannon entropy comprises the steps of firstly utilizing a Rosetta Abinitio protocol to search a search space, and finding out a potential natural state region through clustering background points; then, performing a prediction process in stages under the framework of a population evolution algorithm, analyzing the relation between each generation of population and the potential natural state area, and indicating the evolution state of the current population by classification; secondly, calculating state transition matrixes of two generations before and after the population and measuring the state transformation condition of the population by using the Shannon entropy; and finally, carrying out stage switching according to the accumulated times of the Shannon entropy value within a certain threshold value, and taking the last generation of population as a final prediction result. The invention provides a multi-stage protein structure prediction method based on Shannon entropy, which is used for dynamically switching stages according to the Shannon entropy so that the prediction precision and robustness of an algorithm are obviously improved.
Description
Technical Field
The invention relates to the fields of bioinformatics, intelligent optimization and computer application, in particular to a multi-stage protein structure prediction method based on Shannon entropy.
Background
The protein is the material basis of life, is an organic macromolecule, is a basic organic matter constituting cells, is the main undertaker of life activities, and is a substance with a certain spatial structure formed by the way that polypeptide chains consisting of amino acids in a dehydration condensation mode are coiled and folded. Multiple proteins can perform a particular function by folding or spiraling into a spatial structure, often by binding together to form a stable protein complex. The three-dimensional structure of proteins is of decisive importance in drug design, protein engineering and biotechnology, and therefore, protein structure prediction is an important research issue.
Experimental measurement methods for protein structure include X-ray crystallography, nuclear magnetic resonance spectroscopy, electron microscopy, and the like, and these methods are widely used for protein structure measurement. X-ray crystallography is considered one of the relatively feasible and accurate determination methods among these methods. However, X-ray crystallography requires a complex crystallization process and for some proteins that do not crystallize readily (e.g., membrane proteins), this method cannot be used for structural determination. In addition, these experimental assays are extremely time consuming, expensive, and prone to error.
According to the Anfinsen principle, a three-dimensional structure of a protein is directly predicted from an amino acid sequence by using a computer as a tool and applying an appropriate algorithm, and the prediction is a main research subject in bioinformatics at present. And the de novo prediction method is an optimization method for establishing a protein physical or knowledge energy model based on the Anfinsen hypothesis and then designing a proper optimization algorithm to solve the minimum energy conformation. On one hand, the method is helpful to reveal the protein folding mechanism in a biological sense, and further can finally clarify the second genetic code theoretical part in the biological center rule; on the other hand, this approach is universal in a practical sense, and de novo prediction methods are the only choice for sequence similarity < 20% or oligopeptides (<10 residues of small proteins). Rosetta, QUARK, etc. build energy models based on knowledge, which have been highlighted in past CASP events. However, when the method predicts a target protein with a long sequence, the search space increases exponentially, the prediction accuracy decreases sharply, and thus the problems of insufficient sampling capability, improper phase switching, incapability of retaining excellent intermediate results, and waste of computing resources are caused.
Therefore, the existing multi-stage protein structure prediction method based on the energy function has defects in stage switching and prediction accuracy, and needs to be improved.
Disclosure of Invention
In order to overcome the defects of the conventional multi-stage protein structure prediction method based on an energy function in the aspects of stage switching and prediction precision, the invention provides a Shannon entropy guided multi-stage switching protein structure prediction method with reasonable stage switching and high prediction precision.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a multi-stage protein structure prediction method based on shannon entropy, the method comprising the steps of:
1) giving input sequence information, and obtaining a fragment library of the sequence by using a Robeta server;
2) and (3) constructing a Markov state model by the following process:
2.1) acquiring nstruct background points: operating the Rosetta Abinitio protocol for nstruct times, and recording the conformation result of each operation as a background point;
2.2) calculating the root mean square difference distance RMSD between the nstruct background points to form a distance matrix D;
2.3) classifying the nstruct background points by using a k-means clustering method according to the distance matrix D to obtain m cluster centers serving as m Markov states;
3) initialization: performing the current stage NP times of Rosetta Abinitio according to the input sequence to generate an initial conformation population P ═ C { (NP), wherein the current stage is 1, the Shannon entropy threshold value alpha and the Shannon entropy maximum accumulation times count _ max1,C2,...,CNPIn which C isNPRepresents the Nth individual;
4) calculating the current population state: for individual C in the populationiI ∈ { 1.,. NP } classification: calculating CiRMSD distance from m cluster centers, if CiThe p cluster center is nearest, then the current state of the individualiP, p ∈ {1, 2.., m }, and the state of the entire population is denoted as statelast={state1,state2,...,stateNP},statelastThe group state of the previous generation is referred to as the state + 1;
5) let the cumulative number count of shannon entropy be 0, enter the next stage, and the process is as follows:
5.1) performing corresponding phase prediction on the population, wherein the process is as follows:
5.1.1) to individuals CiFragment Assembly to give C'iAnd is combined withEnergy E of the conformation before and after fragment assembly was evaluated using the energy function at this stagestage(Ci)、E′stage(C′i);
5.1.2) if Estage(Ci)>E′stage(C′i) Then, the current fragment assembly C is acceptedi=Ci'; otherwise, the selection is made using the Metropolis criteria and p ═ exp (- (E) is calculatedstage(Ci)-Estage(C′i) If p > rand (0,1), accepting the current fragment assembly Ci=Ci'; otherwise, rejecting the segment assembly;
5.1.3) executing the steps 5.1.1) to 5.1.2) on all individuals to obtain a next generation population;
5.2) calculating the current population state: for individual C in the populationiI ∈ {1, 2.,. NP } classification: calculating CiRMSD distance from m cluster centers, if CiClosest to the q, q e {1,2,. the.m } cluster centers, then the individual's current state'iQ, the state of the entire population is denoted as statenow={state′1,state′2,...,state′NP},statenowThe current population state is indicated;
5.3) obtaining a Markov state transition matrix T according to the state statistics of the previous generation and the next generation: for conformation CiTwo preceding and succeeding state states of i ∈ { 1.,. NP }iP and state'iQ indicates a transition from state p to state q, then tpq=tpq+1/m,tpqThe value of the matrix T in the p th row and the q th column represents the state transition frequency, and the initial value of the state transition frequency is 0;
5.4) calculating the Shannon Entropy value Encopy ∑ -T according to the state transition matrix Tpq lntpq;
5.5) update the State of the Current Statelast=statenow;
5.6) if Encopy < alpha, considering that the population state transition is more definite, and then count is equal to count + 1;
5.7) if the count is less than the count _ max, continuing to execute the current stage and returning to the step 5.1); otherwise, switching stages, namely, changing the stage to the stage +1, returning to the step 5 if the stage is less than 5), otherwise, ending the fourth stage prediction process, and outputting a prediction result.
The technical conception of the invention is as follows: firstly, searching a search space by using a Rosetta Abinitio protocol, and finding a potential natural state region by clustering background points; then, performing a prediction process in stages under the framework of a population evolution algorithm, analyzing the relation between each generation of population and the potential natural state area, and indicating the evolution state of the current population by classification; secondly, calculating state transition matrixes of two generations before and after the population and measuring the state transformation condition of the population by using the Shannon entropy; and finally, carrying out stage switching according to the accumulated times of the Shannon entropy value within a certain threshold value, and taking the last generation of population as a final prediction result.
The beneficial effects of the invention are as follows: on one hand, the potential natural state area is searched by using the clustering of the background points, so that the search space is reduced, and the calculation cost is reduced; on the other hand, the evolution condition of the population is measured according to the Shannon entropy so as to switch stages, the iteration times of each stage can be dynamically adjusted according to the size of the search space, and the prediction precision and the robustness are improved.
Drawings
FIG. 1 is a multi-stage protein structure prediction method based on Shannon entropy to perform structure prediction on protein 1ACF to obtain conformational energy and RMSD distribution compared with a natural state.
Fig. 2 is a three-dimensional structure diagram obtained by performing structure prediction on the protein 1ACF by a multi-stage protein structure prediction method based on shannon entropy.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, a multi-stage protein structure prediction method based on shannon entropy includes the following steps:
1) giving input sequence information, and obtaining a fragment library of the sequence by using a Robeta server;
2) and (3) constructing a Markov state model by the following process:
2.1) acquiring nstruct background points: operating the Rosetta Abinitio protocol for nstruct times, and recording the conformation result of each operation as a background point;
2.2) calculating the root mean square difference distance RMSD between the nstruct background points to form a distance matrix D;
2.3) classifying the nstruct background points by using a k-means clustering method according to the distance matrix D to obtain m cluster centers serving as m Markov states;
3) initialization: performing the current stage NP times of Rosetta Abinitio according to the input sequence to generate an initial conformation population P ═ C { (NP), wherein the current stage is 1, the Shannon entropy threshold value alpha and the Shannon entropy maximum accumulation times count _ max1,C2,...,CNPIn which C isNPRepresents the Nth individual;
4) calculating the current population state: for individual C in the populationiI ∈ { 1.,. NP } classification: calculating CiRMSD distance from m cluster centers, if CiThe p cluster center is nearest, then the current state of the individualiP, p ∈ {1, 2.., m }, and the state of the entire population is denoted as statelast={state1,state2,...,stateNP},statelastThe group state of the previous generation is referred to as the state + 1;
5) let the cumulative number count of shannon entropy be 0, enter the next stage, and the process is as follows:
5.1) performing corresponding phase prediction on the population, wherein the process is as follows:
5.1.1) to individuals CiFragment Assembly to give C'iAnd using the energy function at this stage to evaluate the energy E of the conformation before and after fragment assemblystage(Ci)、E′stage(C′i);
5.1.2) if Estage(Ci)>E′stage(C′i) Then, the current fragment assembly C is acceptedi=C′i(ii) a Otherwise, the selection is made using the Metropolis criteria and p ═ exp (- (E) is calculatedstage(Ci)-Estage(C′i) If p > rand (0,1), accepting the current fragment assembly Ci=C′i(ii) a Otherwise, rejectAssembling the current fragment;
5.1.3) executing the steps 5.1.1) to 5.1.2) on all individuals to obtain a next generation population;
5.2) calculating the current population state: for individual C in the populationiI ∈ { 1.,. NP } classification: calculating CiRMSD distance from m cluster centers, if CiClosest to the q, q e {1,2,. the.m } cluster centers, then the individual's current state'iQ, the state of the entire population is denoted as statenow={state′1,state′2,...,state′NP},statenowThe current population state is indicated;
5.3) obtaining a Markov state transition matrix T according to the state statistics of the previous generation and the next generation: for conformation CiTwo preceding and succeeding state states of i ∈ { 1.,. NP }iP and state'iQ indicates a transition from state p to state q, then tpq=tpq+1/m,tpqThe value of the matrix T in the p th row and the q th column represents the state transition frequency, and the initial value of the state transition frequency is 0;
5.4) calculating the Shannon Entropy value Encopy ∑ -T according to the state transition matrix Tpq lntpq;
5.5) update the State of the Current Statelast=statenow;
5.6) if Encopy < alpha, considering that the population state transition is more definite, then count is equal to count + 1;
5.7) if the count is less than the count _ max, continuing to execute the current stage and returning to the step 5.1); otherwise, switching stages, namely, changing the stage to the stage +1, returning to the step 5 if the stage is less than 5), otherwise, ending the fourth stage prediction process, and outputting a prediction result.
In this embodiment, the α/β sheet protein 1ACF with a sequence length of 125 is an embodiment, and a method for predicting a multi-stage protein structure based on shannon entropy includes the following steps:
1) giving input sequence information, and obtaining a fragment library of the sequence by using a Robeta server;
2) and (3) constructing a Markov state model by the following process:
2.1) obtain nstruct 1000 background points: operating the Rosetta Abinitio protocol for nstruct times, and recording the conformation result of each operation as a background point;
2.2) calculating the root mean square difference distance RMSD between the nstruct background points to form a distance matrix D;
2.3) classifying the nstruct background points by using a k-means clustering method according to the distance matrix D to obtain m-8 cluster centers as m Markov states;
3) initialization: the population size NP is 300, the current stage is 1, the Shannon entropy threshold value alpha is 0.01, the Shannon entropy maximum accumulation times count _ max is 50, the current stage NP of Rosetta Abinitio is executed according to the input sequence, and the initial conformation population P is generated { C ═ C1,C2,...,CNPIn which C isNPRepresents the Nth individual;
4) calculating the current population state: for individual C in the populationiI ∈ { 1.,. NP } classification: calculating CiRMSD distance from m cluster centers, if CiThe p cluster center is nearest, then the current state of the individualiP, p ∈ { 1.. m }, and the state of the entire population is denoted as statelast={state1,state2,...,stateNP},statelastThe group state of the previous generation is referred to as the state + 1;
5) let the cumulative number count of shannon entropy be 0, enter the next stage, and the process is as follows:
5.1) performing corresponding phase prediction on the population, wherein the process is as follows:
5.1.1) to individuals CiFragment Assembly to give C'iAnd using the energy function at this stage to evaluate the energy E of the conformation before and after fragment assemblystage(Ci)、E′stage(C′i);
5.1.2) if Estage(Ci)>E′stage(C′i) Then, the current fragment assembly C is acceptedi=C′i(ii) a Otherwise, the selection is made using the Metropolis criteria and p ═ exp (- (E) is calculatedstage(Ci)-Estage(C′i) If p > rand (0,1), then accept this segmentAssembly Ci=C′i(ii) a Otherwise, rejecting the segment assembly;
5.1.3) executing the steps 5.1.1) to 5.1.2) on all individuals to obtain a next generation population;
5.2) calculating the current population state: for individual C in the populationiI ∈ { 1.,. NP } classification: calculating CiRMSD distance from m cluster centers, if CiClosest to the q, q e {1,2,. the.m } cluster centers, then the individual's current state'iQ, the state of the entire population is denoted as statenow={state′1,state′2,...,state′NP},statenowThe current population state is indicated;
5.3) obtaining a Markov state transition matrix T according to the state statistics of the previous generation and the next generation: for conformation CiTwo preceding and succeeding state states of i ∈ { 1.,. NP }iP and state'iQ indicates a transition from state p to state q, then tpq=tpq+1/m,tpqThe value of the matrix T in the p th row and the q th column represents the state transition frequency, and the initial value of the state transition frequency is 0;
5.4) calculating the Shannon Entropy value Encopy ∑ -T according to the state transition matrix Tpqlntpq;
5.5) update the State of the Current Statelast=statenow;
5.6) if Encopy < alpha, considering that the population state transition is more definite, then count is equal to count + 1;
5.7) if the count is less than the count _ max, continuing to execute the current stage and returning to the step 5.1); otherwise, switching stages, namely stage +1, if the stage is less than 5, returning to the step 5), otherwise, ending the fourth-stage prediction process, and outputting a prediction result.
Using the alpha/beta sheet protein 1ACF with a sequence length of 125 as an example, the above method is used to obtain the near-native conformation of the protein, and the minimum root mean square deviation isThe predicted structure is shown in FIG. 2, and the energy sum of conformation in the prediction process is compared with the natural stateThe RMSD distribution of (a) is shown in fig. 1.
The above description is the optimization effect of the present invention using 1ACF protein as an example, and is not intended to limit the scope of the present invention, and various modifications and improvements can be made without departing from the scope of the present invention.
Claims (1)
1. A multi-stage protein structure prediction method based on Shannon entropy is characterized in that: the protein structure prediction method comprises the following steps:
1) giving input sequence information, and obtaining a fragment library of the sequence by using a Robeta server;
2) and (3) constructing a Markov state model by the following process:
2.1) acquiring nstruct background points: operating the Rosetta Abinitio protocol for nstruct times, and recording the conformation result of each operation as a background point;
2.2) calculating the root mean square difference distance RMSD between the nstruct background points to form a distance matrix D;
2.3) classifying the nstruct background points by using a k-means clustering method according to the distance matrix D to obtain m cluster centers serving as m Markov states;
3) initialization: performing the current stage NP times of Rosetta Abinitio according to the input sequence to generate an initial conformation population P ═ C { (NP), wherein the current stage is 1, the Shannon entropy threshold value alpha and the Shannon entropy maximum accumulation times count _ max1,C2,...,CNPIn which C isNPRepresents the Nth individual;
4) calculating the current population state: for individual C in the populationiI ∈ { 1.,. NP } classification: calculating CiRMSD distance from m cluster centers, if CiThe p cluster center is nearest, then the current state of the individualiP, p ∈ {1, 2.., m }, and the state of the entire population is denoted as statelast={state1,state2,...,stateNP},statelastThe group state of the previous generation is referred to as the state + 1;
5) let the cumulative number count of shannon entropy be 0, enter the next stage, and the process is as follows:
5.1) executing a prediction process of a corresponding stage on the population, wherein the process is as follows:
5.1.1) to individuals CiFragment Assembly to give C'iAnd using the energy function at this stage to evaluate the energy E of the conformation before and after fragment assemblystage(Ci)、E′stage(C′i);
5.1.2) if Estage(Ci)>E′stage(C′i) Then accept this fragment assembly, i.e. Ci=C′i(ii) a Otherwise, the selection is made using the Metropolis criteria and p ═ exp (- (E) is calculatedstage(Ci)-Estage(C′i) If p > rand (0,1), accepting the current fragment assembly Ci=C′i(ii) a Otherwise, rejecting the segment assembly;
5.1.3) executing the steps 5.1.1) to 5.1.2) on all individuals to obtain a next generation population;
5.2) calculating the current population state: for individual C in the populationiI ∈ {1, 2.,. NP } classification: calculating CiRMSD distance from m cluster centers, if CiClosest to the q, q e {1,2,. the.m } cluster centers, then the individual's current state'iQ, the state of the entire population is denoted as statenow={state′1,state′2,...,state′NP},statenowThe current population state is indicated;
5.3) obtaining a Markov state transition matrix T according to the previous generation population state and the current population state: for conformation CiTwo preceding and succeeding state states of i ∈ { 1.,. NP }iP and state'iQ indicates a transition from state p to state q, then tpq=tpq+1/m,tpqThe value of the matrix T in the p th row and the q th column represents the state transition frequency, and the initial value of the state transition frequency is 0;
5.4) calculating the Shannon Entropy value Encopy ∑ -T according to the state transition matrix Tpqlntpq;
5.5) update the State of the Current Statelast=statenow;
5.6) if Encopy < alpha, considering that the population state transition is more definite, and then count is equal to count + 1;
5.7) if the count is less than the count _ max, continuing to execute the current stage and returning to the step 5.1); otherwise, switching stages, namely, changing the stage to the stage +1, returning to the step 5 if the stage is less than 5), otherwise, ending the fourth stage prediction process, and outputting a prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810238703.6A CN108614957B (en) | 2018-03-22 | 2018-03-22 | Multi-stage protein structure prediction method based on Shannon entropy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810238703.6A CN108614957B (en) | 2018-03-22 | 2018-03-22 | Multi-stage protein structure prediction method based on Shannon entropy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108614957A CN108614957A (en) | 2018-10-02 |
CN108614957B true CN108614957B (en) | 2021-06-18 |
Family
ID=63659305
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810238703.6A Active CN108614957B (en) | 2018-03-22 | 2018-03-22 | Multi-stage protein structure prediction method based on Shannon entropy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108614957B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106575320A (en) * | 2014-05-05 | 2017-04-19 | 艾腾怀斯股份有限公司 | Binding affinity prediction system and method |
CN107491664A (en) * | 2017-08-29 | 2017-12-19 | 浙江工业大学 | A kind of protein structure ab initio prediction method based on comentropy |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170002319A1 (en) * | 2015-05-13 | 2017-01-05 | Whitehead Institute For Biomedical Research | Master Transcription Factors Identification and Use Thereof |
US20180068053A1 (en) * | 2016-08-05 | 2018-03-08 | The Governors Of The University Of Alberta | Systems and methods of selecting compounds with reduced risk of cardiotoxicity using cardiac sodium ion channel models |
-
2018
- 2018-03-22 CN CN201810238703.6A patent/CN108614957B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106575320A (en) * | 2014-05-05 | 2017-04-19 | 艾腾怀斯股份有限公司 | Binding affinity prediction system and method |
CN107491664A (en) * | 2017-08-29 | 2017-12-19 | 浙江工业大学 | A kind of protein structure ab initio prediction method based on comentropy |
Non-Patent Citations (2)
Title |
---|
"Prediction of Protein Structural Features from Sequence Data Based on Shannon Entropy and Kolmogorov Complexity";Bywater R P;《Plos One》;20150409;第1-15页 * |
"基于深度学习的八类蛋白质二级结构预测算法";张蕾;《计算机应用》;20170510;第1512-1515页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108614957A (en) | 2018-10-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hanson et al. | Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks | |
Pomyen et al. | Deep metabolome: Applications of deep learning in metabolomics | |
CN107609342B (en) | Protein conformation search method based on secondary structure space distance constraint | |
Xia | Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction | |
Quang et al. | EXTREME: an online EM algorithm for motif discovery | |
CN109215732B (en) | Protein structure prediction method based on residue contact information self-learning | |
CN109360599B (en) | Protein structure prediction method based on residue contact information cross strategy | |
EP2759952B1 (en) | Efficient genomic read alignment in an in-memory database | |
CN108846256B (en) | Group protein structure prediction method based on residue contact information | |
CN109033744B (en) | Protein structure prediction method based on residue distance and contact information | |
CN107491664B (en) | Protein structure de novo prediction method based on information entropy | |
CN112365921B (en) | Protein secondary structure prediction method based on long-time and short-time memory network | |
CN109215733B (en) | Protein structure prediction method based on residue contact information auxiliary evaluation | |
CN109101785B (en) | Protein structure prediction method based on secondary structure similarity selection strategy | |
CN108614957B (en) | Multi-stage protein structure prediction method based on Shannon entropy | |
Hou et al. | Predicting protein functions from PPI networks using functional aggregation | |
CN115938490B (en) | Metabolite identification method, system and equipment based on graph representation learning algorithm | |
CN109300506B (en) | Protein structure prediction method based on specific distance constraint | |
Phan et al. | A comprehensive revisit of the machine‐learning tools developed for the identification of enhancers in the human genome | |
CN109360597B (en) | Group protein structure prediction method based on global and local strategy cooperation | |
CN109390035B (en) | Protein conformation space optimization method based on local structure comparison | |
CN109243526B (en) | Protein structure prediction method based on specific fragment crossing | |
CN110729023B (en) | Protein structure prediction method based on contact assistance of secondary structure elements | |
CN108563921B (en) | Protein structure prediction algorithm evaluation index construction method | |
Lakshmi et al. | An Improved Genetic with Particle Swarm Optimization Algorithm Based on Ensemble Classification to Predict Protein–Protein Interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20181002 Assignee: ZHEJIANG ORIENT GENE BIOTECH CO.,LTD. Assignor: JIANG University OF TECHNOLOGY Contract record no.: X2023980053610 Denomination of invention: A multi-stage protein structure prediction method based on Shannon entropy Granted publication date: 20210618 License type: Common License Record date: 20231222 |
|
EE01 | Entry into force of recordation of patent licensing contract |