CN104021316B - Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine - Google Patents
Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine Download PDFInfo
- Publication number
- CN104021316B CN104021316B CN201410302140.4A CN201410302140A CN104021316B CN 104021316 B CN104021316 B CN 104021316B CN 201410302140 A CN201410302140 A CN 201410302140A CN 104021316 B CN104021316 B CN 104021316B
- Authority
- CN
- China
- Prior art keywords
- disease
- medicine
- gene
- association
- collection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000003814 drug Substances 0.000 title claims abstract description 146
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 140
- 239000011159 matrix material Substances 0.000 title claims abstract description 90
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 36
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 203
- 201000010099 disease Diseases 0.000 claims abstract description 202
- 238000012549 training Methods 0.000 claims abstract description 36
- 230000008569 process Effects 0.000 claims abstract description 9
- 238000010276 construction Methods 0.000 claims abstract description 6
- 239000012141 concentrate Substances 0.000 claims description 9
- 229940079593 drug Drugs 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 5
- 241000995051 Brenda Species 0.000 claims description 4
- 230000004927 fusion Effects 0.000 claims description 3
- 239000000523 sample Substances 0.000 claims 16
- 230000006978 adaptation Effects 0.000 claims 1
- 239000012468 concentrated sample Substances 0.000 claims 1
- 238000012935 Averaging Methods 0.000 abstract description 3
- 238000002360 preparation method Methods 0.000 abstract 1
- UEJJHQNACJXSKW-UHFFFAOYSA-N 2-(2,6-dioxopiperidin-3-yl)-1H-isoindole-1,3(2H)-dione Chemical compound O=C1C2=CC=CC=C2C(=O)N1C1CCC(=O)NC1=O UEJJHQNACJXSKW-UHFFFAOYSA-N 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- VJJPUSNTGOMMGY-MRVIYFEKSA-N etoposide Chemical group COC1=C(O)C(OC)=CC([C@@H]2C3=CC=4OCOC=4C=C3[C@@H](O[C@H]3[C@@H]([C@@H](O)[C@@H]4O[C@H](C)OC[C@H]4O3)O)[C@@H]3[C@@H]2C(OC3)=O)=C1 VJJPUSNTGOMMGY-MRVIYFEKSA-N 0.000 description 4
- 229960005420 etoposide Drugs 0.000 description 4
- 206010028417 myasthenia gravis Diseases 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 229960003433 thalidomide Drugs 0.000 description 4
- 238000012827 research and development Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000002547 new drug Substances 0.000 description 2
- 208000006586 Ectromelia Diseases 0.000 description 1
- 206010024229 Leprosy Diseases 0.000 description 1
- 206010024503 Limb reduction defect Diseases 0.000 description 1
- BNRNXUUZRGQAQC-UHFFFAOYSA-N Sildenafil Natural products CCCC1=NN(C)C(C(N2)=O)=C1N=C2C(C(=CC=1)OCC)=CC=1S(=O)(=O)N1CCN(C)CC1 BNRNXUUZRGQAQC-UHFFFAOYSA-N 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000012098 association analyses Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 208000029404 congenital absence of upper arm and forearm with hand present Diseases 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 229940000406 drug candidate Drugs 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 210000003754 fetus Anatomy 0.000 description 1
- DBEPLOCGEIEOCV-WSBQPABSSA-N finasteride Chemical compound N([C@@H]1CC2)C(=O)C=C[C@]1(C)[C@@H]1[C@@H]2[C@@H]2CC[C@H](C(=O)NC(C)(C)C)[C@@]2(C)CC1 DBEPLOCGEIEOCV-WSBQPABSSA-N 0.000 description 1
- 229960004039 finasteride Drugs 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000036244 malformation Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035935 pregnancy Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- DEIYFTQMQPDXOT-UHFFFAOYSA-N sildenafil citrate Chemical compound OC(=O)CC(O)(C(O)=O)CC(O)=O.CCCC1=NN(C)C(C(N2)=O)=C1N=C2C(C(=CC=1)OCC)=CC=1S(=O)(=O)N1CCN(C)CC1 DEIYFTQMQPDXOT-UHFFFAOYSA-N 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 231100000027 toxicology Toxicity 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 229940094720 viagra Drugs 0.000 description 1
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a kind of method for predicting new indication to old medicine based on the matrix decomposition that gene space merges, the method includes:Collect the associated data of medicine and disease;Prepare the data of gene space;Gene compactness matrix is extracted from gene space;According to gene compactness matrix construction joint space;The feature space of medicine and disease is calculated according to joint space;With the feature space initialization matrix decomposition model of medicine and disease;Prepare training sample;Matrix decomposition model is trained;To medicine disease combination degree of the being associated prediction beyond training sample;By sample preparation, model training and prediction process repeatedly, all possible medicine disease is combined, its degree of association score is predicted by averaging;To degree of association prediction score sequence, score threshold is set and screens potential medicine disease association relation, predict the new indication of medicine.The method of the present invention can more accurately filter out potential medicine disease association relation.
Description
Technical field
The present invention relates to application of the computer technology in medical research, the square for more particularly to being merged based on gene space
Battle array decomposes the method for predicting new indication (old medicine is newly used) to old medicine.
Background technology
As new drug development input is big, cycle length, risk are high, the number for listing new drug occurs in that significant reduction.One
The construction cycle of new chemical entitieses medicine, cost was more than 1,000,000,000 dollars often more than 10 years.From drug candidate to approval life
Produce, its success rate only has one one-tenth.In recent years, medicament research and development person increasingly pays close attention to old medicine and newly uses field.Old medicine is newly used, refer to for
Existing medicine finds the new indication in clinical practice.The new R&D process of old medicine can exempt existing toxicology with
Pharmacokinetic Evaluation, this greatly shortens the time of research and development and cost.The clinical safety of medicine is ground it has been determined that it reducing
The risk sent out.Additionally, new indication is found for old medicine, can continue the service life of medicine, expand the indication of medicine.Allusion quotation
Type example includes Thalidomide, viagra, finasteride etc..For example, Thalidomide is once wide as gestation reaction medicine
General use, afterwards due to the phocomelia malformation fetus for occurring being caused by Thalidomide in a large number, the medicine is stopped use.Afterwards
Come, scientist has found Thalidomide for human immune system has adjustment effect, then in 1998 by U.S.'s food and medicine
Surveillance Authority (FDA) ratifies as a kind of medicine list marketing for treating leprosy.But, the new application of these medicines is often
It is derived from having been surprisingly found that.Therefore, provide effective computational methods for the new systematicness prediction of old medicine to support to become to pass weight
Will.
At present, researcher has been proposed for some for finding the computation model of the new indication of medicine, these models master
All to be to be matched using the attribute data of medicine and disease, so as to find new medicine-disease association, realize that old medicine is new
With.The medicine wherein used and the attribute data of disease include the data such as transcription group, side effect, path and gene profile.These
Research achieves certain achievement, but, they to require be before the study that medicine and disease calculate clear and definite attribute number
According to, and this can expend regular hour and energy.Conversely, the medicine associated data related to disease is easier from disclosure
Obtain in data base.In internet arena, the matrix decomposition model based on associated data is played on user behavior analysis
Huge effect, during we have reason to believe the association analysiss that the model can be used for medicine-disease.And, with when
Between passage, the associated data information of the medicine accumulated in public database and disease also can be more and more complete.Additionally, medicine and
The relatedness of disease may have been embodied in gene space.Therefore, the related information in gene space can also be incorporated into medicine
In the analysis of thing-disease association.
For such case, it is necessary to set up old medicine newly with research framework using associated data, design based on gene space
The matrix disassembling method of fusion, for predicting new medicine-disease association relation, finds the new indication of medicine, so as to realize
The bigger use value of medicine.
The content of the invention
(1) technical problem to be solved
The technical problem to be solved is to provide a kind of method for predicting new indication for old medicine, for pre-
New medicine-disease association relation is surveyed, the new indication of medicine is found, is realized that old medicine is newly used.
(2) technical scheme
To solve above-mentioned technical problem, the present invention proposes that a kind of matrix decomposition merged based on gene space is predicted to old medicine
The method of new indication, methods described comprise the steps:
Step S1:The associated data of medicine collection and disease collection known to collecting;
Step S2:The data of gene space are obtained, the data of gene space are obtained, the data include and the known medicine
Related medicine-gene association the collection of the thing disease-gene association collection related to disease in step S1 and gene association collection;
Step S3:Go out gene compactness matrix from the extracting data of gene space;
Step S4:Eigenvalues Decomposition is carried out to gene compactness matrix, the theorem in Euclid space of a low dimensional is generated so that is closed
Each gene in note gene set can be represented with the characteristic vector of a low dimensional, so as to construct joint space;
Step S5:According to medicine-gene association collection and disease-gene association collection, medicine and disease are calculated from joint space
The feature space of disease;
Step S6:Matrix decomposition model is set up according to the eigenmatrix of the eigenmatrix and disease of medicine, it is empty using joint
Between in medicine and the characteristic vector of disease model is initialized;
Step S7:Training sample is obtained according to medicine collection, disease collection and medicine disease association collection;
Step S8:The matrix decomposition model is trained using the training sample;
Step S9:By the matrix decomposition model after training for the prediction to the medicine beyond training sample-disease combination;
Step S10:Return to step S7, the processing procedure being related to during step S7, S8, S9 is repeated several times, to positive sample with
Outer all possible medicine-disease group predicts its degree of association;
Step S11:According to the degree of association of step S10 prediction, the incidence relation of screening of medicaments-disease, prediction medicine are new
Indication.
Specific embodiment of the invention, step S1 are further included:
Step S11:Medicine-disease association collection known to collecting;
Step S12:According to medicine-disease association collection construction medicine-disease association matrix.
Specific embodiment of the invention, in step s 2, the data of gene space include medicine-gene association
Collection, disease-gene association collection and gene association collection, the acquisition of these data are realized by inquiring about data base.
Specific embodiment of the invention, medicine-gene association collection by inquire about KEGG BRITE, BRENDA,
SuperTarget and DrugBank data bases obtain, and disease-gene association collection is obtained by inquiring about omim database, and gene is closed
Connection collection is obtained by inquiring about HPRD data bases.
Specific embodiment of the invention, gene are represented with Entrez Gene ID.
Specific embodiment of the invention, step S3 are further included:
Step S31:The gene Internet is built according to the gene association collection;
Step S32:Using the medicine-gene association collection and disease-gene association concentrate all genes being related to as
Concern gene set calculates the compactness between gene.
Specific embodiment of the invention, step S4 include:
Step S41:The eigenvalue of gene compactness matrix is calculated, with eigenvalue size descending sort, while obtaining correspondence
Characteristic vector;
Step S42:Before extracting, k eigenvalue constitutes k × k diagonal matrix Λ, while corresponding k characteristic vector is extracted,
Constitute Ng× k matrix Γ, wherein, one characteristic vector of correspondence per string, k is natural number;
Step S43:Calculate Ng× k matrix P=Γ Λ1/2, by this matrix, gene compactness matrix can decompose
For C=PPT=Γ Λ1/2Λ1/2ΓT;
Step S44:Gene is represented using each row vector of matrix POne group of k dimensional vectorConstitute joint space.
Specific embodiment of the invention, in step s 5, the feature space of disease according to table below up to formula come
Calculate:
Wherein, pgThe characteristic vector of gene in joint space is represented,Concentrate and disease s for disease-gene associationiHave
The gene sets of incidence relation,Represent disease siThe number of the gene of association.
Specific embodiment of the invention, in step s 6, matrix decomposition model is
Specific embodiment of the invention,The eigenmatrix of medicine is represented,The eigenmatrix of disease is represented, the initialization of model is carried out up to formula according to table below:
Specific embodiment of the invention, in the step s 7, the sample using medicine-disease association concentration is used as positive sample
This;Concentrate random choose medicine and disease to be combined from medicine collection and disease, construct the medicine not occurred in positive sample
Thing-disease pair, generates and the same number of medicine-disease pair of positive sample, using them as negative sample;In medicine-disease association
Negative sample respective items are set to into Y in matrix Yij=0.
Specific embodiment of the invention, step S8 are further included:
Step S81:Successively using each sample of training sample, according to table below up to formula to medicine and disease
Characteristic vector is updated:
ai+=η (eI, jbjFij(1-Fij)-λai)
bj+=η (eI, jaiFij(1-Fij)-λbj)
Wherein,For current medical riWith disease sjBetween the degree of association predictive value, eI, j=Yij-
FijFor the error between actual value and predictive value, η is learning rate, and λ is penalty coefficient.All samples are traveled through, has been designated as once changing
In generation, is processed;
Step S82:Repeat step S81, until it reaches maximum iteration time.
Specific embodiment of the invention, in step s 9, for example following expression of matrix decomposition model after training
Shown in formula, according to the model to the medicine r beyond training sampleiWith disease sjThe degree of association of combination is predicted:
Wherein, aiAnd bjFor medicine r after trainingiWith disease sjCharacteristic vector.
Specific embodiment of the invention, in step slo, repeats 100 times to following process:Step S7 again with
Machine generates negative sample, in step S8 re -training matrix decomposition model, the positive and negative sample in step S9 is to medicine-disease association matrix
The unknown beyond this is predicted.
Specific embodiment of the invention, in step slo, for the medicine r beyond positive sampleiWith disease sj's
Combination, it is N that the combination is selected as the number of times of negative sample, in 100-N model of negative sample is not selected as, the combination degree of association
Predictive value summation be S, then according to table below up to formula calculate the combination the degree of association prediction score score be:
(3) beneficial effect
Method provided by the present invention, focuses on the analysis to associated data and modeling, compared to based on attribute data
Method, eliminates the calculating process of attribute data, more simple and easy to do;The associated data for being used derives from open biology number
According to storehouse, data reliability;With the continuous accumulation of the associated data in public database, method provided by the present invention can be played
More and more significant effect;Potential medicine-disease can more accurately be filtered out by method provided by the present invention to close
Connection relation, is that medicine predicts new indication.
Description of the drawings
Fig. 1 is the flow chart of method provided by the present invention;
Fig. 2 is the operating process schematic diagram of the specific embodiment of the invention;
Fig. 3 A and Fig. 3 B is etoposide (Etoposide) and myasthenia graviss in the specific embodiment of the invention respectively
Change procedure figure of the characteristic vector of (Myasthenia Gravis) in model training;
Fig. 4 is the scattergram of degree of association prediction score in the specific embodiment of the invention;
Fig. 5 is the corresponding relation figure that prediction obtains new indication number and medicine number in the specific embodiment of the invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention become more apparent, below in conjunction with specific embodiment, and reference
Accompanying drawing, the present invention is described in further detail.
The present invention sets up and medicine-disease association, medicine-gene association, disease-gene association and gene association is being divided
On analysis basis.Joint space is constructed by the gene Internet of gene space, then obtain from joint space medicine and
The characteristic vector of disease.So, the topological characteristic of gene space has just been dissolved in the feature space of medicine and disease.Then,
Under the supervision of medicine-disease association data, by matrix decomposition model the characteristic vector of medicine and disease is updated with
Adjustment.Finally, the feature space after medicine and disease adjustment is used in the screening of potential medicine-disease association relation, from
And predict the new indication of medicine.
Fig. 1 is the flow chart of method provided by the present invention.As shown in figure 1, the present invention provides a kind of based on gene space
The new matrix decomposition Forecasting Methodology of the old medicine of fusion, the method comprise the steps:
Step S1:The associated data of medicine collection and disease collection known to collecting.
Preferably, step S1 includes:
Step S11:Medicine-disease association collection known to collecting.The medicine collection being directed to
Represent, the disease collection being related toRepresent, NrRepresent the number of medicine, NsRepresent the number of disease.
Medicine represents that with DrugBank ID disease is represented with OMIM ID.
Step S12:According to medicine-disease association collection construction medicine-disease association matrix.Specifically, construct Nr×NsMedicine
Thing-disease association matrix Y.Wherein, if medicine riWith disease sjConcentrate in medicine-disease association and there is incidence relation, then medicine
Thing-disease association matrix respective items Yij=1, other are unknown for matrix.
Step S2:The data of gene space are obtained, the data include that the medicine-gene related to the known medicine is closed
Connection collection, the disease related to disease in step S1-gene association collection and gene association collection.
Gene association collection describes gene-gene association set;Gene space refers to the metric space of gene.
Preferably, step S2 includes:
Step S21:From collection step S1 in KEGG BRITE, BRENDA, SuperTarget and DrugBank data bases
Related medicine-gene association the collection of medicine.Gene is represented with Entrez Gene ID.
Step S22:Related disease-gene association the collection of disease in collection step S1 from omim database.Gene is used
Entrez Gene ID are represented.
Step S23:Gene association collection is extracted from HPRD data bases.
Gene is represented with Entrez Gene ID.
Step S3:Go out gene compactness matrix from gene space extracting data.
What gene compactness matrix was represented is the matrix for describing correlation degree between gene.
Preferably, step S3 includes:
Step S31:The gene Internet is built according to the gene association collection.
The gene Internet refers to the network being made up of gene-gene association relation.
Specifically, represent that gene association concentrates the gene for including with node, represent what gene association was concentrated with incidence edge
Gene-gene association relation, connects each node, builds the gene Internet with this.
Step S32:Using the medicine-gene association collection and disease-gene association concentrate all genes being related to as
Concern gene set calculates the compactness between gene.
Specifically, all genes being related to are concentrated as concern base using medicine-gene association collection and disease-gene association
Because of collectionWherein the number of gene is Ng, i-th gene g is calculated up to formula according to table belowiWith j-th
Gene gjBetween compactness:
Wherein, dijRepresent giWith gjBeeline in the gene Internet, a ' and b ' are regulation parameter.When two bases
When cause is unreachable in a network, dijIt is defined as infinity.So, it is tight between any two gene in calculating concern gene set
Degree, constitutes Ng×NgGene compactness Matrix C.
Step S4:Eigenvalues Decomposition is carried out to gene compactness matrix, the theorem in Euclid space of a low dimensional is generated so that is closed
Each gene in note gene set can be represented with the characteristic vector of a low dimensional, so as to construct joint space.
So-called joint space, expression is for measuring medicine, disease and gene unified space.
Preferably, step S4 includes:
Step S41:The eigenvalue of gene compactness matrix is calculated, with eigenvalue size descending sort, while obtaining correspondence
Characteristic vector.
Gene compactness matrix is the gene compactness Matrix C.
Step S42:Before extracting, k eigenvalue constitutes k × k diagonal matrix Λ, while corresponding k characteristic vector is extracted,
Constitute Ng× k matrix Γ, wherein, one characteristic vector of correspondence per string.K is natural number, for example, take 8.
Step S43:Calculate Ng× k matrix P=Γ Λ1/2, by this matrix, gene compactness matrix can decompose
For C=PPT=Γ Λ1/2Λ1/2ΓT。
Step S44:Gene is represented using each row vector of matrix PSo, one group of k tie up to
AmountConstitute joint space.
Step S5:According to medicine-gene association collection and disease-gene association collection, medicine and disease are calculated from joint space
The feature space of disease.
So, medicine space, disease space and gene space unification has been arrived in joint space.
Preferably, step S5 includes:
Step S51:For each medicine ri, medicine-gene association collection is traveled through, relevant gene set is collected
Close, be designated as
The characteristic vector of medicine is calculated according to table below up to formula:
Wherein,Represent medicine riThe number of the gene of association.
Step S52:For each disease si, disease-gene association collection is traveled through, relevant gene set is collected
Close, be designated as
The characteristic vector of disease is calculated according to table below up to formula:
Wherein,Represent disease siThe number of the gene of association.
Step S6:Matrix decomposition model is set up according to the eigenmatrix of the eigenmatrix and disease of medicine, it is empty using joint
Between in medicine and the characteristic vector of disease model is initialized.
What matrix decomposition model was represented is a kind of computation model with incidence relation between hidden characteristic vector description object.Can
To set up matrix decomposition modelWherein,Represent the feature square of medicine
Battle array,Represent the eigenmatrix of disease.
The characteristic vector of medicine and disease in available joint space is initialized to matrix A and B:
Step S7:Training sample is obtained according to medicine collection, disease collection and medicine disease association collection.
Wherein, the sample using medicine-disease association concentration is used as positive sample.Random choose medicine is concentrated from medicine collection and disease
Thing and disease are combined, and construct the medicine-disease pair not occurred in positive sample.Generate the same number of with positive sample
Medicine-disease pair, it is believed that determine there is no incidence relation between them, using them as negative sample.In medicine-disease association square
Negative sample respective items are set to into Y in battle array Yij=0, residual matrix item is unknown.WithRepresent training sample set.
Step S8:The matrix decomposition model is trained using the training sample.
Be presented herein below be model training algorithm a false code.
Preferably, step S8 includes:
Step S81:Training sample set is utilized successivelyIn each sample, according to table below up to formula to medicine and
The characteristic vector of disease is updated:
ai+=η (eI, jbjFij(1-Fij)-λai)
bj+=η (eI, jaiFij(1-Fij)-λbj)
Wherein,For current medical riWith disease sjBetween the degree of association predictive value, eI, j=Yij-
FijFor the error between actual value and predictive value, η is learning rate, and λ is penalty coefficient.All samples are traveled through, has been designated as once changing
In generation, is processed.
Step S82:Repeat step S81, until it reaches maximum iteration time.
Step S9:According to the matrix decomposition model after training to beyond positive negative sample in medicine-disease association matrix not
Know that item is predicted.
The medicine r unknown for incidence relationiWith disease sj, according to following model expression to the association between them
Degree is predicted:
Wherein, aiAnd bjFor medicine r after trainingiWith disease sjCharacteristic vector.
Step S10:Return to step S7, the processing procedure being related to during step S7, S8, S9 is repeated several times, to positive sample with
Outer all possible medicine-disease group predicts its degree of association.
For example, the processing procedure that step S7, S8, S9 is related to repeats 100 times, to positive sample beyond it is all possible
Medicine-disease combination, predicts its degree of association by averaging.
Preferably, negative sample is generated again at random in step S7, in step S8 re -training matrix decomposition model, in step
S9 is predicted to the unknown beyond positive negative sample in medicine-disease association matrix.Above procedure 100 can for example be repeated
It is secondary.
For the medicine r beyond positive sampleiWith disease sjCombination, the combination be selected as negative sample number of times be N, not
It is selected as in 100-N model of negative sample, the predictive value summation of the combination degree of association is S, then reach formula according to table below
Calculate the degree of association prediction score of the combination:
Step S11:According to the degree of association of step S10 prediction, the incidence relation of screening of medicaments-disease, prediction medicine are new
Indication.
According to the step S10 degree of association predict score order from high to low, the medicine-disease beyond positive sample is combined into
Row sequence.Score threshold is set and screens potential medicine-disease association relation, predict the new indication of medicine.
Each feature and advantage of the present invention are embodied below by the explanation to a specific embodiment.The embodiment is selected
The medicine extracted in taking document and public database and the associated data of disease are used to predict that old medicine is newly used.Fig. 2 is tool of the present invention
The operating process schematic diagram of body embodiment, as shown in Fig. 2 operation comprises the steps:
Step S1:The associated data of medicine collection and disease collection known to collecting.
Medicine-disease association collection is collected by document and public database, is obtained between 130 medicines and 50 diseases altogether
213 medicine-disease association relations.Construction 130 × 50 medicines-disease association matrix Y, wherein, the corresponding matrix entries of incidence set
It is set to 1.
Step S2:The data of gene space are obtained, the data include that the medicine-gene related to the known medicine is closed
Connection collection, the disease related to disease in step S1-gene association collection and gene association collection.
130 medicines are collected from KEGG BRITE, BRENDA, SuperTarget to DrugBank data bases related
Medicine-gene association collection, totally 776 medicine-gene association relations.The related disease of 50 diseases is collected from omim database
Disease-gene association collection, totally 74 disease-gene association relations.Gene association collection is extracted from HPRD data bases, is related to 9415
Totally 36882 gene interactive relations between individual gene.
Step S3:Go out gene compactness matrix from gene space extracting data.
Medicine-gene association collection and disease-gene association are concentrated and are related to 850 genes altogether.According to gene association collection structure
Build the gene Internet.A '=10 and b '=0.25 are taken, gene g is calculated according to expressioniWith gene gjBetween it is tight
Degree:
Wherein, dijRepresent giWith gjBeeline in the gene Internet, when two genes it is unreachable in a network
When, dijIt is defined as infinity.So, the compactness in 850 genes of calculating between any two gene, constitutes 850 × 850 bases
Because of compactness Matrix C.
Step S4:Eigenvalues Decomposition is carried out to gene compactness matrix, the theorem in Euclid space of a low dimensional is generated so that is closed
Each gene in note gene set can be represented with the characteristic vector of a low dimensional, so as to construct joint space.
In this embodiment, Eigenvalues Decomposition is carried out to gene compactness Matrix C.K=8 is taken, one group of 8 dimensional vector is obtainedRepresent the characteristic vector of gene, tectonic syntaxis space.
Step S5:According to medicine-gene association collection and disease-gene association collection, medicine and disease are calculated from joint space
The feature space of disease.
For each medicine ri, medicine-gene association collection is traveled through, relevant gene sets is collected, is designated asThe characteristic vector of medicine is calculated according to table below up to formula:
Wherein,Represent medicine riThe number of the gene of association.
For each disease si, disease-gene association collection is traveled through, relevant gene sets is collected, is designated asThe characteristic vector of disease is calculated according to table below up to formula:
Wherein,Represent disease siThe number of the gene of association.
Step S6:Matrix decomposition model is set up according to the eigenmatrix of the eigenmatrix and disease of medicine, it is empty using joint
Between in medicine and the characteristic vector of disease model is initialized.
In this embodiment, set up matrix decomposition modelWith medicine and disease in joint space
The characteristic vector of disease is initialized to matrix A and B:
Step S7:Training sample is obtained according to medicine collection, disease collection and medicine disease association collection.
In this embodiment, the 213 medicines-disease association relation using medicine-disease association concentration is used as positive sample.From
In 130 medicines and 50 diseases, random choose medicine and disease are combined, and construct the medicine not occurred in positive sample
Thing-disease pair.Generate and the same number of medicine-disease pair of positive sample, it is believed that between them, determine there is no incidence relation, with
They are used as negative sample.Negative sample respective items are set to into 0 in medicine-disease association matrix Y, residual matrix item is unknown.
Step S8:The matrix decomposition model is trained using the training sample.
Successively using each sample in 426 training samples, according to table below up to formula to medicine and disease
Characteristic vector is updated:
ai+=η (eI, jbjFij(1-Fij)-λai)
bj+=η (eI, jaiFij(1-Fij)-λbj)
Wherein,For current medical riWith disease sjBetween the degree of association predictive value, eI, j=Yij-
FijFor the error between actual value and predictive value, learning rate η=0.035, penalty coefficient λ=0.005.Maximum iteration time is set
For 1500.Fig. 3 A and Fig. 3 B with etoposide (Etoposide) and myasthenia graviss (Myasthenia Gravis) is respectively
Example, illustrates the process that the feature space of medicine and disease progressively tends towards stability in model training.
Step S9:According to the matrix decomposition model after training to beyond positive negative sample in medicine-disease association matrix Y not
Know that item is predicted.
The medicine r unknown for incidence relationiWith disease sj, according to following model expression to the association between them
Degree is predicted:
Wherein, aiAnd bjFor medicine r after trainingiWith disease sjCharacteristic vector.
Step S10:Return to step S7, the processing procedure being related to during step S7, S8, S9 is repeated several times, to positive sample with
Outer all possible medicine-disease group predicts its degree of association.
In this embodiment, processing procedure step S7, S8, S9 being related to repeats 100 times, to the institute beyond positive sample
Possible medicine-disease combination, predicts its degree of association by averaging.
For the medicine r beyond positive samplei(1≤i≤130) and disease sjThe combination of (1≤j≤50), the combination are selected as
The number of times of negative sample is N, and in 100-N model of negative sample is not selected as, the predictive value summation of the combination degree of association is S,
The degree of association prediction score of the combination is calculated according to table below up to formula so:
So, the degree of association prediction score of 6287 medicines-disease combination is calculated altogether.The degree of association predicts dividing for score
Cloth is shown in Fig. 4.
Step S11:According to the degree of association of step S10 prediction, the incidence relation of screening of medicaments-disease, prediction medicine are new
Indication.
Score orders from high to low are predicted according to 6287 degrees of association, the medicine-disease beyond positive sample is combined into
Row sequence.It is 0.6459 to arrange score threshold, filters out 500 potential medicine-disease association relations, predicts that medicine is new with this
Indication, realize that old medicine is newly used.This 500 potential medicine-disease association relations cover 125 medicines and 40 diseases
Disease, wherein, the corresponding relation of new indication number and the medicine number for obtaining the number new indication is shown in Fig. 5.
For the potential drug-disease association relation of prediction, clinical research and medical literature verify that they are that comparison is accurate
With it is effective.By these potential medicine-disease association relations, can be that medicine predicts new indication, it is old so as to realize
Medicine is newly used.
Particular embodiments described above, has been carried out to the purpose of the present invention, technical scheme and beneficial effect further in detail
Describe in detail bright, it should be understood that the foregoing is only the specific embodiment of the present invention, be not limited to the present invention, it is all
Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc. should be included in the protection of the present invention
Within the scope of.
Claims (12)
1. a kind of method that matrix decomposition based on gene space fusion predicts new indication to old medicine, it is characterised in that described
Method comprises the steps:
Step S1:Collect known to medicine-disease association collection, the medicine collection being directed toTable
Show, the disease collection being related toRepresent, NrRepresent the number of medicine, NsRepresent the number of disease;Root
According to known medicine-disease association collection construction medicine-disease association matrix;
Step S2:Obtain gene space data, the data include the medicine-gene association collection related to known medicine and
Related disease-gene association the collection of disease and gene association collection in step S1;
Step S3:Go out gene compactness matrix from the extracting data of gene space;
Step S4:Eigenvalues Decomposition is carried out to gene compactness matrix, the theorem in Euclid space of a low dimensional is generated so that concern base
Because each gene concentrated can be represented with the characteristic vector of a low dimensional, so as to construct joint space, the pass
Note gene set includes that medicine-gene association collection and disease-gene association concentrate all genes being related to;
Step S5:According to medicine-gene association collection and disease-gene association collection, medicine and disease are calculated from joint space
Eigenmatrix;
Step S6:Matrix decomposition model is set up according to the eigenmatrix of the eigenmatrix and disease of medicine, using in joint space
Medicine and the characteristic vector of disease model is initialized;
Step S7:Training sample is obtained according to medicine collection, disease collection and medicine-disease association collection;
Step S8:The matrix decomposition model is trained using the training sample;
Step S9:By the matrix decomposition model after training for the prediction to the medicine beyond training sample-disease combination;
Step S10:Return to step S7, the processing procedure being related to during step S7, S8, S9 is repeated several times, beyond positive sample
All possible medicine-disease combined prediction its degree of association, the positive sample is that described known medicine-disease association is concentrated
Sample;
Step S11:According to the degree of association of step S10 prediction, the incidence relation of screening of medicaments-disease, the new adaptation of medicine is predicted
Disease.
2. the method for claim 1, it is characterised in that in step s 2, the data of gene space include medicine-gene
Incidence set, disease-gene association collection and gene association collection, the acquisition of these data are realized by inquiring about data base.
3. method as claimed in claim 2, it is characterised in that medicine-gene association collection by inquire about KEGG BRITE,
BRENDA, SuperTarget and DrugBank data base obtains, and disease-gene association collection is obtained by inquiring about omim database,
Gene association collection is obtained by inquiring about HPRD data bases.
4. method as claimed in claim 2, it is characterised in that gene is represented with Entrez Gene ID.
5. the method for claim 1, it is characterised in that step S3 is further included:
Step S31:The gene Internet is built according to the gene association collection;
Step S32:Compactness between gene is calculated with the concern gene set.
6. the method for claim 1, it is characterised in that step S4 includes:
Step S41:The eigenvalue of gene compactness matrix is calculated, with eigenvalue size descending sort, while obtaining corresponding spy
Levy vector;
Step S42:Before extracting, k eigenvalue constitutes k × k diagonal matrix Λ, while extracting corresponding k characteristic vector, constitutes Ng
× k matrix Γ, wherein, one characteristic vector of correspondence per string, k is natural number, NgFor the number of gene in the concern gene set
Mesh;
Step S43:Calculate Ng× k matrix P=Γ Λ1/2, by this matrix, gene compactness matrix can be decomposed into C=
PPT=Γ Λ1/2Λ1/2ΓT;
Step S44:Gene is represented using each row vector of matrix POne group of k dimensional vector
Constitute joint space.
7. the method for claim 1, it is characterised in that in step s 5, the feature space of disease is according to table below
Calculate up to formula:
Wherein, pgThe characteristic vector of gene in joint space is represented,Concentrate and disease s for disease-gene associationiIt is relevant
The gene sets of relation,Represent disease siThe number of the gene of association.
8. the method for claim 1, it is characterised in that in the step s 7, concentrates random choose from medicine collection and disease
Medicine and disease are combined, and construct the medicine-disease pair not occurred in positive sample, generate identical with positive sample number
Medicine-disease pair, using them as negative sample;Negative sample respective items are set in the medicine-disease association matrix Y
Yij=0.
9. the method for claim 1, it is characterised in that step S8 is further included:
Step S81:Successively using each sample of training sample, according to table below up to formula to the feature of medicine and disease to
Amount is updated:
ai+=η (ei,jbjFij(1-Fij)-λai)
bj+=η (ei,jaiFij(1-Fij)-λbj)
Wherein, aiFor the characteristic vector of medicine, bjFor the characteristic vector of disease,For current medical riWith
Disease sjBetween the degree of association predictive value, YijFor current medical riWith disease sjBetween the degree of association actual value, ei,j=Yij-Fij
For current medical riWith disease sjBetween the degree of association actual value and predictive value between error, η is learning rate, and λ is punishment system
Number;All samples have been traveled through, an iteration process has been designated as;
Step S82:Repeat step S81, until it reaches maximum iteration time.
10. the method for claim 1, it is characterised in that in step s 9, the matrix decomposition model after training is for example following
Expression formula shown in, according to the model to the medicine r beyond training sampleiWith disease sjThe degree of association of combination is predicted:
Wherein, aiAnd bjFor medicine r after trainingiWith disease sjCharacteristic vector.
11. the method for claim 1, it is characterised in that
In step slo, following process is repeated 100 times:Step S7 generates negative sample again at random, in step S8 re -training
Matrix decomposition model, the unknown in step S9 is to medicine-disease association matrix beyond positive negative sample are predicted.
12. the method for claim 1, it is characterised in that in step slo, for the medicine r beyond positive sampleiWith disease
Sick sjCombination, the combination be selected as negative sample number of times be N, in 100-N model of negative sample is not selected as, the group
The predictive value summation for closing the degree of association is S, then degree of association prediction score score of the combination is calculated up to formula according to table below
For:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410302140.4A CN104021316B (en) | 2014-06-27 | 2014-06-27 | Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410302140.4A CN104021316B (en) | 2014-06-27 | 2014-06-27 | Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104021316A CN104021316A (en) | 2014-09-03 |
CN104021316B true CN104021316B (en) | 2017-04-05 |
Family
ID=51438068
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410302140.4A Active CN104021316B (en) | 2014-06-27 | 2014-06-27 | Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104021316B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017100794A1 (en) * | 2015-12-12 | 2017-06-15 | Cipherome, Inc. | Computer-implemented evaluaton of drug safety for a population |
CN105653846B (en) * | 2015-12-25 | 2018-08-31 | 中南大学 | Drug method for relocating based on integrated similarity measurement and random two-way migration |
CN107666403B (en) * | 2016-07-29 | 2022-01-28 | 中兴通讯股份有限公司 | Index data acquisition method and device |
CN107403069B (en) | 2017-07-31 | 2020-05-12 | 京东方科技集团股份有限公司 | System and method for analyzing drug-disease association relationship |
CN108062556B (en) * | 2017-11-10 | 2021-09-14 | 广东药科大学 | Drug-disease relationship identification method, system and device |
CN109935341B (en) * | 2019-04-09 | 2021-04-13 | 北京深度制耀科技有限公司 | Method and device for predicting new drug indication |
CN110957002B (en) * | 2019-12-17 | 2023-04-28 | 电子科技大学 | Drug target interaction relation prediction method based on synergistic matrix decomposition |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1625602A (en) * | 2002-03-13 | 2005-06-08 | 霍夫曼-拉罗奇有限公司 | Method for selecting drug sensitivity decision factor and method for predetermining drug sensitivity using the selected factor |
WO2005114578A1 (en) * | 2004-05-21 | 2005-12-01 | Bioimagene, Inc. | Method and system for automated quantitation of tissue micro-array (tma) digital image analysis |
WO2008054768A2 (en) * | 2006-10-31 | 2008-05-08 | The Board Of Trustees Of The Leland Stanford Junior University | Methods for constructing association maps of imaging data and biological data |
CN102855398A (en) * | 2012-08-28 | 2013-01-02 | 中国科学院自动化研究所 | Method for obtaining disease potentially-associated gene based on multi-source information fusion |
-
2014
- 2014-06-27 CN CN201410302140.4A patent/CN104021316B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1625602A (en) * | 2002-03-13 | 2005-06-08 | 霍夫曼-拉罗奇有限公司 | Method for selecting drug sensitivity decision factor and method for predetermining drug sensitivity using the selected factor |
WO2005114578A1 (en) * | 2004-05-21 | 2005-12-01 | Bioimagene, Inc. | Method and system for automated quantitation of tissue micro-array (tma) digital image analysis |
WO2008054768A2 (en) * | 2006-10-31 | 2008-05-08 | The Board Of Trustees Of The Leland Stanford Junior University | Methods for constructing association maps of imaging data and biological data |
CN102855398A (en) * | 2012-08-28 | 2013-01-02 | 中国科学院自动化研究所 | Method for obtaining disease potentially-associated gene based on multi-source information fusion |
Non-Patent Citations (3)
Title |
---|
利用矩阵分解提取生物医学文献中潜在相关基因;张浩 等;《医学信息学杂志》;20131231;第34卷(第5期);第55-60、70页 * |
基于二分图评价模型的网络药物靶标预测改进方法;刘西 等;《中国中药杂志》;20120131;第37卷(第2期);第125-129页 * |
基于靶标识别的心脑血管潜在致病基因预测;左晓晗 等;《中国中药杂志》;20120131;第37卷(第2期);第130-133页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104021316A (en) | 2014-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104021316B (en) | Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine | |
CN109033738B (en) | Deep learning-based drug activity prediction method | |
Rau et al. | An empirical Bayesian method for estimating biological networks from temporal microarray data | |
WO2020220545A1 (en) | Long short-term memory model-based disease prediction method and apparatus, and computer device | |
Yu et al. | Deep learning in target prediction and drug repositioning: Recent advances and challenges | |
WO2019143737A1 (en) | Systems and methods for modeling probability distributions | |
CN105117618B (en) | It is a kind of based on the drug targets of guilt by association principle and network topology structure feature interact recognition methods | |
CN107180152A (en) | Disease forecasting system and method | |
CN109065174A (en) | Consider the case history theme acquisition methods and device of similar constraint | |
CN110782948A (en) | Method for predicting potential association of miRNA and disease based on constraint probability matrix decomposition method | |
CN112967803A (en) | Early mortality prediction method and system for emergency patients based on integrated model | |
Sharma et al. | Prediction of Heart Disease Using Cleveland Dataset: A Machine Learning Approach. | |
Liu et al. | Prediction of microbe–disease associations by graph regularized non-negative matrix factorization | |
Park et al. | Frequency-aware attention based LSTM networks for cardiovascular disease | |
CN111145902A (en) | Asthma diagnosis method based on improved artificial neural network | |
Li et al. | Prediction model of ischemic stroke recurrence using PSO-LSTM in mobile medical monitoring system | |
CN117457192A (en) | Intelligent remote diagnosis method and system | |
Taherinezhad et al. | COVID-19 crisis management: Global appraisal using two-stage DEA and ensemble learning algorithms | |
Choi et al. | Learning relational Kalman filtering | |
CN115019960B (en) | Disease assistant decision-making system based on personalized state space progress model | |
Wei et al. | Intelligent medical auxiliary diagnosis algorithm based on improved decision tree | |
CN114722217A (en) | Content pushing method based on link prediction and collaborative filtering | |
CN115376658A (en) | Artificial intelligent evaluation method for traditional Chinese medicine prescription based on fusion phenotype and molecular information of deep neural network | |
Al-Sagheer et al. | Data Mining and RBF Neural Networks to Analyze Data from COVID-19 Patients and Predict New Cases Based on Symptoms | |
Zhang et al. | msiDBN: a method of identifying critical proteins in dynamic PPI networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |