CN104021316B - Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine - Google Patents

Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine Download PDF

Info

Publication number
CN104021316B
CN104021316B CN201410302140.4A CN201410302140A CN104021316B CN 104021316 B CN104021316 B CN 104021316B CN 201410302140 A CN201410302140 A CN 201410302140A CN 104021316 B CN104021316 B CN 104021316B
Authority
CN
China
Prior art keywords
disease
medicine
gene
association
collection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410302140.4A
Other languages
Chinese (zh)
Other versions
CN104021316A (en
Inventor
刘西
代文
高波
高一波
卢朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201410302140.4A priority Critical patent/CN104021316B/en
Publication of CN104021316A publication Critical patent/CN104021316A/en
Application granted granted Critical
Publication of CN104021316B publication Critical patent/CN104021316B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a kind of method for predicting new indication to old medicine based on the matrix decomposition that gene space merges, the method includes:Collect the associated data of medicine and disease;Prepare the data of gene space;Gene compactness matrix is extracted from gene space;According to gene compactness matrix construction joint space;The feature space of medicine and disease is calculated according to joint space;With the feature space initialization matrix decomposition model of medicine and disease;Prepare training sample;Matrix decomposition model is trained;To medicine disease combination degree of the being associated prediction beyond training sample;By sample preparation, model training and prediction process repeatedly, all possible medicine disease is combined, its degree of association score is predicted by averaging;To degree of association prediction score sequence, score threshold is set and screens potential medicine disease association relation, predict the new indication of medicine.The method of the present invention can more accurately filter out potential medicine disease association relation.

Description

Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine
Technical field
The present invention relates to application of the computer technology in medical research, the square for more particularly to being merged based on gene space Battle array decomposes the method for predicting new indication (old medicine is newly used) to old medicine.
Background technology
As new drug development input is big, cycle length, risk are high, the number for listing new drug occurs in that significant reduction.One The construction cycle of new chemical entitieses medicine, cost was more than 1,000,000,000 dollars often more than 10 years.From drug candidate to approval life Produce, its success rate only has one one-tenth.In recent years, medicament research and development person increasingly pays close attention to old medicine and newly uses field.Old medicine is newly used, refer to for Existing medicine finds the new indication in clinical practice.The new R&D process of old medicine can exempt existing toxicology with Pharmacokinetic Evaluation, this greatly shortens the time of research and development and cost.The clinical safety of medicine is ground it has been determined that it reducing The risk sent out.Additionally, new indication is found for old medicine, can continue the service life of medicine, expand the indication of medicine.Allusion quotation Type example includes Thalidomide, viagra, finasteride etc..For example, Thalidomide is once wide as gestation reaction medicine General use, afterwards due to the phocomelia malformation fetus for occurring being caused by Thalidomide in a large number, the medicine is stopped use.Afterwards Come, scientist has found Thalidomide for human immune system has adjustment effect, then in 1998 by U.S.'s food and medicine Surveillance Authority (FDA) ratifies as a kind of medicine list marketing for treating leprosy.But, the new application of these medicines is often It is derived from having been surprisingly found that.Therefore, provide effective computational methods for the new systematicness prediction of old medicine to support to become to pass weight Will.
At present, researcher has been proposed for some for finding the computation model of the new indication of medicine, these models master All to be to be matched using the attribute data of medicine and disease, so as to find new medicine-disease association, realize that old medicine is new With.The medicine wherein used and the attribute data of disease include the data such as transcription group, side effect, path and gene profile.These Research achieves certain achievement, but, they to require be before the study that medicine and disease calculate clear and definite attribute number According to, and this can expend regular hour and energy.Conversely, the medicine associated data related to disease is easier from disclosure Obtain in data base.In internet arena, the matrix decomposition model based on associated data is played on user behavior analysis Huge effect, during we have reason to believe the association analysiss that the model can be used for medicine-disease.And, with when Between passage, the associated data information of the medicine accumulated in public database and disease also can be more and more complete.Additionally, medicine and The relatedness of disease may have been embodied in gene space.Therefore, the related information in gene space can also be incorporated into medicine In the analysis of thing-disease association.
For such case, it is necessary to set up old medicine newly with research framework using associated data, design based on gene space The matrix disassembling method of fusion, for predicting new medicine-disease association relation, finds the new indication of medicine, so as to realize The bigger use value of medicine.
The content of the invention
(1) technical problem to be solved
The technical problem to be solved is to provide a kind of method for predicting new indication for old medicine, for pre- New medicine-disease association relation is surveyed, the new indication of medicine is found, is realized that old medicine is newly used.
(2) technical scheme
To solve above-mentioned technical problem, the present invention proposes that a kind of matrix decomposition merged based on gene space is predicted to old medicine The method of new indication, methods described comprise the steps:
Step S1:The associated data of medicine collection and disease collection known to collecting;
Step S2:The data of gene space are obtained, the data of gene space are obtained, the data include and the known medicine Related medicine-gene association the collection of the thing disease-gene association collection related to disease in step S1 and gene association collection;
Step S3:Go out gene compactness matrix from the extracting data of gene space;
Step S4:Eigenvalues Decomposition is carried out to gene compactness matrix, the theorem in Euclid space of a low dimensional is generated so that is closed Each gene in note gene set can be represented with the characteristic vector of a low dimensional, so as to construct joint space;
Step S5:According to medicine-gene association collection and disease-gene association collection, medicine and disease are calculated from joint space The feature space of disease;
Step S6:Matrix decomposition model is set up according to the eigenmatrix of the eigenmatrix and disease of medicine, it is empty using joint Between in medicine and the characteristic vector of disease model is initialized;
Step S7:Training sample is obtained according to medicine collection, disease collection and medicine disease association collection;
Step S8:The matrix decomposition model is trained using the training sample;
Step S9:By the matrix decomposition model after training for the prediction to the medicine beyond training sample-disease combination;
Step S10:Return to step S7, the processing procedure being related to during step S7, S8, S9 is repeated several times, to positive sample with Outer all possible medicine-disease group predicts its degree of association;
Step S11:According to the degree of association of step S10 prediction, the incidence relation of screening of medicaments-disease, prediction medicine are new Indication.
Specific embodiment of the invention, step S1 are further included:
Step S11:Medicine-disease association collection known to collecting;
Step S12:According to medicine-disease association collection construction medicine-disease association matrix.
Specific embodiment of the invention, in step s 2, the data of gene space include medicine-gene association Collection, disease-gene association collection and gene association collection, the acquisition of these data are realized by inquiring about data base.
Specific embodiment of the invention, medicine-gene association collection by inquire about KEGG BRITE, BRENDA, SuperTarget and DrugBank data bases obtain, and disease-gene association collection is obtained by inquiring about omim database, and gene is closed Connection collection is obtained by inquiring about HPRD data bases.
Specific embodiment of the invention, gene are represented with Entrez Gene ID.
Specific embodiment of the invention, step S3 are further included:
Step S31:The gene Internet is built according to the gene association collection;
Step S32:Using the medicine-gene association collection and disease-gene association concentrate all genes being related to as Concern gene set calculates the compactness between gene.
Specific embodiment of the invention, step S4 include:
Step S41:The eigenvalue of gene compactness matrix is calculated, with eigenvalue size descending sort, while obtaining correspondence Characteristic vector;
Step S42:Before extracting, k eigenvalue constitutes k × k diagonal matrix Λ, while corresponding k characteristic vector is extracted, Constitute Ng× k matrix Γ, wherein, one characteristic vector of correspondence per string, k is natural number;
Step S43:Calculate Ng× k matrix P=Γ Λ1/2, by this matrix, gene compactness matrix can decompose For C=PPT=Γ Λ1/2Λ1/2ΓT
Step S44:Gene is represented using each row vector of matrix POne group of k dimensional vectorConstitute joint space.
Specific embodiment of the invention, in step s 5, the feature space of disease according to table below up to formula come Calculate:
Wherein, pgThe characteristic vector of gene in joint space is represented,Concentrate and disease s for disease-gene associationiHave The gene sets of incidence relation,Represent disease siThe number of the gene of association.
Specific embodiment of the invention, in step s 6, matrix decomposition model is
Specific embodiment of the invention,The eigenmatrix of medicine is represented,The eigenmatrix of disease is represented, the initialization of model is carried out up to formula according to table below:
Specific embodiment of the invention, in the step s 7, the sample using medicine-disease association concentration is used as positive sample This;Concentrate random choose medicine and disease to be combined from medicine collection and disease, construct the medicine not occurred in positive sample Thing-disease pair, generates and the same number of medicine-disease pair of positive sample, using them as negative sample;In medicine-disease association Negative sample respective items are set to into Y in matrix Yij=0.
Specific embodiment of the invention, step S8 are further included:
Step S81:Successively using each sample of training sample, according to table below up to formula to medicine and disease Characteristic vector is updated:
ai+=η (eI, jbjFij(1-Fij)-λai)
bj+=η (eI, jaiFij(1-Fij)-λbj)
Wherein,For current medical riWith disease sjBetween the degree of association predictive value, eI, j=Yij- FijFor the error between actual value and predictive value, η is learning rate, and λ is penalty coefficient.All samples are traveled through, has been designated as once changing In generation, is processed;
Step S82:Repeat step S81, until it reaches maximum iteration time.
Specific embodiment of the invention, in step s 9, for example following expression of matrix decomposition model after training Shown in formula, according to the model to the medicine r beyond training sampleiWith disease sjThe degree of association of combination is predicted:
Wherein, aiAnd bjFor medicine r after trainingiWith disease sjCharacteristic vector.
Specific embodiment of the invention, in step slo, repeats 100 times to following process:Step S7 again with Machine generates negative sample, in step S8 re -training matrix decomposition model, the positive and negative sample in step S9 is to medicine-disease association matrix The unknown beyond this is predicted.
Specific embodiment of the invention, in step slo, for the medicine r beyond positive sampleiWith disease sj's Combination, it is N that the combination is selected as the number of times of negative sample, in 100-N model of negative sample is not selected as, the combination degree of association Predictive value summation be S, then according to table below up to formula calculate the combination the degree of association prediction score score be:
(3) beneficial effect
Method provided by the present invention, focuses on the analysis to associated data and modeling, compared to based on attribute data Method, eliminates the calculating process of attribute data, more simple and easy to do;The associated data for being used derives from open biology number According to storehouse, data reliability;With the continuous accumulation of the associated data in public database, method provided by the present invention can be played More and more significant effect;Potential medicine-disease can more accurately be filtered out by method provided by the present invention to close Connection relation, is that medicine predicts new indication.
Description of the drawings
Fig. 1 is the flow chart of method provided by the present invention;
Fig. 2 is the operating process schematic diagram of the specific embodiment of the invention;
Fig. 3 A and Fig. 3 B is etoposide (Etoposide) and myasthenia graviss in the specific embodiment of the invention respectively Change procedure figure of the characteristic vector of (Myasthenia Gravis) in model training;
Fig. 4 is the scattergram of degree of association prediction score in the specific embodiment of the invention;
Fig. 5 is the corresponding relation figure that prediction obtains new indication number and medicine number in the specific embodiment of the invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention become more apparent, below in conjunction with specific embodiment, and reference Accompanying drawing, the present invention is described in further detail.
The present invention sets up and medicine-disease association, medicine-gene association, disease-gene association and gene association is being divided On analysis basis.Joint space is constructed by the gene Internet of gene space, then obtain from joint space medicine and The characteristic vector of disease.So, the topological characteristic of gene space has just been dissolved in the feature space of medicine and disease.Then, Under the supervision of medicine-disease association data, by matrix decomposition model the characteristic vector of medicine and disease is updated with Adjustment.Finally, the feature space after medicine and disease adjustment is used in the screening of potential medicine-disease association relation, from And predict the new indication of medicine.
Fig. 1 is the flow chart of method provided by the present invention.As shown in figure 1, the present invention provides a kind of based on gene space The new matrix decomposition Forecasting Methodology of the old medicine of fusion, the method comprise the steps:
Step S1:The associated data of medicine collection and disease collection known to collecting.
Preferably, step S1 includes:
Step S11:Medicine-disease association collection known to collecting.The medicine collection being directed to Represent, the disease collection being related toRepresent, NrRepresent the number of medicine, NsRepresent the number of disease. Medicine represents that with DrugBank ID disease is represented with OMIM ID.
Step S12:According to medicine-disease association collection construction medicine-disease association matrix.Specifically, construct Nr×NsMedicine Thing-disease association matrix Y.Wherein, if medicine riWith disease sjConcentrate in medicine-disease association and there is incidence relation, then medicine Thing-disease association matrix respective items Yij=1, other are unknown for matrix.
Step S2:The data of gene space are obtained, the data include that the medicine-gene related to the known medicine is closed Connection collection, the disease related to disease in step S1-gene association collection and gene association collection.
Gene association collection describes gene-gene association set;Gene space refers to the metric space of gene.
Preferably, step S2 includes:
Step S21:From collection step S1 in KEGG BRITE, BRENDA, SuperTarget and DrugBank data bases Related medicine-gene association the collection of medicine.Gene is represented with Entrez Gene ID.
Step S22:Related disease-gene association the collection of disease in collection step S1 from omim database.Gene is used Entrez Gene ID are represented.
Step S23:Gene association collection is extracted from HPRD data bases.
Gene is represented with Entrez Gene ID.
Step S3:Go out gene compactness matrix from gene space extracting data.
What gene compactness matrix was represented is the matrix for describing correlation degree between gene.
Preferably, step S3 includes:
Step S31:The gene Internet is built according to the gene association collection.
The gene Internet refers to the network being made up of gene-gene association relation.
Specifically, represent that gene association concentrates the gene for including with node, represent what gene association was concentrated with incidence edge Gene-gene association relation, connects each node, builds the gene Internet with this.
Step S32:Using the medicine-gene association collection and disease-gene association concentrate all genes being related to as Concern gene set calculates the compactness between gene.
Specifically, all genes being related to are concentrated as concern base using medicine-gene association collection and disease-gene association Because of collectionWherein the number of gene is Ng, i-th gene g is calculated up to formula according to table belowiWith j-th Gene gjBetween compactness:
Wherein, dijRepresent giWith gjBeeline in the gene Internet, a ' and b ' are regulation parameter.When two bases When cause is unreachable in a network, dijIt is defined as infinity.So, it is tight between any two gene in calculating concern gene set Degree, constitutes Ng×NgGene compactness Matrix C.
Step S4:Eigenvalues Decomposition is carried out to gene compactness matrix, the theorem in Euclid space of a low dimensional is generated so that is closed Each gene in note gene set can be represented with the characteristic vector of a low dimensional, so as to construct joint space.
So-called joint space, expression is for measuring medicine, disease and gene unified space.
Preferably, step S4 includes:
Step S41:The eigenvalue of gene compactness matrix is calculated, with eigenvalue size descending sort, while obtaining correspondence Characteristic vector.
Gene compactness matrix is the gene compactness Matrix C.
Step S42:Before extracting, k eigenvalue constitutes k × k diagonal matrix Λ, while corresponding k characteristic vector is extracted, Constitute Ng× k matrix Γ, wherein, one characteristic vector of correspondence per string.K is natural number, for example, take 8.
Step S43:Calculate Ng× k matrix P=Γ Λ1/2, by this matrix, gene compactness matrix can decompose For C=PPT=Γ Λ1/2Λ1/2ΓT
Step S44:Gene is represented using each row vector of matrix PSo, one group of k tie up to AmountConstitute joint space.
Step S5:According to medicine-gene association collection and disease-gene association collection, medicine and disease are calculated from joint space The feature space of disease.
So, medicine space, disease space and gene space unification has been arrived in joint space.
Preferably, step S5 includes:
Step S51:For each medicine ri, medicine-gene association collection is traveled through, relevant gene set is collected Close, be designated as
The characteristic vector of medicine is calculated according to table below up to formula:
Wherein,Represent medicine riThe number of the gene of association.
Step S52:For each disease si, disease-gene association collection is traveled through, relevant gene set is collected Close, be designated as
The characteristic vector of disease is calculated according to table below up to formula:
Wherein,Represent disease siThe number of the gene of association.
Step S6:Matrix decomposition model is set up according to the eigenmatrix of the eigenmatrix and disease of medicine, it is empty using joint Between in medicine and the characteristic vector of disease model is initialized.
What matrix decomposition model was represented is a kind of computation model with incidence relation between hidden characteristic vector description object.Can To set up matrix decomposition modelWherein,Represent the feature square of medicine Battle array,Represent the eigenmatrix of disease.
The characteristic vector of medicine and disease in available joint space is initialized to matrix A and B:
Step S7:Training sample is obtained according to medicine collection, disease collection and medicine disease association collection.
Wherein, the sample using medicine-disease association concentration is used as positive sample.Random choose medicine is concentrated from medicine collection and disease Thing and disease are combined, and construct the medicine-disease pair not occurred in positive sample.Generate the same number of with positive sample Medicine-disease pair, it is believed that determine there is no incidence relation between them, using them as negative sample.In medicine-disease association square Negative sample respective items are set to into Y in battle array Yij=0, residual matrix item is unknown.WithRepresent training sample set.
Step S8:The matrix decomposition model is trained using the training sample.
Be presented herein below be model training algorithm a false code.
Preferably, step S8 includes:
Step S81:Training sample set is utilized successivelyIn each sample, according to table below up to formula to medicine and The characteristic vector of disease is updated:
ai+=η (eI, jbjFij(1-Fij)-λai)
bj+=η (eI, jaiFij(1-Fij)-λbj)
Wherein,For current medical riWith disease sjBetween the degree of association predictive value, eI, j=Yij- FijFor the error between actual value and predictive value, η is learning rate, and λ is penalty coefficient.All samples are traveled through, has been designated as once changing In generation, is processed.
Step S82:Repeat step S81, until it reaches maximum iteration time.
Step S9:According to the matrix decomposition model after training to beyond positive negative sample in medicine-disease association matrix not Know that item is predicted.
The medicine r unknown for incidence relationiWith disease sj, according to following model expression to the association between them Degree is predicted:
Wherein, aiAnd bjFor medicine r after trainingiWith disease sjCharacteristic vector.
Step S10:Return to step S7, the processing procedure being related to during step S7, S8, S9 is repeated several times, to positive sample with Outer all possible medicine-disease group predicts its degree of association.
For example, the processing procedure that step S7, S8, S9 is related to repeats 100 times, to positive sample beyond it is all possible Medicine-disease combination, predicts its degree of association by averaging.
Preferably, negative sample is generated again at random in step S7, in step S8 re -training matrix decomposition model, in step S9 is predicted to the unknown beyond positive negative sample in medicine-disease association matrix.Above procedure 100 can for example be repeated It is secondary.
For the medicine r beyond positive sampleiWith disease sjCombination, the combination be selected as negative sample number of times be N, not It is selected as in 100-N model of negative sample, the predictive value summation of the combination degree of association is S, then reach formula according to table below Calculate the degree of association prediction score of the combination:
Step S11:According to the degree of association of step S10 prediction, the incidence relation of screening of medicaments-disease, prediction medicine are new Indication.
According to the step S10 degree of association predict score order from high to low, the medicine-disease beyond positive sample is combined into Row sequence.Score threshold is set and screens potential medicine-disease association relation, predict the new indication of medicine.
Each feature and advantage of the present invention are embodied below by the explanation to a specific embodiment.The embodiment is selected The medicine extracted in taking document and public database and the associated data of disease are used to predict that old medicine is newly used.Fig. 2 is tool of the present invention The operating process schematic diagram of body embodiment, as shown in Fig. 2 operation comprises the steps:
Step S1:The associated data of medicine collection and disease collection known to collecting.
Medicine-disease association collection is collected by document and public database, is obtained between 130 medicines and 50 diseases altogether 213 medicine-disease association relations.Construction 130 × 50 medicines-disease association matrix Y, wherein, the corresponding matrix entries of incidence set It is set to 1.
Step S2:The data of gene space are obtained, the data include that the medicine-gene related to the known medicine is closed Connection collection, the disease related to disease in step S1-gene association collection and gene association collection.
130 medicines are collected from KEGG BRITE, BRENDA, SuperTarget to DrugBank data bases related Medicine-gene association collection, totally 776 medicine-gene association relations.The related disease of 50 diseases is collected from omim database Disease-gene association collection, totally 74 disease-gene association relations.Gene association collection is extracted from HPRD data bases, is related to 9415 Totally 36882 gene interactive relations between individual gene.
Step S3:Go out gene compactness matrix from gene space extracting data.
Medicine-gene association collection and disease-gene association are concentrated and are related to 850 genes altogether.According to gene association collection structure Build the gene Internet.A '=10 and b '=0.25 are taken, gene g is calculated according to expressioniWith gene gjBetween it is tight Degree:
Wherein, dijRepresent giWith gjBeeline in the gene Internet, when two genes it is unreachable in a network When, dijIt is defined as infinity.So, the compactness in 850 genes of calculating between any two gene, constitutes 850 × 850 bases Because of compactness Matrix C.
Step S4:Eigenvalues Decomposition is carried out to gene compactness matrix, the theorem in Euclid space of a low dimensional is generated so that is closed Each gene in note gene set can be represented with the characteristic vector of a low dimensional, so as to construct joint space.
In this embodiment, Eigenvalues Decomposition is carried out to gene compactness Matrix C.K=8 is taken, one group of 8 dimensional vector is obtainedRepresent the characteristic vector of gene, tectonic syntaxis space.
Step S5:According to medicine-gene association collection and disease-gene association collection, medicine and disease are calculated from joint space The feature space of disease.
For each medicine ri, medicine-gene association collection is traveled through, relevant gene sets is collected, is designated asThe characteristic vector of medicine is calculated according to table below up to formula:
Wherein,Represent medicine riThe number of the gene of association.
For each disease si, disease-gene association collection is traveled through, relevant gene sets is collected, is designated asThe characteristic vector of disease is calculated according to table below up to formula:
Wherein,Represent disease siThe number of the gene of association.
Step S6:Matrix decomposition model is set up according to the eigenmatrix of the eigenmatrix and disease of medicine, it is empty using joint Between in medicine and the characteristic vector of disease model is initialized.
In this embodiment, set up matrix decomposition modelWith medicine and disease in joint space The characteristic vector of disease is initialized to matrix A and B:
Step S7:Training sample is obtained according to medicine collection, disease collection and medicine disease association collection.
In this embodiment, the 213 medicines-disease association relation using medicine-disease association concentration is used as positive sample.From In 130 medicines and 50 diseases, random choose medicine and disease are combined, and construct the medicine not occurred in positive sample Thing-disease pair.Generate and the same number of medicine-disease pair of positive sample, it is believed that between them, determine there is no incidence relation, with They are used as negative sample.Negative sample respective items are set to into 0 in medicine-disease association matrix Y, residual matrix item is unknown.
Step S8:The matrix decomposition model is trained using the training sample.
Successively using each sample in 426 training samples, according to table below up to formula to medicine and disease Characteristic vector is updated:
ai+=η (eI, jbjFij(1-Fij)-λai)
bj+=η (eI, jaiFij(1-Fij)-λbj)
Wherein,For current medical riWith disease sjBetween the degree of association predictive value, eI, j=Yij- FijFor the error between actual value and predictive value, learning rate η=0.035, penalty coefficient λ=0.005.Maximum iteration time is set For 1500.Fig. 3 A and Fig. 3 B with etoposide (Etoposide) and myasthenia graviss (Myasthenia Gravis) is respectively Example, illustrates the process that the feature space of medicine and disease progressively tends towards stability in model training.
Step S9:According to the matrix decomposition model after training to beyond positive negative sample in medicine-disease association matrix Y not Know that item is predicted.
The medicine r unknown for incidence relationiWith disease sj, according to following model expression to the association between them Degree is predicted:
Wherein, aiAnd bjFor medicine r after trainingiWith disease sjCharacteristic vector.
Step S10:Return to step S7, the processing procedure being related to during step S7, S8, S9 is repeated several times, to positive sample with Outer all possible medicine-disease group predicts its degree of association.
In this embodiment, processing procedure step S7, S8, S9 being related to repeats 100 times, to the institute beyond positive sample Possible medicine-disease combination, predicts its degree of association by averaging.
For the medicine r beyond positive samplei(1≤i≤130) and disease sjThe combination of (1≤j≤50), the combination are selected as The number of times of negative sample is N, and in 100-N model of negative sample is not selected as, the predictive value summation of the combination degree of association is S, The degree of association prediction score of the combination is calculated according to table below up to formula so:
So, the degree of association prediction score of 6287 medicines-disease combination is calculated altogether.The degree of association predicts dividing for score Cloth is shown in Fig. 4.
Step S11:According to the degree of association of step S10 prediction, the incidence relation of screening of medicaments-disease, prediction medicine are new Indication.
Score orders from high to low are predicted according to 6287 degrees of association, the medicine-disease beyond positive sample is combined into Row sequence.It is 0.6459 to arrange score threshold, filters out 500 potential medicine-disease association relations, predicts that medicine is new with this Indication, realize that old medicine is newly used.This 500 potential medicine-disease association relations cover 125 medicines and 40 diseases Disease, wherein, the corresponding relation of new indication number and the medicine number for obtaining the number new indication is shown in Fig. 5.
For the potential drug-disease association relation of prediction, clinical research and medical literature verify that they are that comparison is accurate With it is effective.By these potential medicine-disease association relations, can be that medicine predicts new indication, it is old so as to realize Medicine is newly used.
Particular embodiments described above, has been carried out to the purpose of the present invention, technical scheme and beneficial effect further in detail Describe in detail bright, it should be understood that the foregoing is only the specific embodiment of the present invention, be not limited to the present invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc. should be included in the protection of the present invention Within the scope of.

Claims (12)

1. a kind of method that matrix decomposition based on gene space fusion predicts new indication to old medicine, it is characterised in that described Method comprises the steps:
Step S1:Collect known to medicine-disease association collection, the medicine collection being directed toTable Show, the disease collection being related toRepresent, NrRepresent the number of medicine, NsRepresent the number of disease;Root According to known medicine-disease association collection construction medicine-disease association matrix;
Step S2:Obtain gene space data, the data include the medicine-gene association collection related to known medicine and Related disease-gene association the collection of disease and gene association collection in step S1;
Step S3:Go out gene compactness matrix from the extracting data of gene space;
Step S4:Eigenvalues Decomposition is carried out to gene compactness matrix, the theorem in Euclid space of a low dimensional is generated so that concern base Because each gene concentrated can be represented with the characteristic vector of a low dimensional, so as to construct joint space, the pass Note gene set includes that medicine-gene association collection and disease-gene association concentrate all genes being related to;
Step S5:According to medicine-gene association collection and disease-gene association collection, medicine and disease are calculated from joint space Eigenmatrix;
Step S6:Matrix decomposition model is set up according to the eigenmatrix of the eigenmatrix and disease of medicine, using in joint space Medicine and the characteristic vector of disease model is initialized;
Step S7:Training sample is obtained according to medicine collection, disease collection and medicine-disease association collection;
Step S8:The matrix decomposition model is trained using the training sample;
Step S9:By the matrix decomposition model after training for the prediction to the medicine beyond training sample-disease combination;
Step S10:Return to step S7, the processing procedure being related to during step S7, S8, S9 is repeated several times, beyond positive sample All possible medicine-disease combined prediction its degree of association, the positive sample is that described known medicine-disease association is concentrated Sample;
Step S11:According to the degree of association of step S10 prediction, the incidence relation of screening of medicaments-disease, the new adaptation of medicine is predicted Disease.
2. the method for claim 1, it is characterised in that in step s 2, the data of gene space include medicine-gene Incidence set, disease-gene association collection and gene association collection, the acquisition of these data are realized by inquiring about data base.
3. method as claimed in claim 2, it is characterised in that medicine-gene association collection by inquire about KEGG BRITE, BRENDA, SuperTarget and DrugBank data base obtains, and disease-gene association collection is obtained by inquiring about omim database, Gene association collection is obtained by inquiring about HPRD data bases.
4. method as claimed in claim 2, it is characterised in that gene is represented with Entrez Gene ID.
5. the method for claim 1, it is characterised in that step S3 is further included:
Step S31:The gene Internet is built according to the gene association collection;
Step S32:Compactness between gene is calculated with the concern gene set.
6. the method for claim 1, it is characterised in that step S4 includes:
Step S41:The eigenvalue of gene compactness matrix is calculated, with eigenvalue size descending sort, while obtaining corresponding spy Levy vector;
Step S42:Before extracting, k eigenvalue constitutes k × k diagonal matrix Λ, while extracting corresponding k characteristic vector, constitutes Ng × k matrix Γ, wherein, one characteristic vector of correspondence per string, k is natural number, NgFor the number of gene in the concern gene set Mesh;
Step S43:Calculate Ng× k matrix P=Γ Λ1/2, by this matrix, gene compactness matrix can be decomposed into C= PPT=Γ Λ1/2Λ1/2ΓT
Step S44:Gene is represented using each row vector of matrix POne group of k dimensional vector Constitute joint space.
7. the method for claim 1, it is characterised in that in step s 5, the feature space of disease is according to table below Calculate up to formula:
Wherein, pgThe characteristic vector of gene in joint space is represented,Concentrate and disease s for disease-gene associationiIt is relevant The gene sets of relation,Represent disease siThe number of the gene of association.
8. the method for claim 1, it is characterised in that in the step s 7, concentrates random choose from medicine collection and disease Medicine and disease are combined, and construct the medicine-disease pair not occurred in positive sample, generate identical with positive sample number Medicine-disease pair, using them as negative sample;Negative sample respective items are set in the medicine-disease association matrix Y Yij=0.
9. the method for claim 1, it is characterised in that step S8 is further included:
Step S81:Successively using each sample of training sample, according to table below up to formula to the feature of medicine and disease to Amount is updated:
ai+=η (ei,jbjFij(1-Fij)-λai)
bj+=η (ei,jaiFij(1-Fij)-λbj)
Wherein, aiFor the characteristic vector of medicine, bjFor the characteristic vector of disease,For current medical riWith Disease sjBetween the degree of association predictive value, YijFor current medical riWith disease sjBetween the degree of association actual value, ei,j=Yij-Fij For current medical riWith disease sjBetween the degree of association actual value and predictive value between error, η is learning rate, and λ is punishment system Number;All samples have been traveled through, an iteration process has been designated as;
Step S82:Repeat step S81, until it reaches maximum iteration time.
10. the method for claim 1, it is characterised in that in step s 9, the matrix decomposition model after training is for example following Expression formula shown in, according to the model to the medicine r beyond training sampleiWith disease sjThe degree of association of combination is predicted:
Wherein, aiAnd bjFor medicine r after trainingiWith disease sjCharacteristic vector.
11. the method for claim 1, it is characterised in that
In step slo, following process is repeated 100 times:Step S7 generates negative sample again at random, in step S8 re -training Matrix decomposition model, the unknown in step S9 is to medicine-disease association matrix beyond positive negative sample are predicted.
12. the method for claim 1, it is characterised in that in step slo, for the medicine r beyond positive sampleiWith disease Sick sjCombination, the combination be selected as negative sample number of times be N, in 100-N model of negative sample is not selected as, the group The predictive value summation for closing the degree of association is S, then degree of association prediction score score of the combination is calculated up to formula according to table below For:
CN201410302140.4A 2014-06-27 2014-06-27 Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine Active CN104021316B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410302140.4A CN104021316B (en) 2014-06-27 2014-06-27 Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410302140.4A CN104021316B (en) 2014-06-27 2014-06-27 Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine

Publications (2)

Publication Number Publication Date
CN104021316A CN104021316A (en) 2014-09-03
CN104021316B true CN104021316B (en) 2017-04-05

Family

ID=51438068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410302140.4A Active CN104021316B (en) 2014-06-27 2014-06-27 Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine

Country Status (1)

Country Link
CN (1) CN104021316B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017100794A1 (en) * 2015-12-12 2017-06-15 Cipherome, Inc. Computer-implemented evaluaton of drug safety for a population
CN105653846B (en) * 2015-12-25 2018-08-31 中南大学 Drug method for relocating based on integrated similarity measurement and random two-way migration
CN107666403B (en) * 2016-07-29 2022-01-28 中兴通讯股份有限公司 Index data acquisition method and device
CN107403069B (en) 2017-07-31 2020-05-12 京东方科技集团股份有限公司 System and method for analyzing drug-disease association relationship
CN108062556B (en) * 2017-11-10 2021-09-14 广东药科大学 Drug-disease relationship identification method, system and device
CN109935341B (en) * 2019-04-09 2021-04-13 北京深度制耀科技有限公司 Method and device for predicting new drug indication
CN110957002B (en) * 2019-12-17 2023-04-28 电子科技大学 Drug target interaction relation prediction method based on synergistic matrix decomposition

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1625602A (en) * 2002-03-13 2005-06-08 霍夫曼-拉罗奇有限公司 Method for selecting drug sensitivity decision factor and method for predetermining drug sensitivity using the selected factor
WO2005114578A1 (en) * 2004-05-21 2005-12-01 Bioimagene, Inc. Method and system for automated quantitation of tissue micro-array (tma) digital image analysis
WO2008054768A2 (en) * 2006-10-31 2008-05-08 The Board Of Trustees Of The Leland Stanford Junior University Methods for constructing association maps of imaging data and biological data
CN102855398A (en) * 2012-08-28 2013-01-02 中国科学院自动化研究所 Method for obtaining disease potentially-associated gene based on multi-source information fusion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1625602A (en) * 2002-03-13 2005-06-08 霍夫曼-拉罗奇有限公司 Method for selecting drug sensitivity decision factor and method for predetermining drug sensitivity using the selected factor
WO2005114578A1 (en) * 2004-05-21 2005-12-01 Bioimagene, Inc. Method and system for automated quantitation of tissue micro-array (tma) digital image analysis
WO2008054768A2 (en) * 2006-10-31 2008-05-08 The Board Of Trustees Of The Leland Stanford Junior University Methods for constructing association maps of imaging data and biological data
CN102855398A (en) * 2012-08-28 2013-01-02 中国科学院自动化研究所 Method for obtaining disease potentially-associated gene based on multi-source information fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
利用矩阵分解提取生物医学文献中潜在相关基因;张浩 等;《医学信息学杂志》;20131231;第34卷(第5期);第55-60、70页 *
基于二分图评价模型的网络药物靶标预测改进方法;刘西 等;《中国中药杂志》;20120131;第37卷(第2期);第125-129页 *
基于靶标识别的心脑血管潜在致病基因预测;左晓晗 等;《中国中药杂志》;20120131;第37卷(第2期);第130-133页 *

Also Published As

Publication number Publication date
CN104021316A (en) 2014-09-03

Similar Documents

Publication Publication Date Title
CN104021316B (en) Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine
CN109033738B (en) Deep learning-based drug activity prediction method
Rau et al. An empirical Bayesian method for estimating biological networks from temporal microarray data
WO2020220545A1 (en) Long short-term memory model-based disease prediction method and apparatus, and computer device
Yu et al. Deep learning in target prediction and drug repositioning: Recent advances and challenges
WO2019143737A1 (en) Systems and methods for modeling probability distributions
CN105117618B (en) It is a kind of based on the drug targets of guilt by association principle and network topology structure feature interact recognition methods
CN107180152A (en) Disease forecasting system and method
CN109065174A (en) Consider the case history theme acquisition methods and device of similar constraint
CN110782948A (en) Method for predicting potential association of miRNA and disease based on constraint probability matrix decomposition method
CN112967803A (en) Early mortality prediction method and system for emergency patients based on integrated model
Sharma et al. Prediction of Heart Disease Using Cleveland Dataset: A Machine Learning Approach.
Liu et al. Prediction of microbe–disease associations by graph regularized non-negative matrix factorization
Park et al. Frequency-aware attention based LSTM networks for cardiovascular disease
CN111145902A (en) Asthma diagnosis method based on improved artificial neural network
Li et al. Prediction model of ischemic stroke recurrence using PSO-LSTM in mobile medical monitoring system
CN117457192A (en) Intelligent remote diagnosis method and system
Taherinezhad et al. COVID-19 crisis management: Global appraisal using two-stage DEA and ensemble learning algorithms
Choi et al. Learning relational Kalman filtering
CN115019960B (en) Disease assistant decision-making system based on personalized state space progress model
Wei et al. Intelligent medical auxiliary diagnosis algorithm based on improved decision tree
CN114722217A (en) Content pushing method based on link prediction and collaborative filtering
CN115376658A (en) Artificial intelligent evaluation method for traditional Chinese medicine prescription based on fusion phenotype and molecular information of deep neural network
Al-Sagheer et al. Data Mining and RBF Neural Networks to Analyze Data from COVID-19 Patients and Predict New Cases Based on Symptoms
Zhang et al. msiDBN: a method of identifying critical proteins in dynamic PPI networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant