CN104021316B

CN104021316B - Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine

Info

Publication number: CN104021316B
Application number: CN201410302140.4A
Authority: CN
Inventors: 刘西; 代文; 高波; 高一波; 卢朋
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2014-06-27
Filing date: 2014-06-27
Publication date: 2017-04-05
Anticipated expiration: 2034-06-27
Also published as: CN104021316A

Abstract

The invention discloses a kind of method for predicting new indication to old medicine based on the matrix decomposition that gene space merges, the method includes：Collect the associated data of medicine and disease；Prepare the data of gene space；Gene compactness matrix is extracted from gene space；According to gene compactness matrix construction joint space；The feature space of medicine and disease is calculated according to joint space；With the feature space initialization matrix decomposition model of medicine and disease；Prepare training sample；Matrix decomposition model is trained；To medicine disease combination degree of the being associated prediction beyond training sample；By sample preparation, model training and prediction process repeatedly, all possible medicine disease is combined, its degree of association score is predicted by averaging；To degree of association prediction score sequence, score threshold is set and screens potential medicine disease association relation, predict the new indication of medicine.The method of the present invention can more accurately filter out potential medicine disease association relation.

Description

Based on the method that the matrix decomposition that gene space merges predicts new indication to old medicine

Technical field

The present invention relates to application of the computer technology in medical research, the square for more particularly to being merged based on gene space Battle array decomposes the method for predicting new indication (old medicine is newly used) to old medicine.

Background technology

As new drug development input is big, cycle length, risk are high, the number for listing new drug occurs in that significant reduction.One The construction cycle of new chemical entitieses medicine, cost was more than 1,000,000,000 dollars often more than 10 years.From drug candidate to approval life Produce, its success rate only has one one-tenth.In recent years, medicament research and development person increasingly pays close attention to old medicine and newly uses field.Old medicine is newly used, refer to for Existing medicine finds the new indication in clinical practice.The new R＆D process of old medicine can exempt existing toxicology with Pharmacokinetic Evaluation, this greatly shortens the time of research and development and cost.The clinical safety of medicine is ground it has been determined that it reducing The risk sent out.Additionally, new indication is found for old medicine, can continue the service life of medicine, expand the indication of medicine.Allusion quotation Type example includes Thalidomide, viagra, finasteride etc..For example, Thalidomide is once wide as gestation reaction medicine General use, afterwards due to the phocomelia malformation fetus for occurring being caused by Thalidomide in a large number, the medicine is stopped use.Afterwards Come, scientist has found Thalidomide for human immune system has adjustment effect, then in 1998 by U.S.'s food and medicine Surveillance Authority (FDA) ratifies as a kind of medicine list marketing for treating leprosy.But, the new application of these medicines is often It is derived from having been surprisingly found that.Therefore, provide effective computational methods for the new systematicness prediction of old medicine to support to become to pass weight Will.

At present, researcher has been proposed for some for finding the computation model of the new indication of medicine, these models master All to be to be matched using the attribute data of medicine and disease, so as to find new medicine-disease association, realize that old medicine is new With.The medicine wherein used and the attribute data of disease include the data such as transcription group, side effect, path and gene profile.These Research achieves certain achievement, but, they to require be before the study that medicine and disease calculate clear and definite attribute number According to, and this can expend regular hour and energy.Conversely, the medicine associated data related to disease is easier from disclosure Obtain in data base.In internet arena, the matrix decomposition model based on associated data is played on user behavior analysis Huge effect, during we have reason to believe the association analysiss that the model can be used for medicine-disease.And, with when Between passage, the associated data information of the medicine accumulated in public database and disease also can be more and more complete.Additionally, medicine and The relatedness of disease may have been embodied in gene space.Therefore, the related information in gene space can also be incorporated into medicine In the analysis of thing-disease association.

For such case, it is necessary to set up old medicine newly with research framework using associated data, design based on gene space The matrix disassembling method of fusion, for predicting new medicine-disease association relation, finds the new indication of medicine, so as to realize The bigger use value of medicine.

The content of the invention

(1) technical problem to be solved

The technical problem to be solved is to provide a kind of method for predicting new indication for old medicine, for pre- New medicine-disease association relation is surveyed, the new indication of medicine is found, is realized that old medicine is newly used.

(2) technical scheme

To solve above-mentioned technical problem, the present invention proposes that a kind of matrix decomposition merged based on gene space is predicted to old medicine The method of new indication, methods described comprise the steps：

Step S1：The associated data of medicine collection and disease collection known to collecting；

Step S2：The data of gene space are obtained, the data of gene space are obtained, the data include and the known medicine Related medicine-gene association the collection of the thing disease-gene association collection related to disease in step S1 and gene association collection；

Step S3：Go out gene compactness matrix from the extracting data of gene space；

Step S4：Eigenvalues Decomposition is carried out to gene compactness matrix, the theorem in Euclid space of a low dimensional is generated so that is closed Each gene in note gene set can be represented with the characteristic vector of a low dimensional, so as to construct joint space；

Step S5：According to medicine-gene association collection and disease-gene association collection, medicine and disease are calculated from joint space The feature space of disease；

Step S6：Matrix decomposition model is set up according to the eigenmatrix of the eigenmatrix and disease of medicine, it is empty using joint Between in medicine and the characteristic vector of disease model is initialized；

Step S7：Training sample is obtained according to medicine collection, disease collection and medicine disease association collection；

Step S8：The matrix decomposition model is trained using the training sample；

Step S9：By the matrix decomposition model after training for the prediction to the medicine beyond training sample-disease combination；

Step S10：Return to step S7, the processing procedure being related to during step S7, S8, S9 is repeated several times, to positive sample with Outer all possible medicine-disease group predicts its degree of association；

Step S11：According to the degree of association of step S10 prediction, the incidence relation of screening of medicaments-disease, prediction medicine are new Indication.

Specific embodiment of the invention, step S1 are further included：

Step S11：Medicine-disease association collection known to collecting；

Step S12：According to medicine-disease association collection construction medicine-disease association matrix.

Specific embodiment of the invention, in step s 2, the data of gene space include medicine-gene association Collection, disease-gene association collection and gene association collection, the acquisition of these data are realized by inquiring about data base.

Specific embodiment of the invention, medicine-gene association collection by inquire about KEGG BRITE, BRENDA, SuperTarget and DrugBank data bases obtain, and disease-gene association collection is obtained by inquiring about omim database, and gene is closed Connection collection is obtained by inquiring about HPRD data bases.

Specific embodiment of the invention, gene are represented with Entrez Gene ID.

Specific embodiment of the invention, step S3 are further included：

Step S31：The gene Internet is built according to the gene association collection；

Step S32：Using the medicine-gene association collection and disease-gene association concentrate all genes being related to as Concern gene set calculates the compactness between gene.

Specific embodiment of the invention, step S4 include：

Step S41：The eigenvalue of gene compactness matrix is calculated, with eigenvalue size descending sort, while obtaining correspondence Characteristic vector；

Step S42：Before extracting, k eigenvalue constitutes k × k diagonal matrix Λ, while corresponding k characteristic vector is extracted, Constitute N_g× k matrix Γ, wherein, one characteristic vector of correspondence per string, k is natural number；

Step S43：Calculate N_g× k matrix P=Γ Λ^1/2, by this matrix, gene compactness matrix can decompose For C=PP^T=Γ Λ^1/2Λ^1/2Γ^T；

Step S44：Gene is represented using each row vector of matrix POne group of k dimensional vectorConstitute joint space.

Specific embodiment of the invention, in step s 5, the feature space of disease according to table below up to formula come Calculate：

Wherein, p_gThe characteristic vector of gene in joint space is represented,Concentrate and disease s for disease-gene association_iHave The gene sets of incidence relation,Represent disease s_iThe number of the gene of association.

Specific embodiment of the invention, in step s 6, matrix decomposition model is

Specific embodiment of the invention,The eigenmatrix of medicine is represented,The eigenmatrix of disease is represented, the initialization of model is carried out up to formula according to table below：

Specific embodiment of the invention, in the step s 7, the sample using medicine-disease association concentration is used as positive sample This；Concentrate random choose medicine and disease to be combined from medicine collection and disease, construct the medicine not occurred in positive sample Thing-disease pair, generates and the same number of medicine-disease pair of positive sample, using them as negative sample；In medicine-disease association Negative sample respective items are set to into Y in matrix Y_ij=0.

Specific embodiment of the invention, step S8 are further included：

Step S81：Successively using each sample of training sample, according to table below up to formula to medicine and disease Characteristic vector is updated：

a_i+=η (e_{I, j}b_jF_ij(1-F_ij)-λa_i)

b_j+=η (e_{I, j}a_iF_ij(1-F_ij)-λb_j)

Wherein,For current medical r_iWith disease s_jBetween the degree of association predictive value, e_{I, j}=Y_ij- F_ijFor the error between actual value and predictive value, η is learning rate, and λ is penalty coefficient.All samples are traveled through, has been designated as once changing In generation, is processed；

Step S82：Repeat step S81, until it reaches maximum iteration time.

Specific embodiment of the invention, in step s 9, for example following expression of matrix decomposition model after training Shown in formula, according to the model to the medicine r beyond training sample_iWith disease s_jThe degree of association of combination is predicted：

Wherein, a_iAnd b_jFor medicine r after training_iWith disease s_jCharacteristic vector.

Specific embodiment of the invention, in step slo, repeats 100 times to following process：Step S7 again with Machine generates negative sample, in step S8 re -training matrix decomposition model, the positive and negative sample in step S9 is to medicine-disease association matrix The unknown beyond this is predicted.

Specific embodiment of the invention, in step slo, for the medicine r beyond positive sample_iWith disease s_j's Combination, it is N that the combination is selected as the number of times of negative sample, in 100-N model of negative sample is not selected as, the combination degree of association Predictive value summation be S, then according to table below up to formula calculate the combination the degree of association prediction score score be：

(3) beneficial effect

Method provided by the present invention, focuses on the analysis to associated data and modeling, compared to based on attribute data Method, eliminates the calculating process of attribute data, more simple and easy to do；The associated data for being used derives from open biology number According to storehouse, data reliability；With the continuous accumulation of the associated data in public database, method provided by the present invention can be played More and more significant effect；Potential medicine-disease can more accurately be filtered out by method provided by the present invention to close Connection relation, is that medicine predicts new indication.

Description of the drawings

Fig. 1 is the flow chart of method provided by the present invention；

Fig. 2 is the operating process schematic diagram of the specific embodiment of the invention；

Fig. 3 A and Fig. 3 B is etoposide (Etoposide) and myasthenia graviss in the specific embodiment of the invention respectively Change procedure figure of the characteristic vector of (Myasthenia Gravis) in model training；

Fig. 4 is the scattergram of degree of association prediction score in the specific embodiment of the invention；

Fig. 5 is the corresponding relation figure that prediction obtains new indication number and medicine number in the specific embodiment of the invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention become more apparent, below in conjunction with specific embodiment, and reference Accompanying drawing, the present invention is described in further detail.

The present invention sets up and medicine-disease association, medicine-gene association, disease-gene association and gene association is being divided On analysis basis.Joint space is constructed by the gene Internet of gene space, then obtain from joint space medicine and The characteristic vector of disease.So, the topological characteristic of gene space has just been dissolved in the feature space of medicine and disease.Then, Under the supervision of medicine-disease association data, by matrix decomposition model the characteristic vector of medicine and disease is updated with Adjustment.Finally, the feature space after medicine and disease adjustment is used in the screening of potential medicine-disease association relation, from And predict the new indication of medicine.

Fig. 1 is the flow chart of method provided by the present invention.As shown in figure 1, the present invention provides a kind of based on gene space The new matrix decomposition Forecasting Methodology of the old medicine of fusion, the method comprise the steps：

Step S1：The associated data of medicine collection and disease collection known to collecting.

Preferably, step S1 includes：

Step S11：Medicine-disease association collection known to collecting.The medicine collection being directed to Represent, the disease collection being related toRepresent, N_rRepresent the number of medicine, N_sRepresent the number of disease. Medicine represents that with DrugBank ID disease is represented with OMIM ID.

Step S12：According to medicine-disease association collection construction medicine-disease association matrix.Specifically, construct N_r×N_sMedicine Thing-disease association matrix Y.Wherein, if medicine r_iWith disease s_jConcentrate in medicine-disease association and there is incidence relation, then medicine Thing-disease association matrix respective items Y_ij=1, other are unknown for matrix.

Step S2：The data of gene space are obtained, the data include that the medicine-gene related to the known medicine is closed Connection collection, the disease related to disease in step S1-gene association collection and gene association collection.

Gene association collection describes gene-gene association set；Gene space refers to the metric space of gene.

Preferably, step S2 includes：

Step S21：From collection step S1 in KEGG BRITE, BRENDA, SuperTarget and DrugBank data bases Related medicine-gene association the collection of medicine.Gene is represented with Entrez Gene ID.

Step S22：Related disease-gene association the collection of disease in collection step S1 from omim database.Gene is used Entrez Gene ID are represented.

Step S23：Gene association collection is extracted from HPRD data bases.

Gene is represented with Entrez Gene ID.

Step S3：Go out gene compactness matrix from gene space extracting data.

What gene compactness matrix was represented is the matrix for describing correlation degree between gene.

Preferably, step S3 includes：

Step S31：The gene Internet is built according to the gene association collection.

The gene Internet refers to the network being made up of gene-gene association relation.

Specifically, represent that gene association concentrates the gene for including with node, represent what gene association was concentrated with incidence edge Gene-gene association relation, connects each node, builds the gene Internet with this.

Specifically, all genes being related to are concentrated as concern base using medicine-gene association collection and disease-gene association Because of collectionWherein the number of gene is N_g, i-th gene g is calculated up to formula according to table below_iWith j-th Gene g_jBetween compactness：

Wherein, d_ijRepresent g_iWith g_jBeeline in the gene Internet, a ' and b ' are regulation parameter.When two bases When cause is unreachable in a network, d_ijIt is defined as infinity.So, it is tight between any two gene in calculating concern gene set Degree, constitutes N_g×N_gGene compactness Matrix C.

Step S4：Eigenvalues Decomposition is carried out to gene compactness matrix, the theorem in Euclid space of a low dimensional is generated so that is closed Each gene in note gene set can be represented with the characteristic vector of a low dimensional, so as to construct joint space.

So-called joint space, expression is for measuring medicine, disease and gene unified space.

Preferably, step S4 includes：

Step S41：The eigenvalue of gene compactness matrix is calculated, with eigenvalue size descending sort, while obtaining correspondence Characteristic vector.

Gene compactness matrix is the gene compactness Matrix C.

Step S42：Before extracting, k eigenvalue constitutes k × k diagonal matrix Λ, while corresponding k characteristic vector is extracted, Constitute N_g× k matrix Γ, wherein, one characteristic vector of correspondence per string.K is natural number, for example, take 8.

Step S43：Calculate N_g× k matrix P=Γ Λ^1/2, by this matrix, gene compactness matrix can decompose For C=PP^T=Γ Λ^1/2Λ^1/2Γ^T。

Step S44：Gene is represented using each row vector of matrix PSo, one group of k tie up to AmountConstitute joint space.

Step S5：According to medicine-gene association collection and disease-gene association collection, medicine and disease are calculated from joint space The feature space of disease.

So, medicine space, disease space and gene space unification has been arrived in joint space.

Preferably, step S5 includes：

Step S51：For each medicine r_i, medicine-gene association collection is traveled through, relevant gene set is collected Close, be designated as

The characteristic vector of medicine is calculated according to table below up to formula：

Wherein,Represent medicine r_iThe number of the gene of association.

Step S52：For each disease s_i, disease-gene association collection is traveled through, relevant gene set is collected Close, be designated as

The characteristic vector of disease is calculated according to table below up to formula：

Wherein,Represent disease s_iThe number of the gene of association.

Step S6：Matrix decomposition model is set up according to the eigenmatrix of the eigenmatrix and disease of medicine, it is empty using joint Between in medicine and the characteristic vector of disease model is initialized.

What matrix decomposition model was represented is a kind of computation model with incidence relation between hidden characteristic vector description object.Can To set up matrix decomposition modelWherein,Represent the feature square of medicine Battle array,Represent the eigenmatrix of disease.

The characteristic vector of medicine and disease in available joint space is initialized to matrix A and B：

Step S7：Training sample is obtained according to medicine collection, disease collection and medicine disease association collection.

Wherein, the sample using medicine-disease association concentration is used as positive sample.Random choose medicine is concentrated from medicine collection and disease Thing and disease are combined, and construct the medicine-disease pair not occurred in positive sample.Generate the same number of with positive sample Medicine-disease pair, it is believed that determine there is no incidence relation between them, using them as negative sample.In medicine-disease association square Negative sample respective items are set to into Y in battle array Y_ij=0, residual matrix item is unknown.WithRepresent training sample set.

Step S8：The matrix decomposition model is trained using the training sample.

Be presented herein below be model training algorithm a false code.

Preferably, step S8 includes：

Step S81：Training sample set is utilized successivelyIn each sample, according to table below up to formula to medicine and The characteristic vector of disease is updated：

a_i+=η (e_{I, j}b_jF_ij(1-F_ij)-λa_i)

b_j+=η (e_{I, j}a_iF_ij(1-F_ij)-λb_j)

Wherein,For current medical r_iWith disease s_jBetween the degree of association predictive value, e_{I, j}=Y_ij- F_ijFor the error between actual value and predictive value, η is learning rate, and λ is penalty coefficient.All samples are traveled through, has been designated as once changing In generation, is processed.

Step S82：Repeat step S81, until it reaches maximum iteration time.

Step S9：According to the matrix decomposition model after training to beyond positive negative sample in medicine-disease association matrix not Know that item is predicted.

The medicine r unknown for incidence relation_iWith disease s_j, according to following model expression to the association between them Degree is predicted：

Step S10：Return to step S7, the processing procedure being related to during step S7, S8, S9 is repeated several times, to positive sample with Outer all possible medicine-disease group predicts its degree of association.

For example, the processing procedure that step S7, S8, S9 is related to repeats 100 times, to positive sample beyond it is all possible Medicine-disease combination, predicts its degree of association by averaging.

Preferably, negative sample is generated again at random in step S7, in step S8 re -training matrix decomposition model, in step S9 is predicted to the unknown beyond positive negative sample in medicine-disease association matrix.Above procedure 100 can for example be repeated It is secondary.

For the medicine r beyond positive sample_iWith disease s_jCombination, the combination be selected as negative sample number of times be N, not It is selected as in 100-N model of negative sample, the predictive value summation of the combination degree of association is S, then reach formula according to table below Calculate the degree of association prediction score of the combination：

According to the step S10 degree of association predict score order from high to low, the medicine-disease beyond positive sample is combined into Row sequence.Score threshold is set and screens potential medicine-disease association relation, predict the new indication of medicine.

Each feature and advantage of the present invention are embodied below by the explanation to a specific embodiment.The embodiment is selected The medicine extracted in taking document and public database and the associated data of disease are used to predict that old medicine is newly used.Fig. 2 is tool of the present invention The operating process schematic diagram of body embodiment, as shown in Fig. 2 operation comprises the steps：

Medicine-disease association collection is collected by document and public database, is obtained between 130 medicines and 50 diseases altogether 213 medicine-disease association relations.Construction 130 × 50 medicines-disease association matrix Y, wherein, the corresponding matrix entries of incidence set It is set to 1.

130 medicines are collected from KEGG BRITE, BRENDA, SuperTarget to DrugBank data bases related Medicine-gene association collection, totally 776 medicine-gene association relations.The related disease of 50 diseases is collected from omim database Disease-gene association collection, totally 74 disease-gene association relations.Gene association collection is extracted from HPRD data bases, is related to 9415 Totally 36882 gene interactive relations between individual gene.

Step S3：Go out gene compactness matrix from gene space extracting data.

Medicine-gene association collection and disease-gene association are concentrated and are related to 850 genes altogether.According to gene association collection structure Build the gene Internet.A '=10 and b '=0.25 are taken, gene g is calculated according to expression_iWith gene g_jBetween it is tight Degree：

Wherein, d_ijRepresent g_iWith g_jBeeline in the gene Internet, when two genes it is unreachable in a network When, d_ijIt is defined as infinity.So, the compactness in 850 genes of calculating between any two gene, constitutes 850 × 850 bases Because of compactness Matrix C.

In this embodiment, Eigenvalues Decomposition is carried out to gene compactness Matrix C.K=8 is taken, one group of 8 dimensional vector is obtainedRepresent the characteristic vector of gene, tectonic syntaxis space.

For each medicine r_i, medicine-gene association collection is traveled through, relevant gene sets is collected, is designated asThe characteristic vector of medicine is calculated according to table below up to formula：

Wherein,Represent medicine r_iThe number of the gene of association.

For each disease s_i, disease-gene association collection is traveled through, relevant gene sets is collected, is designated asThe characteristic vector of disease is calculated according to table below up to formula：

Wherein,Represent disease s_iThe number of the gene of association.

In this embodiment, set up matrix decomposition modelWith medicine and disease in joint space The characteristic vector of disease is initialized to matrix A and B：

In this embodiment, the 213 medicines-disease association relation using medicine-disease association concentration is used as positive sample.From In 130 medicines and 50 diseases, random choose medicine and disease are combined, and construct the medicine not occurred in positive sample Thing-disease pair.Generate and the same number of medicine-disease pair of positive sample, it is believed that between them, determine there is no incidence relation, with They are used as negative sample.Negative sample respective items are set to into 0 in medicine-disease association matrix Y, residual matrix item is unknown.

Step S8：The matrix decomposition model is trained using the training sample.

Successively using each sample in 426 training samples, according to table below up to formula to medicine and disease Characteristic vector is updated：

a_i+=η (e_{I, j}b_jF_ij(1-F_ij)-λa_i)

b_j+=η (e_{I, j}a_iF_ij(1-F_ij)-λb_j)

Wherein,For current medical r_iWith disease s_jBetween the degree of association predictive value, e_{I, j}=Y_ij- F_ijFor the error between actual value and predictive value, learning rate η=0.035, penalty coefficient λ=0.005.Maximum iteration time is set For 1500.Fig. 3 A and Fig. 3 B with etoposide (Etoposide) and myasthenia graviss (Myasthenia Gravis) is respectively Example, illustrates the process that the feature space of medicine and disease progressively tends towards stability in model training.

Step S9：According to the matrix decomposition model after training to beyond positive negative sample in medicine-disease association matrix Y not Know that item is predicted.

In this embodiment, processing procedure step S7, S8, S9 being related to repeats 100 times, to the institute beyond positive sample Possible medicine-disease combination, predicts its degree of association by averaging.

For the medicine r beyond positive sample_i(1≤i≤130) and disease s_jThe combination of (1≤j≤50), the combination are selected as The number of times of negative sample is N, and in 100-N model of negative sample is not selected as, the predictive value summation of the combination degree of association is S, The degree of association prediction score of the combination is calculated according to table below up to formula so：

So, the degree of association prediction score of 6287 medicines-disease combination is calculated altogether.The degree of association predicts dividing for score Cloth is shown in Fig. 4.

Score orders from high to low are predicted according to 6287 degrees of association, the medicine-disease beyond positive sample is combined into Row sequence.It is 0.6459 to arrange score threshold, filters out 500 potential medicine-disease association relations, predicts that medicine is new with this Indication, realize that old medicine is newly used.This 500 potential medicine-disease association relations cover 125 medicines and 40 diseases Disease, wherein, the corresponding relation of new indication number and the medicine number for obtaining the number new indication is shown in Fig. 5.

For the potential drug-disease association relation of prediction, clinical research and medical literature verify that they are that comparison is accurate With it is effective.By these potential medicine-disease association relations, can be that medicine predicts new indication, it is old so as to realize Medicine is newly used.

Particular embodiments described above, has been carried out to the purpose of the present invention, technical scheme and beneficial effect further in detail Describe in detail bright, it should be understood that the foregoing is only the specific embodiment of the present invention, be not limited to the present invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc. should be included in the protection of the present invention Within the scope of.

Claims

1. a kind of method that matrix decomposition based on gene space fusion predicts new indication to old medicine, it is characterised in that described Method comprises the steps：

Step S1：Collect known to medicine-disease association collection, the medicine collection being directed toTable Show, the disease collection being related toRepresent, N_rRepresent the number of medicine, N_sRepresent the number of disease；Root According to known medicine-disease association collection construction medicine-disease association matrix；

Step S2：Obtain gene space data, the data include the medicine-gene association collection related to known medicine and Related disease-gene association the collection of disease and gene association collection in step S1；

Step S4：Eigenvalues Decomposition is carried out to gene compactness matrix, the theorem in Euclid space of a low dimensional is generated so that concern base Because each gene concentrated can be represented with the characteristic vector of a low dimensional, so as to construct joint space, the pass Note gene set includes that medicine-gene association collection and disease-gene association concentrate all genes being related to；

Step S5：According to medicine-gene association collection and disease-gene association collection, medicine and disease are calculated from joint space Eigenmatrix；

Step S6：Matrix decomposition model is set up according to the eigenmatrix of the eigenmatrix and disease of medicine, using in joint space Medicine and the characteristic vector of disease model is initialized；

Step S7：Training sample is obtained according to medicine collection, disease collection and medicine-disease association collection；

Step S10：Return to step S7, the processing procedure being related to during step S7, S8, S9 is repeated several times, beyond positive sample All possible medicine-disease combined prediction its degree of association, the positive sample is that described known medicine-disease association is concentrated Sample；

Step S11：According to the degree of association of step S10 prediction, the incidence relation of screening of medicaments-disease, the new adaptation of medicine is predicted Disease.

2. the method for claim 1, it is characterised in that in step s 2, the data of gene space include medicine-gene Incidence set, disease-gene association collection and gene association collection, the acquisition of these data are realized by inquiring about data base.

3. method as claimed in claim 2, it is characterised in that medicine-gene association collection by inquire about KEGG BRITE, BRENDA, SuperTarget and DrugBank data base obtains, and disease-gene association collection is obtained by inquiring about omim database, Gene association collection is obtained by inquiring about HPRD data bases.

4. method as claimed in claim 2, it is characterised in that gene is represented with Entrez Gene ID.

5. the method for claim 1, it is characterised in that step S3 is further included：

Step S32：Compactness between gene is calculated with the concern gene set.

6. the method for claim 1, it is characterised in that step S4 includes：

Step S41：The eigenvalue of gene compactness matrix is calculated, with eigenvalue size descending sort, while obtaining corresponding spy Levy vector；

Step S42：Before extracting, k eigenvalue constitutes k × k diagonal matrix Λ, while extracting corresponding k characteristic vector, constitutes N_g × k matrix Γ, wherein, one characteristic vector of correspondence per string, k is natural number, N_gFor the number of gene in the concern gene set Mesh；

Step S43：Calculate N_g× k matrix P=Γ Λ^1/2, by this matrix, gene compactness matrix can be decomposed into C= PP^T=Γ Λ^1/2Λ^1/2Γ^T；

Step S44：Gene is represented using each row vector of matrix POne group of k dimensional vector Constitute joint space.

7. the method for claim 1, it is characterised in that in step s 5, the feature space of disease is according to table below Calculate up to formula：

Wherein, p_gThe characteristic vector of gene in joint space is represented,Concentrate and disease s for disease-gene association_iIt is relevant The gene sets of relation,Represent disease s_iThe number of the gene of association.

8. the method for claim 1, it is characterised in that in the step s 7, concentrates random choose from medicine collection and disease Medicine and disease are combined, and construct the medicine-disease pair not occurred in positive sample, generate identical with positive sample number Medicine-disease pair, using them as negative sample；Negative sample respective items are set in the medicine-disease association matrix Y Y_ij=0.

9. the method for claim 1, it is characterised in that step S8 is further included：

Step S81：Successively using each sample of training sample, according to table below up to formula to the feature of medicine and disease to Amount is updated：

a_i+=η (e_i,jb_jF_ij(1-F_ij)-λa_i)

b_j+=η (e_i,ja_iF_ij(1-F_ij)-λb_j)

Wherein, a_iFor the characteristic vector of medicine, b_jFor the characteristic vector of disease,For current medical r_iWith Disease s_jBetween the degree of association predictive value, Y_ijFor current medical r_iWith disease s_jBetween the degree of association actual value, e_i,j=Y_ij-F_ij For current medical r_iWith disease s_jBetween the degree of association actual value and predictive value between error, η is learning rate, and λ is punishment system Number；All samples have been traveled through, an iteration process has been designated as；

Step S82：Repeat step S81, until it reaches maximum iteration time.

10. the method for claim 1, it is characterised in that in step s 9, the matrix decomposition model after training is for example following Expression formula shown in, according to the model to the medicine r beyond training sample_iWith disease s_jThe degree of association of combination is predicted：

11. the method for claim 1, it is characterised in that

In step slo, following process is repeated 100 times：Step S7 generates negative sample again at random, in step S8 re -training Matrix decomposition model, the unknown in step S9 is to medicine-disease association matrix beyond positive negative sample are predicted.

12. the method for claim 1, it is characterised in that in step slo, for the medicine r beyond positive sample_iWith disease Sick s_jCombination, the combination be selected as negative sample number of times be N, in 100-N model of negative sample is not selected as, the group The predictive value summation for closing the degree of association is S, then degree of association prediction score score of the combination is calculated up to formula according to table below For：