CN113345593A - Method for predicting disease association relation in biological association network - Google Patents
Method for predicting disease association relation in biological association network Download PDFInfo
- Publication number
- CN113345593A CN113345593A CN202110287525.8A CN202110287525A CN113345593A CN 113345593 A CN113345593 A CN 113345593A CN 202110287525 A CN202110287525 A CN 202110287525A CN 113345593 A CN113345593 A CN 113345593A
- Authority
- CN
- China
- Prior art keywords
- matrix
- disease
- aθb
- hypergraph
- core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Physiology (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Molecular Biology (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method for predicting disease association relation in a biological association network, which comprises the following steps: s1 creating a polynuclear representation of non-coding RNAsAnd multi-nuclear representation of diseaseS2, respectively fusing the non-coding RNA multi-core and disease multi-core by adopting a central core alignment calculation method to obtain an optimal coreAnds3, calculating method by using singular value decompositionThe fused optimal kernel is decomposed into two matrices, i.e.Ands4 matrix pair method using hypergraph regular term three-matrix decompositionAndcalculating to obtain a hypergraph Laplace matrixAnds5, performing cross validation on the Laplace matrix of the hypergraphAndcalculating to obtain a new incidence relation matrix, Y*=AΘBT(ii) a The invention solves the problem of incidence relation prediction of non-coding RNA and diseases, adds the Laplacian regularization term of the hypergraph into the three-matrix decomposition calculation, and adopts a multi-core fusion method of central core alignment, thereby obviously improving the prediction precision.
Description
Technical Field
The invention belongs to the field of biological association network prediction algorithms in bioinformatics, and particularly relates to a method for predicting disease association relationship in a biological association network.
Background
The precise correlation between non-coding RNA and disease is of great help for the treatment of human biomedical research. However, the conventional techniques are applied to only one non-coding RNA or one specific disease, and the separation of the two is performed, and the experimental method is time-consuming and expensive. Based on known non-coding RNAs and disease-related information, many computational tools have been proposed to detect new associations. Since non-coding rnas (ncrnas), including circular rnas (circrnas), micro rnas (mirnas), and long non-coding rnas (lncrnas), are closely related to the progression of various diseases in humans, it is important to develop an effective computational method to predict ncRNA-disease association.
Disclosure of Invention
In view of the problems in the prior art, the present invention aims to provide a method for predicting disease association relationship in a biological association network. The method uses a multi-core learning algorithm with central core alignment to fuse a plurality of cores, and then uses a three-matrix decomposition method based on a hypergraph regular term to train, so as to predict the new incidence relation between non-coding RNA and diseases.
In order to solve the problems in the prior art, the invention adopts the following technical scheme:
a method for disease association prediction in a biological association network, comprising the steps of:
s1 creating a polynuclear representation of non-coding RNAsAnd multi-nuclear representation of diseaseWherein u and v are the number of nuclei in the expression space of the non-coding RNA and the disease;
s2, fusing the non-coding RNA nucleus and the disease nucleus respectively by adopting a central nucleus alignment method to obtain the optimal nucleusAnd
s3, decomposing the fused optimal kernel into two matrixes by using a singular value decomposition method, namelyAndthe calculation process is as follows:
wherein, A and B are low rank approximate matrixes; r isncAnd rdPotential characteristic spatial dimensions of non-coding RNA and disease, respectively;
s4 matrix pair method using hypergraph regular term three-matrix decompositionAndcalculating to obtain a hypergraph Laplace matrixAnd
s5, performing cross validation on the Laplace matrix of the hypergraphAndcalculating to obtain a new incidence relation matrix, Y*=AΘBT。
Further, the laplace matrix of the hypergraph in the step S4Andthe calculation formula is as follows:
Lh=I-Θ
where I is the identity matrix.
Further, in the step S5, the laplace matrix of the hypergraph is processed by the cross validation methodAndthe calculation process is as follows:
ATAΘBTB+λ1ATL1AΘBTB+λ2ATAΘBTL2B=ATYtrainB
AΘBT+λ1L1AΘBT+λ2AΘBTL2=Ytrain
(I+λ1L1)AΘBT+λ2AΘBTL2=Ytrain
A-1(I+λ1L1)AΘ+λ2ΘBTL2(BT)-1=A-1Ytrain(BT)-1
wherein the content of the first and second substances,is a correlation matrix of known ncRNA-diseases;is a double projection matrix; lambda [ alpha ]1And λ2Regularizing coefficients for two different graphs, each set to 1;andis shown as a drawingThe normalized laplacian matrix is calculated as follows:
advantageous effects
The project utilizes a multi-core fusion method to identify the association relationship between non-coding RNA and diseases. And searching for important features influencing the incidence relation by using an efficient multi-core learning algorithm, evaluating the importance of the core matrix, and reducing the deviation brought by core fusion. A reasonable kernel matrix evaluation method is constructed, weight coefficients are calculated for different kernel matrices, the kernel matrix containing noise can be effectively filtered, useful kernel matrices are reserved to the maximum extent, and a foundation is provided for improving the prediction accuracy of the model. Each kernel matrix represents information of different heterogeneous data, the weight coefficient of each kernel matrix reflects the contribution degree of different information in a prediction model, and key information influencing the ncRNA-disease associated prediction accuracy can be further found. The prediction accuracy of the method is better than that of other methods in the prior art. The method is high in prediction accuracy, simple and efficient, solves the problem that the accuracy of ncRNA-disease association identification is not high in the existing method, and has important significance for promoting non-coding RNA research.
Drawings
FIG. 1 is a flow chart of the computational process of the present invention;
FIG. 2, weight of each core in five data sets;
FIG. 3 compares AUC and AUPR for different kernel functions by 5-fold cross validation of 5 data sets;
FIG. 4 compares AUC and AUPR of different matrix factorization methods by 5-fold cross validation of 5 data sets;
FIG. 5, different rdAnd rncAUC of parameters under 5-fold cross validation;
FIG. 6, different rdAnd rncAUPR with parameters under 5-fold cross validation;
FIG. 7, optimal parameter r calculated over five data setsncAnd rd;
FIG. 8 compares AUC results of prior excellent methods by 5-fold cross validation and leave-one-fold cross validation;
FIG. 9 shows ten new correlations of lung cancer, liver cancer and pancreatic cancer.
Detailed Description
The invention is described in detail below with reference to the attached drawing figures:
as shown in figure 1, the invention realizes the accurate identification of the association relationship between non-coding RNA and diseases, and is greatly helpful for disease treatment of human biomedical research. However, conventional techniques are only applied to one non-coding RNA or one specific disease, and the experimental method is time-consuming and expensive. Based on known non-coding RNAs and disease-related information, many computational tools have been proposed to detect new associations. Since ncRNAs (circRNAs, miRNAs, and lncRNAs) are closely related to various disease progression in humans, development of an efficient computational method is crucial for ncRNA-disease association prediction.
The basic idea of the invention is: fusing a plurality of nuclei of non-coding RNA and a plurality of nuclei of diseases by adopting a central nucleus alignment method, and predicting a new incidence relation by adopting a three-matrix decomposition method of a hypergraph regular term.
The invention mainly comprises the following steps: firstly, obtaining non-coding RNA nuclei and disease nuclei as much as possible, then fusing the non-coding RNA nuclei and the disease nuclei by adopting a central nucleus alignment method, decomposing the fused nuclei into two matrixes by using a singular value decomposition method, and finally obtaining a new incidence relation matrix by using a three-matrix decomposition method of hypergraph regular terms in a cross validation mode. The method comprises the following specific steps:
a method for disease association prediction in a biological association network, comprising the steps of:
s1 construction of non-coding RNA nucleiAnd nucleus of diseaseWherein u and v are the number of nuclei in the non-coding RNA and the disease space;
s2, fusing the non-coding RNA nucleus and the disease nucleus to obtain the optimal nucleus by adopting a central nucleus alignment methodAnd
the parity value of the invention can describe the similarity of two kernels; CKA-MKL the relationship between the ideal nuclear matrix and ncRNA nuclei (or disease nuclei) was calculated as follows:
βp≥0,p=1,2,…,N
wherein, KidealIs an idealThe core is a core of a plurality of cores,andis an ideal inner core of ncRNA and disease constructed by known correlation;
s3, decomposing the fused optimal kernel into two matrixes by adopting a singular value decomposition method, namelyAndthe calculation process is as follows:
wherein, A and B are low rank approximate matrixes; r isncAnd rdPotential characteristic space dimensions of ncRNA and disease respectively;
s4, three-matrix decomposition method matrix using hypergraph regular termAndcalculating to obtain a hypergraph Laplacian matrixAnd
Lh=I-Θ
where I is the identity matrix.
S5, performing cross validation on the Laplace matrix of the hypergraphAndcalculating to obtain a new incidence relation matrix, Y*=AΘBT。
The cross validation method in the step S5 is used for the Laplacian moment of the hypergraphAndthe matrix calculation process is as follows:
ATAΘBTB+λ1ATL1AΘBTB+λ2ATAΘBTL2B=ATYtrainB
AΘBT+λ1L1AΘBT+λ2AΘBTL2=Ytrain
(I+λ1L1)AΘBT+λ2AΘBTL2=Ytrain
A-1(I+λ1L1)AΘ+λ2ΘBTL2(BT)-1=A-1Ytrain(BT)-1
wherein the content of the first and second substances,is a correlation matrix of known ncRNA-diseases;is a double projection matrix; lambda [ alpha ]1And λ2Regularizing coefficients for two different graphs, each set to 1;andthe laplacian matrix is normalized for the graph as calculated:
the invention calculates the new incidence relation matrix through the steps S1-S5, and can obtain the identification accuracy of the new incidence relation through inquiring and verifying in other databases.
The implementation process of the invention comprises the following steps:
according to the calculation method, the invention obtains the optimal r of the D1 data set by using a grid search methodncAnd rd. The present invention uses 100 steps to test for different values from 100 to the maximum. The rest data sets use the same grid searching method, and the optimal parameters r of different data setsncAnd rdAs shown in fig. 7. Meanwhile, fig. 5 and 6 are different from each other in rdAnd rncAUC and aucr under 5-fold cross validation of the model. Here, rnc(horizontal axis) and rdThe (vertical axis) setting ranges from 100 to 1500, step size 100.
The invention counts the performance of multi-core and single-core applications on 5 data sets. Fig. 2 shows the weights for each core in the five data sets. It can be seen that the semantic similarity of diseases is given almost the greatest weight. Non-coding RNAs that function similarly are of greater weight, which means that they have more potent information.
Figure 3 shows the results of 5-fold cross validation (AUC and aucr) on 5 data sets for different kernel functions. It can be seen that the process of the invention (CKA-HGRTMF) gives the best performance (AUC) on D2(0.9775), D3(0.9023), D4(0.8809) and D5 (0.9185). The method (CKA-HGRTMF) of the invention achieves the best AUPR performance on 5 data sets. For a single core, CKA-HGRTMF selects different combinations of cores in two feature spaces and tests are performed using the HGRTMF model. It was found experimentally that both ncRNA and the most weighted nuclei of the disease feature set gave better results than the other methods.
Comparing HGRTMF and other MF based computational models, including three matrix factorization (CMF), graph regularization term matrix factorization (GRMF), Three Matrix Factorization (TMF), NRLMF, and graph regularization term three matrix factorization (GRTMF), the results are shown in fig. 4. The method (CKA-HGRTMF) of the present invention performed the best AUPRs at D1 (AUPR: 0.9173), D2 (AUPR: 0.7712), D3 (AUPR: 0.6224) and D5 (AUPR: 0.5017), and performed the best AUCs at D2 (AUC: 0.9775), D3 (AUC: 0.9023), D4 (AUC: 0.8809) and D5 (AUC: 0.9185), which are superior to other MF-based computational models. The AUPRs of CKA-HGRTMF on 5 datasets were 0.8957, 0.7456, 0.6014, 0.3992 and 0.4250, respectively, and the AUCs were 0.9857, 0.9746, 0.8991, 0.8774 and 0.8991, respectively. The AUCs and AUPRs of CKA-GRTMF at D1, D2 and D5 are all higher than CKA-TMF. The result shows that the addition of the calculation model of the graph regularization term is beneficial to improving the prediction performance.
In order to evaluate the performance of the CKA-GRTMF model, the method is compared with other existing methods. Fig. 8 shows the results of 5-fold cross validation and leave-one cross validation of AUCs. The method of the invention (CKA-GRTMF) achieved the best results in both 5-fold cross validation on D2, D3, D4, D5 and one-leave cross validation on 5 datasets. The calculation method of the invention finds ten new association relations among lung cancer, liver cancer and pancreatic cancer, and the result is shown in figure 9.
In conclusion, the invention solves the problem of predicting the association relationship between non-coding RNA and diseases. Adding a Laplace regularization term of the hypergraph into three-matrix decomposition, and adopting a center kernel alignment method to obviously improve the prediction precision. The calculation process of the invention has the characteristics of simplicity, easy realization and the like, and the hardware equipment and the calculation resources required by calculation are also lower, thus having wide usability. The method is realized by C + + and MATLAB, is applied to a computer with a common 2.5GHz8 core CPU and a 24GB memory, and can complete the prediction tasks of thousands of samples in a short time.
Claims (3)
1. A method for disease association prediction in a biological association network, comprising the steps of:
s1 creating a polynuclear representation of non-coding RNAsAnd multi-nuclear representation of diseaseWherein: u and v are the number of nuclei in the feature space of the non-coding RNA and the disease;
s2, fusing the non-coding RNA multi-core and disease multi-core by adopting a central core alignment calculation method to obtain an optimal coreAnd
s3, decomposing the fused optimal kernel into two matrixes by adopting a singular value decomposition calculation method, namelyAndthe calculation process is as follows:
wherein, A and B are low rank approximate matrixes; r isncAnd rdPotential feature space dimensions for non-coding RNA and disease, respectively;
s4 matrix pair method using hypergraph regular term three-matrix decompositionAndcalculating to obtain a hypergraph Laplace matrixAnd
3. The method for disease association prediction in biological association network as claimed in claim 1, wherein the step S5 is performed by cross-validation of the laplacian matrix of the hypergraphAndthe calculation process is as follows:
ATAΘBTB+λ1ATL1AΘBTB+λ2ATAΘBTL2B=ATYtrainB
AΘBT+λ1L1AΘBT+λ2AΘBTL2=Ytrain
(I+λ1L1)AΘBT+λ2AΘBTL2=Ytrain
A-1(I+λ1L1)AΘ+λ2ΘBTL2(BT)-1=A-1Ytrain(BT)-1
wherein the content of the first and second substances,a correlation matrix of known non-coding RNAs with disease;is a double projection matrix; lambda [ alpha ]1And λ2Regularizing coefficients for two different graphs, each set to 1;andfor the graph normalized Laplace matrix, the formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110287525.8A CN113345593A (en) | 2021-03-17 | 2021-03-17 | Method for predicting disease association relation in biological association network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110287525.8A CN113345593A (en) | 2021-03-17 | 2021-03-17 | Method for predicting disease association relation in biological association network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113345593A true CN113345593A (en) | 2021-09-03 |
Family
ID=77467707
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110287525.8A Pending CN113345593A (en) | 2021-03-17 | 2021-03-17 | Method for predicting disease association relation in biological association network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113345593A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114913916A (en) * | 2022-04-19 | 2022-08-16 | 广东工业大学 | Drug relocation method for predicting new coronavirus adaptive drugs |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160140312A1 (en) * | 2014-11-14 | 2016-05-19 | International Business Machines Corporation | Generating drug repositioning hypotheses based on integrating multiple aspects of drug similarity and disease similarity |
CN107578104A (en) * | 2017-08-31 | 2018-01-12 | 江苏康缘药业股份有限公司 | A kind of Chinese Traditional Medicine knowledge system |
CN109308935A (en) * | 2018-09-10 | 2019-02-05 | 天津大学 | A kind of method and application platform based on SVM prediction noncoding DNA |
US20190286086A1 (en) * | 2016-11-10 | 2019-09-19 | Rowanalytics Ltd | Control apparatus and method for processing data inputs in computing devices therefore |
-
2021
- 2021-03-17 CN CN202110287525.8A patent/CN113345593A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160140312A1 (en) * | 2014-11-14 | 2016-05-19 | International Business Machines Corporation | Generating drug repositioning hypotheses based on integrating multiple aspects of drug similarity and disease similarity |
US20190286086A1 (en) * | 2016-11-10 | 2019-09-19 | Rowanalytics Ltd | Control apparatus and method for processing data inputs in computing devices therefore |
CN107578104A (en) * | 2017-08-31 | 2018-01-12 | 江苏康缘药业股份有限公司 | A kind of Chinese Traditional Medicine knowledge system |
CN109308935A (en) * | 2018-09-10 | 2019-02-05 | 天津大学 | A kind of method and application platform based on SVM prediction noncoding DNA |
Non-Patent Citations (1)
Title |
---|
HAO WANG ET AL: "Exploring associations of non-coding RNAs in human diseases via three-matrix factorization with hypergraph-regular terms on center kernel alignment", 《BRIEFINGS IN BIOINFORMATICS》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114913916A (en) * | 2022-04-19 | 2022-08-16 | 广东工业大学 | Drug relocation method for predicting new coronavirus adaptive drugs |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108733976B (en) | Key protein identification method based on fusion biology and topological characteristics | |
CN111681705A (en) | miRNA-disease association prediction method, system, terminal and storage medium | |
CN113393911B (en) | Ligand compound rapid pre-screening method based on deep learning | |
CN112289391B (en) | Anode aluminum foil performance prediction system based on machine learning | |
CN109492748B (en) | Method for establishing medium-and-long-term load prediction model of power system based on convolutional neural network | |
CN115240772B (en) | Method for analyzing single cell pathway activity based on graph neural network | |
Sedaghat et al. | Combining supervised and unsupervised learning for improved miRNA target prediction | |
Zhang et al. | A new graph autoencoder-based consensus-guided model for scRNA-seq cell type detection | |
CN110491443B (en) | lncRNA protein correlation prediction method based on projection neighborhood non-negative matrix decomposition | |
CN106599610A (en) | Method and system for predicting association between long non-coding RNA and protein | |
CN115640529A (en) | Novel circular RNA-disease association prediction method | |
CN113345593A (en) | Method for predicting disease association relation in biological association network | |
LU502421B1 (en) | A Method for Predicting Disease Association in Biological Association Network | |
Deng et al. | EXAMINE: A computational approach to reconstructing gene regulatory networks | |
CN116453585A (en) | mRNA and drug association prediction method, device, terminal equipment and medium | |
CN116011071A (en) | Method and system for analyzing structural reliability of air building machine based on active learning | |
CN113628696B (en) | Medicine connection graph score prediction method and device based on double-graph convolution fusion model | |
CN115881232A (en) | ScRNA-seq cell type annotation method based on graph neural network and feature fusion | |
WO2021004355A1 (en) | Decoy library construction method and apparatus, target-decoy library construction method and apparatus, and metabolome fdr identification method and apparatus | |
CN112992347A (en) | lncRNA-disease associated prediction method and system based on Laplace regularization least square and network projection | |
Wang et al. | Predicting RBP binding sites of RNA with high-order encoding features and CNN-BLSTM hybrid model | |
CN113421614A (en) | Tensor decomposition-based lncRNA-disease association prediction method | |
CN113223655A (en) | Medicine-disease associated prediction method based on variational self-encoder | |
WO2016187898A1 (en) | Metabolite ms/ms mass spectrum computer simulation method | |
Li et al. | A sparse Bayesian learning method for structural equation model-based gene regulatory network inference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210903 |