CN111009285A - Biological data network processing method based on similarity network fusion algorithm - Google Patents

Biological data network processing method based on similarity network fusion algorithm Download PDF

Info

Publication number
CN111009285A
CN111009285A CN201910451766.4A CN201910451766A CN111009285A CN 111009285 A CN111009285 A CN 111009285A CN 201910451766 A CN201910451766 A CN 201910451766A CN 111009285 A CN111009285 A CN 111009285A
Authority
CN
China
Prior art keywords
sample
data
similarity
type
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910451766.4A
Other languages
Chinese (zh)
Inventor
刘伟
郑明霞
赵溶
丁彦蕊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201910451766.4A priority Critical patent/CN111009285A/en
Publication of CN111009285A publication Critical patent/CN111009285A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a biological data network processing method based on a similarity network fusion algorithm, and belongs to the technical field of biological information analysis. The method is used for the fields of discovery of disease pathogenesis, early diagnosis, later treatment and the like by constructing a similarity network for various biological genetic information such as mRNA, miRNA, lncRNA and the like, fusing the similarity matrix by using an SNF algorithm, creating an available sample network, clustering by using spectral clustering and analyzing the relationship between networks. The method can obtain more comprehensive results by utilizing the complementarity of different types of data, is greatly superior to the analysis and establishment of single data, and establishes a foundation for subsequent comprehensive analysis.

Description

Biological data network processing method based on similarity network fusion algorithm
Technical Field
The invention relates to a biological data network processing method based on a similarity network fusion algorithm, and belongs to the technical field of biological information analysis.
Background
With the development of the human genome project, bioinformatics is rapidly perfected and developed. The development of high throughput sequencing technologies has facilitated more comprehensive and deeper genome analysis. With the continuous reduction of sequencing cost, a plurality of groups of biological data including genomics, transcriptomics and the like are continuously accumulated, and massive biological data is helpful for comprehensively and effectively mining biological knowledge contained in the biological data, so that abundant data resources are provided for biological information analysis, and a new challenge is brought at the same time. With the continuous accumulation of biological data with big data characteristics and the opening of accurate medical strategy plans, the importance of biological information analysis is increasing day by day, and the biological information analysis method has great significance for promoting the development of the current related fields. However, how to excavate the potential changes of the biological network through the biological experiment data always uses a systematic method to research the hot spots and difficulties of the life phenomena. The conventional method can only analyze certain biological type data at the same time, but cannot analyze multiple biological type data at the same time, and cannot utilize different characteristics contained in different types of data.
And bin (the analysis of differences of circRNA expression profiles of Luminal subtype breast cancer cells and normal breast cells, the southern medical university, 2018, 38(8), 1014-1019) and the like) is a single-factor bioinformation analysis. And after data are extracted through circRNA expression spectrums of the two cells, quantile normalization and subsequent data processing are carried out on the collected array images, and volcano graph and clustering heat map analysis is carried out, so that the conclusion that the circRNA expression difference of the Luminal subtype breast cancer cells and normal breast cells is large is obtained, wherein the circRNA with the expression up-regulated or down-regulated is expected to become a new target for Luminal subtype breast cancer diagnosis. However, in the actual disease gene relationship, multiple types of genes commonly affect cells to generate diseases, and the analysis of single data has certain limitations.
Liuyu intelligence (Liuyu chip and DNA methylation chip integrated analysis explore molecular targets for occurrence and development of nasopharyngeal carcinoma, journal of clinical examination, 2018, (8), 574-. Although this article has applied multiple types of data to obtain nasopharyngeal carcinoma-related therapeutic targets, it essentially performs data processing on a single type of data and does not perform data analysis by fusing the characteristics of multiple types of data at the same time.
Disclosure of Invention
In order to solve the problem that the existing biological data network processing method only can analyze data from a single type of data and does not fuse the characteristics of a plurality of types of data at the same time to analyze the data so as to determine the disease subtype, the invention provides a biological data network processing method based on a similarity network fusion algorithm.
A method of biometric data network processing, the method comprising:
s1: respectively constructing sample similarity matrixes corresponding to various types according to sample data sets of different biological data types;
s2: according to the sample similarity matrix corresponding to each type constructed in S1, constructing a fusion similarity matrix of multiple types of sample data by adopting an SNF algorithm;
s3: and clustering the fusion similarity matrix corresponding to the multiple types of sample data obtained in the step S2 by adopting a spectral clustering method to determine the subclass of the sample data.
Optionally, the S1 includes:
carrying out normalization processing on each type of data in the sample data set containing different biological data types;
calculating Euclidean distances among samples of the same type after normalization, and constructing a distance matrix;
and constructing a sample similarity matrix of sample data of each type by adopting a Gaussian thermal kernel function.
Optionally, the Euclidean distance dijThe calculation formula is as follows:
Figure BDA0002075366760000021
wherein the content of the first and second substances,the sample data set contains M types of sample data, the number of the samples is n, MvFor the number of genes included in each type of sample data, v is 1 … M, xikRepresenting the kth gene of the sample i, wherein the value ranges of i and j are [1, n]K has a value range of [1, mv];xjkRepresents the kth gene of sample j.
Optionally, the constructing a sample similarity matrix of sample data of each type by using the gaussian thermal kernel function includes:
the sample similarity matrix of each type of sample data is denoted as wv, and the sample similarity matrix of each type of sample data is:
Figure BDA0002075366760000022
wherein mu is a hyper-parameter with the value range of [0.3, 0.8%];εijAre parameters used to eliminate the scaling problem.
Optionally,. epsilonijIs defined as:
Figure BDA0002075366760000023
wherein N isiRepresents samples other than sample i, mean (d (i, N)i) Is sample x)iTo other samples NiDistance mean of (2).
Optionally, the S2 includes:
after obtaining the sample similarity matrix wv constructed in S1 and corresponding to each type, obtaining a normalized weight matrix P corresponding to each type of sample data according to the following formula(v)
Figure BDA0002075366760000031
f≠iwifRepresenting the sum of the similarity of the sample i and all other samples in the sample data of the same type, wherein the value range of f is [1, n ]];
Definition for measuring local parentsAnd a core matrix S of the sum force, wherein the core matrix corresponding to each type of sample data is recorded as S(v)
Figure BDA0002075366760000032
Figure BDA0002075366760000033
The sum of the similarity of the first g samples with the highest similarity of the sample i is the value of g, and the value of g is in the range of [20,30 ]];
Updating a sample similarity matrix wv corresponding to each data type by adopting an SNF algorithm, and iterating for a preset number of times to obtain updated P(v)′
Figure BDA0002075366760000034
Therein, sigmak≠vP(k)A normalized matrix P representing the correspondence of all data types except the current data type v(v)Summing;
and fusing the similarity matrixes of all the data types to obtain a fused similarity matrix P:
Figure BDA0002075366760000035
optionally, the predetermined number of iterations is 10 to 20 iterations.
Optionally, the method further includes: and obtaining a sample similarity network according to the sample similarity matrix.
The second object of the present invention is to provide the use of the above method for analysis of disease subtype identification.
The third purpose of the invention is to provide the application of the method in the technical field of biological information analysis.
The invention has the beneficial effects that:
by adopting the SNF algorithm, a similarity network is firstly constructed for various biological genetic information such as mRNA, miRNA, lncRNA and the like, then the SNF algorithm is used for fusing similarity matrixes, an available sample network is created, spectral clustering is used for clustering, and the relationship between networks is analyzed, so that the method is used for the fields of discovery of disease pathogenesis, early diagnosis, later treatment and the like. The method can obtain more comprehensive results by utilizing the complementarity of different types of data, is greatly superior to the analysis and establishment of single data, and establishes a foundation for subsequent comprehensive analysis.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of a fusion similarity network corresponding to a fusion similarity matrix obtained by fusing through the SNF algorithm.
FIG. 2 is a diagram of a sample similarity network constructed in accordance with the present invention.
FIG. 3 is a graph of the results of a clustering analysis of the fusion similarity matrix using spectral clustering in accordance with the present invention.
Fig. 4 is a schematic diagram of the clustering results shown in fig. 3 with obvious blocks.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The first embodiment is as follows:
in this embodiment, for detailed description, three types of data, mRNA, miRNA, and lncRNA, are taken as an example of data sets, and these data are respectively used to construct a similarity matrix for fusion, and then a sample network is constructed for analysis. The three types of mRNA, miRNA and IncRNA of 177 samples of pancreatic cancer patients were input in the following process, and the sample data was derived from TCGA database (https:// www.cancer.gov/TCGA).
The biological data network processing method based on the similarity network fusion algorithm provided by the embodiment comprises the following steps:
(1) respectively constructing a sample similarity matrix and a sample similarity network corresponding to each type of sample data;
assume the sample data set as { x1,x2,...,xnThe sample data set contains M types of sample data in total, the number of the samples is n, and the number of genes contained in each type of sample data is Mv(v ═ 1 … M); in this example, the number of genes included in each of the three types of sample data, n-177, M-3, mRNA, miRNA, and lncRNA, is M1-8073, M2-557, and M3-17914.
Firstly, normalizing each type of data in different types of biological data sets of mRNA, miRNA and lncRNA of 177 samples, calculating Euclidean distance between the samples after normalization, constructing a distance matrix, and constructing a sample similarity matrix wv of each type of sample data by a Gaussian thermal kernel function, wherein v is 1 … M. The sample similarity matrixes respectively corresponding to the three types of sample data are w1, w2 and w 3;
for simplicity of description, the following description will take the example of constructing a sample similarity matrix corresponding to one type of sample data and a sample similarity network process, and the following process needs to be performed on each type of sample data for multiple types.
The normalized formula is:
Figure BDA0002075366760000051
u is the mean, σ is the standard deviation, and x is the sample data.
The Euclidean distance calculation formula is as follows:
Figure BDA0002075366760000052
i. j has a value in the range of [1, n ]]In this embodiment, i, j ∈ (1,177), xikRepresents the kth gene of the sample i, and the k value range is [1, mv]。
Constructing a sample similarity matrix wv of each type of sample data by a Gaussian thermal kernel function as follows:
Figure BDA0002075366760000053
wherein, wijThe similarity between the sample i and the sample j is shown, mu is a hyper-parameter and the value range is [0.3,0.8 ]];dijRepresents the euclidean distance of sample i from sample j; epsilonijIs a parameter for eliminating the scaling problem, εijIs defined as
Figure BDA0002075366760000054
Wherein N isiRepresents samples other than sample i, mean (d (i, N)i) Is sample x)iTo other samples NiDistance mean of (d);
and after obtaining the sample similarity matrix wv of the sample data of each type, representing the sample similarity matrix wv in a graph form to obtain a sample similarity network corresponding to the sample data of each type.
(2) Similar network convergence
After the sample similarity matrix w constructed by different biological data types is obtained, a state matrix, namely the sample similarity matrix input in each iteration, is iteratively updated by using a Similarity Network Fusion (SNF) algorithm, and finally the fusion similarity matrix of various types of sample data is obtained, so that a fusion sample network is constructed, and further the next analysis is carried out.
The SNF algorithm is a method of constructing sample similarity networks for each data type using a sample network as an integration basis, and integrating these networks into a single similarity network using a nonlinear combination method. The SNF algorithm surpasses the current typing strategy for capturing continuous phenotypes, is greatly superior to the analysis and establishment of single data, and is very effective in identifying tumor subtypes and predicting survival.
Similar network iterative fusion based on the SNF algorithm integrates the data types well, so that biological information is further mined from a comprehensive angle.
After obtaining the sample similarity matrix wv of each data type, obtaining a normalized weight matrix P corresponding to the sample data of each type according to the following formula(v)
Figure BDA0002075366760000061
f≠iwifRepresenting the sum of the similarity of the sample i and all other samples in the sample data of the same type, wherein the value range of f is [1, n ]];
The normalization matrix P(v)The method is not influenced by the self-similarity of the diagonal lines, and numerical instability is avoided.
Defining a kernel matrix S for measuring local affinity, and recording the kernel matrix corresponding to each type of sample data as S(v)
Figure BDA0002075366760000062
Figure BDA0002075366760000063
The sum of the similarity of the first g samples with the highest similarity of the sample i is the value of g, and the value of g is in the range of [20,30 ]]. Next, a k-nearest neighbor (k-nn) method is used, which can filter out those edges with low similarity, and only the k-nearest neighbors of the sample are retained.
Updating a sample similarity matrix wv corresponding to each data type by adopting an SNF algorithm, and after iterating for a preset time, taking the preset time for 10-20 times to obtain updated P(v)′
Figure BDA0002075366760000064
Therein, sigmak≠vP(k)Indicating in addition to the current data typeNormalization matrix P corresponding to all data types except v(v)Summing;
in this embodiment, the sample similarity matrix corresponding to 3 types
In a feature fusion process of data on M sample similarity networks, if two samples i and j are similar in all data types, their similarity will be enhanced by the fusion process, and vice versa. And fusing the similarity matrixes of all the data types to obtain a fused similarity matrix P:
Figure BDA0002075366760000065
the fused similarity network corresponding to the fused similarity matrix P is shown in fig. 1, and a sample similarity network is constructed by the fused similarity network, as shown in fig. 2.
(3) Spectral clustering
Clustering the obtained fusion similarity matrix P by using a spectral clustering method to obtain subclasses.
Assuming the total number of clusters is C, each sample xiHaving a label indicating vector yi∈{0,1}CWhen x isiWhen belonging to the C-th cluster, C has a value range of [1, C%],
yi(k)=1
If not, then,
yi(k)=0
by dividing the matrix
Figure BDA0002075366760000071
To represent a clustering scheme; network partitioning using spectral clustering algorithm:
Figure BDA0002075366760000072
s.t.QTQ=I
wherein Q ═ Y (Y)TY)-1/2Partitioning the matrix for scale; l is+=I-D-1/2PD-1/2A normalized laplacian matrix representing the fusion similarity matrix P; momentThe matrix D is a matrix of degrees of the similarity network corresponding to the fusion similarity matrix P, the diagonal elements are degrees of the corresponding position nodes, and the non-diagonal elements are set to 0. The objective function may be characterized by a feature vector decomposition problem. By calculating the minimum k feature vector and applying a k-means algorithm to the reduced data, clustering of samples is obtained, the analysis result is shown in fig. 3, the samples are clustered into three subclasses, comparing fig. 1, three obvious blocks with different sizes can be seen in fig. 3, each block represents a subclass, and the schematic diagram of the three obvious blocks is shown in fig. 4.
The invention adopts SNF algorithm, and creates a calculation model of a comprehensive view of biological information by calculating sample similarity and performing similarity network fusion. The SNF algorithm can maintain a high signal-to-noise ratio so that the individual data types can be well integrated together. And the spectral clustering algorithm can analyze the relationship between the network nodes. The method can centralize the characteristics of various data types, solves the limitation of single data analysis, and establishes a foundation for subsequent comprehensive analysis such as disease subtype identification and the like.
Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A method for network processing of biological data, the method comprising:
s1: respectively constructing sample similarity matrixes corresponding to various types according to sample data sets of different biological data types;
s2: according to the sample similarity matrix corresponding to each type constructed in S1, constructing a fusion similarity matrix of multiple types of sample data by adopting an SNF algorithm;
s3: and clustering the fusion similarity matrix corresponding to the multiple types of sample data obtained in the step S2 by adopting a spectral clustering method to determine the subclass of the sample data.
2. The method according to claim 1, wherein the S1 includes:
carrying out normalization processing on each type of data in the sample data set containing different biological data types;
calculating Euclidean distances among samples of the same type after normalization, and constructing a distance matrix;
and constructing a sample similarity matrix of sample data of each type by adopting a Gaussian thermal kernel function.
3. The method of claim 2, wherein the euclidean distance dij is calculated as:
Figure FDA0002075366750000011
wherein, the sample data set contains M types of sample data, the number of the samples is n, MvFor the number of genes included in each type of sample data, v is 1 … M, xikRepresenting the kth gene of the sample i, wherein the value ranges of i and j are [1, n]K has a value range of [1, mv];xjkRepresents the kth gene of sample j.
4. The method of claim 3, wherein constructing the sample similarity matrix for each type of sample data using the Gaussian thermal kernel function comprises:
the sample similarity matrix of each type of sample data is denoted as wv, and the sample similarity matrix of each type of sample data is:
Figure FDA0002075366750000012
wherein mu is a hyper-parameter with the value range of [0.3, 0.8%];εijIs used for eliminatingExcept for the parameters of the scaling problem.
5. Method according to claim 4, characterized in that εijIs defined as:
Figure FDA0002075366750000013
wherein N isiRepresents samples other than sample i, mean (d (i, N)i) Is sample x)iTo other samples NiDistance mean of (2).
6. The method according to claim 5, wherein the S2 includes:
after obtaining the sample similarity matrix wv constructed in S1 and corresponding to each type, obtaining a normalized weight matrix P corresponding to each type of sample data according to the following formula(v)
Figure FDA0002075366750000021
f≠iwifRepresenting the sum of the similarity of the sample i and all other samples in the sample data of the same type, wherein the value range of f is [1, n ]];
Defining a kernel matrix S for measuring local affinity, and recording the kernel matrix corresponding to each type of sample data as S(v)
Figure FDA0002075366750000022
Figure FDA0002075366750000023
The sum of the similarity of the first g samples with the highest similarity of the sample i is the value of g, and the value of g is in the range of [20,30 ]];
Updating a sample similarity matrix wv corresponding to each data type by adopting an SNF algorithm, and iterating for a preset number of times to obtain updated P(v)′
Figure FDA0002075366750000024
Therein, sigmak≠vP(k)A normalized matrix P representing the correspondence of all data types except the current data type v(v)Summing;
and fusing the similarity matrixes of all the data types to obtain a fused similarity matrix P:
Figure FDA0002075366750000025
7. the method of claim 6, wherein the predetermined number of iterations is 10-20 iterations.
8. The method of claim 7, further comprising: and obtaining a sample similarity network according to the sample similarity matrix.
9. Use of the method of any one of claims 1 to 8 for analysis of disease subtype identification.
10. Use of the method of any one of claims 1 to 8 in the field of bioinformatic analysis techniques.
CN201910451766.4A 2019-05-28 2019-05-28 Biological data network processing method based on similarity network fusion algorithm Pending CN111009285A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910451766.4A CN111009285A (en) 2019-05-28 2019-05-28 Biological data network processing method based on similarity network fusion algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910451766.4A CN111009285A (en) 2019-05-28 2019-05-28 Biological data network processing method based on similarity network fusion algorithm

Publications (1)

Publication Number Publication Date
CN111009285A true CN111009285A (en) 2020-04-14

Family

ID=70111524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910451766.4A Pending CN111009285A (en) 2019-05-28 2019-05-28 Biological data network processing method based on similarity network fusion algorithm

Country Status (1)

Country Link
CN (1) CN111009285A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112071369A (en) * 2020-09-10 2020-12-11 暨南大学附属第一医院(广州华侨医院) Module marker mining method and device, computer equipment and storage medium
CN113723537A (en) * 2021-09-02 2021-11-30 安阳师范学院 Robust-based symmetric nonnegative matrix factorization microbial data clustering method
CN115631799A (en) * 2022-12-20 2023-01-20 深圳先进技术研究院 Sample phenotype prediction method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104392247A (en) * 2014-11-07 2015-03-04 上海交通大学 Similarity network fast fusion method used for data clustering
CN106203471A (en) * 2016-06-22 2016-12-07 南京航空航天大学 A kind of based on the Spectral Clustering merging Kendall Tau distance metric

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104392247A (en) * 2014-11-07 2015-03-04 上海交通大学 Similarity network fast fusion method used for data clustering
CN106203471A (en) * 2016-06-22 2016-12-07 南京航空航天大学 A kind of based on the Spectral Clustering merging Kendall Tau distance metric

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112071369A (en) * 2020-09-10 2020-12-11 暨南大学附属第一医院(广州华侨医院) Module marker mining method and device, computer equipment and storage medium
CN112071369B (en) * 2020-09-10 2021-08-03 暨南大学附属第一医院(广州华侨医院) Module marker mining method and device, computer equipment and storage medium
CN113723537A (en) * 2021-09-02 2021-11-30 安阳师范学院 Robust-based symmetric nonnegative matrix factorization microbial data clustering method
CN115631799A (en) * 2022-12-20 2023-01-20 深圳先进技术研究院 Sample phenotype prediction method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
JP7487163B2 (en) Detection and diagnosis of cancer evolution
CN111009285A (en) Biological data network processing method based on similarity network fusion algorithm
Song et al. scLM: automatic detection of consensus gene clusters across multiple single-cell datasets
CN111913999B (en) Statistical analysis method, system and storage medium based on multiple groups of study and clinical data
US20230395196A1 (en) Method and system for quantifying cellular activity from high throughput sequencing data
CN112086199B (en) Liver cancer data processing system based on multiple groups of study data
Suo et al. Application of clustering analysis in brain gene data based on deep learning
Wen et al. Multi-dimensional data integration algorithm based on random walk with restart
Shi et al. Multi-view subspace clustering analysis for aggregating multiple heterogeneous omics data
Tran et al. Omics-based deep learning approaches for lung cancer decision-making and therapeutics development
Maind et al. Identifying condition specific key genes from basal-like breast cancer gene expression data
Feng et al. Multi-omics data fusion via a joint kernel learning model for cancer subtype discovery and essential gene identification
Zou et al. DEMOC: a deep embedded multi-omics learning approach for clustering single-cell CITE-seq data
Liu et al. A Network Hierarchy-Based method for functional module detection in protein–protein interaction networks
Maddouri et al. Deep graph representations embed network information for robust disease marker identification
CN117457065A (en) Method and system for identifying phenotype-associated cell types based on single-cell multi-set chemical data
Yang et al. Characterization of essential genes by topological properties in the perturbation sensitivity network
CN112086187B (en) Disease progress path mining method based on complex network
Bi et al. SSLpheno: a self-supervised learning approach for gene–phenotype association prediction using protein–protein interactions and gene ontology data
CN113421614A (en) Tensor decomposition-based lncRNA-disease association prediction method
CN112768001A (en) Single cell trajectory inference method based on manifold learning and main curve
Soleimani et al. Classification of cancer types based on microRNA expression using a hybrid radial basis function and particle swarm optimization algorithm
Zhu et al. Dimensionality Reduction of Single-Cell RNA Sequencing Data by Combining Entropy and Denoising AutoEncoder
Zhou et al. A Unified Bayesian Framework for Bi-overlapping-Clustering Multi-omics Data via Sparse Matrix Factorization
Nagi et al. Cluster analysis of cancer data using semantic similarity, sequence similarity and biological measures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination