WO2020198942A1 - Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering - Google Patents
Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering Download PDFInfo
- Publication number
- WO2020198942A1 WO2020198942A1 PCT/CN2019/080443 CN2019080443W WO2020198942A1 WO 2020198942 A1 WO2020198942 A1 WO 2020198942A1 CN 2019080443 W CN2019080443 W CN 2019080443W WO 2020198942 A1 WO2020198942 A1 WO 2020198942A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- peak
- cell
- clustering
- accesson
- matrix
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Definitions
- the invention belongs to the technical field of biological sequencing data analysis, and specifically relates to a single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering.
- scATAC-seq data analysis The main purpose of scATAC-seq data analysis is to restore the main cell populations or developmental differentiation pathways in mixed biological samples through sequencing results.
- the current scATAC-seq technology is relatively cutting-edge, and the signal-to-noise ratio of the data is low. Therefore, scATAC-seq data analysis requires a set of easy-to-use analysis methods and restores cell heterogeneity information to the greatest extent.
- the currently published scATAC-seq data analysis method does not have a complete and easy-to-use analysis process from fastq to clustering, visualization, and developmental path reconstruction.
- ChromVAR ChromVAR
- LSI LSI
- Cicero Cicero
- the input data of this method is the reading matrix of the cell*peak and the sequence information of each peak.
- This method uses the known transcription factor motif information to calculate the preference degree of the transcription factor for each peak.
- the input data of this method is the cell * peak reading matrix.
- This method uses the TF-IDF algorithm (term frequency (Term Frequency), IDF means inverse text frequency index) to complicate the matrix, and then use a new matrix to perform Information restoration.
- the input data of this method is the reading matrix of the cell*peak and the position information of the peak on the chromosome.
- This method combines the readings of the peaks in a certain absolute space by the position of the peak on the chromatin (such as : Peaks within 250kb). Then use this matrix to restore downstream information.
- the present invention proposes a complete, easy-to-use, and efficient biological sample scATAC-seq data analysis method and system with efficient cell heterogeneity information reduction ability.
- the present invention proposes a single-cell chromatin accessibility sequencing data analysis method based on peak clustering, including:
- the method further includes reducing the dimensionality of the reading matrix of the cell *accesson to a two-digit visualization matrix.
- the dimensionality reduction method includes PCA, T-SNE or UMAP.
- the method further includes clustering the cells according to the reading matrix of the cell *accesson.
- the clustering algorithm includes KNN clustering, kernel clustering or louvain clustering.
- the method further includes using the read matrix of the cell *accesson to construct the false time condition of the cell development path.
- the algorithm used when constructing the false time condition of the cell development path includes SPRING or monocle.
- the present invention provides a single-cell chromatin accessibility sequencing data analysis system based on peak clustering, including a preprocessing module and an accesson building module;
- the preprocessing module includes a) a comparison unit, which is used to compare single-cell chromatin accessibility sequencing data with corresponding biological sample genome data to obtain a comparison result; b) a peak finding unit, which is used to compare all single cells The comparison results of the cells are combined, and then the peak is searched; c) The reading calculation unit calculates the readings in each peak to obtain the reading matrix of the cell*peak;
- the accesson building module includes a) a peak distance calculation unit, used to calculate the mathematical distance between peaks in the cell*peak reading matrix; b) a peak clustering unit, used to cluster peaks based on the mathematical distance between peaks C) A matrix conversion unit for combining the reading matrix of the cell*peak into the reading matrix of the cell*accesson, where the accesson is the peak after clustering.
- the system further includes a visualization module for reducing the dimensionality of the reading matrix of the cell *accesson to a two-digit visualization matrix.
- the dimensionality reduction method includes PCA, T-SNE or UMAP.
- the system further includes a cell clustering module, which is used to cluster the cells according to the reading matrix of the cell *accesson.
- the clustering algorithm includes KNN clustering, kernel clustering or louvain clustering. class.
- the system further includes a cell development path remodeling module, which is used to construct the false time condition of the cell development path using the reading matrix of the cell *accesson, preferably, the algorithm used when constructing the false time condition of the cell development path Including SPRING or monocle.
- a cell development path remodeling module which is used to construct the false time condition of the cell development path using the reading matrix of the cell *accesson, preferably, the algorithm used when constructing the false time condition of the cell development path Including SPRING or monocle.
- the mathematical distance includes Euclidean distance, Pearson correlation coefficient, or cityblock distance.
- the peak clustering method includes KNN, DBSAN, or K-Mean.
- the method of combining the reading matrix of the cell*peak into the reading matrix of the cell*accesson includes taking the sum of the peak readings in the accesson, the average of the peak readings, the median of the peak readings, or the variance of the peak readings.
- the present invention also provides a single-cell chromatin accessibility sequencing data analysis device based on peak clustering, including:
- a memory has instructions stored thereon, and when the instructions are executed by the processor, the processor executes the analysis method.
- the present invention also provides a computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to execute the analysis method.
- the present invention has the following beneficial effects:
- the present invention provides the first scATAC-seq data analysis method and system from fastq to clustering, visualization and developmental path reshaping;
- the present invention proposes an accesson construction method based on peak clustering as a key module of scATAC-seq data analysis.
- the transformed cell *accesson reading matrix is used for subsequent clustering, visualization and cell development path remodeling.
- the clustering effect is statistically significantly higher than the existing method (ARI).
- Figure 1 is a schematic diagram of accesson construction and downstream analysis based on peak clustering in an embodiment of the present invention
- Figure 2 shows the relationship between the number of accesson and the clustering effect ARI in an embodiment of the present invention (gold standard test data set 1);
- Figure 3 is the scATAC-seq data of human leukemia cells and related lineage cells in an embodiment of the present invention: A. Data clustering (hierarchical clustering) and B. Visualization effect (tSNE);
- Figure 4 shows the scATAC-seq data related to the development and differentiation lineage of the artificial hematopoietic stem cells in the embodiment of the present invention: data development path remodeling (monocle);
- Figure 5 is the scATAC-seq data of mouse forebrain nerve cells in an embodiment of the present invention: data clustering (KNN) and visualization (tSNE);
- 6A-6D are mouse thymic T cell scATAC-seq data in an embodiment of the present invention: data clustering (Louvain, hierarchical clustering), visualization (tSNE) and developmental path remodeling (monocle);
- FIG. 7 is a comparison between the clustering effect and time used in the embodiment of the present invention with existing methods (gold standard test data set 1);
- Fig. 8 is a comparison of the clustering effect and time used in the embodiment of the present invention with the existing method (gold standard test data set 2).
- Mammals are the basic components of life activities and are often the pathogenesis of various diseases, such as nerve cells, epithelial cells, and tumor cells.
- Cell heterogeneity Biological tissue samples (such as tumor tissue, brain tissue) are composed of a large number of cells, and the physiological functions of the constituent cells are not the same. Common cell heterogeneity has the following two manifestations: 1) The constituent cells are composed of a variety of clear cell populations (discrete). 2) The constituent cells are in a continuous cell differentiation path (continuous).
- Genome The whole DNA sequence of an organism, composed of four ATCG bases arranged in an orderly manner. The genomes of major mammals such as humans and mice have all been sequenced.
- Genes are all DNA sequences required to produce a polypeptide chain or functional RNA.
- a gene is generally one or more segments of DNA in the genome.
- Transcription factor A protein that binds to DNA to initiate or regulate gene expression. It binds to DNA often by recognizing specific DNA sequence patterns (Motif).
- Chromatin A linear composite structure composed of DNA, histones, non-histone proteins and a small amount of RNA in the nucleus.
- the basic original is the nucleosome formed by DNA winding on histone.
- Chromatin accessibility to evaluate whether a certain piece of DNA is entangled on histones. Under normal circumstances, there are two situations for chromatin accessibility: 1) DNA is tightly wound around nucleosomes, called closed DNA; 2) DNA is wound around nucleosomes and is naked, called open DNA.
- TCGA The Cancer Genome Atlas (TCGA). Contains different omics sequencing data of cancer tissues and normal tissues from 33 different cancers and 11,000 patients.
- Single-cell chromatin accessibility sequencing (scATAC-seq): A collective term for several existing sequencing methods used to detect the chromatin accessibility of a single cell. Including mononuclear chromatin accessibility sequencing (snATAC-seq), single-cell composite index chromatin accessibility sequencing (sciATAC-seq), flow-based single-cell chromatin accessibility sequencing (FACS scATAC-seq).
- Sequence reads DNA fragments obtained in bioomics.
- Mapping Compare the short sequence with the known genome information, and find the position of each short sequence on the genome.
- Peak Calling and through the results of data analysis and comparison, find the open position of the DNA.
- the position information is called a peak and assigned a number.
- Readings the number of short sequences in each sample and each peak.
- ARI Adjusted Rand index
- An embodiment of the present invention proposes a single-cell chromatin accessibility (scATAC-seq) sequencing data analysis system based on peak clustering (hereinafter referred to as APEC): It includes the following modules:
- Preprocessing module including a) alignment unit, used to compare fastq files (single-cell chromatin accessibility sequencing data) to genome sequences to form bam files; b) peak finding unit, used to compare all single The bam files of the cell comparison results are merged into a merge_bam file, and peaks are searched on this basis; c) The reading calculation unit calculates the count of reads in each peak, and finally outputs the reading matrix of the cell*peak.
- Accesson building module including a) Peak distance calculation unit, which calculates the mathematical distance between peaks (including but not limited to Euclidean distance, Pearson correlation coefficient, cityblock distance) through the cell * peak reading matrix; b) The peak clustering unit uses the mathematical distance between the peaks to cluster the peaks. The peaks after clustering are called accesson.
- the clustering methods include but are not limited to (KNN, DBSAN).
- the matrix conversion unit according to the accesson information, merges the cell*peak reading matrix into the cell*accesson matrix.
- the merging method includes but is not limited to taking the sum, average, median, and variance of the peak readings in the accesson.
- Visualization module Reduce the dimensionality of the cell *accesson reading matrix to a two-digit visualization matrix.
- the dimensionality reduction visualization methods used include but are not limited to PCA, T-SNE, UMAP.
- Cell clustering module use the cell *accesson reading matrix to cluster cells.
- Clustering algorithms include but are not limited to KNN clustering, kernel clustering, and louvain clustering.
- Cell development path remodeling module Use cell *accesson reading matrix to construct false time situation of cell development path. Algorithms used include but not limited to SPRING and monocle.
- the data set includes: 1) People ScATAC-seq data of leukemia cells and related lineage cells; 2) scATAC-seq data related to the development and differentiation lineage of artificial hematopoietic stem cells; 3) scATAC-seq data of mouse forebrain nerve cells; 4) scATAC-seq data of mouse thymic T cells.
- the analysis process of the scATAC-seq analysis system (APEC) based on peak clustering of the present invention includes the following steps:
- the input data is a fastq file, and its format can be: a), a single fastq file for each cell; b), a mixed fastq file, but each cell can be split by the split rule given by the data provider Split the data into each cell.
- index sequence using different splits of the first 5-10 bases of fastq
- the input data can be compared to different biological sample genomes through the comparison unit, for example, data sets 1, 2 are compared to the human genome, and data sets 3, 4 are compared to the mouse genome. Or the biological sample genome designated by the data provider.
- the result of the comparison produces a Bam file, which indicates the position of the read in each fastq to the genome.
- the peak finding unit to process the bam file, the chromatin open sites in the biological sample can be defined, combined with the reading calculation unit, the reading matrix (m ⁇ n) of each cell (m) and each peak (n) can be obtained.
- Fig. 1 is a schematic diagram of accesson construction and downstream analysis based on peak clustering in an embodiment of the present invention.
- the m ⁇ n reading matrix is first transferred to the accesson building module.
- Euclidean distance can be used to calculate the relative distance between peaks (data set 1, 2, 3, 4), and other commonly used vector distance calculation methods can also be used, such as Pearson correlation coefficient, cityblock distance Wait.
- the KNN algorithm can be used to cluster the peaks into a specified number of Accesson (data set 1, 2, 3, 4).
- the clustering algorithm can be a common vector clustering algorithm, such as DBSCAN, K-Mean, etc.
- the number of specified accesson will not affect the result over a wide distance ( Figure 2), so the default is 2000, which can be adjusted according to specific data.
- the accesson is first selected according to the basic nature of the accesson, such as removing the accesson whose peak number is less than the specified value, or removing the accesson whose internal Gini coefficient is less than the specified value.
- the cell*peak reading matrix is merged into the cell*accesson matrix.
- the merging method is to take the sum of the peak readings in the accesson (data set 1, 2, 3, 4).
- the visualization module can be used to reduce the dimension of the cell *accesson reading matrix to a two-digit visualization matrix, and/or the cell clustering module can be used to cluster cells, and/or the cell development path remodeling module can be used to construct cell development Route false time situation.
- Figure 3 shows scATAC-seq data of human leukemia cells and related lineage cells: A. Data clustering (hierarchical clustering) and B. Visualization effect (tSNE);
- Figure 4 shows scATAC-seq data related to the development and differentiation lineage of artificial hematopoietic stem cells: data development path remodeling (monocle);
- Figure 5 shows scATAC-seq data of mouse forebrain nerve cells: data clustering (KNN) and visualization (tSNE);
- Figures 6A-6D are mouse thymic T cell scATAC-seq data: Figure 6A is Louvain clustering, Figure 6B is hierarchical clustering, Figure 6C is visualization (tSNE), and Figure 6D is developmental path remodeling (monocle).
- the present invention can realize reshaping from fastq to clustering, visualization and developmental path.
- the clustering effect (ARI) is statistically significantly higher than the existing methods, as shown in Figures 7 and 8.
- the reason why it can efficiently restore cell heterogeneity information is that the accesson construction method proposed in this method is a filtering process that reduces noise and amplifies the signal.
- the present invention can The sparse cell*peak matrix is transformed into a denser cell*accesson matrix, which reduces the noise signal in subsequent analysis; 2) Compared with the Cicero method based on chromatin position for peak merging, the present invention uses mathematical distance and clustering Algorithm to cluster the peaks and merge them.
- the peaks clustered together by this method have similar expression patterns. Therefore, the construction of accesson is more biologically meaningful. For example, the peaks within an accesson may be regulated by the same transcription factor, or closer in the three-dimensional structure of chromatin. Therefore, the *accesson matrix of transformed cells further amplifies the heterogeneity of cells.
- the present invention also provides a single-cell chromatin accessibility sequencing data analysis device based on peak clustering, including:
- a memory has instructions stored thereon, and when the instructions are executed by the processor, the processor executes the analysis method.
- the present invention also provides a computer-readable storage medium storing instructions, which when executed by a processor cause the processor to execute the analysis method.
- each functional module/unit in the present invention can be hardware, for example, the hardware can be a circuit, including a digital circuit, an analog circuit, and so on.
- the physical realization of the hardware structure includes but is not limited to physical devices, which includes but is not limited to transistors, memristors, and so on.
- the data processing module can be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
- the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC, etc.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Claims (16)
- 一种基于峰聚类的单细胞染色质可及性测序数据分析方法,包括:A single-cell chromatin accessibility sequencing data analysis method based on peak clustering includes:将单细胞染色质可及性测序数据与相应的生物样本基因组数据进行比对获得比对结果,并在所述比对结果的基础上寻峰,并计算每个峰内读数,得到细胞*峰的读数矩阵;Compare the single-cell chromatin accessibility sequencing data with the corresponding biological sample genome data to obtain the comparison result, and find the peak based on the comparison result, and calculate the reading within each peak to obtain the cell*peak The reading matrix;计算细胞*峰的读数矩阵中峰与峰之间的数学距离,将峰聚类,并将细胞*峰的读数矩阵合并为细胞*accesson的读数矩阵,其中accesson为聚类后的峰。Calculate the mathematical distance between the peaks in the cell*peak reading matrix, cluster the peaks, and merge the cell*peak reading matrix into the cell*accesson reading matrix, where accesson is the clustered peak.
- 根据权利要求1所述的分析方法,其中,所述方法还包括将所述细胞*accesson的读数矩阵降维为二位可视化矩阵,优选地,降维的方法包括PCA、T-SNE或UMAP.The analysis method according to claim 1, wherein the method further comprises dimensionality reduction of the reading matrix of the cell *accesson to a two-digit visualization matrix, preferably, the dimensionality reduction method includes PCA, T-SNE or UMAP.
- 根据权利要求1或2所述的分析方法,其中,所述方法还包括根据所述细胞*accesson的读数矩阵对细胞进行聚类,优选地,聚类算法包括KNN聚类、kernel聚类或louvain聚类。The analysis method according to claim 1 or 2, wherein the method further comprises clustering the cells according to the reading matrix of the cell *accesson, preferably, the clustering algorithm comprises KNN clustering, kernel clustering or louvain Clustering.
- 根据权利要求1-3中任一项所述的分析方法,其中,所述方法还包括利用所述细胞*accesson的读数矩阵构建细胞发育路径假时间情况,优选地,构建细胞发育路径假时间情况时所用算法包括SPRING或monocle。The analysis method according to any one of claims 1 to 3, wherein the method further comprises using the read matrix of the cell *accesson to construct a false time situation of the cell development path, preferably, construct a false time situation of the cell development path The algorithm used here includes SPRING or monocle.
- 根据权利要求1-4中任一项所述的分析方法,其中,所述数学距离包括欧氏距离、皮尔逊相关系数或cityblock距离。The analysis method according to any one of claims 1 to 4, wherein the mathematical distance includes Euclidean distance, Pearson correlation coefficient or cityblock distance.
- 根据权利要求1-5中任一项所述的分析方法,其中,所述峰聚类的方法包括KNN、DBSAN或K-Mean。The analysis method according to any one of claims 1 to 5, wherein the method of peak clustering comprises KNN, DBSAN or K-Mean.
- 根据权利要求1-6中任一项所述的分析方法,其中,将细胞*峰的读数矩阵合并为细胞*accesson的读数矩阵的方法包括取accesson中峰读数的和、峰读数的平均值、峰读数的中位数或峰读数的方差。The analysis method according to any one of claims 1-6, wherein the method of combining the reading matrix of cell*peak into the reading matrix of cell*accesson comprises taking the sum of the peak readings in the accesson, the average of the peak readings, The median of the peak readings or the variance of the peak readings.
- 一种基于峰聚类的单细胞染色质可及性测序数据分析***,包括预处理模块和accesson构建模块;A single-cell chromatin accessibility sequencing data analysis system based on peak clustering, including a preprocessing module and an accesson building module;其中,预处理模块包括a)比对单元,用于将单细胞染色质可及性测序数据与相应的生物样本基因组数据进行比对获得比对结果;b)寻峰单元,用于将所有单细胞的比对结果合并,然后寻峰;c)读数计算单元,计算每个峰内的读数,得到细胞*峰的读数矩阵;Among them, the preprocessing module includes a) a comparison unit, which is used to compare single-cell chromatin accessibility sequencing data with corresponding biological sample genome data to obtain a comparison result; b) a peak finding unit, which is used to compare all single cells The comparison results of the cells are combined, and then the peak is searched; c) The reading calculation unit calculates the readings in each peak to obtain the reading matrix of the cell*peak;accesson构建模块包括a)峰距离计算单元,用于计算细胞*峰的读数矩阵中峰与峰之间的数学距离;b)峰聚类单元,用于根据峰与峰之间的数学距离将峰聚类;c)矩阵转换单元,用于将细胞*峰的读数矩阵合并为细胞*accesson的读数矩阵,其中accesson为聚类后的峰。The accesson building module includes a) a peak distance calculation unit, used to calculate the mathematical distance between peaks in the cell*peak reading matrix; b) a peak clustering unit, used to cluster peaks based on the mathematical distance between peaks C) A matrix conversion unit for combining the reading matrix of the cell*peak into the reading matrix of the cell*accesson, where the accesson is the peak after clustering.
- 根据权利要求8所述的分析***,其中,所述***还包括可视化模块,用于将所述细胞*accesson的读数矩阵降维为二位可视化矩阵,优选地,降维的方法包括PCA、T-SNE或UMAP。The analysis system according to claim 8, wherein the system further comprises a visualization module for reducing the dimensionality of the reading matrix of the cell *accesson to a two-digit visualization matrix. Preferably, the dimensionality reduction method includes PCA, T -SNE or UMAP.
- 根据权利要求8或9所述的分析***,其中,所述***还包括细胞聚类模块,用于根据所述细胞*accesson的读数矩阵对细胞进行聚类,优选地,聚类算法包括KNN聚类、kernel聚类或louvain聚类。The analysis system according to claim 8 or 9, wherein the system further comprises a cell clustering module for clustering the cells according to the reading matrix of the cell *accesson, preferably, the clustering algorithm includes KNN clustering Class, kernel clustering or louvain clustering.
- 根据权利要求8-10中任一项所述的分析***,其中,所述***还包括细胞发育路径重塑模块,用于利用所述细胞*accesson的读数矩阵构建细胞发育路径假时间情况,优选地,构建细胞发育路径假时间情况时所用算法包括SPRING或monocle。The analysis system according to any one of claims 8-10, wherein the system further comprises a cell development path remodeling module for constructing a false time situation of a cell development path using the reading matrix of the cell *accesson, preferably In particular, the algorithms used to construct false-time conditions of cell development paths include SPRING or monocle.
- 根据权利要求8-11中任一项所述的分析***,其中,所述数学距离包括欧氏距离、皮尔逊相关系数或cityblock距离。The analysis system according to any one of claims 8-11, wherein the mathematical distance includes Euclidean distance, Pearson correlation coefficient or cityblock distance.
- 根据权利要求8-12中任一项所述的分析***,其中,所述峰聚类的方法包括KNN、DBSAN或K-Mean。The analysis system according to any one of claims 8-12, wherein the method of peak clustering comprises KNN, DBSAN or K-Mean.
- 根据权利要求8-13中任一项所述的分析***,其中,将细胞*峰的读数矩阵合并为细胞*accesson的读数矩阵的方法包括取accesson中峰读数的和、峰读数的平均值、峰读数的中位数或峰读数的方差。The analysis system according to any one of claims 8-13, wherein the method of combining the reading matrix of the cell*peak into the reading matrix of the cell*accesson comprises taking the sum of the peak readings in the accesson, the average of the peak readings, The median of the peak readings or the variance of the peak readings.
- 一种基于峰聚类的单细胞染色质可及性测序数据分析装置,包括:A single-cell chromatin accessibility sequencing data analysis device based on peak clustering includes:处理器;processor;存储器,其上存储有指令,所述指令在由所述处理器执行时使得所述处理器执行权利要求1-7中任一项所述的分析方法。A memory having instructions stored thereon, and when the instructions are executed by the processor, the processor executes the analysis method according to any one of claims 1-7.
- 一种存储指令的计算机可读存储介质,所述指令在由处理器执行时使得所述处理器执行权利要求1-7中任一项所述的分析方法。A computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to execute the analysis method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/080443 WO2020198942A1 (en) | 2019-03-29 | 2019-03-29 | Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/080443 WO2020198942A1 (en) | 2019-03-29 | 2019-03-29 | Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020198942A1 true WO2020198942A1 (en) | 2020-10-08 |
Family
ID=72664796
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/080443 WO2020198942A1 (en) | 2019-03-29 | 2019-03-29 | Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2020198942A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116153404A (en) * | 2023-02-28 | 2023-05-23 | 成都信息工程大学 | Single-cell ATAC-seq data analysis method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930862A (en) * | 2016-04-13 | 2016-09-07 | 江南大学 | Density peak clustering algorithm based on density adaptive distance |
CN107002122A (en) * | 2014-07-25 | 2017-08-01 | 华盛顿大学 | It is determined that causing the tissue of the generation of Cell-free DNA and/or the method for cell type and the method for identifying disease or disorder using it |
CN107368701A (en) * | 2017-07-31 | 2017-11-21 | 浙江绍兴千寻生物科技有限公司 | In high volume unicellular ATAC seq data quality controls and analysis method |
WO2018132518A1 (en) * | 2017-01-10 | 2018-07-19 | Juno Therapeutics, Inc. | Epigenetic analysis of cell therapy and related methods |
-
2019
- 2019-03-29 WO PCT/CN2019/080443 patent/WO2020198942A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107002122A (en) * | 2014-07-25 | 2017-08-01 | 华盛顿大学 | It is determined that causing the tissue of the generation of Cell-free DNA and/or the method for cell type and the method for identifying disease or disorder using it |
CN105930862A (en) * | 2016-04-13 | 2016-09-07 | 江南大学 | Density peak clustering algorithm based on density adaptive distance |
WO2018132518A1 (en) * | 2017-01-10 | 2018-07-19 | Juno Therapeutics, Inc. | Epigenetic analysis of cell therapy and related methods |
CN107368701A (en) * | 2017-07-31 | 2017-11-21 | 浙江绍兴千寻生物科技有限公司 | In high volume unicellular ATAC seq data quality controls and analysis method |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116153404A (en) * | 2023-02-28 | 2023-05-23 | 成都信息工程大学 | Single-cell ATAC-seq data analysis method |
CN116153404B (en) * | 2023-02-28 | 2023-08-15 | 成都信息工程大学 | Single-cell ATAC-seq data analysis method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hao et al. | Dictionary learning for integrative, multimodal and scalable single-cell analysis | |
CN111755071B (en) | Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering | |
Peng et al. | Combining gene ontology with deep neural networks to enhance the clustering of single cell RNA-Seq data | |
Xiong et al. | Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space | |
Persad et al. | SEACells infers transcriptional and epigenomic cellular states from single-cell genomics data | |
Ding et al. | Biological process activity transformation of single cell gene expression for cross-species alignment | |
Chen et al. | A road map from single-cell transcriptome to patient classification for the immune response to trauma | |
Bonnal et al. | De novo transcriptome profiling of highly purified human lymphocytes primary cells | |
CN109558493B (en) | Disease similarity calculation method based on disease ontology | |
Pragadeesh et al. | Hybrid feature selection using micro genetic algorithm on microarray gene expression data | |
Bai et al. | Using machine learning for the early prediction of sepsis-associated ARDS in the ICU and identification of clinical phenotypes with differential responses to treatment | |
Choi et al. | Sparsely correlated hidden Markov models with application to genome-wide location studies | |
WO2020198942A1 (en) | Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering | |
Huang et al. | GOMA: functional enrichment analysis tool based on GO modules | |
Becker et al. | Large-scale correlation network construction for unraveling the coordination of complex biological systems | |
Bihan et al. | Development and validation of a predictive tool for postpartum hemorrhage after vaginal delivery: a prospective cohort study | |
Turenne et al. | Finding biomarkers in non-model species: literature mining of transcription factors involved in bovine embryo development | |
He et al. | Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS | |
Edlow et al. | The pathway not taken: understanding ‘omics data in the perinatal context | |
Lin et al. | Atlas-scale single-cell multi-sample multi-condition data integration using scMerge2 | |
Xu et al. | Structure-preserving visualization for single-cell RNA-Seq profiles using deep manifold transformation with batch-correction | |
Helman et al. | Risk of Preterm Birth among Secundiparas with a Previous Cesarean due to a Failed Vacuum Delivery | |
Sunami et al. | Local conformational changes in the DNA interfaces of proteins | |
Su et al. | Distribution‐Agnostic Deep Learning Enables Accurate Single‐Cell Data Recovery and Transcriptional Regulation Interpretation | |
Vychegzhanin et al. | Selecting an optimal feature set for stance detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19923338 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19923338 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19923338 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22/04/2022) |