CN116949132A

CN116949132A - Method for constructing single-cell sequencing library

Info

Publication number: CN116949132A
Application number: CN202310657205.6A
Authority: CN
Inventors: 张强锋; 唐磊
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2023-06-05
Filing date: 2023-06-05
Publication date: 2023-10-27

Abstract

The application provides a method for constructing a single-cell sequencing library, and sequencing the constructed single-cell sequencing library to realize paired analysis of RNA and chromatin accessibility in the same single cell. The library obtained by the method has good data specificity, high quality and a large number of captured genes through sequencing verification. Moreover, there are few doublets and the collision rate is extremely low.

Description

Method for constructing single-cell sequencing library

Technical Field

The application relates to the technical field of gene sequencing, in particular to a method for carrying out pairing analysis on RNA and chromatin accessibility in the same single cell by sequencing.

Background

Single cell sequencing technology has evolved from single cell RNA-seq to ultra high throughput, multi-modal single cell sequencing. G & T-seq detects single cell genomes and transcriptomes in the same cell. Sctrio-seq analyzes the relationship between genome, DNA methylation and transcriptome of individual mammalian cells, while CITE-seq measures both epitopes and transcriptome in single cells.

In a multimode single cell sequencing technique, the sci-CAR, SNARE-Seq, paire-Seq, SHARE-Seq and Chromium single cell multicellular multiple sets of chemical ATAC+ gene expression cassettes can localize chromatin and RNA in the same single cell. These methods dissect tissue heterogeneity and reveal related epigenomic regulatory elements. However, sci-CAR barcodes bind poorly and have high collision rates, and the efficiency of both labelling and reverse transcription reactions is not ideal when the number of cells per tube is excessive. SHARE Seq requires custom sequencing to read both fragments of the ATAC Seq library, increasing the sequencing cost. SNARE-Seq labeled cells with DNA barcode microbeads were encapsulated in nanoliter droplets using Drop-Seq system, which showed low cell yield (10 k per experiment) and very high ratio of more than 2 cells labeled with the same barcoded (11.3%). Optimal joint analysis data were obtained for each single cell, multiple sets of atac+ gene expression kits, but at high cost, with a flux similar to SNARE-Seq.

In the droplet-based single cell sequencing (dsc-seq) method, when loaded into the same droplet, two cells will acquire the same barcode, known as a doubcet, which can affect single cell data analysis. The number of Dsc-seq loaded cells is much smaller than the droplets it produces to avoid a doubclet. For example, a 10x Genomics Chromium platform can produce about 100k droplets containing bar code microbeads and bar code reagents, but can only recover 10k single cells at a collision rate of about 10%. Over 80% of functional droplets never receive individual cells, wasting most of the reagents and leading to high costs for their extensive investigation.

Thus, the present application proposes an ultra-high throughput, multimode single cell technique that measures gene expression and chromatin accessibility in the same cell in Parallel, called (Parallel-seq).

Disclosure of Invention

The application provides a single cell ultra-high throughput double-group technology (single cell combined fluid labeling (scifi)) which can simultaneously measure gene expression and chromatin accessibility of the same cell. Compared with the prior multimode single-cell analysis method, the Parallel-Seq carries out four-wheel bar code index only through one-wheel ligation reaction and two-wheel amplification reaction, and the Parallel-slit-Seq carries out four-wheel bar code index only through two-wheel ligation reactions and one-wheel amplification reaction, so that the combined analysis of open chromatin and gene expression in the same single cell is realized, and the deconvolution of cis-regulatory elements for regulating gene expression can be carried out. Benchmark tests were performed on Parallel-Seq and Parallel-Split-Seq with several human and mouse cell lines and were applied to primary cells of human lung cancer samples. The results show that the library has good data specificity, high quality and large number of captured genes. Moreover, there are few doublets and the collision rate is extremely low. The method for constructing the single-cell sequencing library has the advantage that the super-label combined space can execute large-scale cytogram projects at lower cost.

In a first aspect of the invention, there is provided a method of constructing a single cell sequencing library, said method comprising cleaving open chromatin using a transposon to obtain a DNA fragment carrying a first adaptor; reverse transcription of the mRNA with the addition of reverse transcription primers to obtain the first strand of cDNA carrying the second linker, results in the construction of a chromatin DNA library and a transcriptome library in the same cell.

Preferably, the method further comprises placing the cell on a carrier, and ligating the obtained first adaptor-carrying DNA fragment and the obtained first cDNA first strand, respectively, to the carrier using a first carrier-specific adaptor.

Preferably, the method further comprises synthesizing a second strand of cDNA.

Preferably, the method further comprises forming droplets, lysing the cells and performing an amplification reaction in the droplets, preferably overloading the cells in the formed droplets.

Preferably, the method further comprises purifying the DNA and amplifying the cDNA and the chromatin DNA of the transcriptome library with primers, respectively.

Preferably, the method further comprises adding an rnase.

Preferably, the method further comprises obtaining cells, and immobilizing and permeabilizing the cells.

In one embodiment of the invention, a method of constructing a single cell sequencing library comprises:

a) Cleaving the open chromatin using a transposon to obtain a DNA fragment carrying the first adaptor;

b) Reverse transcription primer is added to carry out reverse transcription on mRNA to obtain cDNA first strand carrying second linker;

c) Placing the cell on a carrier, and respectively connecting the DNA fragment carrying the first linker obtained in the step a) and the cDNA first strand obtained in the step b) to the carrier by using a first carrier specific linker;

d) Synthesizing a second strand of the cDNA;

e) The cDNA and chromatin DNA of the transcriptome library were amplified separately with primers.

The step a) and the step b) can be performed simultaneously or sequentially. For example, step a) may be performed before step b), or step b) may be performed before step a).

Preferably, step a) is performed before step b).

Preferably, more than 10 transposons, more than 100 transposons, more than 1000 transposons, more than 10000 transposons, etc. are included.

The transposon comprises a barcode sequence and a transposase.

The transposases include, but are not limited to, tn5 transposase, mu transposase, tn7 transposase, or IS5 transposase. In one embodiment of the invention, the transposase is a Tn5 transposase. The Tn5 transposase carries a sequence as shown in SEQ ID NO:1 or 12.

The barcode sequence comprises a first linker. Further preferably, the barcode sequence comprises a first index. The first linker comprises a first index and a transposase binding site.

The first linker comprises at least one linker that is the same or different. Further preferably, said first linker comprises at least 4 identical or different linkers. In one embodiment of the invention, at least 4-96 identical or different linkers are included.

The barcode sequence is followed from 5'-3' by an overhang, a first index and a transposase binding site. The overhang is a sequence complementary to the subsequent primer.

Preferably, the second linker comprises at least one identical or different linker. Further preferably, said second linker comprises at least 4 identical or different linkers.

The reverse transcription primer comprises a second adaptor comprising poly (T) and a first index; preferably also random hexamer primers.

In one embodiment of the invention, the reverse transcription primer comprises a sequence complementary to the poly (T) and the first index and subsequent primers.

In one embodiment of the invention, the first and second adaptors may comprise sequences complementary to the same subsequent primer.

In a specific embodiment of the present invention, the first index includes at least one, two or more than three of AACAAC, ACCGCA, AGTTGG, CCACGT, CGTGTT, GTTCTC, TGACTA, TCAAGG, AACGGT, AAGCCT, ACATGA, ACTCTA, AGAAGT, AGTACC, ATGCGA, CAATAG, CATCCA, CCTGGA, CGAGAC, CGCTCA, GCGTAA, GGATCG, GTGAGG, TCCTTA, TCTGCC, TTAACC or TTAGTG.

In one embodiment of the invention, the barcode sequence comprises SEQ ID NO:2 and SEQ ID NO:1 or 12, or a combination of at least one, two or more than three after hybridization.

In one embodiment of the invention, the reverse transcription primer comprises the sequence of SEQ ID NO: 3. 4, or a combination of at least one, two, or three or more.

The first vector-specific linker comprises a second index.

The first vector-specific linker comprises UMI.

Preferably, the second index comprises a combination of at least one, two or more than three of AAGACCAA, AAGCTACG, AAGGTCAT, AATAGTGG, AATGCCTT, ACAATAGC, ACAGGATT, ACCGACCT, ACCTAGAT, ACGAGTCC, ACGGACGA, ACGTTCAA, ACTATCTG, ACTCCGAA, AGAACAGA, AGACGCTT, AGATGCGA, AGCCACTC, AGCGAAGC, AGGTAACG, AGTACATC, AGTGATTC, ATAAGAGG, ATATCACG, ATCGCCGT, ATGACGGA, ATGGAATG, ATTCCTAC, CAACGCCA, CAAGTCTG, CACACATC, CACCTTAT, CAGAACCT, CAGCCGAT, CATACTGT, CATCCACC, CATTGAGC, CCAAGCGT, CCACGACT, CCATTGTC, CCGCATGT, CCTACTCC, CCTCCTTG, CCTTAATG, CGAATATC, CGAGAGCA, CGCCTCAA, CGCGTTAC, CGGACTCT, CGGTTGTT, CGTAGCTT, CGTGCCAA, CTACCGGA, CTAGCAGT, CTCAGCCT, CTCTTCTA, CTGCTGGT, CTGTATTC, CTTCGCTC, GAAGAGTA, GACACCTA, GACGTGAG, GACTTACT, GAGGACAA, GAGTTAAG, GATCCTCG, GCAATCCG, GCAGTGTG, GCCGCTAA, GCGACCAT, GCTAAGAC, GCTGTAGG, GGAACTGG, GGACAGTT, GGATTGCT, GGTCCTAA, GTACCTGT, GTCAAGGA, GTCTGCTT, GTGCTCCA, GTGTGACC, GTTATTGG, TAATTCGG, TACCAATC, TAGACTCC, TAGTCAAC, TCACGTTG, TCAGAATG, TCCAGCTT, TCCTGCGA, TCGGTTCC, TCTTACCT, TGACATGG, TGCCTATA, TGGTGTGG, TGTACTAG.

Preferably, the first vector-specific adaptor comprises a second index, UMI and a sequence complementary to a reverse transcription primer or transposon sequence.

In one embodiment of the invention, the first vector-specific adaptor is a sequence complementary to a reverse transcription primer or transposon sequence, UMI, a second index and a sequence complementary to a sequence contained on the vector in order from 5 '-3'.

In a specific embodiment of the invention, the first vector-specific linker comprises SEQ ID NO:6.

in a specific embodiment of the invention, the vector comprises SEQ ID NO:5.

in another embodiment of the invention, the first vector-specific linker comprises SEQ ID NO:15.

in another embodiment of the invention, the vector comprises SEQ ID NO:13.

the method further comprises the steps of forming droplets, lysing the cells and performing an amplification reaction in the droplets, preferably overloading the cells in the formed droplets. The droplets are overloaded, so that all functional droplets are used, and the flux of the microfluidic device is greatly improved. Linear amplification in droplets avoids purification of the unamplified product and can be easily combined with CRISPR screening, DNA methylation analysis, protein expression analysis, which may lead to single cell cross-histologic sequencing or even whole-set sequencing of single cells.

Preferably, the primer used to perform the amplification reaction in the droplet comprises a third index.

In one embodiment of the invention, the primers used in performing the amplification reaction in the droplets comprise SEQ ID NO:8.

preferably, the linear amplification further comprises the step of lysing the droplets. In one embodiment of the present invention, the cracking droplets are cracked using a demulsifier.

Preferably, the method comprises ligating the above obtained DNA fragment carrying the first adaptor and the above obtained cDNA first strand, respectively, to the vector using a second vector-specific adaptor. Preferably, the second vector-specific linker comprises a third index.

In a specific embodiment of the invention, the second vector-specific linker comprises SEQ ID NO:16.

in a specific embodiment of the invention, the vector comprises SEQ ID NO:14.

in one embodiment of the present invention, the third index comprises a combination of at least one, two or more than three of AACCTCTT, AACGTCGC, AAGAATCG, AAGCGGTG, AAGGAGCT, AATACCGC, AATCTCCA, ACAACTTC, ACACGCAA, ACCACAGT, ACCGTGTA, ACCTTGCC, ACGCATAA, ACGTATGG, ACTAACCA, ACTCAGGT, ACTTGTTG, AGAAGTAC, AGAGATGA, AGATTAGG, AGCCTGGT, AGCTCTAA, AGGTGTCT, AGTCCGTT, AGTTCGCA, ATAAGCTC, ATCCATGA, ATCTAGCG, ATGCAACC, ATGTGCAG, ATTGGTAG, CAAGAAGA, CAATGGAC, CACATGCT, CACGGTAG, CAGAGGTT, CAGTATAG, CATCAAGT, CATGTTCC, CCAACAAT, CCAATTAC, CCAGTGAA, CCGATCAG, CCGGTCTT, CGACAACG, CGCCAGTA, CGCGGAAT, CGGAAGGA, CGGTGAGA, CGTAACAC, CGTCTATG, CGTTCTCG, CTACTAAG, CTAGTGCG, CTCTGACA, CTGATGAA, CTGGTACA, CTTACGAG, GAACTCAA, GAATGTTG, GACGAATT, GACTGCCA, GAGCTATT, GAGTCGGA, GATAGAAC, GATGGTCT, GCAGCACT, GCATTCAT, GCCTCTGT, GCGCAGAT, GCTCACAA, GCTTGCGT, GTAATGCA, GTATCGAG, GTCGATCT, GTGAGCGT, GTGGATAG, GTTAGCCA, TAAGGTGG, TACACCGG, TACTCGTC, TAGCTGAG, TCAACAGG, TCACTCAC, TCATAGAC, TCCGTACA, TCGGAGTA, TCGTCGGT, TGAACGCG, TGAGTCTT, TGCGACTG, TGGTTATC, TGTGTAAG, TTAGGAAC, TTCAGTGG, TTCTATCC.

Preferably, the method further comprises the step of purifying the DNA.

The primers in the amplification reaction performed after purification of the DNA comprise a fourth index.

Preferably, the fourth index comprises a combination of at least one, two or more than three of the P3xx indexes;

preferably, the fourth index comprises a combination of at least one, two or more than three of N7 xx;

preferably, the fourth index comprises a combination of at least one, two or more than three of P5 xx;

preferably, the fourth index contains a combination of at least one, two, or more than three of N5 xx.

To add a fourth index (e.g., the P3xx index), the primers used to amplify the transcriptome are SEQ ID NO: 9. 10.

To increase the fourth index (e.g., P5 xx), the primers used to amplify the transcriptome are SEQ ID NO: 20. 18.

To increase the fourth index (e.g., N7 xx), primers used to amplify the open chromatin fragment are SEQ ID NO: 9. 11. To increase the fourth index (e.g., N5 xx), primers used to amplify the open chromatin fragment are SEQ ID NO: 20. 19. In one embodiment of the invention, the carrier comprises a well, tube or plate.

Preferably, the carrier is an ELISA plate, such as a 96-well plate.

Preferably, the method further comprises adding an rnase. RNA was removed from the first strand cDNA by RNase cleavage, followed by second strand synthesis with random primers, avoiding disruption of the open chromatin fragment by 0.1N NaOH and contamination of the RNA-seq pool.

In a second aspect of the invention, a method of constructing a multi-mode single-cell sequencing library is provided, the method of constructing comprising constructing a single-cell sequencing library according to the above method.

In a third aspect of the invention, there is provided a method of constructing a transcriptome library, said method comprising reverse transcribing mRNA with the addition of reverse transcription primers to obtain a first strand of cDNA carrying a second adaptor; placing the cell on a vector, and ligating the obtained first strand of the cDNA to the vector using a first vector-specific adaptor; synthesizing a second strand of the cDNA; the transcriptome cDNA was purified and amplified with primers.

Preferably, the reverse transcription primer comprises a second adaptor comprising poly (T) and a first index; preferably also random hexamer primers.

Preferably, the first vector-specific linker comprises a second index.

Preferably, the method further comprises the steps of forming droplets, lysing the cells and performing an amplification reaction in the droplets,

preferably, the droplets formed are overloaded with cells;

Preferably, the method comprises ligating the first strand of the obtained cDNA to a vector using a second vector-specific adaptor, preferably comprising a third index.

Preferably, the primers in the amplification reaction performed after purification of the DNA comprise a fourth index.

Preferably, the method further comprises adding an rnase.

In a fourth aspect of the invention, there is provided a method of constructing a chromatin DNA library, the method comprising cleaving open chromatin using a transposon to obtain a DNA fragment carrying a first adaptor; placing the cells on a carrier, and connecting the obtained DNA fragment carrying the first linker to the carrier by using a first carrier specific linker; the DNA was purified and the chromatin DNA was amplified separately with primers.

Preferably, the transposon comprises a barcode sequence and a transposase; preferably, the barcode sequence comprises a first linker; further preferably, the barcode sequence further comprises a first index.

Preferably, the first vector-specific linker comprises a second index.

preferably, the droplets formed are overloaded with cells;

Preferably, the method comprises ligating the obtained DNA fragment carrying the first adaptor to the vector using a second vector specific adaptor, preferably said second vector specific adaptor comprising a third index.

Preferably, the primer used to amplify the chromatin DNA comprises a fourth index.

In a fifth aspect of the invention, there is provided a nucleic acid library obtained by the method described above.

In a sixth aspect of the invention, there is provided a nucleic acid library comprising at least one fragment DNA comprising at least one index, and at least one unique molecular signature.

Preferably, the index is one, two, three, four, five, six, seven, eight, nine or more than ten.

Preferably, the index includes a first index, a second index, a third index and/or a fourth index.

In one embodiment of the present invention, the nucleic acid library comprises at least one fourth index, fragment DNA, first index, second index, third index, in order from 5 'to 3'.

Preferably, the unique molecular identifier is located between the fourth index and the fragment DNA, between the fragment DNA and the first index, between the first index and the second index, or between the second index and the third index.

In a seventh aspect of the invention, there is provided a sequencing method comprising constructing a nucleic acid library as described above.

In an eighth aspect of the invention, there is provided a use of the nucleic acid library described above, including tumor target screening, disease monitoring or preimplantation embryo diagnosis.

In a ninth aspect of the invention, there is provided a method of analysing chromatin accessibility and transcription in the same cell, said method comprising the steps of constructing a single cell sequencing library as described above, constructing a transcriptome library as described above, and constructing a chromatin DNA library as described above.

In a tenth aspect of the present invention, there is provided a method of analysis of single cell multicellular genetics, the method comprising constructing a single cell sequencing library, constructing a transcriptome library as described above, constructing a chromatin DNA library as described above, and sequencing to obtain chromatin accessibility and/or transcriptome sequence information, followed by bioinformatic analysis.

In an eleventh aspect of the invention, there is provided a kit comprising reagents for constructing a nucleic acid library as described above.

The "chromatin accessibility" of the present invention is the degree of openness of eukaryotic chromatin DNA after the binding of proteins such as nucleosomes or transcription factors to other proteins, whether or not the proteins can be bound again. Wherein the region to which other proteins can be recombined is open chromatin.

The "carrier" according to the present invention may be any object having a solid support surface, the surface of which may be modified to be coupled to a cell or nucleic acid molecule. It may be a pore glass (CPG), oxalyl-pore glass, tentaGel support-an aminopolyethylene glycol derived support, polystyrene, poros (a copolymer of polystyrene/divinylbenzene) or a reversible cross-linked acrylamide. Many other solid supports are commercially available and suitable for use in the present invention. In some embodiments, it may be polystyrene resin or poly (methyl methacrylate) (PMMA). Or may be metal.

The "droplets" described herein are oil-in-water or water-in-oil structures. Different droplets may have different identifications. Preferably the aqueous mixture is combined with the oil phase. Preferably, the oil phase is a surfactant.

"permeabilization" as used herein refers to the technique of changing the permeability of cell walls and membranes so that small molecular substances and some larger molecular substances can freely enter and exit the cell without causing cell lysis and without destroying the organic structures within the cell. The cell is subjected to permeabilization treatment, the permeability is improved, the whole structure is kept complete, the cell has a considerable protection effect on intracellular enzymes, the full play of the catalysis effect of the intracellular enzymes can be ensured, and the service life of the enzymes is prolonged.

The overload is the overload exceeding the original bearing capacity. The original bearing capacity is the conventional bearing capacity in the prior art. For example, "overloaded cells in a droplet" means an amount of loaded cells that exceeds the amount of loaded cells in the original droplet. The prior art droplet-carrying cells include empty, single-cell-carrying or overloaded cells. Wherein overloaded cells represent more than one number of cells carried in one droplet. Preferably two, three, four, five, six, seven, eight or more than nine are carried.

The "linker" as described herein is used interchangeably with the adapter of the prior art and can be used to link fragmented DNA to an index, or to link an index to an index, or to link fragmented DNA to fragmented DNA. It is preferably a nucleotide sequence of 3 to 1000 bases in length.

The "index" described in the present invention may be used interchangeably with index, barcode, etc. in the prior art. The index may be a sequence or a combination of sequences. It is preferably a nucleotide sequence of 3 to 1000 bases in length.

The unique molecular identifier Unique Molecular Identifier, abbreviated as UMI, is a randomly designed nucleotide sequence that can specifically identify the coupled molecules, but not all coupled molecules have unique UMI, and in one embodiment, it is combined with other indexes to form a unique molecular identifier.

"complementary" as used herein refers to nucleotide sequences that are related by base pairing rules. For example, the sequence 5'-AGT-3' is complementary to the sequence 5 '-ACT-3'. Complementarity may be partial or complete. Partial complementarity occurs when one or more nucleobases do not match according to the base pairing rules. Complete or complete complementarity between nucleic acids occurs when each nucleic acid base matches another base under the base pairing rules. The degree of complementarity between nucleic acid strands has a significant effect on the efficiency and strength of hybridization between nucleic acid strands.

As used herein, "single cell" refers to a single cell or a cell, which may be derived from a blood sample, cell culture, or from a specific tissue, organ, tumor, or the like. And then separated into individual cells by a conventional separation method in the prior art.

"diallet" or "diallets" as used herein refers to the case where two or more cells share a single identifier, such as an index, linker, unique molecular identifier, or the like, or a combination thereof.

As used herein, "nucleic acid" refers to DNA, RNA, single-stranded, double-stranded, or more highly aggregated hybridization motifs, as well as any chemical modifications thereof. Modifications include, but are not limited to, those that provide chemical groups that incorporate other charges, polarizability, hydrogen bonding, electrostatic interactions, points of attachment and points of action with the nucleic acid ligand base or nucleic acid ligand entity. Such modifications include, but are not limited to, peptide Nucleic Acids (PNAs), phosphodiester group modifications (e.g., phosphorothioates, methylphosphonates), 2' -position sugar modifications, 5-position pyrimidine modifications, 8-position purine modifications, modifications at exocyclic amines, substitutions of 4-thiouridine, substitutions of 5-bromo or 5-iodo-uracil, backbone modifications, methylation, unusual base pairing combinations such as iso bases (iso bases), iso cytidine and iso guanidine (isoguanidine), and the like. Nucleic acids may also comprise unnatural bases, such as nitroindoles. Modifications may also include 3 'and 5' modifications, including but not limited to capping with fluorophores (e.g., quantum dots) or other moieties.

All combinations of items to which the term "and/or" is attached "in this description shall be considered as being individually listed in this document. For example, "a and/or B" includes "a", "a and B", and "B". Also for example, "A, B and/or C" include "a", "B", "C", "a and B", "a and C", "B and C" and "a and B and C".

The term "comprising" or "comprising" as used herein is an open reading frame, and when used to describe a sequence of a protein or nucleic acid, the protein or nucleic acid may consist of the sequence, or may have additional amino acids or nucleotides at one or both ends of the protein or nucleic acid, but still have the activity described herein.

The intracellular second strand synthesis step was added to reduce the effects of cross-linked protein inhibition, capturing more transcripts. The linear amplification based on the liquid drop indexing is realized, and the cDNA capturing efficiency is improved. At the same time, PCR-anchored linkers are provided for cDNA other than chromatin fragments, avoiding contamination of the RNA-seq pool by the ATAC-seq pool.

Parallel-seq uses multiple cells to overload the droplet to make full use of the generated droplet and index back and forth to differentiate cells within one droplet, greatly expanding the barcode space. Furthermore, the length of the barcode region is significantly reduced so that it can read open fragments within 150nt of the sequencing read length through the barcode and fixed nucleotide region reads. By design, parallel-Seq first hashes cells with sample-specific barcodes during transposition and reverse transcription, making it possible to evaluate multiple samples in Parallel in one experiment and with scalability. Parallel-Seq is superior to the existing method in terms of data quality, and the flux is increased (3600 ten thousand cells per experiment), which provides a powerful tool for constructing a large-scale unit map with reasonable price. Furthermore, we applied Parallel-seq to lung cancer samples and demonstrated their ability to recognize cis-regulatory elements of accessible regions of specific genes. Combined analysis of gene expression and chromatin accessibility is applied in tumor samples and is used to identify possible regulatory elements, including oncogene enhancers and mutations, using both the combined analysis and newly developed analytical methods. In addition, parallel-Seq is easy to handle more samples in experiments and can be extended to other groups such as DNA methylation, protein expression and CRISPR screening.

Drawings

Embodiments of the present invention are described in detail below with reference to the attached drawing figures, wherein:

fig. 1: parallel-Seq experimental design diagram index, droplet overload were used to analyze the scATAC and scRNA of the same cell, where pool/split represents mixing/dispersion.

Fig. 2: parallel-Seq was performed using a mixture of NIH/3T3 (murine), HEK293T (human) and K562 (human) cells, and the results mapped to UMI counts of scRNA-Seq (upper) and scATAC-Seq (lower) of human and mouse genomes, where mm10 represents the mouse reference genome mm10 version.

Fig. 3: insert length distribution of the scataac-Seq partial fragment of Parallel-Seq.

Fig. 4: enrichment of scataC-seq reads around TSSs.

Fig. 5: the scatter plot shows the log between the scataC-Seq and the ENCODE DNase-Seq of Parallel-Seq in K562 cells ₂ (count) correlation.

Fig. 6: the scatter plot shows the log between the aggregated scRNA-Seq and the ENCODE nuclear RNA-Seq for K562 cells Parallel-Seq ₂ Correlation of (TPM+1).

Fig. 7: comparison of RNA was captured in K562 cells using Parallel-Seq and ENCODE DNase-Seq, respectively, to capture chromatin accessibility.

Fig. 8: uniform Manifold Approximation and Projection (UMAP) visualization of the parallels-Seq paired gene expression data from 3T3, 293T and K562 cell mixes.

Fig. 9: homogeneous manifold approximation and projection (UMAP) visualization of paired chromatin accessibility data from 3T3, 293T and K562 cell mixes.

Fig. 10: the box plot shows the number of unique mapped RNA reads and the number of unique mapped ATAC reads for sci-CAR, SNARE-Seq, paired-Seq, SHARE-Seq, and Parallel-Seq. Wherein, the abscissa RNA library block diagram is sci-CAR, SNARE-Seq, paired-Seq, SHARE-Seq, parallel-Seq in turn from left to right, and the ATAC library block diagram is sci-CAR, SNARE-Seq, paired-Seq, SHARE-Seq, parallel-Seq in turn from left to right.

Fig. 11: the box plot shows the number of genes captured by each cell in sci-CAR, SNARE-Seq, paired-Seq, SHARE-Seq and Parallel-Seq. Wherein, the abscissa RNA library block diagram is sci-CAR, SNARE-Seq, paired-Seq, SHARE-Seq, parallel-Seq in sequence from left to right.

Fig. 12: parallel-Split-Seq workflow schematic.

Fig. 13: UMI counts of scRNA-seq (left) and scattac-seq (right) mapped to human and mouse genomes. The experiment was performed using a mixture of NIH/3T3 (murine), HEK293T (human), heLa (human), K562 (human) and THP1 (human) cells.

Fig. 14: parallel-Split-Seq and the insert length distribution of the scaTAC-Seq fragment in Parallel-Seq.

Fig. 15: enrichment of Parall-Split-Seq with scATAC-Seq reads surrounding TSSs in Parall-Seq.

Fig. 16: the scatter plot shows the log between the scRNA-Seq and the ENCODE core RNA-Seq of Parallel-Split-Seq in K562 cells ₂ (TPM+1) correlation (FIG. A) and log2 (count) correlation between scataC-seq and code core DNase-seq (FIG. B).

Fig. 17: homogeneous manifold approximation and projection (UMAP) visualization of Parallel-Split-Seq paired gene expression (left) and chromatin accessibility (right) data from NIH/3T3, HEK293T, heLa, K562 and THP1 cells.

Fig. 18: the comparison of chromatin accessibility was captured in K562 cells using Parallel-Seq, parallel-Split-Seq and ENCODE DNase-Seq, respectively, and RNA was captured using Parallel-Seq, parallel-Split-Seq and ENCODE RNA-Seq, respectively.

Fig. 19: the box plot shows the number of unique mapped RNA reads and the number of unique mapped ATAC reads for sci-CAR, SNARE-Seq, paired-Seq, SHARE-Seq, parallel-Seq, and Parallel-Split-Seq. Wherein, the abscissa RNA library block diagram is sci-CAR, SNARE-Seq, paired-Seq, SHARE-Seq, parallel-Seq and Parallel-Split-Seq in sequence from left to right, and the ATAC library block diagram is sci-CAR, SNARE-Seq, paired-Seq, SHARE-Seq, parallel-Seq and Parallel-Split-Seq in sequence from left to right.

Fig. 20: the box plot shows the number of genes captured by each cell in sci-CAR, SNARE-Seq, paired-Seq, SHARE-Seq, parallel-Seq and Parallel-Split-Seq. Wherein, the abscissa RNA library block diagram is sci-CAR, SNARE-Seq, paired-Seq, SHARE-Seq, parallel-Seq and Parallel-Split-Seq in sequence from left to right.

Detailed Description

The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Cell culture method in examples:

HEK293T, heLa-S3 and NIH/3T3 cells in DMEM (C11995500 BT, thermoFisher) supplemented with 10% fetal bovine serum (P30-3302, PAN BIOTECH) at 37℃and 5% CO ₂ Is cultured in the environment of (2). Cells were washed with PBS (C10010500 BT, thermoFisher) and incubated with 1mL of 0.25% trypsin EDTA (25200114, thermoFisher) for 3-5 min at 37℃to isolate cells. K562 cells in RPMI 1640 supplemented with 10% fetal bovine serum (C11875500 BT) Thermo fisher) medium, 37 ℃ and 5% CO ₂ Is cultured in the environment of (2). The isolated HEK293T, heLa-S3 and NIH/3T3 cells and K562 cell suspensions were collected by centrifugation, washed with PBS and counted using Countstar.

Lung cancer sample preparation in the examples:

fresh non-small cell lung cancer solid tumor tissues are collected in the general Hospital of the liberation army and placed in precooled MACS tissue storage liquid (130-100-008,Miltenyi Biotec) (2-8 ℃). The sample must be covered entirely by MACS and transported from the hospital to the laboratory.

A separation mixture of 4677. Mu. LDMEM/F-12 (11320033, thermoFisher), 250. Mu.L of 2.5mg/mL Liberase TL (05401020001,Sigma Aldrich) to a final concentration of 250. Mu.g/mL, 23. Mu.L of 2mg/mL elastase (NC 9301601, worthington) to a final concentration of 9.2. Mu.g/mL, 50. Mu.L of 10mg/mL DNase (11284932001,Sigma Aldrich) to a final concentration of 100. Mu.g/mL was used.

The tissue was minced with scissors into small pieces of 0.4mm or less in a 1.5mL Eppendorf microcentrifuge tube. The dissociation mixture was incubated at 37℃and rotated horizontally at 90 rpm for 60 minutes. The single cell suspension was filtered through a 70 μm cell filter (15-1070, BIOLOGIX) and centrifuged at 500g (centrifugal force) at 4℃for 5 minutes. Cells were resuspended with 1mL PBS and 3mL erythrocyte lysate (4994957, TIANGEN). Incubate for 5min at room temperature and centrifuge 500g for 5min at 4 ℃. Cells were resuspended in 500. Mu.L of fetal bovine serum. mu.L of cells were mixed with 5. Mu.L of Taiban blue (15250061, thermoFisher) and counted on a C-Chip disposable hemocytometer (DHC-N01N, as One). Single cell suspensions were diluted to a final concentration of 10% with fetal bovine serum supplemented with dimethyl sulfoxide (D2650, sigma Aldrich). We cryopreserve cells, each tube containing 1x10≡6 single cells. Prior to the experiment, the cells were gently thawed for 5min at 37℃and centrifuged for 5min at 500g at 4 ℃. Cells were resuspended in 80. Mu.L of cell staining buffer (420201, bioLegend) and 5. Mu. L Human TruStainFcX (422302, bioLegend) and incubated at 4℃for 5min. mu.L of anti-CD45 PE (304039, bioLegend), 5. Mu.L of anti-CD3 BV421 (317344, bioLegend) and 5. Mu.L of anti-EpCal PE/Cy7 (324222, bioLegend) antibodies were added, respectively, and incubated at 4℃for 15min in the absence of light. The stained cells were washed with 1mL of PBS and centrifuged at 500g for 5min at 4 ℃. The supernatant was discarded, and individual cells were resuspended in 1mL PBS containing 0.02. Mu.M Calcein-AM (425201, bioLegend) and incubated at room temperature for 15 minutes in the absence of light. Cells were resuspended in 90. Mu. L Annexin V Binding Buffer, 5. Mu.L of APC Annexin V (640941, bioLegend) and 5. Mu.L of 7-AAD viability staining solution (420404, bioLegend), respectively. Incubate at room temperature for 10 min. 400. Mu.L of PBS was added and the cells were filtered using a 35 μm BD cell filter (352235, BD Falcon). Caicein-AM positive, 7-AAD negative, annexin V negative single cells were sorted using MoFloAstrios EQ Cell Sorter (Beckman Coulter). Since tumor cells and T cells may be the major part of the sequencing data, we equilibrate the samples with less than 5% EpCam positive cells, 40% T cells, and 55% other single cells.

The procedure for the preparation of transposons in the examples is as follows:

tn5Merev (/ 5 Phos/CTGTCTCTCTTTATACACATCT (SEQ ID NO: 21)), tn5ME-A, tn ME-B and bar-coded R1BxME (x represents 1-96) were prepared. mu.M Tn5Merev, 10. Mu.M Tn5ME-A (for Parall-Split-Seq) or 10. Mu.M Tn5ME-B (for Parall-Seq) were annealed gradually with 10. Mu. M R1BxME at 95℃to 2 minutes, 0.1℃per second to 20℃and 4℃respectively. mu.L annealed Tn5ME-B (Parallel-Seq), 2. Mu.L annealed Tn5ME-R1Bx, 2. Mu.L 10 XTPS, 4. Mu.L transposase (M0221, robustinique) and 10. Mu.LULTRAPure DNase/RNase-Free Water (10977023, thermoFisher) were combined and incubated for 30 minutes at room temperature. The assembled transposon was dispensed 4. Mu.L per tube and stored at-20℃for no more than 1 month.

The method for fixing cells in the examples is as follows:

the cell lines were counted for 50k single cells, or 50k primary cells were sorted for each lung cancer sample. Single cells were centrifuged at 500g for 5min at 4℃and resuspended in 250. Mu.L PBS. 750 μl of PBS (28906, thermoFisher) containing 1.33% methanol-free formaldehyde was added and incubated in ice for 10 minutes. mu.L of 20% BSA (V0332-100G, VWR) was added and centrifuged at 1000g for 3 min at 4℃by barrel swing centrifugation, then the cells were collected on one side of a 1.5mL microcentrifuge tube (MCT-150-C, axygen) by centrifugation in a pre-chilled (4 ℃) fixed angle centrifuge and the supernatant removed by two pipetting steps as with omni ATAC. The results showed that after BSA addition, single cells were isolated by barrel swing centrifugation and more primary cells could be recovered.

The method of permeabilizing the cells in the examples is as follows:

1mL 1M Tris HCl pH 7.4 (T2663-1L, sigma-Aldrich), 200. Mu.L 5M NaCl (AM 9759, thermoFisher), 300. Mu.L 1M MgCl ₂ (AM 9530G, thermoFisher) and 48.5mL of ultrapure DNase/RNase free distilled water were mixed to prepare 2xRSB as Omni ATAC. At the time of immobilization, permeabilization buffers were prepared, each sample bound 50. Mu.L of 2XRSB, 1. Mu.L of ribolock (EO 0384, thermoFisher), 1. Mu.L of UPERase-In RNase inhibitor (AM 2696, thermoFisher), 1. Mu.L of 10% Nonidet P40 surrogate (1133247301, sigma-Aldrich), 1. Mu.L of 10% tween20 (11332465001, sigma-Aldrich), 1. Mu.L of 1% digitonin (D141-100 MG, sigma-Aldrich), 5. Mu.L of 20% BSA, 40. Mu.L of ultrapure DNase/RNase free distilled water. By mixing 500. Mu.L of 2XRSB, 10. Mu.L of 10% TWEEN20, 1. Mu.L of riboLock, 50. Mu.L of 20% BSA. Immediately after removal of the fixing reagent, 100 μl of permeabilization buffer was added, pipetted 8 times and placed in ice for 5 minutes. After permeabilization, 1mL of wash buffer was added to each sample. The cells were centrifuged and the supernatant removed.

The method for the transfer seat in the embodiment is as follows:

an ATAC seq reaction solution was prepared by mixing 10 μl of 5xLM buffer (M0221, robustnique), 16.5 μl of PBS, 0.5 μl RiboLock, 0.5 μl LSUPERase In RNase Inhibitor, 0.5 μl 10% Tween20, 0.5 μl digiton.5 μl, 17.5 μl ultrapure DNase/RNase free distilled water. The permeabilized single cells were resuspended with 46. Mu.L of ATAC-seq reaction solution and 4. Mu.L of barcode-specific transposon was added to each tube. The ATAC-seq reaction was carried out at 37℃under 550r.p.m. conditions with a hot lid. After the ATAC-seq reaction, 949. Mu.L PBS, 10. Mu.L 10% Triton X-100 (93443, sigma-Aldrich), 1. Mu.L riboLock and 50. Mu.L 20% BSA were added to each tube and centrifuged to remove the supernatant.

The methods for intracellular reverse transcription in the examples are as follows:

mu.L of PBS, 0.5. Mu.LRiboLock, 0.5. Mu.LSUPERase In RNase Inhibitor, 7. Mu.L nuclease-free water were mixed and 16. Mu.L of resuspension solution was prepared for each sample. Reverse transcription mixtures were prepared by adding 8. Mu.L of 5 xRT buffer, 2. Mu.L of 10mM dNTP (N0447L, NEB), 0.5. Mu.LRiboLock, 0.25. Mu.L of Maxima H Minus reverse transcriptase (EP 0753 and ThermoFisher), and 5.25. Mu.L of LULTRAPure DNase/RNase-Free distilled water. mu.L of reverse transcription mix was split into each tube and 2. Mu.L of random and 2. Mu.L polyT reverse transcription primer matched by bar code was added. The transposed cells were resuspended with 16 μl of resuspension solution and added to the barcode matched PCR tube. Well mixed, reverse transcribed for 10 minutes at 50℃and then subjected to 3 thermal cycles (12 seconds at 8 ℃, 45 seconds at 15 ℃, 45 seconds at 20 ℃, 30 seconds, 120 seconds at 42 ℃ and 180 seconds at 50), incubated for 5 minutes at 50℃and permanently preserved at 4 ℃. The reverse transcription reactions were pooled into a 1.5mL tube on ice. The cells were centrifuged and the supernatant removed. Cells were again washed with 1mL PBS supplemented with 10. Mu.L of 10% Triton X-100 and 50. Mu.L of 20% BSA. The cells were centrifuged and the supernatant removed.

The ligation reaction procedure in the examples is as follows:

parallel Seq adds a second index using ligation. The ligation junctions contained 7nt complementary strands ligated to transposon and reverse transcription primers, respectively, as well as 10nt index strand, 8nt pore-specific junction, 10nt UMI, and universal PCR anchor for droplet linear amplification. The ligation junctions were annealed by binding 11. Mu.M linker strands and 12. Mu.M barcode strands in a reaction volume of 100. Mu.L prior to intracellular barcode ligation. The plates were incubated at 95℃for 2 minutes and cooled to 20℃at a rate of-0.1℃per second, after which the plates were divided into 10 connection plates, each well containing 10. Mu.L of connection linker.

For Parallel-Split-Seq, the second and third indices are added by ligation. The ligation linker comprises a 10nt sequence complementary to the linker strand, an 8nt pore-specific linker and a 7nt sequence, which are then ligated. The ligation adaptors for the ligation reaction added to the third index contained a 10nt index strand, an 8nt pore-specific adaptor, a 10nt UMI, and a P3 short sequence of the universal PCR primer. The second and third rounds of adapters were annealed according to the Parallel-Split-seq protocol and Split into 10 webs, respectively.

Wherein the intracellular ligation steps are as follows:

The ligation was performed according to the Split-seq protocol, without RNase inhibitor. 2mL 1xNEBuffe 3.1 (B7203S, NEB) and 2mL ligation solution (500. Mu.L 10×T4 DNA ligation buffer, 100. Mu. L T4 DNA ligase (M0082, robustinique), 50. Mu.L 10% Triton x-100 and 1350. Mu.L ultrapure DNase/RNase free distilled water) were prepared. The combined single cells were resuspended with 1 Xbuffer 3.1 and thoroughly mixed with the ligation solution. 40. Mu.L of cells in the ligation mixture was added to each well of the ligation plate. The ligation reaction was spun for 1 hour at 15 r.p.m. at room temperature. After ligation, 2 μl of 500 μM EDTA (AM 9260G, thermo fisher) was added to each well and pooled. The pooled cells were added with 50. Mu.l 10% Triton X-100, 50. Mu.l 20% BSA and the supernatant removed by centrifugation. Cells were again washed with 940. Mu.L PBS, 10. Mu.L 10% Triton X-100 and 50. Mu.L 20% BSA. For Parallel-Split-Seq, after the second index is marked, a third index is added by a ligation reaction.

The methods for RNase digestion in the examples are as follows:

cells were resuspended using RNase digestion reaction (40. Mu.L of 5xRT buffer, 8. Mu. L RNase Cocktail Enzyme Mix (AM 2286, thermoFisher), 8. Mu.LRNAse H (Y9220L, enzymatics) and 144. Mu.LULTRAPure DNase/RNase free distilled water) and incubated at 37℃for 30 min, 300rpm for 15 seconds and then placed on a mixer for 45 seconds. RNase digestion was washed by adding 790. Mu.L PBS and 10. Mu.L 10% Triton X-100, centrifuged and the supernatant removed. BSA is not added in this step. The residual BSA will generate fragments with PEG8000 in the next step.

The procedure for the second strand synthesis in the examples is as follows:

the second strand synthesis reaction mixture (40. Mu.L 5xRT buffer, 48. Mu.L 50% PEG 8000 (B1004 SVIAL, NEB), 20. Mu.L 10mM dNTP, 2. Mu.L 1mM dN-P3 short primer (for Parall-Seq) or dN-P5 short primer (for Parall-slit-Seq), 5. Mu.L Klenow Exo- (M0212L, NEB) and 85. Mu.LULTRAPure DNase/RNase-Free distilled water) was incubated at 37℃for 1 hour and placed on a mixing instrument after 300 r.p.m. 15s for 45 seconds. After second strand synthesis, cells were washed twice with PBS containing 0.1Triton X-100 and 1% BSA. Cells were then resuspended with 40. Mu.L of 0.5xPBS and counted in a trypan blue using a C-Chip one-time cytometer.

Overload using 10x chromatography ATAC-seq kit

For Parallel-Seq, cells were centrifuged and resuspended in 7. Mu.L of ATAC-Seq buffer B and supplemented to 15. Mu.L with 1 Xnucleic buffer. 56.5 μ L Barcoding Reagent B, 1.5 μ L Reducing Agent B and 2 μ L Barcoding Enzyme (PN-1000176, 10x Genomics) were combined with cells and loaded into one channel of Chromium Next GEM Chip H (PN-1000162, 10x Genomics). After GEM generation, the droplets were divided into 16 tubes, each containing 6.25 μl droplets. Linear amplification was performed as follows: 72℃for 5 minutes, 98℃for 30 seconds, then 98℃for 10 seconds, 59℃for 30 seconds, 72℃for 1 minute, 12 cycles. And then stored at 15 ℃ for later use. Parallel-Seq library construction

With the shrinkage of Chromium Next GEM Single Cell ATAC Reagent Kits V1.1.1, post-GEM incubation clean-up was performed. 7.8 μl Recovery Agent was added to each tube and the tubes were gently inverted 10 times to mix. Centrifuge briefly and add 12.5 μ LDynabeads Cleanup Mix. Pipette mix 5 times and incubate at room temperature for 10 minutes. The product was eluted with 81. Mu. L Elution Solution I and split into two parts, 40. Mu.L for the ATAC-seq and 40. Mu.L for the RNA-seq. The ATAC-seq portion was cleaned using 1.2x SPRI loads (B23218, beckman Coulter) and the RNA-seq portion was cleaned using 0.8x SPRI loads. The ATAC-seq library was amplified using SI-PCR primer B (PN-2000128, 10x genomics) and N7xx primers. The RNA-seq pool was amplified with SI-PCR primer B (PN-2000128) and P3xx primers. After amplification, the ATAC-seq portion was cleared with 1.2 XSPRI beads and the RNA-seq portion was cleared with 0.8 XSPRI beads.

Parallel-Split-Seq library construction

The second strand synthesis cells were diluted to 800 cells/μl and 2,000 cells per tube were separated. Mu.l of 2 Xlysate (0.25. Mu.l 1M pH 8.0Tris-HCl, 0.25. Mu.l 10% IGEPAL CA-630 (I8896, sigma Aldrich), 0.25. Mu.l 10% Tween 20, 0.5. Mu.l 291mg/ml QIAGENProtease (19155, QIAGEN) and 1.25. Mu. l UltraPure DNase/RNase-Free distilled water) were added and incubated at 55℃for 8 hours, at 70℃for 15 minutes to inactivate the QIAGEN Protease and permanently hold at 4 ℃. Mu.l of PCR amplification Mix (25. Mu.l NEBNEXt High-Fidelity 2X PCR Master Mix (M0541L, NEB), 2.5. Mu. L N5xx primer, 1.25. Mu. L P5xx primer, 1.25. Mu. L P3xx primer and 15. Mu. L UltraPure DNase/RNase-Free distilled water) was added to amplify the ATAC-seq and RNA-seq fragments. The cycle conditions were 72℃for 5 minutes, 98℃for 30 seconds, then 98℃for 10 seconds, 65℃for 30 seconds, 72℃for 1 minute for 5 cycles, and maintained at 4 ℃. The PCR mixture was separated into an ATAC-seq portion and an RNA-seq portion. The RNA-seq fraction was cleaned using 1.0x AMPure XP beads (A63881, beckman Coulter) and 0.8x AMPure XP beads, respectively. The PCR product was eluted with 22. Mu. l UltraPure DNase/RNase-Free distilled water. A second round of PCR amplification was performed by adding 28. Mu.l of the PCR reaction mixture (25. Mu.l NEBNEXext High-Fidelity 2X PCR Master Mix, 1.25. Mu. l N5xx primer, 1.25. Mu. l P3. Mu.3_end primer, 0.5. Mu.l 25X SYBR Green I (S7563, thermoFisher) for ATAC-seq,); 25 μl NEBNEext High-Fidelity 2X PCR Master Mix,1.25 μ l N xx primer, 1.25 μ l P3_end primer, 0.5 μl 25x SYBR Green I for RNA-seq). We amplified each sub-pool of the ATAC-seq and RNA-seq libraries on the Quantum 3 real-time PCR system (ThermoFisher), followed by amplification and stopped each sub-pool when the fluorescence unit value reached 100,000. Empirically, cell lines or tumor cells have a cycle number of about 6-8, and other primary cells have a cycle number of about 7-10, with less than 11 being acceptable. The ATAC-seq pool was cleaned up using 1.0x AMPure XP beads and the RNA-seq pool was cleaned up using 0.8x AMPure XP beads, respectively. The library was eluted with 20 μl of elution buffer (19086, QIAGEN) and the concentration was determined using the Qubit dsDNA HS detection kit (Q32851, thermoFisher). Each sub-pool is expected to recover more than 20ng of product. Quality control was performed using Agilent high sensitivity D1000 Screen tape Assay (5067-5584 and 5067-5585, agilent).

The sequencing method in the examples is as follows:

Parallel-Seq library was sequenced using an Illumina NovaSeq 6000 sequencing system with 16nt i5 index, 8nt i7 index and PE150 sequencing.

Parallel-Split-Seq library was sequenced using either the Illumina Hiseq X10 system or the Novaseq 6000 sequencing system, standard PE150 sequencing had 8nt i5 index and 8nt i7 index.

Pretreatment of Parallel-Seq and Parallel-Split-Seq data

We used read1 for Parallel-Seq cell streaksThe codes and ligation junctions were sequenced. To balance sequencing nucleotide combinations, we added phase nucleotides between barcode2 and linker, no nucleotides at positions 1-24, T at positions 25-48, CA at positions 49-72, ACA at positions 73-96. 1-24 for barcode2 ^th The Parallel-Seq's barcode1, barcode2, barcode3 and barcode4 should be located within 36-41st, 11-18th, i5 index, i7 index. 25-48 for barcode2 ^th 、49-72 ^nd And 73-96 ^th The position of barcode1 need only be changed by one nucleotide step. The unique molecular identifier is located between 1 and 10 of read1 ^th And (3) inner part. The sequence following Tn5ME of read1 is one Tn5 cleavage site, while read2 provides another Tn5 cleavage site for the ATAC-seq. Read2 of the RNA-seq library starts from the second strand synthesis annealing site and is identical to the RNA sequence of the target gene.

For Parallel-Split-Seq, the cell barcode and linker were sequenced using read 2. The barcode1, the barcode2, the barcode3 and the barcode4 should be located within the 61-66, 36-43, 11-18 and i7 indexes. The sequence following Tn5ME of read2 is a Tn5 cleavage site, while read1 provides another fragment for the ATAC-seq. Read1 of the RNA-seq library provides the RNA sequence of the gene of interest.

The original reading was trimmed with cutadapt. The bar code is parsed by the FREE Difference software, allowing only one edit per round of bar code. Data with embedded terminal sequences were screened from the RNA library and data without embedded terminal sequences were screened from the ATAC library. Data were aligned with hg38, mm10 or combination genomes using STAR.

For single cell RNA-seq, UMI is folded and a digital gene expression matrix is generated using the modified python script from the Split-seq conduit. For single cell ATAC-seq, mitochondrial readings were deleted. The richness of TSS accessibility is calculated as previously described to assess data quality. Cells with TSS enrichment <6 were discarded. Tn5 insertions were then calculated over the 2-kb bins (interval) in the whole genome.

Cells expressing <200 genes, <200bin were discarded. We use a Scrublet to predict the doublet probability and remove the doublet by default threshold.

For mixed cell line data, <90% UMI mapped to cells of one species were considered mixed cells.

The control procedure in the examples is as follows:

see literature for steps of sci-CAR: cao, J.et al. Join profiling of chromatin accessibility and gene expression in thousands of single cells. Science361,1380-1385 (2018).

See literature: zhu, C.et al, an ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat Struct Mol biol26,1063-1070 (2019).

For steps of SNARE-Seq see: chen, s., lake, b.b. & Zhang, k.high-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat biotechnol37,1452-1457 (2019).

See literature: ma, S.et al, chromain Potential Identified by Shared Single-Cell Profiling of RNA and chromain. Cell183,1103-1116e20 (2020).

The "X … X" and "N … N" representing bases in a sequence of the present application may each represent any natural or modified or known base type in the art, where "X" is used interchangeably with "N" and include, but are not limited to A, T, C, G or U. V represents A, C or G. B represents C, G, T or U. Of course, when "X" represents an amino acid, then it represents a natural or modified type of amino acid known in the art.

EXAMPLE 1 Parallel-Seq analysis of RNA and open chromatin in the same Single cell of multiple samples

The experimental design of Parallel-Seq is shown in FIG. 1. The method comprises the following specific steps:

(1) Parallel-Seq started with 27 different samples, 50,000 single cells per sample;

(2) Fixing and permeabilizing the cells of each sample and using the bar code Tn5 transposon, marking the open chromatin with a bar code specific to the transposon; wherein, the unique bar code sequence Tn5ME-B of the transposon is shown in SEQ ID NO:1, with a first indexTn5ME-x (x represents 1-27) has the sequence shown in SEQ ID NO:2, in the sequenceXXXXXXRepresents the first index, see table 1.

Tn5ME-B：GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG(SEQ ID NO：1)

Tn5ME-x：/5Phos/TGCAGTAXXXXXXAGATGTGTATAAGAGACAG(SEQ ID NO：2)

(3) mRNA from each sample was reverse transcribed using barcode-matched poly (T) primers R1BxT VN and random hexamer primer R1BxN6, reverse transcription primers R1BxT15VN and R1BxN6 (x represents 1-27) as set forth in SEQ ID NO:3 and SEQ ID NO:4, wherein XXXXXX in the sequence represents the first index, see Table 1;

R1BxT15VN：/5Phos/TGCAGTAXXXXXXTTTTTTTTTTTTTTTVN(SEQ ID NO：3)。

R1BxN6：/5Phos/TGCAGTAXXXXXXNNNNNN(SEQ ID NO：4)

(4) Cells from different samples were combined and randomly allocated to 96-well plates, each well containing a dscB' sequence (SEQ ID NO: 5), and a well-specific linker sequence dscBx was ligated to the first strand of the transposed chromatin or cDNA, wherein the well-specific linker sequence dscBx (x represents 1-96) was as set forth in SEQ ID NO:6, "NNNNNNNNNNNN" in the sequence is UMI, a phase nucleotide is added between the second index and UMI, no nucleotide is added to the dscB1-dscB24, T is added to the dscB25-dscB48, CA is added to the dscB49-dscB72, ACA is added to the dscB73-dscB96, and the sequence is shown in the figure XXXXXXXXRepresents a second index, see table 2;

dscB' sequence: TACTGCACTCAGTGACT (SEQ ID NO: 5)

dscBx sequence: TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNNNNNNNNNXXXXXXXXAGTCACTGAG(SEQ ID NO：6)

(5) Digesting the RNA with RNase;

(6) Second strand synthesis of random primer a second PCR anchor was attached to the cDNA, wherein the primer used for second strand synthesis was a p3 short primer.

p3 short primer: CAGACGTGTGCTCTCTTTCCGATCTNNNGGNNNB (SEQ ID NO: 7)

(7) All cells were pooled and overloaded into one channel of the chromia scattac-seq chip;

(8) Cracking ofDecellularization, adding a droplet specific marker p5 adapter, a third index, to the linear amplification within the droplet, see Table 2, wherein the linear amplification primer is shown as (SEQ ID NO: 8), whereinXXXXXXXXXXXXXXXXThe third index information is the specific index of the beads in each droplet;

5’-AATGATACGGCGACCACCGAGATCTACAC-XXXXXXXXXXXXXXXX-TCGTCGGCAGCGTC-3’(SEQ ID NO：8)

(9) The droplets are further divided into 16 PCR tubes for PCR purification;

(10) Dividing the purified product in each PCR tube into two parts, and amplifying the transcriptome and the open chromatin fragment with the corresponding primers, wherein the amplified transcriptome uses primers SI-PCR primer B (SEQ ID NO: 9) and P3xx primer (SEQ ID NO: 10), in sequenceXXXXXXXXA fourth index representing the primer sequences required for transcriptome amplification, see the P3xx index in table 2; amplification of open chromatin fragments Using primers SI-PCR primer B (SEQ ID NO: 9) and N7xx primer (SEQ ID NO: 11), in sequence XXXXXXXXA fourth index representing the primer sequences required for amplifying open chromatin fragments, see index N7xx in table 1;

SI-PCR primer B: AATGATACGGCGACCACCGAGA (SEQ ID NO: 9)

P3xx primer: CAAGCAGAAGACGGCATACGAGATXXXXXXXXGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT(SEQ ID NO：10)

N7xx primer: CAAGCAGAAGACGGCATACGAGATXXXXXXXXGTCTCGTGGGCTCGG(SEQ ID NO：11)

(11) After sequencing and barcode analysis, 4 rounds of index the gene expression and chromatin accessibility maps of the same combination represent a single cell paired map. In principle, after 4 rounds of indexing, the barcode space is largely extended to (96 x96x100000x 16. Apprxeq.1.47x10) ¹⁰ ) This enables Parallel-Seq to evaluate over 100 tens of thousands of cells in one experiment with very low collision rates.

TABLE 1

First index	N7xx index
		AACAAC	TCGCCTTA
ACCGCA	CTAGTACG
		AGTTGG	TTCTGCCT
CCACGT	GCTCAGGA
		CGTGTT	AGGAGTCC
GTTCTC	CATGCCTA
		TGACTA	GTAGAGAG
TCAAGG	CCTCTCTG
		AACGGT	AGCGTAGC
AAGCCT	CAGCCTCG
		ACATGA	TGCCTCTT
ACTCTA	TCCTCTAC
		AGAAGT
AGTACC
		ATGCGA
CAATAG
		CATCCA
CCTGGA
		CGAGAC
CGCTCA
		GCGTAA
GGATCG
		GTGAGG
TCCTTA
		TCTGCC
TTAACC
		TTAGTG

TABLE 2

/>

Example 2 Parallel-Seq Performance verification

Parallel-Seq (procedure same as example 1) was performed with a mixture of NIH/3T3 (mouse), HEK293T (human) and K562 (human) cells, and a quality screen was performed to obtain a transcriptome and chromatin accessibility of 2200 cells, wherein the average UMI of the scRNA sequence portion was 7014 and the average UMI of the scATAC sequence portion was 10103. Reads from both human and mouse cells were well separated in transcriptome and staining mass spectrum, where transcriptome was assigned to 802 mouse cells and 1398 human cells, and staining mass spectrum was assigned to 805 mouse cells and 1398 human cells, with few doulblets, collision rates of 0.2% and 0.1% for both profiles, respectively (fig. 2).

The insert size distribution of the aggregated scATAC-seq data showed clear nucleosome binding pattern (fig. 3), and the TSS enrichment score of sequencing reads was as high as 14 (fig. 4), indicating that the scATAC-seq data was acceptable.

The aggregated single cell chromatin accessibility and transcriptome profile generated by Parallel-Seq are closely related to the bulk DNase sequence (ENCFF 156LGK, R=0.79) (FIG. 5) and the nuclear RNA sequence (ENCFF 631TDY, R=0.81) (FIG. 6) of K562 cells in ENCODE, respectively (FIG. 7). Furthermore, the expression and chromatin accessibility profiles of each cell were clustered together within the cell type, separated from each other (fig. 8-9). Taken together, these data demonstrate the high specificity and high quality of Parallel-Seq.

The data quality of Parallel-Seq was further compared with sci-CAR, paired-Seq, SNARE-Seq and SHARE-Seq. Parallel Seq showed better data quality on both libraries than the most advanced method SHARE Seq (FIGS. 10-11), with a greater number of UMI for the ATAC fragment and RNA, a greater number of genes captured, and a greater bandwidth than the other methods.

Example 3 Parallel-Split-Seq and Performance verification

To make it easier to use, further reduce costs, parallel-Split-Seq was developed, where the addition of the third index in Parallel-Seq was changed, i.e. adding the third index from linear amplification in the droplet was changed to adding a round of ligation reaction on the plate to add the third index. The step of linear amplification of the liquid drop is still included, but the third index is not added in the step, and the bar code space is 24x96x96x 96-2.12 x10 ⁷ (FIG. 12).

In this example, a mixture of NIH/3T3 (mouse), HEK293T (human), hela (human), K562 (human) and THP1 (human) cells was used to perform Parallel-Split-Seq (procedure similar to example 1) as follows:

(1) Fixing and permeabilizing the cells of each sample and using the bar code Tn5 transposon, marking the open chromatin with a bar code specific to the transposon; wherein, the unique bar code sequence Tn5ME-A of the transposon is shown in SEQ ID NO:12, the Tn5ME-x (x represents 1-27) sequence with the first index is shown in SEQ ID NO:2, in the sequenceXXXXXXRepresents a first index, see table 1;

Tn5ME-A：TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG(SEQ ID NO：12)

(2) mRNA from each sample was reverse transcribed using barcode-matched poly (T) primers R1BxT VN and random hexamer primer R1BxN6, reverse transcription primers R1BxT15VN and R1BxN6 (x represents 1-27) as set forth in SEQ ID NO:3 and SEQ ID NO:4, wherein XXXXXX in the sequence represents the first index, see Table 1;

(3) Combining and randomly distributing cells of different samples into a 96-well plate, performing 2 ligation reactions, adding a second index and a third index, wherein each well contains R2 'sequence (SEQ ID NO: 13) when the second index is added, each well contains R3' sequence (SEQ ID NO: 14) when the third index is added, and ligating a pore-specific linker sequence to a first strand of transposed chromatin or cDNA, wherein the pore-specific linker sequence with the second index is as shown in R2Bx (SEQ ID NO: 15), wherein the first strand is a cDNA sequence of the first strand XXXXXXXXRepresents a second index, see table 2; the third indexed pore-specific linker sequence is shown as R3Bx (SEQ ID NO: 16), in sequenceXXXXXXXXRepresents a third index, see table 2;

r2' sequence: TACTGCAGCTGAACCTC (SEQ ID NO: 13)

R3' sequence: TCTCCAAAGCTGTGGAC (SEQ ID NO: 14)

R2Bx sequence: 5Phos/TTGGAGAXXXXXXXXGAGGTTCAGC(SEQ ID NO：15)

R3Bx sequence: CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNXXXXXXXXGTCCACAGCT(SEQ ID NO：16)。

(4) Digesting the RNA with RNase;

(5) Second strand synthesis of random primer a second PCR anchor was attached to the cDNA, wherein the primer used for second strand synthesis was a p5 short primer.

P5 short primer: ACACCGACGCTCTTCCGATCTNNNGGNNNB (SEQ ID NO: 17)

(6) All cells were pooled together, counted and diluted to 800 cells/ul, and split into PCR tubes with 2.5ul cells per tube;

(7) Cells were lysed in PCR tubes and amplified directly with the addition of a PCR amplification system comprising P5xx (SEQ ID NO: 18), N5xx (SEQ ID NO: 19) and P3xx (SEQ ID NO: 10), and a fourth index was added, see Table 2;

(8) Each PCR was purified and the purified product was divided into two parts, and the transcriptome and the accessibility chromatin fragment were amplified with the corresponding primers, respectively, wherein the amplified transcriptome used the P3 end primer (SEQ ID NO: 20) and the P5xx (SEQ ID NO: 18), in the P5xx sequence XXXXXXXXA fourth index for amplified transcriptomes, see table 3; amplification of open chromatin fragments Using primers p3 end (SEQ ID NO: 20) and N5xx (SEQ ID NO: 19), in the N5xx sequenceXXXXXXXXThe fourth index for amplifying open chromatin fragments is shown in Table 3.

p3 end sequence: CAAGCAGAAGACGGCATACGAGAT (SEQ ID NO: 20)

P5xx sequence:

AATGATACGGCGACCACCGAGATCTACACXXXXXXXXACACTCTTTCCCTACACGACGCTCTTCCGATCT(SEQ ID NO：18)

n5xx sequence: AATGATACGGCGACCACCGAGATCTACACXXXXXXXXTCGTCGGCAGCGTC(SEQ ID NO：19)

TABLE 3 Table 3

P5xx index	N5xx index
		TATAGCCT	TAGATCGC
ATAGAGGC	CTCTCTAT
		CCTATCCT	TATCCTCT
GGCTCTGA	AGAGTAGA
		AGGCGAAG	GTAAGGAG
TAATCTTA	ACTGCATA
		CAGGACGT	AAGGAGTA
GTACTGAC	CTAAGCCT
			CGTCTAAT
	TCTCTCCG
			TCGACTAG
	TTCTAGCT
			CCTAGAGT
	GCGTAAGA
			CTATTAAG
	AAGGCTAT
			GAGCCTTA
	TTATGCGA

The results showed that Parallel-Split-Seq had better specificity, low collision rate and high correlation with large amounts of data (see FIGS. 13-18). Moreover, the Parallel-Split-Seq performs quite as well as Parallel-Seq, and is superior to the prior art methods (see FIGS. 19-20).

The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present invention within the scope of the technical concept of the present invention, and all the simple modifications belong to the protection scope of the present invention. In addition, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described further.

Claims

1. A method of constructing a single cell sequencing library, comprising:

d) Synthesizing a second strand of the cDNA;

e) And the cDNA and chromatin DNA of the transcriptome library are amplified with primers, respectively.

2. The method of claim 1, wherein the transposon comprises a barcode sequence and a transposase;

preferably, the barcode sequence comprises a first linker;

preferably, the barcode sequence comprises a first index.

3. The method of claim 1 or 2, wherein the reverse transcription primer comprises a second adaptor comprising poly (T) and a first index; preferably also random hexamer primers.

4. A method according to any one of claims 1-3, wherein the first index comprises a combination of at least one, two or more of AACAAC, ACCGCA, AGTTGG, CCACGT, CGTGTT, GTTCTC, TGACTA, TCAAGG, AACGGT, AAGCCT, ACATGA, ACTCTA, AGAAGT, AGTACC, ATGCGA, CAATAG, CATCCA, CCTGGA, CGAGAC, CGCTCA, GCGTAA, GGATCG, GTGAGG, TCCTTA, TCTGCC, TTAACC or TTAGTG.

5. The method of any one of claims 1-4, wherein the first vector-specific linker comprises a second index, preferably wherein the second index comprises a combination of at least one, two, or more than three of AAGACCAA, AAGCTACG, AAGGTCAT, AATAGTGG, AATGCCTT, ACAATAGC, ACAGGATT, ACCGACCT, ACCTAGAT, ACGAGTCC, ACGGACGA, ACGTTCAA, ACTATCTG, ACTCCGAA, AGAACAGA, AGACGCTT, AGATGCGA, AGCCACTC, AGCGAAGC, AGGTAACG, AGTACATC, AGTGATTC, ATAAGAGG, ATATCACG, ATCGCCGT, ATGACGGA, ATGGAATG, ATTCCTAC, CAACGCCA, CAAGTCTG, CACACATC, CACCTTAT, CAGAACCT, CAGCCGAT, CATACTGT, CATCCACC, CATTGAGC, CCAAGCGT, CCACGACT, CCATTGTC, CCGCATGT, CCTACTCC, CCTCCTTG, CCTTAATG, CGAATATC, CGAGAGCA, CGCCTCAA, CGCGTTAC, CGGACTCT, CGGTTGTT, CGTAGCTT, CGTGCCAA, CTACCGGA, CTAGCAGT, CTCAGCCT, CTCTTCTA, CTGCTGGT, CTGTATTC, CTTCGCTC, GAAGAGTA, GACACCTA, GACGTGAG, GACTTACT, GAGGACAA, GAGTTAAG, GATCCTCG, GCAATCCG, GCAGTGTG, GCCGCTAA, GCGACCAT, GCTAAGAC, GCTGTAGG, GGAACTGG, GGACAGTT, GGATTGCT, GGTCCTAA, GTACCTGT, GTCAAGGA, GTCTGCTT, GTGCTCCA, GTGTGACC, GTTATTGG, TAATTCGG, TACCAATC, TAGACTCC, TAGTCAAC, TCACGTTG, TCAGAATG, TCCAGCTT, TCCTGCGA, TCGGTTCC, TCTTACCT, TGACATGG, TGCCTATA, TGGTGTGG, TGTACTAG.

6. The method according to any one of claims 1-5, further comprising the steps of forming droplets, lysing the cells and performing an amplification reaction in the droplets, preferably overloading the cells in the formed droplets.

7. The method of claim 6, wherein the primer used to perform the amplification reaction in the droplet comprises a third index.

8. The method according to any one of claims 1 to 5, characterized in that the method comprises ligating the DNA fragment carrying the first adaptor obtained in claim 1 and the first strand of the cDNA obtained in claim 1, respectively, to the vector using a second vector-specific adaptor, preferably said second vector-specific adaptor comprises a third index.

9. The method of claim 7 or 8, wherein the third index comprises a combination of at least one, two or more than three of AACCTCTT, AACGTCGC, AAGAATCG, AAGCGGTG, AAGGAGCT, AATACCGC, AATCTCCA, ACAACTTC, ACACGCAA, ACCACAGT, ACCGTGTA, ACCTTGCC, ACGCATAA, ACGTATGG, ACTAACCA, ACTCAGGT, ACTTGTTG, AGAAGTAC, AGAGATGA, AGATTAGG, AGCCTGGT, AGCTCTAA, AGGTGTCT, AGTCCGTT, AGTTCGCA, ATAAGCTC, ATCCATGA, ATCTAGCG, ATGCAACC, ATGTGCAG, ATTGGTAG, CAAGAAGA, CAATGGAC, CACATGCT, CACGGTAG, CAGAGGTT, CAGTATAG, CATCAAGT, CATGTTCC, CCAACAAT, CCAATTAC, CCAGTGAA, CCGATCAG, CCGGTCTT, CGACAACG, CGCCAGTA, CGCGGAAT, CGGAAGGA, CGGTGAGA, CGTAACAC, CGTCTATG, CGTTCTCG, CTACTAAG, CTAGTGCG, CTCTGACA, CTGATGAA, CTGGTACA, CTTACGAG, GAACTCAA, GAATGTTG, GACGAATT, GACTGCCA, GAGCTATT, GAGTCGGA, GATAGAAC, GATGGTCT, GCAGCACT, GCATTCAT, GCCTCTGT, GCGCAGAT, GCTCACAA, GCTTGCGT, GTAATGCA, GTATCGAG, GTCGATCT, GTGAGCGT, GTGGATAG, GTTAGCCA, TAAGGTGG, TACACCGG, TACTCGTC, TAGCTGAG, TCAACAGG, TCACTCAC, TCATAGAC, TCCGTACA, TCGGAGTA, TCGTCGGT, TGAACGCG, TGAGTCTT, TGCGACTG, TGGTTATC, TGTGTAAG, TTAGGAAC, TTCAGTGG, TTCTATCC.

10. The method of any one of claims 1-9, wherein the primers in the amplification reaction performed in step e) comprise a fourth index;

preferably, the fourth index comprises a combination of at least one, two or more than three of the N7xx indexes;

preferably, the fourth index comprises a combination of at least one, two or more than three of the P5xx indexes;

preferably, the fourth index comprises a combination of at least one, two or more than three of the N5xx indexes;

wherein, the P3xx index, the N7xx index, the P5xx index or the N5xx index are as follows:

11. the method of any one of claims 2-10, wherein the carrier comprises a well, tube or plate.

12. The method of any one of claims 1-11, further comprising adding an rnase.

13. The method of any one of claims 1-12, further comprising obtaining cells, immobilizing and permeabilizing the cells.

14. A method of constructing a transcriptome library, said method comprising reverse transcribing mRNA with the addition of a reverse transcription primer to obtain a first strand of cDNA carrying a second adaptor; placing the cell on a vector, and ligating the obtained first strand of the cDNA to the vector using a first vector-specific adaptor; synthesizing a second strand of the cDNA; the transcriptome cDNA was purified and amplified with primers.

15. A method of constructing a chromatin DNA library, said method comprising cleaving open chromatin using a transposon to obtain a DNA fragment carrying a first adaptor; placing the cells on a carrier, and connecting the obtained DNA fragment carrying the first linker to the carrier by using a first carrier specific linker; the DNA was purified and the chromatin DNA was amplified separately with primers.

16. A nucleic acid library obtained by the method of any one of claims 1-15.

17. A sequencing method comprising constructing the nucleic acid library of claim 16.

18. Use of a nucleic acid library according to claim 16, wherein said use comprises tumor target screening, disease monitoring or preimplantation embryo diagnosis.