WO2024121354A1 - Séquençage duplex avec extrémités d'adn fermées de manière covalente - Google Patents

Séquençage duplex avec extrémités d'adn fermées de manière covalente Download PDF

Info

Publication number
WO2024121354A1
WO2024121354A1 PCT/EP2023/084817 EP2023084817W WO2024121354A1 WO 2024121354 A1 WO2024121354 A1 WO 2024121354A1 EP 2023084817 W EP2023084817 W EP 2023084817W WO 2024121354 A1 WO2024121354 A1 WO 2024121354A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
acid molecule
sequence
double
adapter
Prior art date
Application number
PCT/EP2023/084817
Other languages
English (en)
Inventor
René Cornelis Josephus Hogers
Theodorus Frank Maria ROELOFS
Original Assignee
Keygene N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Keygene N.V. filed Critical Keygene N.V.
Publication of WO2024121354A1 publication Critical patent/WO2024121354A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present invention is in the field of molecular biology, more in particular in the field of genomics and genetic research. Particularly, the invention is in the field of (deep-)sequencing. Disclosed are new methods and means for improving the accuracy of deep-sequencing, for library preparation, and for complexity reduction of nucleic acid samples.
  • a significant component of genetic research is the sequence analysis of known and unknown nucleotide sequences.
  • Current sequencing methods are however hampered by their ability to accurately determine the sequence of a nucleic acid molecule. For example, for genotyping and SNP calling it is essential to be able to distinguish between a genuine nucleotide (variant) and a sequencing artifact, especially in case of a low read coverage.
  • Nanopore sequencing technology platform is able to produce reads of hundreds of kilobases in size with the potential to go even beyond megabase-sized read lengths.
  • the platform relies on passing of single strands of nucleic acid molecules through a small protein channel (nanopore) that is embedded in an electrically resistant membrane. Single molecules entering the nanopore cause characteristic disruptions in the current. Measuring this disruption, DNA or RNA molecules can be characterized.
  • Embodiment 1 A method for determining a sequence of interest in a double-stranded nucleic acid molecule, wherein the method comprises the steps of: a) providing a sample comprising the double-stranded nucleic acid molecule; b) adding a protelomerase recognition site to an end of the double-stranded nucleic acid molecule; c) generating a double-stranded nucleic acid molecule having one open end and one closed end comprising contacting the nucleic acid molecule obtained in step b) with a protelomerase that cleaves and covalently closes the protelomerase recognition site; d) sequencing both strands of at least part of the double-stranded nucleic acid molecule in a single sequencing reaction to generate a duplex read; and e) generating a consensus sequence from the duplex read to determine the sequence in the double-stranded nucleic acid molecule.
  • Embodiment 2 The method according to embodiment 1 , wherein the protelomerase is TelN and wherein the protelomerase recognition site is a TelN protelomerase recognition site.
  • Embodiment 3 The method according to embodiment 1 or 2, wherein the nucleic acid molecule is provided by fragmentation of a longer nucleic acid molecule, wherein the longer nucleic acid molecule is preferably a genomic nucleic acid molecule.
  • Embodiment 4 The method according to embodiment 3, wherein the fragmentation is performed by restriction endonuclease and/or site-directed endonuclease digestion.
  • Embodiment 5 The method according to embodiment 4, wherein the restriction enzyme and/or site-directed endonuclease creates a single-stranded overhang to the double-stranded nucleic acid molecule.
  • Embodiment s The method according to any one of embodiments 1-5, wherein in step b) the protelomerase recognition site is added by ligating an adapter or amplification with a primer.
  • Embodiment 7 The method according to embodiment 6, wherein the adapter or primer comprises an identifier sequence and wherein the identifier sequence remains covalently attached to the nucleic acid molecule after step c).
  • Embodiment 8 The method according to embodiment 6 or 7, wherein the nucleic acid molecule provided in step a) comprises a single-stranded overhang and wherein at least one of: i) the adapter comprises an overhang that can be ligated to the overhang of the nucleic acid molecule; and ii) the primer is capable of annealing to at least the overhang of the nucleic acid molecule.
  • Embodiment 9 The method according to any one of embodiments 6-8, wherein the adapter comprising the protelomerase recognition site is ligated by tagmentation.
  • Embodiment 10 The method according to any one of embodiments 1-9, wherein step c) comprises the (sub-)steps of: c1) closing both ends of the double-stranded nucleic acid molecule; c2) optionally exposing the sample to an exonuclease; and c3) cleaving the closed double-stranded nucleic acid molecule, thereby generating a double-stranded nucleic acid molecule having one open end and one closed end.
  • Embodiment 11 The method according to embodiment 10, wherein the closed double-stranded nucleic acid molecule in step c3) is cleaved by a site-directed endonuclease or a restriction endonuclease, wherein preferably the site-directed endonuclease is an RNA-guided CRISPR nuclease or a TALENs.
  • Embodiment 12 A method according to any one of embodiments 1 - 11 , wherein the method is performed for a plurality of samples, and wherein preferably the plurality of samples are pooled prior to step d), optionally prior to step c).
  • Embodiment 13 A method according to any one of embodiments 1-12, wherein a sequencing adapter is ligated to the open end of the double-stranded nucleic acid after step c) and prior to step d)
  • Embodiment 14 A method according to any one of embodiments 1-13, wherein the sequencing of step d) is nanopore sequencing.
  • the term “about” is used to describe and account for small variations.
  • the term can refer to less than or equal to ⁇ 10%, such as less than or equal to ⁇ 5%, less than or equal to ⁇ 4%, less than or equal to ⁇ 3%, less than or equal to ⁇ 2%, less than or equal to ⁇ 1 %, less than or equal to ⁇ 0.5%, less than or equal to ⁇ 0.1 %, or less than or equal to ⁇ 0.05%.
  • amounts, ratios, and other numerical values are sometimes presented herein in a range format.
  • range format is used for convenience and brevity and should be understood flexibly to include numerical values explicitly specified as limits of a range, but also to include all individual numerical values or sub-ranges encompassed within that range as if each numerical value and sub-range is explicitly specified.
  • a ratio in the range of about 1 to about 200 should be understood to include the explicitly recited limits of about 1 and about 200, but also to include individual ratios such as about 2, about 3, and about 4, and subranges such as about 10 to about 50, about 20 to about 100, and so forth.
  • the term "adapter” is a single-stranded, double-stranded, partly doublestranded, Y-shaped or hairpin nucleic acid molecule that can be attached, preferably ligated, to the end of other nucleic acids, e.g., to one or both strands of a double-stranded DNA molecule, and preferably has a limited length, e.g., about 10 to about 200, or about 10 to about 100 bases, or about 10 to about 80, or about 10 to about 50, or about 10 to about 30 base pairs in length, and is preferably chemically synthesized.
  • the double-stranded structure of the adapter may be formed by two distinct oligonucleotide molecules that are at least partly base paired with one another, or by a hairpin structure of a single oligonucleotide strand.
  • the adapter that will be attached to the nucleic acid has two open ends, i.e. is formed by two distinct oligonucleotides.
  • At least one end of the adapter is designed to be compatible with the end of a nucleic acid.
  • the compatible end can be a blunt end or a single-stranded overhang.
  • the attachable end of an adapter may be designed to be compatible with, and optionally ligatable to, overhangs made by cleavage by a restriction enzyme and/or site-directed nuclease, may be designed to be compatible with an overhang created after addition of a non-template elongation reaction (e.g., 3’-A addition).
  • a non-template elongation reaction e.g., 3’-A addition.
  • Amplification used in reference to a nucleic acid or nucleic acid reactions, refers to in vitro methods of making copies of a particular nucleic acid, such as a target nucleic acid, or a tagged nucleic acid. Numerous methods of amplifying nucleic acids are known in the art, and amplification reactions include polymerase chain reactions, ligase chain reactions, strand displacement amplification reactions, rolling circle amplification reactions, transcription-mediated amplification methods such as NASBA (e.g., U.S. Pat. No. 5,409,818), loop mediated amplification methods (e.g., “LAMP” amplification using loop-forming sequences, e.g., as described in U.S. Pat. No.
  • NASBA e.g., U.S. Pat. No. 5,409,812
  • loop mediated amplification methods e.g., “LAMP” amplification using loop-forming sequences, e.g., as described in U.S. Pat
  • the nucleic acid that is amplified can be DNA comprising, consisting of, or derived from DNA or RNA or a mixture of DNA and RNA, including modified DNA and/or RNA.
  • the products resulting from amplification of a nucleic acid molecule or molecules i.e., “amplification products”
  • the starting nucleic acid is DNA, RNA or both
  • amplification products can be either DNA or RNA, or a mixture of both DNA and RNA nucleosides or nucleotides, or they can comprise modified DNA or RNA nucleosides or nucleotides.
  • a “copy” can be, but is not limited to, a sequence having full sequence complementarity or full sequence identity to a particular sequence. Alternatively, a copy does not necessarily have perfect sequence complementarity or identity to this particular sequence, e.g. a certain degree of sequence variation is allowed. For example, copies can include nucleotide analogs such as deoxyinosine or deoxyuridine, intentional sequence alterations (such as sequence alterations introduced through a primer comprising a sequence that is hybridizable, but not complementary, to a particular sequence), and/or sequence errors that occur during amplification.
  • complementarity is herein defined as the sequence identity of a sequence to a fully complementary strand (e.g. the second, or reverse, strand).
  • a sequence that is 100% complementary (or fully complementary) is herein understood as having 100% sequence identity with the complementary strand and e.g. a sequence that is 80% complementary is herein understood as having 80% sequence identity to the (fully) complementary strand.
  • construct refers to a man-made nucleic acid molecule resulting from the use of recombinant DNA technology and which can be used to deliver exogenous DNA into a host cell, often with the purpose of expression in the host cell of a DNA region comprised on the construct.
  • the vector backbone of a construct may for example be a plasmid into which a (chimeric) gene is integrated or, if a suitable transcription regulatory sequence is already present (for example a (inducible) promoter), only a desired nucleotide sequence (e.g., a coding sequence) is integrated downstream of the transcription regulatory sequence.
  • Vectors may comprise further genetic elements to facilitate their use in molecular cloning, such as e.g., selectable markers, multiple cloning sites and the like.
  • double-stranded and “duplex” as used herein describes two complementary polynucleotides that are base-paired, i.e., hybridized together.
  • Complementary nucleotide strands are also known in the art as reverse-complement.
  • the two complementary strands may also be described herein as a forward and reverse strand, or a first and a second strand.
  • an effective amount of an exonuclease refers to an amount of a biologically active agent that is sufficient to elicit a desired biological effect.
  • an effective amount of an exonuclease may refer to the amount of the exonuclease that is sufficient to induce cleavage of an unprotected or “open-ended” nucleic acid.
  • the effective amount of an agent may vary depending on various factors such as the agent being used, the conditions wherein the agent is used, and the desired biological effect, e.g. degree of nuclease cleavage to be detected.
  • “Expression” this refers to the process wherein a DNA region, which is operably linked to appropriate regulatory regions, particularly a promoter, is transcribed into an RNA, which in turn can be translated into a protein or peptide.
  • a “guide sequence” is to be understood herein as a sequence that directs an RNA or DNA guided endonuclease to a specific site in an RNA or DNA molecule.
  • guide sequence is furtherto be understood herein as the section of the sgRNA or crRNA, which is required for targeting a gRNA-CAS complex to a specific site in a duplex DNA.
  • a gRNA-CAS complex is to be understood herein a CAS protein, also named a CRISPR- endonuclease or CRISPR-nuclease, which is complexed or hybridized to a guide RNA, wherein the guide RNA may be a crRNA, a combination of a crRNA and a tracrRNA, or a sgRNA.
  • sequence identity and “sequence similarity” can be determined by alignment of two peptide or two nucleotide sequences using global or local alignment algorithms, depending on the length of the two sequences. Sequences of similar lengths are preferably aligned using a global alignment algorithm (e.g. Needleman Wunsch) which aligns the sequences optimally over the entire length, while sequences of substantially different lengths are preferably aligned using a local alignment algorithm (e.g. Smith Waterman).
  • a global alignment algorithm e.g. Needleman Wunsch
  • Sequences may then be referred to as "substantially identical” or “essentially similar” when they (when optimally aligned by for example the programs GAP or BESTFIT using default parameters) share at least a certain minimal percentage of sequence identity (as defined below).
  • GAP uses the Needleman and Wunsch global alignment algorithm to align two sequences over their entire length (full length), maximizing the number of matches and minimizing the number of gaps. A global alignment is suitably used to determine sequence identity when the two sequences have similar lengths.
  • the default scoring matrix used is nwsgapdna and for proteins the default scoring matrix is Blosum62 (Henikoff & Henikoff, 1992, PNAS 89, 915-919). Sequence alignments and scores for percentage sequence identity may be determined using computer programs, such as the GCG Wisconsin Package, Version 10.3, available from Accelrys Inc., 9685 Scranton Road, San Diego, CA 92121-3752 USA, or using open source software, such as the program “needle” (using the global Needleman Wunsch algorithm) or “water” (using the local Smith Waterman algorithm) in EmbossWIN version 2.10.0, using the same parameters as for GAP above, or using the default settings (both for ‘needle’ and for ‘water’ and both for protein and for DNA alignments, the default Gap opening penalty is 10.0 and the default gap extension penalty is 0.5; default scoring matrices are Blosum62 for proteins and DNAFull for DNA). When sequences have a substantially different overall lengths, local alignments, such as
  • nucleic acid and protein sequences of the present invention can further be used as a “query sequence” to perform a search against public databases to, for example, identify other family members or related sequences.
  • search can be performed using the BLASTn and BLASTx programs (version 2.0) of Altschul, et al. (1990) J. Mol. Biol. 215:403 — 10.
  • Gapped BLAST can be utilized as described in Altschul et al., (1997) Nucleic Acids Res. 25(17): 3389-3402.
  • the default parameters of the respective programs e.g., BLASTx and BLASTn
  • nucleotide includes, but is not limited to, naturally-occurring nucleotides, including guanine, cytosine, adenine and thymine (G, C, A and T, respectively).
  • nucleotide is further intended to include those moieties that contain not only the known purine and pyrimidine bases, but also other heterocyclic bases that have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles.
  • nucleotide includes those moieties that contain hapten or fluorescent labels and may contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleosides or nucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.
  • nucleic acid refers to any length, e.g., greater than about 2 bases, greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, up to about 10,000 or more bases composed of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, and may be produced enzymatically or synthetically (e.g., PNA as described in U.S. Pat. No. 5,948,902 and the references cited therein).
  • the nucleic acid may hybridize with naturally occurring nucleic acids in a sequence specific manner analogous to that of two naturally occurring nucleic acids, e.g., can participate in Watson-Crick base pairing interactions.
  • nucleic acids and polynucleotides may be isolated (and optionally subsequently fragmented) from cells, tissues and/or bodily fluids.
  • the nucleic acid can be e.g. genomic DNA (gDNA), mitochondrial, cell free DNA (cfDNA), DNA from a library and/or RNA from a library.
  • nucleic acid sample or “sample comprising a nucleic acid” as used herein denotes any sample containing a nucleic acid, wherein a sample relates to a material or mixture of materials, typically, although not necessarily, in liquid form, containing one or more double-stranded nucleic acid molecules. At least one of the nucleic acid molecules may comprise a sequence of interest.
  • the nucleic acid sample used as starting material in the method of the invention can be from any source, e.g., a whole genome, a collection of chromosomes, a single chromosome, one or more regions from one or more chromosomes, a transcriptome or a selection of transcribed genes, and may be purified directly from the biological source or from a laboratory source, e.g., a processed or chemically modified nucleic acid.
  • the nucleic acid samples can be obtained from the same individual, which can be a human or other species (e.g., plant, bacteria, fungi, algae, archaea, etc.), or from different individuals of the same species, or different individuals of different species.
  • the nucleic acid samples may be from a cell, tissue, biopsy, bodily fluid, genome DNA library, cDNA library and/or a RNA library.
  • sequence of interest includes, but is not limited to, any genetic sequence preferably present within a cell, such as, for example (part of) a chromosome, (part of) a gene, or a non-coding sequence within or adjacent to a gene.
  • the sequence of interest may be present in a chromosome, an episome, a transcript, an organellar genome such as mitochondrial or chloroplast genome or genetic material that can exist independently to the main body of genetic material such as an infecting viral genome, plasmids, episomes, transposons for example.
  • a sequence of interest may be within the coding sequence of a gene or within a transcribed non-coding sequence such as, for example, within the 5’ untranslated region, the 3’ untranslated region or intron.
  • Said nucleic acid sequence of interest may be present in a double or a single strand nucleic acid.
  • the sequence of interest can be, but is not limited to, a sequence having or suspected of having, a polymorphism, e.g. a SNP.
  • a sequence of interest can be a sequence unknown in the art, e.g. de novo sequencing.
  • oligonucleotide denotes a single-stranded multimer of nucleotides, preferably of about 2 to 200 nucleotides, or up to 500 nucleotides in length. Oligonucleotides may be synthetic or may be made enzymatically, and, in some embodiments, are about 10 to 50 nucleotides in length. Oligonucleotides may contain ribonucleotide monomers (i.e., may be oligoribonucleotides) or deoxyribonucleotide monomers.
  • An oligonucleotide may be about 10 to 20, 20 to 30, 30 to 40, 40 to 50, 50 to 60, 60 to 70, 70 to 80, 80 to 100, 100 to 150, 150 to 200, or about 200 to 250 nucleotides in length, for example.
  • Plant refers to either the whole plant or to parts of a plant, such as tissue or organs (e.g. pollen, seeds, gametes, roots, leaves, flowers, flower buds, anthers, fruit, etc.) obtainable from the plant, as well as derivatives of any of these and progeny derived from such a plant by selfing or crossing.
  • tissue or organs e.g. pollen, seeds, gametes, roots, leaves, flowers, flower buds, anthers, fruit, etc.
  • Non-limiting examples of plants include crop plants and cultivated plants, such as African eggplant, alliums, artichoke, asparagus, barley, beet, bell pepper, bitter gourd, bladder cherry, bottle gourd, cabbage, canola, carrot, cassava, cauliflower, celery, chicory, common bean, corn salad, cotton, cucumber, eggplant, endive, fennel, gherkin, grape, hot pepper, lettuce, maize, melon, oilseed rape, okra, parsley, parsnip, pepino, pepper, potato, pumpkin, radish, rice, ridge gourd, rocket, rye, snake gourd, sorghum, spinach, sponge gourd, squash, sugar beet, sugar cane, sunflower, tomatillo, tomato, tomato rootstock, vegetable Brassica, watermelon, wax gourd, wheat and zucchini.
  • crops plants include crop plants and cultivated plants, such as African eggplant, alliums, artichoke, asparagus, barley, beet, bell pepper, bitter
  • Plant cell(s) include protoplasts, gametes, suspension cultures, microspores, pollen grains, etc., either in isolation or within a tissue, organ or organism.
  • the plant cell can e.g. be part of a multicellular structure, such as a callus, meristem, plant organ or an explant.
  • Primer refers to a single stranded synthetic nucleotide molecule which can prime the synthesis of DNA.
  • a DNA polymerase cannot synthesize DNA de novo without a primer: it can only extend an existing DNA strand in a reaction wherein the complementary strand is used as a template to direct the order of nucleotides to be assembled. From the 3’-end of a primer hybridized to the complementary DNA strand, nucleotides are incorporated using the complementary strand as a template.
  • Primers can be amplification primers used in an amplification reaction including, but not limited to, a polymerase chain reaction (PCR), or sequence primers used to sequence DNA.
  • PCR polymerase chain reaction
  • the “protospacer sequence” is the sequence that is recognized or hybridizable to a guide sequence within a guide RNA, more specifically the crRNA or, in case of a sgRNA, the crRNA part of the sgRNA.
  • an “endonuclease” is an enzyme that hydrolyses at least one strand of a duplex DNA or a strand of an RNA molecule, upon binding to its target or recognition site.
  • An endonuclease is to be understood herein as a site-directed endonuclease and the terms “endonuclease” and “nuclease” are used interchangeable herein.
  • a restriction endonuclease is to be understood herein as an endonuclease that hydrolyses both strands of the duplex at the same time to introduce a double strand break in the DNA.
  • a “nicking” endonuclease is an endonuclease that hydrolyses only one strand of the duplex to produce DNA molecules that are “nicked” rather than cleaved.
  • exonuclease is defined herein as any enzyme that cleaves one or more nucleotides from the (open) end (exo) of a polynucleotide.
  • Reducing complexity or “complexity reduction” is to be understood herein as the reduction of a complex nucleic acid sample, such as samples derived from genomic DNA, cfDNA derived from liquid biopsies, isolated RNA samples and the like. Reduction of complexity results in the enrichment of one or more specific nucleic acids, preferably comprising a sequence of interest, comprised within the complex starting material and/or the generation of a subset of the sample, wherein the subset comprises or consists of one or more specific nucleic acids, preferably comprising a sequence of interest, comprised within the complex starting material, while nonspecific nucleic acids, preferably not comprising a sequence of interest, are reduced in amount by at least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% as compared to the amount of non-specific nucleic acids in the starting material, i.e. before complexity reduction.
  • complexity reduction is in general performed prior to further analysis or method steps, such as amplification, barcoding, sequencing, determining epigenetic variation etc.
  • complexity reduction is reproducible complexity reduction, which means that when the same sample is reduced in complexity using the same method, the same, or at least comparable, subset is obtained, as opposed to random complexity reduction.
  • complexity reduction methods include for example AFLP® (Keygene N.V., the Netherlands; see e.g., EP 0 534 858), Arbitrarily Primed PCR amplification, capture-probe hybridization, the methods described by Dong (see e.g., WO 03/012118, WO 00/24939) and indexed linking (Unrau P. and Deugau K.V. (1994) Gene 145:163-169), the methods described in W02006/137733; W02007/037678; W02007/073165; W02007/073171 , US 2005/260628, WO 03/010328, US 2004/10153, genome portioning (see e.g.
  • Massively Parallel Signature Sequencing (MPSS; see e.g. Brenner et al., 2000, Nature Biotechnology, vol. 18:630-634 and Brenner et al . , 2000, PNAS, vol. 97 (4) :1665- 1670) , self-subtracted cDNA libraries (Laveder et al., 2002, Nucleic Acids Research, vol. 30(9):e38), Real-Time Multiplex Ligation-dependent Probe Amplification (RT-MLPA; see e.g. Eldering et al., 2003, vol. 31 (23) : el53) , High Coverage Expression Profiling (HiCEP; see e.g. Fukumura et al.
  • MPSS Massively Parallel Signature Sequencing
  • RT-MLPA Real-Time Multiplex Ligation-dependent Probe Amplification
  • HiCEP High Coverage Expression Profiling
  • Sequence or “Nucleotide sequence”: This refers to the order of nucleotides of, or within a nucleic acid. In other words, any order of nucleotides in a nucleic acid may be referred to as a sequence or nucleic acid sequence.
  • the target sequence is an order of nucleotides comprised in a single strand of a DNA duplex.
  • next-generation sequencing refers to a method by which the identity of at least 10 consecutive nucleotides (e.g., the identity of at least 20, at least 50, at least 100 or at least 200 or more consecutive nucleotides) of a polynucleotide are obtained.
  • next-generation sequencing refers to the so-called parallelized sequencing-by-synthesis or sequencing-by-ligation platforms, e.g., such as currently employed by Illumina, Life Technologies, PacBio and Roche etc.
  • Next-generation sequencing methods may also include nanopore sequencing methods, such as those commercialized by Oxford Nanopore Technologies (ONT), or electronic-detection based methods such as Ion Torrent technology commercialized by Life Technologies.
  • the nextgeneration sequencing method is a nanopore sequencing method, preferably a nanopore selective sequencing method.
  • Nanopore selective sequencing is to be understood herein as selective sequencing of single molecules in real time using nanopore sequencing technology such as from Oxford Nanopore or Ontera, and mapping streaming nanopore current signals or base calls to a reference sequence in order to reject non-target sequences.
  • the sequencer is steered to either pursue sequencing of a nucleic acid, or to quit and remove the nucleic acid from the sequencing pore by reversing the polarity of the voltage across the specific pore for a certain short period of time sufficient to eject the non-target molecule and make the nanopore available for a new sequencing read.
  • Nanopore selective sequencing methods are described in Payne et al., 2020 (Nanopore adaptive sequencing for mixed samples, whole exome capture and targeted panels, February 3, 2020; DOI: 10.1101/2020.02.03.926956) and Kovaka et al. 2020 (Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED, February 3, 2020; doi: 10.1101/2020.02.03.931923), which are incorporated herein by reference.
  • a “double-stranded nucleic acid molecule” in the context of the invention may be a small or longer stretch, or selected portion of a nucleic acid.
  • the double-stranded nucleic acid molecule may be comprised within a larger nucleic acid molecule, e.g. within a larger nucleic acid molecule present in a sample to be analysed.
  • the double-stranded nucleic acid molecule comprises a sequence of interest.
  • the sequence of interest may be any sequence within a nucleic acid sample, e.g., a gene, gene complex, locus, pseudogene, regulatory region, highly repetitive region, polymorphic region, or portion thereof.
  • the sequence of interest may also be a region comprising genetic or epigenetic variations indicative for a phenotype or disease.
  • a sequence of interest is preferably the object of a further analysis or action, such as, but not limited to copying, amplification, sequencing and/or other procedure for nucleic acid interrogation.
  • the sequence of interest is a small or longer contiguous stretch of nucleotides (/.e.
  • duplex DNA is genomic DNA (gDNA) and/or cell free DNA (cfDNA).
  • a “target sequence” is defined herein as a sequence present in the double-stranded acid molecule as defined herein, which sequence is recognized by at least one of a nuclease and nickase as defined herein.
  • a plurality or “set” of nucleic acid molecules used in the method of the invention comprise one or more sequences of interest that are selected to be enriched.
  • such set consists of structurally or functionally related nucleic acid molecules.
  • a nucleic acid molecule in the context of the invention can comprise both natural and non-natural, artificial, or non- canonical nucleotides including, but not limited to, DNA, RNA, BNA (bridged nucleic acid), LNA (locked nucleic acid), PNA (peptide nucleic acid), morpholino nucleic acid, glycol nucleic acid, threose nucleic acid, epigenetically modified nucleotide such as methylated DNA, and mimetics and combinations thereof.
  • the inventors discovered a method to increase the sequencing read accuracy, in particular to increase the sequencing read accuracy of long read sequencing.
  • a method for determining a sequence in a double-stranded nucleic acid molecule preferably of determining a sequence of interest in a double-stranded nucleic acid molecule.
  • the method comprises the steps of: a) providing a sample comprising the double-stranded nucleic acid molecule; b) adding a protelomerase recognition site to an end of the double-stranded nucleic acid molecule; c) generating a double-stranded nucleic acid molecule having one open end and one closed end comprising contacting the nucleic acid molecule obtained in step b) with a protelomerase that cleaves and covalently closes the protelomerase recognition site; d) sequencing both strands of the double-stranded nucleic acid molecule in a single sequencing reaction to generate a duplex read; and e) generating a consensus sequence from the duplex read to determine the sequence in the double-stranded nucleic acid molecule.
  • the method as defined herein may also be considered as e.g. at least one of:. a method for genotyping; a method for haplotyping; and a method for analyzing a (genomic) sequence.
  • a sample comprising the double-stranded nucleic acid molecule is provided.
  • the double-stranded nucleic acid molecule preferably comprises a sequence of interest.
  • the sample comprising the double-stranded nucleic acid molecule may be from any source, e.g. of an individual such as a human, animal, plant or microorganism, and the double stranded nucleic acid may be of any kind, e.g.
  • the nucleic acid molecule present in the sample that is used as starting material for the method of the invention is any one of DNA, such as genomic DNA, chromosomal DNA, organellar DNA, nuclear DNA, mitochondrial DNA, artificial chromosomes, plasmid DNA, episomal DNA, cDNA and RNA.
  • the DNA is chromosomal DNA, preferably endogenous to the cell.
  • the DNA may also be cDNA reversed transcribed from RNA, wherein said RNA may be from any source, e.g. of an individual such as a human, animal, plant or microorganism.
  • the sample is a plant sample.
  • the nucleic acid molecule is preferably a long nucleic acid molecule, provided e.g. by cell lysis and optionally lysis of an organelle.
  • the nucleic acid molecule for use in the method of the invention may have a size of at least about 50 kb, 100 kb, 150 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, 700 kb, 800 kb, 900 kb or at least about 1000 kb (1 Mb).
  • the nucleic acid for use in the invention may be a high molecular weight (HMW) nucleic acid or ultra-high molecular weight (uHMW) nucleic acid.
  • HMW nucleic acids may have a length of at least 10 kb.
  • uHMW nucleic acids may have a length of at least 1 Mb.
  • the nucleic acid molecules used in the method of the invention may have a size of at least 1 .1 Mb, 1 .3 Mb, 1 .5 Mb, 1 .7 Mb, 2 Mb, 2.5 Mb, 3 Mb, 4 Mb, 5 Mb, 6 Mb, 7 Mb, 8 Mb, 9 Mb or at least about 10 Mb.
  • a nucleic acid molecule preferably a long nucleic acid molecule, may first be fragmented, resulting in the double-stranded nucleic acid provided in step a). Therefore in an embodiment, step a) of the method provided herein may be preceded by a step of fragmenting a longer nucleic acid molecule.
  • the fragmentation is preferably the fragmentation of a genomic nucleic acid molecule.
  • the skilled person is familiar with means to fragment nucleic acid molecules and the invention is not limited to any specific means for fragmenting the nucleic acid molecule.
  • the fragmented nucleic acids are preferably fragmented genomic DNA.
  • DNA, and in particular genomic DNA can be fragmented using any suitable method known in the art. Methods for DNA fragmentation include, but are not limited to, enzymatic digestion and mechanical force.
  • Non-limited examples of fragmenting the nucleic acid molecule using mechanical force include the use of acoustic shearing, nebulization, sonication, point-sink shearing, needle shearing and French pressure cells.
  • the fragmentation is performed by restriction enzyme and/or site-directed endonuclease digestion.
  • Enzymatic digestion for fragmenting a nucleic acid molecule includes, but is not limited to, endonuclease restriction. Enzymatic digestion, such as e.g. used in AFLP® and/or Sequenced Based Genotyping technology, may further result in a complexity reduction of the nucleic acid sample.
  • the skilled person knows which enzyme(s) to select for the DNA fragmentation.
  • at least one frequent cutter and at least one rare cutter can be used for the fragmentation of the nucleic acid sample.
  • a frequent cutter preferably has a recognition site of about 3-5 bp, such as, but not limited to Msel.
  • a rare cutter preferably has a recognition site of >5bp, such as but not limited to EcoRI.
  • the sample contains or is derived from a relative large genome
  • the step of enzymatic digestion is not limited to any specific restriction endonuclease.
  • the endonuclease may be a type II endonuclease, such as EcoRI, Msel, Pstl etc.
  • a type IIS or type III endonuclease may be used, i.e.
  • an endonuclease of which the recognition sequence is located distant from the restriction site such as, but not limited to, Acelll, Alwl, AlwXI, Alw26l, Bbvl, Bbvll, Bbsl, Bed, Bce83l, Bcefl, Bcgl, Bini, Bsal, Bsgl, BsmAI, BsmFI, BspMI, Earl.Ecil, Eco3ll, Eco57l, Esp3l, Faul, Fokl, Gsul, Hgal, HinGUII, Hphl, Ksp632l, Mboll, Mmel, Mn 11, NgoVIII, Piel, RleAl, Sapl, SfaNI, TaqJI and Zthll III. Restriction fragments can be blunt- ended or have protruding ends, depending on the endonuclease used.
  • the recognition site of at least one of the frequent cutter and the rare cutter is within or in close proximity of the sequence of interest, e.g. the recognition site of the frequent cutter or the rare cutter is located about 0-10000, 10-5000, 50-1000 or about 100-500 bases from the sequence of interest.
  • the nucleic acid molecule may be digested using a site-directed nuclease, preferably using at least one of a CRISPR nuclease, a zinc finger nuclease, TALENs and meganucleases.
  • the restriction enzyme and/or site-directed endonuclease creates a singlestranded overhang to the double-stranded nucleic acid molecule.
  • the generated single-stranded overhang may provide for adapter ligation or may be part of a primer binding site.
  • the ends of the nucleic acid molecule provided in step a) may be repaired, preferably by polishing the ends of the provided nucleic acid molecule.
  • the method of the invention may comprise a step of polishing the (optionally fragmented) nucleic acid prior thereto.
  • Polishing reactions are well-known in the art and the skilled person straightforwardly understands how to perform polishing reaction, e.g. using a Klenow fragment of DNA polymerase, a T4 DNA polymerase and/or a Mung Bean Nuclease.
  • the nucleic acid molecule provided in step a) may be polished to create blunt ends followed by the addition of a 3’-A staggered overhang, preferably to facilitate ligation to a partly, or fully, double-stranded adapter in step b), wherein said adapter comprises a protelomerase recognition sequence and a T-overhang.
  • a 3’-A overhang i.e. the addition of a deoxyadenosine nucleotide to the 3’-end of the (polished) nucleic acid provided in a
  • the nucleic acid molecule comprising a 3’-A-overhang may subsequently be ligated to a compatible adapter comprising a 3’-T-overhang, having a deoxythymidine nucleotide overhang at its 3’ end.
  • the method of the invention may optionally comprise a step of A-tailing the, optionally fragmented and/or optionally polished, nucleic acid molecule.
  • A-tailing reactions are well-known in the art and the skilled person straightforwardly understands how to perform an A-tailing reaction, such as e.g. using a Klenow fragment (exo-).
  • the nucleic acid sample may comprise a plurality of double-stranded nucleic acid molecules.
  • the plurality of nucleic acid molecules may be derived from at least one of the same organism, the same tissue, the same cell, the same organelle and/or the same (fragmented) molecule.
  • the sequence of all double-stranded nucleic acid molecules is determined in method provided herein.
  • the sequence of part of the double-stranded nucleic acid molecules is determined.
  • (part of) the sequence of a complexity-reduced sample of double-stranded nucleic acid molecules is determined.
  • the method may comprise a step of enriching the provided sample for the double-stranded nucleic acid molecule of interest, preferably the double-stranded nucleic acid molecule comprising a sequence of interest.
  • step a) of the method of the invention may be preceded by a step of reducing the complexity of the sample.
  • the complexity of the sample may be reduced between step b) and c) of the method of the invention.
  • This may be achieved by specifically adding protelomerase recognition site to both ends of a double-stranded nucleic acid comprising a sequence of interest, and subsequently covalently closing both ends by contacting said nucleic acid with a protelomerase.
  • the closed fragments can subsequently be enriched by degrading the remainder of the nucleic acid molecules or fragments using an exonuclease, as the remainder of the nucleic acid molecules or fragments have at least one open end.
  • the closed fragments can subsequently be opened for instance using a programmable endonuclease, rendering two nucleic acids with each having one covalently closed and one open end.
  • a site for amplification and/or sequencing may be added to the open and.
  • an adapter for sequencing may be attached to said open end.
  • said adapter introduces a leader sequence at the 5’ end of the nucleic acid molecule of the invention, wherein said leader sequence comprises a motor protein for nanopore sequencing.
  • Addition of a protelomerase recognition site specifically to both ends of a double-stranded nucleic acid comprising a sequence of interest can be achieved using a pair of protelomerase recognition site-comprising primers designed to specifically amplify the sequence of interest, by introducing protelomerase recognition sites specifically flanking a sequence of interest using prime editing, and/or by excising the sequence of interest using one or more (programmed) endonucleases capable of creating an overhang that is compatible to the ligation side of a protelomerase recognition site-comprising adapter.
  • Another way of selectively sequencing a subset of the nucleic acid molecules and/or fragments in the sample is by adding in step b) a protelomerase recognition site to one end of a double-stranded nucleic acid comprising a sequence of interest, while adding a sequencing primer binding site to the other side of said nucleic acid molecule.
  • a protelomerase recognition site to one end of a double-stranded nucleic acid comprising a sequence of interest
  • a sequencing primer binding site to the other side of said nucleic acid molecule.
  • Such sites can be introduced specifically as indicated above, e.g. by primer amplification, prime-editing and/or by specific adapter ligation to compatible overhangs.
  • the current method as disclosed herein can also be used in Sequence Based Genotyping (SBG) technology, e.g. for polyploid cells.
  • SBG Sequence Based Genotyping
  • the SBG technology is e.g. described in more detail in WG2007/114693, WG2006/137733 and WG2007/073165, which are incorporated herein by reference.
  • the SBG technology as described in the art can be modified by attaching an adapter comprising a protelomerase recognition sequence as described herein, to the fragmented nucleic acid sample.
  • the protelomerase recognition site may be added to at least one end of the provided double-stranded nucleic acid molecule using any conventional means known to the person skilled in the art, such as at least one of adapter ligation, tagmentation, primer amplification or prime editing.
  • the protelomerase recognition site may be added as described in at least one of WO2021/123062 and W02022/074058, which are incorporated herein by reference.
  • Adding, tagging or attaching in relation to a particular sequence or site in the nucleic acid molecule of the method of the invention may be via a method step such as adapter ligation, adapter tagmentation, primer extension, primer amplification and/or prime editing.
  • a single or double stranded nucleic acid molecule comprising the protelomerase recognition site may be added to the nucleic acid molecule by ligation, hybridization, annealing, tagmentation and/or prime editing.
  • the method will comprise a step of converting at least the protelomerase recognition sequence into a double stranded structure, preferably as further detailed herein, in order for the protelomerase enzyme to be able to cleave and simultaneously covalently close the double stranded structure in step c) of the method provided herein.
  • nucleic acid molecules that are present in a particular nucleic acid sample are tagged at least on one side with a protelomerase recognition site and are thus cleaved and simultaneously covalently closed upon protelomerase treatment, rendering double-stranded nucleic acid molecules that have at least one closed end.
  • the protelomerase recognition site added to the nucleic acid molecule of step a) is preferably a TelN protelomerase recognition sequence (also indicated herein as TelN protelomerase recognition site).
  • the protelomerase recognition site is added by adapter ligation.
  • adapters containing a recognition site for the protelomerase enzyme can be ligated to the double-stranded molecule. These adapters are subsequently cut by the protelomerase enzyme and simultaneously the end of the nucleic acid molecule is covalently closed.
  • step b) comprises a step of attaching an adapter to the provided nucleic acid molecule, wherein the adapter comprises a protelomerase recognition site.
  • the adapter attached to the nucleic acid molecule comprises a TelN protelomerase recognition sequence.
  • the adapter may be single-stranded.
  • a single-stranded adapter preferably comprises a section, preferably at its 3’ end, that is capable of hybridizing to a nucleic acid molecule provided in step a) of the method described herein.
  • the single-stranded adapter preferably can hybridize to a single-stranded overhang of the nucleic acid molecule, preferably a 3’ overhang of the nucleic acid molecule.
  • the remaining single-stranded part of the annealed single-stranded adapter may subsequently be filled in, i.e. is made double-stranded, using a polymerase, such as, but not limited to, Klenow (known by the skilled person to have 5'— >3' polymerase activity and 3’— >5’ exonuclease activity but lacking 5'— >3' exonuclease activity) or a Bst-polymerase (known by the skilled person to be a DNA polymerase from Bacillus stearothermophilus having 5'— >3' polymerase activity and strand displacement activity, but lacking 3'— >5' exonuclease activity).
  • Klenow known by the skilled person to have 5'— >3' polymerase activity and 3’— >5’ exonuclease activity but lacking 5'— >3' exonuclease activity
  • Bst-polymerase known by the skilled person to be a DNA polymerase from
  • the filling-in step optionally results in the generation of a double-stranded protelomerase recognition sequence.
  • the hybridized adapter may be ligated to the 5’-end of the nucleic acid molecule of the opposite strand to which the adapter is hybridized, prior or after being made double stranded.
  • the adapter is at least partly double-stranded.
  • the at least partly doublestranded adapter may be ligated to a nucleic acid molecule provided in step a) as defined herein.
  • at least 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or 100% of the nucleotides in the adapter are double-stranded.
  • the protelomerase recognition sequence is double-stranded.
  • the protelomerase recognition sequence may be comprised in the single-stranded part of a partly double stranded adapter, which may be made double stranded by filling in the opposite strand after attachment to the nucleic acid.
  • the adapter may be 100% or “fully” double-stranded.
  • the adapter may become fully doublestranded after ligation of the adapterto the nucleic acid molecule, e.g. by filing in the single-stranded part of the adapter using a DNA polymerase.
  • the at least partly double-stranded adapter comprises two single-stranded molecules that may at least partly anneal to each other, i.e. the double-stranded adapter preferably comprises two open ends prior to ligating the adapter to the nucleic acid molecules as defined herein.
  • the adapter is not a hairpin adapter.
  • One end of the at least partly double-stranded adapter can be ligated to the nucleic acid molecule provided in step a).
  • at least the one end that is ligated to the nucleic acid molecule is double-stranded.
  • the at least one end of double-stranded end of the adapter can be a blunt or a staggered or “sticky” end.
  • the adapter comprises at least one staggered end.
  • the end of the adapter that is ligated to the nucleic acid molecule has an end that is compatible with an end of said nucleic acid molecule.
  • the nucleic acid molecule comprises an end having an A-overhang
  • the adapter preferably comprises an end having a T-overhang.
  • the adapter preferably comprises an overhang of respectively 1 , 2, 3, 4, 5 or more nucleotides that are complementary to the overhang of the nucleic acid molecule.
  • the other end of the adapter preferably cannot be ligated to a nucleic acid molecule or an adapter. Any means to block ligation of an adapter end is suitable for use in the method as provided herein.
  • the other end of the adapter may be single-stranded or comprises an incompatible overhang.
  • an adapter for use in the method provided herein are well known to the skilled person and the method provided herein is not limited to any particular adapter design and/or construction.
  • two oligonucleotides comprising or consisting of complementary sequences can be constructed and annealed to one another under controlled conditions, resulting in at least partly double-stranded adapter for use in the invention.
  • the adapter may be at least partially double stranded.
  • the adapter preferably has a double stranded structure at or near the side for ligation to the nucleic acid molecule of step (a) of the invention.
  • the ligatable end of the adapter may be blunt or staggered for ligation to a respective compatible blunt or staggered ended nucleic acid molecule.
  • Compatible is to be understood herein as that the overhangs of the adapter and the nucleic acid molecule can hybridize to one another.
  • a nucleic acid fragment produced by restriction enzyme digestion may comprise a 3’ or 5’ overhang, which can hybridize to a complementary (compatible) 3’ or 5’ adapter overhang, respectively.
  • a single- or double-stranded adapter can be ligated at its 3’ end to a phosphorylated 5’-end of the nucleic acid molecule.
  • DNA fragments produced by restriction enzyme fragmentation in general comprise phosphorylated 5’ ends and can therefore readily be ligated.
  • a single- or double-stranded adapter can be ligated at its 5’ end to a 3’-end of the nucleic acid molecule, in case the 5’ end of said adapter is phosphorylated.
  • the opposite strand of the adapter may be filled in (optionally after denaturing the non-ligated strand of the at least partly double-stranded adapter), thus producing a double-stranded adapter.
  • Filling in the single-stranded sequence, i.e. to generate a double-stranded sequence can be done using any conventional polymerase, such as, but not limited to Klenow or BST-polymerase.
  • a preferred polymerase is a BST-polymerase.
  • an adapter is ligated to both sides of the nucleic acid molecule, wherein at least one of the adapters comprises a protelomerase recognition sequence.
  • one of the adapters does not comprise a protelomerase sequence.
  • Said adapter may have at the non-ligatable end, a single-stranded overhang for e.g. amplification primer binding, sequencing primer binding and/or identification.
  • both strands of said adapter at the non-ligatable end are singlestranded or non-complementary, thereby forming a Y-shaped adapter, wherein one or both of the strands of the non-complementary end of the adapter may comprise sequences for amplification primer binding, sequencing primer binding, nanopore sequencing and/or identification.
  • the 5’-end overhang of the non-ligated side of an adapter comprises a leader sequence that comprises a motor protein for nanopore sequencing.
  • Ligation of an adapter can be performed using any conventional method known to the skilled person and the invention is not limited to any specific ligation method or ligation enzyme (ligase).
  • the adapter comprises an end that is compatible to at least one end of the nucleic acid molecule.
  • the nucleic acid molecule provided in step a) comprises a single-stranded overhang and the adapter provided in step b) comprises an overhang that can be ligated to the overhang of the nucleic acid molecule.
  • a step of fragmentation to provide the nucleic acid of step a) and subsequent adapter ligation of step b) may be combined in a single step, e.g.
  • the adapter in step b) is ligated by tagmentation, preferably using a Tn5 transposase.
  • Transposases randomly cut the long nucleic acid molecules in shorter nucleic acid molecules and adapters can be ligated on either side of the cleaved points.
  • Tagmentation or “transposase mediated fragmentation and tagging” is a process that is well-known for the person skilled in the art, for example as exemplified in the workflow for NexteraTM.
  • the adapters may comprise sequences that make them compatible for use in a tagmentation reaction.
  • the adapters used in a tagmentation reaction further comprise a transposase sequence.
  • the transposase sequence is preferably compatible with the transposase used in the tagmentation reaction.
  • the tagmentation reaction may be followed by a repair step to ensure that all, or substantially all, generated nucleic acid molecules comprise an adapter, preferably on both sides.
  • the nucleic acid molecules comprising a ligated adapter, optionally obtained by tagmentation may be repaired to remove any single-stranded breaks.
  • the repair step takes place prior to contacting the molecules with a TelN protelomerase. Such repair step can be performed using any conventional means known in the art.
  • step b) may comprise a step of amplifying at least part of (and at least one strand of) the provided nucleic acid molecule, wherein the primer for amplification comprises a protelomerase recognition site.
  • said primer comprises i) a 3’-end for annealing to a primer binding site present in the provided nucleic acid molecule, or to an, optionally universal, primer binding site in an adapter that has been ligated to the nucleic acid molecule; and ii) a protelomerase recognition site in a 5’-tail of such primer.
  • the primer may optionally comprise an identifier sequence, preferably an identifier sequence as defined herein.
  • the identifier sequence is a sample identifier.
  • the identifier sequence is located in the primer such that it remains part of the nucleic acid molecule after cleaving and closing the cleaved ends. Put differently, the identifier sequence preferably remains covalently attached to the nucleic acid after step c).
  • a pair of primers is used for amplification, wherein at least one of the primers comprises a protelomerase recognition site.
  • at least one of the primers of the primer pair comprises an identifier sequence.
  • both primers of the primer pair comprise an identifier sequence, that preferably both remain covalently attached to the nucleic acid after step c).
  • Amplification can be achieved by PCR or by any other amplification method known in the art.
  • the primer binding site is a unique sequence, i.e. a sequence that is only present in the nucleic acid molecule comprising the sequence of interest.
  • the protelomerase sequence may be introduced in amplicons produced via PCR using the provided nucleic acid molecule as a template.
  • the nucleic acid molecule provided in step a) comprises a single-stranded overhang and the primer is capable to annealing to at least the overhang over the nucleic acid molecule.
  • the single-stranded overhang of the provided nucleic acid molecule is, or is part of, the primer binding site.
  • the nucleic acid molecule is amplified using a primer set wherein at least one primer comprises a protelomerase recognition site; the subsequent steps are then performed on the resulting amplicons which can be closed at least at one end upon protelomerase treatment.
  • the protelomerase sequence may be introduced via a single step of denaturation, annealing of the primer and filling in the single strand overhang.
  • a protelomerase recognition site may be added to only one end or to both ends of the provided nucleic acid molecule.
  • a protelomerase recognition site may be added to one end of the nucleic acid molecule using a first primer or adapter comprising a protelomerase recognition site and an optional second primer or adapter of the respective primer pair, adapter pair or combination of a primer and adapter, that does not comprise a protelomerase recognition site.
  • a first and a second adapter may be combined or “mixed” prior to ligating the combined adapters to the provided nucleic acid molecule, wherein the first adapter comprises a protelomerase recognition site and the second adapter does not comprise a protelomerase recognition site.
  • the first and second adapter may only differ by the respective presence and absence of a protelomerase recognition site.
  • using an adapter combination wherein about 50% of the adapters are a first adapter as defined herein and about 50% of the adapters are a second adapter as defined herein should result in about 50% of the nucleic acid molecules having an attached first adapter at one end and an attached second adapter at the other end of the same nucleic acid molecule.
  • Step b) may therefore comprise a step b i) of combining a first and a second adapter as defined herein and a step b ii) of attaching the combined adapters to the provided nucleic acid molecule, wherein preferably the first adapter is attached to one end of the nucleic acid molecule and the second adapter is attached to the other end of the same nucleic acid molecule.
  • Adapter combinations may comprise a first to second adapter (w/w) ratio of about 1 :1 , or about 2:1 , 3:1 , 4:1 , 5:1 , 6:1 , 7:1 , 8:1 , 9:1 , 10:1 , 11 :1 , or about 12:1 .
  • the adapter combinations may comprise a first to second adapter (w/w) ratio of about 1 :2; 1 :3; 1 :4, 1 :5, 1 :6, 1 :7, 1 :8, 1 :9, 1 :10, 1 :11 or about 1 :12.
  • the adapter comprising the protelomerase recognition site may be capable of ligation to only one end of the provided nucleic acid molecule.
  • the adapter may comprise an end, preferably a 3’-end, that is capable of hybridizing to only one end of the provided nucleic acid molecule.
  • the nucleic acid molecule comprises dissimilar ends, such as a blunt end and a sticky end
  • the ligatable end of the adapter may comprise a respectively a blunt or a (compatible) sticky end, such that it can be ligated to only one end of the nucleic acid molecule.
  • the nucleic acid molecule may comprise two different overhangs and the adapter may be compatible with only one of these overhangs. Consequently, the adapter may be ligated to only one end of the provided nucleic acid molecule.
  • a protelomerase recognition site may be added to both ends of the provided nucleic acid molecule.
  • the adapters and/or primer for use in a method as defined herein preferably do not comprise a recognition site for the restriction endonuclease or the site-directed endonuclease that can be used in step c3) of the method provided herein. More preferably the part of the adapter and/or primer that is located in between the protelomerase recognition sequence and the end ligated and/or annealed to the nucleic acid molecule does not comprise a recognition site for a restriction endonuclease or a site-directed endonuclease that can be used in step c3) of the method provided herein.
  • the adapter and/or primer for use in step b) of the method provided herein comprises a protelomerase recognition sequence, preferably a TelN protelomerase recognition sequence.
  • a protelomerase recognition sequence is any DNA sequence whose presence in a DNA template allows for its conversion into a closed linear DNA by the enzymatic activity of protelomerase. In other words, the protelomerase recognition sequence is required for the cleavage and religation of double-stranded DNA by protelomerase to form covalently closed linear DNA.
  • a protelomerase recognition sequence comprises a perfect palindromic sequence, i.e. a doublestranded DNA sequence having two-fold rotational symmetry.
  • the length of the perfect inverted repeat differs depending on the specific organism. In Borrelia burgdorferi, the perfect inverted repeat is 14 base pairs in length. In various mesophilic bacteriophages, the perfect inverted repeat is 22 base pairs or greater in length. Also, in some cases, e.g. E. coli N15, the central perfect inverted palindrome is flanked by inverted repeat sequences, i.e. forming part of a larger imperfect inverted palindrome.
  • a protelomerase recognition sequence as used in the invention preferably comprises a double-stranded palindromic (perfect inverted repeat) sequence of at least 14 base pairs in length.
  • Preferred perfect inverted repeat sequences include the sequences of SEQ ID NOs: 1 - 9 and variants thereof.
  • SEQ ID NO: 1 (NCATNNTANNCGNNTANNATGN) is a 22 base consensus sequence. As e.g. disclosed in WO2010/086626, base pairs of the perfect inverted repeat are conserved at certain positions, while flexibility in sequence is possible at other positions.
  • SEQ ID NO: 1 is a minimum consensus sequence for a perfect inverted repeat sequence for use with a protelomerase in the process of the present invention.
  • the protelomerase recognition sequence may have a sequence as described in WO2010/086626, which is incorporated herein by reference.
  • the protelomerase recognition sequence has at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity with SEQ ID NO: 10.
  • the sequence of SEQ ID NO: 10 is:
  • the protelomerase cleaves the protelomerase sequence between positions 28- 29 in the recognition sequence and closes the cleaved ends.
  • An adapter for use in step b) of the method of the invention may consist of the protelomerase recognition sequence.
  • the adapter may comprise additional nucleotides.
  • a primer for use in step b) of the method of the invention will at least comprise of protelomerase recognition sequence and at the 3’ end thereof of a sequence for annealing to the nucleic acid of step a) of the method of the invention.
  • the adapter and/or primer may comprise an identifier sequence or “barcode” or “tag” for tagging the double-stranded nucleic acid molecule.
  • tagging of the double-stranded nucleic acid molecule is preferably achieved by (primer) hybridization and/or (adapter) ligation.
  • the identifier is preferably at least one of a sample identifier and an UMI.
  • the identifier sequence remains part of the nucleic acid molecule after cleaving and closing the cleaved ends.
  • the identifier sequence preferably remains covalently attached to the nucleic acid after step c).
  • the nucleic acid that is sequenced in step d) comprises a combination of identifier sequences.
  • At least one identifier may be introduced in step b) together with the protelomerase recognition site at one side to the nucleic acid molecule, and at least one identifier may be added to the open end of the nucleic acid fragment obtained in step c), for instance togetherwith introducing an amplification and/or sequencing primer binding site, and/or by attaching a sequencing adapter.
  • a Y-shaped adapter is ligated to the open end of the nucleic acid fragment obtained in step c)
  • two identifiers may be present, wherein each identifier is located in a different single-stranded arm of the Y shaped adapter.
  • the nucleic acid of the invention may be labeled with three different identifiers.
  • the function of the identifiers may be different.
  • the identifier introduced in step b) may be a sample identifier
  • the (combination of) identifier(s) introduced at the open end may be molecular identifier(s).
  • Such (combination of) identifier(s) may serve to trace back the template molecule after a process of amplification of the labelled nucleic acid molecules.
  • the UMI may be a separate sequence within the adapter and/or primer or, in case the protelomerase recognition sequence comprises degenerate nucleotides, these degenerate nucleotides may be used to introduce an identifier.
  • an adapter and/or primer may be used with one or more specific nucleotides within this recognition sequence, whereas for a second or further sample, other specific nucleotides are used at this position, thereby creating an identifier sequence within the protelomerase recognition sequence.
  • the adapter and/or primer may comprise a sample identifier as well as an UMI.
  • a sample identifier may connect the sequence of a nucleic acid molecule to a specific sample.
  • the adapters and/or primers used in the method provided herein may comprise an identifier sequence that is specific for a certain sample.
  • Each additional sample can be processed using adapters and/or primers having an identifier sequence specific for said additional sample.
  • the processed samples can subsequently be pooled and the obtained sequences can be assigned to a specific sample using the sample identifier sequence.
  • multiple samples are treated in parallel by performing steps a), b) and c) in parallel and pooling the samples prior to step d).
  • a UMI is a substantially unique sequence or barcode, preferably fully unique, that is specific for a nucleic acid molecule, i.e. unique for each nucleic acid molecule used in the method of the invention.
  • the UMI may have random, pseudo-random or partially random, or non-random nucleotide sequences.
  • a UMI can be used to uniquely identify the originating molecule from which a sequencing read is derived. For example, reads of amplified nucleic acid molecules can be collapsed into a single consensus sequence from each originating nucleic acid molecule. As indicated above, the UMI may be fully or substantially unique.
  • each adapter- ligated and/or primer-annealed nucleic acid molecule provided in the method of the invention comprises a unique tag that differs from all the other tags comprised in further (adapter- ligated and/or primer-annealed) nucleic acid molecules used in the method of the invention.
  • Substantially unique is to be understood herein in that each adapter- ligated and/or primer-annealed nucleic acid molecule provided in the method of the invention comprises a random UMI, but a low percentage of these adapter- ligated and/or primer-annealed nucleic acid molecules may comprise the same UMI.
  • substantially unique molecular identifiers are used in case the chances of tagging the exact same molecule comprising the same sequence with the same UMI is negligible.
  • the UMI is fully unique in relation to a specific sequence of the nucleic acid molecule.
  • the UMI preferably has a sufficient length to ensure this uniqueness.
  • a less unique molecular identifier i.e. a substantially unique identifier, as indicated above
  • An identifier sequence may range in length from about 2 to 100 nucleotide bases or more, and preferably has a length between about 4-16 nucleotide bases.
  • the identifier sequence can be a consecutive sequence or may be split into several subunits. These subunits may be present in a single adapter and/or primer or may be present in separate adapters and/or primers. For instance, if the nucleic acid molecule is flanked by two adapters, each of these two adapters may comprise a subunit of the identifier sequence.
  • the sequence reads obtained in the method of the invention may be grouped based on the information each of the two subunits.
  • the identifier sequence does not contain two or more consecutive identical bases.
  • a double-stranded nucleic acid molecule is generated having one open end and one closed end, comprising contacting the nucleic acid molecule obtained in step b) with a protelomerase that cleaves and covalently closes the protelomerase recognition site.
  • the nucleic acid molecule obtained in step b) is contacted with a protelomerase that cleaves and covalently closes the protelomerase recognition site for generating a doublestranded nucleic acid molecule having one open end and one closed end.
  • At least one end of the double-stranded nucleic acid molecule is closed to provide a double-stranded nucleic acid molecule having one open end and one closed end.
  • a terminus of a double-stranded nucleic acid, wherein the 3’-end terminal nucleotide of the respective upper strand is covalently linked to the 5’-end terminal nucleotide of the respective bottom strand, is annotated herein as a “closed end”.
  • a terminus of a double-stranded nucleic acid wherein the 5’-end terminal nucleotide of the respective upper strand is covalently linked to the 3’-end terminal nucleotide of the respective bottom strand, is also annotated herein as a “closed end”.
  • a “closed end” is thus understood herein as a terminus of a double-stranded nucleic acid wherein said terminal nucleic acids from opposite strands are covalently linked to each other, as opposed to an “open end” which is understood herein as a terminus of a double-stranded nucleic acid wherein said terminal nucleic acids from opposite strands are not covalently linked to each other.
  • the at least one closed end of the double-stranded nucleic acid is performed by contacting the nucleic acid molecule with a protelomerase, preferably a TelN protelomerase, to cleave and covalently close the cleaved end.
  • a protelomerase preferably a TelN protelomerase
  • the protelomerase can covalently close the nucleic molecule, resulting in a covalently closed end.
  • a preferred protelomerase for use in the invention is a bacteriophage protelomerase.
  • a protelomerase can be selected from the group consisting of:phiHAP-1 from Halomonas aquamarina, PY54 from Yersinia enterolytica, phiKO2 from Klebsiella oxytoca, VP882 from Vibrio sp. and Nl 5 from Escherichia coll, or variants of any thereof.
  • the protelomerase may have an amino acid sequence as disclosed in WO2010/086626, which is incorporated herein by reference.
  • the use of bacteriophage Nl 5 (TelN) protelomerase or a variant thereof is particularly preferred.
  • a preferred protelomerase has a sequence of at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity with SEQ ID NO: 11.
  • Variants include homologues or mutants thereof. Mutants include truncations, substitutions or deletions with respect to the native sequence.
  • a variant preferably produces closed linear DNA from a template comprising a protelomerase recognition sequence as described herein. In case the nucleic acid molecule comprises a protelomerase recognition site on only one end of the molecule, contacting the molecule with a protelomerase results in a double-stranded nucleic acid molecule having one open and one closed end.
  • the nucleic acid molecule comprises a protelomerase recognition site on both ends.
  • contacting the nucleic acid molecule with a protelomerase results in a double-stranded nucleic acid molecule having two closed ends.
  • Step c) may therefore comprise the following (sub-)steps of: a step c1) of contacting the nucleic acid molecule provided in step a) with protelomerase.
  • the provided nucleic acid molecule comprises a protelomerase recognition site on both ends and hence contacting the nucleic acid molecule with a protelomerase covalently closes said both ends.
  • An optional step c2) of contacting the nucleic acid sample with an exonuclease can digest any remaining nucleic acid molecule comprising at least one open end, thereby enriching the sample for the nucleic acid molecule comprising closed ends on both sides.
  • the method may comprise an optional step c2) of contacting the nucleic acid molecule with an exonuclease.
  • the exonuclease may digest any nucleic acid molecule not comprising two closed ends, i.e. comprising one or two open ends.
  • Such nucleic acid molecules are for example, but not limited to, nucleic acid molecules without adapters, nucleic acid molecules with one or two adapters having an open end, and/or cleaved nucleic acid molecules having one open end and one closed end.
  • the sample provided in step a) may thus comprise a plurality of nucleic acid molecules and in step c1) part of the nucleic acid molecules are covalently closed, i.e. part of the nucleic acid molecules comprise two closed ends.
  • Optional step c2) is thus a step of the removing open-ended double-stranded nucleic acid molecules by exposing the sample to an exonuclease.
  • the nucleic acid molecules having two closed ends are protected from degradation, while the non-protected fragments are degraded, resulting in enrichment or complexity reduction of the nucleic acid molecules comprising the sequence of interest. Therefore in an embodiment, the method of the invention takes the approach of removal of an undesired (non-target) part of the nucleic acid sample.
  • the adapters in step b) may be ligated to nucleic acid molecules having a selective staggered overhang, for example created by enzymatic digestion.
  • the molecules comprising the adapters are subsequently closed in step c1), and the exonuclease treatment in step c2) may digest any nucleic acid molecule not having two closed ends.
  • the exonuclease treatment in step c2) may thus result in an enrichment of nucleic acid molecules comprising closed ends.
  • the exonuclease may be an exonuclease I, III, V, VII, VIII, or related enzyme, or any combination thereof.
  • Exonuclease III recognizes nicks and extend the nick to a gap until a piece of ssDNA is formed. Exonuclease VII can degrade this ssDNA. Exonuclease I also degrades ssDNA. Exolll and ExoVII is a preferred combination of exonucleases for use in step c2) of the method described herein.
  • Exonuclease V is capable of degrading ssDNA and dsDNA in both 3’ to 5’ and in 5’ to 3’ direction. Therefore in a preferred embodiment, the exonuclease in step c2) of the method described herein is an exonuclease that is capable of degrading ssDNA and dsDNA in both 3’ to 5’ and in 5’ to 3’ direction, preferably an exonuclease V.
  • Step c2) is preferably performed at conditions (e.g. time, temperature, reaction buffer, enzyme concentration, etc) sufficient for the exonuclease to degrade substantially all non-protected nucleic acid molecules.
  • step c2) is performed at conditions and time sufficient for the exonuclease to degrade all non-protected nucleic acid molecules.
  • Step c2) is preferably performed for about 1 minute to about 12 hours, preferably 30 min, at about 20-80°C, preferably about 37°C,
  • the exonuclease may be inactivated by, for example, but not limited to, at least one of a Proteinase, e.g.
  • Proteinase K treatment or heat inactivation. Such techniques are standard in the art and the skilled person straightforwardly understands how to inactivate an exonuclease.
  • a preferred inactivation step is heating the sample at a temperature of about 50 - 90°C, preferably about 75°C, for about 1 - 120 minutes, preferably about 10 minutes.
  • both ends of a double-stranded nucleic acid molecules are closed.
  • the molecules that are closed on both ends are insensitive for 5’ or 3’ modifying enzymes.
  • an optional step of exonuclease treatment of the protelomerase-treated sample can be added to remove any possible nucleic acid molecules that are not covalently closed on both ends.
  • the (covalently closed) nucleic acid molecules can be selectively opened by using for instance a restriction endonuclease or site-directed endonucleases.
  • nucleic acid molecules are still present in the reaction mixture, only those cleaved in the last opening reaction are able to be used in a subsequent (sequencing) process, for instance by ligating sequencing adapters to the opened ends thereby selectively rendering these opened fragments ready for sequencing.
  • the opened fragments may be degraded using exonuclease treatment, thereby enriching for the non-opened nucleic acid molecules for further processing.
  • these non-opened molecules may be opened in a second round of selective opening using for instance site-directed endonucleases targeted to these non-opened molecules.
  • the method described herein may further comprise a step c3) of cleaving the closed double-stranded nucleic acid molecule, thereby generating a double-stranded nucleic acid molecule having one open end and one closed end.
  • “Cleaving” is understood herein the generation of a double-stranded break.
  • the double-stranded break may be created by the use of a nuclease or by the use of two nickases that cleave opposite stands.
  • the double-stranded break may create a blunt open end of the nucleic acid molecule. After cleavage the cleaved nucleic acid molecule may thus have one open blunt end and one closed end.
  • the double-stranded break may create a staggered open end of the cleaved nucleic acid molecule.
  • the cleaved nucleic acid molecule may thus have one open staggered end and one closed end.
  • the double-stranded nucleic acid molecule comprising two closed ends may be cleaved at a target sequence located in the attached adapter or primer, and/or may be cleaved at a target sequence located in the doublestranded nucleic acid molecule.
  • the adapter and/or primer for use in the method provided herein further comprises a restriction enzyme recognition site or target sequence for a site-directed nuclease between the protelomerase recognition sequence and the part of the adapter for ligation to the nucleic acid molecule, or between protelomerase recognition sequence and the part of the primer for hybridization to the nucleic acid molecule, respectively.
  • a restriction enzyme recognition site or target sequence for a site-directed nuclease between the protelomerase recognition sequence and the part of the adapter for ligation to the nucleic acid molecule, or between protelomerase recognition sequence and the part of the primer for hybridization to the nucleic acid molecule, respectively.
  • such adapter or primer is located at only one side of the nucleic acid molecule of the invention, while an adapter or primer located at the opposite side of said molecule comprises a protelomerase recognition sequence but lacks said restriction enzyme recognition or target site.
  • the nucleic acid molecule may be contacted with a restriction enzyme or site- directed nuclease, resulting in a double stranded nucleic acid molecule having one closed and one open end.
  • nucleic acid molecule comprising two closed ends may be opened by fragmentation and/or tagmentation.
  • the nucleic acid molecule in step c3) is cleaved by a site-directed endonuclease or a restriction endonuclease.
  • all nucleic acid molecules comprising two closed ends will be cleaved in step c3) by the endonuclease and subsequently sequenced in step d) of the method provided herein.
  • only a part or “subset” of the nucleic acid molecules comprise a target sequence that is recognized by the endonuclease.
  • the closed nucleic acid molecule comprises a single sequence that is targeted by the endonuclease.
  • the nucleic acid molecule may comprise the target sequence more than once, e.g. the nucleic acid molecule may comprise the target sequence 1 , 2, 3, 4, 5, 6 or more times.
  • the nucleic acid molecule comprising closed ends may be cleaved by a restriction endonuclease.
  • Any sequence-specific endonuclease may be suitable for use in the method provided herein.
  • the endonuclease may be a so-called “restriction endonuclease” or “restriction enzyme”, e.g. a Type I, Type II, Type III, Type IV or Type V restriction endonuclease.
  • a preferred restriction endonuclease is a Type II restriction endonuclease, preferably Type IIP or Type IIS.
  • the enzyme used in step c3) is preferably a different restriction endonuclease.
  • the nucleic acid molecule may be cleaved by a site-directed nuclease.
  • a site-directed nuclease may be selected from the group consisting of a TALENs, an RNA-guided CRISPR nuclease, a zinc finger nuclease and a meganuclease.
  • the site-directed nuclease is a TALENs or an RNA-guided CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) nuclease.
  • the CRISPR nuclease is a Type II CRISPR-nuclease, e.g., Cas9 (e.g., the protein of SEQ ID NO: 12, encoded by SEQ ID NO: 13, or the protein of SEQ ID NO: 14) or a Type V CRISPR-nuclease, e.g. Cpf1 (e.g., the protein of SEQ ID NO: 15, encoded by SEQ ID NO: 16) or Mad7 (e.g.
  • Cas9 e.g., the protein of SEQ ID NO: 12, encoded by SEQ ID NO: 13, or the protein of SEQ ID NO: 14
  • Type V CRISPR-nuclease e.g. Cpf1 (e.g., the protein of SEQ ID NO: 15, encoded by SEQ ID NO: 16) or Mad7 (e.g.
  • the site-directed nuclease is a Type II CRISPR-nuclease, preferably a Cas9 nuclease.
  • the skilled person knows how to obtain the site- directed nuclease for use in the method of the invention, such as a TALEN or a CRISPR-nuclease.
  • the site-directed nuclease is a CRISPR-nuclease being either a nickase or (endo)nuclease.
  • the site-directed nuclease for use in the method of the invention may comprise or consist of a whole type II or type V CRISPR-nuclease or variant or functional fragment thereof.
  • the site-directed nuclease is a Cas9 protein.
  • the Cas9 protein may be derived from the bacteria Streptococcus pyogenes (SpCas9; NCBI Reference Sequence NC_017053.1 ; UniProtKB - Q99ZW2), Geobacillus thermodenitrificans (UniProtKB- A0A178TEJ9), Corynebacterium ulcerous (NCBI Refs: NC_015683.1 , NC_017317.1); Corynebacterium diphtheria (NCBI Refs: NC_016782.1 , NC_016786.1); Spiroplasma syrphidicola (NCBI Ref: NC_021284.1); Prevotella intermedia (NCBI Ref: NC_017861 .1); Spiroplasma taiwanense (NCBI Ref: NC_021846.1); Streptococcus iniae (NCBI Ref: NC_021314.1); Belliella baltica (NCBI Ref: NC_018010.1);
  • Cas9 variants from these having an inactivated HNH or RuvC domain homologues to SpCas9, e.g. the SpCas9_D10A or SpCas9_H840A, or a Cas9 having equivalent substitutions at positions corresponding to D10 or H840 in the SpCas9 protein, rendering a nickase.
  • the site-directed nuclease may be, or may be derived from, Cpf1 , e.g. Cpf1 from Acidaminococcus sp; UniProtKB - U2UMQ6.
  • the variant may be a Cpf1 -nickase having an inactivated RuvC or NUC domain, wherein the RuvC or NUC domain has no nuclease activity anymore.
  • the skilled person is well aware of techniques available in the art such as site-directed mutagenesis, PCR-mediated mutagenesis, and total gene synthesis that allow for inactivated nucleases such as inactivated RuvC or NUC domains.
  • Cpf1 R1226A An example of a Cpf1 nickase with an inactive NUC domain is Cpf1 R1226A (see Gao et al. Cell Research (2016) 26:901-913, Yamano et al. Cell (2016) 165(4): 949-962).
  • R1226A arginine to alanine
  • the site-directed nuclease may be, or may be derived from, CRISPR-Cas ⁇ t>, a nuclease that is about half the size of Cas9.
  • CRISPR-Cas ⁇ t> uses a single crRNA for targeting and cleaving the nucleic acid as is described e.g. in Pausch et al (CRISPR-Cas ⁇ P from huge phages is a hypercompact genome editor, Science (2020); 369(6501 ):333-337).
  • an active, partly inactive or a dead site-directed nuclease preferably an active, partly inactive or a dead CRISPR-nuclease complex may serve to guide a fused functional domain to a specific site in the nucleic acid molecule as determined by the guide RNA.
  • the site-directed nuclease may be fused to a functional domain.
  • such functional domain is an endonuclease domain.
  • an inactive CAS protein for use in a method as defined herein e.g.
  • dCas9, dCpfl is fused to a restriction enzyme such as, but not limited to, Fok1 or Clo51 , preferably as described in WO2014/144288, WO2016/205554, Tsai et al. Nat Biotechnol. 2014 Jun; 32(6): 569-576, or Cheng et al Biotechnol J. 2022 Jul;17(7): e2100571 , all of which are incorporated herein by reference).
  • a restriction enzyme such as, but not limited to, Fok1 or Clo51 , preferably as described in WO2014/144288, WO2016/205554, Tsai et al. Nat Biotechnol. 2014 Jun; 32(6): 569-576, or Cheng et al Biotechnol J. 2022 Jul;17(7): e2100571 , all of which are incorporated herein by reference).
  • the fusion protein has a sequence of any one of SEQ ID NO: 19 - 21 , or a sequence having at least about 50%, 55%, 60%, 65%, 70%, 75%, 805, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity with any one of SEQ ID NO: 19 - 21 .
  • the nucleic acid molecule comprising two closed ends may be cleaved with a guide RNA- CAS complex.
  • the guide RNA directs the complex to a defined target site in a double-stranded nucleic acid molecule, also named the protospacer sequence.
  • the guide RNA comprises a sequence for targeting the site-directed nuclease complex to a protospacer sequence that is preferably near, at or within a sequence of interest in the double-stranded nucleic acid molecule.
  • the guide RNA may be a guide RNA that is a single guide (sg)RNA molecule, or the combination of a crRNA and a tracrRNA (e.g. for Cas9) as separate molecules, or a crRNA molecule only (e.g. in case of Cpf1 and Cas ⁇ t>).
  • the guide RNA is a single guide (sg)RNA (e.g. for Cas9) or a crRNA only (e.g. in case of Cpf1 and Cas ⁇ t>).
  • the guide RNA for use in a method provided herein may comprise a sequence that can hybridize to a sequence in the double-stranded nucleic acid molecule, preferably to or near a sequence of interest, preferably a sequence of interest as defined herein.
  • the guide RNA preferably comprises a nucleotide sequence that is fully complementary to a sequence located in the doublestranded nucleic acid molecule, optionally fully complementary to a sequence located in the sequence of interest i.e. the sequence of interest may comprise a protospacer sequence.
  • the nucleic acid molecule comprising two closed ends can be cleaved with a site-directed endonuclease, wherein the site-directed endonuclease is a TALENs.
  • TALENs are well-known forthe person skilled in the art and are constructed by fusing a TAL effector DNA binding domain (TALE) to an effector domain, preferably a (non-specific) DNA cleavage domain, such as a Fokl cleavage domain.
  • TALE TAL effector DNA binding domain
  • TALE transcriptional activator like effector DNA binding domain
  • Transcription activator- like effector nucleases are fusions of a restriction endonuclease cleavage domain, preferably a Fokl domain, with a DNA-binding transcription activator-like effector (TALE) repeat array.
  • TALE DNA-binding transcription activator-like effector
  • Other useful endonuclease domains may include, for example, Hhal, Hindlll, Notl, BbvCI, EcoRI, Bgl II and Alwl.
  • the cleavage domain is a Fokl domain.
  • the method as detailed herein further comprises a step d) of sequencing both strands of at least part of the double-stranded nucleic acid molecule in a single sequencing reaction to generate a duplex read.
  • the double-stranded nucleic acid obtained after step c) of the method provided herein comprises one open end and one closed end. Using a single sequencing reaction, both strands a sequenced directly after each other, generating a duplex read.
  • a further adapter Prior to sequencing a further adapter can be attached to the nucleic acid molecule.
  • the further adapter is preferably attached to the open end of the double-stranded nucleic acid molecule.
  • the further adapter may be an adapter suitable for amplification and/or sequencing.
  • a sequencing adapter is ligated to the open end of the double-stranded nucleic acid after step c) and prior to step d).
  • the additional adapter may be a sequencing adapter, e.g.
  • the further adapter allows for Oxford Nanopore Technologies (ONT) sequencing.
  • the further adapter comprises at least one sequencing primer binding site and/or the further adapter comprises at least one amplification primer binding site.
  • the further adapter may comprise at least two sequencing primer binding sites and/or the further adapter may comprise at least two amplification primer binding site.
  • the additional adapter may be a singlestranded, double-stranded, partly double-stranded, Y-shaped or a hairpin nucleic acid molecule.
  • the adapter is a (partly) double-stranded adapter comprising two open ends.
  • the further adapter comprises an identifier sequence, preferably an identifier sequence as defined herein.
  • the double-stranded nucleic acid molecule comprising one open and one closed end may be repaired, preferably by polishing the ends of the nucleic acid molecule.
  • the method as provided herein may optionally comprise a step of polishing the open end of the nucleic acid molecule.
  • the nucleic acid molecule comprising one open and one closed end may be modified to comprise an A-overhang, preferably to facilitate ligation to the further adapter, wherein the further adapter preferably comprises a T-overhang.
  • the method of the invention may optionally comprise a step of A-tailing the, optionally polished, nucleic acid molecule.
  • step d) of the method of the invention the double-stranded nucleic acid molecule comprising one open end and one closed end is sequenced to generate a duplex read.
  • Part of the nucleic acid molecule may be sequenced, wherein the sequenced part comprises part of the forward strand and part of the complementary reverse strand.
  • a duplex read is a sequencing read comprising a sequence of the forward strand of a double-stranded nucleic acid molecule, followed by the sequence of the reverse strand of the same double-stranded nucleic acid molecule.
  • the double-stranded nucleic acid molecule comprising one open and one closed end is denatured, such that the forward and reverse strand form a singlestranded template, connected by the closed end of the nucleic acid molecule.
  • the nucleotides forming the closed end are thus located in between the sequence of the forward strand and the sequence of the reverse strand.
  • the sequences forming the closed end are in the middle of the single-stranded template.
  • the single-stranded nucleic acid molecule or “opened double-stranded nucleic acid molecule” is sequenced in a single sequencing reaction.
  • a generated duplex read thus preferably comprises the following three successive sequences: a first sequence comprising (part of) the forward strand, a second sequence comprising the two nucleotides forming the closed end of the double-stranded nucleic acid molecule, and a third sequence comprising (part of) the sequence of the reverse strand.
  • the second sequence preferably further comprises the protelomerase recognition site, or a part thereof.
  • a generated duplex read preferably comprises a first sequence comprising (part of) the forward strand, a second sequence comprising the two nucleotides forming the closed end of the double-stranded nucleic acid molecule, and a third sequence comprising the reverse complement sequence of the first sequence.
  • the complete nucleic molecule may be sequenced.
  • the prepared nucleic acid molecule is deep-sequenced.
  • Sequencing may include at least one of ILLUMINATM, SOLEXATM sequencing, Ion Torrent sequencing, the Pacific Biosciences' SMRTTM sequencing, Pacific Biosciences Sequencing By Binding (SBB), Pacific Biosciences Onso system, Sanger sequencing, Genapsys, Pollonator Polony sequencing, Oxford Nanopore Technologies (ONT), Ontera sequencing, Singular Genomics, Element Biosciences and Complete Genomics sequencing.
  • the sequencing of step d) of the method provided herein is nanopore sequencing.
  • Nanopore sequencing technologies include but are not limited to Oxford Nanopore sequencing technologies (e.g., GridlON, MinlON) and Genia sequencing technologies (see for example Logsdon et al., Nat. Rev. Genet. 2020; 21 (10):597-614).
  • the prepared nucleic acid molecule can be sequenced by nanopore selective (“Read Until”) sequencing.
  • Read Until nanopore selective sequencing
  • the generated data is compared to one or more reference sequence(s).
  • sequencing will proceed, if not, the current is reversed thereby removing the nucleic acid from the pore and making the pore available for sequencing of a new nucleic acid.
  • the set number of nucleotides may be at least the first 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, or 500 nucleotides of the nucleic acid read.
  • the one or more reference sequences may be a multitude of different sequences.
  • each of these reference sequences is at least 50, 60, 70, 80, 90, 92, 93, 94, 95, 96, 97 98, 99 or 100% identical to the sequence of the nucleic acid molecule obtained by the method of the invention.
  • each of the reference sequences is at least 50, 60, 70, 80, 90, 92, 93, 94, 95, 96, 97 98, 99 or 100% identical to the sequence of a particular subset of nucleic acid molecules obtained by the method of the invention.
  • One of the benefits of selectively sequencing a particular subset by nanopore selective sequencing is that in different sequencing runs, different subsets may be sequenced using the prepared library of nucleic acid molecules having one open end and one closed end.
  • the sequence read lengths obtained by long-read sequencing in the method of the invention can be at least about 100 bp, 200 bp, 300 bp, 400 bp, 500 bp, 600 bp, 700 bp, 800 bp, 900 bp, 1 kb, 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 20 kb, 30 kb, 40 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb, 100 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, 700 kb, 800 kb, 900 kb, 1 Mb, 2 Mb, 3 Mb, 4 Mb, 5 Mb, 6 Mb, 7 Mb, 8 Mb, 9 Mb, or 10 Mb.
  • the method provided herein further comprises a step e) of generating a consensus sequence from the duplex read.
  • the duplex read comprises the sequence of the forward strand as well as the sequence of the reverse strand of the same double-stranded nucleic acid molecule.
  • the obtained sequence of the forward strand and the reverse strand can be combined or “collapsed” into a consensus sequence.
  • the method of the invention improves the sequencing accuracy, preferably to a modal accuracy of about Q30.
  • An accuracy of Q30 is equivalent to the probability of an incorrect base call 1 in 1000 times, i.e. the base call accuracy is 99.9%.
  • the consensus sequence can optionally be aligned or compared to a known nucleotide sequence, such as, but not limited to, a known genomic sequence.
  • Generating a consensus sequence from the duplex read thus results in the determination the sequence in the double-stranded nucleic acid molecule, preferably with an high accuracy, preferably with a modal accuracy of about Q30.
  • the method is performed for a plurality of samples.
  • the method of the invention is multiplexed, i.e. applied simultaneously for multiple nucleic acid samples, such as for at least about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 500, 1000 or more nucleic acid samples.
  • the method may thus be performed in parallel on a plurality of samples, wherein “in parallel” is to be understood herein as substantially simultaneously but each sample being processed in a separate reaction tube or vessel.
  • one or more steps of the method provided herein may be performed on a plurality of pooled samples.
  • the plurality of samples are preferably pooled prior to step d).
  • the plurality of samples are pooled prior to step c).
  • the nucleic acid molecules may be tagged with an identifier prior to pooling the samples.
  • an identifier can be any detectable entity, such as, but not limited to, a radioactive or fluorescent label, but preferably is a particular nucleotide sequence or combination of nucleotide sequences, preferably of defined length.
  • the samples can be pooled using a clever pooling strategy, such as, but not limited to, a 2D and 3D pooling strategy, such that after pooling each sample is encompassed in at least two or three pools, respectively.
  • a clever pooling strategy such as, but not limited to, a 2D and 3D pooling strategy, such that after pooling each sample is encompassed in at least two or three pools, respectively.
  • a particular nucleic acid molecule can be traced back to the originating sample by using the coordinates of the respective pools comprising the doublestranded nucleic acid molecules.
  • the plurality of samples may be pooled prior to step d) and/or prior to step c).
  • the nucleic sample may be purified and/orthe reaction enzyme may be inactivated.
  • a purification step e.g., an AMPure bead-based purification process, may be included to remove complexes, enzymes, free nucleotides, possible free adapters, and possible small, non-relevant, nucleic acid molecules.
  • the nucleic acid molecule may be recovered after purification and subjected to further processing and/or analysis, such as single-molecule sequencing.
  • An optional purification step is a proteinase K treatment.
  • said purification may comprise the following steps: I. exposing the nucleic acid sample to one or more solid supports that specifically and effectively bind the nucleic acid molecule; and optionally,
  • the one or more solid supports may be, but not limited to, AMPure beads. After purification, at least one purified nucleic acid molecule is obtained.
  • the method of the invention may further comprise a size-selection step.
  • a size-selection step As also the remainder of the protelomerase recognition site, cut loose from the molecule of the invention in step c), is closed upon contacting with the protelomerase in step c), these (small) fragments are preferably removed prior to sequencing.
  • the size-selection step is performed prior to step d) of the method of the invention. Alternatively, there is no further purification, inactivation and/or size selection step.
  • the method provided herein is a sequencing method that is free of amplification and/or cloning steps. Reduction of amplification steps is beneficial, as epigenetic information (e.g., 5-mC, 6-mA, etc.) will get lost in amplicons. Further amplification can introduce variations in the amplicons (e.g., via errors during amplification) such that their nucleotide sequence is not reflective of the original sample.
  • epigenetic information e.g., 5-mC, 6-mA, etc.
  • Further amplification can introduce variations in the amplicons (e.g., via errors during amplification) such that their nucleotide sequence is not reflective of the original sample.
  • kits of parts for performing the method described herein.
  • the kit of parts is for use in a method as defined herein.
  • the kit of parts comprises at least one or more adapters comprising a protelomerase recognition site as defined herein.
  • a plurality of adapters may be combined in one vial or may be present in separate vials, e.g. wherein the adapters of one vial comprise the same identifier sequence, preferably the same sample identifier sequence.
  • the kit of parts comprises at least one or more primers comprising a protelomerase recognition sequence as defined herein.
  • a plurality of primers may be combined in one vial or may be present in separate vials, e.g. wherein the primers of one vial comprise the same identifier sequence, preferably the same sample identifier sequence.
  • the kit of parts may further comprise a vial comprising a protelomerase as defined herein.
  • the kit of parts may comprise one or more reagents for performing an ONT sequencing reaction.
  • the kit of parts may comprise at least one of: one or more vials comprising adapters comprising a protelomerase recognition site as defined herein; one or more vials comprising primers comprising a protelomerase recognition site as defined herein; one or more vials comprising a further adapter as defined herein for sequencing, preferably for an ONT sequencing reaction; and one or more vials comprising a protelomerase as defined herein.
  • the kit of parts may further comprise at least one of: - one or more vials comprising a restriction endonuclease;
  • the kit comprises at least 2, 4, 10, 20, 30, or 50 vials comprising one or more gRNAs as defined herein.
  • the volume of any of the vials within the kit do not exceed 100mL, 50mL, 20mL, 10mL, 5mL, 4mL, 3mL, 2mL or 1 mL.
  • the reagents may be present in lyophilized form, or in an appropriate buffer.
  • the kit may also contain any other component necessary for carrying out the present invention, such as buffers, pipettes, microtiter plates and written instructions. Such other components for the kits of the invention are known to the skilled person.
  • FIG. 1 The figure shows the alignment of an exemplary duplex read, obtained after ONT sequencing of a double-stranded nucleic acid having an open end and a closed end.
  • the top strand (SEQ ID NO: 22) is the duplex (sequence) read comprising the forward strand (dashed forward arrow), the protelomerase recognition sequence (SEQ ID NO: 23, underlined) and the reverse strand (dashed reversed arrow).
  • the bottom stand shows part of the reference (lambda) sequence (SEQ ID NO: 24 and 25).
  • the nucleotides below the alignment indicate the nucleotides present in the reference sequence, but not present in the duplex read (see also e.g. SEQ ID NO: 28).
  • the symbol indicates nucleotides present in the duplex read, but not present in the reference sequence.
  • the lambda sequence is shown in SEQ ID NO: 29 and 30.
  • a sequencing library from a DNA-Hindlll Digest was prepared.
  • adapters comprising a protelomerase recognition site were added to both ends of the nucleic acid molecules and subsequent TelN treatment resulted in covalently closing the ends of the nucleic acid molecules (as confirmed by the resistance to Exo V treatment).
  • Subsequent Pvul digestion resulted in opening of one end of the nucleic acid molecule, providing for nucleic acid molecules having one open and one closed end.
  • the nucleic acid molecules were sequenced on the nanopore deep-sequencing platform.
  • the double-stranded TelN adapter was generated as follows:
  • a double-stranded TelN adapter (25pM) was formed by mixing 5 pl of the top strand (100pM), 5 pl of the bottom strand (100pM) in the presence of 10 pl of 50 mM Tris/HCI (pH 7.5). The mixture was heated in a thermal cycler to 90°C for three minutes, followed by slow cooling at 0.1 °C per 5 seconds to 37°C.
  • the lambda DNA (previously digested with Hindlll) was polished and an 3’ A-overhang was added to the digested sample:
  • the adapter comprising a protelomerase recognition site was ligated to the purified sample.
  • the product of 12 reactions was pooled and purified by adding 2 volumes of AMPure PB beads according to manufacturer’s instructions (Pacific Bio Science; 100-265-900), and eluted in 37 pl Nuclease free water. The concentration is measured using the Qubit fluorometer
  • the ends of the nucleic acids were closed using the protelomerase TelN.
  • the TelN treated products were exposed to the exonuclease V.
  • the product of 3 reactions is pooled, purified by adding 2 volumes of AMPure PB beads according to manufacturer’s instructions (Pacific Bio Science; 100-265-900), and eluted in 11 pl Nuclease free water. The concentration is measured using the Qubit fluorometer.
  • nucleic acid sample was incubated with the Pvul restriction enzyme.
  • the product is purified by adding 2 volumes of AMPure PB beads according to manufacturer’s instructions (Pacific Bio Science; 100-265-900), eluted in 50 pl Nuclease free water and pooled.
  • the product is purified by adding 2 volumes of AMPure PB beads according to manufacturer’s instructions (Pacific Bio Science; 100-265-900), eluted in 60 pl Nuclease free water and pooled.
  • the processed samples were sequenced on a MinlON Mk1 B portable sequencing device using flow cell R9.4.1 with realtime basecalling (MinKNOW v 22.05.5) for 72 hours.
  • ONT sequencing of a single DNA strand results in sequencing errors.
  • the base call accuracy of ONT sequencing is in general around Q20 (/.e. an accuracy of about 99%, thus an incorrect base call probability of 1 in 100).
  • the forward and reverse strand do not have the same sequencing errors and these errors are thus removed upon generating a consensus sequence of the forward and reverse strand. Using the method described herein, the sequencing accuracy is significantly improved.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne un procédé pour augmenter la précision de séquençage, en particulier le séquençage ONT. Selon un aspect, l'invention concerne un procédé pour déterminer une séquence d'intérêt dans une molécule d'acide nucléique double brin, le procédé consistant à fournir un échantillon comprenant la molécule d'acide nucléique double brin, à fermer par covalence au moins une extrémité de la molécule d'acide nucléique double brin pour obtenir une molécule d'acide nucléique double brin présentant une extrémité ouverte et une extrémité fermée, à séquencer les deux brins d'au moins une partie de la molécule d'acide nucléique double brin dans une seule réaction de séquençage afin de générer une lecture duplex ; et générer une séquence consensus à partir de la lecture duplex pour déterminer la séquence dans la molécule d'acide nucléique double brin. De préférence, le séquençage est un séquençage par nanopores.
PCT/EP2023/084817 2022-12-08 2023-12-08 Séquençage duplex avec extrémités d'adn fermées de manière covalente WO2024121354A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP22212141.0 2022-12-08
EP22212141 2022-12-08

Publications (1)

Publication Number Publication Date
WO2024121354A1 true WO2024121354A1 (fr) 2024-06-13

Family

ID=84463290

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/084817 WO2024121354A1 (fr) 2022-12-08 2023-12-08 Séquençage duplex avec extrémités d'adn fermées de manière covalente

Country Status (1)

Country Link
WO (1) WO2024121354A1 (fr)

Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0534858A1 (fr) 1991-09-24 1993-03-31 Keygene N.V. Amplification sélective des fragments de restriction: procédé général pour le "fingerprinting" d'ADN
US5409818A (en) 1988-02-24 1995-04-25 Cangene Corporation Nucleic acid amplification process
US5948902A (en) 1997-11-20 1999-09-07 South Alabama Medical Science Foundation Antisense oligonucleotides to human serine/threonine protein phosphatase genes
WO2000024939A1 (fr) 1998-10-27 2000-05-04 Affymetrix, Inc. Gestion de la complexite et analyse d'adn genomique
US6410278B1 (en) 1998-11-09 2002-06-25 Eiken Kagaku Kabushiki Kaisha Process for synthesizing nucleic acid
WO2003010328A2 (fr) 2001-07-25 2003-02-06 Affymetrix, Inc. Gestion de la complexite de l'adn genomique
WO2003012118A1 (fr) 2001-07-31 2003-02-13 Affymetrix, Inc. Gestion de la complexite d'adn genomique
US20040010153A1 (en) 2001-07-10 2004-01-15 Manzer Leo Ernest Manufacture of 3-methyl-tetrahydrofuran from alpha-methylene-gamma-butyrolactone in a single step process
WO2004022758A1 (fr) 2002-09-05 2004-03-18 Plant Bioscience Limited Partitionnement de genome
WO2006137733A1 (fr) 2005-06-23 2006-12-28 Keygene N.V. Strategies permettant l'identification et la detection a haut rendement de polymorphismes
WO2007037678A2 (fr) 2005-09-29 2007-04-05 Keygene N.V. Criblage a haut debit de populations mutagenisees
WO2007073165A1 (fr) 2005-12-22 2007-06-28 Keygene N.V. Procede pour detecter des polymorphismes a base aflp, avec un rendement eleve
WO2007073171A2 (fr) 2005-12-22 2007-06-28 Keygene N.V. Strategies ameliorees pour etablir des profils de produits de transcription au moyen de technologies de sequençage a rendement eleve
WO2007114693A2 (fr) 2006-04-04 2007-10-11 Keygene N.V. Detection de marqueurs moleculaires a haut rendement base sur des fragments de restriction
WO2010079430A1 (fr) 2009-01-12 2010-07-15 Ulla Bonas Domaines modulaires de liaison à l'adn et procédés d'utilisation
WO2010086626A1 (fr) 2009-01-30 2010-08-05 Touchlight Genetics Limited Production d'adn linéaire fermé
WO2011072246A2 (fr) 2009-12-10 2011-06-16 Regents Of The University Of Minnesota Modification de l'adn induite par l'effecteur tal
EP2601312A1 (fr) * 2010-08-04 2013-06-12 Touchlight Genetics Limited Production d'adn linéaire fermé en utilisant une séquence palindromique
US20140134610A1 (en) 2012-01-31 2014-05-15 Pacific Biosciences Of California, Inc. Compositions and methods for selection of nucleic acids
WO2014144288A1 (fr) 2013-03-15 2014-09-18 The General Hospital Corporation Utilisation de nucléases foki à guidage arn (rfn) pour augmenter la spécificité pour la modification d'un génome à guidage arn
WO2015027134A1 (fr) 2013-08-22 2015-02-26 President And Fellows Of Harvard College Domaines d'effecteur de type activateur de transcription (tale) modifiés par génie genetique et leurs utilisations
WO2016205554A1 (fr) 2015-06-17 2016-12-22 Poseida Therapeutics, Inc. Compositions et procédés permettant de diriger des protéines vers des loci spécifiques dans le génome
WO2021123062A1 (fr) 2019-12-20 2021-06-24 Keygene N.V. Préparation de bibliothèque de ngs à l'aide d'extrémités de molécules d'acide nucléique fermées de manière covalente
AU2020370740A1 (en) * 2019-10-25 2021-09-30 Illumina Cambridge Limited Methods for generating, and sequencing from, asymmetric adaptors on the ends of polynucleotide templates comprising hairpin loops
WO2022074058A1 (fr) 2020-10-06 2022-04-14 Keygene N.V. Addition de séquence ciblée

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5409818A (en) 1988-02-24 1995-04-25 Cangene Corporation Nucleic acid amplification process
EP0534858A1 (fr) 1991-09-24 1993-03-31 Keygene N.V. Amplification sélective des fragments de restriction: procédé général pour le "fingerprinting" d'ADN
US5948902A (en) 1997-11-20 1999-09-07 South Alabama Medical Science Foundation Antisense oligonucleotides to human serine/threonine protein phosphatase genes
WO2000024939A1 (fr) 1998-10-27 2000-05-04 Affymetrix, Inc. Gestion de la complexite et analyse d'adn genomique
US6410278B1 (en) 1998-11-09 2002-06-25 Eiken Kagaku Kabushiki Kaisha Process for synthesizing nucleic acid
US20040010153A1 (en) 2001-07-10 2004-01-15 Manzer Leo Ernest Manufacture of 3-methyl-tetrahydrofuran from alpha-methylene-gamma-butyrolactone in a single step process
WO2003010328A2 (fr) 2001-07-25 2003-02-06 Affymetrix, Inc. Gestion de la complexite de l'adn genomique
US20050260628A1 (en) 2001-07-25 2005-11-24 Affymetrix, Inc. Complexity management of genomic DNA
WO2003012118A1 (fr) 2001-07-31 2003-02-13 Affymetrix, Inc. Gestion de la complexite d'adn genomique
WO2004022758A1 (fr) 2002-09-05 2004-03-18 Plant Bioscience Limited Partitionnement de genome
WO2006137733A1 (fr) 2005-06-23 2006-12-28 Keygene N.V. Strategies permettant l'identification et la detection a haut rendement de polymorphismes
WO2007037678A2 (fr) 2005-09-29 2007-04-05 Keygene N.V. Criblage a haut debit de populations mutagenisees
WO2007073165A1 (fr) 2005-12-22 2007-06-28 Keygene N.V. Procede pour detecter des polymorphismes a base aflp, avec un rendement eleve
WO2007073171A2 (fr) 2005-12-22 2007-06-28 Keygene N.V. Strategies ameliorees pour etablir des profils de produits de transcription au moyen de technologies de sequençage a rendement eleve
WO2007114693A2 (fr) 2006-04-04 2007-10-11 Keygene N.V. Detection de marqueurs moleculaires a haut rendement base sur des fragments de restriction
WO2010079430A1 (fr) 2009-01-12 2010-07-15 Ulla Bonas Domaines modulaires de liaison à l'adn et procédés d'utilisation
WO2010086626A1 (fr) 2009-01-30 2010-08-05 Touchlight Genetics Limited Production d'adn linéaire fermé
WO2011072246A2 (fr) 2009-12-10 2011-06-16 Regents Of The University Of Minnesota Modification de l'adn induite par l'effecteur tal
EP2601312A1 (fr) * 2010-08-04 2013-06-12 Touchlight Genetics Limited Production d'adn linéaire fermé en utilisant une séquence palindromique
US20140134610A1 (en) 2012-01-31 2014-05-15 Pacific Biosciences Of California, Inc. Compositions and methods for selection of nucleic acids
WO2014144288A1 (fr) 2013-03-15 2014-09-18 The General Hospital Corporation Utilisation de nucléases foki à guidage arn (rfn) pour augmenter la spécificité pour la modification d'un génome à guidage arn
WO2015027134A1 (fr) 2013-08-22 2015-02-26 President And Fellows Of Harvard College Domaines d'effecteur de type activateur de transcription (tale) modifiés par génie genetique et leurs utilisations
WO2016205554A1 (fr) 2015-06-17 2016-12-22 Poseida Therapeutics, Inc. Compositions et procédés permettant de diriger des protéines vers des loci spécifiques dans le génome
AU2020370740A1 (en) * 2019-10-25 2021-09-30 Illumina Cambridge Limited Methods for generating, and sequencing from, asymmetric adaptors on the ends of polynucleotide templates comprising hairpin loops
WO2021123062A1 (fr) 2019-12-20 2021-06-24 Keygene N.V. Préparation de bibliothèque de ngs à l'aide d'extrémités de molécules d'acide nucléique fermées de manière covalente
US20220333100A1 (en) * 2019-12-20 2022-10-20 Keygene N.V. Ngs library preparation using covalently closed nucleic acid molecule ends
WO2022074058A1 (fr) 2020-10-06 2022-04-14 Keygene N.V. Addition de séquence ciblée

Non-Patent Citations (26)

* Cited by examiner, † Cited by third party
Title
"NCBI", Database accession no. YP_002342100.1
"UniProtKB", Database accession no. AOA178TEJ9
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 10
ALTSCHUL ET AL., NUCLEIC ACIDS RES., vol. 25, no. 17, 1997, pages 3389 - 3402
AUSUBEL ET AL.: "Current Protocols in Molecular Biology", 1987, JOHN WILEY & SONS
BRENNER ET AL., NATURE BIOTECHNOLOGY, vol. 18, 2000, pages 630 - 634
BRENNER ET AL., PNAS, vol. 97, no. 4, 2000, pages 1665 - 1670
CHENG ET AL., BIOTECHNOL J, vol. 17, no. 7, July 2022 (2022-07-01), pages 2100571
FUKUMURA ET AL., NUCLEIC ACIDS RESEARCH, vol. 31, no. 16, 2003, pages 94
GAO ET AL., CELL RESEARCH, vol. 26, 2016, pages 901 - 913
HAEUSSLER ET AL., J GENET GENOMICS, vol. 43, no. 5, 2016, pages 239 - 50
KENZELMANNMUHLEMANN, NUCLEIC ACIDS RESEARCH, vol. 27, no. 3, 1999, pages 917 - 918
KOVAKA ET AL., TARGETED NANOPORE SEQUENCING BY REAL-TIME MAPPING OF RAW ELECTRICAL SIGNAL WITH UNCALLED, 3 February 2020 (2020-02-03)
LAVEDER ET AL., NUCLEIC ACIDS RESEARCH, vol. 30, no. 9, 2002, pages 1300 - 1307
LEE ET AL., PLANT BIOTECHNOLOGY JOURNAL, vol. 14, no. 2, 2016, pages 448 - 462
LOGSDON ET AL., NAT. REV. GENET., vol. 21, no. 10, 2020, pages 597 - 614
MATSUMURA ET AL., THE PLANT JOURNAL, vol. 20, no. 6, 1999, pages 719 - 726
METSIS ET AL., NUCLEIC ACIDS RESEARCH, vol. 32, no. 16, 2004, pages 27
PAUSCH ET AL.: "CRISPR-CasΦ from huge phages is a hypercompact genome editor", SCIENCE, vol. 369, no. 6501, 2020, pages 333 - 337, XP055862891, DOI: 10.1126/science.abb1400
PAYNE ET AL., NANOPORE ADAPTIVE SEQUENCING FOR MIXED SAMPLES, WHOLE EXOME CAPTURE AND TARGETED PANELS, 3 February 2020 (2020-02-03)
POWELL, NUCLEIC ACIDS RESEARCH, vol. 26, no. 14, 1998, pages 3445 - 3446
ROTH ET AL., NATURE BIOTECHNOLOGY, vol. 22, no. 4, 2004, pages 418 - 426
SAMBROOK ET AL.: "Molecular Cloning. A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS
TSAI ET AL., NAT BIOTECHNOL., vol. 32, no. 6, June 2014 (2014-06-01), pages 569 - 576
UNRAU PDEUGAU K.V., GENE, vol. 145, 1994, pages 163 - 169
YAMANO ET AL., CELL, vol. 165, no. 4, 2016, pages 949 - 962

Similar Documents

Publication Publication Date Title
US10876108B2 (en) Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library generation
US11203750B2 (en) Methods of sequencing nucleic acids in mixtures and compositions related thereto
US20210363570A1 (en) Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
US20220389416A1 (en) COMPOSITIONS AND METHODS FOR CONSTRUCTING STRAND SPECIFIC cDNA LIBRARIES
US20220333100A1 (en) Ngs library preparation using covalently closed nucleic acid molecule ends
CA3096856A1 (fr) Methode pour l'adaptation selective d'un polynucleotide
US20220033879A1 (en) Targeted enrichment by endonuclease protection
US11661624B2 (en) Methods of identifying and characterizing gene editing variations in nucleic acids
US20230407366A1 (en) Targeted sequence addition
WO2024121354A1 (fr) Séquençage duplex avec extrémités d'adn fermées de manière covalente
US20240002904A1 (en) Targeted enrichment using nanopore selective sequencing
WO2020234608A1 (fr) Protocole de détection d'interactions dans une ou plusieurs molécules d'adn à l'intérieur d'une cellule
CA3183405A1 (fr) Isolation et immobilisation d'acides nucleiques et utilisations associees
JP2024512463A (ja) 増幅されたライブラリからの望ましくない断片の選択的枯渇のためのブロッキングオリゴヌクレオチド