WO2013040060A2 - Nucleic acids for multiplex detection of hepatitis c virus - Google Patents

Nucleic acids for multiplex detection of hepatitis c virus Download PDF

Info

Publication number
WO2013040060A2
WO2013040060A2 PCT/US2012/054901 US2012054901W WO2013040060A2 WO 2013040060 A2 WO2013040060 A2 WO 2013040060A2 US 2012054901 W US2012054901 W US 2012054901W WO 2013040060 A2 WO2013040060 A2 WO 2013040060A2
Authority
WO
WIPO (PCT)
Prior art keywords
probes
sequence
probe
hcv
collection
Prior art date
Application number
PCT/US2012/054901
Other languages
French (fr)
Other versions
WO2013040060A3 (en
Inventor
Philip Alexander Rolfe
Original Assignee
Pathogenica, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pathogenica, Inc. filed Critical Pathogenica, Inc.
Publication of WO2013040060A2 publication Critical patent/WO2013040060A2/en
Publication of WO2013040060A3 publication Critical patent/WO2013040060A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/70Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving virus or bacteriophage
    • C12Q1/701Specific hybridization probes
    • C12Q1/706Specific hybridization probes for hepatitis
    • C12Q1/707Specific hybridization probes for hepatitis non-A, non-B Hepatitis, excluding hepatitis D
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2565/00Nucleic acid analysis characterised by mode or means of detection
    • C12Q2565/50Detection characterised by immobilisation to a surface
    • C12Q2565/519Detection characterised by immobilisation to a surface characterised by the capture moiety being a single stranded oligonucleotide
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/16Primer sets for multiplex assays

Definitions

  • the invention is directed to sets of nucleic acid probes for multiplex detection of hepatitis viruses and methods of using the probes.
  • HCV Hepatitis C Virus
  • Embodiments of the present invention include optimized nucleic acid probes, and methods of using them, that enable the skilled artisan to simultaneously detect HCV from a clinical sample, determine the genotype and sub-genotype (or genotypes) present, and detect the presence of a wide variety of mutations known to confer resistance to various direct acting antiviral (DAA) drugs.
  • the invention is based, at least in part, on the discovery of sequences, from sets of large query hepatitis C virus (HCV) sequences such as whole genomes, which can be used in multiplex diagnostic assays that dramatically reduce assay time and cost, compared to conventional diagnostics.
  • HCV hepatitis C virus
  • nucleic acids and methods of the invention enable the skilled artisan to identify hepatitis C virus and differentiate between closely related strains thereof based on the sequence of regions containing, for example, single nucleotide polymorphisms (SNPs), insertions, deletions, or indels (sites where a colocalized insertion and deletion has occurred, resulting in a net gain or loss in nucleotides).
  • SNPs single nucleotide polymorphisms
  • insertions, deletions, or indels sites where a colocalized insertion and deletion has occurred, resulting in a net gain or loss in nucleotides.
  • the nucleic acid probes and methods of the invention may be further multiplexed and used in automated systems, such as microplates, for high throughput processing of large numbers of samples by centralized laboratory, hospital, and/or diagnostic facilities.
  • aspects of the invention provide nucleic acid probes and mixtures comprising a plurality of nucleic acid probes capable of circularizing capture of a region of interest.
  • aspects of the invention include a single-stranded nucleic acid probe comprising a nucleic acid sequence of the formula:
  • a and C are probe-arms designed to hybridize, potentially with some number of mismatches, to a target nucleic acid sequence.
  • a list of probe arms used in this invention is included in tables 1-4.
  • B is a backbone sequence.
  • the backbone sequence B comprises a cleavage site.
  • the cleavage site is a restriction endonuclease recognition site.
  • the backbone sequence B comprises one or more detectable moieties.
  • the one or more detectable moieties are each independently selected from a barcode sequence and a primer-binding sequence.
  • the backbone contains non-Watson-Crick nucleotides, including, for example, abasic furan moieties, and the like.
  • the backbone sequence B includes one or more primer binding sites that can be used to amplify the circularized probe sequence using rolling circle amplification, PCR, or similar techniques.
  • the primer sequences also contain adapters such that the amplification product can be sequenced on a particular next-generation sequencing platform such as the Ion Torrent PGM, Ion Torrent Proton, lllumina MiSeq, lllumina HiSeq, Solid XL, 454 GS, or a nanopore platform.
  • one or more of the primer sequences contains a barcode sequence such that several samples can be amplified each with a unique barcode and the sequencing reads de-multiplexed during analysis.
  • each probe arm sequence in the group consisting of columns 2 and 3 from tables 1 -4 is contained in at least one probe in the plurality of probes.
  • the plurality of probes includes a subset of the probes in tables 1-4, where the probes in the subset have been chosen to detect a specific set of drug resistance mutations or to determine a subset of viral genotypes.
  • the composition comprising a plurality of probes comprises extracted nucleic acids from a test sample.
  • the extracted nucleic acids may be from a biological sample.
  • the biological sample may be from a human patient.
  • the composition comprising a plurality of probes comprises at least one sample internal calibration standard nucleic acid. In some embodiments, the composition comprising a plurality of probes comprises at least one probe that specifically hybridizes with the sample internal calibration standard nucleic acid.
  • the composition comprising a plurality of probes comprises extracted nucleic acids from a test sample.
  • kits comprising the composition comprising a plurality of probes as described herein and instructions for use.
  • the kit may also comprise reagents for obtaining a sample (e.g., swabs), and/or reagents for extracting RNA, and/or enzymes, such as polymerase and/or ligase to capture a region of interest.
  • aspects of the invention include a method of detecting the presence of one or more strains of hepatitis C virus in a test sample, comprising:
  • aspects of the invention also include a method of detecting the genotype of one or more strains of hepatitis C virus (HCV) in a test sample, comprising:
  • a region of interest is captured by polymerase-dependent extension from the 3' terminus of sequence C of a probe in the plurality of probes. In further embodiments, the region of interest is captured by sequence-specific ligation of a linking oligonucleotide.
  • the method of detecting the presence of one or more strains of hepatitis C virus in a test sample or of detecting the genotype of one or more strains of HCV in a test sample includes the step of amplifying the circularized probe to form a plurality of amplicons containing the captured region or regions of interest.
  • the method of detecting the presence of one or more strains of hepatitis virus in a test sample or of detecting the genotype of one or more strains of HCV in a test sample includes the step of treating the mixture with a nuclease to remove linear nucleic acids between the steps of capturing and detecting a region of interest (steps (b) and (c) of each of the methods described above).
  • the method includes the step of linearizing the circularized probe by cleavage with a site-specific endonuclease.
  • the method of detecting the presence of one or more strains of hepatitis C virus in a test sample includes the step of sequencing the region of interest.
  • the method of detecting the presence of one or more strains of hepatitis C virus in a test sample further includes the step of comparing the sequence of the captured region of interest to the sequence of known HCV genomes.
  • the method of detecting the presence of one or more strains of HCV or of determining the genotype of the one or more strains includes the step of comparing the sequence of the captured region to the predicted capture regions of a previously sequenced HCV genomes or partial genomes.
  • the method of detecting the presence of one or more strains of hepatitis C virus in a test sample includes the step of analyzing the sequence of the captured region of interest with respect to the sequence of known hepatitis C virus genomes and a model of sequencing errors to estimate the proportions or abundances of the hepatitis C strains in the test sample.
  • the method of detecting the genotype of one or more strains of HCV in a test sample includes the step of comparing the sequence of the captured region of interest to a database of known HCV mutations.
  • the database of known HCV mutations is a database of known HCV drug resistance mutations.
  • the method of detecting the genotype of one or more strains of HCV in a test sample includes the step of analyzing the sequence of the captured region of interest with respect to the sequence of known HCV genomes and a model of sequencing errors to estimate the proportions or abundances of one or more strains of HCV in a test sample. In some embodiments, the method of detecting the genotype of one or more strains of HCV in a test sample includes the step of analyzing the sequence of the captured region of interest with respect to the sequence of known HCV genomes and mutations and a model of sequencing errors to estimate the proportions or abundances of one or more mutations a test sample.
  • the test sample is obtained from a human subject. In some embodiments, the test sample is blood obtained from a human subject.
  • the method of detecting the presence of one or more strains of hepatitits C virus, of determining the genotype of one or more strains of HCV in a test sample, or detecting mutations in the HCV genome in the test sample includes the step of adding a sample internal calibration standard to the test sample. In some embodiments, the method further comprises the steps of adding a probe that specifically hybridizes with the sample internal calibration standard and detecting the sample internal calibration standard.
  • the method of detecting the presence of one or more strains of hepatitis C virus or of determining the genotype of one or more strains of HCV in a test sample includes the step of formatting the results to inform physician decision making.
  • the formatting includes providing an estimated quantity of one or more HCV genotypes of interest.
  • the formatted results comprise a therapeutic recommendation based on the one or more HCV genotypes detected.
  • the methods of using this invention achieves high specificity in detecting or sequencing only HCV nucleic acid compared to any human nucleic acids that may be present in the sample.
  • the nucleic acids processed by this method are sequenced on a next-generation sequencing machine, less than 50%, 25%, 10%, or 1 % of the resulting sequencing reads or sequencing reads that pass a quality filter represent human nucleic acids. This specificity is advantageous as it reduces the number of sequencing reads required to achieve a desired depth of sequencing for the HCV genome.
  • Figure 1 A schematic diagram of a probe hybridized to the target nucleic acid.
  • the two homologous probe arms are shown, as are the two universal primer binding sites in the backbone that are present in certain embodiments.
  • Figure 2 A schematic of the protocol of an embodiment in which (1 ) probes are hybridized to target nucleic acids in a sample, (2) a polymerase copies the reverse- complement of the target into the probe molecule and a ligase closes the circle, (3) exonuclease enzymes digest away target and unused probe molecules and (4) a pair of adapter-primers amplify circularized probe molecules containing the copied target region, adding a barcode and next-generation sequencing machine adapters in the process.
  • Figure 3 Specific or degenerate primers initiate the reverse-transcription reaction. Each primer contains a homology region to the RNA and a non-binding tail.
  • Panel B shows the content of the tail: a molecule-specific barcode or dogtag followed by a probe binding site. With such a primer and suitably long dogtag, each cDNA molecule will contain a unique dogtag.
  • Panel C shows the capture of the resulting cDNA molecule (or its complement if the first strand cDNA was later amplified in a PCR reaction) by a probe. The probe binds to a target in the reverse-transcribed RNA and to the probe arm binding site such that the molecule-specific barcode will be captured and sequenced.
  • Figure 4 The protocol can be performed in a single tube and the resulting material used with any sequencing platform.
  • Figure 5 The distribution of the 436 probes along the HCV genome.
  • the NS3 gene is between roughly 3kb and 4k, NS5A between roughly 6kb and 7kb, and NS5B from roughly 7kb to 9.5kb.
  • the large number of probes overlapping certain coordinates reflects the enormous genetic diversity between the HCV strains seen in patients around the world. While certain technologies perform poorly on certain strains, the invention described here uses a large set of probes to ensure efficient capture of any sample.
  • Figure 6 Agreement between Sanger Sequencing, another next-generation sequencing approach, and the invention described here (Pathogenica DxSeq) on a subset of clinical samples.
  • Figure 7 Graphical summary of results for the 19 genotype-1a and 11 genotype- l b samples showing the coverage and results of a set of known drug resistance mutations in the NS3, NS5a, and NS5b genes.
  • Figure 8 A list of selected HCV drug resistance mutations and the drugs to which they indicate resistance. Presence of even a small fraction of resistant molecules in a patient sample indicates that the drug should not be prescribed.
  • Figure 9 Agreement in HCV genotype detection between the Versant Genotyping assay and the Pathogenica DxSeq assay disclosed here.
  • Figure 10 The three panes show scatterplots comparing the frequency of alternate alleles detected in an HCV sample between two replicates. Each point represents the frequency of the alternate allele at a single locus in the gene. Because the presence of small numbers of drug resistant viral particles in a patient can predict therapy failure, detection of alternate alleles at low frequency is critical for clinical HCV assays. The invention described here demonstrates a strong correlation between replicates across the entire range of allele frequencies.
  • Figures 11-14 show genotyping results for clinical sample 10 in genes NS3,
  • each column represents a single codon.
  • the vertical coordinate indicates the frequency of the alternate allele at that codon in the clinical sample.
  • the color of the circle indicates whether the alternate allele is a known drug resistance mutation.
  • One aspect of the invention provides mixtures of circularizing "capture” probes suitable for sensitive, rapid, and highly specific detection of one or more hepatitis C viruses in complex samples.
  • Probe refers to a linear, unbranched polynucleic acid comprising two homologous probe sequences separated by a backbone sequence, where the first homologous probe sequence is at a first terminus of the nucleic acid and the second homologous probe sequence is at the second terminus to the nucleic acid, and where the probe is capable of circularizing capture of a region of interest of at least 2 nucleotides.
  • “Circularizing capture” refers to a probe becoming circularized by incorporating the sequence complementary to a region of interest.
  • A is a probe arm sequence listed in column 2 of tables 1 -4;
  • B is a backbone sequence.
  • Figure 1 shows a schematic of a probe hybridized to a target nucleic acid (either RNA or DNA).
  • embodiments encompass a probe, which includes two homologous probe sequences A and C, each of which may specifically hybridize to a different target sequence in a hepatitis viral genome adjacent to a region of interest.
  • a probe may comprise any one of the pairs of homologous probe sequences in columns 2 and 3 of tables 1-4.
  • the probes may further comprise a backbone sequence, which contains a primer binding site between the homologous probe sequences.
  • the homologous probe sequence at the 3' end of the probe is termed the extension arm and the homologous probe sequence at the 5' end of the probe (probe segment A) is termed the ligation or anchor arm.
  • the probe/target duplexes are suitable substrates for polymerase- dependent incorporation of at least two nucleotides on the probe (on the extension arm), and/or ligase-dependent circularization of the probes (either by circularizing a polymerase-extended probe or by sequence-dependent ligation of a linking
  • Capture reaction refers to a process where one or more probes contacted with a test sample has possibly undergone circularizing capture of a region of interest, wherein the first and second homologous probe sequences in the probe have
  • Capture reaction products refers to the mixture of nucleic acids produced by completing a capture reaction with a test sample.
  • Amplification reaction refers to the process of amplifying capture reaction products.
  • An amplification reaction product refers to the mixture of nucleic acids produced by completing an amplification reaction with a capture reaction product.
  • Figure 2 shows a schematic of a protocol wherein the probe circularizes to capture a region of the target nucleic acid and is then amplified by universal primers in preparation for sequencing.
  • a “homologous probe sequence” is a portion of a probe provided by the invention that specifically hybridizes to a target sequence present in the genome of a hepatitis C virus.
  • the terms “homologous probe sequence,” “probe arm,” “homologous probe arm,” “homer,” and “probe homology region” each refer to homologous probe sequences that may specifically hybridize to target genomic sequences, and are used interchangeably herein.
  • “Target sequence” refers to a nucleic acid sequence on a single strand of nucleic acid in the genome of an organism of interest. In some embodiments, the homologous probe sequences in the probes are the probe pairs listed in tables 1-4.
  • hybridizes refers to sequence-specific interactions between nucleic acids by Watson-Crick base-pairing (A with T or U and G with C). "Specifically hybridizes” means a nucleic acid hybridizes to a target sequence with a T m of not more than 14 °C below that of a perfect complement to the target sequence.
  • a bridge nucleic acid may be employed, wherein at least a first portion of the bridge nucleic acid is capable of hybridizing to the capture probe, and at least a second portion of the bridge nucleic acid (which may overlap with the first portion) is capable of simultaneously or sequentially hybridizing to the target nucleic acid, thereby enhancing the efficiency of ligation of the capture probe to the target.
  • a probe specifically hybridizes when: a) both homologous probe sequences A and C in the probe hybridize to their respective target sequence with at least 60, 65, 70, 75, 80, 85, 90, 95, or 100% correct pairing across the entire length of the homologous probe sequence; b) the 3'-most homologous probe sequence (also referred to herein as "C") hybridizes with 100% correct pairing in the 8, 7, 6, 5, 4, 3, or 2 bases at the 3' end of probe sequence C; and c) the 5'-most homologous probe sequence (also referred to herein as "A”) hybridizes the first 8, 7, 6, 5, 4, 3, or 2 bases of the 5' end of probe sequence A.
  • both homologous probe sequences A and C in the probe hybridize to their respective target sequence with at least 60, 65, 70, 75, 80, 85, 90, 95, or 100% correct pairing across the entire length of the homologous probe sequence
  • the 3'-most homologous probe sequence also referred to herein as "C” hybridizes with 100% correct pairing in
  • a probe specifically hybridizes when: a) both homologous probe sequences A and C in the probe hybridize to their respective target sequence with at least 80% correct pairing across the entire length of the homologous probe sequence, b) homologous probe sequence C hybridizes with 100% correct pairing of the first 6 bases of the 3' end of C; and c) homologous probe sequence A hybridizes with 100% correct pairing of the first 6 bases of the 5' end of A.
  • a probe specifically hybridizes when: a) both homologous probe sequences A and C in the probe hybridize to their respective target sequences with a melting temperature that is within some range (eg, 10 degrees Celsius given the hybridization buffer) of the perfect match Tm, b) both homologous probe sequences hybridize with 100% correct pairing over the 6 bases at the external ends.
  • Homology between two sequences may be determined by any means known in the art, including pairwise alignment, dot-matrix, and dynamic programming, and in particular embodiments by FASTA (Lipman and Pearson, Science, 227: 1435-41 (1985) and Lipman and Pearson, PNAS, 85: 2444-48 (1998)), BLAST (McGinnis & Madden, Nucleic Acids Res., 32:W20-W25 (2004) (current BLAST reference, describing, inter alia, MegaBlast); Zhang et al. , J. Comput.
  • FASTA Lipman and Pearson, Science, 227: 1435-41 (1985) and Lipman and Pearson, PNAS, 85: 2444-48 (1998)
  • BLAST McGinnis & Madden, Nucleic Acids Res., 32:W20-W25 (2004) (current BLAST reference, describing, inter alia, MegaBlast); Zhang et al. , J. Comput.
  • the methods provided by the invention comprise screening candidate sets of sequences by MegaBLAST against one or more annotated genomes.
  • a sequence “specifically hybridizes” when it hybridizes to a target sequence under stringent hybridization conditions.
  • Stringent hybridization conditions refers to hybridizing nucleic acids in 6xSSC and 1 % SDS at 65 °C, with a first wash for 10 minutes at about 42 °C with about 20% (v/v) formamide in O.lxSSC, and a subsequent wash with 0.2xSSC and 0.1 % SDS at 65 °C.
  • alternate hybridization conditions can include different hybridization and/or wash temperatures of about 55, 56, 57, 58, 59, 60, 61 , 62, 63, 64, 66, 67, 68, 69, or 70 °C or other hybridization conditions as disclosed in Sambrook and Russell, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 3rd edition (2001 ), which is incorporated herein by reference.
  • the hybridization temperature is greater than 60 °C, e.g., 60-65 °C.
  • homologous probe arms must be chosen to match the hybridization conditions, and vice-versa. If homologous probe arms are to be used at a less stringent hybridization condition (e.g. , a lower hybridization temperature or higher salt
  • nucleotides can be removed from the backbone-adjacent ends of the probe arms until the melting temperature of the probe arms is suitable for the
  • the melting temperature of the probe arm can be computed using any software known to those skilled in the art, such as Melting 5.0
  • nucleotides may be added to the backbone-adjacent ends of the probe arms such that the nucleotides match as many desired targets of the probe as possible.
  • the backbone-distal ends of the probe arms cannot be easily modified to change the melting temperatures of the arms as the nucleotide composition of those ends (typically the terminal two nucleotides) play a significant role in the efficiency of the polymerase initiation and the ligation reactions.
  • An “organism” is any biologic with a genome, including viruses, bacteria, archaea, and eukaryotes including plantae, fungi, protists, and animals.
  • Regular Interest refers to the sequence between the nearest termini of the two target sequences of the homologous probe sequences in a probe.
  • Homologous probe sequences A and C in a probe provided by the invention can readily be adapted for use as a pair of conventional primer pairs for use in a polymerase chain reaction (PCR) to specifically amplify a region of interest from a viral sequence.
  • "Conventional primer pairs” refers to a pair of linear nucleic acid primers each member of which comprises sequences corresponding to one of the two homologous probe sequences in a probe provided by the invention, which are capable of exponential amplification of a region of interest comprising at least two nucleotides. These conventional primer pairs are encompassed by and are a part of the present invention.
  • a convention primer pair comprises a first primer comprising the sequence of an extension arm of a circularizing capture probe provided by the invention— i.e. a 3' primer or "C" sequence” and a second primer comprising the reverse complement of a ligation arm of a circularizing capture probe provided by the invention— i.e. a 5' primer or "A" sequence.
  • the conventional primer pairs comprise a barcode sequence.
  • the conventional primer pairs comprise universal sequences, including, for example, sequences that hybridize to adaptamer primers.
  • the probes and conventional primer pairs provided by the invention may comprise the naturally occurring conventional nucleotides A, C, G, T, and U (in deoxyriobose and/or ribose forms) as well as modified nucleotides such as 2'O-Methyl- modified nucleotides (Dunlap et al, Biochemistry. 10(13):2581-7 (1971 )), artificial base pairs such as IsodC or IsodG, or abasic furans (such as dSpacer) (Chakravorty, et al. Methods Mol Biol.
  • the 5' or 3' homologous probe sequences of a probe provided by the invention comprise, at their respective termini, a photocleavable blocking group, such as PC-biotin.
  • a probe provided by the invention comprises a photocleavable blocking group at its 5' terminus to block ligation until photoactivation.
  • a probe provided by the invention comprises at its 3' terminus a photocleavable blocking group to block polymerase-dependent extension or n-mer oligonucleotide ligation until photoactivation.
  • the 5'-most nucleotide of a probe provided by the invention comprises an adenylated nucleotide to improve ligation and/or hybridization efficiency. See, e.g., Hogrefe et al., J Biol. Chem. 265 (10): 5561-5566, (1990).
  • the 5' end of the 5' homologous probe region (e.g., the ligation arm) comprises at least one LNA and in still more particular embodiments, the 5' terminal nucleotide is a LNA.
  • the probe molecule is capped with a phosphate group at the 5' end to improve the ligation efficiency.
  • Homologous probe arms may be chosen to capture regions that identify particular HCV genotypes or sub-genotypes.
  • the set of 27 pairs of probe arms listed in tables 1-4 were chosen to distinguish among genotypes 1 a, 1 b, 1g, 2a, 2b, 3, 6a, 6m, 6k, 6n, 6j, 6i, and 6f. These probe arms were designed against regions that are relatively conserved across 661 instances of the above genotypes but that surround capture regions that are relatively variable.
  • AF169004 3749 3919 ACCAGATACAGGTCGACCGCT GGATGAAGTCTATGGACTTAGC
  • EU155283 3754 3900 AGTAAGAGATAGGCCGGGGCG TGGAGAAGAGTTGTCTGTGAAC
  • EU155315 3607 3762 CCTGGTCTACATTGGTGTACATCT GAAGAGCCCTTCAAGTAGGAG
  • EU256069 3550 3712 GTACATCTGGATGACAGGACCCT CTTTCAAATAAGAGATAGGCCGG
  • FJ435090 3744 3911 GTCACTAGATAGAGATCTGATGCGC ACGAAATCAAGTGCTTTCGC
  • EU155331 6341 6517 ACTCCCTTATACCCACGTTGGC CCATAGCGCCCTAGAATAGTT
  • AF165052 8360 8535 GCTCTGTGAGCGACTTTATGGC AGATAACGACTAGGTCGTCTC
  • EU256064 8330 8494 ACATAAAGCCGCTCTGTGAGCG GATAACGACAAGGTCGTCTCC
  • BBBBBBBB represents a unique barcoding sequence and TxC and AxC indicate a phosphorothioate bond between the T and C or A and C.
  • Homologous probe arms may also be chosen to capture drug resistance mutations. These sets of probes were chosen to capture one or more drug resistance mutations across a set of 661 full or partial HCV genomes that are publically available. See e.gJ/hcv.lanl.gov; HCV sequence database: Kuiken C, Yusim K, Boykin L, Richardson R. The Los Alamos HCV Sequence Database. Bioinformatics(2005), 21(3):379-84. The probe selection process attempted to select three probes that would "work” against every drug resistance mutation listed in Table 5 in each of the 661 genomes. The software considers a probe to work in a strain if it is expected to capture with at least 10% of its maximum efficiency. Table 2 lists the 162 probes designed against drug resistance mutations in the NS3 protease gene. Table 3 lists the 119 probes designed against the NS5a gene. Table 4 lists the 129 probes designed against the NS5b polymerase gene.
  • the probes provided by the invention include a probe backbone sequence between the first and second homologous probe sequences.
  • the backbone sequence can be at least 15, 20, 25, 30, 35, 40, 45, 50, 70, 90, 100, 12, 140, 150, 160, 180, 200, 400 bases, or more.
  • the backbone sequence may include a detectable moiety.
  • the detectable moiety is a probe-specific sequence, such as a barcode for identification of a specific probe or set of probes.
  • the backbone sequence comprises one or more primer- binding sites.
  • the backbone includes two primer- binding sites.
  • Each backbone primer-binding site may comprise one or more universal sequences that, for example, can be used to amplify all circularized probes in a mixture.
  • the backbone sequence comprises one or more non Watson- Crick nucleotides.
  • the backbone comprises one or more 2'0Methyl nucleotide residues, artificial base pairs such as IsodC or IsodG, or abasic furans (such as dSpacer), or 2'OMethyl, abasic furans, or LNA nucleotides, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more LNAs or 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100%
  • 2'OMethyl, abasic furans, or LNA nucleotides to confer greater reactivity or inertness in the hybridization reaction, provide resistance to enzymatic activities such as
  • polymerase-mediated strand displacement or nuclease cleavage to serve as inhibitors of spurious amplification events, or to act as target sites for trans-acting nucleic acid oligonucleotides such as PCR primers or biotinylated capture probes.
  • the backbone sequence B comprises a cleavage site.
  • the cleavage site is a restriction endonuclease recognition site.
  • the backbone sequence B comprises one or more detectable moieties.
  • the one or more detectable moieties are each independently selected from a barcode sequence and a primer-binding sequence.
  • the backbone contains non-Watson-Crick nucleotides, including, for example, abasic furan moieties, and the like.
  • the backbone comprises the sequence
  • the backbone comprises the sequence GTTGGAGGCTCATCGTTCCTATATTCCACACCACTTATTATTACAGATGTTATGCTCG CAGGTC.
  • barcode is used to refer to a nucleotide sequence that uniquely identifies a molecule or class of related molecules.
  • Suitable barcode sequences that may be used in the probes of the invention may include, for example, sequences corresponding to customized or prefabricated nucleic acid arrays, such as n-mer arrays as described in U.S. Patent No. 5,445,934 to Fodor er a/, and U.S. Patent No.
  • the n-mer barcode may be at least 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400 or 500 nucleotides, e.g., from 18 to 20, 21 , 22, 23, 24, or 25 nucleotides.
  • the n-mer barcode is from 6 to 8 nucleotides.
  • the n-mer barcode is from 10 to 12 nucleotides.
  • the barcodes include sequences that have been designed to require greater than 1 , 2, 3, 4 or 5 sequencing errors to allow this barcode to be inadvertently read as another in error.
  • the probe does not contain a barcode, while a primer that is used to amplify a circularized probe contains a barcode.
  • barcode sequences for each barcode size K, 4 K random barcodes may be generated from the four DNA nucleotides, A, T, G, C, using a Perl script.
  • This set of barcodes represents the total number of unique sequence combinations possible for a sequence of K length, using 4 nucleotide variations. Barcodes for which one nucleotide comprises 100% of the length, e.g., TTTTTT, are then optionally removed using a pattern-matching Perl script. Further filtering steps may include removal of barcodes which contain runs of nucleotides of >3, e.g., TGGGGT, or runs interrupted by only one nucleotide, for instance, GGGTGG.
  • Barcodes containing palindromes or inverted repeats with a propensity to form secondary structure through self-hybridization may be filtered using a Perl script designed to identify such self-complementarity.
  • a set of candidate barcodes may be further filtered such that every barcode contains at least some number of base differences compared to any other barcode.
  • barcodes may be selected to be an edit distance of two nucleotides apart (i.e., differing in sequence by two nucleotides) to ensure that a single sequencing error does not cause barcode mis-identification.
  • Selection of barcodes that may be utilized in a mixture of probes used to test a sample from a patient may involve selecting a combination of barcodes that will provide >5% and not more than 50% representation of a particular nucleotide at each position in the barcode sequence within the pool. This is achieved by random addition and removal of barcodes to a pooled set until the conditions specified are met using a Perl script. Barcodes for which the reverse complement sequence is also present within the barcode pool may also be eliminated.
  • Suitable barcode sequences include such barcode sequences as set forth in the table below, which illustrates exemplary 3-mer, 4-mer, 5-mer, 6-mer, 7-mer, 8-mer, 9- mer, and 10-mer barcode sequences.
  • Sequences indicated as “1 nucleotide distance” nmers in Table 3 are illustrative sequences that have a sequence distance of at least 1 from each other, where “distance” refers to the minimum number of sequencing differences between each of the sequences of the same category.
  • “Two nucleotide distance” sequences have a "distance" from each other of at least 2 nucleotides.
  • barcodes used in the probes provided by the invention correspond to those on the Tag3 or Tag4 barcode arrays by AFFYMETRIXTM. Further discussion of barcode systems can be found in Frank, BMC Bioinformatics, 10:362 (2009; 13 pages), Pierce er a/. , Nature Methods, 3:601-03 (2006) (including web supplements), and Pierce et a/., Nature Protocols, 2:2958-74 (2007).
  • the barcode is sample-specific, e.g. , comprises one or more patient specific barcodes.
  • more than one barcode will be assigned per patient sample, allowing replicate samples for each patient to be performed within the same sequencing reaction.
  • the barcode may be temporal, e.g., a barcode that specifies a particular period of time.
  • a temporal barcode it is possible to detect carry-over or contamination on an assay instrument, such as a sequencing instrument, between runs on different days.
  • sample and/or temporal barcodes may be used to automatically detect cross-contamination between samples and/or days and, for example, instruct an instrument operator to clean and/or decontaminate a sample handling system, such as a sequencing instrument.
  • a barcode sequence in the backbone is located between a universal primer binding site and a probe arm such that the barcode sequence is amplified by the universal primer.
  • universal primer binding sequences in a backbone sequence serve as a hybridizing template for longer "adaptamer” primers.
  • adaptamer primer is a primer that hybridizes to universal primer sequences in a capture reaction product to facilitate amplification of the capture reaction product and further comprise a sample-specific barcode sequence, e.g., sequence 5' to the universal primer hybridizing region of the adaptamer primer.
  • Adaptamer primers can be used, for example, to incorporate sample-specific barcodes on amplification reaction products to allow further multiplexing of samples after completing a capture reaction and an amplification reaction.
  • the addition of sample-specific barcodes allows multiple capture and/or amplification reaction products to be pooled before detection by, for example, sequencing.
  • the adaptamer primers further include universal sequences that hybridize to a sequencing primer.
  • the detectable moiety may be associated with the backbone sequence. It may be bound to the polynucleotide sequence, as in the case of direct labels, such as fluorescent (e.g., quantum dots, small molecules, or fluorescent proteins), chemical or protein-based labels. Alternatively, the detectable moiety may be incorporated within the polynucleotide sequence, as in the case of nucleic acid labels, such as modified nucleotides or probe-specific sequences, such as barcodes. Quantum dots are known in the art and are described in, e.g., International Publication No. WO 03/003015.
  • the backbone may also contain a "random" sequence, typically written as a sequence of N's. Such random sequence indicates that any of the four nucleotides A, T, C, or G will be incorporated at that position in the synthesized probe molecule. Within a population of probe molecules, one expects to see all or many of the 4 ⁇ ⁇ possible sequences for the string of Ns. For example, a probe with the backbone
  • GTTGGAGGCTCATCGTTCCTATATTCCACACCACTTATTATTANNNNNNCAGATGTTA TGCTCGCAGGTC could actually by one of 4 6 molecules. Including such random sequences, also known as "dogtags," in the probe molecule allows one skilled in the art to determine the most likely number of circularized probe molecules after amplification and sequencing by counting the number of unique dogtags seen. 2 Probe Mixtures
  • aspects of the invention provide one or more probes for multiplex analysis of test samples, including hepatitis virus detection and hepatitis C viral genotyping in a biological sample from a patient.
  • aspects of the invention encompass a composition comprising a plurality of probes, each comprising a nucleic acid sequence of the formula:
  • A is a probe arm sequence taken from column 2 of tables 1-4;
  • C is a corresponding probe arm sequence from column 3 of tables 1-4;
  • B is a backbone sequence.
  • each probe arm sequence in the group consisting of all sequences from tables 1-4 is contained in at least one probe in the plurality of probes.
  • Probes in a mixture may be selected such that the mixture comprises a subset of the full group of probes encompassed by the probe arm sequence pairs provided in tables 1-4, so as to detect a particular subset of hepatitis C genotypes or a particular subset of mutations.
  • Probes in a mixture will typically have similar bulk properties (such as,
  • homologous probe sequence length the homologous probe sequence T m , and length of the captured region of interest, and the lack of secondary structure
  • the T m of the homologous probe sequences in a mixture of probes will be within 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 °C of each other, or in particular embodiments have the same T m .
  • the homologous probe sequences in a mixture of probes will all be within 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 nucleotide in length of each other.
  • the length of the region of interest between the target sequences of probes in a mixture may vary over a range of values, such as from 2 to 20, 20 to 100, 20 to 200, 40 to 300, 100 to 300, 100 to 500, 80 to 500, or 100-180 nucleotides. In some embodiments, the length of the region of interest between the target sequences of probes in a mixture is from 100 to 489 nucleotides. In particular embodiments, the regions of interest are within 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 nucleotides in length of each other. Barcode lengths may also vary, but are generally within 25, 20, 15, 10, or 5 nucleotides of each other. In particular embodiments, the barcodes are the same length.
  • mixtures provided by the invention comprise capture reaction products and amplification reaction products from different test samples, as further described below.
  • different capture reaction products and/or amplification reaction products can be combined and multiplexed before detection, i.e., for concurrent detection. This is accomplished using barcode sequences that identify the test samples.
  • capture reaction products from test sample A will include a sample A-specific barcode
  • capture reaction products from sample B will include a sample B-specific barcode.
  • all sequences in the sample A capture reaction products are identified by the presence of the sample A-specific barcode sequence.
  • the mixtures of the invention contain sample internal calibration nucleic acids (SICs).
  • SICs sample internal calibration nucleic acids
  • known quantities of one or more SICs are included in a mixture provided by the invention.
  • the SICs have a nucleotide composition characteristic of pathogenic DNA targets and are present in specific molar quantities that allow for reconstruction of a calibration curve for quality control, e.g., for the processing and sequencing steps for each individual test sample.
  • the SICs makes up approximately 10% (molar quantity) of nucleic acids in a mixture, for example, 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20% (molar) of nucleic acids in the mixture.
  • different SICs are present in different concentrations, for example, in a dilution series, over a 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 200, 500, 1000, 5000, 10000, 50000, or 100000 -fold concentration range from the most dilute to most concentrated SICs in 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, or 50 steps.
  • SICs are present in a sample (e.g., a mixture of probes and a test sample, a capture reaction, a capture reaction product, an amplification reaction, or an amplification reaction product) at concentrations of 5, 25, 100, and 250 copies/ml.
  • an organism count per unit volume e.g., copies/mL for liquid samples such as blood or urine
  • an organism count per unit volume can be estimated for each organism detected.
  • the concentration of SICs and probes directed to the SICs are adjusted empirically so that sequences of SICs detected in a capture reaction product and/or amplification reaction product make up about 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 25, or 30% of sequences in the mixture.
  • SICs make up 10-20% of sequence reads.
  • the number of SICs sequence reads in a sequencing reaction is quantitatively evaluated to ensure that sample processing occurs within pre-defined parameters.
  • the pre- defined parameters include one or more of the following: reproducibility within two standard deviations relative to all samples sequenced during a particular run, empirically determined criteria for reliable sequencing data (e.g., base calling reliability, error scores, percentage composition of total sequencing reads for each probe per target organism), no greater than about 15% deviation of GC or AU-rich SICs within a sequencing run.
  • the SICs DNA in a sample will also comprise the same barcode(s) corresponding to unique samples, e.g., particular patient samples.
  • SICs may comprise a region of interest as defined above, where the region of interest is modified to further comprise a sequence heterologous to the region of interest.
  • the sequence heterologous to the region of interest in the SICs is at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40 contiguous bases, or more.
  • the SIC contains the nucleotide sequence
  • the probe mixture includes the probe with sequence 5- /5Phos/GTG GTA TGG CTG ATT ATG ATC TAG AGT GTT GGA GGC TCA TCG TTC CTA TAT TCC TGA CTC CTC ATT GAT GAT TAC AGA TGT TAT GCT CGC AGG TCG AGT TTG GAC AAA CCA CAA CTA GAA -3.
  • the mixtures of the invention contain sample nucleic acids.
  • the nucleic acids may be obtained from any test sample, such as a biological sample.
  • the nucleic acids obtained from the test sample may be of varying degrees of purity, such as at least 1 , 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 85, 90, 95, 96, 97, 98, 99% of organic matter by weight.
  • the sample nucleic acids are extracted from a test sample.
  • Test samples may be from any source and include swabs or extracts of any surface, or biological samples, such as patient samples.
  • Patients may be of any age, including adults, adolescents, and infants.
  • Biological samples from a subject or patient may include blood, whole cells, tissues, or organs, or biopsies comprising tissues originating from any of the three primordial germ layers— ectoderm, mesoderm or endoderm.
  • Exemplary cell or tissue sources include skin, heart, skeletal muscle, smooth muscle, kidney, liver, lungs, bone, pancreas, central nervous tissue, peripheral nervous tissue, circulatory tissue, lymphoid tissue, intestine, spleen, thyroid, connective tissue, or gonad.
  • Test samples may be obtained and immediately assayed or, alternatively processed by mixing, chemical treatment, fixation/ preservation, freezing, or culturing.
  • Biological samples from a subject include blood, pleural fluid, milk, colostrums, lymph, serum, plasma, urine, cerebrospinal fluid, synovial fluid, saliva, semen, tears, and feces.
  • the biological sample is blood.
  • Other samples include swabs, washes, lavages, discharges, or aspirates (such as, nasal, oral, nasopharyngeal, oropharyngeal, esophagal, gastric, rectal, or vaginal, swabs, washes, ravages, discharges, or aspirates), and combinations thereof, including combinations with any of the preceding biopsy materials.
  • the invention provides a method for detecting the presence of one or more hepatitis C virus by contacting a sample suspected of containing at least one such virus with a mixture of probes of the invention, capturing a region of interest of the at least one virus (e.g., by polymerization and/or ligation) to form a circularized probe, and detecting the captured region of interest, thereby detecting the presence of the one or more hepatitis C viruses.
  • the captured region of interest may be amplified to form a plurality of amplicons (e.g., by PCR).
  • the sample is treated with nucleases to remove the linear nucleic acids after probe-circularizing capture of the region of interest.
  • the circularized probe is linearized, e.g., by nuclease treatment.
  • the circularized probe molecule is sequenced directly by any means known in the art, without amplification.
  • the circularized probe is contacted by an oligonucleotide that primes polymerase-mediated extension of the molecules to generate sequences complementary to that of the circularized probe, including from at least one to as many as 1 million or more concatemerized copies of the original circular probe.
  • the circularized probe molecule is enriched from the reaction solution by means of a secondary-capture oligonucleotide capture probe.
  • a secondary-capture oligonucleotide capture probe may comprise a moiety designed to be captured, such as a biotin molecule, and a nucleic acid sequence designed to hybridize to at least 6 nucleotides of the circularized probe.
  • the nucleic acid sequence designed to hybridize to at least 6 nucleotides of the circularized probe may include 1 ,
  • the probe and/or captured region of interest is sequenced by any means known in the art, such as polymerase-dependent sequencing (including, dideoxy sequencing, pyrosequencing, and sequencing by synthesis) or ligase based sequencing (e.g., polony sequencing).
  • the sample is a biological sample.
  • the biological sample is from a mammal, such as a human.
  • the methods of detecting the presence of one or more hepatitis viruses further comprise the step of formatting the results to facilitate physician decision making by, for example, providing one or more graphical displays.
  • the invention provides a method of treating a subject suspected of being infected with a hepatitis C virus, comprising detecting at least one hepatitis C virus by the methods of the invention and administering a suitable therapeutic treatment based on the at least one hepatitis virus detected.
  • the invention also provides a method of treating a subject suspected of being infected with an HCV strain carrying a drug resistance mutation, comprising detecting at least one HCV drug resistant genotype by the methods of the invention and
  • HCV RNA may be directly contacted by probes or may be converted to DNA to be used with the probes disclosed in this invention.
  • Many techniques for converting RNA to DNA are available in the scientific literature. For example, random hexamer or octamer primers can be used with a reverse-transcriptase to generate first strand cDNA from the viral RNA. While simple, random priming also amplifies host (eg, human) RNA that may be present in a clinical sample, thus limiting the amount of viral cDNA produced by the reaction.
  • Figure 4 panels A and B depict cDNA generation approaches for embodiments of this invention.
  • Figure 5 shows the process from RNA through analysis.
  • a preferred embodiment of this invention uses a set of HCV-specific RT primers in an RT-PCR reaction to generate and amplify DNA from the viral RNA molecules. Similar to probes, good RT primers hybridize to relatively conserved portions of the HCV genome.
  • HCV genotypes 1 a and 1 b are the dominant strains.
  • Table 6 shows a set of RT primers that amplify the NS3, NS5A, and NS5B genes from genotypes 1 a and 1 b. These primers can be used with the Qiagen OneStep RT-PCR to produce cDNA from RNA that has been extracted from HCV blood samples.
  • a set of RT-PCR primers that target all known HCV genotypes may be used.
  • Table 7 lists a set of 75 15mers that achieve this goal for the NS3 gene.
  • This set of primer set works at a lower temperature (hybridization temperature at 30 °C ).
  • the invention provides methods of detecting the presence of one or more HCV strains in a test sample.
  • the methods comprise the step of contacting a mixture comprising probes described above with any of the test samples described above in a capture reaction, as defined above.
  • a mixture comprising probes is contacted with nucleic acids extracted from a test sample such as blood, along with a polymerase enzyme and nucleotide triphosphates (NTPs), and capturing at least one region of interest by polymerase-dependent extension of at least one homologous probe sequence in the mixture.
  • NTPs nucleotide triphosphates
  • the polymerase-dependent extension of a homologous probe sequence is followed by a ligation of the end of the extended (i.e., by the polymerase) homologous probe sequence to the end of the other homologous probe sequence to produce a circularized probe containing a region of interest from the genome of an HCV strain.
  • the ligation reaction occurs while the target arm is hybridized to the target.
  • the target arm is dissociated from the target and ligated in solution under reaction conditions favoring self-ligation over trans-ligation to other probe molecules, for example a dilute ligation solution.
  • Figure 2 illustrates one particular embodiment of a method provided by the invention. Briefly, hybridization of a probe to the target sequences in the organism of interest is followed by polymerase-mediated, target-sequence-directed addition of nucleotides to the 3' homologous probe sequence, terminating due to obstruction at the 5' homologous probe sequence of the probe. A ligation reaction joins the terminal 3' nucleotide to the 5' nucleotide of arm.
  • the sample may be treated with exonuclease to digest single stranded linear DNA.
  • Primers complementary to the probe backbone may amplify the MIP into dsDNA for sequencing.
  • amplification primers at this stage may contain sample-specific nucleotide barcode sequences, e.g., they may be adaptamer primers.
  • a unique primenbarcode molecule sequence therefore may identify each test sample. For example, a panel of 100 probes is contacted with 50 individual test samples. The homologous probe sequences detected in a sequence read identifies a strain of hepatitis C or a drug resistance genotype of a strain of HCV. Each test sample amplification reaction is done with one unique probe set.
  • Each barcode within the amplification primer can be used to act as an identifier for a patient, e.g., contains a barcode. Therefore 50 pairs of amplification primers (one for each amplification reaction product) and one panel of probes (e.g., probes for hepatitis A, B, and C distinction, for HCV genotyping, or both) are required for a 50-sample multiplex assay.
  • Polymerases for use in the methods provided by the invention include Taq polymerase (Lawyer et al., J. Biol. Chem., 264:6427-6437 (1989); Genbank
  • accession: P19821 including the 5'->3' nuclease deficient "Stoffel” fragment described in Lawyer et al., PCR Meth. Appl., 2:275-287 (1993)), PHUSIONTM high fidelity recombinant polymerase (NEB), and Pyrococcus furiosus (Pfu) polymerase (see, e.g., U.S. Patent No. 5,545,552), as well as polymerases comprising a helix-hairpin-helix domain, such as TopoTaq and PfuC2 (Pavlov et al., PNAS, 99:13510-15 (2002)).
  • the polymerase is 5'->3' nuclease deficient, such as the Stoffel fragment of Taq polymerase, which further lacks 3'- 5' proofreading activity.
  • Polymerases lacking 5'- 3' exonuclease activity may be generated by means known in the art, for example, based on methods of screening or rational design.
  • polymerase variants can be designed based on sequence alignments of one or more polymerases to the Stoffel fragment of Taq and/or by "threading" a sequence through a solved polymerase structure (e.g., MMDB IDs 56530, 81884 and 81885).
  • a polymerase for use in the methods of the invention is a non-displacing polymerase, such as Pfu, T4 DNA polymerase, or T7 DNA polymerase.
  • a polymerase for use in the methods provided by the invention is a polymerase suitable for isothermal amplification and capture and/or amplification reactions are performed isothermally, e.g., by controlling metal ion concentration and/or using particular polymerases and/or additional enzymes, such as helicases or nicking enzymes (such as primer generation RCA and EXPAR). See, e.g., U.S. Patent No. 6,566,103, Murakami et al., Nucl. Acid.
  • Polymerases foruse in isothermal amplification include, for example, Bst, Bsu andphi29 DNA polymerases, and E.coli DNA polymerase I.
  • a mixture of probes is contacted with nucleic acids extracted from a test sample, a ligase enzyme, and a pool of n-mer oligonucleotides in a capture reaction, as defined above.
  • the n-mer is contacted with nucleic acids extracted from a test sample, a ligase enzyme, and a pool of n-mer oligonucleotides in a capture reaction, as defined above.
  • the n-mer is contacted with nucleic acids extracted from a test sample, a ligase enzyme, and a pool of n-mer oligonucleotides in a capture reaction, as defined above.
  • the n-mer is contacted with nucleic acids extracted from a test sample, a ligase enzyme, and a pool of n-mer oligonucleotides in a capture reaction, as defined above.
  • the n-mer is contacted with nucleic acids extracted from a test sample, a ligas
  • oligonucleotides are at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 22, 24 or 25 nucleotides long. In more particular embodiments, they are random hexamers. In other embodiments, they are polynucleotides, the length of the region of interest between the first and second target sequences that hybridize to the homologous probe sequence. In some embodiments, the n-mer oligonucleotide contains 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 locked nucleic acids (LNAs) or 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% LNAs.
  • LNAs locked nucleic acids
  • the ligase enzyme ligates the n-mer oligonucleotides with the probes provided by the invention to produce a circularized probe containing a region of interest from HCV.
  • Primers complementary to the probe backbone amplify the probe into dsDNA for sequencing.
  • amplification primers are adaptamer primers and contain sample-identifying barcode sequences. A unique barcode sequence therefore identifies each sample in a multiplex.
  • Each strain of HCV is identified by the unique combination of homologous probe sequences and ligated n- mer in a sequence read.
  • Ligases for use in the methods of the invention include T4, T7, and thermostable ligases, such a Taq ligase (as disclosed in Takahashi er a/., J. Biol. Chem., 259:10041 - 47 (1984), and international publication WO 91/17239), and AMPLIGASETM
  • mixtures comprising pairs of conventional PCR primers (conventional primer pairs) provided by the invention are contacted with sample nucleic acids to amplify a region of interest between two target regions in HCV.
  • a limited number of amplification steps are performed.
  • fewer than 25, 20, 15, 10, 9, 8, 7, 6, 5, 4, 3, or 2 cycles of amplification are performed.
  • the mixture of conventional primer pairs are contacted with nucleic acids extracted from a test sample, a
  • primers binding to universal probe recognition sequence e.g., a barcode
  • conventional primer pairs can be used in a variety of additional methods.
  • conventional primer pairs may be contacted with a sample nucleic acid suspected of containing at least one target nucleic acid.
  • PCR may be used to amplify the region of interest directly from a sample nucleic acid.
  • the conventional primer pairs may be used to amplify capture reaction products, e.g., one or more circularized probes.
  • a sample nucleic acid suspected of containing a region of interest is amplified using a conventional primer pair and then contacted with a probe provided by the invention for circularizing capture.
  • conventional primer pairs are contacted with a sample nucleic acid and modified nucleotides, such as biotinylated nucleotides.
  • modified nucleotides such as biotinylated nucleotides
  • the resulting capture or amplification reaction products can then be isolated by affinity capture, for example, with steptavidin substrates, for subsequent processing, e.g., circularizing capture with the probes provided by the invention.
  • a single conventional primer may be used for linear amplification of a region of interest in a sample nucleic acid in, and then contacted with a probe provided by the invention for circularizing capture.
  • a single conventional primer containing a 5' biotin moiety may be used to amplify a target sequence and then be enriched from the sample using streptavidin capture for sequencing by, for example, direct sequencing using either specific conventional primer pairs provided by the invention, or by random hexamer priming, or may be used for circularizing capture using probes provided by the invention
  • methods that comprise a capture reaction further comprise the step of contacting the capture reaction product with one or more
  • exonuclease includes at least one of exo I, exo III, exo VII, and exo V.
  • the exonuclease is up to a 100:1 , 50:1 , 25:1 , 10:1 , 5:1 , 2:1 , 1 : 1 , 1 :2, 1 :5, 1 :10, 1 :25, 1 :50, or 1 : 100 (unit to unit) mixture of exonuclease I and
  • the methods of the invention further comprise the step of amplifying capture reaction products in an amplification reaction.
  • amplifying nucleic acids include the polymerase chain reaction (see, e.g. , U.S. Patent Nos. 4,683,195 and 4,683,202 and McPherson and Moller, PCR (the baSICs), Taylor & Francis; 2 edition (March 30, 2006)), OLA (oligonucleotide ligation amplification) (see, e.g., U.S. Patent Nos. 5,185,243, 5,679,524, and
  • amplification is linear amplification such as, RCA.
  • capture reaction products e.g., circularized probes
  • the RCA reaction may comprise contacting a sample with modified nucleotides, such as biotinylated nucleotides, LNA nucleotides or artificial base pairs such as IsodC or IsodG, or abasic furans (such as dSpacer), to facilitate affinity enrichment and purification.
  • the amplification reaction products comprising linear repeating ssDNA can be contacted with a conventional primer provided by the invention to produce short extensions of double stranded DNA with a length 2, 3, 4, 5 ,6, 7,10,15, 20, 30, 40, 50, 75, 100, 500 nucleotides.
  • the length of extension may be controlled by time of extension step at the optimum temperature of elongation for this polymerase, e.g., 5, 10, 15, 20, 40, 60 seconds, at temperatures including 37, 42, 45, 68, 72, 74 °C.
  • the length of extension is controlled by mixing of nucleotide analogues that prevented further elongation into the reaction, such as dideoxyCytosine, or nucleotides with a 3' modification such as biotin, or a carbon spacer terminated with an amino group.
  • nucleotide analogues that prevented further elongation into the reaction, such as dideoxyCytosine, or nucleotides with a 3' modification such as biotin, or a carbon spacer terminated with an amino group.
  • a primer is contacted with a linear repeating ssDNA RCA amplification reaction product and extended by a polymerase for a single cycle of PCR, to generate a short single stranded DNA containing the complementary sequence to the repeating unit of the RCA product.
  • the primer contacted with a linear repeating ssDNA RCA amplification reaction product produces a dsDNA region comprising a restriction enzyme cleavage site. Accordingly, in certain embodiments, when the primer hybridizes to the linear repeating ssDNA RCA amplification reaction product to form a double-stranded DNA region, the amplification reaction product is contacted with the restriction enzyme to produce shorter fragments.
  • the amplification reaction uses adaptamer primers.
  • the amplification reaction uses sample-specific primers, that is, primers that hybridize to sequences present in the probe that identify the sample.
  • sample-specific primers that is, primers that hybridize to sequences present in the probe that identify the sample.
  • a low number of amplification cycles are used to avoid amplification artifacts, e.g., fewer than 25, 20, 15, 10, 9, 8, 7, 6, or 5 cycles.
  • the methods provided by the invention may comprise the step of contacting sample nucleic acids, capture reaction products or amplification reaction products with a secondary-capture oligonucleotide capture probe which comprises a moiety designed to be captured, such as a biotin molecule, and a nucleic acid sequence, which is able to hybridize to the sample nucleic acids, capture reaction products, or amplification reaction products.
  • a secondary-capture oligonucleotide capture probe which comprises a moiety designed to be captured, such as a biotin molecule, and a nucleic acid sequence, which is able to hybridize to the sample nucleic acids, capture reaction products, or amplification reaction products.
  • a secondary-capture oligonucleotide capture probe which comprises a moiety designed to be captured, such as a biotin molecule, and a nucleic acid sequence, which is able to hybridize to the sample nucleic acids, capture reaction products, or amplification reaction products.
  • a biotinylated probe may be extended on sample nucleic acids, capture reaction products or amplification reaction products using thermophilic or mesophilic polymerases.
  • the method comprises contacting a capture reaction product with a biotinylated oligonucleotide for enrichment of specific capture reaction products using the biotin:streptavidin interaction.
  • Sequences captured by the methods of the invention can be detected by any means, including, for example, array hybridization or direct sequencing. In some embodiments, captured sequences may be detected by sequencing without
  • the sequencing methods rely on the specificity of either a DNA polymerase or DNA ligase and include, e.g., pyrosequencing, base extension sequencing (single base stepwise extensions), multi-base sequencing by synthesis (including, e.g. , sequencing with terminally-labeled nucleotides) and wobble sequencing, which is ligation-based.
  • sequencing technology used in the methods provided by the invention include Sanger sequencing, microelectrophoretic sequencing, nanopore sequencing, sequencing by hybridization (e.g. , array-based sequencing), realtime observation of single molecules, and cyclic-array sequencing, including
  • pyrosequencing e.g. , 454 SEQUENCING ® , see, e.g., Margulies et al. , Nature, 437: 376-380 (2005)
  • ILLUMINA ® or SOLEXA ® sequencing ⁇ see, e.g., Turcatti et al., Nucleic Acids Res., 36, e25 (2008), see also U.S. Patent Nos. 7,598,035, 7,282,370, 7,232,656, and 7,115,400), polony sequencing (e.g. , SOLiDTM, see Shendure et al. 2005), and sequencing by synthesis (e.g. , HELICOS ® , see, e.g., Harris et al., Science, 320:106-109 (2008)).
  • the capture probes contain sequences that facilitate processing for sequencing by a certain sequencing technology, such as sequences that can serve as anchor sites for sequencing by synthesis, primer sites for sequencing reaction initiation, or restriction enzyme sites that allow cleavage for improved ligation of oligonucleotide adaptors for sequencing of the particular amplicon.
  • sequences that facilitate processing for sequencing by a certain sequencing technology such as sequences that can serve as anchor sites for sequencing by synthesis, primer sites for sequencing reaction initiation, or restriction enzyme sites that allow cleavage for improved ligation of oligonucleotide adaptors for sequencing of the particular amplicon.
  • circularized capture probes are contacted by oligonucleotides which prime polymerase-mediated extension of the capture probes to generate sequences complementary to that of the circularized probe, including from at least one to one million or more concatemerized copies of the original circular probe.
  • homologous probe sequences may be used in the probes provided by the invention, as well as conventional primer pairs.
  • the homologous probe sequences will be about 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 bases.
  • the region of interest between the target sequences of a probe or conventional primer pair is about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, or 50 bases.
  • the probes provided by the invention may be circularized by polymerase-dependent synthesis and ligation, or by ligation of n-mer oligonucleotides of about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 5, 20, 25, 30, 35, 40, 45, or 50 bases.
  • the region of interest is about 7 bases and homologous probe sequences are 10 or 12 bases.
  • a 7-mer oligonucleotide comprising a locked nucleic acid is ligated to a probe provided by the invention, and in still more particular embodiments, the 7-mer oligonucleotide comprises at least 1 , 2, 3, 4, 5, 6, or 7 locked nucleic acids (LNAs).
  • capture or amplification reaction products may be sequenced by emulsion droplet sequencing by synthesis as disclosed in, for example, Binladen er a/, PLoS One. 2(2):e197 (2007).
  • capture products may be amplified by RCA to generate higher copy numbers of capture product within a single DNA molecule in order to facilitate emulsion of captured DNA for emulsion PCR and sequencing by synthesis. See, e.g., Drmanac et al, Science 327(5961 ):78-81 (2010).
  • capture reaction products and/or amplification reaction products containing different samples are combined before detection.
  • capture and/or amplification reaction products are
  • combinatorially pooled before detection e.g., an MxN array of individual capture reaction products and/or amplification reaction products are pooled by row and column, and the pools are detected. Results from row and column pools can then be
  • capture reaction products and/or amplification reaction products contain identifying barcode sequences.
  • amplification primers contain sample-specific barcode sequences. Accordingly, the sample source of sequences contained in pools of capture reaction products and/or amplification reaction products are identified by their barcode sequences.
  • the methods provided by the invention may also include directly detecting a particular nucleic acid in a capture reaction product or amplification reaction product, such as a particular target amplicon or set of amplicons.
  • the mixtures of the invention comprise specialized probe sets including TAQMANTM, which uses a hydrolyzable probe containing detectable reporter and quencher moieties, which are released by a DNA polymerase with 5'- 3' exonuclease activity (U.S. Pat. No. 5,538,848); molecular beacon, which uses a hairpin probe with reporter and quenching moieties at opposite termini (U.S. Patent No. 5,925,517);
  • FRET fluorescence resonance energy transfer
  • the methods of the invention comprise using sample internal calibration nucleic acid (SICs) to estimate the concentration of a hepatitis strain in a test sample. This is done by calibrating the frequency of a sequence from a hepatitis strain to the known concentration of the SICs to provide an estimated concentration of the viral strain in the test sample.
  • the estimated concentration of the viral strain is compared to a database of reference concentrations of hepatitis strains associated with a disease state and/or likely clinical diagnoses.
  • the methods of the invention further comprise steps of formatting results to inform physician decision making.
  • Results refers to the outcome of detecting a target organism and includes, e.g., binary (e.g., +/-) detection as well as estimates of concentration, and may be based on, inter alia the result of sequencing a capture reaction product or amplification reaction product.
  • the formatting comprises presenting an estimate of the concentration of an organism in a test sample, optionally including statistical confidence intervals.
  • the formatting further comprises color-coding of the results.
  • the formatting includes recommendations for therapeutic intervention, including, for example, hospitalization, probiotic treatment, antibiotic treatments, and chemotherapy.
  • the formatting comprises one or more of the following: references to peer-reviewed medical literature and database statistics of empirically defined sample results.
  • Phusion polymerase is used to copy the reverse- complement of the target sequence into the probe and Ampligase ligase is used to circularize the resulting molecule as shown in Protocol 1 .
  • the resulting circular molecules can be amplified using adapter-primers shown in Table 8 to prepare the material for sequencing on either the Ion Torrent or lllumina platforms.
  • the probes are applied directly to extracted RNA without generating a cDNA intermediate.
  • This embodiment requires a DNA polymerase capable of using an RNA template (eg a standard reverse-transcriptase such as Tth,
  • Figure 5 shows the distribution of probes from tables 1-4 across a reference HCV genome.
  • the probes target 661 HCV strains from many genotypes.
  • the number of probes in each region indicates the sequence diversity in that region.
  • Conversion of raw sequence data may occur in three stages, namely (1) the processing of raw instrument data and conversion into aligned sequencing reads, (2) statistical interpretation of read data and (3) providing output and storage in archives.
  • statistical analysis and interpretation determine the most likely strains or substrains present in the sample given the sequencing data.
  • each sequencing read is first compared to the set of probe arms used in the capture reaction using an algorithm similar to Needleman- Wunsch but with no terminal gap penalty in the probe arm.
  • the software retains the probe arm with the best score. Having identified the probe arm and therefore
  • the software then compares the sequencing read against all expected reads for that probe, where expected reads were generated by an in-silico application of the probe set to a set of full or partial HCV genomes. All matches of a probe to a genome that meet some minimum criteria are included in the set of expected reads. Having compared all reads to all expected reads, the software picks the most likely strain or strains present in the sample based on the alignment scores, a model of mutation probabilities, and a user-provided prior probability on the number of strains to expect.
  • the methods of analysis determine the relative distance
  • the hidden variables in the model are the proportions or abundances of the strains and the assignments of sequencing reads to expected reads (where each observed read is assigned to a single expected read).
  • a variety of methods including Expectation- Maximization, Gibbs Sampling, and Metropolis-Hastings, may be used to find the values of these hidden variables, which maximize the probability of the data given the hidden variables and the priors on the hidden variables.
  • the software compares each read against both probe arms for each probe.
  • the software performs two alignments for each read-probe pair, first aligning the first probe arm with no terminal gap penalty for the probe arm and then aligning the other probe arm with no initial gap penalty for that probe arm.
  • the section of the read between the two probe arm alignments is the copied part of the target region that the probe captured from the target nucleic acid.
  • the software analyzes subsets of the data, where each subset contains only the capture regions of sequencing reads that overlap a mutation of interest.
  • Tools well known in the art such as
  • FreeBayes, SAMTools, and ShoRAH can be used to estimate the frequency of each allele based on the sequencing data.
  • Output of results can occur in parallel (1 ) to company server, (2) to xml and HL7 formats, e.g., for deposit in hospital system, in an electronic medical record (EMR) system, or in other HL7 or xml capable storage systems, for use in existing health record frameworks, and/or (3) to physician-friendly graphical and text formats, e.g., graphs, tables, summary text and possible annotated, web formats linking to reference information.
  • Output formats are arbitrary, e.g., simple text, spreadsheet data, binary data objects, encrypted and/or compressed files.
  • a complete record may involve all or some of these linked to a diagnostic test via unique identifiers. They may be assembled into a coherent object or may be accessible via a search for the unique identifier.
  • Protocols The following protocols were used in the examples described below and can be used by the skilled artisan when practicing the methods provided by the invention or using the probes, mixtures, and compositions provided by the invention. Variations on these protocols will be readily apparent to the skilled artisan.
  • Protocol 1 MIP capture, HCV cDNA target capture
  • thermocycler When the thermocycler reaches the 60° hold (approximately 26 minutes), add 2 ⁇ _ of enzyme mix to each sample and then advance the
  • thermocycler to the next step (60° for 10 min).
  • thermocycler When the thermocycler reaches the 15° hold, advance the thermocycler to the next step (94° for 2 min) and prepare the exonuclease mix:
  • thermocycler When the thermocycler reaches the 37° hold, add 1 ⁇ _ of exonuclease mix to each sample and then advance the thermocycler to the next step (37° for 30 min).
  • HCV genotype was confirmed by Versant HCV Genotyping Assay 2.0.
  • the 436 probes from tables 1-4 were used with Protocol 1 to target desired gene regions. Captured gene regions were sequenced using an Ion Torrent PGM and compared to sequences determined by Sanger
  • HCV probes from this invention correctly identified HCV-1 a and HCV-1 b viral variants compared to the Versant HCV genotyping assay. Resistance locus capture size averaged 200 bases, and read depth ranged between 50 to >50,000 fold. The probes detected mutations generating both nucleotide and amino acid polymorphisms. Figure 8 illustrates that among detected amino acid polymorphisms in our DAA-na ' ive clinical samples, we detected mutations reported to confer retroviral drug resistance in NS3, NS5a and NS5b proteins.
  • Selected observed mutations include: in NS3 - Q80L/K/R, D168G/E, I170T/V, 175L and E176G; in NS5a - M28T, Q30R, L31 M, P58S, Y58S and Y93H/N; and in NS5b - 71V, 1831, M414L/V, L419S, Y452H, V494A and V499A.
  • the probes agreed in 28/28 samples with Versant HCV genotyping assay in 1a/1 b clinical samples, see figure 9.
  • DxSeq detected mutations associated with resistance to antiviral drugs, such as TMC435, boceprevir, danoprevir, BI-201335, BMS-790052, GS-9190, BMS-650032, MK-3281 , VCH-916, and JTK-109.
  • Figures 11-13 describe the fraction of viral quasispecies represented by specific nucleic acid variants sequenced from selected samples, and illustrates detection of viral variants at 2% of the total viral nucleic acid present.
  • Sequencing reads from the 28 clinical samples in example 1 were analyzed to determine whether probe arms could be identified in the sequencing read. There were 13,583,863 reads for which at least the first probe arm could be identified uniquely (reads for which no probe arm can be identified are generally of poor quality and thus rejected). The probe arms were trimmed from these sequencing reads to yield only the capture regions. As the input files were FASTQ files containing both base calls and quality scores, both the nucleotides and quality scores were trimmed to produce a new FASTQ file. The resulting sequences were aligned against the human reference genome (hg19 from //genome. ucsc.edu) using the Bowtie2 alignment software version 2.0.0b6 and the following command line parameters: -q --sensitive --end-to- end -M 1 --no-unal --threads 8.
  • the resulting output will contain only sequences for which the probe capture can be mapped to one or more locations in the human genome using Bowtie2's sensitive alignment option. Only 71951 reads were mapped. Thus at most 0.53% of the sequencing reads that could be assigned to a probe contained a sequence of plausibly human origin.

Abstract

The present application provides, inter alia, nucleic acids probes, mixtures of probes, and compositions containing these probes, which are useful in the detection and characterization of hepatitis virus, such as hepatitis C (HCV) nucleic acid sequences in a test sample, such as a biological sample from a patient. The invention also provides methods of diagnosis and treatment of HCV using these probes, mixtures, and compositions; as well as systems for performing these methods.

Description

NUCLEIC ACIDS FOR MULTIPLEX DETECTION OF HEPATITIS
C VIRUS
RELATED APPLICATION
This application claims the benefit of U.S. Provisional Application No.
61/533,520, filed on September 12, 2011.
The entire teachings of the above application are incorporated herein by reference.
INTRODUCTION
The invention is directed to sets of nucleic acid probes for multiplex detection of hepatitis viruses and methods of using the probes.
Advances in sequencing technology have continued to drive a precipitous decline in per base sequencing costs. The $1 ,000 personal genome benchmark proposed by the U.S. National Human Genome Research Institute (NHGRI), however, remains elusive. Moreover, even a patient's complete genome may provide little or no insight into a patient's current infectious disease state. Infectious diseases, in turn, can be caused by a wide variety of pathogens, including viruses, bacteria, archaea, fungi, and other eukaryotes (both single cellular and multicellular), many of which can be cultured only with great difficulty or not at all, hindering detection and selection of proper clinical intervention.
The Hepatitis C Virus (HCV) is a single-stranded RNA virus that infects humans and can lead to liver disease and eventually cirrhosis, liver cancer, or liver failure. Numerous genotypes (strains) and sub-genotypes exist worldwide that demonstrate substantial genetic diversity. Furthermore, the HCV's rapid mutation rate means that even the viral particles within a single patient may present multiple SNPs or drug resistance phenotypes.
HCV's enormous genetic diversity and rapid mutation rate make reliable detection and sequencing difficult. Primers or probes designed against one genotype or sub-genotype may work poorly, if at all, against other types. While particular genotypes represent the majority of HCV infections in particular regions of the globe, other genotypes typically represent 10-25% of HCV infections. Thus, a diagnostic assay should work effectively on all genotypes without prior knowledge of which genotype is present to ensure reliable results in a clinical or research setting.
SUMMARY OF THE INVENTION
Embodiments of the present invention include optimized nucleic acid probes, and methods of using them, that enable the skilled artisan to simultaneously detect HCV from a clinical sample, determine the genotype and sub-genotype (or genotypes) present, and detect the presence of a wide variety of mutations known to confer resistance to various direct acting antiviral (DAA) drugs. The invention is based, at least in part, on the discovery of sequences, from sets of large query hepatitis C virus (HCV) sequences such as whole genomes, which can be used in multiplex diagnostic assays that dramatically reduce assay time and cost, compared to conventional diagnostics. The nucleic acids and methods of the invention enable the skilled artisan to identify hepatitis C virus and differentiate between closely related strains thereof based on the sequence of regions containing, for example, single nucleotide polymorphisms (SNPs), insertions, deletions, or indels (sites where a colocalized insertion and deletion has occurred, resulting in a net gain or loss in nucleotides). Advantageously, the nucleic acid probes and methods of the invention may be further multiplexed and used in automated systems, such as microplates, for high throughput processing of large numbers of samples by centralized laboratory, hospital, and/or diagnostic facilities.
Accordingly, aspects of the invention provide nucleic acid probes and mixtures comprising a plurality of nucleic acid probes capable of circularizing capture of a region of interest. Aspects of the invention include a single-stranded nucleic acid probe comprising a nucleic acid sequence of the formula:
5'-A-B-C-3' wherein A and C are probe-arms designed to hybridize, potentially with some number of mismatches, to a target nucleic acid sequence. A list of probe arms used in this invention is included in tables 1-4. B is a backbone sequence.
In some embodiments, the backbone sequence B comprises a cleavage site. In some embodiments, the cleavage site is a restriction endonuclease recognition site. In further embodiments, the backbone sequence B comprises one or more detectable moieties. In some embodiments, the one or more detectable moieties are each independently selected from a barcode sequence and a primer-binding sequence. In certain embodiments, the backbone contains non-Watson-Crick nucleotides, including, for example, abasic furan moieties, and the like.
In some embodiments, the backbone sequence B includes one or more primer binding sites that can be used to amplify the circularized probe sequence using rolling circle amplification, PCR, or similar techniques. In some embodiments, the primer sequences also contain adapters such that the amplification product can be sequenced on a particular next-generation sequencing platform such as the Ion Torrent PGM, Ion Torrent Proton, lllumina MiSeq, lllumina HiSeq, Solid XL, 454 GS, or a nanopore platform. In some embodiments, one or more of the primer sequences contains a barcode sequence such that several samples can be amplified each with a unique barcode and the sequencing reads de-multiplexed during analysis.
In some embodiments, each probe arm sequence in the group consisting of columns 2 and 3 from tables 1 -4 is contained in at least one probe in the plurality of probes. In other embodiments, the plurality of probes includes a subset of the probes in tables 1-4, where the probes in the subset have been chosen to detect a specific set of drug resistance mutations or to determine a subset of viral genotypes. In some embodiments, the composition comprising a plurality of probes comprises extracted nucleic acids from a test sample. The extracted nucleic acids may be from a biological sample. The biological sample may be from a human patient.
In some embodiments, the composition comprising a plurality of probes comprises at least one sample internal calibration standard nucleic acid. In some embodiments, the composition comprising a plurality of probes comprises at least one probe that specifically hybridizes with the sample internal calibration standard nucleic acid.
In some embodiments, the composition comprising a plurality of probes comprises extracted nucleic acids from a test sample.
Aspects of the invention further include a kit comprising the composition comprising a plurality of probes as described herein and instructions for use. In particular embodiments, the kit may also comprise reagents for obtaining a sample (e.g., swabs), and/or reagents for extracting RNA, and/or enzymes, such as polymerase and/or ligase to capture a region of interest.
Aspects of the invention include a method of detecting the presence of one or more strains of hepatitis C virus in a test sample, comprising:
a) contacting a test sample with the composition comprising a plurality of probes as described herein to form a mixture;
b) capturing a region or regions of interest in a hepatitis C virus genome by at least one single-stranded nucleic acid probe hybridized to a first and second target sequence in the hepatitis virus genome to form a circularized probe; and
c) detecting the captured region of interest, thereby detecting the presence of the one or more strains of hepatitis C virus. Aspects of the invention also include a method of detecting the genotype of one or more strains of hepatitis C virus (HCV) in a test sample, comprising:
a) contacting a test sample with the composition comprising a plurality of probes as described herein to form a mixture;
b) capturing a region of interest in an HCV genome by at least one single- stranded nucleic acid probe hybridized to a first and second target sequence in the HCV genome to form a circularized probe; and
c) determining the sequence of the captured region of interest, thereby detecting the genotype of each of the one or more strains of HCV.
In some embodiments, a region of interest is captured by polymerase-dependent extension from the 3' terminus of sequence C of a probe in the plurality of probes. In further embodiments, the region of interest is captured by sequence-specific ligation of a linking oligonucleotide.
In some embodiments, the method of detecting the presence of one or more strains of hepatitis C virus in a test sample or of detecting the genotype of one or more strains of HCV in a test sample includes the step of amplifying the circularized probe to form a plurality of amplicons containing the captured region or regions of interest. In some embodiments, the method of detecting the presence of one or more strains of hepatitis virus in a test sample or of detecting the genotype of one or more strains of HCV in a test sample includes the step of treating the mixture with a nuclease to remove linear nucleic acids between the steps of capturing and detecting a region of interest (steps (b) and (c) of each of the methods described above). In some
embodiments, the method includes the step of linearizing the circularized probe by cleavage with a site-specific endonuclease. In some embodiments, the method of detecting the presence of one or more strains of hepatitis C virus in a test sample includes the step of sequencing the region of interest. In some embodiments, the method of detecting the presence of one or more strains of hepatitis C virus in a test sample further includes the step of comparing the sequence of the captured region of interest to the sequence of known HCV genomes. In some embodiments, the method of detecting the presence of one or more strains of HCV or of determining the genotype of the one or more strains includes the step of comparing the sequence of the captured region to the predicted capture regions of a previously sequenced HCV genomes or partial genomes.
In some embodiments, the method of detecting the presence of one or more strains of hepatitis C virus in a test sample includes the step of analyzing the sequence of the captured region of interest with respect to the sequence of known hepatitis C virus genomes and a model of sequencing errors to estimate the proportions or abundances of the hepatitis C strains in the test sample.
In some embodiments, the method of detecting the genotype of one or more strains of HCV in a test sample includes the step of comparing the sequence of the captured region of interest to a database of known HCV mutations. In some
embodiments, the database of known HCV mutations is a database of known HCV drug resistance mutations.
In some embodiments, the method of detecting the genotype of one or more strains of HCV in a test sample includes the step of analyzing the sequence of the captured region of interest with respect to the sequence of known HCV genomes and a model of sequencing errors to estimate the proportions or abundances of one or more strains of HCV in a test sample. In some embodiments, the method of detecting the genotype of one or more strains of HCV in a test sample includes the step of analyzing the sequence of the captured region of interest with respect to the sequence of known HCV genomes and mutations and a model of sequencing errors to estimate the proportions or abundances of one or more mutations a test sample.
In some embodiments, the test sample is obtained from a human subject. In some embodiments, the test sample is blood obtained from a human subject.
In some embodiments, the method of detecting the presence of one or more strains of hepatitits C virus, of determining the genotype of one or more strains of HCV in a test sample, or detecting mutations in the HCV genome in the test sample includes the step of adding a sample internal calibration standard to the test sample. In some embodiments, the method further comprises the steps of adding a probe that specifically hybridizes with the sample internal calibration standard and detecting the sample internal calibration standard.
In some embodiments, the method of detecting the presence of one or more strains of hepatitis C virus or of determining the genotype of one or more strains of HCV in a test sample includes the step of formatting the results to inform physician decision making. In some embodiments, the formatting includes providing an estimated quantity of one or more HCV genotypes of interest. In some embodiments, the formatted results comprise a therapeutic recommendation based on the one or more HCV genotypes detected.
In some embodiments, the methods of using this invention achieves high specificity in detecting or sequencing only HCV nucleic acid compared to any human nucleic acids that may be present in the sample. When the nucleic acids processed by this method are sequenced on a next-generation sequencing machine, less than 50%, 25%, 10%, or 1 % of the resulting sequencing reads or sequencing reads that pass a quality filter represent human nucleic acids. This specificity is advantageous as it reduces the number of sequencing reads required to achieve a desired depth of sequencing for the HCV genome.
Additional objects and advantages of the invention will be set forth in part in the description, which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several embodiments of the invention and together with the description, serve to explain the principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 : A schematic diagram of a probe hybridized to the target nucleic acid.
The two homologous probe arms are shown, as are the two universal primer binding sites in the backbone that are present in certain embodiments.
Figure 2 : A schematic of the protocol of an embodiment in which (1 ) probes are hybridized to target nucleic acids in a sample, (2) a polymerase copies the reverse- complement of the target into the probe molecule and a ligase closes the circle, (3) exonuclease enzymes digest away target and unused probe molecules and (4) a pair of adapter-primers amplify circularized probe molecules containing the copied target region, adding a barcode and next-generation sequencing machine adapters in the process. Figure 3: Specific or degenerate primers initiate the reverse-transcription reaction. Each primer contains a homology region to the RNA and a non-binding tail. Panel B shows the content of the tail: a molecule-specific barcode or dogtag followed by a probe binding site. With such a primer and suitably long dogtag, each cDNA molecule will contain a unique dogtag. Panel C shows the capture of the resulting cDNA molecule (or its complement if the first strand cDNA was later amplified in a PCR reaction) by a probe. The probe binds to a target in the reverse-transcribed RNA and to the probe arm binding site such that the molecule-specific barcode will be captured and sequenced.
Figure 4: The protocol can be performed in a single tube and the resulting material used with any sequencing platform.
Figure 5: The distribution of the 436 probes along the HCV genome. The NS3 gene is between roughly 3kb and 4k, NS5A between roughly 6kb and 7kb, and NS5B from roughly 7kb to 9.5kb. The large number of probes overlapping certain coordinates reflects the enormous genetic diversity between the HCV strains seen in patients around the world. While certain technologies perform poorly on certain strains, the invention described here uses a large set of probes to ensure efficient capture of any sample.
Figure 6: Agreement between Sanger Sequencing, another next-generation sequencing approach, and the invention described here (Pathogenica DxSeq) on a subset of clinical samples.
Figure 7: Graphical summary of results for the 19 genotype-1a and 11 genotype- l b samples showing the coverage and results of a set of known drug resistance mutations in the NS3, NS5a, and NS5b genes. Figure 8: A list of selected HCV drug resistance mutations and the drugs to which they indicate resistance. Presence of even a small fraction of resistant molecules in a patient sample indicates that the drug should not be prescribed. Figure 9: Agreement in HCV genotype detection between the Versant Genotyping assay and the Pathogenica DxSeq assay disclosed here.
Figure 10: The three panes show scatterplots comparing the frequency of alternate alleles detected in an HCV sample between two replicates. Each point represents the frequency of the alternate allele at a single locus in the gene. Because the presence of small numbers of drug resistant viral particles in a patient can predict therapy failure, detection of alternate alleles at low frequency is critical for clinical HCV assays. The invention described here demonstrates a strong correlation between replicates across the entire range of allele frequencies.
Figures 11-14 show genotyping results for clinical sample 10 in genes NS3,
NS5a, and NS5b. Figure 11 indicates the interpretation of the plot: each column represents a single codon. The vertical coordinate indicates the frequency of the alternate allele at that codon in the clinical sample. The color of the circle indicates whether the alternate allele is a known drug resistance mutation.
DETAILED DESCRIPTION
1. Probes
One aspect of the invention provides mixtures of circularizing "capture" probes suitable for sensitive, rapid, and highly specific detection of one or more hepatitis C viruses in complex samples. "Probe" refers to a linear, unbranched polynucleic acid comprising two homologous probe sequences separated by a backbone sequence, where the first homologous probe sequence is at a first terminus of the nucleic acid and the second homologous probe sequence is at the second terminus to the nucleic acid, and where the probe is capable of circularizing capture of a region of interest of at least 2 nucleotides. "Circularizing capture" refers to a probe becoming circularized by incorporating the sequence complementary to a region of interest. Basic design principles for circularizing probes, such as simple molecular inversion probes (MIPs) as well as related capture probes are known in the art and described in, for example, Nilsson et al., Science, 265:2085-88 (1994), Hardenboi et al., Genome Res., 15:269-75 (2005), Akharas et al., PLOS One, 9:e915 (2007), Porecca et al, Nature Methods, 4:931 -36 (2007); Deng et al.,Nat. Biotechnol., 27(4):353-60 (2009), U.S. Patent Nos. 7,700,323 and 6,858,412, and International Publications WO 2011/156795,
WO/1999/049079 and WO/1995/022623.
Certain aspects of the invention encompass a single-stranded nucleic acid probe comprising a nucleic acid sequence of the formula:
5'-A-B-C-3'
wherein
A is a probe arm sequence listed in column 2 of tables 1 -4; and
C is the corresponding probe arm sequence listed in column 3 of tables 1-4 and
B is a backbone sequence.
Figure 1 shows a schematic of a probe hybridized to a target nucleic acid (either RNA or DNA).
Accordingly, embodiments encompass a probe, which includes two homologous probe sequences A and C, each of which may specifically hybridize to a different target sequence in a hepatitis viral genome adjacent to a region of interest. A probe may comprise any one of the pairs of homologous probe sequences in columns 2 and 3 of tables 1-4.
The probes may further comprise a backbone sequence, which contains a primer binding site between the homologous probe sequences. Typically, the homologous probe sequence at the 3' end of the probe (probe segment C) is termed the extension arm and the homologous probe sequence at the 5' end of the probe (probe segment A) is termed the ligation or anchor arm. Upon hybridization to the target sites in the genome of interest, the probe/target duplexes are suitable substrates for polymerase- dependent incorporation of at least two nucleotides on the probe (on the extension arm), and/or ligase-dependent circularization of the probes (either by circularizing a polymerase-extended probe or by sequence-dependent ligation of a linking
polynucleotide that spans the region of interest).
"Capture reaction" refers to a process where one or more probes contacted with a test sample has possibly undergone circularizing capture of a region of interest, wherein the first and second homologous probe sequences in the probe have
specifically hybridized to their respective target sequence in the test sample to capture the region of interest between the first and second target sequences of the probe. A capture reaction may produce no circularized products containing a region of interest if none of the organisms targeted by the probes were present in the sample. "Capture reaction products" refers to the mixture of nucleic acids produced by completing a capture reaction with a test sample. "Amplification reaction" refers to the process of amplifying capture reaction products. An "amplification reaction product" refers to the mixture of nucleic acids produced by completing an amplification reaction with a capture reaction product. Figure 2 shows a schematic of a protocol wherein the probe circularizes to capture a region of the target nucleic acid and is then amplified by universal primers in preparation for sequencing.
1.1 Homologous probe sequences
A "homologous probe sequence" is a portion of a probe provided by the invention that specifically hybridizes to a target sequence present in the genome of a hepatitis C virus. The terms "homologous probe sequence," "probe arm," "homologous probe arm," "homer," and "probe homology region" each refer to homologous probe sequences that may specifically hybridize to target genomic sequences, and are used interchangeably herein. "Target sequence" refers to a nucleic acid sequence on a single strand of nucleic acid in the genome of an organism of interest. In some embodiments, the homologous probe sequences in the probes are the probe pairs listed in tables 1-4. The term "hybridizes" refers to sequence-specific interactions between nucleic acids by Watson-Crick base-pairing (A with T or U and G with C). "Specifically hybridizes" means a nucleic acid hybridizes to a target sequence with a Tm of not more than 14 °C below that of a perfect complement to the target sequence.
In further particular embodiments, a bridge nucleic acid may be employed, wherein at least a first portion of the bridge nucleic acid is capable of hybridizing to the capture probe, and at least a second portion of the bridge nucleic acid (which may overlap with the first portion) is capable of simultaneously or sequentially hybridizing to the target nucleic acid, thereby enhancing the efficiency of ligation of the capture probe to the target.
In particular embodiments, a probe specifically hybridizes when: a) both homologous probe sequences A and C in the probe hybridize to their respective target sequence with at least 60, 65, 70, 75, 80, 85, 90, 95, or 100% correct pairing across the entire length of the homologous probe sequence; b) the 3'-most homologous probe sequence (also referred to herein as "C") hybridizes with 100% correct pairing in the 8, 7, 6, 5, 4, 3, or 2 bases at the 3' end of probe sequence C; and c) the 5'-most homologous probe sequence (also referred to herein as "A") hybridizes the first 8, 7, 6, 5, 4, 3, or 2 bases of the 5' end of probe sequence A. In still more particular
embodiments, a probe specifically hybridizes when: a) both homologous probe sequences A and C in the probe hybridize to their respective target sequence with at least 80% correct pairing across the entire length of the homologous probe sequence, b) homologous probe sequence C hybridizes with 100% correct pairing of the first 6 bases of the 3' end of C; and c) homologous probe sequence A hybridizes with 100% correct pairing of the first 6 bases of the 5' end of A. In still more particular
embodiments, a probe specifically hybridizes when: a) both homologous probe sequences A and C in the probe hybridize to their respective target sequences with a melting temperature that is within some range (eg, 10 degrees Celsius given the hybridization buffer) of the perfect match Tm, b) both homologous probe sequences hybridize with 100% correct pairing over the 6 bases at the external ends.
Homology between two sequences, e.g., a homologous probe sequence and the complement of a target sequence, may be determined by any means known in the art, including pairwise alignment, dot-matrix, and dynamic programming, and in particular embodiments by FASTA (Lipman and Pearson, Science, 227: 1435-41 (1985) and Lipman and Pearson, PNAS, 85: 2444-48 (1998)), BLAST (McGinnis & Madden, Nucleic Acids Res., 32:W20-W25 (2004) (current BLAST reference, describing, inter alia, MegaBlast); Zhang et al. , J. Comput. Biol., 7(1 -2):203-14 (2000) (describing the "greedy algorithm" implemented in MegaBlast); Altschul et al., J. Mol. Biol., 215:403-410 (1990) (original BLAST publication)), Needleman-Wunsch (Needleman and Wunsch, J. Molec. Bio., 48 (3): 443-53(1970)), Sellers (Sellers, Bull. Math. Biol., 46:501-14 (1984), and Smith-Waterman (Smith and Waterman, J. Molec. Bio., 147: 195-197 (1981)), and other algorithms (including those described in Gerhard et al., Genome Res.,
14(10b):2121-27 (2004)), which are incorporated herein by reference. In particular embodiments, the methods provided by the invention comprise screening candidate sets of sequences by MegaBLAST against one or more annotated genomes. In some embodiments, a sequence "specifically hybridizes" when it hybridizes to a target sequence under stringent hybridization conditions. "Stringent hybridization conditions" refers to hybridizing nucleic acids in 6xSSC and 1 % SDS at 65 °C, with a first wash for 10 minutes at about 42 °C with about 20% (v/v) formamide in O.lxSSC, and a subsequent wash with 0.2xSSC and 0.1 % SDS at 65 °C. In particular
embodiments, alternate hybridization conditions can include different hybridization and/or wash temperatures of about 55, 56, 57, 58, 59, 60, 61 , 62, 63, 64, 66, 67, 68, 69, or 70 °C or other hybridization conditions as disclosed in Sambrook and Russell, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 3rd edition (2001 ), which is incorporated herein by reference. In particular embodiments, the hybridization temperature is greater than 60 °C, e.g., 60-65 °C.
The homologous probe arms must be chosen to match the hybridization conditions, and vice-versa. If homologous probe arms are to be used at a less stringent hybridization condition (e.g. , a lower hybridization temperature or higher salt
concentration), nucleotides can be removed from the backbone-adjacent ends of the probe arms until the melting temperature of the probe arms is suitable for the
hybridization condition. The melting temperature of the probe arm can be computed using any software known to those skilled in the art, such as Melting 5.0
(www.ebi.ac.uk/compneur-srv/melting/). Likewise, to increase the melting temperatures of the probe arms to accommodate a higher hybridization temperature or lower salt concentration, nucleotides may be added to the backbone-adjacent ends of the probe arms such that the nucleotides match as many desired targets of the probe as possible. The backbone-distal ends of the probe arms cannot be easily modified to change the melting temperatures of the arms as the nucleotide composition of those ends (typically the terminal two nucleotides) play a significant role in the efficiency of the polymerase initiation and the ligation reactions.
An "organism" is any biologic with a genome, including viruses, bacteria, archaea, and eukaryotes including plantae, fungi, protists, and animals.
"Region of interest" refers to the sequence between the nearest termini of the two target sequences of the homologous probe sequences in a probe.
Homologous probe sequences A and C in a probe provided by the invention can readily be adapted for use as a pair of conventional primer pairs for use in a polymerase chain reaction (PCR) to specifically amplify a region of interest from a viral sequence. "Conventional primer pairs" refers to a pair of linear nucleic acid primers each member of which comprises sequences corresponding to one of the two homologous probe sequences in a probe provided by the invention, which are capable of exponential amplification of a region of interest comprising at least two nucleotides. These conventional primer pairs are encompassed by and are a part of the present invention. In contrast to the probes provided by the invention, which are capable of circularizing capture of a sequence complementary to a region of interest, conventional primer pairs are oriented with their 3' ends facing each other to facilitate exponential amplification. For example, in some embodiments a convention primer pair comprises a first primer comprising the sequence of an extension arm of a circularizing capture probe provided by the invention— i.e. a 3' primer or "C" sequence" and a second primer comprising the reverse complement of a ligation arm of a circularizing capture probe provided by the invention— i.e. a 5' primer or "A" sequence. In certain embodiments, the conventional primer pairs comprise a barcode sequence. In some embodiments, the conventional primer pairs comprise universal sequences, including, for example, sequences that hybridize to adaptamer primers. The probes and conventional primer pairs provided by the invention may comprise the naturally occurring conventional nucleotides A, C, G, T, and U (in deoxyriobose and/or ribose forms) as well as modified nucleotides such as 2'O-Methyl- modified nucleotides (Dunlap et al, Biochemistry. 10(13):2581-7 (1971 )), artificial base pairs such as IsodC or IsodG, or abasic furans (such as dSpacer) (Chakravorty, et al. Methods Mol Biol. 634:175-85 (2010)), that do not form canonical Watson-Crick hydrogen bonds), biotinylated nucleotides, adenylated nucleotides, nucleotides comprising blocking groups (including photocleavable blocking groups), and locked nucleic acids (LNAs; modified ribonucleotides, which provide enhanced base stacking interactions in a polynucleic acid; see, e.g., Levin et al. Nucleic Acid Res. 34(20):142 (2006)), as well as a peptide nucleic acid backbone. In particular embodiments, the 5' or 3' homologous probe sequences of a probe provided by the invention comprise, at their respective termini, a photocleavable blocking group, such as PC-biotin. In more particular embodiments, a probe provided by the invention comprises a photocleavable blocking group at its 5' terminus to block ligation until photoactivation. In other particular embodiments, a probe provided by the invention comprises at its 3' terminus a photocleavable blocking group to block polymerase-dependent extension or n-mer oligonucleotide ligation until photoactivation.
In other embodiments, the 5'-most nucleotide of a probe provided by the invention comprises an adenylated nucleotide to improve ligation and/or hybridization efficiency. See, e.g., Hogrefe et al., J Biol. Chem. 265 (10): 5561-5566, (1990). In more particular embodiments, the 5' end of the 5' homologous probe region (e.g., the ligation arm) comprises at least one LNA and in still more particular embodiments, the 5' terminal nucleotide is a LNA. In a particular embodiment, the probe molecule is capped with a phosphate group at the 5' end to improve the ligation efficiency.
Homologous probe arms may be chosen to capture regions that identify particular HCV genotypes or sub-genotypes. In the present invention, the set of 27 pairs of probe arms listed in tables 1-4 were chosen to distinguish among genotypes 1 a, 1 b, 1g, 2a, 2b, 3, 6a, 6m, 6k, 6n, 6j, 6i, and 6f. These probe arms were designed against regions that are relatively conserved across 661 instances of the above genotypes but that surround capture regions that are relatively variable.
Table 1 : Genotype distinction probes
Figure imgf000019_0001
Table 2: ns3 drug resistance mutations
probe 1 5' arm (A) 3' arm (C) probe 1 5' arm (A) 3' arm (C)
AF054247 3796 3931 TAGACTTCCCCTGCTGTCGCC GGTAGTTTCCATAGACTCAACG
AF054247 3797 3929 GTAGACTTCCCCTGCTGTCGC TAGTTTCCATAGACTCAACGGG
AF139594 3832 3998 GCCCTTCAAGTAGGAGACAGGC CACTTGGAATGTTTGCGGTAC
AF169004 3749 3919 ACCAGATACAGGTCGACCGCT GGATGAAGTCTATGGACTTAGC
AF176573 3885 4044 GCCCGGAAGATGCCTACAGCG CGGCACCTTAGTGCTTTTGC
AF207756 3443 3585 GGCCTCGCGTCTGTTGGGAG GCCGGCTAGAGTCTTTGAGC
AF207757 3816 3946 CCCTTCAAGTAGGAGATGGGCC TATCCGTGAAGACCGGAGAC
AF207764 3749 3904 GAATGACATCAGCATGCCTCGTG CTCAACAGGTATAAAGTCCACC
AF207772 3380 3520 GACTATCGGCTGGTCCCAGAAG CCAGGAAAGATTGCGTTGCG
AF238481 3774 3907 GTAACGCTCCCCGCTTGTCC GATGTCGAGTGTCTCAACGG
AF238484 3535 3667 GTTGTTCCGAGGAAGGACTGAGT GCACGGCTCCAAAGATTTGG
AF271632 3829 3974 GCCTTTCAAGTAGGAAATGGGCC CTACTGGTGGAGAGGAGTTATC
AY045702 3737 3916 AAATAGAGGTCTGAGCTGCCGC CGGGTATGAAATCCACCGC
AY878652 3433 3606 GCCCAATAGACCCCTGGTCTGC GGTCGACATTGGTGTACATCT
AY878652 3455 3603 CGGTAAGGCTGGTGACGATCG CGACATTGGTGTACATCTGAC
D11168 3464 3634 AACCAAGTAGGCCCCGTGTCT GGTCTACATTAGTGTACATTTGG
D14484 3592 3725 GCCATGGTAGACAGTCCAACACA TCTCGTGACCAAGTAGAGGTC
D50483 3817 3953 GCCCTTCAAGTAGGAGATGGGC GACGAGTTGTCCGTGAAGAC
DQ278891_3595_3727 CACCATGGTATACGGTCCAGAGG CCTCGTCACTAGGTAAAGGTC
DQ278891 3752_3915 GCATTCCTCGTCACTAGGTAAAGG CGGGAACAAAGTCAAGGGAC
DQ278893_3502_3637 GTTCTTGTCGCGGCCAGTAAGG TCGACGTTGGTGTACATTTGG
DQ480512 3710 3880 ACCAGATAGAGGTCACTAGAGCCA TGGGATAAAATCTAAGGATTTTGC
DQ480513_3736_3893 CGGGGATAACATCGGCCTCCC CATGTTTTCCACTGGGATAAAAT
DQ480514 3411_3549 CGCATACGCGGTGATGGGAG CGGCCCCATGATAAACAGTC
DQ480515 3761_3907 GCACGGTTGTCACCCCTGCG CATGGTCGTCTCCATGTTTTC
DQ480517_3615_3781 GTACATTTGACACACTGGCCCTT GCCTTTCAAGGTGCTTATGGG
DQ480519_3801 3973 GCCTTTCAAGGTGCTTATGGGC ATACCCCACCTGATAGGTCTG
DQ480520__3368_3539 CGCATATCATCCGCGGGTCC TGATAAACAGTCCACATGACAC
DQ480521 3742 3872 GCCTTGCAGGGATAACATCGGC TCTAAGGATTTTGCTACACCCC
DQ480521 3743 3905 CGCCTTGCAGGGATAACATCGG ATAGTCGTCTCCATGTTTTCCA
DQ480522_3529_3689 AGGTCGCCAGGAAGGATTGGG CCAGATAGAGGTCACTAGAGC
DQ480523 3435 3605 GCCGACTAGGCCCCTCGTCT TCCACATTGGTGTACATTTGAC
DQ480523 3690__3869 GCCACAAGTACATGGGGTGAGC AGGATTTTGCTACACCCCTTG
EU155216 3713 3887 GCTACCCCTGCTATCACCCCG GGAGAAGAGTTGTCCGTGAAC
EU155216 3719 3852 AAGCAGGCTACCCCTGCTATCA ATAGTTGTCTCTAGGTTCTCCAC
EU155224 3683 3862 AAGTAAAGGTCCGAACTGCCGC CGGGTATAAAGTCCACCGC
EU155224 3686 3865 ACCAAGTAAAGGTCCGAACTGCC ACTCAACGGGTATAAAGTCCAC
EU155227 3712 3880 CCGGAATAACATCAGCGTGTCTC TGGTAGTCTCCATAGACTCAAC
EU155227 3715 3882 GCACCGGAATAACATCAGCGTG CATGGTAGTCTCCATAGACTCA
EU155234 3836 3993 CAGCCCGGAAGATGCCTACAG CGGCACTTTAGTGCTTTTGC
EU155238 3398 3564 GACTATGCACCCTAGGAGGCCC TCTTTGTCCACATTGGTATACAT
EU155253 3721 3867 GGCGCACCGGAATGACATCG TCAACGGGTATAAAGTCCACC
EU155283 3754 3900 AGTAAGAGATAGGCCGGGGCG TGGAGAAGAGTTGTCTGTGAAC
EU155292 3857 4010 GGGATAAAGTCCACCGCCTTGG TGAGTACTAGCACCTTGTAGC
EU155314 3752 3914 AAATGCCTACGGCGTGTCCCG ACCTTGGTACTCTTACCGCT
EU155315 3607 3762 CCTGGTCTACATTGGTGTACATCT GAAGAGCCCTTCAAGTAGGAG
EU234061 3880 4032 CGGGTACGAAGTCCACCGCC TTCAGGACGAGTACCTTGTAC
EU239714 3427 3589 GCTAGTGATGATGCAACCAAGTAG TCTTGGTCCACATTGGTGTAC
EU255932 3725 3855 AAGCAGGCTGCCTCTGCTATCA TTGTCTCTAGGTTCTCTACAGG
EU255945 3564 3727 GTATACATCTGGATAACGGGACCC GCCCTTCAAGTAAGAGATAGGC probe 1 5' arm (A) 3' arm (C)
EU255945 3658 3835 GACCAGGTAAAGGTCCGAGGAG TCTACGGGGATAAAGTCCACC
EU255953 3516 3648 CCCCGTGGTAGACAGTCCAGC CCTCGTAACCAGGTAAAGGTC
EU255953 3584 3730 GTCTTTGTCTACATTGGTGTACATCT CCTTTCAAATAGGAAATGGGCC
EU255959 3557 3707 GAGGTCTTGGTCTACATTGGTGTA GAGGAGCCCTTCAAGTAAGAG
EU255975 3584 3734 CCTGGTCCACATTGGTATACATCT GAGCCTTTCAAGTAAGAAATGGG
EU256001 3715 3885 GCACCGGGATGACATCAGCG CCGCATAGTAGTTTCCATAGAC
EU256002 3849 4023 CCACTCCACGGGTACACACCG GAGCACTAACACCTTGTAGCC
EU256004 3791 3950 CCCTGAATATGCCTACGGCGTG CGGAACCTTAGTGCTCTTACC
EU256008 3629 3802 GGTAAAGGTCTGAGGAGCCGC ATAAAGTCCACCGCTTTAGCC
EU256029 3391 3553 GCTGGTAATTATGCATCCCAAGAG GGTCTTGGTCTACATTAGTGTAC
EU256029 3574 3721 GTCTTGGTCTACATTAGTGTACATCT GAGCCTTTCAAATAAGAGATAGGC
EU256031 3585 3733 GTCCTTGTCTACATTGGTATACATCT AGCCTTTCAGGTAGGAGATAGG
EU256049 3519 3651 CCCCATGGTAGACAGTCCAGCA CCTCGTGACCAGATAAAGGTC
EU256053 3757 3890 AGCCTTTCAAGTAAGAGATGGGCC AGGAGTTGTCTGTGAATACCG
EU256069 3550 3712 GTACATCTGGATGACAGGACCCT CTTTCAAATAAGAGATAGGCCGG
EU256102 3777 3913 GCCCTTCAAGTAGGAGACGGGC GACGAATTGTCCGTGAAGAC
EU256106 3579 3726 GTCTTGGTCTACATTGGTATACATTT GCCTTTCAAGTAAGAAATGGGC
EU482841 3594 3741 GTCTTTGTCTACATTGGTATACATCT CCTTTCAAGTAGGAGATGGGC
EU482842 3566 3715 GTCTTGGTCTACATTGGTATACATCT GAGCCTTTCAGGTAGGAGATAG
EU482847 3578 3723 GTCTTGGTCTACATTAGTATACATCT CTTCAAGTAGGAAATGGGCCG
EU482866 3419 3572 CCAGTCAGGCTGGTGACTATGC GTCTTTGTCCACATTGGTATACA
EU482881 3728 3903 CGGCGCACCGGAATAACATCA CGGGGACCGCATAGTAGTC
EU529682 3411 3586 CCTCGTGACCAAGTAAAGGTCCG ATAGACTCAACGGGTATAAAGTC
EU569723 3564 3726 GTATACATCTGGATGACAGGACCC CCTTTTAAGTAGGAAATGGGCC
EU569723 3580 3727 GTCCTGGTCTACATTGGTATACATCT GCCTTTTAAGTAGGAAATGGGC
EU660386 3508 3673 CACAGGTCGCCAGGAAAGATTG CCTCGTGACCAAATAGAGGTC
EU857431 3593 3725 CACCATGATAGACAGTCCAACACA TCTCGTGACCAAGTAAAGGTC
FJ435090 3744 3911 GTCACTAGATAGAGATCTGATGCGC ACGAAATCAAGTGCTTTCGC
GU133617 3643 3809 GTACATTTGGGTGATTGGGCCTT GCCTTTCAAGTAGGAGATGGG
NC 009823 3749 3914 ACCAGGTACAGGTCGACCGC AATCTATGGACTTAGCCACGC
S62220 3574 3724 AACATACGCCGTTGACGCAGG GTCTCGTAACCAAGTAAAGGTC
S62220 3578 3744 GTCCAACATACGCCGTTGACG ACCGGAATAACATCTGCATGT
S62220 3592 3741 CACCATGGTAGACAGTCCAACATA CGGAATAACATCTGCATGTCTC
S62220 3765 3942 CACCGGAATAACATCTGCATGTCT GGGACCGCATAGTAGTTTCC
S62220 3766 3940 GCACCGGAATAACATCTGCATGT GGACCGCATAGTAGTTTCCAT
S62220 3796 3931 GCAAGCTCCCTCTATTGTCGCC ATAGTAGTTTCCATGGACTCAAC
AJ238799 3557 3736 GTCCAACACACGCCATTGACGCA GAATGACATCGGCATGCCTCGTGACC ns3 AF054247 3592 3725 GCCATGGTAGACAGTCCAGCAC TCTCGTGACCAAGTAAAGGTC ns3 FJ024086 3501 3673 CGCCAGGAAAGATTGCGTTGC CCTCGTGACCAAATAAAGGTC ns3 EU256031 3514 3649 CGTGGTAGACAGTCCAGCACAC CCTCGTAACCAGGTAAAGGTC ns3 AF207764 3580 3713 GCCGTGGTAGACAGTCCAACAC CCTCGTGACCAGATAAAGGTC ns3 EU482845 3270 3443 GCCATTGATGATGTCACCGCAC TTGACAGAATCTGGACCTCAC ns3 AF139594 3450 3570 GTCTGTTGGGAGTAGGCCGTG GGTAGACAGTCCAACATACGC ns3 EU482868 3254 3382 GCCATTGATGATGTCACCGCAT GCCAGTTAGGCTGGTAATTATAC ns3 EU255947 341 1 3573 GCTGGTAATTATGCATCCCAAGAG GTCCTGGTCTACATTGGTATAC ns3 AF290978 3504 3628 ACCTCACCCTCCACTTGGTTCT GTCTTGGTCCACATTGGTATAC ns3 EF621489 3292 3462 GATAAGCTTGGTCTCCATTCGGG GTCAGACTGGTGATTATACACC ns3 EU256035 3414 3553 GGCCAGTTAGGCTGGTGATTATAC ATTGGTATACATCTGGATAACGG ns3 AY651061 3512 3640 CACCCTCCACTTGGTTCTTGTCC GGTCTTGGTCTACATTGGTATAC ns3 EU155353 3574 3726 GGTCTACATTGGTATACATCTGGATAA GCCTTTCAAGTAAGAGATGGGC probe 1 5' arm (A) 3' arm (C)
ns3 EU155269 3259 3384 CAAGCCGTCGATGATATCACCG CCGGTCAAGCTGGTAATTATAC ns3 EU155234 3522 3674 CACACGCCATTGATGCAGGTC TCTCGTGACCAAATAGAGGTC ns3 EU255993 3581 3734 GTCTACATTGGTATACATCTGGATAAC GCCTTTCAAGTAAGAAATGGGC ns3 AF483269 3252 3376 CCTTGGTCTCCATGTCAGAGAAGA CGTAATAGGCGCAAGGAGTC ns3 AF207767 3435 3612 GTCTGTTGGGAGTAAGCCGTGA GTGTACATTTGGGTAATTGGGC ns3 EU256091 2939 3077 TCAGAGAAGACGACGGGCTCA CGTGATAGGTGCAAGGAGTC ns3 AJ238799 3292 3419 GATAACCTTGGTCTCCATATCAGAGA GAGTAGGCCGTAATAGGCGC ns3 AF207760 3415 3592 GATAGGTGCAAGGAGTCGCCAT CCTTAGGACCGGCTAAGGTC ns3 EU862838 3402 3568 ATACACCCTAGGAGGCCTCTTGT GGTCTACATTGGTATACATTTGG ns3 EU660386 3541 3673 CACCATGGTAGACAGTCCAACAC CCTCGTGACCAAATAGAGGTC ns3 EU862832 3260 3435 GACAGGCAAGCCGTTAATGATGT TTGGGTAGCAGTTGATACGAT ns3 EU155308 3426 3589 GCTAGTAATGATGCAGCCAAGTAG GTCCTGGTCTACATTGGTGTA ns3 EU256003 3594 3741 GTCTTTGTCCACATTGGTATACATCT ACCTTTCAAGTAAGAGATGGGC ns3 FJ024087 3300 3447 CAATATCTCCCGGCCCTTGCG GGGTAGCAGTTGACACGATC ns3 EU482859 3319 3486 ATATCTCCCTCCCTCTTCGGGC AGGTCGCCAAGAAGGATTGT ns3 FJ462438 3493 3645 GTCTCTGCCGGTAAGGCTGGT ACTAGGTCTTGGTCAACATTGG ns3 EU155236 3425 3584 GCAGTTGACACAATCTGGACCTC AGGTGCAGGGTGTTAATGAAC ns3 AF207759 3282 3407 GTGATGATCTTGGTCTCCATGTCA GAGTAGGCTGTAATGGGCGC ns3 EU 155250 3435 3563 CCCTCCGCTTGGTTCTTGTCC GTCTTTGTCTACATTGGTATACAT ns3 EU255994 3250 3371 CAAGCCGTTGATGATGTCGCC GTTAAGCTGGTGATTATACATCC ns3 DQ480514 3529 3695 AGGTCGCCAGGAAGGATTGGG CTCGTGACCAGATAGAGATCAC ns3 EU256068 3383 3552 CCCTAGAAGGCCTCTTGTTTGCT TCTACATTGGTGTACATTTGGAT ns3 EU155381 3508 3673 CGCAGGTCGCCAGGAAAGATT TGTCTTGTGACCAAATAAAGGTC ns3 AF271632 3567 3725 CCATTGATGCACGTTGCCAGG CCTCGTTACCAGGTAAAGGTC ns3 AJ238800 2951 3078 GATAACCTTGGTCTCCATGTCAGA GAGTAGGCCGTAATATGCGC ns3 FJ462431 3595 3724 CCGCTCCATGGTATACAGTCCA TTCCTAGTAACCAGGTACAAGTC ns3 EU155328 3406 3585 GTAAACCTCGCGTCTGCTGGG CTGGTCCACATTGGTATACATT ns3 EU256077 3395 3573 GTCTGTTGGGAATAGGCCGTGA TTGGTATACATTTGAATGATTGGG ns3 EU255946 3408 3564 CAAGCTGGTGATTATGCATCCCA TCTTGGTCTACATTGGTATACATC ns3 EU155253 3243 3373 GATAACTTTGGTCTCCATGTCAGAG TTGGGAATAAGCCGTGATAGG ns3 AF054248 3574 3740 ACACACGCCATTGATGCAGGT AATGACATCAGCATGTCTCGT ns3 DQ480518 3545^3699 ATGACACCATTAATGGAGGTCACC TCCCTCGTGACCAGATATAGG ns3 EU155290 3415 3542 CCCTCCACTTGGTTCTTGTCCC TCTTTGTCTACATTGGTATACATC ns3 EU155240 3364 3542 TCTGCTGGGCATACGCTGTGA GGTATACATCTGGATAATGGGAC ns3 EU482878 3581 3735 GTCCACGTTGGTATACATCTGGAT AGCCCTTCAAGTAGGAGATAGG ns3 EU234062 3429 3591 GAGGCTAGTGATGATGCAGCCA GAGGTCTTGATCTACATTGGTG ns3 AB442219 3612 3740 GCTAAGGTCTTTGAGCCGGCG ATGACGTCAGCATGTCTCGT ns3 AY045702 3560 3728 CAGGTCGCCAGGAAGGATTGTG GTCTCGTGACCAAATAGAGGT ns3 DQ314806_ 3569_3734 CACAGGTTGCCAGGAAAGTTTGA TCGCGTGATGAGGTAAAGATC ns3 EU256038 3576 3741 GTGTACATCTGGATAACAGGGCC AGGAACCTTTCAAGTAAGAAATGG ns3 DQ835760 3609 3732 GCTAAGGTCTTTGAACCAGCACC GACATCTGCATTCCTAGTAACTA ns3 EU256074 3398 3550 GTGATTATACACCCTAGGAGGCCT TTGGTATACATTTGGATGACAGG ns3 EU482835 3360 3537 GATAGGCGCCAGCAACCTCCA CCCTTAGATGTCGCTATGGTC ns3 EU155257 3431 3590 GTGAGGCTAGTGATGATGCAGC AGGTCCTGGTCTACATTAGTGT ns3 AF165049 3633 3797 GTGTACATCTGGGTGATTGGACC AGCCCTTCAAATAAGAGACAGG ns3 AF165053 3283 3452 GGTAATGATCTTGGTCTCCATGTCA CTGTGAGGCTAGTGACTATACA ns3 AF207772 3562 3714 ACATACGCCATTGATGCAGGTCG GTCTCGTGACCAAGTAAAGGT ns3 EU155306 3526 3675 ACCAACACACGCCGTTGATACA TGTCTCGTGACCAGGTAAAGG ns3 EU255966 3576 3739 GTATACATCTGGATAACAGGACCCT GCCTTTCAGGTAAGAGATAGGC ns3 D84264 3648 3806 GTGTACATCTGGCATATGGGCCC TCAAGGTGCTTATTGGTCTTGG probe 1 5' arm (A) 3' arm (C)
ns3 EU255983 3391 3558 GATTATACACCCTAGGAGCCCTCT TCTTGGTCCACATTGGTATACA ns3 AF207765 3474 3633 CCTGTAAGGCTAGTGATGATGCAA CGAGGTCTAGGTCTACATTGG ns3 D63821 3518 3646 GACGATGTTCTTGTCCCTACCAG CTACGTTGGTGTACATTTGCA ns3 FJ839870 3268 3438 GTGTAAATATGACTGGCTCGAGCG CTTAACAAACCTCGAGTCTGC ns3 EU256034 3575 3738 GTATACATCTGAATAACGGGACCCT GCCTTTCAAGTAAGAGATAGGC ns3 EU155227 3539 3662 CCATGATAGACAGTCCAGCACAC AATAAAGGTCCGAACTGCCAC ns3 D63822 3483 3659 GTAACTATTGTACCCAGCAACCCG GCCAACCATGTCTTGATCTAC ns3 U89019 3547 3717 GTCGCTAGGAAAGATTGCGTTGC GCCATAGGGACCAAATAAAGGT ns3 EU155239 3476 3643 CATGTCGCCAGGAATGTTTGGG CCTCGTGACAAGGTAAAGGTC ns3 EU 155297 3474 3643 CATGTCGCCAGGAAAGTTTGGG TGCCTCGTAACCAAGTAAAGG ns3 AF207759 3633 3767 GTATACATCTGGGTGATTGGACCC AGAGTAGGCTCCCTCTACTATC ns3 EU482847 3391 3560 GTGATTATGCACCCTAGGAGGCC CAAGGTCTTGGTCTACATTAGTA
Table 3: ns5a drug resistance mutations probe 1 5' arm (A) 3' arm (C)
AB080299 6303 6454 AATACCGTGCATATCCAGTCCCA CTCATGGAACCGTTCTTGACA
AF 169004 6495 6642 CCCTGTGATTCTCATAGAGCCCA AGTCCTGTTATATAGGAGTATGAC
AF207762 6289 6434 CACTGTGCATATCCAGTCCCAAA CGTTCTTGACATGTCCGGC
AF207762 6290 6436 ACACTGTGCATATCCAGTCCCAA AACCGTTCTTGACATGTCCG
AF207771 6459 6635 ATGGAACCGTTCTTGACATGTCC TTGTCAGTAGTCATGCCCGT
AF238481 6432 6604 CATTACCAGAGATGTTGGCGCC TGTTATATAGGAGTATGACCCGT
AF238481 6438 6606 GGCGGACATTACCAGAGATGTTG CCTGTTATATAGGAGTATGACCC
AF238482 6210 6384 TAATCCAATTGTGGAGTCTCCTGAG ACATCGTGTGGTCATGATACC
AF238485 6268 6442 ACCCAGTCCCACACATCACGG GCCCTGTGATTCTCATAGAGC
AF290978 6404 6558 GCATAATGCCGTCTCCTCGC CCACAGCGCGAACTTATAGT
AF483269 6174 6342 GCAGCTGAGTGATGGTAAGGCT CCAGACTCCCTTATACCCACG
AF483269 6322 6473 ACTCCCGGTAACCGCGGCA GTTGATGGGAAATGTTCCATGC
AJ000009 6409 6588 CAGGTGGTTTGCATGATGCCG CGCGTAACCTCCACATACTC
D50483 6451 6591 GTTCTTGACATGTCCAGTGATCTG CGTAACCTCCACGTACTCCT
D50485 6282 6444 CATATCCAGTCCCAGACATCCCT TTCTCATGGAACCGTTCTTGA
D50485 6283 6445 GCATATCCAGTCCCAGACATCCC GATTCTCATGGAACCGTTCTTG
D50485 6308 6460 AAGTCTTGAAGTCAGTCAACACCG GTCTTAGGCCCAACGATTCTC
D50485 6481 6639 GGTCTTAGGCCCAACGATTCTCA TTAAGTTGTCAGTGGTCATGC
DQ278891 6292^6445 CCCAGTCCCAGATCTCCCGTAG ATTCTTAACGTGGCCGACTAT
DQ2788916339_6515 GAGCTTTGTTCTGAGCCAGGTC TAGCATTGATAGGGAACGTGC
DQ278891 6342 651 1 CACGAGCTTTGTTCTGAGCCAG ATTGATAGGGAACGTGCCATG
DQ278891__6345__6522 CGGCACGAGCTTTGTTCTGAG GTCGTCGTAGCATTGATAGGG
DQ278891_6433_6611 ATGGACATGTAGTGTGACAAATGC CATCCTCACTACCTCGACGT
DQ278891_6451_6609 CGACTATGTCAGCACCACATGG ATCCTCACTACCTCGACGTAC
DQ278891 6456 6604 GTGGCCGACTATGTCAGCACC ACTACCTCGACGTACTCCTC
DQ278891_6536_6698 GTAGCATTGATAGGGAACGTGCC ATCCACTTCAGTGAAGAATTCTG
DQ278892_6309_6468 TACACACCCAGTCCCAGATATCC ATCTTCATAGACCCGTTCTTTAC
DQ278894_6487_6648 CCTGAGATACGCATAGACCCGT CCTGGTTGGTTACACCTACTAT
DQ278894 6489 6644 AACCTGAGATACGCATAGACCCG GGTTGGTTACACCTACTATGAAA
DQ480520 6489 6666 AGTCCCGTGCCAAGTGTTACTG ACCTCGGTGAAGAATTCAGGA
DQ835763 6298J5452 ACAAACCCAGTCCCAGATATCCC TCCCGTTACGTACATGGCC
DQ835764_6292_6442 GCAAACCCAGTCCCAAATGTCTC CGTTCTTCACATGACCCGTG
DQ835765_6209___6380 GGCTGGTTATAGTAAGGGAACTGAG CATACGCCTCTAAACCCTCG probe 1 5' arm (A) 3' arm (C)
DQ835765_6299_6453 TACAAACCCAGTCCCAGATATCCC GTCCCGTTACGTACGTGGC
DQ835768_6487_6651 CGAACCTGAGATACGCATAGACC TTAAGGTCCTGGTTGGTTACAC
EU155226 6272 6417 AGCCAGGTCTTGAAGTCACTCAA TTAGGCCCAACGATTCTCATG
EU155230 6187 6329 CCTCGTTAATCCACTGGTGGAGC CTCGCCAGACTCCCTTATACC
EU155240 6373 6520 CATGTCCGGTGATCTCAGCTCC GCCTTATTTCCACGTATTCCTC
EU155265 6338 6487 GTGCATAATGCCATCTCCTCGC AGCGCGAACGTATAGTTCG
EU155265 6387 6566 TCCCGTTCTTGACATGTCCAGTA TTGTCAGTAGTCATACCCGTC
EU155284 6390 6534 ACATGTCCAGTAATTTCAGCTCCA CTTACTTCCACGTATTCCTCTG
EU155285 6379 6557 ATCGTCCCGTTTCTGACATGTCC TAAGGTTGTCAGTAGTCATACCC
EU155292 6398 6576 ATCGTCCCGTTCTTGACATGTCC TAAGATTGTCAGTAGTCATACCC
EU155314 6216 6395 GGCATGAGCTTGGCCTTTAGCC GTGGTGTAGGCGTTGATAGG
EU155314 6329 6479 CATGTCCGGCGATCTCAGCTC CCGCCTTATTTCCACGTATTC
EU155331 6341 6517 ACTCCCTTATACCCACGTTGGC CCATAGCGCCCTAGAATAGTT
EU155331 6345 6519 CCAGACTCCCTTATACCCACGTT GCCATAGCGCCCTAGAATAG
EU155332 6403 6553 CATGTCCGGTGATCTGTGCTCC CGCGTAATCTCCACGTACTC
EU155370 6245 6400 GTGCATATCCAGTCCCAGATGTC CATGGAACCGTTCTTGACATG
EU155371 6240 6402 GCATATCCAGTCCCAGACGTCC ATCCTCATGGAACCGTTCTTG
EU255931 6381 6526 CATGTCCAGTAATCTCGGCTCCA CTTATCTCCACGTATTCCTCTG
EU255945 6374 6524 CATGTCCGGTGATTTCAGCTCC CCTGCCTTATTTCCACGTATTC
EU255950 6204 6362 GTCCCAGATGTCTCTCAGCCAG GTTCTTGACATGTCCGGTGA
EU255952 6347 6523 CAGCGAGTGTGCATAATGCCAT TGCCTTATTTCCACATATTCCTC
EU255954 6176 6326 GAGTGGTGCAATCCGAGCTTATC GGTGTGCATAATGCCATCTCC
EU255965 6133 6302 GGAGTTGGGTTACAGTGAGGCT CCAGACTCCCTTATACCCGC
EU255978 6109 6274 GTGAGGCTGCTGAGTATGGTAGT GCGTTGGCAGGATACAAAGG
EU255979 6353 6526 CAACGAGTGTGCATGATGCCG TTATCTCCACGTATTCCTCTGC
EU255999 6108 6259 GTGAGGCTGCTTAGTATGGCAGT CATAAAGGGAATCCCAGGTAGT
EU256074 6344 6493 GTGTGCATGATGCCGTCTCCTCGCCAG CCACAACGCGAACTTATAGTT
EU482841 6449 6628 GTTCCATTCCACATGTTCCTGCA TCCAATTCTGTAAAGAATTCGGG
EU482842 6198 6338 ATATCCAGTCCCAGATGTCCCTTAG TGTCCAGTAATCTCAGCTCCG
EU482844 6235 6379 CACCTCGCATATCCAGTCCCAG CGTTCTTGACATGTCCGGTG
EU482848 6357 6501 CATGTCCAGTGATCTCAGCTCCA TTATTTCCACGTATTCCTCTGC
EU482853 6311 6484 CTCCCTTGTACCCGCGTTGGC CGCGAACGTATAGTTCGGC
EU482854 6310 6487 ACTCCCTTATACCCGCGTTGGC CACAGCGCGAACGTATAGT
EU482859 6243 6403 GCATATCCAGTCCCAAACGTCC TCTCATGGAACCGTTCTTGAC
EU482865 6200 6375 CATATCCAGTCCCAGATGTCCCT CTAGGACCGACGATTCTCATC
EU482872 6233 6375 CACCTCGCATACCCAGTCCCA CGTTCTTGACATGTCCAGTAAT
EU595699 6246 6397 GCATAATGCCATCTCCTCGCCA AGCGCGAACTCATAGTTCG
EU781823 6343 6522 GTGTGCATAATGCCGTCTCCTC ATCTCCACGTACTCCTCCGC
EU862832 6282 6430 ACACAAAGGGAATTCCGGGCAA GTGTAGGCGTTAATAGGGAAGG
EU862832 6284 6435 GGACACAAAGGGAATTCCGGGC CGTGGTGTAGGCGTTAATAGG
FJ390397 6452 6603 GTGTTGCTGCAGGTTCTAGGCC CACTTTACGTTGTCAGTGGTC
FJ478453 6149 6323 GTGATGGTAAGGCTGGAGAGGAT AGACTCCCTTATACCCACGTT
M62321 6231 6408 GAGCATGGAGTGGTACACTCCGAGCTTA GTGGCAGCGAGTGTGCATGATGC
EU781823 6155 6331 GAGCATGGAGTGGTACACTCCGAGCTT CACAGTGGCAGCGAGTGTGCATAATGC
EU781823 6156 6328 GAGCATGGAGTGGTACACTCCGAGCT GCGAGTGTGCATAATGCCGT
EU256033 6215 6360 GTCGCTCAGCACCTCGCATATCC ACATGTCCAGTGATCTCAGCTCCA
EU155297 6149 6318 GAGCATGGAGTGGTGCTCTCCGAGCT GCAGCGAGTGTGCATGATGCCGTCTC
EU155297 6152 6319 GAGCATGGAGTGGTGCTCTCCGA GCAGCGAGTGTGCATGATGCCGTCT
EU781823 6159 6325 GAGCATGGAGTGGTACACTCCGA GCGAGTGTGCATAATGCCGTCTC
EU239713 6342 6508 GTGTGCATGATGCCGTCTCCTCGCCAG AACGTGTAGTTCGGCGCAGG probe 1 5' arm (A) 3' arm (C)
AF290978 6385 6548 GCATAATGCCGTCTCCTCGC AACTTATAGTTCGGCGCAGG ns5a EU155347 6205 6361 CATATCCAGTCCCAAATGTCCCTT CGACCCGTTCTTGACATGTC ns5a EU255953 6146 6328 GTGTAGTCGCCTCAGGAGCTG GCGAGTGTGCATGATACCAT ns5a EU155339 6189 6367 CATATCCAGTCCCAGATGTCCCT GTCTTAGGACCGACAATCCTC ns5a EU482883 6248 6404 CATATCCAGTCCCAAACATCCCTT ATGGAACCATTCTTGACATGTC ns5a D50485 6282 6460 CATATCCAGTCCCAGACATCCCT GTCTTAGGCCCAACGATTCTC ns5a EU 155229 6154 6320 GCTGAGTGATGGTAAGGCTGGA CTCCCTTATACCCTCGTTGAC ns5a EU482872 6218 6375 GTCCCAGATGTCTCTTAACCAGGA CGTTCTTGACATGTCCAGTAAT ns5a EU155303 6242 6386 CATATCCAGTCCCAAACATCCCTC GACATGTCCGGTGATCTGTC ns5a AJ238800 5982 6154 GCCAGGTCTTGAAATCAGTCAACA TTCCATGCCACGTGTTACTAC ns5a EU255966 6175 6333 GCAGTCCGAACTTATCCACTGGT GAGTGTGCATAATGCCATCTC ns5a EU255954 6224 6365 CCTCACATATCCAGTCCCAGATGT TCTTGACATGTCCAGTGATCTC ns5a U89019 6193 6367 GTGATGGTAAGGTTGGAGAGAATCT CAGACTCCCTTATATCCACGTT ns5a D50485 6298 6473 GTCAGTCAACACCGTGCATATCC CGTATTGCTGCAGGTCTTAGG ns5a_DQ278892_6316J5468 AGTACAGTACACACCCAGTCCCA ATCTTCATAGACCCGTTCTTTAC ns5a EU155254 6261 6403 AAATCACTCAACACCGTGCATATC CTCATGGAACCGTTCTTGACA ns5a AF169003 6228 6406 GAGTCTTCTGAGCAGGCTAGTTAT GTCATGATACCAGTACCGGC ns5a_DQ835768_6312_6457 AAATCAGACAACACAGTACAAACCC ATACGCATAGACCCGTTCTTC ns5a AF169005 6230 6419 AGTCTTCTGAGCAGGCTAGTTATAG GACATCGTGTGGTCATGATGC ns5a AF169002 6246 6419 AGTAATCCAATTATGGAGTCTTCTGAG AACATCGTGTGGTCATGATGC ns5a EU255951 6232 6372 ACCTCGCATATCCAGTCCCAGA CTTGACATGTCCGGTAATCTC ns5a DQ835760 6319_6495 CCATGTCTTGAAGTCACTCAGGAC CGTACCATGCCAGGTATTGC ns5a FJ462434 6292 6476 GCAAACCCAGTCCCAGATGTCC CTACATGTCTTTGGCCCGAC ns5a DQ278892 6309 6466 TACACACCCAGTCCCAGATATCC TCATAGACCCGTTCTTTACGT ns5a D14853 6294 6437 CATATCCAGTCCCAGACATCTTTGA GACATGACCAGTTATATCGGC ns5a DQ314806 6227 6413 GTTTACGCAAGAGACTAGTGATGG GTAGTGTTGCAAATCCCGTC ns5a EU255987 6201 6380 ATATCCAGTCCCAGATGTCCCTTAG GTCTTAGGACCGGTGATCTTC ns5a EU155368 6192 6374 GAAGTCCTCATTAATCCACTGGTGG TAATTTGTGCTCCACATGGGC ns5a AY651061 6294 6450 CATATCCAGTTCCAGACATCTTTAAGC CGAACCGTTCTTGACATGTC ns5a EU256021 6212 6358 CATATCCAGTCCCAGATGTCTCTC TTGACATGTCCGGCGATTTC ns5a D49374 6296 6475 ATGTAACCAGTCACCGTTGCAGG TCATGGACCCATTCCTTACGT ns5a EU482888 6168 6329 CCTCTTTAACAGCTGAGTGATGGT GCCAGATTCCCTTGTATCCAC ns5a EU255951 6226 6375 CATATCCAGTCCCAGATGTCCCG CATTCTTGACATGTCCGGTAAT ns5a AY651061 6300 6454 ACCTCGCATATCCAGTTCCAGAC CTCATCGAACCGTTCTTGACA ns5a FJ839870 6308 6468 AAGTCACTCAGTACGGTGCAGAC TCTTTGGCCCGGTAATTCGC ns5a AJ132996 6291 6443 GTCCCAGACATCCCTTAGCCAC GACATGTCCGGTGATTTGTG
Table 4: ns5b drug resistance mutations probe name 5' arm (A) 3' arm (C)
AB442222 8386 8553 AATATAAAGCCGCTCTGTGAGCG CACAGATAACGACAAGGTCGT
AF165050 8426 8598 CGCGGCACCGACGATAACC TAGTCATAGCCTCCGTGAAGA
AF165052 8360 8535 GCTCTGTGAGCGACTTTATGGC AGATAACGACTAGGTCGTCTC
AF165064 8318 8458 AGTCACAACATTGGTAAATTGACTCC CTTCAAGTAACATGTGATGGTG
AF169004 8457 8616 CCCTCCCACGTAAAGTCTCTCAG AGATGACAACCAAGTCGTCG
AF207760 8421 8585 CACCGGCGATAACCGCAGTT CGTGAAGACTCGTAGGTTTGC
AF207766 8279 8454 CATTCTCGGTGACCGTTGAGTC CAAGTAACATGTGAGGGTATTGC
AF207766 8280 8421 TCATTCTCGGTGACCGTTGAGT TCAGCACGCCGCTTGCGCGGC
AF207768 8288 8433 CACGGATGTCACTCTCAGTGACC CAGCTGGTCGTCAGTACGC probe name 5' arm (A) 3' arm (C)
AF238481 8414 8583 GTAAAGTCTCTCAGTCAGCGAATG TGAGATGACAACCAAGTCGTC
AF238485 8410 8581 AGTCTCTCAGTCAGCGAGTGTAT AGATGACAACCAAGTCATCGC
AJ000009 8356 8529 CGCTCTGTGAGCGACTTTATGG TAACGACGAGGTCGTCTCC
AY859526 8356 8530 GCCGTTCTGTTAGGGATGATACTG CACAAATGACCACTAGGTCATC
AY859526 8359 8527 AGAGCCGTTCTGTTAGGGATGAT AATGACCACTAGGTCATCTCC
D50484 8317 8457 GTCACAACATTGGTATATTGACTCCT CTTCAAGTAACATGTGAGGGTAT
DQ278893 8377_8531 AGAGCGTATGGCTGTCTCGGC ACATGTCATATTTGGTAAGGCC
DQ278893_8383^8534 AGTGAGAGAGCGTATGGCTGTC ACCAACATGTCATATTTGGTAAG
DQ278893_8387_8532 GCTCAGTGAGAGAGCGTATGGC CAACATGTCATATTTGGTAAGGC
DQ480513__8310_8454 GCAAGACTGATAGATGTCGTTCTC CTTCAGATAGCATGTGATGGTG
DQ480518_8308_8455 AAGACTGATAGATGTCATTCTCAGTC CCTTCAGATAGCATGTGATGGT
DQ480521^8370 8531 GCCTACGTAGAGCCGTTCTGTC ACAAATGACCACTAAGTCATCTC
DQ480521 8371_8530 CGCCTACGTAGAGCCGTTCTG CAAATGACCACTAAGTCATCTCC
DQ480524 8356 8534 CTGTCAGGGATGATACTGCCTTC CAAATGACCACTAGGTCATCTC
DQ835763 8314 8476 GCGTATGTCGCGCTCGGTG CATGTGATGGTATTGCCCATG
DQ835763 8317_8481 GTTGCGTATGTCGCGCTCGG CAAGTAACATGTGATGGTATTGC
DQ835764 8393 8537 GTTCATTGAGTGATGTGATGGCCT AACATGTCATAGTCTTTGAGCTT
DQ835764_8394_8546 CGTTCATTGAGTGATGTGATGGC GCAGACCAACATGTCATAGTC
DQ835765_8313__8483 CGTATGTCGCGCTCAGTGACT TTTCAAGTAACATGTGATGGTATT
DQ835766_83308482 AGATGTCGTGCTCGTTGCGTA TTCAAGTAACATGTGATGGTATTG
EF621489 8293 8433 ATTCTCAGTGACTGTGGAGTCAAA GTACGCCGCTCGCGCGGCACC
EU155218 8240 8415 CACTCTCAGTGACTGTTGAGTCAA CAAGTAGCATGTGAGGGTATTAC
EU155226 8245 8412 GGATGTCACTCTCGGTGACCG TAGCATGTGAGGGTATTACCG
EU155241 8235 8376 GGTCCAGGTCACAACATTGGTAG TTGGGCCTTAATGTAGCAAGT
EU155254 8330 8496 CATAAAGCCGCTCTGTGAGCG AGATAACGACAAGGTCGTCTC
EU155258 8384 8559 CGGCATCGGCGATAACCGC CTAGTCATAGCCTCCGTGAAG
EU155273 8255 8421 AACGTAAAGCCTCTCAGTGAGGG ACAGATAACGACTAGGTCGTC
EU155305 8244 8413 GATATCACTCTCAGTGACTGTTGAG GTAACATGTGAGGGTGTTACC
EU155317 8248 8414 CACGGATATCACTCTCAGTGACTG CAAGTAACATGTAAGGGTATTGC
EU155330 8266 8408 GATGTCACTCTCGGTGACCGTT CGTCAGTACACCGCTCGC
EU155335 8239 8414 CACTCTCGGTGACCGTTGAGTC AAGTAGCATGTGAGGGTATTGC
EU155347 8205 8383 CGCTCTCAGTGACTGTAGAGTCA TGATGTAACAGGTGAGGGTGT
EU155360 8278 8421 GGTCACAACATTGGTAAATTGACTC GCTTTCAAGTAGCATGTGAGG
EU239716 8302 8478 ACATAAAGTCTCTCAGTGAGGGACT GCACTTTCGCAGATAACGAC
EU255960 8322 8473 CCGCTCTGTGAGCGACTTTATG CACACGAGCATAGTGCAGTC
EU255987 8292 8458 CATAAAGCCTCTCAGTGAGGGACT AGATAACGACTAAGTCGTCGC
EU256000 8244 8417 GATATCATTCTCAGTGACTGTTGAGT CTTTCAAGTAACATGTAAGGGTAT
EU256009 8253 8393 GGGTCCAGGTCACAACATTGGT CTTGGCCTTGATGTAGCAAGT
EU256039 8252 8393 GGTCACAACATTGGTAAATTGCCT CCTTGATGTAACAAGTGAGGGT
EU256058 8298 8467 AACATAAAGCCTCTCGGTGAGGG CTTTCACAGATAACGACTAAGTC
EU256062 8248 8416 GGATGTCACTCTCAGTGACTGTTG GGTAACATGTGAGGGTATTACC
EU256064 8330 8494 ACATAAAGCCGCTCTGTGAGCG GATAACGACAAGGTCGTCTCC
EU256091 7957 8134 CGCTCTCAGTGACTGTTGAGTCA TTCAAGTAGCATGTAAGGGTATT
EU482833 8385 8545 GCGGCACCGACGATAACCG CCGTGAAGACTCGTAGGCTCGC
EU482859 8275 8415 CACAACATTGGTAGATTGACTCCTC CAAGTAACATGTGAGGGTGTTA
EU482888 8251 8422 CACGGATGTCACTCTCAGTGACT CCTTCAAGTAACATGTGAGGGT
EU569722 8319 8486 ACGTAAAGCCTCTCAGTGAGGG GCAGATAACGACTAGGTCGTC
EU862841 8214 8389 GATATCACTCTCGGTGACTGTGG CCTTAATGTAGCAAGTGAGGGT
NC 009823 8432 8606 AGCGAGTGTATGGCAGTGTGG AAGTCATCGCCGCATACCA
NC 009823 8441 8620 CTCTCAGTTAGCGAGTGTATGGC TTTCTGAGATGACAACCAAGTC probe name 5' arm (A) 3' arm (C)
U01214 8310 8469 ACTCCTCAGTACGGATGTCACTC TTCAAGTAACATGTGAGGGTATT
EU482885 8371 8548 GCTCGCGCGGCACCGGCGATAGCC CCGTGAAGACTCGTAGGCTCGC
FJ024274 7689 7857 GCTTCCTCTACGGATAGCAAGTTAGCCT GTTCTTAGCCATGATGGTAGTAT
M62321 7837 7999 GCTTCCTCTACGGATAGCAAGTTAGCC AGCCATGATGGTAGTGTCTATT
EU255938 8596 8748 GCTCCAAGTCGTATTCTGGTT CATGATTATGTTGCCTAGCCAGGAATT
EU255938 8594 8738 GCTCCAAGTCGTATTCTGGTTGT GCCTAGCCAGGAATTGACTGGAGTG
EU155297 7753 7902 GCCTCCTCTACGGATAGCAAGTTAGCC GTAGTATCTATTGGTGTTACACTGT
EU155230 8766 8929 ACATAATGATGTTGCCTAGCCAG GCTAAGACCATGGAGTCGTTGAATGAT
FJ024278 8691 8845 CCAGGAATTGACTGGAGTGTGTCTTGCT CCAGTGGTTCTATGGAGTAGCA
EU 155230 8767 8928 ACATAATGATGTTGCCTAGCCA ACCATGGAGTCGTTGAATGATC
EU255991 8192 8358 GATGTCGCTCTCAGTGACTGTGG CTAGTTGTCAGTACGCCGCTC
EU482876 8213 8379 GATGTCGCTCTCAGTGACTGTG CTAGTTGTCAGTACGCCGCT
EU256053 7589 7768 CGAGTTGCTCAGTGCGTTGATAGGCA GCTTCCTCTACGGATAGCAAGTT
EU256085 8389 8550 GCAGCTGGTCGTCAGCACGCCGCT CCTCCGTGAAGGCTCGTAGG
EU255992 8197 8364 GATATCGCTCTCAGTGACTGTAGA CTAGTCGTCAGTACGCCGCTC
EU482876 8876 9049 GTTCTATGGAGTAGCAGGCTCCGTAG ACAGAAGCCTAGCGCGGACGCTCC
EU256099 8748 8912 GCCTAGCCAGGAGTTGACTGGAGTGTGT CAATGATCTGAGGTAGGTCAAGTG
EU256052 8369 8525 CTAGTCGTCAGTACGCCGCT CCTCCGTGAAGGCTCGCAGGCTCGC
FJ410172 7684 7858 CCTCCTTGAGCACGTCCTGGTA CTTCCAGAAGGTCCTTCCACAC
AF356827 8272 8440 GATGTCACTCTCAGTGACCGTTGAG GCAGCTAGTCGTCAGCACGCCGCTC
EU256085 8608 8751 CTCCAAGTCGTATTCTGGTTG GCCTAGCCAGGAGTTGACTGGAGTG
EU256048 8368 8526 CTAGTTGTCAGTACGCCGCTCG CCTCCGTGAAGGCTCTCAGGCTCGC
EU255934 7781 7932 TCAGGCTGCAAGCTTCCTCTACGGATAG GCCATGATGGTAGTGTCTATTG
AF207768 8652 8806 ATGTTATCAGCTCCAGGTCGTATTCTG ATGATGATGTTGCCTAGCCAG
DQ480523_7678 7829 GTGGAATACACCATGTTATGGTG CGCAGGCTTCCTCTATGGATA
AF207768 8654 8804 ATGTTATCAGCTCCAGGTCGTATTC ACATGATGATGTTGCCTAGCCAGGA
DQ480522J7676J7828 GTGGAATACACCATGTTATGGTGTC CGCAGGCTTCCTCTATGGATAG
EU155365 8750 8927 GCCTAGCCAGGAATTGACTGGAGTGT ACCATGGAGTCGTTGAATGATCT
DQ480517 7751 7922 GTAATGTTGGTCGAACACTTGCA GCGGATGTGGTTAACGGCCT
EU239713 7786 7944 GCTTCCTCTACGGATAGCAAGTTAGC GCCATGATGGTAGTGTCTATTGGT
DQ480522_7749_7897 GTAATGTTGGTCGAACACTTGCACT GCTAGCATGGCTTCTAACGTCCTG
DQ480523_8351_8498 GCCTACGTAGAGCCGTTCCG AACATGTCATAGTCCTTGATGTTGG
DQ480523 8397 8575 GCATCTACGGTAGCCACATGAC AATGCTCGCAGTGACGCAGTGTC
DQ480524 8729_8873 GTGCAGTCACGTGTGAGGTAGT AATTGTTCTTGCGACTGGAGGATGGAGA
DQ480523 8398_8563 GCATCTACGGTAGCCACATGA ACGCAGTGTCCTCCTGGACGCC
DQ480522 7748J7898 GTAATGTTGGTCGAACACTTGCACTC GCTAGCATGGCTTCTAACGTCCT
DQ480515 7679 7822 GTGGAATACACCATGTTATGGT GCTTCCTCTATGGATAGGAGCTT
DQ480513 8499 _8664 AACATGTCATAGTCCTTGATGTTG ACATTGGATGAGCACGATGTTATGAGC
DQ480518 8559 8719 ACGCAGTGTCCTCCTGGACGCCAGCA GTCACGTGTGAGGTAGTAATATCT
DQ835765 8941 9103 ACTCCGTACATGTCGAAGTCA GCTCGAGCTCTGTGTCTCCAGGCTCTC
FJ024274 7692 7849 GCTTCCTCTACGGATAGCAAGTTAG AGCCATGATGGTAGTATCTATTGGT
EU256065 8752 8917 GCCTAGCCAGGAGTTGACTGGAGT GTCGCTGAATGATCTGAGGTAGGTC
DQ835765__7789_7949 GGACATCATAATAATGTTGGTCCAGC GAGTTGATGTGGTTAATGGCCTTACT
DQ480524 8503 8666 AACATGTCATAGTCCTTGATGTT ACATTGGATGAGCACGATGTTATGAGCT
DQ835763_7790_7948 GGACATCATAATAATGTTGGTCCAG GAGTTGATGTGGTTAATGGCCTTACTG
DQ835763J7788J7952 ACATCATAATAATGTTGGTCCAGCA GAGTTGATGTGGTTAATGGCCTT
EU155359 8748 8913 CCAGGAGTTGACTGGAGTGTGTC CGCTGAATGATCTGAGGTAGGTCAAGTG
DQ480516_7680_7823 GTGGAATACACCATGTTATGG GCTTCCTCTATGGATAGGAGCC
DQ480512 7677 7826 GTGGAATACACCATGTTATGGTGT CGCAGGCTTCCTCTATGGATAGGA
DQ480516J7673J7822 GTGGAATACACCATGTTATGGTGTCTTA GCTTCCTCTATGGATAGGAGCCT probe name 5' arm (A) 3' arm (C)
DQ835763_7793_7947 AGGACATCATAATAATGTTGGTC GAGTTGATGTGGTTAATGGCCTTACTGG
AF207763 8783 8955 CCAGGAGTTAACTGGAGTGTGTCTAGC GTCGTTCAATGATCTGAGGTAGGTCAA
EU155302 8744 8918 CCAGGAGTTAACTGGAGTGTGTCTAG GTCGTTCAATGATCTGAGGTAGGT
FJ410172 7588 7743 GTGGAATACACCAGATTGTGATGACG GCTGCAAGCTTCCTCTACGGAT
AF207770 7792 7966 ACTGTGGACGCCTTCGCCTTCATCTCC GTGTCAATTGGTGTCTCAGTGTCTTCC
EU857431 7985 8129 GTGGTGTCAATTGGTGTCTCAGT GCCTGAGGAAGAGTGGAGACCAC
EU255979 7622 7769 GTGGAATACACCAGATTGTGATGAC GCCTCCTCTACGGATAGCAAGT
EU857431 7986 8130 GTGGTGTCAATTGGTGTCTCAG GCCTGAGGAAGAGTGGAGACCA
AF207770 7793 7965 ACTGTGGACGCCTTCGCCTTCATCTC GTGTCAATTGGTGTCTCAGTGTCTTCCA
EU155302 8233 8381 ACTCCTCAACACGGATGTCACTCTC TCAGCACGCCGCTTGCGCGGC
EU155359 8897 9060 CAAGTGGCTCAATGGAGTAACA GTAGCTTAGCGCGGACACTTC
EU255934 8220 8369 GCCTCCTCCGTACGGATGTCGCTCTCAG TCAGTACGCCGCTTGCGCGGCA
EU155359 7757 7910 CTGTGGACGCCTTCGCCTTCAT GTCTTCCAGCAAGTCCTTCCACAC
FJ410172 7591 7734 GTGGAATACACCAGATTGTGATG GCTTCCTCTACGGATAGCAAGTTA
AF290978 7686 7833 GTGGAATACACCAGATTGTGAT GCTGCAAGCTTCCTCTACGGATAGCA
DQ835763_7607_7786 GCTGTATGAGTATGAGCAGCA ACATCATAATAATGTTGGTCCAGCATC
FJ410172 7587 7742 GTGGAATACACCAGATTGTGATGACGT GCTGCAAGCTTCCTCTACGGATA
EU155330 8261 8411 GTAGATTGACTCCTCAACACGGATGTC GCAGCTGGTCGTCAGTACACCGCT
EU482876 7778 7937 GCTGCAAGCTTCCTCTACGGATAGCAAG GTTCTTAGCCATGATGGTAGTGTCT
EU482876 8598 8742 CTCCAAGTCGTACTCTGGTTG GCCTAGCCAGGAATTGACTGGAGT
Table 5: Drug resistance mutations gene codon gene codon gene codon
ns3 16 ns3 95 ns3 155
ns3 36 ns3 96 ns3 156
ns3 37 ns3 97 ns3 157
ns3 38 ns3 98 ns3 158
ns3 39 ns3 99 ns3 159
ns3 40 ns3 100 ns3 160
ns3 41 ns3 101 ns3 161
ns3 42 ns3 102 ns3 162
ns3 43 ns3 103 ns3 163
ns3 44 ns3 104 ns3 164
ns3 45 ns3 105 ns3 165
ns3 46 ns3 106 ns3 166
ns3 47 ns3 107 ns3 167
ns3 48 ns3 108 ns3 168
ns3 49 ns3 109 ns3 169
ns3 50 ns3 110 ns3 170
ns3 51 ns3 111 ns3 175
ns3 52 ns3 112 ns3 176
ns3 53 ns3 113 ns3 489
ns3 54 ns3 114 ns5a 23
ns3 55 ns3 1 15 ns5a 28
ns3 56 ns3 116 ns5a 30
ns3 57 ns3 117 ns5 31
ns3 58 ns3 118 ns5a 32
ns3 59 ns3 119 ns5a 54 gene codon gene codon gene codon ns3 60 ns3 120 ns5a 58 ns3 61 ns3 121 ns5a 92 ns3 62 ns3 122 ns5a 93 ns3 63 ns3 123 ns5a 262 ns3 64 ns3 124 ns5a 318 ns3 65 ns3 125 ns5a 320 ns3 66 ns3 126 ns5b 50 ns3 67 ns3 127 ns5b 55 ns3 68 ns3 128 ns5b 71 ns3 69 ns3 129 ns5b 95 ns3 70 ns3 130 ns5b 96 ns3 71 ns3 131 ns5b 138 ns3 72 ns3 132 ns5b 142 ns3 73 ns3 133 ns5b 282 ns3 74 ns3 134 ns5b 314 ns3 75 ns3 135 ns5b 316 ns3 76 ns3 136 ns5b 365 ns3 77 ns3 137 ns5b 368 ns3 78 ns3 138 ns5b 411 ns3 79 ns3 139 ns5b 414 ns3 80 ns3 140 ns5b 419 ns3 81 ns3 141 ns5b 422 ns3 82 ns3 142 ns5b 423 ns3 83 ns3 143 ns5b 445 ns3 84 ns3 144 ns5b 447 ns3 85 ns3 145 ns5b 448 ns3 86 ns3 146 ns5b 451 ns3 87 ns3 147 ns5b 452 ns3 88 ns3 148 ns5b 462 ns3 89 ns3 149 ns5b 482 ns3 90 ns3 150 ns5b 494 ns3 91 ns3 151 ns5b 495 ns3 92 ns3 152 ns5b 496 ns3 93 ns3 153 ns5b 499 ns3 94 ns3 154 ns5b 554 ns5b 555 ns5b 556 ns5b 558 ns5b 559
Table 6: 1b-RT primers, 5'
GTCTAGCCATGGCGTTAGTAT
GTAGAGGGGCCAAGGATACC
GCAGAAAGCGTCTAGCCATGGCGT
GTAGAGGGGCCAAGGATACC
TGCAGGAACACGTGGCATGG
TGAGCTTGGTCTTTACCGCCCAGT Table 7: ns3-RT-primers primer 5' -> 3' strand Tin
ATGAAGTTCCACATG - 31.2
GAAATTCCACATGTC - 30.1
ATTATGTCCAAATGG + 27.5
CTACATCTATGACCA + 28.7
ATTACTGGCACCTAT + 3 1.3
TTCCACATATGCTTT - 29.6
TAAGATCATCACCTG + 29.4
TATGGAGACCAAGCT + 34.4
TCCATCTCATCAAAC - 30.5
TACGTACGTTTATGA + 28.7
TAAGGTCATCACCTG + 32.8
ACATTTATGACCACC + 30.8
AAGTTCCACATGTGT - 33.1
ATCTATGACCATCTC + 28.7
ATACTTTGTGAGGGC + 34
TTCTGCTTGAATTGA - 30
CTATAACCATCTCAC + 27.5
TCATCTTACTCCACT + 30.4
TCCATCTCATCATAT - 27
ATAAGATACACCTGC + 29.9
TGTTTGAACTGCTCA - 33.6
CATCTATGATCACCT + 29.1
ACATTTATGACCATC + 27.4
TTACAACCATCTCAC + 30.4
TAAGGTTATCACCTG + 29.2
ATTTACAACCACCTT + 29.3
TTCTGCTTGAACTGT - 33.1
CTACCTATATGACCA + 27.8
GAAAGTCATCACTTG + 29.7
TAAAGTGCCTTACTT + 28.6
GGTTTGTGGAGATAT + 30
GTGGATTGGTATTTA + 26.8
GTATAACCATCTTAC + 24.4
TTGTGGTGATATTAT + 24.8
GACATACATATACGA + 26.1
TTATGTTCAAATGGC + 28.9
TATCATCTTCAGTCC + 29
TCATTACGTGCAAAT + 30.8
TAGGTATGTGTAGGT + 29.6
TGGATGGGTATTTAT + 27.4
ATAGTGTGTTGGTGA + 32
TTGAGTGATGGTAAG + 30
ATACCGTATTACGTG + 30.3
TATGACCACCTTACT + 30.3
TGAACCTGTAATATT + 24.7
GTACCATACTTCGTG + 32
TAACCACCTTACTCC + 32.2
TTGGATTGGTATATA + 23.5 primer 5' -> 3' strand Tm
TATGTGTATGACCAT + 27.8
TCCCTTACTATGTGA + 30
CACCTACATCTATGA + 28.7
GAAATATGTTCAGGC + 29.8
AACATACGTCTACAA + 28.8
TGGTAAGTACATGCA + 32.1
CTATGATCATCTCTC + 27
TTATGAGTTGGAGTA + 27.2
GGGCACCTATATTTA + 29.2
AAGTTATTGTTTGGG + 28.1
GATTGGGTTTATGTG + 29.6
GGAGGTTAAGGTTAT + 29
GAACTTATAGTTCGG + 28.4
TATAGTGAGTTGGTG + 29.3
TGAGGGATGATAGTA + 29.3
ATAGTGTGTAGGAGA + 29.7
TGGTATAAGTTTGGT + 28.2
TTATATGGGTGAGTG + 29.3
CGTGTAACAATTGAT + 28.8
AGGGAACTAAGTATG + 28.7
TATGGTAAGGGAATT + 27.3 .
TTTACAACCATCTGG + 30.7
ATGACAGGTACGTAC + 33.2
TAGAGGTGTCTGTCT + 33.2
TGTGTAATAATTGAT + 22.1
TAAGTATGTGCAGGC + 34.2
ATAGGTTAGTGATGG + 28.8
Ion Torrent adaptamer-primers:
5' CCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT
CAGATGTTATGCTCGCAGGTC 3'
5' CCATCTCATCCCTGCGTGTCTCCGAC TCAG BBBBBBBB
GGAACGATGAGCCTCCAAC 3'
lllumina adaptamer-primers:
5' CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCG CTCTTCCGATCT C AG ATGTTATG CTCGCAGGTxC 3'
5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTT CCGATCT BBBBBBBB GGAACGATGAGCCTCCAAxC 3' where BBBBBBBB represents a unique barcoding sequence and TxC and AxC indicate a phosphorothioate bond between the T and C or A and C.
Homologous probe arms may also be chosen to capture drug resistance mutations. These sets of probes were chosen to capture one or more drug resistance mutations across a set of 661 full or partial HCV genomes that are publically available. See e.gJ/hcv.lanl.gov; HCV sequence database: Kuiken C, Yusim K, Boykin L, Richardson R. The Los Alamos HCV Sequence Database. Bioinformatics(2005), 21(3):379-84. The probe selection process attempted to select three probes that would "work" against every drug resistance mutation listed in Table 5 in each of the 661 genomes. The software considers a probe to work in a strain if it is expected to capture with at least 10% of its maximum efficiency. Table 2 lists the 162 probes designed against drug resistance mutations in the NS3 protease gene. Table 3 lists the 119 probes designed against the NS5a gene. Table 4 lists the 129 probes designed against the NS5b polymerase gene.
1.2 Backbone sequences
The probes provided by the invention include a probe backbone sequence between the first and second homologous probe sequences. The backbone sequence can be at least 15, 20, 25, 30, 35, 40, 45, 50, 70, 90, 100, 12, 140, 150, 160, 180, 200, 400 bases, or more.
The backbone sequence may include a detectable moiety. In some
embodiments, the detectable moiety is a probe-specific sequence, such as a barcode for identification of a specific probe or set of probes.
In some embodiments, the backbone sequence comprises one or more primer- binding sites. In more particular embodiments, the backbone includes two primer- binding sites. Each backbone primer-binding site may comprise one or more universal sequences that, for example, can be used to amplify all circularized probes in a mixture. In some embodiments, the backbone sequence comprises one or more non Watson- Crick nucleotides. In further embodiments, the backbone comprises one or more 2'0Methyl nucleotide residues, artificial base pairs such as IsodC or IsodG, or abasic furans (such as dSpacer), or 2'OMethyl, abasic furans, or LNA nucleotides, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more LNAs or 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100%
2'OMethyl, abasic furans, or LNA nucleotides, to confer greater reactivity or inertness in the hybridization reaction, provide resistance to enzymatic activities such as
polymerase-mediated strand displacement or nuclease cleavage, to serve as inhibitors of spurious amplification events, or to act as target sites for trans-acting nucleic acid oligonucleotides such as PCR primers or biotinylated capture probes.
In some embodiments, the backbone sequence B comprises a cleavage site. In some embodiments, the cleavage site is a restriction endonuclease recognition site. In further embodiments, the backbone sequence B comprises one or more detectable moieties. In some embodiments, the one or more detectable moieties are each independently selected from a barcode sequence and a primer-binding sequence. In certain embodiments, the backbone contains non-Watson-Crick nucleotides, including, for example, abasic furan moieties, and the like.
In some embodiments, the backbone comprises the sequence
GTTGGAGGCTCATCGTTCCTATATTCCACACCACTTATTGATGATTACAGATGTTATG CTCGCAGGTC. In other embodiments, the backbone comprises the sequence GTTGGAGGCTCATCGTTCCTATATTCCACACCACTTATTATTACAGATGTTATGCTCG CAGGTC.
The term "barcode" is used to refer to a nucleotide sequence that uniquely identifies a molecule or class of related molecules. Suitable barcode sequences that may be used in the probes of the invention may include, for example, sequences corresponding to customized or prefabricated nucleic acid arrays, such as n-mer arrays as described in U.S. Patent No. 5,445,934 to Fodor er a/, and U.S. Patent No.
5,635,400 to Brenner. In certain embodiments, the n-mer barcode may be at least 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400 or 500 nucleotides, e.g., from 18 to 20, 21 , 22, 23, 24, or 25 nucleotides. In particular embodiments, the n-mer barcode is from 6 to 8 nucleotides. In further embodiments, the n-mer barcode is from 10 to 12 nucleotides. In particular embodiments the barcodes include sequences that have been designed to require greater than 1 , 2, 3, 4 or 5 sequencing errors to allow this barcode to be inadvertently read as another in error. In some embodiments, the probe does not contain a barcode, while a primer that is used to amplify a circularized probe contains a barcode.
To generate barcode sequences, for each barcode size K, 4K random barcodes may be generated from the four DNA nucleotides, A, T, G, C, using a Perl script. This set of barcodes represents the total number of unique sequence combinations possible for a sequence of K length, using 4 nucleotide variations. Barcodes for which one nucleotide comprises 100% of the length, e.g., TTTTTT, are then optionally removed using a pattern-matching Perl script. Further filtering steps may include removal of barcodes which contain runs of nucleotides of >3, e.g., TGGGGT, or runs interrupted by only one nucleotide, for instance, GGGTGG. Barcodes containing palindromes or inverted repeats with a propensity to form secondary structure through self-hybridization may be filtered using a Perl script designed to identify such self-complementarity. A set of candidate barcodes may be further filtered such that every barcode contains at least some number of base differences compared to any other barcode. For example, barcodes may be selected to be an edit distance of two nucleotides apart (i.e., differing in sequence by two nucleotides) to ensure that a single sequencing error does not cause barcode mis-identification.
Selection of barcodes that may be utilized in a mixture of probes used to test a sample from a patient may involve selecting a combination of barcodes that will provide >5% and not more than 50% representation of a particular nucleotide at each position in the barcode sequence within the pool. This is achieved by random addition and removal of barcodes to a pooled set until the conditions specified are met using a Perl script. Barcodes for which the reverse complement sequence is also present within the barcode pool may also be eliminated.
Suitable barcode sequences include such barcode sequences as set forth in the table below, which illustrates exemplary 3-mer, 4-mer, 5-mer, 6-mer, 7-mer, 8-mer, 9- mer, and 10-mer barcode sequences. Sequences indicated as "1 nucleotide distance" nmers in Table 3 are illustrative sequences that have a sequence distance of at least 1 from each other, where "distance" refers to the minimum number of sequencing differences between each of the sequences of the same category. "Two nucleotide distance" sequences have a "distance" from each other of at least 2 nucleotides.
Additional exemplary barcode sequences are described in Table 1 of PCT International Publication WO 2011/156795, which is incorporated by reference in its entirety.
In particular embodiments, barcodes used in the probes provided by the invention correspond to those on the Tag3 or Tag4 barcode arrays by AFFYMETRIX™. Further discussion of barcode systems can be found in Frank, BMC Bioinformatics, 10:362 (2009; 13 pages), Pierce er a/. , Nature Methods, 3:601-03 (2006) (including web supplements), and Pierce et a/., Nature Protocols, 2:2958-74 (2007). In some embodiments, the barcode is sample-specific, e.g. , comprises one or more patient specific barcodes. In particular embodiments, more than one barcode will be assigned per patient sample, allowing replicate samples for each patient to be performed within the same sequencing reaction. By using sample nucleic acid-specific barcodes it is possible to both multiplex reactions as described in the present
application, as well as detect cross-contamination between test samples that did not use a defined repertoire of specific barcodes. In certain embodiments, the barcode may be temporal, e.g., a barcode that specifies a particular period of time. By using a temporal barcode, it is possible to detect carry-over or contamination on an assay instrument, such as a sequencing instrument, between runs on different days. In more specific embodiments, sample and/or temporal barcodes may be used to automatically detect cross-contamination between samples and/or days and, for example, instruct an instrument operator to clean and/or decontaminate a sample handling system, such as a sequencing instrument.
In certain embodiments, a barcode sequence in the backbone is located between a universal primer binding site and a probe arm such that the barcode sequence is amplified by the universal primer.
In certain embodiments, universal primer binding sequences in a backbone sequence serve as a hybridizing template for longer "adaptamer" primers. An
"adaptamer primer" is a primer that hybridizes to universal primer sequences in a capture reaction product to facilitate amplification of the capture reaction product and further comprise a sample-specific barcode sequence, e.g., sequence 5' to the universal primer hybridizing region of the adaptamer primer. Adaptamer primers can be used, for example, to incorporate sample-specific barcodes on amplification reaction products to allow further multiplexing of samples after completing a capture reaction and an amplification reaction. The addition of sample-specific barcodes allows multiple capture and/or amplification reaction products to be pooled before detection by, for example, sequencing. In more particular embodiments, the adaptamer primers further include universal sequences that hybridize to a sequencing primer.
The detectable moiety may be associated with the backbone sequence. It may be bound to the polynucleotide sequence, as in the case of direct labels, such as fluorescent (e.g., quantum dots, small molecules, or fluorescent proteins), chemical or protein-based labels. Alternatively, the detectable moiety may be incorporated within the polynucleotide sequence, as in the case of nucleic acid labels, such as modified nucleotides or probe-specific sequences, such as barcodes. Quantum dots are known in the art and are described in, e.g., International Publication No. WO 03/003015.
Means of coupling quantum dots to biomolecules are known in the art, as reviewed in, e.g., Mednitz et al., Nature Materials 4:235-46 (2005) and U.S. Patent Publication Nos. 2006/0068506 and 2008/0087843, published Mar. 30, 2006 and Apr. 17, 2008, respectively.
The backbone may also contain a "random" sequence, typically written as a sequence of N's. Such random sequence indicates that any of the four nucleotides A, T, C, or G will be incorporated at that position in the synthesized probe molecule. Within a population of probe molecules, one expects to see all or many of the 4Λη possible sequences for the string of Ns. For example, a probe with the backbone
GTTGGAGGCTCATCGTTCCTATATTCCACACCACTTATTATTANNNNNNCAGATGTTA TGCTCGCAGGTC could actually by one of 46 molecules. Including such random sequences, also known as "dogtags," in the probe molecule allows one skilled in the art to determine the most likely number of circularized probe molecules after amplification and sequencing by counting the number of unique dogtags seen. 2 Probe Mixtures
2.1 Probes and calibration standards
Aspects of the invention provide one or more probes for multiplex analysis of test samples, including hepatitis virus detection and hepatitis C viral genotyping in a biological sample from a patient. Aspects of the invention encompass a composition comprising a plurality of probes, each comprising a nucleic acid sequence of the formula:
5'-A-B-C-3'
wherein
A is a probe arm sequence taken from column 2 of tables 1-4; and
C is a corresponding probe arm sequence from column 3 of tables 1-4; and
B is a backbone sequence.
In some embodiments, each probe arm sequence in the group consisting of all sequences from tables 1-4 is contained in at least one probe in the plurality of probes.
Probes in a mixture may be selected such that the mixture comprises a subset of the full group of probes encompassed by the probe arm sequence pairs provided in tables 1-4, so as to detect a particular subset of hepatitis C genotypes or a particular subset of mutations.
Probes in a mixture will typically have similar bulk properties (such as,
homologous probe sequence length, homologous probe sequence Tm, and length of the captured region of interest, and the lack of secondary structure) or fall in ranges of similar values. In some embodiments, the Tm of the homologous probe sequences in a mixture of probes will be within 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 °C of each other, or in particular embodiments have the same Tm. In some embodiments, the homologous probe sequences in a mixture of probes will all be within 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 nucleotide in length of each other. The length of the region of interest between the target sequences of probes in a mixture may vary over a range of values, such as from 2 to 20, 20 to 100, 20 to 200, 40 to 300, 100 to 300, 100 to 500, 80 to 500, or 100-180 nucleotides. In some embodiments, the length of the region of interest between the target sequences of probes in a mixture is from 100 to 489 nucleotides. In particular embodiments, the regions of interest are within 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 nucleotides in length of each other. Barcode lengths may also vary, but are generally within 25, 20, 15, 10, or 5 nucleotides of each other. In particular embodiments, the barcodes are the same length.
In some embodiments, mixtures provided by the invention comprise capture reaction products and amplification reaction products from different test samples, as further described below. Briefly, different capture reaction products and/or amplification reaction products can be combined and multiplexed before detection, i.e., for concurrent detection. This is accomplished using barcode sequences that identify the test samples. For example, capture reaction products from test sample A will include a sample A-specific barcode and capture reaction products from sample B will include a sample B-specific barcode. When capture reaction products from sample A and sample B are combined for sequencing, all sequences in the sample A capture reaction products are identified by the presence of the sample A-specific barcode sequence.
In certain embodiments, the mixtures of the invention contain sample internal calibration nucleic acids (SICs). In particular embodiments, known quantities of one or more SICs are included in a mixture provided by the invention. In particular
embodiments, at least 1 , 2, 3, 4, 5, 6, 7, 8, 10, 15, 20, 25, or 30 different SICs are included in the mixture. In particular embodiments, there are about 4 different SICs in a mixture. In some embodiments, the SICs have a nucleotide composition characteristic of pathogenic DNA targets and are present in specific molar quantities that allow for reconstruction of a calibration curve for quality control, e.g., for the processing and sequencing steps for each individual test sample. In certain embodiments, the SICs makes up approximately 10% (molar quantity) of nucleic acids in a mixture, for example, 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20% (molar) of nucleic acids in the mixture. In particular embodiments different SICs are present in different concentrations, for example, in a dilution series, over a 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 200, 500, 1000, 5000, 10000, 50000, or 100000 -fold concentration range from the most dilute to most concentrated SICs in 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, or 50 steps. In particular embodiments, SICs are present in a sample (e.g., a mixture of probes and a test sample, a capture reaction, a capture reaction product, an amplification reaction, or an amplification reaction product) at concentrations of 5, 25, 100, and 250 copies/ml. By detecting the predetermined concentration of the SICs— for example, by using probes directed to the SICs— the skilled artisan can estimate the concentration of an organism of interest such as a virus in a test sample. In certain embodiments, this is
accomplished by correlating the frequency that a captured sequence is detected to the volume of the sample from which the nucleic acids were obtained. Thus, an organism count per unit volume (e.g., copies/mL for liquid samples such as blood or urine) can be estimated for each organism detected.
In particular embodiments, the concentration of SICs and probes directed to the SICs are adjusted empirically so that sequences of SICs detected in a capture reaction product and/or amplification reaction product make up about 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 25, or 30% of sequences in the mixture. In particular embodiments, SICs make up 10-20% of sequence reads. In certain embodiments, the number of SICs sequence reads in a sequencing reaction is quantitatively evaluated to ensure that sample processing occurs within pre-defined parameters. In particular embodiments, the pre- defined parameters include one or more of the following: reproducibility within two standard deviations relative to all samples sequenced during a particular run, empirically determined criteria for reliable sequencing data (e.g., base calling reliability, error scores, percentage composition of total sequencing reads for each probe per target organism), no greater than about 15% deviation of GC or AU-rich SICs within a sequencing run. In embodiments in which patient samples are barcoded to allow pooling for multiplex sequencing, the SICs DNA in a sample will also comprise the same barcode(s) corresponding to unique samples, e.g., particular patient samples.
In more particular embodiments, SICs may comprise a region of interest as defined above, where the region of interest is modified to further comprise a sequence heterologous to the region of interest. In more particular embodiments, the sequence heterologous to the region of interest in the SICs is at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40 contiguous bases, or more. By using SICs comprising a modified region of interest, a single probe can be used both to detect an organism of interest within a sample, as well as the SICs, which provides internal controls for quantification and validation. Thus, SICs sequences and a region of interest from an organism of interest detected in a test sample can be differentiated by detecting the sequence heterologous to the region of interest, e.g., by sequencing or sequence-specific quantitative PCR.
In a preferred embodiment, the SIC contains the nucleotide sequence
ACTCTAGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGCTTTAAAAAACC TC C C AC AC CTC CC C CTG AAC CTGAAACATAAAATG AATGC AATTGTTGTTGTTAACTT GTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATA AAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTC and the probe mixture includes the probe with sequence 5- /5Phos/GTG GTA TGG CTG ATT ATG ATC TAG AGT GTT GGA GGC TCA TCG TTC CTA TAT TCC TGA CTC CTC ATT GAT GAT TAC AGA TGT TAT GCT CGC AGG TCG AGT TTG GAC AAA CCA CAA CTA GAA -3. 2.2 Samples
In some embodiments, the mixtures of the invention contain sample nucleic acids. The nucleic acids may be obtained from any test sample, such as a biological sample. The nucleic acids obtained from the test sample may be of varying degrees of purity, such as at least 1 , 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 85, 90, 95, 96, 97, 98, 99% of organic matter by weight. In particular embodiments, the sample nucleic acids are extracted from a test sample.
Test samples may be from any source and include swabs or extracts of any surface, or biological samples, such as patient samples.
Patients may be of any age, including adults, adolescents, and infants.
Biological samples from a subject or patient may include blood, whole cells, tissues, or organs, or biopsies comprising tissues originating from any of the three primordial germ layers— ectoderm, mesoderm or endoderm. Exemplary cell or tissue sources include skin, heart, skeletal muscle, smooth muscle, kidney, liver, lungs, bone, pancreas, central nervous tissue, peripheral nervous tissue, circulatory tissue, lymphoid tissue, intestine, spleen, thyroid, connective tissue, or gonad. Test samples may be obtained and immediately assayed or, alternatively processed by mixing, chemical treatment, fixation/ preservation, freezing, or culturing. Biological samples from a subject include blood, pleural fluid, milk, colostrums, lymph, serum, plasma, urine, cerebrospinal fluid, synovial fluid, saliva, semen, tears, and feces. In particular embodiments, the biological sample is blood. Other samples include swabs, washes, lavages, discharges, or aspirates (such as, nasal, oral, nasopharyngeal, oropharyngeal, esophagal, gastric, rectal, or vaginal, swabs, washes, ravages, discharges, or aspirates), and combinations thereof, including combinations with any of the preceding biopsy materials.
3 Exemplary Methods of the Invention
In another aspect, the invention provides a method for detecting the presence of one or more hepatitis C virus by contacting a sample suspected of containing at least one such virus with a mixture of probes of the invention, capturing a region of interest of the at least one virus (e.g., by polymerization and/or ligation) to form a circularized probe, and detecting the captured region of interest, thereby detecting the presence of the one or more hepatitis C viruses. In certain embodiments, the captured region of interest may be amplified to form a plurality of amplicons (e.g., by PCR). In particular embodiments the sample is treated with nucleases to remove the linear nucleic acids after probe-circularizing capture of the region of interest. In some embodiments, the circularized probe is linearized, e.g., by nuclease treatment. In other embodiments the circularized probe molecule is sequenced directly by any means known in the art, without amplification. In certain embodiments, the circularized probe is contacted by an oligonucleotide that primes polymerase-mediated extension of the molecules to generate sequences complementary to that of the circularized probe, including from at least one to as many as 1 million or more concatemerized copies of the original circular probe. In particular embodiments, the circularized probe molecule is enriched from the reaction solution by means of a secondary-capture oligonucleotide capture probe. A secondary-capture oligonucleotide capture probe may comprise a moiety designed to be captured, such as a biotin molecule, and a nucleic acid sequence designed to hybridize to at least 6 nucleotides of the circularized probe. The nucleic acid sequence designed to hybridize to at least 6 nucleotides of the circularized probe may include 1 ,
2, 4, 8, 16, 32 or more nucleotides of the polymerase-extended capture product. In certain embodiments, the probe and/or captured region of interest is sequenced by any means known in the art, such as polymerase-dependent sequencing (including, dideoxy sequencing, pyrosequencing, and sequencing by synthesis) or ligase based sequencing (e.g., polony sequencing). In particular embodiments, the sample is a biological sample. In more particular embodiments, the biological sample is from a mammal, such as a human.
In some embodiments, the methods of detecting the presence of one or more hepatitis viruses further comprise the step of formatting the results to facilitate physician decision making by, for example, providing one or more graphical displays.
Accordingly, in another aspect, the invention provides a method of treating a subject suspected of being infected with a hepatitis C virus, comprising detecting at least one hepatitis C virus by the methods of the invention and administering a suitable therapeutic treatment based on the at least one hepatitis virus detected.
The invention also provides a method of treating a subject suspected of being infected with an HCV strain carrying a drug resistance mutation, comprising detecting at least one HCV drug resistant genotype by the methods of the invention and
administering a suitable therapeutic treatment based on the at least one HCV drug resistant genotype detected.
3.1 Reverse-Transcription
HCV RNA may be directly contacted by probes or may be converted to DNA to be used with the probes disclosed in this invention. Many techniques for converting RNA to DNA are available in the scientific literature. For example, random hexamer or octamer primers can be used with a reverse-transcriptase to generate first strand cDNA from the viral RNA. While simple, random priming also amplifies host (eg, human) RNA that may be present in a clinical sample, thus limiting the amount of viral cDNA produced by the reaction. Figure 4, panels A and B depict cDNA generation approaches for embodiments of this invention. Figure 5 shows the process from RNA through analysis.
A preferred embodiment of this invention uses a set of HCV-specific RT primers in an RT-PCR reaction to generate and amplify DNA from the viral RNA molecules. Similar to probes, good RT primers hybridize to relatively conserved portions of the HCV genome.
In many parts of the world, HCV genotypes 1 a and 1 b are the dominant strains. Table 6 shows a set of RT primers that amplify the NS3, NS5A, and NS5B genes from genotypes 1 a and 1 b. These primers can be used with the Qiagen OneStep RT-PCR to produce cDNA from RNA that has been extracted from HCV blood samples.
In another preferred embodiment, a set of RT-PCR primers that target all known HCV genotypes may be used. Table 7 lists a set of 75 15mers that achieve this goal for the NS3 gene. By using shorter primers, a given primer sequence is more likely to fit into a small island of high conservation thus reducing the number of primers needed to target all known HCV subtypes. This set of primer set works at a lower temperature (hybridization temperature at 30 °C ).
3.2 Probe capture and detection
The invention provides methods of detecting the presence of one or more HCV strains in a test sample. In certain embodiments, the methods comprise the step of contacting a mixture comprising probes described above with any of the test samples described above in a capture reaction, as defined above. In particular embodiments, a mixture comprising probes is contacted with nucleic acids extracted from a test sample such as blood, along with a polymerase enzyme and nucleotide triphosphates (NTPs), and capturing at least one region of interest by polymerase-dependent extension of at least one homologous probe sequence in the mixture. In particular embodiments, the polymerase-dependent extension of a homologous probe sequence is followed by a ligation of the end of the extended (i.e., by the polymerase) homologous probe sequence to the end of the other homologous probe sequence to produce a circularized probe containing a region of interest from the genome of an HCV strain. In some embodiments, the ligation reaction occurs while the target arm is hybridized to the target. In other embodiments, the target arm is dissociated from the target and ligated in solution under reaction conditions favoring self-ligation over trans-ligation to other probe molecules, for example a dilute ligation solution.
Figure 2 illustrates one particular embodiment of a method provided by the invention. Briefly, hybridization of a probe to the target sequences in the organism of interest is followed by polymerase-mediated, target-sequence-directed addition of nucleotides to the 3' homologous probe sequence, terminating due to obstruction at the 5' homologous probe sequence of the probe. A ligation reaction joins the terminal 3' nucleotide to the 5' nucleotide of arm.
The sample may be treated with exonuclease to digest single stranded linear DNA. Primers complementary to the probe backbone may amplify the MIP into dsDNA for sequencing. For multiplexing of sample reaction products or amplification reaction products, amplification primers at this stage may contain sample-specific nucleotide barcode sequences, e.g., they may be adaptamer primers. A unique primenbarcode molecule sequence therefore may identify each test sample. For example, a panel of 100 probes is contacted with 50 individual test samples. The homologous probe sequences detected in a sequence read identifies a strain of hepatitis C or a drug resistance genotype of a strain of HCV. Each test sample amplification reaction is done with one unique probe set. Each barcode within the amplification primer can be used to act as an identifier for a patient, e.g., contains a barcode. Therefore 50 pairs of amplification primers (one for each amplification reaction product) and one panel of probes (e.g., probes for hepatitis A, B, and C distinction, for HCV genotyping, or both) are required for a 50-sample multiplex assay.
Polymerases for use in the methods provided by the invention include Taq polymerase (Lawyer et al., J. Biol. Chem., 264:6427-6437 (1989); Genbank
accession: P19821 ), including the 5'->3' nuclease deficient "Stoffel" fragment described in Lawyer et al., PCR Meth. Appl., 2:275-287 (1993)), PHUSION™ high fidelity recombinant polymerase (NEB), and Pyrococcus furiosus (Pfu) polymerase (see, e.g., U.S. Patent No. 5,545,552), as well as polymerases comprising a helix-hairpin-helix domain, such as TopoTaq and PfuC2 (Pavlov et al., PNAS, 99:13510-15 (2002)). In more particular embodiments, the polymerase is 5'->3' nuclease deficient, such as the Stoffel fragment of Taq polymerase, which further lacks 3'- 5' proofreading activity.
Polymerases lacking 5'- 3' exonuclease activity may be generated by means known in the art, for example, based on methods of screening or rational design. For example, polymerase variants can be designed based on sequence alignments of one or more polymerases to the Stoffel fragment of Taq and/or by "threading" a sequence through a solved polymerase structure (e.g., MMDB IDs 56530, 81884 and 81885).
In certain embodiments, a polymerase for use in the methods of the invention is a non-displacing polymerase, such as Pfu, T4 DNA polymerase, or T7 DNA polymerase. In other embodiments, a polymerase for use in the methods provided by the invention is a polymerase suitable for isothermal amplification and capture and/or amplification reactions are performed isothermally, e.g., by controlling metal ion concentration and/or using particular polymerases and/or additional enzymes, such as helicases or nicking enzymes (such as primer generation RCA and EXPAR). See, e.g., U.S. Patent No. 6,566,103, Murakami et al., Nucl. Acid. Res., 37(3)e19 (2009), Tan et al., Biochemistry, 47:9987-99 (2008), Vincent et al., EMBO Rep., 5(8):795-800 (2004). Polymerases foruse in isothermal amplification include, for example, Bst, Bsu andphi29 DNA polymerases, and E.coli DNA polymerase I.
In other embodiments, a mixture of probes is contacted with nucleic acids extracted from a test sample, a ligase enzyme, and a pool of n-mer oligonucleotides in a capture reaction, as defined above. In particular embodiments, the n-mer
oligonucleotides are at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 22, 24 or 25 nucleotides long. In more particular embodiments, they are random hexamers. In other embodiments, they are polynucleotides, the length of the region of interest between the first and second target sequences that hybridize to the homologous probe sequence. In some embodiments, the n-mer oligonucleotide contains 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 locked nucleic acids (LNAs) or 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% LNAs.
The ligase enzyme ligates the n-mer oligonucleotides with the probes provided by the invention to produce a circularized probe containing a region of interest from HCV. Primers complementary to the probe backbone amplify the probe into dsDNA for sequencing. In some embodiments, e.g., for multiplexing, amplification primers are adaptamer primers and contain sample-identifying barcode sequences. A unique barcode sequence therefore identifies each sample in a multiplex. Each strain of HCV is identified by the unique combination of homologous probe sequences and ligated n- mer in a sequence read.
Ligases for use in the methods of the invention include T4, T7, and thermostable ligases, such a Taq ligase (as disclosed in Takahashi er a/., J. Biol. Chem., 259:10041 - 47 (1984), and international publication WO 91/17239), and AMPLIGASE™
In certain other embodiments, mixtures comprising pairs of conventional PCR primers (conventional primer pairs) provided by the invention are contacted with sample nucleic acids to amplify a region of interest between two target regions in HCV. In certain embodiments, a limited number of amplification steps are performed. In particular embodiments, fewer than 25, 20, 15, 10, 9, 8, 7, 6, 5, 4, 3, or 2 cycles of amplification are performed. In particular embodiments, the mixture of conventional primer pairs are contacted with nucleic acids extracted from a test sample, a
polymerase, and nucleotide triphosphates to amplify the region of interest. Multiple combinations of conventional primer pairs may be used to multiplex reactions within the same sample tube, or separately for pooling. In some embodiments, primers binding to universal probe recognition sequence (e.g., a barcode) in the conventional primer pairs introduce nucleotide barcodes, and recognition sites for next-generation DNA
sequencing technology primers.
As part of the present invention, conventional primer pairs can be used in a variety of additional methods. For example, in some embodiments, conventional primer pairs may be contacted with a sample nucleic acid suspected of containing at least one target nucleic acid. In particular embodiments, PCR may be used to amplify the region of interest directly from a sample nucleic acid. In other embodiments, the conventional primer pairs may be used to amplify capture reaction products, e.g., one or more circularized probes. In other embodiments a sample nucleic acid suspected of containing a region of interest is amplified using a conventional primer pair and then contacted with a probe provided by the invention for circularizing capture. In some embodiments, conventional primer pairs are contacted with a sample nucleic acid and modified nucleotides, such as biotinylated nucleotides. In some embodiments using modified nucleotides, such as biotinylated nucleotides, the resulting capture or amplification reaction products can then be isolated by affinity capture, for example, with steptavidin substrates, for subsequent processing, e.g., circularizing capture with the probes provided by the invention. In further embodiments, a single conventional primer may be used for linear amplification of a region of interest in a sample nucleic acid in, and then contacted with a probe provided by the invention for circularizing capture. In other embodiments, a single conventional primer containing a 5' biotin moiety may be used to amplify a target sequence and then be enriched from the sample using streptavidin capture for sequencing by, for example, direct sequencing using either specific conventional primer pairs provided by the invention, or by random hexamer priming, or may be used for circularizing capture using probes provided by the invention
In certain embodiments, methods that comprise a capture reaction further comprise the step of contacting the capture reaction product with one or more
exonucleases to remove linear nucleic acids. In particular embodiments, the
exonuclease includes at least one of exo I, exo III, exo VII, and exo V. In more particular combinations the exonuclease is up to a 100:1 , 50:1 , 25:1 , 10:1 , 5:1 , 2:1 , 1 : 1 , 1 :2, 1 :5, 1 :10, 1 :25, 1 :50, or 1 : 100 (unit to unit) mixture of exonuclease I and
exonuclease III.
In certain embodiments, the methods of the invention further comprise the step of amplifying capture reaction products in an amplification reaction. Numerous methods of amplifying nucleic acids are known in the art and include the polymerase chain reaction (see, e.g. , U.S. Patent Nos. 4,683,195 and 4,683,202 and McPherson and Moller, PCR (the baSICs), Taylor & Francis; 2 edition (March 30, 2006)), OLA (oligonucleotide ligation amplification) (see, e.g., U.S. Patent Nos. 5,185,243, 5,679,524, and
5,573,907), rolling-circle amplification ("RCA," described in Baner et a/., Nuc. Acids Res., 26:5073-78 (1998); Barany, PNAS, 88:189-93 (1991 ); and Lizardi et al., Nat.
Genet. 19:225-32 (1998)), and strand displacement amplification (SDA; described in U.S. Patent Nos. 5,455,166 and 5,130,238). In particular embodiments, the
amplification is linear amplification such as, RCA. In more particular embodiments, capture reaction products (e.g., circularized probes) are used as templates in a RCA to generate long, linear repeating ssDNA products. In some embodiments, the RCA reaction may comprise contacting a sample with modified nucleotides, such as biotinylated nucleotides, LNA nucleotides or artificial base pairs such as IsodC or IsodG, or abasic furans (such as dSpacer), to facilitate affinity enrichment and purification. In certain embodiments, the amplification reaction products comprising linear repeating ssDNA can be contacted with a conventional primer provided by the invention to produce short extensions of double stranded DNA with a length 2, 3, 4, 5 ,6, 7,10,15, 20, 30, 40, 50, 75, 100, 500 nucleotides. In certain embodiments, the length of extension may be controlled by time of extension step at the optimum temperature of elongation for this polymerase, e.g., 5, 10, 15, 20, 40, 60 seconds, at temperatures including 37, 42, 45, 68, 72, 74 °C. In other embodiments, the length of extension is controlled by mixing of nucleotide analogues that prevented further elongation into the reaction, such as dideoxyCytosine, or nucleotides with a 3' modification such as biotin, or a carbon spacer terminated with an amino group. In additional particular
embodiments, a primer is contacted with a linear repeating ssDNA RCA amplification reaction product and extended by a polymerase for a single cycle of PCR, to generate a short single stranded DNA containing the complementary sequence to the repeating unit of the RCA product. In more particular embodiments, the primer contacted with a linear repeating ssDNA RCA amplification reaction product produces a dsDNA region comprising a restriction enzyme cleavage site. Accordingly, in certain embodiments, when the primer hybridizes to the linear repeating ssDNA RCA amplification reaction product to form a double-stranded DNA region, the amplification reaction product is contacted with the restriction enzyme to produce shorter fragments. In particular embodiments, the amplification reaction uses adaptamer primers. In some embodiments, the amplification reaction uses sample-specific primers, that is, primers that hybridize to sequences present in the probe that identify the sample. In particular embodiments, a low number of amplification cycles are used to avoid amplification artifacts, e.g., fewer than 25, 20, 15, 10, 9, 8, 7, 6, or 5 cycles.
In certain embodiments, the methods provided by the invention may comprise the step of contacting sample nucleic acids, capture reaction products or amplification reaction products with a secondary-capture oligonucleotide capture probe which comprises a moiety designed to be captured, such as a biotin molecule, and a nucleic acid sequence, which is able to hybridize to the sample nucleic acids, capture reaction products, or amplification reaction products. Such an oligonucleotide, such as a biotinylated oligonucleotide, may be used to enrich their target nucleic acids using affinity purification. In some embodiments, a biotinylated oligonucleotide may
specifically hybridize to a captured sequence {i.e., it is complementary to a region of interest), a homologous probe sequence, or a backbone sequence, such as a barcode sequence. In certain embodiments, a biotinylated probe may be extended on sample nucleic acids, capture reaction products or amplification reaction products using thermophilic or mesophilic polymerases. In more particular embodiments, the method comprises contacting a capture reaction product with a biotinylated oligonucleotide for enrichment of specific capture reaction products using the biotin:streptavidin interaction.
Sequences captured by the methods of the invention can be detected by any means, including, for example, array hybridization or direct sequencing. In some embodiments, captured sequences may be detected by sequencing without
amplification. Numerous sequencing methods are known in the art, can be used in the method of the invention, and are reviewed in, e.g., U.S. Patent No. 6,946,249 and Metzker, Nat. Reviews, Genetics, 11 :31-46 (2010); Ansorge, Nat. Biotechnol.,
25(4): 195-203 (2009), Shendure and Ji, Nat. Biotechnol., 26(10):1135-45 (2008), Shendure et al., Nat. Rev. Genet. 5:335-44 (2004). In some embodiments, the sequencing methods rely on the specificity of either a DNA polymerase or DNA ligase and include, e.g., pyrosequencing, base extension sequencing (single base stepwise extensions), multi-base sequencing by synthesis (including, e.g. , sequencing with terminally-labeled nucleotides) and wobble sequencing, which is ligation-based.
Extension sequencing is disclosed in, e.g. , U.S. Patent No. 5,302,509. Exemplary embodiments of terminal-phosphate-labeled nucleotides and methods of using them are described in, e.g. , U.S. Patent No. 7,361 ,466; U.S. Patent Publication No.
2007/0141598, published Jun. 21 , 2007; and Eid et al., Science 323:133-138 (2009). Ligase-based sequencing methods are disclosed in, for example, U.S. Patent No.
5,750,341 , PCT publication WO 06/073504, and Shendure et al., Science 309:1728- 1732 (2005). In particular embodiments, sequencing technology used in the methods provided by the invention include Sanger sequencing, microelectrophoretic sequencing, nanopore sequencing, sequencing by hybridization (e.g. , array-based sequencing), realtime observation of single molecules, and cyclic-array sequencing, including
pyrosequencing (e.g. , 454 SEQUENCING ®, see, e.g., Margulies et al. , Nature, 437: 376-380 (2005)), ILLUMINA ® or SOLEXA ® sequencing {see, e.g., Turcatti et al., Nucleic Acids Res., 36, e25 (2008), see also U.S. Patent Nos. 7,598,035, 7,282,370, 7,232,656, and 7,115,400), polony sequencing (e.g. , SOLiD™, see Shendure et al. 2005), and sequencing by synthesis (e.g. , HELICOS ®, see, e.g., Harris et al., Science, 320:106-109 (2008)).
In certain embodiments, the capture probes contain sequences that facilitate processing for sequencing by a certain sequencing technology, such as sequences that can serve as anchor sites for sequencing by synthesis, primer sites for sequencing reaction initiation, or restriction enzyme sites that allow cleavage for improved ligation of oligonucleotide adaptors for sequencing of the particular amplicon. In some
embodiments, circularized capture probes are contacted by oligonucleotides which prime polymerase-mediated extension of the capture probes to generate sequences complementary to that of the circularized probe, including from at least one to one million or more concatemerized copies of the original circular probe.
The mixtures and methods provided by the invention can be readily adapted to use with any suitable detections means, including, but not limited to, those listed above. In certain embodiments using ILLUMINA ® or SOLEXA ® sequencing, shorter
homologous probe sequences may be used in the probes provided by the invention, as well as conventional primer pairs. In more particular embodiments, the homologous probe sequences will be about 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 bases. In more particular embodiments, the region of interest between the target sequences of a probe or conventional primer pair is about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, or 50 bases. In still more particular embodiments, the probes provided by the invention may be circularized by polymerase-dependent synthesis and ligation, or by ligation of n-mer oligonucleotides of about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 5, 20, 25, 30, 35, 40, 45, or 50 bases. In yet more particular embodiments, the region of interest is about 7 bases and homologous probe sequences are 10 or 12 bases. In further embodiments a 7-mer oligonucleotide comprising a locked nucleic acid is ligated to a probe provided by the invention, and in still more particular embodiments, the 7-mer oligonucleotide comprises at least 1 , 2, 3, 4, 5, 6, or 7 locked nucleic acids (LNAs). In other embodiments, capture or amplification reaction products may be sequenced by emulsion droplet sequencing by synthesis as disclosed in, for example, Binladen er a/, PLoS One. 2(2):e197 (2007). In certain embodiments, capture products may be amplified by RCA to generate higher copy numbers of capture product within a single DNA molecule in order to facilitate emulsion of captured DNA for emulsion PCR and sequencing by synthesis. See, e.g., Drmanac et al, Science 327(5961 ):78-81 (2010).
In particular embodiments, capture reaction products and/or amplification reaction products containing different samples are combined before detection. In particular embodiments, capture and/or amplification reaction products are
combinatorially pooled before detection, e.g., an MxN array of individual capture reaction products and/or amplification reaction products are pooled by row and column, and the pools are detected. Results from row and column pools can then be
deconvolved to provide results for individual samples. Higher dimensional arrays and pools may be used analogously. In other embodiments, capture reaction products and/or amplification reaction products contain identifying barcode sequences. In particular embodiments, amplification primers contain sample-specific barcode sequences. Accordingly, the sample source of sequences contained in pools of capture reaction products and/or amplification reaction products are identified by their barcode sequences.
The methods provided by the invention may also include directly detecting a particular nucleic acid in a capture reaction product or amplification reaction product, such as a particular target amplicon or set of amplicons. Accordingly, in some embodiments, the mixtures of the invention comprise specialized probe sets including TAQMAN™, which uses a hydrolyzable probe containing detectable reporter and quencher moieties, which are released by a DNA polymerase with 5'- 3' exonuclease activity (U.S. Pat. No. 5,538,848); molecular beacon, which uses a hairpin probe with reporter and quenching moieties at opposite termini (U.S. Patent No. 5,925,517);
fluorescence resonance energy transfer (FRET) primers, which use a pair of adjacent primers with fluorescent donor and acceptor moieties, respectively (U.S. Patent No. 6, 174,670); and LIGHTUP™, a single short probe which fluoresces only when bound to the target (U.S. Patent No. 6,329,144). Similarly, SCORPION™ (U.S. Patent No.
6,326,145) and SIMPLEPROBES™ (U.S. Patent No. 6,635,427) use single reporter/dye probes. Amplicon-detecting probes are designed according to the particular detection modality used, and as discussed in the above-referenced patents. In particular embodiments, a quantitative, real-time PCR assay to detect a particular capture reaction product or amplification reaction product may be performed on the ILLUMINA® ECO Real-time PCR system™.
In particular embodiments, the methods of the invention comprise using sample internal calibration nucleic acid (SICs) to estimate the concentration of a hepatitis strain in a test sample. This is done by calibrating the frequency of a sequence from a hepatitis strain to the known concentration of the SICs to provide an estimated concentration of the viral strain in the test sample. In more particular embodiments, the estimated concentration of the viral strain is compared to a database of reference concentrations of hepatitis strains associated with a disease state and/or likely clinical diagnoses.
In some embodiments, the methods of the invention further comprise steps of formatting results to inform physician decision making. "Results" refers to the outcome of detecting a target organism and includes, e.g., binary (e.g., +/-) detection as well as estimates of concentration, and may be based on, inter alia the result of sequencing a capture reaction product or amplification reaction product. In particular embodiments, the formatting comprises presenting an estimate of the concentration of an organism in a test sample, optionally including statistical confidence intervals. In more particular embodiments, the formatting further comprises color-coding of the results. In certain embodiments, the formatting includes recommendations for therapeutic intervention, including, for example, hospitalization, probiotic treatment, antibiotic treatments, and chemotherapy. In some embodiments, the formatting comprises one or more of the following: references to peer-reviewed medical literature and database statistics of empirically defined sample results.
In a preferred embodiment, Phusion polymerase is used to copy the reverse- complement of the target sequence into the probe and Ampligase ligase is used to circularize the resulting molecule as shown in Protocol 1 . The resulting circular molecules can be amplified using adapter-primers shown in Table 8 to prepare the material for sequencing on either the Ion Torrent or lllumina platforms.
In another embodiment, the probes are applied directly to extracted RNA without generating a cDNA intermediate. This embodiment requires a DNA polymerase capable of using an RNA template (eg a standard reverse-transcriptase such as Tth,
Superscript, or MMLV reverse transcriptase) and a ligase capable of ligating single- stranded DNA bound to the RNA template. This embodiment reduces the cost, time-to- result, and complexity of the protocol but currently works with lower efficiency.
Figure 5 shows the distribution of probes from tables 1-4 across a reference HCV genome. In this embodiment, the probes target 661 HCV strains from many genotypes. The number of probes in each region indicates the sequence diversity in that region. 3.3 Sequence analysis
Conversion of raw sequence data may occur in three stages, namely (1) the processing of raw instrument data and conversion into aligned sequencing reads, (2) statistical interpretation of read data and (3) providing output and storage in archives.
In some embodiments, statistical analysis and interpretation determine the most likely strains or substrains present in the sample given the sequencing data.
In a particular embodiment, each sequencing read is first compared to the set of probe arms used in the capture reaction using an algorithm similar to Needleman- Wunsch but with no terminal gap penalty in the probe arm. The software retains the probe arm with the best score. Having identified the probe arm and therefore
determined which probe generated the sequencing read, the software then compares the sequencing read against all expected reads for that probe, where expected reads were generated by an in-silico application of the probe set to a set of full or partial HCV genomes. All matches of a probe to a genome that meet some minimum criteria are included in the set of expected reads. Having compared all reads to all expected reads, the software picks the most likely strain or strains present in the sample based on the alignment scores, a model of mutation probabilities, and a user-provided prior probability on the number of strains to expect.
In some embodiments, the methods of analysis determine the relative
proportions or abundances of strains via a "Mixture Model." In some embodiments, the hidden variables in the model are the proportions or abundances of the strains and the assignments of sequencing reads to expected reads (where each observed read is assigned to a single expected read). A variety of methods, including Expectation- Maximization, Gibbs Sampling, and Metropolis-Hastings, may be used to find the values of these hidden variables, which maximize the probability of the data given the hidden variables and the priors on the hidden variables.
In some embodiments, the software compares each read against both probe arms for each probe. The software performs two alignments for each read-probe pair, first aligning the first probe arm with no terminal gap penalty for the probe arm and then aligning the other probe arm with no initial gap penalty for that probe arm. The section of the read between the two probe arm alignments is the copied part of the target region that the probe captured from the target nucleic acid.
To determine the set of mutations present in a sample, the software analyzes subsets of the data, where each subset contains only the capture regions of sequencing reads that overlap a mutation of interest. Tools well known in the art such as
FreeBayes, SAMTools, and ShoRAH can be used to estimate the frequency of each allele based on the sequencing data.
Output of results can occur in parallel (1 ) to company server, (2) to xml and HL7 formats, e.g., for deposit in hospital system, in an electronic medical record (EMR) system, or in other HL7 or xml capable storage systems, for use in existing health record frameworks, and/or (3) to physician-friendly graphical and text formats, e.g., graphs, tables, summary text and possible annotated, web formats linking to reference information. Output formats are arbitrary, e.g., simple text, spreadsheet data, binary data objects, encrypted and/or compressed files. A complete record may involve all or some of these linked to a diagnostic test via unique identifiers. They may be assembled into a coherent object or may be accessible via a search for the unique identifier.
EXAMPLES
Protocols The following protocols were used in the examples described below and can be used by the skilled artisan when practicing the methods provided by the invention or using the probes, mixtures, and compositions provided by the invention. Variations on these protocols will be readily apparent to the skilled artisan.
Protocol 1 : MIP capture, HCV cDNA target capture
• Prepare the reaction solution:
• <12.5 μΐ_ Input DNA
• 1 .5 μΙ_ 10x buffer
• 1 μΙ_ probe mix
• Nuclease free water to 15 μΐ_
• Begin the MIP program on the thermocycler
• 94°, 10 min
• Ramp to 60°, 0.17sec
• 60°, 10 min
• 60° hold
• 60°, 10 min
• 15° hold
• 94° for 2 minutes
• 37° hold
• 37° for 30 minutes
• 94° for 15 minutes
• 4° hold
• While the hybridization is running, prepare the enzyme mix on ice:
• 5 μΙ_ (polymerase)
. 5 μΙ_ (buffer)
• 1 μΙ_ (ligase) • 1 .25 MI_ CJ NTPS
• 37.75 pL HOH
• When the thermocycler reaches the 60° hold (approximately 26 minutes), add 2 μΙ_ of enzyme mix to each sample and then advance the
thermocycler to the next step (60° for 10 min).
• When the thermocycler reaches the 15° hold, advance the thermocycler to the next step (94° for 2 min) and prepare the exonuclease mix:
• 7 μΙ_ of Exo I
• 7 ί of Exo I I I
• When the thermocycler reaches the 37° hold, add 1 μΐ_ of exonuclease mix to each sample and then advance the thermocycler to the next step (37° for 30 min).
• On ice, prepare the amplification mix.
• 10 μί of Phusion Master Mix
• 1 μΙ_ 10 μΜ barcoding primer
• 1 μΙ_ 10 μΜ lonAmpF
• 6 μ1_ ΗΟΗ
• When the thermocycler reaches the 4° hold, remove 2 μΙ_ from each
reaction and add to the appropriate amplification mix.
• Begin the amplification program on the thermocycler
• 94° for 3 minutes
• 30 cycles of:
• 94° for 15 seconds
• 60° for 15 seconds
• 72° for 30 seconds
• 72° for 4 minutes
After amplification, purify the products. Gel matrix purification or Ampure enrichment should enrich a product sized between 180 and 250 bases, excluding both primer dimers (~70-90 bases) and self-ligated probes (-160 bases). Example 1 : Capture and Analysis of 28 Clinical Samples
Blinded clinical samples from DAA-na'ive HCV-1 a (n=18) and HCV-1 b (n=10) infected patients underwent HCV RNA extraction followed by RT-PCR using the primers in table 6. All HCV clinical samples had a viral load >200,000 copies/mL (Roche COBAS AmpliPrep TaqMan HCV RNA PCR Test). HCV genotype was confirmed by Versant HCV Genotyping Assay 2.0. The 436 probes from tables 1-4 were used with Protocol 1 to target desired gene regions. Captured gene regions were sequenced using an Ion Torrent PGM and compared to sequences determined by Sanger
Sequencing (Figure 6). Cumulative representation of nucleic acid variants detected between HCV variants in these patient samples are represented by the heatmap in figure 7.
HCV probes from this invention correctly identified HCV-1 a and HCV-1 b viral variants compared to the Versant HCV genotyping assay. Resistance locus capture size averaged 200 bases, and read depth ranged between 50 to >50,000 fold. The probes detected mutations generating both nucleotide and amino acid polymorphisms. Figure 8 illustrates that among detected amino acid polymorphisms in our DAA-na'ive clinical samples, we detected mutations reported to confer retroviral drug resistance in NS3, NS5a and NS5b proteins. Selected observed mutations include: in NS3 - Q80L/K/R, D168G/E, I170T/V, 175L and E176G; in NS5a - M28T, Q30R, L31 M, P58S, Y58S and Y93H/N; and in NS5b - 71V, 1831, M414L/V, L419S, Y452H, V494A and V499A. The probes agreed in 28/28 samples with Versant HCV genotyping assay in 1a/1 b clinical samples, see figure 9. Additionally, DxSeq detected mutations associated with resistance to antiviral drugs, such as TMC435, boceprevir, danoprevir, BI-201335, BMS-790052, GS-9190, BMS-650032, MK-3281 , VCH-916, and JTK-109.
Eleven clinical samples were processed in duplicate with Pathogenica's HCV workflow and sequenced in 2 independent runs. Figure 10 demonstrates that probes and mutations detected in duplicate samples showed strong correlation between runs.
Figures 11-13 describe the fraction of viral quasispecies represented by specific nucleic acid variants sequenced from selected samples, and illustrates detection of viral variants at 2% of the total viral nucleic acid present.
Example 2: Specificity of Probes in Rejecting Human Nucleic Acids
Sequencing reads from the 28 clinical samples in example 1 were analyzed to determine whether probe arms could be identified in the sequencing read. There were 13,583,863 reads for which at least the first probe arm could be identified uniquely (reads for which no probe arm can be identified are generally of poor quality and thus rejected). The probe arms were trimmed from these sequencing reads to yield only the capture regions. As the input files were FASTQ files containing both base calls and quality scores, both the nucleotides and quality scores were trimmed to produce a new FASTQ file. The resulting sequences were aligned against the human reference genome (hg19 from //genome. ucsc.edu) using the Bowtie2 alignment software version 2.0.0b6 and the following command line parameters: -q --sensitive --end-to- end -M 1 --no-unal --threads 8.
The resulting output will contain only sequences for which the probe capture can be mapped to one or more locations in the human genome using Bowtie2's sensitive alignment option. Only 71951 reads were mapped. Thus at most 0.53% of the sequencing reads that could be assigned to a probe contained a sequence of plausibly human origin.
The foregoing description has been presented for purposes of illustration. It is not exhaustive and does not limit the invention to the precise forms or embodiments disclosed. Modifications and adaptations of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed embodiments. For example, the described implementations may be implemented in software, hardware, or a combination of hardware and software. Examples of hardware include computing or processing systems, such as personal computers, servers, laptops, mainframes, and micro-processors. In addition, one of ordinary skill in the art will appreciate that the records and fields shown in the figures may have additional or fewer fields, and may arrange fields differently than the figures illustrate. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
Unless otherwise indicated, all numbers expressing quantities of ingredients, reaction conditions, and so forth used in the specification, including claims, are to be understood as being modified in all instances by the term "about." Accordingly, unless otherwise indicated to the contrary, the numerical parameters are approximations and may vary depending upon the desired properties sought to be obtained by the present invention and may deviate by about 0.01 , 0.05, 0.1 , 0.5, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10% or more. At the very least, and not as an attempt to limit the application of the doctrine of equivalents to the scope of the claims, each numerical parameter should be construed in light of the number of significant digits and ordinary rounding approaches. The recitation of series of numbers with differing amounts of significant digits in the specification is not to be construed as implying that numbers with fewer significant digits given have the same precision as numbers with more significant digits given.
The use of the word "a" or "an" when used in conjunction with the term
"comprising" in the claims and/or the specification may mean "one," but it is also consistent with the meaning of "one or more," "at least one," and "one or more than one." The use of the term "or" in the claims is used to mean "and/or" unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and "and/or."
Unless otherwise indicated, the term "at least" preceding a series of elements is to be understood to refer to every element in the series. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments described herein. Such equivalents are intended to be encompassed by the following claims.
It should be understood that for all numerical bounds describing some parameter in this application, such as "about," "at least," "less than," and "more than," the description also necessarily encompasses any range bounded by the recited values. Accordingly, for example, the description at least 1 , 2, 3, 4, or 5 also describes, inter alia, the ranges 1-2, 1 -3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, and 4-5, et cetera.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention belongs. Any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the invention.
For all patents, applications, or other reference cited herein, such as non-patent literature and reference sequence information, it should be understood that it is incorporation herein by reference in its entirety for all purposes as well as for the proposition that is recited. Where any conflict exits between a document incorporated herein by reference and the present application, this application will control. All information associated with reference gene sequences disclosed in this application, such as Gene IDs or accession numbers, including, for example, genomic loci, genomic sequences, functional annotations, allelic variants, and reference mRNA (including, e.g. , exon boundaries) and protein sequences (such as conserved domain structures) are hereby incorporated herein by reference in their entirety.
The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates, which may need to be independently confirmed.
Headings used in this application are for convenience only and do not affect the interpretation of this application.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

WHAT IS CLAIMED IS:
1. A collection of probes or probe pairs that hybridize to pairs of regions in ten or more HCV genomes to detect or amplify those regions from a clinical sample.
2. The collection of probes or probe pairs of claim 1 wherein the collection allows HCV nucleic acid from a clinical sample to be sequenced with less than 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 , 0.9, 0.8, 0.7, 0.6, 0.55, 0.5, 0.45, 0.4, 0.3, 0.2, or 0.1 % contamination of the final sequencing output by non-HCV nucleic acids, preferably wherein the non-HCV nucleic acids are human nucleic acids.
3. The collection of probes of Claim 1 or Claim 2, wherein the collection of probes comprise probes with one or more of the nucleotide sequences in tables 1-4 or the reverse complement thereof; particularly where the collection of probes comprises at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, or 850, or more of the nucleotide sequences in tables 1-4, or the reverse complement thereof; or 0.1 , 0.2, 0.5, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or 100% more of the nucleotide sequences in any one of tables 1-4, or the reverse complement thereof.
4. The collection of probes of Claim 3, where the collection of probes comprises at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 30, 40, 50, or all 54 sequences in table 1 , or the reverse complement thereof; or more particularly where the collection of probes comprises at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, or all 27 pairs of probes in table 1 or the reverse complement thereof.
5. The collection of probes of Claim 3, where the collection of probes comprises at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 100, 150, 200, 250, 300, or all 324 sequences in table 2, or the reverse complement thereof; or more particularly where the collection of probes comprises at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, or all 162 pairs of sequences in table 2, or the reverse complement thereof.
6. The collection of probes of Claim 3, where the collection of probes comprises at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 100, 150, 200, or all 238 sequences in table 3, or the reverse complement thereof; or more particularly where the collection of probes comprises at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, or all 119 pairs of sequences in table 3, or the reverse complement thereof.
7. The collection of probes of Claim 3, where the collection of probes comprises at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 100, 150, 200, 250, or all 258 sequences in table 4, or the reverse complement thereof; or more particularly where the collection of probes comprises at least 1 , 2, 3,
4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, or all 129 pairs of sequences in table 4, or the reverse complement thereof.
8. A collection of probes comprising the collection of two or more of the collections of Claims 1-8; more particularly wherein the collection comprises the collection of Claim 4 and the collection of Claim 5, Claim 6, or Claim 7; still more particularly wherein the collection comprises the collection of Claim 4 and the collection of Claim
5, Claim 6, and Claim 7; and further particularly wherein the collection comprises at least 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 % of the probes in table 1 , or the reverse complement thereof, with at least 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 % of the probes in tables 2-4, or the reverse complement thereof.
9. A single-stranded nucleic acid probe comprising a nucleic acid sequence of the formula: 5'-A-B-C-3' wherein
i) A is a probe arm sequence listed in column 2 of any one of tables 1 -4; and
ii) C is the corresponding probe arm sequence to "i" listed in column 3 of any one of tables 1-4; and
iii) B is a backbone sequence.
10. The single-stranded nucleic acid probe of claim 9, wherein the backbone
sequence B comprises a detectable moiety.
1 1. The single-stranded nucleic acid probe of claim 9, wherein the one or more
detectable moiety is a barcode sequence.
12. The single-stranded nucleic acid probe of claim 9, wherein the backbone
sequence B comprises one or more primer-binding sites.
13. A composition comprising a plurality of probes according to any one of claims 1 - 12.
14. The composition of claim 13, wherein each probe arm sequence in the group consisting of all sequences from tables 1-4 is contained in at least one probe in the plurality of probes.
15. The composition of claim 13, wherein each probe arm sequence in the group consisting of all sequences from tables -1-4, or the complement thereof, is contained in at least one probe in the plurality of probes.
6. The composition of claim 13, further comprising extracted nucleic acids from a test sample.
17. The composition of claim 16, wherein the extracted nucleic acids are from a biological sample; preferably wherein the biological sample is isolated from a subject; more preferably wherein the subject is a human; still more preferably wherein the human subject is suspected of having a hepatitis virus.
18. The composition of claim 13, further comprising at least one sample internal calibration standard nucleic acid.
19. The composition of claim 18, further comprising at least one probe that
specifically hybridizes with the sample internal calibration standard nucleic acid.
20. The composition of any one of Claims 13-19, further comprising extracted nucleic acids from a test sample.
21 . A kit comprising the composition of any one of claims 1 -20; optionally comprising instructions for use.
22. The kit of claim 21 , further comprising reagents for nucleic acid extraction;
preferably wherein the reagents are for RNA extraction.
23. A kit comprising one or more of the nucleic acid sequences in Table 6, or the reverse complement thereof; more particularly wherein the one or more nucleic acid sequences are 2, 3, 4, 5, or all of the nucleic acid sequences in Table 6.
24. A composition comprising one or more of the nucleic acid sequences in Table 6; more particularly wherein the one or more nucleic acid sequences are 2, 3, 4, 5, or all of the nucleic acid sequences in Table 6.
25. The composition of Claim 24, further comprising extracted nucleic acids from a test sample.
26. A method of detecting the presence of one or more strains of hepatitis C virus (HCV) in a test sample, comprising determining, in nucleic acids in an isolated test sample, the presence of one or more HCV nucleic acid sequences detectable with the collection of probes of any one of Claims 1 -9, probes according to any one of Claims 9-12, or composition of any one of Claims 13-20, wherein determining the presence of one or more HCV nucleic acid sequences indicates the presence of one or more strains of hepatitis C virus in the test sample; preferably wherein the method comprises the following steps
a) contacting an isolated test sample comprising nucleic acids with the collection of probes of any one of Claims 1-9, probes according to any one of Claims 9-12 to form a mixture;
b) capturing a region of interest in a hepatitis C virus genome by at least one single-stranded nucleic acid probe or probe pair hybridized to a first and second target sequence in the hepatitis virus genome to form a circularized probe; and
c) detecting the captured region of interest, thereby detecting the presence of the one or more strains of hepatitis C virus.
27. A method of detecting the genotype of one or more strains of hepatitis C virus (HCV) in a test sample, comprising determining, in nucleic acids in an isolated test sample, the presence of one or more HCV nucleic acid sequences detectable with the collection of probes of any one of Claims 1-9, probes according to any one of Claims 9-12, or composition of any one of Claims 13-20, wherein determining the presence of one or more HCV nucleic acid sequences indicates the genotype of one or more strains of hepatitis C virus in the test sample; preferably wherein the genotype of the strain identifies one or more HCV drug resistance mutations; more preferably wherein the method comprises the following steps:
a) contacting a test sample with the collection of probes of any one of Claims 1-9, probes according to any one of Claims 9-12, or composition of any one of Claims 13-20 to form a mixture;
b) capturing a region or set of regions of interest in a hepatitis C virus genome by at least one single-stranded nucleic acid probe hybridized to a first and second target sequence in the hepatitis virus genome to form a circularized probe; and c) determining the sequence of at least a portion of the captured region or regions of interest, thereby detecting the genotype of each of the one or more strains of HCV.
28. The method of claim 26 or 27, wherein the region of interest is captured by
polymerase-dependent extension from the 3' terminus of sequence C of a probe in the plurality of probes.
29. The method of claim 26 or 27, wherein the region of interest is captured by
sequence-specific ligation of a linking oligonucleotide.
30. The method of claim 26 or 27, further comprising the step of amplifying the
circularized probe to form a plurality of amplicons containing the captured region of interest.
31 . The method of claim 26 or 27, further comprising the step of treating the mixture with a nuclease to remove linear nucleic acids between steps (b) and (c).
32. The method of claim 31 , further comprising the step of sequencing the region of interest.
33. The method of claim 32, further comprising the step of comparing the sequence of the captured region of interest to the sequence of known hepatitis C virus genomes.
34. The method of claim 33, further comprising the step of comparing the sequence of the captured region of interest to a database of known HCV mutations; particularly wherein the HCV mutations are HCV drug resistance mutations that confer drug resistance to an HCV strain.
35. The method of any one of claims 32-34, wherein the sequence of interest contains one or more of any one of the group selected from a single nucleotide polymorphism (SNP), an insertion, a deletion, and an indel.
36. The method of claim 35, further comprising the step of analyzing the sequence of the captured region of interest with respect to the sequence of known hepatitis C virus genomes and a model of sequencing errors to estimate the proportions or abundances of the hepatitis strains present in the sample.
37. The method of claim 32, further comprising the step of analyzing the sequence of the captured region of interest with respect to the sequence of known HCV drug resistance mutations that confer drug resistance to an HCV strain and a model of sequencing errors to estimate the proportions or abundances of the HCV drug resistance mutations present in the sample.
38. The method of claim 26 or 27, wherein the test sample is obtained from a human subject.
39. The method of claim 38, wherein the test sample is blood.
40. The method of any one of claims 26-39, further comprising the step of adding a sample internal calibration standard to the test sample.
41 . The method of claim 40, further comprising the steps of adding a probe that
specifically hybridizes with the sample internal calibration standard and detecting the sample internal calibration standard.
42. The method of any one of claims 26-41 , further comprising the step of formatting the results to inform physician decision making.
43. The method of claim 42, wherein the formatting includes providing an estimated quantity of one or more hepatitis C strains of interest or HCV drug resistance mutations of interest.
44. The method of any one of claims 26-44 in which the nucleic acids in the test sample comprise an appended nucleic acid sequence; preferably wherein the appended nucleic acid sequence is a known sequence.
45. The method of any one of claims 26-44 in which the probes are interrogated for information content by DNA or RNA sequencing.
46. The method of any one of claims 26-44 in which the nucleic acids in the test sample DNA and/or RNA.
47. The method of claim 42 or 43, wherein the formatted results comprise a
therapeutic recommendation based on the one or more hepatitis strains or HCV drug resistance mutations detected; preferably wherein the therapeutic recommendation in given by Figure 8.
48. A method of treating a subject infected with hepatitis C virus, comprising
performing the method of Claim 26 or 27 and administering a suitable therapy to the subject based on the at least one hepatitis C strain detected; preferably wherein the therapeutic recommendation in given by Figure 8.
49. A method of treating a subject infected with HCV, comprising performing the method of Claim 26 or 27 and administering a suitable therapy to the subject based on the at least one HCV genotype detected; preferably wherein the therapeutic recommendation in given by Figure 8.
50. A collection of probes or probe pairs that hybridize to pairs of regions in ten or more HCV genomes to select or amplify those regions from a clinical sample, and additionally select or amplify human or non-human derived nucleic acids from this sample, for example Hepatitis D virus.
51. The collection of probes of claim 47 in which the plurality of probes is designed to contact and select or amplify for subsequent detection regions of both HCV genome and non-HCV genomes such as HDV or Human genomes.
52. A method of identifying HCV genome derived sequence, the sequence from a computer-readable file, by means of comparison with a database of HCV sequence regions that do not represent an entire HCV genome, wherein the comparison is computer-assisted sequence comparison; preferably wherein the file is a FASTQ file or SFF.
53. A method of identifying HCV genome derived sequence the sequence from an incomplete sequencing machine run file by means of comparison with a database of HCV sequence regions that do not represent an entire HCV genome.
54. A non-transitory computer-readable storage medium that provides instructions that, if executed by a computer, will cause the computer to perform operations comprising the method of any one of Claims 26-46.
55. A system comprising the storage medium of Claim 54 and a processor for
executing the instructions.
56. A method of detecting the presence of one or more strains of HCV in a test
sample comprising performing the method of any one of Claims 26-46 on the system of Claim 55.
PCT/US2012/054901 2011-09-12 2012-09-12 Nucleic acids for multiplex detection of hepatitis c virus WO2013040060A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161533520P 2011-09-12 2011-09-12
US61/533,520 2011-09-12

Publications (2)

Publication Number Publication Date
WO2013040060A2 true WO2013040060A2 (en) 2013-03-21
WO2013040060A3 WO2013040060A3 (en) 2013-05-16

Family

ID=47883945

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/054901 WO2013040060A2 (en) 2011-09-12 2012-09-12 Nucleic acids for multiplex detection of hepatitis c virus

Country Status (1)

Country Link
WO (1) WO2013040060A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017021752A1 (en) * 2015-08-03 2017-02-09 Universite Joseph Fourier Methods for amplifying and sequencing the genome of a hepatitis c virus
WO2018108328A1 (en) * 2016-12-16 2018-06-21 F. Hoffmann-La Roche Ag Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
US20210292352A1 (en) * 2014-08-04 2021-09-23 The Trustees Of The University Of Pennsylvania Transcriptome In Vivo Analysis (TIVA) and Transcriptome In Situ Analysis (TISA)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006043100A2 (en) * 2004-10-22 2006-04-27 Iqur Limited Methods for genotyping hcv
US20070141560A1 (en) * 2005-06-30 2007-06-21 Roche Molecular Systems Probes and methods for hepatitis C virus typing using multidimensional proble analysis
EP1953242A1 (en) * 2007-02-05 2008-08-06 INSERM (Institut National de la Santé et de la Recherche Medicale) Methods and kits for determining drug sensitivity in patientsinfected with HCV
US20080241832A1 (en) * 2002-07-26 2008-10-02 Abbott Laboratories Method of detecting and quantifying hepatitis c virus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080241832A1 (en) * 2002-07-26 2008-10-02 Abbott Laboratories Method of detecting and quantifying hepatitis c virus
WO2006043100A2 (en) * 2004-10-22 2006-04-27 Iqur Limited Methods for genotyping hcv
US20070141560A1 (en) * 2005-06-30 2007-06-21 Roche Molecular Systems Probes and methods for hepatitis C virus typing using multidimensional proble analysis
US20090098532A1 (en) * 2005-06-30 2009-04-16 Roche Molecular Systems, Inc. Probes and Methods for Hepatitis C Virus Typing Using Single Probe Analysis
EP1953242A1 (en) * 2007-02-05 2008-08-06 INSERM (Institut National de la Santé et de la Recherche Medicale) Methods and kits for determining drug sensitivity in patientsinfected with HCV

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210292352A1 (en) * 2014-08-04 2021-09-23 The Trustees Of The University Of Pennsylvania Transcriptome In Vivo Analysis (TIVA) and Transcriptome In Situ Analysis (TISA)
US11873312B2 (en) * 2014-08-04 2024-01-16 The Trustees Of The University Of Pennsylvania Transcriptome in vivo analysis (TIVA) and transcriptome in situ analysis (TISA)
WO2017021752A1 (en) * 2015-08-03 2017-02-09 Universite Joseph Fourier Methods for amplifying and sequencing the genome of a hepatitis c virus
WO2017021471A1 (en) * 2015-08-03 2017-02-09 Universite Grenoble Alpes Methods for amplifying and sequencing the genome of a hepatitis c virus
WO2018108328A1 (en) * 2016-12-16 2018-06-21 F. Hoffmann-La Roche Ag Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
CN110036117A (en) * 2016-12-16 2019-07-19 豪夫迈·罗氏有限公司 Increase the method for the treating capacity of single-molecule sequencing by multi-joint short dna segment
JP2020501554A (en) * 2016-12-16 2020-01-23 エフ.ホフマン−ラ ロシュ アーゲーF. Hoffmann−La Roche Aktiengesellschaft Method for increasing the throughput of single molecule sequencing by linking short DNA fragments

Also Published As

Publication number Publication date
WO2013040060A3 (en) 2013-05-16

Similar Documents

Publication Publication Date Title
EP2341151B1 (en) Methods for determining sequence variants using ultra-deep sequencing
US20170253922A1 (en) Human identification using a panel of snps
US20130261196A1 (en) Nucleic Acids For Multiplex Organism Detection and Methods Of Use And Making The Same
US20110003701A1 (en) System and method for improved processing of nucleic acids for production of sequencable libraries
KR20180020137A (en) Error suppression of sequenced DNA fragments using redundant reading with unique molecule index (UMI)
EP2616555B1 (en) Capture probes immobilizable via l-nucleotide tail
WO2013173774A2 (en) Molecular inversion probes
US9677122B2 (en) Integrated capture and amplification of target nucleic acid for sequencing
CA2848304A1 (en) Methods for sequencing a polynucleotide
US20220389408A1 (en) Methods and compositions for phased sequencing
KR20140087044A (en) Method and system for detection of an organism
US10011866B2 (en) Nucleic acid ligation systems and methods
CA2742754A1 (en) System and method for detection of hiv integrase variants
US20120244523A1 (en) System and Method for Detection of HIV Integrase Variants
WO2013040060A2 (en) Nucleic acids for multiplex detection of hepatitis c virus
WO2013173795A1 (en) Realtime sequence based biosurveillance system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12832670

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12832670

Country of ref document: EP

Kind code of ref document: A2