US20070042388A1 - Method of probe design and/or of nucleic acids detection - Google Patents

Method of probe design and/or of nucleic acids detection Download PDF

Info

Publication number
US20070042388A1
US20070042388A1 US11/202,023 US20202305A US2007042388A1 US 20070042388 A1 US20070042388 A1 US 20070042388A1 US 20202305 A US20202305 A US 20202305A US 2007042388 A1 US2007042388 A1 US 2007042388A1
Authority
US
United States
Prior art keywords
nucleic acid
probes
target nucleic
probe
biological sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/202,023
Inventor
Christopher Wong
Wing-Kin Sung
Charlie Lee
Lance Miller
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agency for Science Technology and Research Singapore
Original Assignee
Agency for Science Technology and Research Singapore
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency for Science Technology and Research Singapore filed Critical Agency for Science Technology and Research Singapore
Priority to US11/202,023 priority Critical patent/US20070042388A1/en
Assigned to AGENCY FOR SCIENCE, TECHNOLOGY AND RESEARCH reassignment AGENCY FOR SCIENCE, TECHNOLOGY AND RESEARCH ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MILLER, LANCE D., SUNG, WING-KIN, LEE, CHARLIE, WONG, CHRISTOPHER W.
Priority to US11/990,290 priority patent/US8234079B2/en
Priority to PCT/SG2006/000224 priority patent/WO2007021250A2/en
Priority to JP2008525967A priority patent/JP2009504153A/en
Priority to KR1020087006089A priority patent/KR20080052585A/en
Priority to EP06769707A priority patent/EP1922418A4/en
Priority to CN2006800369768A priority patent/CN101292044B/en
Priority to AU2006280489A priority patent/AU2006280489B2/en
Publication of US20070042388A1 publication Critical patent/US20070042388A1/en
Priority to US13/549,032 priority patent/US20120309643A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B99/00Subject matter not provided for in other groups of this subclass
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6811Selection methods for production or design of target specific oligonucleotides or binding molecules
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation

Definitions

  • the present invention relates to the field of oligoucleotide probe design and nucleic acids detection.
  • the method according to the invention may be used for the detection of pathogens, for example viruses.
  • the present invention addresses the problems above, and in particular provides an alternative and/or improved method of probe design and/or for nucleic acid detection.
  • the present invention provides a method of designing oligonucleotide probe(s) for nucleic acid detection comprising the following steps in any order:
  • a score of AE is determined for every position i on the length of the target nucleic acid or of a region thereof and subsequently, an average AE score is obtained.
  • Those regions showing an AE score higher than the average may be selected as the region(s) of the target nucleic acid to be amplified.
  • the AE of the selected region(s) may be calculated as the Amplification Efficiency Score (AES), which is the probability that a forward primer r i can bind to a position i and a reverse primer r j can bind at a position j of the target nucleic acid, and
  • may be ⁇ 10000 bp, more in particular ⁇ 5000 bp, or ⁇ 1000 bp, for example ⁇ 500 bp.
  • the forward and reverse primers may be random primers.
  • the step (i) comprises determining the effect of geometrical amplification bias for every position of a target nucleic acid, and selecting the region(s) to be amplified as the region(s) having an efficiency of amplification (AE) higher than the average AE.
  • the geometrical amplification bias is the PCR bias.
  • step (ii) of designing oligonucleotide probe(s) capable of hybridizing to the region(s) selected in step (i) may be carried out according to any probe designing technique known in the art.
  • the oligonucleotide probe(s) capable of hybridizing to the selected region(s) may be selected and designed according to at least one of the following criteria:
  • the probe(s) may be designed by applying all criteria (a) to (e).
  • Other criteria not explicitly mentioned herein but which are within the knowledge of a skilled person in the art may also be used.
  • a probe p i at position i of a target nucleic acid v a is selected if P(p i
  • the method of designing the oligonucleotide probe(s) as described above further comprises a step of preparing the selected and designed probe(s).
  • the probe may be prepared according to any standard method known in the art. For example, by chemical synthesis.
  • the present invention provides a method of detecting at least one target nucleic acid comprising the steps of:
  • the amplification step (ii) may be carried out in the presence of random primers.
  • the amplification step (ii) may be carried out in the presence of more than two random primers. Any amplification method known in the art may be used.
  • the amplification is a RT-PCR.
  • the amplification step may comprise forward and reverse primers, and each of the forward and reverse primers may comprise, in a 5′-3′ orientation, a fixed primer header and a variable primer tail, and wherein at least the variable tail hybridizes to a portion of the target nucleic acid v a .
  • the amplification step may comprise forward and/or reverse random primers having the nucleotide sequence of SEQ ID NO:1 or a variant or derivative thereof.
  • the biological sample may be any sample taken from a mammal, for example from a human being.
  • the biological sample may be tissue, sera, nasal pharyngeal washes, saliva, any other body fluid, blood, urine, stool, and the like.
  • the biological sample may be treated to free the nucleic acid comprised in the biological sample before carrying out the amplification step.
  • the target nucleic acid may be any nucleic acid which is intended to be detected.
  • the target nucleic acid to be detected may be at least a nucleic acid exogenous to the nucleic acid of the biological sample. Accordingly, if the biological sample is from a human, the exogenous target nucleic acid to be detected (if present in the biological sample) is a nucleic acid which is not from human origin.
  • the target nucleic acid to be detected is at least a pathogen genome or fragment thereof.
  • the pathogen nucleic acid may be at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof.
  • the invention provides a method of detection of at least a target nucleic acid, if present, in a biological sample.
  • the method may be a diagnostic method for the detection of the presence of a pathogen in the biological sample. For example, if the biological sample is obtained from a human being, the target nucleic acid, if present in the biological sample, is not from human.
  • the probe(s) designed and prepared according to any method of the present invention may be used in solution or may be placed on an insoluble support.
  • the probe(s) may be applied, spotted or printed on an insoluble support according to any technique known in the art.
  • the support may be a solid support or a gel.
  • the support with the probes applied on it may be a microarray or a biochip.
  • the probes are then contacted with the nucleic acid(s) of the biological sample, and, if present, the target nucleic acid(s) and the probe(s) hybridize, and the presence of the target nucleic acid is detected.
  • the mean of the signal intensities of the probes which hybridize to v a is statistically higher than the mean of the probes ⁇ v a , thereby indicating the presence of v a in the biological sample.
  • the mean of the signal intensities of the probes which hybridize to v a is statistically higher than the mean of the probes ⁇ v a
  • the method further comprises the step of computing the relative difference of the proportion of probes ⁇ v a having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes v a being more positively skewed than that of probes ⁇ v a , thereby indicating the presence of v a in the biological sample.
  • the presence of a target nucleic acid in a biological sample is given by a value of t-test ⁇ 0.1 and/or a value of Weighted Kullback-Leibler divergence of ⁇ 1.0, preferably ⁇ 5.0.
  • the t-test value is ⁇ 0.05.
  • the present invention provides a method of determining the presence of a target nucleic acid v a comprising detecting the hybridization of a probe (the probe being selected and designed according to any known method in the art and not necessary limited to the methods according to the present invention) to a target nucleic acid v a and wherein the mean of the signal intensities of the probes which hybridize to v a is statistically higher than the mean of the probes ⁇ v a , thereby indicating the presence of v a .
  • the mean of the signal intensities of the probes which hybridize to v a is statistically higher than the mean of the probes ⁇ v a
  • the method further comprises the step of computing the relative difference of the proportion of probes ⁇ v a having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes v a being more positively skewed than that of probes ⁇ v a , thereby indicating the presence of v a .
  • the presence of a target nucleic acid in a biological sample is given by a value of t-test ⁇ 0.1 and/or a value of Weighted Kullback-Leibler divergence of ⁇ 1.0, preferably ⁇ 5.0.
  • the t-test value may be ⁇ 0.05.
  • the present invention provides a method of detecting at least a target nucleic acid, comprising the steps of:
  • step (iv) the mean of the signal intensities of the probes which hybridize to v a is statistically higher than the mean of the probes ⁇ v a
  • the method further comprises the step of computing the relative difference of the proportion of probes ⁇ v a having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes v a being more positively skewed than that of probes ⁇ v a , thereby indicating the presence of v a in the biological sample.
  • the presence of a target nucleic acid in a biological sample is given by a value of t-test ⁇ 0.1 and/or a value of Weighted Kullback-Leibler divergence of ⁇ 1.0, preferably ⁇ 5.0.
  • the t-test value may be ⁇ 0.05.
  • the nucleic acid to be detected is nucleic acid exogenous to the nucleic acid of the biological sample.
  • the target nucleic acid to be detected may be at least a pathogen genome or fragment thereof.
  • the pathogen nucleic acid may be at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof.
  • the target nucleic acid if present in the biological sample, is not from the human genome.
  • the probes may be placed on an insoluble support.
  • the support may be a microarray or a biochip.
  • FIG. 1 shows a RT-PCR binding process of a pair of random primers on a virus sequence.
  • FIG. 2 shows an Amplification Efficiency Scoring (AES) Map for the RSV B genome.
  • AES Amplification Efficiency Scoring
  • FIG. 3 shows signal intensities for 1 experiment for RSV B.
  • FIGS. 4 (A, B).
  • FIG. 4A shows the density distribution of signal intensities of a virus that is the sample tested. An arrow indicates the positive skewness of the distribution. This indicates that although there is noise, there is significant amount of real signals as well.
  • FIG. 4B shows the density distribution of signal intensities of a virus not in the sample. It is noise dominant.
  • FIG. 5 shows an analysis framework of pathogen detection chip data.
  • the present invention addresses the problems of the prior art, and in particular provides an alternative and/or improved method of probe design and/or of nucleic acids detection.
  • the inventors realized that to generate probes which would hybridize consistently well to patient material, it would be necessary to develop a new and/or improved method of probe design so as to determine the optimal design predictors.
  • the present inventors created a microarray comprising overlapping 40-mer probes, tiled across 35 viral genomes.
  • the invention is not limited to this particular application, probe length and type of target nucleic acid.
  • the present inventors describe how a support, in particular a microarray platform, is optimized so as to become a viable tool in target nucleic acid detection, in particular, in pathogen detection.
  • the inventors also identified probe design predictors, including melting temperature, GC-content of the probe, secondary structure, hamming distance, similarity to human genome, effect of PCR primer tag in random PCR amplification efficiency, and/or the effect of sequence polymorphism. These results were considered and/or incorporated into the development of a method and criteria for probe design.
  • the inventors developed a data analysis algorithm which may accurately predict the presence of a target nucleic acid, which may or may not be a pathogen.
  • the pathogen may be, but not limited to, a virus, bacteria and/or parasite(s).
  • the algorithm may be used even if probes are not ideally designed. This detection algorithm, coupled with a probe design methodology, significantly improves the confidence level of the prediction (see Tables 1 and 2).
  • the method of the invention may not require a prediction of the likely pathogen, but may be capable of detecting most known human viruses, bacteria and/or parasite(s), as well as some novel species, in an unbiased manner.
  • Genome or a fragment thereof is defined as all the genetic material in the chromosomes of an organism.
  • DNA derived from the genetic material in the chromosomes of a particular organism is genomic DNA.
  • a genomic library is a collection of clones made from a set of randomly generated overlapping DNA fragments representing the entire genome of an organism. The rationale behind this detection platform according to the invention is that each species of virus, bacteria and/or parasite(s) contains unique molecular signatures within the primary sequence of their genomes.
  • oligonucleotide probe design allows for rational oligonucleotide probe design for the specific detection of individual species, and in some cases, individual strains.
  • the concomitant design and/or preparation of oligonucleotide (oligo) probes that represent the most highly conserved regions among family and genus members, will enable the detection and partial characterization of some novel pathogens.
  • the inclusion of all such probes in a single support may allow the detection of multiple viruses, bacteria and/or parasite(s) that simultaneously co-infect a clinical sample.
  • the support may be an insoluble support, in particular a solid support. For example, a microarray or a biochip assay.
  • the invention may be used as a diagnostic tool, depending on the way in which oligonucleotide probes are designed, and/or how the data generated by the microarray is interpreted and analyzed.
  • the present invention provides a method of designing oligonucleotide probe(s) for nucleic acid detection comprising the following steps in any order:
  • a score of AE is determined for every position i on the length of the target nucleic acid or of a region thereof and an average AE is obtained.
  • Those regions showing an AE higher than the average may be selected as the region(s) of the target nucleic acid to be amplified.
  • the AE of the selected region(s) may be calculated as the Amplification Efficiency Score (AES), which is the probability that a forward primer r i can bind to a position i and a reverse primer r j can bind at a position j of the target nucleic acid, and
  • is the region of the target nucleic acid desired to be amplified.
  • may be ⁇ 10000 bp, more in particular ⁇ 5000 bp, or ⁇ 1000 bp, for example ⁇ 500 bp.
  • the forward and reverse primers may be random primers.
  • the step (i) of identifying and selecting region(s) of a target nucleic acid to be amplified comprises determining the effect of geometrical amplification bias for every position of a target nucleic acid, and selecting the region(s) to be amplified as the region(s) having an efficiency of amplification (AE) higher than the average AE.
  • the geometrical amplification bias may be defined as the capability of some regions of a nucleic acid to be amplified more efficiently than other regions.
  • the geometrical amplification bias is the PCR bias.
  • random primers may be used during the amplification step and/or the reverse-transcription (RT) process to ensure unbiased reverse-transcription of all RNA present into DNA.
  • Any random amplification method known in the art may be used for the purposes of the present invention.
  • the random amplification method will be RT-PCR.
  • the method of the present invention is not limited to RT-PCR.
  • the RT-PCR approach may be susceptible to signal inaccuracies caused by primer-dimer bindings and poor amplification efficiencies in the RT-PCR process (Bustin, S. A., et al, 2004). To overcome this hurdle, the inventors have modeled the RT-PCR process by using random primers.
  • the amplification step comprises forward and reverse primers, and each of the forward and reverse primers comprises, in a 5′-3′ orientation, a fixed primer header and a variable primer tail, and wherein at least the variable tail hybridizes to a portion of the target nucleic acid v a .
  • the size of the fixed primer header and that of the variable primer tail may be of any size, in mer, suitable for the purposes of the method according to the present invention.
  • the fixed header may be 10-30 mer, in particular, 15-25 mer, for example 17 mer.
  • the variable tail may be 1-20 mer, in particular, 5-15 mer, for example 9 mer.
  • An example of these forward and reverse primers is shown in FIG. 1 .
  • the amplification step may comprise forward and/or reverse random primers having the nucleotide sequence 5′-GTTTCCCAGTCACGATANNNNNNN-3′, (SEQ ID NO:1), wherein N is any one of A, T, C, and G or a derivative thereof.
  • the present inventors have modeled the random RT-PCR process as follows. Let v a be the actual virus in the sample.
  • the random primer used in the RT-PCR process has a fixed 17-mer header and a variable 9-mer tail of the form (5′-GTTTCCCAGTCACGATANNNNNN-3′)(SEQ ID NO:1).
  • v a be the actual virus in the sample.
  • the random primer used in the RT-PCR process has a fixed 17-mer header and a variable 9-mer tail of the form (5′-GTTTCCCAGTCACGATANNNNNN-3′)(SEQ ID NO:1).
  • the inventors required (1) a forward primer binding to position i, (2)
  • which is the region of the target nucleic acid desired to be amplified, is ⁇ 1000
  • may be ⁇ 10000 bp, more in particular ⁇ 5000 bp, or also ⁇ 500 bp.
  • the quality of the RT-PCR product depends on how well the forward primer and the reverse primer bind to v a . Some random primers can bind to v a better than others. The identification of such primers and where they bind to v a gives an indication of how likely a particular region of v a will be amplified.
  • an amplification efficiency model may be proposed which computes an Amplification Efficiency Score (AES) for every position of v a .
  • AES Amplification Efficiency Score
  • P f (i)and P r (i) are the probabilities that a random primer r i can bind to position i of v a as forward primer and reverse primer respectively.
  • a random primer can only bind to v a if the last 9 nucleotides of the random primer is a substring of the reverse complement of v a (forward primer) or a substring of v a (reverse primer). This is shown in FIG. 1 .
  • Based on well-established primer design criteria Wang, D.
  • the P f (i) was estimated to be low if r i forms a significant primer-dimer or has extreme melting temperature. On the other hand, if r i does not form any significant primer-dimer and has optimal melting temperature, then P f (i) will be high. Note that if the header of the random primer is similar to v a , it may also aid in the binding and thus result in a higher P f (i). Similarly, the P r (i) was computed.
  • the binding of the random primer r i at position i of v a as a forward primer affects the quality of the RT-PCR product for at least 1000 nucleotides upstream of position i.
  • the binding of the random primer r i at position i of v a as a reverse primer affects the quality of the RT-PCR product for at least 10000 nucleotides downstream of position i.
  • Z may be ⁇ 5000 bp, ⁇ 1000 bp or ⁇ 500 bp.
  • Z is ⁇ 10000 bp.
  • the step (ii) of designing oligonucleotide probe(s) capable of hybridizing to the selected region(s) may be selected to any one of the probe designing techniques known in the art.
  • a set of target nucleic acids for example, viral genomes
  • V ⁇ v 1 , v 2 , . . . , v n ⁇
  • a set of length-m probes that is a substring of v i ) which satisfies the following conditions may be designed taking into consideration, for example, at least one of the following:
  • CG-content of probes selected should be from 40% to 60%.
  • the present invention provides a method of designing oligonucleotide probe(s) for nucleic acid detection, comprising selecting the probes having a CG-content from 40% to 60%.
  • Hybridization refers to the process in which the oligo probes bind non-covalently to the target nucleic acid, or portion thereof, to form a stable double-stranded. Triple-stranded hybridization is also theoretically possible.
  • Hybridization probes are oligonucleotides capable of binding in a base-specific manner to a complementary strand of target nucleic acid.
  • Hybridizing specifically refers to the binding, duplexing, or hybridizing of a molecule substantially to or only to a particular nucleotide sequence or sequences under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) of DNA or RNA.
  • Hybridizations are generally performed under stringent conditions.
  • the salt concentration is no more than about 1 Molar (M) and a temperature of at least 25° C., e.g., 750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4 (5 times SSPE) and a temperature of from about 25° C. to about 30° C.
  • Hybridization is usually performed under stringent conditions, for example, at a salt concentration of no more than 1 M and a temperature of at least 25° C.
  • stringent conditions see also for example, Sambrook and Russel, Molecular Cloning: A Laboratory Manual, Cold Springs Harbor Laboratory, New York (2001) which is hereby incorporated by reference in its entirety for all purposes above.
  • Sensitivity requires that probes that cannot form significant secondary structures be selected in order to detect low-abundance mRNAs. Thus, probes with the highest free energy computed based on Nearest-Neighbor model are selected (SantaLucia, J., Jr., et al., 1996).
  • the present invention provides a method of designing oligonucleotide probe(s) for nucleic acid detection, wherein the probe(s) are selected by having the highest free energy computed based on Nearest-Neighbor model.
  • probes a and probe s b substrings of target nucleic acids v a and v b are selected based on the hamming distance between s a and any length-m substring s b from the target nucleic acid v b and/or on the longest common substring of s a and probe s b.
  • s a and s b be length-m substrings from viral genome v a and v b respectively, where (v a ⁇ v b ).
  • the length of the probe(s) to be designed may be of any length useful for the purposes of the present invention.
  • the probes may be less than 100 mer, for example 20 to 80 mer, 25 to 60 mer, for example 40 mer.
  • the hamming distance and/or longest common substring may also vary.
  • s a is specific to v a if:
  • the cutoff value(s) for the hamming distance may be chosen according to the stringency desired. It will be evident to any skilled person how to select the hamming distance cutoff according to the particular stringency desired. According to a particular example of the herein described probe design, the inventors used hamming distance cutoffs of >10 for specific probes, and ⁇ 10, preferably ⁇ 5 for conserved probes. With a specific probe, it indicates a probe which only hybridizes to a specific target nucleic acid, while with a conserved probe it indicates a probe which may hybridize to any member of the family of the target nucleic acid.
  • the present invention also provides a method of designing oligonucleotide probe(s) for nucleic acid detection, wherein given probe s a and probe s b substrings of target nucleic acids v a and v b comprised in the biological sample, s a is selected if the hamming distance between s a and any length-m substring s b from the target nucleic acid v b is more than 0.25m, and the longest common substring of s a and probe s b is less than 15.
  • the target nucleic acid to be detected is of human origin (for example, human samples containing viral genomes)
  • probes with high homology to the human genome should also be avoided. Accordingly, for any probe s a of length-m specific for the target nucleic acid v a , the probe s a is selected if it does not have any hits with any region of a nucleic acid different from the target nucleic acid, and if the probe s a length-m has hits with the nucleic acid different from the target nucleic acid, the probe s a length-m with the smallest maximum alignment length and/or with the least number of hits is selected.
  • the present invention provides a method of designing oligonucleotide probe(s) for nucleic acid detection, wherein for any probe s a of length-m specific for the target nucleic acid v a , the probe s a is selected if it does not have any hits with any region of a nucleic acid different from the target nucleic acid, and if the probe s a length-m has hits with the nucleic acid different from the target nucleic acid, the probe s a length-m with the smallest maximum alignment length and/or with the least number of hits is selected.
  • the design of the oligonucleotide probe(s) may be also carried out by AES according to the invention.
  • the invention provides a method of selecting and designing probes wherein a probe p i at position i of a target nucleic acid is selected if p i is predicted to hybridize to the position i of the amplified target nucleic acid.
  • the oligonucleotide probe(s) capable of hybridizing to the selected region(s) may be selected and designed according to at least one of the following criteria:
  • two or more of the criteria indicated above may be used for designing the oligonucleotide probe(s).
  • the probe(s) may be designed by applying all criteria (a) to (e).
  • Other criteria not explicitly mentioned herein but which are evident to a skilled person in the art may also be used.
  • a probe p i at position i of a target nucleic acid v a is selected if P(p i
  • the invention provides a method as above described wherein P(p i
  • v a ) ⁇ P(X ⁇ x i ) c i /k, wherein X is the random variable representing the amplification efficiency score (AES) values of all probes of v a , k is the number of probes in v a , and c i is the number of probes whose AES values are ⁇ x i .
  • AES amplification efficiency score
  • the method of selecting and designing the oligonucleotide probe(s) as described above further comprises a step of preparing the selected and designed probe(s).
  • Designing a probe comprises understanding its sequence and/or designing it by any suitable means, for example by using a software.
  • the step of preparing the probe comprises the physical preparation of it.
  • the probe may be prepared according to any standard method known in the art.
  • the probes may be chemically synthesized or prepared by cloning. For example, as described in Sambrook and Russel, 2001.
  • the probe(s) designed and prepared according to any method of the present invention may used in solution or may be placed on an insoluble support.
  • an insoluble support For example, may be applied, spotted or printed on an insoluble support according to any technique known in the art.
  • the support may be a solid support or a gel.
  • the support with the probes applied on it, may be a microarray or a biochip.
  • the present invention provides an oligo microarray hybridization-based approach for the rapid detection and identification of pathogens, for example viral and/or bacterial pathogens, from PCR-amplified cDNA prepared from primary tissue samples.
  • pathogens for example viral and/or bacterial pathogens
  • PCR-amplified cDNA prepared from primary tissue samples.
  • random PCR-amplified cDNA(s) for example, from random PCR-amplified cDNA(s).
  • an “array” is an intentionally created collection of molecules which can be prepared either synthetically or biosynthetically.
  • the molecules in the array can be identical or different from each other.
  • the array can assume a variety of formats, e.g., libraries of soluble molecules; libraries of compounds tethered to resin beads, silica chips, or other solid supports.
  • Array Plate or a Plate is a body having a plurality of arrays in which each array is separated from the other arrays by a physical barrier resistant to the passage of liquids and forming an area or space, referred to as a well.
  • the biological sample may be any sample taken from a mammal, for example from a human being.
  • the biological sample may be blood, a body fluid, saliva, urine, stool, and the like.
  • the biological sample may be treated to free the nucleic acid comprised in the biological sample before carrying out the amplification step.
  • the target nucleic acid may be any nucleic acid which is intended to be detected.
  • the target nucleic acid to be detected may be at least a nucleic acid exogenous to the nucleic acid of the biological sample. Accordingly, if the biological sample is from a human, the exogenous target nucleic acid to be detected (if present in the biological sample) is a nucleic acid which is not from human origin.
  • the target nucleic acid to be detected is at least a pathogen genome or fragment thereof.
  • the pathogen nucleic acid may be at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof.
  • the target nucleic acid(s) from a biological sample desired to be detected may be any target nucleic acid, RNA and/or DNA.
  • RNA and/or DNA may be any target nucleic acid, RNA and/or DNA.
  • the target nucleic acid to be detected may be a pathogen or non-pathogen.
  • it may be the genome or a fragment thereof of at least one virus, at least one bacterium and/or at least one parasite.
  • the probes selected and/or prepared may be placed, applied and/or fixed on a support according to any standard technology known to a skilled person in the art.
  • the support may be an insoluble support, for example a solid support.
  • a microarray and/or a biochip may be any standard technology known to a skilled person in the art.
  • RNA and DNA was extracted from patient samples e.g. tissues, sera, nasal pharyngeal washes, stool using established protocols and commercial kits.
  • Qiagen Kit for nucleic acid extraction may be used.
  • Phenol/Chloroform may also be used for the extraction of DNA and/or RNA. Any technique known in the art, for example as described in Sambrook and Russel, 2001 may be used.
  • RNA was reverse-transcribed to CDNA using tagged random primers, based on a protocol described by Bohlander et. al., 1992 and Wang et. al., 2003.
  • the cDNA was then amplified by random PCR. Fragmentation, labeling and hybridization of sample to the microarray were carried out as described by Wong et. al., 2004.
  • the present inventors selected 35 viral genomes representing the most common causes of viral disease in Singapore. Using the complete genome sequences downloaded from Genbank, 40-mer probes which tiled across the entire genomes and overlapping at five-base resolution were generated. Seven replicates of each virus probe were synthesized directly onto the microarray using Nimblegen technology (Nuwaysir, E. F., et al., 2002). The probes were randomly distributed on the microarray to minimize the effects of hybridization artifacts. To control the non-specific hybridization of sample to probes, 10,000 oligonucleotide probes were designed and synthesized onto the microarray.
  • oligonucleotide probes were designed and synthesized onto the microarray. These 10,000 oligonucleotides did not have any sequence similarity to the human genome, or to the pathogen genomes. They were random probes with 40-60% CG-content. These probes measured the background signal intensity.
  • the present invention provides a method of detecting at least one target nucleic acid comprising the step of:
  • the amplification step (ii) may be carried out in the presence of random primers.
  • the amplification step (ii) may be carried out in the presence of more than two random primers. Any amplification method known in the art may be used.
  • the amplification is a RT-PCR.
  • a forward random primer binding to position i and a reverse random primer binding to position j of a target nucleic acid v a are selected among primers having an amplification efficiency score (AES l ) for every position i of a target nucleic acid v a of:
  • the amplification step may comprise forward and reverse primers, and each of the forward and reverse primers may comprise, in a 5′-3′ orientation, a fixed primer header and a variable primer tail, and wherein at least the variable tail hybridizes to a portion of the target nucleic acid v a .
  • the amplification step may comprise forward and/or reverse random primers having the nucleotide sequence of SEQ ID NO:1 or a variant or derivative thereof.
  • the biological sample may be any sample taken from a mammal, for example from a human being.
  • the biological sample may be tissue, sera, nasal pharyngeal washes, saliva, any other body fluid, blood, urine, stool, and the like.
  • the biological sample may be treated to free the nucleic acid comprised in the biological sample before carrying out the amplification step.
  • the target nucleic acid may be any nucleic acid which is intended to be detected.
  • the target nucleic acid to be detected may be at least a nucleic acid exogenous to the nucleic acid of the biological sample. Accordingly, if the biological sample is from a human, the exogenous target nucleic acid to be detected (if present in the biological sample) is a nucleic acid which is not from human origin.
  • the target nucleic acid to be detected is at least a pathogen genome or fragment thereof.
  • the pathogen nucleic acid may be at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof.
  • the invention provides a method of detection of at least a target nucleic acid, if present, in a biological sample.
  • the method may be a diagnostic method for the detection of the presence of a pathogen into the biological sample. For example, if the biological sample is obtained from a human being, the target nucleic acid, if present in the biological sample, is not from human.
  • the probe(s) designed and prepared according to any method of the present invention may used in solution or may be placed on an insoluble support.
  • an insoluble support For example, may be applied, spotted or printed on an insoluble support according to any technique known in the art.
  • the support may be a solid support or a gel.
  • the support with the probes applied on it, may be a microarray or a biochip.
  • the probes are then contacted with the nucleic acid of the biological sample, and if present the target nucleic acid(s) and the probe(s) hybridize, and the presence of the target nucleic acid is detected.
  • the mean of the signal intensities of the probes which hybridize to v a is statistically higher than the mean of the probes ⁇ v a , thereby indicating the presence of v a in the biological sample.
  • the mean of the signal intensities of the probes which hybridize to v a is statistically higher than the mean of the probes ⁇ v a
  • the method further comprises the step of computing the relative difference of the proportion of probes ⁇ v a having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes v a being more positively skewed than that of probes ⁇ v a , thereby indicating the presence of v a in the biological sample.
  • the presence of a target nucleic acid in a biological sample is given by a value of t-test ⁇ 0.1 and/or a value of Weighted Kullback-Leibler divergence of ⁇ 1.0, preferably ⁇ 5.0.
  • the t-test value is ⁇ 0.05.
  • the present invention provides a method of determining the presence of a target nucleic acid v a comprising detecting the hybridization of a probe to a target nucleic acid v a and wherein the mean of the signal intensities of the probes which hybridize to v a is statistically higher than the mean of the probes ⁇ v a , thereby indicating the presence of v a .
  • the mean of the signal intensities of the probes which hybridize to v a is statistically higher than the mean of the probes ⁇ v a
  • the method further comprises the step of computing the relative difference of the proportion of probes ⁇ v a having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes v a being more positively skewed than that of probes ⁇ v a , thereby indicating the presence of v a .
  • the presence of a target nucleic acid in a biological sample is given by a value of t-test ⁇ 0.1 and/or a value of Weighted Kullback-Leibler divergence of ⁇ 1.0, preferably, ⁇ 5.0.
  • the t-test value may be ⁇ 0.05.
  • the present invention provides a method of detecting at least one target nucleic acid, comprising the steps of:
  • step (iv) the mean of the signal intensities of the probes which hybridize to v a is statistically higher than the mean of the probes ⁇ v a
  • the method further comprises the step of computing the relative difference of the proportion of probes ⁇ v a having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes v a being more positively skewed than that of probes ⁇ v a , thereby indicating the presence of v a in the biological sample.
  • the presence of a target nucleic acid in a biological sample is given by a value of t-test ⁇ 0.1 and/or a value of Weighted Kullback-Leibler divergence of ⁇ 1.0, preferably ⁇ 5.0.
  • the t-test value may be ⁇ 0.05.
  • the nucleic acid to be detected is nucleic acid exogenous to the nucleic acid of the biological sample.
  • the target nucleic acid to be detected may be at least a pathogen genome or fragment thereof.
  • the pathogen nucleic acid may be at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof.
  • the target nucleic acid if present in the biological sample, is not from the human genome.
  • the probes may be placed on an insoluble support.
  • the support may be a microarray or a biochip.
  • the signal intensities of the 1948 probes were ranked in decreasing order and were correlated with their corresponding AES value.
  • the p-value was found to be ⁇ 2.2e ⁇ 16 on the average. This indicates that the correlation between the signal intensity of probe at position i of RSV B with AES i is not at all random. Further investigations revealed that about 300 probes, which consistently produced high signal intensities in all five experiments, have amplification efficiency scores in the 90 th percentile level.
  • the amplification efficiency model according to the invention is able to predict the relative strength of signals produced by different regions of a viral genome in the described experiment set-up.
  • Probes from regions with low amplification efficiency scores have a high tendency to produce no or low signal intensities. This would result in a false negative on the microarray. Such probes will complicate the analysis of the microarray data and this is made even more complicated since a probe with a low signal intensity may be due to its target genome not being present or simply that it was not amplified.
  • probes in regions with reasonably high amplification efficiency scores should be selected to minimize inaccuracies caused by the RT-PCR process using random primers.
  • the threshold for amplification efficiency scores for probe selection for a virus v a is determined by the cumulative distribution function of the AES values v a .
  • X be the random variable representing the AES values of all probes of v a .
  • k be the number of probes in v a .
  • x i For a probe p i at position i of v a , let x i be its corresponding AES value.
  • probe p i is selected if P(p i
  • the present invention also provides a method of probe design and/or of target nucleic acid detection wherein a probe p i at position i of a target nucleic acid v a is selected if P(p i
  • the invention will be described in more particularity with reference to a pathogen detection chip analysis (also referred to as PDC).
  • the chip data here refers to the collective information provided by the probe signals on the PDC.
  • the invention provides a method wherein the mean of the signal intensities of the probes which hybridize to v a is statistically higher than the mean of the probes ⁇ v a , which may indicate the presence of v a in the biological sample.
  • the chip data D for the presence of viruses was analyzed as follows. For every virus v a ⁇ V, we used a one-tail t-test (Goulden, C. H., 1956) to determine if the mean of the signal intensities of the probes ⁇ v a was statistically higher than that of the signal intensities of the probes ⁇ v a .
  • t i ⁇ a - ⁇ a ′ ⁇ a 2 n a + ⁇ a ′ 2 n a ′
  • ⁇ a , ⁇ a 2 and n a is the mean, variance, and size of the signal intensities of the probes ⁇ v a respectively
  • ⁇ s , ⁇ a 2 , and n a is the mean, variance, and size of the signal intensities of the probes ⁇ v a respectively.
  • the level of significance was set to 0.05. This means that the hypothesis that the mean of the signal intensities of the probes ⁇ v a is higher than that of the signal intensities of the probes ⁇ v a would only be accepted if the p-value of t a ⁇ 0.05. In this case, v a is likely to be present in the sample.
  • the t-test alone which allows the inventors to know if the distribution of the signal intensities of a virus is different from that of other viruses, may not be sufficient to determine if a particular virus is in the sample. It is also essential to know how similar or different the two distributions are.
  • a ruler that can be used to measure the similarity between a true distribution and a model distribution is the Kullback-Leiber divergence (Kullback and Leiber, 1951) (also known as the relative entropy).
  • the probability distribution of the signal intensities of the probes in v a is the true distribution while the probability distribution of the signal intensities of all the probes in P is the model distribution.
  • P a be the set of probes in v a .
  • P ) ⁇ ⁇ ⁇ x ⁇ max ⁇ ( D ) ⁇ f a ⁇ ( x ) ⁇ ⁇ log ⁇ ( f a ⁇ ( x ) f ⁇ ( x ) )
  • is the mean signal intensity of the probes in P
  • f a (x) is the fraction of probes in P a with signal intensity x
  • f(x) is the fraction of probes in P with signal intensity x.
  • the Kullback-Leibler divergence is the collective difference over all x of two probability distributions.
  • the Kullback-Leibler divergence is good at finding shifts in a probability distribution, it is not always so good at finding spreads, which affect the tails of the probability distribution more.
  • the tails of the probability distribution provides the most information about whether a virus is present in the sample.
  • the Kullback-Leibler divergence statistic must be improved to reflect more accurately such an observation.
  • P ) ⁇ ⁇ ⁇ x ⁇ max ⁇ ( D ) ⁇ f a ⁇ ( x ) ⁇ ⁇ log ⁇ ⁇ f a ⁇ ( x ) f ⁇ ( x ) Q ⁇ ( x ) ⁇ [ 1 - Q ⁇ ( x ) ]
  • Q(x) is the cumulative distribution function of the signal intensities of the probes in P.
  • Empirical tests show that in samples where there are no viruses, viruses that pass the t-test with significance level 0.05 have WKL ⁇ 5.0. In samples where there is indeed a virus present, the actual viruses not only pass the t-test with significance level 0.05 but are also the only viruses to have WKL>5.0. Thus we set the Weighted Kullback-Leiber divergence threshold for a virus to be present in the sample to be 5.0.
  • This analysis framework is shown in FIG. 5 .
  • the present inventors present 2 sets of experiments to demonstrate the effects of probe design on experimental results and then to show the robustness of the analysis algorithm according to the present invention.
  • the analysis algorithm correctly detected the actual virus in the 3 samples and also the negative sample.
  • the Weighted Kullback-Leibler divergence of the acutal viruses in Experiment 1, 2 and 3 was greater than that of the corresponding experiments without probe design. This means that the signal intensities from the actual virus were relatively higher than the background noise in the PDC. This showed that our probe design criteria had removed some bad probes from the PDC, which resulted in a more accurate analysis.
  • probe design has reduced the number of false positive viruses detected by the t-test for samples 35259 — 324 and 35179 — 122.
  • Weighted Kullback Leiber divergence for the actual virus has increased for all 4 samples. This means that the signals of the actual virus are more differentiated than the background signals when probe design criteria are applied on the PDC.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

It is provided a method of designing oligonucleotide probe(s) for nucleic acid detection comprising the following steps in any order: (i) identifying and selecting region(s) of a target nucleic acid to be amplified, the region(s) having an efficiency of amplification (AE) higher than the average AE; and (ii) designing oligonucleotide probe(s) capable of hybridizing to the selected region(s). It is also provided a method of detecting at least one target nucleic acid comprising the steps of: (i) providing a biological sample; (ii) amplifying the nucleic acid(s) of the biological sample; (iii) providing at least an oligonucleotide probe capable of hybridizing to at least a target nucleic acid, if present in the biological sample; and (iv) contacting the probe(s) with the amplified nucleic acids and detecting the probe(s) hybridized to the target nucleic acid(s). In particular, the method indicates the presence of at least a pathogen, for example a virus, in a human biological sample. The probes may be placed on a support, for example a microarray or a biochip.

Description

    FIELD OF THE INVENTION
  • The present invention relates to the field of oligoucleotide probe design and nucleic acids detection. The method according to the invention may be used for the detection of pathogens, for example viruses.
  • BACKGROUND OF THE INVENTION
  • The accurate and rapid detection of viral and bacterial pathogens in human patients and populations is of critical medical and epidemiologic importance. Historically, diagnostic techniques have relied on cell culture passaging and various immunological assays or staining procedures. More recently PCR-based assays have been implemented, allowing for more rapid diagnosis of suspected pathogens with higher degree of sensitivity of detection. In clinical practice, however, the etiologic agent often remains unidentified, eluding detection in myriad ways. For example, some viruses are not amenable to culturing. At other times, a patient's sample may be of too poor quality or of insufficient titre for pathogen detection by conventional techniques. Moreover, both PCR- and antibody-based approaches may fail to recognize suspected pathogens simply due to natural genetic diversification resulting in alterations of PCR primer binding sites and antigenic drift.
  • While the concept of using oligonucleotide hybridization microarrays as a tool for determining the presence of pathogens has been proposed, significant hurdles remain, preventing the use of these microarrays routinely (Striebel, H. M., 2003). These hurdles include probe design and data analysis (Striebel, H. M., 2003; Bodrossy, L. & Sessitsch, A., 2004; Vora, G. J., et al., 2004).
  • Accordingly, there is a need in this field of technology for alternative and improved methods of detection of nucleic acids. In particular, there is a need for alternative and/or improved diagnostic methods for the detection of pathogens.
  • SUMMARY OF THE INVENTION
  • The present invention addresses the problems above, and in particular provides an alternative and/or improved method of probe design and/or for nucleic acid detection.
  • According to a first aspect, the present invention provides a method of designing oligonucleotide probe(s) for nucleic acid detection comprising the following steps in any order:
      • (i) identifying and selecting region(s) of a target nucleic acid to be amplified, the region(s) having an efficiency of amplification (AE) higher than the average AE; and
      • (ii) designing oligonucleotide probe(s) capable of hybridizing to the selected region(s).
  • In particular, in step (i) a score of AE is determined for every position i on the length of the target nucleic acid or of a region thereof and subsequently, an average AE score is obtained. Those regions showing an AE score higher than the average may be selected as the region(s) of the target nucleic acid to be amplified. In particular, the AE of the selected region(s) may be calculated as the Amplification Efficiency Score (AES), which is the probability that a forward primer ri can bind to a position i and a reverse primer rj can bind at a position j of the target nucleic acid, and |i-j| is the region of the target nucleic acid desired to be amplified. In particular, the region |i-j| may be ≦10000 bp, more in particular ≦5000 bp, or ≦1000 bp, for example ≦500 bp. In particular, the forward and reverse primers may be random primers.
  • According to another aspect, the step (i) comprises determining the effect of geometrical amplification bias for every position of a target nucleic acid, and selecting the region(s) to be amplified as the region(s) having an efficiency of amplification (AE) higher than the average AE. For example, the geometrical amplification bias is the PCR bias.
  • The step (ii) of designing oligonucleotide probe(s) capable of hybridizing to the region(s) selected in step (i) may be carried out according to any probe designing technique known in the art. In particular, the oligonucleotide probe(s) capable of hybridizing to the selected region(s) may be selected and designed according to at least one of the following criteria:
      • (a) the selected probe(s) has a CG-content from 40% to 60%;
      • (b) the probe(s) is selected by having the highest free energy computed based on Nearest-Neighbor model;
      • (c) given probe sa and probe sb substrings of target nucleic acids va and vb, sa is selected based on the hamming distance between sa and any length-m substring sb and/or on the longest common substring of sa and probe sb;
      • (d) for any probe sa of length-m specific for the target nucleic acid va, the probe sa is selected if it does not have any hits with any region of a nucleic acid different from the target nucleic acid, and if the probe sa length-m has hits with the nucleic acid different from the target nucleic acid, the probe sa length-m with the smallest maximum alignment length and/or with the least number of hits is selected; and
      • (e) a probe pi at position i of a target nucleic acid is selected if pi is predicted to hybridize to the position i of the amplified target nucleic acid.
  • Accordingly, two or more of the criteria indicated above may be used for designing the oligonucleotide probe(s). For example, the probe(s) may be designed by applying all criteria (a) to (e). Other criteria not explicitly mentioned herein but which are within the knowledge of a skilled person in the art may also be used.
  • In particular, under the criterion (e), a probe pi at position i of a target nucleic acid va is selected if P(pi|va)>λ, wherein λ is 0.5 and P(pi|va) is the probability that pi has to hybridize to the position i of the target nucleic acid Va. More in particular, λ is 0.75.
  • In particular, P ( p i | v a ) P ( X x i ) = c i k ,
    wherein X is the random variable representing the amplification efficiency score (AES) values of all probes of va, k is the number of probes in va, and ci is the number of probes whose AES values are ≦xi.
  • According to another aspect of the invention, the method of designing the oligonucleotide probe(s) as described above further comprises a step of preparing the selected and designed probe(s). The probe may be prepared according to any standard method known in the art. For example, by chemical synthesis.
  • According to another aspect, the present invention provides a method of detecting at least one target nucleic acid comprising the steps of:
      • (i) providing a biological sample;
      • (ii) amplifying the nucleic acid(s) of the biological sample;
      • (iii) providing at least one oligonucleotide probe capable of hybridising to at least a target nucleic acid, if present in the biological sample, wherein the probe(s) is prepared by using a method according to any aspect of the invention described herein; and
      • (iv) contacting the probe(s) with the amplified nucleic acids and detecting the probe(s) hybridised to the target nucleic acid(s).
  • The amplification step (ii) may be carried out in the presence of random primers. For example, the amplification step (ii) may be carried out in the presence of more than two random primers. Any amplification method known in the art may be used. For example, the amplification is a RT-PCR.
  • In particular, a forward random primer binding to position i and a reverse random primer binding to position j of a target nucleic acid va are selected among primers having an amplification efficiency score (AESl) for every position i of a target nucleic acid va of AES i = j = i - Z i { P f ( j ) × k = i + 1 j + Z P r ( k ) } ,
    wherein k = i + 1 j + Z P r ( k ) = P r ( i + 1 ) + P r ( i + 2 ) + P r ( j + Z ) ,
    Pf (i) and Pr (i)are the probability that a random primer ri can bind to position i of va as forward primer and reverse primer, respectively, and Z≦10000 bp is the region of va desired to be amplified. More in particular, Z may be ≦5000 bp, ≦1000 bp, or ≦500 bp.
  • The amplification step may comprise forward and reverse primers, and each of the forward and reverse primers may comprise, in a 5′-3′ orientation, a fixed primer header and a variable primer tail, and wherein at least the variable tail hybridizes to a portion of the target nucleic acid va. In particular, the amplification step may comprise forward and/or reverse random primers having the nucleotide sequence of SEQ ID NO:1 or a variant or derivative thereof.
  • The biological sample may be any sample taken from a mammal, for example from a human being. The biological sample may be tissue, sera, nasal pharyngeal washes, saliva, any other body fluid, blood, urine, stool, and the like. The biological sample may be treated to free the nucleic acid comprised in the biological sample before carrying out the amplification step. The target nucleic acid may be any nucleic acid which is intended to be detected. The target nucleic acid to be detected may be at least a nucleic acid exogenous to the nucleic acid of the biological sample. Accordingly, if the biological sample is from a human, the exogenous target nucleic acid to be detected (if present in the biological sample) is a nucleic acid which is not from human origin. According to an aspect of the invention, the target nucleic acid to be detected is at least a pathogen genome or fragment thereof. The pathogen nucleic acid may be at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof.
  • Accordingly, the invention provides a method of detection of at least a target nucleic acid, if present, in a biological sample. The method may be a diagnostic method for the detection of the presence of a pathogen in the biological sample. For example, if the biological sample is obtained from a human being, the target nucleic acid, if present in the biological sample, is not from human.
  • The probe(s) designed and prepared according to any method of the present invention may be used in solution or may be placed on an insoluble support. For example, the probe(s) may be applied, spotted or printed on an insoluble support according to any technique known in the art. The support may be a solid support or a gel. The support with the probes applied on it may be a microarray or a biochip.
  • The probes are then contacted with the nucleic acid(s) of the biological sample, and, if present, the target nucleic acid(s) and the probe(s) hybridize, and the presence of the target nucleic acid is detected. In particular, in the detection step (iv), the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, thereby indicating the presence of va in the biological sample.
  • More in particular, in the detection step (iv), the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, and the method further comprises the step of computing the relative difference of the proportion of probes ∉ va having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes va being more positively skewed than that of probes ∉ va, thereby indicating the presence of va in the biological sample.
  • For example, in the detection step (iv), the presence of a target nucleic acid in a biological sample is given by a value of t-test ≦0.1 and/or a value of Weighted Kullback-Leibler divergence of ≧1.0, preferably ≧5.0. In particular, the t-test value is ≦0.05.
  • According to another aspect, the present invention provides a method of determining the presence of a target nucleic acid va comprising detecting the hybridization of a probe (the probe being selected and designed according to any known method in the art and not necessary limited to the methods according to the present invention) to a target nucleic acid va and wherein the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, thereby indicating the presence of va. In particular, the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, and the method further comprises the step of computing the relative difference of the proportion of probes ∉ va having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes va being more positively skewed than that of probes ∉ va, thereby indicating the presence of va. More in particular, the presence of a target nucleic acid in a biological sample is given by a value of t-test ≦·0.1 and/or a value of Weighted Kullback-Leibler divergence of ≧1.0, preferably ≧5.0. For example, the t-test value may be ≦0.05.
  • According to another aspect, the present invention provides a method of detecting at least a target nucleic acid, comprising the steps of:
      • (i) providing a biological sample;
      • (ii) amplifying the nucleic acid(s) of the biological sample;
      • (iii) providing at least one oligonucleotide probe capable to hybridize to at least a target nucleic acid, if present in the biological sample; and
      • (iv) contacting the probe(s) with the amplified nucleic acids and detecting the probe(s) hybridized to the target nucleic acid(s), wherein the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, thereby indicating the presence of va in the biological sample.
  • In step (iv), the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, and the method further comprises the step of computing the relative difference of the proportion of probes ∉ va having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes va being more positively skewed than that of probes ∉ va, thereby indicating the presence of va in the biological sample. In particular, in step (iv) the presence of a target nucleic acid in a biological sample is given by a value of t-test ≦0.1 and/or a value of Weighted Kullback-Leibler divergence of ≧1.0, preferably ≧5.0. The t-test value may be ≦0.05. The nucleic acid to be detected is nucleic acid exogenous to the nucleic acid of the biological sample. The target nucleic acid to be detected may be at least a pathogen genome or fragment thereof. The pathogen nucleic acid may be at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof. In particular, when the sample is obtained from a human being, the target nucleic acid, if present in the biological sample, is not from the human genome. The probes may be placed on an insoluble support. The support may be a microarray or a biochip.
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 shows a RT-PCR binding process of a pair of random primers on a virus sequence.
  • FIG. 2 shows an Amplification Efficiency Scoring (AES) Map for the RSV B genome.
  • FIG. 3 shows signal intensities for 1 experiment for RSV B.
  • FIGS. 4(A, B). FIG. 4A shows the density distribution of signal intensities of a virus that is the sample tested. An arrow indicates the positive skewness of the distribution. This indicates that although there is noise, there is significant amount of real signals as well. FIG. 4B shows the density distribution of signal intensities of a virus not in the sample. It is noise dominant.
  • FIG. 5 shows an analysis framework of pathogen detection chip data.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Bibliographic references mentioned in the present specification are, for convenience, listed in the form of a list of references and added to the end of the examples. The whole content of such bibliographic references is herein incorporated by reference.
  • The present invention addresses the problems of the prior art, and in particular provides an alternative and/or improved method of probe design and/or of nucleic acids detection.
  • While the concept of using oligonucleotide hybridization microarrays as a tool for determining the presence of pathogens has been proposed, significant hurdles remain, thus preventing the use of these microarrays routinely (Striebel, H. M., 2003). These hurdles include probe design and data analysis (Striebel, H. M., 2003; Bodrossy, L. & Sessitsch, A., 2004; Vora, G. J., et al., 2004). The present inventors observed in a pilot microarray that despite meticulous probe selection, the best in silico designed probes do not necessarily hybridize well to patient samples. However, purified samples from cell-culture hybridized well to the probes on the microarray. The inventors realized that to generate probes which would hybridize consistently well to patient material, it would be necessary to develop a new and/or improved method of probe design so as to determine the optimal design predictors. In particular, as described in the Example section, the present inventors created a microarray comprising overlapping 40-mer probes, tiled across 35 viral genomes. However, the invention is not limited to this particular application, probe length and type of target nucleic acid.
  • According to a particular aspect of the invention, the present inventors describe how a support, in particular a microarray platform, is optimized so as to become a viable tool in target nucleic acid detection, in particular, in pathogen detection. The inventors also identified probe design predictors, including melting temperature, GC-content of the probe, secondary structure, hamming distance, similarity to human genome, effect of PCR primer tag in random PCR amplification efficiency, and/or the effect of sequence polymorphism. These results were considered and/or incorporated into the development of a method and criteria for probe design. According to a more particular aspect, the inventors developed a data analysis algorithm which may accurately predict the presence of a target nucleic acid, which may or may not be a pathogen. For example the pathogen may be, but not limited to, a virus, bacteria and/or parasite(s). The algorithm may be used even if probes are not ideally designed. This detection algorithm, coupled with a probe design methodology, significantly improves the confidence level of the prediction (see Tables 1 and 2).
  • According to a particular aspect, the method of the invention may not require a prediction of the likely pathogen, but may be capable of detecting most known human viruses, bacteria and/or parasite(s), as well as some novel species, in an unbiased manner. Genome or a fragment thereof is defined as all the genetic material in the chromosomes of an organism. DNA derived from the genetic material in the chromosomes of a particular organism is genomic DNA. A genomic library is a collection of clones made from a set of randomly generated overlapping DNA fragments representing the entire genome of an organism. The rationale behind this detection platform according to the invention is that each species of virus, bacteria and/or parasite(s) contains unique molecular signatures within the primary sequence of their genomes. Identification of these distinguishing regions allows for rational oligonucleotide probe design for the specific detection of individual species, and in some cases, individual strains. The concomitant design and/or preparation of oligonucleotide (oligo) probes that represent the most highly conserved regions among family and genus members, will enable the detection and partial characterization of some novel pathogens. Furthermore, the inclusion of all such probes in a single support may allow the detection of multiple viruses, bacteria and/or parasite(s) that simultaneously co-infect a clinical sample. The support may be an insoluble support, in particular a solid support. For example, a microarray or a biochip assay.
  • According to a particular aspect, the invention may be used as a diagnostic tool, depending on the way in which oligonucleotide probes are designed, and/or how the data generated by the microarray is interpreted and analyzed.
  • Determination of Efficiency of Amplification
  • According to a first aspect, the present invention provides a method of designing oligonucleotide probe(s) for nucleic acid detection comprising the following steps in any order:
      • (i) identifying and selecting region(s) of a target nucleic acid to be amplified, the region(s) having an efficiency of amplification (AE) higher than the average AE; and
      • (ii) designing oligonucleotide probe(s) capable of hybridizing to the selected region(s).
  • In particular, in step (i) a score of AE is determined for every position i on the length of the target nucleic acid or of a region thereof and an average AE is obtained. Those regions showing an AE higher than the average may be selected as the region(s) of the target nucleic acid to be amplified. In particular, the AE of the selected region(s) may be calculated as the Amplification Efficiency Score (AES), which is the probability that a forward primer ri can bind to a position i and a reverse primer rj can bind at a position j of the target nucleic acid, and |i-j| is the region of the target nucleic acid desired to be amplified. In particular, the region |i-j| may be ≦10000 bp, more in particular ≦5000 bp, or ≦1000 bp, for example ≦500 bp. In particular, the forward and reverse primers may be random primers.
  • According to another aspect, the step (i) of identifying and selecting region(s) of a target nucleic acid to be amplified comprises determining the effect of geometrical amplification bias for every position of a target nucleic acid, and selecting the region(s) to be amplified as the region(s) having an efficiency of amplification (AE) higher than the average AE. The geometrical amplification bias may be defined as the capability of some regions of a nucleic acid to be amplified more efficiently than other regions. For example, the geometrical amplification bias is the PCR bias.
  • Modeling of Amplification Efficiency
  • Since it is not known what target nucleic acid (for example a pathogen) exists within the patient sample, random primers may be used during the amplification step and/or the reverse-transcription (RT) process to ensure unbiased reverse-transcription of all RNA present into DNA. Any random amplification method known in the art may be used for the purposes of the present invention. In the present description, the random amplification method will be RT-PCR. However, it will be clear to a skilled person that the method of the present invention is not limited to RT-PCR. In particular, the RT-PCR approach may be susceptible to signal inaccuracies caused by primer-dimer bindings and poor amplification efficiencies in the RT-PCR process (Bustin, S. A., et al, 2004). To overcome this hurdle, the inventors have modeled the RT-PCR process by using random primers.
  • According to a particular aspect of the invention, the amplification step comprises forward and reverse primers, and each of the forward and reverse primers comprises, in a 5′-3′ orientation, a fixed primer header and a variable primer tail, and wherein at least the variable tail hybridizes to a portion of the target nucleic acid va. The size of the fixed primer header and that of the variable primer tail may be of any size, in mer, suitable for the purposes of the method according to the present invention. The fixed header may be 10-30 mer, in particular, 15-25 mer, for example 17 mer. The variable tail may be 1-20 mer, in particular, 5-15 mer, for example 9 mer. An example of these forward and reverse primers is shown in FIG. 1. More in particular, the amplification step may comprise forward and/or reverse random primers having the nucleotide sequence 5′-GTTTCCCAGTCACGATANNNNNNNNN-3′, (SEQ ID NO:1), wherein N is any one of A, T, C, and G or a derivative thereof.
  • In particular, the present inventors have modeled the random RT-PCR process as follows. Let va be the actual virus in the sample. The random primer used in the RT-PCR process has a fixed 17-mer header and a variable 9-mer tail of the form (5′-GTTTCCCAGTCACGATANNNNNNNN-3′)(SEQ ID NO:1). To get a RT-PCR product in a region between positions i and j of va, the inventors required (1) a forward primer binding to position i, (2) |i-j|≦1000, and (3) a reverse primer binding to position j. However, even if in the above particular example |i-j|, which is the region of the target nucleic acid desired to be amplified, is ≦1000, the region |i-j| may be ≦10000 bp, more in particular ≦5000 bp, or also ≦500 bp. The quality of the RT-PCR product depends on how well the forward primer and the reverse primer bind to va. Some random primers can bind to va better than others. The identification of such primers and where they bind to va gives an indication of how likely a particular region of va will be amplified. Using this approach, an amplification efficiency model may be proposed which computes an Amplification Efficiency Score (AES) for every position of va.
  • For a particular position i of a target nucleic acid va, Pf (i)and Pr (i) are the probabilities that a random primer ri can bind to position i of va as forward primer and reverse primer respectively. For simplicity, it is assumed that a random primer can only bind to va if the last 9 nucleotides of the random primer is a substring of the reverse complement of va (forward primer) or a substring of va (reverse primer). This is shown in FIG. 1. Based on well-established primer design criteria (Wu, D. Y., et al., 1991), the Pf (i) was estimated to be low if ri forms a significant primer-dimer or has extreme melting temperature. On the other hand, if ri does not form any significant primer-dimer and has optimal melting temperature, then Pf (i) will be high. Note that if the header of the random primer is similar to va, it may also aid in the binding and thus result in a higher Pf (i). Similarly, the Pr (i) was computed.
  • The binding of the random primer ri at position i of va as a forward primer affects the quality of the RT-PCR product for at least 1000 nucleotides upstream of position i. Similarly, the binding of the random primer ri at position i of va as a reverse primer affects the quality of the RT-PCR product for at least 10000 nucleotides downstream of position i. Thus, an amplification efficiency score, AESi, for every position i of va can be computed by considering the combined effect of all forward and reverse primer-pairs that amplifies it: AES i = j = i - Z i { P f ( j ) × k = i + 1 j + Z P r ( k ) }
      • wherein k = i + 1 j + Z P r ( k ) = P r ( i + 1 ) + P r ( i + 2 ) + P r ( j + Z )
      • Pf (i) and Pr (i) is the probability that a random primer ri can bind to position i of va as forward primer and reverse primer, respectively, and Z ≦10000 bp is the region of va desired to be amplified.
  • Z may be ≦5000 bp, ≦1000 bp or ≦500 bp. For example, Z is ≦10000 bp.
  • To verify if the variation in signal intensities displayed by different regions of a virus has direct correlation with their corresponding amplification efficiency scores, a total of five microarray experiments were performed on a common pathogen affecting human, the human respiratory syncytial virus B (RSV B).
  • Probe Design
  • The step (ii) of designing oligonucleotide probe(s) capable of hybridizing to the selected region(s) may be selected to any one of the probe designing techniques known in the art.
  • For example, given a set of target nucleic acids (for example, viral genomes) V={v1, v2, . . . , vn}, for every vi ε V, a set of length-m probes (that is a substring of vi) which satisfies the following conditions may be designed taking into consideration, for example, at least one of the following:
      • (a) established probe design criteria of homogeneity, sensitivity and specificity (Sung, W. K. & Lee, W. H., 2003);
      • (b) no significant sequence similarity to human genome; and
      • (c) efficiently amplified, for example by RT-PCR, as herein described.
        Homogeneity, Sensitivity and Specificity
  • Homogeneity requires the selection of probes which have similar melting temperatures. It was found that probes with low CG-content did not produce reliable hybridization signal intensities, and that probes with high CG-content had a propensity to produce high signal intensities through non-specific binding. Thus, it could be established that the CG-content of probes selected should be from 40% to 60%.
  • Accordingly, the present invention provides a method of designing oligonucleotide probe(s) for nucleic acid detection, comprising selecting the probes having a CG-content from 40% to 60%.
  • The term “hybridization” refers to the process in which the oligo probes bind non-covalently to the target nucleic acid, or portion thereof, to form a stable double-stranded. Triple-stranded hybridization is also theoretically possible. Hybridization probes are oligonucleotides capable of binding in a base-specific manner to a complementary strand of target nucleic acid. Hybridizing specifically refers to the binding, duplexing, or hybridizing of a molecule substantially to or only to a particular nucleotide sequence or sequences under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) of DNA or RNA. Hybridizations, e.g., allele-specific probe hybridizations, are generally performed under stringent conditions. For example, conditions where the salt concentration is no more than about 1 Molar (M) and a temperature of at least 25° C., e.g., 750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4 (5 times SSPE) and a temperature of from about 25° C. to about 30° C. Hybridization is usually performed under stringent conditions, for example, at a salt concentration of no more than 1 M and a temperature of at least 25° C. For stringent conditions, see also for example, Sambrook and Russel, Molecular Cloning: A Laboratory Manual, Cold Springs Harbor Laboratory, New York (2001) which is hereby incorporated by reference in its entirety for all purposes above.
  • Sensitivity requires that probes that cannot form significant secondary structures be selected in order to detect low-abundance mRNAs. Thus, probes with the highest free energy computed based on Nearest-Neighbor model are selected (SantaLucia, J., Jr., et al., 1996).
  • Accordingly, the present invention provides a method of designing oligonucleotide probe(s) for nucleic acid detection, wherein the probe(s) are selected by having the highest free energy computed based on Nearest-Neighbor model.
  • Specificity requires the selection of probes that are most unique to a viral genome. This is to minimize cross-hybridization of the probes with other non-target nucleic acids (for example, viral genomes). Given probe sa and probe sb substrings of target nucleic acids va and vb, sa is selected based on the hamming distance between sa and any length-m substring sb from the target nucleic acid vb and/or on the longest common substring of sa and probe sb. In particular, let sa and sb be length-m substrings from viral genome va and vb respectively, where (va≠vb).
  • The length of the probe(s) to be designed may be of any length useful for the purposes of the present invention. The probes may be less than 100 mer, for example 20 to 80 mer, 25 to 60 mer, for example 40 mer. The hamming distance and/or longest common substring may also vary.
  • According to Kane's criteria (Kane, M. D., et al., 2000), sa is specific to va if:
      • (a) the hamming distance between sa and any length-m substring sb from viral genome vb, is more than 0.25 m;
      • (b) the longest common substring of sa and sb is less than 15.
  • The cutoff value(s) for the hamming distance may be chosen according to the stringency desired. It will be evident to any skilled person how to select the hamming distance cutoff according to the particular stringency desired. According to a particular example of the herein described probe design, the inventors used hamming distance cutoffs of >10 for specific probes, and <10, preferably <5 for conserved probes. With a specific probe, it indicates a probe which only hybridizes to a specific target nucleic acid, while with a conserved probe it indicates a probe which may hybridize to any member of the family of the target nucleic acid.
  • Accordingly, the present invention also provides a method of designing oligonucleotide probe(s) for nucleic acid detection, wherein given probe sa and probe sb substrings of target nucleic acids va and vb comprised in the biological sample, sa is selected if the hamming distance between sa and any length-m substring sb from the target nucleic acid vb is more than 0.25m, and the longest common substring of sa and probe sb is less than 15.
  • Sequence Similarity to Human Genome
  • In case the target nucleic acid to be detected is of human origin (for example, human samples containing viral genomes), probes with high homology to the human genome should also be avoided. Accordingly, for any probe sa of length-m specific for the target nucleic acid va, the probe sa is selected if it does not have any hits with any region of a nucleic acid different from the target nucleic acid, and if the probe sa length-m has hits with the nucleic acid different from the target nucleic acid, the probe sa length-m with the smallest maximum alignment length and/or with the least number of hits is selected. In particular, for any length-m probe sa, hits of sa with the human genome are found with the BLAST algorithm (Altschul, S. F., et al., 1997). A BLAST word size of (W=15) and an expectation value of 100 was used to find all hits. sa is selected if it does not have any hits with the human genome, that is, it is specific to va. However, if all length-m substrings of va have hits with the human genome, those with the smallest maximum alignment length and with the least number of hits was selected.
  • Accordingly, the present invention provides a method of designing oligonucleotide probe(s) for nucleic acid detection, wherein for any probe sa of length-m specific for the target nucleic acid va, the probe sa is selected if it does not have any hits with any region of a nucleic acid different from the target nucleic acid, and if the probe sa length-m has hits with the nucleic acid different from the target nucleic acid, the probe sa length-m with the smallest maximum alignment length and/or with the least number of hits is selected.
  • Further, the design of the oligonucleotide probe(s) may be also carried out by AES according to the invention. In particular, the invention provides a method of selecting and designing probes wherein a probe pi at position i of a target nucleic acid is selected if pi is predicted to hybridize to the position i of the amplified target nucleic acid.
  • In particular, the oligonucleotide probe(s) capable of hybridizing to the selected region(s) may be selected and designed according to at least one of the following criteria:
      • (a) the selected probe(s) has a CG-content from 40% to 60%;
      • (b) the probe(s) is selected by having the highest free energy computed based on Nearest-Neighbor model;
      • (c) given probe sa and probe sb substrings of target nucleic acids va and vb, sa is selected based on the hamming distance between sa and any length-m substring sb from the target nucleic acid vb and/or on the longest common substring of sa and probe sb;
      • (d) for any probe sa of length-m specific for the target nucleic acid va, the probe sa is selected if it does not have any hits with any region of a nucleic acid different from the target nucleic acid, and if the probe sa length-m has hits with the nucleic acid different from the target nucleic acid, the probe sa length-m with the smallest maximum alignment length and/or with the least number of hits is selected; and
      • (e) a probe pi at position i of a target nucleic acid is selected if pi is predicted to hybridize to the position i of the amplified target nucleic acid.
  • According to a particular aspect of the invention, two or more of the criteria indicated above may be used for designing the oligonucleotide probe(s). For example, the probe(s) may be designed by applying all criteria (a) to (e). Other criteria, not explicitly mentioned herein but which are evident to a skilled person in the art may also be used.
  • In particular, under the criterion (e), a probe pi at position i of a target nucleic acid va is selected if P(pi|va)>λ, wherein λ is 0.5 and P(pi|va) is the probability that pi has to hybridize to the position i of the target nucleic acid va. More in particular, λ is 0.75.
  • According to another aspect, the invention provides a method as above described wherein P(pi|va)≈P(X≦xi)=ci/k, wherein X is the random variable representing the amplification efficiency score (AES) values of all probes of va, k is the number of probes in va, and ci is the number of probes whose AES values are ≦xi.
  • Synthesis of Oligonucleotide Probes on a Support
  • According to another aspect of the invention, the method of selecting and designing the oligonucleotide probe(s) as described above further comprises a step of preparing the selected and designed probe(s). Designing a probe comprises understanding its sequence and/or designing it by any suitable means, for example by using a software. The step of preparing the probe comprises the physical preparation of it. The probe may be prepared according to any standard method known in the art. For example, the probes may be chemically synthesized or prepared by cloning. For example, as described in Sambrook and Russel, 2001.
  • The probe(s) designed and prepared according to any method of the present invention may used in solution or may be placed on an insoluble support. For example, may be applied, spotted or printed on an insoluble support according to any technique known in the art. The support may be a solid support or a gel. The support with the probes applied on it, may be a microarray or a biochip.
  • More in particular, the present invention provides an oligo microarray hybridization-based approach for the rapid detection and identification of pathogens, for example viral and/or bacterial pathogens, from PCR-amplified cDNA prepared from primary tissue samples. In particular, from random PCR-amplified cDNA(s).
  • In the following description, the preparation of probes is made with particular reference to a microarray. However, the support, as well as the probes, may be prepared according to any description across the whole content of the present application. In particular, an “array” is an intentionally created collection of molecules which can be prepared either synthetically or biosynthetically. The molecules in the array can be identical or different from each other. The array can assume a variety of formats, e.g., libraries of soluble molecules; libraries of compounds tethered to resin beads, silica chips, or other solid supports. Array Plate or a Plate is a body having a plurality of arrays in which each array is separated from the other arrays by a physical barrier resistant to the passage of liquids and forming an area or space, referred to as a well.
  • Sample Preparation and Hybridization onto the Microarray
  • The biological sample may be any sample taken from a mammal, for example from a human being. The biological sample may be blood, a body fluid, saliva, urine, stool, and the like. The biological sample may be treated to free the nucleic acid comprised in the biological sample before carrying out the amplification step. The target nucleic acid may be any nucleic acid which is intended to be detected. The target nucleic acid to be detected may be at least a nucleic acid exogenous to the nucleic acid of the biological sample. Accordingly, if the biological sample is from a human, the exogenous target nucleic acid to be detected (if present in the biological sample) is a nucleic acid which is not from human origin. According to an aspect of the invention, the target nucleic acid to be detected is at least a pathogen genome or fragment thereof. The pathogen nucleic acid may be at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof.
  • According to an aspect of the present invention, there is provided a method of target nucleic acid detection analysis. The target nucleic acid(s) from a biological sample desired to be detected may be any target nucleic acid, RNA and/or DNA. For example, mRNA and/or cDNA. More in particular, the target nucleic acid to be detected may be a pathogen or non-pathogen. For example, it may be the genome or a fragment thereof of at least one virus, at least one bacterium and/or at least one parasite. The probes selected and/or prepared may be placed, applied and/or fixed on a support according to any standard technology known to a skilled person in the art. The support may be an insoluble support, for example a solid support. In particular, a microarray and/or a biochip.
  • According to a particular example, RNA and DNA was extracted from patient samples e.g. tissues, sera, nasal pharyngeal washes, stool using established protocols and commercial kits. For example, Qiagen Kit for nucleic acid extraction may be used. Alternatively, Phenol/Chloroform may also be used for the extraction of DNA and/or RNA. Any technique known in the art, for example as described in Sambrook and Russel, 2001 may be used. RNA was reverse-transcribed to CDNA using tagged random primers, based on a protocol described by Bohlander et. al., 1992 and Wang et. al., 2003. The cDNA was then amplified by random PCR. Fragmentation, labeling and hybridization of sample to the microarray were carried out as described by Wong et. al., 2004.
  • Microarray Synthesis
  • According to a particular experiment described in the Examples section, the present inventors selected 35 viral genomes representing the most common causes of viral disease in Singapore. Using the complete genome sequences downloaded from Genbank, 40-mer probes which tiled across the entire genomes and overlapping at five-base resolution were generated. Seven replicates of each virus probe were synthesized directly onto the microarray using Nimblegen technology (Nuwaysir, E. F., et al., 2002). The probes were randomly distributed on the microarray to minimize the effects of hybridization artifacts. To control the non-specific hybridization of sample to probes, 10,000 oligonucleotide probes were designed and synthesized onto the microarray. These 10,000 oligonucleotides did not have any sequence similarity to the human genome, or to the pathogen genomes. They were random probes with 40-60% CG-content. These probes measured the background signal intensity. As a positive control, 400 oligonucleotide probes to human genes which have been known or inferred functions in immune response were synthesized on the array. A plant virus, PMMV, was included as a negative control, for a total of approximately 380,000 probes. In the following description, the invention will be described in more particularity with reference to a pathogen detection chip analysis (also referred to as PDC). However, the analysis (method) is not limited to this particular embodiment, but encompasses the several aspects of the invention as described across the whole content of the present application.
  • Non-Specific Hybridization Controls
  • To control the non-specific hybridization of sample to probes, 10,000 oligonucleotide probes were designed and synthesized onto the microarray. These 10,000 oligonucleotides did not have any sequence similarity to the human genome, or to the pathogen genomes. They were random probes with 40-60% CG-content. These probes measured the background signal intensity.
  • Method of Detecting Target Nucleic Acid(s)
  • According to another aspect, the present invention provides a method of detecting at least one target nucleic acid comprising the step of:
      • (i) providing a biological sample;
      • (ii) amplifying the nucleic acid(s) of the biological sample;
      • (iii) providing at least one oligonucleotide probe capable of hybridizing to at least a target nucleic acid, if present in the biological sample, wherein the probe(s) is prepared by using a method according to any aspect of the invention herein described;
      • (iv) contacting the probe(s) with the amplified nucleic acids and detecting the probe(s) hybridized to the target nucleic acid(s).
  • The amplification step (ii) may be carried out in the presence of random primers. For example, the amplification step (ii) may be carried out in the presence of more than two random primers. Any amplification method known in the art may be used. For example, the amplification is a RT-PCR.
  • In particular, the present inventors developed a method of detecting the probe(s) hybridized to the to the target nucleic acid based on the amplification efficiency score (AES). This may herein also be referred to as the algorithm according to the present invention. In particular, a forward random primer binding to position i and a reverse random primer binding to position j of a target nucleic acid va are selected among primers having an amplification efficiency score (AESl) for every position i of a target nucleic acid va of: AES i = j = i - Z i { P f ( j ) × k = i + 1 j + Z P r ( k ) } ,
    wherein k = i + 1 j + Z P r ( k ) = P r ( i + 1 ) + P r ( i + 2 ) + P r ( j + Z )
      • Pf (i)and Pr (i)are the probabilities that a random primer ri can bind to position i of va as forward primer and reverse primer, respectively, and Z≦10000 bp is the region of va desired to be amplified. More in particular, Z may be ≦5000 bp, ≦1000 bp, or ≦500 bp.
  • The amplification step may comprise forward and reverse primers, and each of the forward and reverse primers may comprise, in a 5′-3′ orientation, a fixed primer header and a variable primer tail, and wherein at least the variable tail hybridizes to a portion of the target nucleic acid va. In particular, the amplification step may comprise forward and/or reverse random primers having the nucleotide sequence of SEQ ID NO:1 or a variant or derivative thereof.
  • The biological sample may be any sample taken from a mammal, for example from a human being. The biological sample may be tissue, sera, nasal pharyngeal washes, saliva, any other body fluid, blood, urine, stool, and the like. The biological sample may be treated to free the nucleic acid comprised in the biological sample before carrying out the amplification step. The target nucleic acid may be any nucleic acid which is intended to be detected. The target nucleic acid to be detected may be at least a nucleic acid exogenous to the nucleic acid of the biological sample. Accordingly, if the biological sample is from a human, the exogenous target nucleic acid to be detected (if present in the biological sample) is a nucleic acid which is not from human origin. According to an aspect of the invention, the target nucleic acid to be detected is at least a pathogen genome or fragment thereof. The pathogen nucleic acid may be at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof.
  • Accordingly, the invention provides a method of detection of at least a target nucleic acid, if present, in a biological sample. The method may be a diagnostic method for the detection of the presence of a pathogen into the biological sample. For example, if the biological sample is obtained from a human being, the target nucleic acid, if present in the biological sample, is not from human.
  • The probe(s) designed and prepared according to any method of the present invention may used in solution or may be placed on an insoluble support. For example, may be applied, spotted or printed on an insoluble support according to any technique known in the art. The support may be a solid support or a gel. The support with the probes applied on it, may be a microarray or a biochip.
  • The probes are then contacted with the nucleic acid of the biological sample, and if present the target nucleic acid(s) and the probe(s) hybridize, and the presence of the target nucleic acid is detected. In particular, in the detection step (iv), the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, thereby indicating the presence of va in the biological sample.
  • More in particular, in the detection step (iv), the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, and the method further comprises the step of computing the relative difference of the proportion of probes ∉ va having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes va being more positively skewed than that of probes ∉ va, thereby indicating the presence of va in the biological sample.
  • For example, in the detection step (iv), the presence of a target nucleic acid in a biological sample is given by a value of t-test ≧0.1 and/or a value of Weighted Kullback-Leibler divergence of ≧1.0, preferably ≧5.0. In particular, the t-test value is ≧0.05.
  • According to another aspect, the present invention provides a method of determining the presence of a target nucleic acid va comprising detecting the hybridization of a probe to a target nucleic acid va and wherein the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, thereby indicating the presence of va. In particular, the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, and the method further comprises the step of computing the relative difference of the proportion of probes ∉ va having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes va being more positively skewed than that of probes ∉ va, thereby indicating the presence of va. More in particular, the presence of a target nucleic acid in a biological sample is given by a value of t-test ≦0.1 and/or a value of Weighted Kullback-Leibler divergence of ≧1.0, preferably, ≧5.0. For example, the t-test value may be ≦0.05.
  • According to another aspect, the present invention provides a method of detecting at least one target nucleic acid, comprising the steps of:
      • (i) providing a biological sample;
      • (ii) amplifying the nucleic acid(s) of the biological sample;
      • (iii) providing at least one oligonucleotide probe capable of hybridizing to at least a target nucleic acid, if present in the biological sample;
      • (iv) contacting the probe(s) with the amplified nucleic acids and detecting the probe(s) hybridized to the target nucleic acid(s), wherein the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, thereby indicating the presence of va in the biological sample.
  • In step (iv), the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, and the method further comprises the step of computing the relative difference of the proportion of probes ∉ va having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes va being more positively skewed than that of probes ∉ va, thereby indicating the presence of va in the biological sample. In particular, in step (iv) the presence of a target nucleic acid in a biological sample is given by a value of t-test ≦0.1 and/or a value of Weighted Kullback-Leibler divergence of ≧1.0, preferably ≧5.0. The t-test value may be ≦0.05. The nucleic acid to be detected is nucleic acid exogenous to the nucleic acid of the biological sample. The target nucleic acid to be detected may be at least a pathogen genome or fragment thereof. The pathogen nucleic acid may be at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof. In particular, when the sample is obtained from a human being, the target nucleic acid, if present in the biological sample, is not from the human genome. The probes may be placed on an insoluble support. The support may be a microarray or a biochip.
  • Test Using the Template Sequence of RSV B
  • To verify if the variation in signal intensities displayed by different regions of a virus has direct correlation with their corresponding amplification efficiency scores, a total of five microarray experiments were performed on a common pathogen affecting human, the human respiratory syncytial virus B (RSV B).
  • Next, the probe design criteria, as described above, were applied on the template sequence of RSV B obtained from NCBI (NC001781). This resulted in 1948 probes spotted onto each microarray. The amplification efficiency map for RSV B was also computed prior to the actual experiments and shown in FIG. 2. This figure shows the peaks having the AES higher than the average AES and indicating the regions of the RSV B with higher probability of amplification.
  • Using 5 samples containing the human respiratory syncytial virus B (RSV B), independent microarray experiments were conducted. The resultant signal intensities for one such experiment is shown in FIG. 3.
  • For each experiment, the signal intensities of the 1948 probes were ranked in decreasing order and were correlated with their corresponding AES value. The p-value was found to be <2.2e 16 on the average. This indicates that the correlation between the signal intensity of probe at position i of RSV B with AESi is not at all random. Further investigations revealed that about 300 probes, which consistently produced high signal intensities in all five experiments, have amplification efficiency scores in the 90th percentile level.
  • Having shown that the described amplification efficiency model works well on the RSV B genome, it was desired to show that the model according to the invention may be extended to other viral genomes as well. Another microarray experiment was performed on the human metapneumonia virus (HMPV). This time, there were 1705 probes on the microarray. Again, the amplification efficiency map for HMPV was computed. In this experiment, the correlation test between signal intensities and amplification efficiency scores gave a p-value of 1.335e−9.
  • Thus, there is a strong indication that the amplification efficiency model according to the invention is able to predict the relative strength of signals produced by different regions of a viral genome in the described experiment set-up. Probes from regions with low amplification efficiency scores have a high tendency to produce no or low signal intensities. This would result in a false negative on the microarray. Such probes will complicate the analysis of the microarray data and this is made even more complicated since a probe with a low signal intensity may be due to its target genome not being present or simply that it was not amplified. As such, probes in regions with reasonably high amplification efficiency scores should be selected to minimize inaccuracies caused by the RT-PCR process using random primers.
  • The threshold for amplification efficiency scores for probe selection for a virus va is determined by the cumulative distribution function of the AES values va. Let X be the random variable representing the AES values of all probes of va. Let k be the number of probes in va. Then, we denote the probability that the AES value is less than or equal to x be P(X≦x)=c/k, where c is the number of probes which have AES values less than or equal to x. For a probe pi at position i of va, let xi be its corresponding AES value. Since the signal intensity of a probe is highly correlated to its AES value, we estimate P(pi|va), the probability that pi has high signal intensity in the presence of va, to be P(X≦xi). Thus, P ( p i | v a ) P ( X x i ) = c i k
    where ci is the number of probes whose AES values are less than or equal to xi.
  • For probe selection, probe pi is selected if P(pi|va)>λ. In the present experiments, λ was set as λ=0.75.
  • Accordingly, the present invention also provides a method of probe design and/or of target nucleic acid detection wherein a probe pi at position i of a target nucleic acid va is selected if P(pi|va)>λ, wherein λ is 0.75 and P(pi|va) is the probability that pi has a high signal intensity in the presence of va. More in particular, P ( p i | v a ) P ( X x i ) = c i k ,
    wherein X is the random variable representing the amplification efficiency score (AES) values of all probes of va, k is the number of probes in va, and ci is the number of probes whose AES values are less than or equal to xi.
    Target Nucleic Acid Detection Analysis
  • In the following description, the invention will be described in more particularity with reference to a pathogen detection chip analysis (also referred to as PDC). However, the analysis (method) is not limited to this particular embodiment, but encompasses the several aspects of the invention as described across the whole content of the present application. Therefore, in particular, given a PDC with a set of length-m probes P={p1, p2, . . . , pl}, which is designed for a set of viral genomes V={v1, v2, . . . , vn}, the pathogen detection chip analysis problem is to detect the virus present in the sample based on the chip data. The chip data here refers to the collective information provided by the probe signals on the PDC. Thus, the chip data D={di, d2, . . . , dx} is the set of corresponding signals of the probe set P on the PDC.
  • Given a sample, it is not known what pathogens are present in the sample, how many different pathogens there are, if present at all. However, if a virus va is indeed in the sample, then the signal intensities of the probes of va should differ significantly from the signal intensities of probes from other viruses. Specifically, a higher proportion of probes of va should have high signal intensities compared to other viruses. Hence, it would be expected that the mean of the signal intensities of the probes in va should be statistically higher than that of probes ∉ va.
  • Accordingly, the invention provides a method wherein the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, which may indicate the presence of va in the biological sample.
  • However, having a statistically higher mean may still be insufficient to conclude that va is in the sample. Preferably, an additional step may be required. We need to compute the relative difference of the proportion of probes ∉ va having high signal intensities to the proportion of probes on the PDC having high signal intensities. This is based on the observation that the distribution of the signal intensities of probes ε va is more positively skewed than that of probes ∉ va (see the arrow in FIG. 4A. For comparison see FIG. 4B).
  • Based the above observations, the chip data D for the presence of viruses was analyzed as follows. For every virus va ε V, we used a one-tail t-test (Goulden, C. H., 1956) to determine if the mean of the signal intensities of the probes ε va was statistically higher than that of the signal intensities of the probes ∉ va. Thus, the t-statistic was computed: t i = μ a - μ a σ a 2 n a + σ a 2 n a
    where μa, σa 2 and na is the mean, variance, and size of the signal intensities of the probes ε va respectively and μs, σa 2, and na, is the mean, variance, and size of the signal intensities of the probes ∉ va respectively.
  • To test the significance of the difference, the level of significance was set to 0.05. This means that the hypothesis that the mean of the signal intensities of the probes ε va is higher than that of the signal intensities of the probes ∉ va would only be accepted if the p-value of ta<0.05. In this case, va is likely to be present in the sample.
  • The t-test alone, which allows the inventors to know if the distribution of the signal intensities of a virus is different from that of other viruses, may not be sufficient to determine if a particular virus is in the sample. It is also essential to know how similar or different the two distributions are. A ruler that can be used to measure the similarity between a true distribution and a model distribution is the Kullback-Leiber divergence (Kullback and Leiber, 1951) (also known as the relative entropy). In this application, the probability distribution of the signal intensities of the probes in va is the true distribution while the probability distribution of the signal intensities of all the probes in P is the model distribution. Let Pa be the set of probes in va. The Kullback-Leibler (KL) divergence of the probability distribution of the signal intensities of Pa and P is: KL ( P a || P ) = μ x max ( D ) f a ( x ) log ( f a ( x ) f ( x ) )
    where μ is the mean signal intensity of the probes in P; fa(x) is the fraction of probes in Pa with signal intensity x; and f(x) is the fraction of probes in P with signal intensity x. It follows that if KL(Pa∥P)=0 then the probability distribution of Pa is exactly the same as that of P. Otherwise they are different.
  • Since a virus that is present in the sample would have signal intensities higher than that of the population, this implies that va has a chance of being present in the sample if KL(Pa∥P)>0. Thus, the larger the value of KL(Pa∥P), the more different are the two probability distributions and the more likely that va is indeed present in the sample.
  • It is important to note that the Kullback-Leibler divergence is the collective difference over all x of two probability distributions. Thus, while the Kullback-Leibler divergence is good at finding shifts in a probability distribution, it is not always so good at finding spreads, which affect the tails of the probability distribution more. As described in FIG. 4(A,B), the tails of the probability distribution provides the most information about whether a virus is present in the sample. Hence, the Kullback-Leibler divergence statistic must be improved to reflect more accurately such an observation.
  • To increase its sensitivity out on the tails, we introduced a stabilized or weighted statistic to the Kullback-Leibler divergence, the Anderson-Darling statistic (Stephens, M. A. (1974). EDF Statistics for Goodness of Fit and Some Comparisons, Journal of the American Statistical Association, Vol. 69, pp. 730-737). Thus the Weighted Kullback-Leibler divergence (WKL) is: WKL ( P a || P ) = μ x max ( D ) f a ( x ) log f a ( x ) f ( x ) Q ( x ) [ 1 - Q ( x ) ]
    where Q(x) is the cumulative distribution function of the signal intensities of the probes in P.
  • Empirical tests show that in samples where there are no viruses, viruses that pass the t-test with significance level 0.05 have WKL<5.0. In samples where there is indeed a virus present, the actual viruses not only pass the t-test with significance level 0.05 but are also the only viruses to have WKL>5.0. Thus we set the Weighted Kullback-Leiber divergence threshold for a virus to be present in the sample to be 5.0.
  • This analysis framework is shown in FIG. 5.
  • Having now generally described the invention, the same will be more readily understood through reference to the following examples, which are provided by way of illustration, and are not intended to be limiting of the present invention.
  • EXAMPLES
  • Standard molecular biology techniques known in the art and not specifically described were generally followed as described in Sambrook and Russel, Molecular Cloning: A Laboratory Manual, Cold Springs Harbor Laboratory, New York (2001).
  • The present inventors present 2 sets of experiments to demonstrate the effects of probe design on experimental results and then to show the robustness of the analysis algorithm according to the present invention.
  • Example 1
  • In the first set of experiments, a PDC containing 53555 40-mer probes from 35 viruses affecting human was used for 4 independent microarray experiments. These 53555 probes were chosen based on a 5-bps tiling of each virus and were not subjected to any of our probe design criteria. Thus, we would expect errors arising due to CG-content, cross-hybridization and inefficient amplification to be significantly more than that of a PDC with well-designed probes. We tested our analysis algorithm in such an adverse setting for 4 experiments.
  • In each experiment, a human sample with an unknown pathogen was amplified by the RT-PCR process using random probes and then hybridized onto the PDC. We subjected the probes for each of the 35 viruses on our PDC to the one-tailed t-test with significance level 0.05 and computed the Weighted Kullback-Leibler (WKL) divergence of their signal intensities to the signal intensities of all the probes on the chip to determine which virus was in the sample for each experiment. Confirmation of the accuracy of the analysis by our program was done by wet-lab PCR to identify the actual virus in the sample. We present the results of our analysis for the 4 experiments and their corresponding PCR verifications in Table 1.
    TABLE 1
    Analysis results done on a PDC with no probe design criteria applied.
    The virus determined by our analysis algorithm to be the actual virus in the
    sample tested for each experiment is highlighted in light gray colour.
    Figure US20070042388A1-20070222-C00001
    Figure US20070042388A1-20070222-C00002
    Figure US20070042388A1-20070222-C00003
  • The present results show that the analysis algorithm accurately deduces the actual virus in the sample tested in the first 3 experiments. Furthermore, we were able to deduce that the sample has no viruses in the last experiment. Note that if we had just used the t-test with level of significance 0.05, then the number of viruses detected to be present for each sample is shown in Table 2.
    TABLE 2
    False positive detection of viruses using t-test alone
    Sample Name
    35259_324 35179_122 35253_841 35915_111
    Viruses 9 14 9 10
    Detected
    Using T-test
    False
    8 13 8 10
    Positives
    Max KL 16.391 5.76 10.85
    divergence
    (>5.0)
    Viruses 1 1 1 0
    Detected
    Using T-test
    followed
    by KL
    divergence
  • By using the Weighted Kullback-Leibler divergence of the viruses that pass the t-test, we were able to remove all false positive viruses and identify the actual virus. Thus, our analysis algorithm can robustly determine the virus under a high level of noise.
  • Next, we investigated the effects of using a PDC with probe design criteria applied on our analysis results. Firstly, the amplification efficiency map for each of the 35 viruses was computed. Then, the exact 53555 probes on the original PDC were subjected to probe design criteria. Probes which had extreme levels of CG-conten, high similarity to human and non-target viruses, and low amplifictaion efficiency scores were removed from the chip. A total of 10955 probes were retained for the second set of experiments. Using the samples used if the first set of experiments, we repeated the 4 experiments with the new chip. The experimental results are presented in Table 3.
    TABLE 3
    Analysis results done on a PDC with probe design criteria applied. The
    virus determined by our analysis algorithm to be the actual virus in the sample
    tested for each experiment is highlighted in light gray colour.
    Figure US20070042388A1-20070222-C00004
    Figure US20070042388A1-20070222-C00005
    Figure US20070042388A1-20070222-C00006
  • Example 2
  • In the second set of experiments, the analysis algorithm correctly detected the actual virus in the 3 samples and also the negative sample. After designing good probes for our chip, the Weighted Kullback-Leibler divergence of the acutal viruses in Experiment 1, 2 and 3 was greater than that of the corresponding experiments without probe design. This means that the signal intensities from the actual virus were relatively higher than the background noise in the PDC. This showed that our probe design criteria had removed some bad probes from the PDC, which resulted in a more accurate analysis.
  • Again, we present results of the 4 experiments if we had just used the t-test with a level of significance 0.05. This time, the number of viruses detected to be present for each sample is shown in Table 4:
    TABLE 4
    False positive detection of viruses using
    t-test alone in a PDC with probe design.
    Sample Name
    35259_324 35179_122 35253_841 35915_111
    Viruses 6 9 9 10
    Detected
    Using T-test
    False
    5 8 8 10
    Positives
    Max KL 18.54859 9.324785 11.17914
    divergence
    (>5.0)
    Viruses 1 1 1 0
    Detected
    Using T-test
    followed
    by KL
    divergence
  • From Table 4, it can be seen that probe design has reduced the number of false positive viruses detected by the t-test for samples 35259324 and 35179122. A more important observation is that the Weighted Kullback Leiber divergence for the actual virus has increased for all 4 samples. This means that the signals of the actual virus are more differentiated than the background signals when probe design criteria are applied on the PDC.
  • In conclusion, we showed that using the one-tailed t-test with significance level 0.05, followed by computing the Weighted Kullback-Leibler divergence for the signal intensities of each virus, we were able to accurately analyze the data on the PDC and determine with high probability the actual pathogen in the sample. Although the analysis algorithm works well even under a high level of noise, we showed that the accuracy of the analysis is improved by using the above-described probe design criteria to select a good set of probes for the PDC.
  • REFERENCES
  • 1. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-3402. (1997).
  • 2. Bodrossy, L. & Sessitsch, A. Oligonucleotide microarrays in microbial diagnostics. Curr Opin Microbiol 7, 245-254 (2004).
  • 3. Bohlander, S. K., Espinosa, I., Rafael, Le Beau, M. M., Rowley, J. D. & Diaz, M. O. A method for the rapid sequence-independent amplification of microdissected chromosomal material. Genomics 13, 1322-1324 (1992).
  • 4. Bustin, S. A. & Nolan, T. Pitfalls of quantitative real-time reverse-transcription polymerase chain reaction. J Biomol Tech 15, 155-166 (2004).
  • 5. Goulden, C. H. Methods of Statistical Analysis, Edn. 2nd. (John Wiley & Sons, Inc., New York; 1956).
  • 6. Kane, M. D. et al. Assessment of the sensitivity and specificity of oligonucleotide (50 mer) microarrays. Nucleic Acids Res 28, 4552-4557 (2000).
  • 7. Kullback, S. & Leiber, R. A. On information and sufficiency. Ann. Math. Stat. 22, 79-86 (1951).
  • 8. Nuwaysir, E. F. et al. Gene expression analysis using oligonucleotide arrays produced by maskless photolithography. Genome Res 12, 1749-1755 (2002).
  • 9. Sambrook and Russel, Molecular Cloning: A Laboratory Manual, Cold Springs Harbor Laboratory, New York (2001).
  • 10. SantaLucia, J., Jr., Allawi, H. T. & Seneviratne, P. A. Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry 35, 3555-3562 (1996).
  • 11. Stephens, M. A. (1974). EDF Statistics for Goodness of Fit and Some Comparisons, Journal of the American Statistical Association, Vol. 69, pp. 730-737.
  • 12. Striebel, H. M., Birch-Hirschfeld, E., Egerer, R. & Foldes-Papp, Z. Virus diagnostics on microarrays. Curr Pharm Biotechnol 4, 401-415 (2003).
  • 13. Sung, W. K. & Lee, W. H. Fast and Accurate Probe Selection Algorithm for Large Genomes. CSB (2003).
  • 14. Vora, G. J., Meador, C. E., Stenger, D. A. & Andreadis, J. D. Nucleic acid amplification strategies for DNA microarray-based pathogen detection. Appl Environ Microbiol 70, 3047-3054 (2004).
  • 15. Wang, D. et al. Viral discovery and sequence recovery using DNA microarrays. PLoS Biol 1, E2 (2003).
  • 16. Wong, C. W. et al. Tracking the Evolution of the SARS Coronavirus Using High-Throughput, High-Density Resequencing Arrays. Genome Res. 14, 398-405 (2004).
  • 17. Wu, D. Y., Ugozzoli, L., Pal, B. K., Qian, J. & Wallace, R. B. The effect of temperature and oligonucleotide primer length on the specificity and efficiency of amplification by the polymerase chain reaction. DNA Cell Biol 10, 233-238 (1991).

Claims (29)

1. A method of designing oligonucleotide probe(s) for nucleic acid detection comprising the following steps in any order:
(i) identifying and selecting region(s) of a target nucleic acid to be amplified, the region(s) having an efficiency of amplification (AE) higher than the average AE; and
(ii) designing oligonucleotide probe(s) capable of hybridizing to the selected region(s).
2. The method according to claim 1, wherein the AE of the selected region(s) is calculated as the Amplification Efficiency Score (AES), which is the probability that a forward primer ri can bind to a position i and a reverse primer rj can bind at a position j of the target nucleic acid, and |i-j| is the region of the target nucleic acid desired to be amplified.
3. The method according to claim 2, wherein |i-j| is ≦10000 bp.
4. The method according to claim 2, wherein |i-j| is ≦500 bp.
5. The method according to claim 1, wherein the oligonucleotide probe(s) capable of hybridizing to the selected region(s) is selected and designed according to at least one of the following criteria:
(a) the selected probe(s) has a CG-content from 40% to 60%;
(b) the probe(s) is selected by having the highest free energy computed based on Nearest-Neighbor model;
(c) given probe sa and probe sb substrings of target nucleic acids va and vb, sa is selected based on the hamming distance between sa and any length-m substring sb from the target nucleic acid vb and/or on the longest common substring of sa and probe sb;
(d) for any probe sa of length-m specific for the target nucleic acid va, the probe sa is selected if it does not have any hits with any region of a nucleic acid different from the target nucleic acid, and if the probe sa length-m has hits with the nucleic acid different from the target nucleic acid, the probe sa length-m with the smallest maximum alignment length and/or with the least number of hits is selected; and
(e) a probe pi at position i of a target nucleic acid is selected if pi is predicted to hybridize to the position i of the amplified target nucleic acid.
6. The method according to claim 1, wherein the method further comprises a step of preparing the selected and designed probe(s).
7. A method of detecting at least one target nucleic acid comprising the step of:
(i) providing a biological sample;
(ii) amplifying the nucleic acid(s) of the biological sample;
(iii) providing at least one oligonucleotide probe capable of hybridizing to at least a target nucleic acid, if present in the biological sample, wherein the probe(s) is prepared according to the method of claim 13; and
(iv) contacting the probe(s) with the amplified nucleic acids and detecting the probe(s) hybridized to the target nucleic acid(s).
8. The method according to claim 7, wherein the amplification step (ii) is carried out in the presence of at least one random forward primer and at least one reverse random primer.
9. The method according to claim 7, wherein the amplification step is a RT-PCR.
10. The method according to claim 7, wherein the forward random primer binding to position i and the reverse random primer binding to position j of a target nucleic acid va are selected among primers having an amplification efficiency score (AESl) for every position i of a target nucleic acid va of:
AES i = j = i - Z i { P f ( j ) × k = i + 1 j + Z P r ( k ) } wherein k = i + 1 j + Z P r ( k ) = P r ( i + 1 ) + P r ( i + 2 ) + P r ( j + Z ) ;
Pf (i) and Pr (i)are the probabilities that a random primer ri can bind to position i of va as forward primer and reverse primer respectively, and Z≦10000 bp is the region of va desired to be amplified.
11. The method according to claim 7, wherein the target nucleic acid to be detected is nucleic acid exogenous to the nucleic acid of the biological sample.
12. The method according to claim 7, wherein the target nucleic acid to be detected is at least a pathogen genome or fragment thereof.
13. The method according to claim 12, wherein the pathogen nucleic acid is at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof.
14. The method according to claim 7, wherein the biological sample is obtained from a human being and the target nucleic acid, if present in the biological sample, is not from human.
15. The method according to claim 7, wherein the probes are placed on an insoluble support.
16. The method according to claim 7, wherein in the detection step (iv), the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, thereby indicating the presence of va in the biological sample.
17. The method according to claim 7, wherein in the detection step (iv), the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, and the method further comprises the step of computing the relative difference of the proportion of probes ∉ va having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes va being more positively skewed than that of probes ∉ va, thereby indicating the presence of va in the biological sample.
18. The method according to claim 7, wherein in the detection step (iv), the presence of a target nucleic acid in a biological sample is given by a value of t-test ≦0.1 and/or a value of Kullback-Leibler divergence of ≧1.0.
19. A method of determining the presence of a target nucleic acid va comprising detecting the hybridization of a probe to a target nucleic acid va and wherein the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, thereby indicating the presence of va.
20. The method according to claim 19, wherein the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, and the method further comprises the step of computing the relative difference of the proportion of probes ∉ va having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes va being more positively skewed than that of probes ∉ va, thereby indicating the presence of va.
21. The method according to claim 19, wherein the presence of a target nucleic acid in a biological sample is given by a value of t-test ≦0.1 and/or a value of Kullback-Leibler divergence of ≧1.0.
22. A method of detecting at least one target nucleic acid comprising the steps of:
(i) providing a biological sample;
(ii) amplifying the nucleic acid(s) of the biological sample;
(iii) providing at least one oligonucleotide probe capable of hybridizing to at least a target nucleic acid, if present in the biological sample; and
(iv) contacting the probe(s) with the amplified nucleic acids and detecting the probe(s) hybridized to the target nucleic acid(s), wherein the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, thereby indicating the presence of va in the biological sample.
23. The method according to claim 22, wherein in step (iv) the mean of the signal intensities of the probes which hybridize to va is statistically higher than the mean of the probes ∉ va, and the method further comprises the step of computing the relative difference of the proportion of probes ∉ va having high signal intensities to the proportion of the probes used in the detection method having high signal intensities, the density distribution of the signal intensities of probes va being more positively skewed than that of probes ∉ va, thereby indicating the presence of va in the biological sample.
24. The method according to claim 22, wherein in step (iv) the presence of a target nucleic acid in a biological sample is given by a value of t-test ≦0.1 and/or a value of Kullback-Leibler divergence of ≧1.0.
25. The method according to claim 22, wherein the target nucleic acid to be detected is nucleic acid exogenous to the nucleic acid of the biological sample.
26. The method according to claim 22, wherein the target nucleic acid to be detected is at least a pathogen genome or fragment thereof.
27. The method according to claim 26, wherein the pathogen nucleic acid is at least a nucleic acid from a virus, a parasite, or bacterium, or a fragment thereof.
28. The method according to claim 22, wherein the biological sample is obtained from a human being and the target nucleic acid, if present in the biological sample, is not from human genome.
29. The method according to claim 22, wherein the probes are placed on an insoluble support.
US11/202,023 2005-08-12 2005-08-12 Method of probe design and/or of nucleic acids detection Abandoned US20070042388A1 (en)

Priority Applications (9)

Application Number Priority Date Filing Date Title
US11/202,023 US20070042388A1 (en) 2005-08-12 2005-08-12 Method of probe design and/or of nucleic acids detection
AU2006280489A AU2006280489B2 (en) 2005-08-12 2006-08-08 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection
KR1020087006089A KR20080052585A (en) 2005-08-12 2006-08-08 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection
PCT/SG2006/000224 WO2007021250A2 (en) 2005-08-12 2006-08-08 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection
JP2008525967A JP2009504153A (en) 2005-08-12 2006-08-08 Method and / or apparatus for oligonucleotide design and / or nucleic acid detection
US11/990,290 US8234079B2 (en) 2005-08-12 2006-08-08 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection
EP06769707A EP1922418A4 (en) 2005-08-12 2006-08-08 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection
CN2006800369768A CN101292044B (en) 2005-08-12 2006-08-08 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection
US13/549,032 US20120309643A1 (en) 2005-08-12 2012-07-13 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/202,023 US20070042388A1 (en) 2005-08-12 2005-08-12 Method of probe design and/or of nucleic acids detection

Related Child Applications (4)

Application Number Title Priority Date Filing Date
PCT/SG2006/000224 Continuation-In-Part WO2007021250A2 (en) 2005-08-12 2006-08-08 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection
US11/990,290 Continuation-In-Part US8234079B2 (en) 2005-08-12 2006-08-08 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection
US11/990,290 Continuation US8234079B2 (en) 2005-08-12 2006-08-08 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection
US99029008A Continuation-In-Part 2005-08-12 2008-02-11

Publications (1)

Publication Number Publication Date
US20070042388A1 true US20070042388A1 (en) 2007-02-22

Family

ID=37757981

Family Applications (3)

Application Number Title Priority Date Filing Date
US11/202,023 Abandoned US20070042388A1 (en) 2005-08-12 2005-08-12 Method of probe design and/or of nucleic acids detection
US11/990,290 Expired - Fee Related US8234079B2 (en) 2005-08-12 2006-08-08 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection
US13/549,032 Abandoned US20120309643A1 (en) 2005-08-12 2012-07-13 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection

Family Applications After (2)

Application Number Title Priority Date Filing Date
US11/990,290 Expired - Fee Related US8234079B2 (en) 2005-08-12 2006-08-08 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection
US13/549,032 Abandoned US20120309643A1 (en) 2005-08-12 2012-07-13 Method and/or apparatus of oligonucleotide design and/or nucleic acid detection

Country Status (7)

Country Link
US (3) US20070042388A1 (en)
EP (1) EP1922418A4 (en)
JP (1) JP2009504153A (en)
KR (1) KR20080052585A (en)
CN (1) CN101292044B (en)
AU (1) AU2006280489B2 (en)
WO (1) WO2007021250A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100159533A1 (en) * 2008-11-24 2010-06-24 Helicos Biosciences Corporation Simplified sample preparation for rna analysis
WO2011046614A2 (en) * 2009-10-16 2011-04-21 The Regents Of The University Of California Methods and systems for phylogenetic analysis
WO2012173636A1 (en) * 2011-06-16 2012-12-20 University Of Rochester Hiv incidence assays with high sensitivity and specificity
CN110268473A (en) * 2017-02-08 2019-09-20 微软技术许可有限责任公司 The design of primers of polynucleotides for being stored fetched
CN115101128A (en) * 2022-06-29 2022-09-23 纳昂达(南京)生物科技有限公司 Method for evaluating off-target risk of hybridization capture probe

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101138864B1 (en) * 2005-03-08 2012-05-14 삼성전자주식회사 Method for designing primer and probe set, primer and probe set designed by the method, kit comprising the set, computer readable medium recorded thereon a program to execute the method, and method for identifying target sequence using the set
US20070042388A1 (en) * 2005-08-12 2007-02-22 Wong Christopher W Method of probe design and/or of nucleic acids detection
US20110152109A1 (en) * 2009-12-21 2011-06-23 Gardner Shea N Biological sample target classification, detection and selection methods, and related arrays and oligonucleotide probes
JP5756247B1 (en) 2012-05-08 2015-07-29 アダプティブ バイオテクノロジーズ コーポレイション Composition and method for measuring and calibrating amplification bias in multiplex PCR reactions
CN105780129B (en) * 2014-12-15 2019-06-11 天津华大基因科技有限公司 Target area sequencing library construction method
US11319602B2 (en) 2017-02-07 2022-05-03 Tcm Biotech Internationl Corp. Probe combination for detection of cancer
JP6995604B2 (en) * 2017-12-15 2022-01-14 東洋鋼鈑株式会社 Design method and probe set for single nucleotide polymorphism detection probe
CN109097450B (en) * 2018-08-30 2022-05-13 江苏省疾病预防控制中心 Nucleic acid sequence independent full RNA amplification method
KR102020614B1 (en) * 2018-09-11 2019-09-11 한국과학기술정보연구원 Primer set for diagnosing tuberculosis and uses thereof
WO2020054906A1 (en) * 2018-09-11 2020-03-19 한국과학기술정보연구원 Method for designing primer for detecting target gene

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030236633A1 (en) * 2000-11-21 2003-12-25 Affymetrix, Inc. Methods for oligonucleotide probe design
US20040259124A1 (en) * 2003-02-19 2004-12-23 Affymetrix, Inc. Methods for oligonucleotide probe design

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000194554A (en) * 1998-12-25 2000-07-14 Nec Corp Arithmetic processor
GB2377017A (en) 2001-06-28 2002-12-31 Animal Health Inst Detection of foot and mouth disease virus
CN100392097C (en) * 2002-08-12 2008-06-04 株式会社日立高新技术 Method of detecting nucleic acid by using DNA microarrays and nucleic acid detection apparatus
US20070042388A1 (en) * 2005-08-12 2007-02-22 Wong Christopher W Method of probe design and/or of nucleic acids detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030236633A1 (en) * 2000-11-21 2003-12-25 Affymetrix, Inc. Methods for oligonucleotide probe design
US20040259124A1 (en) * 2003-02-19 2004-12-23 Affymetrix, Inc. Methods for oligonucleotide probe design

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100159533A1 (en) * 2008-11-24 2010-06-24 Helicos Biosciences Corporation Simplified sample preparation for rna analysis
WO2011046614A2 (en) * 2009-10-16 2011-04-21 The Regents Of The University Of California Methods and systems for phylogenetic analysis
WO2011046614A3 (en) * 2009-10-16 2011-07-21 The Regents Of The University Of California Methods and systems for phylogenetic analysis
WO2012173636A1 (en) * 2011-06-16 2012-12-20 University Of Rochester Hiv incidence assays with high sensitivity and specificity
CN110268473A (en) * 2017-02-08 2019-09-20 微软技术许可有限责任公司 The design of primers of polynucleotides for being stored fetched
CN115101128A (en) * 2022-06-29 2022-09-23 纳昂达(南京)生物科技有限公司 Method for evaluating off-target risk of hybridization capture probe

Also Published As

Publication number Publication date
US20120309643A1 (en) 2012-12-06
WO2007021250A3 (en) 2007-07-05
CN101292044B (en) 2012-11-07
US8234079B2 (en) 2012-07-31
WO2007021250A2 (en) 2007-02-22
AU2006280489B2 (en) 2012-05-24
KR20080052585A (en) 2008-06-11
EP1922418A4 (en) 2010-02-03
JP2009504153A (en) 2009-02-05
AU2006280489A1 (en) 2007-02-22
US20090053708A1 (en) 2009-02-26
EP1922418A2 (en) 2008-05-21
CN101292044A (en) 2008-10-22

Similar Documents

Publication Publication Date Title
US20070042388A1 (en) Method of probe design and/or of nucleic acids detection
US11377695B2 (en) Breast cancer associated circulating nucleic acid biomarkers
US20230087365A1 (en) Prostate cancer associated circulating nucleic acid biomarkers
US20200354788A1 (en) Digital counting of individual molecules by stochastic attachment of diverse labels
CN112037860B (en) Statistical analysis for non-invasive chromosome aneuploidy determination
US20140155283A1 (en) Microarray for detecting viable organisms
JP2004531271A (en) Methods for detecting diseases caused by chromosomal imbalance
JP7071341B2 (en) How to identify a sample
JP2011239708A (en) Design method for probe for nucleic acid standard substrate detection, probe for nucleic acid standard substrate detection and nucleic acid detecting system having the same
US20220403447A1 (en) Sample preparation and sequencing analysis for repeat expansion disorders and short read deficient targets
US8268562B2 (en) Biomarkers for predicting response of esophageal cancer patient to chemoradiotherapy

Legal Events

Date Code Title Description
AS Assignment

Owner name: AGENCY FOR SCIENCE, TECHNOLOGY AND RESEARCH, SINGA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WONG, CHRISTOPHER W.;SUNG, WING-KIN;LEE, CHARLIE;AND OTHERS;REEL/FRAME:017242/0560;SIGNING DATES FROM 20051026 TO 20051028

STCB Information on status: application discontinuation

Free format text: EXPRESSLY ABANDONED -- DURING EXAMINATION