WO2000060123A2 - Method for selecting primers for amplification of nucleic acids - Google Patents

Method for selecting primers for amplification of nucleic acids Download PDF

Info

Publication number
WO2000060123A2
WO2000060123A2 PCT/US2000/008962 US0008962W WO0060123A2 WO 2000060123 A2 WO2000060123 A2 WO 2000060123A2 US 0008962 W US0008962 W US 0008962W WO 0060123 A2 WO0060123 A2 WO 0060123A2
Authority
WO
WIPO (PCT)
Prior art keywords
primers
primer
sequence
degenerate
fixed
Prior art date
Application number
PCT/US2000/008962
Other languages
French (fr)
Other versions
WO2000060123A3 (en
Inventor
Periannan Senapathy
Original Assignee
Genome Technologies, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genome Technologies, Llc filed Critical Genome Technologies, Llc
Priority to AU43299/00A priority Critical patent/AU4329900A/en
Publication of WO2000060123A2 publication Critical patent/WO2000060123A2/en
Publication of WO2000060123A3 publication Critical patent/WO2000060123A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/686Polymerase chain reaction [PCR]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the invention relates to a method of optimizing the selection of a degenerate, second primer for use in nucleic acid-based reactions. More particularly, the invention relates to a method of optimizing the selection of a degenerate, second primer such that the T m of the degenerate, second primer matches the T m of a fixed, first primer.
  • PCR polymerase chain reaction
  • Oligonucleotides complementary to these known sequences at both ends serve as "primers" in the PCR procedure.
  • Double stranded target DNA is first melted to separate the DNA strands, and then oligonucleotide (oligo) primers complementary to the ends of the segment which is desired to be amplified are annealed to the template DNA.
  • oligonucleotide (oligo) primers complementary to the ends of the segment which is desired to be amplified are annealed to the template DNA.
  • Nucleic acid primers are typically short fragments of nucleic acid, most commonly DNA, that bind to target sequences of nucleic acid, such as RNA or DNA. Binding is based on the complementarity of the primer sequence to the target sequence. According to base pair rules, A binds with T, and C binds with G. Complementarity can be exact, wherein each nucleotide is bound to its complement, or it can be inexact, wherein not all nucleotides are bound to their complements. The oligos serve as primers for the synthesis of new complementary DNA strands, using a DNA polymerase enzyme and a process known as primer extension.
  • each newly synthesized DNA strand becomes a template for synthesis of another DNA strand beginning with the other oligo as primer.
  • Repeated cycles of melting, annealing of oligo primers, and primer extension lead to a (near) doubling, with each cycle, of DNA strands containing the sequence of the template beginning with the sequence of one oligo primer and ending with the sequence of the other oligo primer.
  • the key requirement for this exponential increase of template DNA is the two oligo primers complementary to the ends of the sequence desired to be amplified, and oriented such that their 3' extension products proceed toward each other. If the sequences at both ends of the segment to be amplified are not known, complementary oligos cannot be made and standard PCR cannot be performed.
  • U.S. Patent No. 5,994,058 to Senapathy, which is incorporated herein by reference, discloses PCR amplification of a target DNA using a first primer with a completely known sequence (i.e. , a fixed, first primer) and a plurality of degenerate, second primers of varying partly-fixed sequence.
  • the fixed, first primer binds at a known sequence
  • at least one of the degenerate, second primers binds at an unknown sequence downstream from the first primer.
  • the completely unknown second primer-binding site is located in the unknown sequence region of the target DNA.
  • the fixed portion of the degenerate, second primer statistically determines the distance at which the degenerate primer binds relative to the primer-binding site for the fixed, first primer. Because the varying sequence region within the degenerate, second primer contains all of the possible sequences (i.e. , each possible permutation is represented in the plurality), there will be a species of the degenerate, second primer preparation that will have a full-length primer sequence that is complementary to an a priori unknown binding site in the target DNA. Consequently, PCR amplification takes place between two primers that are complementary to their respective binding-sites, although the actual primer sequence at the second primer-binding site is unknown previous to the PCR amplification.
  • the method of the '058 patent will work even with one or a few mismatches between the primers and the target nucleic acid, as long as there is sufficient complementarity, to enable standard PCR amplification. Therefore, the method disclosed in the '058 patent permits PCR amplification of an unknown DNA region adjacent to a known DNA region. Once the unknown region has been amplified, it can be sequenced. Consequently, this method can be used to contiguously PCR amplify a long genomic DNA and completely sequence it without having to resort to shotgun cloning and sequencing.
  • the T m of a DNA molecule is the midpoint of the temperature range over which the two strands of DNA separate.
  • the T m depends on the proportion of GC base pairs. Because GC base pairs have three hydrogen bonds, whereas AT base pairs have only two, more energy is required to melt the GC base pairs than the AT base pairs. Thus, the more GC base pairs in a given DNA molecule, the more energy it will take to separate the strands.
  • the T m of the primers of a primer set should be matched.
  • the range of the T m between a first and a second primer is usually the T m of the first primer plus or minus 3-5 °C for the second primer. Having primers with closely-matched T m 's results in more efficient amplification. This is because each of the primers is able to bind efficiently and specifically to the template at the same temperature during the annealing step of the PCR.
  • the T m of the fixed, first primer is known because its sequence is fully known.
  • the T m of the degenerate, second primer is not known, because only part of the degenerate, second primer sequence is known and the rest of its sequence is randomized.
  • the efficiency of the PCR amplification is significantly diminished when the T m of one primer does not match the T m of a second primer.
  • the object of the present invention is to provide a method of specifically matching the T m of a fixed, first primer with a completely known sequence to the T m of a degenerate, second primer with only a partly fixed sequence.
  • a first embodiment of the invention is drected to a method of determining a T m range for a plurality of degenerate oligonucleotide primers having a fixed- sequence portion and a degenerate-sequence portion.
  • the method comprises the steps of: (a) searching a known portion of a nucleic acid template for a sequence complementary to a desired fixed-sequence portion of a primer within the plurality ; (b) identifying nucleotide base pairs flanking or interspersed between the sequence of step (a), the base pairs flanking or interspersed between the sequence of step (a) being complementary to at least one degenerate- sequence portion of one of the primers present in the plurality, whereby one or more sequences of potential binding sites for the plurality of primers are elucidated; and then (c) calculating T m of primers whose fixed- sequence portion and degenerate- sequence portion are complementary to the potential binding sites of step (b).
  • a second embodiment of the invention is directed to a method for selecting a plurality of degenerate primers, each primer within the plurality having a fixed- sequence portion and a degenerate-sequence portion, the plurality of degenerate primers having a fixed-sequence portion that occurs at a frequency in a nucleic acid template than is different than expected based on a random distribution of nucleotides in the template.
  • This method comprises the following steps: (a) determining occu ⁇ ences of the fixed- sequence portion of the degenerate primers in a given template sequence; statistically determining a mean number of occurrences within a hypothetical template having a random distribution of nucleotide base pairs of a hypothetical primer equal in length to the degenerate primers; and then (c) selecting the degenerate primers based upon whether the fixed- sequence portion of the primers has a different number of occu ⁇ ences in the template than the hypothetical primer of the same length in the hypothetical template having a random distribution of nucleotide base pairs.
  • a third embodiment of the invention is directed to a method of PCR amplification using a fixed oligonucleotide primer and a plurality of T m -matched degenerate oligonucleotide primers, each degenerate primer having a fixed- sequence portion and a degenerate-sequence portion.
  • the method comprises the steps of: (a) searching a known portion of a nucleic acid template for a sequence complementary to a desired fixed- sequence portion of a primer within the plurality; (b) identifying nucleotide base pairs flanking or interspersed between the sequence of step (a), the base pairs flanking or interspersed between the sequence of step (a) being complementary to at least one degenerate- sequence portion of one of the primers present in the plurality, whereby one or more sequences of potential binding sites for the plurality of primers are elucidated; and then (c) calculating T m of primers whose fixed- sequence portion and degenerate-sequence portion are complementary to the potential binding sites of step (b); and (d) amplifying the template by PCR using an oligonucleotide primer of fixed sequence and and the plurality of primers, the fixed sequence primer having a Tm within the Tm range of the plurality of primers.
  • a prefe ⁇ ed version of the present invention relates to a molecular biology method that uses two primers (e.g. , PCR amplification) having a first and a second primer.
  • the first primer has a completely known sequence.
  • the second primer has a known sequence portion and an unknown sequence portion.
  • the second primer is actually a mixture of primers with each primer within the mixture sharing the same known sequence but having a different unknown sequence portion.
  • the present invention enables the matching of the T m of the fixed, first primer to the T ra of the degenerate, second primer.
  • the close matching of the two T m s greatly improves the efficiency of a PCR reaction with two such primers.
  • the degenerate, second primer having a fixed portion is designed to bind to a DNA template, preferably a genomic DNA template, that has at least a partially known DNA sequence.
  • a DNA template preferably a genomic DNA template
  • several binding sites are determined for the degenerate, second primer. All of the occu ⁇ ences of a degenerate, second primer are analyzed in the known portion of a given DNA template. Because the degenerate, second primers are variable, each site where the degenerate, second primer binds, will be different from the rest of the sequence.
  • the particular sequences of the primer's degenerate portion that bind the template DNA sequence next to the binding sites of the primer's fixed sequence can be determined by examining the template sequence.
  • Various features of the primer sequences, such as the G/C content and the T m can be analyzed.
  • the statistical spread of the T m , and the average T m for all the occu ⁇ ences of a degenerate, second primer can also be analyzed.
  • the T m of the various oligonucleotides with a particular fixed sequence that occur in a template DNA sequence are generally distributed in a random manner.
  • the T m for oligonucleotides with some particular fixed sequences occurs in a sharper range, and occurs even sharper for a smaller subset of the fixed sequences.
  • the spread of the T m s is sharp (i.e. , the spread of the T m s is small) over a known portion of the template DNA sequence. It was discovered in the present invention that the information from one known portion of a genomic template DNA sequence is also true for another known portion of that genomic DNA, for instance in another chromosome. Therefore, for a given fixed sequence, once the spread of the T m s for a known portion is determined, this information can be used to analyze the unknown portion. Many different given fixed sequences can be analyzed, thereby permitting a determination of a set of oligonucleotides that have a sharp T m range in a given unknown template DNA sequence.
  • oligonucleotides occur at a much higher frequency in a genomic DNA sequence than statistically expected. When the frequency is high in a known portion of a long DNA sequence, then this phenomenon is also applicable to the unknown sequence region of the template.
  • This uniform high frequency of certain oligonucleotides in a template sequence permits a longer fixed sequence to be incorporated into a degenerate, second primer. This is because more-frequently occurring, longer sequences will appear at intervals closer to the first primer binding site. Consequently, a unit length of unknown DNA sequence (say, 1 kb) can be amplified with a longer fixed-sequence primer. In general, the longer a primer is in its fixed portion, the better it performs in the PCR.
  • the longer fixed sequence primer will bind at shorter distances from a given fixed, first primer than statistically expected in a template DNA. This finding permits the present invention to be used in PCR amplification of an unknown DNA sequence with the CGS method more efficiently. Conversely, some oligonucleotides occur at a much lower frequency in a genomic DNA sequence than statistically expected. If amplification between two, more distant primers is desired, this feature permits such amplification. Also, primers whose binding sites are unique or rare can be designed to include a co ⁇ esponding, less frequently - occurring sequence
  • FIG. 1 is an illustration showing a method for determining the occu ⁇ ence of the partly fixed degenerate, second primer in the known template DNA sequence.
  • FIG. 2 is a drawing depicting the occu ⁇ ences of a primer having a 6-base fixed sequence within a template along with the neighboring, unknown sequence of
  • FIG. 3A shows the distribution of T m s of a degenerate, second primer, the distribution not having a sharp range.
  • FIG.3B shows the distribution of T m s of another degenerate, second primer, the distribution having a sharp range
  • FIG. 4 is a diagram denoting the frequency distribution of the occu ⁇ ences of a degenerate, second primer with a particular fixed sequence.
  • the cu ⁇ ent invention is based on a recently developed method disclosed in U.S. Patent No. 5,994,058 ('058 patent), which describes a new contiguous genome amplifying and sequencing method that allows the contiguous amplification of a very long DNA without the need for it to be subcloned. It uses the basic PCR technique but circumvents the usual need of PCR for sufficient knowledge of the sequence of the target DNA to enable the fabrication of two primers for contiguous sequencing.
  • the technology of the '058 patent makes it possible to PCR-amplify
  • DNA of unknown sequence which lies adjacent to a DNA of known sequence using a fixed, first primer and a plurality of degenerate, second primers.
  • the sequence of the binding site of the degenerate, second primer does not need to be known.
  • the '058 patent method is for amplifying an unknown DNA using a primer complementary to a known sequence region of the target DNA, and a partly degenerate, second primer that will bind downstream from the first primer, in a region of unknown sequence.
  • the plurality of degenerate, second primers each contain a randomized sequence portion, and a fixed sequence portion, making the second primers partly degenerate.
  • the fixed sequence portion statistically determines the location of its binding in relation to the fixed, first primer (in the 3 ' direction). In other words, the number of fixed nucleotides in the degenerate, second primer statistically determines the average distance that the degenerate, second primer is expected to bind downstream from the fixed, first primer.
  • the actual sequence of the fixed nucleotides in the second primer determines exactly where the second primer binds .
  • the '058 patent does not contemplate determining the temperature of annealing, (i.e., the T m ) of the degenerate, second primer. Because the exact nucleotide sequence of any given primer within the plurality of degenerate, second primers (other than the fixed nucleotide sequence) is random, in the '058 patent the T m of the degenerate, second primer cannot be matched with that of the fixed, first primer. Such a matching is desirable because it generates a very efficient PCR amplification of the DNA.
  • a method for determining the T m of the degenerate, second primer is greatly beneficial for matching the T m of the second primer with that of a first fully known primer.
  • the cu ⁇ ent method permits the determination of the T m average of a number of degenerate, second primers with different partly-fixed sequences.
  • a degenerate, second primer can be chosen that: (1) matches the first primer's T m ; and (2) binds at an appropriate distance from the first primer.
  • the information for selecting a desired T m of the degenerate, second primer is derived by analysis of the known DNA sequence regions in a template genomic DNA. This T m information is applicable to the unknown sequence regions of the template, thereby allowing for judicious selection of the plurality of second, degenerate primers.
  • This method permits the sharp-matching of the T m of a fixed, first primer having a known sequence with the T m of the degenerate, second primer, thereby optimizing PCR amplification using degenerate primers. It was discovered that some primer species of a plurality of degenerate, second primers have a consistently sharp T m range within the known portion of the DNA target.
  • this feature is also uniformly applicable to the unknown portion of the genomic DNA, which thereby permits the definition a sharp T m range for a given plurality of degenerate, second primers.
  • a database of many different degenerate, second primers with a defined T m permits the matching of the T m of the degenerate, second primers to that of a fixed, first primer.
  • a second aspect of the invention is based on the discovery that the frequency of particular oligonucleotide sequences are far higher than statistically expected in a given template DNA, such as human genomic DNA. This feature is also applicable to the unknown sequence portion of the DNA. Different known sequences are analyzed to determine their frequency of occu ⁇ ence within the template DNA. By analyzing the template DNA for the frequencies of many different sequences of varying length, a list of potential binding sites with high frequency and occurring at a uniform distribution throughout the known sequence region is generated. This information is used to generate complementary oligonucleotides of fixed sequence for use as first primers or second primers.
  • the T m average for all occu ⁇ ences of a plurality of degenerate, second primers in a known template DNA sequence is determined using the method described below.
  • Another prefe ⁇ ed embodiment of this invention provides a method for the definition and development of degenerate, second primers specifically applicable to genomes of different species. Because the oligonucleotide distribution characteristics of different genomes vary, the present invention can be used to find distinct sets of degenerate, second primers, with different fixed sequences that are specifically applicable to different genomic sequences. These sequence properties can also be used to design first primers that are species-specific.
  • An immediately apparent advantage of the present invention is that by using a database of different pluralities of highly frequent oligonucleotides having sharply defined T m averages in a given template DNA allows the selection of primers which produce a more efficient PCR amplification.
  • the method also permits the efficient amplification of an unknown DNA sequence downstream of a known sequence, as if the second primer sequence was fully fixed, rather than only partially fixed. This method is also advantageous in PCR-walking of large genomic DNA.
  • FIGS. 1-2 illustrate, for the known portion of the DNA template, the binding sites are determined for the degenerate, second primers. This is accomplished by searching the known portion of the DNA template for the presence of the fixed portion of the degenerate , second primer. Wherever the fixed portion of the degenerate, second primer occurs in the template DNA, bases are added to one side of the fixed sequence to get the full primer-binding site. From this information, the reverse complementary sequence of the degenerate, second primer which binds to that site is now known, including both the fixed and variable portions.
  • the degenerate, second primer's variable portion may be different at different locations where it binds to the DNA template. However, at all of these sites, the reverse complementary sequence of the variable sequence are known on the known portion of the DNA template.
  • the T m 's of the species within the plurality of degenerate, second primers are determined by a method such as the one detailed below. This permits the statistical features of the variable portions to be computed for a known DNA template.
  • second oligonucleotide primer for example, a 16-mer, of which 8 bases are fixed and 8 bases are variable
  • the T m of the known sequence of 8 bases is inherent in the sequence itself. Thus, it can be calculated based on a known formula, such as 2°C for every A or T, and 4°C for every C or G. Other methods to calculate T m that are known in the art can be used.
  • the average T m can be predicted quite accurately based upon the complementary sequences found in the known portions of the template DNA. This average T m will remain accurate even when amplifying unknown portions of the template DNA.
  • the particular primer species from among the degenerate, second primer mixture that specifically bind to a particular primer-binding site is determined by the exact sequence in the template DNA flanking the fixed- sequence binding site.
  • the T m of the degenerate, second primer therefore, is determined by the sequence of the template at this location.
  • all of the binding locations of a degenerate, second primer with a particular fixed sequence can be determined because all of these are determined based on examining the sequences adjacent the fixed sequence.
  • all of the full-length sequences of a degenerate, second primer with a particular fixed sequence that can bind to a given template DNA can be determined if at least part of the template DNA sequence is known.
  • This data can be analyzed for all of the full-length primers for a given fixed sequence of the degenerate, second primer.
  • the actual distribution of bases and sequences in a given genomic sequence depends upon various factors inherent and specific to a given template DNA, such as the G+C content, other sequence biases, and preferences in the DNA sequence.
  • T m and other parameters of the primer sequences with respect to a template DNA can be determined. For example, because of biological sequence biases within a given template DNA, oligonucleotides with varying statistical and other characteristics occur within it.
  • the systematic analysis of the present invention provides a method for choosing some of these oligonucleotides with particular sequence biases that can be very advantageous in molecular biology techniques, such as PCR genome walking.
  • the sequences flanking a particular fixed base sequence in known portions of the template DNA can be determined.
  • the various sequence features of the flanking regions can then be subjected to statistical analysis.
  • the T m 's of corresponding full-length primers complementary to the fixed sequence and the variable flanking regions can be determined.
  • the frequency of primers having a given T ra can be determined, and the frequency of distribution of these primers with a particular T m can also be determined. If the frequency pattern is such that most primers occur within a na ⁇ ow range of T m , then this range can be used as the most effective of the T m of the second primers having a fixed base sequence portion.
  • the T m of all of the degenerate, second primers with a given fixed sequence portion fall within a na ⁇ ow range of, for example, 5°C, then any one of these primers whose T m falls roughly in the middle of the 5°C temperature range will be an optimally effective second primer.
  • the characteristic features of the sequences flanking the fixed base sequence in the second, degenerate primer can be derived from the known sequence information of the long template DNA.
  • the range of the T m between a first and a second primer is normally 3-5 ° C. If more than 60 % of the primer species of the plurality of degenerate , second primers of the present invention, falls within this range, it is a great advantage. At this percentage, by using two such primers with different fixed base regions that have a T m within the given range, the probability is close to 1 that one of the degenerate, second primers will match the T m of the first primer within the range desired.
  • FIGS. 3 and 3B illustrate examples of two different pluralities of degenerate, second primers having different fixed sequences.
  • FIG. 3 A has a fixed sequence of CGGGGCCG.
  • FIG. 3B has a fixed sequence of GGCGGGCGG.
  • binding sites within a known portion of a template DNA was determined for each degenerate, second primer.
  • the position of the binding site within the DNA template, the sequence of the binding site, and the T m of the binding sites are listed respectfully.
  • An "F" to the left of the binding site position indicates that the primer has the forward sequence in the template.
  • the absence of an "F” indicates that the primer has the reverse complement sequence of the template.
  • a degenerate primer (e.g. , a 16-mer) having 8 fixed bases and 8 variable bases will have 4 8 (65 ,536) different sequences in the 8 bases of the variable region.
  • the regions flanking the fixed portion of the second, degenerate primers can be determined. This knowledge can then be used to compute an average T m for the subset of second primers which will actually bind to the templates.
  • T m range of degenerate primers having many different fixed sequences occur in a template DNA in a random manner, some occur with a sharp bell-shaped curve, and with a sharp range of T m . This situation is deviant from that which would be expected for a random sequence.
  • the T m range is quite broad.
  • the example shown in FIG. 3B the T m range is very sharp.
  • FIGS. 3A-B the distribution of the sequences' T m s are shown, with the mean, median, and mode shown below the sequences.
  • the template DNA used in these examples was human genomic DNA. In comparing FIG. 3A with FIG. 3B, several items are notable.
  • the distribution of the T m s of the primers of FIGS . 3A and 3B are both a bell-shaped curve.
  • the curve of FIG. 3A is not a sharp curve, whereas the curve of FIG. 3B is.
  • the inventive method to analyze empirically the T m s in known genomic DNA sequences, such as the human genomic DNA illustrated here, it was discovered that the T m occurs in a dramatically short range of within 5-10°C for certain primer sets. From a larger list of such primers, a sub-set of many primers whose T m fall within even a much na ⁇ ower range of from 3-5 °C can be chosen.
  • the range of the T m is controlled by the length of the fixed sequence in the second, degenerate primer.
  • Second Primers with Certain Fixed Sequences in a Template DNA It was also discovered that in a given template DNA sequence, some of the oligonucleotides occur at a far higher frequency than is statistically expected. Sometimes these occur at 50-100 times more than their expected frequency. FIG.
  • FIG. 4 illustrates the frequency distribution of the occu ⁇ ence of an exemplary degenerate, second primer with a particular fixed sequence.
  • human genomic DNA was the template used to generate the data presented in FIG. 4.
  • a degenerate, second primer for example, a 16-mer with a particular fixed sequence of 8 bases and 8 variable bases, will have 4 8 (65,536) different sequences in the primers' variable portion.
  • a template sequence of, for example, one million bases only about 16 occu ⁇ ences of a particular 8-base sequence are expected to occur.
  • a given fixed sequence of 8 bases may occur at either the expected frequency, at a frequency much lower than expected, or at a frequency much higher than expected.
  • 500,000 bases of the human genome was analyzed for the occu ⁇ ence of various fixed sequence 9-mers.
  • the number of occu ⁇ ences is listed to the right of each sequence.
  • a map showing the approximate position of the occu ⁇ ence is shown to the right of the frequency.
  • the number of occu ⁇ ences for the various 9-mers ranges from 6 to 62. That is, for the 9-mers analyzed in this
  • one primer occu ⁇ ed only 9 times (for an average of one occu ⁇ ence per 55,555 bases).
  • one primer occu ⁇ ed 62 times for an average of one occu ⁇ ence per 8,065 bases).
  • Another advantage of using a longer fixed- sequence in the second primer is that a lesser quantity of primer can be used in the PCR reaction. This is because the more fixed bases there are in a primer, the fewer variable bases there will be, and hence fewer permutations of the second, degenerate primer. Thus, more primers from the plurality of second, degenerate primers will have a sequence that is complementary to a given template. This advantage makes it more statistically probable that a mole equivalent of the particular degenerate, second primer sequence to that of the first primer will be provided.
  • the statistical distribution features of degenerate, second primers with certain fixed sequences are uniform in a template DNA.
  • the features observed in a known sequence region can be applied to the unknown sequence region.
  • the method uses the information present in the known sequence region of a genomic DNA for identifying a degenerate, second primer's actual sequences that have the capability to bind to the template DNA in the unknown region. It is known that the frequency of different oligonucleotides in a given genomic DNA can vary from expected frequency for a random DNA sequence based on the GC content of the genomic DNA, and many other types of sequence biases that are inherent in a given genomic DNA.
  • a prefe ⁇ ed procedure in accordance with the present invention is as follows. As is illustrated in FIG. 2, the known sequence region of a template DNA is searched for the presence of a given fixed oligonucleotide, (e.g. , CGGCCC) of a given length, (i.e. , 6). When the fixed sequence is found in the template, the neighboring bases making up the full-length of the second, degenerate primer are also determined by adding bases to one side of the fixed sequence. In the example of FIG. 2, the following sequences are found in the template (with the fixed sequence underlined): CGGCCCTATCG. CGGCCC AGGCC , AATGACGGCCC. and CGGCCCTACCT.
  • the annealing temperature (T m ) for the complete primer of a given length including the fixed nucleotide sequence, (e.g. , 11 nucleotides including 6 fixed nucleotides), downstream of the fixed sequence is computed.
  • the T m of all the occu ⁇ ences of these sequences are determined.
  • the T m of the variable portion and the full binding site is as shown in Table 1 below (all temperatures are listed in °C).
  • the frequency of occu ⁇ ence of various T m is also determined.
  • the mean, median, and mode are determined.
  • the mean T m is 38.5°C
  • the median is 38 °C
  • the mode is 38°C.
  • the results of the frequency graphs showed that generally the distribution of the T m is a bell shaped curve (normal distribution).
  • the important information that can be derived from the curve is the most probable T m of the degenerate, second primers with a particular fixed sequence that bind a given genomic DNA.
  • the sharper is the curve.
  • the sharpness of the curve can vary.
  • An advantage of the cu ⁇ ent method is that the T m of the second, degenerate primer can be matched to the T m of the fixed, first primer. This matching produces stronger amplification of DNA between the two primers.
  • Finding a degenerate, second primer with a longer fixed sequence that occurs at a higher frequency than expected in a template DNA is advantageous in molecular biology techniques, such as PCR genome walking. If, for example, a primer with 6 fixed nucleotides occurs at 4 times higher than the expected frequency, then the waiting intervals (i.e. , the distance) between the consecutive occu ⁇ ences of this primer will be as expected for a 5-fixed nucleotide sequence. If the 6-mer occurs at 16 times higher than the expected frequency, the waiting intervals will be as expected for a 4-mer. Thus, an 8-mer fixed sequence occurring at a 64 times higher frequency in a template DNA will have a waiting interval normally expected for a 5-mer fixed sequence in that template DNA. In other words, normally, the waiting interval from a fixed, first primer binding site to the degenerate, second primer binding site with a 5-mer fixed sequence is 1024 bases.
  • the waiting interval from the first primer binding site to that of the degenerate, second primer with this 8-mer fixed sequence is 1024 bases.
  • the T m of the 8-mer fixed sequence can be predicted more sharply than the T m of a 5-mer fixed sequence. Therefore, to obtain the amplification of a DNA of a length equivalent to the waiting interval for a 5-mer in a template DNA, it is now possible to use a degenerate, second primer containing a 8-mer fixed sequence that occurs at a 64-fold higher frequency. Furthermore, it is sufficient to use a 64-fold lesser amount of the 8-mer fixed degenerate, second primer preparation in the PCR reaction than would be required using a 5-mer fixed sequence in the second primer. This is because there is a 64-fold lesser number of possible primer sequences in the degenerate, second primer with a 8-mer fixed sequence than with a 5-mer fixed sequence.
  • This method is advantageous in amplifying short and long pieces of DNA from a DNA template. For example, using a longer sequence equivalent in frequency to a 5-mer fixed sequence, one can amplify approximately 1000 base pairs. By using a highly frequent longer sequence (e.g. 11-mer) equivalent to that of a 7-mer fixed sequence, a DNA fragment of approxiamtely 16 kB can be amplified using the conditions for long PCR amplification. Once DNA is amplified using a fixed, first primer and a degenerate, second primer, the amplified DNA can be sequenced. Thus, at least one version of the invention provides for sequencing of DNA templates from unknown DNA regions.
  • T m range is accomplished based on the information present in the known portion of a template DNA.
  • the method enables the sharp-matching of the T m of a fully- known sequence of a fixed, first primer with that of a plurality of degenerate, second primers in which only a portion of the sequence of each second primer is fixed.
  • the basis of this method lies in finding the actual primer species from a degenerate, second primer with a particular fixed sequence that bind a given known sequence portion of a long template DNA, and then analyzing the T m of these species of primers.
  • FIGS. 3 A and 3B show the range of the frequency of the T m of primers with a particular fixed sequence in a template DNA, analyzing the same stretch of the human genomic DNA sequence for both FIG. 3 A and 3B. It demonstrates that some of the degenerate, second primers have a sharp distribution of T m (see FIG. 3B).
  • FIG. 4 shows examples of the occu ⁇ ences of given fixed sequences in a stretch of human genomic DNA. Many of the primers occur at a higher frequency than would be expected for a random sequence. It can be seen from FIG. 4 that if a degenerate, second primer with a particular fixed sequence portion occurs at a much higher frequency, then on average the second primer will bind at a shorter distance from the first primer binding site than expected for a primer with a normally-expected frequency. Therefore, analysis of a given genomic DNA sequence of sufficient length enables the determination of a set of many oligonucleotide sequences that occur at a high frequency.
  • oligonucleotide sequence occurs at a high frequency in one region of the genomic DNA, it occurs with a similar frequency at other regions of the genomic DNA also. Consequently, any given oligonucleotide will occur at a similar high frequency in an unknown sequence region of a template DNA as it does in the known sequence region of the template DNA. Without being limited to a hypothetical explanation for this phenomenon, it is thought that this higher than expected frequency of certain oligonucleotide sequences is possibly due to some biologically functional and structural aspects of a given genomic DNA.
  • Another aspect of the present invention is that in a degenerate, second primer, the longer the fixed sequence portion, the sharper that its T m can be defined with respect to a template DNA. Furthermore, it has been discovered that, due to non-random sequence bias of a given genomic DNA, the degenerate, second primers with empirically-determined fixed oligonucleotide sequences that occur highly frequently have a much sharper T m distribution of all of their occurrences in the template DNA than other oligonucleotide sequences.
  • a set of degenerate, second primers with particular oligonucleotide sequences that have a sharp T ro distribution in a known template DNA can be chosen for use in a molecular biology reaction, such as amplification of DNA of unknown sequence.
  • a combination of both the first and the second aspects of the invention permits the selection of a degenerate, second primer with a longer fixed sequence portion that binds at a shorter distance from the fixed, first primer binding site than would be expected. Furthermore, the plurality of degenerate, second primers has a sharp T m distribution. Advantages of this capability include the following: 1) The
  • T m s of the first and second primers can be matched; 2) a longer fixed sequence portion in the degenerate, second primers yields sharper T m range; 3) a smaller amount of the degenerate, second primer mixture can be used in a PCR reaction to provide a mole equivalent of the particular species of degenerate, second primer with a full-length complementarity at the second primer binding site; and 4) more characteristics of the degenerate, second primer are amenable to analysis and selection for an optimum PCR amplification between the fixed, first primer and the degenerate, second primer.
  • the degenerate, second primers with a higher or lower frequency than expected find use in processes that do not include PCR, such as reverse transcription. Because the lower frequency of such primers is applicable in certain processes, the invention provides a method to find such primers.
  • the T m of either the binding site or the primer itself can be determined.
  • the method can be used with any nucleic acid, such as DNA, RNA, or cDNA.
  • the method can be used to determine the T m of one or only a few degenerate, second primers.
  • the method can be used to analyze only part of a genome, as opposed to the entire genome. It can also be used to analyze a chromosome or any piece of nucleic acid regardless of size.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Medical Informatics (AREA)
  • Wood Science & Technology (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A preferred version of the present invention relates to a method of matching the Tms of a fixed, first primer and degenerate, second primer. The first primer has a completely known sequence. Thus, the Tm of the first primer can be readily calculated. The second primer has a known sequence portion and an unknown sequence portion. The second primer is actually a mixture of primers with each primer within sharing the same known sequence and having a different unknown sequence portion. The present invention enables the matching of the two Tm of the first primer to the Tm of the degenerate, second primer. The close matching of the two Tms greatly improves the efficiency of a PCR reaction with two such primers. Another preferred version of the invention relates to finding degenerate, second primers that occur at frequencies that differ from the frequency than would be expected. These two preferred versions can be combined to provide a method that has a fixed, first primer and a degenerate, second primer that have Tms that are matched. This combination also supplies a degenerate, second primer that is either closer to or farther away from the fixed, first primer than would be expected based on a random distribution of template nucleotides.

Description

METHOD FOR SELECTING PRIMERS FOR AMPLIFICATION OF NUCLEIC ACIDS
PERIANNAN SENAPATHY
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority under 35 U.S. C. §119(e) to U.S. Provisional Patent Application 60/127,891 filed April 6, 1999, the entirety of which is incorporated by reference herein.
FIELD OF THE INVENTION
The invention relates to a method of optimizing the selection of a degenerate, second primer for use in nucleic acid-based reactions. More particularly, the invention relates to a method of optimizing the selection of a degenerate, second primer such that the Tm of the degenerate, second primer matches the Tm of a fixed, first primer.
DESCRIPTION OF THE RELATED ART
The polymerase chain reaction (PCR) technique enables the amplification of DNA which lies between two regions of known sequence (see, U.S. Pat. Nos.
4,683,202 and 4,683,195 to K. B. Mullis etal.). Oligonucleotides complementary to these known sequences at both ends serve as "primers" in the PCR procedure.
Double stranded target DNA is first melted to separate the DNA strands, and then oligonucleotide (oligo) primers complementary to the ends of the segment which is desired to be amplified are annealed to the template DNA.
Nucleic acid primers are typically short fragments of nucleic acid, most commonly DNA, that bind to target sequences of nucleic acid, such as RNA or DNA. Binding is based on the complementarity of the primer sequence to the target sequence. According to base pair rules, A binds with T, and C binds with G. Complementarity can be exact, wherein each nucleotide is bound to its complement, or it can be inexact, wherein not all nucleotides are bound to their complements. The oligos serve as primers for the synthesis of new complementary DNA strands, using a DNA polymerase enzyme and a process known as primer extension. The orientation of the primers with respect to one another is such that the 5 ' to 3 ' extension product from each primer contains, when extended far enough, the sequence which is complementary to the other oligo primer. Thus, each newly synthesized DNA strand becomes a template for synthesis of another DNA strand beginning with the other oligo as primer. Repeated cycles of melting, annealing of oligo primers, and primer extension lead to a (near) doubling, with each cycle, of DNA strands containing the sequence of the template beginning with the sequence of one oligo primer and ending with the sequence of the other oligo primer.
The key requirement for this exponential increase of template DNA is the two oligo primers complementary to the ends of the sequence desired to be amplified, and oriented such that their 3' extension products proceed toward each other. If the sequences at both ends of the segment to be amplified are not known, complementary oligos cannot be made and standard PCR cannot be performed.
This limitation severely restricts primer-based reactions from being used for experiments on unknown target DNA sequences.
U.S. Patent No. 5,994,058 ('058 patent) to Senapathy, which is incorporated herein by reference, discloses PCR amplification of a target DNA using a first primer with a completely known sequence (i.e. , a fixed, first primer) and a plurality of degenerate, second primers of varying partly-fixed sequence. The fixed, first primer binds at a known sequence, whereas at least one of the degenerate, second primers binds at an unknown sequence downstream from the first primer. The completely unknown second primer-binding site is located in the unknown sequence region of the target DNA. The fixed portion of the degenerate, second primer statistically determines the distance at which the degenerate primer binds relative to the primer-binding site for the fixed, first primer. Because the varying sequence region within the degenerate, second primer contains all of the possible sequences (i.e. , each possible permutation is represented in the plurality), there will be a species of the degenerate, second primer preparation that will have a full-length primer sequence that is complementary to an a priori unknown binding site in the target DNA. Consequently, PCR amplification takes place between two primers that are complementary to their respective binding-sites, although the actual primer sequence at the second primer-binding site is unknown previous to the PCR amplification. The method of the '058 patent will work even with one or a few mismatches between the primers and the target nucleic acid, as long as there is sufficient complementarity, to enable standard PCR amplification. Therefore, the method disclosed in the '058 patent permits PCR amplification of an unknown DNA region adjacent to a known DNA region. Once the unknown region has been amplified, it can be sequenced. Consequently, this method can be used to contiguously PCR amplify a long genomic DNA and completely sequence it without having to resort to shotgun cloning and sequencing.
The Tm of a DNA molecule is the midpoint of the temperature range over which the two strands of DNA separate. The Tm depends on the proportion of GC base pairs. Because GC base pairs have three hydrogen bonds, whereas AT base pairs have only two, more energy is required to melt the GC base pairs than the AT base pairs. Thus, the more GC base pairs in a given DNA molecule, the more energy it will take to separate the strands.
When designing PCR primers, the Tm of the primers of a primer set should be matched. In conventional PCR amplification, the range of the Tm between a first and a second primer is usually the Tm of the first primer plus or minus 3-5 °C for the second primer. Having primers with closely-matched Tm's results in more efficient amplification. This is because each of the primers is able to bind efficiently and specifically to the template at the same temperature during the annealing step of the PCR.
In the method described in the '058 patent, the Tm of the fixed, first primer is known because its sequence is fully known. However, the Tm of the degenerate, second primer is not known, because only part of the degenerate, second primer sequence is known and the rest of its sequence is randomized. As noted above, the efficiency of the PCR amplification is significantly diminished when the Tm of one primer does not match the Tm of a second primer. Thus, the object of the present invention is to provide a method of specifically matching the Tm of a fixed, first primer with a completely known sequence to the Tm of a degenerate, second primer with only a partly fixed sequence.
SUMMARY OF THE INVENTION A first embodiment of the invention is drected to a method of determining a Tm range for a plurality of degenerate oligonucleotide primers having a fixed- sequence portion and a degenerate-sequence portion. The method comprises the steps of: (a) searching a known portion of a nucleic acid template for a sequence complementary to a desired fixed-sequence portion of a primer within the plurality ; (b) identifying nucleotide base pairs flanking or interspersed between the sequence of step (a), the base pairs flanking or interspersed between the sequence of step (a) being complementary to at least one degenerate- sequence portion of one of the primers present in the plurality, whereby one or more sequences of potential binding sites for the plurality of primers are elucidated; and then (c) calculating Tm of primers whose fixed- sequence portion and degenerate- sequence portion are complementary to the potential binding sites of step (b).
A second embodiment of the invention is directed to a method for selecting a plurality of degenerate primers, each primer within the plurality having a fixed- sequence portion and a degenerate-sequence portion, the plurality of degenerate primers having a fixed-sequence portion that occurs at a frequency in a nucleic acid template than is different than expected based on a random distribution of nucleotides in the template. This method comprises the following steps: (a) determining occuπences of the fixed- sequence portion of the degenerate primers in a given template sequence; statistically determining a mean number of occurrences within a hypothetical template having a random distribution of nucleotide base pairs of a hypothetical primer equal in length to the degenerate primers; and then (c) selecting the degenerate primers based upon whether the fixed- sequence portion of the primers has a different number of occuπences in the template than the hypothetical primer of the same length in the hypothetical template having a random distribution of nucleotide base pairs.
A third embodiment of the invention is directed to a method of PCR amplification using a fixed oligonucleotide primer and a plurality of Tm-matched degenerate oligonucleotide primers, each degenerate primer having a fixed- sequence portion and a degenerate-sequence portion. The method comprises the steps of: (a) searching a known portion of a nucleic acid template for a sequence complementary to a desired fixed- sequence portion of a primer within the plurality; (b) identifying nucleotide base pairs flanking or interspersed between the sequence of step (a), the base pairs flanking or interspersed between the sequence of step (a) being complementary to at least one degenerate- sequence portion of one of the primers present in the plurality, whereby one or more sequences of potential binding sites for the plurality of primers are elucidated; and then (c) calculating Tm of primers whose fixed- sequence portion and degenerate-sequence portion are complementary to the potential binding sites of step (b); and (d) amplifying the template by PCR using an oligonucleotide primer of fixed sequence and and the plurality of primers, the fixed sequence primer having a Tm within the Tm range of the plurality of primers.
A prefeπed version of the present invention relates to a molecular biology method that uses two primers (e.g. , PCR amplification) having a first and a second primer. The first primer has a completely known sequence. Thus, the Tm of the first primer can be readily calculated. The second primer has a known sequence portion and an unknown sequence portion. The second primer is actually a mixture of primers with each primer within the mixture sharing the same known sequence but having a different unknown sequence portion. The present invention enables the matching of the Tm of the fixed, first primer to the Tra of the degenerate, second primer. The close matching of the two Tms greatly improves the efficiency of a PCR reaction with two such primers.
The degenerate, second primer having a fixed portion is designed to bind to a DNA template, preferably a genomic DNA template, that has at least a partially known DNA sequence. In the known portion of the DNA template, several binding sites are determined for the degenerate, second primer. All of the occuπences of a degenerate, second primer are analyzed in the known portion of a given DNA template. Because the degenerate, second primers are variable, each site where the degenerate, second primer binds, will be different from the rest of the sequence. However, because a portion of the template DNA sequence is known, the particular sequences of the primer's degenerate portion that bind the template DNA sequence next to the binding sites of the primer's fixed sequence can be determined by examining the template sequence. Various features of the primer sequences, such as the G/C content and the Tm can be analyzed. The statistical spread of the Tm, and the average Tm for all the occuπences of a degenerate, second primer can also be analyzed.
In this analysis, the Tm of the various oligonucleotides with a particular fixed sequence that occur in a template DNA sequence are generally distributed in a random manner. However, surprisingly, the Tm for oligonucleotides with some particular fixed sequences occurs in a sharper range, and occurs even sharper for a smaller subset of the fixed sequences.
For some degenerate, second primers, the spread of the Tms is sharp (i.e. , the spread of the Tms is small) over a known portion of the template DNA sequence. It was discovered in the present invention that the information from one known portion of a genomic template DNA sequence is also true for another known portion of that genomic DNA, for instance in another chromosome. Therefore, for a given fixed sequence, once the spread of the Tms for a known portion is determined, this information can be used to analyze the unknown portion. Many different given fixed sequences can be analyzed, thereby permitting a determination of a set of oligonucleotides that have a sharp Tm range in a given unknown template DNA sequence.
For instance, if a stretch of one million base pairs of DNA is known from the human genome, the characteristics of the second primer binding sites within this one million-base pair sequence can be analyzed. These characteristics can then be generally applied to the unknown portion of the human genomic DNA. It was also discovered that particular oligonucleotides occur at a much higher frequency in a genomic DNA sequence than statistically expected. When the frequency is high in a known portion of a long DNA sequence, then this phenomenon is also applicable to the unknown sequence region of the template. This uniform high frequency of certain oligonucleotides in a template sequence permits a longer fixed sequence to be incorporated into a degenerate, second primer. This is because more-frequently occurring, longer sequences will appear at intervals closer to the first primer binding site. Consequently, a unit length of unknown DNA sequence (say, 1 kb) can be amplified with a longer fixed-sequence primer. In general, the longer a primer is in its fixed portion, the better it performs in the PCR.
The longer fixed sequence primer will bind at shorter distances from a given fixed, first primer than statistically expected in a template DNA. This finding permits the present invention to be used in PCR amplification of an unknown DNA sequence with the CGS method more efficiently. Conversely, some oligonucleotides occur at a much lower frequency in a genomic DNA sequence than statistically expected. If amplification between two, more distant primers is desired, this feature permits such amplification. Also, primers whose binding sites are unique or rare can be designed to include a coπesponding, less frequently - occurring sequence
Further advantages, features, and objects of the invention will be apparent from the following detailed description of the invention in conjunction with the associated drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an illustration showing a method for determining the occuπence of the partly fixed degenerate, second primer in the known template DNA sequence.
FIG. 2 is a drawing depicting the occuπences of a primer having a 6-base fixed sequence within a template along with the neighboring, unknown sequence of
5 bases. FIG. 3A shows the distribution of Tms of a degenerate, second primer, the distribution not having a sharp range.
FIG.3B shows the distribution of Tms of another degenerate, second primer, the distribution having a sharp range FIG. 4 is a diagram denoting the frequency distribution of the occuπences of a degenerate, second primer with a particular fixed sequence.
DETAILED DESCRIPTION OF THE INVENTION The cuπent invention is based on a recently developed method disclosed in U.S. Patent No. 5,994,058 ('058 patent), which describes a new contiguous genome amplifying and sequencing method that allows the contiguous amplification of a very long DNA without the need for it to be subcloned. It uses the basic PCR technique but circumvents the usual need of PCR for sufficient knowledge of the sequence of the target DNA to enable the fabrication of two primers for contiguous sequencing. The technology of the '058 patent makes it possible to PCR-amplify
DNA of unknown sequence which lies adjacent to a DNA of known sequence using a fixed, first primer and a plurality of degenerate, second primers. The sequence of the binding site of the degenerate, second primer does not need to be known.
The '058 patent method is for amplifying an unknown DNA using a primer complementary to a known sequence region of the target DNA, and a partly degenerate, second primer that will bind downstream from the first primer, in a region of unknown sequence. The plurality of degenerate, second primers each contain a randomized sequence portion, and a fixed sequence portion, making the second primers partly degenerate. The fixed sequence portion statistically determines the location of its binding in relation to the fixed, first primer (in the 3 ' direction). In other words, the number of fixed nucleotides in the degenerate, second primer statistically determines the average distance that the degenerate, second primer is expected to bind downstream from the fixed, first primer. With a standard stringent temperature of annealing, the actual sequence of the fixed nucleotides in the second primer determines exactly where the second primer binds . The '058 patent does not contemplate determining the temperature of annealing, (i.e., the Tm) of the degenerate, second primer. Because the exact nucleotide sequence of any given primer within the plurality of degenerate, second primers (other than the fixed nucleotide sequence) is random, in the '058 patent the Tm of the degenerate, second primer cannot be matched with that of the fixed, first primer. Such a matching is desirable because it generates a very efficient PCR amplification of the DNA. Therefore, a method for determining the Tm of the degenerate, second primer is greatly beneficial for matching the Tm of the second primer with that of a first fully known primer. The cuπent method permits the determination of the Tm average of a number of degenerate, second primers with different partly-fixed sequences. For a given fixed, first primer (and thus with a specific TJ, a degenerate, second primer can be chosen that: (1) matches the first primer's Tm; and (2) binds at an appropriate distance from the first primer. The information for selecting a desired Tm of the degenerate, second primer is derived by analysis of the known DNA sequence regions in a template genomic DNA. This Tm information is applicable to the unknown sequence regions of the template, thereby allowing for judicious selection of the plurality of second, degenerate primers.
A method is presented for determining a sharp (i.e. , naπow) range of Tm's for a degenerate, second primer with a partly-fixed sequence, based on the sequence information present in a template genomic DNA. This method permits the sharp-matching of the Tm of a fixed, first primer having a known sequence with the Tm of the degenerate, second primer, thereby optimizing PCR amplification using degenerate primers. It was discovered that some primer species of a plurality of degenerate, second primers have a consistently sharp Tm range within the known portion of the DNA target. It has been determined in the present invention that this feature is also uniformly applicable to the unknown portion of the genomic DNA, which thereby permits the definition a sharp Tm range for a given plurality of degenerate, second primers. A database of many different degenerate, second primers with a defined Tm permits the matching of the Tm of the degenerate, second primers to that of a fixed, first primer.
A second aspect of the invention is based on the discovery that the frequency of particular oligonucleotide sequences are far higher than statistically expected in a given template DNA, such as human genomic DNA. This feature is also applicable to the unknown sequence portion of the DNA. Different known sequences are analyzed to determine their frequency of occuπence within the template DNA. By analyzing the template DNA for the frequencies of many different sequences of varying length, a list of potential binding sites with high frequency and occurring at a uniform distribution throughout the known sequence region is generated. This information is used to generate complementary oligonucleotides of fixed sequence for use as first primers or second primers.
Next, for each of these highly frequent, fixed oligonucleotide sequences, the Tm average for all occuπences of a plurality of degenerate, second primers in a known template DNA sequence is determined using the method described below.
This permits the selection of a plurality of degenerate, second primers with longer fixed sequences for binding at shorter distances from the first primer in a given genomic DNA; the plurality also will have a sharply defined Tm average.
Another prefeπed embodiment of this invention provides a method for the definition and development of degenerate, second primers specifically applicable to genomes of different species. Because the oligonucleotide distribution characteristics of different genomes vary, the present invention can be used to find distinct sets of degenerate, second primers, with different fixed sequences that are specifically applicable to different genomic sequences. These sequence properties can also be used to design first primers that are species-specific.
An immediately apparent advantage of the present invention is that by using a database of different pluralities of highly frequent oligonucleotides having sharply defined Tm averages in a given template DNA allows the selection of primers which produce a more efficient PCR amplification. The method also permits the efficient amplification of an unknown DNA sequence downstream of a known sequence, as if the second primer sequence was fully fixed, rather than only partially fixed. This method is also advantageous in PCR-walking of large genomic DNA.
Determining the Sequence of the Variable Portion of the Degenerate. Second Primer from the Known Sequence of the Template DNA: As FIGS. 1-2 illustrate, for the known portion of the DNA template, the binding sites are determined for the degenerate, second primers. This is accomplished by searching the known portion of the DNA template for the presence of the fixed portion of the degenerate , second primer. Wherever the fixed portion of the degenerate, second primer occurs in the template DNA, bases are added to one side of the fixed sequence to get the full primer-binding site. From this information, the reverse complementary sequence of the degenerate, second primer which binds to that site is now known, including both the fixed and variable portions. The degenerate, second primer's variable portion may be different at different locations where it binds to the DNA template. However, at all of these sites, the reverse complementary sequence of the variable sequence are known on the known portion of the DNA template.
Next, the Tm's of the species within the plurality of degenerate, second primers are determined by a method such as the one detailed below. This permits the statistical features of the variable portions to be computed for a known DNA template. For a given degenerate, second oligonucleotide primer, for example, a 16-mer, of which 8 bases are fixed and 8 bases are variable, the Tm of the known sequence of 8 bases is inherent in the sequence itself. Thus, it can be calculated based on a known formula, such as 2°C for every A or T, and 4°C for every C or G. Other methods to calculate Tm that are known in the art can be used.
For the 8 variable bases that are next to the fixed sequence, the average Tm can be predicted quite accurately based upon the complementary sequences found in the known portions of the template DNA. This average Tm will remain accurate even when amplifying unknown portions of the template DNA. Thus, the particular primer species from among the degenerate, second primer mixture that specifically bind to a particular primer-binding site is determined by the exact sequence in the template DNA flanking the fixed- sequence binding site. The Tm of the degenerate, second primer, therefore, is determined by the sequence of the template at this location.
Within the known sequence regions of the template, all of the binding locations of a degenerate, second primer with a particular fixed sequence can be determined because all of these are determined based on examining the sequences adjacent the fixed sequence. Thus, all of the full-length sequences of a degenerate, second primer with a particular fixed sequence that can bind to a given template DNA can be determined if at least part of the template DNA sequence is known. This data can be analyzed for all of the full-length primers for a given fixed sequence of the degenerate, second primer. The actual distribution of bases and sequences in a given genomic sequence depends upon various factors inherent and specific to a given template DNA, such as the G+C content, other sequence biases, and preferences in the DNA sequence. It was determined that this is true for a subset of all the possible sequences of a given length. Several features of the Tm and other parameters of the primer sequences with respect to a template DNA can be determined. For example, because of biological sequence biases within a given template DNA, oligonucleotides with varying statistical and other characteristics occur within it. The systematic analysis of the present invention provides a method for choosing some of these oligonucleotides with particular sequence biases that can be very advantageous in molecular biology techniques, such as PCR genome walking.
If the sequence of at least part of a genome is known, then the sequences flanking a particular fixed base sequence in known portions of the template DNA can be determined. The various sequence features of the flanking regions can then be subjected to statistical analysis. Using this information, the Tm's of corresponding full-length primers complementary to the fixed sequence and the variable flanking regions can be determined. The frequency of primers having a given Tra can be determined, and the frequency of distribution of these primers with a particular Tm can also be determined. If the frequency pattern is such that most primers occur within a naπow range of Tm, then this range can be used as the most effective of the Tm of the second primers having a fixed base sequence portion. That is, if the Tm of all of the degenerate, second primers with a given fixed sequence portion fall within a naπow range of, for example, 5°C, then any one of these primers whose Tm falls roughly in the middle of the 5°C temperature range will be an optimally effective second primer. In this manner, the characteristic features of the sequences flanking the fixed base sequence in the second, degenerate primer can be derived from the known sequence information of the long template DNA.
Sharper Tm Ranges of Degenerate, Second Primers for Certain Fixed Sequences with Respect to a Template DNA: In conventional PCR amplification, the range of the Tm between a first and a second primer is normally 3-5 ° C. If more than 60 % of the primer species of the plurality of degenerate , second primers of the present invention, falls within this range, it is a great advantage. At this percentage, by using two such primers with different fixed base regions that have a Tm within the given range, the probability is close to 1 that one of the degenerate, second primers will match the Tm of the first primer within the range desired. It was found that the Tm distribution of many degenerate, second primers occurs in a generally random manner, i.e., without any particular statistical deviations from a purely random distribution. However, the distribution of the Tm of some of the degenerate, second primers is a sharp, bell- shaped curve (i.e. , a normal distribution). A few of these curves are very sharp, with a spread of the Tm within ~2°C. FIGS. 3 and 3B illustrate examples of two different pluralities of degenerate, second primers having different fixed sequences. FIG. 3 A has a fixed sequence of CGGGGCCG. FIG. 3B has a fixed sequence of GGCGGGCGG. The binding sites within a known portion of a template DNA was determined for each degenerate, second primer. The position of the binding site within the DNA template, the sequence of the binding site, and the Tm of the binding sites are listed respectfully. An "F" to the left of the binding site position indicates that the primer has the forward sequence in the template. The absence of an "F" indicates that the primer has the reverse complement sequence of the template.
A degenerate primer (e.g. , a 16-mer) having 8 fixed bases and 8 variable bases will have 48 (65 ,536) different sequences in the 8 bases of the variable region.
However, in a template DNA sequence of one million bases, only about 16 occuπences of a particular 8-base sequence are expected to occur. Although the Tm of the fixed 8-bases is constant, the variable bases in the individual species of the plurality of degenerate primers can vary between the minimum if all the bases are As or Ts (2°C x 8 = 16°C) and the maximum if all bases are Cs or Gs (4°C x 8 = 32 ° C) . As noted above, however, by analyzing the known sequence portions of the template DNA, the regions flanking the fixed portion of the second, degenerate primers can be determined. This knowledge can then be used to compute an average Tm for the subset of second primers which will actually bind to the templates. While the Tm range of degenerate primers having many different fixed sequences occur in a template DNA in a random manner, some occur with a sharp bell-shaped curve, and with a sharp range of Tm. This situation is deviant from that which would be expected for a random sequence. In the example shown in FIG. 3A, the Tm range is quite broad. However, the example shown in FIG. 3B, the Tm range is very sharp. In FIGS. 3A-B, the distribution of the sequences' Tms are shown, with the mean, median, and mode shown below the sequences. The template DNA used in these examples was human genomic DNA. In comparing FIG. 3A with FIG. 3B, several items are notable. First, the distribution of the Tms of the primers of FIGS . 3A and 3B are both a bell-shaped curve. Second, as noted above, the curve of FIG. 3A is not a sharp curve, whereas the curve of FIG. 3B is. In using the inventive method to analyze empirically the Tms in known genomic DNA sequences, such as the human genomic DNA illustrated here, it was discovered that the Tm occurs in a dramatically short range of within 5-10°C for certain primer sets. From a larger list of such primers, a sub-set of many primers whose Tm fall within even a much naπower range of from 3-5 °C can be chosen.
Furthermore, the range of the Tm is controlled by the length of the fixed sequence in the second, degenerate primer. The longer the fixed sequence within a degenerate, second primer, the naπower the range in which all of the Tms of the different second primer binding sites occur. From a large set of such primers, a smaller subset can be chosen with a very naπow distribution of Tms within a desired range. Higher Frequency of Degenerate. Second Primers with Certain Fixed Sequences in a Template DNA: It was also discovered that in a given template DNA sequence, some of the oligonucleotides occur at a far higher frequency than is statistically expected. Sometimes these occur at 50-100 times more than their expected frequency. FIG. 4 illustrates the frequency distribution of the occuπence of an exemplary degenerate, second primer with a particular fixed sequence. As in the previous example, human genomic DNA was the template used to generate the data presented in FIG. 4. A degenerate, second primer, for example, a 16-mer with a particular fixed sequence of 8 bases and 8 variable bases, will have 48 (65,536) different sequences in the primers' variable portion. In a template sequence of, for example, one million bases, only about 16 occuπences of a particular 8-base sequence are expected to occur. However, in a biological DNA such as the human genome, a given fixed sequence of 8 bases may occur at either the expected frequency, at a frequency much lower than expected, or at a frequency much higher than expected. In the sample shown in FIG. 4, a sequence of about
500,000 bases of the human genome was analyzed for the occuπence of various fixed sequence 9-mers. The number of occuπences is listed to the right of each sequence. A map showing the approximate position of the occuπence is shown to the right of the frequency. As FIG. 4 shows, the number of occuπences for the various 9-mers ranges from 6 to 62. That is, for the 9-mers analyzed in this
500,000 base portion of the human genome, one primer occuπed only 9 times (for an average of one occuπence per 55,555 bases). At the other extreme, one primer occuπed 62 times (for an average of one occuπence per 8,065 bases).
By using a particular long fixed sequence that occurs highly frequently, one can obtain a naπower range of Tm for all the occuπences. Another advantage of using a longer fixed- sequence in the second primer is that a lesser quantity of primer can be used in the PCR reaction. This is because the more fixed bases there are in a primer, the fewer variable bases there will be, and hence fewer permutations of the second, degenerate primer. Thus, more primers from the plurality of second, degenerate primers will have a sequence that is complementary to a given template. This advantage makes it more statistically probable that a mole equivalent of the particular degenerate, second primer sequence to that of the first primer will be provided.
It was also discovered that the statistical distribution features of degenerate, second primers with certain fixed sequences are uniform in a template DNA. Thus, the features observed in a known sequence region can be applied to the unknown sequence region. The method uses the information present in the known sequence region of a genomic DNA for identifying a degenerate, second primer's actual sequences that have the capability to bind to the template DNA in the unknown region. It is known that the frequency of different oligonucleotides in a given genomic DNA can vary from expected frequency for a random DNA sequence based on the GC content of the genomic DNA, and many other types of sequence biases that are inherent in a given genomic DNA. However, it was found that when a given oligonucleotide sequence occurs at a higher frequency in a known portion of a given template genomic DNA, its distribution is uniformly highly frequent throughout the entire genomic DNA including the unknown portion. Furthermore, the frequency is not greatly biased in different regions of the genome. Thus, it can be expected that the frequency distribution of binding sites to a given primer in a known region of a genomic DNA will also occur in the unknown sequence regions. Therefore, the findings of the cuπent invention using a known sequence portion of a given genome can be applied to the unknown sequence portion of the same genome. Based on this, the Tm range of the partly-fixed degenerate, second primers determined from the known sequence regions can be applied to the unknown sequence regions in a genomic DNA. Procedure: A prefeπed procedure in accordance with the present invention is as follows. As is illustrated in FIG. 2, the known sequence region of a template DNA is searched for the presence of a given fixed oligonucleotide, (e.g. , CGGCCC) of a given length, (i.e. , 6). When the fixed sequence is found in the template, the neighboring bases making up the full-length of the second, degenerate primer are also determined by adding bases to one side of the fixed sequence. In the example of FIG. 2, the following sequences are found in the template (with the fixed sequence underlined): CGGCCCTATCG. CGGCCC AGGCC , AATGACGGCCC. and CGGCCCTACCT.
Wherever the fixed sequence occurs, the annealing temperature (Tm) for the complete primer of a given length including the fixed nucleotide sequence, (e.g. , 11 nucleotides including 6 fixed nucleotides), downstream of the fixed sequence is computed. The Tm of all the occuπences of these sequences are determined. For the example in FIG. 2, the Tm of the fixed portion is 4°C x 6 = 24° C. The Tm of the variable portion and the full binding site is as shown in Table 1 below (all temperatures are listed in °C).
Table 1
Binding Sequence of Tm of the variable portion Tm of the full
Site variable portion binding site first TATCG (4° x 2) +(2° x 3) = 14° 24° + 14° = 38° second AGGCC (4° x 4) +(2° x 1) = 18° 24° + 18° = 42° third AATGA (4° x 1) +(2° x 4) = 12° 24° + 12° = 36° fourth TACCT (4° x 2) +(2° x 3) = 14° 24° +14° = 38°
The frequency of occuπence of various Tm is also determined. For the example of FIG. 2, the following frequencies exist: 36°C (25 %), 38°C (50%), 42 °C (25 %). Additionally, the mean, median, and mode are determined. For the example of FIG. 2, the mean Tm is 38.5°C, the median is 38 °C, and the mode is 38°C.
For the examples given in FIGS. 3A-B, the results of the frequency graphs showed that generally the distribution of the Tm is a bell shaped curve (normal distribution). The important information that can be derived from the curve is the most probable Tm of the degenerate, second primers with a particular fixed sequence that bind a given genomic DNA. Usually, the longer the fixed sequence in the degenerate, second primer, the sharper is the curve. Also, for different fixed sequences of a given length, the sharpness of the curve can vary. An advantage of the cuπent method is that the Tm of the second, degenerate primer can be matched to the Tm of the fixed, first primer. This matching produces stronger amplification of DNA between the two primers.
Finding a degenerate, second primer with a longer fixed sequence that occurs at a higher frequency than expected in a template DNA is advantageous in molecular biology techniques, such as PCR genome walking. If, for example, a primer with 6 fixed nucleotides occurs at 4 times higher than the expected frequency, then the waiting intervals (i.e. , the distance) between the consecutive occuπences of this primer will be as expected for a 5-fixed nucleotide sequence. If the 6-mer occurs at 16 times higher than the expected frequency, the waiting intervals will be as expected for a 4-mer. Thus, an 8-mer fixed sequence occurring at a 64 times higher frequency in a template DNA will have a waiting interval normally expected for a 5-mer fixed sequence in that template DNA. In other words, normally, the waiting interval from a fixed, first primer binding site to the degenerate, second primer binding site with a 5-mer fixed sequence is 1024 bases.
But, if a given 8-mer occurs 64 times more than expected, then the waiting interval from the first primer binding site to that of the degenerate, second primer with this 8-mer fixed sequence is 1024 bases.
The importance of this discovery is that the Tm of the 8-mer fixed sequence can be predicted more sharply than the Tm of a 5-mer fixed sequence. Therefore, to obtain the amplification of a DNA of a length equivalent to the waiting interval for a 5-mer in a template DNA, it is now possible to use a degenerate, second primer containing a 8-mer fixed sequence that occurs at a 64-fold higher frequency. Furthermore, it is sufficient to use a 64-fold lesser amount of the 8-mer fixed degenerate, second primer preparation in the PCR reaction than would be required using a 5-mer fixed sequence in the second primer. This is because there is a 64-fold lesser number of possible primer sequences in the degenerate, second primer with a 8-mer fixed sequence than with a 5-mer fixed sequence.
This method is advantageous in amplifying short and long pieces of DNA from a DNA template. For example, using a longer sequence equivalent in frequency to a 5-mer fixed sequence, one can amplify approximately 1000 base pairs. By using a highly frequent longer sequence (e.g. 11-mer) equivalent to that of a 7-mer fixed sequence, a DNA fragment of approxiamtely 16 kB can be amplified using the conditions for long PCR amplification. Once DNA is amplified using a fixed, first primer and a degenerate, second primer, the amplified DNA can be sequenced. Thus, at least one version of the invention provides for sequencing of DNA templates from unknown DNA regions.
UTILITY OF THE INVENTION A method is presented for specifying a sharp range of Tm for a plurality of degenerate, second primers with a partly-fixed sequence. The determination of the
Tm range is accomplished based on the information present in the known portion of a template DNA. The method enables the sharp-matching of the Tm of a fully- known sequence of a fixed, first primer with that of a plurality of degenerate, second primers in which only a portion of the sequence of each second primer is fixed. The basis of this method lies in finding the actual primer species from a degenerate, second primer with a particular fixed sequence that bind a given known sequence portion of a long template DNA, and then analyzing the Tm of these species of primers. It also lies in the discovery that when the primer species of degenerate, second primers with certain fixed sequences have a consistently sharp Tm range within the known portion of the template DNA, that this feature is applicable uniformly to the unknown portion of the template DNA. This aspect of the invention allows the definition of a sharp Tm range for a given degenerate, second primer with a particular fixed sequence. A database of many such different degenerate, second primers with different fixed sequences permits the matching of degenerate, second primers with particular Tm averages to the Tm of a fixed, first sequence primer that is fully known.
First, using the human genome as an example, it was discovered that a few particular sequences among all the possible oligonucleotides of a given length, (e.g., 7-mer, 8-mer, or 12-mer) occur in a given genomic DNA at a much higher frequency than statistically expected in a random DNA sequence template.
Choosing one such oligonucleotide sequence as a fixed part of the degenerate, second primer permits it to occur, statistically, at a shorter distance from the fixed, first primer binding site on the template DNA than would be normally expected. FIGS. 3 A and 3B show the range of the frequency of the Tm of primers with a particular fixed sequence in a template DNA, analyzing the same stretch of the human genomic DNA sequence for both FIG. 3 A and 3B. It demonstrates that some of the degenerate, second primers have a sharp distribution of Tm (see FIG. 3B).
FIG. 4 shows examples of the occuπences of given fixed sequences in a stretch of human genomic DNA. Many of the primers occur at a higher frequency than would be expected for a random sequence. It can be seen from FIG. 4 that if a degenerate, second primer with a particular fixed sequence portion occurs at a much higher frequency, then on average the second primer will bind at a shorter distance from the first primer binding site than expected for a primer with a normally-expected frequency. Therefore, analysis of a given genomic DNA sequence of sufficient length enables the determination of a set of many oligonucleotide sequences that occur at a high frequency.
In addition, if it is found in practicing the present invention that a particular oligonucleotide sequence occurs at a high frequency in one region of the genomic DNA, it occurs with a similar frequency at other regions of the genomic DNA also. Consequently, any given oligonucleotide will occur at a similar high frequency in an unknown sequence region of a template DNA as it does in the known sequence region of the template DNA. Without being limited to a hypothetical explanation for this phenomenon, it is thought that this higher than expected frequency of certain oligonucleotide sequences is possibly due to some biologically functional and structural aspects of a given genomic DNA.
Another aspect of the present invention is that in a degenerate, second primer, the longer the fixed sequence portion, the sharper that its Tm can be defined with respect to a template DNA. Furthermore, it has been discovered that, due to non-random sequence bias of a given genomic DNA, the degenerate, second primers with empirically-determined fixed oligonucleotide sequences that occur highly frequently have a much sharper Tm distribution of all of their occurrences in the template DNA than other oligonucleotide sequences. Analyzing a large number of oligonucleotides for their Tm distribution, a set of degenerate, second primers with particular oligonucleotide sequences that have a sharp Tro distribution in a known template DNA can be chosen for use in a molecular biology reaction, such as amplification of DNA of unknown sequence.
A combination of both the first and the second aspects of the invention permits the selection of a degenerate, second primer with a longer fixed sequence portion that binds at a shorter distance from the fixed, first primer binding site than would be expected. Furthermore, the plurality of degenerate, second primers has a sharp Tm distribution. Advantages of this capability include the following: 1) The
Tms of the first and second primers can be matched; 2) a longer fixed sequence portion in the degenerate, second primers yields sharper Tm range; 3) a smaller amount of the degenerate, second primer mixture can be used in a PCR reaction to provide a mole equivalent of the particular species of degenerate, second primer with a full-length complementarity at the second primer binding site; and 4) more characteristics of the degenerate, second primer are amenable to analysis and selection for an optimum PCR amplification between the fixed, first primer and the degenerate, second primer.
In addition to the above uses, the degenerate, second primers with a higher or lower frequency than expected find use in processes that do not include PCR, such as reverse transcription. Because the lower frequency of such primers is applicable in certain processes, the invention provides a method to find such primers.
These methods find many applications, for example, in contiguous genome sequencing, in finding oligomers that may melt a given genomic DNA either in vitro or in vivo for possible diagnostic or therapeutic applications, or for finding important anti-sense RNAs or DNAs that could inhibit the functions of such oligonucleotides, and many other biological or biochemical applications that may utilize such characteristics of the oligonucleotides. These other uses are within the purview of the present invention. It is understood that the various prefeπed embodiments are shown and described above to illustrate different possible features of the invention and the varying ways in which these features may be combined. Apart from combining the different features of the above embodiments in varying ways, other modifications are also considered to be within the scope of the invention.
First, the Tm of either the binding site or the primer itself can be determined.
Second, the method can be used with any nucleic acid, such as DNA, RNA, or cDNA.
Third, the method can be used to determine the Tm of one or only a few degenerate, second primers.
Fourth, the method can be used to analyze only part of a genome, as opposed to the entire genome. It can also be used to analyze a chromosome or any piece of nucleic acid regardless of size.
It is understood that the invention is not confined to the particular construction and aπangement of parts herein illustrated and described, but embraces such modified forms thereof as come within the scope of the following claims.

Claims

CLAIMSWhat is claimed is:
1. A method of determining a Tm range for a plurality of degenerate oligonucleotide primers having a fixed- sequence portion and a degenerate- sequence portion, comprising:
(a) searching a known portion of a nucleic acid template for a sequence complementary to a desired fixed-sequence portion of a primer within the plurality;
(b) identifying nucleotide base pairs flanking or interspersed between the sequence of step (a), the base pairs flanking or interspersed between the sequence of step (a) being complementary to at least one degenerate-sequence portion of one of the primers present in the plurality, whereby one or more sequences of potential binding sites for the plurality of primers are elucidated; and then
(c) calculating Tm of primers whose fixed-sequence portion and degenerate- sequence portion are complementary to the potential binding sites of step (b).
2. The method of claim 1 , further comprising calculating a mean, median, mode, and range of the Tms for the primers of step (c).
3. The method of claim 1, further comprising calculating a frequency of occuπences of the Tms for the primers of step (c) .
4. The method of claim 3, wherein the frequency has a normal distribution.
5. The method of Claim 1 , wherein each primer within the plurality of degenerate primers has an identical fixed- sequence portion and further comprising repeating steps (a) through (c) for another and distinct plurality of degenerate oligonucleotide primers having a fixed-sequence portion and a degenerate- sequence portion, wherein in the other plurality of degenerate primers each primer within the plurality has an identical fixed- sequence portion distinct from the fixed-sequence portion of the primers of the first plurality, and then comparing the Tm ranges of the first plurality of primers and the other plurality of primers.
6. The method of Claim 5, wherein the desired Tm range within either plurality of primers is no more than about 5 °C.
7. The method of Claim 5, wherein the desired Tm range within either plurality of primers is no more than about 3°C.
8. The method of Claim 1 , further comprising in step (d) constructing a fixed- sequence primer having a Tm within the Tm range for the plurality of degenerate primers.
9. The method of claim 1 , further comprising repeating steps (a), (b), and (c), and in step (a) selecting another and distinct sequence complementary to another and distinct desired fixed- sequence portion of a primer.
10. The method of claim 9, further comprising calculating a mean, median, and mode of the Tms for the primers having the distinct fixed- sequence portion.
11. The method of claim 9, further comprising calculating a frequency of occuπences of the Tms for all of the binding sites.
12. The method of claim 11 , wherein the frequency has a normal distribution.
13. The method of claim 9, further comprising selecting a plurality of degenerate primers having a desired Tm range.
14. The method of Claim 13, wherein the Tm range is no more than about 5 °C.
15. The method of Claim 13 , wherein the Tm range is no more than about 3 ° C .
16. A method for selecting a plurality of degenerate primers , each primer within the plurality having a fixed- sequence portion and a degenerate- sequence portion, the plurality of degenerate primers having a fixed- sequence portion that occurs at a frequency in a nucleic acid template that is different than expected based on a random distribution of nucleotides in the template, comprising
(a) determining occuπences of the fixed- sequence portion of the degenerate primers in a given template sequence;
(b) statistically determining a mean number of occuπences within a hypothetical template having a random distribution of nucleotide base pairs of a hypothetical primer equal in length to the degenerate primers; and then
(c) selecting the degenerate primers based upon whether the fixed- sequence portion of the primers has a different number of occuπences in the template than the hypothetical primer of the same length in the hypothetical template having a random distribution of nucleotide base pairs.
17. The method of claim 16, further comprising amplifying the given template sequence with the plurality of degenerate primers and a fixed primer.
18. The method of claim 16, wherein the frequency of occuπences of the fixed- sequence portion of the degenerate primers is greater than statistically expected.
19. The method of claim 16, wherein the frequency of occuπences of the fixed- sequence portion of the degenerate primers is less than statistically expected.
20. A method of PCR amplification using a fixed oligonucleotide primer and a plurality of Tm-matched degenerate oligonucleotide primers, each degenerate primer having a fixed- sequence portion and a degenerate-sequence portion comprising
(a) searching a known portion of a nucleic acid template for a sequence complementary to a desired fixed- sequence portion of a primer within the plurality;
(b) identifying nucleotide base pairs flanking or interspersed betw een the sequence of step (a), the base pairs flanking or interspersed between the sequence of step (a) being complementary to at least one degenerate-sequence portion of one of the primers present in the plurality, whereby one or more sequences of potential binding sites for the plurality of primers are elucidated;
(c) calculating Tm of primers whose fixed- sequence portion and degenerate- sequence portion are complementary to the potential binding sites of step (b); and then
(d) amplifying the template by PCR using an oligonucleotide primer of fixed sequence and and the plurality of primers, the fixed sequence primer having a Tm within the Tm range of the plurality of primers.
21. The method of Claim 20, wherein the Tm range in step (c) is no more than about 5 °C.
22. The method of Claim 20, wherein the Tm range in step (c) is no more than about 3 °C.
23. The method of Claim 20, wherein each primer within the plurality of degenerate primers has an identical fixed-sequence portion and further comprising repeating steps (a) through (c) for another and distinct plurality of degenerate oligonucleotide primers having a fixed-sequence portion and a degenerate- sequence portion, wherein in the other plurality of degenerate primers each primer within the plurality has an identical fixed-sequence portion distinct from the fixed- sequence portion of the primers of the first plurality, and then comparing the Tm ranges of the first plurality of primers and the other plurality of primers and amplifying in step (d) using the plurality primers whose Tm range most-closely matches a desired Tm range.
PCT/US2000/008962 1999-04-06 2000-04-05 Method for selecting primers for amplification of nucleic acids WO2000060123A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU43299/00A AU4329900A (en) 1999-04-06 2000-04-05 Method for selecting primers for amplification of nucleic acids

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12789199P 1999-04-06 1999-04-06
US60/127,891 1999-04-06

Publications (2)

Publication Number Publication Date
WO2000060123A2 true WO2000060123A2 (en) 2000-10-12
WO2000060123A3 WO2000060123A3 (en) 2002-01-24

Family

ID=22432494

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/008962 WO2000060123A2 (en) 1999-04-06 2000-04-05 Method for selecting primers for amplification of nucleic acids

Country Status (2)

Country Link
AU (1) AU4329900A (en)
WO (1) WO2000060123A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020096679A1 (en) * 2018-11-06 2020-05-14 North Carolina State University Non-destructively storing, accessing, and editing information using nucleic acids

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0630973A2 (en) * 1993-05-14 1994-12-28 Johnson & Johnson Clinical Diagnostics, Inc. Diagnostic compositions, elements, methods & test kits for amplification & detection of two or more DNA's using primers having matched melting temperatures
WO1998002575A1 (en) * 1996-07-16 1998-01-22 Periannan Senapathy Method for contiguous genome sequencing
WO1998006872A1 (en) * 1996-08-14 1998-02-19 Academy Of Applied Science Inc. Method of computer-aided automated diagnostic dna test design
WO2000028087A1 (en) * 1998-11-10 2000-05-18 Sequetech Corporation A library of modified primers for nucleic acid sequencing, and method of use thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0630973A2 (en) * 1993-05-14 1994-12-28 Johnson & Johnson Clinical Diagnostics, Inc. Diagnostic compositions, elements, methods & test kits for amplification & detection of two or more DNA's using primers having matched melting temperatures
WO1998002575A1 (en) * 1996-07-16 1998-01-22 Periannan Senapathy Method for contiguous genome sequencing
WO1998006872A1 (en) * 1996-08-14 1998-02-19 Academy Of Applied Science Inc. Method of computer-aided automated diagnostic dna test design
WO2000028087A1 (en) * 1998-11-10 2000-05-18 Sequetech Corporation A library of modified primers for nucleic acid sequencing, and method of use thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Computer-assisted selection of oligonucleotide primers and probes, and calculation of critical parameters using OLIGO, ver. 4.0 primer analysis software" FASEB JOURNAL, FED. OF AMERICAN SOC. FOR EXPERIMENTAL BIOLOGY, BETHESDA, MD, US, vol. 7, no. 7, SUPPL, 20 April 1993 (1993-04-20), page A1315 XP002132782 ISSN: 0892-6638 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020096679A1 (en) * 2018-11-06 2020-05-14 North Carolina State University Non-destructively storing, accessing, and editing information using nucleic acids

Also Published As

Publication number Publication date
AU4329900A (en) 2000-10-23
WO2000060123A3 (en) 2002-01-24

Similar Documents

Publication Publication Date Title
US6270966B1 (en) Restriction display (RD-PCR) of differentially expressed mRNAs
US7691614B2 (en) Method of genome-wide nucleic acid fingerprinting of functional regions
EP1183388B1 (en) Shot-gun sequencing and amplification without cloning
EP0832290B1 (en) Universal primer sequence for multiplex dna amplification
US5869242A (en) Mass spectrometry to assess DNA sequence polymorphisms
JP5140425B2 (en) Method for simultaneously amplifying specific nucleic acids
Rychlik et al. A computer program for choosing optimal oligonudeotides for filter hybridization, sequencing and in vitro amplification of DNA
JP2019533432A5 (en)
CA2639819A1 (en) Selective terminal tagging of nucleic acids
EP2620497A1 (en) Method for producing circular dna formed from single-molecule dna
WO2008118839A1 (en) Exon grouping analysis
JPH06153952A (en) Method for pretreatment for carrying out amplifying and labeling of unknown double-stranded dna molecule in trace amount
US5814489A (en) PCR amplification of mRNA
Yeatman et al. Identification of a differentially-expressed message associated with colon cancer liver metastasis using an improved method of differential display.
Sakuma et al. Computer prediction of general PCR products based on dynamical solution structures of DNA
WO2000060123A2 (en) Method for selecting primers for amplification of nucleic acids
EP1244815A2 (en) Method of analyzing a nucleic acid
WO2006004365A1 (en) The method selecting highly specific probes for hpv genotype analysis and the probes thereof
JPH08173164A (en) Preparation of dna
KR102237248B1 (en) SNP marker set for individual identification and population genetic analysis of Pinus densiflora and their use
EP4012029A1 (en) Method for capturing nucleic acid molecule, preparation method for nucleic acid library, and a sequencing method
Häner Oligonucleotide Technology–A Research Platform for Chemical Biology
WO2022125100A1 (en) Methods for sequencing polynucleotide fragments from both ends
KR20230080464A (en) Methods and Means for Generating Transcribed Nucleic Acids
KR100844010B1 (en) Method for Simultaneous Amplification of Multi-gene

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)