WO1990004651A1 - Mapping quantitative traits using genetic markers - Google Patents

Mapping quantitative traits using genetic markers Download PDF

Info

Publication number
WO1990004651A1
WO1990004651A1 PCT/US1989/004688 US8904688W WO9004651A1 WO 1990004651 A1 WO1990004651 A1 WO 1990004651A1 US 8904688 W US8904688 W US 8904688W WO 9004651 A1 WO9004651 A1 WO 9004651A1
Authority
WO
WIPO (PCT)
Prior art keywords
progeny
qtls
genetic
qtl
strains
Prior art date
Application number
PCT/US1989/004688
Other languages
French (fr)
Inventor
Eric S. Lander
Andrew H. Paterson
Steven D. Tanksley
Original Assignee
Whitehead Institute For Biomedical Research
Cornell Research Foundation, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Whitehead Institute For Biomedical Research, Cornell Research Foundation, Inc. filed Critical Whitehead Institute For Biomedical Research
Publication of WO1990004651A1 publication Critical patent/WO1990004651A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • C12Q1/683Hybridisation assays for detection of mutation or polymorphism involving restriction enzymes, e.g. restriction fragment length polymorphism [RFLP]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6841In situ hybridisation

Definitions

  • RFLPs restriction fragment length polymorphisms
  • DNA polymorphisms which are differences in the nucleotide sequence of a region of DNA, are of several types. For example, variation in the nucleotide sequence of DNA which is the result of a point mutation can, in turn, result in gain or loss of a restriction site for a particular restriction endonuclease. Changes in DNA which involve larger regions (deletions, additions, inversions, translocations) change the relative
  • restriction fragments which differ in size distribution from fragments similarly obtained from an unaffected individual (i.e., one in whom such a mutation or
  • RFLP which, in the case of a point mutation, is generated only by enzymes whose recognition sites include the mutation and, in the case of changes in larger regions, is
  • RFLPs have been shown to be stably inherited and, in teh case of nuclear RFLPs, to express codominance and generally lack obvious phenotypic effects. In addition, RFLPs often exhibit multiple alleles. Assigning
  • the present invention relates to a systematic and accurate method for mapping or locating precisely the genomic regions containing polygenic factors controlling a quantitatively inherited trait or traits of interest. Described herein is the first method by which accurate mapping of a QTL to an interval can be carried out. That is, unlike previously-available methods of determining QTLs, the present method makes it possible to determine with a high degree of accuracy that a QTL lies within an interval (e.g., within a region of DNA bounded by two selected markers or a subregion thereof). The accuracy with which mapping can be carried out by the method of the present invention is particularly valuable in that it provides precision in at least two applications: gene cloning and gene transfer.
  • the present method makes it possible to delimit the region in which It is highly likely that DNA encoding a quantitative trait of interest occurs, thus maximizing the likelihood that the DNA of interest is, in fact isolated and minimizing the effort which must be expended in its isolation.
  • the present method is also very valuable in those instances in which transfer of a gene or gene portion from one plant or animal (e.g., an unagricultural product or wild type) to another plant or animal (e.g., an agricultural product) is to be carried out.
  • tight intervals increase the likelihood that the desired gene will be transferred and decrease the effort (and concomitant time and expense) necessary to transfer the gene or gene portion.
  • the method of the present invention makes use of genetic markers to map/locate QTL, to estimate or predict their phenotypic effects and to greatly reduce the number of progeny which must be scored with the DNA markers.
  • RFLPs RFLPs
  • isozymes two specific genes were the genetic markers used.
  • any genetic markers such as any DNA polymorphisms or DNA sequence differences that can be detected, codominant protein polymorphisms or a combination of codominant genetic markers can be used for the same purpose.
  • Any set of scorable genetic markers (codominant or recessive) which cover most of the genome of the plant or animal being assessed can be used for this purpose.
  • isogenic lines can be used, in conjunction with the fundamental tools of genetic and molecular biology for the study of a trait, including testing of complementation and epistasis; characterization of physiological and biochemical
  • the method of the present invention has broad application to breeding of plants and animals for agriculturally valuable traits, particularly because it allows for deterministic breeding.
  • diabetes predispositions to cancer and teratomas, alcohol sensitivity, drug sensitivities and some
  • concentration of soluble solids and fruit pH are mapped to within about 20-30 centiMorgans (cM, which - 1% recombination about 10 6 - 10 7 bp) by means of a complete RFLP linkage map Brief Description of the Drawings
  • Figure 1 is the frequency distribution for fruit mass, soluble-solids concentration (oBrix, a standard refractometric measure primarily detecting reducing sugars, but also affected by other soluble constituents;
  • 1oBrix is approximately 1% w/w) and pH in the E parental strain and in the backcross (BC) progeny.
  • Figure 2 is the distribution of recurrent parent (E) genotype in the 237 backcross progeny, estimated on the basis of the marker genotypes and their relative
  • Figure 3 are QTL likelihood maps indicating LOD scores for fruit weight (solid lines and bars), soluble solids concentration (dotted lines and bars) and pH
  • FIG. 4 is a schematic drawing of phenotypic distributions in the A and B parental, F1 hybrid and B1 backcross populations.
  • Figure 5 are graphic representations of LOD scores for a hypothetical quantitative trait, based on simulated data for 250 backcross progeny in an organism with 12 chromosomes of 100 cM each.
  • Figure 6 is a graphic representation of LOD scores for a chromosome containing two QTLs.
  • Figure 7 shows the appropriate LOD threshold so that the chance of a false positive occurring anywhere in the genome is at most 5 % , as a function of genome s ize and density of RFLPs scored.
  • Figure 8 shows that progeny having phenotypes exceeding mean by ⁇ L standard deviations make up a proportion Q(L) of population, but account for a
  • Figure 9 shows that if only individuals having phenotypes exceeding mean by ⁇ L standard deviations, the number of progeny genotyped may be decreased by a factor of g(L) if the number of progeny grown and phenotyped is increased by a factor of h(L).
  • Figure 10 shows the number of backcross progeny that must be genotyped to map a QTL, based on the fraction of the backcross variance explained by the segregation of the QTL.
  • all progeny are genotyped and single markers are analyzed.
  • only progeny with 5% most extreme phenotypes are genotyped and interval mapping is used to analyze the data.
  • Figure 11 shows the number of backcross progeny that must be genotypes to map a QTL, based on the difference D between the strains (measured in environmental standard deviations) and the number K of effective factors.
  • D the difference between the strains (measured in environmental standard deviations)
  • K the number of effective factors.
  • the present invention is based on a method of resolving quantitative traits into discrete Mendelian factors, using genetic markers.
  • a complete RFLP linkage map was available and RFLPs were the genetic markers used.
  • any genetic markers which will generally be codominant markers such as DNA polymorphisms, isozymes or other codominant protein polymorphisms or a combination of such markers, can be used. Described herein is the first use of such a complete RFLP linkage map to resolve
  • mapping QTLs in other plants and in animals, have made it possible, for the first time, to map QTLs to DNA intervals with a high degree of accuracy, thus maximizing the likelihood that a QTL of interest lies within an interval defined by two selected (defined) markers.
  • This approach is broadly applicable to the genetic dissection of quantitative inheritance of
  • interval mapping of the present invention and its application to a higher plant are described in detail herein.
  • the method of interval mapping of the present invention can also be used to locate precisely the genomic regions which contain polygenic factors controlling a quantitatively inherited trait or traits.
  • the subject method entails five steps or procedures: 1) choosing of a pair of interfertile strains (i.e., two types or varieties of a plant or animal which differ as to a trait of interest), which will serve as the parent strains in the initial cross (and one of which may serve as the recurrent parent in subsequent crosses if backcrosses are employed); 2) constructing a genetic linkage map (using RFLPs, isozymes and/or other codominant markers), if an adequate map is not already available; 3) arranging one or more
  • back-crosses or intercrosses using as the recurrent parent the strain or type of plant or animal in which the transferred gene (or genes) is to function; 4) scoring progeny of the back-crosses or intercrosses for the trait or traits of interest and for the genetic markers
  • interval mapping method of the present invention to locate genomic regions containing QTLs of interest in tomato plants is presented in the fo llowing s ec tion .
  • accession LA1028 (denoted CL).
  • concentrations of soluble solids E approximately 5%; CL approximately 10%. These are traits of agricultural importance because they jointly determine the yield of tomato paste. Rick, CM., Hilgardia, 42:493-510 (1974).
  • the strains are known to be polymorphic for genes affecting fruit pH, which is important for the optimal preservation of tomato products; the difference in pH between parental strains is, however, small.
  • the map is essentially complete: it has linkage groups covering all 12 tomato chromosomes, with an average spacing of 5 cM between markers (1 cM is the distance along the chromosome which gives a
  • E and CL strains differ in two easily-scored, simply-inherited morphological traits: determinancy (described below) and uniform ripening, controlled by the sp and u genes, respectively. Although a few distal regions did not contain appropriate markers, it is estimated that about 95% of the tomato genome was detectably linked to the markers used.
  • interval mapping allows inference about points throughout the entire genome and avoids confounding phenotypic effects with recombination, by using
  • interval mapping reduces to linear regression.
  • threshold of 2.4 gives a probability of less than 5% that even a single false positive will occur anywhere in the genome.
  • QTL likelihood maps showing how lod scores for fruit mass, soluble-solids concentration and pH change as one moves along the genome, reveal multiple QTLs for each trait and estimate their location to within 20-30 cM.
  • chromosome 1 CD41 on chromosome 5 and TG68 on chromosome 12 may affect soluble-solids concentration and merit further attention in larger populations.
  • the region near, the ⁇ locus on chromosome 10 may contain an additional QTL affecting pH (See Example 1).
  • the region near sp on chromosome 6 has the largest effects on soluble solids and pH, as well as a
  • the sp gene affects plant-growth habit: the dominant CL allele causes continuous apical growth (indeterminate habit), whereas the recessive E allele causes termination in an
  • the QTLs identified here may well differ from those that would be fixed by repeated back-crossing with continuing selection for a trait, a classical method for introgressing quantitative traits.
  • Work on LA1563, a strain with increased soluble solids produced through back-crossing a different strain of E to CL has provided some suggestive evidence. Rick, C.M., Hilgardia,
  • Tanksley and Hewitt By surveying RFLPs, Tanksley and Hewitt recently found that LA1563 has maintained three separate regions from CL: near CD56 on chromosome 10, near Got2 on chromosome 7 and near TG13 on chromosome 7. Here, above-threshold effects were detected In the last of these three regions only (which, interestingly, failed to show effects on soluble solids in a single-environment test by Tanksley and Hewitt).
  • crosses can be used to isolate them in near- isogenic lines. These lines can be used to
  • the method of interval mapping described above allows (i) efficient detection of QTLs while limiting the overall occurrence of false positives; til) accurate estimation of phenotypic effects of QTLs; and (iii) localization of QTLs to specific regions
  • phenotypic effect ⁇ that the cross will be designed to detect.
  • a choice of ⁇ in the range of between 1 ⁇ 2(D/k) and (D/k) should ensure that QTLs accounting for much of the phenotypic difference will be detected.
  • a choice of ⁇ in the range of between 1 ⁇ 2(D/k) and (D/k) should ensure that QTLs accounting for much of the phenotypic difference will be detected.
  • the same choice of ⁇ can be used, although the presence of QTLs with this effect is not guaranteed.
  • interval mapping and selective genotyping reduce the number of progeny to be genotyped by up to 7-fold.
  • isogenic lines can be rapidly constructed differing only in the region of the QTL by using the RFLPs to select for the desired region and against the remainder of the genome.
  • flanking markers may be used to retain the QTL and the study of the remaining markers may be used to speed progress by identifying individuals with a fortuitously high
  • the tomatoes were grown in the field at Davis, California, in a completely randomized design including 237 BC plants (with E as the recurrent pistillate
  • Figure 2 shows the distribution of percentage of recurrent parent (E) genotype in the 237 back-cross progeny, estimated on teh basis of the marker genotypes and their relative distances. Determination of marker genotypes was as previously described. Tanksley, S. D. and J. Hewitt,
  • the height of the curve indicates the strength of the evidence (log 10 of the odds ratio) for the presence of a QTL at each location and not the magnitude of the inferred allelic effect.
  • the horizontal line at a height of 2.4 indicates the stringent threshold that the lod score must cross to allow the presence of a QTL to be inferred, as described herein.
  • Information about the likely position of the QTL can also be inferred from the curve.
  • the maximum likelihood position of the QTL is the highest point on the curve.
  • Bars below each graph indicate a 10:1 likelihood support interval for the position of the QTL (the range outside which the likelihood falls by a lod score of 1.0), whereas the lines extending out from the bars indicate a
  • Phenotypic effects indicated beside the bars are the inferred effect of substituting a single CL allele for one of the two E alleles at the QTL.
  • telome one near TG19, chromosome five near TG34 and chromosome 12 near TG68 regions show sub-threshold effects on one or more traits (chromosome one near TG19, chromosome five near TG34 and chromosome 12 near TG68) which may represent QTLs; this requires additional testing.
  • the region near TG68 may be particularly interesting, as it is the only instance found where the CL allele seems to decrease soluble-solids concentration (by about
  • the lod score and the maximum likelihood estimate (MLE) of the phenotypic effect at any point in the genome is computed assuming that the distribution of phenotypes in the BC progeny represents a mixture of two normal distributions (of equal variance) with means depending on the genotype at a putative QTL at the given position.
  • QTLs are considered individually and there is no assumption that different QTL effects can be added, except in studying the possibility of two QTLs on chromosome 10 affecting pH .
  • the likelihood function for individual i with quantitative phenotype ⁇ is given by
  • ⁇ 2 is the phenotypic variance not attributable to the QTL and p 1 and p 2 are the probabilities that individual i has genotype E/E and E/CL, respectively, at the QTL (which can be computed on the basis of the genotypes at the flanking markers and the distance to the flanking markers).
  • ⁇ * and ⁇ * denote the MLEs allowing the possibility of a QTL at the
  • QTLs quantitative trait loci
  • Figures 10 and 11 are graphs that allow geneticists to estimate, in any particular case, the number of progeny required to map QTLs underlying a quantitative trait.
  • mapping QTLs The basic methodology for mapping QTLs involves arranging a cross between two inbred strains differing substantially in a quantitative trait: segregating progeny are scored both for the trait and for a number of genetic markers. Typically, the segregating progeny are produced by a B1 backcros s (F1 x Parent) or an F2
  • phenotypic effect ⁇ of a QTL is meant the additive effect of substituting both A alleles by B alleles.
  • a single allele has effect 1 ⁇ 2 ⁇ , since
  • the number of effective factors k may seriously underestimate the number of QTLs.
  • the number of QTLs is unlimited. In this case, must there exist any QTLs affecting the phenotype by (D/k)? More generally, for any 0 ⁇ ⁇ ⁇ 1, must there exist QTLs affecting the phenotype by ⁇ (D/k)? And, how must of the total
  • the traditional approach for detecting a QTL near a genetic marker involves comparing the phenotypic means for two classes of progeny: those with genotype marker AB, and those with marker genotype AA .
  • the difference between the means provides an estimate of the phenotypic effect of substituting a B allele for an A allele at ehe QTL.
  • z is a standard normal variable (i.e., Z ⁇ is the number of standard deviations beyond which the normal curve contains probability ⁇ ).
  • the required progeny size thus essentially scales inversely with the square of the phenotypic effect of the QTL or, equivalently, inversely with the variance explained.
  • mapping the traditional approach has a number of
  • phenotypic effect of the QTL is biased downward by a factor of (1-2 ⁇ ). (ii) If the QTL does not lie at the marker locus, substantially more progeny may be required.
  • the progeny size would need to be increased by 22%, 49%, 82% or 123%, respectively, to account for the possibility that the QTL might lie In the middle of an interval (i.e., at the maximum distance from te nearest RFLP).
  • linear regression solutions (a*, b*, ⁇ 2 *) are, in fact, maximum likelihood estimates (MLEs) for the
  • parameters--thhat are the values which maximize the probability L(a,b, ⁇ 2 ) that the observed data would have occurred.
  • LOD log 10 (L(a*, b*, ⁇ 2 * )/L( ⁇ A , 0, ⁇ 2 BC1 )), essentially indicating how much more likely the data is to have arisen assuming the presence of a QTL than assuming its absence. (The choice of log 10 accords with longstanding practice in human genetics, although log e would be slightly more convenient below). Morton, N.E., Am . J. Hum. Genet., 7 : 211 -318 (1955). If the LOD score exceeds a predetermined threshold T, a QTL is declared to be present. The important issues are: (i) What LOD threshold T should be used, in order to maintain an acceptable low rate of false positives? (ii) What is the expected contribution to the LOD score (called the ELOD) from each additional progeny? The number of progeny required is then T/ELOD, to provide even odds of
  • genotyped ln typical cases, a reduction of up to 7-fold can be achieved by combining two approaches: interval mapping and selective genotyping.
  • Selective genotyping involves growing a larger population, but genotyping only those individuals whose phenotypes deviate substantially from the mean. Additional methods for increasing the power of QTL mapping include reducing environmental noise by progeny testing and reducing genetic noise by studying several genetic regions simultaneously.
  • L(a,b, ⁇ 2 ) ⁇ i [G i (0)L i (0) + G i (1)L i (1)], (7)
  • Finding the maximum likelihood solution (a * , b * , ⁇ 2* ) to (7) can be regarded as a linear regression problem with missing data: none of the independent variables (genotypes) are known; only probability distributions for each are available. Although standard computer programs for linear regressions cannot be used, techniques for maximum likelihood estimation with missing data have been developed in recent years. Little, R.J.A. and D.B.
  • Figure 5 presents a QTL likelihood map, showing how the LOD score varies
  • a LOD score of 2.4 is required (see below) for declaring the presence of a QTL.
  • the four largest QTLs are detected while the fifth does not attain statistical significance.
  • the approximate position of the QTLs is indicated by one-lod confidence intervals, defined by the points on the genetic map at which the likelihood ratio has fallen by a factor of 10 from the maximum; such confidence intervals are frequently used in human genetics to indicate the probable position of genes. (Ott, J., Analysis of Human Genetic Linkage, Johns Hopkins University, Baltimore (1985)).
  • the probable position of the QTL is given by confidence intervals, indicating the range of points for which the likelihood ratio is within a factor of 10 (or 100, if desired) of the maximum.
  • Interval mapping thus decreases the required number of progeny by a factor of (1- ⁇ )--which is exactly the proportion of meioses in which the flanking markers do not recombine.
  • d 10,20,30 and 40 cM
  • Interval mapping has recently been applied to an interspecific backcross in tomato: six QTLs affecting tomato fruit weight, four QTLs affecting the
  • interval mapping should prove valuable for analyzing and pres enting evidence for QTLs, and for decreasing somewhat the number of progeny required to detect QTLs of a given magnitude.
  • a standard chi-square table may be used to calculate the LOD score threshold corresponding to a 5% chance that even a single false positive will occur.
  • LOD thresholds as a function of genome size and marker spacing ( Figure 7).
  • interval mapping increases the efficiency of QTL mapping somewhat, large numbers of progeny may still be required. Additional methods are available to increase the power of QTL mapping, the most important of which is selective genotyping.
  • the individuals that provide the most linkage information are those whose genotype can be most clearly inferred from their phenotype. For example,
  • the highest ELODs are provided by the progeny that deviate most from the phenotypic mean.
  • the cost of growing progeny is less that the cost of complete RFLP genotyping (as is frequently the case), it will thus be more efficient to increase the number of progeny grown but to genotype only those with the most extreme
  • Progeny with phenotypes more than 1 standard deviations from the mean comprise about 33% of the total population but contribute about 81% of the total linkage information. By growing a population that was only about 25% larger and genotyping only these extreme progeny, the same total linkage information would be obtained from genotyping only about 40% as many individuals.
  • Progeny with phenotypes more than 2 standard deviations from the mean comprise about 5% of the total population but contribute about 28% of the total linkage information.
  • progeny genotypes for the non-extreme progeny may simply be entered as missing. Using the MAPMAKER-QTL program, the method has been applied to both simulated and experimental data sets.
  • F2 intercrosses and recombinant inbred strains Although the discussion above concerns the backcross, it applies directly to F2 intercrosses and recombinant inbred strains, with the following modifications:
  • multi-generational breeding scheme that is used to construct recombinant inbred strains increases the effective genetic length of the genome. Compared to a backcross, the density of crossovers is doubled in a recombinant inbred produced through selfing and is quadrupled in a recombinant inbred produced by sib mating. Haldane, J.B.S. and CH. Waddington, Genetics, 16 : 357-374 (1931). A genetic length of 2G or 4G must be used in place when computing the appropriate LOD
  • threshold which leads to an increase of about 0.3 or 0.6, respectively, in the threshold required.
  • the higher threshold will increase the number of progeny required, the effect is typically offset by the ability to decrease the number of progeny by reducing the
  • ⁇ *(d) denote the maximum likelihood estimate of the phenotypic effect of a putative QTL at this position
  • LOD(d) denote the corresponding LOD score.
  • LOD(d) is asymptotically proportional to the square of a random normal variable ⁇ (d) (which incidentally proves that LOD is proportional to X 2 ).
  • r(a,b) denote the correlation coefficient for any two random variables a and b.
  • 1 -2 ⁇ e -2d .
  • this is the definition of ORENSTEIN - UHLENBECK diffusion and Proposition 2 follows directly (see LEADERBETTER, LINDGREN and ROOTSZEN 1983. Theorem 12.2.9 and discussion following).
  • the LOD score for a marker at 0 cM can be redefined as the log 10 of
  • This ratio measures how much more likely the data is to have been generated by a QTL with the hypothesized effect located at the marker locus than by a QTL with this same effect but unlinked to the marker.
  • the ELOD can be found by numerical integration over the distribution for ⁇ . In the limit of a QTL with large effect, the expression tends to the traditional LOD score for a qualitative trait used in human genetics.
  • the LOD score (comparing the true hypothesis H 1 :(0.b, ⁇ 2 ) to the alternative H 0 :(0,0, ⁇ 2 )) is
  • LOD ⁇ L ⁇ ⁇ I ⁇ I ⁇ L ⁇ (LOD ⁇ )p( ⁇ )d ⁇ .

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A systematic and accurate method for mapping or locating the genomic regions or subregions containing polygenic factors which control a quantitatively inherited trait or traits of interest. The present method, which can be applied to higher plants and animals, makes it possible to determine with a high degree of accuracy that a quantitative trait loci (QTL) lies within a genomic region or subregion bounded by selected genetic markers.

Description

MAPPING QUANTITATIVE TRAITS US ING GENET IC MARKERS Description
Background
Although some important characteristics of
agricultural crops and animals are determined by genetic loci which have major effects on phenotype, most traits which are economically valuable are quantitative in nature. In the case of a quantitatively inherited trait, phenotypic variation in that trait shows continuous quantitative differences among individual plants or animals and there is a normal distribution of phenotypic values for the trait in a given population. The
continuous variation is seen because of the fact that the collective effect of numerous genetic loci, each having a small effect (making a small quantitative contribution to), determines the phenotypic trait. Traits which exhibit such genetic variation are often referred to as polygenic traits because genetic variation at a large number of genetic loci affects phenotypic expression.
Loci which contribute to polygenic traits are often referred to as "minor genes"; "major genes" are those in which variation results in substantial effects on
phenotype and which exhibit a Mendelian pattern of inheritance. Beckmann, J.S. and M. Soller, Oxford
Surveys of Plant Molecular and Cellular Biology,
3:196-250 (1986). The conflict between the Mendelian theory of
particulate inheritance and the observation that most traits in nature exhibit continuous variation was
resolved in the early 1900s by the concept that
quantitative traits can result from segregation of multiple genes, modified by environmental effects.
Mendel, G., Verh. des Naturf. Vereines in Brunn, 4
(1866); Johannsen, W. Elemente der exakten
Erbilichkeitsllehre (Fischer, Jena, 1909); Nilsson-Ehle, H. Kreuzunguntersuchungen an Hafer und Weizen (Lund,
1909); East, E.M., Genetics, 1:164-176 (1915); Wrigth, S. Evolution and the Genetic of Populations (Univ. of
Chicago, Chicago, 1968). Although it has been shown that polygenes exhibit a Mendelian pattern of inheritance, it has not been possible to separate the effect of different loci acting in this way. Pioneering experiments have shown that linkage to such quantitative trait loc i (QTLs ) could occas ionally be detected . However , accurate and systematic mapping of QTLs has not been possible because the inheritance of an entire genome could not be studied with genetic markers. Sax, K., Genetics, 8:552-556
(1923); Thoday, J.M., Nature, 191:368-369 (1961);
Tanksley, S.D. et al., Heredity, 49:11-25 (1982);
Edwards, M.D. et al., Genetics, 116 :113-125 (1987). The use of restriction fragment length polymorphisms (RFLPs) has made such investigations possible, at least in principle. Botstein, D. et al., American Journal of Human Genetics, 32 : 314-331 (1980).
DNA polymorphisms, which are differences in the nucleotide sequence of a region of DNA, are of several types. For example, variation in the nucleotide sequence of DNA which is the result of a point mutation can, in turn, result in gain or loss of a restriction site for a particular restriction endonuclease. Changes in DNA which involve larger regions (deletions, additions, inversions, translocations) change the relative
distribution of restriction sites for several restriction endonucleases. In both cases, endonuclease digestion of the region affected by the mutation produces DNA
restriction fragments which differ in size distribution from fragments similarly obtained from an unaffected individual (i.e., one in whom such a mutation or
alteration is not present). The result is an RFLP, which, in the case of a point mutation, is generated only by enzymes whose recognition sites include the mutation and, in the case of changes in larger regions, is
generated by a number of enzymes.
RFLPs have been shown to be stably inherited and, in teh case of nuclear RFLPs, to express codominance and generally lack obvious phenotypic effects. In addition, RFLPs often exhibit multiple alleles. Assigning
locations of RFLPs on a chromosome map can be carried out, using art-recognized techniques. Beckmann, J. S. and
M . Soller, Oxford Surveys of Plant Molecular and Cellular Biology, 3:196-250 (1986).
Manipulation of QTL has historically been a major limitation of genetic engineering and classical breeding. Systematic and accurate mapping of QTLs has not been possible because of the difficulty in arranging crosses with genetic markers densely spaced throughout an entire genome. RFLP techniques make it possible to try to identify and manipulate QTLs, by, for example,
determining RFLP-QTL linkage, followed by mapping and evaluating QTL effects; directly identifying allelic variation at a QTL; and using insertional mutagenesis to identify and clone QTL. It would be very beneficial if a method by which QTL could be identified, accurately mapped and introduced, as needed, into plants and
animals, particularly for development of new strains which exhibit desirable features. Summary of the Invention
The present invention relates to a systematic and accurate method for mapping or locating precisely the genomic regions containing polygenic factors controlling a quantitatively inherited trait or traits of interest. Described herein is the first method by which accurate mapping of a QTL to an interval can be carried out. That is, unlike previously-available methods of determining QTLs, the present method makes it possible to determine with a high degree of accuracy that a QTL lies within an interval (e.g., within a region of DNA bounded by two selected markers or a subregion thereof). The accuracy with which mapping can be carried out by the method of the present invention is particularly valuable in that it provides precision in at least two applications: gene cloning and gene transfer. For example, in gene cloning, the present method makes it possible to delimit the region in which It is highly likely that DNA encoding a quantitative trait of interest occurs, thus maximizing the likelihood that the DNA of interest is, in fact isolated and minimizing the effort which must be expended in its isolation. Similarly, the present method is also very valuable in those instances in which transfer of a gene or gene portion from one plant or animal (e.g., an unagricultural product or wild type) to another plant or animal (e.g., an agricultural product) is to be carried out. As in the case of gene cloning, in gene transfer, tight intervals increase the likelihood that the desired gene will be transferred and decrease the effort (and concomitant time and expense) necessary to transfer the gene or gene portion.
The method of the present invention makes use of genetic markers to map/locate QTL, to estimate or predict their phenotypic effects and to greatly reduce the number of progeny which must be scored with the DNA markers. In the case described herein, RFLPs, isozymes and two specific genes were the genetic markers used. However, any genetic markers, such as any DNA polymorphisms or DNA sequence differences that can be detected, codominant protein polymorphisms or a combination of codominant genetic markers can be used for the same purpose. Any set of scorable genetic markers (codominant or recessive) which cover most of the genome of the plant or animal being assessed can be used for this purpose.
As a result of following an entire genetic map of RFLPs in a cross of two strains which differ for one or more quantitatively inherited traits, it has been
possible to develop analytical methods useful for
systematic and accurate mapping of these genes and, then, by following the inheritance of nearby markers, for breeding them into new strains. The availability of complete RFLP linkage maps, such as described herein, has made is possible to dissect quantitative traits into discrete genetic factors. Subsequently, QTLs can be mapped and isogenic lines can be constructed, so as to differ only in the region of the QTL, by using the RFLPs to select for the desired region and against the
remainder of the genome. Such isogenic lines can be used, in conjunction with the fundamental tools of genetic and molecular biology for the study of a trait, including testing of complementation and epistasis; characterization of physiological and biochemical
differences between isogenic lines; isolation of
additional alleles via mutagenesis or further selective breeding (at least in favorable systems); and,
eventually, molecular cloning of the genes underlying quantitative inheritance. The method of the present invention has broad application to breeding of plants and animals for agriculturally valuable traits, particularly because it allows for deterministic breeding.
Systematic genetic dissection of quantitative traits using complete RFLP linkage maps is valuable in a broad range of biological endeavours. For example,
agricultural traits, such as resistance to diseases and pests, tolerance to drought, heat, cold, and other adverse conditions, and nutritional value can be mapped and introgressed into domestic strains from exotic relatives (Rick, CM., Genes: Enzymes and Populations,
Plenum Press, pp. 255-268 (1973); Harlan, J.R., Crop Science, 16: 329-333 (1976)). Aspects of mammalian physiology such as hypertension, atherosclerosis,
diabetes, predispositions to cancer and teratomas, alcohol sensitivity, drug sensitivities and some
behaviours can be investigated in animal strains
differing widely for these traits (Tanase, H. et al.,
Japanese Circulation Journal, 34:1197-1212 (1970);
DeJong, W., Handbook of Hypertension, Vol. 4, Elsevier
(1984); Paigen, B.A. et al., Atherosclerosis, 57:65-73 (1985); Prochazka, M. et al., Science, 237 : 286-289
(1987); Heston, W. E., J. Natl. Cancer Inst., 3:79-82 (1942); Kalter, H., Genetics, 39: 185-196 (1954);
Malkinson, A.M. & D. S. Beer, J. Natl. Cancer Inst.,
70:931-936 (1983); Shire, J.G.M., In: Genetic and
Environmental Influences on Behavior, pp. 194-205 (1968); Stewart, J. & R.C. Elston, Genetics, 73:675-693 (1973); Festing, M.F.W., Inbred Strains in Biomedical Research, Oxford University Press, Oxford (1979)). Evolutionary questions about speciation can be elucidated by
determining the number and nature of the genes involved in reproductive barriers (Coyne, J.A. & B. Charlesworth, Heredity, 57: 243-246 (1986)). An example of such genetic dissection is described herein: In an interspecific cross in tomato, QTLs affecting fruit weight,
concentration of soluble solids and fruit pH are mapped to within about 20-30 centiMorgans (cM, which - 1% recombination about 106 - 107 bp) by means of a complete RFLP linkage map Brief Description of the Drawings
Figure 1 is the frequency distribution for fruit mass, soluble-solids concentration (ºBrix, a standard refractometric measure primarily detecting reducing sugars, but also affected by other soluble constituents;
1ºBrix is approximately 1% w/w) and pH in the E parental strain and in the backcross (BC) progeny.
Figure 2 is the distribution of recurrent parent (E) genotype in the 237 backcross progeny, estimated on the basis of the marker genotypes and their relative
distances.
Figure 3 are QTL likelihood maps indicating LOD scores for fruit weight (solid lines and bars), soluble solids concentration (dotted lines and bars) and pH
(hatched lines and bars), throughout the 862 cM spanned by the 70 genetic markers. The RFLP linkage map used in the analysis is presented along the abscissa, in Kosambi cM. Figure 4 is a schematic drawing of phenotypic distributions in the A and B parental, F1 hybrid and B1 backcross populations.
Figure 5 are graphic representations of LOD scores for a hypothetical quantitative trait, based on simulated data for 250 backcross progeny in an organism with 12 chromosomes of 100 cM each.
Figure 6 is a graphic representation of LOD scores for a chromosome containing two QTLs.
Figure 7 shows the appropriate LOD threshold so that the chance of a false positive occurring anywhere in the genome is at most 5 % , as a function of genome s ize and density of RFLPs scored.
Figure 8 shows that progeny having phenotypes exceeding mean by ≥ L standard deviations make up a proportion Q(L) of population, but account for a
proportion S(L) of the total LOD score for the progeny.
Figure 9 shows that if only individuals having phenotypes exceeding mean by ≥ L standard deviations, the number of progeny genotyped may be decreased by a factor of g(L) if the number of progeny grown and phenotyped is increased by a factor of h(L).
Figure 10 shows the number of backcross progeny that must be genotyped to map a QTL, based on the fraction of the backcross variance explained by the segregation of the QTL. In the traditional approach, all progeny are genotyped and single markers are analyzed. In the method of the present invention, only progeny with 5% most extreme phenotypes are genotyped and interval mapping is used to analyze the data.
Figure 11 shows the number of backcross progeny that must be genotypes to map a QTL, based on the difference D between the strains (measured in environmental standard deviations) and the number K of effective factors. In the traditional approach (Panel A), all progeny are genotyped and single markers are analyzed. In the method of the present invention (Panel B), only progeny with 5% most extreme phenotypes are genotyped and interval mapping is used to analyze the data. Detailed Description of the Invention
The present invention is based on a method of resolving quantitative traits into discrete Mendelian factors, using genetic markers. In the work described herein, a complete RFLP linkage map was available and RFLPs were the genetic markers used. However, any genetic markers which will generally be codominant markers, such as DNA polymorphisms, isozymes or other codominant protein polymorphisms or a combination of such markers, can be used. Described herein is the first use of such a complete RFLP linkage map to resolve
quantitative traits into discrete Mendelian factors, in an interspecies backcross of a higher plant. In the instant case, a series of QTLs controlling selected traits in the higher plant were mapped, using the new analytical methods described herein. The methods
described herein, which can be applied in a similar manner to mapping QTLs in other plants and in animals, have made it possible, for the first time, to map QTLs to DNA intervals with a high degree of accuracy, thus maximizing the likelihood that a QTL of interest lies within an interval defined by two selected (defined) markers. This approach is broadly applicable to the genetic dissection of quantitative inheritance of
physiological, morphological and behavioral traits in any higher plant or animal. The method of interval mapping of the present invention and its application to a higher plant are described in detail herein. The method of interval mapping of the present invention can also be used to locate precisely the genomic regions which contain polygenic factors controlling a quantitatively inherited trait or traits. Briefly, the subject method entails five steps or procedures: 1) choosing of a pair of interfertile strains (i.e., two types or varieties of a plant or animal which differ as to a trait of interest), which will serve as the parent strains in the initial cross (and one of which may serve as the recurrent parent in subsequent crosses if backcrosses are employed); 2) constructing a genetic linkage map (using RFLPs, isozymes and/or other codominant markers), if an adequate map is not already available; 3) arranging one or more
back-crosses or intercrosses, using as the recurrent parent the strain or type of plant or animal in which the transferred gene (or genes) is to function; 4) scoring progeny of the back-crosses or intercrosses for the trait or traits of interest and for the genetic markers
comprising the linkage map; and 5) applying an algorithm designed to maximize the likelihood of a specific/selected function based on the data obtained in (A).
An example of application of the interval mapping method of the present invention to locate genomic regions containing QTLs of interest in tomato plants is presented in the fo llowing s ec tion .
The following is a description of: a) the parent plants used in the interspecies backcross; b) assessment of backcross progeny; c) construction of a genetic linkage map; d) interval mapping of QTLs; e) a summary or overview of key considerations; and f) possible
applications of QTL mapping based on the resulting data. a. Parent plants
Described below is resolution of quantitative traits, using a complete RFLP linkage map, in an
interspecific back-cross of two types of tomato plants: the domestic tomato Lycopersicon esculentum (L.
esculentum) cv. UC82B (denoted E) and a wild South
American green-frui ted tomato L. chmielewskii (L.
chmielewskii) accession LA1028 (denoted CL).
Chmielewski, T., Genet. Polon., 9: 97-124 (1968).
These strains have very different fruit masses (E approximately 65 g; CL approximately 5 g) and
concentrations of soluble solids (E approximately 5%; CL approximately 10%). These are traits of agricultural importance because they jointly determine the yield of tomato paste. Rick, CM., Hilgardia, 42:493-510 (1974). In addition, the strains are known to be polymorphic for genes affecting fruit pH, which is important for the optimal preservation of tomato products; the difference in pH between parental strains is, however, small.
Tanksley, S.D. and J. Hewitt, J. Theor. Appl. Genet.,
75:811-823 (1988). b. Back-cross progeny assessed
A total of 237 back-cross plants, with E as the recurrent parent, were grown in the field at Davis,
California. Between five and 20 fruit from each plant were assayed for fruit mass, soluble - solids concentration
(ºBrix) and pH, each of which showed continuous variation (Figure 1). Tanksley, S.D. and J. Hewitt, J. Theor. Appl. Genet., 75: 811-823 (1988). °Brix is a standard refractometric measure which primarily detects reducing sugars; it is also affected by other soluble
constituents. 1ºBrix is approximately 1% w/w. Example 1 is a detailed description of the back-cross and
assessment of backcross progeny. c. Construction of a genetic linkage map
A genetic linkage map of tomato with more than 300 RFLPs and 20 isozyme markers had previously been
constructed, by analyzing 46 F2 individuals derived from
L. esculentum cv. VF36 x L . pennellii accession LA716 (E x P). Tanksley, S.D. et al., R. Proc. 18th Stadler
Genet. Symp., ln Press. The map is essentially complete: it has linkage groups covering all 12 tomato chromosomes, with an average spacing of 5 cM between markers (1 cM is the distance along the chromosome which gives a
recombination frequency of one per cent). For QTL mapping, a selected subset of markers spaced at
approximately 20 cM intervals and displaying polymorphism between the E and CL strains was used. These included 63 RFLPs and five isozyme markers. In addition, the E and CL strains differ in two easily-scored, simply-inherited morphological traits: determinancy (described below) and uniform ripening, controlled by the sp and u genes, respectively. Although a few distal regions did not contain appropriate markers, it is estimated that about 95% of the tomato genome was detectably linked to the markers used.
These 70 genetic markers were scored for each of the 237 E x CL back-cross progeny (as described by Tanksley and Hewitt) and a linkage map was constructed de novo using MAPMAKER. Tanksley, S.D. and J. Hewitt, J. Theor. Appl. Genet . , 75:811-823 (1988); Lander, E.S. et al., L. Genomics, 1:174-181 (1987). The map covers all 12 chromosomes with an average spacing of 14.3 cM. Although the linear order of markers inferred from the E x CL cross essentially agreed with that inferred from the E x P cross, differences were noted (See Example 1). Genetic distances differed markedly in certain intervals (for example, 51 cM in E x P and 11 cM in E x CL, for the distance between the 45S ribosomal repeat and TG1B on chromosome 2). In total, the markers scored in both crosses span 852 cM in the E x CL map versus 1103 cM in the E x P map, a highly significant (P < 0.01)
difference. Skewed segregation (P < 0.05) was detected for 48 of the 70 markers, comprising 21 distinct regions distributed over all 12 chromosomes. The heterozygote (E/CL) was overabundant in 12 cases, whereas in nine cases the homozygote (E/E) was favoured. Overall, the effects of skewing approximately cancelled each other out: on average, the back-cross contained the expected 75% E genome (Figure 2). d. Interval mapping of QTLs
Next the question of mapping the Mendelian factors that underly continuous variation in fruit mass,
soluble - solids concentration and pH was addressed. The method of maximum likelihood and lod scores, commonly used in human linkage analysis has recently been adapted to allow interval mapping of QTLs. Ott, J., Analysis of Human Genetic Linkage (Johns Hopkins, Baltimore, 1985); Lander, E.S. and D. Botstein, Genetics, 121:185-199
(1989). Both were used here. At each position in the genome one computes the 'most likely' phenotypic effect of a putative QTL affecting a trait (the effect which maximizes the likelihood of the observed data arising) and the odds ratio (the chance that the data would arise from a QTL with this effect divided by the chance that it would arise given no linked QTL). The lod score, defined as the log10 of the odds ratio, summarizes the strength of evidence in favour of the existence of a QTL with this effect at this position; if the lod score exceeds a pre-determined threshold, the presence of a QTL is inferred. The traditional approach to mapping QTLs, as described by Tanksley et al. and Edwards et al., involves standard linear regression, which accurately measures the e ffect of QTLs falling at marker loc i only ,
underestimating the effects of other loci in proportion to the amount of recombination between marker and QTL. Tanksley, S.D. et al., Heredity, 49:11-25 (1982);
Edwards, M.D. et al., Genetics, 116:113-125 (1987). In contrast, interval mapping allows inference about points throughout the entire genome and avoids confounding phenotypic effects with recombination, by using
information from flanking genetic markers. In the special case when a QTL falls exactly at a marker locus, interval mapping reduces to linear regression. A
computer program, MAPMAKER-QTL, was written to implement interval mapping.
Due to the large number of markers tested, an extremely high lod score threshold must be adopted to avoid false positives. Given the genetic length of the tomato genome and the density of markers used, a
threshold of 2.4 gives a probability of less than 5% that even a single false positive will occur anywhere in the genome. Lander, E.S. and D. Botstein, Genetics,
121:185-199 (1989). This is approximately equivalent to requiring the significance level for any single test to be 0.001.
QTL likelihood maps, showing how lod scores for fruit mass, soluble-solids concentration and pH change as one moves along the genome, reveal multiple QTLs for each trait and estimate their location to within 20-30 cM. (Figure 3).
Factors for fruit mass were found on six chromosomes (1, 4, 6, 7, 9 and 11). In each case, CL alleles
decrease fruit mass (by 3.5 to 6.0 g), adding to a total reduction of 28.1 g inferred for back-cross progeny carrying a CL allele at all six loci. This accounts for about half of the approximately 60 g difference between E and CL.
Factors for soluble-solids concentration were found on four chromosomes (3, 4, 6 and 7). In each case, CL alleles elevate soluble-solids concentration (by 0.83 to 1.89 Brix), adding to a total of 4.57 Brix (versus a difference of approximately 5º Brix between the parental strains). This large effect in the backcross is
consistent with previous reports that high soluble-solids concentration exhibits dominance and overdominance.
Rick, CM., Hilgardia, 42:493-510 (1974); Tanksley, S.D. and J. Hewitt, J. Theor. Appl. Genet., 75:811-823 (1988). The QTL alleles for both fruit mass and soluble-solids concentration all produce effects in the direction predicted by the difference between the parental strains.
Factors for pH were found on five chromosomes (3, 6, 7, 8 and 10). In addition, the lod score for a
putative QTL on chromosome 9 fell just below the
threshold. Because the parental strains do not differ greatly in pH, it was suspected that CL alleles might not all produce effects in the same direction. In fact, pH was increased by four QTLs and decreased by two,
including the likely QTL on chromosome 9. This provides a genetic explanation for the observation that many back-cross progeny exhibited more extreme phenotypes than the parental strains (Figure 1), a phenomenon known as transgression. Simmonds, N.W. Principles of Crop
Improvement, 82-85 (Longman, NY, 1981).
Together, the QTLs identified for fruit mass, soluble-solids and pH account for 58%, 44% and 48%, respectively, of the phenotypic variance among the back-cross progeny, with another 13%, 9% and 11%
attributable to environment.
The numbers of QTLs reported for each trait must be considered a minimum estimate. Because an extremely stringent threshold was used to avoid any false
positives, some sub-threshold effects probably represent real QTLs. For example, the regions near TG19 on
chromosome 1, CD41 on chromosome 5 and TG68 on chromosome 12 may affect soluble-solids concentration and merit further attention in larger populations. Similarly, the region near, the μ locus on chromosome 10 may contain an additional QTL affecting pH (See Example 1). Moreover, one cannot rule out the presence of many additional QTLs with tiny phenotypic effects, which has been postulated in evolutionary theory and supported by some experimental evidence. Lande, R., Heredity, 50:47-65 (1983);
Shrimpton, A.E. and A. Robertson, Genetics, 11:445-459 (1988). Also, it is conceivable that some of the apparent QTLs actually represent several closely-linked QTLs, each with small phenotypic effects in the same direction. Such a phenomenon might arise particularly in regions of genetic map compression. Finally, the QTL mapping here applies strictly only to the specific environment tested and to heterozygosity for CL alleles. In principle, homozygosity for CL alleles could have been studied by using an F2 self between E and CL, but in practice too many of the progeny are sterile.
Some regions of the genome clearly exert effects on more than on trait (for example, chromosome 6; Figure 3), providing a genetic explanation for at least some of the correlation between the traits. Although the present data are insufficient to distinguish between pleiotropic effects of a single gene and independent effects of tightly-linked loci, the frequent coincidence of QTL locations for different traits makes it likely that at least some of the effects are due to pleiotropy.
The region near sp on chromosome 6 has the largest effects on soluble solids and pH, as well as a
substantial effect on fruit mass. The sp gene affects plant-growth habit: the dominant CL allele causes continuous apical growth (indeterminate habit), whereas the recessive E allele causes termination in an
inflorescence ('determinate' or 'self-pruning' habit). Yeager, A.F., J. Hered ., 18: 263-265 (1927). Although indeterminancy has been reported previously by Emery e t al. to elevate both fruit mass and soluble-solids
concentration within L. esculentum, it is associated here with reduced fruit mass in both E x CL and another interspecific cross (E x L. cheesmanii). Emery, G.C. and
H. M. Munger, J. Am. Soc. Hort. Sci., 95:410-412 (1966). These differing results might be due to a second,
tightly-linked locus or to unlinked modifier genes.
Overall, pairwise epistatic interactions between intervals were not common (about 5% of two-way
analysis-of-variance tests were significant at 0.05). An interesting exception was the region near TG16 on chromosome 8, at which the CL allele significantly enhanced the effect of three of the four QTLs for
soluble-solids concentration. TG16 also showed the most extreme segregation distortion of any marker scored
(about 4:1 in favour of the E/E homozygote) and is in a region known to exhibit skewed segregation in
back-crosses to other green-fruited tomato species.
Zamir, D. and Y. Tadmor, Bot. Gaz., 147:355-358 (1986); Tanksley, S. D. In: Isozymes in Plant Genetics and
Breeding, (eds. Tanksley, S. D. and T. J. Orton) 331-338 (Elsevier, Amsterdam, 1983). The unusual properties of this region of CL clearly merit further study.
The QTLs identified here may well differ from those that would be fixed by repeated back-crossing with continuing selection for a trait, a classical method for introgressing quantitative traits. Work on LA1563, a strain with increased soluble solids produced through back-crossing a different strain of E to CL has provided some suggestive evidence. Rick, C.M., Hilgardia,
42:493-510.(1974). By surveying RFLPs, Tanksley and Hewitt recently found that LA1563 has maintained three separate regions from CL: near CD56 on chromosome 10, near Got2 on chromosome 7 and near TG13 on chromosome 7. Here, above-threshold effects were detected In the last of these three regions only (which, interestingly, failed to show effects on soluble solids in a single-environment test by Tanksley and Hewitt). Tanksley, S. D. and J.
Hewitt, J. Theor. Appl. Genet., 75:811-823 (1988).
Moreover, QTLs affecting soluble-solids concentration were detected in regions that did not seem to be
retained. Unfortunately, the results of the two
experiments are not directly comparable due to the use of a different strain by Rick, possible environmental differences between the experiments, the possibility that small CL fragments containing QTLs went undetected in LA1563, the possibility that the region near TG13
retained in LA1563 may not contain the QTL detected here and the possibility that some of the sub- threshold effects are real. Although more detailed studies are clearly needed, it is interesting to speculate about why repeated back-crossing may fix a narrower class of QTLs than found by QTL mapping. Because such breeding
programs demand horticultural acceptability, they are likely to select against otherwise-desirable QTLs which are closely linked to undesirable effects from the wild parent. If such QTLs can first be identified by mapping it may be feasible to remove linked deleterious effects by recombination.
Once several QTLs with relatively large effects have been mapped, crosses can be used to isolate them in near- isogenic lines. These lines can be used to
characterize the QTLs in various dosages, genetic
backgrounds, environments and combinations. By
re-assembling selected CL alleles in an otherwise E genotype, it should be possible to engineer an
agriculturally-useful tomato with a higher yield of soluble solids. e. Overview of Key Consi derations
Although it has long been recognized that
quantitative traits often arise from the combined action of multiple Mendelian factors, only recently has it become practical to undertake systematic mapping of such QTLs. While such investigations will by no means be easy, the methodology developed here should increase their accuracy and efficiency. Specifically, by
integrating information from genetic markers spaced throughout a genome, the method of interval mapping described above allows (i) efficient detection of QTLs while limiting the overall occurrence of false positives; til) accurate estimation of phenotypic effects of QTLs; and (iii) localization of QTLs to specific regions
(Figure 6). Beyond the increased efficiency due to interval mapping, the strategy of selective genotyping (when applicable) can further reduce the number of progeny that must be genotyped in order to detect a QTL. Together, the methods lead to a reduction of up to 7-fold in the number of progeny to be genotyped. Finally, additional savings may be achieved via progeny testing and simultaneous search. The main considerations in designing a cross for genetic dissection of a
quantitative trait are as follows:
1. Designing a cross: Strains can be chosen to
maximize the chance that they segregate for QTLs having relatively large phenotypic effects, thereby allowing mapping with a manageable number of progeny. The ideal situation occurs when (a) the phenotypic difference D between the strains is large compared to the
environmental or within-strain standard deviation σE; (b) breeding experiments indicate that the number k of effective factors given by Wright's formula Is small; and (c) the strains are the result of selective breeding for the trait.
2. Specifying the minimum phenotypic effect the cross will be des igned to detect: Once the strains have been chosen, the experimenter must specify the minimum
phenotypic effect δ that the cross will be designed to detect. When using strains resulting from selection, a choice of δ in the range of between ½(D/k) and (D/k) should ensure that QTLs accounting for much of the phenotypic difference will be detected. When using arbitrary strains, the same choice of δ can be used, although the presence of QTLs with this effect is not guaranteed.
3. Calculating the number of backcross progeny to be phenotyped: The number N of backcross progeny that should be genotyped can then be calculated based on the spacing d between genetic markers in the map, the
appropriate threshold T for the LOD score, and the desired probability β of success, assuming either (i) the traditional method of analysis involving single markers and genotyping all progeny or (ii) interval mapping and selective genotyping of the 5% most extreme progeny.
Figure 10 shows N as a function of the fraction of variation v explained by the QTL (where v = δ2/16σ2 B1), while Figure 11ab shows N as a function of the phenotypic difference D between the strains and the number k of effective factors. Together, interval mapping and selective genotyping reduce the number of progeny to be genotyped by up to 7-fold. Both Figures 10 and 11 assume that d = 20 cM, T = 2.5, and β = 0.50, and Figure 11 assumes that the QTLs have equal phenotypic effects. For instances in which different assumptions are made, the following modifications are made: multiply by 4 to allow for QTLs having half the average effect; multiply by approximately (1.25)(1-2θ)2/1 - ψ ) to allow for markers every d cM ( θ and ψ are the recombination fractions corresponding to ½d and d cM, respectively; multiply by approximately 1.50 to allow for a 90% chance of success; multiply by approximately 1.50 to allow for a 90% chance of success; mul t ip ly by T/2.5 to allow for a low LOD threshold of T; and multiply by about 55% if an F2 intercross is used instead of a backcross. As a rule of thumb, it appears practical to map QTLs when the
phenotypic difference D measured in environmental
standard deviations is on the order of the number k of effective factors segregating.
The Spontaneous Hypertensive rat (SHR) strain
(Tanase, H. et al., Japan. Circulation Journal,
34: 1197-1212 (1970)), was derived from the Wistar-Kyoto rat (WKY) strain by selective breeding for high systolic blood pressure followed by inbreeding. Blood pressure in SHR is about 3 standard deviations higher than in WKY, while the number of effective factors was estimated at k about 3. Assuming that the rat genome is about 1500 cM and that a 20 cM map RFLP is available, the appropriate LOD threshold would be about 2.7 (see Figure 7). Using the traditional approach, one would need about 325 backcross progeny or about 175 F2 intercross progeny. With interval mapping, these become about 275 and 145. If it were practical to grow a larger population but genotype only those progeny with the 5% most extreme blood pressures, the number of progeny to genotype could be reduced to about 55 and 30, respectively.
In addition to SHR, a number of other genetically hypertensive strains of rat and mouse have been desired, with estimated number of factors between 2 and 5 (DeJong 1984). Comparison of these strains would elucidate the number and location of the most important genes
controlling naturally-occuring variation for blood pressure--at least in rodent population. Such
information might shed light on hypertension in humans as well. The availability of complete RFLP linkage maps makes it possible to dissect quantitative traits into discrete genetic factors, thereby unifying two
historically-separated areas of genetics. Once QTLs have been mapped, isogenic lines can be rapidly constructed differing only in the region of the QTL by using the RFLPs to select for the desired region and against the remainder of the genome. Soller, M. & J.S. Beckmann,
Theor. Appl. Genet., 47:179-190 (1983); Paterson, A.H. et al., Submitted, 1988. f. Application of QTL Mapping
The general approach of QTL mapping is broadly applicable to a wide range of biological endeavors.
For example, in agriculture, it might be desirable to transfer to domestic strains many quantitative traits harbored in wild species, including resistance to
diseases and pests, tolerance to drought, heat, cold and other adverse conditions, efficient use of resources and high nutritional quality. Rick, CM., In: Genes,
Enzymes, and Populations (Ed. A.M. Srb) 255-268 (Plenum, NY, 1973); Harlan, J. R., Crop Sci. , 16:329-333 (1976). In mammalian physiology, selective breeding has generated rodent strains which differ greatly in quantitative traits, such as hypertension, atherosclerosis, diabetes, predispositions to cancer, drug sensitivities and various behavioural patterns. Information on the number,
location and nature of these QTLs would be of value in medicine. Festing, M. F. W., Inbred Strains in Biomedical Research, (Oxford, 1979). In evolutionary biology, the process of speciation can be investigated by studying the number and nature of genes underlying reproductive isolation. Coyne, J. A. and B. Charlesworth, Heredity, 57:243-246 (1986).
The availability of detailed RFLP linkage maps makes it possible to dissect quantitative traits into discrete genetic factors (QTLs): all regions of a genome can be assayed and accurate estimates of phenotypic effects and genetic position derived from interval analysis.
Tanksley, S. D. et al., R. Proc. 18th Stadler Genet.
Symp., In Press; Helentjaris, T., Trends in Gen.,
3(8):217-221 (1987); Landry, B.S. et al., Theor. Appl. Genet., 74: 646-653 (1987); Burr, B. et al., Genetics, 118:519-526 (1988); Chang, C et al., Proceedings of the
National Academy of Sciences, U.S.A., In Press; McCouch,
S.R. et al., Theor. Appl. Genet., In Press; Kosambi, D.D., Ann. Eugen., 12:172-175 (1944). Once QTLs are mapped, RFLP markers permit genetic manipulations such as rapid construction of near-isogenic lines: flanking markers may be used to retain the QTL and the study of the remaining markers may be used to speed progress by identifying individuals with a fortuitously high
proportion of the desired genetic background (see Example 1). Using isogenic lines, the fundamental tools of genetics and molecular biology may be brought to bear on the study of QTLs, including testing of complementation, dominance and epistasis; characterization of
physiological and biochemical differences between
isogenic lines; isolation of additional alleles by mutagenesis (at least in favourable systems); and
physical mapping and molecular cloning of genetic factors underlying quantitative traits. EXAMPLE 1 Interspecies Back-cross and Assessment of Progeny
The tomatoes were grown in the field at Davis, California, in a completely randomized design including 237 BC plants (with E as the recurrent pistillate
parent), as well as E, CL and the F1 are controls.
Neither CL nor the F1 progeny matured completely, as is typical in the central valley of California. Among the BC plants, six failed to mature and 12 produced too few fruit to assay reliably for quantitative traits. The absence of quantitative trait data for these few progeny should yield at most a slight bias in the analyses. The frequency distribution for fruit masses, soluble solids concentration (ºBrix, a standard refractometrie measure primarily detecting reducing sugars, but also affected by other soluble constituents; 1ºBrix is approximately 1% w/w) and pH in the E parental strain and in the backcross (BC) progeny are shown in Figure 1. Means and standard deviations for the distributions of the E parental strain (E filled bars) and the BC progeny (BC open bars) appear in the upper right of each histogram. The distributions for soluble-solids concentration and pH are approximately normal. The distribution of the BC progeny for fruit weight is clearly skewed; log1 0 (fruit mass) was studied throughout to achieve approximate normality (E = 1.81 ± 0.07; BC = 1.20 ± 0.19). Wright, S., Evolution and the Genetics of Populations, (Univ. of Chicago, Chicago, 1968). The proportion of variance due to environment was estimated to be the square of the ratio of the standard deviations (E/BC), for log-mass, solids and pH. Figure 2 shows the distribution of percentage of recurrent parent (E) genotype in the 237 back-cross progeny, estimated on teh basis of the marker genotypes and their relative distances. Determination of marker genotypes was as previously described. Tanksley, S. D. and J. Hewitt,
Theor. Appl. Genet., 7 5 : 811 - 823 (1988). Estimates of the percentage of recurrent parent genome were produced by the recently-developed computer program HyperGene™.
Although the average agreed closely with the Mendelian expectation of 75% for a back-cross, values for
individual plants ranged from 59% to over 90%. The distribution of the proportion of recurrent-parent genome agrees with the mathematical expectation. Franklin, L.A., Theor. Populat. Biol., 11:60-80 (1977); Stam, P.,
Genet. Res., 25:131-155 (1980). The individual with >90% E appears to carry only five fragments from CL (ranging from 9 to 47 map units in length) and could be returned to essentially 100% E with two additional back-crosses of about 550 plants. This is far more rapid than the 6-8 back-crosses routinely used to eliminate donor genome in the absence of markers.
QTL likelihood maps indicating lod scores for fruit mass (solid lines and bars), soluble-solids concentration (dotted lines and bars) and pH (hatched lines and bars), throughout the 862 cM spanned by the 70 genetic markers are shown in Figure 3. The RFLP linkage map used in the analysis is presented along the abscissa, in Kosambi cM. Kosambi, D. D., Ann. Eugen., 12:172-175 (1944). The order of teh markers agrees with the previously-published map of the E x P cross, except for three inversions of adjacent markers: (TG24-CD15), (TG63-CD32B) and
(TG30-TG36). Tanksley, S.D. et al., R. Proc. 18th
Stadler Genet. Symp., ln Press. In the first case, re-analysis of the E x P data with MAPMAKER indicates that the order shown here is the more likely order, in both E x P and E x CL. Lander, E.S. et al., Genomics, 1:174-181 (1987); Proc. Natl. Acad. Sci., USA,
84:2363-2367 (1987). For the other two, the orders shown here are more likely in E x CL by odds of 104:1 and
107:1, but the inverse is more likely in E x P by 11:1 and 8:1 odds. These differences will be investigated in a larger E x P population. Soluble-solids concentration and pH were analyzed in ºBrix and pH units, respectively; allele effects on fruit mass are presented in g; log transformation of fruit mass was used in all analyses to achieve approximate normality. The maximum likelihood effect of a putative QTL, as well as the lod score in favour of the existence of such a QTL, have been
determined at points spaced every 1 cM throughout the genome, according to the method described herein and a smooth curve plotted through the points. The height of the curve indicates the strength of the evidence (log10 of the odds ratio) for the presence of a QTL at each location and not the magnitude of the inferred allelic effect. The horizontal line at a height of 2.4 indicates the stringent threshold that the lod score must cross to allow the presence of a QTL to be inferred, as described herein.
Information about the likely position of the QTL can also be inferred from the curve. The maximum likelihood position of the QTL is the highest point on the curve.
Bars below each graph indicate a 10:1 likelihood support interval for the position of the QTL (the range outside which the likelihood falls by a lod score of 1.0), whereas the lines extending out from the bars indicate a
100:1 support interval. Phenotypic effects indicated beside the bars are the inferred effect of substituting a single CL allele for one of the two E alleles at the QTL.
Several regions show sub-threshold effects on one or more traits (chromosome one near TG19, chromosome five near TG34 and chromosome 12 near TG68) which may represent QTLs; this requires additional testing. The region near TG68 may be particularly interesting, as it is the only instance found where the CL allele seems to decrease soluble-solids concentration (by about
0.7 ºBrix). In the case of chromosome 10, the lod score for pH crosses the significance threshold in two places. Controlling for the presence of a QTL near CD34A, testing for the presence of a second QTL near μ was carried out, by comparing the maximum lod scores assuming the presence of only the first QTL to the maximum lod score assuming the presence of two QTLs). Allowing for a QTL in the region of CD34A, the residual lod score near μ falls below the required threshold. Thus, the evidence is not yet sufficient to support the presence of a QTL near μ . Methods. The lod score and the maximum likelihood estimate (MLE) of the phenotypic effect at any point in the genome is computed assuming that the distribution of phenotypes in the BC progeny represents a mixture of two normal distributions (of equal variance) with means depending on the genotype at a putative QTL at the given position. (Note that QTLs are considered individually and there is no assumption that different QTL effects can be added, except in studying the possibility of two QTLs on chromosome 10 affecting pH . Specifically, at a given position in the genome, the likelihood function for individual i with quantitative phenotype Φ is given by
L1(α,σ) = (2πσ2)(p1 exp(-Φ)22 ) + p2 exp
(-(Φ1)-α)2/2σ2), where α is the effect of substituting a
CL allele for an E allele at a putative QTL in the given position, σ2 is the phenotypic variance not attributable to the QTL and p1 and p2 are the probabilities that individual i has genotype E/E and E/CL, respectively, at the QTL (which can be computed on the basis of the genotypes at the flanking markers and the distance to the flanking markers). The likelihood function for the entire population is L = 11 L1. Also α* and σ* denote the MLEs allowing the possibility of a QTL at the
location (the values which maximize L) and σ** denotes the MLE of σ , subject to the constraint that no QTL is linked (α = 0). The lpd score is then given by log10
L(α*, σ*)/L(0, σ **). This method for QTL mapping is developed more fully in Example 2.
EXAMPLE 2 Analytical Methods Used in Mapping Quantitative
Traits Using RFLP Linkage Maps
The following is a description of a set of
analytical methods that modify and extend the classical theory for mapping discrete Mendelian factors underlying quantitative traits, referred to as quantitative trait loci (QTLs). These include: (i) a method of identifying promising crosses for QTL mapping by exploiting a
classical formula of Sewall Wright; (ii) a method
(interval mapping) for exploiting the full power of RFLP linkage maps by adapting the approach of LOD score analysis used in human genetics, to obtain accurate estimates of the genetic location and phenotypic effects of QTLs; and (iii) a method (selective genotyping) that allows a reduction of up to 7-fold in the number of progeny that need to be scored with the DNA markers.
Figures 10 and 11 are graphs that allow geneticists to estimate, in any particular case, the number of progeny required to map QTLs underlying a quantitative trait. (i) Identification of promising crosses for QTL mapping. Genetic dissection of a quantitative trait will succeed only when some of the QTLs segregating in the cross have relatively large phenotypic effects. It has been shown, through use of a classical formula of Sewall Wright, that it is often possible to recognize such crosses in advance and thereby to ensure that QTLs will in fact be identified.
The basic methodology for mapping QTLs involves arranging a cross between two inbred strains differing substantially in a quantitative trait: segregating progeny are scored both for the trait and for a number of genetic markers. Typically, the segregating progeny are produced by a B1 backcros s (F1 x Parent) or an F2
intercross (F1 x F1). For simplicity, only the backcross will be discussed in detail. As noted below, the F2 Intercross is completely analogous and requires only about half as many progeny.
Definitions and assumptions: Let A and B be inbred strains differing for a quantitative trait of interest, and suppose that a Bl backcross is performed with A as the recurrent parent. Let
A2 A), (μB2 B), (μF12 F1) and (μB12 B1) denote the mean and variance of the phenotype in the A, B, F1 and B1 populations, respectively (see Figure 4). Let D = μB - μA > 0 denote the phenotypic difference between the strains. The cross will be analyzed under the classical assumption that the phenotype results from summing the effects of individual QTL alleles, and then adding normally-distributed environmental (i.e.,
non-genetic) noise. (Mather, K. and J.L. Jinks,
Biometrical Genetics, Cornell University Press, Ithaca, NY (1971); Falconer, D. S., Introduction to Quantitative Genetics, Longman, London (1981)). In particular, complete codominance and no epistasis are assumed. These assumptions imply that: μF1 = ½ ( μA + μB), (1a) μBC1 = ½(μAF1), and (1b) σ2 A = σ2 B = σ2 F1 < σ2 BC1, (1c)
The variances within the A, B and F1 populations equal the environmental variance, σ2 E, among genetically identical individuals, while the variance within the B1 progeny also includes genetic variance σ 2 G = σ 2 B1 - σ 2 E .
Frequently, phenotypic measurements must be
mathematically transformed so that parental phenotypes are approximately normally distributed and the relations
(labc) are approximately satisfied. For example, Wright (1968) obtained an excellent fit to the theory by
applying a log-transformation (appropriate when variances scale with the mean) to tomato fruit weight.
By the phenotypic effect δ of a QTL, is meant the additive effect of substituting both A alleles by B alleles. A single allele has effect ½δ, since
codominance is assumed. In a backcross, the segregation of a QTL with effect δ contr ibute s an amount δ 2/16 to the genetic variance σ 2 G . The variance explained by the QTL is written σ2 exp = δ2/16, while the residual variance is σ2 res = 02 B1 -02 exp.
Choosing Strains
The ability to map QTLs underlying a quantitative trait depends on the magnitude of their phenotypic effect: the smaller the effect that one wishes to detect, the more progeny will be required. Before attempting genetic dissection of a quantitative trait, it would thus be desirable to identify crosses segregating for QTLs with relatively large phenotypic effects and to estimate the magnitude of the effects. In fact, this can often be accomplished by exploiting a classical formula of Wright.
Wright (quoted by Castel 1921; Wright 1968) proved that the number of QTLs segregating in a backcross between two strains with phenotypic difference D can be estimated by the formula: k = D2/16σ2 G (2) provided that the following assumptions hold: (i) the QTLs have effects of equal magnitude; (ii) the QTLs are unlinked; and (iii) the alleles in the high strain all increase the phenotype, while those in the low strain decrease the phenotype. The estimate k is called the number of effective factors in the cross. If the assumptions are satisfied, then each QTL affects the phenotype by (D/k) and explains (1/k) of the genetic variance in the backcross.
Unfortunately, if these assumptions are not
satisfied (as will be likely in practice), the number of effective factors k may seriously underestimate the number of QTLs. In principle, the number of QTLs is unlimited. In this case, must there exist any QTLs affecting the phenotype by (D/k)? More generally, for any 0 ≤ ε ≤ 1, must there exist QTLs affecting the phenotype by ε(D/k)? And, how must of the total
phenotypic difference D and the genetic variance σ 2 G can be attributed such QTLs? Proposition 1 (proven in
Appendix [A1]) supplies an answer. Proposition 1.
Consider a cross in which the phenotype difference between the strains is D and the number of effective factors is k. Assume that the QTLs are unlinked and that the alleles in the 'high' strain all increase the
phenotype. No matter how many QTLs are segregating and no matter what their individual phenotype effects, the sets of QTLs that alter the phenotype by at least e(D/k) must together account for a fraction ≥ Dε of the total phenotypic difference D between the strains and must together explain a fraction ≥ Vε of the genetic variance in the second generation, where
Dε = [ ½ ε + (1-ε)k+½ε2]/k and Vε = 1 - ε(1-Dε).
Considering the case ε=1, the proposition states that the QTLs with phenotypic effect (D/k) must account for a phenotypic difference of at least (D/k). In other words, there must exist at least one QTL having
phenotypic effect ≥( D/k) .
Consider the search for QTLs with somewhat smaller effects. How much of the phenotypic difference can be attributed to QTLs with effect ≥ ½(D/k)? Taking ε = ½ aad considering various values of k, results in the following: Minimum proportion of Minimum proportion of k phenotypic difference genetic variance σ2 G D accounted for by QTLs explained by QTLs with effect ≥ ½(D/k) with effect ≥ ½(D/k)
2 64% 82%
3 50% 75%
4 42% 71%
5 37% 69%
A small value of k thus Implies that the cross must be segregating for QTLs with relatively large effects
(≥ ½(D/k)), which together account for a substantial proportion of the phenotypic difference and explain a substantial proportion of the genetic variance in the backcross.
In other words, Wright's formula can be used to indicate the presence of some QTLs with large effects-- even though the number k of effective factors may not be a reliable estimate of the total number of QTLs. Note that Proposition 1 provides only worst-case lower bounds: in general, the QTLs with large effects will have an even greater effect.
How serious a limitation is posed by the two
assumptions remaining in Proposition 1?
(I) The first assumption is not essential:
admitting the possibility of linked QTLs simply allows that some large QTL effects may eventually prove to be due to several nearby genes. Such questions may be safely neglected at first.
(ii) The second assumption is more important.
Fortunately, it is possible to choose crosses in which it is likely to be satisfied. The ideal situation would be two strains arising from brief, intense artificial selection for and against the trait in an outbred
population, followed by inbreeding: in such a case, classical selection theory shows that a "high" strain is unlikely to fix a "low" allele at QTLs with relatively large effect; moreover, the force of selection will be greatest on the QTLs with the largest effects. (Falconer, D. S., Introduction to Quantitative Genetics, Longman, London (1981)). Many such strains have been developed to study various physiological traits. As a reasonable alternative, one could use strains that appear to have resulted from natural selection for the trait.
Judicious choice of strains can essentially assure that some QTLs will be detected in a reasonable progeny size that can be calculated in advanced. When studying strains resulting from selection, a reasonable approach to mapping QTLs would be to use enough progeny to map QTLs having effect δ between ½(D/k) and (D/k). Of course, one could choose to study more progeny and might well be rewarded with the detection of QTLs with smaller effects.
Unselected strains exhibiting extreme phenotypic difference may also merit attention: QTLs with large effects may well be segregating, despite the lack of a mathematical guarantee. When there is clear evidence of both high and low alleles within a strain--as when many segregating progeny exhibit phenotypes more extreme than either parent--the analysis above does not apply; the detection level for QTLs must be chosen somewhat
arbitrarily. When there is no such evidence, one might proceed as above in choosing a progeny size. Although the existence of QTLs having a given effect is no longer assured, the methods described below for detecting QTLs in a cross are unchanged.
Assuming that the desired detection level δ has been chosen, next considered are the method for mapping QTLs and the number of progeny required.
(ii) Exploit the full power of complete linkage maps. The traditional approach to mapping QTLs involves studying single genetic markers one-at-a-time. Sax, K., Genetics, 8:552-560 (1923; SoHer, M. and T. Brody,
Theor. Appl. Genet., 47:35-39 (1976). In general, the drawbacks of the method include that (a) the phenotypic effects of QTLs are systematically underestimated, (b) the genetic locations of QTLs are not well resolved because distant linkage cannot be distinguished from small phenotypic effect, and (c) the number of progency required for detecting QTLs is larger than necess ary . Adapting the method of LOD scores used in human genetic linkage analysis, it has been possible to remedy these problems by the approach of interval mapping of QTLs. In addition, the traditional approach neglects the problem that testing many genetic markers increases the risk that false positives will occur. As described below, the appropriate degree of statistical stringency to prevent such errors in mapping QTLs has been determined. Traditional Approach to Mapping QTLs
The traditional approach for detecting a QTL near a genetic marker involves comparing the phenotypic means for two classes of progeny: those with genotype marker AB, and those with marker genotype AA . The difference between the means provides an estimate of the phenotypic effect of substituting a B allele for an A allele at ehe QTL. To test whether the inferred phenotypic effect is significantly different from 0, one applies a simple statistical test--amounting to linear regression or one-way analysis of variance, under the assumption of normally-distributed environmental variance.
Consider a QTL that contributes σ2 exp to the genetic variance. Supposing that such a QTL were located exactly at a marker locus, the number of progeny required for detection would be approximately (Zα)22 res2 exp), (3) where this progeny size affords a 50% probability of detection if such a QTL is actually present and a
probability α of a false positive if no QTL is linked.
Here, Zα is defined by the equation Probability (z > Zα) = α where z is a standard normal variable (i.e., Zα is the number of standard deviations beyond which the normal curve contains probability α). Soller and Brody suggest allowing a false positive rate of α = 0.05. Soller, M. and T. Brody, Theor. Appl. Genet ., 47:35-39 (1976). For a given false positive rate, the required progeny size thus essentially scales inversely with the square of the phenotypic effect of the QTL or, equivalently, inversely with the variance explained.
Although it captures the key features of QTL
mapping, the traditional approach has a number of
shortcomings:
(i) If the QTL does not lie at the marker locus, its phenotypic effect may b e s e r ious ly undere s t imate d .
I f the re comb inat i on fraction is θ, the inferred
phenotypic effect of the QTL is biased downward by a factor of (1-2θ). (ii) If the QTL does not lie at the marker locus, substantially more progeny may be required. In
particular, the variance explained by the marker
decreases by a factor of ( 1 - 2 θ ) 2 and the number of progeny consequently increases by a factor of 1/(1-2θ)2.
For an RFLP map with markers every 10, 20, 30 or 40 cM throughout the genome, the progeny size would need to be increased by 22%, 49%, 82% or 123%, respectively, to account for the possibility that the QTL might lie In the middle of an interval (i.e., at the maximum distance from te nearest RFLP).
(iii) The approach does not define the likely position of the QTL. In particular, it cannot
distinguish between tight linkage to a QTL with small effect and loose linkage to a QTL with large effect.
(iv) The suggested false positive rate of α = 0.05 neglects the fact that many markers are being tested. While the chance of a false positive at any given marker is only 5% the chance that a false positive will occur somewhere in the genome is much higher.
These difficulties stem from the fact that single markers are analyzed one-at-a-time. To remedy these problems, the approach is generalized, as described in the following section, to make it possible to exploit the full power of an RFLP linkage map to scan the intervals between markers as well.
QTL Mapping: Interval Mapping using LOD Scores
Method of maximum likelihood: The traditional approach, Involving linear regression of phenotype on genotype, is a special case of the method of maximum likelihood. Formally, the phenotype Φ i and genotype gi are assumed to be related by the equation Φ i = α + bg i + ε , where gi is encoded as a (0, 1)-indicator variable, ε is a random normal variable with mean 0 and variance a2 , and a, b, and σ2 are unknown parameters. Here, b denotes the estimated phenotypic effect of a putative QTL.
The linear regression solutions (a*, b*, σ2*) are, in fact, maximum likelihood estimates (MLEs) for the
parameters--thhat is, they are the values which maximize the probability L(a,b,σ2) that the observed data would have occurred. Here,
L(a,b,σ2) = π i z ( (Φi - (a+bgi)), α2), (4) where z(x,σ2) = (2πσ2) exp(-x2/2σ2) is the probability density for the normal distribution with mean 0 and variance σ2. The MLEs are compared to the constrained MLEs obtained under the assumption that b = 0,
corresponding to the assumption that no QTL is linked.
These constrained MLEs are easily seen to be
( <μA, 0, σ2 BC1). The evidence for a QTL is then
summarized by the LOD score: LOD = log10 (L(a*, b*, σ 2* )/L(μA, 0, σ2 BC1)), essentially indicating how much more likely the data is to have arisen assuming the presence of a QTL than assuming its absence. (The choice of log10 accords with longstanding practice in human genetics, although loge would be slightly more convenient below). Morton, N.E., Am . J. Hum. Genet., 7 : 211 -318 (1955). If the LOD score exceeds a predetermined threshold T, a QTL is declared to be present. The important issues are: (i) What LOD threshold T should be used, in order to maintain an acceptable low rate of false positives? (ii) What is the expected contribution to the LOD score (called the ELOD) from each additional progeny? The number of progeny required is then T/ELOD, to provide even odds of
detecting the QTL with the desired false positive rate.
When only a single genetic marker is being tested, these equations are easily answered. (i). By a general result about maximum likelihood estimation in large samples, LOD is asymptotically distributed as
½(log10e)x2, where x2 denotes the chi-squared
distribution with one degree of freedom. Kendall, M. and
A. Stuart, The Advanced Theory of Statistics, Vol. 2,
Griffin: London 1979). A false positive rate of α will thus result if the LOD threshold is chosen so that T = ½(log10e) (Zα)2. For the 5% error rate suggested by
Soller and Brody, the threshold is T = 0.83. The
question of the appropriate threshold when many markers are being tested is postponed temporarily. (ii) For a QTL contributing σ2exp to the backcross variance, the expected LOD score per progeny (ELOD) is
ELOD = ½ log10(1 + σ2exp2res) (5a)
≃ ½(log10 e) (σ2exp2res) (5b)
≃ 0.22 (σ2exp2res) (5c) where (5a) follows from well-known results about linear regression and (5b) follows from Taylor expansion for small values of (σ2exp2res). Combining these two results, the number of progeny required so that the LOD score is expected to exceed T is T/ELOD = (Zα)22exp2res) (6) This confirms that the maximum likelihood approach agrees with the result (3) from the traditional approach above, when examining effects at a single marker locus. The more general framework of maximum likelihood, however, allows the method to be generalized to several more complex situations described below.
(iii) Decrease the number of progeny to be
genotyped. ln typical cases, a reduction of up to 7-fold can be achieved by combining two approaches: interval mapping and selective genotyping. Selective genotyping involves growing a larger population, but genotyping only those individuals whose phenotypes deviate substantially from the mean. Additional methods for increasing the power of QTL mapping include reducing environmental noise by progeny testing and reducing genetic noise by studying several genetic regions simultaneously.
Interval mapping: If genetic markers have been scored throughout the genome, the method of maximum likelihood can be used as above to estimate the
phenotypic effect and the LOD score for a putative QTL at any given genetic location. (Lander, E.S. and D.
Botstein, Proceedings of the National Academy of
Sciences, USA, 83 : 7353 -7357 (1986); Lander, E.S. and D. Botstein, Cold Spring Harbor Symp. Quant. Biol. , 51: 49 -62 (1986)). The main difference is that the QTL genotype gi for individual i is unknown: the appropriate likelihood function is therefore
L(a,b,σ2) = πi [Gi(0)Li(0) + Gi(1)Li(1)], (7) where Li(x) = z(( Φ i - ( a+bx)),σ2) denotes the likelihood function for the individual i assuming that gi =x and G i (x) denotes the probability that gi =x conditional on the genotypes and positions of the flanking markers.
(Given a map function, G is easily computed. For
example, if the flanking markers both have genotype AA in an individual and they lie at recombination fraction θ and θ' from the putative QTL, then the probability of the QTL genotype being AB is θ θ ' , assuming no interference.) Note that (7) reduces to (4) in the special case that the QTL lies at a marker locus and the genotype gi is thus known with certainty.
Finding the maximum likelihood solution (a*, b*, σ 2*) to (7) can be regarded as a linear regression problem with missing data: none of the independent variables (genotypes) are known; only probability distributions for each are available. Although standard computer programs for linear regressions cannot be used, techniques for maximum likelihood estimation with missing data have been developed in recent years. Little, R.J.A. and D.B.
Rubin, Statistical Analysis with Missing Data, Wiley, NY (1987). By adapting the EM algorithm, a computer program MAPMAKER-QTL has been written to compute LOD scores for putative QTLs. Dempster, A. P. et al., J. Roy. Statist.
Soc., 39:1-38 (1977); Lander, E.S. and P. Green,
Proceedings of the National Academy of Sciences, USA,
84:2363-2367 (1987).
To illustrate the method, simulated data from many backcrosses has been analyzed. Figure 5 presents a QTL likelihood map, showing how the LOD score varies
throughout a genome, for a simulated data set involving 250 backcross progeny segregating for five QTLs with various allelic effects. Based on the assumed genome size and density of markers, a LOD score of 2.4 is required (see below) for declaring the presence of a QTL. In the example, the four largest QTLs are detected while the fifth does not attain statistical significance. The approximate position of the QTLs is indicated by one-lod confidence intervals, defined by the points on the genetic map at which the likelihood ratio has fallen by a factor of 10 from the maximum; such confidence intervals are frequently used in human genetics to indicate the probable position of genes. (Ott, J., Analysis of Human Genetic Linkage, Johns Hopkins University, Baltimore (1985)).
Among the advantages of the approach are:
(i) The QTL likelihhod map represents clearly the strength of the evidence for QTLs throughout the entire genome.
(ii) In contrast to the traditional approach, the inferred phenotypic effects are asymptotically unbiased. This is an immediate consequence of the fact that they are MLEs for a correctly specified model. (Kendall, M. and A. Stuart, The Advanced Theory of Statistics, Vol. 2, Griffin: London (1979)).
(iii) The probable position of the QTL is given by confidence intervals, indicating the range of points for which the likelihood ratio is within a factor of 10 (or 100, if desired) of the maximum.
(iv) Interval mapping requires fewer progeny than the traditional approach for the detection of QTLs. In meioses in which the flanking markers do not recombine, the genotype of the QTL is known almost certainly--up to the chance of a double crossover (e.g., at most 1% in the case of a 20 cM RFLP map). In essence, the flanking markers can be thought of as a single tightly-linked virtual marker in such meioses. Supposing that genetic markers are available every d cM and considering the (worst) case of a QTL In the middle of an interval, one can show (Appendix [A2]) that
ELODinterval mapping ≃ (1-2θ)2ELOD/(1-ψ), (8a) where ψ is the recombination fraction corresponding to d cM, θ is the recombination fraction corresponding to ½d cM, and ELOD is the expected LOD score for a marker located exactly at the QTL. By contrast, recall that
ELODsingle markers ≃ (1-2θ)2 ELOD. (8b)
Interval mapping thus decreases the required number of progeny by a factor of (1-ψ)--which is exactly the proportion of meioses in which the flanking markers do not recombine. For maps with d = 10,20,30 and 40 cM, the savings are 9%, 16%, 23% and 28%, respectively.
(v) QTL likelihood maps can also be used to
recognize a pair of linked QTLs, provided that they are not so close that recombination between them is very rare. Holding fixed the position of one QTL, the
increase in LOD score caused by a second putative QTL can be computed for each position along the chromosome. An example is shown in Figure 6.
In addition to being tested on numerous simulated data sets, Interval mapping has recently been applied to an interspecific backcross in tomato: six QTLs affecting tomato fruit weight, four QTLs affecting the
concentration of soluble solids, and five QTLs affecting fruit pH were mapped to about 20-30 cM. (Paterson, A.H. et al., Submitted (1988)). In general, interval mapping should prove valuable for analyzing and pres enting evidence for QTLs, and for decreasing somewhat the number of progeny required to detect QTLs of a given magnitude.
Appropriate threshold for LOD scores: When an entire genome is tested for presence of QTLs, the usual nominal significance level of 5% corresponding to a LOD score of 0.83 is clearly inadequate. Indeed, applying this standard would have resulted in a spurious QTLs being declared on chromosome 10 in Figure 5. The
appropriate threshold depends on the size of the genome and the density of markers genotyped.
To determine the correct LOD threshold, it is useful to consider two limiting situations: (i) the sparse-map case, in which consecutive markers are well-separated and (ii) the dense-map case, in which the spacing between consecutive markers approaches zero. In each case, the issue is: If no QTLs are segregating, what is the chance that the LOD s core wi l l exce ed the thre sho ld T s omewhere in the genome?
In the sparse-map case, occurrences of spuriously high LOD scores are essentially independent. To achieve an overall significance level of a when M intervals are tested, a nominal significance level of α/M should be required for each individual test, corresponding to a LOD threshold of ½(log10e)(Zα/M)2.
In the dense-map case, occurrences of spuriously high LOD scores at nearby markers are no longer
independent events. Even if markers were typed
continuously throughout the genome, there would be a maximum statistical penalty to be paid. In fact, as shown in the Appendix [A3], in the limit of an infinitely dense-map and a large progeny size, the LOD score varies according to the square of an Orens tein-Uhlenbeck
diffusion process. Well-known in physics and engineering, the Orenstein-Uhlenbeck diffusion describes a particle executing Brownian motion while being coupled to the origin by a weak spring. The extreme value properties of this diffusion have been extensively studied and the results immediately translate into statements about how high a LOD score will be expected by chance, given the size of the genome. (Leadbetter, M.R. et al., Extremes and related properties of random
sequences and processes, Springer, NY (1983)).
Specifically, for a high threshold T, there is (see
Appendix [A3]) the following result:
Proposition 2. Consider an organism with C
chromosomes and genetic length G, measured in Morgans.
When no QTLs are present, the probability that the LOD score exceeds a high level T is ≃ (C + 2Gt) x 2(t), where t = (4.6)T. In order to make the probability less than α that a false positive occurs somewhere in the genome, the appropriate LOD threshold is thus ≃Tα = 4.6tα, where tα solves the equation α = (C + 2Gtα) x2(tα).
For both the sparse-map and dense-map cases, a standard chi-square table may be used to calculate the LOD score threshold corresponding to a 5% chance that even a single false positive will occur. For
intermediate situations, extensive numerical simulation was used to determine the appropriate LOD thresholds as a function of genome size and marker spacing (Figure 7). Typically, a LOD score of between 2 and 3 Is required to ensure an overall false positive rate of 5%. For instance, analyzing the domestic tomato (C = 12, G ≃ 11) with a 20 cM RFLP map requires a LOD threshold of 2.4-- equivalent to applying a nominal significance level of about α ' = 0.001 for each individual test performed. If the nominal 5% significance level (LOD > 0.83) were used instead, the probability would exceed 90% that a false positive would arise somewhere in the genome. Indeed, a LOD score of 1.5 occurred by chance on chromosome 10 in the simulated data shown in Figure 5.
Number of progeny required: Given the ELOD for a
QTL as a function of its phenotypic effect (Equation (8)) and the LOD threshold T (Figure 7) a progeny size of T/ELOD will ensure a 50% chance of detecting linkage to such a QTL no matter where it lies in the genome. If it is desired to increase the chance of success to 100β%, standard arguments show that the progeny size should be further increased by a factor of [1 + (Z1 -β/Zα,)]2, where α' is the nominal significance level corresponding to a
LOD score of T. (Kendall, M. and A. Stuart, The Advanced Theory of Statistics, Vol. 2, Griffin: London (1979)).
A technical note: The approximate progeny sizes given above (Equations 3, 5ab, 6, 8ab) are exact in the case of QTLs with small effects. Slight modifications are required for QTLs with large effects; see Appendix [A4].
Increasing the Power of QTL Mapping
Although interval mapping increases the efficiency of QTL mapping somewhat, large numbers of progeny may still be required. Additional methods are available to increase the power of QTL mapping, the most important of which is selective genotyping.
Selective genotyping of the extreme progeny: Some progeny contribute more linkage information than others.
As a general principle, the individuals that provide the most linkage information are those whose genotype can be most clearly inferred from their phenotype. For example,
Lander and Botstein have pointed out that the vast majority of linkage information about human diseases with incomplete penetrance comes from the affected
individuals; since the genotype of unaffected individuals is uncertain, they provide relatively little information. Lander, E.S. and D. Botstein, Cold spring Harbor Symp. Quant. Biol., 51:49-62 (1986).
Applying this principle to quantitative genetics, the highest ELODs are provided by the progeny that deviate most from the phenotypic mean. When the cost of growing progeny is less that the cost of complete RFLP genotyping (as is frequently the case), it will thus be more efficient to increase the number of progeny grown but to genotype only those with the most extreme
phenotypes. This increase in efficiency can be estimated as follows, with a more precise argument given in the Appendix [A5]. Since regression minimizes squared deviations from the mean, the ELOD conditional on an individual's phenotype Φ is proportional to (Φ - μB1)2.
Thus, the proportion of individuals with extreme
phenotype Φ such that ∣Φ -μB1≥ L i s
Q(L)=2∫L z(x)dx, while the proportion of the total linkage information contributed by such individuals is
S(L) = 2∫L x2z(x)dx = Q(L) [1 + 2Lz(L)/Q(L)]
≃ Q(L)[1 + L2] (9) using integration by parts and the approximation
z(L)/Q(L) = ½L for large L. Accordingly, the same total linkage information would be obtained by growing a population that was larger by a factor of h(L) = 1/S(L), but only genotyping individuals with extreme phenotypes. The number of progeny to genotype would fall by a factor of g(L) = S(L)/Q(L)≃[1+L2]. Graphs of Q(L), S(L), h(L) and g(L) are shown in Figure 8. Results show that:
(i) Progeny with phenotypes more than 1 standard deviations from the mean comprise about 33% of the total population but contribute about 81% of the total linkage information. By growing a population that was only about 25% larger and genotyping only these extreme progeny, the same total linkage information would be obtained from genotyping only about 40% as many individuals.
(ii) Progeny with phenotypes more than 2 standard deviations from the mean comprise about 5% of the total population but contribute about 28% of the total linkage information. By growing a population that was about 3.6-fold larger and genotyping only these extreme
progeny, the same total linkage information would be obtained from genotyping about 5.5-fold fewer individuals (since h(2) = 3.6 and g(2) = 5.5).
(iii) It is probably unwise to go beyond the 5% tails of distribution. From a practical point of view, true phenotypic outliers may represent artifacts.
Moreover, the increase in population size required for L > 2 outweighs the decreased number of individuals to genotype.
The strategy of selective genotyping will
substantially increase efficiency whenever growing and phenotyping additional progeny requires less effort than completely genotyping individuals at all RFLP markers, which is typically the case in many organisms.
It should be noted that standard computer programs for linear regression cannot be used (even for single marker analysis) when only the extreme progeny have been genotyped; phenotypic effects would be grossly
overestimated because of the biased selection of progeny. As in the case of interval mapping, missing-data methods are required. (Little, R.J.A. and D.B. Rubin,
Statistical Analysis with Missing Data, Wiley, NY
(1987)). Conveniently, the maximum likelihood methods discussed above will produce the correct results,
provided that the phenotypes are recorded for all
progeny: genotypes for the non-extreme progeny may simply be entered as missing. Using the MAPMAKER-QTL program, the method has been applied to both simulated and experimental data sets.
Decreasing environmental variance via progeny testing: As shown above, the number of progeny needed to map a QTL is proportional to
2res2exp) = [(σ2 G + σ2 E ) /σ2exp ] - 1.
Typically, the environmental variance exceeds the genetic variance. If σ2 E could be reduced, QTL mapping would become considerably more efficient. If the environmental noise results from measurement error, one might either average replicate measurements or try to develop a better assay. More often, environmental noise results from actual physiological differences between genetically identical individuals. In this case, σ2 E can be reduced through progeny testing; an individual's phenotype would be taken to be the average phenotype of n of its B2 backcross offspring. The variance of this average is σ2 E '= (1/n) [ ½ σ2 G + σ2 E ] , which will be less than σ2 E , except for very small n.. Simultaneous search: Just as environmental noise can be decreased via progeny testing, genetic noise can be reduced by simultaneously studying several intervals containing QTLs. If the genetic variance is large, such an approach may further decrease the number of progeny required. The extension of interval mapping to such simultaneous search; the question of the appropriate LOD score when considering sets of intervals, and the approximate increase in the power of QTL mapping are discussed in the Appendix [A6] .
F2 intercrosses and recombinant inbred strains: Although the discussion above concerns the backcross, it applies directly to F2 intercrosses and recombinant inbred strains, with the following modifications:
(i) In an F2 intercross, a QTL with phenotypic effect δ contributes variance δ2/8 and thus Wright's formula (2) becomes k = D2/8σ2 E. Since F2 intercrosses provide information about twice as many meioses as backcrosses of the same size, fewer progeny are required for detecting QTLs having purely additive effects: only 50-60% as many progeny are needed, depending on the density of the markers used. If a QTL is partly
dominant, one of the backcrosses will be more efficient and one less efficient for mapping it. The magnitude of dominance effects can be estimated by explicitly
incorporating them into the maximum likelihood analysis via an additional parameter.
(ii) Recombinant inbred strains are analyzed in the same manner as backcrosses, except that the
multi-generational breeding scheme that is used to construct recombinant inbred strains increases the effective genetic length of the genome. Compared to a backcross, the density of crossovers is doubled in a recombinant inbred produced through selfing and is quadrupled in a recombinant inbred produced by sib mating. Haldane, J.B.S. and CH. Waddington, Genetics, 16 : 357-374 (1931). A genetic length of 2G or 4G must be used in place when computing the appropriate LOD
threshold, which leads to an increase of about 0.3 or 0.6, respectively, in the threshold required. Although the higher threshold will increase the number of progeny required, the effect is typically offset by the ability to decrease the number of progeny by reducing the
environmental variance through replicate phenotypic measurements within each recombinant inbred strain (cf. progeny testing above). Recombinant inbred strains will thus typically be more efficient for QTL mapping than equal number of backcross progeny. However, this
advantage may often be negated by the considerable time and effort required to construct large numbers of such strains. However, the ability to replicate phenotype measurements within each recombinant inbred strain leads to a reduction in the environmental variance (cf. progeny testing above). Typically, the latter consideration dominates. A drawback to employing recombinant inbred strains is the considerable time and effort required for their construction .
Although the Results section is mathematical in parts, the Discussion presents the methodology in terms of explicit graphs that allow a geneticist to design crosses to dissect a quantitative trait by using a complete RFLP linkage map. Equivalents
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described specifically herein. Such
equivalents are intended to be encompassed in the scope of the following claims.
APPENDIX
[Al] To prove Proposition I, we use the following lemma.
Lemma. Let x1,..., xn≥ 0. For y≥ 0, let sy =∑'xi and ty =∑'xi 2, where the sum is taken over the terms xi≥ y. If t0/s0≥ y, then sy≥ ½[y +√y2-4(ys0-t0)] and ty≥ t0-y(s0-sy). Proof: From the definitions and the non-negativity of the xi, it is clear that sy 2≥ ty≥ t0 - y(s0-sy ).
The constraint on sy then follows fay considering the outer terms and applying the quadratic formula.♤
In the context of Proposition 1, suppose that the QTLs in the high strain change the phenotype by x1,...,xn≥ 0, respectively. Using the notation above, we have D = s0 and σ2 G= t0/16 = D2/k (because of non-linkage among QTLs and WRIGHT'S formula). Taking y = ε(D/k), the result then follows from the lemma since Dε = sy/s0, and Vε = ty/t0.
[A2] Suppose that a QTL lies midway between two flanking markers. Let θ be the recombination fraction between the QTL and either marker and ψ = 2θ(1-θ) the recombination fraction between the two markers (ignoring interference). In meioses in which they have not recombined (a proportion 1-ψ of the total), the flanking markers act as a single virtual marker linked at recombination fraction γ, where 7 is the chance that the QTL recombines with both markers given that the markers themselves have not recombined. By contrast, meioses in which the flanking markers have recombined provide zero information about linkage of the QTL. The ELOD for interval mapping is thus ( 1 -ψ) times the ELOD for a single marker linked at 7 which in turn is (1 -2γ)2 times the ELOD for a marker at 0% recombination. That is,
ELODinterval mapping = (1 -ψ)(1 -2γ)2 ELOD.
Using the relation γ = θ2/[( 1 -θ)2 + θ2] and simplifying terms. Equation Sa follows. [A3] In the idealized dense-map case, suppose that markers are available at every point along a chromosome. Suppose that there are no QTLs in the genome. For individual i, the phenotype Φi = N(0, 1); that is Φi is a random normal variable with mean 0 and variance 1. For individual i, let xi(d) denote the genotype at a position d cM from the left end of the chromosome (xi = 0 or 1 according to the allele inherited), let β*(d) denote the maximum likelihood estimate of the phenotypic effect of a putative QTL at this position, and let LOD(d) denote the corresponding LOD score. By standard formulas for linear regression. β*(d) =∑(Φi-Φ)(xi-x)/(xi-x)2 where Φ and x are the means of Φi and xi, respectively. For a large population of size n, the central limit theorem implies that β*(d) ~∑4Φi(xi- ½ ), ν(d) :=√nβ*(d) ~ N(0,1 ), σ2 res(d) =∑(Φi-β*(d)xi(d))2 ~ n ( 1 -β-(d))2, σ2 exp(d) ~ n [β*(d)]2 and
LOD(d) ~ ½(log10 e)( σ2 exp(d)/σ2 res(d))
~ ½ (log10 e)[√nβ*(d)]2.
Thus, LOD(d) is asymptotically proportional to the square of a random normal variable ν(d) (which incidentally proves that LOD is proportional to X2).
Let r(a,b) denote the correlation coefficient for any two random variables a and b. Let d1 and d2 denote points on the chromosome and write d = d1-d2. From the asymptotic expression for β*(d) above, it follows that r(d) := r(ν(d1),y( d2)) =∑√nΦi(1-2θ) ~ 1 -2θ, where θ is the recombination fraction corresponding the the genetic distance d = ∣d1-d2∣. Assuming Haldane's map function, 1 -2θ = e-2d.
To summarize, ν(d) is a stationary normal process with covariance function r(d) = e-2d. Up to rescaling d by a factor of ½ , this is the definition of ORENSTEIN - UHLENBECK diffusion and Proposition 2 follows directly (see LEADERBETTER, LINDGREN and ROOTSZEN 1983. Theorem 12.2.9 and discussion following).
While only Haldane's map function yields precisely an ORENSTEIN- UHLENBECK diffusion, the proof of Proposition 2 holds in general. The relevant results in
LEADERBETTER. LINDGREN and ROOTSZEN ( 1983) require only that r(d) ~ 1-2d+o(d2) as d→ 0, which holds for all map functions.
[A4] If QTLs with very large effects are segregating, regression analysis is not strictly appropriate (whether in the traditional approach or in the generalization developed in the text) because the phenotypic distribution eventually becomes bimodal. In the extreme, the assumption that the phenotypes are generated by the segregation of a
QTL will always fit the data much better than the assumption that they follow a normal distribution--even for loci unlinked to any QTLs. Consequently, expressions (3) and (6) fail to give the correct number of progeny required in the case of QTLs with very large effects: indeed, they tend to zero (whereas a minimum positive number of progeny is obviously needed no matter how large the effect of the QTL).
To achieve the correct limiting behaviour, the LOD score for a marker at 0 cM can be redefined as the log10 of
L(a,b,σ2)/ ½[L(a,b,σ2)+L(a,-b,σ2)] with L(a,b,σ2) defined in (4). This ratio measures how much more likely the data is to have been generated by a QTL with the hypothesized effect located at the marker locus than by a QTL with this same effect but unlinked to the marker. The ELOD can be found by numerical integration over the distribution for Φ. In the limit of a QTL with large effect, the expression tends to the traditional LOD score for a qualitative trait used in human genetics.
For the QTLs likely to be encountered in practice, this correction is irrelevant. We have used it in computing the number of progeny required in Figures 5 and 6, however, in order that these graphs exhibit the correct limiting behaviour-- rather than tending to zero.
[A5] For notational convenience, rescale the phenotype so that its mean in the backcross is 0 and and encode the two alternative genotypes by the indiciator variable g = - 1 or 1 (rather than 0 or 1 , as in the text). Given a true QTL, let 2b be the amount by which substituting an allele increases the phenotype and let σ2 be the residual variance unexplained by the QTL out of the total backcross variance∑2 = σ2 + b2. Suppose that a marker is located exactly at the QTL. Conditional on the phenotype Φ of an individual but unconditional on its genotype x at the marker, the LOD score (comparing the true hypothesis H1:(0.b,σ2) to the alternative H0:(0,0,Σ2)) is
L ODΦ =∑g=0,1 π(gIΦ) log10[z(Φ-bg,σ2)/z(Φ,Σ2)] where π (g=xIΦ) is the probability that the individual has marker genotype x given its phenotype Φ, given by z(Φ-bg,σ2)/[z(Φ-bg,σ2)+z(Φ+bg,σ2)].
As claimed in the text, if b is small, LOD Φ is proportional to Φ2. Now, the probability distribution for φ has density p(Φ) = ½[z(Φ-bg,σ2)+z(Φ+bg,σ2)]. Conditional on the phenotype of a backcross progeny deviating from the mean by≥
LΣ, the LOD score is
LOD∣Φ∣≥L∑ =∫IΦI≥L∑ (LODΦ)p(Φ)dΦ.
Letting v = b22 denote the fraction of variance explained by the QTL,
straightforward although tedious integration shows that S(L) = LOD∣Φ∣≥L∑ / LOD∣Φ∣≥0 ≈Q(L) [ 1 + 2uLz(L)/Q(L)], ( 10) where u = -v/loge ( 1 -v)≈ ( 1 - ½ v) and where the approximation in ( 10) is 0(v2) for small v. For QTLs with small effects, this reduces to Equation (9).
[A6] Interval mapping can be straightforwardly extended to the case of multiple intervals explaining a quantitative phenotype: for m intervals, the bracketed term in Equation 7 becomes a sum with 2m terms corresponding to the possible joint genotypes at the m putative QTLs. Since simultaneous consideration of multiple QTLs reduces the unexplained variance, it may be somewhat easier to detect linkage to the set of loci than to any one individually (cf. LANDER and BOTSTEIN 1986ab). The subtle issue is the appropriate threshold for simultaneous search for m QTLs. In a genome with no QTLs, how high a LOD score might occur by chance? For any particular choice of putative QTLs. the LOD score is asymptotically distributed as X2 with m degrees of freedom. When considering an entire genome, the LOD score follows a mathematical process known as a X2 random field (ADLER 1981)--about which somewhat less is known than the
ORENSTEIN-UHLENBECK diffusion. Approximate arguments show that the level of the highest excursion of such a X2 random field on an entire genome is about m-fold higher than the corresponding level for an ORENSTEIN-UHLENBECK diffusion on the genome. If m QTLs have equal effects, then simultaneous search decreases the number of progeny required to achieve statistical significance by a factor of about (1-mσ2)/(1-σ2), where σ2 is the fraction of variance explained by each. If the QTLs have unequal effects, it may become possible to detect those with smaller effects by first controlling for those with larger effects. We will discuss simultaneous search for QTLs in more detail elsewhere.

Claims

1. A method of mapping genomic regions containing
polygenic factors controlling quantitative trait loci comprising the steps of:
a. crossing two strains of interest, the strains differing as to at least one trait of interest, to produce progeny;
b. carrying out one or more crosses, which are
either back-crosses or intercrosses, to produce progeny;
c. scoring progeny of the one or more crosses for at least one trait of interest and for selected genetic markers, the genetic markers comprising a genetic linkage map; and
d. applying. an algorithm designed to maximize the likelihood of the trait of interest to the results of scoring progeny as in step (c).
2. A method of mapping quantitative trait loci in the genome of a higher plant, comprising the steps of: a. crossing two interfertile strains of the higher plant, the two strains differing as to at least one trait of interest, to produce progeny;
b. carrying out one or more back-crosses of one of the two interfertile strains and progeny produced in step (a), to produce progeny;
c. scoring progeny of the back-crosses for at
least one trait of interest and for selected genetic markers, the genetic markers comprising a genetic linkage map; and
el. applying an algorithm designed to maximize the likelihood of the trait of interest to the result of scoring progeny as in step (c).
3. A method of genetic dissection of a quantitative
trait, comprising the steps of: making an
appropriate interspecies back-cross which will detect a specific minimum phenotypic effect and determining the presence or absence of the minimum phenotypic effect.
4. A method of mapping genomic regions containing
polygenic factors controlling at leas t one
quantitatively inher ited trait of interest,
comprising the steps of:
a. making a back-cross of two species which differ in at least one quantitatively inherited trait of interest, to produce back-cross progeny; b. scoring back-cross progeny as to the occurrence of selected restriction fragment length
polymorphisms;
c. constructing a genetic linkage map using the information resulting from step (b); and d. constructing an interval map of quantitative trait loci.
5. A method of designing a cross for genetic dissection of a quantitative trait, comprising the steps of: a. selecting two strains differing for a
quantitative trait of interest, the two strains having a phenotypic difference D which is large compared to an environmental variance σ2 E and a backcross between the two strains having a relatively small number of K of QTLs
segregating therein;
b. specifying a minimum phenotypic effect δ that the cross of the two strains will be designed to detect the specific minimum phenotypic effect δ ensuring that QTLs accounting for a substantial amount of the phenotypic difference D will be detected; and
c. genotyping a number N of the backcross progeny of the two strains where N is calculated from a function of spacing between genetic markers in a map (of ?), threshold T for an LOD score (of ?), and a desired probability β that a false positive occurs; the number N decreasing for increasing fraction of genetic variances V in the backcross explained by the QTLs that account for a substantial amount of the
phenotypic difference D, and the number N decreasing for smaller K with larger D.
6. A method of Claim 5 wherein within progeny of a
backcross of one of the strains σ2 B1, variance of the phenotype is defined by σ2 B1 = σ2 G + σ2 E where σ2 G is genetic variance.
7. A method of Claim 6 wherein K is defined by
Figure imgf000063_0001
8. A method of Claim 5 wherein the step of specifying a minimum phenotypic effect δ includes choosing a value of δ which Is from approximately H(D/K) to approximately (D/K).
9. A method of Claim 5 wherein the genetic variance V explained by the QTL is defined
V =
Figure imgf000064_0001
where σ2 B 1 is the variance of the phenotype in progeny of a backcross of one of the selected strains.
10. A method of Claim 5, further comprising the step of: mapping QTLs when the phenotypic difference D measured in environmental standard deviations is on the order of the number K.
PCT/US1989/004688 1988-10-19 1989-10-19 Mapping quantitative traits using genetic markers WO1990004651A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US259,998 1981-05-04
US25999888A 1988-10-19 1988-10-19

Publications (1)

Publication Number Publication Date
WO1990004651A1 true WO1990004651A1 (en) 1990-05-03

Family

ID=22987392

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1989/004688 WO1990004651A1 (en) 1988-10-19 1989-10-19 Mapping quantitative traits using genetic markers

Country Status (1)

Country Link
WO (1) WO1990004651A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995017524A2 (en) * 1993-12-23 1995-06-29 Molecular Tool, Inc. Automatic genotype determination
US5492547A (en) * 1993-09-14 1996-02-20 Dekalb Genetics Corp. Process for predicting the phenotypic trait of yield in maize
WO1999008507A1 (en) * 1997-08-15 1999-02-25 Forbio Limited Method of directed breeding for reduced reproductive development
WO1999013107A1 (en) * 1997-09-08 1999-03-18 Warner-Lambert Co. A method for determining the in vivo function of dna coding sequences
WO2002044422A2 (en) * 2000-12-01 2002-06-06 University Of North Carolina - Chapel Hill Method for ultra-high resolution mapping of genes and determination of genetic networks among genes underlying phenotypic traits
WO2010071431A1 (en) 2008-12-19 2010-06-24 Monsanto Invest N.V. Method of breeding cysdv-resistant cucumber plants
WO2011050296A1 (en) 2009-10-22 2011-04-28 Seminis Vegetable Seeds, Inc. Methods and compositions for identifying downy mildew resistant cucumber plants
EP2511381A1 (en) 2007-06-08 2012-10-17 Monsanto Technology LLC Methods for sequence-directed molecular breeding
CN110093406A (en) * 2019-05-27 2019-08-06 新疆农业大学 A kind of argali and its filial generation gene research method
US10455783B2 (en) 2006-08-15 2019-10-29 Monsanto Technology Llc Compositions and methods of plant breeding using high density marker information

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1989007647A1 (en) * 1988-02-22 1989-08-24 Pioneer Hi-Bred International, Inc. Genetic linkages between agronomically important genes and restriction fragment length polymorphisms

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1989007647A1 (en) * 1988-02-22 1989-08-24 Pioneer Hi-Bred International, Inc. Genetic linkages between agronomically important genes and restriction fragment length polymorphisms

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BIOMETRICS, Vol. 42, 1986 J.I. Weller: "Maximum Likelihood Techniques for the Mapping and Analysis of Quantitative Trait Loci with the Aid of Genetic Markers ", see page 627 - page 640, see especially page 628 line 22 - page 633 line 19. *
Genetics, Vol. 116, 1987 M.D. Edwards et al: "Molecular-Marker-Facilitated Investigations of Quantitative-Trait Loci in Maize. I. Numbers, Genomic Distribution and Types of Gene Action. ", see page 113 - page 125, see especially page 114 column 2 line 14 - page 117 column 2 line 4. *
NATURE, Vol. 335, 1988 Andrew H. Paterson et al: "Resolution of quantitative traits into Mendelian factors by using a complete linkage map of restriction fragment length polymorphisms ", see page 721 - page 726. *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5981832A (en) * 1991-02-19 1999-11-09 Dekalb Genetics Corp. Process predicting the value of a phenotypic trait in a plant breeding program
US6455758B1 (en) 1991-02-19 2002-09-24 Dekalb Genetics Corporation Process predicting the value of a phenotypic trait in a plant breeding program
US5492547A (en) * 1993-09-14 1996-02-20 Dekalb Genetics Corp. Process for predicting the phenotypic trait of yield in maize
EP1352973A3 (en) * 1993-12-23 2004-01-02 Beckman Coulter, Inc. Automatic genotype determination
WO1995017524A3 (en) * 1993-12-23 1995-07-13 Molecular Tool Inc Automatic genotype determination
EP1352973A2 (en) * 1993-12-23 2003-10-15 Beckman Coulter, Inc. Automatic genotype determination
WO1995017524A2 (en) * 1993-12-23 1995-06-29 Molecular Tool, Inc. Automatic genotype determination
US7585466B1 (en) 1993-12-23 2009-09-08 Beckman Coulter, Inc. Automatic genotype determination
WO1999008507A1 (en) * 1997-08-15 1999-02-25 Forbio Limited Method of directed breeding for reduced reproductive development
WO1999013107A1 (en) * 1997-09-08 1999-03-18 Warner-Lambert Co. A method for determining the in vivo function of dna coding sequences
WO2002044422A2 (en) * 2000-12-01 2002-06-06 University Of North Carolina - Chapel Hill Method for ultra-high resolution mapping of genes and determination of genetic networks among genes underlying phenotypic traits
WO2002044422A3 (en) * 2000-12-01 2003-09-25 Univ North Carolina Chapel Hill Method for ultra-high resolution mapping of genes and determination of genetic networks among genes underlying phenotypic traits
US10455783B2 (en) 2006-08-15 2019-10-29 Monsanto Technology Llc Compositions and methods of plant breeding using high density marker information
EP2511381A1 (en) 2007-06-08 2012-10-17 Monsanto Technology LLC Methods for sequence-directed molecular breeding
US10544448B2 (en) 2007-06-08 2020-01-28 Monsanto Technology Llc Methods for sequence-directed molecular breeding
US10544471B2 (en) 2007-06-08 2020-01-28 Monsanto Technology Llc Methods for sequence-directed molecular breeding
US10550424B2 (en) 2007-06-08 2020-02-04 Monsanto Technology Llc Methods for sequence-directed molecular breeding
WO2010071431A1 (en) 2008-12-19 2010-06-24 Monsanto Invest N.V. Method of breeding cysdv-resistant cucumber plants
WO2011050296A1 (en) 2009-10-22 2011-04-28 Seminis Vegetable Seeds, Inc. Methods and compositions for identifying downy mildew resistant cucumber plants
EP3680352A1 (en) 2009-10-22 2020-07-15 Seminis Vegetable Seeds, Inc. Downy mildew resistant cucumber plants
CN110093406A (en) * 2019-05-27 2019-08-06 新疆农业大学 A kind of argali and its filial generation gene research method

Similar Documents

Publication Publication Date Title
Grattapaglia et al. Genetic mapping of quantitative trait loci controlling growth and wood quality traits in Eucalyptus grandis using a maternal half-sib family and RAPD markers
Paterson et al. Mendelian factors underlying quantitative traits in tomato: comparison across species, generations, and environments.
Brown et al. Isozymes and the Genetic Resources of Forest Trees1
Ranc et al. A clarified position for Solanum lycopersicum var. cerasiforme in the evolutionary history of tomatoes (solanaceae)
Tanksley Mapping polygenes
Melchinger Use of molecular markers in breeding for oligogenic disease resistance
Famoso et al. Genetic architecture of aluminum tolerance in rice (Oryza sativa) determined through genome-wide association analysis and QTL mapping
Stuber Biochemical and molecular markers in plant breeding
Marques et al. Genetic dissection of vegetative propagation traits in Eucalyptus tereticornis and E. globulus
EP2399214B1 (en) Method for selecting statistically validated candidate genes
Barnaud et al. Linkage disequilibrium in cultivated grapevine, Vitis vinifera L
McCauley et al. The spatial distribution of chloroplast DNA and allozyme polymorphisms within a population of Silene alba (Caryophyllaceae)
Warwick Allozyme and life history variation in five northwardly colonizing North American weed species
Liu Computational tools for study of complex traits
Bucci et al. Assessing the genetic divergence of Pinus leucodermis Ant. endangered populations: use of molecular markers for conservation purposes
Latouche‐Hallé et al. Long‐distance pollen flow and tolerance to selfing in a neotropical tree species
Chung et al. Patterns of hybridization and population genetic structure in the terrestrial orchids Liparis kumokiri and Liparis makinoana (Orchidaceae) in sympatric populations
Andersen et al. Natural genetic variation as a tool for discovery in Caenorhabditis nematodes
Paterson Of blending, beans, and bristles: the foundations of QTL mapping
Carrasco et al. The Chilean strawberry [Fragaria chiloensis (L.) Duch.]: genetic diversity and structure
Paterson QTL mapping in DNA marker-assisted plant and animal improvement
Kearsey QTL analysis: problems and (possible) solutions.
WO1990004651A1 (en) Mapping quantitative traits using genetic markers
Pelosi et al. From genomes to populations: A meta-analysis and review of fern population genetics
Roy et al. Who is Dermanyssus gallinae? Genetic structure of populations and critical synthesis of the current knowledge

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LU NL SE

COP Corrected version of pamphlet

Free format text: PAGES 1/12-12/12,DRAWINGS,REPLACED BY NEW PAGES 1/15-15/15;DUE TO LATE TRANSMITTAL BY THE RECEIVINGOFFICE