AU781961B2 - Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds - Google Patents

Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds Download PDF

Info

Publication number
AU781961B2
AU781961B2 AU21791/01A AU2179101A AU781961B2 AU 781961 B2 AU781961 B2 AU 781961B2 AU 21791/01 A AU21791/01 A AU 21791/01A AU 2179101 A AU2179101 A AU 2179101A AU 781961 B2 AU781961 B2 AU 781961B2
Authority
AU
Australia
Prior art keywords
dna
soil
nucleic acids
vector
collection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU21791/01A
Other versions
AU2179101A (en
Inventor
Maria Ball
Carmela Cappellano
Sophie Courtois
Francois Francou
Asa Frostegard
Michel Guerineau
Pascale Jeannin
Jean-Luc Pernodet
Alain Raynal
Guennadi Sezonov
Pascal Simonet
Karine Tuphile
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aventis Pharma SA
Original Assignee
Aventis Pharma SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from FR9915032A external-priority patent/FR2801609B1/en
Application filed by Aventis Pharma SA filed Critical Aventis Pharma SA
Publication of AU2179101A publication Critical patent/AU2179101A/en
Application granted granted Critical
Publication of AU781961B2 publication Critical patent/AU781961B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P19/00Preparation of compounds containing saccharide radicals
    • C12P19/26Preparation of nitrogen-containing carbohydrates
    • C12P19/28N-glycosides
    • C12P19/30Nucleotides
    • C12P19/34Polynucleotides, e.g. nucleic acids, oligoribonucleotides
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/93Ligases (6)
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P31/00Antiinfectives, i.e. antibiotics, antiseptics, chemotherapeutics
    • A61P31/04Antibacterial agents
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P33/00Antiparasitic agents
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P33/00Antiparasitic agents
    • A61P33/02Antiprotozoals, e.g. for leishmaniasis, trichomoniasis, toxoplasmosis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P37/00Drugs for immunological or allergic disorders
    • A61P37/02Immunomodulators
    • A61P37/06Immunosuppressants, e.g. drugs for graft rejection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1003Extracting or separating nucleic acids from biological samples, e.g. pure separation or isolation methods; Conditions, buffers or apparatuses therefor

Landscapes

  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Veterinary Medicine (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • General Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Genetics & Genomics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Biophysics (AREA)
  • Transplantation (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Plant Pathology (AREA)
  • Communicable Diseases (AREA)
  • Oncology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)
  • Medicines Containing Material From Animals Or Micro-Organisms (AREA)

Description

Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in the synthesis of novel compounds.
The present invention relates to a process for preparing nucleic acids from an environmental sample, more particularly a process for obtaining a collection of nucleic acids from a sample. The invention also relates to the nucleic acids or to the collections of nucleic acids obtained according to the process and to their use in the synthesis of novel compounds, in particular novel compounds of therapeutic interest.
The invention also relates to the novel means used in the above process for obtaining nucleic acids, such as novel vectors and novel processes for preparing such vectors or alternatively recombinant host cells comprising a nucleic acid of the invention.
The invention also relates to processes for detecting a nucleic acid of interest in a collection of nucleic acids obtained according to the above process, as well as to the nucleic acids detected by such a process and to the polypeptides encoded by such nucleic acids.
The invention also relates to nucleic acids obtained and detected according to the above processes, in particular nucleic acids encoding an 20 enzyme which participates in the pathway for the biosynthesis of antibiotics such as 3-lactams, aminoglycosides, heterocyclic nucleotides or polyketides, as well as the enzyme encoded by these nucleic acids, the polyketides produced by means of the expression of these nucleic acids and, finally, pharmaceutical compositions comprising a pharmacologically active amount of a polyketide produced by means of the expression of such nucleic acids.
Since the discovery of the production of streptomycin by actinomycetes, the search for novel compounds of therapeutic interest, and most particularly of novel antibiotics, has made increasing use of methods for screening the metabolites produced by soil microorganisms.
Such methods consist mainly in isolating the organisms of the telluric microflora, in culturing them on specially adapted nutrient media and then in detecting a pharmacological activity in the products found in the culture supernatants or in the cell lysates which have, where appropriate, undergone one or more prior separation and/or purification steps.
Thus, the methods for the in vitro isolation and culturing of the organisms constituting the telluric microflora have, to date, enabled the characterization of about 40,000 molecules, about half of which show biological activity.
Major products have been characterized according to such in vitro culture methods, such as antibiotics (penicillin, erythromycin, actinomycin, tetracycline, cephalosporin), anticancer agents, anti-cholesterolaemiants or pesticides.
The products of therapeutic interest of microbial origin which are known to date originate in the majority (about 70%) from the actinomycetes and more particularly from the Streptomyces genus. However, other therapeutic compounds, such as teicoplanins, gentamycin and spinosins, have been isolated from microorganisms of genera that are more difficult to culture, such as Micromonospora, Actinomadura, Actinoplanes, Nocardia, Streptosporangium, Kitasatosporia or Saccharomonospora.
However, the practice illustrates the fact that the characterization of novel natural products synthesized by the microorganisms of soil microflora remains limited, partly on account of the fact that the in vitro culturing step usually results in a selection of organisms that are already previously known.
The methods for in vitro separation and culturing of telluric organisms in order to identify novel compounds of interest thus have many limitations.
For example, in actinomycetes, the level of rediscovery of antibiotics that are already previously known is about 99%. Specifically, fluorescence microscopy techniques have made it possible to count more than 1010 bacterial cells in 1 g of soil, whereas only 0.1 to 1% of these bacteria can be isolated after inoculation on culture media.
With the aid of DNA recombination kinetics techniques, it has been possible to show that between 12,000 and 18,000 bacterial species can be contained in 1 g of soil, whereas, to date, only 5000 non-eukaryotic microorganisms have been described, all habitats considered.
Molecular ecology studies have made it possible to amplify and clone many novel sequences of 16S rDNA from environmental DNA.
The results of these studies have led to a trebling of the number of bacterial divisions previously characterized.
At the present time, bacteria are subdivided into 40 divisions, some of which consist only of bacteria which cannot be cultured. These latest results bear witness to the breadth of microbial biodiversity which remains unexploited to date.
Recent studies have attempted to overcome the many obstacles to gaining access to the biodiversity of the soil microflora, in particular including the step of in vitro culturing prior to the isolation and characterization of compounds of industrial interest, especially of therapeutic interest.
Methods have thus been developed which include a step of extracting the DNA from telluric organisms, where appropriate after a prior isolation of the organisms contained in the soil samples.
fr~ The DNA thus extracted, after lysis of the bacterial cells without prior in vitro culturing, is cloned into vectors used to transfect host organisms, in order to constitute libraries of DNA originating from soil bacteria.
These libraries of recombinant clones are used to detect the presence of genes encoding compounds of therapeutic interest or alternatively to detect the production of compounds of therapeutic interest by these recombinant clones.
However, the methods for gaining direct access to the DNA of soil microflora, described in the prior art, present drawbacks during the implementation of each of the steps described above, these drawbacks being of a nature to considerably affect the quantity and quality of the genetic material obtained and exploitable.
The prior art regarding each of the steps for constructing libraries of DNA originating from soil samples is detailed below, along with the technical drawbacks identified by the Applicant and which have been overcome according to the present invention.
1. Step of extracting DNA from a soil sample 1.1 Direct extraction of environmental DNA This is essentially a process using DNA extraction techniques performed directly on the environmental sample, usually after a prior in situ lysis of the organisms in the sample.
Such techniques have been used on samples originating from aquatic media, both from freshwater and marine water. They comprise a first step of preconcentrating the cells present in free form or in the form of particles, which generally consists of a filtration of large volumes of water on different filtration devices, for example conventional membrane filtration, tangential or rotational filtration or alternatively ultrafiltration.
The pore size is between 0.22 and 0.45 mm and often requires a prefiltration in order to avoid blockages due to the treatment of large volumes.
In a second stage, the cells harvested are lysed directly on the filters in small volumes of solutions, by enzymatic and/or chemical treatment.
This technique is illustrated for example by the studies by Stein et al., 1996, Journal of Bacteriology, Vo1.178 591-599 who describes the cloning of genes encoding ribosomal DNA and encoding a transcription elongation factor (EF 2) from Archaebacteria of marine plankton.
Techniques of direct extraction of DNA from samples of soil or sediment have also been described, which are based on protocols of physical, chemical or enzymatic lysis performed in situ.
For example, US patent No. 5 824 485 (Chromaxome Corporation) describes a chemical lysis of bacteria directly on the sample taken by addition of a hot lysis buffer based on guanidium isothiocyanate.
International patent application No. WO 99/20799 (Wisconsin Alumni Research Foundation) decribes a step of in situ lysis of bacteria using an extraction buffer containing a protease and SDS.
Other techniques have also been used, such as carrying out several cycles of freezing-thawing on the sample followed by high-pressure pressing of the thawed sample. Techniques of bacterial lysis using a succession of steps of sonication, heating with microwaves and heat shocks have also been used (Picard et al. 1992).
However, the techniques of the prior art described above for the direct extraction of DNA have very variable efficacy in quantitative and qualitative terms.
Thus, in situ chemical or enzymatic treatments of the sample have the drawback of lysing only certain categories of microorganisms on account of the selective resistance of the various microorganisms indigenous to the lysis step due to their heterogeneous morphology.
Thus, Gram-positive bacteria withstand a treatment with hot SDS detergent whereas virtually all Gram-negative cells are lysed.
In addition, some of the direct extraction protocols described above promote the adsorption of the nucleic acids extracted onto the mineral particles of the sample, thus significantly reducing the amount of available DNA.
Moreover, although some of the protocols of the prior art disclose a mechanical treatment step to lyse the microorganisms in the sample taken, such a mechanical lysis step is systematically carried out in liquid medium in an extraction buffer, which does not allow good homogenization of the starting sample in the form of fine particles enabling maximum accessibility to the diversity of organisms present in the sample. Grinding tests have also been carried out on crude soil samples using glass beads, but the amount of DNA extracted was low.
It has been observed according to the invention that a first step of in situ mechanical lysis in liquid medium has negative effects on the amount of DNA which can be extracted.
The amount of DNA which can be used directly for cloning in recombinant vectors is also dependent on the purification steps subsequent to its extraction.
In the prior art, the DNA extracted is then purified, for example by using polyvinylpolypyrrolidone, by a precipitation in the presence of ammonium acetate or potassium acetate, by centrifugations on a caesium chloride gradient, or by chromatographic techniques, in particular on a hydroxyapatite support, on an ion-exchange column or molecular sieving, or by electrophoresis techniques on agarose gel.
The DNA purification techniques previously described, especially when combined with the abovementioned techniques for extracting environmental DNA, are liable to lead to a co-purification of the DNA with inhibitory compounds, originating from the initial sample, that are difficult to remove.
The co-extraction of inhibitory compounds with the DNA necessitates the multiplication of the number of purification steps, which leads to considerable losses of the DNA initially extracted and simultaneously reduces the diversity of the genetic material initially contained in the sample, as well as its quantity.
Another aim of the invention was to overcome the drawbacks of the prior purification protocols and to develop a DNA purifcation step which makes it possible to maintain an optimum level of diversity of the DNA in the initial sample, on the one hand, and to promote quantitatively its production, on the other hand.
Most particularly, the qualitative and quantitative improvements to the purification of DNA are at a maximum when they make use of a combination of a direct DNA extraction process according to the invention and a subsequent purification process, as will be described hereinbelow.
1.2. Indirect extraction of environmental DNA.
Such techniques involve a first step of separation of the various organisms in the telluric microflora from the other constituents of the starting sample, prior to the actual DNA extraction step.
In the state of the art, the prior separation of a microbial fraction from a soil sample usually comprises a physical dispersion of the sample by grinding it in liquid medium, for example using devices such as a Waring Blender or a mortar.
Chemical dispersions have also been described, for example dispersions on ion-exchange resins or dispersions using non-specific detergents such as sodium deoxycholate or polyethylene glycol. Whatever the mode of dispersion, the solid sample should be suspended in water, phosphate buffer or a saline solution.
The physical or chemical dispersion step can be followed by a centrifugation on a density gradient allowing the separation of the cells contained in the sample and of the particles of this sample, it being understood that bacteria have lower densities than those of most soil particles.
The physical dispersion step can also alternatively be followed by a step of low-speed centrifugation or a step of cell elutriation.
The DNA can then be extracted from the separated cells by any available method of lysis and can be purified by many methods, including the purification methods described in paragraph 1.1 above. In particular, the inclusion of the cells in low-melting agarose can be carried out in order to control the lysis.
However, the methods described in the prior art that are known to the Applicant are unsatisfactory on account of the presence, in the fractions containing the extracted DNA, of unwanted constituents of the starting sample which have a significant influence on the final quality and quantity of DNA.
The present invention proposes to solve the technical difficulties encountered in the processes of the prior art, as will be described hereinbelow.
2. Molecular characterization of the extracted DNA.
When it is desired to construct a DNA library from an environmental sample, in particular from a soil sample, it is advantageous to check the quality and diversity of the source of DNA extracted and purified before it is inserted into suitable vectors.
The object of such a molecular characterization of the DNA extracted and purified is to obtain profiles representing the proportions of the various bacterial taxons present in this DNA extract. The molecular characterization of the DNA extracted and purified makes it possible to determine whether or not artefacts have been introduced during the implementation of the various extraction and purification steps and, where appropriate, whether or not the original diversity of the DNA extracted and purified is representative of the microbial diversity initially present in the sample, in particular in the soil sample.
To the Applicant's knowledge, the prior art makes use of quantitative hybridization processes using oligonucleotide probes that are specific for different bacterial groups, applied directly to the DNA extracted from the environment.
Unfortunately, such an approach is relatively insensitive and does not make it possible to detect taxonomic groups or genera that are present in low abundance.
The prior art also describes quantitative PCR processes, such as MPN-PCR or competitive quantitative PCR. However, these techniques have major drawbacks.
Thus, MPN-PCR is complicated to carry out on account of the multiplication of the dilutions and repetitions, making it unsuitable for a large number of samples or for primer couples.
Moreover, competitive quantitative PCR is difficult to carry out on account of the need to construct a competitor which is specific to the target DNA and which, in addition, does not induce any bias or artefacts into the competition itself.
According to the invention, a process is thus proposed for prescreening a library of DNA originating from an environmental sample, which is both quick, simple and reliable and which makes it possible to test the quality of the DNA extracted and purified beforehand and thus to determine the value of constructing a library of clones prepared from this purified starting DNA.
3. Vectors for cloning DNA extracted and purified from an environmental sample.
Many vectors have already been described in the prior art for cloning DNA preextracted from an environmental sample.
Thus, according to the description of international patent application No. WO 99/20799, viral vectors, phages, plasmids, phagemids, cosmids, phosmids, vectors of the BAC (bacterial artificial chromosome) type or bacteriophage P1, vectors of PAC type (artificial chromosome based on bacteriophage P1), vectors of the YAC (yeast artificial chromosome) type, yeast plasmids or any other vector capable of maintaining and expressing a genomic DNA in a stable manner can be used.
Example 1 of PCT patent application No. WO 99/20799 describes the construction of a genomic DNA library by cloning into a vector of the BAC type.
To the Applicant's knowledge, no DNA library originating from an environmental sample has yet been effectively produced with vectors of conjugative type, such a technique being made available to and reproducible by those skilled in the art for the first time by virtue of the teaching of the present invention.
4. Host cells In the prior art, many host cells have been described as being able to be used in order to accommodate vectors containing inserts of DNA originating from the DNA extracted and purified from an environmental sample.
Thus, PCT patent application No. WO 99/20799 cites many suitable host cells, such as Escherichia coli, in particular the strain DH or the strain 294 (ATCC 31446, the strain E. coli B, E. Coli X 1776 (ATCC No. 31.537), E.coli DH5 a and E.coli W3110 (ATCC No. 27.325).
This PCT patent application also cites other suitable host cells such as Enterobacter, Erwinia, Klebsiella, Proteus, Salmonella, Serratia, Schigella or strains of the bacillus type such as B. subtilis and B. licheniformis as well as bacteria of the genus Pseudomonas, Streptomyces or Actinomyces.
US patent No. 5 824 485 in particular cites the Streptomyces lividans TK66 strain or yeast cells such as those of Saccharomyces pombe.
Characterization of genes of interest in DNA libraries originating from an environmental sample.
PCT patent application No. WO 99/20799 describes an identification of the phenotype of different clones belonging to the DNA library of B. cereus, respectively a clone producing haemolysin, a clone hydrolysing esculin or a clone producing an orange pigment.
Mutagenesis techniques based on the use of a transposon encoding the phoA enzyme made it possible subsequently to isolate mutated clones and to characterize the sequences responsible for the phenotypes observed.
The abovementioned article by Stein et al. (1996) describes the use of specific primers for ribosomal DNA in order to amplify the DNA inserted into the vectors harboured by certain clones of a genomic DNA library of marine plankton Archaebacteria and the identification of several coding sequences in the DNA thus amplified.
The article by Borschert S. et al. (1992) describes the screening of a genomic DNA library of Bacillus subtilis using pairs of primers which hybridize with conserved regions of known peptide synthetases in order to identify one or more corresponding genes in the genome of Bacillus subtilis.
This technique made it possible to detect a chromosomal DNA fragment of about 26 kb carrying a portion of the surfactin biosynthesis operon.
The article by Kah-Tong S. et al. (1997) describes the screening of a library of DNA originating from the soil with the aid of primers which hybridize with conserved sequences of the operon responsible for the biosynthetic pathway of type II polyketides and shows the identification, in this DNA library, of sequences belonging to the PKS-P gene. This article also describes the construction of hybrid expression cassettes in which the sequence of the PKS-P subunit, found naturally in the operon responsible for polyketide biosynthesis, has been replaced with various similar sequences found in the DNA library.
Similarly, the article by Hong-Fu et al. (1995) describes the construction of expression cassettes containing the various open reading frames of the operon responsible for polyketide biosynthesis, the various expression cassettes having been constructed artificially by combining the open reading frames which are not found together naturally in the genome of Streptomyces coelicolor. This article shows that the combination, in the artificial expression cassettes, of open reading frames originating from different bacterial strains allows the production of polyketides that have different structural characteristics and relatively large antibiotic activities with respect to Bacillus subtilis and Bacillus cereus.
Polyketides form part of a large family of natural products of variable structure having great diversity of biological activity. Among the polyketides are, for example, tetracyclines and erythromycin (antibiotics), FK506 (immunosuppressant), doxorubicin (anticancer agent), monensin (a coccidiostatic agent) and avermectin (an antiparasitic agent).
These molecules are synthesized by means of multifunctional enzymes known as polyketide synthases, which catalyse repeated cycles of condensation between acyl thioesters (in general acetyl, propionyl, malonyl or methylmalonyl thioesters). Each condensation cycle results in the formation, on a growing carbon chain, of a -keto group which can then undergo, where appropriate, one or more series of reductive steps.
Given the major clinical interest of polyketides, their common mechanism of biosynthesis and the high degree of conservation observed between the groups of genes encoding polyketide synthases, increased interest has developed for the development of novel polyketides by genetic engineering.
Novel artificial polyketides have thus been produced by genetic engineering, such as mederrhodin A or dihydrogranatirhodin. The vast majority of the novel polyketide molecules obtained by genetic engineering are very different, in structural terms, from the corresponding natural polyketides.
P OPERUbrI-791I41 r l doc-2M4ID5 14- From the prior art, it thus emerges that there is a need to obtain novel polyketides of interest and most particularly polyketides of therapeutic interest which have in particular, relative to their natural homologues, an increased level of antibiotic activity or a different spectrum of antibiotic activity, either which is broader than that of the known polyketides, or which is, on the other hand, more selective.
As will be described below, this need is partly fulfilled according to the present invention.
DESCRIPTION OF THE INVENTION According to an aspect of the present invention there is provided a process for preparing a collection of recombinant vectors, wherein a collection of nucleic acids is prepared from a soil sample containing organisms, by a process S 15 comprising the following sequence of steps: obtaining microparticles by grinding a pre-dried or pre-desiccated soil sample, followed by suspension of the microparticles in a liquid buffer medium; and extracting the nucleic acids present in the microparticles; and passage of the solution containing the nucleic acids over a molecular sieve, followed by recovery of the elution fractions enriched in nucleic acids and passage of the elution fractions enriched in nucleic acids over an anion-exchange chromatography support, followed by recovery of the elution fractions containing the purified nucleic acids, 25 and the obtained nucleic acids are inserted into a cloning and/or expression vector.
According to another aspect of the present invention there is provided a collection of recombinant vectors obtained by the process of the invention.
According to another aspect of the present invention there is provided a process for detecting a nucleic acid of given nucleotide sequence, or whose nucleotide sequence is structurally similar to a given nucleotide sequence, in a P OPERKb.2nU1791-O1 ml dMM-28A)4OS -14Acollection of recombinant host cells according to the invention, characterized in that it comprises the following steps: placing the collection of recombinant host cells in contact with a pair of primers which hybridize with the given nucleotide sequence or which hybridize with the nucleotide sequence that is structurally similar to a given nucleotide sequence; carrying out at least three amplification cycles; and detecting any nucleic acid amplified.
According to yet another aspect of the present invention there is provided a process for detecting a nucleic acid of given nucleotide sequence, or whose nucleotide sequence is structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: placing the collection of recombinant host cells in contact with a probe S 15 which hybridizes with the given nucleotide sequence or which hybridizes with .a nucleotide sequence that is structurally similar to the given nucleotide sequence; and detecting the hybrid possibly formed between the probe and the nucleic acids included in the vectors of the collection.
According to a further aspect of the present invention there is provided a process for identifying the production of a compound of interest by one or more recombinant host cells in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: culturing the recombinant host cells of the collection in a suitable culture medium; and detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant host cells cultured.
According to a further aspect of the present invention there is provided a process for selecting a recombinant host cell which produces a compound of interest in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: P OPERU bmu\I79.OI rcal dc-28M 14B culturing recombinant host cells of the collection in a suitable culture medium; detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant host cells cultured; and selecting recombinant host cells which produce the compound of interest.
According to yet a further aspect of the present invention there is provided a process for producing a compound of interest, characterized in that it comprises the following steps: culturing a recombinant host cell selected according to the process of the invention; and recovering and, where appropriate, purifying the compound produced by the said recombinant host cell.
The invention relates firstly to a process for constructing libraries of DNA originating from an environmental sample, such a sample possibly being, without discrimination, an aquatic medium (fresh water or marine water), a sample of soil (surface layer of soil, subsoil or sediments), or a sample of eukaryotic organisms containing an associated microflora, such as, for example, a sample originating from plants, insects or marine organisms and having an associate microflora.
S 20 The development of a process for constructing a library of DNA from an environmental sample, and most particularly from a soil sample, comprises critical steps whose implementation must necessarily be optimized in order to obtain a library of DNA whose content of nucleic acids of interest satisfies the objectives •initially set.
A first critical step consists in extracting and subsequently purifying the nucleic acids initially contained in the sample, i.e. mainly the nucleic acids contained in the various organisms of which the microflora of this sample is composed.
The quality of purification of the extracted DNA is a factor which determines the result obtained.
A second important step of a process for constructing a library of nucleic acids originating from an environmental sample is the evaluation of the genetic diversity of the nucleic acids extracted and purified. The development of a step for the simple and reliable pre-screening of the DNA extracted and purified in order to check that it takes account, at least partially, of the phylogenetic diversity of the organisms initially present in the starting sample effectively makes it possible to determine the value or otherwise of using the initial source of extracted and purified DNA for the construction of the nucleic acid library itself or, on the contrary, to not continue the construction of the nucleic acid library on account of excessive artefacts introduced at the time of the extraction and purification of the nucleic acids. It has also been identified, according to the invention, that the quality of the inserts introduced into the vectors to construct the library is a determining factor. It has thus been determined that the use of restriction enzymes to cleave the DNA extracted and purified from the environmental sample was of a nature to introduce artefacts or "bias" into the structure of the inserts obtained. Specifically, the DNA extracted from the soil or from other environments, originating in the vast majority of cases from unculturable organisms, is composed of molecules whose content of G and C bases is by definition unknown and furthermore variable as a function of the origin of these organisms.
A third critical step is the insertion of the extracted and purified nucleic acids into vectors capable of integrating nucleic acids of chosen length, on the one hand, and to allow their transfection or integration into the genome of given host cells, on the other hand, as well as, where appropriate, to allow their expression in such host cells.
Vectors capable of integrating large nucleic acids, i.e. larger than 100 kb in size, constitute vectors of interest when the objective pursued consists in cloning and identifying a complete operon capable of directing a P %OPERXKb.I79I1-N rc-l doc-2AM 16complete biosynthetic pathway of a compound of industrial interest, in particular of a compound of pharmaceutical or agronomic interest.
DEFINITIONS
Throughout this specification and the claims which follow, unless the context requires otherwise, the word "comprise", and variations such as "comprises" and "comprising", will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.
The reference to any prior art in this specification is not, and should not be taken as, an acknowledgment or any form of suggestion that that prior art forms part of the common general knowledge in Australia.
For the purposes of the present invention, the terms "nucleic acids", "polynucleotides" and "oligonucleotides" mean not only DNA and RNA sequences but also hybrid RNA/DNA sequences of more than 2 nucleotides, in either single-stranded or double-stranded form.
The term "library" or "collection" is used in the present description with reference either to a set of extracted, and where appropriate purified, nucleic acids originating from an environmental sample, to a set of recombinant vectors, each of S* the recombinant vectors of the set comprising a nucleic acid originating from the set of abovementioned extracted, and where appropriate purified, nucleic acids, or to a set of recombinant host cells comprising one or more nucleic acids originating from the set of abovementioned extracted, and where appropriate purified, nucleic acids, the said nucleic acids being either carried by one or more recombinant vectors or integrated into the genome of the said recombinant host cells.
The expression "environmental sample" denotes, without discrimination, a sample of aquatic origin, for example from fresh or salt water, or a telluric sample originating from the surface layer of a soil, from sediments or from lower layers of the soil (subsoil), as well as samples of eukaryotic organisms, which may be multicellular, of plant origin, originating from marine organisms or from insects and having an associated microflora, this associated microflora constituting organisms of interest.
According to the invention, the term "operon" means a set of open reading frames whose transcription and/or translation is co-regulated by a unique set of signals for regulating the transcription and/or translation.
According to the invention, an operon can also comprise the said signals for regulating the transcription and/or translation.
For the purposes of the invention, the expression "metabolic pathway" or "biosynthetic pathway" means a set of anabolic or catabolic biochemical reactions which results in the conversion of a first chemical species into a second chemical species.
For example, a biosynthetic pathway for an antibiotic consists of the set of biochemical reactions converting primary metabolites into intermediate products of the antibiotics, and then subsequently into antibiotics.
The expression "regulation sequence which is operably linked relative to a nucleotide sequence whose expression is desired" means that the transcription regulation sequence(s) is (are) located, relative to the nucleotide sequence of interest whose expression is desired, so as to allow the expression of the said sequence of interest, the regulation of the said expression being dependent on factors which interact with the regulatory nucleotide sequences.
According to another terminology, it may also be said that the nucleotide sequence of interest whose expression is desired is placed "under the control" of the transcription-regulating nucleotide sequences.
For the purposes of the present invention, the term "isolated" denotes a biological material which has been abstracted from its original environment (the environment in which it is naturally located).
For example, a polynucleotide or a polypeptide present in the natural state in an organism (virus, bacterium, fungus, yeast, plant or animal) is not isolated. The same polypeptide separated from its natural environment or the same polynucleotide separated from the adjacent nucleic acids within which it is naturally inserted in the genome of the organism, is isolated.
Such a polynucleotide can be included into a vector and/or such a polynucleotide can be included into a composition and nevertheless remain in isolated form, due to the fact that the vector or composition does not constitute its natural environment.
The term "purified" does not require the material to be present in a form of absolute purity, exclusive of the presence of other compounds.
Rather, this is a relative definition.
A polypeptide or polynucleotide is in purified form after purification of the starting material by at least one order of magnitude, preferably two or three and preferentially four or five orders of magnitude.
For the purposes of the present invention, the "percentage of identity" between two sequences of nucleotides or of amino acids can be determined by comparing two optimally aligned sequences across a comparison window.
The portion of the nucleotide or polypeptide sequence in the comparison window can thus comprise additions or deletions (for example "gaps") relative to the reference sequence (which does not comprise these additions or deletions) so as to obtain an optimum alignment of the two sequences.
The percentage is calculated by determining the number of positions at which an identical nucleic base or an identical amino acid residue is observed for the two compared sequences (nucleic acid or peptide), followed by dividing the number of positions at which there is identity between the two bases or amino acid residues by the total number of positions in the comparison window, followed by multiplying the result by 100 in order to obtain the percentage of sequence identity.
The optimum alignment of the sequences for the comparison can be achieved by computer with the aid of known algorithms contained in the package from the company Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Doctor, Madison, Wisconsin.
By way of illustration, the percentage of sequence identity may be determined using the BLAST software (BLAST versions 1.4.9 of March 1996, BLAST 2.0.4. of February 1998 and BLAST 2.0.6. of September 1998), exclusively using the default parameters Altschul et al., J. Mol.
Biol. 1990 215: 403-410, S. F. Altschul et al., Nucleic Acids Res. 1997 3389-3402). Blast recherche des sequences similaires/homologues a une sequence" requite de reference, a I'aide de I'algorithme [Blast search for sequences similar/homologous to a reference "request" sequence, with the aid of the algorithm] from Altschul et al. The request sequence and the databases used can be of peptide or nucleic nature, any combination being possible.
EXTRACTION AND PURIFICATION OF NUCLEIC ACIDS ORIGINATING FROM AN ENVIRONMENTAL SAMPLE.
1. Direct extraction of nucleic acids It has been shown according to the present invention that, in order to obtain a library of nucleic acids originating from organisms contained in a sample of soil, it was important to create conditions under which, on the one hand, the various organisms in the sample are made accessible to the subsequent steps for extracting the nucleic acids, and, on the other hand, that the initial step of treatment of the sample of soil allows a maximum mechanical lysis of the organisms in the sample, which is of a nature to make the nucleic acids of these organisms directly accessible, mainly the genomic and plasmid DNA, to the buffers used for the subsequent extraction steps.
It has thus been demonstrated according to the invention that maximum accessibility of nucleic acids originating from microorganisms from a sample of soil was achieved by a thorough dry-grinding of the predried soil sample in order to obtain microparticles. The Applicant has thus determined that the drying of the soil sample prior to any subsequent treatment brings about a significant reduction in the cohesion of the crude soil sample and consequently promotes its subsequent disintegration in the form of microparticles, when a suitable grinding treatment is carried out.
Surprisingly, the Applicant has shown that microparticles of dry soil samples combined physicochemical properties that are favourable to the extraction of an optimum quantity of nucleic acids which, in their nature, could be representative of the genetic diversity of the organisms initially present in the starting soil sample. It has been shown in particular that the process of direct extraction of nucleic acids according to the invention allows the extraction of DNA originating from rare microorganisms, such as certain rare Streptomyces or sporulated microorganisms.
For the purposes of the present invention, the term "microparticles" of the soil sample means particles derived from the sample which have an average size of about 50 pm, i.e. on average between and 55 .m.
According to the invention, the microparticles are obtained from soil samples that are pre-dried or pre-desiccated and then ground until microparticles with an average size of between 2/ pm and 50 pm are obtained, before resuspension of the microparticles obtained in a liquid buffer medium.
Such a liquid buffer medium can consist of a nucleic acid extraction buffer, in particular a conventional DNA extraction buffer which is well known to those skilled in the art.
The grinding of the soil sample into microparticles has the twin function of mechanically lysing most of the organisms present in the initial soil sample and of making the organisms that are not lysed by this mechanical treatment accessible to optional subsequent steps of chemical and/or enzymatic lysis.
Thus, a first subject of the invention consists of a process for preparing a collection of nucleic acids from a soil sample containing organisms, the said process comprising a first step of obtaining microparticles by grinding the pre-dried or pre-desiccated soil sample, followed by suspending the microparticles in a liquid buffer medium.
In an entirely preferred manner, the grinding step is carried out using a device with agate or tungsten beads or alternatively using a device with tungsten rings. These devices are preferred since the hardness of materials such as agate or tungsten significantly facilitates the production of microparticles of the size specified above. For this reason, use of a grinding device with glass beads, which is found to be much less efficient, will preferably not be chosen, or will be avoided.
The drying or classification of the soil sample can be carried out by any method known to those skilled in the art. For example, the crude soil sample can be dried at room temperature for a period of 24 to 48 hours.
As indicated previously, the liquid buffer medium can consist of a medium for extracting the DNA present in the microparticles. An extraction buffer known as TENP containing, respectively, 50 mM Tris, 20 mM EDTA, 100 mM NaCI and 1% (weight/volume) of polyvinylpolypyrrolidone, at pH will most preferably be used.
The process for preparing a collection of nucleic acids from a soil sample is also characterized in that the step for obtaining microparticles by grinding the pre-dried or pre-desiccated soil sample is followed by a step of extracting the nucleic acids present in the microparticles.
It is common ground that the extraction of the nucleic acids is accompanied by a co-extraction of unwanted soil constituents and/or compounds, thus necessitating the subsequent purification of the nucleic acids extracted, such a subsequent purification step needing to be both selective enough to allow the removal of the unwanted soil constituents and/or compounds, and of a yield which is sufficient to entail a small loss in terms of the amount of pre-extracted DNA.
It has been shown according to the invention that a step of purifying the DNA extracted from the microparticles of the soil sample which satisfies the selectivity and yield criteria defined above comprises a treatment of the extracted DNA with a combination of two successive chromatography steps, a chromatography on molecular sieves and an anion-exchange chromatography, respectively.
According to another characteristic of the above process, step of extracting the nucleic acids is followed by a step of purifying the extracted nucleic acids with the aid of the following two chromatography steps: passing the solution containing the nucleic acids over a molecular sieve, followed by recovery of the elution fractions enriched in nucleic acids; passing the elution fractions enriched in nucleic acids over an anion-exchange chromatography support, followed by recovery of the elution fractions containing the nucleic acids.
The nature and order of the above chromatography steps are essential for good selectivity and an excellent yield for the step of purifying the DNA pre-extracted from the microparticles of the pre-dried or predesiccated soil sample.
In a very advantageous manner, the chromatographic support of the "molecular sieve" type in the above nucleic acid purification step consists of a chromatographic support of Sephacryl' 5400 HR type or a chromatographic support of equivalent characteristics.
In an entirely preferred manner, the anion-exchange chromatographic support used in the second step for purifying the extracted DNA is a support of Elutip® d type, or a chromatographic support of equivalent characteristics.
By combining the steps of obtaining microparticles of the dry soil sample, of extracting the nucleic acids present in the microparticles and of purification by the chromatography steps described above, it is possible according to the invention to extract the DNA from the soil directly without prior purification of the cells of the organisms initially contained in the sample, while at the same time avoiding the co-extraction of soil contaminants, such as, for example, humic acids, which is observed with the processes of the prior art.
The contaminants, such as humic acids, severely impair the analyses and the subsequent uses of the nucleic acids whose purification is desired.
According to the above process, it is also possible to gain access to the nucleic acids contained in the organisms which have not been lysed mechanically during step of obtaining microparticles of the soil sample, with the aim of obtaining a virtually exhaustive collection of the genetic diversity of nucleic acids initially present in the soil sample. Thus, the microparticles of the soil sample can undergo subsequent steps of chemical, enzymatic or physical lysis treatment, or alternatively a combination of chemical, enzymatic or physical treatments.
According to a first aspect, the process for preparing a collection of nucleic acids from a soil sample according to the invention can also be characterized in that step is followed by the following steps: treatment of the soil suspension in a liquid buffer medium by sonication; extraction and recovery of the nucleic acids.
In a preferred manner, for a treatment by sonication, use will be made of a device of titanium micro-point type, such as the 600 W Vibracell Ultrasonicator device sold by the company Bioblock or a sonicator of Cup Horn type.
In an entirely preferred manner, the sonication step is carried out at a power of 15 W for a duration of 7 to 10 minutes and comprises successive cycles of sonication, the sonication itself being carried out for of the duration of each cycle.
According to a second aspect, the above process can also be characterized in that step is followed by the following steps: treatment of the soil suspension in a liquid buffer medium by sonication; incubation of the suspension at 37 0 C after sonication in the presence of lysozyme and achromopeptidase; addition of SDS before centrifugation and precipitation of the nucleic acids; recovery of the precipitated nucleic acids.
Preferably, the step of incubation in the presence of lysozyme and achromopeptidase will be carried out at a final concentration of 0.3 mg/ml of each of the two enzymes, preferably for 30 minutes at 370C.
Preferably, the SDS will be used at a final concentration of 1% and for an incubation time of 1 hour at a temperature of 600C before centrifugation and precipitation.
According to a third aspect, the process for preparing a collection of nucleic acids from a soil sample above is also characterized in that step is followed by the following steps: homogenization of the soil suspension with a step of vigorous mixing (vortex) followed by a step of simple stirring; freezing of the homogeneous suspension followed by thawing; treatment of the suspension by sonication after thawing; incubation of the suspension at 370C after sonication in the presence of lysozyme and achromopeptidase; addition of SDS before centrifugation and precipitation of the nucleic acids; recovery of the nucleic acids.
Preferably, the suspensions of soil microparticles are mixed on the vortex machine and then homogenized by gentle stirring on a stirrer with circular rotation for a duration of two hours, after which they are frozen at -200C.
Preferably, the suspensions are again vigorously stirred with a vortex machine for 10 minutes, after thawing and before the sonication step.
It goes without saying that the nucleic acids extracted by the embodiments of the process described above for the direct extraction of nucleic acids are preferably purified according to the purification step consisting of a first passage over molecular sieves and then a subsequent passage, of the elution fractions obtained after the chromatography on molecular sieves, over an anion-exchange chromatographic support.
2. Indirect extraction of nucleic acids According to a second embodiment of the process for preparing a collection of nucleic acids from an environmental sample, according to the invention, the said environmental sample undergoes a first treatment which is of a nature to allow separation of the organisms, contained in this sample, from the other macro-constituents of the sample.
This second embodiment of the process for preparing a collection of nucleic acids according to the invention promotes the production of large nucleic acids, which are virtually impossible to obtain according to the first embodiment of the process according to the invention described above, the mechanical lysis step performed in order to obtain the microparticles also having the effect of physically breaking the nucleic acids in the soil sample or the nucleic acids contained in the organisms in the soil sample.
The production of large nucleic acids has been sought by the Applicant for the purpose of isolating and characterizing nucleic acids comprising, at least partially, all of the coding sequences belonging to the same operon capable of directing the biosynthesis of a compound of industrial interest.
Preferably, by carrying out the second embodiment of the process for preparing a collection of nucleic acids from a soil sample according to the invention, nucleic acids are obtained which are greater than 100 kb in size, preferably greater than 200, 250 or 300 kb, and most preferably nucleic acids greater than 400, 500 or even 600 kb in size.
This second embodiment of a process for preparing a collection of nucleic acids from an environmental sample according to the invention consists of a combination of four successive steps intended to obtain nucleic acids having the characteristics described above.
When the environmental sample is a soil sample, it has been shown according to the invention that a first step for obtaining a suspension by dispersing the soil sample in liquid medium promotes the accessibility of the organisms contained in the sample without bringing about any significant mechanical lysis of the cells.
The first step of obtaining a dispersion of the above soil sample makes the organisms in the sample accessible to the external medium and also allows a partial dissociation of the organisms in the sample and of the macro-constituents. It thus makes possible a subsequent separation of the organisms initially contained in the sample from the other constituents of this sample.
When the environmental sample originates, for example, from plants, from marine organisms or from insects, a pretreatment by grinding is necessary in order to make the organisms of the associated microflora accessible to the subsequent steps of the process.
Thus, the present process comprises a step of separating the organisms from the other inorganic and/or organic constituents obtained above by means of centrifugation on a density gradient. The organisms thus separated are then subjected to a step of lysis and then of extraction of the nucleic acids.
The step of centrifugation on a density gradient makes it possible, surprisingly, to separate the cells of organisms in the soil particles contained in the sample suspension. In point of fact, it might have been expected that a proportion of the cells would be entrained with the macroparticles in the gradient phase. In addition, it had never been demonstrated hitherto that a centrifugation of a soil sample on a density gradient made it possible to find, at the aqueous phase/gradient interface, a population of organisms representative of the diversity of the organisms present in the starting sample, due to the fact that these organisms are extremely variable in volume, density and shape. It could reasonably be assumed that they would be found either in the aqueous phase, at the aqueous phase/density gradient interface or in the density gradient itself.
Thus, a person skilled in the art could expect that organisms with densities less than or greater than the density of the density gradient used (density of the density gradient of between 1.2 and 1.5 g/ml, preferably 1.3 g/ml) could not be recovered, the effect of which would have been to introduce a bias into the representativeness of the organisms effectively separated and, consequently, also into the diversity of the nucleic acids extracted.
Also, in one specific embodiment of the process, a step of germination of spores, in particular of actinomycetes, is carried out, the effect of which is to significantly increase the amount of actinomycete DNA recovered.
The final step consists of a step of purifying the nucleic acids thus extracted on a caesium chloride gradient.
Surprisingly, the purification of the nucleic acids on the caesium chloride gradient allows a substantial or even complete removal of the substances of which the density gradient is composed. This characteristic is a determining factor as regards the subsequent use of the purified nucleic acids, since the density gradient is known as being a powerful enzymatic inhibitor, capable where appropriate of inhibiting the catalytic activity of the enzymes used to prepare the insertion of extracted nucleic acids into vectors.
According to this second embodiment, the process for preparing a collection of nucleic acids from an environmental sample containing organisms according to the invention comprises the succession of steps below: production of a suspension by dispersing the environmental sample in liquid medium and then homogenizing the suspension obtained by gentle stirring; (ii) separating the organisms from the other inorganic and/or organic constituents of the homogeneous suspension obtained in step (i) by centrifugation on a density gradient; (iii) lysis of the microorganisms separated in step (ii) and extraction of the nucleic acids; (iv) purification of the nucleic acids on a caesium chloride gradient.
Preferably, the suspension of the soil sample is obtained by dispersing this sample by grinding with the aid of a device such as a Waring Blender or a device of equivalent characteristics. In an entirely preferred manner, the sample suspension is obtained after three successive grinding operations each lasting one minute in a device such as a Waring Blender. Preferably, the ground sample will be cooled in ice between each of the grinding operations.
Preferably, the organisms are then separated from the soil particles by centrifugation on a density cushion of the "Nycodenz" type, sold by the company Nycomed Pharma AS. (Oslo, Norway). The preferred centrifugation conditions are 10,000xg for 40 minutes at advantageously in a rotor with swing-out buckets of the "rotor TST 28.38" type sold by the company Kontron.
The ring of organisms located, after centrifugation, at the interphase of the upper aqueous phase and the lower Nycodenz phase is then removed and washed by centrifugation before taking up the cell pellet in a suitable buffer.
Step (iii) of lysis of the organisms separated out in step (ii) described above can be carried out in any manner known to those skilled in the art.
Advantageously, the cells are lysed in a 10 mM Tris-100 mM EDTA solution at pH 8.0 in the presence of lysozyme and achromopeptidase, advantageously for one hour at 370C.
The actual extraction of the DNA can advantageously be carried out by adding a solution of lauryl sarcosyl of the final weight of the solution) in the presence of proteinase K and incubation of the final solution at 370C for 30 minutes.
The nucleic acids extracted in step (iii) are then purified on a caesium chloride gradient. Preferably, the step of purifying the nucleic acids on a caesium chloride gradient is carried out by centrifugation at 35,000 rpm for 36 hours, for example on a rotor of the Kontron 65.13 type.
According to one specific aspect of the process for preparing a collection of nucleic acids from a soil sample containing organisms according to the invention, the said nucleic acids consist predominantly, if not exclusively, of DNA molecules.
According to another aspect, the nucleic acids can be recovered after inclusion of the organisms, separated on a density gradient, in an agarose block and lysis, for example chemical and/or enzymatic lysis, or the organisms included in the agarose block.
Another subject of the invention consists of a collection of nucleic acids consisting of the nucleic acids obtained in step ll-(iv) of the process for preparing a collection of nucleic acids according to the invention, or alternatively obtained in step or a subsequent step of the process for preparing a collection of nucleic acids according to the invention.
The invention also relates to a nucleic acid which is characterized in that it is contained in a collection of nucleic acids as defined above.
According to a first aspect, such a nucleic acid constituting a collection of nucleic acids according to the invention is characterized in that it comprises a nucleotide sequence encoding at least one operon, or part of an operon.
Most preferably, such an operon encodes all or part of a metabolic pathway.
Example 9 describes the construction of a genomic DNA library from a strain of Streptomyces alboniger and its cloning into the shuttle cosmids pOS7001 and pOS700R, respectively. It has been shown according to the invention that, in the DNA library prepared in the integrative vector pOS7001, new clones contain nucleotide sequences belonging to the operon responsible for the puromyocin biosynthetic pathway. Similarly, twelve clones containing nucleotide sequences of the operon responsible for the puromycin biosynthetic pathway have been identified in the DNA library prepared in the replicative vector pOS 700R.
In particular, certain integrative and replicative cosmids of the libraries produced have, after digestion with the restriction endonucleases Clal and EcoRV, a 12-kb fragment capable of containing all of the sequences of the operon responsible for the puromycin biosynthetic pathway.
Thus, according to another aspect, a nucleic acid according to the invention contains, at least partially, nucleotide sequences of the operon responsible for the puromycin biosynthetic pathway.
Example 2 below describes the construction of a DNA library according to a process in accordance with the present invention, in a pBluescript SK- vector starting with a soil contaminated with lindane.
The recombinant vectors were transfected into Escherichia coil DH1OB cells and the transformed cells were then cultured in a suitable culture medium in the presence of lindane. Screening of the clones on transformed cells of the library made it possible to show that, out of 10,000 screened clones, 35 of them had a lindane degradation phenotype.
The presence of the linA gene in these clones was confirmed by PCT amplification by means of primers specific for this gene.
Thus, according to another aspect, the invention also relates to a nucleic acid containing a nucleotide sequence for the metabolic pathway which brings about the biodegradation of lindane.
It is thus clearly demonstrated, as described above, that a process for preparing a collection of nucleic acids from a soil sample containing organisms according to the invention and a process for preparing a collection of recombinant vectors containing the constituent nucleic acids of the collection of abovementioned nucleic acids was entirely suitable for the isolation and characterization of nucleotide sequences included in an operon.
An additional demonstration of the ability of a process according to the invention to identify coding nucleotide sequences involved in a biosynthetic pathway regulated in the form of an operon is also described later: this concerns the cloning and characterization of sequences encoding polyketide synthases involved in the pathway for the biosynthesis of polyketides, which belong to a family of molecules certain representatives of which are of major therapeutic interest, in particular antibiotic interest.
A subject of the present invention is thus also a constituent nucleic acid of a collection of nucleic acids according to the invention, characterized in that it comprises all of a nucleotide sequence encoding a polypeptide.
According to a first aspect, a constituent nucleic acid of a collection of nucleic acids according to the invention is of prokaryotic origin.
According to a second aspect, a constituent nucleic acid of a collection of nucleic acids according to the invention originates from a bacterium or from a virus.
According to a third aspect, a constituent nucleic acid of a collection of nucleic acids according to the invention is of eukaryotic origin.
In particular, such a nucleic acid is characterized in that it originates from a fungus, a yeast, a plant or an animal.
MOLECULAR CHARACTERIZATION OF THE COLLECTION OF NUCLEIC ACIDS EXTRACTED FROM THE SOIL.
In order to overcome the various technical drawbacks of the methods for characterizing libraries of DNA extracted and purified from an environmental sample which have been described in the section of the description relating to the prior art, the Applicant has developed a simple and reliable process for qualitatively and semi-quantitatively characterizing the nucleic acids obtained from the process described above.
The process according to the invention thus consists in universally amplifying a 700 bp fragment located inside a sequence of ribosomal DNA of 16S type, and then in hybridizing the amplified DNA with an oligonucleotide probe of variable specificity and finally in comparing the hybridization intensity of the sample relative to an external calibration range of DNA of known sequence or origin.
The amplification prior to the hybridization with the oligonucleotide probe makes it possible to quantify relatively scarce microorganism genera or species. Furthermore, the amplification with universal primers makes it possible, during the hybridization, to use a broad series of oligonucleotide probes.
Thus, a subject of the invention is also a process for determining the diversity of nucleic acids contained in a collection of nucleic acids, and most particularly of a collection of nucleic acids originating from an environmental sample, preferably from a soil sample, the said process comprising the following steps: placing the nucleic acids of the collection of nucleic acids to be tested in contact with a pair of oligonucleotide primers hybridizing at any sequence of bacterial 16S ribosomal DNA; carrying out at least three amplification cycles; detection of the amplified nucleic acids using an oligonucleotide probe or a plurality of oligonucleotide probes, each probe hybridizing specifically with a 16S ribosomal DNA sequence common to a bacterial kingdom, order, subclass or genus; where appropriate, comparison of the results from the preceding detection step with the detection results, using the probe or the plurality of probes of nucleic acids of known sequence constituting a calibration range.
Preferably, a first pair of primers hybridizing with universally conserved regions of the gene for the 16S ribosomal RNA consists, respectively, of the primers FGPS 612 (SEQ ID No 12) and FGPS 669 (SEQ ID No 13).
A second embodiment of a preferred pair of primers according to the invention consists of the pair of universal primers 63 f (SEQ ID No 22) and 1387 r (SEQ ID No 23).
According to one specific embodiment of a process for determining the diversity of nucleic acids in a collection of nucleic acids, the amplification step using a pair of universal primers can be carried out on a collection of recombinant vectors into each of which has been inserted a nucleic acid from the collection of nucleic acids under consideration, prior to the step of hybridization with the oligonucleotide probes specific for a particular bacterial kingdom, order, subclass or genus.
Such a process for determining the diversity of the nucleic acids contained in a collection is most particularly applicable to the collections of nucleic acids obtained in accordance with the teaching of the present description.
Thus, Example 3 details a process for preparing a collection of nucleic acids from a soil sample containing organisms, comprising a step of indirect extraction of DNA by dispersion of a soil sample prior to the separation of the cells on a Nycodenz gradient, lysis of the cells and then purification of the DNA on a caesium chloride gradient.
The collection of nucleic acids thus obtained was used as obtained or in the form of inserts into vectors of cosmid type in an amplification process using the abovementioned universal primers for 16S rDNA, and the amplified DNA was then subjected to a step of detection using oligonucleotide probes of sequences SEQ ID No 14 to SEQ ID No 21 which are presented in Table 4.
The results show that a process for preparing a collection of nucleic acids starting with a soil sample containing organisms according to the invention makes it possible to gain access to the DNA of more than 14% of the total telluric microflora, i.e. 2 x 108 cells per gram of soil, whereas the total microflora which can be cultured represents barely 2% of the total microbial population.
In order to determine the phylogenetic diversity of a collection of nucleic acids prepared in accordance with the invention, 47 sequences of the 16S rRNA gene were isolated and sequenced. These sequences correspond, respectively, to the nucleotide sequences SEQ ID No 60 to SEQ ID No 106.
The nucleic acids comprising the sequences SEQ ID No 60 to SEQ ID No 106 also form part of the invention, as do nucleic acids possessing at least 99%, preferably 99.5% or 99.8%, nucleic acid identity with the nucleic acids comprising the sequences SEQ ID No 60 to SEQ ID No 106. Such sequences can be used in particular as probes for screening clones of a DNA library and for thus identifying those, among the clones of the library, which contain such sequences, these sequences being liable to be close to coding sequences of interest, such as sequences encoding enzymes involved in the biosynthetic pathway of antibiotic metabolites, for example polyketides.
Comparison of the sequences of 16S rRNA from a DNA library prepared in accordance with the invention, with the sequences listed in the RDP database (Maidak Cole Parker Garrity G.M., Larsen Li Lilburn McCaughey, Olsen Overbeek R., Pramanik Schmidt Tiedje Woese C.R. (1999) "A new project of the RDP (Ribosomal Database Project)" Nucleic Acids Research Vol.
27: 171-173) made it possible to determine that the nucleic acids contained in a collection of nucleic acids according to the invention originate from a-proteobacteria, from P-proteobacteria, from 5-proteobacteria, from y-proteobacteria, from actinomycetes and from a genus related to acidobacterium. These results, presented in Table 7 and in the phylogenetic tree in Figure 7, take account of the huge phylogenetic diversity of the nucleic acids contained in a DNA library prepared in accordance with the process according to the invention.
CLONING AND/OR EXPRESSION VECTORS Each of the nucleic acids contained in a collection of nucleic acids prepared in accordance with the invention can be inserted into a cloning and/or expression vector.
For this purpose, any type of vector known in the prior art can be used, such as viral vectors, phages, plasmids, phagemids, cosmids, phosmids, vectors of BAC type, P1 bacteriophages, vectors of BAC type, vectors of YAC type, yeast plasmids or any other vector known in the prior art to a person skilled in the art.
Use will advantageously be made according to the invention of vectors which allow a stable expression of the nucleic acids of a DNA library. To this end, such vectors preferentially include transcriptionregulation sequences which are operably linked with the genomic insert so as to allow the initiation and/or regulation of the expression of at least a portion of the said DNA insert.
It results from the text hereinabove that the invention also relates to a process for preparing a collection of recombinant vectors, characterized in that the nucleic acids obtained in step ll-(iv) or in step I-(c) or any other subsequent step of a process for preparing a collection of nucleic acids from a soil sample containing organisms according to the invention are inserted into a cloning and/or expression vector.
Prior to their insertion into a cloning and/or expression vector, the constituent nucleic acids of a collection of nucleic acids according to the invention can be separated as a function of their size, for example by electrophoresis on an agarose gel, where appropriate after digestion with a restriction endonuclease.
According to another aspect, the average size of the constituent nucleic acids of a collection of nucleic acids according to the invention can be rendered into a substantially uniform size by carrying out a step of physical rupture prior to their insertion into the cloning and/or expression vector.
Such a step of physical or mechanical rupture of nucleic acids can consist of successive passages of these nucleic acids, in solution, in a metal channel about 0.4 mm in diameter, for example the channel of a syringe needle having such a diameter.
The average size of the nucleic acids can be, in this case, between 30 and 40 kb in length.
The construction of the vectors that are preferred according to the invention is represented schematically in Figures 25 (conjugative integrative cosmid) and 26 (integrative BAC).
Cloning and/or expression vectors which can be used advantageously for the purposes of inserting nucleic acids contained in a DNA library or collection according to the invention are, in particular, the vectors described in European patent No EP 0 350 341 and in US patent No 5 688 689, such vectors being especially suitable for the transformation of actinomycete strains. Such vectors contain, besides an insert DNA sequence, an attachment sequence att and a DNA sequence encoding an integrase (int sequence) which is functional in actinomycete strains.
However, it has been observed according to the invention that certain cloning and/or expression vectors had drawbacks and that their theoretical functional capacity was not achieved in practice.
Thus, it was seen that the integration system contained in vectors of the prior art, and in particular in the vectors described in European patent No EP 0 350 341, do not in reality allow good integration of the DNA insert from the library into the bacterial chromosome.
Starting from the hypothesis that the functional defects in the integration of such vectors into the bacterial chromosome were due to a defect in the expression of the integrase gene present in these vectors, the Applicant first attempted to increase the expression of the integrase gene by replacing the initial transcription promoter with a transcription promoter capable of significantly increasing the number of integrase transcripts.
The results were disappointing and the function of integration of these vectors into the chromosome was not improved.
Surprisingly, it has been shown according to the invention that the integrase expression difficulties contained in this family of integrative vectors did not lie in the amount of transcript expression, but in the stability of the transcripts.
According to a second hypothesis, the Applicant was able to show that the stability defect of the integrase transcripts was caused by defects in termination of the transcription of the corresponding messenger RNA.
The Applicant thus inserted a stop site placed downstream of the sequence encoding the integrase of the vector so as to obtain a messenger RNA of given size. The insertion of an additional termination signal downstream of the nucleotide sequence encoding the integrase of the vector made it possible to obtain a family of integrative vectors of cosmid type and of BAC type.
Preferentially, the stop site is placed downstream of the attachment site att.
In addition, the Applicant has developed novel conjugative vectors and novel replicative vectors of cosmid type and novel conjugative vectors of BAC type which can be used advantageously to insert constituent nucleic acids of a collection of nucleic acids prepared according to the process of the invention.
When the insertion of DNA fragments of average size is desired, vectors of the cosmid type, capable of receiving inserts having a maximum size of about 50 kb, are preferably used.
Such cosmid vectors are most particularly suitable for inserting constituent nucleic acids of a collection of nucleic acids obtained according to the process of the invention comprising a first step of direct DNA extraction by mechanical lysis of the organisms contained in the initial soil sample.
When the insertion of large nucleic acids, in particular of nucleic acids greater than 100 kb in size, or even greater than 200, 300, 400, 500 or 600 kb, is desired, use will then preferentially be made of vectors of the BAC type which are capable of receiving DNA inserts of such a size.
Such vectors of BAC type are most particularly suitable for inserting constituent nucleic acids of a collection of nucleic acids obtained in accordance with the process according to the invention, in which the first step consists of an indirect extraction of the DNA by prior separation of the organisms contained in the initial soil sample and removal of the macroconstituents from the said soil sample.
In particular, vectors of the BAC type are advantageously used to insert large nucleic acids containing, at least partially, the nucleotide sequence of an operon.
Thus, the process for preparing a collection of recombinant cloning and/or expression vectors according to the invention is also characterized in that the cloning and/or expression vector is of the plasmid type.
According to another aspect, such a process is characterized in that the cloning and/or expression vector is of the cosmid type.
According to a first aspect, it can be a cosmid which is replicative in E. coli and integrative in Streptomyces. An entirely preferred cosmid corresponding to such a definition is the cosmid pOS7001 described in Example 3.
According to yet another aspect, the cosmid vector is conjugative and integrative in Streptomyces.
In general, conjugative vectors of cosmid type or of BAC type, which comprise in their nucleotide sequences a unit recognized by the cellular enzymatic machinery known as a "conjugation origin", are used whenever it is desired to avoid resorting to laborious transformation techniques that are difficult to automate.
For example, the transfection of vectors initially harboured by E. coli cells into Streptomyces cells conventionally requires a step of recovering the recombinant vector contained in the Escherichia coli cells, and purifying it prior to the step of transforming Streptomyces protoplasts.
It is commonly accepted that a transfection of an assembly of 1000 Escherichia coli clones into Streptomyces requires the production of about 8000 clones in order for each E. coli clone to have a chance of being represented.
Conversely, a step of transfection by conjugating a vector harboured by E. coli into Streptomyces cells requires the same number of clones of each of the microorganisms, the conjugation step taking place "clone to clone" and moreover not comprising the technical difficulties associated with the step for transferring genetic material by transformation of protoplasts, for example in the presence of polyethylene glycol.
In order to optimize the construction of a DNA library in Streptomyces, novel conjugative vectors of cosmid type and of BAC type which are of a nature to allow maximum efficacy of the conjugation step have been developed according to the invention.
In particular, the novel conjugative vectors according to the invention have been constructed by placing a selection marker gene at the end of the DNA of the vector which is transferred into the recipient bacterium at the end. This improvement to the conjugative vectors of the prior art makes it possible to positively select only the recipient bacteria which have received all of the vector DNA and, consequently, all of the insert DNA of interest.
Cosmids which are conjugative and integrative in Streptomyces and which are preferred according to the invention are the cosmids pOSV303, pOSV306 and pOSV307 described in Example According to another aspect, a process for preparing a collection of recombinant vectors according to the invention is carried out using a cosmid which is replicative both in E. coli and in Streptomyces. Such a cosmid is advantageously the cosmid pOS700R described in Example 6.
According to yet another aspect, the above process can be carried out with a cosmid which is replicative in E. coli and Streptomyces and conjugative in Streptomyces.
Such a replicative and conjugative cosmid can be obtained from a replicative cosmid in accordance with the invention, by inserting a suitable transfer origin, such as RK2, as described in Example 5 for the construction of the vector pOSV303.
According to another advantageous embodiment of the process for preparing a collection of recombinant vectors according to the invention, use is made of a cloning and/or expression vector of BAC type.
According to a first aspect, the vector of the BAC type is integrative and conjugative in Streptomyces.
In an entirely preferred manner, such a BAC vector which is integrative and conjugative in Streptomyces is the vector BAC pOSV403 described in Example 8 or else the vectors BAC pMBD-1, pMBD-2, pMBD- 3, pMBD-4, pMBD-5 and pMBD-6 described in Example A subject of the invention is also a recombinant vector, characterized in that it is chosen from the following recombinant vectors: a) a vector comprising a constituent nucleic acid of a collection of nucleic acids according to the invention; b) a vector as obtained according to a process which avoids any involvement of the action of a restriction endonuclease on the DNA fragment to be inserted, as described previously.
In an entirely preferable manner, the invention also relates to a vector chosen from the following vectors: the cosmid pOS7001; the cosmid pOSV303; the cosmid pOSV306; the cosmid pOSV307; the cosmid pOS700R; the vector BAC pOSV403; the vector BAC pMBD-1; the vector BAC pMBD-2; the vector BAC pMBD-3; the vector BAC pMBD-4; the vector BAC the vector BAC pMBD-6.
The invention also relates to a collection of recombinant vectors as obtained according to any one of the processes according to the invention.
Process for preparing a recombinant cloning and/or expression vector according to the invention.
The conventional techniques for inserting DNA into a vector in order to prepare a recombinant cloning and/or expression vector conventionally involve a first step in which a restriction endonuclease is incubated both with the DNA to be inserted and with the recipient vector, thus creating compatible ends between the DNA to be inserted and the vector DNA, allowing the assembly of the two DNAs before a final ligation step allowing the production of the recombinant vector.
However, such a conventional technique has notable drawbacks, most particularly when it is desired to insert large nucleic acids into a cloning and/or expression vector.
Specifically, the prior action of a restriction enzyme on the DNA fragments intended to be inserted into a vector is liable to appreciably reduce the size of this DNA prior to its insertion into the vector. It goes without saying that a significant reduction in the size of the DNA prior to its insertion into a vector is a situation that is particularly unfavourable when it is desired to clone large fragments of DNA liable to contain all of the coding sequences and, where appropriate, also the regulatory sequences, of an operon whose expression constitutes a complete biosynthetic pathway of a metabolite of industrial interest, and most particularly of a compound of therapeutic interest.
To overcome the drawbacks of the prior art, two processes have been developed according to the invention, for preparing a recombinant cloning and/or expression vector which do not use a restriction endonuclease on the DNA to be inserted prior to its introduction into the vector. Such processes are consequently entirely suitable for cloning long DNA fragments liable to contain, at least partially, all of the coding sequences and, where appropriate, also the regulatory sequences, of a complete operon responsible for a biosynthetic pathway.
According to a first aspect, one process for preparing a recombinant cloning and/or expression vector according to the invention is characterized in that the insertion of a nucleic acid into the cloning and/or expression vector comprises the following steps: opening the cloning and/or expression vector at a chosen cloning site, using a suitable restriction endonuclease; adding a first homopolymeric nucleic acid at the free 3' end of the open vector; adding a second homopolymeric nucleic acid, whose sequence is complementary to the first homopolymeric nucleic acid, at the free 3' end of the nucleic acid to be inserted into the vector; assembling the nucleic acid of the vector and the nucleic acid by hybridizing the first and second homopolymeric nucleic acids of mutually complementary sequence; closing the vector by ligation.
Such a process is described in Examples 10 and 13 below.
Advantageously, the above process can comprise the following characteristics, separately or in combination: the first homopolymeric nucleic acid is of poly(A) or poly(T) sequence; the second homopolymeric nucleic acid is of poly(T) or poly(A) sequence.
In an entirely preferred manner, the homopolymeric nucleic acids have a length of between 25 and 100 nucleotide bases, preferably between 25 and 70 nucleotide bases.
The process for preparing a recombinant cloning and/or expression vector described above is particularly suitable for the construction of DNA libraries in vectors of BAC type. Thus, according to one advantageous embodiment of the process for preparing a recombinant vector described above, the said process is also characterized in that the size of the nucleic acid to be inserted is at least 100 kb and preferably at least 200, 300, 400, 500 or 600 kb.
Such a preparation process is thus particularly suited to the insertion of nucleic acids contained in a collection of nucleic acids obtained according to the process of the invention.
In order to allow the insertion of large DNA fragments into cloning and/or expression vectors, a second process has been developed according to the invention, which makes it possible to dispense with any use of a restriction endonuclease on the DNA intended to be inserted into the vector.
Such a process for preparing a recombinant cloning and/or expression vector according to the invention is characterized in that the step of inserting a nucleic acid into the said cloning and/or expression vector comprises the following steps: creation of blunt ends on the ends of the nucleic acid of the collection by removing the protruding 3' sequences and filling in the protruding 5' sequences; opening the cloning and/or expression vector at a chosen cloning site using a suitable restriction endonuclease; adding complementary oligonucleotide adapters; creation of blunt ends at the ends of the vector nucleic acid by removing the protruding 3' sequences and filling in the protruding sequences, then dephosphorylating the 5' ends in order to prevent a recircularization of the vector; inserting the nucleic acid of the collection into the vector by ligation.
Preferably, the removal of the protruding 3' sequences is carried out using an exonuclease, such as the Klenow enzyme.
Preferably, the filling in of the protruding 5' sequences is carried out using a polymerase, and most preferably T4 polymerase, in the presence of the four nucleotide triphosphates.
A process for preparing a recombinant cloning and/or expression vector by removing the protruding 3' sequences and filling in the protruding sequences as described above is particularly suitable for the construction of DNA libraries from vectors of cosmid type.
Such a process for obtaining recombinant vectors is described in Example 12.
In one specific method for preparing a recombinant vector according to the invention, oligonucleotides comprising one or more rare restriction sites are added to the vector in the cloning site of the DNA to be inserted, in accordance with the teaching of Example 10. This addition of oligonucleotides facilitates the subsequent recovery of the inserts without cleavage thereof.
HOST CELLS Although any type of host cell can be used for the transfection or transformation with a nucleic acid or a recombinant vector according to the invention, in particular a prokaryotic or eukaryotic host cell, host cells whose physiological, biochemical and genetic properties are well characterized, which can be cultured easily on a large scale and whose culturing conditions for the production of metabolites are well known will preferably be used.
Preferably, the host cell receiving a nucleic acid or a recombinant vector according to the invention is phylogenetically close to the donor organisms initially contained in the environmental sample from which the nucleic acids originate.
In a most preferred manner, a host cell according to the invention should have a similar, or at least close, codon usage in the donor organisms initially present in the environmental sample, most particularly in the soil sample.
The size of the DNA fragments liable to carry the desired nucleotide sequences of interest can be variable. Thus, enzymes encoded by genes with an average size of 1 kb may be expressed using inserts of small size, while the expression of secondary metabolites will require the maintenance in the host organism of much larger fragments, for example from 40 kb to more than 100 kb, 200 kb, 300 kb, 400 kb or 600 kb.
Thus, the host cells of Escherichia coli constitute a preferred choice for cloning large DNA fragments.
In a most preferred manner, use will be made of the Escherichia coli strain known as DH10B and described by Shizuya et al. (1992), for which protocols for cloning into BAC vectors have been optimized.
However, other strains of Escherichia coli can be used advantageously to construct a DNA library according to the invention, such as the strains E.coi Sure, E.coliDH5 ao, or E.coli294 (ATCC No. 31446).
In addition, the construction of a DNA library by transfecting E. coli cells with recombinant vectors according to the invention is also possible, the expression of genes of various prokaryotes such as Bacillus, Thermotoga, Corynebacterium, Lactobacillus or Clostridium having been described in PCT patent application No WO 99/20799.
In general, E. coli host cells can in all cases constitute transient hosts in which recombinant vectors according to the invention may be maintained highly effectively, it being possible for the genetic material to be handled easily and archived stably.
For the purposes of expressing the widest possible molecular diversity, other host cells may also advantageously be used, such as Bacillus, Pseudomonas, Streptomyces, Myxococcus, Aspergillus nidulans or Neurospora crassa cells.
It has also been shown according to the present invention that Streptomyces lividans cells can be used successfully and constitute expression systems complementary to Escherichia coli.
Streptomyces lividans constitutes a model for studying the genetics of Streptomyces and has also been used as a host for the heterologous expression of many secondary metabolites. Streptomyces lividans has, in common with other actinomycetes such as Streptomyces coelicolor, Streptomyces griseus, Streptomyces fradiae and Streptomyces griseochromogenes, the precursor molecules and the regulatory systems required for the expression of all or part of complex biosynthetic pathways, such as, for example, the polyketide biosynthetic pathway or the pathway for the biosynthesis of non-ribosomal polypeptides representing classes of molecules of very diverse structure.
Streptomyces lividans also has the advantage of accepting foreign DNA with high transformation efficacies.
Thus, the invention also relates to a recombinant host cell comprising a nucleic acid according to the invention, which is a constituent of a collection of nucleic acids prepared according to a process in accordance with the invention, or alternatively a recombinant host cell comprising a recombinant vector as defined above.
According to a first aspect, it may be a recombinant host cell of prokaryotic or eukaryotic origin.
Advantageously, a recombinant cell according to the invention is a bacterium, and most preferably a bacterium chosen from E. col and Streptomyces.
According to another aspect, a recombinant host cell according to the invention is characterized in that it is a yeast or a filamentous fungus.
The invention also relates to a collection of recombinant host cells, each of the constituent host cells of the collection comprising a nucleic acid originating from a collection of nucleic acids prepared in accordance with a process for preparing a collection of nucleic acids from a soil sample containing organisms as described above.
The invention also relates to a collection of recombinant host cells, each of the constituent host cells of the collection comprising a recombinant vector according to the invention.
On account of the large size of the inserts, it is necessary to have maximum transformation efficacy. With this aim, a recipient strain of Streptomyces lividans constitutively expressing the pSAM2 integrase in order to promote the site-specific integration of the vector is preferred. For this, the int gene under the control of a strong promoter is integrated into the chromosome. The overproduction of integrase does not induce any excision phenomena (Raynal et al., 1998).
The production of a novel metabolite from the insert might be toxic for Streptomyces if the insert does not contain genes for resistance to the antibiotic produced or if this gene is not expressed or only expressed to a small extent. The capacity of the various genes for allowing Streptomyces ambofaciens to resist the antibiotic that it produces has been studied (Gourmelen et al., 1998; Pernodet et al., 1999). Some of these genes encode transporters of ABC type which are liable to impart a broad spectrum of resistance. These genes can be introduced into and overexpressed in the Streptomyces lividans host strain.
Conversely, a strain that is hypersensitive to antibiotics can be used (Pernodet et al., 1996) in order to detect the presence of resistance genes in the library. Specifically, in antibiotic-producing microorganisms, these resistance genes are often associated with the genes for the biosynthetic pathway of the antibiotic. The selection of resistance clones can make it possible to carry out a first sorting easily before the more complex tests for detecting a novel metabolite produced by the clone.
ISOLATION AND CHARACTERIZATION OF NOVEL NUCLEOTIDE SEQUENCES ENCODING POLYKETIDE SYNTHASES.
According to the invention, a collection of recombinant host cells was obtained after transfecting host cells with a collection of recombinant vectors each containing a nucleic acid insert originating from a collection of nucleic acids prepared in accordance with the process according to the invention.
More specifically, the DNA fragments obtained according to the process of the invention, in which a step of indirect extraction of DNA from the organisms contained in the soil sample is carried out, were first cloned into the integrative cosmid pOS7001.
The step of inserting DNA fragments into the integrative cosmid pOS7001 was carried out according to the process of the invention in which homopolymeric polynucleotide tails poly(A) and poly(T) were added to the 3' end of the vector nucleic acid and of the DNA fragments to be inserted, respectively.
The recombinant vectors thus constructed were encapsidated in lambda phage heads and the phages obtained were used to infect E. coli cells according to techniques that are well known to those skilled in the art.
A library of about 5000 Escherichia coli clones was obtained.
This library of clones was screened with pairs of primers specific for a nucleotide sequence encoding an enzyme involved in the polyketide biosynthetic pathway, the type I PKS enzyme, also known as (-ketoacyl synthase.
It is recalled here that polyketides constitute a chemical category of wide structural diversity comprising a large number of molecules of pharmaceutical interest such as tylosin, monensin, vermectin, erythromycin, doxorubicin or FK506.
Polyketides are synthesized by condensation of acetate molecules under the action of enzymes known as polyketide synthases (PKSs). Two types of polyketide synthase exist. The type II polyketide synthases are generally involved in the synthesis of polycyclic aromatic antibiotics and catalyze the iterative condensation of acetate units.
The type I polyketide synthases are involved in the synthesis of macrocyclic or macrolide polyketides and constitute modular multifunctional enzymes.
Given their therapeutic interest, there is a need in the state of the art to isolate and characterize novel polyketide synthases which can be used for the production of novel pharmaceutical compounds, in particular novel pharmaceutical compounds with antibiotic activity.
The screening of the library of recombinant clones described above using PCR primers which selectively amplify nucleotide sequences encoding type I polyketide synthases has made it possible to identify recombinant clones containing DNA inserts comprising a nucleotide sequence encoding novel polyketide synthases. The nucleotide sequences encoding these novel polyketide synthases are referenced as the sequences SEQ ID No 33 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120.
Another subject of the invention consists of a nucleic acid encoding a novel polyketide synthase I, characterized in that it comprises one of the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120.
Preferably, such a nucleic acid is in isolated and/or purified form.
The invention also relates to a recombinant vector comprising a polynucleotide comprising one of the sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120.
The invention also relates to a recombinant host cell comprising a nucleic acid chosen from polynucleotides comprising one of the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120 as well as to a recombinant host cell comprising a recombinant vector into which is inserted a polynucleotide comprising one of the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No.
115 to SEQ ID No. 120.
Advantageously, the recombinant vectors containing a DNA insert encoding a novel type I polyketide synthase according to the invention are cloning and expression vectors.
Preferably, a recombinant host cell as described above is a bacterium, a yeast or a filamentous fungus.
The amino acid sequences of novel polyketide synthases originating from organisms contained in a soil sample were deduced from the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No.
115 to SEQ ID No. 120 above. They are polypeptides comprising one of the amino acid sequences SEQ ID No 48 to SEQ ID No 59 and SEQ ID No. 121 to 126.
The invention also relates to novel polyketide synthases comprising an amino acid sequence chosen from the sequences SEQ ID No 48 to SEQ ID No 59 and SEQ ID No. 121 to SEQ ID No. 126.
The nucleotide sequence SEQ ID No. 114 which comprises six open reading frames respectively encoding the polypeptides of sequences SEQ ID No. 121 to SEQ ID No. 126 also forms part of the invention.
The nucleotide sequence SEQ ID No. 113 of the a26G1 cosmid, which contains the sequence complementary to the sequence SEQ ID No. 114 also forms part of the invention.
Genomic DNA originating from pure bacterial strains, such as Streptomyces coelicolor (ATCC No. 101.478), Streptomyces ambofaciens (NRRL No. 2.420), Streptomyces lactamandurans (ATCC No. 27.382), Streptomyces rimosus (ATCC No. 109.610), Bacillus subtilis (ATCC No. 6633) or Bacillus lichenifornis and Saccharopolyspora erythrea, was also extracted and amplified according to the invention.
A PCR amplification of DNA from each of the bacterial strains described above was carried out using pairs of primers specific for the nucleic acid sequences of type I polyketide synthase.
Novel bacterial type I polyketide synthase genes were thus able to be isolated and characterized. These are the nucleic acid sequences SEQ ID No 30 to SEQ ID No 32.
A subject of the invention is also, therefore, nucleotide sequences encoding novel type I polyketide synthases chosen from the polynucleotides comprising one of the nucleotide sequences SEQ ID No to SEQ ID No 32.
Recombinant vectors comprising the nucleotide sequences encoding novel type I polyketide synthases defined above also form part of the invention.
The invention also relates to recombinant host cells, characterized in that they contain a nucleic acid encoding a novel type I polyketide synthase comprising a nucleotide sequence chosen from the sequences SEQ ID No 30 to SEQ ID No 32 and recombinant host cells comprising a recombinant vector as defined above.
A subject of the invention is also polypeptides encoded by sequences comprising the nucleic acids SEQ ID No 30 to 32, and more specifically polypeptides comprising the amino acid sequences SEQ ID No 47 to SEQ ID No A subject of the invention is also a process for producing a type I polyketide synthase according to the invention, the said production process comprising the following steps: production of a recombinant host cell comprising a nucleic acid encoding a type I polyketide synthase comprising a nucleotide sequence chosen from the sequences SEQ ID No 33 to SEQ ID No 44 and SEQ ID No 30 to SEQ ID No 32 and SEQ ID No. 115 to SEQ ID No. 120; culturing of the recombinant host cells in a suitable culture medium; recovery and, where appropriate, purification of the type I polyketide synthase from the culture supernatant or from the cell lysate.
The novel type I polyketide synthases obtained according to the process described above can be characterized by binding to an immunoaffinity chromatography column onto which antibodies recognizing these polyketide synthases have been pre-immobilized.
The type I polyketide synthases according to the invention, and more particularly the recombinant polyketide synthases described above, can also be purified by high performance liquid chromatography (HPLC) techniques such as, for example, reverse-phase chromatography techniques or anion-exchange or cation-exchange chromatography techniques, that are well known to those skilled in the art.
The recombinant or non-recombinant polyketide synthases according to the invention can be used for the preparation of antibodies.
According to another aspect, a subject of the invention is also an antibody which specifically recognizes a type I polyketide synthase according to the invention or a peptide fragment of such a polyketide synthase.
The antibodies according to the invention may be monoclonal or polyclonal. The monoclonal antibodies can be prepared from hybridoma cells according to the technique described by Kohler and Milstein C.
(1975), Nature, Vol. 256:495.
The polycional antibodies can be prepared by immunizing a mammal, in particular mice, rats or rabbits, with a type I polyketide synthase according to the invention, where appropriate in the presence of an immunity-adjuvant compound, such as complete Freund's adjuvant, incomplete Freund's adjuvant, aluminium hydroxide or a compound from the muramyl peptide family.
For the purposes of the present invention, antibody fragments such as the Fab, Fab', F(ab') 2 or single-chain antibody fragments containing the variable portion (ScFv) described by Martineau et al. (1998) J. Mol. Biol., Vol. 280 (1):117-127 or in US patent 4 946 778, and the humanized antibodies described by Reinmann KA et al. (1997), AIDS Res.
Hum. Retroviruses, Vol. 13(11):933-943 or by Leger O.J et al. (1997), Hum. Antibodies, Vol. 8 3-16, also constitute "antibodies".
The antibody preparations according to the invention are useful in particular in qualitative or quantitative immunological tests intended either simply to detect the presence of a type I polyketide synthase according to the invention or to quantify the amount of this polyketide synthase, for example in the culture supernatant or the cell lysate of a bacterial strain capable of producing such an enzyme.
Another subject of the invention consists of a process for detecting a type I polyketide synthase according to the invention or a peptide fragment of this enzyme, in a sample, the said process comprising the steps of: a) placing an antibody according to the invention in contact with the sample to be tested; b) detecting the antigen/antibody complex possibly formed.
The invention also relates to a kit or equipment for detecting a type I polyketide synthase according to the invention in a sample, comprising: a) an antibody according to the invention; b) where appropriate, reagents required for detecting the antigen/antibody complex possibly formed.
An antibody directed against a type I polyketide synthase according to the invention can be labelled using an isotopic or non-isotopic detectable label, according to processes that are well known to those skilled in the art.
Screening of a DNA library according to the invention using a pair of primers which hybridize with target sequences whose presence is desired, such as sequences of the puromycin biosynthetic pathway, sequences of the linA gene involved in the biodegradation of lindane or sequences encoding type I polyketide synthases, have been detailed hereinabove.
A subject of the invention is thus a process for detecting a nucleic acid of given nucleotide sequence, or whose nucleotide sequence is structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: placing the collection of recombinant host cells in contact with a pair of primers which hybridize with the given nucleotide sequence or which hybridize with the nucleotide sequence that is structurally similar to a given nucleotide sequence; carrying out at least three amplification cycles; detecting any nucleic acid amplified.
For the amplification conditions that are appropriate as a function of the desired target sequences, a person skilled in the art may advantageously refer to the examples below.
According to another aspect, the invention also relates to a process for detecting a nucleic acid, given nucleotide sequences or nucleotide sequences that are structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: placing the collection of recombinant host cells in contact with a probe which hybridizes with the given nucleotide sequence or which hybridizes with a nucleotide sequence that is structurally similar to the given nucleotide sequence; detecting the hybrid possibly formed between the probe and the nucleic acids included in the vectors of the collection.
To carry out the screening of a DNA library according to the invention in order to detect the presence of a nucleotide sequence encoding a polypeptide capable of degrading lindane, the recombinant clones of interest were detected on the basis of their phenotype corresponding to their capacity to degrade lindane. With this aim, the clones isolated and/or sets of clones of the DNA library prepared were cultured in a culture medium in the presence of lindane and the lindane degradation was observed by the formation of a cloudy halo in the immediate environment of the cells.
The invention also relates to a process for identifying the production of a compound of interest by one or more recombinant host cells in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: culturing the recombinant host cells of the collection in a suitable culture medium; detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant cells cultured.
A subject of the invention is also a process for selecting a recombinant host cell which produces a compound of interest in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: culturing recombinant host cells of the collection in a suitable culture medium; detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant host cells cultured; selecting recombinant host cells which produce the compound of interest.
The invention also relates to a process for producing a compound of interest, characterized in that it comprises the following steps: culturing a recombinant host cell selected according to the process described above; recovering and, where appropriate, purifying the compound produced by the said recombinant host cell.
The invention also relates to a compound of interest, characterized in that it is obtained according to the process described above.
A compound of interest according to the invention can consist of a polyketide produced by means of expressing at least one nucleotide sequence comprising a sequence chosen from the sequences SEQ ID No 33 to 44 and SEQ ID No 30 to 32 and SEQ ID No. 115 to SEQ ID No.
120.
The invention also relates to a composition comprising a polyketide produced by means of expressing at least one nucleotide sequence comprising a sequence chosen from the sequences SEQ ID No 33 to SEQ ID No 44 and SEQ ID No 30 to SEQ ID No 32 and SEQ ID No. 115 to SEQ ID No. 120.
A polyketide produced by means of expressing at least one nucleotide sequence above is preferentially the product of the activity of several coding sequences included in a functional operon whose translation products are the various enzymes required for the synthesis of a polyketide, one of the above sequences being included and expressed in the said operon. Such an operon comprising a nucleic acid sequence according to the invention encoding a polyketide synthase can be constructed, for example, according to the teaching of Borchert et al.
(1992).
The invention also relates to a pharmaceutical composition comprising a pharmacologically active amount of a polyketide according to the invention, where appropriate in combination with a pharmaceutically compatible vehicle.
Such pharmaceutical compositions will advantageously be adapted for the administration, for example parenteral administration, of an amount of a polyketide synthesized by a type I polyketide synthase according to the invention ranging from 1 pg/kg per day to 10 mg/kg per day, preferably at least 0.01 mg/kg per day and most preferably between 0.01 and 1 mg/kg per day.
The pharmaceutical compositions according to the invention can be administered either orally, rectally, parenterally, intravenously, subcutaneously or intradermally.
The invention also relates to the use of a polyketide obtained by means of expressing a type I polyketide synthase according to the invention, for the manufacture of a medicinal product, in particular a medicinal product with antibiotic activity.
The invention will also be illustrated, without however being limited, by the figures and examples below.
Figure 1 illustrates the scheme of the various lysis steps carried out according to protocols 1, 2, 3n, 4a, 4b, 5a and 5b described in Example 1.
Figure 2 illustrates an electrophoresis on 0.8% agarose gel of the DNAs extracted from 300 mg of soil No 3 (St Andre coast) after various lysis treatments (protocols 1 to 5, cf. Fig. M: lambda phage molecular weight marker.
Figure 3 illustrates the proportion of various genera of actinomycetes cultured after treatments 1 to 5 (cf. Fig. The cfu (colonyforming unit) number was determined on a medium which is selective for this group of bacteria. A total number of about 400 colonies was analysed.
Figure 4 illustrates the recovery of lambda phage DNA digested with Hindll added to the soils at different concentrations before or after grinding. The treatments T (heat shocks) and S (sonication) are additional lysis treatments. The quantification was carried out by analysis with a phospho-imager after dot-blot hybridization. A sample of each soil was used for each concentration of lambda phage added. The characteristics of the soils are given in Table 1. The samples corresponding to 10 and 15 /g pf DNA added were not treated.
Figure 5 illustrates the PCR amplification of the DNAs extracted from soil No 3 according to protocols 1, 2, 3, 5a and 5b. The primers FGPS 122 and FGPS 350 (Table 2) were used to target indigenous Streptosporangium spp. The DNAs extracted were used undiluted or at and 100-fold dilutions. M: 123 bp molecular weight marker (Gibco BRL), C: DNA-free amplification control.
Figure 6 illustrates the amounts of DNA extracted after inoculating spores or mycelium of S. ividans OS48.3 inoculated into the soils at different concentrations. The amounts of mycelium added to the soil correspond to the number of spores inoculated in the germination medium.
About 50% of the spores germinated and the number of cells or genomes contained in the germinated spore hyphae was not determined. The amounts of spores and of mycelium inoculated are thus not directly comparable. The extraction protocol was carried out according to protocol 6 (cf. materials and methods section). Symbol indicates that RNA was included in the extraction buffer. The target DNA was amplified by PCR with the primers FGPS 516 and FGPS 517, and the quantification was carried out with a phospho-imager after dot-blot hybridization using the probe FGPS 518. A sample of each soil was used for each concentration of hyphae or of spores. The characteristics of the soils are described in Table 1.
Figure 7 represents the phylogenetic tree obtained with the Neighbour Joining algorithm, positioning the 16S rDNA sequences contained in the soil DNA library, relative to cultured reference bacteria.
In grey: the sequences obtained from the pools of clones of the library.
The bootstrap values are indicated at the nodes, after re-sampling of 100 repetitions. The scale bar indicates the number of substitutions per site. The access number of the sequences in the Genbank database is indicated in parentheses.
Figure 8 represents a scheme of the vector pOSint 1.
Figure 9 represents a scheme of the vector pWED 1.
Figure 10 represents a scheme of the vector pWE15 (ATCC No 37503).
Figure 11 represents a scheme of the vector pOS7001.
Figure 12 represents a scheme of the vector Figure 13 represents the fragment containing a "cos" site inserted into the plasmid pOSV010 during construction of the vector pOSV303.
Figure 14 represents a scheme of the vector pOSV303.
Figure 15 represents a scheme of the vector pE116.
Figure 16 represents a scheme of the vector pOS700R.
Figure 17 represents a scheme of the vector pOSV001.
Figure 18 represents a scheme of the vector pOSV002.
Figure 19 represents a scheme of the vector pOSV014.
Figure 20 represents a scheme of the vector pBAC11.
Figure 21 represents a scheme of the vector pOSV403.
Figure 22 represents the electrophoresis gels for DNA of the library after digestion with the enzymes BamHI and Dral of the positive clones of the library screened with the PKS-I oligonucleotides.
Figure 23 illustrates the production of puromycin by the S. lividans recombinants compared with the production of the S. alboniger wild-type strain.
Figure 24 illustrates the alignment of soil PKSs with the conserved active sites of other PKSs. The references for each peptide are indicated.
The beta-ketoacyl synthase domains were aligned using the GCG PILEUP program (Wisconsin Package Version 9.1, Genetics Computer Group, Madison, Wisc).
Figure 25 illustrates the construction of an integrative conjugative cosmid.
Figure 26 illustrates the construction of an integrative conjugative
BAC.
Figure 27 illustrates the scheme for constructing the vector pOSV308.
Figure 28 illustrates the scheme for constructing the vector pOSV306.
Figure 29 illustrates the scheme for constructing the vector pOSV307.
Figure 30 illustrates the scheme for constructing the vector PMBD-1.
Figure 31 shows a detailed map of the plasmid pMBD-2 and also a scheme for constructing the vector pMBD-3.
Figure 32 illustrates a detailed map of the plasmid pMBD-4.
Figure 33 illustrates the scheme for constructing the plasmid from the plasmid pMBD-1.
Figure 34 illustrates the detailed map of the vector pBTP-3.
Figure 35 illustrates the scheme for constructing the vector pMBD-6 from the vector pMBD-1.
Figure 36 illustrates the map of the cosmid a26G1 whose DNA insertion contains open reading frames encoding several polyketide synthases.
Figure 37 is a scheme representing the DNA insertion strand) of the cosmid a26G1, on which are positioned the various reading frames encoding several polyketide synthases.
EXAMPLES:
EXAMPLE 1: Process for preparing a collection of nucleic acids from a soil sample containing organisms, comprising a step of direct extraction of DNA from the soil sample.
1. MATERIAL AND METHODS 1.1 SOILS: The characteristics of the six soils used in this study are listed in Table 1.
The clay content and organic matter content range, respectively, from 9 to 47% and from 1.7 to the pH ranging from 4.3 to 5.8.
Soil samples were collected from the surface layer of 5 to 10 cm in depth. All the visible roots were removed and the soils were stored at 4°C for a few days if necessary, after which they were dried for 24 hours at room temperature and screened (average mesh size: 2 mm) and then stored for up to several months at 1.2 BACTERIAL STRAIN AND CULTURE CONDITIONS: The extracellular DNA and the bacterial strains supplying vegetative cells, spores or hyphae, used to inoculate the soil samples, were chosen such that their presence could be specifically monitored.
In order to obtain large amounts of extracellular DNA, the lysogenic strain of E.coli 1192 Hfr P4X (metB), containing the lambda phage CI857 Sam7, was cultured on Luria-Bertani (LB) medium for two hours at 300C, then for 30 minutes at 40 0 C, and then for 3 hours at 370C.
The lambda phage DNA was extracted according to the technique desribed by Sambrook J. et al. (1989) Molecular Cloning: A Laboratory Manual, 2nd, ed. Cold Spring Harbor Laboratory, Cold Spring Harbor N.Y.
The avirulent strain of Bacillus anthracis (STERNE 7700) was used as bacterial cell inoculum. Bacillus anthracis was multiplied on a "trypticase soy broth" (TSB) (Biomerieux, Lyons, France) culture broth for about 6 hours, checking that the OD 60 0 was maintained below 0.6. These conditions allow the growth of vegetative cells without formation of spores (Patra et al., (1996), FEMS Immunol. Medical Microbiology, vol.15:223- 231). The spores of Streptomyces lividans OS48.3 (Clerc-Bardin et al., unpublished) were removed mechanically from the organism cultures on a R2YE medium (Hopwood et al., (1985), Genetic Manipulation of Streptomyces-A Laboratory Manual. The John Innes Foundation, Norwich, United Kingdom). The hyphae of S.lividans OS48.3 were obtained from pre-germination spores, since it was expected that the use of short hyphae would minimize the rupture and subsequent loss of DNA. The spores were suspended in TES buffer (N-tris [hydroxymethyl]methyl-2-aminoethanesulphonic acid; Sigma-Aldrich Chimie, France) (0.05 M; pH 8) (Holben WE et al., (1988), APPL. Environ. Microbiol. vol. 54:703-711), and were then subjected to a heat shock (50°C for 10 minutes followed by cooling under cold running water and then addition to an equal volume of pre-germination medium yeast extract, 1% casamino acids, 0.01 M CaCI 2 The solution was incubated at 370C on an agitator. The proportion of germinated spores was estimated at about 50%, in accordance with the results of Hopwood et al. (1985). After centrifugation, the pellets were resuspended in TES buffer, added to 3% TSB medium and incubated at 37°C until an OD 450 of 0.15 was obtained (Hopwood et al., (1985)).
Streptomyces hygroscopicus SWN 736 and Streptosporangium fragile AC1296 (Institute Pushino, Moscow) were cultured according to techniques described by Hickey and Tresner (1952).
The DNA of the spores and hyphae of S. lividans was extracted from pure cultures according to the lysis protocol 6 described below (except that no grinding was carried out), while the spores of S. hygroscopicus and S. fragile were extracted by chemical/enzymatic lysis (Hintermann et al., 1981).
1.3 CHOICE OF THE EXTRACTION BUFFER: A TENP buffer (50 mM Tris, 20 mM EDTA, 100 mM NaCI, 1% wt/vol of polyvinylpolypyrrolidone) developed by Picard (1992) was used. Similar buffers were subsequently used by other authors (Clegg et al., 1997; Kuske et al., 1998; Zhou et al., 1996).
The Tris and the EDTA protect the DNA from the nuclease activity, the NaCI provides a dispersant effect and the PVPP absorbs the humic acids and the other phenolic compounds (Holben et al. (1988); Picard et al., (1992)).
In this study, the extraction efficacy of this buffer was evaluated at different pH values (6.0-10.0) using 20 different soils having a pH range from 5.8 to 8.3 and an organic matter content of between 0.2 and 6.3%.
These twenty soils (the other characteristics are not indicated) were used only in this experiment. The amount of DNA was determined by colorimetric means as described by Richard (1974), and detailed below.
1.4 PROTOCOL OF IN SITU LYSIS AND OF DNA EXTRACTION: Several protocols using an increasing number of steps were tested in order to evaluate the efficacy of various techniques for lysing the soil microbes in situ. For these experiments, the indigenous soil microflora was targetted in six soils. Additional experiments were carried out in order to study the effects of the lysis treatments on the DNA released, by analysing the quantities and quality of DNA recovered originating from a lambda phage DNA added beforehand to the soils.
Once an optimized protocol (referred to as protocol 6) had been developed, this protocol was used to quantify the DNA originating from indigenous Actinomycetes and of DNA originating from gram-positive bacteria inoculated in the selected soils. In all cases, the soil samples were dried and screened as described above.
After grinding, 0.5 ml of TENP buffer was added to 200 mg dry weight of soil, except for protocol 1 in which the buffer was added to an unground soil.
For the various lysis treatments (see below), the soil suspensions were vortexed for 10 minutes and centrifuged (4000 x g for five minutes), after which an aliquot fraction (25 pl) of the supernatant was analysed by gel electrophoresis agarose).
Another aliquot fraction of the supernatant representing a known volume, generally 350 pl, was precipitated with isopropanol.
Five aliquot fractions (representing the DNA derived from 1 g of soil) were combined and resuspended in 100 1l of a sterile TE buffer mM Tris, 1 mM EDTA, pH 8.0) before purification (protocol D, see below) and quantification, either by hybridization (Dot-Blot) of the total DNA, or by hybridization (Dot-Blot) of the PCR amplification products (see below).
The hybridization signals were quantified by phosphorescence imaging ("phospho-imaging" technique, see below).
EVALUATION OF THE METHODS OF IN SITU CELL LYSIS: The quality and quantity of DNA extracted after an increasing number of lysis treatment steps (protocol 2-5b) were compared with those of the extracellular DNA obtained after washing the soil with an extraction buffer (protocol 1; see also Figure 1).
Protocol 1: No lysis treatment.
The TENP buffer was added to an unground soil, and a DNA extraction step was carried out as described above.
Protocol 2: Grinding of the soil followed by a DNA extraction.
Two different types of device were used to grind the soil.
In order to compare their respective efficacy, 5 g of dry soil were ground for 30 seconds in a grinder containing tungsten rings, or for times varying up to 60 minutes in a soil grinder containing a mortar and agate beads (20 mm in diameter).
The TENP buffer was then added and the DNA was extracted as described above.
The gel electrophoresis results showed that grinding for minutes using agate beads was necessary in order to obtain amounts of extracted DNA equivalent to those obtained after grinding for 30 seconds using tungsten rings.
The size distribution of the DNA fragments is similar whatever the method used.
Thus, these treatments were considered as equivalent and the one which is used in the protocols described below will consequently not be specified.
In protocols 3 to 5, the efficacy of several other lysis treatments subsequent to the grinding of the soil was tested, either separately or in different combinations.
Protocol 3: This protocol is identical to protocol 2, except that it comprises a step of homogenization using an Ultra-turrax type mixer (Janker and Kunkel, IKA Labortechnik, Germany) set at half the maximum speed for minutes.
PROTOCOLS 4a and 4b: These protocols are identical to protocol 3, except for an additional sonication step.
Two types of sonicator device were compared: a titanium micropoint sonicator (600W Vibracell Ultrasonicator, Bioblock, Illkirch, France) (Protocol 4a) and a sonicator of Cup Horn type (protocol 4b).
The Vibracell micropoint producing ultrasound is in direct contact with the soil solution.
As regards the device of Cup Horn type, the soil solution is stored in tubes which are placed in a water bath through which the ultrasound passes.
Preliminary experiments were carried out in order to determine the optimum conditions for the two sonicators (results not presented).
The best compromise, in terms of amount of DNA extracted and fragment size, consists of a sonication with the titanium micropoint and the sonicator of Cup Horn type for 7 and 10 minutes respectively, adjusting the power to 15 W and with 50% active cycles.
Protocols 5a and After sonication with a titanium micropoint or a device of Cup Horn type (protocols 4a and 4b respectively), lysozyme and achromopeptidase were added to each of the enzymes at a final concentration of 0.3 mg/ml.
The soil suspensions were incubated for 30 minutes at 37 0 C, after which lauryl sulphate at a final concentration of 1% was added, and the suspensions were then incubated for 1 hour at 60 0 C before centrifugation and precipitation as described above.
In addition to the protocols described above, the effect of the sonication (Cup Horn, see protocol 4b) and heat shocks (30 seconds in liquid nitrogen followed by three minutes in boiling water, the treatments being repeated three times) on lambda phage DNA digested with Hindlll added beforehand to the soil, were examined (see below).
Heat shocks were suggested in the prior art as means for in situ cell lysis (Picard et al. (1992)). However, due to the fact that such a treatment has a harmful effect on the free DNA (see the results section) it was not included in the protocols described above.
OPTIMIZED PROTOCOL After evaluation of the various lysis treatments, an optimized protocol was defined, which is referred to as protocol 6. Protocol 6 is identical to protocol 5b except that, before sonication, the soil suspensions are subjected to a vortexing treatment and then agitated by rotation on a wheel for two hours before being frozen at -20 0
C.
After thawing, the soil suspensions were vortexed for 10 minutes before sonication. Protocol 6 was used in the experiments in which the soils were inoculated with bacterial cells, as well as in the experiments in which the indigenous actinomycetes were quantified (see below).
1.6 COUNTING BY MICROSCOPE: The efficacy of grinding of the soil as a method for lysing bacterial cells was examined by microscope.
g of dried crude soil were mixed in a Waring Blender device with ml of ultrapure sterilized water for 1.5 minutes; simultaneously, 1 g (dry weight) of ground soil (protocol 2) was suspended in 10 ml by agitation for 10 minutes. The soil suspensions were serially diluted and acridine orange was added to a final concentration of 0.001%.
After 2 minutes, the suspensions were filtered through a Nucleopore brand membrane of 0.2 pm black type. Each filter was rinsed with lysed sterile water, treated with 1 ml of isopropanol for 1 minute in order to fix the bacterial cells, and then rinsed again.
The bacterial cells were counted using a Zeiss Universal epifluorescence microscope with a 100x objective lens. For each of the types of soil, three filters were counted, and at least 200 cells were counted on each of the filters.
1.7 COUNTING OF THE CULTURABLE ACTINOMYCETES AND TOTAL NUMBER OF COLONY-FORMING UNITS (CFU): The actinomycetes which survived the lysis treatments (protocols 1-5) were examined specifically with soil No. 3 (Saint Andre coast, see Table 1).
After a 10-fold dilution of a solution of yeast extract (6% weight/volume) and of SDS in order to induce germination (Hayakawa et al. (1988)), the soil suspensions were serially diluted in sterile water, incubated at 40 0 C for 20 minutes and inoculated on HV medium (Hayakawa et al., 1987).
The HV medium was supplemented with actidione (50 mg/I) and nystatin (50 mg/I).
The actinomycete colonies were counted after incubation for 15 days at 28 0
C.
In total, about 400 colonies were examined. The identification was carried out on the basis of the macro- and microscopic morphological characteristics as well as on the analysis of the diaminopimelic acid content of the isolates (Shirling et al., 1966); Staneck et al., 1974; Williams etal.,1993).
The total amount of culturable bacteria (total CFU) was also determined for each of the lysis protocols 1 to 5. The soil suspensions were serially diluted and inoculated in triplicate on a Bennett agar medium (Waksman et al., 1961) supplemented with nystatin and actidione (each at mg/I).
Each Petri dish was covered with a cellulose nitrate filter (Millipore) and incubated for three days at 280C. After counting the colonies on the membranes, the filters were removed and the Petri dishes were reincubated for 7 days at 28 0 C and then counted again.
1.8 RECOVERY OF THE LAMBDA PHAGE DNA ADDED TO THE SOILS: The lambda phage DNA was digested with Hindlll, extracted with a phenolchloroform mixture, precipitated and then resuspended in ultrapure sterile water according to standard protocols (Sambrook et al.,1989).
Dilutions corresponding, respectively, to 0, 2.5, 5, 7.5, 10 and pg of DNA/g of dry weight of soil were prepared in 60 pil volumes. These DNA dilutions were added to 5 g batches of dry soil which were subsequently vortexed vigorously for 5 minutes before grinding.
The lambda phage DNA was also added to a soil before grinding at concentrations corresponding to 0, 10 and 15 pg of DNA/g of dry weight of soil.
After grinding, the extraction buffer was added and the DNA was extracted according to protocol 2 (see above).
1.9 SATURATION OF THE ADSORPTION SITES WITH RNA: In order to determined whether or not the saturation of the nucleic acid adsorption sites of the soil colloids could increase the level of recovery of the DNA, the sandy compost (soil No. 4) and the clayey soil (soil No. 5) were incubated with an RNA solution before any other treatment.
Commercial Saccharomyces cerevisiae RNA (Boehringer Mannheim, Meylan, France) was diluted in phosphate buffer (pH 7.1) and added to the dry, screened soil samples (2 ml/g of soil) at final concentrations of 20, 50 and 100 mg of RNA/g of dry weight of soil.
The tubes containing the soil suspensions were agitated by rotation for two hours at room temperature. After centrifugation, the soil pellets were dried in an oven (500C) overnight. The lambda phage DNA was then added to the soils 20 or 50 jig/g of dry weight of soil) in order to simulate the fate of the DNA released after cell lysis.
The DNA was extracted according to protocol 2. It was determined thereafter that an identical effect of addition of RNA on the recovery of DNA could be achieved by adding the RNA directly to the extraction buffer.
This simplified procedure was used for the clayey soil No. 5 in the experiments in which the microorganisms were inoculated in the soils.
The RNA was then added at a concentration corresponding to 50 mg of RNA/g of dry weight of soil.
1.10 QUALITATIVE AND QUANTITATIVE DETERMINATION OF THE EFFICACY OF THE EXTRACTION PROTOCOLS: The quality of the DNA (absence of degradation) was estimated on the basis of the size of the DNA fragments or the relative position of the DNA migration bands after electrophoresis of an aliquot fraction of a DNA solution on a 0.8% agarose gel.
The fluorescence intensity allowed a semi-quantitative estimation of the extraction yields.
Another aliquot fraction was used for quantitative determinations of the DNA content by hybridization (Dot-Blot) and analysis with a phospho-imager. The Dot Blot hybridization protocol has been described by Simonet et al. (1990).
The hybridization membranes (GeneScreen plus, Life Science Products, Boston, USA) were prehybridized for at least 2 hours in 20 ml of a solution containing 6 ml of 20 x SSC, 1 ml of Denhardt's solution, 1 ml of SDS and 5 mg of salmon sperm DNA.
The hybridization was carried out overnight in the same solution in the presence of a labelled probe prior to two washes of the membranes in an SSC 2 x buffer for 5 minutes at room temperature, followed by a third wash in a SSC 2 x, 0.1% SDS buffer and a fourth wash in an SSC 1 x, 0.1% SDS buffer for 30 minutes at the hybridization temperature.
The hybridization signals were quantified with a Biorad radioanalytical imaging system (Molecular Analyst Software, BIORAD, Ivry-sur-Seine, France).
In order to quantify the total amount of DNA derived from the indigenous microflora, the various soils were extracted according to protocols 1 to 5. The non-amplified DNA was applied to the Dot-Blot membranes and hybridized using the universal probe FGPS431 (Table 2).
This probe, which hybridizes to positions 1392-1406 of the E.coli 16S rDNA gene (Amann et al. (1995)) was labelled at its ends with a 3 2 P ATPa using a polynucleotide T4 kinase (Boehringer Mannheim, Melan, France).
A calibration curve was prepared using E.coli DH5a DNA. The conversion of the calculations to the soil bacteria required a simplification, starting from the hypothesis that the average number of copies (rrn) is 7, as for E.coli.
The lambda phage DNA digested with Hindll was used to quantify the recovery of the extracellular DNA. Non-amplified extracts from soils, to which lambda phage DNA had been added, were hybridized with lambda phage DNA digested with Hindlll and labelled at random using the Klenow fragment (Boehringer Mannheim, Melan, France).
The amounts of DNA were calculated by interpolation using a calibration curve prepared with the purified DNA.
The total amount of DNA extracted from soils 1, 2, 3, 4 and 6 according to protocol 2 (grinding) was also quantified by colorimetric means according to the technique described by Richard (1974).
Briefly, the DNA was mixed with concentrated HCIO 4 (the final concentration of HCIO 4 was 1.5 2.5 volumes of this solution were mixed with 1.5 volumes of DPA (diphenylamine, Sigma-Aldrich, France) and the mixture was left to incubate at room temperature for 18 hours, prior to determination of the OD at 600 nm. The soil DNA extracts were quantified relative to a standard curve prepared with the DNA extracted from E.coli according to the standard protocols (Sambrook et al., (1989)).
1.11 DEVELOPMENT OF A DNA QUANTIFICATION TECHNIQUE USING PCR AMPLIFICATION AND HYBRIDIZATION: For the PCR amplifications, DNA Taq polymerase (Appligene Oncor, France) was used according to the manufacturer's instructions.
The PCR programme used for all the amplifications is as follows: initial denaturing for 3 minutes at 950C, followed by 35 cycles consisting of 1 minute at 95°C, 1 minute at 550C and 1 minute at 720C and then a final extension at 720C for 3 minutes.
The DNA isolated and purified from Streptosporangium fragile was used as control at concentrations ranging from 100 fg to 100 ng.
In order to amplify specifically the DNA of this bacterial genus, the primers FGPS122 and FGPS350 (Table 2) were selected, which are complementary to a portion of the 16S rDNA, after alignment of the sequences of actinomycetes 16S rDNA. Their specificity was tested on a collection of actinomycetes strains (Streptomyces, Streptosporangium and other highly similar genera).
The PCR products were hybridized with the oligonucleotide probe FGPS643 (Table In order to simulate the level of purity routinely obtained with DNA extracted from the soil, controls of pure DNA from S. fragile were mixed with the soil extracts obtained after treatments according to the lysis protocols 4b and 5b and then purified according to protocol D.
Before use, the soil extracts were treated with DNase (one unit of DNase/ml, Gibco BRL) for 30 minutes at room temperature. The DNase was then inactivated by heating at 65°C for 10 minutes. Verification of the inactivation was carried out by PCR. The humic acid concentrations were measured by spectrophotometry (OD 280 nm) against a standard curve of commercial humic acids (Sigma).
Soil solutions treated with undiluted, 10-fold diluted and 100-fold diluted DNase were mixed with from 100 fg to 100 ng of S. fragile DNA before the PCR amplification. In another series of experiments, the increasing concentrations of Streptomyces hygroscopicus DNA (from 100 pg to 1 pg) were added to the S. fragile DNA in order to simulate the presence of non-target DNA and its influence on the PCR process.
1.12 PURIFICATION OF THE CRUDE DNA EXTRACTS: Four DNA purification methods were compared. The DNA was extracted from 1 g (dry weight of soil) according to protocol 4a and resuspended in 100 pl of buffer TE8 (50 mM Tris, 20 mM EDTA, pH Protocol A Elution through two successive Elutip d columns (Schleicher and Schuell, Dassel, Germany) (Picard et al., (1992)).
Protocol B: Elution through a Sephacryl S200 column (Pharmacia Biotech, Uppsala, Sweden) followed by an elution through an Elutip d column (Nesme et al. (1995)).
Protocol C: Separation using a two-phase aqueous system with 17.9% (weight/weight) of PEG 8000 (Merck, Darmstadt, Germany) and 14.3% (weight/weight) of (NH 4 2
SO
4 (Zaslavsky, (1995)).
After vigorous vortex mixing, the two phases were left at room temperature to separate.
1 ml of each of the phases was transferred into another tube, mixed with 100 Al of the sample and left at 4°C overnight to allow separation.
The lower phase was dialysed for one hour through a Millipore membrane in the presence of an excess of a TE 7.5 buffer (10 mM Tris, 1 mM EDTA at pH 7.5 and 1M MgCI 2 in order to remove the excess salts.
Protocol D: Elution through a Microspin Sephacryl S400 HR column (Pharmacia Biotech, Uppsala, Sweden), followed by elution through an Elutip d column.
Each protocol is completed by a step of precipitation with ethanol and the DNA is resuspended in 10 il of TE 7.5 buffer. The efficacy of the purification protocols was checked by PCR amplification of undiluted aliquot fractions of the DNA solutions and of 10-fold and 100-fold diluted aliquot fractions, using standard protocols (see below).
1.13 RECOVERY OF THE DNA FROM INOCULATED
MICROORGANISMS:
The cells, spores and hyphae were washed twice and counted by counting on a plate or by direct microscopic counting. 5 g batches of dry, screened soil (soils 2, 3 and 5) were inoculated with 100 p.l of a suspension of S. lividans spores and hyphae at concentrations corresponding to 0, 10 3 105, 107 and 10 9 spores/g of dry weight of soil, or with B.anthracis vegetative cells at concentrations corresponding to 0, 10 7 and 10 9 cells per gram of dry weight of soil.
The amounts of S. lividans hyphae were calculated on the basis of the number of spores from which they originate. After addition of the bacterial suspensions, the soil samples were vortexed vigorously for minutes before grinding. The DNA was extracted according to protocol 6 (see below).
PCR amplification followed by Dot-Blot hybridization and phosphorescence imaging (phospho-imaging) was used in order to quantify the amounts of DNA recovered from the cells and spores and from the bacterial mycelium inoculated in the soils.
The DNA extraction was carried out according to lysis protocol 6.
The PCR amplification and the hybridization were carried out as described above. The primers and probes are targetted on chromosome regions located outside the 16S region, and are highly specific for the respective organisms, so as to avoid background signals.
For the soils inoculated with B. anthracis, the primers R499 and R500 were used (Patra et al. (1996)) and the amplification products were hybridized with the oligonucleotide probe C501 (Table 2).
For the soils inoculated with S. lividans, the PCR reactions were carried out using the primers FGPS516 and FGPS517, and the amplification products were hybridized with the oligonucleotide probe FGPS518 (Table 2).
The amplified region is a portion of the cassette constructed specifically to obtain the strain OS48.3 (Clerc-Bardin et al., unpublished).
The calibration counts were obtained in all cases using the purified DNA from the target organism.
2. RESULTS 2.1 CHOICE OF THE EXTRACTION BUFFER: different soils were used in order to determine the optimum pH of the DNA extraction buffer. For all the soils, the DNA yield increases as the buffer pH increases. The yield for each pH sd), calculated as the percentage of the highest value for each of the soil, is as follows: pH 6.0 31 13; pH 7.0: 43 16; pH 8.0: 60 14; pH 9.0: 82 12; pH 10.0: 98 3.
For 16 out of the 20 soils, the highest yield was obtained at pH 10.0, whereas for the other four soils, the highest yield was obtained at pH 9.0. However, at pH 10.0, larger amounts of humic material were released, compared with pH 9.0 (results not presented). Consequently, pH was chosen for all the experiments presented below.
2.2 EFFICACY OF THE DNA EXTRACTION PROTOCOLS: The total DNA from the indigenous soil organisms was extracted and quantified so as to evaluate the efficacy of several in situ cell lysis protocols. Soil samples 1-6 (Table 1) were treated according to protocols 1 to 5 described in the Materials and Methods section (Figure 1).
After the DNA extraction, the soil suspensions were precipitated with isopropanol, and aliquot fractions of the resuspended pellets were analysed by gel electrophoresis, in a first step, in order to estimate the quality and quantity of the DNA released.
However, the colour of the DNA extract turned darker and darker as the number of lysis steps increased, due to the co-extraction of compounds, such as humic acids, with the DNA.
Some of these dark-coloured crude extracts do not migrate in the expected manner in the agarose gels.
Consequently, the crude DNA solutions were purified (protocol B) before quantification. The gel electrophoreses of the purified solutions obtained after the various lysis treatments are given as examples on soil 3 (Figure 2).
A visual comparison by ultraviolet radiation of the intensities of the coloured DNA allowed a semi-quantitative estimation of the efficacy of the treatments. Furthermore, the presence of migration profiles of multiple sizes of DNA fragments (discrete bands) and the disappearance of the long fragments indicates that a degradation of the DNA has taken place.
No DNA could be extracted from the clayey soil No. A more precise quantification of the DNA from all the soils, extracted according to protocols 1 to 5, was carried out by Dot-Blot hybridization without a prior PCR amplification step and using an oligonucleotide probe complementary to a highly conserved sequence of the 16S rDNA region (probe FGPS 431, Table 2).
The DNA was detected in the extracts of all the soils after each of the various lysis steps, except for the clayey soil No. The results agree with the estimations made after gel electrophoresis.
In order to compare with an independent quantification method, the DNA extracted according to protocol 2 (from all the soils except soil No. 5) was also quantified using a colorimetric DNA detection method (Richard, 1974).
Good correlation was found (r 0.88) between the DNA quantified using this colorimetric technique and the results obtained by Dot- Blot hybridization/radio-imaging, confirming the hypothesis that the average number of copies of the soil bacteria (rrn) is 7.
The hybridization (Dot-Blot) showed that the amounts of extracellular DNA, as determined by extraction without a lysis treatment (protocol ranged from 4 p~g/g for the acidic soil (No. 6) to 36 pIg/g for soil No. 3 (Table 3).
Grinding of the soil (protocol 2) increased the amounts of DNA extracted from all the soils 26 ig/g of soil for soil No. 6 and 59 pg/g of soil for soil No. 3) (Table 3; Figure 2).
For the two grinding treatments (see the Materials and Methods section), the discrete DNA migration was detected on the agarose gels, indicating that the DNA molecules were partially degraded (Figure 2).
The size of the DNA fragments is between 20 and 0.2 kb. The band intensity of the smallest fragments is very low, indicating that most of the fragments are much bigger than 1 kb.
Protocol 3 comprises a step of homogenization in an Ultra-turrax mixing device after addition of the extraction buffer to the soil samples.
This step leads to an increase in the amounts of DNA extracted, as determined by Dot-Blot hybridization for two of the soils (the sandy soil No. 3 and the acidic soil No. whereas the two soils rich in organic matter (soils No. 1 and No. 2) led to the production of smaller amounts of
DNA.
Protocols 4a and 4b made it possible to evaluate the effect of two types of sonication on the yields of DNA from pre-ground and prehomogenized soils.
The sonication had no positive effect on the DNA yield, compared with protocol 3, except for soil No. 6. However, the lysis efficacy for the two types of sonicator differs. For soils 2, 3 and 4, the largest amounts of DNA extracted were obtained using the titanium micropoint (Table 3; Figure 2), whereas for soils Nos. 1 and 6, the DNA yield was higher using the Cup Horn device.
Contradictory results were also obtained when a step of enzymatic/chemical lysis was added (protocols 5a and 5b) after the sonication step; in certain cases, the amounts of DNA extracted were larger than those recovered according to protocols 4a and 4b, whereas in other cases the yields were lower (Table 3).
2.3 DIRECT COUNTING OF THE MICROORGANISMS: Counting by microscope of the total number of bacterial cells after staining with acridine orange was carried out for all the soils, before and after grinding.
Before grinding, the number of bacteria per gram of dry weight of soil ranged from 1.4 x 10 9 0.4) in the tropical soil No. 5, to 10 x 10 9 0.7) in the soil obtained from the Saint-Andrd coast (soil No. 3) (Table 1).
After grinding, the number of cells were, respectively, 45, 74, 54, 34 and 75% of the initial values for soils Nos. 1 to 6.
2.4 COUNTING OF THE CULTURABLE ACTINOMYCETES BELONGING TO DIFFERENT GENERA: A modification in the populations of actinomycetes in soil No. 3 was noted after the various lysis treatments (Figure 3).
For example, the colonies of Streptomyces sp. dominated the viable actinomycetes flora when no lysis treatment was applied (protocol 1) and represented 65% of the total number of colonies identified. After grinding, the percentage of Streptomyces colonies fell to 51%, whereas the proportion of colonies belonging to the Micromonospora genus increased by 14% to 41%.
The chemical/enzymatic lysis (protocols 5a and 5b) appeared to be particularly effective for the lysis of Streptomycetes. When all the lysis treatments were applied, including a chemical/enzymatic lysis (protocols and 5b), the actinomycetes microflora, which still comprised more than 106cfu/g of soil, was dominated by the species belonging to the Micromonospora genus, while few or no Streptomyces colonies were recovered.
The organisms belonging to genera such as Streptosporangium, Actinomadura, Microbispora, Dactilosporangium and Actinoplanes appeared in small number on the plates of the total number of colonies identified) after grinding, homogenization with the Ultra-turrax device and sonication, but were generally absent when these treatments were combined with a chemical/enzymatic lysis.
The total number of culturable bacteria remaining after each lysis treatment (protocols 2 to 5) was also investigated for soil No. 4. The results indicate that the number of culturable bacteria does not decrease with the intensity of the lysis treatments (about 2 x 10" cfu/g of soil in all cases, and also when a treatment is not applied, such as according to protocol 1).
The production of these low cfu values is probably due to the fact that dry soil was used and that only the most resistant bacteria multiplied on the plates. The number of actinomycetes forming colonies was generally greater than that of the total cfu (all the bacteria) due to the fact that a spore-germination step, included in the actinomycetes detection protocol, was missing during the control of the total bacteria.
RECOVERY OF THE LAMBDA PHAGE DNA ADDED: The aim of these experiments was to estimate the way in which successive lysis treatments might affect the recovery of naked DNA, and whether or not these successive lysis treatments contributed to its degradation.
The DNA could be either a fraction of extracellular DNA released from already-dead organisms, which can persist in the soil for months (Ward et al., 1990), or DNA released from organisms readily lysed during the first steps of the treatment. In order to simulate this situation, lambda phage DNA digested with Hindll was added, at various concentrations, to the soils before and after grinding. In addition to grinding, a combination of the other lysis treatments was tested, including sonication (Cup Horn device, see protocol 4b) and heat shocks (see the Materials and Methods section).
After extraction, aliquot fractions which theoretically needed to contain from 25 to 150 ng of lambda phage DNA were analysed by gel electrophoresis. No DNA fragment specific for the lambda phage could be observed when the DNA was inoculated into the soil samples prior to grinding, independently of the dose or of the type of soil.
When the DNA was added after grinding, and extracted without an additional lysis treatment step, the specific lambda phage DNA profiles were detected in the extracts of four out of the five soils tested.
In all these cases, a direct cause-and-effect relationship was obtained between the amount of DNA added and the intensity of the signals on the agarose gels. However, the signal intensities were less than the signal intensities expected when compared with those of the molecular standards.
Furthermore, the band at 23 kb was absent in several cases, indicating that the long fragments were preferentially adsorbed onto the soil particles, or were more sensitive to degradation, compared with the short fragments.
No band was detected in the samples of tropical soil No. 5 which is characterized by a very high clay content (Table 1).
For a more precise quantification, the recovery of DNA was determined on a phosphorescence imaging device (phospho-imager) after Dot-Blot hybridization. According to this technique, the DNA was detected in all the samples, including those which had been inoculated before grinding, except for soil No. 5 in which no DNA could be detected.
In all the other soils, the amount of DNA extracted increases as the size of the inoculum increases (Figures 4a-d).
However, the recoveries of lambda phage DNA were low. When grinding was the only lysis treatment applied, the recoveries were between 0.6 and 5.9% of the DNA added when this DNA was added before grinding, and from 3.6 to 24% of the DNA added when the latter was added after grinding. The highest levels of recovery were obtained from soil No. 2.
Gel electrophoresis of aliquot fractions of samples treated by heat shock and sonication did not allow any DNA bands to be observed in any of the samples, including the tests in which the DNA had been added after grinding. The Dot-Blot hybridization experiments confirmed these results.
The hybridization signals obtained from soil suspensions which were treated with heat shocks and sonications were, at best, low.
The sample showing the largest amount of DNA (15 /g of DNA/g of dry weight of soil) was the only one for which the signal obtained was substantially different from the background level.
No difference (or only small differences) was observed between the samples treated with heat shock and those treated with heat shocks and sonication, indicating that the heat shocks have a harmful effect on the DNA. The best recoveries were observed for soil No. 2, which has the highest organic matter content (Table whereas no DNA was recovered from the clayey soil No. Additional experiments were carried out with non-ground samples of soils No. 4 and No. 5, which were inoculated with 20 and 50 gg of lambda phage DNA per gram of soil.
The samples were extracted immediately or after an incubation period of one hour at 28 0 C, and the DNA extracts were then purified and analysed by gel electrophoresis.
The incubation of soil No. 4 for one hour after the inoculation did not give profiles that were qualitatively or quantitatively different from those obtained without incubation or from those observed previously when the DNA was added after grinding.
These results indicate that the enzymatic degradation by the soil nucleases is not thought to be involved in the low level of DNA recovery.
Furthermore, the absence of a grinding step does not allow an increase in the recovery of the DNA from soil No. 5, indicating that the changes to the structure of the soil due to the grinding do not significantly increase the adsorption of the nucleic acids onto the colloids.
2.6 SATURATION OF THE ADSORPTION SITES WITH RNA: Most of the profiles obtained on the agarose gels do not differ significantly from the previous profiles in which the RNA treatment was not carried out.
For example, no band was detected from the clay-rich soil No. independently of the RNA concentrations and of the lambda phage DNA concentrations used.
Furthermore, the specific bands of lambda phage DNA digested with Hindlll remained undetectable in the sandy compost treated with RNA (soil No. 4) when the RNA is added before grinding.
The intensity of the bands obtained from samples inoculated with DNA after grinding increases as the RNA concentration increases, indicating that the treatment might have a positive effect.
However, the results after hybridization and analysis by phosphorescence imaging did not confirm the electrophoresis results. For example, the positive effect of the RNA treatment on the recovery of DNA from the clayey compost, when DNA was added after grinding, did not appear clearly.
On the other hand, a positive effect of the RNA was found for the clay-rich soil (No. 5) when the DNA was added after grinding.
Although the hybridization signals for the control samples do not differ from the background noise levels, significant amounts of DNA were released from the samples treated with RNA, and the signals increased as the amount of DNA added increased and as the RNA concentration increased.
However, even for the highest RNA concentration (100 mg/g of weight of dry soil), the recovery level never exceeded 3%.
2.7 PURIFICATION OF THE CRUDE DNA EXTRACTS: Of the four protocols tested, the best amplification of the undiluted DNA extracts (1 pl of extract in 50 pl of PCR mixture) was observed after elution through Microspin S400 columns followed by an elution through an Elutip d column as shown by the gel electrophoresis of the PCR products.
The DNA purified by the two-phase aqueous system (protocol C) gave smaller amounts of PCR products after amplification starting with undiluted DNA extract.
No amplification product could be obtained from the undiluted extracts after amplification following the use of protocols A or B.
Consequently, protocol B (see Materials and Methods section) was used for all the experiments in which the PCR amplifications and/or the Dot-Blot hybridizations were performed.
2.8 QUANTIFICATION BY PCR AND HYBRIDIZATION: The first step was to determine whether or not the amounts of PCR product were proportional to the number of target DNA molecules initially present in the reaction tube. DNA from Streptosporangium fragile was used as target (see Materials and Methods section).
The primers used were the primers FGPS122 and FGPS350 (Table Gel electrophoresis of the PCR products showed that the band intensity increases as the concentration of the targets increases. The PCR products were hybridized with the oligonucleotide probe FGPS643 (Table and the signals were quantified by phosphorescence imaging (phospho-imaging).
A good correlation (r 2 0.98) was found between the log[number of targets] and the log[intensity of the hybridization signal].
An investigation was then carried out to see whether or not the efficacy of the PCR amplification was affected by the humic acids and the non-target DNA. When analysed by gel electrophoresis, the increased intensity of the bands for the PCR products, corresponding to the various amounts of target DNA, were conserved when the amplification was carried out with DNA solutions to which extracts of soil treated with DNase had been added, containing humic acids at concentrations ranging up to 8 ng in 50 il of the PCR mixture.
With 20 ng of humic acid in the PCR mixture, the bands corresponding to the small levels of target DNA disappeared, and at humic acid concentrations of 80 ng and at higher concentrations, no band was visible.
The varied amounts of target DNA from S.fragile made it possible to supply the expected amounts of PCR product when, before amplification, the S. fragile DNA was mixed with Streptomyces hygroscopicus DNA and added to 50 pl of the PCR mixture in a range from 100 pg to 1 g in order to simulate the non-target DNA released from the soil microflora.
2.9 QUANTIFICATION OF THE INDIGENOUS SOIL ACTINOMYCETES AFTER DIFFERENT LYSIS TREATMENTS: Purification protocol D was applied, followed by a PCR amplification as described above, in order to quantify the actinomycetes belonging to the Streptosporangium genus in soil No. 3 after extraction in accordance with protocols 1, 2, 3, 5a and 5b (Figure After grinding (protocol the amount of target DNA originating from this actinomycete was estimated by hybridization (Dot-Blot) and radioimaging as being 2.5 1.3 ng/g of weight of dry soil.
If it is postulated that the DNA content is 10 fg per cell, as for Streptomyces (Gladek et al., 1984), this value corresponds to approximately 2.5 x 105 genomes. Similar values were obtained after the other lysis treatments (2.6 1.1 and 1.8 1.3 ng of DNA/g of dry soil, respectively, using protocols 3 and 4b, respectively).
2.10 EFFICACY OF THE RECOVERY OF DNA FROM SOILS PRE- INOCULATED WITH BACTERIA: Three soils (Nos. 2, 3 and 5) were inoculated with different concentrations of Streptomyces lividans spores or hyphae (see Materials and Methods section). The amounts of mycelium added to the soil (Figure 6b) correspond to the number of spores inoculated in the germination medium. Approximately 50% of these spores germinated. The exact number of cells in the hyphae of the germinated spores was not determined. Consequently, the amounts of spores and mycelium inoculated in the soils are not directly comparable.
For each soil sample, the extraction protocol No. 6, the purification method D and PCR amplification combined with Dot-Blot hybridization and phosphorescence imaging (phospho-imaging) were used to count the specific target DNAs which had been released. The DNA extracted can be clearly distinguished from the background noise only when the number of spores added exceeds 10 5 for soils No. 3 and No. and 107 for soil No. 2 (Figure 6a).
When the mycelium is added, the DNA extracted can be detected at and above an amount corresponding to 103 spores/g of soil for soils No. 2 and No. 3, and at and above 10 7 spores/g of soil No. 5 (Figure b).
Above the detection level, the hybridization signal increases as the amounts of inoculated cells increases.
For the spore inoculum, a 100-fold increase in the number of cells inoculated leads to a close to 100-fold increase in the DNA yield. This increase is clearly less than when the hyphae are inoculated, particularly into soils No. 2 and No. 3 (Figure 6).
In contrast, in the results obtained when lambda phage DNA was used as the inoculum, the DNA was also recovered from the clay-rich soil (No. 5) when the bacterial cells were used as the inoculum. However, for the latter inoculum also, the treatment with RNA increased the recovery of Streptomyces DNA from this soil both for the spores and the mycelium (Figure 6).
Inoculating the soils with vegetative Bacillus anthracis cells gave recovery levels similar to those obtained for Streptomyces.
Furthermore, the levels of DNA recovery from soil No. 5 increased after treatment with RNA for this inoculum also.
Example 2: Construction of a library of low molecular weight DNA kb) using a soil contaminated with lindane, and cloning and expression of the linA gene This example describes the construction of a DNA library of the E. coli. It demonstrates the cloning and expression of small genes obtained from a non-culturable microflora.
Lindane is an organochlorine pesticide, which is recalcitrant to degradation and persistent in the environment. Under aerobic conditions, biodegradation is catalyzed by a dehydrochlorinase, encoded by the linA gene, allowing lindane to be converted into 1,2,4-trichlorobenzene. The linA gene has been identified only from two strains isolated from soil: Sphingomonas paucimobilis, isolated in Japan (Seeno and Wada 1989; Imai et al., 1991; Nagata et 1993) and Rhodanobacter lindaniclasticus isolated in France (Thomas et 1996, Nalin et 1999).
However, the degradation potential of lindane, demonstrated by assaying the chloride ions released and PCR amplification of the linA gene from soils which have been in contact with lindane or otherwise, appears to be more widespread in the environment (Biesiekierska-Galguen, 1997).
1. Direct extraction of soil DNA The dry soils are ground for 10 minutes in a Restch centrifugal-force grinder equipped with 6 tungsten beads. 10 grams of ground soil are suspended in 50 ml of pH 9 TENP buffer (50 mM Tris, 20 mM EDTA, 100 mM NaCI, 1% w/v polyvinylpolypyrrolidone), and homogenized by vortexing for 10 min.
After centrifuging for 5 min, at 4000 x g and 4 0 C, the supernatant is precipitated with sodium acetate (3M, pH 5.2) and with isopropanol, then taken up in sterile TE buffer (10 mM Tris, 1 mM EDTA, pH The DNA extracted is then purified on an S400 molecular sieve column (Pharmacia) and on an Elutip d ion-exchange column (Schleicher and Schuell), according to the manufacturers' instructions, then stored in TE.
2. Construction of the library of DNA extracted from the soil in the vector pBluescript SK- The vector pBluescript SK- and the DNA extracted from the soil are each digested with the enzymes Hindlll and BamHI (Roche), at a rate of 10 units of enzymes per 1 pg of DNA (incubation for 2 hours at 370C).
The DNAs are then ligated by the action of T4 DNA ligase (Roche) overnight at 15°C, at a rate of one enzyme unit per 300 ng of DNA (about 200 ng of DNA insert and 100 ng of digested vector). Electrocompetent Escherichia coli cells, ElectroMAX DH10BTM (Gibco BRL) are transformed with the ligation mixture (2 pl) by electroporation (25 pF, 200 and 500 S, kV) (Biorad Gene Pulser).
After one hour of incubation in the LB medium, the transformed cells are diluted so as to obtain about 100 colonies per dish, and then plated out on LB medium (10 g/l tryptone, 5 g/l yeast extract, 5 g/l NaCI) supplemented with Ampicillin (100 mg/I), y-HCH (500 mg/I), X-gal (5-bromo-4-chloro-3-indolyl-a-D-galactoside, 60 mg/I), and IPTG (isopropylthio-P-D-galactoside, 40 mg/l) and incubated overnight at 37 0
C.
Since y-hexachlorocyclohexane (Merck-Schuchardt) is insoluble in water, a g/l solution is prepared in DMSO (dimethyl sulphoxide) (Sigma).
A library of 10,000 clones was thus obtained.
3.Cloning and expression of the linA gene Screening of the library was carried out by visualization of a lindane degradation halo around the colony (the lindane precipitating in the culture media). Out of 10,000 clones screened, 35 thus exhibited lindanedegrading activity. The presence of the linA gene in these clones was confirmed by PCR with the aid of specific primers, desribed by Thomas et al. (1996). Digestions carried out on the inserts and on the amplification products showed identical profiles between all the clones screened and the reference control, R. lindaniclasticus. The clones carrying the linA gene also had an insert of the same size (about 4 kb).
It was thus demonstrated that the soil DNA could be cloned and expressed in a heterologous host: E. coli, and that genes derived from a microflora that is difficult to culture could be expressed. Libraries prepared by partial digestion of DNA extracted from soil, with restriction enzymes such as Sau3AI, can thus be envisaged also.
EXAMPLE 3: Process for preparing a collection of nucleic acids from a soil sample, comprising a step of indirect DNA extraction.
1. MATERIALS AND METHODS 1.1 Extraction of the bacterial fraction of the soil of soil are dispersed in 50 ml of sterile 0.8% NaCI, by grinding in a Waring Blender for 3 x 1 minute, with cooling in ice between each grinding. The bacterial cells are then separated from the soil particles by centrifugation on a density cushion of Nycodenz (Nycomed Pharma AS, Oslo, Norway). In a centrifugation tube, 11.6 ml of a Nycodenz solution with a density of 1.3 g.ml' (8g of Nycodenz suspended in 10 ml of sterile water) are placed below 25 ml of the soil suspension previously obtained.
After centrifugation at 10,000 x g in a rotor with swing-out buckets (TST 28.38 rotor, Kontron) for 40 minutes at 4 0 C, the cellular ring, located at the interphase between the aqueous phase and the Nycodenz phase, is taken, washed in 25 ml of sterile water and centrifuged at 10,000 x g for minutes. The cell pellet is then taken up in a 10 mM Tris; 100 mMn EDTA pH 8.0 solution.
Prior to dispersion of the soil in the Waring Blender, a step of enrichment of the soil in a solution of yeast extract can be included in order in particular to allow the germination of the soil bacterial spores. 5 g of soil are thus incubated in 50 ml of a sterile solution of 0.8% NaCL-6% yeast extract, for 30 minutes at 40 0 C. The yeast extract is removed by centrifugation at 5000 rpm for 10 minutes in order to avoid the formation of a foam during the grinding.
1.2 Lysis of the soil bacterial cells Lysis of the cells in liquid medium and purification on a caesium chloride gradient The cells are lysed in a 10 mM Tris, 100 mM EDTA, pH solution containing 5 mg.ml' of lysozyme and 0.5 mg.ml" 1 of achromopeptidase for 1 hour at 370C A solution of lauryl sarcosyl (1% final) and proteinase K (2 mg.ml' is then added and incubated at 370C for minutes. The DNA solution is then purified on a density gradient of caesium chloride by centrifugation at 35,000 rpm for 36 hours on a Kontron 65.13 rotor. The caesium chloride gradient used is a gradient at 1g/ml of CsCI, with a refractive index of 1.3860 (Sambrook et al., 1989).
Lysis of the cells after inclusion in an agarose block The cells are mixed with an equal volume of agarose containing (weight/volume) Seaplaque (Agarose Seaplaque FMC Products.
TEBU, Le Perray en Yvelines, France) at low melting point and poured into a 100 pLI block. The blocks are then incubated in a lysis solution: 250 mM EDTA, 10.3% sucrose, 5 mg.ml lysozyme and 0.5 mg.ml 1 achromopeptidase at 370C for 3 hours. The blocks are then washed in a mM Tris-500 mM EDTA solution and incubated overnight at 370C in 500 mM EDTA containing 1 mg.ml" of proteinase K and 1% lauryl sarcosyl. After washing several times in Tris-EDTA, the blocks are stored in 500 mM EDTA.
The quality of the DNAs thus extracted is checked by pulse-field electrophoresis.
The amount of DNA extracted was evaluated on electrophoresis gel relative to a calibration range of calf thymus DNA.
1.3 Molecular characterization of the DNA extracted from soil The DNAs extracted from the soil are characterized by PCR hybridization, this method consisting in a first stage in amplifying the DNAs using primers located on universally conserved regions of the 16S rRNA gene, and then in hybridizing the amplified DNAs with different oligonucleotide probes of known specificity (Table with the aim of quantifying the intensity of the hybridization signal relative to an external calibration range of genomic DNA.
The DNAs extracted from the soil and the genomic DNAs extracted from pure cultures are amplified with the primers FGPS 612-669 (Table 1) under the standard PCR amplification conditions. The amplification products are then denatured with an equal volume of 1N NaOH, deposited on a Nylon membrane (GeneScreen Plus, Life Science Products) and hybridized with an oligonucleotide probe labelled at its end with g32P ATP by the action of T4 polynucleotide kinase. After prehybridization of the membrane in a solution of 20 ml containing 6 ml of SSC 20X, 1 ml of Denhardt's solution, 1 ml of 10% SDS and 5 mg of heterologous salmon sperm DNA, the hybridizations are carried out overnight at the temperature defined by the probe. The membranes are washed twice in SSC 2X for 5 minutes at room temperature, then once in SSC 2X 0.1% SDS and a second time in SSC 1X, 0.1% SDS for minutes at the hybridization temperature. The hybridization signals are quantified using the Molecular Analyst software (Biorad, Ivry sur Seine, France) and the amounts of DNA are estimated by interpolation of the calibration curves obtained from the genomic DNAs.
2. RESULTS AND DISCUSSION 2.1 Extraction and lysis of the bacterial fraction of the soil Separation of the microbial cells from the soil particles, prior to extraction of the DNA, is an alternative which has many advantages over the methods of direct extraction of the DNA in the soil. Specifically, extraction of the microbial fraction limits the contamination of the DNA extract with extracellular DNA freely present in the soil or with DNA of eukaryotic origin. Above all though, the DNA extracted from the microbial fraction of the soil has fragments of longer size and better integrity than the DNA extracted by direct lysis (Jacobson and Rasmussen (1992)).
Furthermore, separation of the soil particles makes it possible to avoid contamination of the DNA extract with humic and phenolic compounds, it being possible thereafter for these compounds to seriously impair the cloning efficacies.
One of the steps which is a determining factor for the extraction of the cells from the soil is the dispersion of the soil sample in order to dissociate the cells which adhere to the surface or to the inside of aggregates of soil particles. Three successive cycles of grinding for one minute each make it possible to obtain better cell extraction efficacy and a larger amount of DNA recovered, compared with a single cycle of grinding for one minute 30 seconds.
Table 5 reports the extraction efficacies obtained after centrifugation on a Nycodenz gradient, on the total viable microflora (counted by microscopy after staining with acridine orange), on the total culturable microflora (counted on solid 10% Trypticase-Soja medium), and on the actinomycetes microflora culturable on HV agar medium (after incubation at 40 0 °C in a solution of 6% yeast extract-0.05% SDS in order to bring about germination of the spores). Moreover, the extracted DNA was 100 quantified either after lysing the cells in liquid medium (without purification on a caesium chloride gradient) or after lysing the cells included in an agarose block (after digesting the agarose with a P-agarase).
The results show that more than 14% of the total telluric microflora is recovered by this method 2 x 108 cells per gram of soil) and that the total culturable microflora represents barely 2% of the total microbial population.
Moreover, the amount of DNA extracted from the cells is 330 ng per gram of dry soil. Estimating the DNA content per soil microbial cell to be between 1.6 and 2.4 fg, and given the amount of cells extracted (2 x 108 cells per gram of soil), it can be estimated that virtually all of the cells are lysed and that this lysis does not place any major bias on this approach.
The pulsed-field electrophoreses show that the DNA from the soil extracted after Nycodenz and CsCI gradients could be up to 150 kb in size and that the agarose block lysis allowed fragments of more than 600 kb to be extracted.
These results confirm the advantage of this approach independent of culture for the construction of environmental DNA libraries, as an alternative to the methods of direct DNA extraction.
2.2 Molecular Characterization of the DNA extracted from the soil The aim of the molecular characterization of the DNA extracted from the soil is to obtain profiles representing the proportions of the various bacterial taxons present in the DNA extract. It also involves the matter of knowing the extraction biases induced by the prior separation of the cellular reaction of the soil, in comparison with a direct extraction method in the absence of a direct visualization of the microbial diversity present in the 101 soils. Specifically, little information has been collected on the extraction of cells on a Nycodenz gradient as a function of their morphological structure (cell diameter, filamentous or sporulated forms).
The methods in place hitherto were based on: quantitative hybridizations using oligonucleotide probes specific for different bacterial groups, applied directly to DNA extracted from the environment. Unfortunately, this approach is not very sensitive and does not allow taxonomic groups or genera present in low abundance to be detected (Amann (1995)).
quantitative PCR such as MPN-PCR (Most Probable Number) (Sykes et al. (1992)) or competitive quantitative PCR (Diviacco et al.
(1993)). The respective drawbacks of each of these approaches are the laborious nature due to the multiplication of the dilutions and repetitions, thus making the technique unsuitable for a large number of samples or pairs of primers, and (ii) the need to construct a competitor which is specific for the target DNA and which does not induce any bias in the competition.
The method introduced according to the present invention consists in universally amplifying a 700 pb fragment inside the 16S rDNA sequence, in hybridizing this amplificate with an oligonucleotide probe of variable specificity (as regards the kingdom, order, subclass or genus) and in comparing the hybridization intensity of the sample relative to an external calibration range. The amplification prior to the hybridization makes it possible to quantify genera or species of microorganisms that are relatively sparse. Furthermore, the amplification with universal primers makes it possible, during the hybridization, to use a wide series of oligonucleotide probes. It allows a comparison between different modes of lysis (direct or indirect extraction) on well defined taxonomic groups.
The results are collated in Table 6.
102 They show similar profiles between the two extraction methods (direct and indirect). Thus, it appears that prior extraction of the telluric microbial fraction does not introduce any genuine bias among the taxons tested. The only significant difference between the two extraction approaches would appear to be the greater abundance of rDNA sequences beloning to y-proteobacteria in the extract by the indirect extraction method.
Furthermore, a significant effect of incubating the soil sample in a solution of yeast extract is observed on the sporulated soil populations (Gram+, low percentage of GC and actinomycetes). This step brings about germination of the spores and, firstly, definitely allows better recovery of cells of this type, and, secondly, allows greater lysis efficacy on germinating cells.
This approach allows a semi-quantitative analysis, targetted on the main taxons defined using microorganisms cultured and usually found in the soils. Only molecular tools make it possible to estimate the magnitude of the various taxons, since culture methods are too restrictive and are dependent on the specificity of the medium used.
The results show that a large proportion of the microbial population is not represented in the phylogenetic groups described, thus demonstrating the existence of novel groups made up of microorganisms which have not been cultured hitherto, or which are not culturable.
Thus, novel probes can be defined using given sequences starting with DNA extracted from the soil (novel phyla composed of noncultured microorganisms, Ludwig et al. (1997)) in order to obtain a more exact image of the composition of the DNA extract.
103 Example 4 CONSTRUCTION OF THE COSMID POS7001 Characteristics of POS7001: Replicative in E. coli Integrative in Streptomyces Selectable in E. coli AmpR, HygroR and Streptomyces HygroR The properties of the cosmid make it possible to insert large DNA fragments of between 30 and 40 kb.
It comprises 1 The inducible promoter tipA of Streptomyces lividans 2 The integration system specific for the element pSAM2 3 The hygromycin-resistance gene 4- The cosmid pWED1, derived from 1) The inducible promoter of the tip A gene of S. lividans The tipA gene encodes a 19 KD protein whose transcription is induced by the antibiotic thiostrepton or nosiheptide. The tipA is well regulated: induction in exponential phase and in stationary phase (200X) (Murakami T, Holt TG, Thompson CJ., J. Bacteriol 1989 ;171 :1459-66).
2) The hygromycin-resistance gene Hygromycin: antibiotic produced by S. hygroscopicus The resistance gene encodes a phosphotransferase (hph) The gene used originates from a cassette constructed by Blondelet et al., in which the hyg gene is under the control of its own promoter and of the IPTG-inducible plac promoter (Blondelet-Rouault et al.; Gene 1997;190:315-7) 3) The site-specific integration system The element pSAM2 integrates into the chromosome by means of a site-specific integration mechanism. The recombination takes place between two identical 58 bp sequences present on the plasmid (attP) and on the chromosome (attB).
The int gene, located close to the attP site, is involved in the sitespecific integration of pSAM2, and its product has similarities with the integrases of the temperate bacteriophages of enterobacteria. It has been demonstrated that a pSAM2 fragment containing only the attP attachment site as well as the int gene was capable of integrating in the same manner as the entire element (see French patent No. 88 06638 of 18/05/1988 and Raynal A et al., Mol. Microbiol. 1998 28 :333-42).
4) Construction of the cosmid pOS7001 Step 1/ The promoter TipA was isolated from the plasmid pPM927 (Smokvina et al., Gene 1990; 94:53-9 on a 700-base pair Hindlll-BamHI fragment and cloned into the vector pUC18 (Yannish-Perron et al., 1985) digested with Hindlll/BamHI.
Step 2/ This HindllI-BamHI fragment was subsequently transferred from pUC18 to pUC19 (Yannish-Perron et al., 1985).
Step 3/ A 1500-base pair BamHI-BamHI insert carrying the int gene and the attP site of pSAM2 was isolated from the pOSintl, represented in Figure 8 (Raynal A et al. Mol Microbiol 1998 28 :333-42) and cloned into the BamHI site of the preceding vector (pUC19/TipA), in the orientation 105 which allows the int gene to be placed under the control of the promoter TipA.
Step 4/ The BamHI site located on the 5' side of the int gene was deleted by partial digestion with BamHI followed by treatment with the Klenow enzyme. A Hindlll-BamHI fragment carrying TipA-int-attP was thus isolated from pUC19 and transferred into pBR322 Hindlll/BamHI.
Step 5/ The hygromycin cassette isolated from pHP452hyg (Blondelet-Rouault et al., 1997) on a Hindll-Hindlll fragment was cloned into the Hindill site located upstream of the promoter TipA.
Step 6/ The Hindill site located between the i2Hyg cassette and the promoter TipA was deleted by Klenow treatment after partial Hindlll digestion.
Step 7/ The plasmid obtained after the preceding step makes it possible to isolate a single Hindlll-BamHI fragment, carrying all the QHyg/TipA/int attP elements, which was cloned after Klenow treatment into the EcoRV site of the cosmid pWED1. The cosmid pWED1, represented in Figure 9, derived from the cosmid pWE15, represented in Figure 10 (Wahl GM, et al., Proc.
Natl. Acad. Sci. USA 1987 84:2160-4) by deletion of an Hpal-Hpal fragment carrying the Neomycin gene and the SV40 origin.
A map of the vector pOS 7001 is represented in Figure 11.
Example 5: Construction of the cosmid which is conjugative and integrative in Streptomyces, the vectors pOSV 303, pOSV306 and pOSV307 106 5.1 Construction of the vector pOSV303 Given that the packaging selects clones larger than 30 kb, only to 15% of the clones contain no insert, and it is thus not really necessary to have a system for selecting recombinants, thus allowing a smaller vector to be constructed.
Construction: Step 1 the vector pOSV001 Cloning of an 800 base pair Pstl-Pstl fragment carrying the transfer origin OriT of the replicon RK2 (Guiney et al., 1983), into the plasmid pUC19 opened with Pstl. This cloning step makes it possible to obtain a vector which is transferable from E. coi to Streptomyces by conjugation.
The map of the vector pOSV 001 is represented in Figure 17.
Step 2 the vector pOSV002 Insertion of the hygromycin marker (92hyg cassette), which is selectable in Streptomyces, such that the hygromycin-resistance gene is transferred last, thus making it possible to ensure complete transfer of the BAC with the soil DNA insert.
Cloning of the hygromycin cassette isolated from pHP45Qhyg on a Hindlll-Hindlll fragment carrying the hygromycin-resistance gene. This fragment is cloned into the Pstl site (position 201) of the vector pOSV001.
This Pstl site was chosen, given the direction of the transfer, such that the Hygro marker is transferred last during the conjugation. The Pstl and Hindlll ends are made compatible after treatment with the Klenow fragment of DNA polymerase, allowing "blunt ends" to be generated. The orientation of the Qhyg fragment is determined at the end of construction.
The map of the vector pOSV002 is represented in Figure 18.
107 Step 3 the vector pOSV010 The Xbal-Hindlll fragment isolated from the plasmid pOSV002 and containing the hygromycin-resistance marker and the transfer origin is cloned into the plasmid pOSintl digested with Xbal and Hindlll. The orientation of the sites is such that the hygromycin marker will always be transferred last.
The plasmid pOSintl, represented in Figure 8, was described in the article by Raynal et al. (Raynal A et al., Mol. Microbiol. 1998 28 :333- 42).
This construct allows the expression of the integrase in E. coli and Streptomyces.
Step 4 insertion of the "cos" site The principle is to insert a "cos" site into the plasmid pOSV010, allowing packaging into the plasmid pOSV010, represented in Figure 12.
The production of the "cos" fragment is represented in Figure 13.
This fragment is obtained by PCR. Starting with a fragment carrying the cohesive ends (cos) of X (bacteriophage lambda or cosmid pHC79), a PCR amplification is carried out using oligonucleotides corresponding to the sequences -50/+130 relative to the cos site. These oligonucleotides also contain the Nsil cloning sites, Pstl compatible, the Xhol site, Sail compatible, and EcoRV, site for obtaining "blunt ends".
Addition of the rare Swal and Pad sites makes it possible to isolate and/or map the insert cloned.
The PCR fragment is delimited by a Pstl site at the 5' end and by a Hincll site at the 3' end, allowing cloning into the vector pOSV010 (Figure 12) predigested with the enzymes Nsil and EcoRV, bringing about deletion of the laclq repressor.
108 The map of the vector pOSV303 is represented in Figure 14. The vector pOSV303 contains cloning sites such as the Nsil site, Pstl compatible, the Xhol site, Sail compatible or the EcoRV site for obtaining "blunt ends".
5.2 Construction of the vector pOSV306 Step 1: Construction of the vector pOSV308 The vector pOSV308 was constructed according to the process illustrated in Figure 27. A 643-bp fragment containing the cos region was amplified using a pair of primers of sequences SEQ ID No. 107 and SEQ ID No. 108 from the cosmid vector pHc79 described by Hohm B and Collins (1980).
This amplified nucleotide fragment was cloned directly into the pGEMT-easy vector sold by the company Promega, as illustrated in Figure 27, so as to produce the vector pOSV308.
Step 2: Construction of the vector pOSV306 The vector pOSV010 was constructed as described in step 3 of construction of the vector pOSV303, as described in paragraph 5.1 of the present example.
The vector pOSV10 was digested with the enzymes EcoRV and Nsil in order to excise a 7874-bp fragment, which was subsequently purified, as illustrated in Figure 28.
Next, the vector pOSV308 obtained in step 1) above was digested with the enzymes EcORV and Pstl in order to excise a 617-bp fragment, which was subsequently purified.
109 Next, the 617-bp cos fragment obtained from the vector pOSV308 was integrated by ligation into the vector pOSV10, so as to obtain the vector pOSV306, as illustrated in Figure 28.
5.3 Construction of the vector pOSV307 The cosmid pOSV307 still contains the Laclq gene so as to improve the stability of the cosmid in Streptomyces, for example in the S17-1 strain of Streptomyces.
In order to construct the vector pOSV307, the vector pOSV010 was subjected to a digestion with the enzyme Pvull, to obtain an 8761-bp fragment which was purified and then dephosphorylated.
Next, the vector pOSV308, obtained as described in step 1) of paragraph 5.2 above, was digested with the enzyme EcoRI so as to obtain a 663-bp fragment, which was then purified and treated with the Klenow enzyme.
The nucleotide fragment thus treated was integrated into the vector pOSV010 after ligation so as to obtain the vector pOSV307, as illustrated in Figure 29.
Example 6: Construction of the E. coli-Streptomyces replicative shuttle cosmid pOS700R The fragments of the plasmid pEI16 (Volff et al., 1996) represented in Figure 15 were isolated and Klenow-treated. These fragments contain the sequences required for replication and stability originating from the plasmid SCP2.
These two fragments are inserted separately into the EcoRV site of the cosmid pWED1, leading to 2 different clones.
110 The hygromycin cassette isolated from pHP45Qhyg on a Hindlll- Hindll fragment was cloned into the Hindlll site of the pWED1 cosmids containing the ScP2 insert in the form of Pstl-EcoRI or Xbal fragments. It imparts hygromycin resistance which can be selected both in E. coli and in Streptomyces.
Transformation of S. lividans and determination of the transformation efficacy.
It was found that the cosmid containing the Xbal insert was less stable than that containing the Pstl EcoRI fragment. It is therefore the latter cosmid which was selected under the name pOS700R.
The map of the vector pOS 700R is represented in Figure 16.
Example 7: Transformation efficacy of the integrative (pOS7001) and replicative vectors Possibilities To render the strain of S. lividans resistant to thiostrepton by integrating the plasmid pTO1 carrying the thiostrepton-resistance marker.
Preparation of protoplasts from S. lividans cultured in the presence of thiostrepton.
With the pOS7001 vector, the transformation efficacy is about 3000 transformants per gg of DNA.
With the vector pOS700R, the transformation efficacy is about 30,000 transformants per pg of DNA.
Example 8: Construction of a BAC vector which is integrative in Streptomyces and conjugative Characteristics: Replicative in E. coli Transferable by conjugation of E. coli with Streptomyces Integrative in Streptomyces Selectable in E. coli and Streptomyces Capable of inserting large DNA fragments; it should be pointed out that it is necessary to have available soil DNA which is between 100 and 300 kb in size and which is not contaminated with small fragments. The reason for this is that the small fragments are very preferably integrated.
Endowed with a screen for selecting plasmids carrying an insert. This screen makes it possible, by removing the vectors which are closed on themselves and which are not digested, to work with a higher ratio between the vector and the DNA to be inserted, thus making it possible to have better cloning efficacy for making libraries.
Construction: Step 1 the vector pOSVO01 Cloning of an 800 base pair Pstl-Pstl fragment carrying the transfer origin OriT of the replicon RK2 (Guiney et al., 1983), into the plasmid pUC19 opened with Pstl. This cloning step makes it possible to obtain a vector which is transferable from E. coli to Streptomyces by conjugation.
The map of the vector pOSV 001 is represented in Figure 17.
Step 2 the vector POSV002 Insertion of the hygromycin marker (hyg cassette), which is selectable in Streptomyces, such that the hygromycin-resistance gene is transferred last, thus making it possible to ensure complete transfer of the BAC with the soil DNA insert.
112 Cloning of the hygromycin cassette isolated from pHP45Qhyg on a Hindll-Hindlll fragment carrying the hygromycin-resistance gene. This fragment is cloned into the Pstl site (position 201) of the vector pOSV001.
This Pstl site was chosen, given the direction of the transfer, such that the Hygro marker is transferred last during the conjugation. The Pstl and Hindlll ends are made compatible after treatment with the Klenow fragment of DNA polymerase for generating "blunt ends". The orientation of the Qhyg fragment is determined at the end of construction.
The map of the vector pOSV002 is represented in Figure 18.
Step 3 the vector pOSV010 The Xbal-Hindlll fragment isolated from the plasmid pOSV002 and containing the hygromycin-resistance marker and the transfer origin is cloned into the plasmid pOSintl digested with Xbal and Hindlll. The orientation of the sites is such that the hygromycin marker will always be transferred last.
The plasmid pOSintl, represented in Figure 8, was described in the article by Raynal et al. (Raynal A et al., Mol. Microbiol. 1998 28 :333- 42).
This construct allows the expression of the integrase in E. coli and Streptomyces.
Step 4 the vector pOSV014 Addition of a "cassette" making it possible at the end to select in the final construct the plasmids which have foreign DNA inserted.
This "cassette" carries the gene encoding the X phage CI repressor and the tetracycline-resistance gene. This gene carried the target sequence of the repressor in its non-coding 5' region. The insertion of DNA into the Hindlll site located in the coding sequence of CI leads to 113 the non-production of the repressor and thus to the expression of tetracycline resistance.
It is carried by the plasmid pUN99 described in the article: Nilsson et al. (Nucleic Acids Res. 1983, 11:8019-30).
A Pvull-Hindlll fragment isolated from pOSV010 and containing the sequences Int, attP, Hygro and oriT is cloned into the Mscl site of pUN99.
The map of the vector pOSV014 is represented in Figure 19.
Step 5: the vector pOSV 403, and integrative and conjugative BAC vector This last step of cloning into pBAC11 (represented in Figure gives the final plasmid BAC (Bacterial Artificial Chromosome) characteristics, in particular the ability to accept very large DNA inserts.
The Pstl-Pstl fragment of the vector pOSV014 carrying the set of elements and functions described previously is cloned into the plasmid pBAC11 (pBeloBAC11) digested with Notl. The ends are made compatible by treatment with the Klenow enzyme.
The map of the vector pOSV403 is represented in Figure 21. The scheme of Figure 21 indicates the orientation selected.
Step 6: The vector pOSV403 contains the Hindlll and Nsil sites. The Nsil site is quite rare in Streptomyces and has the advantage of being compatible with Pstl. On the other hand, the Pstl site is common in Streptomyces and can be used to carry out partial digestions.
The recombinant clones carrying an insert cloned into the Cl repressor, and thus inactivating this repressor, become tetracyclineresistant. Given that the BACs are present only at a rate of one copy per 114 cell, it is necessary to select the recombinant clones with a lower dose of tetracycline than the usual dose of 20 jg/ml, for example with a dose of pg/ml. Under these conditions, there is no background noise.
It is also possible to use the system developed and sold by the company InVitrogen, in which the insertion of DNA into the vector inactivates a gyrase inhibitor whose expression is toxic for E. coli. The fragment is preferentially isolated from the vector pZErO-2 (http://www.invitrogen.com/).
Example 9 Construction of an S. alboniqer library in the integrative cosmid (pOS7001) and the replicative cosmid (pOS700R) 1) Construction of the library To evaluate the efficacy of the cloning system, the puromycin biosynthetic pathway of Streptomyces alboniger was cloned into the two shuttle cosmids pOS7001 and pOS700R. The genes of the puromycin biosynthetic pathway are carried by a BamHI DNA fragment of about kb.
The genomic DNA of Streptomyces alboniger was isolated. of this DNA has a molecular weight of between 20 and 150 kb, determined by pulsed-field electrophoresis.
The two cosmids were digested with the enzyme BamHI (single cloning site).
The conditions of partial BamHI digestion of the genomic DNA were determined (50 pg of DNA and 12 units of enzyme, digestion for minutes). After checking the size by agarose gel electrophoresis, the DNA partially digested was introduced into the vectors. In the ligation, 15 g of genomic DNA 2 pg of the integrative vector or 5 pg of the replicative vector were used.
115 Each ligation mixture was used for the in vitro encapsidation of the DNA into the heads of bacteriophage lambda. The encapsidation mixtures (0.5 ml) were titrated (integrative vector pOS7001 7.5 x 105 cosmids/ml, replicative vector 5 x 104 cosmids/ml).
The cosmids were used to transfect E. coli and thus to generate libraries of about 25,000 ampicillin-resistant clones. The DNA from all of these clones was isolated and quantified.
To test the libraries, several clones were chosen, the DNA purified and digested with BamHI, in order to check the presence and size of the inserts. The clones tested contain between 20 and 35 Kb of S. alboniger insert.
2) Identification of the clones containing the puromycin biosynthetic pathway The clones liable to contain the complete puromycin biosynthetic pathway were identified by hybridization with a probe corresponding to the puromycin-resistance gene, the 1.1 kb pac gene (Lacalle et al., Gene 1989; 79, 375-80).
Library made in the integrative vector pOS 7001: Among 2000 clones analysed, 9 clones were hybridized with the probe and they contain inserts of about 40 kb.
Library made in the replicative vector pOS 700R: Among 2000 clones analysed, 12 clones were hybridized with the probe; they contain inserts of about 40 kb.
116 Using the data published by Tercero et al. (J Biol. Chem. 1996; 271, 1579-90), the clones containing the entire biosynthetic pathway were identified, after hybridization with suitable probes. Certain integrative and replicative cosmids contain a 12,360-base pair fragment after Clal-EcoRV digestion, which leads to the assumption of an insert containing the entire puromycin biosynthetic pathway.
4) Checking the production of puromycin by the resistant clones (Rh6ne-Poulenc).
a) Materials and Methods Strains and culture conditions: Three resistant clones were selected to check the production of puromycin. They correspond to the S. lividans recombinants containing an insert in the integrative vector pOS7001 (G 20) or an insert in the replicative vector (G21 and G22).
Reference strains were used to ensure that the culture media used allowed this production. They are the S. alboniger wild-type strain ATCC 12461, which produces puromycin, and the S. lividans recombinant strain containing the complete puromycin cluster cloned into the plasmid pRCP11 (Lacalle et al, 1992, the EMBO journal, 11, 785-792) (G23).
The strains were inoculated in a culture medium whose composition is as follows: Organotechnie bacteriological peptone 5 g/l of final medium Springer yeast extract Liebig meat extract Prolabo glucose Prolabo CaCO 3 3 Prolabo NaCI Difco agar 1 The 3g of carbonate are mixed with 200 ml of distilled water and then sterilized separately. The addition is carried out after sterilization.
The agar is melted beforehand in 100 ml of distilled water, after which it is added to the other ingredients of the medium.
pH ajusted to 7.2 before sterilization sterilization for 25 minutes at 121°C pg/l of hygromycin and 5 pg/l of thiostrepton are added to the medium after sterilization so as to maintain a selection pressure for the clones containing an insert by means of the marker gene present on the vector (the thiostrepton-resistance gene being carried by the plasmid pRCP11).
ml of liquid culture medium, distributed in 250 ml conical flasks, are inoculated with 2 ml of aqueous suspension of spores and mycelium of each of the strains. The cultures are incubated for 4 days at 280C with stirring at 220 rpm. 50 ml of production medium, distributed in 250 ml conical flasks, are then incoulated with 2 ml of these precultures.
The production medium used is an industrial medium optimized for the production of pristinamycin (medium RPR 201). The cultures are incubated at 280C, with stirring at 220 rpm. After different incubation times, a conical flask of each culture is brought to pH 11 and then extracted with twice 1 volume of dichloromethane. The organic phase is concentrated to dryness under reduced pressure and the extract is then taken up in 10 RI of methanol. 100 jl of the methanol solution are analysed by HPLC equipped with a diode-bar detector, in a water-acetonitrile 0.05% TFA V/ gradient system on a C18 column for the detection of puromycin.
b) Results The comparative HPLC analyses from the cultures of the various strains show the production of puromycin in the culture of the wild-type strain at and above 24 h of incubation. A production, although lower, is also clearly detected at and above 48 h in the culture of the clone containing the cosmid pOS7001 (Figure 23). Puromycin was also detected in trace amounts in the clone G23 containing the complete operon encoding the compound in the plasmid pRCP11. However, no production was observed in the cultures of the clones G21 and G22 containing the cosmid pOS700R. The results are given in Figure 23.
c) Conclusions The results obtained make it possible to demonstrate the efficacy of the cloning system developed in the cosmid pOS7001 for expressing, in a heterologous host such as S. lividans, a complete biosynthetic pathway under the control of its own regulatory sequences. Moreover, these data also validate the screening of the libraries obtained on the basis of the resistance of the clones to puromycin since it leads to the identification, among a small number of clones, of a recombinant capable of expressing the biosynthetic pathway associated with the resistance gene. The absence of puromycin production in the other clones can probably be explained by the cloning of only a portion of the operon containing the resistance gene but devoid of certain regulatory, transduction or transcription sequences necessary for the synthesis of the compound.
119 EXAMPLE 10 CLONING OF SOIL DNA INTO VECTORS 1) Preparation of the soil DNA to be cloned The various DNA fragments need to be purified according to their destination: Cosmids The size of the molecules should be between 30 and 40 kb. Now, the DNA extracted from the soil is heterogeneous in size and comprises molecules of up to 200 or 300 kb. In order to homogenize the sizes, the DNA is broken mechanically by passing the solution through a needle 0.4 mm in diameter. The fragments of a size in the region of 30 kb are not affected by these repeated passages through a needle and it is thus not necessary to carry out a separation on the basis of size especially since the packaging in the particles automatically eliminates the short inserts.
BACs Preparation of the DNA The soil DNA is separated by pulsed-field electrophoresis (CHEF type) under conditions such that the fragments between 100 and 300 kb are concentrated in a band of about 5 mm. This is obtained by carrying out the migration in a gel containing 0.7% of normal agarose or 1% of agarose of low melting point with a pulsation time of 100 seconds, for 20 hours and at a temperature of 100C.
Recovery of the DNA Two methods are used, their choice depending on the size of the molecules it is desired to isolate, either up to 150 kb or higher.
120 Up to 150 kb The porosity of a 0.7% agarose gel allows the exit of the DNA by electroelution on the condition that there is total absence of ethidium bromide. This DNA is then handled with hydrophobic and enlargedorifice pipetting instruments in order to avoid mechanical fragmentation of the molecules.
Between 100 and 300 kb The band containing the fragments between 100 and 300 kb in size is cut up. For the migration, a gel containing 1% of agarose of low melting point is used. This property makes it possible to melt the gel at a temperature of 65 0 C, which can be tolerated by the DNA, and then to digest it with agarase (Agarase sold by the company Boehringer) at a temperature of 45 0 C according to the supplier's prescriptions.
2) Use of the integrative cosmid pOS7001 and the replicative cosmid pOS700R Construction with polyA polyT tails Principle A cosmid vector, opened at any cloning site, is modified at the 3' ends by adding a monotonous polynucleotide. Moreover, the DNA to be cloned is modified at the 3' ends by adding a monotonous polynucleotide which can pair up with the above polynucleotide.
The vector-fragment combination to be cloned is made with these polynucleotides and the cos sequence of the vector allows the in vitro packaging of the DNA into lambda phage capsids.
Preparation of the vector The vector used is a vector which is self-replicating in E. coli and integrative in Streptomyces.
For E. coli, the selection is made on the ampicillin resistance, and for Streptomyces, it is made on the hygromycin resistance.
The cosmid is opened at one of the 2 possible sites (BamHI or Hindll) and the 3' ends are extended with polyA with terminal transferase under the conditions in which the enzyme supplier envisages the addition of 50 to 100 nucleotides.
Preparation of the DNA to be inserted The 3' ends of the DNA are extended with polyT with terminal transferase under the conditions supplying an extension comparable to that of the vector. Under the experimental conditions described by the manufacturer, the polyA polyT tails are from 30 to 70 bases long.
Assembly of the molecules and in vitro encapsidation For the assembly of the molecules, one vector molecule is mixed per molecule of DNA inserted. The concentration of the DNA by mass is 500 j g.ml 1 The mixture is encapsidated and the transfection efficacy depends on the strain used as recipient and the DNA inserted: zero with the test DNA and the strain DH5a, the efficacy is comparable for the SURE and strains; on extraction, the DNA yield is, however, higher with the strain Construction by dephosphorylation The soil DNA is rendered with blunt ends by removal of the protruding 3' sequences and filling in of the protruding 5' sequences. This operation is carried out with: Klenow enzyme, T4 polymerase, the 4 nucleotide triphosphates. The cosmid vector is digested with BamHI and then treated with the Klenow enzyme to make the ends blunt, then dephosphorylated to prevent it from closing up on itself. After ligation, the mixture is encapsidated and transfected as described previously.
3) Use of pBACs Principle The conjugative and integrative plasmid pBAC contains the Hindlll and Nsil sites as cloning sites. The insertion of a DNA sequence into these sites inactivates the lambda phage Cl repressor which controls the expression of the tetracycline-resistance gene. Inactivation of the repressor thus makes the cell resistant to this antibiotic (5 pg.ml'). The cloning at these sites is facilitated by modifying the vector and preparing the DNA to be cloned.
Preparation of the vector. Hindlll example In order for the vector not to close up on itself, the Hind III site is modified: the first base is reinserted to form a protruding 5' sequence, which cannot pair up with its partners. The operation is carried out with the Klenow enzyme in the presence of dATP.
The success of the operation is checked by carrying out a self-ligation of the vector before and after treatment with the Klenow enzyme. For an identical amount of test DNA, 3000 clones are obtained before treatment and 60 clones after treatment.
Preparation of the DNA (size between 100 and 300 kb) Giving the DNA blunt ends The DNA is given blunt ends by removing the protruding 3' sequences and filling in the protruding 5' sequences. This operation is carried out with: Klenow enzyme, T4 polymerase, the 4 nucleotide triphosphates.
Preparation of the ends. Hindlll example The addition of DNA to the vector is carried out by means of oligonucleotides which recognize the Hindlll modified sequence of the vector. They contain rare restriction sites to allow the subsequent clonings (Swal; Notl). This technique is derived from that of: Elledge SJ, Mulligan JT, Ramer SW, Spottswood M, Davis RW. Proc. Natl Acad. Sci. USA 1991 Mar 1;88(5):1731-5 Two complementary oligonucleotides are used: Oligo 1 5'-GCTTATTTAAATATTAATGCGGCCGCCCGGG-3' (SEQ ID No Oligo 2: 5'-CCCGGGCGGCCGCATTAATATTTAAATA-3' (SEQ ID No 26) They are phosphorylated at the 5' end with T4 polynucleotide kinase in the presence of ATP, after hybridization. This phosphorylation step can be eliminated by using the already-phosphorylated oligonucleotides.
The ligation of this double-stranded adapter with the DNA to be inserted into a vector is carried out with T4 ligase in the presence of a very large excess of adapter (1000 adapter molecules per molecule of DNA to be inserted) over 15 hours at 140C.
The excess adapter is removed by agarose gel electrophoresis and the molecules of interest are recovered from the gel by hydrolysing it with agarase or by electroelution.
Vector-DNA ligation The ligation is carried out at 14°C over 15 hours with 10 molecules of vector per insert molecule.
Transformation The recipient strain is the strain DH10B. The transformation is carried out by electroporation. To express the tetracycline resistance, the transformants are incubated at 370C for 1 hour in antibiotic-free medium. The clones are selected by culturing overnight on gelled LB medium supplemented with 5 pg.ml 1 of tetracycline.
Example 11 CLONE-TO-CLONE CONJUGATION BETWEEN E. COLI AND STREPTOMYCES CONJUGATION BETWEEN E COLI STRAIN S17.1 CONTAINING PPM803 AND STREPTOMYCES LIVIDANS TK 21 Introduction It is possible to carry out conjugations between E. coli and Streptomyces (Mazodier et al, 1989). The adaptation of this method, by developing a socalled drop technique in which 10 pl of an E. coli culture containing a recombinant vector are mixed with one drop of recipient S. lividans, consists in carrying out a clone-to-clone transformation while ensuring that, at the end of the operation, all of the library constructed in E. coli is introduced into S. lividans. A bulk transformation would necessarily lead to a multiplication of the Streptomyces transformant clones in order to be sure in practice that the library in E. coli is fully represented in S. lividans.
Furthermore, this method is easy to automate.
Preliminary tests Conjugation between E. coli strain S17.1 containing the vector pOSV303 and S. lividans TK21.
Under these conditions, 6 x 106 E. coli cells are mixed with 2 x 106 pregerminated S. lividans spores in a final volume of 20 pl.
Development of the method It is known that the DNA extracted from certain actinomycetes is modified and, as a result, cannot be introduced into certain strains of E. coli without it being restricted. The E. coli strain DH10B which accepts these DNAs is not capable of transferring to Streptomyces a plasmid containing only oriT, and it is thus necessary to construct such a plasmid. A derivative of RP4 126 should be introduced therein by integration into the chromosome, this derivative being capable of trans-supplying all the functions required to ensure the transfer of the recombinant clones containing the transfer origin oriT.
Example 12: Construction of a cosmid library in E. coli and Streptomyces lividans: Cloning of the soil DNA The object is to construct a library of large-sized environmental DNA, without a prior step of culturing the microorganisms, with the aim of gaining access to the metabolic genes of bacteria (or of any other organism) which it is not known how to culture under standard laboratory conditions.
The procedure described was used to generate a DNA library in Escherichia coli using the E. coliS. lividans shuttle cosmid pOS7001 and DNA extracted and purified from the bacterial fraction of a soil. This last method makes it possible to obtain DNA of high purity and with an average size of 40 kb. Also, in order to avoid a partial digestion of the extracted DNA in the cloning, an alternative strategy was adopted based on the use of the terminal transferase enzyme for adding polynucleotide tails to the 3' ends of the DNA and of the vector.
pg of DNA were extracted from 60 mg of "Saint-Andre coast" soil according to the protocol described in Example 3, and were treated with terminal transferase (Pharmacia) to extend the 3' ends with a monotonous polynucleotide (poly T) (Example The integrative cosmid pOS7001 is prepared according to protocol B1, Orsay. After a standard step of purification in the presence of phenol/chloroform, the DNA and the vector are assembled by mixing one 127 molecule of vector and one molecule of inserted DNA. The mixture is then encapsidated in the heads of lambda bacteriophages (Amersham kit) which serve to transfect E. coli DH10B. The cells transfected are then inoculated on LB agar medium in the presence of ampicillin for the selection of the recombinants resistant to this antibiotic.
A library of about 5000 ampicillin-resistant E. coli clones was obtained. Each clone was inoculated in LB or TB medium ampicillin in a microplate well (96 wells) and stored at -80 0
C.
The sequence at the sites of insertion of the soil fragments into the vector, pOS7001, generated during the construction of the library was analysed.
For this, 17 cosmids of the libraries were purified and sequenced with a primer, seq.5' CCGCGAATTCTCATGTTTGACCG which hybridizes between the BamHI site and the Hindlll cloning site present in the vector.
The sequences obtained made it possible to estimate that the length of the homopolymeric tails at the junction points is very variable, between 13 and poly-dA/dT. Beyond the tails, the sequences of the soil fragments thus generated have a percentage of G+C of between 53 and 70%. Such high percentages were unexpected, but similar results have already been reported on crude preparations of soil DNA (Chatzinotas A. et al., 1998).
A strategy of "pooling" 48 or 96 clones was used to analyse the microbial and metabolic richness. The cosmid DNA extracted from these "pools" of clones was then used to carry out PCR or hybridization experiments.
128 Example 13 Diversity of the 16S ribosomal DNA in the cloned DNA a) Materials and methods The cosmids of the library are extracted from pools of clones by alkaline lysis and are then purified on a caesium chloride gradient, in order to take up the band of cosmid DNA in supercoiled form and for the purpose of eliminating any Escherichia coli chromosomal DNA which might interfere in the study.
After linearization of the cosmids by the action of S1 nuclease, (50 units, 30 minutes at 370C), the 16S rDNA sequences contained in the pools of clones are amplified under the standard amplification conditions, using the universal primers 63f (5'-CAGGCCTAACACATGCAAGTC-3') and 1387r (5'-GGGCGGWGTGTACAAGGC-3') defined by Marchesi et al.
(1998). The amplification products of about 1.5 kilobases are purified using the Qiaquik gel extraction kit (Qiagen) and then cloned directly into the vector pCR II (Invitrogen) in Escherichia coli TOP10, according to the manufacturer's instructions. The insert is then amplified using the primers M13 forward and M13 reverse specific for the cloning site of the vector pCR II. The amplification products of expected size (about 1.7 kb) are analysed by RFLP (Restriction Fragment Length Polymorphism) using the enzymes Cfol, Mspl and BstUI (0.1 units) in order to select the clones to be sequenced. The restriction profiles obtained are separated on Metaphore agarose gel (FMC Products) containing 0.4 mg of ethidium bromide per ml.
The 16S rDNA sequences are then determined directly using the PCR products purified with the "Qiaquick gel extraction" kit with the aid of the sequencing primers defined by Normand (1995). The phylogenetic analyses are obtained by comparing the sequences with the prokaryotic 16S rDNA sequences collated in the Ribosomal Database Project (RDP) database, version 7.0 (Maidak et al. (1999)) by means of the SIMILARITY MATCH program, which makes it possible to obtain the similarity values relative to the database sequences.
b) Results To determine the phylogenetic diversity represented in the library, 47 sequences of the 16S rRNA gene were isolated from pools of 288 clones and were sequenced almost entirely. The results are given in Table 7.
Analysis of the sequences by interrogation of the databases reveals that most of the sequences have percentages of similarity of less than or equal to 95% with identified bacterial species (Table Out of the 47 sequences analysed, 28 sequences have non-cultured bacteria as closest neighbours, the sequences of which were obtained directly from DNA extracted from the environment. The majority of these sequences moreover have very low percentages of similarity 17 sequences out of 28 thus differing by more than 5% relative to their closest neighbours.
Among the sequences which can be classified in a phyletic group, a majority of sequences belong to the proteobacteria subclass a (18 sequences with a percentage of similarity of between 89 and A second group of sequences is represented by the proteobacteria subclass g, comprising 9 sequences whose percentages of similarity range between 84 and 99%. The groups of b-proteobacteria and dproteobacteria, which are Firmicutes with a low G+C% and a high G+C%, comprise 1, 4, 3 and 5 sequences, respectively. Only one sequence could not be classified among the major bacterial taxonomic groups defined: the sequence a22.1(19), its closest neighbour Aerothermobacter marianas 130 (with a similarity of 89%) itself being a strain isolated from the marine environment and not classified at the current time. Finally, 6 sequences can be classified in the group of Acidobacterium/Holophaga. This group has the particular feature of being represented by only two cultured bacteria, Acidobacterium capsulatum and Holophaga foetida, this entire group being composed of bacteria for which only the 16S rRNA gene has been detected by amplification and cloning using DNA extracted from an environmental sample (mainly from soil) (Ludwig et al., (1997)). The low values of similarity between the different sequences composing this group makes it possible to predict great heterogeneity and diversity within this group.
The set of results is represented in Table 7.
These results show that the sequences contained in the cosmid library are thought to be derived from microorganisms that are not only phylogenetically diversified but above all from microorganisms which have never been isolated to date.
The results of the sequencing of the DNAs amplified allowed the establishment of a phylogenetic tree of the organisms present in the soil sample whose characterized sequences are novel.
The phylogenetic tree represented in Figure 7 was produced from the alignment of the sequences by the MASE software (Faulner and Jurak, 1988) and corrected by the Kimura 2-parameter method (1980), and with the aid of the Neighbour Joining algorithm (Saitou and Nei, 1987). The phylogenetic analysis allowed comparison of the 16S rDNA sequences cloned in the soil DNA library, with sequences of prokaryotic 16S rDNA collated in the Ribosomal Database Project (RDP) databases (version SIMILARITY-MATCH program, Maidak et al., 1999) and in the GenBank base by means of the BLAST 2.0 software (Atschul et al, 1997).
Example 14: Genetic preselection of the library to evaluate the metabolic richness To characterize the library obtained in terms of metabolic diversity and to identify the clones containing inserts carrying genes which may be involved in biosynthetic pathways, genetic screening techniques based on PCR methods were developed according to the invention in order to detect and identify type I PKS genes.
1 Bacterial strains, plasmids and culture conditions S. coelicolor ATCC101478, S. ambofaciens NRRL2420, S.
lactamandurans ATCC27382, S. rimosus ATCC109610, B. Subtilis ATCC6633 and B. licheniformis THE1856 (collection RPR) were used as DNA sources for the PCR experiments. S. lividans TK24 is the host strain used for the shuttle cosmid POSI700.
For the preparation of genomic DNA, suspensions of spores and protoplasts and for the transformation of S. lividans, the standard protocols described in Hopwood et al.(1986) were followed.
Escherichia coliToplO (INVITROGEN) was used as host for the cloning of the PCR products and E. coli Sure (STRATAGENE) was used as host for the shuttle cosmid pOS7001. The E. coli culture conditions, the preparation of plasmids, the digestion of the DNA and the agarose gel electrophoresis were carried out according to the standard procedures (Sambrook et al.,1996).
2. PCR primers: The primer pairs al-a2 and bl-b2 were defined by the team of N. Bamas-Jacques and their use was optimized for the screening of the 132 DNA from the pure strains and of the soil library for the investigation of genes encoding PKSI.
Table 8 PCR primers that are homologous to the PKSI genes used for screening the library.
al 5' CCSCAGSAGCGCSTSTTSCTSGA 3' a2 5' GTSCCSGTSCCGTGSGTSTCSA 3' bl 5' CCSCAGSAGCGCSTSCTSCTSGA 3' b2 5' GTSCCSGTSCCGTGSGCCTCSA 3' Amplification conditions: For the investigation of PKS I from the DNA of pure strains, the amplification mixture contained: in a final volume of 50 pl, between 50 and 150 ng of genomic DNA, 200 pM of dNTP, 5 mM of MgCI 2 final, 7% DMSO, lx Appligene buffer, 0.4 pM of each primer and 2.5 U of Appligene Taq polymerase. The amplification conditions used are: denaturing at for 2 minutes, hybridization at 650C for 1 minute, elongation at 72°C for 1 minute, for the first cycle, followed by 30 cycles in which the temperature is reduced to 58°C, as described in K. Seow et al., 1997. The final extension step is carried out at 72°C for 10 minutes.
For the investigation of PKS I from the DNA of the library, the PCR conditions are the same as above for the al-a2 pair using between 100 and 500 ng of cosmid extracted from pools of 48 clones.
133 For the bl-b2 primer pair, 500 ng of cosmids derived from pools of 96 clones were used. The amplification mixture contained 200 pM of dNTP, mM of MgCI 2 final, 7% DMSO, lx Quiagen buffer, 0.4 /jM of each primer and 2.5 U of hot-start Taq polymerase (Qiagen). The amplification conditions used are: denaturing for 15 minutes at 950C followed by cycles: 1 minute of denaturing at 95°C 1 minute of hybridization at for the first cycle and 62°C for the other cycles, 1 minute of elongation at 72°C, final extension step of 10 minutes at 720C.
The identification of the positive clones from the pools of 48 or 96 clones is carried out using replicas of the corresponding parent microplates on solid medium or any other standard replication method.
3 Subcloning and sequencing The PCR products of the clones identified were sequenced according to the following protocol: The fragments are purified on agarose gel (gel extraction kit (Qiagen)) and cloned into E.coli TOP 10 (Invitrogen) using the TOPO TA cloning kit (Invitrogen). The plasmid DNA of subclones is extracted by alkaline lysis on a Biorobot (Qiagen) and dialysed for 2 h on a 0.025 pm VS membrane (Millipore). The samples are sequenced with the "universal" and "reverse" M13 primers on the ABI 377 96 sequencer (Perkin Elmer).
4) Results Definition and validation of the PCR primers Two highly conserved regions of actinomycetes type I PKS, comprising the active site of the enzyme, were targeted for the amplification of homologous genes with degenerate primers. These two 134 regions correspond to the sequences PQQR(L)(L)LE and VE(A)HGTGT, respectively.
Primers (Table 8) were tested with the DNA of strains producing or not producing macrolides: Streptomyces coelicolor, Streptomyces ambofaciens, producing spiramycin, and Saccharopolyspora erythraea, producing erythromycin. Irrespective of the primers used, bands representing fragments of about 700 pb and corresponding to the length of the expected fragment were obtained with all the strains.
These results demonstrate the specificity of the primers a and b for the PKS I genes of productive strains or of silent genes in S. coelicolor.
The sequencing of the PCR products obtained with the al-a2 primer pair made it possible to identify, from the S. ambofaciens strain, the sequence of a KS gene already described (European patent application No. EP 0 791 656) as belonging to the pathway for the biosynthesis of plantenolide, a macrolide precursor of spiramycin, and two sequences never described, Stramb 9 and Strambl2 (see sequence listing).
As regards S. erythraea, the screening method allowed the identification of a sequence of KS (saceryl7) which is identical to that of the KS of module 1 already published in Genebank (Access number M63677), encoding synthetase 1 (DEBS1) of 6-deoxyerythronolide B.
Another sequence not correlated to the erythromycin biosynthetic pathway was identified and is the sequence SEQ ID No 32.
Conclusion A method for analysing the presence of genes encoding type I PKSs by PCR from different microorganisms has been developed. The highly conserved structure of the type I keto-synthetase domain made it possible 135 to produce a PCR method based on the use of GC-biased degenerate primers for the choice of the codons.
This approach shows the possibility of identifying genes or clusters involved in the biosynthetic pathway of type I polyketides. The cloning of these genes allows the creation of a collection which may then be used to construct polyketide hybrids. The same principle can be applied to other classes of antibiotics.
The results obtained here also show the presence of genes which may belong to silent clusters (SEQ ID No 30 to 32).
The presence of silent clusters has already been documented in S. lividans and their expressions are triggered by specific or pleiotropic regulators (Horinouchi et al.; Umeyama et al. 1996). These results suggest that the detection of genes belonging to so-called silent pathways in reality encode active enzymes capable of directing, in combination with the other specific enzymes of the pathway, the enzymatic steps required for the synthesis of the secondary metabolites.
Screening of the library The screening was carried out under the conditions described in the Materials and Methods section using the primer pairs validated from productive strains.
In the presence of the al-a2 primer pair, the size of the PCR products obtained from cosmid DNA extracted from pools of 48 or 96 clones was about 700 bp, which is thus in agreement with the expected results.
The intensity of the bands obtained was variable, but only one amplification band was present for each pool of target DNA.
136 Under these conditions, 8 groups of target DNA were detected, corresponding to 9 positive clones after dereplication.
The screening carried out with the second primer pair, bl-b2, gave less specific amplification results since many satellite bands were observed alongside the 700 bp band. Nevertheless, 9 groups of target DNA were detected, corresponding to 14 positive clones after dereplication starting with these positive clones, and the DNA was extracted for the steps of sequencing and transformation of S. lividans.
Analysis of the cosmids Digestion of the cosmids identified by PCR with the enzyme Dral, which recognizes an AT-rich site, frees a fragment greater than 23 kb (Figure 22). This suggests that the PCR method preferentially targets soil DNA containing a high percentage of G+C. This result is the consequence of the degeneracy of the primers used, which are GC-biased, for the choice of the codons. The inserts, as expected in the case of cosmids, are larger than 23 kb in size, except in one case (clone a9B12), which might reflect a certain level of instability of the cosmids. Moreover, among all the clones selected, only two of them, GS.F1 and GS.G11, showed the same restriction profile, indicating a low level of redundancy in the library.
The cosmids selected were transferred into Streptomyces lividans by transformation of protoplasts in the presence of PEG 1000. The transformation efficacy ranges between 30 and 1000 transformants per jg of cosmid DNA used.
Sequencing and phylogenetic analysis of the soil PKS I genes The PCR method developed on the pure strains was used as described on the cosmids of the library and 24 clones were thus identified.
137 The PCR products of about 700 bp obtained from the DNA of two pools (48 clones) and of 8 unique clones, were cloned, after purification on agarose gel, and sequenced. This allowed the identification of 11 sequences.
The alignment of the deduced protein sequences of soil PKSs I with other PKSs I present in different microorganisms (Figure 24) shows the presence of a highly conserved region which corresponds to the consensus region of the active site of 3-ketoacyl synthetase.
Analysis of the sequences obtained with the "codon preference" method (Gribskov et al., 1984 Bibb et al., 1984) revealed the presence of a strong bias in the use of codons rich in G+C in a single reading frame.
The proteins deduced according to this reading frame show strong similarity with known type I KSs (Blast program). In particular, the similarity between the sequences of KSs from the soil and of KSs of the erythromycin cluster is about 53%.
After dereplication of a pool and identification of the unique clone, the sequence of the PCR product obtained from this clone is identical to that of the pool, which confirms the reliability of the method used.
Analysis of the sequence of the PCR product of a clone allowed the probable identification of 3 different KSI genes. One of these sequences (SEQ ID No 34) has a similarity of 98.7% with the sequence of another pool, suggesting that they encode the same enzyme. The other two sequences are different but strongly homologous.
The cloning and identification in a soil DNA library of pathways for the biosynthesis of secondary metabolites containing genes encoding type I KSs is described here for the first time.
The high percentage of G+C in the soil sequences suggests that they may derive from genomes having a codon use similar to that of actinomycetes.
Although the data available in the literature is limited, it is known that the genes encoding type I PKSs are highly diversified on account of their physical organization in the genome, size and the number of modules contained in each gene.
The presence of several domains originating from a single clone is confirmation that they belong to asymmetric polyketide clusters. In a single case, two clones appear to form a contiguum since they share the same sequence for the KS domain.
The size of the genetic regions involved in PKSI synthesis ranges between a few kb for penicillin to about 120 kb for rapamycin. The size of the cosmid inserts may thus not be sufficient for the expression of the most complex clusters.
Genes encoding PKSs I, capable of working iteratively like the PKSs II and of controlling the synthesis of aromatic polyketides, have been described (Jae-Hyuk et al., 1995). The study of soil PKS I clusters may provide further novelties in this field.
Identification of 6 genes encoding polyketide synthases On continuing the screening of the cosmid library according to the protocols described in the present example, the inventors identified a cosmid clone containing a 34071-bp insertion containing several open reading frames encoding polypeptides of the polyketide synthase type.
More specifically, the cosmid thus identified by screening the library contains six open reading frames encoding polyketide synthase 139 polypeptides or very closely related polypeptides, non-ribosomal synthase peptides. A detailed map of this cosmid is represented in Figure 36.
The complete nucleotide sequence of the cosmid constitutes the sequence SEQ ID No. 113 of the sequence listing. The DNA insertion contained in the sequence SEQ ID No. 113 constitutes the complementary nucleotide sequence strand) of the nucleotide sequence encoding the various polyketide synthases.
The nucleotide sequence of the DNA insertion contained in the cosmid in Figure 36 which comprises the open reading frames encoding the polyketide synthase polypeptides strand) is represented schematically in Figure 37 and constitutes the sequence SEQ ID No. 114 of the sequence listing.
Furthermore, a detailed map of the various open reading frames contained in the DNA insertion of this cosmid is represented in Figure 37.
The characteristics of the nucleotide sequences comprising open reading frames contained in the DNA insertion of this cosmid are detailed below.
ORF1 Sequence The orf1 sequence comprises a partial open reading frame 4615 nucleotides long. This sequence constitutes the sequence SEQ ID No. 115, which starts at the nucleotide in position 1 and ends at the nucleotide in position 4615 of the sequence SEQ ID No. 114.
The sequence SEQ ID No. 115 encodes the 1537-amino acid ORF1 polypeptide, this polypeptide constituting the sequence SEQ ID No. 121.
The polypeptide of sequence SEQ ID No. 121 is related to the nonribosomal synthase peptides. This polypeptide has a degree of amino acid identity of 37% with the synthase peptide of Anabaena sp.90 referenced under the access number "emb CACO1604.1" in the Genbank database.
ORF2 sequence The orf2 nucleotide sequence is 8301 nucleotides long and constitutes the sequence SEQ ID No. 116, which starts at the nucleotide in position 4633 and ends at the nucleotide in position 12933 of the sequence SEQ ID No. 114.
The ORF2 sequence encodes the 2766-amino acid ORF2 peptide, this polypeptide constituting the sequence SEQ ID No. 122.
The polypeptide of sequence SEQ ID No. 122 has an amino acid sequence identity of 41% with the MtaD sequence of Stigmatella aurantiaca referenced under the access number "gb AAF 19812.1" from the Genbank database.
The ORF2 polypeptide constitutes a polyketide synthase.
ORF3 sequence The orf3 nucleotide sequence is 5292 nucleotides long and constitutes the sequence SEQ ID No. 117. The sequence SEQ ID No. 117 corresponds to the sequence which starts at the nucleotide in position 12936 and which ends at the nucleotide in position 18227 of the sequence SEQ ID No. 114.
The nucleotide sequence SEQ ID No. 117 encodes the 1763-amino acid ORF3 polyketide synthase polypeptide, this polypeptide constituting the sequence SEQ ID No. 123 according to the invention.
The ORF3 polypeptide of sequence SEQ ID No. 123 has an amino acid identity of 42% with the MtaB sequence of Stigmatella aurantiaca referenced under the access number "gb AAF 19810.1" from the Genbank database.
ORF4 sequence The orf4 nucleotide sequence is 6462 nucleotides long and constitutes the sequence SEQ ID No. 118 according to the invention.
The nucleotide sequence SEQ ID No. 118 corresponds to the sequence starting at the nucleotide in position 18224 and ending at the nucleotide in position 24685 of the nucleotide sequence SEQ ID No. 114.
The nucleotide sequence SEQ ID No. 118 encodes the 2153-amino acid ORF4 polyketide synthase polypeptide, this polypeptide constituting the sequence SEQ ID No. 124 according to the invention.
The ORF4 polypeptide of sequence SEQ ID No. 124 has an amino acid sequence identity of 46% with the epoD sequence of Sorangium cellulosum referenced under the access number "gb AAF62883.1" of the Genbank database.
sequence The orf5 nucleotide sequence is 5088 nucleotides long and constitutes the sequence SEQ ID No. 119 according to the invention.
The sequence SEQ ID No. 119 corresponds to the sequence starting at the nucleotide in position 24682 and ending at the nucleotide in position 29769 of the nucleotide sequence SEQ ID No. 114.
The nucleotide sequence SEQ ID No. 119 encodes the 1695-amino acid ORF5 polyketide synthase polypeptide, this polypeptide constituting the sequence SEQ ID No. 125 according to the invention.
142 The ORF5 polyketide synthase polypeptide of sequence SEQ ID No. 125 has an amino acid identity of 43% with the epod sequence of Sorangium cellulosium referenced under the access number "gb AAF 62883.1" of the Genbank database.
ORF6 sequence The orf6 nucleotide sequence is 4306 nucleotides long and constitutes the sequence SEQ ID No. 120 according to the invention. The nucleotide sequence SEQ ID No. 120 corresponds to the sequence starting at the nucleotide in position 29766 and ending at the nucleotide in position 34071 of the sequence SEQ ID No. 114.
The sequence SEQ ID No. 120 contains a partial open reading frame encoding the 1434-amino acid ORF6 polypeptide of the polyketide synthase type, this polypeptide constituting the sequence SEQ ID No. 126 according to the invention.
The polypeptide of sequence SEQ ID No. 126 has an amino acid identity of 43% with the epoD sequence of Sorangium cellulosum referenced under the access number "gb AAF 62883.1" of the Genbank database.
EXAMPLE 15: Construction of shuttle vectors of integrative BAC type in Streptomyces Construction of shuttle vectors of the integrative and conjugative BAC type in Streptomyces 15.1 Construction of the vector pMBD-1 143 The vector BAC pMBD-1 was obtained according to the following steps: Step 1: The vector pOSVO10 was subjected to a digestion with the enzymes PsTI and BstZ171 in order to obtain a 6.3-kb nucleotide fragment.
Step 2: The vector pDNR-1 was digested with the enzymes Pstl and Pvull in order to obtain a 4 145-kb nucleotide fragment.
Step 3: The 6.3-kb nucleotide fragment derived from the vector pOSV017 was fused by ligation with the 4.15-kb fragment derived from the vector pDNR-1, so as to produce the vector pMBD-1, as illustrated in Figure 15.2 Construction of the vector pMBD-2 The vector pMBD-2 is a vector of the BAC type containing an ")c31 int-Qhyg" integrative box.
Oc31 is a broad host spectrum temperate phage whose site of attachment (attP) is well localized. The Oc31 int fragment is the minimum fragment of the actinophage (c31 capable of inducing the integration of a plasmid into the chromosome of Streptomyces Lividans.
ihyg is a derivative of the S interposon capable of conferring hygromicin resistance in E.coli and S.Lividans.
BAC vectors containing the Oc31 integration system are described by Sosio et ai. (2000) and in PCT patent application No. 99/6734 published on 29 December 1999.
144 The vector BAC pmBD-2 was constructed according to the following steps: Step 1: Construction of a (c31int S2hyg integrative box in an E.coli multicopy plasmid.
The 4c31int fragment was first amplified from the plasmid pOJ436 using the following pair of primers: The primer EV)c311 (SEQ ID No.109) (which allows the introduction of an EcoRV site into the 5' end of the Oc31 sequence) and the primer BI)c31F (SEQ ID No. 110) (which allows the introduction of a BgLII site into the 3' end of the (c31 sequence).
The Qhyg fragment was obtained by digestion using the BamHI enzyme of the plasmid pHP45 Qhyg described by Blondelet-Rouault (1997).
Next, the )c31 int-9hyg integrative box was cloned into the vector pMCS5 digested with the enzymes Bglll and EcoRV.
Step 2: Construction of the vector pMBD-2 The bacterial artificial chromosome pBAce3.6 described by Frengen et al. (1999) was digested with the enzyme Nhel and then treated with the enzyme Eco polymerase.
Next, the vector pMCS5 Oc31 int-Qhyg was digested with the enzymes SnaBI and EcoRV so as to recover the integrative box.
The detailed map of the vector pMBD2 is represented in Figure 31.
15.3 Construction of the vector pMBD-3 The vector pMBD-3 is an integrative (Qc31 int) and conjugative (OriT) vector of the BAC type, which comprises the selection marker 9hyg.
145 The map of the vector pMBD-3 and also the method for constructing it are illustrated in Figure 31.
The vector pMBD-3 was obtained by amplifying the OriT gene starting with the plasmid pOJ436 using the pair of primers of sequences SEQ ID No. 111 and SEQ ID No. 112 which contain pad restriction sites.
The nucleotide fragment amplified using the primers SEQ ID No. 111 and SEQ ID No. 112 was cloned into the vector pMBD2 predigested with the Pad enzyme. The scheme for constructing the vector pMBD-3 is illustrated in Figure 31.
15.4 Construction of the vector pMBD-4 The detailed map of the vector pMBD-4 is represented in Figure 32.
The vector pMBD-4 was obtained by cloning the Oc31 int-Qhyg integrative box into the vector pCYTAC2.
15.5 Construction of the vector The scheme for constructing the vector pMBD-5 is illustrated in Figure 33.
The vector pMBD-5 was constructed by recombination of the nucleotide fragment included between the two loxP sites of the vector pMBD-1 illustrated in Figure 33 with the loxp site contained in the BAC vector designated pBTP3, a detailed map of the plasmid pBTP3 being represented in Figure 34.
146 15.6 Construction of the vector pMBD-6 The vector pMBD-6 was constructed by recombining the nucleotide fragment included between the two loxP sites of the vector pMBD-1 into the loxP site of the BAC pBeloBacl 1 vector, as represented in Figure 147 TABLE 1 Location of the sampling sites and characteristics of the soils used in the various experiments.
The direct microbial counts using staining with acridine orange were carried out before and after grinding the soil.
Number Origin Texture Amount of Organic pH Number of Number of sand loam clay matter cells before cells after (g/kg of dry grinding grindinga soil) a(x10 9 /g dry (x10 9 /g dry weight of soil) weight of soil) 1 Australia Sandy clay 62 22 6 49.7 5.8 6.5(0.9) 2.9(1.3) 2 Peyrat le Chateau, Sandy clay 61 26 13 48.2 4,9 7.3(0.6) 5.4(0.8) France 3 St-Andre coast, Sandy compost 50 41 9 40.6 5.6 10.0(0.7) 7.5(1.4) France 4 Chazay d'Azergue, Clayey sandy 34 47 19 13.9 5.8 7.8(1.1) 4.2(0.6) France compost Guadeloupe, France Clay 27 26 47 17.0 4.8 1.4(0.4) 0.5(0.1) 6 Dombes, France Clayey sandy 20 67 13 30.3 4.3 7.5(0.5) 5.6(0.9) compost
I
a n=3; standard deviation in parentheses 148 TABLE 2 Primers and probes used for the PCR amplification and the dot-blot hybridization Primer or probe Target a) Sequence to Reference No.
FGPS431 probe Universelle (1392-1 406) ACGGGCGGTGTGT(AIG)C Amann et al., 1995 FGPS122 primer Bact6ries (6-27) GGAGAGTTTGATCATGGCTCAG Amann et al., 1995 FGPS350 primer Streptosporangium (616-635) CCTGGAGTTAAGCCCCCAAGC This study FGPS643 probe Streptosporangium (122-1 42) GTGAGTAACCTGCCCC(T/C)GACT This study R499 primer Bacillus anthracis TTAATTCACTTGCAACTGATGGG Patra et al., 1996 R500 primer Bacillus anthracis AACGATAGCTCCTACATTTGGAG Patra et al., 1996 0501 probe Bacillus anthracis TTGCTGATACGGTATAGAACCTGGC Patra et al., 1996 FGPS516 primer S. lividans 0S48.3 TCCAGATCCTTGACCCGCAG This study FGPS51 7 primer S. lividans 0S48.3 CACGACATTGCACTCCACCG This study FGPS518 probe S. lividans 0s48.3 CCGTGAGCCGGATCAG This study aThe positions on the Ecoli 16S rRNA gene are given in parentheses. For B. anthracis and S. lividans, the primers and probes target chromosomal sequences specific for the respective organisms. These sequences are not located in the 16S rRNA gene. The cassette containing the target region of S. lividans is described by Clerc-Bardin et al. (unpublished).
149 TABLE 3 Amount of DNA extracted from different soils after lysis treatments according to protocols 1 to 5 (pg ADN/g of weight of dry soil standard deviation)a Soils1, 2, 3 and 6; n 3; soil 4: n =1.
Soil Number and origin 1 2 1. Australia 17+/-2 52+/-2 2. Peyrat 29+/-2 58+/-1 3. St-Andre coast 36+/-7 60+/-6 4. Chazay 9 16 6. Dombes 26+/-3 3 32+/-I 40+/-2 148+/-1
ND
43+/-1 Lysis protocol numberb 4a S 16+/-3 29+/-2 0 94+/-7 32 61+/- 4b 33+/-2 18+/3 38+/-6 15 66+/-1 5a 59+/-1 56+/-1 73+/-5 15 160+/-7 27+/-0 15+/-1 47+/-6 102+/-5 1 60+/-7 1 02+/-S a Quantification by phosphorescence imaging after dot-blot hybridization with the universal probe FGPS431 (Table 2).
b1: no treatment; 2: dry-grinding of the soil 3: Cr Ultra-turrax homogenization 4a: G H Microtip sonication 4b: G H+ Cup Horn sonications 5a: Cr H NT chemical/enzymatic lysis.
See also Figure 1.
c ND not determined.
Table 4: Primers and probes used in the molecular characterization of the DNAs extracted from the soil Target Sequence Positiona (primer or probe) FGPS 612 Eubacteria (primer) C(C/T)AACT(T/C/A)CGTGCCAGCAGCC 506 -525 FGPS 669 Eubacteria (primer) GACGTC(AIG)TCCCC(AIC)CCTTCCTC 1174 -1194 FGPS 618 Eubacteria (probe) ATGG(T/C)TGTCGTCAGCTCG 1056 -1073 FGPS 614 a-Proteobacteria (probe) GTGTAGAGGTGAAATTCGTAG 683 -703 FGPS 615 b-Proteobacteria (probe) CG GTG GATGATGTG GATT 939 -956 FGPS 616 g-Proteobacteria (probe) AGGTTAAAACTCAAATGA 900 -917 FGPS 621 Gram+ with low GC% (probe) ATACGTAGGTGGCAAGCG 532 -549 FGPS 617 Actinomycetes (probe) GCCGGGGTCAACTCGGAGG 1159 -1149 FGPS 680 Streptomycetes (probe) TGAGTCCCCA(A/CJT)C(T/A)CCCCG 1132 -1149 FGPS 619 Streptosporangium (probe) GCTTGGGGCTTAACTCCAGG 609 -628 aposition on the Escherichia coi 16S rRNA gene Table Extraction efficacies of the bacterial cells on a Nycodenz gradient and amounts of DNA extracted.
Effect of incubating the soil sample in a 6% yeast extract solution, prior to the dispersion and centrifugation on a density gradient.
Bacteria extracted DlNA aytrnrtt Total microflora a Culturable microflora Culturable Direct lysis Lysis on a arose bacterialg dry soil cf u/g dry soil actinomycetesc blockg.e cfu/g dry soil ng DNA/g ng DNA/g dry Without incubationdrsoloi Soil suspension 1.3 x 109( 0.1) 6.9 x 106( 0.2) 8.6 x 106 Cell extract 1.9 X10 8 4.1 x 2.5 x 106 333 35) 221 Extraction efficacy With incubation in_6%_yeast extract Soil suspension C e ll __ex tra c t_ 1 .2 x 1 0 0 1 7 .6 x 1 0 1 .1 6 .6 x 1 0 0 .4 Extraction__ ____efficacy__ 1.6 x 108(± 0.3) 5.3 x 10~6 3.7 x 1 06(± 344 30) 341 67)- _13% 17% 5% a :Counting by microscope after staining with acridine orange b Counting on 10% Trypcase-Soja solid medium c Counting on HV Agar solid medium, after enrichment for 20 minutes at 4000 in a solution of 6% yeast extract 0.05% SDS d The amount of DNA extracted was evaluated on electrophoresis gel relative to a calibration range of calf thymus DNA.
e The quantification was carried out after digesting the agarose by the action of a P-agarase.
152 Table 6: Characterization of the DNAs extracted as a function of proteobacteria subclases a, b and g in Gram+ with low GC% and actinomycetes; the hybridization signal with the prokaryotic probe serving as 100% reference.
a-Proteobacteria b-Proteobacteria g-Proteobacteria G ram+ Actinomycetes I Streptomycetes Direct extractior Indirect extracti( Lysis Block Block YE incubi a 7.7 1.4) 5.3 3.3 09) 3.1 0.8 SOl 10.9 6.4 14.3 7.9 8.5 1.4) 3.0 lysis 2.9 5.4 111.3%(±1.4) lysis ation 6.3 1.4) 7.5 1.4) 17.0 1.4) 18.1 1.4) 19.4 1.4) 4.6 1.4) a grinding in a centrifugal-force tungsten bead grinder (extraction protocol described in the article by Frostegard et al.) YE 6% yeast extract solution 153 Table 7: Diversity of the 16S rDNA sequences contained in the cosmid library Pool No.
(clone No.) at-Proteobacteria a24.1 (2) a4-a6-a7 (7) a4-a6-a7 (23) a52-a53-a5 (15) a49-a50-a51 (22) a49-a50-a51 (11) a4-a6-a7 (14) a30-a31 -a32 (7) al1 9-a20-a26 (5) a37-a38-a39 (6) a 19-a20-a26 (9) a46-a47-a48 (14) Closest neighbour identified -1 r of Closest neighbour of similarity (casfctorfrne Azospirillum brasilense Azospirillum brasilense Azospirillum brasilense Azospirillum lipoferum Agrobacterium tumnefaciens Rhizobium sp Rhizobium sp Bradyrhizobium japonicum Bradyrhizobium genosp Mesorhizobium sp.
Bradyrhizobium sp Phyllobacterium rubiacearum 97.7%__ 88.9% Str L-87 (a-proteobacteria)l 89.8% 97.6%Jt____ 95.0%/o Clone JN 15d (unpublished) 95.5% 99.7% 99.7% 99.4% 93.3% Clone DA122 (unpublished) 95.9% 98.9% 90.2% CloneS-26(a-proteobacteria)2 95.9% 97.6% L. __I 154 TABLE 7 (continued 1) Diversity of the 16S rONA sequences contained in the cosmid library Pool No. Closest neighbour identified of Closest neighbour of (clone No.) similarity (classification, reference) similarity a49-a50-a51 Caulobacter henricii 97.0% al -a2-a3 (13) Caulobacter sp. 96.3% a52-a53-a5 Mesorhyzobium mediterraneumn 92.1% Clone DA122 (unpublished) 94.8% a34-a35-a36 Rhodobium orientis 91.8% Clone (unpublished) 95.1% al1-a2-a3 Sphingomonas sp. 94.7% Clone PAD23 (unpublished) 95.1% a8-a9-al10 (13) Sphingomonas sp. 94.0% y-Proteobacteria a40-a41 -a42 (13) Pseudomonas sp 98.9% clone G26(g-proteobacteria) 3 99.7% a15-a16-a17 (12) Lysobacter antibioticus 94.4% clone vadinHA77(g-Proteo) 4 93.6% a15-aI6-a17 (5) Xanthomonas sp 93.4% clone vadinHA77(g-Proteo) 4 94.6% a 19-a20-a26 (13) Luteimonas mephitis 92.9% Strain rJl5 (unpublished) 93.5%0/ a46-a47-a48 Methylobacter whittenburyi 88.3% soil clone S-43(g-Proteo) 2 88.9 al1 -al 2-al 3 (11) Methylobacter whittenburyl 88.3% soil clone S-43(g-Proteo) 2 889 TABLE 7 (continued 2) Diversity of the 16S rDNA sequences contained in the cosmid library Pool No. Closest neighbour identified of Closest neighbour of (clone No.) similarity (classif ication, reference) similarity a34-a35-a36 Methylococcus capsulatus 84.9% soil clone S-i 2 (d-Proteo) 2 85.6% a43-a44-a45 (10) Legionella birminghamensis 88.9% A8-a9-a 10 Lamprocystis roseopersicina 87.5% Clone 2-100OC1 4 (unpublised) 95.1% 3- Proteobacteria a27-a28-a29 Rhodocyclus tenuis 90.2% Clone OPB37 (b-proteo) 5 91% 8- Proteobacteria a8-a9-al10 (18) Nannocystis exedens 92.0% all1-al 2-al13 Geobacter sulfurreducens 91.5% a27-a28-a29 Desulfoacinum infernum 88.4% Clone S-31 (d-Proteo) 2 89.1 a40-a41 -a42 Desulfivibrio aminophilus 85.3% Clone S-34 (d-Proteo) 2 86.2% G+ with low GC% a23.1 Kurthia zopfii 97.3% a25.1 Kurthia zopfii 97.2% a 18.1 (22) Kurthia gipsonii 94.4% G+ low GC% not identified 94.8% I_ RS1 9 (unpublished) TABLE 7 (continued 3) Diversity of the 16S rDNA sequences contained in the cosmid library Acti nomycetes a33. 1 Cellulomonas sp 99.5% a 14.7 Streptosporangium longisporum 99.8% a 21 .7 Arthrobater polychromogenes 99.2% a8-a9-al10 Arthrobacter oxydans 98.3% actinomycete not identified RSW1 (unpublished) 98.5% a27-a28-a29 Arthrobacter oxydans 98.9% actinomycete not identifed RSW1 (unpublished) 99.3% Acidobacterium a43-a44-a45 Holophaga foetida 87.3% Clone 32-10 95.0% (Acidobacterium phylum) 6 a27-a28-a29 (12) Desulfuromonas acetexigens 88.8% Clone SvaO5l 5 (Acidobacterium phylum) 6 91 .0% a37-a38-a39 (12) Desulfuromonas palmitatis 90.3% Clone Sva0515 (Acidobacterium phylum) 6 91.5% a37-a38-a39 (14) Halothermothrix orenii 87.5% Clone ii3-7 93.3% (Acidobacterium phylum) 6 a8-a9-al10 Pelobacter carbinolicus 86.5% Clone ii3-15 92.6% phylu M) 6 a34-a35-a36 (10) Nitrococcus mobilis 90.6% Clone RB43 93.7% (Acidobacterium phylum) 6 Not classified 1- L a22.1 Aerothermobacter marianas 89.1% 'GONZALEZ et al. (1996) 2 Zhou et al. (1997) 3 Pederson et al. (1996) 6Ludwig (1997) Eubacteria not identified (unpublished) 1 93.4% 1~ 4 Godon et al. (1997) -5 Hugenholtz et al. (1998) I 93.4% 157 TABLE 9: Sequences Name SEQ ID No Probes and primers FGPS431 1 FGPS122 2 FG PS350 3 FGPS643 4 FGPS643 R499 6 R500 7 0501 8 FGPS516 9 FGPS51 7 FGPS51 8 11 FGPS612 12 FG PS669 13 FGPS61 8 14 FGPS614 FG PS615 16 FGPS61 6 17 FGPS621 18 FGPS61 7 19 FG PS680 FGPS61 9 21 63f 22 1387r 23 Oligo-1 (Example 10) 24 Oligo-2 (Example 10) Al1 26 A2 27 Bi1 28 B2 29 PKS-I nucleic acids Amb9 Ambl2 31 Eryl 9 32 A9b12 33 A23G1 1-1 34 A26G1 1-2 A26G1-10 36 158 TABLE 9 (continued 1):Sequences Name SEQ ID No E4-16 .37 A49F1 -32 38 Al 7d2-3 39 A53F1 1-13 A53F1 1-14 41 A22A 2-11 42 A36E8- 1 43 A52E8-2 44 PKS-I amino acid sequences Amb9 Ambl2 46 Eryl 9 47 A9b1 2 48 A23G1 1-1 49 A26G 1 1-2 A26G1-10 51 E4-16 52 A49F1 -32 53 Al 7d2-3 54 A53F1 1-13 A53F1 1-14 56 A22A 2-11 57 A36E8-1 58 A52E8-2 59 16S rDNA sequences________ a24.1 a4.a6.a7 61 a52.a53.a5(1 5) 62 a49.a50.a5l (1 1) 63 a4.a6.a7(1 4) 64 a30.a3l.a32(7) a37.a38.a39(6) 66 a46.a47.a48(14) 67 a49.a50.a51 68 a52.a53.a5 69 a8.a9.alO 13) al .a2.a3(1 3) 71 a43. a44. a45 (10) 172 [a27.a28.a29(5)17 159 TABLE 9 (continued 2):Sequences Name SEQ ID No a23.1 74 a25.1 a18.l (22) 76 a33.1 77 al14.7 78 a21.7 79 a8.a9.al 0(7) a8.a9. al 0(1 8) 81 a27.a28.a29 82 a34.a35.a36(5) 83 a22.1 (19) 84 al l.a1 2.a1 3(5) al 9.a20.a26(9) 86 a40.a4l .a42(6) 87 a27.a28.a29(8) 88 a27.a28.a29(12) 89 a37.a38.a39(12) a46.a47.a48(6) 91 al l.a1 2.a1 3(l 1) 92 a15.a16.a17(12) 93 al 5.al 6.al 7(5) 94 al19.a20.a26(13) a37.a38.a39(14) 96 a8.a9.al 0(9) 97 al 9.a20.a26(5) 98 a43.a44.a45(4) 99 al.a2.a3(4) 100 a4.a6.a7(23) 101 a49.a50.a51 (22) 102 a8.a9.al 0(2) 103 a34.a35.a36(3) 104 a34.a35.a36(l 0) 1105 a40.a4l1.a42013) 1Fl06 160 TABLE 9 (continued 3): Sequences Name SEQ ID No.
Primers cos i n (Example 5) 107 cos 2 n (Example 5) 108 Evoc 311 (Example 15) 109 31 F(Example 15) 110 Primer 1 (Example 15) 111 Primer 2 (Example 15 112 PKS-l nucleic acids Cosmid a2641 (vector strand insertion 113 Cosmid a2641 insertion strand 114 orfi1 115 orf2 116 orf3 117 orf4 118 119 orf6 120 PKS-l amino acid sequences______ ORRi 121 ORF2 122 ORF3 123 ORF4 124 125 ORF6 126
REFERENCES
Amann, R. W. Ludwig, and Schleifer. 1995. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev. 59:143- 169.
Atschul Madden Schaffer Zhang Zhang Miller Lipman D.J. (1997) Gapped BLAST and PSI-BLAST: a next generation of protein database search programs Nucleic Acid Research Vol 25 3389-3404 Atschul SF et al., 1990, J. Mol Biol, 215 403-410.
Bakken, L. R. 1985. Separation and purification of bacteria from soil. Appl. Environ.
Microbiol. 49:1482-1487.
Bibb MJ, Findlay PR, Johnson MW, The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences., Gene 30: 1-3, 157-66, Oct, 1984.
Biesiekierska-Galguen M. (1997) "Attenuation biologique de contaminants xenobiotiques dans le sol modele lindane [Biological attenuation of xenobiotic contaminants in soil lindane model]" National DEP Diploma in Toxicology, Universite Claude Bernard Lyons I.
Blondelet-Rouault MH, Weiser J, Lebrihi A, Branny P, Pernodet JL. Institute of Genetics and Microbiology, URA CNRS 2225, Universite Paris XI, Orsay, France. Gene 1997 May 6;190(2):315-7 Borchert S et al., 1992, Microbiology Letters, 92 175-180 Blondelet-Rouault, 1997, Gene, 315-317 Boccard, Smokvina Pernodet Friedmann, A. Guerineau M. (1989).
The integrated conjugative plasmid pSAM2 of Streptomyces ambofaciens is related to temperature bacteriophages. Embo J 8,973-80 Chatzinotas Sandaa Sch6nhuber Amanna Daae Torsvik V., Zeyer Hahn D. (1998) Analysis of broad-scale differences in microbial community 162 composition of two pristine forest soils Systematic and Applied Microbiology Vol 21 579-587 Clegg, C. K. Ritz, and B. S. Griffiths. 1997. Direct extraction of microbial community DNA from humified upland soils. Lett. Appl. Microbiol. 25:30-33.
Clerc-Bardin, Pernodet, A. Frostegard, and P. Simonet. Development of a conditional suicide system for a Streptomyces lividans strain and its use to investigate conjugative transfer in soil. Submitted.
Elledge SJ, Mulligan JT, Ramer SW, Spottswood M, Davis RW. Department of Biochemistry, Baylor College of Medicine, Houston, TX 77030. Proc Natl Acad Sci U S A 1991 Mar 1;88(5):1731-5 Engelen, K. Meinken, F. Von Wintzingerode, H. Heuer, Malkomes, and H. Backhaus. 1998. Monitoring impact of a pesticide treatment on bacterial soil communities by metabolic and genetic fingerprinting in addition to conventional testing procedures. Appl. Environ. Microbiol. 64:2814-2821.
Farrelly, F. A. Rainey, and E. Stackebrandt. 1995. Effect of genome size and rm gene copy number on PCR amplification of 16S rRNA genes from a mixture of bacterial species. Appl. Environ. Microbiol. 61:2798-2801.
Faulkner Jurka J. (1988) Multiple Aligned Sequence Editor (MASE) Trends in Biochemical Sciences Vol 13 321-322 Frengen et al., 1999, Genomics, 58 250-258 *FrostegArd, Tunlid, and BAAth, E. 1991. Microbial biomass measured as total lipid phosphate in soils of different organic content. J. Microbiol. Meth. 14:151-163.
eGiddings, G. 1998. The release of genetically engineered micro-organisms and viruses into the environment. New Phytol. 140:173-184.
*Gladek, and J. Zakrzewska. 1984. Genome size of Streptomyces. FEMS Microbiol. Lett. 24:73-76.
*Gribskov M, Devereux J, Burgess RR, The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression., Nucleic Acids Res 12: 1 Pt 2, 539-49, Jan 11, 1984.
*Guiney et al., 1983, Proc. Natl. Acad. Sci USA, (12) 3595-3598.
Gourmelen Blondelet-Rouault, M.H. Pernodet, J.L. (1998). Characterization of a glycosyl transferase inactivating macrolide, encoded by gimA from Streptomyces ambofaciens, Antimicrob Agents Chemother 42, 2612-9.
Hayakawa, and H. Nonomura. 1987. Humic acid-vitamin agar, a new medium for the selective isolation of soil actinomycetes. J. Ferment. Technol. 65:501-509.
Hayakawa, Ishizawa and H. Nonomura. 1988. Distribution of rare actinomycetes in Japanese soils. J. Ferment. Technol. 66:367-373.
Hickey, R. and H. D. Tresner. 1952. A cobalt containing medium for sporulation of Streptomyces species. J. Bacteriol. 64:891-892.
Hintermann, Crameri, Kieser, and R. Hitter. 1981. Restriction analysis of the Streptomyces glaucescens genome by agarose gel electrophoresis. Arch.
Microbiol. 130:218-222.
Holben, W. J. K. Jansson, B. K. Chelm, and J. M. Tiedje. 1988. DNA probe method for the detection of specific microorganisms in the soil bacterial community.
Appl. Environ. Microbiol. 54:703-711.
Hong Fu et al., 1995, Molecular diversity, 1 121-124 Hopwood DA, Bibb M J, Chater K F, Kieser Bruton Kieser Lydiate Smith Ward J.M. and Scrempf H. 1985. Genetic Manipulation of Streptomyces. A Laboratory manual. The John Innes Foundation, Norwich, U.K.
Hopwood, D. M. J. Bibb, K. F. Chater, T. Kieser, C. J. Bruton, H. M. Kieser, D.
J. Lydiate, C. P. Smith, J. M. Ward, and H. Schrempf. 1985. Genetic manipulation of streptomyces a laboratory manual. The John Innes Foundation, Norwich, United Kingdom.
Hohm B. and Collins 1980, Gene, 11 291-298 Horinouchi Malpartida Hopwood D. et Beppu Mol. Gen. Genet. (1989) 215 :355-357.
Imai Nagata Fukuda Takagi Yano K. (1991) Molecular cloning of a Pseudomonas paucimobilis gene encoding a 17-kilodalton plypeptide that eliminates HCI molecules from ?-Hexachlorocyclohexane" Journal of Bacteriology Vol 17", No21 :6811-6819 Jacobsen, C. and 0. F. Rasmussen. 1992. Development and application of a new method to extract bacterial DNA from soil based on separation of bacteria from soil with cation-exchange resin. Appl. Environ. Microbiol. 58:2458-2462.
Jae-Hyuk Y.U. and Leonard T.J.,1995. Sterigmetscytin biosynthesis in Aspergilus nidulans requires a type I polyketide synthase. J. Bacteriol, (August) 4792-4800.
Ka, J. W. E. Holben, and J. M. Tiedje. 1994. Analysis of competition in soil among 2,4-dichlorophenoxyacetic acid-degrading bacteria. Appl. Environ. Microbiol.
60:1121-1128.
Kah-Tong S et al., 1997, J Bacteriol, G179(23) 7360-7368 Kimura M. (1980) "A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences" Journal of Molecular Evolution Vol 16 111-120 Kuske, C. K. L. Banton, D. L. Adorada, P. C. Stark, K. K. Hill, and P. J.
Jackson. 1998. Small-scale DNA sample preparation method for field PCR detection of microbial cells and spores in soil. Appl. Environ. Microbiol. 64:2463-2472.
Lacalle RA, Pulido D, Vara J, Zalacain M, Jimenez A. Centro de Biologia Molecular (CSIC-UAM), Universidad Autonoma, Canto Blanco, Madrid, Spain. Gene 1989 Jul 15;79(2):375-80 Lee, J. Bollinger, D. Bezdicek, and A. Ogram. 1996. Estimation of the abundance of an uncultured soil bacterial strain by a competitive quantitative PCR method. Appl. Environ. Microbiol. 62:3787-3793.
Left, L. J. R. Dana, J. V. McArthur, and L. J. Shimkets. 1995. Comparison of methods of DNA extraction from stream sediments. Appl. Environ. Microbiol. 61:1141- 1143.
Liesack, and E. Stackebrandt. 1992. Occurrence of novel groups of the domain Bacteria as revealed by analysis of genetic material isolated from an Australian terrestrial environment. J. Bacteriol. 174:5072-5078.
165 Liesack, P. H. Janssen, F. A. Rainey, N. L. Ward-Rainey, and E. Stackebrandt. 1997. Microbial diversity in soil: the need for a combined approach using molecular and cultivation techniques. In J. D. Van Elsas, J. T. Trevors, and E. M.
H. Wellington Modern soil microbiology, Marcel Dekker, Inc., New York. (p 375- 439) Lorentz, M. and W. Wackernagel. 1994. Bacterial gene transfer by natural genetic transformation in the environment. Microbiol. Reviews 58:563-602.
Maidak Cole Parker Garrity Larsen Li Lilburn T.G., McCaughey Olsen Overbeek Pramanik Schmidt Tiedje J.M., Woese C.R. (1999) A new project of the RDP (Ribosomal Database Project) Nucleic Acids Research Vol 27 171-173 Mazodier P. et al., 1989, J. Bacteriol., 171(6) 3583-3585.
Mor6, M. J. B. Herrick, M. C. Silva, W. C. Ghiorse, and E. L. Madsen. 1994.
Quantitative cell lysis of indigenous microorganisms and rapid extraction of microbial DNA from sediment. Appl. Environ. Microbiol. 60:1572-1580.
Murakami T, Holt TG, Thompson CJ, Microbiological Engineering Unit, Institut Pasteur, Paris, France. J. Bacteriol 1989 Mar;171(3):1459-66 *Nagata Hatta Imai Kimbara Fukuda Yano Takagi M. (1993) Purification and characterization of ?-Hexachlorocyclohexane HCH)dehydrochlorinase (LinA) from Pseudomonas paucimobilis" Bioscience, Biotechnology and Biochemistry Vol 57 No 9 1582-1583 Nalin Simonet Vogel Normand P. (1999) Rhodanobacter lindaniclasticus gen.nov., sp., nov., a lindane-degrading bacterium" International Journal of Systematic Bacteriology Vol 49 19-23 Nesme, C. Picard, and P. Simonet. 1995. Specific DNA sequences for detection of soil bacteria. In J. T. Trevors, and J. D. van Elsas Nucleic acids in the environment, methods and application. Springer Lab Manual. (p 111-139) Nilsson B, Uhlen M, Josephson S, Gatenbeck S, Philipson L. Nucleic Acids Res 1983 Nov 25;11(22):8019-30 166 Normand P. et al., 1995, Oceanis, 21(1): 31-56 Ogram, A. M. L. Mathot, J. B. Harsh., J. Boyle, and C. A. Pettigrew, JR. 1994.
Effects of DNA polymer length on its adsorption to soils. Appl. Environ. Microbiol.
60:393-396.
Ogram, G. S. Sayler, and T. Barkay. 1987. The extraction and purification of microbial DNA from sediments. J. Microbiol. Methods 7:57-66.
Olsen, R. and Bakken, L. R. 1987. Viability of soil bacteria: optimization of the plate-counting technique. Microb. Ecol. 13:59-74.
Paget, L. Jocteur Monrozier, and P. Simonet. 1992. Adsorption of DNA on clay minerals: protection against DNasel and influence on gene transfer. FEMS Microbiol.
Lett. 97:31-40.
Patra, P. Sylvestre, V. Ramisse, J. Therasse, and Guesdon. 1996.
Isolation of a specific chromosomic DNA sequence of Bacillus anthrasis and its possible use in diagnosis. FEMS Immunol. Medical Microbiology 15:223-231.
Pernodet J.L. Fish, S. Blondelet-Rouault, M.H. Cundliffe, E. (1996). The macrolide-lincosamide-streptogramin B resistance phenotypes characterized by using a specifically deleted, antibiotic-sensitive strain of Streptomyces lividans. Antimicrob Agents Chemother 40, 581, Pernodet J.L. Gourmelen, Blondelet-Rouault, M.H. Cundliffe, E. (1999).
Dispensable ribosomal resistance to spiramycin conferred by srmA in the spiramycin producer Streptomyces ambofaciens. 145, 2355-64.
Picard, C. Ponsonnet, X. Nesme, and P. Simonet. 1992. Detection and enumeration of bacteria in soil by direct DNA extraction and polymerase chain reaction.
Appl. Environ. Microbiol. 58:2717-2722.
Preud'homme, Belloc, Charpenti6, and Tarridec, P. 1965.
Un antibiotique forme de deux groupes de composants a synergie d'action la pristinamycine [An antibiotic formed from two groups of components with synergistic action: pristinamycin] C. R. Acad. Sci. 260 :1309-1312.
167 Priemd, J. I. B. Sitaula, A. K. Klemedtsson, and L. R. Bakken. 1996. Extraction of methane-oxidizing bacteria from soil particles. FEMS Microbiol. Ecol. 21: 59-68.
Prosser, J. 1994. Molecular marker systems for detection of genetically engineered micro-organisms in the environment. Microbiol. 140:5-17.
Raynal A, Tuphile K, Gerbaud C, Luther T, Guerineau M, Pernodet JL; Laboratory of Biology and Molecular Genetics, Institute of Genetics and Microbiology, URA CNRS 2225, Universite Paris-Sud, Orsay, France. Mol Microbiol 1998 Apr;28(2):333-42 Raynald A. Tuphile, K. Gerbaud, Luther, T. Guerineau, M. Pernodet, J.L.
(1998). Structure of the chromosomal insertion site for pSAM2: functional analysis in Escherichia coli. Mol. Microbiol 28, 333-42.
Richard, G. M. 1974. Modifications of the diphenylamine reaction giving increased sensitivity and simplicity in the estimation of DNA. Analytical Biochem. 57:369-376.
Romanowski, M. G. Lorentz, and W. Wackernagel. 1993. Use of polymerase chain reaction and electroporation of Escherichia coli to monitor the persistence of extracellular plasmid DNA introduced into natural soils. Appl. Environ. Microbiol.
59:3438-3446.
Saitou Nei M. (1987) "The Neighbour-Joining method: a new method for reconstructing phylogentic trees Molecular and Biological Evolution Vol 2 112-118 Sambrook Fritsch E. F. et Maniatis T. 1996. Molecular cloning: a laboratory manual, 2 nd ed. Cold spring Harbor Laboratory Press, Cold Sring Harbor, N.Y.
eSambrook, E. F. Fritsch, and T. Maniatis. 1989. Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, N. Y.
Senoo Wada H. (1989) "Isolation and identification of an aerobic ?-HCHdecomposing bacterium from soil Soil Science, Plant Nutrition Vol 35, No 1 79-87.
Sezonov, Blanc, Bamas-Jacques, Friedmann, A. Pernodet, J.L. Guerineau, M.(1997). Complete conversion of antibiotic precursor to pristinamycin IIA by overexpression of Streptomyces pristinae biosynthetic genes. Nat Biotechnol 15,349-53.
168 Shirling, E. and D. Gottlieb. 1966. Methods for characterization of Streptomyces species. Int. J. Syst. Bacteriol. 16:313-340.
Shizuga et al., 1992, Proc. Natl. Acad. Sci USA, 89 8794-8797.
Siefert, J. and G. E. Fox. 1998. Phylogenetic mapping of bacterial morphology.
Microbiology 144:2803-2808.
Simonet, P. Normand, A. Moiroud, and R. Bardin. 1990. Identification of Frankia strains in nodules by hybridization of polymerase chain reaction products with strain-specific oligonucleotide probes. Arch. Microbiol. 153:235-240.
Smalla, N. Cresswell, L. Mendonca-Hagler, A. Wolters, and D. J. van Elsas.
1993. Rapid DNA extraction protocol from soil for polymerase chain reaction-mediated amplification. J. Appl. Bacteriol. 74:78-85.
Sosio M. et al., 2000, Nature Biotechnology, vol 18 343-345 Smit, P. Leeflang, and K. Wernars. 1997. Detection of shifts in microbial community structure and diversity in soil caused by copper contamination using amplified ribosomal DNA restriction analysis. FEMS Microbiol. Ecol. 23:249-261.
Smokvina T, Mazodier P, Boccard F, Thompson CJ, Guerineau M. Laboratory of Biology and Molecular Genetics, Universite Paris-Sud, Orsay, France. Gene 1990 Sep 28;94(1):53-9 Smolvina, Mazodier, P. Boccard, F. Thompson, C.J. Guerineau, M. (1990).
Construction of a series of pSAM2-based integrative vectors for use in actinomycetes.
Gene 94, 53-9.
Stackebrandt, E. 1988. Phylogenetic relationships vs. phenotypic diversity: how to achieve a phylogenetic classification system of the eubacteria. Can. J. Microbiol.
34:552-556.
Staneck, J. and G. D. Roberts. 1974. Simplified approach to identification of aerobic Actinomycetes by thin-layer chromatography. Appl. Microbiol. 28:226-231.
Stapleton, R. S. Ripp, L. Jimenez, S. Cheol-Koh, J. T. Fleming, 1. R. Gregory, and G. S. Sayler. 1998. Nucleic acid analytical approaches in bioremediation: site assessment and characterization. J. Microbiol. Methods 32:165-178.
169 Steffan, R. J. Goksoyr, A. K. Bej, and R. Atlas. 1988. Recovery of DNA from soils and sediments. Appl. Environ. Microbiol. 54:2908-2915.
Tebbe, C. and W. Vahjen. 1993. Interference of humic acids and DNA extracted directly from soil in detection and transformation of recombinant DNA from bacteria and a yeast. Appl. Environ. Microbiol. 59:2657-2665.
Tercero JA, Espinosa JC, Lacalle RA, Jimenez A. Centro de Biologia Molecular Severo Ochoa, Consejo Superior de Investigaciones Cientificas, Madrid, Spain. J Biol Chem 1996 Jan 19;271(3):1579-90 Thomas Berger Jacquier Bernillon Baud-Grasset Truffaut N., Normand Vogel Simonet P. (1996) Isolation and Characterisation of a novel ?-Hexachlorocyclohexane-degrading bacterium Journal of Bacteriology Vol 178, 6049-6055 Torsvik, V. L. 1980. Isolation of bacterial DNA from soil. Soil Biol. Biochem. 12:15- 21.
Torsvik, R. Sorheim, and J. Goksoyr. 1996. Total bacterial diversity in soil and sediment communities a review. J. Ind. Microbiol. 17:170-178.
Tsai, and B. Olson. 1991. Rapid method for direct extraction of DNA from soil and sediments. Appl. Environ. Microbiol. 57:1070-1074.
Umeyama Tanabe Aigle B.D. et Horinuochi FEMS (1996) 144 :177-184.
Van Elsas, J. G. F. Duarte, A. S. Rosado, and K. Smalla. 1998. Microbiological and molecular biological methods for monitoring microbial inoculants and their effects in the soil environment. J. Microbiol. Methods 32:133-154.
Van Elsas, J. V. Mantynen, and A. C. Wolters. 1997. Soil DNA extraction and assessment of the fate of Mycobacterium chlorophenolicum strain PCP-1 in different soils by 16S ribosomal RNA gene sequence based most-probable-number PCR and immunofluorescence. Biol. Fert. Soils 24:188-195.
Volff JN et al., 1996, Mol. Microbiol., 21(5): 1037-1047.
Volossiouk, E. J. Robb, and R. N. Nazar. 1995. Direct DNA extraction for PCRmediated assays. Appl. Environ. Microbiol. 61:3972-3976.
170 Wahl GM, Lewis KA, Ruiz JC, Rothenberg B, Zhao J, Evans GA. Proc Natl Acad Sci U S A 1987 Apr;84(8):2160-4 Waksman, S. A. 1961. Williams and Wilkins The actinomycetes. Classification, identification and description of genera and species.Vol 2. Baltimore.
Ward, D. R. Weller, and M. M. Bateson. 1990. 16S rRNA sequences reveal numerous uncultured microorganisms in a natural community. Nature 344:63-65.
Widmer, R. J. Seidler, and L. S. Watrud. 1996. Sensitive detection of transgenic plant marker gene persistence in soil microcosms. Mol. Ecol. 5:603-613.
Williams, R. Locci, A. Beswick, D. I. Kurtb6ke, V. D. Kuznetsov, F. J. Le Monnier, P. F. Long, K. A. Maycroft, R. A. Palma, B. Petrolini, S. Quaroni, J. I.
Todd, and M. West. 1993. Detection and identification of novel actinomycetes. Res.
Microbiol. 144:653-656.
Wilson, 1. G. 1997. Inhibition and facilitation of nucleic acid amplification. Appl.
Environ. Microbiol. 63:3741-3751.
Woese, C. R. 1987. Bacterial evolution. Microbiol. Rev. 51:221-271.
Yannish-Perron et al., 1985, Gene, 33(1): 103-119.
Zaslavsky, B. Y. 1995. Separation of biomolecules, p. 503-667. In Aqueous twophase partitioning. Boris Y. Zaslavsky Physical Chemistry and Bioanalytical Applications, Marcel Dekker, Inc., New York.
Zhou, M. A. Bruns, and J. M. Tiedje.1996. DNA recovery from soils of diverse composition. Appl. Environ. Microbiol. 62:316-322.
EDITORIAL NOTE APPLICATION NUMBER 21791/01 The following Sequence Listing pages 1-131 are part of the description. The claims pages follow on pages 171-175 WO 01/40497 1 PCT/FR00/03311 SEQUENCE LISTING <110> Aventis Pharma S.A.
<120> Process for obtaining nucleic acids from a soil sample, nucleic acids thus obtained and their application to the synthesis of compounds <130> Soil DNA library RPR S.A.
<140> <141> <150> FR9915032 <151> 1999-11-29 <160> 126 <170> PatentIn Ver. 2.1 <210> 1 <211> <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:probe FGPS431 <220> <221> variation <222> (14) <223> Base A replaced with G <400> 1 acgggcggtg tgtac <210> 2 <211> 22 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer FGPS122 <400> 2 ggagagtttg atcatggctc ag 22 <210> 3 <211> <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer FGPS350 <400> 3 cctggagtta agccccaagc <210> 4 <211> 24 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:probe FGPS643 <220> <221> variation <222> <223> T replaced with C <400> 4 gtgagtnnna acctgcccct gact 24 <210> <211> 21 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:probe FGPS643-2 <400> gtgagtaacc tgcccccgac t 21 <210> 6 <211> 23 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer R499 <400> 6 ttaattcact tgcaactgat ggg 23 <210> 7 <211> 23 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer R500 <400> 7 aacgatagct cctacatttg gag 23 <210> 8 <211> <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:probe C501 <400> 8 ttgctgatac ggtatagaac ctggc <210> 9 <211> <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer FGPS516 <400> 9 tccagatcct tgacccgcag <210> <211> <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer FGPS517 <400> cacgacattg cactccaccg <210> 11 <211> 16 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:probe FGPS518 <400> 11 ccgtgagccg gatcag 16 <210> 12 <211> <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:FGPS612 <220> <221> variation <222> (2) <223> Base C replaced with T <220> <221> variation <222> (7) <223> Base T replaced with C <220> <221> variation <222> (7) <223> Base T replaced with A <400> 12 ccaacttcgt gccagcagcc <210> 13 <211> 21 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:FGPS669 <220> <221> variation <222> (7) <223> Base A replaced with G <220> <221> variation <222> (13) <223> Base A replaced with C <400> 13 gacgtcatcc ccaccttcct c 21 <210> 14 <211> 18 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:FGPS618 <220> <221> variation <222> <223> Base T replaced with C <400> 14 atggttgtcg tcagctcg 18 <210> <211> 21 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:FGPS614 <400> gtgtagaagt gaaattcgat t 21 <210> 16 <211> 18 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:FGPS615 <400> 16 cggtggatga tgtggatt 18 <210> 17 <211> 18 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:FGPS616 <400> 17 aggttaaaac tcaaatga 18 <210> 18 <211> 18 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:FGPS621 <400> 18 atacgtaggt ggcaagcg 18 <210> 19 <211> 19 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:FGPS617 <400> 19 gccggggtca actcggagg 19 <210> <211> 18 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:FGPS680 <220> <221> variation <222> (11) <223> Base A replaced with C <220> <221> variation <222> (11) <223> Base A replaced with T <220> <221> variation <222> (13) <223> Base T replaced with A <400> tgagtcccca actccccg 18 <210> 21 <211> <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:FGPS619 <400> 21 gcttggggct taactccagg 6 <210> 22 <211> 21 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer 63f <400> 22 caggcctaac acatgcaagt c 21 <210> 23 <211> 18 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer 1387r <400> 23 gggcggngtg tacaaggc 18 <210> 24 <211> <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:oligo-1 <400> 24 gcttatttaa atattaagcg gccgcccggg <210> <211> 28 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:oligo-2 <400> cccgggcggc cgcattaata tttaaata 28 <210> 26 <211> 23 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer al <400> 26 ccncagnagc gcntnttnct nga 23 <210> 27 <211> 22 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer a2 <400> 27 gtnccngtnc cqtgngtntc na <210> 28 <211> 23 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer bi <400> 28 ccncagnaqc gcntnctnct nga <210> 29 <211> 22 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence:primer b2 <400> 29 gtnccngtnc cgtgngcctc na <210> <211> 672 <212> DNA <213> Streptomyces ambofaciens <400> ccccagcagc ccgcgcgcgg ccggtggtgt gcggcggcg accgtcgaca cggcgcggcq gcgttcaccg tcggccgacg ctggcqgacg aaccaggacg atccggcagg cacgqcaccg acgtgttcct tacgcgqtcg tqqccggatc tgctgtccgg cggcgtgttc agtgcgatct agttcgcccg ccgacggcac cccgccgcaa gcgcct ccaa cactqgcgga gc cgaqacggtg ttccgtcggg cgccgacgag ccggqt ctcg gtcgtcgctg ggcactcgcc gcagggcggc gqgctggggc cgggcaccgg cggtctgacg cgcccggctg tggagacct atgttcgtcg ggcctggacq tacgccttcg gtggcccttc ggcggtgtgt ctggccgacg gagggcgtcg gccct cgcqc gcacccaacg tcgccgtcgg tcgaatccgc gcaccaacgg cccacgcggc gcctggaagg acctggccgc cggagatgtc acggccgctg gcgtcctgct tggtacqgg gcccgtccca aggtcgacgc cgqaqtggac acaggactac caccggtaac gccggcggtc gcaggcgctg caccgaggcg caaggccttc ggtggagcgg cagcgcggtc gcagcgagtc ggtcgagacc 120 180 240 300 360 420 480 540 600 660 672 <210> 31 <211> 665 <212> DNA <213> Streptomyces ambofaciens <400> 31 ccccagcagc atgcgcaccc ccgtcggtgg gttctctccg acggcctgct.
gaqtgcgaac qgcttcgqcc qccgacggca gcccqgcgcc ggtgcctccg gccctggcqa gcgtgttcct tgcgcggtgg tcgaccccga gccgcatcqc cctcgtccct tcgccctggt tggaaqacgg ccggctgggg acgggcaccg gcgqcctcac gcgcggcact ggaagcgtcc acqcaccggc agcgctcgac ctacaccttc ggtggcgctg cggtggggtc ctctgccgcc cgagqgtgtc gqtgctqqcc cgcccccaac cgtaccggcc tgggaggcgg gtcttcgccg ggctacctgg gggcttcagg cacctcgccg acggtcatgt gacggccgct ggtqtgctgc qtggtgcgcg ggacctgccc gaggtcgacg tcgagcgggc gcgtgatgta qcacggccaa gaccggcggt cccaggcgct ccggcccgat gcaaggcgtt tggtggagcg gtagcgcggt agcaqcgcgt cggtcgaqac aggcatcgac ccacgactac cqccggcagc caccgtgqac gcccgccggc gatgttcgcg cgccgccgcc gctqtcqgac caaccaggac cat ccgt cag ccacggcacc 120 180 240 300 360 420 480 540 600 660 gggac 665 <210> 32 <211> 671 <212> DNA <213> Saccharopolyspora erythraea <400> 32 ccgcaggagc ccgcacagcc gcgcagttcg ctgagcatca ctgaacaccg ctgggcgaat atggtcgcca tctcgcgcga tcgcgcgctc aacgacggct cgcgacgcct gggaccggca gcgtgttcct tcagggacag cagccggagc tcccggcCag cgtgctcatc cctcggtcgc tgtcgcggtt acggctacgt tggccgatgg tcagcaatgg acgccaacgC c ggaactcgct ccggacgggc cgtcgaccgc qatcgcctac ggctttggtg cttggtcggc cqgagcgatg gcgcggcgaa caacccggtC ccttaccgcg cggggtcgat tgggaagCaC gtgttcttcg atcacccagc ttcctgggct gccatgcacc gggatcagct gccccggacg ggcggcggtg tactgcgtcc ccgagccCgg ccggcacagg ttgataacgc gagctatgtg acaccgcgaC t gcgcqgccc aggcacgcca tgttggtcgc gccggtqcaa tcgtggtgct tgcgcggcag cggcgcagga tcgactacgt gggcatcgca gcacggctac cgggcacgac qgacatgacC aagcatcctg gctggacagc ggcattcgac caaaccgctg cgcggtcaac gcaggtactg cgagacccac <210> 33 <211> 686 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soiI organism <400> 33 ccgcaggagc actgcacgct ctgaccaatc ctgagcaacg ccggcgatgg caggcgctgc cccct ccggg cgcccgt ttg ctcaagcggc acggcggtca cagagccggg tacgtcgaca gcgtgttcct accccggccg tcatgaacaa acaaggactt ccgtcggcac ggctgggcga agggctacct acgccgacgc tggacgaagc acaacgacgg tcgtgcggga cgcacggcac cgagtcgtgc catcgggctg ccgcgcctt t cat cgccacc cgcctgttcc gtgtgacatg ctaccaggaa cgatggcacg gctccgggac ctctgtcaag cgccctgcgg cggcac tgggaggcgc tgggccggCg ttagagagcg cgcacggctt acatcgctgg gcactggccg ggcatgatta gtgctgggca ggtgacacgg atcgggttca gcggccgcg tggagcatgc cgggcttcaa tgggcatgta acaagttaaa tggcggttca gtgctgcgtc tgagccgtga atggcgtggc tctacgccgt cggcgcccag tcccggcgga tggatacgat cagctacctc ccagatcttt cctgcgcggt cgaagcttgc tgtcagcacg cggcgtctgc ggtcgtggtg gattcgtggc cgccgagggg gagcgtgacc 120 180 240 300 360 420 480 540 600 660 686 <210> 34 <211> 689 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 34 ccccagcagc gcgcgaagct ctgaacaacc ctgacggcca gggcccagcc tc acaqct gttccgcagt tgccgcgcct gtcctcaagc ggagcggcta ggtcagacgc gcctgttcct ataagggttC tcgccaccgc acgacaagga tgacggttca tgcaqcgcg ccgtggggta tcgatgagtc gcttgagccg ttaataatga gattgattcg cgagtgcgezg gatcggcgtt ggagccgttc tttcctggcc gacggcgtgc cgcctcggac cctgcaccag cgctcaaggc cgctctggcc tggcgccgag gcgcactcaa tgggaagcga t1tcgcgggat gatttctcac acgcgtgtct tccacctcgc attqccttgg ccgggcatga acggtgccgg gatggcgaca cgcatggggt gagatggcgg tggagaacgZ gcggcgtcaa gcccctccgc cttacaagct tggtgtcggt ccgggggagt tcctgtcgcc gcaacggcgc cgatctacgc ttaccgctcc gcgtgaagcc gggatatgcg tacctacctg gtaccagctg gaacctccgc ggtgatggca tgccatcaat cgacgggcgc gggtgtggtC cgtcattcgc aggtgtggac ggagtccatc 120 180 2 300 360 420 480 540 600 660 ggctacatgg acacccacgg caccggcac <210> <211> 671 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> ccgcagcagc cccgacagt c agccggctga ttcagcactg gtcgacacgg t cgcgagagt acgatctact gcctccgccg tccgatgcga cacggcggcc cgggcggcgc ggcaccggca gcctcttcct tcgcgggcag aacctaccga ccgccggacg cgtgctcttc gcagcatggc tctgccgcct acggttacgg cgcgtgacgq gcagcaacgg tcaagaacgc cgaggtggca cgacaccgga tccggcgctc gatctcctat ctcactcgtg gctggccggc gcgggccatg ccgcggcgag cgatcgtatt cctcacggcg cggcatggcc tgggaagctt gtgttcatcg attgacgcct ctgctggggt gcggttcatc ggcgtgaacc gcggccgatq ggatgcggaa ctggcgctga ccgaacggtc cccgccgatg tggagcgtgc ggatcagcac ataccggtac tgcagggacc tggcgtgccg tgattctggc gccgttgcaa tgctggtgct ttcgcggatc cggcgcagga tcgattacgt gggtcggccg cgacgactac cggaaccgcg gaacttcccc cagcttgcag gccggaaagc aagtttcgct gaagcggctg gqccgtcaac agccgtgatt ggacacccac 120 180 240 300 360 420 480 540 600 660 671 <210> 36 <211> 758 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 36 ccgcaggagc tccccccgcg gaagcggcag tttgccggct ggagcttggt ctcgacctgc qtccatctgg gtgaacttga cccgacggac tgcggcatcg gcagtcatcc aatctgcagg tcccacgtat gcgtcttcct aagctctgaa aggacgccgg cctgcgccca cgggttccgg gcggtccgag cttgccaaag tcctgactcc gctgcaagac tgctgctgaa gcggctcggc cgcagaaggc cgttgatcga cgaacgcatt catggatccg catctctccc ggacttcgga cgtggcgcat catggcggtc cctgcgccgg cgagggcatg gttcgacgcc gcggctctcC aatcaatcag ggtcctgcaa cacgcacggc gacggtttcg cagcagcggc ggccctctgg ctgtttcagt agcatgttgg gatacggcct cgcgaatgcg atcgctttgt gcagccgacg gatgcgctgg gacggacgga gaggcggtgg accggcac atgcggaatt tgctgctgga cgggcagcgc acgccgaccc ccaatcgcat gctcctccgc atgcggcatt cgaaggctcg gttatgtgcq ccgatggcga gcaatggcat ccaacgcgca cttcggcatc aqtgtgctgg gaccggcgtc tgcccgcatc ctcctatctg gctcgtcgCC cgccggcgga catgttggcg cgqcgagqgc tqccatctgt cacggcgccg catcgatcca 120 180 240 300 360 420 480 540 600 660 720 758 <210> 37 <211> 704 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 37 ccgcagcagc cccgaaaaat ctt tat aacc ggcgagtacc aaattgaacc gccgtttgtc ggcatctcga gcgtgttcct atcccggcct tcgcgcacaa agacgatcct tgcgcggccc aggccattca tttcqtttcc cgagtgcgcc gatcggagtt ccgggaattc cggaaacgac cagcctggcc aaatctgcag gcaaaagcgc tgggaggcgg ttcgccgggg gtcgcccgca aaggactacc gtgcagtccg acttatcagt gactaccgct tggaaagcgc ccagcatcaa tggcggggga tccccactcg cctgctcgac gcgatatggc tcaccgacqa gggctacgat cagctatttc gtaccaagtg cgtctcctac cggcctcgtc cctcgcgggC aggaatggtc 120 180 240 300 360 420 tctcgcgacg gtcactgccg cccgttcgac gccagcgcgc aaggcacggt cttcggcaac ggggccggcg tcgtcctgat gaaaagattg gccgacgcag tgaccgatcg ggacacgatc ctcgccgtga ttaggggcgc tgccgtgaac aacgacggcg gcgtcaaaat gggttacacg gcgcccagtg ccgaaggtca ggcggaggcc atcaccctgg ccctcgcgct cgctggcgtc agcccggaga ccatcacttg catggacacc cacggcaccg gcac <210> 38 <211> 680 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 38 ccccaqcagc gggcgacacc cttctgaacc gtcgccaacg ccgqccatga caaagcctcc tcgcatggat ttcgatgcgg cgtctcgaag atcaacaatg ttggtcatca gacacccacg gcgtgttcct ttccacggtg tgcatgccaa acaaggattt cagtgcagac tagcgggcga atgtggcgcg atgccggcgg atgcgcttgc atggcgcgct qcgaggccca gcaccgggac cgaatgcgcc tccatcggcg tgccgcggtg tctggcgacg ggcctgctcc atgcgatatt cgaaqgtgga aaccgttcca agacggcgat gaaggcgagc tgcagctgcc tgggcggcgc gtctatgcct cqccaatcga cgcacggctt tcatcgttgg gcgctcgcgg a tattgtct C ggcagcggcg acgatcgacg tttaccgcac ggaatatcgg tggagcgccg caagcggctt tcagcccgtt acaagctcaa ttgccgttca gcqgcatcac ctgacgggca tcggcgttgt ccgtcatcat cgcaggtgga ccgattccat ccggatatca taacacctat tgaactgttc tctgcgcggc tgtcgccgcg ggtttcccgt ttgccgggcg cgtgctcaag cggttcggcc cagccaggcc cggttatatg 120 180 240 300 360 420 480 540 600 660 680 <210> 39 <211> 671 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 39 ccgcagcagc ccgtccacga ggccacaagt ctggcggtcg gtagacacgg t cggggcgga ttcatcgcct gccaaggccg gcgcatgcgc t ccgacgggc caacgcgtct gggaccggca gcctcttcct ttgccggcac tcttcagcga tcgccaatcg cgtgctcgtc tcgaaacagc tctcgcaggc atggctttgt atggcagccg gtaccaacgg attcacgcgc cgagctcacc gaatgtcggc ccacgccgtc catttcctac gtcgctcgtc cat tgtcggc ctcgatgctg ccgcggcgag caacccggtg cat ctcgctg atcgatcgat tgggaagcgc gttttcatgg gcggattccc atctacgacc gcgctgcatc ggcattaacg t cgccgacgg ggcggcacgg cgcgggctca ccatcggccg ccgaaccgcc tggaagatgc gcgcgtcgca atttcgccac tgcgcggCc aggcggt gga ttatcgccag ggttgtgcca ttttcgtcct ttctcgccac aagcgcagga tggctttcgt cggcatcccg ggctgactac cggcacctcg aagcctcact agcgctccgc cccggcgtCC ggctttctcc gcgcaaggcg cgacgtcaat agtcctcctg cgacacccac 120 180 240 300 360 420 480 540 600 660 671 <210> <211> 764 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> ccgcagcagc gcgtgttcct cgacggcatc gaccggttcg atccgcgtca cttcgcgatc acgccgcgcg aggcgatcag catggacccg Cagcagcggc tcctgctcga ggtcacgtgg 120 gaagcgctgg agcgcgccgg cgtggcgccc gatcgcctga ccggatccga caccggcgtc 180 ttcatcggca tcagcaccaa cgactacggc cagatcctgc tgcgcgcctc ggaccagatc 240 gatccgggga tacgtcctcg gtggcgattc ggcggcgcca ctcgcgcctg qaaggggccg atcgtcgcgc gcgccgaacg gccgcgtccg tgtacttcgg gcctgcaggg atctcgcgtg acctggtgct acgggcgctg cggtgatcgt tgatccgcgg aactggcgca acatcggcta caccggcaac tccgagcatg tcagagcctg cgt cccggaa caagacgttc gctgaagcgg atccgcggtc gcaggcggtg cgtggacacg ctgttgaacg gcggtcgaca cqcaaccgcg gtgacggtca gacgccgcgg ctctccgacg aatcaggacg atccggaccg cacggcaccg cggcggcggg ccgcatgtcc agtgccgcat actgctgccg cggacggcta cgctggcgga gccgcagcgq cgctcgcggc ggac acgcctctcg gtcgtcgctg ggcgctcgcc cgccaagatg cgtccgcggc cggcgatccg cggcttcacc agcgggcgtc <210> 41 <211> 763 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 41 ccgcagcagc qcgccgcgcg gaagcgctgg ttcgtcggca agcatcgacg tacgtgctcg gtcgcgatcc ggcggcgtgc ctggcggcgg gagggctgcg atcctcgcgc gcgccgaacg cagccggcgg gcgtgttcct aagcggccgg aagacgccgg tcaacagcat cctattcgct gcctgcaggg acctggcgtg acgtcacgct acggcaagtg cggtcatcgt tggtgcgcgg gtccggcgca aggtcggcta cgacggcatc catcgatccg gacgtcgccg cgactacgcg ctccggcagc gccggcgatg ccagagcctg gacgccgatc caagacgttc cctcaagcgg ttcggcggtc ggaagcggt c cgtggacacc gaccgcttcg cagcagcggc gaaaagctgc acgctgcagc gcgcacagca gcggtcgaca cgcaacgacg aacatggtcg gacggccgcg ttgtcgcacg aaccaggacg at ccgcgcgg cacggcaccg atccgcagtt tgctgctcga agggaacccc tgcagaactg tcgcggccgg ccgcctgctc actgccgcgt tgttctcgaa gcgacggatt cgcttgccga gcgcgagcag cgttgaagcg gca tttcgggatc gacgacgtgg ggccggcgtg cgatctggcc gcggctcgcc gtcgtcgctg cgccgtggcc gctgcgcatg cgtcgaaggc caaggatcgg cggtctcacc ggccggcgtg 120 180 240 300 360 420 480 540 600 660 720 763 <210> 42 <211> 668 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 42 ccgcaggagc ggcgaaagca ggcgacctgc tcggtgctgt gacaccgcct ggcgagaccg atctcctcca ggcgccgacg gacgcgctcg gacggcgcca cacgtctacg accggcac gcgtgctgct tcgccggcgc tgtacggcca cggcgcgcat gttcgagctc atctggccct gccgcgccgg gcttcgtgcc acgccggcga gcaacggcat acagcttcgg ggaatcctcg gcgctgcggc gccgtcgctg cgcctattac gttggtcgcg ggccggcggc catgctctcg gtccgaaggc ccacatntac caccgcgccg cat cgacgcc tggcatgcgc gtgtacatgg ccgccgcacg ctggacctgc gtgcatctgg gtgtggatcc ccgaccggcc gtcggcgtgg ggcgtgatcc agcgccgccg tcgcgcctgc tggaagacgc gcttcaacgg cgatgtgggg aaggcccggc cctgccaggg agtgcacgcc agtgccgcgc tcgtgctcaa gcggcagcgc cccaggagcg agatgatcga cggctatgcc cggcgactac caacgccgcc gatcaccctc gctqtggacc cggattcctg gttcggcgcc gcgcctgcag gatcaaccag cttgcagcgc ggcccacggc 120 180 240 300 360 420 480 540 600 660 668 <211> 671 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 43 ccgcagqagc gtggaccgtc ggccaqcttc ctgagcatcg gtqgacacgg cgcggtgaag acggtgaatt gcgqccgcca gcccaggcta caggacggcc cgggccgcgt ggcaccggca gcgtgctgct tggccgggcg agaacggcga ccgccaaccg cgtgctcgtc cggaactcgc tcacccgcgc acggctacgt tcgccgacgg qttccaacgg atcgtgacgc ggagqtgact gcccgtcggc cccggccgac actcagctac ttcactcqtc cgtcgcggcc cggcatgatg gcgcggcgaa cgacccgatc cctcaccgcc gggcatcaqc tgggaggcac gtcttcgtcg gtggacgcct acgtttgact gcgatccatc gqcgtcaact gcgcctgacg ggcqccggcg tacgcgatcg ccqaaccqac ccggccgatg tcgaagacgc ggatctcgtc acgtcggcac ttcqcgqccc tcgcctgcca tgattctgac gccggtgcaa tcgtcgtgct tccgtgqcag aggcccaaga tcgacgccgt cggccaagac gaacgattac cggtaacgcg qagtctggcg gagcgttcgc ccccggcct g gacgttcqac caagccqctq cgccgtcaac ggtcgtgctg cgaggcccac <210> 44 <211> 707 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 44 ccccagcagc ggcgaagacg ggcgattcat cgcgatcaca tacgcqattc gtcccgtatg accggacgca gcgggcgaga gacactgtgc ttcggcgagg accgtcttca ttctccgaac gcgtgttcct tcgtgatcgc cgtgcgtgct cgttcaagct agcgcgacaa tctcgaaggc aactqcacga atttctacgt tcgggccgga cctt cgccaa tctcggtcaa tcggtttcgg cgaggacgcg cggcatcatg tccgccggtc cgcgcgcgcg ggtctacgtg gacaggcgtg gctgttgccg gaagtcgccg gatgaaatcg ggcacagatc cgaccgtgac cattgtcgac actgaggtcg cagcacatcg gacatcccgc ttgaaggtca attgaggtaa ccgctggcga gaagqggtcg gtcttcccgt accggcgaag gccgccggca aaagqcaacg acgcacggca acgtggatgc aggaggccgg cqaaggcgct tcgqcctgat accctagggc aggtcqcgtc agcgcgqctq ggqgtaaqtt tcatgggcgt cat acctgcc tcattcagct ccgqgac gctttcaqac catccactcg gcagacgat c gaacgtgcag ttctcgaact acgcttgatg gatcaccacc cccgggcgtt cgccgacaac gaccgaaggt gqcgcagcgt 120 180 240 300 360 420 480 540 600 660 707 <210> <211> 225 <212> PRT <213> Streptomyces ambofaciens <400> Pro Gin Gin His Val Phe Leu Glu Thr 1 5 Val1 10 Trp Glu Thr Phe Glu Ser Ala Gly Val Val Gly Thr Asp Pro Arq Ala Val Arg 25 Gly Arq Ser Val Gly Met Phe Gly Ser Ala Asn Gly Gin Asp Pro Val Val Leu Asp Glu Gly Leu Asp Ala His 55 Ala Ala Thr Gly Asn Ala Ala Ala Val Ser Gly Arg Val Tyr Ala Phe Gly Glu Gly Pro Ala Thr Val Asp Thr Ala Cys Ser Ser Ser Leu 90 Val Ala Leu His Leu Ala Ala Gin Ala Leu 100 Arg Arg Gly Glu Cys 105 Asp Leu Ala Leu Ala Gly Gly 110 Val Ser Glu Met Ser Thr Glu Ala Ala Phe Thr Glu Phe Ala Arg Gin 13 115 120 125 Gly Gly Leu Ala Asp Asp Gly Arg Cys Lys Ala Phe Ser Ala Asp Ala 130 135 140 Asp Gly Thr Gly Trp Gly Glu Gly Val Gly Val Leu Leu Val Glu Arg 145 150 155 160 Leu Ala Asp Ala Arg Arg Asn Gly His Arg Ala Leu Ala Leu Val Arg 165 170 175 Gly Ser Ala Val Asn Gin Asp Gly Ala Ser Asn Gly Leu Thr Ala Pro 180 185 190 Asn Gly Pro Ser Gin Gin Arg Val Ile Arg Gin Ala Leu Ala Asp Ala 195 200 205 Arg Leu Ser Pro Ser Glu Val Asp Ala Val Glu Thr His Gly Thr Gly 210 215 220 Thr 225 <210> 46 <211> 207 <212> PRT <213> Streptomyces ambofaciens <400> 46 Ala Ser Trp Glu Ala Val Glu Arg Ala Gly Ile Asp Met Arg Thr Leu 1 5 10 Arg Gly Gly Arg Thr Gly Val Phe Ala Gly Val Met Tyr His Asp Tyr 25 Pro Ser Val Val Asp Pro Glu Ala Leu Asp Gly Tyr Leu Gly Thr Ala 40 Asn Ala Gly Ser Val Leu Ser Gly Arg Ile Ala Tyr Thr Phe Gly Leu 55 Gin Gly Pro Ala Val Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val 70 75 Ala Leu His Leu Ala Ala Gin Ala Leu Pro Ala Gly Glu Cys Glu Leu 90 Ala Leu Val Gly Gly Val Thr Val Met Ser Gly Pro Met Met Phe Ala 100 105 110 Gly Phe Gly Leu Glu Asp Gly Ser Ala Ala Asp Gly Arg Cys Lys Ala 115 120 125 Phe Ala Ala Ala Ala Asp Gly Thr Gly Trp Gly Glu Gly Val Gly Val 130 135 140 Leu Leu Val Giu Arg Leu Ser Asp Ala Arg Arg His Giy His Arg Val 145 150 155 160 Leu Ala Val Val Arg Gly Ser Ala Vai Asn Gin Asp Gly Ala Ser Giy 165 170 175 Gly Leu Thr Ala Pro Asn Gly Pro Ala Gin Gin Arg Val Ile Arg Gin 180 185 190 14 Ala Leu Ala Ser Ala Ala Leu Val Pro Ala Glu Val Asp Ala Val 195 200 205 <210> 47 <211> 223 <212> PRT <213> Saccharopolyspora <400> 47 Pro Gin Glu Arg Val Phe erythraea 1 Ala Phe Asp Pro Leu Gln Ser Ala Gly 145 Ser Ser Pro Gly Gly Arg Ala Asn Ser Leu Met 130 Tyr Arg Ala Ala Ile Ala Ile Arg Thr Ile Leu 115 Ala Val Ala Val Ala 195 Ala Met Thr Ile Ala Leu 100 Val Pro Arg Leu Asn 180 Gin 5 Pro Trp Gin Ala Cys Leu Ala Asp Gly Ala 165 Asn Glu His His His Tyr 70 Ser Gly Leu Gly Glu 150 Asp Asp Gin Leu Ser Gly Thr 55 Phe Ser Glu Asp Arg 135 Gly Gly Gly Val Asp 215 Glu Leu Tyr 40 Ala Leu Ala Ser Ser 120 Cys Gly Asn Phe Leu 200 Leu Arg 25 Ala Thr Gly Leu Ser 105 Met Lys Gly Pro Ser 185 Arg Ala 10 Asp Gin Gly Leu Val 90 Val Val Ala Val Val 170 Asn Asp Trp Ser Phe His Arg 75 Ala Ala Ala Phe Val 155 Tyr Gly Ala Glu Arg Ala Asp Gly Met Leu Met Asp 140 Val Cys Leu Tyr Ala Thr Ala Leu Pro His Val Ser 125 Ser Leu Val Thr Ala 205 Leu Gly Gly Ser Asp Gin Gly 110 Arg Arg Lys Leu Ala 190 Asn Asp Val Ala Ile Met Ala Gly Phe Ala Pro Arg 175 Pro Ala Asn Phe Val Ile Thr Arg Ile Gly Asn Leu 160 Gly Ser Gly Val Asp 210 Pro Ala Gin Val Tyr Val Glu Thr His Gly Thr Gly 220 <210> 48 <211> 211 <212> PRT <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 48 Ser Cys Trp Glu Ala Leu Glu His Ala Gly Tyr Asp Thr Ala Arg Tyr Pro Gly Arg Ile Gly Leu Trp Ala Leu Tyr Ala Cys Leu Pro Asp Gly 145 Arg Asn Gin Glu Thr Gln Tyr Ser Gly Leu Gly 130 Asn Asp Asp Ser Ser 210 Asn Ile Lys Thr Glu Arg 115 Val Gly Gly Gly Arg 195 Val Leu Phe Leu Ser Cys 100 Glu Cys Val Asp Ser 180 Met Leu Asn Leu Asp Gly Arg Ala Thr 165 Val Asn Ser Leu 70 Val Met Tyr Pro Val 150 Val Lys Asn Asn 55 Arg Ala Ala Leu Phe 135 Val Tyr Ile Arg 40 Asp Gly Val Leu Tyr 120 Asp Val Ala Gly Gly 25 Ala Lys Pro His Ala 105 Gin Ala Leu Val Phe 185 Ala Phe Asp Ala Glu 90 Gly Glu Asp Lys Ile 170 Thr Gly Leu Phe Met 75 Ala Ala Gly Ala Arg 155 Arg Ala Phe Glu Ile Ala Cys Ala Met Asp 140 Leu Gly Pro Asn Ser Ala Val Gin Ser Ile 125 Gly Asp Thr Ser Ser Val Thr Gly Ala Val 110 Met Thr Glu Ala Ala 190 Tyr Leu Gly Met Arg Thr Thr Ala Leu Arg Ser Thr Ser Arg Val Leu Ala Leu 160 Val Asn 175 Glu Gly Val Val Arg Asp Ala Leu Arg Ala Ala Ala Val Pro Ala 200 205 <210> 49 <211> 229 <212> PRT <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 49 Pro Gin Gin Arg Leu Phe Leu Glu Cys Ala Trp 1 5 10 Ala Gly Tyr Ala Ala Arg Ser Tyr Lys Gly Ser 25 Gly Cys Gly Val Asn Thr Tyr Leu Leu Asn Asn 40 Pro Phe Ap Ph- Ser Arg Pro Ser Ala Tyr Gn 55 Asp Lys Asp Phe Leu Ala Thr Arg Val Ser Tyr 70 75 Glu Ala Met Glu Asn Ile Gly Val Phe Ala Leu Ala Thr Ala Glu Leu Leu Thr Ala Asn Lys Leu Asn Leu Arg Gly Val Leu His Asp 145 Val Ala Gly Thr Thr 225 Pro Val Ala Gin 130 Glu Leu Val Phe Gin 210 His Ser Met Gly 115 Pro Ser Lys Ile Thr 195 Glu Gly Leu Ala 100 Gly Gly Ala Arg Arg 180 Ala Met Thr Thr Cys Val Met Gin Leu 165 Gly Pro Ala Gly Val Glu Ala Ile Gly 150 Ser Ala Gly Gly Gin Ser Ile Leu 135 Thr Arg Ala Val Val 215 Thr Leu Asn 120 Ser Val Ala Ile Asp 200 Lys Ala Gin 105 Val Pro Pro Leu Asn 185 Gly Pro 16 Cys Ser 90 Arg Gly Pro Gin Asp Gly Gly Asn 155 Ala Asp 170 Asn Asp Gin Thr Glu Ser Thr Ala Ser Arg 140 Gly Gly Gly Arg Ile 220 Glu Asp Lys Ala Gly Val Leu Phe Ser Ser Val 125 Cys Ala Asp Ala Leu 205 Gly Ala Thr Pro Phe Pro His Ala Cys 125 Leu Asp 110 Gly Arg Gly Thr Glu 190 Ile Tyr Leu Gly Thr Ser Asn Leu Gly 110 Arg Val Ile Tyr Ala Val Ile 175 Arg Arg Met Glu Val Asp Thr Phe Ala Gly Leu Ser Ala Leu Phe Val 160 Tyr Met Arg Asp Arg Phe Pro Ala Pro Cys Val Arg <210> <211> 223 <212> PRT <213> Unknown organism <220> <223> Origin of the se <400> Pro Gin Gin Arg Leu Ph 1 5 Ala Gly Arg Pro Pro As Ile Gly Ile Ser Thr As Ala Leu Ile Asp Ala Ty: Ala Gly Arg Ile Ser Ty: 7 Val Asp Thr Ala Cys Se Arg Ser Leu Gin Ser Ar 100 Asn Leu Ile Leu Ala Pr 115 quence: soil organism e Leu Glu Val Ala Trp 10 p Ser Leu Ala Gly Ser 25 p Asp Tyr Ser Arg Leu 40 r Thr Gly Thr Gly Thr 55 r Leu Leu Gly Leu Gin 0 75 r Ser Ser Leu Val Ala 90 g Glu Cys Ser Met Ala 105 o Glu Ser Thr Ile Tyr 120 Ala Gly 145 Ser Ser Gly Met Met 130 Tyr Asp Ala Pro Ala 210 Ala Gly Ala Val Ala 195 Pro Ala Arg Thr Asn 180 Gin Ala Asp Gly Arg 165 His Glu Asp Gly Glu 150 Asp Gly Ala Val Arg 135 Gly Gly Gly Val Asp 215 Cys Cys Asp Arg Ile 200 Tyr Lys Gly Arg Ser 185 Arg Val 17 Ser Phe Met Leu 155 Ile Leu 170 Asn Gly Ala Ala Asp Thr Ala 140 Val Ala Leu Leu His 220 Ala Leu Leu Thr Lys 205 Gly Ala Asp Arg Leu 160 Arg Gly 175 Pro Asn Ala Gly Gly <210> 51 <211> 252 <212> PRT <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 51 Pro Gin Glu 1 Phe Phe Gly Arg Leu Leu Ser Pro Gly Cys Ala Gin Gly Ala Trp Ile Ser Tyr Ala Cys Ser 115 Arg Arg Arg 130 Leu Thr Pro 145 Pro Asp Gly Arg Gly Glu Leu Ala Asp Arg Ile Leu Pro Asp Ser Leu 100 Ser Glu Glu Arg Gly 180 Gly Val 5 Ser Glu Leu Phe Gly Leu Ala Cys Gly Cys 165 Cys Asp Phe Leu Glu Arg lie Asp Pro Val Ala Gly 70 Ser Asp Leu Asp Met 150 Lys Gly Ala Arg Cys Gly 55 Leu Gly Leu Val Ala 135 Ile Thr Ile Ile Glu Trp 40 Ser Phe Val Arg Ala 120 Ala Ala Phe Val Cys Ala 25 Glu Ala Gin Ala Gly 105 Val Phe Leu Asp Leu 185 Ala 10 Leu Ala Thr Tyr His 90 Pro His Ala Ser Ala 170 Leu Val Asn Ala Gly Ala 75 Ser Ser Leu Gly Lys 155 Ala Lys Ile Gly Met Glu Val Asp Met Met Ala Gly 140 Ala Ala Arg Arg Phe Asp Asp Phe Pro Leu Ala Cys 125 Val Arg Asp Leu Gly Asp Pro Ala Ala Ala Ala Val 110 Gin Asn Met Gly Ser 190 Ser Ala Gln Gly Gly Arg Asn Asp Ser Leu Leu Tyr 175 Asp Ala Glu Gin Ile Ser Ile Arg Thr Leu Ile Ala 160 Val Ala Ile 195 Asn Gin Asp Gly 210 Gin Lys Ala Val 225 Ser His Val Ser Arg Leu Leu 245 200 205 Ser Asn Gly Ile Thr Ala Pro Asn Leu Gin Ala 215 220 Gin Glu Ala Val Ala Asn Ala His Ile Asp Pro 230 235 240 Ile Asp Thr His Gly Thr Gly 250 <210> 52 <211> 234 <212> PRT <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 52 Pro Gin Gin 1 Ala Gly Tyr Gly Ala Ser Glu Phe Val Thr Ile Leu Lys Leu Asn Thr Gly Leu Gin Cys Asp 115 Lys Arg Asp 130 His Cys Arg 145 Gly Ala Gly Arg Asp Thr Giy Giy Val 195 Giu Ala Ile 210 Ile Thr Cys 225 Arg Asp Ile Ala Gly Leu Val 100 Met Tyr Pro Val Ile 180 Lys Thr Met Val 5 Pro Asn Arg Asn Arg Ala Ala Arg Phe Val 165 Leu Met Leu Asp Phe Leu Glu Cys Ala Trp Glu Lys Ser Tyr Met Ala 55 Asp Lys 70 Gly Pro Val Cys Leu Ala Phe Thr 135 Asp Ala 150 Leu Met Ala Val Gly Tyr Ala Leu 215 Thr His 230 Tyr Phe 40 Gly Asp Ser Gin Gly 120 Asp Ser Lys Ile Thr 200 Ala Gly Pro 25 Leu Glu Tyr Leu Ala 105 Gly Glu Ala Arg Arg 185 Ala Leu Thr 10 Gly Tyr Tyr Leu Ala 90 Ile Ile Gly Gin Leu 170 Gly Pro Ala Gly Leu Asn Gin Pro 75 Val Gin Ser Met Gly 155 Ala Ala Ser Gly Glu Ile Leu Val Thr Gin Asn Ile Val 140 Thr Asp Ala Ala Val 220 Ala Gly Ala Gly Arg Ser Leu Ser 125 Ser Val Ala Val Giu 205 Ser Val Val His Glu Val Ala Gin 110 Phe Arg Phe Val Asn 190 Giy Pro Glu Phe Asn Tyr Ser Cys Thr Pro Asp Gly Thr 175 Asn Gln Giu Ser Ala Arg Gin Tyr Ser Tyr Gln Gly Asn 160 Asp Asp Ala Thr <210> 53 <211> 226 <212> PRT <213> Unknown organism <220> <223> Origin of the sequence: soil organism <400> 53 Pro Gin Gin 1 Arg Arg Ile Ala Ser Ser Ala Val Arg Lys Asp Phe Pro Ala Met His Val Ala Ala Gly Gly 115 Gly Gly Ile 130 Ala Gly Gly 145 Arg Leu Glu Ile Gly Ser Ala Pro Gin 195 Ala Ala Gly 210 Thr Gly 225 Arg Ser Gly Gin Leu Thr Ala 100 Ile Leu Thr Asp Ala 180 Val Ile Val 5 Gly Phe Ser Ala Val Gin Thr Ser Val Ala 165 Ile Asp Ser Phe Leu Glu Cys Ala Trp Arg Asn Ile Thr 70 Gin Ser Val Pro Pro 150 Leu Asn Ser Ala His Thr Ser 55 Arg Thr Leu Ser Asp 135 Gly Ala Asn Gin Asp 215 Leu Tyr 40 Pro Thr Ala Leu Arg 120 Gly Ser Asp Asp Ala 200 Ser Pro 25 Leu Phe Ala Cys Ala 105 Ser His Gly Gly Gly 185 Leu Ile 10 Arg Leu Glu Tyr Ser 90 Gly His Cys Val Asp 170 Ala Val Gly Cys Asn Leu Lys 75 Ser Glu Gly Arg Gly 155 Thr Leu Ile Tyr Ala Pro Leu Phe Leu Ser Cys Tyr Ala 140 Val Ile Lys Ser Met 220 Ala Ser His Val Asn Leu Asp Val 125 Phe Val Asp Ala Glu 205 Asp Leu Ala Ala Ala Leu Val Ile 110 Ala Asp Val Ala Ser 190 Ala Thr Glu Val Asn Asn Arg Ala Ala Arg Ala Leu Val 175 Phe His His Arg Tyr Ala Asp Gly Val Leu Glu Asp Lys 160 Ile Thr Ala Gly <210> 54 <211> 223 <212> PRT <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 54 Pro Gin Gin Arg Leu Phe Leu Glu Leu Thr 1 5 10 Ala Met Ala Ala Val Glu Asn Met Gly 145 Ala Thr Ala Ile Gly Gly Val Asn Asp Ala Val Leu 130 Phe His Asp Glu Asp 210 Ile Ala Ala Arg Thr Leu Ile 115 Ser Val Ala Val Ala 195 Pro Pro Ser Asp Ile Ala Arg 100 Ala Pro Arg His Asn 180 Gin Asn Pro Gln Ser Ser Cys Ser Ser Thr Gly Gly 165 Ser Glu Arg Ser Ala His Tyr 70 Ser Gly Pro Gly Glu 150 Ser Asp Val Leu Thr Asp Phe 55 Ile Ser Arg Ala Leu 135 Gly Arg Gly Leu Ala 215 Ile Tyr 40 Ala Tyr Ser Ile Ser 120 Cys Gly Asn Arg Leu 200 Phe Ala 25 Gly Thr Asp Leu Glu 105 Phe Gin Thr Pro Thr 185 Gin Val Gly His Gly Leu Val 90 Thr Ile Ala Val Val 170 Asn Arg Asp Trp Thr Lys Thr Arg 75 Ala Ala Ala Phe Phe 155 Arg Gly Val Thr Glu Asn Phe Ser Gly Leu Ile Phe Ser 140 Val Gly Ile Tyr His 220 Ala Val Phe Leu Pro His Val Ser 125 Ala Leu Leu Ser Ser 205 Gly Leu Gly Ser Ala Ser Gin Gly 110 Gin Lys Arg Ile Leu 190 Arg Thr Glu Val Asp Val Leu Ala Gly Ala Ala Lys Leu 175 Pro Ala Gly Asp Phe His Val Thr Val Ile Ser Asp Ala 160 Ala Ser Ser <210> <211> 254 <212> PRT <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> Pro Gin Gin Arg Val Phe Leu Asp Gly Ile Asp Arg Phe Asp Pro Arg 1 5 10 His Phe Ala Ile Thr Pro Arg Glu Ala Ile Ser Met Asp Pro Gin Gin 25 Arg Leu Leu Leu Glu Va Thr Trp lu Ala Leu Glu Arg Ala Gly Val 40 Ala Pro Asp Arg Leu Thr Gly Ser Asp Thr Gly Val Phe Ile Gly Ile 55 Ser Thr Asn Asp Tyr Gly Gin Ile Leu Leu Arg Ala Ser Asp Gin Ile 70 75 Asp Pro Gly Met Tyr Phe Gly Thr Gly Asn Leu Leu Asn Ala Ala Ala 90 Gly Arg Leu Ser Tyr Val Leu Gly Leu Gin Gly Pro Ser Met Ala Val 100 105 110 Asp Thr Ala Cys Pro Ser Ser Leu Val Ala Ile His Leu Ala Cys Gin 115 120 125 Ser Leu Arg Asn Arg Glu Cys Arg Met Ala Leu Ala Gly Gly Ala Asn 130 135 140 Leu Val Leu Val Pro Glu Val Thr Val Asn Cys Cys Arg Ala Lys Met 145 150 155 160 Leu Ala Pro Asp Gly Arg Cys Lys Thr Phe Asp Ala Ala Ala Asp Gly 165 170 175 Tyr Val Arg Gly Glu Gly Ala Ala Val Ile Val Leu Lys Arg Leu Ser 180 185 190 Asp Ala Leu Ala Asp Gly Asp Pro Ile Val Ala Leu Ile Arg Gly Ser 195 200 205 Ala Val Asn Gin Asp Gly Arg Ser Gly Gly Phe Thr Ala Pro Asn Glu 210 215 220 Leu Ala Gin Gin Ala Val Ile Arg Thr Ala Leu Ala Ala Ala Gly Val 225 230 235 240 Ala Ala Ser Asp lie Gly Tyr Val Asp Thr His Gly Thr Gly 245 250 <210> 56 <211> 254 <212> PRT <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 56 Pro Gin Gin Arg Val Phe Leu Asp Gly lie Asp Arg Phe Asp Pro Gin 1 5 10 Phe Phe Gly Ile Ala Pro Arg Glu Ala Ala Gly Ile Asp Pro Gin Gin 25 Arg Leu Leu Leu Glu Thr Thr Trp Glu Ala Leu Glu Asp Ala Gly Thr 40 Ser Pro Glu Lys Leu Gin Gly Thr Pro Ala Gly Val Phe Val Gly Ile 55 Asn Ser Ile Asp Tyr Ala Thr Leu Gin Leu Gin Asn Cys Asp Leu Ala 70 75 Ser lie Asp Ala iyr Ser Leu Ser Giy Ser Aia His Ser lie Aia Ala 90 Gly Arg Leu Ala Tyr Val Leu Gly Leu Gin Gly Pro Ala Met Ala Val 100 105 110 Asp Ser Val 145 Leu Phe His Ala Pro 225 Gin Thr Leu 130 Thr Ala Val Ala Val 210 Ala Pro Ala 115 Arg Leu Ala Glu Leu 195 Asn Gin Ala Cys Asn Thr Asp Gly 180 Ala Gin Glu Glu Ser Asp Pro Gly 165 Glu Asp Asp Ala Val 245 Ser Asp Ile 150 Lys Gly Lys Gly Val 230 Gly Ser Cys 135 Asn Cys Cys Asp Ala 215 Ile Tyr Leu 120 Arg Met Lys Ala Arg 200 Ser Arg Val Val Val Val Thr Val 185 Ile Ser Ala Asp 22 Ala Ile Ala Val Val Phe 155 Phe Asp 170 Ile Val Leu Ala Gly Leu Ala Leu 235 Thr His 250 <210> 57 <211> 222 <212> PRT <213> Unknown organism <220> <223> Origin of the sec <400> 57 Pro Gin Glu Arg Val Le 1 5 Ala Gly Tyr Ala Gly Gil Met Gly Phe Asn Gly Gl\ Ser Leu Pro Pro His AlE Ala Arg Ile Ala Tyr Tyi 7( Asp Thr Ala Cys Ser Sei Gly Leu Trp Thr Gly GlL 100 Ile Gin Cys Thr Pro Gli 115 Leu Ser Pro Thr Gly Gir 130 His Ala 140 Ser Gly Leu Leu Thr 220 Lys Gly His Arg Leu Ala Pro His Ala Ser Ala 140 Leu 125 Gly Lys Arg Lys Val 205 Ala Arg Thr Ala Cys Tyr Ser Ala Leu Gly Arg 125 Gly Ala Gly Leu Gly Arg 190 Arg Pro Ala Gly Leu Gly Gly Val Ile Ala Gly 110 Ala Ala Cys Val Arg Asp 175 Leu Gly Asn Gly Glu Val Gin Leu Thr Cys Val Gly Asp Gin His Met 160 Gly Ser Ser Gly Val 240 Asp Tyr Pro Ser Leu Gin Trp Met Gly quence:soil organism u Leu Glu Ser Ser Trp 10 i Ser Ile Ala Gly Ala 25 y Asp Tyr Gly Asp Leu 40 a Met Trp Gly Asn Ala 55 r Leu Asp Leu Gin Gly 0 75 r Ser Leu Val Ala Val 90 u Thr Asp Leu Ala Leu 105 y Phe Leu Ile Ser Ser 120 Cys Arg Ala Phe Gly 135 23 Phe Val Pro Ser Glu Gly Val Gly Val Val Val Leu Lys Arg Leu Gin 145 150 155 160 Asp Ala Leu Asp Ala Gly Asp His Xaa Tyr Gly Val Ile Arg Gly Ser 165 170 175 Ala Ile Asn Gin Asp Gly Ala Ser Asn Gly Ile Thr Ala Pro Ser Ala 180 185 190 Ala Ala Gin Glu Arg Leu Gin Arg His Val Tyr Asp Ser Phe Gly Ile 195 200 205 Asp Ala Ser Arg Leu Gin Met Ile Glu Ala His Gly Thr Gly 210 215 220 <210> 58 <211> 223 <212> PRT <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 58 Pro Gin Glu 1 Ala Gly Gin Val Gly Ile Ala Asp Val Ala Asn Arg Val Asp Thr Gin Ser Val Asn Leu Ile 115 Met Met Ala 130 Gly Tyr Val 145 Ala Gin Ala Ser Ala Val Arg Gin Ala 195 Ile Ser Pro Arg Asp Ser Asp Leu Ala Arg 100 Leu Pro Arg Ile Asn 180 Gin Ala Val 5 Val Ser Ala Ser Cys Arg Thr Asp Gly Ala 165 Gin Glu Asp Leu Leu Glu Val Thr Trp Asp Asn Tyr Tyr 70 Ser Gly Pro Gly Glu 150 Asp Asp Val Val Arg Asp Val 55 Thr Ser Glu Gly Arg 135 Gly Gly Gly Val Asp Leu Tyr 40 Gly Phe Ser Ala Leu 120 Cys Ala Asp Arg Leu 200 Ala Ala 25 Gly Thr Asp Leu Glu 105 Thr Lys Gly Pro Ser 185 Arg Val 10 Gly Gin Gly Phe Val 90 Leu Val Thr Val Ile 170 Asn Ala Glu Arg Leu Asn Arg 75 Ala Ala Asn Phe Val 155 Tyr Gly Ala Ala Glu Pro Gin Ala Gly Ile Val Phe Asp 140 Val Ala Leu Tyr His Ala Val Asn Leu Pro His Ala Thr 125 Ala Leu Ile Thr Arg 205 Gly Leu Gly Gly Ser Ser Leu Ala 110 Arg Ala Lys Val Ala 190 Asp Thr Glu Val Asp Ile Leu Ala Gly Ala Ala Pro Arg 175 Pro Ala Gly Asp Phe Pro Ala Ala Cys Val Gly Asn Leu 160 Gly Asn Gly 210 215 <210> 59 <211> 235 <212> PRT <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 59 Pro Gin Gin 1 Ala Leu Ser Ile Glu Glu Pro Val Asp Phe Lys Leu Tyr Ala Ile Ala Ser Arg Ala Lys Val 115 Leu Pro Glu 130 Phe Tyr Val 145 Asp Thr Val Val Ala Asp Gly Thr Tyr 195 Arg Asp Lys 210 Gly Phe Gly 225 Arg Asp Ala Ile Ala Gin Thr 100 Ala Gly Lys Leu Asn 180 Leu Gly Val 5 Gly Gly Pro Arg Arg Val Ser Val Ser Gly 165 Phe Pro Asn Phe Leu Glu Asp Ala Thr Glu Ile Pro Ala 70 Asp Pro Arg Glu Pro 150 Pro Gly Thr Val Asp His Lys 55 Leu Lys Tyr Leu Arg 135 Val Glu Glu Glu Ile 215 Val Ser 40 Ala Lys Val Val Met 120 Gly Phe Met Ala Gly 200 Gin Val 25 Gly Leu Val Tyr Ser 105 Thr Trp Pro Lys Phe 185 Thr Leu 10 Ile Asp Gin Ile Val 90 Lys Gly Ile Trp Ser 170 Ala Val Ala Ala Ser Thr Gly 75 Ile Ala Arg Thr Gly 155 Thr Lys Phe Gin Glu Gly Ser Ile Leu Glu Thr Lys Thr 140 Lys Gly Ala Ile Arg 220 Val Ile Cys Arg Met Val Gly Leu 125 Ala Phe Glu Gin Ser 205 Phe Asp Met Val Asp Asn Asn Val 110 His Gly Pro Val Ile 190 Val Ser Val Gin Leu His Val Pro Pro Glu Glu Gly Met 175 Ala Asn Glu Asp His Pro Thr Gin Arg Leu Leu Asn Val 160 Gly Ala Asp Leu Ile Val Asp Thr His Gly Thr Gl <210> <211> 1269 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> taacaggaag acctgcctta gggggaaagt gcccaccaag gacacggccc ctgatccagc cgacgatgat taatacgaag cgatcagtca cttgagttcc gaagaacacc gtggggaqca cgctggggtg acggccgcaa tqgtttaatt cgagagattg ctcgtgtcgt catcattcag gacgtcaagt agtgggacgc actctgcaac gtgaatacg aagcttgctt tggttcggga ttacgccatg ccgacgatcg agactcctac aatgccgcgt gacggtagcq ggggcgagcg gatgtqaaag ggagaggatg ggtggcgaag aacaggatta catgcacttc ggttaaaact cqaagcaacg gaccttcagt gagatgttgg ttgggcactc cctcatggcc gaagtcgcaa tcgggtgcat ctttgctgac taacgtctgg agaggggccc gtagctggtc gggaggcagc gaqtgatgaa tgagaagaag ttgttcggaa ccccgggct c gtggaattcc gcggccatct gataccctgg ggtgtcgccg caaaggaatt cgcagaacct tcggctggat gttaagtccc tggtggaact cttatgggtt gatggagcaa gaagttggaa qagtggcgga aaacggacgc gcgtccgatt tgagaggatg agtggggaat gqccttaggg ccccgqctaa ttactgggcg aacctggqaa cagtgtagag gqacgqacac tagtccacgc ctaacgcatt gacgggqqcc taccaaccct ggaacacagg gcaacgagcg qccggtgaca gggctacaca atccccaaaa tcgctagtaa cgggtgaqta taacaccqga aggtagttgg atcagccaca attggacaat ttgtaaagct cttcgtqcca taaagggcgc ctgcatttga gtgaaattcg tqacgctgaq cgtaaacgat aagcattccg cgcacaagcg tgacatgtcc tgctgcatgg caacccctac agccggaqga cgtgctacaa gccgtctcag tcgcggatca acacgtggga tgtgCccttc tgggqtaatg ctgggactga gggggcaacc ctttcgcacg gcagccgcgg gtagqcqgcc tactgtcggg tagatattgq gcgcgaaagc gaatgctaga cctggggagt gtggagcatg attgccgqtc ctgtcgtcag cqccaqttgc aggcggggat taqcggtgac ttcggattqc gcacgccgcg 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1269 <210> 61 <211> 1500 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 61 ttttaaaacg agatgcatgc taacacatgc ggaacctgcc ttcgggggaa atggcccacc tgagacacgg accctgatcc acgcgacgat cggtaatacg gcccgatcag qggcttqaqt tgqgaagaac agcgtgqgga agacgctggq agtacggccg atgtggttta gtccgagaga cagctcgtgt tgccatcatt gatgacgtca gacagtggga tgcactctgc gcggtgaata c-,ggccgttac acggccagtg tcgagcggcc aagtcgaacg ttatggttcg agtttacgcc aagccgacga cccagactcc aqcaatgccq catgacggta aagggggcga tcagatgtga tccggagagg accggtggcg gcaaacagga qtgcatgcac caaggttaaa attcgaagca ttggaccttc cgtgagatgt cagttgqgca agtcctcatg cqcgaagtcg aactcgggtg cgttCccggg tagtggatcc aattgtaata qccagtgtga aqggcttcgg ggataacqtc atgagagggg tcggtagctg tacgggagqc cgtgagtgat gcgtgagaaq gcgttgttcg aagccccggg atggtggaat aaggcgqcca ttagataccc ttcggtgtcg actcaaagga acgcgcagaa aqttcqgctg tgqqttaagt ctctggtgga qcccttatgg caagatggag catgaagttg ccttgtacac gagctcggta cqactcacta tggatatctg ccctagtggc tggaaacgga cccgcgtccg gtctgaqagg agcagtgggq gaaqqcctta aaqccccggc gaattactgg ctcaacctqg tcccagtgta tctggacgga tggtagtcca ccgctaacgc attgacgggg ccttaccaac gatgqaacac cccgcaacga actgccqgtg gttgggctac caaatcccca gaatcgctag accgcccaag ccaagcttgg tagggcgaat cagaattcgc gcacgggtga cgctaacacc attaggtagt atgatcagcc aatattggac gggttgtaaa t aact t cg t gcgtaaaggg gaactgcatt gaggtgaaat cactgacgct cgccgtaaac attaagcatt gcccgcacaa ccttgacatg aggtgctgca gcgcaacccc acaagccgga acacgtgcta aaagccgtct taatcgcgga ggcgaattcc cgtaatcal g tggCcctct ccttcaggcc gtaacacgtg ggatgtgccc tggtggggta acactgggac aatgggggca gCtctttcgc ccagcagcca cgcgtaggcg tgatactgtc tcgtagatat gaggcqcgaa gatgaatgct ccgcctgggg gcggtggagc tccattgccg tggctgtcgt taccgccagt ggaaggcggg caatggcggt cagttcggat tcagcacgcc agcacactgg gtca act 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 <211> 1366 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 62 acgacggcca tgctcgagcg tqcaagtcga gcctttcggt gaaagttcac accaagcctt cggcccagac tccagcaatg gatgatgacg acgaaggggg tagtcagaag agttccggag aacaccggtg ggagcaaaca ggggtgcatg ccgcaaggtt ttaattcgaa agatgaggtc gtgtcgtgag cattcagttg gtcaagtcct gggaagcgaa ctgcaactcg gtgaattgta gccgccagtg acqaaggctt tcggaataac gccgagagag cgatccgtag tcctacggga ccgcgtgagt gtagcgtgag ctagcgttgt tgaaagcccc aggatggtgg gcgaaggcgg ggattagata cacttcggtg aaaactcaaa gcaacgcgca cttcagttcg atgttgggtt ggcactctgg catggccctt gtcgcgagat agtgcgtgaa atacgactca tgatggatat cggccttagt gtctggaaac gggCccgcgt ctggtctgag ggcagcagtg gatgaaggcc aagaagcccc tcggaattac gggctcaacc aattcccagt ccatctggac ccctggtagt tcgccgctaa ggaattgacg gaaccttacc gctgggtgga aagtcccgca tggaaccgcc atgggttggg ggagcaaatc gttggaatcg c tat agggcg ctgcagaatt ggcgcacggg ggacgctaac cggattaggt aggatgatca gggaatattg ttagggttgt ggctaacttc tgggcgtaaa tgggaatagc gtagaggtga qgacactgac ccacgccgta cgcattaagc ggggcccgca aacccttgac acacaggtgc acgagcgcaa ggt gacaagc ctacacacgt cccaaaagcc ctagtaatcg aattgggccc cgcccttcag tgagtaacac accggatacg agttggtgag gccacactgg gacaatgggc aaagctcttt gtgccagcag gggcgcgtag ttttgatact aattcgtaga gctgaggcgc aacgatgaat attccgcctg caagcggtgg atgtccatta tgcatggctg cccctaccgt cggaggaagg gctacaatgg gtctcagttc cggat c tctagatgca gcctaacaca gtgggaacct cccttcgggg gtaatggctc gactgagaca gcaagcctga cgcacgcgac ccgcggtaat gcggcctgct ggcaggcttg tattgggaag gaaagcgtgg gctagacgtc gggagtacgg agcatgtggt tgggcttcag tcgtcagctc cagttqccat cggggatgac cggtgacagt ggatcgcact 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1366 <210> 63 <211> 1360 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soii organism <400> 63 acagctatga gccagtgtgc ggagtggcag gaaactggaa cccgcgttgg gtctgagagg agcagtgggg gaaggcctta aagccccggc gaattactgg ctcaactctg tccgagtgta actggtccat tggtagtcca cagctaacgc attgacgggg ccttaccagc ggccccagaa tcccgcaacg gactgccggt gggctgggct agctaatctc tggaatcgct <210> 64 <211> 1288 <212> DNA ccatgattac tggaattcgc acgggtgagt ttaataccgc attagctagt atgatcagcc aatattggac gggttgtaaa taacttcgtg gcgtaaagcg gaactgcctt gaggtgaaat tactgacgct cgccgtaaac attaaacatt gcccgcacaa tcttgacatt caggtgctgc agcgcaaccc gataagccga acacacytgc caaaagccat agtaatcgca gccaagcttg ccttcaggcc aacgcgtgg atacgcccta tggtggggta acattgggac aatgggcgca gctctttcac ccagcagccg cacgtaggcg tgatactggg tcgtagatat gaqgtgcqaa gatgaatgtt ccgcctggqg gcggtggagc cggggtttgq atggctgtcg tcgccct tag gaggaaggtg tacaatyytg ctcagttcgg gatcagcatg gtaccgagct taacacatgc aacataccct cgggggaaag aaggcctacc tgagacacgg agcctgatcc cggagaagat cggtaatacg gatatttaag tat ct tgag t tcggaggaac agcgtgggga agccgtcggg agtacggtcg atgtggttta gcagtggaga tcagctcgcg ttgccagcat gggacgacgt ytgacag Lyg at tgcact ct tgcggtgaat cggatccact aagtcgaacg ttcctgcgga atttatcggg aaggcgacga cccaaactcc agccatgccg aatgacggta aagggggcta tcaggggtga atggaagagg accagtggcg gcaaacagga cagtatactg caagattaaa attcgaagca cattgtcctt tcgtgagatg ttagttgggc caagtcctca ycagcyayac gcaactcgag agtaacggcc ccccgcaagg atagctccgg gaaggattgg tccatagctg tacgggaggc cgtgagtgat tccggagaag gtgttgttcg aatcccagag taagtggaat aaggcggctt ttagataccc ttcggtggcg act caaagga acgcgcagaa cagttaggct ttgggttaag act ctaaggg tggcccttac agcgatgtcg tgcatgaagt 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1360 <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 64 tccaggaaac taacggccgc ccgcaagggg aactccggga tggatgagcc catagctggt cgggaggcag gtgagtgatg ccggagaaqa gcgttgttcg aatcccagag taagtggaat aaggcggctt ttagataccc gtcggtggcg actcaaagga acgcgcagaa ttcggggacc gttaagtccc taaggggact cttacgggct atcccgagct agctatgacc cagtgtgctg agcggcagac aactggagct cgcgttgqat ctgagaggat cagtggggaa aaggtcttag agccccggct gaattactgg ctcaactctg tgcgagtgta actggtccat tggtagtcca cagctaacgc attgacgggg ccttaccagc gggacacagg gcaacgagcg gccggtgata gggctacaca aatctccaaa atgattacgc gaattcgccc gggtgagtaa aataccgtat tagctagttg gatcagccac tattggacaa gattgtaaag aactttcgtg gcgtaaagcg gaactgcctt qaqgtqaaat tactgacgct cgccgtaaac attaaacatt gcccgcacaa ccttgacatg tgctgCatgg caaccctcgc agccgagagg cgtgctacaa agccat ct caagcttggt.
ttcaggccta cgcgtqggaa acgccctttg gtggggtaaa attgggactg tgqqcqcaag ctctttcacc ccagcagccg cacgtaggcg tgatactggg tcgtagatat gaggtgcqaa gatgaatgtt ccgcctgggg gcggtggagc cccggacagc ctgtcgtcag ccttagttgc aagtggggat tgggtggtga accgagctcg acacatgcaa tctacccatc ggggaaagat ggcctaccaa agacacggcc cctgatccag ggagaagata cggtaatacg gatatttaag tatcttqagt tcgcaggaac agcgtqggga agccqtcggc agtacggtcg atgtggttta tacagagatg ctcgtqtcgt cagcattcaq gacgtcaagt cagtgggcag gatccactag gtcgagcgcc cctacggaac ttatcgggga ggcgacgatc caaactccta ccatgcccgc atgacggtat aagggggcta tcagqggtqa atggaaqagg accagtggcq gcaaacagga aagtttactt caagattaaa attcgaagca tagtgtt ccc gagatgttgg ttggqcactc cctnatggcc cgaaggaacg 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1288 <210> <211> 1386 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> cgacggccag gctcgagcg qcaaqtcgag taccttttgg ggaaagattt cctaccaagg acacggccca tgatccagcc gaagataatg aatacgaagg t t taagt cag ttgagttcgg aagaacacca gt ggggagca cqttagtggq acggtcgcaa tggtttaatt qcagagatgt tcgtgtcgtg accatttagt gacgtcaagt aatgqgacgc ggct ct gcaa ggtgaa tgaattgtaa ccgccagtgt cgggcgtagc ttcggaacaa atcgccgaaa cgacgatcag aactcctacg atqccgcgtg acggtaccgc gggctagcgt gggtgaaatc gagaggtgag gtgggcgaag aacaggatta tttactcact qattaaaact cgacqcaacq gaccttctct agatgttggg ig-cactct~ cctcatggcc taaggggcaa tacgactcac gatggatatc aatacgtcag cacaqggaaa gattggcccg tagctggtct ggaggcagca agtgatgaag aagaataagc tgctcggaat ctgqagctca tggaactgcg gcggctcact gataccctgg agtggcgcag caaaggaatt cgcagaacct tcggaqcctg ttaagtcccg aaggagactg cttacgggct cccttcqcaa t a a g1 tatagggcga tgcagaattc cqgcagacgg cttgtgctaa cgtctqatta gagaggatga gtggggaata gccctagggt cccggctaac cactgggcgt actccagaac agtgtagagg ggcccgatac tagtccacgc ctaacgcttt gacgggggcc taccagccct gagcacaggt caacgagcgc ccggtgataa gggctacaca atctcaaaaa atcgctagta attgggccct gcccttcagg gtgagtaacg taccggataa gctagttggt tcagccacat ttggacaatg tgtaaagctc ttcgtgccag aaagggtgcg tgcctttgat tgaaattcgt tgacgctgag cgtaaacqat aagcattccg cgcacaagcg tgacatgtcc gctgcatggc aacccccgtc gccgcgagga cgtgctacaa qcccgtctca atcgtggaLu ctagatqcat cctaacacat cgtgqaaca gcccttacgg agggtaatgg tgggactgag qgcgcaaqcc ttttgtgcgq cagccgcggt taggcgggtc actgaagatc agatattcgc gcacgaaagc gaatgccagc cctggggagt gtgqagcatg aggaccggtc tgtcgtcagc cttagttgct aggtggggat tggcggtgac gttcggattg agcacgccac 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1386 <210> 66 <211> 1223 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 66 agcggcagag ggtgagtaac gcgtgggaat ctacccatct ctacggaaca actggagcta ataccgtata cgtccttcgg gagaaagatt tatcggagat gcgttggatt agctagttgg tggggtaatg gcctaccaag gcgacgatcc tgagaggatg atcagccaca ctgggactga gacacggccc agactcctac agtggggaat attggacaat gggcgaaagc ccgatccagc catgccgcgt ggccctaggg ttgtaaagct ctttcaacgg tgaggataat gacggtaacc ccccggctaa cttcgtgcca gcagccgcgg taatacgaag ggggctagcg ttactgggcg taaagcgcac gtaggcggac tattaagtca ggggtgaaat aaccccggaa ctgcctttga tactggtagt ctcgagtccg gaagaggtga gagtgtagag gtgaaattcg tagatattcg gaggaacacc agtggcgaag ggtccggtac tgacgctgag gtgcgaaagc gtggggagca aacaggatta tagtccacgc cgtaaacgat ggaagctagc cgttggcaag tttacttgtc ctaacgcatt aagcttcccg cctggggagt acggtcgcaa gattaaaact gacgggggcc cgcacaagcg gtggagcatg tggtttaatt cgaagcaacg taccagccct tgacatcccg gtcgcggtta ccagagatgg tatccttcag ccggtgacag gtgctgcatg gctgtcgtca gctcgtgtcg tgagatgttg cgcaacgagc gcaaccctcg cccttagttg ccagcattca gttgggcact tgccggtgat aagccgagag gaaggtgggg atgacgtcaa gtcctcatgg ctgggctaca cacgtgctac aatggtggtg acagtgggca gcgagaccgc taatctccaa aagccatctc agttcggatt gcactctgca actcgagtgc aatcgctagt aatcgcggat cag actccgggaa ggatgagccc 120 atagctggtc 180 gggaggcagc 240 gagtgatgaa 300 gtagaagaag 360 ttgttcggaa 420 cccggggctc 480 gtggaattcc 540 gcggctcact 600 gataccctgg 660 ggtgqcgcag 720 caaaggaatt 780 cgcagaacct 840 ttcggctgga 900 ggttaagtcc 960 ctaaggggac 1020 cccttacggg 1080 gaqgtcgagc 1140 atgaagttgg 1200 1223 <210> 67 <211> 1237 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 67 cccgcagggg agtggcagag actgagggaa acttcagcta ggatgagccc gcgttggatt atagctggtc tgagaggatg gggaggcagc agtggggaat gagtgatgaa ggccctaggg ggagaagaag ccccggctaa ttgttcggat ttactgggcg cccggggctc aaccccggaa gtggaattcc gagtgtagag gcggctcact ggctcgatac gataccctgg tagtccacgc ggtggcgcag ctaacgcatt caaaggaatt gacgggggcc cgcagaacct taccagccct ttaggctgga tcggtgacaq ggttaagtcc cgcaacgagc ctaaggggac tgccggtgat cccttacggg ctgggctaca gaggtcgagc taatctccaa atgaagttgg aatcgctagt ggtgagtaac gcgtgggaat ctaccctttt ataccgtata cggccgagag gcgaaagatt agctagttgg tggggtaaag gcctaccaag atcagccaca ctgggactga gacacggccc attggacaat gggcgcaagc ctgatccagc ttgtaaagct ctttcaccgg tgaagataat cttcgtgcca gcagccgcgg taatacgaag taaagcgcac gtaggcggac tattaagtca ctgcctttga tactggtagt cttgagttcg gtgaaattcg tagatattcg gaggaacacc tgacgctgag gtgcgaaagc gtggggagca cgtaaactat gagagctagg cgtcgggcag aagctcttcg cctggggagt acggtcgcaa cgcacaagcg gtggagcatg tggtttaatt tgacatcccg atcgcggtta ccagagatgg gtgctgcatg gctgtcgtca gctcgtgtcg gcaaccctcg cccttagttg ccatcattca aagccgagag gaaggtgggg atgacgtcaa cacgtgctac aatggtggcg acagtgggca aagccatctc agttcggatt gcactctgca aatcgtggat cagaatg ctacggaaca tatcggaqaa 120 gcgacgatcc 180 agactcctac 240 catgccgcgt 300 gacggtaacc 360 ggggctagcg 420 ggggtgaaat 480 aaagaggtga 540 agtggcgaag 600 aacaggatta 660 tatactgttc 720 gattaaaact 780 cgaagcaacg 840 tatccttcag 900 tgagatgttg 960 gttgggcact 1020 gtcctcatgg 1080 gcgagaccgc 1140 actcgagtgc 1200 1237 <210> 68 <211> 1346 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 68 acgacgggcc atgctcgagc atgcaagtcg tgccctttgg ggaaagattt ctcaccaagg acacggccca tgacgcagcc gacgataatg aatacgaagg gtttagtcag ttgagtacgg aagaacacca tggggagcaa gtcggcatgc cggtcgcaag ggtttaattc gagatggagc tgtcgtgaga aggtttggct cgtcaaqtcc agggctgcaa tctgcaactc agtgaattgt ggccgccagt aacggatccc ttcggaacaa atcgccattg cgacgat oct aactcctacg atgccgcgtg acggtacccg gggctagcgt aggtgaaagc aagaggtatg gtgqcgaagg acaqgattag atgcatgtcg at taaaact c gaagcaacgc tttCccttcg tgttgggtta gggcactcta tcatggccct tcccgcgagg gagtgcatga aatacgactc gtgatggata ttcggattag ct cagggaaa gagcggcccg tagctggtct ggaggcagca aatgatgaag gagaagaagc t gotcgga at ccagggctca tggaactccg cgacatactg ataccctggt gtggcgcagc aaaggaattg gcagaacctt gggactggga agtcccgcaa ataggaccgc tacaaggtgg gggagccaat agttgg actatagggc tctgcagaat tggcggacgg cttgagctaa cgtaggatta gagaggatga gtggggaatc gtcttaggat cccggctaac tactgggcgt accttggaat agtgtagagg gtccgt tact agtccacgct taacgcatta acgggggccc accacctttt cacaggtgct cgagcgcaac cggtggtaag gotacacacg ccctaaaagt gaattgggcc tcgcccttca gtgagtaaca taccggataa gctagttggt tcagccacat ttgcgcaatg tgtaaaattc ttcgtgccag aaagggagcg tgcctttgat tgaaattcgt gacgctgagg gtaaacgatg agcact cogo gcacaagcgg gacatgcccg gcatggctgt cctcgctatt ccggaggaag tgctacaatg cgtctcagtt ctctagatgc ggcctaacac cgcgggaacg gcctttcgag gaggtaaaag tgggact gag ggcgaaagcc tttcaccggg cagccgoggt taggcggata actggctatc aga tat tcgg ctcgaaagcg agtgctagtt ctggggagta tggagcatgt gaccgctcca cgtcagctcg agttgccatc gtggggatga gcgactacag cggattgcac 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1346 <210> 69 <211> 1500 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 69 acagctatga gccagtgtgc acggagtggc gggaaacttg ggCccgcgtc tggtctgaga gcagcagtta atgaaggcct agaagccccg cggaatcact agctcaactc actccgagtg tcactggtcc cctggtagtc cgcagctaac gaattgacgg aaccttacca ctggcgggaa agtcccgcaa gggactgccg acgggctggg C gc.g aaa gtcggaatcg acacaccgc catctagagg <210> <211> 1113 ccatgattac tggaattcgc agacgggtga tgctaatacc tgattagcta ggatgatcag ggaatcttgg tagggttgta gctaact tcg gggcgtaaag cagaactgcc tagaggtgaa ggtactgacg cacgccgtaa gcattaagca gggcccgcac gcccttgaca cacaggtgct cgagcgcaac gtgataagcc ctacacacgt L.c.a a a a agc ctagtaatcg caagggcgaa gccuaa L Lg gccaagcttg ccttcaggoc gtaacacgtg gaataagccc gttggtgggg ccacactggg acaat gggcg aagctctttc tgccagcagc cgcacgtagg tttgatactg attcgtagat ctgaggtgcg acgatggatg tcccgcctgg aagcggt gga tgtcccgtat gcatggctgt cctcgccctt gcgaggaagg got acaatgg oagatcagca ttctgcagat ccctaLagtg gtaoogagct taacacatgc ggaacgtgcc ttacggggaa taacggoca actgagacac caagcctgat agoggggaag cgcggtaata cggatottta gggatctcga attcggaaga aaagcgtggg ctagocgttg ggagtacggt goatgtggtt ggacttcaga cgtcagctcg agttgccatc tggggatgac cggtgacagt ggattggggt cgct gcggtg atocatcaoa aqtcgtatta cggatccact aagtcgaacg ctttggttcg agatttatog ccaaggctac ggcccagact coagccatgc ataatgacgg cgaagggggc agtcaggggt gtccggaaga acaccagtgg gagcaaacag gcgggtttac ogcaagatta caattcgaag gatgaggtcc tgtcgtgaga atttagttgg gtcaagtcot gggacgcaat C. C aC. c. C aataogttcc otggcggccg caattcactg agtaacggcc ccgtagcaat gaacaacaca ocaaaggatc gatcagtagc cctacgggag ogcgtgagtg tacccgcaga tagcgttgct gaaatcctgg ggtgagtgga cgaaggcggc gattagatac tcgtcagtgg aaactcaaag oaacgcgcag ttcagttcgg tgttgggtta goactotaag catggccott ggagcaatcc acIC 'a 0 CI ogggccttgt ctcgagoatg googrogttt 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1380 1440 i 500 <212> DNA <213> Unknown organism <220> <223> origin of the sequence:soil organism <400> 70 gagctaatac cggattagct aggatgatca gggaatattg ttagggttgt ggctaactcc tgggcgtaaa ccggaattgc gtagaggtga acatattgac ccacgccgta acgcgttaag ggggcctgca agcgtttgac acaggtgctg gagcgcaacc tgataagccg acacacgtgc gaaaagccgt cgtataatga agttggtagg gccacactgg gacaatgggc aaagctcttt gtgccagcag gcgcacgt ag ctttaagact aattcgtaga gctgaggtgc aacgatgatg tcatccgcct caagcggtgg atgccaggac catggctgtc ctcgtcttta gaggaaggtg tacaatggcg ctcagttcgg cttcggtcca gtaaaagcct gactgagaca gcaagcctga tacccgggaa ccgcggtaat gcggctttgt gcatcgctcg tattcggaag gaaagcgtgg actagctgtc ggggagtacg agcatgtggt ggtttccaga gtcagctcgt gttgctacca gggatgacgt gtgacaacgg attgttctct aagatttatc accaaggcga cggcccagac tccagcaatg gataatgact acggaggggg aagttagagg aattgtggag aacaccagtg ggagcaaaca ggggcgctta gccgcaaggt ttaattcgaa gatggattcc gtcgtgagat tttagttgag caagtcctca gcagcaaact gca gcctgaggat cgatccgtag tcctacggga ccgcgtgagt gt accgggag ctagcgttgt tgaaagcccg aggtaagtgg gcgaaggcga ggattagata gcgtttcggt taaactcaaa gcaacgcgca ttcccttacg gttgggttaa cactctagag tggcccttac cgcgagagtg gagcccgcgt ctggtctgag ggcagcagtg gatgaaggcc aataagcccc tcggaattac gggctcaact aattccgagt cttactggac ccctggtagt ggcgcagcta gaaattgacg gaaccttacc ggacctggac gtcccgcaac aaactgccgg gcgctgggct agcaaatccc 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1113 <210> 71 <211> 1225 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soii organism <400> 71 ggagcggcgg gaaacttcag cccgcgttgg gtctgagagg agcagtaggg gaaggtctta aagccccggc gaattactgg ctcaaccctg gccgagtgta actggcccga tggtagtcca cagctaacgc attgacgggg ccttaccacc gggacacagg gcaacgagcg gccggtggta gggctacaca atccctaaaa Itcgctaataa acgggtgagt ctaataccgt attaggtagt atgatcagcc aatcttgcgc ggattgtaaa taacttcgtg gcgtaaaggg gaattgcctt gaggtgaaat tactgacgct cgctgtaaac attaagcact gcccgcacaa ttttgacatg tgctgcatgg caaccctcgc agccggagga cgtgctacaa gtcgtctcag tcacggatca aacgcgtggg atgtgccctt tggtggggta acactgggac aatgggcgaa atactttcac ccagcagccg cgcgtaggcg cgatactgga tcgtagatat gaggcgcgaa gatgagtgct ccgcctgggg gcggtggagc ccctgatcgc ctgtcgtcag cattagttgc aggtggggat tggcgactac ttcggattgc qcatg aacgtgccct cgggggaaag aaggcctacc tgagacacgg agcctgacgc cggggaagat cggtaatacg gatatttaag tatcttgagt tcggcggaac agcgtgggga agttgtcggc agtacggtcg atgtggttta tggagagatc ctcgtgtcgt catcattaag gacgtcaagt agagggttgc actctgcaac ttggtacgga atttatcgcc aagcctacga cccagactcc agccatgccg aatgacggta aagggggcta tcgggggtga tcgggagagg accagtggcg gcaaacagga atgcatgcat caagattaaa attcgaagca cagttttccc gagatgttgg ttgggcactc cctcatggcc aaacctgcga tcgagtgcat acaactgagg attggagcgg tccatagctg tacgggaggc cgtgtatgat cccggagaag gcgttgcg aagcccaggg tgagtggaat aaggcgactc ttagataccc gtcggtgacg act caaagga acgcgcagaa ttcggggaca gttaagtccc taatgggacc cttacggggt aggggagcta gaagtcggaa 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1225 <210n> '72 <211> 1286 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 72 atgattaqta ctgcccttaa agaaaagctg aggtaatagc gggactgaga gggcaaccct ttaggtcggg agcaccggca atttactggg tcaacctggg tccggtgtag ctggcataat ggtagtccac aqctaacgcg t tgacggggg cttacctacc agagacaggt taacgagcgc ccagtqatga ggccacacac tctcataaag tcgctagtaa gcaatactaa gcgggggata cagcaatgtg tcaccaaggc cacggcccag gatccagcga aagaaggtta aactctgtgc cgtaaagggc aagtgcatcg cgqtgaaatg actgacgctg gctgtaaact ataaqtattc cccgcacaag cttgacatcc gctgcatggc aacccttgcc actggaggaa gtgctacaat cgtctcgtag tcgcgaatca tcqatqacga actaagggaa gcacttgagg gatgatctgt actcctacgg tgccgcgtgg gtaqaggaaa cagcagccgc gcgtaggcgg caaactqtct cgtagagatc aggcgcgaaa atgagtacta cgcctgggaa cggtggagca tgagaatctq tgtcgtcagc cttagttgcc ggcggggacg ggggcgtacg tccggattgg gcattg gcggcggacq actttagcta aggggcctgc aactggtctg gaggcagcag gtqaagaagg tgctattaac ggtaatacag tgagatgtgt gactggagta ggaaggaacg gcgtggggat gatgttggta gtacggccgc tgtggtttaa gcttagtagc tcgtgttqtg atcatttagt acgtcaagtc gagggtcgca agtctgcaac ggtgagtaat ataccgcata gtcagattag agaggacqac tggggaatat ccttcgggtt ttgacggtac agggtgcgag gtgatgtgaa tatgagaggg tcgatggcga cgaacaggat ggggaaccta aaggttgaaa ttcgatgcaa tggaqtgccg aqatqttggg tggggactct atcatqgcct aacccgcgaq tcgactccat acgtaqgaac aactcgagag ctaqttggtg cagtcacact tgqacaatgg gtaaagccct cgacagaata cgttaatcgq agccccaggc tggcgqaatt aggcagccac tagataccct tcqgtatcga ctcaaatgaa cgcgaagaac aaagqagctc ttaagtcccg aaggggaccg ttatgqgtaq ggggagctaa qaagttggaa 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1286 <210> 73 <211> 1288 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 73 cggggcaacc gtgggggata gggaccgcaa aaaagcctac ctgagacacg aaccctgatc gacggaacga accggctaac tactgggcgt acttgggaat cgtgtagcgg ggccaacact agtccacgcc gctaacgcgt tgacgggggc ttacctaccc tagacacagg gcaacgagcg ccggtgacaa ggctacacac tcccagaaag atcgctagta ctggcggcga accagtcgaa ggccttgcgc caaggcgacg gcccagactc cagccatgcc aatcgcgcga tacgtgccag aaagtgtgcg tgcgctcgaa tgaaatgcgt gacgctcatg ctaaacgatg gaagtcatcc ccgcacaagc ttgacatgct tgctgcatgg caacccctgc accggaggaa gtcatacaat cgcgtcgtag atcgcggatc gcggcgaacg agactggcta gagaggagca atccgtagct ctacgggagg gcgtgtgtga gttaatagtt cagccgcggt caggcggctt actacggagc agagatgtgg cacgaaagcg atgactagtt gcctggggag ggt gga tga t aggaacgctg ctgtcgtcag cattagttgc ggtggggatg ggcgcgtaca tccggattgg agcatgtc ggtgagtaat ataccgcatg gccgatgccg ggtctgagag cagcagtggg agaaggcctt cgcgtuggatg aatacgtagg cgcaagtcga cggagtgtgg aggaacaccg tggggagcaa gttggaggag tacggtcgca gtggtttaat cagaaatgta ctcgtgtcgt tacattcagt acgtcaagtc gagggttgcc agtctgcaac gcatcggaac agatcgaaag gattagctag gacgaccagc gaattttgga cgggttgtaa acggtaccgt gtgcgagcgt gtgtgaaatc cagaggaagg atggcgaagg acaggat tag ttaaatcctt agattaaaac tcgatgcaac gcggtqCccg gagatgttgg tgagcactct ct cat ggccc aacccgcgag tcgactccca gtgtcCtctt atgaaagcag ttggtggggt cacactggga cagtgggggc agcactttcg aagaagaagc taatcggaat cccgagctta tggaattcca cggccttctg ataccctggt tagtaacgca tcaaaggaat gcgaaaaacc aaagggaacc gttaagtccc aatgggactg ttatgggtag ggggagccaa tgaagtcgga 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1288 <210> 74 <211> 600 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 74 cgtgccagca agcgcgcgca cattggaaac aaatgcgtag cgctgaggcg aaacgatgag agcactccgc gcacaagcgg gacatcccga gtgcatggtt gccgcggtaa ggtggtttct tgggagactt agatttggag cgaaagcatg tgctaagtgt ctggggagta tggagcatgt tgancgctct gtcgtcagct tacgtaggtg taagtctgat gagtgcagaa gaacaccagt gggagcaaac tagggggttt cgaccgcaag ggtttaattc agagatagag cgtgtcgtga gcaagcgttg gtgaaagccc gaggaaagtg ggcgaaggcg aggattagat ccgcccctta gttgaaactc gaagcaacgc ttttcccttc gatgttgggt tccggaatta acggcttaac gaattccaag actttctggt accctggtag gtgctgcagc aaaggaattg gaagaacctt ggggacattg taagtcccgc ttgggcgtaa cgtggagggt tgtagcggtg ctgcaactga tccatgccgt taacgcatta acgggggccc accaggtctt gtgacaggtg aacgagcgca 120 180 240 300 360 420 480 540 600 <210> <211> 601 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> cgtgccagca agcgcgcgca cattggaaac aaatgcgtag cgctgaggcg aaacgatgag gcactccgcc cacaagcggt acatcccgat gcatggttgt
C
gccgcggtaa ggtggtttct tgggagactt agatttggag cgaaagcatg tgctaagtgt tggggagtac ggagcatgtg gacgctctag cgtcagctcg tacgtaggtg taagtctgat gagtgcagaa gaacaccagt gggagcaaac tagggggttt gaccgcaagg gtttaattcg agatagagtt tgtcgtgaga gcaagcgttg gtgaaagccc gaggaaagtg ggcgaaggcg aggattagat ccgcccctta ttgaaactca aagcaacgcg ttCccttcgg tgttgggtta tccggaatta acggcttaac gaattccaag act ttctggt accctggtag gtgctgagct aaggaattga aagaacctta ggacattggt agtcccgcaa ttgggcgtaa cgtggagggt tgtagcggtg ctgcaactga tccatgccgt aacgcattaa cgggggcccg ccaggtcttg gacaggtggt cgagcgcacc <210> 76 <211> 1236 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 76 tgccctgtag cacatgggga agctagttgg atcggccaca cttccacaat tcgtaaaact taccttatta aagcgttgtc gaaagcccac ggaaagtgga cgaaggcgac gattagatac gccccttagt tgaaactcaa agcaacgcga -;Cccttcgg tgttgggtta gcactctaag atgcccctta tcgcgagagt acggggataa agagttgaaa tagggtaacg ttgggactga ggacgaaagt ctgttgtaag gaaagccacg cggaattatt ggcttaaccg attccaagtg tttctggtct cctggtagtc gctgcagctla aggaattgac agaaccttac ggacattggt agtcccgcaa gtgactgccg tgacctgggc atgctaatct cttcgggaaa ggcgctttcg gcctaccaag gacacggccc ctgatggagc ggaagaacca gctaactacg gggcgtaaag tggagggtca tagcggcgaa gcaactgacg catgctgtaa acgcattlaag gggggcccgc caggtcttga gacaggcggt cgagcgcaac gtgataaacc tacacacgtg catagaaccg ccggagctaa cgtcactaca gcgacgatgc aaactcctac aacgccgcgt gtacgtcagg tgccagcagc cgcgcgcagg ttggaaactg atgcgtagag ctgaggcgcg acgatgagtg cactccgcct acaagcggtg catcccgatg gcatggttgt ccttaatctt ggaggaaggt ctacaatgga ttctcagttc taccggataa ggatgggccc atagccgacc gggaggcagc gagtgatgaa caatggacgt cgcggtaata tggtttctta ggagacttga atttggagga aaagcatggg ctaagtgtta ggggagtacg gagcatgtgg atcgctctgg cgtcagctcg agttgccatc ggggatgacg cggtacaaag ggattgtagg tcctcttccc gcggtgcatt tgagagggtg agtagggaat ggttttcgga accttgacgg cgtaggtggc agtctgatgt gtgcagaaga acaccagtgg gagcaaacag gggggtttcc accgcaaggt tttaattcga agatagagtt tgtcgtgaga atttagttgg tcaaatcatc agtcgctaac ctgcaactcg 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 cctacatgaa gccggaatcg ctagtaatcg cggatc <210> 77 <211> 815 <212> DNA <213> Unknown organism <220> <223> origin of the sequence:soil organism 1236 <400> 77 caagcgttgt tgaaaactcg gggtgactgg gcgaaggcag ggattagata cacgagttcc gcttaaaact cgatgcaacg tgccccgcaa gttgggttaa gggactcata catgcccctt accgcaaggt gaccccatga ccggaattat aggctcaacc aattcctggt gtcactgggc ccctggtagt gtgccgcagc caaagaaatt cgaagaacct ggtcggtgta gtcccgcaac ggagactgcc atgtcttggg ggagcgaatc agtcggagtc tgggcgtaaa tcgggcttgc gtagcggtgg cgcaactgac ccatgccgta aaacgcatta gacgggggcc taccaaggct caggtggtgc gagcgcaacc ggggtcaact cttcacgcat ccaaaaagcc gctagtaatc gagctcgtag agtgggtacg aatgcgcaga gctgaggagc aacgttgggc agtgccccgc cgcacaagcg tgacatacac atggttgtcg ctcgtcctat cggaggaagg gctacaatgg ggtctcagtt gcaga gcggtttgtc ggcagactag tatcaggagg gaaagcatgg actaggtgtg ctggggagta gcggagcatg cggaaacttc tcagctcgtg gttgccagca tggggatgac ccggtacaaa cggattgggg gcgtctgctg agtgcggtag aacaccgatg ggagcgaaca gggctcattc cggccgcaag cggattaatt cagagatggt tcgtgaagat cgtgatggtg gtcaaat cat gggctgcgat tctgcaact c 120 180 240 300 360 420 480 540 600 660 720 780 815 <210> 78 <211> 826 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 78 tcgtaggtgg gatacgggct cgcagatatc aggagcgaaa ttgggcgcta cccgcctggg agcggcggag cgcccggaaa gtcgtcagct aatgttgcca ctcggaggaa atgctacaat ccggtctcag tcgcagatca cttgtcacgt ggctagaggt aggaggaaca gcgtggggag ggtgtgggga gagtacggcc catgttgCtt gcttcagaga cgtgtcgtga gcaacatcct ggtggggaCg ggccggtaca ttcggattgg gcaacgctgc cgggtgtgaa aggtagggga ccggtggcga cgaacaggat ccttccacgg gcaaggctaa aattcgacgc tggagccctc gatgttgggt tcggggtggt acgtcaagtc gagggttgcg ggtctgcaac ggtgaatacg agcttggggc gaacggaatt aggcggttct tagataccct tttccgcgcc aactcaaagg aacgcgaaga ttcggactgg taagtcccgc tggggactca atcatgcccC ataccgcaag tcgaccccat ttcccgggcc ttaactccag cctggtgtag ctgggcctta ggtagtccac gtagctaacg aattgacggg accttaccaa gtgacaggtg aacgagcgca ttggagactg ttatgtcttg gtggagcgaa gaagtcggag ttgtac gtctgcattc cggtgaaatg cctgacgctg gctgtaaacg cat taagcgc ggcccgcaca ggcttgacat gtgcatggct acccttgttc ccggggtcaa ggctgcaaac tccctaaaag tcgctagtaa 120 180 240 300 360 420 480 540 600 660 720 780 826 <210> 79 <211> 799 <212> DNA <213> Unknown organism <220> <223> origin of the sequence:soil organism <400> 79 cgtaggcggt ttgtcgcgtc tgccgtgaaa gtccggggct caactccgga tctgcggtgg gtacgggcag actagagtga tgtaggggag actggaattc ctggtgtagc ggtgaaatgc 120 gcagatatca ggaggaacac cgatggcgaa ggcaggtctc tgggcattaa ctgacgctga 180 ggagcgaaag catggggagc gaacaggatt agataccctg gtagtccatg ccgtaaacgt 240 tgggcactag cccgcctggg agcggcggag gaaccggaaa ttgtcgtcag tctatgttqc gaaggtgggg aatqgccggt cagttcggat tcaqcaacgc gtgtggggga gagtacggcc catgcggatt cacctggaaa ctcgtgtcgt cagcgcgtta acgacgtcaa acaaagggtt tggggtctgc tgcggtgaa cattccacgt gcaaggctaa aattcgatgc caggtgcccc gagatqttgg tggcggggac atcatcatgc gcgatactgt aactcgaccc tttccgcgcc aactcaaagg aacgcgaaga gcttgcggtc gttaagtccc tcataggaga cccttatgtc gaggtggagc catgaagtcg gtagctaacg aattgacggg accttaccaa ggt t tacagg gcaacgagcg ctgccggggt ttgggcttca taatcccaaa gagtcgctag cattaagtqc ggcccgcaca ggcttgacat tggtgcatgg caaccctcgt caactcggag cgcatgctac aagccggtct taatcgcaga 300 360 420 480 540 600 660 720 780 799 <210> <211> 1250 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> tgccagcttg aactctggga tggggggtgg aggtaatggc gggactgaga gcgaaagcct ttcagtaggg ccagcagccg ctcgtaggcg gggtacgggc gcgcagatat gaggagcgaa gttgggcact gcCccgcctg caagcggcgg atgaaccgga ggttgccgtc gttctatgtt aggaaggtgg acaatggccg ctcagttcgg ctggtggatt taagcctggg aaagcttttt tcaccaaggc cacggcccag gatgcagcga aagaagcgaa cggtaatacg gtttgtcgcg agactagagt caggaggaac agcatgggga aggtgtgggg gggagtacgg agcatgcgga aatacctgga agctcgtgtc gccagcgcgt ggacgacgtc gtacaaaggg attggggtct agtggcgaac aaactgggtc gtggttttgg gacgacgggt act tctacgg cgccgcgtga agtgacggta tagggcgcaa tctgccgtga gatgtagggg accgatggcg gcgaacagga gacattccac ccgcaaggct ttaattcgat aacaggtgcc gtgagatgtt tatggcgggg aaatcatcat ttgcgatact gcaactcgac gggtgagtaa taatgccgga atggactcgc agccggcctg gaggcagcag gggatgacgg cctgcagaag gcgttatccg aagtccgggg agactggaat aaggcaggtc ttagataccc gttttccgcg aaaactcaaa gcaacgcgag ccgcttgcgg gggttaagtc act catagga gccccttatg gtgaggtgga cccatgaagt cacgtgagta tatgactcct ggcctatcag agagggtgac tggggaatat ccttcgggtt aagcgccggc gaattattgg ctcaactccg tcctggtgta tctgggcatt tggtagtcca ccgtagctaa ggaattgacg gaaccttacc tcggtttaca ccgcaacgag gactgccggg tcttgggctt gctaatccca cggagt cgct acctgccctt catcgcatgg cttgttggtg cggccacact tgcacaatgg gtaaacctct taactacgtg gcgtaaagag gatctgcggt gcggtgaaat aactgacgct tgccgtaaac cgcattaagt ggggcccgca aaggcttgac ggtggtgcat cgcaaccctc gtcaactcgg cacgcatgct aaaagccggt 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1250 <210> 81 <211> 1210 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 81 cgctaatacc ggatacggcg cactgaggga cgagcctgcg aagacgggta gctggtctga ctcctacggg aggcagcagt gccgcgtgag cgatgagggc cggtgaagag tcggccttga agccgcggta atacggaggg aggcggcgtg ataagttggg ctgtcacgct tgaatctcgg gatatcggga ggaataccag gcgaaagcgc ggggagcaaa gtgctagacg ggggaggtat cctggggagt acggtcgcaa gtggagcatg tggtttaatt cgagagtctt cggactttcg gcccatcagc tagttggtga gaggatgatc agccacactg ggggaatatt gcgcaatggg cttcgggtcg taaagctctg cggtatctcc ttagcaagca tgcaaacgtt gctcggaatc tgtgaaagcc ctgggctcaa agggggtcag agaattcccg tggcgaaggc gctggcctgg caggattaga taccctggta tgaccccttc gctgccgaag gactaaaact caaaggaatt cgacgcaacg cgcaaaacct cgagaaagat ggtaagagct gaactgagac cgaaagcctg tggggagaga ccggctaact attgggcgta cccaggaagt gtgtagaggt acgaagattg gtccgcgctg ctaacgcgtt gacgggggcc tacctgggtt tcgcaaggat caccaaggct acggtccaga acgcagccac cgaataaggc ccgtgccagc aagcgcacgt gcattcaaaa gaaattcgta acgctgaggt taaacgatga aagcactccg cgcacaagcg aaatccgccg 120 180 240 300 360 4 480 540 600 660 '72 0 780 840 gaacctgqct gtcgtcagct tcagttgcca tggggacgac ctggtacaat cggatcggag acgctttcgg gaaaggctgq cgtgtcgtga acattaaggt gtcaagtcct gagccgcaaa tctgcaactc ggtgCcctcc gatgttgggt gggaactctg catggccctt accgcgaggt gactccgtga qgggaatcgg taagtcccgc gcgagactgc atgcccaggg caaqctaatc agctggaatc tgagaaggtg aacgagcgca cggtctaaac ctacacacgt tcaaaaaacc gctagtaatc ctgcatggct acccctatcg cggaggaagg gctacaatgg agtctcagtt gaagatcagc 900 960 1020 1080 1140 1200 1210 <210> 82 <211> 1272 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 82 gatgccagct ttaactctgg ggtggggggt tgaggtaatg ctgggactga gggcgaaagc ctttcagtag tgccagcagc agctcgtagg gtgggtacgg atgcgcagat ctgaggaacg acqttgggca gtgccccgcc cacaagcggc acatgaaccg atggttgtcg tcgttctatg ggaggaaggt ctacaatggc gtcccagttc cagatcagca tgctggtgga gataagcctg ggaaagcttt gctcaccaag gacacggccc ctgatgcagc ggaagaagcg cgcggtaata cggtttgtcg gcagactaga atcaggagga aaagcatggg ctaggtgtgg tggggagtac ggagcatgcg gaaatacctg tcagctcgtg ttgccagcgc ggggacgacg cggtacaaag ggattggggt ac ttagtggcga ggaaactggg t t gtggt t tt gcgacgacgg agactcctac gacgccgcgt aaagtgacgg cgtagggcgc cgtctgccgt gtgatgtagg acaccgatgg gagcgaacag gggacattcc ggccgcaagg gattaattcg gaaacaggtg tcgtgagatg gttatggcgg tcaaatcatc ggttgcgata ctgcaactcg acgggtgagt tctaataccg ggatggactc gtagccggcc gggaggcagc gagggatgac tacctgcaga aagcgttatc gaaagtccgg ggagactgga cgaaggcagg gattaqatac acgttttccg ctaaaactca atgcaacgcg ccccgcttgc ttgggttaag ggactcatag atgcccctta ctgtgaggtg accccatgaa aacacgtgag gatatgactc gcggcctatc tgagagggtg agtggggaat ggccttcggg agaagcgccg cggaattatt ggctcaactc attcctggtg tctctgggca cctggtagtc cgccgtagct aaggaattga aagaacctta ggtcggttta tcccgcaacg gagactgccg tgtcttgggc gagctgatcc gtcggagtcg taacctgccc ctcatcgcat agcttgttgg accggccaca attgcacaat ttgtaaacct gctaactacg gggcgtaaag cggatctgcg tagcggtgaa ttaactgacg catgccgtaa aacgcattaa cgggggcccg ccaaggcttg caggtggtgc agcgcaaccc gggtcaactc ttcacgcatg caaaaagccg ctagtaatcg 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1272 <210> <211> <212> <213> 83 1247
DNA
Unknown organism <220> <223> Origin of the sequence:soil organism <400> 83 tgtttagtag tgcccaagag gaaaagttgc gaggtaatag tgggactgag ggggcaaccc tttaggcggg taagcaccgg ggatttactg gctcaacctg tttccggtgt acctggcata ctggtagtca gaagctaacg aattgacggg caatactaaa agggggacaa ccgtaagggt ctcaccaaga acacggccca tgatccagcg gaagaaggat caaac~cl.gt ggcgtaaagg ggaagtgcat agcggtgaaa atactggcgc cgcccgtaaa cgataagttc ggcccgcaca tgatgacgag ccaagggaaa ggcgcttttg ctgtgatcgg gactcctacq atgccgcgtg atgqgatgaa gccagcagcc gcgcgtaggc cgcaaacgac t-gcgtaag tgaggcgcga cgatgagaac tccgcctggg agcggtggag cggcggacg ctttggctaa gaggggcctg taactggtct ggaggcagca ggtgaagaag taagcctgta gcggtaatac ggttgtgtga acaactggag tcga aq~a a aagcgtgggg tagatgttgg aagtacagtc catgtggttt gtgaggaaca taccgcataa cgtccgatta gagaggacqa gtggggaatc gccttcgggt ttttgacggt agagg-tgcg gtgtgatgtg tatatgagag cgtlcqatggc agcgaacagg agggggaacc gcaagactga aattcgatgc cgtaggaacc tctctacgga gttagttggt ccagtcacac ttggacaatg tgtaaagccc acccgcagaa agcgttaatc aaagccccgg ggtggcggaa gaag goaocc attagatacc cttcagtatc aactcaaaag aacgcgaaga 120 180 240 300 360 420 480 540 600 720 780 840 900 accttacctg gcagagacag cgtaacgagc cgccggtgat agggctacac aatctcttaa cccttgacat gtgctgcatg gcaacccttg gaaccggagg acgtgctaca agcgtctcgt cctgcgaatc gctgtcgtca tccttaqttg aaggcgggga atggggcgta agtccggatt ttgccgagag gCtcgtgttq ccat cat tta cgacgtcaag cagagggtcg ggagtctgca gtgagagtgc cgcagggagc tgagatgttg ggttaagtcc gttggggact ctaaggagac tcatcatggC ctttatggqt ccaacccgcg agggggagcc actcgac 960 1020 1080 1140 1200 1247 <210> 84 <211> 1292 <212> DNA <213> Unknown organism <220> <223> origin of the sequence:soil organism <400> 84 ggctcgcaag gggatagccg agaggaaagc tggcccacca gagagacggc ggctgacgca cgggacgaac tccgtgccag aaagggcgcg ggccatgaat gtggaatgcg tgacactgag cgtaaacgat ttaagtgccc cccgcacaag cttgacatac tgctgcatgg caacccctgt caaaccggag cacgtgctac aaccccgcct taatcgtgga agcaaccggc ggctaacgcc tccggcqcac aggcgacgac ccagactcct gcgacgccgc agcctclttc cagccgcggt taggtggccc actgccgcgg tagagatccg gcgcgacagc gggcactagg cgcctgggga cggtggagca acgggaaacc ctgtcgtcag ctctagttgc gaaggtgggg aatggcgggg cagttcggat tcagctacgc gaacgggtgc cgggtaatac ggggaggggt gggtagctgg acgggaggca gtgtgggagg gagaggtctg aatacggagg ggtcagttcg ctggagcact gaagaacacc gtggggagca cgcttggggg gtacggccgc tgtggtttaa ggtcagaaac ctcgtgtcgt cagcgcgtca atgacgtcaa acagagggtc tgtcgtctgc cacggtgaat gtaacacgtg cgcatacgtt tcgcggccta tctgagagga gcagtgggga acgcctttcg acggtaccgg gtgcgagcgt tggtgaaagc gtagaggcag ggtggcgaag aacaggatta agcgaccccc aaggctgaaa ttcgacgcaa ggccggCcct gagatgttgg tggcggggac gtcatcatgg gcgagccggc aactcgacgg ac aacaacctgc ctctctgggg tcagctagtt tggccagcca atcttgcgca gggtgtaaac gtgaggaagc tgtccggaat gcggggctca gcggaattcc gcggcctgct gataccctgg cgagggccgg ctcaaaggaa cgcgaagaac cttcggagcc gttaagt ccc tctagagaga tccttacgtc aacggcaagc catgaagctg cctcgtgtgg agtcctgggg ggcggggtaa cattgggact atggccgaaa cactqttgcc accggctaac cattgggcgt accctgcgtc gggtgtagcg gggcagtagc tagtccacgc cgctaacgca ttgacggggg cttacctagg cgtgcacagg gcaacgagcg ctgccggtgc tagggctaca caatcccgta gaatcgctag 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1292 <210> <211> 1300 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> tcccttcggg agactgggat ccttggggca attagctagt atggccagcc aatattgcgc gggtcgtaaa acggtacctc gtgcgagcgt gtgtgaaagc gagaggaggg gtggcgaagg acaggattag ttgaccccct aggttaaaac tcgacgcaac agcaagtaca aacctgccga aaaggtggcc tggcggggta acacaqggac aatgggcgaa gccctgtcaa tgaaggaagc tgttcggaat ccggggctca tggaattcct cggccctctg ataccctggt gagtgccgca tcaaaggaat gcgaagaacc gcggcgaacg aaggcgggct tctacttgta acggcccacc tgagacacgg agcctgacgc gagggacgaa accggctaac Lactyyycgt accccggaag ggtgtagcgg gacggatact agtccacgct gctaacgcat tgacgggggc ttacctgggc ggtgagtaac aataccagat agctaccact aaggcagaga cccagactcc agcgacgccg accttgtcga tccgtgccag aaaycgcgtg tgcattggat tgaaatgcgt gacgctgaga gtaaacgatg taagtgcccc ccgcacaagc tagacaacat acgtaggtaa aagaccacga ccgggatggg tggctagctg tacgggaggc cgtgggtgat cctaacacgt cagccgcggt taggcggcct actgggaggc agatatcagg cgcgaaagcg ggcactaggt gcctggggaa ggtggagcat cggacagcct cctaccctgg gggctgcggC cctgcgcgcc gtctqagagg agcagtgggg gaaggccttc cggcaacctg aatacggagg tggagtaccg aggaacacct tggggagoaa gttcggggta tacggccgca gtggtttaat cagaaat gag gtctccccgc agatgttggg tgagcactct cctcatggcc gaacccgcga ctcgactcca aaqgggccgg ttaagtcccg aqagagactg cttatgtcca ggggaagcca tgaaggcgga tggttcaggt caacgagcgc cccngtgtta gggctacaca atcccaaaaa atcgctagta gctgcatggc aacccctgtc aacgggagga cgtgctacaa gtcgctctca atcgcggatc tgtcgtcagc tcgtgtcgtg tctagttgct accattcagt aggtggggac gacgtcaagt tgggcgatac aaagggctgc gttcggattg gagtctgcaa 1020 1080 1140 1200 1260 1300 <210> 86 <211> 1186 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 86 caatgggcag ctcaqggaaa gatcagcccg tagctggttt ggaggcagca agtgatgaag tagaagaagc tgttcggaat ccggagctca tggaataccc cggctcactg ataccctggt gtggcgcagc agaggaattg gcagaacctt gggacgcaga agt cccgcaa gggaccgccg caccctgggc agctaatccc cggcggacgg cttgagctaa cgttggatta gagagaacga gttgggaatc gccttcgggt tccggctaac tactgggcgt actccggaat agtgtagagg gctcgtaact agtccacgcc taacgcatta acgggggccc accagggttt gacaggtgct cgagcgcaac gcgacaagcc tacacacgtg aaaaaaccgt gtgagtaaca tgccgcatac gctagttggt ccagcctcac ttggacaatg tgtaaaactc ttcgtgccag aaagcgtgcg tgccattgaa tgaaattcgt gacgctcagg gtaaacgatg agcgttccgc gcacaagcgg gacatcctgt gcatggctgt cctcgccttt ggaggaaggt ctacaatggc cccagttcag cgtgggaatg gcccttacgg gaggtaatgg tgggactgag ggggaaaccc tttcgacggg cagccgcggt caggcggcta actgtttagc aga tat tggg cacgacagcg aacgctagcc ctggggagta tggagcatgt gCtcgccggt cgtcagctcg agttgccatc ggggatgacg ggtgacagtg attgcactct tacctttcgg ggaaagattt cccaccaagg acacggccca tgatccagcc gacgataatg aatacgaagg tccaagtcag ttgagtacga tagaacaccg tggggatcaa gttggatagc cggccgcaag ggtttaattc gaaagccggt tgtcgtgaga attcagttgg tcaagtcccc ggcacgagct gcaact tgcggaacaa atcgccgaaa cgacgatcca gactcctacg atgccgcgtg acggtacccg gggctagcgt tggtgaaagc gagaggtgag gtggcgaagg acaggattag ttgctattca gttgagactc gacgcaacgc tttcccgcaa tgttgggtta gcactctaga at ggccctt a cgcgagagtc 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1186 <210> 87 <211> 1454 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 87 cgacggccag gctcgagcgg gcaagtcgag acgtgggcaa aagactacag aggaggagcc ggtagctggt cgggaggcag tgggtgatga aaatagccgt gccgcggtaa ggcggtCtcg tgcgaggctg cgaaagcgtg cactaggtgt ctgggaagta tggagcatgt tgaattgtaa ccgccagtgt cgagaaaggg tctgtccttg gaggcaactc cgcggcctat ctgagaggat cagtggggaa aggccttcgg gagaggtgac gacggag ggt caagtctggc gagtgccgga gaat-,acoggt gggagcaaac cgggggtatc cggtcgcaag ggttcaattc tacgactcac gatggatatc cgcttcggcg agatggggat ccgtggttaa cagctagttg gaccagccac tattgggcaa gtcgtaaagc ggtaccgccg gcaagcgttg gtgaaagccc ggggagagtg ggcgaaagcg aggattagat cactccctcg attaaaactc gatgcaacgc tatagggcga tgcagaattc cctgagtaca aacccagcga agggtgctct gtagggtcac acggggactg tgggggaaac cctgtcgggc aaggaagcac ctcggaatca aaggctcagc gaattcccgg ac l c C.Cgg a accctggtag gtgccgccgc aaaggaattg gaagaacctt attgggccct gcccttcagg gcggcgcacg aagttgggct ctgcggggag ggcctaccaa agacacggcc cctgacccag ggaacgaagg cggccaactc ctgggcgt aa cttggaagtg tgtagcggtg cggcaacctga tccacgccgt taacgcagta acgggggccc acctgggttt ctagatgcat cctaacacat ggtgcgtaac aataccgaat catgcgcttg ggcgaagacg ccgactccta cgacgccgcg ttctcacggc cgtgccagca agggtgcgta cgctcgaaac aaatgcgtag cgctgaggca aaacgatgga agtgtcccgc gcacaagcgg gacatctggc 120 180 240 300 360 420 480 540 600 660C 720 780 8 4 0 900 960 1020 1080 gaatctctgg gaaaccaqag agtgcccgca ggggagcgcc aagacaggtg ctgcatqgct 1140 gtcgtcagct cgtgccgtga ggtgttgggt taagtcccgc aacgagcgca acccttaccc 1200 ttagttgccc ccgqgtcaag ccgtggcact ccaagggaac tgcccgtgtt aagcgggagg 1260 aaggtgggga cgacgtcaag tcatcatqgc ctttatatcc agqgctacac acgtgctaca 1320 atggctggga canagcgtgg ccaacgcgcg agcgggagct aatcgcaaaa ccccagcctc 1380 agttcggatc ggagtctgca actcgactcc gtgaagctgg aatcgctagt aatcgcggat 1440 cagcatgccg cggt 1454 <210> 88 <211> 1307 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 88 cccttcgggg tgactgggat ctttggggta attagctagt atggccagcc aatattgcgc gggtcgtaaa acggtacctc gtgcaagcgt gtgtgaaagc gagaggaggg gtggcgaagg acaggattag ttgaccccct aggctaaaac tcgacgcaac g t ct.t cc cgc agatgttggg tgagcactct ctcatggccc aacccgtgag tcgactccat agcgagtaca gcggcgaacg ggtgagtaac aacttgccga aaggcgggct aataccagat aaagatggcc tctgcttgca tgctatcacg tggtgaggta acggctcacc aaggcagaga acactgggac tgagacacgg cccagactcc aatgggcgaa agcctgacgc agcgacgccg gccctgtcaa gagggacgaa acctcgccga tgaaggaagc accggctaac tccgtgccag tgttcggaat cactgggcgt aaagcgcgtg ccggggctca accccggaag agcattggat tggaattcct ggtgtagcgg tgaaatgcgt cggccctctg gacggatact gacgctgaga ataccctggt agtccacgcc gtaaacgatg gagtgccgca gctaacgcat taagtacccc tcaaaggaat tgacgggggc ccgcacaagc gcgaagaacc ttacctgggc tagacaacac aagggactgg tggttcaggt gctgcatggc ttaagtcccg caacgagcgc aacccctgtc agagagactg cccgtgttaa acgggaggaa ttatgtccag ggctacacac gtgctacaat ggggagccaa tcccaaaaag ctgttctcag gaaggcggaa tcgctagtaa tcgcggatca acgtaggtaa aagaccacga ccgggatggg tggctagctg tacgggaggc cgtgggtgat cccaatacgt cagccgcggt taggcggcct act ggaaggc agatatcagg cgcgacagcg ggtactaggt gcctggggac ggtggagcat tggacagccc tgtcgtcagc tctagttgct ggtggggacg ggacagtaca ttcggattgg gcatqcc cctaccctgg gggctgcggc cctgcgcgcc gtctgagagg agcagtgggg gaaggccttc cggcgacctg aatacggagg tcttagtctg tggagtaccg aggaacaccg tggggagcaa gttcggggta tacggccgca gtggtttaat cagaaatggg tcgtgtcgtg accattaagt acgtcaagtc aagggctgcg agtctgcaac 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1307 <210> 89 <211> 1305 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 89 gggagcaatc cccaagtaga gcggcgaacg ggtgagtaac gcgtgggtaa agtggggaac aacatcggga aactggtgct aataccgcat aacatcgttg tctgacgatc aaagccgggg accgcaaggc ctggcgcttg gagaggagcc tagctagttg gtggggtaat ggcccaccaa ggcttcgatc ggtagccggc ggacggccac actgggactg agacacggcc cagactccta cgggaggcag tttttcgcaa tgggcgaaag cctgacgaag caacgccgcg tggaggatga gtcgtaaact cctgtcgacc gggacgaaag taggatggcc taatacgccg tgtaccggtg gaggaagcca cggctaactc tgtgccagca gccgcggtaa gcaagcgttg ttcggaatta ctgggcgtaa agggcgcgta ggcggcttgg gtgaaatccc tcggctcaac tgaggaactg cacgggaaac tgcctggctt gagggaagtg gaattccggg tgtagcggtg aaatgcgtag atatccggag ggcgaaggcg gcttcctgga ccgacactga cgctgaggcg cgaaagctag gggattagat accccggtag tcctagctgt aaacgatgag tgctgggtgt aaccccccct gtgccgaagc taacgcatta agcactccgc ctggggagta gctgaaactc aaaggaattg acgggggccc gcacaagcgg tggagcatgt gacgcaacgc gaagaacctt accggggttt gaactgtacg ggacagctct tctgcctccg ggtcttcgga cgcgtccgat ctgagagggc cagtggggaa gggccttcgg atctattgac tacagaggtg t ca gtc cg t gagttcggga gaacaccggt qggagcaaac agggggtatc cggtcgcaag ggttcaattc agagatagag t ct tcct tcg gttgggttaa ggcactctgg catggccttt accgcgaggt gactgcatga ggacccgtac gt cccgcaac agagactgcc atgccccggg ggagctaatc agttggaatc agaggtgctg gagcgcaacc ggtgataaac ctacacacgt ccaaaaagcc gctagtaatc catggctgtc cttgcctcct cggaggaagg gctacaatgg ggtcccagtt gcggatcagc gtcagctcgt gttgccatca tggggatgac ccggtacaaa cggattgcag atgcc gtcgtgagat ggtaaagctg gtcaagtcct gggtcgcaaa tctgcaactc 1020 1080 1140 1200 1260 1305 <210> <211> 1299 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> gggctttcgg cgagtgtgga gacttgagga cattagctag gctgtccggc gaatcttgcg cgggtcgtaa gacggtaccg ggggctagcg catgtgaaag ggagaggtga agtggcgaag aacgggatta atcgacccct aaggctgaaa ttcgacgcaa gggtcttcct gatgttgggt cgggaactct ctcatggcct aaggcgcgag ctcgactgca gtcctgagta ataacctggc ccaaaggtgg ttggtggggt cacactggaa caatggggga agccctgtcg ctagaggaag ttattcggaa ccctcggctc gtggaattcc gcggctcact gataccccgg gccgtgctga ctcaaaggaa cgcgaagaac tcgggacacc taagtcccgc agggggaccg ttatgtccag ccggagccaa tgaaggtgga aagtggcgaa gaaagccggg cgagctttga gatggcctac ccgagacacg aaccctgacg agcgggacga ccccggctaa ttattgggcg aaccggggaa cagtgtagcg ggaccggtac tagtcctggc agctaacgca ttgacggggg cttacctggg tgtagaggtg aacgagcgca ccggtgataa ggctacacac tcccaaaaag atcgctagta cgggtgagta ctaataccgc gcgctgtcgc caaggcgacg gtccagactc caacgacgcc accgtgcgag ctccgtgcca taaagggcgt ctgcatggga gtgaaatgcg tgacgctgag tgtaaacgat ttaagtgctc cccgcacaag tttgaactgc ccgcatggct acccctactc accggaggaa gtgctacaac ccgttctcca atcgcggat acgcgtaggt atgacgtctt tcgagaaggg atgggtagcc ctacgggagg gcgtgggcga ctctaacata gcagccgcgg gtaggcggct aactgcggag tagatattgg acgcgaaagc gagcacttgg cgcctgggga cggtggagca aggtgacagc gtcgtcagct ctagttgcca ggtggggatg ggacggtaca gtgcggattg aacctgacct cgggtcttcg gcctgcgtcc gggctgagag cagcagtggg tgaaggcctt gctcgtgcct taatacggag ctgtgtgtcc cttgagtccg gaggaacacc caggggagca tgtggcgggt gtacggccgc tgtggttcaa ccc tgaaagg cgtgtcgtga gcggCtcggc acgtcaagtc aagggctgcg cagtctgcaa 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1299 <210> 91 <211> 1296 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 91 atgtctggta ctgcccagaa gaagaaagct gtgaggtaaa act gg gact g tgggggcaac actttagttg aataagcacc tcggagttac gggcttaacc aatttccggt ctacctggac ccctggtagt tatcgaagct aaggaattga aggaacctta gcaataccag gagggggaca tggcgcaagc ggctcaccaa agacacggcc cctgatccag gggaagaagt ggctaactct tgggcgtaaa taggaaccgc gtagcggtga tgacactgac ccacgctgta aacgcgataa cgggggcccg cctacccttg atgatggcaa acccggggaa caggcgcttt ggcagagatc cagactccta cgatgccgcg aatgtttttt gtgccagcag gggcgcgtag attttagact aatgcgtaga qctQaaacgc aacgatgaga gttctccgcc cacaagcggt acatccacag gtggcggacg act cgggcta tggaggaacc ggtagctggt cgggaggcag tgtgtgaaga aatagagagc ccgcggtaat gcggtgttgc gcaatgctag gatcggaagg ga gagcgtgg9 actagatgtt tggggagtac ggagcatgtg aatttgatag ggtgagtaat at accgcat a tacgtccgat ctgagaggat cagtggggaa aggcct tcgg attgttgacg acagagggtg aagtgagatg agtacagtag aacaccagtg ggagcaa; r= ggtgcgcgcg ggccgcaagg gtttaattcg agatatcgaa acgtagggat ctattctgag tagctagttg gaccagccac tattggacaa gttgtaaagc gtacccaaag caagcgttaa tgaaatccct agggtagtgg gcgaaggcga agcgcacaag ttaaaactca atgcaacgcg gtgccgaaag 120 180 240 300 360 420 480 540 600 660 780 840 900 960 gaactgtgag gtcccgtaac ggagactgcc atgggtaggg ggagccaatc agtcggaatc acaggtgctg gagcgcaacc ggtgaagaac ctacacacgt ccggaaagcg gctagtaatc catggctgtc cttatcctta cggaggaagg gctacaatgg cctcgtagtc gcgaatcaga gtcagctcgt gttgccaaca tggggacgac ggcgtacaga cagattgaag acgtcc gttgtgagat cgtaatggtg gtcaagtcat gggttgccaa tctgcaactc gttgggttaa gggactctaa catggccttt cctgcgaagg gacttcatga 1020 1080 1140 1200 1260 1296 <210> 92 <211> 1250 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 92 gtctggtagc gcccagaaga aaaaagcttg gaggtaaagg tgggactgag ggggcaaccc tttagttggg taagcaccgg ggagttactg gcttaaccta tttccggtgt acctggactg ctggtagtcc tcgaagctaa ggaattgacg gaaccttacc actgtgagac cccgtaacgg agactgccgg gggtagggct agccaat ccc aataccagat gggggacaac gcgcaagcca ctcaccaagg acacggccca tgatccagcg gaagaagtaa ctaactctgt ggcgtaaagg ggaaccgcat agcggtgaaa acactgacgc acgctgtaaa cgcgataagt ggggcccgca tacccttgac aggtgctgca gcgcaaccct tgaagaaccg acacacgtgc ggaaagcgcc gatggcaagt ccggggaaac ggcgcttttg cagagatcgg gactcctacg atgccgcgtg tgttttttaa gccagcagcc gcgcgtaggc tttagactgc tgcgtagaga tgaggcgcga cgatgagaac tctccgcctg caagcggtgg atccacagaa tggctgtcgt tatccttagt gaggaaggtg tacaatgggg tcgtagtcca ggcggacggg tcgggctaat gaggaaccta tagctggtct ggaggcagca tgtgaagaag tagagagcat gcggtaatac ggtgttgcaa aatgctagag tcggaaggaa gagcgtgggg tagatgttgg gggagtacgg agcatgtggt tttgatagag cagctcgtgt tgccaacacg gggacgacgt cgtacagagg gattgaagtc tgagtaatac accgcatact cgtccgatta gagaggatga gtggggaata gccttcgggt tgttgacggt agagggtgca gtgagatgtg tacagtagag caccagtggc agcaaacagg tgcgcgcgag ccgcaaggtt ttaattcgat atatcgaagt tgtgagatgt taatggtggg caagt cat ca gttgccaacc tgcaactcga gtagggatct attctgagga gctagttggt ccagccacac ttggacaatg tgtaaagcac acccaaagaa agcgttaatc aaatccctgg ggtagtggaa gaaggcgact attagatacc cgcacaagta aaaactcaaa gcaacgcgaa gccgaaagga tgggttaagt gactctaagg tggcctttat tgcgaagggg 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1250 <210> 93 <211> 1545 <212> DNA <213> Unknown organism <220> <223> origin of the sequence:soil organism <400> 93 ccaggaaaca aacggccgcc gcacagggga ctgcccagtc gtgaaagcgg ttggcggggt cacactggaa caatgggcgc agctcttttg aagaataagc tactcggaat cctgggctca cggaattccc cggccatctg ataccctggt cagtatcgaa tcaaaggaat gctatgacca agtgtgctgg gct tgct ccc gtgggggata aggaccgcaa aacggcccac ctgagacacg aagcctgatc tccggaaaga accggctaac tactgggcgt acctgggaat aatgtaqcag gaccagcact agtccacgcc gctaacgcgt tgacgggggc tgattacgcc aattcgccct tgggtggcga acctcgggaa ggcttcgcgc caaggcgacg gtccagactc cagccatgcc aaagctttcg ttcgtgccag aaagcgtgcg tgcactggat taaaatacat gacactgagg ctaaacgatg taagttcgcc ccgcacaagc aagcttggta tcaggcctaa gtggcggacg accgggacta gattggatga atccgtagct ctacgggagg gcgtgagtga gttaataccc cagccgcggt taggtggttt actggcaggc cacgaaagcg cgaactggat gcctggggag ggtggagtat ccgagctcgg cacatgcaag ggtgaggaat ataccgcata gccgatgtcg ggtctgagag cagcagtggg agaaggcctt ggaagtcctg aatacqaagg gttaagtctg tagagtgcgg aggaaca-t tggggagcaa gttgggagca tacggtcgca gtggtttaat atccactagt tcgaacggca acatcggaat cgaccttagg gattagcttg gatgatcagc gaatattgga cgggttgtaa acggtaccgg gtgcaagcgt atgtgaaagc tagaggatgg acaggattag actaggctct agactgaaac tcgatgcaac 120 180 240 300 360 420 480 540 600 660 720 840 900 960 1020 gcgaagaacc cggnaaccgt taagtcccgc taaggagact cttacggcca ggtagagcca tgaagtcgga ttgtacacac catgcatcta ttacctggcc gagacaggtg aacgagcgca gccggtgaca gggctacaca atcccagaaa atcgctagta cgcccaaggg gagqgcccaa ttgacatcca ctgcatggct acccttgtcc aaccggagga cgtactacaa accgatccca atcgcggatc cgaattctgc ttcgccctat cggaacttac gtcgtcagct ttagttgcca aggtggggat tggtcggtac gtccggatcg agaatgccgc agatatccat agtgagtcqt cagagatggt cgtgtcgtga gcacgtaatg gacgtcaaqt agagggttgc aagtctgcaa ggtgaatacg cacactggcg attac ttggtgcctt gatgttgggt gtgggaactc catcatggcc aaagccgcga ctcgacttcg ttCccgggcc gccgctcgag 1080 1140 1200 1260 1320 1380 1440 1500 1545 <210> 94 <211> 1549 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 94 ttttaaaccg agatgcatgc taacacatgc ggaatgcatc gcatacgacc tgccggatta gagaggatga gtggggaata gccttcgggt ggatgacggt gaagggtgca gtctgctgtg tgtgatagag caccagtggc agcaaacagg gagcaacttg tcgcaagact ttaattcgat atggtttggt gtgagatgtt aatggtggga aagtcatcat gttgcaaagc gtctgcaact tgaatacgtt cgttactagt acggccagtg tcgagcggcc aagtcgagcg ggaatctacc gagaggt gaa gctagttggt tcagccacat t tgga caat g tgtaaagcac accgaaagaa agcgttactc aaagccctgg gatggtggaa gaaggcggcc attagatacc gctctcagtg gaaactcaaa gcaacgcgaa gccttcggaa gggttaagtc actctaagga ggcccttacg ccgcgaggta cgacttcgtg cccgggcctt ggatccgagc aattgtaata gccagtgtga gcagcgcggg ctgtcgtggg agtgggggac gaggtaaagg tgggactgag ggcgcaagcc ttttgttcgg taagcaccgg ggaatcactg gctcaacctg ttCccggtgt atctggatca ctggtagtcc tcgaagctaa ggaattgacg gaaccttacc ccgtgagaca ccgcaacgag gactgccggt gccagggcta gagccaatcc aagtcggaat gtacacaccg tcggtaccaa cgactcacta tggatatctg gcaacctggc ggataacgta cgcaaggcct ctcaccaagg acacggccca tgatccagcc gaagaaatcg ctaacttcgt ggcgtaaagc ggaactgcag agcggtgaaa acactgacgc acgccctaaa cgcgctaagt ggggcccgca tggccttgac ggtgctgcat cgcaaccctt gacaaaccgg cacacgtact cagaaaaccc cgctagtaat cccaagggcg gcttggcgta tagggcgaat cagaattcgc ggcgagcggc gggaaactta cacgcgatag cgacgatccg aactcctacg atgccgcgtg tgcgggttaa gccagcagcc gtgcgtaggc tggatactgg tgcgtagaga tgaggcacga cgatgcgaac tcgccgcctg caagcggtgg atccacggaa ggctgtcgtc gtccttagtt aggaaggtgg acaatggtcg gatcccagtc cgcggatcag aattccagca atcatggtc tgggccctct ccttcaggcc ggacgggtga cgctaatacc gatgagccga tagctggtct ggaggcagca tgtgaagaag tacccagtac gcggtaatac ggttggttaa ccagctagag t cgggaggaa aagcgtgggg tggacgttgg gggagtacgg agtatgtggt cttaccagag agctcgtgtc gccagcacgt ggatgacgtc gtacaagagg ccggatcgaa aatgccgcgg cactggcggc 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1549 <210> <211> 1276 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> ctggcggcga acgtaagaa gaccttgcgc caaggcgacg g tc ca ga c tc cagccatgcc aatcttccga ttcgtgccag aaagcgtgcg gcggcggacg aat t acg gattggatga atcggtagct r.itac gg ga gg gcgtgtgtga gttaatacct cagccgcggt taggtggttc ggtgaggaat ataccgcata gccgatgtcc ggtctgagag c arca qt g agaaggcctt cgggaggatg aatacgaagg gttaagtctg acatcggaat cgacct-agg gattagctag ggtgatcagc gaatattgga cgggttgtaa acggtaccgg gtgcaagcgt ccgtgaaagc ctacccagtc ttggtgaggt cacactggaa agcacttttg aagaataagc tactcggaat cccgggctca gtgggggata gggatcgcaa aaaggctcac ctgagacacg a a g c ga tc ttcgggaaga accggctaac tactgggcgt acctgggaat tgcggtggat tgaaatgcgt acactgaggc taaacgatgc aagttcgccg cgcacaagcg tgacatccac tgcatggctg cccttgtcct accggaggaa gtactacaat cccatcctag tcgcggtcag actggcggac agaqatcggg acgaaagcgt qaactggacg cctggggagt gtggagtgtg ggaatccttt tcgtcagctc tagttgccag ggtggggatg ggtggggaca tccggatcgg catgcc tagagtgcg aqgaacatct ggggagcaaa ttgggagcaa acggtcgcaa tggtttaatt agagatagag gtgtcgtgag cgcgtaatgg acgtcaagtc gagggtcgcg agtctgcaac tagagggtgg gtggcgaagc caggattaga ctaggctctc gactgaaact cgatgcaacg gagt gcc tt c atgttgggtt cggqaactct atcatggccc aagccgcgag tcgactccgt tqgaatt ccc ggccacctgg taccctggta agtgtcgaag caaaggaatt cgaagaacct gggaaccgtg aagtcccgca aaggagactg ttacggccag gtggagccaa gaagtcggaa ggtgtagcag accagcactg gt ccacgccc ctaacgcgtt gacgggggcc tacctggcct agacaggtgc acgagcgcaa ccggtgacaa qgctacacac tcccagaaac tcgctagtaa 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1276 <210> 96 <211> 1306 <212> DNA <213> Unknown organism <220> <223> origin of the sequence:soil organism <400> 96 cagggat cag ggataacctt atcaaagccg ttggtgaggt cacactggga caatgggcgc actcctt tcg agggaagaag ttgttcggaa ccctcagctc gcggaattcc gcggcgttct gataccccgg gccgtgccgc ctcaaaggaa cgcgaagaac aagaacatct aagtcccgca ggagagactg ttatgtctag ggtgagccaa gaagttggaa tagagtggca ccgaaaggag gggatcgcaa cacggctcac ctgagacacg aagcctgacg accgagatga ccccggctaa ttactgqqcg aactggggaa aggtgtagcg ggactgcaac tagtcctagc agttaacgcg t tgacggggg cttacctagg gtagaggtgc acgagcgcaa ccggtgacaa ggctacacac tcgcagaaag tcgctagtaa aacgggtgag ggctaatacc gacctggcgc caaggct cog goccagacto acgcaacgc agacccgc ctccgtgcca taaagggttc ctgcgtctga gtgaaatgcq tgacactgag cctaaacgat ataagcattc cccgcacaag ctcgaagtgc tgcatggctg cccttgtttc accggaggaa gtgctacaat ccqgtotcag tcgcggatca taacgcgtgg gcatgacatc ttggagaggg atcggtatcc ctacgggagg gcgtggagga gcctaatacg gcagccgcgg gtaggtggct gactggcaag tagatatctg gaacgaaagc gaatgcttgg cgcctgggga cggtggagca agatgaccat tcgtcaqctc ctgttgcoat ggtggggatg ggccggtaca ttcggatagc gcatgccqcg gcgacctaoc ccgtgtttg gcgcgtcc ggcctgagag cagcagtggg tgaagacctt ccggcggat t taatacggqg cgctaagtca ott ga gtgca gaggaacacc taggggagca tgtggcgggt gtacggtcgc tgtggttcaa cggtgaaagc gtgtcgtgag caggttaagc acgtcaaqtc aagcgctgca aggctgcaac gtgaat ttcgagtggq atacacggac gattagctag ggcggacgga gaattgttcg cgggtcgtaa gacagtatcg ggggcaagcg gacgtgaaat ggagaggaac ggtggcqaag aacqggatta atcgatccct aaggctgaaa ttcqacgcaa cgactttcgc atgttgggtt tgggcactct agcatggcct aacccgcgag tcgcctgctt 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1306 <210> 97 <211> 1300 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 97 cccgcagggt ggggataaca agatcaaagc agttggtgaq ao acactac cgcaatgggc aaactccttt cgagagaaga cgttgttcgg gagtagatgg accgaaagg cggggatcgc gtaacggctc ga ctgagaca gcaagcctga cgatcgggaa agccccggct aattactggg caaacgggtg gtcgctaata aagacctggc accaaggcaa cgacgcaacg gaaogcot ct aactctgtgc cgtaaagggc agtaacacgt cc gcataaca gcttagagag cgatcggtat ccgcgtggag ggtgtgaaca cagcagccgc tcgtaggcgg gggtgacctg tcCtglucL L gggcccgcgg ccggcctgag ggcagcag tg gatgaagatc coat cagagg ggtaatacag ccggctaagt cctcagagtg ggatacjacgg ccgattagct agggcggacg gggaattgt ttcgggtcgt gtgacggtac gggggqcaag ccgacgtgaa atccccaggc ttaacctggg aactgcgtcg atgcggaatt aggcggcatc tagatacccc ctgccgtgcc aactcaaagg aacgcgaaga gcaagggccg ttaagtcccg ctgcgqagac ctttatgtct agagtgagct atgaagttgg ccaggtgtag ctggaccggt ggtagtcctg gaagctaacg aattgacggg accttaccca atgtcgaggt caacgagcgc tgccggtgat ggggctacac aatcggagaa aatcgctagt cggtgaaatg attgacgctg gccctaaacg cattaaacat ggcccgcaca ggctcgaacg gctgcatggc aacccttgtc aaaccggagg acgtgctaca agccggtctc aatcgcggat gatactggcg cgtagatatc aatagcgaaa atgaatgttt tccgcctggg agcggtggag gcattggaca tgtcgtcagc cgctgttgcc aaggtgggga atggccggta agttcggatt cagcacgccg ggcttgaatc tggaggaaca gccaggqgag ggtgtggcgg gagtacggtc catgtggttc tccggcgaaa tcgtgtcgtg at cacgt tat tgacgtcaag caaaccgttg gcaggctgca cgggagaggg ccggtggcga caaacgggat gtatcgatcc gcaaggctga aattcgacqc gccggctccc agatgttggg ggtgggcact tcagcatggc cgatctcgca actcgcctgc 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1300 <210> 98 <211> 1233 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 98 acggagcggc gggaaacttg ggCccgcgtc tggtctgaga gcagcagtgg atgaaggccc agaagccccg cggaatcact gagctcaact aactgcgagt ctcactggcc ccctggtagt gcgcagctaa ggaattgacg gaaccttacc gctggcggga aagtcccgca ggggactgcc tacgggctg ccgagcaaat agttggaatc agacgggaga tgctaatacc tgattagcta ggatgatcag ggaatattgg tagggttgta gctaacttcg gggcngtaaa ccagaactgc gtagaggtga cggtactgac ccacgctgta cgcattaagc ggggcccgca agcccttgac acacaggtgc acgagcgcaa ggtgataagc gctacacacg ctcaaaaagc gctagtaatc gtaacacgtg ggataagccc gttggtgagg cctcactggg acaatgggcg aagtcctttc tgccagcagc gcgcacgtag ctttgatact aattcgtaga gctgaggtgc aacgatggat atcccgcctg caagcggtgg atgtcccgta tgcatggctg ccctcgccct cgcgaggaag tgctacaatg cgtctcagtt gcagatcagc ggaacgtqcc ttacggggaa taacggctca actgagacac caagcctgat ggcgqggaag cgcggtaata gcggcttttt gagaagcttg tattcgcaag gaaagcgtgg gctagccgtt gggagtacgg agcatgtggt tgagtaccag tcgtcagctc tagttgccat gtggggatga gcggtgacag cgqattgtgc acg ctttggttcg agatttatcg ccaaggcgac ggcccagact ccagccatgc ataatgacgg cgaagggggc aagtcagggg agtccgggag aacaccagtg ggagcaaaca gtcgggttta tcgcaagatt tcaattcgaa agatggaact gtgtcgtgag catttagttg cgtcaagtcc tgggatgcag tctgcaactc gaacaacaca ccaaaggatc gatcagtagc cctacgggag cgcgtggatg tacccgcaga tagcgttgct tgaaatcctg agqtgagtgg gcgaaggcgg ggattagata ctcgtcagtg aaaactcaaa gcaacgcgca cttcagttcg atgttgggtt ggcactctaa tcatggccct aggggtaacc gagcacatga 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1233 <210> 99 <211> 1304 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 99 cgaaatcccg ttcgagtggg acgagtggag gattagctag ggcggacgga gaattgttcg cgggtcgtaa gacggtaccg ggggcaagcg gacgtgaaat cagggatcag ggataacgtc atcaaagctg ttggtggggt cacactggga caatgggcac actcctttcg agagaagaag ttgttcggaa ccctcggctt tagagtggca ccgaaaggga gggatcgcaa aacggctcac ctgagacacg aagcctgaca atcgagacga ccccggctaa ttactgggcg aaccggggaa aacgggtgag cgctaatacc qacctagcgc caaggcgacg gcccagactc a caca a ccc acggcctccg ctccgtgcca taaagggctc ctgcgtctga taacacgtgg gcatgacatc tcaaagaggg atcagtatcc ctacgggagg gcg gggggtgaacaat gcagccgcgg gtaggcggcc tactggatgg gtgacctgcc ctgctcttga gcccgcgcct ggcctgagag cagcagtggg ccggaggagt taatacgggg aactaagtca ctagaggttg 120 180 240 300 420 480 540 600 ggagagggat ggtggcgaag aacgggatta atcgatccct aaggctgaaa ttcgacgcaa gagcttccgc atgttgggtt tgggcactct agcatggcct aaaccgtaag t cgccgqcgt gcgqaattcc gcggcatcct gataccccgg gccgtgccga ctcaaaggaa cgcgaagaac aaggacactc aagtcccgca gcaaagactg ttatgtctgg gtcgagctaa gaagttggaa aggtgtagcg ggaccaattc tagtcctggc agctaacgca ttgacggggg cttacccagg gtagaggtgc acgagcgcaa ccgqtgataa ggctacacac tcggagaaag tcgctagtaa gtgaaatgcg tgacgctgag cctaaacgat ttaagcattc cccgcacaag cttgaacagc tgcatggctg cccttgtttg accggaggaa gtgctacaat ccggtctcag tcgcggatca tagatatctg gagcgaaagc gaatgcttgg cgcctgggga cggtggagca gagtgaccac tcgtcagctc ctqttgccat ggtggggatg ggccggtaCa ttcggatcgt gcac gaggaacacc caggggagca tgtggcgggt gtacggtcgc tgtggttcaa tcctgaaaag gtgtcgtgag cacgttatgg acgtcaagtc aaccgt cgca cggctgcaac 660 720 780 840 900 960 1020 1080 1140 1200 1260 1304 <210> 100 <211> 1197 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 100 tctagtggcg ggaaacgact gCccgcgtcg ggtctgagag cagcagtggg tgaaggcctt taagccccgg ggaattactg gctcaactcc ttccgagtgt tactggacga ctggtagtcc cgcagctaac aaattgacgg aaccttacca gacctacaca tcccgcaacg aactgccggt cgctgggcta gcaaatcccc cacgggtgcg gctaataccg gattagctag gatgatcagc gaatattgga agggttgtaa ctaactccgt ggcgtaaagc ggaactgcct agaggtgaaa ctgttgacgc acgccgtaaa gcgttaagtc gggcct gcac gcgtttgaca caggtgctgc agcgcaaccc gataagccgg cacacgtgct aaaaaccgtc taacgcgtgg gatgatgtct ttggtgaggt cacactggga caatgggcga agctcttttg gccagcagcc gcacgtaggc ttaagactgc ttcgtagata tgaggtgcga cgatgatgac atccgcctgg aagcggtgga tggtaggacg atggctgtcg tcgtctt tag aggaaggtgg acaatggcgg tcagttcgga gaatctgccc tcggaccaaa aaaggctcac ctgagacacg aagcctgatc cccgggatga gcggtaatac ggctttgtaa atcgcttgaa ttcggaagaa aagcgtgggg tagctgtcgg ggagtacggc gcatgtggtt gtttccagag tcagctcgtg ttgctaccat ggatgacgtc tgacagtggg ttgttctctg ttgggttcgg gatttatcgc caaggcgacg gcccagactc cagcaatgcc taatgacagt ggagggggct gttagaggtg cgtcggagag caccagtggc agcaaacagg ggctcatgga cgcaaggtta taattcgaag atggattcct tcgtgagatg ttagttgggc aagtcct cat cagcaaactc caactcgaga gataacagtt ccagggatga atccgtagct ctacgggagg gcgtgagtga accgggagaa agcgttgttc aaagcccgga gtaagtggaa gaaqgcgact attagatacc gtttcggtgg aaact caaag caacgcgcag tcccttacgg ttgggttaag actctaaaga ggcccttacg gcgagagtga gcatgaa 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1197 <210> 101 <211> 1352 <212> DNA <213> Unknown organism <220> <223> origin of the sequence:soil organism <400> 101 cgacggccag gctcgagcgg gcaagtcgca aggcaatctc ccacgagctc atgagcctgc agctggtctg gagqcajcag gcgatgaagg acctgcatcg ggtaagacag ctgtgtaagt tgaattgtaa ccgccagtgt cgagaaaggg ccctcgagtg gcagagctltg ggcccatcag agaggacgga tggggaarct ccttcgggtt atgacggtat agggtgcaaa cgggcgtgaa tacgactcac gatggatatc cttcggccc gtggataacc tggccaaagc ctagttggta cagccacact tgcgcaatgg gtaaagctct ctccttagca cgttgttcgg atcccatggc tataggqcga tgcagaattc ggtacagtgg ttccgaaagg ggacctcttu gggtaatggc ggaactgaga acgaaagtct gtggggagag agcaccggct aattactggg tcaaccatgg attgggccct gcccttcagg cgcacgggtg agggctaata ttgaaagttc ctaccaaggc cacggtccag gacgcagcga acgaataagg aactctgtgc cgtaaagcgc aagtgcaccc ctagatgcat cctaacacat agtaacacgt cagcatgaga gcgcttgagg taagacgggt act cct acgg cgccgcgtga tgcagctaat cagcagccgc gtgtaggcgg gaaactqcgt 120 180 240 300 360 420 480 540 600 660 720 agctagagtc ctggagagqa aggtggaatg aagcggaaca cgtqgggagc acgctggggt tacggccgca gtqgtttaat cgagagattg ctcgtgtcgt catcattcag acgtcaagtc gtgggacqcg ccggtggcga aaacaggatt gcatgcactt aggttaaaac tcgaagcaac gaccttcagt gagatgttgg ttgggcactc ctcatggccc aagtccaaga agcggccttc agataccctg cggtgtcgcc tcaaaggaat gcgcaaacct tcggctggat gttaagtccc tggtggaact ttatggqttg tggacaaatc cttggtgtag tggacagtga gtagtccacg gctaacgcat tgacgggggc taccaaccct ggaacacagg gcaacgagcg gccqgtgaca ggctacacac cc aggtgaaatt ctgacgctga ccgtaaacga taagcattcc ccgcacaagc tgacatgtcc tgctgcatgg caacccctac agccggagga gtgctacaat cgtagatatc gacgcgaaag tqaatgctag qcctggggag ggtggagcat attgccggtc ctgtcgtcag cgccagttgc agcggggatg ggcggtgaca 780 840 900 960 1020 1080 1140 1200 1260 1320 1352 <210> 102 <211> 1361 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 102 aacagctatg cgccagtgtg ggattagtgg agggaaactt cggcccgcgt ctggtctgag ggcagcagtg gatgaaggcc aagaagcccc tcggaattac gagctcaact aattccgagt cttactggtc ccctggtagt gcgcagctaa ggaattgacg gaaccttacc gctggcccca aagtcccgca ggggactgcc tacgggctgg tcgagctaat gttggaatcg accatgatta ctggaattcg cggacgggtg gagctaatac aggattagct aggatgatca gggaatcttg ttagggttgt ggctaacttc tgggcgtaaa ctggaactgc gtagaggtga cat tactgac ccacgccgta cgcattaaac ggggcccgca agctcttqac gaacaggtgc acgagcgcaa ggtgataagc gctacacacg ctccaaaagc ctagtaatcg cgccaagct t cccttcaggc agtaacacgt cggataagcc agttggtgag gccacattgg cgcaatgggc aaagctcttt gtgccagcag gcgcacgtag ctttgatact aattcgtaga gctgaggtgc aacgatgaat attccgcctg caagcggtgg attcggggtt tgcatggctg ccctcgccct cgagaggaag tgctacaatg catctcagtt cagatcagca ggtaccgagc ctaacacatg gggaacgtgc tttcgaggga gtaaaagctc gactgagaca gcaagcctga caccggagaa ccgcggtaat gcggatattt gggtatcttg tattcggagg gaaagcgtgg gttagccgtc gggagtacgg agcatgtggt tgggcagtgg tcgtcagctc tagttgccag gtggggatga gtggtgacag cggattgcat tgctgcggtg tcggatccac caagtcgaac cctttggttc aagatLttatc accaaggcga cggcccaaac tccagccatg gataatgacg acgaaggggg aagtcagggg agtatggaag aacaccagtg ggagcaaaca gggcagtata tcgcaagatt ttaattcgaa agacattgtc gtgtcgtgag catttagttg cgtcaagtcc tgggcagcga ctgcaactcg a tagtaacggc ggatccttcg ggaacaactc gccattggag cgatcct tag tcctacggga ccgcgtgagt gtatccggag ctagcgttgt tgaaatccca aggtaagtgg gcgaaggcgg ggattagata ctgttcggtg aaaactcaaa gcaacgcgca ct tcagt tag atgttgggtt ggcactctaa tcatggccct gacagcgatg agtgcatgaa 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1361 <210> 103 <211> 1300 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 103 catgtttagt cctgcccaag gagaaaagtt gtgaggtaat actgggactg tgggggcaac cctttaggcg aataagcacc tcggatttac gggctcaacc agcaatacta agagggggac gcccgtaagg agctcaccaa agacacggcc cctgatccag gggaagaagg ggcaaactct tgggcgtaaa tgggaagtgc aatgatgacg aaccaaggga gtggcgcttt gactgtgatc cagactccta cgatgccaca atatgggatg gtgccagcag gggcgcgt ag atcgcaaacg agcggcggac aactttggct tggaggqgcc ggtaactggt cgggaggcag taoqtcraaaa aataagcctg ccgcggtaat gcggttgtgt acacaactgg gggtgaggaa aataccgcat tgcgtccgat ctgagaggac cagtggggaa aqqccttcgq tattttgacg acagagggtg gagtgtgatg agtatatgag cacgtaggaa aatctctacg taattaattg gaccagtcac tcttggacaa gt-tng gtacccgcag cgagcgttaa tgaaagcccc agggtggcgg 120 180 240 300 2 rfl 420 480 540 600 aatttccggt ccacctggca ccctggtagt tcgaagctaa agaattgacg gaaccttacc gcgcagagac cccgtaacga accgccggtg gtagggctac ccaatctctt gqaatcgcta gtagcggtga taatactgac ccacgccgta cgcgataagt ggggcccgca tacccttgac aggtgctgca gcgcaaccct atgaaccgga acacgtgcta aaagcgtctc gtaatcqcgg aatgcgtaga gctgaggcgc aacgatgaga tctccgcctg caagcggtgg atcctgcgaa tggctgtcgt tgtccttagt ggaaggcggg caatggggcg gtagtccgga atcagcagtg gatcggaagg gaaagcgtgg actagatgtt ggaagtacag agcatgtggt tcttgccgag cagctcgtgt tgccatcatt gacgacgtca tacagagggt ttggagtctg ccgcggtgaa aacgtcgatq ggagcgaaca ggagggggaa tcgcaagact ttaattcgat aggtgaqagt tgtgagatgt tagttgggga agtcatcatg cgccaacccg caactcgact gcgaaggcag ggattagata cccttcagta gaaactcaaa gcaacgcgaa qccgcaagga tgggttaagt ctctaaggag gcctttatgg cgaqggggag ccatgaagtc 660 720 780 840 900 960 1020 1080 1140 1200 1260 1300 <210> 104 <211> 1250 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 104 tgtagcaata ggacaactca caatgagacg gatccgtagc cctacgggag cgcgtgagtg tacctgagaa agcgttgttc aaatcccgga gttagtggaa gaaggcggct attagatacc cgttggtggc aacttaaagg aacgcgcaga tcagttcggc gttgggttaa cactctaaag tggcccttac cgcaaggaga tgcatgaagt catcagt ggc gggaaacttg ggCccgcgtc tgatctgaga gcagcagtgg aagaaggcct taagccccgg ggatttactg gctcaactcc ttcccagtgt aactggtcca ctggtagtcc gcagctaacg aattgacggg accttaccaa tggatcggag gt cccgcaac ggactgccgg gggt tgggct agctaatccc tggaatcgct agacgggtga agctaatacc ggattagcta ggatgatcag ggaatcttgg tagggttgta caaacttcgt ggcgtaaagc ggaactgcct agaggtgaaa gatctgacgc acgccgtaaa cattaagcac ggcccgcaca cccttgacat acaggtgctg gagcgcaacc tgataagccg acacacgtgc aaaaagccgt agtaatcgct gtaacacgtg gtatacgtcc gttggtaagg ccacactggq acaatgggcg aagctctttt gccagcagcc gcacgtaggc ttgatactgg ttcgtagata tgaggtgcga ctatgggtgc cccgcctggg agcggtggag cccgatcgcg catggctgtc ctcgccttta gaggaaggtg tacaatggcg ctcagttcag aatcagcagg ggaaccttcc gagaggagaa taacggctta actgagacac caagcctgat gccagggacg gcggtaatac gggtcgttaa cgaccttgag ttgggaagaa aagcgtgggg tagctgtcag gagtacggtc catgtggttt gacaccagag gtcagctcgt gttgccatca gggatgacgt gtgacaatgg attgcactct tagcggtgaa tcgttgtacg agatttatcg ccaaggcgac ggcccagact ccagccatgc ataatgacgg gaagggggct gtcaggggtg gctggaagag caccagtggc agcaaacagg cgggcttgct gcaagattaa aattcgaagc atggagtcct gtcgtgagat tttagttggg caagtcctca gcagctactt gcaactcggg 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1250 (210> 105 <211> 1302 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 105 ggcttcggct ggtggggaat tgacagttgt ttagctagtt cggacggcca attttgggca gggatataaa cggtacccaa caagcgttgt tgaaagcccg aggcatctgg ccccggtaga aacccttcga taaagtgggg gqtgcggtaa cactggcact atgggcgcaa cttcgaaaagt cgtaaqcccc tcggaattac gggctcaacc aattcccagt gtggcggacg aagaggggct gatcgcaaga taacgtacca gagagacggg gcctgaccca taggaagaa ggctaactac tgggcgtaaa ccggaatgtc gtagcggtga ggtgagtaac aataccgcat cctcacgcct aggcggcgat ccagactcct gcaacgccgc at ccgt-gtga gtgccagcag gggcgcgtag tttggaaact aatgcgtaga acgtgggtaa aacgcagcgg gaagaggagc cggtagccgg acgggaggca gtgaaggacg gya C aa'-c Ca ccgcgqtaat gcggtacgac gtcgaacttg tat tgggaag tctgcctttg caccgaatgg ccgcgcccga cctgagaggg gcagtgggga aaatccctct cacgggauga acgtaggggg aagtctggag agtgcggaag aacacctgag 120 180 300 360 480 540 600 660 gcgaaggcgg gga t taga ta gaagtccccg ggctgaaact cgacgcaacg t t tccct tcg gttgggttaa gcactctatt at ggcct t ta ccgcgagggg actccatgaa gatgctgggc ccccggtagt cgtgccggag caaaggaatt cgaagaacct ggggaggtag gtcccgcaac gggactgccg tgtccagggC gagccaatcg gttggaatcg cgacactgac cctggccta ctaacgcggt gacggggacc tacctgggtt gacggtgctg gagcgcaacc gtgacaaacc tacacacgtg caaaaatCCg ctagtaatcq gctgaggcgc aacgatggat aagtatcccg cgcacaagcg aaatcctacc catggctgtc cttaccacta ggaggaaggt ctacaatggc gtctcaqttc cggatcagca gaaagccagg acttggtgtg cctggggagt gtggagcatg tcgtcgcctc gtcagctcgt gttgccagcg ggggatgacg cggaacaaag ggattqgagt tg ggagcgaacg tggggttctc acggtcgcaa tggttcaatt agagatgagg gccgtgaggt gt tcggccgg tcaagtcatc cgcagcaaac ctgcaactcg 720 780 840 900 960 1020 1080 1140 1200 1260 1302 <210> 106 <211> 1281 <212> DNA <213> Unknown organism <220> <223> Origin of the sequence:soil organism <400> 106 tgcttctctt gataacgttc ttcgggcctt tcaccaaggc cacggtccag gatccagcca aggaagggca taactctgtg gcgtaaagcg gaactgcatt gcggtgaaat tactgacact cgccgtaaac cattaagttg ggcccgcaca gccttgacat gtgctgcatg gcaacccttg acaaaccgga acacgtgcta aaaaccgatc gtaatcgcga gagagcggcg agaaacggac gcgctatcag gacgatccgt act cctacgg tgccgcgtgt gtaaattaat ccagcagccg cgcgtaggtg caaaactgac gcgtagatat gaggtgcgaa gatgtcaact accgcctggg agcggtggag ccaatgaact gctgtcgtca tccttagtta ggaaggtggg ca at ggt cgg gtagtccgga atcagaaatg gacgggtgag gctaataccg atgagcctag aactggtctg gaggcagcag gtgaagaagg actttgctgt cggtaataca gtttgttaag tgactagagt aggaaggaac agcgtgggga agccgttgga gagtacggcc catgtggttt ttctagagat gctcgtgtcg ccagcacgac gatgacgtca tacagagggt tcgcagtctg t taatgcctag catacgtcct gtcggattag agaggatgat tggggaatat tcttcggatt tttgacgtta gagggtgcaa ttggatgtga atggtagagg accagtggcg gcaaacagga agccttgagc gcaaggttaa aattcgaagc aga t tggt gc tgagatgttg atggtgggca agtcatcatg tgccaagccg caactcgact gaatctgcct acgggagaaa ctagttggtg cagtcacact tggacaatgg gtaaagcact ccgacagaat gcgttaatcg aatccccggg gtggtggaat aaggcgacca ttagataccc ttttagtggc aactcaaatg aacgcgaaga cttcgggaac ggttaagtcc ctctaaggag gcccttacgg cgaggtggag gcgtgaagtc ggtagtgggg gcaggggacc aggtaatggc ggaactgaga gcgaaagcct ttaagttgga aagcaccggc gaattactgg ctcaacctgg ttcctgtgta cctggactaa tggtagtcca gcagctaacg aattgagggg accttaccag attgagacag cgtaacgagc actgccggtg cctgggctac ctaatcccac ggaatcgcta 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1281 <210> 107 <211> 43 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence: primer <400> 107 cgctgcagat ttaaatatgc aacgcgtaag tcgatggcgt tcg <210> 108 <211'> 51 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence: primer 48 <400> 108 cggtcaactt aattaagata tctcgagaga tctattaata cgatacctgc g 51 <210> 109 <211> 29 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence: primer <400> 109 aaaaagatat ctgacgtccc gaaggcgtg 29 <210> 110 <211> 32 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence: primer <400> 110 aaaaaagatc tggctaacta actaaaccga ga 32 <210> 111 <211> 36 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence: primer <400> gtgccgttaa ttaagctccg cgaagtcgct cttctt 36 <210> 112 <211> 36 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence: primer <400> 112 gtgccgttaa ttaaccgctq cataaccctg cttcgq <210> 113 <211> 42717 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial sequence: cosmid a26gl noncoding strand <400> 113 aaaaaaaaaa gt tcagaact tgtcgagatc gaccccacac ccgccgtaat gctgtatgcg cccccgcagc cgccccggta tgcccagccc cacccgacac ggaaccat tc cgtggccttc gcaccctaag cgcaaggacc cgagcaaggc tcagtcgcgc cgccgtaatg ccccaagtgt aaaaaaaaaa ccccgcgaga gatgcaggtg cggtgcccgc cagccagat t tcccactccg cgacggatcg aggctgatcc tcgcgccacc cgt tgccgt t ggattggcgc cgcatcacac atcgccggcg ggcgctgcc t agga tgaagc cactgcctcg catcccctgc gggcaggc to aaaaaaaaaa atc toctcggc caagcgaact gcgggcacca cgcggatgct aattcttgaa gtctcatcct ggtatctcgg gcggctgcga ggactctcgg tccagt toot agggtgacat ggtotgccga togocgccaa gtgtaaccgg ccgtcgcgcc gatgcgaagg accgtgggcg aaaaaaaaaa agagcgcctg cgggatgotc ocggcatctg gcgat tcggc oga tgcgoaa cggtctcatc oatcgogac tgoogccggc ccagcago tg gggcacgcag ogccctcgag aacagtgcaa gogtcgcgoo actctgcttc gccacactto ccgcatagaa cgacct tgtd aaaaaaaaco cacctcgaot ggo cgogatg cagatgoctcc gtoggacgct aatgtottcg caggotocog cgaaatcaca atcggoaatg oggctoccat gcct tggaco oogcccgto ccgttogatg cagoacotgc gcgaacggca ogcgatgccg gtcatcgcoc cogggga.
gggcoctgac toaccggcag gtacgtoooa goatgaaogc tgtttcgtca caggoggt to cagtgaatga aoogtgt ttg accagccatg tccatagcgt tcogagga tga agccgcgca C gcgacaggoa aaacaggcat C ccggoaogc cggaaggtgt togatgcgat gooa tggtgc 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 cgcgcgcgtc gtccctcgac caaaccggai ccatctcca( gaaacggcat caatctcgac tcgaggtcgc tgtcgggaga cgagcaac tc gtccgtttat gactgatctc aacggactg tcaactgtcc cgagttcctc ccacctgaat gagccacacc cagtagtgac gagccatcgc gggccgcgtc tgtgtccggc t ttgaacggc cgtcggcgag gttcgatcgc ac tgt ccgcc aaccggtggc actcgggcag tggcggcgga cgagatcgcg cttcggcatg ccccgccgcc tagcctccac ctacattcgt ggacgacgtg ggccgttgga gtacgtgcgg cgagaatggt ttttcaccga ccatcgcccg aatcgacatc gcgccggacc cgcgaatcag ccagcattcc aacggccat c gaatcaggtt acgccagatg cctgcaaccc cggtataggc tgatcccgat gctccaaagc 3ctcggtccai g tatgacctgc :ctcttccag :ataagcggc 3 cgcatccag cttgcgaccc gaaccagaac i cgaagcgagc gagcagctcz gccgagcgtc caggaactgc Ictgtcgcaga ,ttcaaccgtc Fgcgcaattcc cagccggcaq fgtctcccgag faccgcgttca tccccggccg ctccagggtg cacgccgtcc gaacagcgca caggtccgca cgat cggaag ctggccggaa gacgccgcga ggatgaggcc a cac agcgcg cagagcctgc gagttgatcc gcttcgcggg attccctttg cttcgcctct cgaat tggtg gggccatggc gt tgggcgcg cttgatcagg cccgacgatc cagttcgatg ggcgggggcc gt tcggcgcc cgccagaata gcatccctcg ggccgccatg cacgccgccg aaccgccacg cagcagatag gtcaatgagc gaacactccg ttcccatgcc a tcggaaccgi :actcgcgag4 I ctgcagccai i ccgggaagc4 i gagagcacg! ;agcagcggal Icgctcgcgct ,gcgcgccagi LCgCCgCtCCC :cgcagactat ,gtgaactcat Lttggctaccc fgagaagaacc fttdtcgagcc faagacgccgc aacaccgtgc gcgagcacgS cgaagtccge agaatgcc tc gggcgaattc ggc tgcacgc agccgccat t acaggc tcgt aagatgaagg ttggtttcat acaaacgctg gagagcggag ggatggcgcg gggaccggca cggctgtccc gcctctacac acattcgtct ccactgattc gaacatgcgg ttgaaatgca ccggcgatac aacggagaat gga tccccca atgccggcgt gtgaggccgt cgatcgccgt ccgcggccgt gc ccgcaggc gccagcgcca agtgaggaag gagatccgtc gccggatcgg gtgtcgctgc acctcgagca cttcggccag actggagatg cggaatgcgt 1140 gcgcccgcggi :ccgcgaaca( a ccacaagctc a actcccaga( :gaccgccaac :gccagggatz :ccggacgctc 3 gttcgtcgcc :cctcgatcgi :caccagccat -agtacgacgc Igcacggcggz ;ggtccacctc ;gcctctcgaz :tgcgtggtcc Iccttcgcccc tgcaacagccc rcgacatgggc cccaggattg gatcgatctq ccacgaagct cggaa tacag cgagtttcgg tgcgccggaa cgcgatgttc cgtgaaagcg ccgaaagcgg cttcctcgcg cctcgaggag tcgccttgac tcgcttctac cgaacgagct tggc tat c tt gatggggcgg ctgccgccgc cgacggcacg gcggcgtccc tcttgagcgc tgctgcggcc cacgcgtcgc aaccgtcggc ggcagaagta tgctgcactc agcacgccgt cggcggcagt taggtttcag ccgcgagact gcaggcgctg cgtcgaagcc a aggaacaacc :ttccttggct :gttgagccgg :gtgtgtgtcc :tgccggcaaa i cgtcggcaga 4 cccattcacg Igcgcagtacg icggcgtcagc cgcctgcaac gtcgcacgcc Lacgttttgca Icgagctgtgt LgtCgtCCttC gttgctggcc Iatcgagcggc Igctgcggcga cgccgccact cagcagttcg gctcagccat gcggaaggcg gcgatacgcc acgaactccg ggcctcgagc gtagtgactg cccatcgcga taggaggtat gc tgggcggc ggggacacta ctcttcactc attcgtcttc caccc cggc a tagcggcagc aa tc tc tcgg ctccaggtgg cccctcgccc ggttccgtgg cgcccgaac gccgtggttg atcggacagc ggaggcagcg gatcgtgcttI tcgcgactgc gtcgacgggg gctgaacgcg ccggctgtag gtcgggcggc ctgcggatccz gtcoacagca agcatttgtt 1200 gcggccagcg 1260 tgatcggcga 1320 9gcgccagcq 1380 CttCgccgCg 1440 tccaggcgcg 1500 tagagcgcgg 1560 gggcgaacca 1620 acaggatgcg 1680 gactcccaga 1740 tcgcccgtgc 1800 ataacgcggc 1860 gaagcgacat 1920 aaatgctcga 1980 gcgacagaaa 2040 agttcgacca 2100 cagatgatgc 2160 tctcccatgc 2220 accagcgcga 2280 gctcccgact 2340 gcgtcgcaac 2400 atgcgcgggt 2460 ggatcggcga 2520 aattgattga 2580 cgcgtcaggc 2640 taggcgccgg 2700 tcgcggccgt 2760 cggcccgacc 2820 gcactagcct 2880 gtcttcgcct 2940 gcttctgcga 3000 actcgggggc 3060 tcattccaga 3120 tgctgcaggg 3180 ccgaagttgg 3240 agcaccgctg 3300 gcttccacgt 3360 acggcttcct 3420 acggccgatc 3480 cgcttcagca 3540 aaacttttgc 3600 tccggcgcca 3660 aagctgcggc 3720 iagttcggtc 3780 3ttccggtac 3840 :cgtcggtgc 3900 :gacccgcac 3960 agaccggcga 4020 ~catcgagga 4080 cctcgcgcgg cgtgattccg aagaagccgg atccgccc cggca tccc gccaaaatc tgggttcgt ggaccgcct ttccgtgat atcgctggS cggatggtc: cgccagatc: gagcgtctt cagcagttc gctcgccg9 ct cgcccag cggcaggcc gatgctcag gttggccgc gagtacgaa cagcttcgg atcgagcaci cagcgccgci gagttgctcc gccttgctcc cagatagacc cagccggcgi acggat ttcz caggccgccc agcctggaaa gatcagccac gagagtgacc gccggcacac: tcgctcgcac ggcatgcat~t cgtcagatca agaacc tgcc gggctggcaa gccggggtgc cagcgcctcg gacgccgcgt cgccggcagc gagcttccac gggtgtgaca catcatgaca aaatgccgac gagcagaggc gaggcgcgct gaaggtgtca atagagaccg cagcacggac gggcagacaa cttgcactcg ag acgcgtgtac :a acggcccgca 3c ttccggcgag :t gtggacgctc :g ttgaagagga :g aacgccgcca Ig acgtcgagtc *g tatagcaatg !c agcgccatca g gacgcatcga 'C tgccgccgct g ctccgccttt c agatcgccga t tgccgcgcga gccggtagtc c gagtaattgc a aagtcgagcg gccagcacct 9 cctgcggcat :agctgttgct 3 agatcggcaa 4 atcaagcggc 3 ccgtcggctg i acgtagcggc i tccagcagca :cacagctcgg Lggatcgacgg Iagccgcgccg rccggacacga agcaaccagtc tcctgccact S acagccagtc c ccgacgaacqg ggcggacggt a cgctccgcgg g agccggtaag c ccatcgcgac; g gcgttcatcg c tcgaaatgaa c gaatcgccat c atcgcttgca c tcggagacgg c acgtgccagg c gtctccgtgg t cccaacatgc cc cgctgaaacg g cgccagtcgg g~ caatccgatc gt tatcgcccca gc cggtccgcca gc atctttcccg ggaatttcgc tccgcgccgg tccagt tcgt gtgagatcgc tctcctgctc cgagt tcgcg tcgcgggaag gcgaatcgag gcgccagcac ccggcgcagc t ttcggcgga agatgcgcgc tgcgcgcggc c ttgcgcgcg t ctgtccggc catggtggcg tctcgaaacg gcacgactcc cggaactcac ctgcctccgt gtgccaccct1 aaatggcagg 9tccgttgcg tggcggcggc gatgctcgcg 3agtcgcatc 3gcggccgacc :gcgcagctct :ctccggctt s ;cacatcctg c :ggaaaactc g 9gccgctcga g bgaagcgcac c rccaagtcgc t 'gttcgcgcc a 'ccagacttc t gccgtagaa c ggcgccggc a ctggctgaa g cgtgaccgg c gcagggaac c' gccgggcac a~ tagcgaatt ti gccgcgcgc cc atacgtcgg cc attcacgcc cc :cccttaga tz gtggtgag cc ~gttcgcat cc gcgcgttggg ggatggcatc gaaagcggca cgagacgcgc tcatcgttcc ggacagctgc gaggacatgg gctcttgcgc tccgtattcc gccgccggcc ttcggtgagt gacttggaac acccacttcc catgccttcg gcgca tgtgg ggaaccgagc agtgagc tgg agcccacgtc acgcagcggc atcgcaagca atgccggccg tcgtcctaat :ggccgcttc :aatgcgatc 3atgtcggca :gcgatcgcct ;tcatcactg z :aaagtctgg :tgcggcatc a ~ccagggccg t :agccacgac t rgcgatgacc g rctcgggcgc a gcttcgatc c ccgaatact t tcctcagcc a tccacccaa c gcatcgccg g gtcgccgcg c acctgcacc tA gtatccggc gc ggaccgaag ac atgaccgcg tc :cgaagatg ac jgctctctc gc igcgcgacg cc ~cggtaaac aS ~gggagtgc ag ;tgggatgc gg rcgctttca aa a tcgggatcg gatgccatcg agccatgccg tcgcgtgcgc tctcccagat cgcacttcat ccggccaat C accagctctc ttgagcggcc tgcttgcgga tgctggatga accgcgatc C ggcggcagca cccgcccatg gcaaggctgt agcgaagcgg tgaagattcc tgttctgtaa tgggtgcgcg gcaacca tga accagtacca gcgccgagac 3acgtttcct 3ctttgtcgt :tgtcgcaac :gcccgagtc tgcgatgca jtctgttcca aaCcggcga c ;cggacttca a gtgcggatt S rcgccggtct c gacgcgcat g cgaccggca c gaaaacaag a ccggcagac g tgaaggcgg g aaatgactt c gcagactgg c ttgcttcgc c ggatggcca g :ttcctgtg C :gtagatca cc :atcgccca a( :gggtacgc gi ~gctgggtg at rgccgccaa g~ iccagaccg cS rgccgattt cc .ctgcacgg tt taaaacgcat 4140 tgcaggagct 4200 acgatcgcga 4260 ttgagcgcca 4320 ggtgcaaggt 4380 cgacgaggga 4440 tctcgacggt 4500 cgatggcgcg 4560 ggcgcgggtc 4620 tgcgcatctg 4680 agccgggatc 4740 gagcggcagt 4800 gcggtacccc 4860 gtccccaatt 4920 cgagaaatgc 4980 cggaagagaa 5040 aggcaccctg 5100 ctaccccgtc 5160 ggtccgccag 5220 ctgcagcccc 5280 gacggcgcgc 5340 cgccggtgat 5400 tgtgccgcac 5460 cgccggcata 5520 cgaggtcgat 5580 :ccacagtgg 5640 :gccgcgcgt 5700 ;cgcctggcg 5760 ~atcgtctgc 5820 cgttgtcga 5880 rcagcgtacc 5940 :ttcaaccag 6000 racagcgcag 6060 :gtatgcgcc 6120 atcgatcag 6180 cattcgccc 6240 gccaagatc 6300 ggaaggctg 6360 tgccgtgtg 6420 gtcctcgcc 6480 tgcctgccg 6540 cgcttcgag 6600 gtgctcatg 6660 cgcgctgtt 6720 :ctcaggct 6780 ttccccggc 6840 ictttccag 6900 ~ccgtcatc 6960 ~agaaacag 7020 :tcgcgcaa 7080 ctgtcgccgc gt tcgacacc ct tgtccaaa cgtttccacg ggagatcacg cgcgatgcgg cagccgcccc acacgccgca cgacagccag cgtctcgtgc gggacggagc aagctctgcg cggccgccgg tgccgcatct ggtgaagcaa gccggccagt tgtgaccggc cctccaaaa, cgacgattcc gcgaaaatgc cccggcgatt acaaggcgca gcccagcgac cgcgttggcc gccattgctc gccatcggcc z cacataaccgt agccttcgac e tgccgcatcg c ggaggagcag g gcgattggcc a gtcggcgtac t gctgcccgcc a cagcagcagc c ttccgcatcg a cggagcatcg g ggtgacaccg t gcccggaaaa c gcctgcgatg t: cgcaattttt t atgtcgtcca ai gcggcctgcg ci gccagcgtgg g~ agacggttgc g aggggatcga t: tcttcgagca at tttggcgcat cc gccgccttgg ge caagccyagc tg tccatgccgc gc cagtagcggg agcgggatcg atcggatcca.
ccggcgcga t gtgcggtccg ctcggattgg ctggcggtca acatactcgc aactgagcca aacggcgaaa caatcggcgc cccatgccgg cgcaacgata gcggcggtga acgtcggaca t cgccgagcg ttcggcaaag ccgaacgagc gtggcgatgc aggtgcggag :ccgccgcgc gagtccggcg 3tgcccgtgc 5Lccgcctcttc :gtccgtcct S ~gcgcatcgg a :cggctgcgg c Laagcgatca t: ~attcgcgcc 9 rccgtatcga c Lacatgctat g .gaaacagtc c gagggccgg g gctgctgcg g aaccgtcaa t gatcggagg a cgacaccgt t gacacgcca t: tttgccgca t: ccctcacga c( cccgctgag tc ttcgaccag cc 3tgacccca az iaactccag gc igtttgaga gt: :gtcctgcg gc ;gcttcgct cg Ltagaactg cc rttcagcag ct aaagccag cc cgtcgagtgt ccagcggc tg tcagcggcga gcaggtcatc gcgcat tcga cgtgaacgat ccagccgcag cgagac tgtg gagcccattc ccgactcgaa aacgatcgag cgtac tgcgc Cttcgcgccg t cgccaagcg gcaacgcatt cttcgtcggt ggagtgcagg tgacgccggc gaaaccggct gaatggtgcg .Ctccagatg :gtcgtagac .atgcgcctc ;caggaccgc ;attgattgc c ~gagccgctt c :gtcgaacgt c ;gccctcggg a rgcgcaggct t 'cgccatgct c 'cgccacgcc g gaagtcctg g agagatgcc g atccatgtt c gcgttcgag g gtagtactg g cttcaacag c gccgacgat g gttccgcgc c~ Cttgctcct cc 3cagttttc at ;gcagtccc at Itcagggtc gc ;ccatgagg gz :cgatgcgc ac ~tctgaggc tc rccgcgtcg tc gccactgg cg gttccaga at gcgatccg cg cgtgcctttc ggcaatacc atacgcgatc gcacctgc atggaacgca tgcgatacc ttgcgcttcc gcgatttct tgcggcgact gccaccttc gaccgctttg ccgcggggE gccgtcctcc acgctaaaS gcccagcacg tagtccggc cagggcaaac attgccggc caagagaacg gtcagcgga cgcgtcgcga aaaacaggc gccctggccg gtgaagaga cggcgccgcg gccaatgcc gtgactatat gcgtcgcgc cgggtgcgac tgcaggaac gcgcgccgac agagtgaga.
cgcctcttcg aggatgacg cagacgcggc cgtccttcc gccgtccagt gagatgttc atgctgcagg gcgagtacg cccgatgttg gtctttacg :gactgcagg gcctcgatcl latcaacgat acgtgggatc ~ttctgcgcc tgcagattcc :gagccgcgg atgactgcac :agcagcacg atgccgcagc :ttgcagcgt ccgtcgggcS igtcaggatc aagttcactc .tggcaagcc agatggacgS 'ggaccgcgc aggtcgagca gaacccgac caagctccga gcgcaggag ccggcaaaga gcgtcctct gccgcttccc agagct tcg cggggggaga aaggcggca. tatcgcgcat tccgagttc cagcggtctg gtccagaag gcgtccggat gcgatgggt tcggcgtgaa aatagcgcg agtttgaccg 3gagcgcag ccacggctgc :ggcgcggc tgtctccttc :ttgcgacg ccaggtgcgg zggggagcg tgagacccag ~atcgaagc cgagttcctt ~cacgcgcg ccagctgctg :ggccgatt gcagccgcgc :atgcaaaa gctcgaacag rgacattga tgggcatcgc agcgaggc cgtgttgcgg 'attgtcct gcgcggcagc 9g ctccgctgac aa gcgcttcgaa at tcagctctcg g cagccgtgcc 3g cggcgagcgc ~a gcgcattgac 3g cgccggccac -c ggacgccgag .t gggtatacgc La catcgagctg .t gcgttttata .a aagcgattgc g ctacagcctc c caacctgact t: ccgcgaagtg a gctgcgggcg t: gcgcgttgct g acgtccacgg g gattcagccg 9 ctttgatcag g aacccagcag t cgataggatc a gatcgatgtg 3 gcgccgtgat -ggatggcac -cctcgccgcg ccaacatgcg *cgccggcgaa cgacgagcgc gataggagat 8 tgcgggcagg 8 *cgccggtcgc 5 agcacacttc 9 tgccgaagaa 9 acgccttgcc 9 gcggcacctc 9 tcttcgcgcc 9 ccaggtcgaa 9 acgacatggg 9 ttcttccgac 9 cgcagcagcc 9.
ggcaagaccg 9' tgtgagttcg 9~ cagcgggcgc 9 ctgtagatgt 9 gcgcaacgcg 9~ tgcagactgc 95 ggcgacgtgg 9S cgtcaggttt 10 cagcccgacc 10 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 3580 3640 3700 3760 3820 1880 1940 1000 '060 120 180 240 300 360 420 480 540 600 660 720 780 340 300 360 )020 1080 tccgaccaci gccagcgcg agtcc tgca, tgcaggc tc, tggttcagt tcgatggaa( cggacatct.
ctcagcagci agcccgcca ggtgcga tg5 gcctgcgcc aggggccgc< aatgccgggc aggaacgggc aaccgtccat ttcaggacgz atcgagcgcc ggactcccgS aatcccacac gctcgcgaga gcgggaa tgt aaactgcaag cccatggcca ccgggcgcat gcggcgcaga aagcg cag cc gccttgaaca gcgccgccgc tcctcttcgc taatgcacgg tgtccatcga ggcgtgcgcc agcaaaaaca tgcgtcacgc acaatgagcc gcgacccagc cgctgcagcc agatcggcga agccagcgca gcggcgatga ctacgcagac gccggacca t actgtggggc gcaggcgtct gaaatccgaa ctgccgtcgc ct ttcaccga aagacgactt tccgtctccc ggattgccca gcaccccaacc t caagaaaggc g cggaggagaa :aggcaccgtc a gcagcgcatc g agatcacgcg 3 cgtccatgac i ccagccgccg I gtccgccggt a gcgcattggc xtggtgcgaaa :aggtttccga :gcttcaggga :gaagccccag :gatcgcgcag Lcgtctactcc faatccgaaac Icagtagcgaacgccggtggc gcgcgtaatc tggccggct t gcgcaagggc cgatacggcc cgtcgggcat cttcaatctc ggt cgaggac tggcgggttg gaaaggcgat1 cgccggcggg tccggcccaa ccggctgcgc ccagggtctg cgatttcctc ggttgccggt atgaccgcgcc gcagttcgtg c gcagaccctcc taggtccttc c tccggtcgat g tctgcagaca t gcccgatcgc c actcgacgcc g agcgggcctg c gacgaatgtc c acgacgccat c gcggcacaat g aggtctcgga a cgccctgcac g agaagaaagt g acgtgaccgc g gatgctcagc gttcgcggcc gagcacgaaa gattttcgcg gtcgagaacg ctcgagttca ggcgatggct ggcgccgcgt aatcaagtag gcaatctcgc cgcctccacc tt Cgaat tt t catgccgagc Ct tgcggccg cagatcgaag ttcttgattc atgcgcgatg gatctccgcg agcgtgaat c ggcgg tgaga caaggcaacg cacgacttca cgagcat tcg aacgccgagg cacttcaccg tccgggggca catatccgct ttgattctcg atcgagatcg accccagaaa 3ccgcgcgtg 3aggagatgc lagcggcggc igcgcgcagc< ~gctcgcgcc S :aaccagccg c ;atgcgggcg e :agccgcgca t jccggcgggc a ~gcatcgagg a :tcacctgga c rtgctccgcc a accgtctcc g ccggaagca t CCgggtcta t agcatctgt g gttccggcc a cgatggtct g ccgggttgc g tgtcgcgcg g gccggcaggc gcgtagtttc tgatcgagcg gccatcacgt gctgcggcat tcgcgctgag tgctggacct ccgatcatcc gttccctcgc atcgccatga gattcggtga gcgaccatct aaatcgacca gcatagtaat CtCgcctcca gtccaattgc cccagcgagc cccgcacgct aggactcgtt aacgcgatag cggaaggcgg tctccgatgc ccgcccaggc gcgagcagaa gcttgcgggg tcgatctcga tcgtgccgag ccgttgttcg1 atcagcgtgc ggcgcctgag iccagccata igaatgcggt :ggagttcat :atgcggtga ~ccggcggct :gcatgtcga t cCagttgtc c :gcgtccaca c Lgccaggttc c LacgccggCg c :aactccaga t tctgacgcc a rcagaatccg g gcaggaccc a cgaccgcga t tgaaagtca c gagccatat c ccagccaag g tcgaggctt c gggaagcgt
C
cttgggcctt cc tgggcggg ggcagtcgcg tgcggaaatg gaatcacgcc aaacatcggc cgggtgaagg agcgtgcgac tatcgaacgc cgattttgcc tggtcgtcac cc tgcagcag gcgtgtacga cgcgcttgcc gcagatcgcc ggatgtcgtc gcagatacgc gtgcgatctg cgccgggcgc gcagggcggc gcgtggtgac gaaagtcggt gcgggctgcc cgtcgaggaa 9gcggcgcga1 gccggaacgg ccacgcgcgc tcagcagttc agttcagttc cgataccggc 3gcgcggtgc :gcatgagggt :gagcttttc c ~cgcgctatc c ~tgcagcggc S :gcgctccga c :ctctccgtc c :cacggaacg t :accggcggg a :agtggaacg c ctgctcgag c .cagctccgc c rcgggacggt c tgtcgatgc c ggccagctg c atgctccag c gagatacgc a cacggaagc g.
gacgcgttt t gagccagaa a ccggtagtag 10140 cgcgcccagc 10200 ggtgagcaag 10260 Cgcttccgtc 10320 gcgcaatcga 10380 ctgCaccgtc 10440 cgcgcggcgg 10500 ggtaagaccg 10560 cgagcgtagg 10620 gatgtgccgc 10680 tcgcgtttcc 10740 ctcccgggtc 10800 gaggttcttc 10860 gatctcgatg 10920 ggaaagcgaa 10980 cacgaaagcc 11040 tcgtttttcc 11100 gattgccgcc 11160 cagccgcgcc 11220 qgcctgttcg 11280 gaagcgaccg 11340 gacgcctttc 11400 ggcaatcgcg 11460 gttcaggccc 11520 tgtggcccgc 11580 ccgatcgccg 11640 3acgtaacgc 11700 3tgcaggagt 11760 :ggatgttcg 11820 ttgcaggatc 11880 :tgacgccag 11940 :tcgtgctcg 12000 :ggcgaggtc 12060 :acagcgccg 12120 qgcggctga 12180 ~gcatccagg 12240 agcagcgac 12300 .gcgggatgc 12360 ccaaacgcc 12420 gtttccgag 12480 gcgcggaaa 12540 gccggcacc 12600 gatgcatcc 12660 tgccggctg 12720 aacgtcatg 12780 acgcacgga 12840 gtagccggc 12900 agactgagt 12960 cccaacagc 13020 cgacgacgc 13080 tgccagggat accggggcag gcgcacgcaa acagtgtgcc cagcctcata gagggcgccc tggtcgcggc gcagagacgg aaccagcgcc ggatgagggc tgatctcgag aaagacatcg tgccagaaca gaaccggcg *a tcgcagattg gtctccagcg tcgcgccggt caccgtggag atcccgtcga 9cgactgcag gagttcgtcg gcgaagtcca ctttcaccgg ccggcaagac tcggccaggg cttcgacttc accggagatg ggcgatagtc gttccgtgta agtcgacagc accatcgcca tcccgcccag gccgctgatc atccgcgccg catcctgcag cgtcagcgca atgctgtgcc cgatgacggc atccggctcg gcgacctgca gcgcgaagag cgcaggctga tcttctttca gcgaccagtc cacataaggc tcgcgaataa cgggttcgcg gtccatccag tgtcccgaga agacgaatac cgtcttccgt tgcgccgcca gttcttcagc cgttctgccg cggcggactg cggccgtgta gcaaagatca aattccccgt atgcccgcgc cacccgacgc agcacataca tcgcgtctgc aatgcccgta tcgggagtgt ctgcagtgtc gggagtgtct gtgcccccag tgtccccctc cgcgaggggg cgggaatgga acccgctcgc agcttcgcct acatgcgcgt tcgtgccgga ccaaccaaac cccgcgggcc acggccggac ttccttcaca ttcgggttca gctgtctcac gtgtaagctc accgctttaa tcaatcccgc tatgcctgcc agggacccga ccgcgcacac atcgccgaca gcctcgatct cgatgggatc gccgagcgga atctgctgcg ccgcgacgcc cgcattggcc acgacattgg gagcggtgag cccggccgag accacggccc acacccggtc tccggccgcg acgccgcagc cttctccgaa cacgatgccgt ccgctgggcg aggcggttcc catcttcgag agattcactc cgccggccac ggccagcgta c agatgaaccg ccgccagcga agacgagcag g aagttcagca aataggaaag tcggccggcg a tacggatcga tgcgcgcgcc atcggcggtc t tggatcccga cgaagacgcc cgtgcggctg c tcctccagtg cctcccacgc cacttccaac a gcctcgcgtg gcgaaatgcc gaaaaaatcg t ccggcttgaa tcttcaccgg cgtggcgggg t tcctcgtccc atcgtccagg cggtacctca c tgccagaact catcgggccc atcgccgccc g atgggttcgc gcgcgtcgcg ttcggccgca t aggtacgcct gctgcaacgg cgtaaggttg g aggctcctga aaaatgagcg aacttctgtt t acccggcgat ctggtttgcg acggcgtcga g~ cgcccgcggc ggtgccaacg gtagcaaggg t.
ttgccggaag agcgacgtga gcat tgccgc.
tgcccgctgt cgagcccagt taaaaaggta cacaatgggt acgcc tcgcc acggtggagt acggcctcac cggctcaaca cccgcgacat attccccagg atgacctcga cgcatggcgg ctgcgcccca cgctgggaag gaaacagcga cgcaggctcg agttcgtccg gttgccgcag gcagtgtcgg acagccgccc S ctaccagtcg 5 gcgctgacgc c atgtcgagcg a ggcggtatcg t gctccctcca g ggtcgcggga g 3tCcccgtgc c iatgccgacc g :ggccatcct g igtgcatcgg a :ccgccgccg c jtggcgtaca t :actcgccgc t ,ccgtgtcga g tcacgctat g .gcatccaga a cggagagcc c .gcagccgct gi tgtcgaaac c tcaacgatt tc gaatagcat cc gaaaccggc ac cgagacgtc gc ggaatcgct c caacaaagc tt atcgtctgc a agcaacggc aS ggtacacggt gcatggaacc cgatggcggg cggcaagatg acgtgctgtc tggtcgcggg ccact tgcgg gctccagcgt tgggtccgtt attccgcgag gccggctgcg gagcggcggc aacgccacaa cgcggtcgag cctcgcagcg ttccgatcca ggatcgtgat gccggcatcg Jtgcgtgcga caccatgcgc tgcctgcagt jagtgc ctgc ;cgcagcggc ~cgccgc ttc :cgcaaacct Lcgttccctc :ctcgtgact ~gtggccgat rgccgacggt :atgggcttc gatgacgac 'at tgaccgc caggcgctt gtcgaaggc S aaactccgg c gcgcaggct c cgcgatgct cgccgtgcc g atagtcgct g ttccatcgt c ctgcggatc a atcgatgga a caggatgcg c 3actccact g 3cccagacc c :tgcatgtg c ~gatatcga a :cgatttct t jcggcggga c ~caacggtc g ccgccatgcg 13140 gcgttcgtcc 13200 cagcaggatg 13260 gcggatgccc 13320 gaggctggtg 13380 ccgcggttga 13440 gctatgcgcg 13500 cgcgacgacc 13560 cgacaccgcg 13620 cggcagctcc 13680 gctgcaaatg 13740 gacctctccc 13800 tgcggcgatg 13860 cttcgccagt 13920 ttccaacgcc 13980 ttgcgatccc 14040 cccctggagc 14100 gtgatgcgtg 14160 Cgctgtcagc 14220 ggacagcgga 14280 ccctgcaatg 14340 aatgtcggga 14400 ggcggggggt 14460 ttcaagaacg 14520 tcgtctcgaa 14580 caaccggata 14640 caatgcgagc 14700 gttcgatttc 14760 ttccgccagc 14820 3atgtaaccg 14880 tgctgagac 14940 4gagccgcgc 15000 ~agcaccacc 15060 ;cggcagcga 15120 :gagaagcgc 15180 :tggcacgcc 15240 ~ggtccttgc 15300 rgtggcggta 15360 rctttggctg 15420 :tgccccgca 15480 atgctgacg 15540 tcgagaaat 15600 cggaccgac 15660 cgcaacatc 15720 acgatcgcg 15780 tccagcgtc 15840 ctcactcgg 15900 tgtccccca 15960 tccggtcct 16020 aaggttcag 16080 cat tgccgg( cggagaatgl ccagacggtt ctctgccgtc gcgtttcca< tgtcggccgc atgcggccg gcggccgaat ccgaga tgcc gtgtccacg ccagcgcgtc tcgcgctggc gcaggt tccz gattgagcac gcatgtcgcc catggatgat gcagaacgac cgccgagccc atgggatctt gatcctctgc catcgagatc t tccgtgcag atccgcgcgt gcaatacgga gcggacccga caatcgccgg cgcggcgaac ggccgc tatg cggtgacgag ggccag tggg tcaaatcgtc tttccgtcag cgatggt tcg catcggctgt tggaatcgcc agatgcggaa cagcatcgtt gcccggcgcc gagtcacgat cgaactgcat cc tgcggcgg ggcgagtttg gcagcgctcc cagcggcaat ggggac tcag acagcatcac taatcgctgc ctcccaccac tgtccacttt ccagagctgc :catgctttco :cgggtaacg( :gcgaaactcc :gagtttcgcl 3 cccccg -cggtttggtc I cggatagta :ctgcgtcaat -gcgcgcggcc j tccccacgcc ,gagaaatgaz ,ggaagagaac Lcgcgccggct Icatgccgtcc Iatcgatgatc 'cttgacgttc fatggcgcgct gcccgtgatc gtgaagactc gatattcgac gacgagcgtg Fccaggcttgt cagcagccac aagcaggtcg gcacgtatcg ctcgccgatg cgacgcggtc ggcgactctc acggacatcc cacttcagcg gtgatggacg cggaatgtcc gaaggtggga gagctccgcc cgcctgcagc ggaagctccg ttccggcagc gagacct tgg cacatcgtgc ctccggcagc acggatatcg gccgccgt tg cagcgtccgg cgctgcccga ttccagaaac gggt tggcga gccggtgacg gcccggcaac cacgcgacgg gctgtcgccc =agcggcaggc =cagatcaggg 3 agggccatca :ggatcgaagc 3 cgctgccgcc 3 tccgcggccg i cggagccact :gctttgtaga :agacggtctc Iatgctgacgg Ittgccggcgg j agaacaaaat :actttcggag jgccagcgtgc gcgagcgcat acaccttcca ccggcggcgg agataggttg faggcgcggcg :agcatcagcg rcagcgcagct cgaatatcaa aggcgcggcg tcgatgcgcg gccgtgcggc acgagccaag cat tgcaccg gactgcagcc accgtggaat agcgagaat c agggccgcgg acgcgacaaa ccgaagtga t gtgcagcggg atgccttcgc ccgccctctt gtc-agcgcgc cgcgcagcgg agccggtgat ggcgacggga atccaataac ggataaatac cgcagtgcga tctccgagcg acatcatgac agattacgag gtggagtaca gccgcgcaca cagaagacgt c gacaggacggt cgagct tgtc tggcagaaag acgaat cgaa gcagcacgtg cggctggaac cgggtgcgat gcgcgatatc gcgcgcgcaa cgcggt tcgc cgggaagacc cgtagttgcc gatccagcgg ccatggcggc ctgccagatg ccga tagctg gttgtggccg cgagccatcc cgtcggcacg cccataccgt tcgcgagatg cgggatgctc tatctttgtc gctcaggcca gcgagggaca aggtttggcc gtccgccagc tccaggtggg ccaccaattc ccggccgcaa ggtccagacc gcaactgcag gcacctcacc agccgcgatc cgcgaatcgc tgtgcagctt catgacggag ccgtcaatga cgagcgccat ccgcgaaatc tccgccggcc gctcacgctgc gagaccagtcc 3gtttccgtct tttcctggat c :acccgccgc zccagtacgc a :gccaagggc g :ggagtccat c :tttcgcctc c :gctgcgcgg g ggcgagatgc ct tgacgcgc tccgagatca tccggct tcg tgccaggagc gccggccagc gaagttcatg tccctgttgc C tgtgcggcc C tgggcgcgg C tggccggga ccggccggcg ttcgaagcgt gaacacgccc ctgccggtcc aggacgctcg cgccacctgc gaacgccaca gcct tgccgg cccgcagtcg ataggcaatc ggagt tgaga gcccgct tgc gtacacaat t gcgcttttgc at tgccggca aatctccgat cgccaccacg gaccgcgtat taccggcaca a.cagcagtcg gttgccgcgc ccacagtcgc atccagatcg ccaggaatcc :accagttga :acgtgttcg tgccaggtgc :ttcgcctcc ~ggcaaagcc :cagggatag :ggagtgact :tcatcgcgc :ggctggacc ~gctgcggcg ~gaaaccagc ~gccggccgc 'agatggctg agttcccgc rctgttgctg tgcgccaggg 16140 agcccggctt 16200 cccagcgtcg 16260 tgcatcagca 16320 tcgctgcgca 16380 agggacatcg 16440 acagcgacgt 16500 ggttgaataa 16560 aaaccaacct 16620 cgcagatgag 16680 gatcccactg 16740 gtgagttcgt 16800 tcggtcgtga 16860 cgcaacggcg 16920 gccacgtccg 16980 ctgcgtccca 17040 agtccgagtc 17100 tcgggtgcgg 17160 atcgcaactt 17220 ttgctgtgcg 17280 gtccgcccaa 17340 accgcggcag 17400 acgatgctgc 17460 tgacggcacg 17520 agagtctcgg 17580 ttggcatcgc 17640 tcgccgagct 17700 ctgccggtgc 17760 ccccagaccg 17820 tgcacatctt 17880 atcgtctgca 17940 cagatggggc 18000 gaatagaagg 18060 atggatgccg 18120 tcgcggctgt 18180 acctgcctgg 18240 acatggtgag 18300 cacgctcccg 18360 acagtggact 18420 tgagactcga 18480 ttgggcagcc 18540 ccgttagtca 18600 cgcaacgagg 18660 acaacgggt 18720 acggccgtcg 18780 :cttcaccgc 18840 4gctgaagcg 18900 :gcgaggcaa 18960 agcagttcgc 19020 3cggcaatcg 19080 agacccgat cagccatt t gagcggctt tgtggccga ccccgactg cttgcgtca cacggaaca gtcccggca tatcgcggt t cg tcc cgg aggtgtcgg, cctccggacl gt tgcggcgi tgccggcgai cgaggggaa, gaatggctt( cctcgaggtc gccccgctcc cggttccgtc cc tggcggat cgtcctggtt cgtcggagac ctgcggcatc agatcgtccc cacctgagcc tgtcgatcgc cgctggccgt gcaggttgta gCCggtcggg gctgctgtgg aacggtcaac gatccggatc tgcggccc tc agcccgcgcc tctgcacctt tcatgtttat ttccagttct tggaaatacc cagac tcgcg cat tagcgaa cgcgatgccg c tgccggggt acgcagcaac atgcaggacg catcgactgg ggt cgc cggc cgcggcgtag gaagaagtcg cggcgccatg cacacccgcg cgagcgcccg tccctggccg atcgagagtc t cacggcgtcc c gaacagggcc g cgaccagtcg c cggttcagaa a aacgaaaacg t cgggttgccc t gaatgccgcg c ggggttgtcg t gcgcgccgat :ctgctcgatg :gcgccgcccg 3 cgtgttccaa :gttctgcagc I gccgaagttg :cagcgcggcc I cgcctcgaca cacggcttcc gattgccgtg Itcgcttcagc ,gaatgcctta fatccggcgag Icaggctctga catgttcggg attgcccgtg ataatcgttg tggaagcccg atccaggctgz ctgatcgatg z gtagtaggcat gcgcagcaac gatgagggcg a gtcgagcgca a tgcgtctcca a tcgcgggcga g gcatggagca c ggcaaaggct g tcgagtccca g acaaagcgcg c gtggcttcgt t tcgccggtaa t gcctcgagaa t ctggtgcgcg c aaacccagtc c~ cctgcctgcc c agcggcagat c4 acgcttgcga t4 gcatggataa c gcgatggcag cgtactccgg agcgcacc tg ggctcgattc ggt tgaatca acgtaaggct gccat caggt acttttcgct gccagaaaac cggtgttcgt ttcaccacgt agcggaagca agcacgtgcg ccgggtgcaa tcgatgtgcg atcagcgcca gtcttcaccg gcaagggc tc tagctcacat tgtgcgggcc ccgcgaatca acaaccacac .agcgaccgt agaatcagat caggcaaggt ccctgcagcc :cggtgtacg 3aaaagatcc jcgttctcga ~tcgcctcgc ~agccgccgt :cgatgtccc ;accaatagg itgggctcgc itcacggcgc S iccactggtc S rgttctcaaa c :gtaactgac g 'ccccagtga t ttcacccaa a cagtgactc c caagcgggt g cgagcccgc c gtgtagacc c ggccattcc g gcgccggtg a ggcaggacc t gcgggtgtg a .cgcgtcca g gccgcgcag c cgatggccc cgagcatccg caatgtgcgc cccaa tggcg cgtcgatgcg gcatggcgcg cgcggcccat tctggccgcg tgtccagccc atgaagtgcg cgcggtatgc ggtacggtgc cattcgtacc ccggccaggg ggc tgggcgt ccttgatcag acccgagctt cggcttcgat ccagcgtctg cgt tcggcgc ccgccatcac cgcagcccc cggctgccat tgacgccgcc gcaccgcgac cgaggatgta cgtcgatatg ccatgaagac :cgcctccca ?cggagcgat iccgcgtgta igcggtcggg ~atcgagatt :gcgcgcgct c ~gcgaagctt S ~acctgcgcc a :tccggcgct t ratcgcatcg a tgggagagc 9 ggctgctgt g ctgatgtgc g cgcagttga 9 ggtagcgcg g tcgtccacg g gctcccgac c tgcgccagc g aagagcgag g tgatggagg t tcctgattc a ggtgacgtt t gtccagcgct gctgcgcagg tgccgcgact ccacagtccg gtcgagcggc gccgcactct gccgggccac cggcacaaaa ggccatcaag gcgagcgcac gcgcgccaga gggcgtactg gc tcaagccg ggtgagccgt ggtcagattc tgcggccacg cagcttgttg gggatcgccc caagcgcgcg cgtcagtcca cggatcgcga accgcggacg cgccttcagc cgccagcgcg cagcgatgac cgagagacgc cgcatccccg gccggt ccgg 3gtgacttcc accgaagaac :attcggccc :ggaacttca ~gatgcgccg :tccagctgg jctgagatcg1 Lgccgcgaat ~ccaccggtg Lgcgacggat ragttgcgaa rgatcgagcg 'caatgaggat gtgaaggcac caatctgaa t agaggctcg c agcgccccc a catcgagaa c atgccgatg a gtacagcgc c gcagtacac c cggtgtgga t aattccacga 19140 caaatcaccc 19200 tcgcccatgc 19260 gccaaggcga 19320 ccctgcaact 19380 tcgatggcgg 19440 tgcgatcctt 19500 cctgtggcgg 19560 tcctgcgcgt 19620 gccgtgtagc 19680 tcacgcagcg 19740 gacgcggcct 19800 aacgagttga 19860 gccgggattt 19920 agatggggcg 19980 cccgctgccg 20040 ccgttggtgc 20100 agcggcgtgc 20160 tctcccacag 20220 ttgctgcgtc 20280 tcgcgcagcg 20340 tagccgtctg 20400 ttgcagaagt 20460 aggtcgcttt 20520 gagcatgccg 20580 ccggcggcaa 20640 ccgcgcattt 20700 ctccccgcca 20760 agaag~cagcc 20820 cgggcgtcaa 20880 gtcgcgccgg 20940 cgtaccgcgc 21000 gggaagcggc 21060 :cgatgcgtt 21120 :ctgacccgc 21180 :gagcagcgc 21240 itgcttcggg 21300 igtcgaacag 21360 ;ttctatggc 21420 ;tgtggaagt 21480 :ggcttcgcg 21540 ~cgcggcggg 21600 .ggggcattc 21660 ~cacgccggc 21720 .gttaatgct 21780 ggcgttggc 21840 .aaagagcac 21900 ttccgcctt 21960 gtcgtccag 22020 gcggcgaat 22080 gagatccgcg acctcttct ttgcagcacg cgctgctgc ggcgccgtgl gatcaaatai cagtcgcggi acgcatctg( gagcccgccc ggcc tgcatc ccgcggaccc atcatcgag4 gtactcgccc t cgcgaaggc t tccgagcgc gagaacgcac ccggagaatt atcgataccc cgcggcgcta tgcggaat ca gtagctcggc caccgcaggc aatattgccq ggcatgccgg aaggatcaga ctccgccgac gtaatatcgg gtctgccacc cggt ctggaa gacacgcct t atatagcgtc cggcagccac cttaggaccg ggcgaaacgg ct cgc cggC C gatcgccgcc ggcgcgcgca gctctcaata aatggagacg catcgcacct atccggcagg catcaccagg agcaaacagc cgccggcaga caccggctgc gaaaagaaac tcgttggcga gcgccctgca atggccgcga aagcacatgt cagaatcacg :tccaccatc, 51 gtcgcgccc4 :acgtagcgci 7gcggccgcgo cacagatccc, 3 ggattgtgci I gacttcaggc i ctgtacaggi 4gctggcggci gccgcgccg( -tggcggagcc ,tgcgcccatc Iccctccgagt -tgcgcggccc Iagcggagcga Lagcaattgce atcgcggcca cccagttcta cggcgttcat *gcagcatgta tcatcgctct gggtgcgcga gcct cgagcg tggtctgaga ggtagcgaca ctccgctcga cgcgaatacg gccacactgc tgcttggcgt atctcgagaa acagggtttc agtttgcccg gcccggcgt t acggccagca tcc tgcggcg cgcgaggtga tccggcaggt c tgacggcgc tcgggcgtca gcgggctgcg tcgagccgtc gactcgtaga gcaatcttgg gcctcgtcgc t tggccgtga tacgagcgca gcggatcgtt tgragcgttgg t cccggctga t cctcgtttg c atttcgcga g aaaccagac z ggttgcttc cggccgccg, g ggtgCtcgc a ccgcactgg' 3 acttcacca( tgacgatcci 3. cggtaacgci :cactgtcgg( I gttcgggcc( Tcgcgagtga( -ccagcaactc -cccatacggc Latatcggatc LggCCggCttc LgCgCttCCtC Ltctcgaggtc *cgaccagtcc agttccacgz gtgaaagcac agt tcacgtt *ccatggccac gtagaggcga gccggcaccc tccaatggcq gacggtcgaa tgagaatcgt cgtcgggcag acacgtcgca gagcgtgatc tgaggttcga gcaggtccgc tctgcgtttt ca cccgaa at tcgc tgcggc tctgcatgag cggcaataca caccccagga cgccggcggt cattcagcag actggcggcc ggcgcgtctg tcgaccgggc atgcgacatc ccag tt c ttc cgggcgcagc tgccgccaat g aggccggcgi c tgtaagtcc( g gactgccgcl t ccacgccgac a agcatgcgc g cgcgatcac( r- cgcctgcgc< Sggccagagtc :gcgccagtcz a catctcgccc :caggatcagc g CCactcgacc gcgccggaaa ;cgcctcgccc gccctccagS I ttcggcgcca :cgccaattca jttcaccgaaa Igcgcagcagc igggcacggga Lgccgtcggac gagctgaaca tfccagcgtg fgaagtacacg agccgtcgat gacgagacca *gcgtctctcg c tcctgccaa ctcccactca gcacttttgg gccttcgtcc tcgccagtac aaccaagccg cagaatcgga gatgccctcc cacgacctca acagcgctgc ccggccgcgt cgccgccgcg Ct tccacaac cgcg tcgagc agctgcgcat catgcccgcg ggcgatgcga cacaatgcag gccgaacgac gagggccgcg tgcggccggc cccgaacgaa :ccggccagc zagggctccg -cgcgcgctg, :tgatcttcg I tcgtcggga :cggccgaag, 3 ccggccgtti I ctgcggcagg icgctcaccgi "tgagctgtgi :catgcccca( :tcatacaatt tctcacgccct fgtaaagccgt Ictgccgtaae Ltccgcaagcc caacggggat tgaatgcgct gaatagtacc gcgacaccac aagctgagta gccgtgtcgc tgccggccgs gccggggcca agtttcgact ctcgcaaccq aaaggatagg tcgatgtcga tcgcggcct t cccatgccga ttgagcgttt agcggat tcg atcgaaggat tccatgctcg gcacgtagag gcgggtccgt tcgccggcgc tcggcaacta tat tcgccga tgcgc ca agg aacgcgtcat tcatccatgg tat tgcgcac acccgtcgtg atacggtgcg aaaccgggct tctgtattgg gtcaccggcg ctgactgccg gtcgaccgtc tctgccgtcg caccaatctg 22140 a cgacgcgagt 22200 a gcccgccggt 22260 g gtcggcgggt 22320 c cgtcgaaatc 22380 t ccagatcgat 22440 :cccagagcgc 22500 a ccagccatag 22560 cgagctcata 22620 actagggacat 22680 agcgcatctgc 227.40 4 gctgtagcgt 22800 :Cgggcttccg 22860 :gaagttcccc 229201 :ccggcgaatg 22980 iaacaaacgcg 23040 j cgggactcgc 23100 tgagcggcgt 23160 gtatgcggcg 23220 1tgtctccatc 23280 fcgtgggcggc 23340 Ltgcggaacga 23400 gCtccgctga 23460 cgccaaacac 23520 ccaccgaacc 23580 cgaagataac 23640 ctacaggttc 23700 ccggcagggc 23760 acccaccctg 23820 tacgcagcga 23880 gtagaaccgg 23940 ggataccgtc 24000 ccagctgtcc 24060 tgcgccacgc 24120 agctgtgaaa 24180 ttgccagcgc 24240 tgatggccgc 24300 tgaccgcagc 24360 agccgagcgc 24420 cgctgtgtcc 24480 cccactgcaa 24540 cggccaacag 24600 cggcgcgaaa 24660 cttgcccggt 24720 cctccgtcag 24780 ggaagtgcac 24840 ggttgtccat 24900 cggacaggca 24960 gcgcttgctc 25020 ctct cgg 25080 ggtctttcc, atcaattcg ggcaagcac ggtct~tgac4 ggcggcgat, gatggact c gtccacaccl gcgaatgacc gaccacaccc gcgcccgtcc attgatggc4 tgccatcacc gcggaggt tc cagctggtac caggtaggtz cgcatatccc gtccatgatc gccgtccagc gccgcccgcc attgcgccac tgcgat ccga tgccaggccg caaccccagg gacatcaaag ggacgcaatc gggcaaaccg caggctgaca caaatcgctc atcgtccttc aatctcggcc gcactcgact tcgatcacgc cgccagt ccc gcgcgaatcc atcggaggca tccgtaaaga cagagcttca caat tgccgc ggtggtgagc ggtgatcaac gaccagccca t agcgacgag gatgtaggcg ggcgaccgcc cgcgtgcggt gagaggcagg cgagcgctcc atgagccagg gacaacggcg cccgt tcgga g gcgggccag c ggattcgga c gtcttgatc g gatccgata ctccaccgga ggcttcacg4 :ggagcggta 3 gcgtagatcc.
:gcgccgttgc a ggcgacaggi i actcccccgc accgacacci agcttgtaac gcggaggggc Lttgacgccgc Igcgttctcca tccgcttcgc agacccttgc agcagatctt aactcctcca tcttctgcga gcaagcgact cgctcctcga aagt tctgcc gtctccacca ggaagcgcgt aacgccgagg tgcagatcgc acgaccgtaa ggttcgatgc gcgccgtcgg cac tcgcgga gctcctccga aggatgtata cccagtcttt ttccacagct ccgccgcaga caaccgctcg cgctcgccgt ggcagccaga tcggcggctg tggcggatct agatcgtcgg atcatcgcca tgcacttcat tacgcgctgc atcgccagag cggttggccc acggcgtgcg cc-gcatcgc c cgagcagccg c gctggaaatt a gcccggccac t acacatcgcc t cgccgagcgg :ccgccatctc a accccatgcg j tgtcgccatc ccggcaccgt i tcatgcccgg 4 ccaaggcaat igcgaggtgga j agacacgcgt ,gtgagaaatc :atcccgcgaa Ltcgcttccca -gcggactgta Iccggcacgta *ccggcgaaag cattgcgcgc *ccgcagccgc cgatcgtcgt gcagtccgcg gatagtcgac catcgccgcg tcgcgtcgat ggatcatgta gcacgtcgcc tcgcctgctt gaaacccgcg aacggtagcg atttctcacg tgtacagctct actgcgtgtt gtgtcttgtg c ctacgccact z aaacacggaa q gggtcgcctg c caaccacgat c tctccagtcc g tcagacccgg c ccacgccctt c gcttgctgcc g tcatcgccat c cgagaatccg g cggacttcag a cgatgatctt t gcgcattcag c gagtgcgttc c gccgtgtcgc a cgtactcaca caggc tggga accggccgcg gcttccgttt cgtggctgtt t tgagtgcgc ctcggcgcca ggccagagcg gccttgagcg ctggtgcagg gt ccgaggcg gcacgccgtc ggccaggaaa gaacggctcc aacgccgatc cgcgcactcg gccgaagaac gctcgggtcc cctggcgatg ccccgggaac aggttccgca atgccggaac cacccgtgtg gtgctccacg catcggctcg Lttgccgttg atcgggaagg cgttgccgga cacggcaatg cagcttgatc igccagatcg :gtgagctcg :cccggaact ;tcgatggga :acggccgac ttcgagaatg ~cctttaccc S :atgaccgta :tcgcgggtg s rgcaatatcg a :tcgcgctgc a 'ggtcgcccc g ctgacgaga t atctcagcc a gcgagacga t atcgcaagc a cccgggccc g tcggcgtag g gcctgagct t ttgtgctgc t cgaaacggag ggaatctggc acgtctagat ttcggaaagt ccgtgggcct cgaatcaatc tcattat Laa cggc tcaagc gactcatcga taccccacgg ccgcgctgca tgaaccgtca tccttgtcgt gcggtggcga gaacccttat agaaacacgc gcggcat cga tggaagacct gaatccacac cggccggcca gcggcgggt t agatcgacga agcattagcg cgcagaact t cgagcagcaa ggcgtcagcg cgcgttgcca acgagatagg tgcgtctcga tggcgatcga ccggtagagt 4ggttgcgat ccggggggaa tggccgatcg :atatggtgg :ggcgcgcca ;gcttccagcc ~cgcccgactt rcgacgatga c Latgacacgg t Ltggagcgca g ~tcgatccgg a tcgcagctt c *ccgcctcct g CCttggggt g gcgcaatca c cgccggatg c tcaactgat g cgaccagtt c cgagcatcc 9 tgtttgcgaa 25140 cgcgatggac 25200 gaccgatgtt 25260 tggcagcgat 25320 cgatgtagcc 25380 gcgtctgacc 25440 tagccgctcc 25500 gcttgaggac 25560 aggcgcggca 25620 actgcggaac 25680 agctctcgca 25740 ggctgggccc 25800 tggccgtcag 25860 ggttgttcag 25920 agcttcgcgc 25980 gatgctgcgg 26040 aaaactcgat 26100 ccgggctgat 26160 cgtcgcgcag 26220 tcccgataac 26280 cggatttttc 26340 cggagagcgt 26400 agtgcccgcc 26460 cacgccagat 26520 ccggcgttgt 26580 gaagggagga 26640 gccacgaccg 26700 cgatcagccg 26760 tcgcggcctc 26820 ctcgtcccag 26880 aaatgcgtcc 26940 gatagccccg 27000 ccggctccat 27060 3cacgatgct 27120 tctccgtcgg 27180 Ittccggcgg 27240 :cgaatccag 27300 :atccagcag 27360 :gcgggcgcc 27420 :ggtgacggc 27480 ~cagattgac 27540 iggtatatat 27600 tggttcgac 27660 rcgtgaggac 27-720 rcgcgggatc 27780 'catctccag 27840 gctcagacg 27900 gtcttcgaa 27960 atgcgcaca 28020 gcttcggac 28080 cgcgggggac cagcagggt gttgtattcg gagtttggat gatggccggc gtcccgcgga ggcgctcaaa cagatcgccq ttgtgtacgg cagcagcacc aaacgcgtgg ccaggtctgc cgacagctgt ggcccagtcg cgcttcgtac gatgtggctc ccgcagaagc ccgcgcttct gaggctctgg ctgcaccacg cagcgctatg ccgctgctgg ttcggggctg tatggtattt ctcatcatac gctcgccttg gtgggggcgt gcgatcatct ccgatgccat gtttccggag gcgttacggt gtcgggatca gtgcgcgtat z ggacgggaagc ggcccatgact agatcggcgg 9 agtgtctctg g cgtgcgttcg g cgtatttcgg c gatccgatat 9 cgacgcggca g aacgcgggat c aactcattcc g tccagcagcg g cgcaccgcct c gagttcagaa c tgccggaacg t agcgccttct g agcagcagtc c cgatcgacgg a aacagcgcag cggtacgcat gcggacgcca ccgccgttgt gcatcgggca ggatgcagt t gcggtgtcgc cgcaggacga cccgcgactg tgaaacgccg agtccatcgg tgcgcgggcc ttcttccagt ccatactgaa gatcgcgtca aacgtcagca gggggttttc gcgattccgt cgcgcctctc gcctccagac ggaatgt tgt gccagcgaca ttttcctgca ggcgacgaat gctcgatttg cccatcctgg taaacacggt gcgtgaccag ccatgccgag jcgtgaatgc1 :gatttttcc :gtagtcggg ~gacggctcg :cggcggacc :gctccagtc jatcgacgcc ~agcttcgtc a raatctcggt a cgcggtctg g rcaggatcgc c gatttcaat c gaccacgag t ggtcaacga c gagactgcg g ggccagcac g aaccgcatc g gcggacatt t cgaaaagtc c cgtaccaca g atcctgcac c cggt tgaaat cgagcaggga t cagtc ca tc gcatgtactc gcagcgcgaa cctcaaccag gggtgcgggc tcatgttggc gaaccccgat cgattgccgt tcaaatcacg gggggcga tc actgctgcgc tcggcagttc ggtcgcggac ggagaatctg gcaggtcgaa cagcctgaac ca tccacacc tccgcaggag aggcgggaga agggtgcggc gcagacggtc tctgcattac aacatzctgac gtctacgccg tcgaaagggc cagcgaatgt cacctcgccc ttcggcgtgg 4ttgggcgtc :agcttctcc agcggatcg ;gccgccgca ~attgccacg ;gaaggcgac ccgttcatc aatgcggcc z rcaacgcctg c :tggtagcgg a :cggccgatc t tcctcttcc t gcgggtgcg c acatcgccg a cgccgcaga t cacgagcga c gccaggccg g gtggcccag t ccaatctcg a cactcccgc a gcggacgtcg ggcgatggtt gccggatggc gcgcgagatg ggagacctga gcgttcgaaa gagaagactg gaaacaaccg aaggatgtct catgaacacc gccgagggct ggtaggaagg ggtttggttc catgagcggc gaacgtctcg ctgcttgtca cgggatctgg aaccggaagt gccaatgcag gacgcgaata atccgggtcg atcccggttt gagcaattgg aacccgctgt atttgggaaa tcctgaatag acttccacgt cctccagagt caaatgtggg gctcgccggc agcggcattt ttcagatgag :tcgtatacg :ggcagaaggt :ggtacggca S Itctggcgca S :aggtcacaa t ~actccggct g ~gatccgatt c ~agcgggtca g :ccggaatct g ~gcgacgcga g :gctgaactt c icaaatacga t .acgcttcgc c tgtcgatct c 'tgcggtccg c actccgaac a gcacgcgac g tctcggcag c ggatccgtca gccgcatcga tcgagggtca gtgagcccag aatacaggcg ggaaagtcct cgaaacgacg acgagacct t tcctgcgcgg gtcgctcct t gtggt ctc ca tcgaggaaag agcgacgtct gatggccgcc accgaczcacg tcgaggcaga gcatcacgca tccagtgtca c tgcgcaggc tccagcggac agctgatgga tcacgccgcg cggcgggcga gttcctagtc cagcgatcag cgacggcgaa ggagcatgtc cgaagaag tg :gagtacctg :gggctcggg gtggagcac :acgcaactg ;gccggccag :cgcgtcgaa ;gtcttcgtc lccggtcccg ggcgctttc ragcgtccgt cggctcctc Ctcgttatgc ttcgcggag 2 cgaacgcac taaagaagc gccgcccgg t ggggaagta c gcacgegtc g gtgagcggc g gtggggagc g cggccgcga g tggaatcgg c ccacgctcgc 28140 acaaatcggt~ 28200 cgccgaggtc 28260 gCatgacggt 28320 accggctcag 28380 gatgagagag 28440 gatcgtcgcg 28500 ccgtttctcg 28560 tatagcgatg 28620 cacgcaaggc 28680 cggcgccccg 28740 gcaaggtgcc 28800 gctgatggac 28860 cctgcacgaa 28920 catccgcgat 28980 tcagcttggt 29040 aggccatttg 29100 ctcgcgccag 29160 tctcgtgccg 29220 ctcggatatg 29280 gaaaccaaag 29340 ggatgcgatg 29400 gcgagaggtc 29460 ttgggcggcg 29520 caaatcggcg 29580 gcctcgaacc 29640 gcgcacgcgg 29700 atcatggacg 29760 ttccaccgga 29820 atcgggcagg 29880 gacccacgcg 29940 cggcacaacc 30000 gcggcgtcgc 30060 gcgtccgtgt 30120 catacgccat 30180 caactccccg 30240 4gcggtcaac 30300 7agtactctg 30360 :gcttcccgc 30420 ~gaccggccg 30480 igcaaagaag 30540 ;cgttgccga 30600 ~taaaacgctc 30660 :ttgaccaca 30720 :tggataacg 30780 rtcggCCgcc 30840 iatgtagtcc 30900 racgcggaag 30960 rgccaggatg 31020 tctccggta 31080 acactgcttc tccagccgac tatgtggtgt cccagacgga aatgtgtcgg cggctgccgt agatacttac cggccgcgtc tgactgcgga ccgcggcctc tggtggtcga tccggccgag cccgtgcgat acagccgagt ctgagttccg gacgattcag cccgcaacac cgatgggtgc gcgatggggc ggccaatcga caaattgtcg tttcggtagg cggtcggcaa gatcgcgagg ccgggccagc cggatgcgag gctttgctg, gccaccacg, aaca tgaac.
atcgacgaai gtcgaaccg ttggtgtcgc gtctcttcci gccgcgccgc tggatgccaz t tggcgctcc gcgtcgggcc tccgcggcgs ggtagatcga tgcaaggaga gcggagaggc agcggcatcg cccaacatga cgcagtccgg gcgccgagaa agttctcctc tgcgtccgac agaagcact agcgccttca aaactctgca gaaagctgcg ttctgccagc gctgtcctcg t cggcgagga agacgcgagc agcaccgcc t tggaccgttt cgatccacca cgcgcgatcc agcgcctgct ggttccggcg agttccgcaa gtatcgatcc C gcgcgagttc a cgcgcgcgcc g tcgtgagggc a gaaaattgac 3 aggtgtacag gctgcgtcca i gcatcagccg 4 ccttcaggac iccatttcgcc cgttgagttc tcaactccac Itcgcattcca tcaactggaca Lgacgttcaat ctccatcggt ccagggactc acggcacgga gacccaaccg gcgtttgccg gcaggattac cgttggtgag gaaacgccgc gtttcgcagt ccggcggact aggaccagta ggacgaaagc cgaggtagat z tgtgatgcac c ggaaacagtt t catcgtccgg c gaacgggtgc g ggtcctgcag c acgcgatgtt g ggccggcgct 9 ccaccgtcgt g cgctggggcc g ggttgcgtaa t gatgttgaac ttgccactgc cgcgcga tgc attttctctg gatctcgccg atactcgagc gccatgaggg gt atccacgg gggctgcatc aaccggtccg tccgtaaaga aagggcttca cagcagacgc cctcgccagc ggcgctcaag gagcagcgta gacggcctgg gacataggcg tacttccgat aagacgt tyc gcccaacagc gggtccagcg gcgatatgtc ctgagcttcg ctcttcgagc ggcgggattc cgtggcagca ttcgaccatc cagtgtgagg ttcgggaatg ctccacgatc aatccgggcg gggattcacg ggtgccggtc z catcagcgtc cagcgcgggt t 3ggacgatcc S Ittccaaagc c ;gcgaagctc g ctgcggagt t :acgatggcc a :tcgccgaga t :gaacacggg a Iccgccactc t gaacggcgc a rtatgcggga c 'agagggtag g Cgttcgctg a tctagaaat c tgcaccgcc a tccgga tccg gtcacgtgct ccgtcagcaa gcggtgacca gtttcgatgc gcgccgtcgc tcgaacgaat gcgagcccgg cgatcgtcaa tcacctgtcg ttccatagcg ccgccgcaga caggtggcgg ctctcaccat ggcaaaaaga tcgcggtcgc tgtgtgattt agatcggcgg tccgtgacat gccggatatt cc tgcgacgg ccgcgcgagc agattctgct aacagctcgt agctggatgc tccgcgattc tcaaaaatgt atcagatcca :tgccccagg cgatgaggct :gggcatacg S agcaccgtat t aagtatccga c ~ggatctctt c jcatgcagcg t :cgattcgga a ~acgggagat t :gctcgccct c rcgaccggcg g catccacca t *gaacctgat c tgaagggcg c tcacttcga a.
ccgcaatcg t4 acgcctgcg g tttccggcg c agagggcag gc ggccgctta g~ ggaccatgg gc tgagcgagt cc cgt tcggcgc cggac tcgac gggggacaat cagcatgtcg ggaacccgcg gt tggcggcg tggcgacgaa cgccgccga t gcacatagag tcacccgt tg ccgcggttcg gcgccgtcag gagttgcctg cgacggccgt tctcgagcgc tga tgcccgg gcacgccttt gagtcgcgag tcagcacaac ccggt tccag tttcgagcga ggagatagtg :accgaagct 3cacgcattg 3t tcccgggt :ttccagcag :cgtgttgta Igtcaaaccg ~catgcgacc ;ctggagcac ~gtactcctg :gaaatccgg :gagatcggc :ttgtgaggt c cgcgttctg c cgagtgaga S cagaaccgg a ggttccggc c cgcgacagg c caccagcag t cttccccga c gtggaagac g atccaccgg g cgttcgcag c 3tcgaaagc g 3cttcggta a iaccgaggc c itcgcttag a :agcaagac gi :aatcccat a attctgttca 31140 tcggtcgtgg 31200 gtaggccgcc 31260 gaccgccggg 31320 tatcttcacc 31380 ggcgagatct 31440 cttgtccgcg 31500 gtacagttcg 31560 ctgagtgttt 31620 gatggcggac 31680 ttgcaggagc 31740 gcggcggtcg 31800 catcattgtc 31860 ctcctggttc 31920 ggaaatgtcg 31980 ctcatgccgc 32040 cggccggccg 32100 cgggttcgga 32160 gaccggcctg 32220 cggaacgtag 32280 ccgcgtgaca 32340 cgcgatccgg 32400 caacgcgatg 32460 ggacgggaat 32520 tgtcagcagc 32580 cacggcgaag 32640 ttgcagaaag 32700 gCtctgtcgc 32760 3ccggactga 32820 3aatagaacc 32880 3tgctcgatc 32940 Itcgcctgaa 33000 ;aattccggt 33060 ~caacgggag 33120 :cgccgcgcg 33180 Itttccccgg 33240 Lagctggccg 33300 :aacagttcg 33360 ggaccgcca 33420 gaccagaag 33480 tgcaccagg 33540 ccgtcgatc 33600 acgctgctg 33660 gccggatga 33720 cctctcgcg 33780 ataaaccaa 33840 tgcgccgca 33900 tCcctggcc 33960 cgcagatcc 34020 :gcaccagc 34080 ggctgctgta agtccaccgc cgattccggg tttttttttt acagt taaat catcctcggc gggcctcttg ccgcccgtgg ccgcgagcac ccacccaagt aaaca ccagg ctatctgcgt ggtctggatg gtgtccgacg gaagc tgtgc gagcgcctgg cgagaagacc gtcctcaaag tttttttttt tgctaacgca accgtcaccc cgggatgatc gaaggtgagt ccatggactg tgct tcggga tgaggcagac gactatcgcg acgggccgga agccgggaga ctcagctccc cggaaggcgc aggccgtgt t tgctgccaga agacttccgg gttttcgtgt aggctgcgg, ccgcagctc( cgtgccctc( ccggccagg( acgcgccatc cgtgctcaga acgcagcttc ctcgacggtc ggcgacggtc tttgagcttc: accgctcgga 9 tggaggccq aaatagagtq tcactcccgg aagaaatccc tccagagatc tttaccgacg aacgtggaga gcgtaactac cgggaacaga ccgcgcagt t atatctgtat atccgccgtc cgttccagat tgattggcaa aaggggacc t ggtcatccac tact tcgcca t tgtgtaggg ggctgtgagc ggcgtcggct tcacgtagtg t ggaagcccq =ggctcaacgt =tcggtgcgcg 3 gcagcattca :tggccctggg I gcccgcttga igtctgagcct :cgggcaccca Icgctggtcac itaccccgtgg Ittcttagcct tcgaagccga ccgtcacccc ttcctctgcc ggcgcggtgc ccagcatcac gcgtagcagc cgctattgcc taaaccgaca aaatggcctg taatttcctg cacgcaccgt ccgccaaatc caatcccccac ttgataaatcc gctagctttat ctagggtccc c cggatcagct t actattgcga t aattatgtgc t: tgaacgaatt g gacaaattct t gccgacccac c tttttttttt gtcaggcacc tggatgctgt cctgtcagtc gtctaactgc gcgccgcagg cgaagaggtc cgtaacgtta ccggaccccg tctgcttccg tctcctccag tcggtgtcca gccggacgtt cctcccactt ggacctccac ccccaccgcg ccggacttcc caccggtcag gcaccgcctc gcacagagca ccgcccggtt tggcggccag aacccggtat ggccagcgag gtgcgatcgc ccgtcttggt ggttcgccgt ggctacgtcg ctctttagcat gttgtcatcg z gcccgctttt t: 3ttaatgata t: Itaataaggt t Ltattgatgc t: cgcatcgac g 'ttcgtcggt g Ctttcgagc a agaacggtt t: acgtcatcc a accacccca c gcttgtaaa c aattaatta g agtaaagcc c~ aacaagaaa a~ cgcttaaaa at :agtgcatc t~ :tagacatt at ccaactgat ct ctggaggag at cactgcagtt tttttttttt gtgtatgaaa aggcataggc atgcgggcaa gtgacaacgc tgagaagcac gtgggttcaa cggtctgcct cttgaagatg gtagacctcc cgggacgccg ctcgtcggcg agtcgcgtcg gtcaccggcg gcaacgcgcc ccggaccgag cttgaggaag cagggcgacg ggcctgggcg cagctccacc caggcaggag ccagcggtcg gacgtgcttc accgtaggca 3agaccgtct ttgccactc jgcgatccag :ttggccatg :tcagtgtat c Lataaattga S ~ggtaacgat E ~ccctgctgt c cat tgagcg c Cgctgcgaa g 'gataatccg t: caaactctt c cgacattaa a tgatcccca t: Cggggagag c Ccgtctgcc g cgttttgtg a taatataat c tcgctagat t' agccagcct ti :aataaaag c aacgcttga gI :ttgccgac t: :gcgcggat c~ :gatcgagg at tttttttttt ttttaatgcg tc taacaa tg ttggttatgc cttagccgag cagcgcacag actggcccaa atcccgccac cacctgtttt ccgtccatga tcagtcacgg cggt eggaca ttgatcccgt agcggc ttgc gcgagccgt t ggcagggcga cgccagacgg acgtggtccc acgatgtagg aaggtgagcg acgttgcgct1 tggaccgcct acgtcctctg tggcttaggt agccagtcgtt :cgtggtcac S :Cccgcttga c :gctgtctct t ;atcgatccc c ~cattaccgt: c ~ctgcgcgac t: ~gcccgcttg c :ctgagtggg c :aaatggtga c tttttccgc c gatgcgttc a cagttgcaa a atgtttacgf g cgcgtccag ti agacattaa t: ctgccattt tt tattaaagg tc ttaatgcgg at tcatgatat at agacttgac ct ttaagccgc gc i~Ccttggtg at 3atccttgc cq :gccagggc ct tttttttttt 34140 gtagtttatc 34200 cgctcatcgt 34260 cggtactgcc 34320 ccctacgaca 34380 cggcggacaa 34440 ggtcgagcgc 34500 cccgacagag 34560 ctgtgcgtgt 34620 ccacagcgcc 34680 ccgtaccgga 34740 gcagggacac 34800 tggcatcctt 34860 caacggccga 34920 cccagccctg 34980 gcgttcgccg 35040 cgatgtgcgg 35100 aggtcagcgc 35160 cgtgcatcga 35220 ccttggacgg 35280 :caccttgec 35340 4aaggctgcg 35400 :gctgaggtc 35460 ;ggtgcagtt 35520 :caccgcgtc 35580 ;cagaacctc 35640 :gatccgctt 35700 :ctcgtccca 35760 :tgcccggca 35820 atcaattgc 35880 ccctgactg 35940 cagatggca 36000 gtcagcagt 36060 gtcttaata 36120 gcccgggca 36180 ccaaacgag 36240 cagggttcg 36300 ataaaccgg 36360 tttttgctg 36420 ttacgctgg 36480 :ttgccgca 36540 iaaataaaa 36600 ;attcaaaa 36660 :gttgcgat 36720 :ctcccaat 3G780 :gatagttt 36840 :cgcgaagc 36900 ctcgcctt 36960 ragctggga 37020 .ttcacgcc 37080 cgccgctgct gagcgtccgc cgccgggccc cgcagcagcg ggcttcggcg cgggcccggg ggcggccgcc gggggccggg ggcggcgccg gtgtccggcg gcccccagag gaactgcgcc tccagcgggg tctcctcgaa cacctcgaag cgggcgaagt nctcggtccg cttccactgc cggtcgcccc ggaaggcgtt gagatgcagt tagacgtcgg tgaagtcgac gatcccggtg ttggtcccgt gcaggtcgcc gtggacgaac acgtccggca gccagtcctc caggcggtcc cggtggtcct cgacggtcgc cgcgcggcgt tggggggtga gcacggtgtt cccggtcagc ccgagttcgc gggccagggc gagcagcgcg cgccaggtgg tgccggtcat ccggctcatc ccgggccgca gctcgccgcg gccgaggagg accgcgtacg cctccgactc cgacgcgagg agcttgatca ccgggccggg ctcgccgacc cgcagcaccg gcggcaccgg cagcccgagc cagaattcct ggtcgttccg caggctcgcg agtaacaggg attcttgtgt cacagcggac tccgcacggc cggtcgcgac acggcctgc ggctggtcgg ccacgtcggg gacgttctcg acgggccgac cgaggcgacg gtgtacgcca tcctgtgtga aattgttatc cgctcacaat gtgtaaagcc tggggtgcct aatgagtgag gcaactgcgg gtcaaggatc tggatttcga gggctccaag gatcgggcct tgatgttacc ggggaattga tccggtggat gaccttttga attggggacc ctagaggtcc ccttttttat agcataaagc tatcgtccat tccgacagca tatatgcgtt gatgcaattt ctatgcgcac gccgccgccc agtcctgctc gcttcgctac cgaccacacc cgtcctgtgg atctgcctcgc cgcagctttt cgttctcaat ttcagcatcc c gagtcataaa gcacctcatt acccttgcca c tgccagttct gaatggtacg gatactcgca c acttccattg ttcattccac ggacaaaaac a tcgctttcag cacctgtcgt ttcctttctt t gttatgacga agaagaacgg aaacgcctta a ccgcgaggtc gccgccccgt aacaaggcgg a attatcaatt gcatactatc gacggcactg c catcatgatg gccgtgcgga cataggaagc c catttgcttt gtgacatcca gcgccgcaca t caacgtttca atgttggtat caacaccagg t taccttgttc tgcgctggct catcacgctg g caccggctga ggtgtttcga ttgccgctgc g cttgatgaat gacactccat tgcgaataag t gtccagctcg tcgattgcct tttgtgcagc a cgaagcgcaa tattgctgct caccaaaacg c ctgcccttct gatgtcagaa aggtaaagtg a gcaccgccgt gctcccgggc ggcggcccgg agttcctccg tcgtgcagga gccccgtcga tgcaccaggc acctcggtcg cggggt tcgc agcagccggg tcccgcagca ggcaccctgt ttccggtcgg accaggtagg cggggcaccg Ctctccggac agtacggggt tcctccaggg taggaatcat ctctattcac cgcaccgcgg gtggtgctgc cagcttggcg tccacacaac ctaactcacaI tcacggcacg cgagagcttg atgaccttta tttaaaaattt :cgccagtca c :cgttctcgg z :tggagccac t ~tggcctgcc S ~tttcggcat a :cgcctcgca g :cgaaaatgt c igagaaagga a tcagagggt a .accggaaaa t tcgccggaa a tgccagata a agttcatcc a tcagcagcg t ttaactttg a~ ataccaagg c tggatagca ci tcgaaggag a gaggtatca ai gtattgacc ac ttttctttc tc cggccggccc cgccgggcgg ggcgtcaggc gatcggtgaa aggtgaaggc gcagcgcggc tgtagcggga cggccaggtc ggccggccag gcgagaggta gttccgggaa gcagccggcc tcgtgccgtc gccacggcca gcaccggggc cgc ac cag tg tgg tgcCc Cc Ctcggcgggc ccgaatcaat agggtacggg atcaggcgt C ~gcgggatc taatcatggt atacgagccg ttacggatca atcatcgtgc jcac ccagcc itagattata1 :tttcacaaa tCatggcgtg Lgcactgtcc :atcgactac ~cagttcttc LccatCCtat raacgggcatt 'agccagctg c acgacagag S ttttaaata a tttcataaa t ggacccgca a caccaccgg g tcgcttc t ttttcagcg c acttatcgg c tgatgCtgt a catttgcga t :ggtgtcac g :ctcaacgc c 3gtgttcaa c 3gtattcag t gctccgggcC 37140 ggctccgccc 37200 gccgggggcg 37260 gccggagaga 37320 gagcagttcg 37380 caggatctcg 37440 gtctcccgca 37500 cacgaagatg 37560 cagcgtgtcc 37620 gccccacccg 37680 gacctcggaa 37740 gagcacccgg 37800 catcgcggac 37860 ggctccggtg 37920 gtccgccagg 37980 ctcgccgaac 38040 gccgggcacc 38100 cagcggctcc 38160 acggtcgaga 38220 ccggcttaat 38280 gacgatgacg 38340 gccaatctct 38400 catagctgtt 38460 gaagcataaa 38520 gtgagggttt 38580 gggagggcaa 38640 tgcgcgagca 38700 ttactaatta 38760 acggtttaca 38820 ctgctagcgc 38880 3accgctttg 38940 3cgatcatgg 39000 lacctcccgg 39060 ~acggcggca 39120 :ccctgttcc 39180 :tttttgttg 39240 rccaaaaagc 39300 Iaaacattaa 39360 ~agcgaaaac 39420 atgataata 39480 gaaacattc 39540 gtctgctgc 39600 gttCtcgat 39660 actgacggt 39720 gatattggC 39780 agcggcgtc 39840 aatgcgctg 39900 aagcgtcat 39960 ggcaaattt 40020 tgctgtgtg 40080 tctggtttca accagacagc caacgtgata caaagccgcc gcatcacaaa aac tcatcaa cgttgaggac tacgcgagcg gaatggtctt gcttcctcgc cactcaaagg tgagcaaaag cataggctcc aacccgacag cctgttccga gcgctttctc ctgggctgtg cgtct tgagt aggat tagca t acggc taca ggaaaaagag tttgtttgca ttttctacgg agattatcaa atctaaagta cc tat ct cag ataactacga ccacgc tcac agaagtggtc agagtaagta gtggtgtcac cgagt tacat gt tgtcagaa tctcttactg tcattctgag aataccgcgc cgaaaactct cccaactgat aggcaaaatg ttcctttttc tttgaatgta ccacctgacg acgaggccct tatagtgagt gcaaaacca aacgcgccac attttatgac gaaagt taac tttcacaaat tgtatcttat ccggc taggc aacgtgaagc cggtttccgt tcactgactc cggtaatacc gccagcaaaa gcccccctga gactataaac ccctgccgct atagctcacq tgcacgaacc ccaacccggt gagcgaggta ctagaaggac t tggtagctc agcagcagat ggt ctgacgc aaaggatctt tatatgagta cgatctgtct C acgggaggg cggctccaga ctgcaacttt gttcgccagt gctcgtcgtt gatcccccat gtaagttggc tcatgccatc aatagtgtat cacatagcag caaggatct C cttcagcatc ccgcaaaaaa aatattattg tttagaaaaa tctaagaaac ttcgtcttca cgtattatgc igctcgcgcaa :ggaaaaacac :ccatgattta :aaagcatttt :catgtctgga :tggcggggtt :gactgctgct :gtttcgtaaa :gctgcgctcg Igttatccaca iggccaggaac Lcgagcatcac Iataccaggcg taccggatac fctgtaggtat *ccccgttcag aagacacgac tgtaggcggt agtatttggt ttgatccggc tacgcgcaga tcagtggaac cacctagatc aact tggtct atttcgttca cttaccatct tttatcagca atccgcctcc taatagtttg tggtatggct gttgtgcaaa cgcagtgt ta cgtaagatgc gcggcgaccg aactttaaaa accgctgttg ttttactttc gggaataagg aagcatttat taaacaaata cat tat tatc agaa ttcgcg ggccgcgaat LttCggCtgtg Icgcatacaga Ltttcctttta cagcttataa tttcactgca tctgacgggt gcct Cactgg gcaaaacgtc gtctggaaac gtcgttcggc gaatcagggg cgt aaaaagg aaaaatcgac tttccccctg ctgtccgcct ctcagttcgg cccgaccgct ttatcgccac gctacagagt atctgcgctc aaacaaacca aaaaaaggat gaaaactcac cttttaaatt gacagttacc tccatagttg ggccccagtg ataaaccagc atccagtcta cgcaacgttg tcat tcagct aaagcggtta tcactcatgg ttttctgtga agttgctctt gtgctcatca agatccagtt accagcgttt gcgacacgga cagggttatt ggggttccgc atgacattaa gccgcaatta ccagatttag acatccgtcg gacgtgagcc tggt tacaaa ttctagttgt gcgcatgatc t tagcagaat tgcgacctga gcggaagtca tgcggcgagc ataacgcagg ccgcgttgct gctcaagtca gaagctccct ttctcccttc tgtaggtcgt gcgccttatc tggcagcagc tcttgaagtg tgctgaagcc ccgctggtag ctcaagaaga gttaagggat aaaaatgaag aatgcttaat cctgac tccc ctgcaatgat cagccggaag ttaattgttg ttgccattgc ccggttCccca gctcct tcgg C tatggcagc ctggtgagta gcccggcgtc ttggaaaacg cgatgtaacc ctgggtgagc aatgttgaat gtctcatgag gcacatttcc cctataaaaa accctcacta aaggcagatc 40140 ccgcgccgga 40200 tgtcgcacag 40260 taaagcaata 40320 ggtttgtcca 40380 gtgctcctgt 40440 gaatcaccga 40500 gcaacaacat 40560 gcgctcttcc 40620 ggtatcagct 40680 aaagaacatg 40740 ggcgtttttc 40800 gaggtggcga 40860 cgtgcgctct 40920 gggaagcgtg 40980 tcgctccaag 41040 cggtaactat 41100 cactggtaac 41160 gtggcctaac 41220 agttaccttc 41280 cggtggtttt 41340 tcctttgatc 41400 tttggtcatg 41460 ttttaaatca 41520 cagtgaggca 41580 cgtcgtgtag 41640 accgcgagac 41700 ggccgagcgc 41760 ccgggaagct 41820 tgcaggcatc 41880 acgatcaagg 41940 tcctccgatc 42000 actgcataat 42060 ctcaaccaag 42120 aacacgggat 42180 ttcttcgggg 42240 cactcgtgca 42300 aaaaacagga 42360 actcatactc 42420 Cggatacata 42480 ccgaaaagtg 42540 :aggcgtatc 42600 iagggatccc 42660 catcgat 42717 tctcatgttt gaccgcttat <210> 114 <211> 34071 <212> DNA <213> Artificial sequence <220> <223> Description of the artificial a26g1 coding strand sequence: insert of DNA of the cosmid <400> 114 actgcagtgc ctcgctcatc catggtccgz aagcggcctc tgccctctcc gccggaaagt gcaggcgt tc gattgcggac cgaagtgatc gcccttcaat tcaggttctg_ ggtggatgaa gccggtcgcg gggcgagcgg tctcccgtcg ccgaatcgaa gctgcatgcg agagatcctg cggatacttc t acggtgc tc gtatgcccgg gcctcatcgc c tggggcagc ggatctgatg catttttgat aatcgcggag catccagctg cgagctgttc gcagaatctg ctcgcgcggc cgt cgcaggg atatccggcg tgt cacggaa cgccgatctc aatcacacac cgaccgcgat ctttttgccc tggtgagagg cgccacctgg ctgcggcggt gc tatggaa t gacaggtgac tgacgatcgg cgggctcgcc :ccggaatcgS jgcggtgcaat ttttctagacS -agcgaacgca taccctctca cccgcataca Icgccgttcgc ragtggcggcg ccgtgttcgc *ctcggcgaaa Igccatcgtgg *ctccgcagta agcttcgccg ctttggaact *gatcgtccca cccgcgctga acgctgatgg accggcaccc gtgaatcccg gcccggat tc atcgtggagc attcccgaat ctcacactgg a tggtcgaaa gc tgccacga aatcccgcct ctcgaagagt gaagctcagg acatatcgcg gctggacccg ctgttgggcg caacgtcttc tcggaagtat gcctatgtcc caggccgtcg acgctgctcg t tgagcgccg ctggcgaggg cgtctgctgc gaagcccttc ctttacggac ggaccggttt atgcagcccg cgtggatacc cggtggactt tacgcaaccg gccccagcgt cgacggtggc *gcgccggcca *acatcgcgtg tgcaggacct cacccgttca cggacgatga actgtttccg tgcatcacat tctacctcgc Ctttcgtccg actggtcctc gtccgccggt ctgcgaaact cggcgtttca tcaccaacgg taatcctgcg ggcaaacgct ggttgggtcc ccgtgccgtt agt ccc tggc ccgatggagg ttgaacgtct gtccagt tgt ggaatgcgac tggagt tgac aactcaacgg gcgaaatgg t tcctgaaggc ggc tgatgc t ggacgcagcc tgtacacctc tcaattttct ccctcacgac gcgcgcgcgt aactcgcgcg tcgcatccgg ctcgcgatct ctaccgaaac cgattggccg cacccatcg tgaatcgtcc acagcagccg gatcgatacg tgcggaactg gccggaacc t gcaggcgct t gatcgcgcgc gg tgga tcg t aacggtccac ggcggtgctg ctcgcgtctc cctcgccgac gaggacagc t ctggcagaac gcagctttcc gcagagtttc gaaggcgc tc agtgcttctc tcggacgcaa aggagaact t tctcggcgcg cggactgcgg catgt tgggt gatgccgc tg cctctccgcc ctccttgcac cgatctaccg cgccgcggaa gcccgacgcc gagcgccaac tggcatccat cggcgcggcc ggaagagacc cgacaccaat cggttcgacc ttcgtcgatg gttcatgttc cgtggtggcg cagcaaagcg ctggcccggc tgccgaccgg gacaatttggt ccccatcgca z tgttgcgggc ggaactcagc ctggtgcgta gatctgcgcg gccagggatc gcggcgcagg tgg tt tat tt gcgagaggcg catccggcgc agcagcgtcc at cgacggcg ctggtgcagt ttctggtcac ggcggtccgc gaactgttgg ggccagcttc cggggaaact gcgcggcggc tcccgttgga ccggaat tcg tcaggcgatc atcgagcacc gttctattcg cagtccggcg cgacagagcc tttctgcaat ttcgccgtgc ctgctgacaa ttcccgtccc atcgcgttga cggat cgcgc gtcacgcggt tacgttccgc aggccggtcg ccgaacccgc ggccggccga cggcatgagc 3acatttccg aaccaggaga icaatgatgc 3accgccgcc :tcctgcaac :ccgccatcc ~acactcagc ;aactgtaca ~cggacaagt tgggattgga tcttgctgcc 120 taagcgatct 180 cctcggttcc 240 accgaagcgc 300 ctttcgatcc 360 tgcgaacgac 420 cggtggattt 480 tcttccacgc 540 cggggaagga 600 tgctggtgat 660 ctgtcgcgcc 720 ccggaaccga 780 cggttctgaa 840 ctcactcgtt 900 agaacgcgac 960 cctcacaaga 1020 ccgatctcgt 1080 cggatttcaa 1140 aggagtaccc 1200 tgctccagca 1260 gtcgcatggc 1320 ggtttgacct 1380 acaacacgga 1440 tgctggaagg 1500 cccgggaacg 1560 aatgcgtgca 1620 gcttcggtga 1680 actatctccg 1740 cgctcgaaac 1800 tggaaccgga 1860 ttgtgctgaa 1920 tcgcgactcc 1980 aaggcgtgca 2040 cgggcatcag 2100 cgctcgagat 2160 cggccgtcga 2220 aggcaactcc 2280 tgacggcgct 2340 gaaccgcggc 2400 aacgggtgac 2460 tctatgtgct 2520 .cggcggcgc 2580 tcgtcgccaa 2640 ttcgttcgai cgacggcgci' catcgaaaci ggtc accgcc tgctgacggc gcacgtgacc ggatccgga( tgccgagat< tCgcgtgctc t tcggagta( gga ccgcac( gatcgacagt cgaagcgtat cgtatttgtc agttcagcgc cgcgtcgcac gattccggac gacccgcttc atcggatcgc gccggagttc tgtgacctgS gcgccagacg gccgtaccgc cttctgccgt gtatacgaac tcatctgaag aatgccgctg ccggcgagcc cca cat ttgg ctctggagga cgtggaagtg tattcaggac t tcccaaatg cagcgggttg caattgctcg aaccgggatg gacccggatt attcgcgtcc tgcattggcg cttccggttg cagatcccgt gacaagcagc gagacgt tcg ccgctcatgg aaccaaaccg cttcctaccg gccctcggcc gtgttcatga gacatcctta ggttgtttcg ccctcatggca g ctcgagtatc :ggcgagatcg :agagaaaatg g catcgcgcga 3 cagtggcaat g ttcaacatcg 3 cgggagtggg :gagattggct :tgggccacgg :ggcctggcaa :cgctcgtgcg -ctgcggcgcg ,ggcgatgtcc -gcacccgcgt Igaagaggaac atcggccgga cgctaccagg aggcgttgcc igccgcattta fatgaacggtg tcgccttccg gtggcaatcg gcggcggccg gatccgctgc gagaagctgc acgcccaacg cacgccgaag ggcgaggtgc cattcgctgc ccctttcgaa ggcgtagacc tcagatgttc taatgcagaa accgtctgct ccgcaccctt ctcccgccta tcctgcggag gtgtggatgg ttcaggctga c tcgacctgcg z agattctcct S tccgcgacct aactgccgat t cgcagcagta c atcgcccccg g gtgatttgac c cggcaatcgc g tcggggttcc a ccaacatgat c ctcggctgta tcggccggat aggccgcggt acgcggccgg cggcagccgc ccgtctggga tcggc tggag tgcaggattc gtggtacggg acttttcgca atgtccgcac atgcggttgt tgctggccga gcagtctccc cgttgacccg tcgtggtcga ttgaaatcct cgatcctgca agaccgcggc ccgagat tcc acgaagctcc gcgtcgatcc ac tggagcag gtccgccggc gagccgtcta ccgactacat gaaaaatcga cattcacgcc t cggca tgga tggtcacgca ccgtgtttaa caggatgggc aaatcgagcg ttcgtcgcca acaggaaaac gtcgctggcc :aacattccc :ctggaggcc igaggcgcgcc :ggaatcgca ~aaacccccg c ;ctgacgttg a ~acgcgatcg t ~cagtatggc g :tggaagaaa c rcccgcgcag c gatggactc c Igcgtttcag g .gtcgcgggc c gtcctgcgc g tcgcacggga cgaccaccag ccgcagtcac taagtatctg cgacacattc caccacatat aagcagtgt t cgtcgatcgc actgctgctc gaaggcgc tg gttccggcag tctgaactcc ggcggtgcgt gctgctggag gaatgagttt tcccgcgt tc gccgcgtcgc tatcggatcg cgaaatacgc gaacgcacgg agagacactc cgccgatcta tcatgggcca ttcccgtccg tacgcgcacg gatcccgacc ccgtaacgcc tccggaaact tggcat cggc gatgatcgcc cgcccccacg aaggcgagcc tatgatgagc aataccatag igccccgaac agcagcggc itagcgctgc Itggtgcagc :agagcctcc ;aagcgcggc z :ttctgcggac LgCcacatcat ~acgaagcgt t ractgggccg t :agctgtcgg g agacctggc g :acgcgtttg c 'tgctgctgc a gtacacaac 9r gcgatctgc 9 gat.ctcgccc gtgaagatac ccggcggt cc gcggcctaca cacgaccgag gaacagaatg accggagagc atcctggcct ttccgcgtcg gac tacatcg gcggccgacg gttatccagt gtggt caaac acgttttacg cggcaacgcg ttctttgctc ggccggt cgc cgggaagcgg agagtactga ttgaccgccg ggggagt tgc tggcgtatgg cacggacgct cgacgccgcc gttgtgccgc gcg tgggtcg ctgcccgatc ccggtggaac gtccatgatc cgcgtgcgcg gttcgaggct1 gccgatttgcI gccgcccaag acctctcgct atcgcatccc :ttggtttct atatccgaggt 3gcacgagag c :ggcgcgagt 9 laatggcctt g ~caagctgat c :cgcggatgc g :cgtgcaggg g :ccatcagca g Icaccttgcc t rgggcgccgt g :cttgcgtga a .tcgctatac c 'agaaacgga a 'cgacgatcc g gccgccaacg 2700 gcgggttccg 2760 gacatgctgt 2820 ttgtccccct 2880 tcgagtccga 2940 cgccgaacgc 3000 cgattccagc 3060 cgcggccgcg 3120 ctccccactg 3180 ccgctcacgc 3240 acgcgtgcga 3300 acttccccgg 3360 cgggcggcat 3420 cttctttaga 3480 tgcgttcgct 3540 tccgcgaaca 3600 ataacgagct 3660 aggagccgga 3720 cggacgctca 3780 aaagcgccat 3840 gggaccggct 3900 acgaagacct 3960 tcgacgcgac 4020 tggccggccc 4080 agttgcgtac 4140 tgctccacga 4200 ccgagcccag 4260 aggtactcgc 4320 acttcttcga 4380 acatgctcca 4440 tcgccgtcgc 4500 tgatcgctgt 4560 ictaggaaca 4620 :gcCcgccgc 4680 3cggcgtgaa 4740 ~catcagctc 4800 :ccgctggat 4860 :ctgcgcagc 4920 jacactggaa 4980 rcgtgatgcc 5040 :tgcctcgat 5100 rtggtcggtc 5160 rcggccatcg 5220 acgtcgctg 5280 ttcctcgac 5340 gagaccaca 5400 ggagcgacg 5460 gcgcaggaa 5520 ggtctcgtc 5580 tcgtttcgc 5640 agtcttctcc ttcgaacgcc caggtctcct atctcgcgcS ccatccggcS accatcgcct gacgtccgca cagcacaatg gctcaggcgg t acgc cgag c ggcccgggaa cttgcgattc cgtctcgccc gctgagatga ctcgtcagcg gggcgaccga cagcgcgagc gatattgccg acccgcgaga acggtcatgc ggtaaaggct attctcgata tcggccgtgc cccatcgaca gt tccgggag gagctcacgc gat ctggc tc atcaagctgc at tgccgtga ccggcaacgg cttcccgatt aacggcaaaa gagccgatgc gtggagcacg acacgggtgc ttccggcata gcggaacc tg ttcccggggg atcgccaggc gacccgagct ttcttcggct gagtgcgcgt atcggcgttt gagccgt tcg t tcc tggcca acggcgtgct gcctcggaca ctgcaccagc gctcaaggca gctc tggccg icccgcacccc tggttgagga tcgcgctgct Iagtacatgca Iatggactgat *ccctgctcga tttcaaccgc cgacacggcq aacgcactcc tgaatgcgcg *agatcatcgc tgaagtccgg ggattctcga tggcgatgat gcagcaagcc agggcgtgga cgggt ctgac gactggagat tcgtggttga aggcgacccc tccgtgtttt gtggcgtaga acaagacaca acacgcagt t agctgtacat gtgagaaat t gctaccgttc gcgggt ttcg agcaggcgat gcgacgtgcg acatgatccc tcgacgcgaa gcggcgatgt tcgactatcg gcggactgct cgacgatcga cggctgcggt cgcgcaatgt tttcgccgga acgtgccggc acagt ccgcg gggaagcgat tcgcgggatg atttctcacg cgcgtgtctc ccacctcgct ttgcc Ctggc cgggcatgat cggtgccggg atggcgacac cgacaccgct *actgcatcct *gcccgatgcg caacggcgga ggcgtcCgCC tgcgtaccga tgcgctgttg cgatgccggt *gcacgccgtc ggccaaccgc tctggcgatg cagcgcgtac tgaagtgcaa ggcga tgatg cgacgatctc gatccgccac agccgccgat ctggc tgccg cggcgagcgg gagcggt tgg c tgcggcggt gctgtggaat aagac tgggt atacatcctg cggaggagcg ccgcgagtgg cgacggcgca catcgaaccg tacggtcgtg cgat ctgcag ctcggcgttt cgcgcttccc ggtggagacg gcagaacttc cgaggagcgc gtcgcttgcc cgcagaagat ggaggagt tc agatctgctg caagggtctg cgaagcggag ggagaacgcg cggcgtcaa C cccctccgcg ttacaagctg ggtgtcggtg cgggggagt C cctgtcgccc caacggcgcg gatctacgcc C tgagcgccc ccgcgggacc ccggccatca tccaaactcg gaatacaaca accctgctgg tcccccgcgg ccgaacgggt gccgttgtct ctggctcatc gagcgctcgc ctgcctctcg ccgcacgcgg gcggtcgccg gcctacatca tcgtcgctag gggctggtcg ttgatcaccg ctcaccaccc cggcaattgc gaagctctgc ctttacggac gcctccgata gattcgcgca ggac tggcgc cg tgat cgag gt cgagtgcc gccgagattg aaggacga Cc agcgatttgc gtcagcctgt ggtttgccca attgcgtcca tt Cgatgtcg ctggggttga ggcctggcag cggatCcgcag tggcgcaatc gcgggcggca ctggacggca atcatggacc ggatatgcgg acctacctgc taccagctgc aacctccgcg gtgatggcat gccatcaatg gacgggcgct ggtgtggt cg gtcattcgcg tctctcatca Cgagccggtc ccgtcatgcc acctcggcgt ccgatttgtt cgagcgtggt tccgaagccg gtgcgcatga tcgaagacca gtctgagcgc tggagatggt atcccgcgca tcctcacgca Ccgaaccaga tatatacctc tcaatctgct ccgtcaccac gcgcccgcgt tgctggataa tggattcggg cgccggaact cgacggagac gcatcgtgcc tggagccggt ggggctatca gacgcattta tgggacgagt aggccgcgat ggctgatcgc ggtcgtggct cctcccttcc caacgccggt tctggcgtga gcgggcactc cgcIZctccgt aaaaatccga Ctatcgggat tgcgcgacgg tcagcccgga tcgagttttt cgcagcatcg cgcgaagc Ca tgaacaacct tgacggccaa ggcccagcct gcgagagctt ttccgcagtc gccgcgcctt tcctcaagcg qaqcggctat ggactttcct 5700 gcctgtatCC 5760 tgggctcacc 5820 gaccctcgag 5880 cgatgcggca 5940 gacggatccc 6000 gatgctcgag 6060 actggtcgaa 6120 tcagttgacc 6180 atccggcgcg 6240 gattgcgctg 6300 ccccaaggat 6360 ggaggcggtg 6420 agctgcgaat 6480 cggatcgacg 6540 gcgctccatg 6600 cgtgtcattc 6660 catcgtcgcc 6720 gtcgggcgcC 6780 ctggaagccg 6840 ggcgcgccgc 6900 caccatatgg 6960 gatcggccat 7020 Cccccccgga 7080 Ccgcaacccc 7140 ctctaccggc 7200 cgatcgccag 7260 cgagacgcac 7320 ctatctcgtt 7380 ggcaacgcgc 7440 gctgacgccc 7500 Cgctgctcgc 7560 agttctgcgc 7620 gctaatgctc 7680 cgtcgatctg 7740 acccgccgct 7800 ggccggccgg 7860 tgtggattcc 7920 ggtcttccag 7980 cgatgccgcg 8040 cgtgtttctc 8100 taagggttcg 8160 cgccaccgcg 8220 cgacaaggat 8280 gacggttcag 8340 gcagcgcggc 8400 cgtggggtac 8460 cgatgagtcc 8520 ~ctgagccgc 8580 taataatgat 8640 ggcgccgagc cgcactcaag acagccacgc aacggaagcg gcggccggtg cccagcc tga gtgag tacgc tcgt tcggga ccggccgcag gcggccctcg tcgttcggcg tgcat tgtgg cgcatcgccc gcatggggt t agatggcggg cgctcggcga gcgatgtgta tggccgggct atttccagcg ggctgctcga tcggcggcac ctgcgcccga aagaactggt atgtcgcatt cccgg tcgag agacgcgccc taccgctcca ggtgtggacg gtcagacgcg gcgggcatgg gccgccagtt tgcgcagcc ctcgacgcga ttgtggaagt gcggcggcgt cgcggccggc cagcgc tgtg gaggtcgtga gagggcatca ccgattctgg ggcttggttt tactggcgag.
gacgaaggct caaaagtgcc gagtgggaga tggcaggagt gagagacgcc ggtct cgtcg tcgacggct t gtgtacttcc acgctggaaa gttcagctcg tccgacggct cccgtgcccc ctgctgcgcc ttcggtgaac gaattggcgg z ggcgccgaac z ctggagggcg c ggcgaggcgc a ttccggcgcg t gtcgagtggc 9 ctgatcctgg c ggcgagatgt g gactggcgcg g actctggccc t gcgcaggcga. c tgctgaatgg ccgccggcgc cctggggtgt gtattgccgg tcatgcagaa ccgcagcgat tttcgggtgc aaacgcagat cggacctgca cgaacctcac atcacgctcg gcgacgtgt t tgcccgacga cgattctcag tcgaccgtcc gccattggat ggtgccggct cgcctctact tggccatggc c acgtgaactt tgctttcaca c cgtggaactt a 4actggtcga t icctcgagat a iggaagcgct 9 iagccggcct g Ltccgatatt c ~cgtatgggg g Lgttgctgga c .cactcgcgc a gcccgaacc g 'cgacagtgg c, 'cgttaccgt g, gatcgtcaa ci ggtgaag~c c cagtgcggt g( cgtgaagccg tccggtggag tatcggatcc gatcaagacg tccgaatccg ctggcccgcc caacgc tcac acgat ccgca gcgctcgtat cacggccaat cgacgaggct caagat tgcg ctacgagtcg acggctcgat gcagcccgcg gacgcccgac cgccgtcagc cctgccggaa cacctcgcgc gccgcaggat gctggccgtt acgccgggcg gggcaaactg aaaccctgtc :ctcgagatc 7gccaagcag t "agtgtggcg a Itattcgcga a Igagcggagt t Itcgctaccg g ;tcagaccac c fctcgaggcg t !gcgcaccct c 'agcgatgac c catgctgcc g~ gaacgccgg c gaactgggg c gccgcgatt g~ caattgctt a~ gctccgctc g~ gccgcgcaa at :cggagggc tc :gggcgcag cc 7tccgccag cc 3gcgcggcc cc :cgccagcc gc ,tgtacagt ct :tgaagtcc 9 :acaatccc at gagtccatcg atcgccgcca gtcaagacca gtgcttgccg cgaat tgatt ggaaagaccc gtgattctgg catgtgcttt cgcggccata gcagggcgcg cgccaacgac tttcttttca cagccggtgt c tgccggcgc Ctgtttgctt ctggtgatgg ctgccggatg ggtgcgatgg gtctccattg attgagagcg 3cgcgcgcct jcggcgatcg ;ccggcgagg 9tttcgccg ;gtcctaagc ~ggctgccgt ~cgctatatc a iggcgtgtcg c ccagaccgg a 'tggcagacg t gatattacg 9 cggcggagg t tgatccttt c ggcatgcct c gcaatattg c ctgcggtgg a cgagctacc g attccgcaa c gcgccgcgg C 3tatcgatc g :tctccggc at cgttctcg gc ~ctcggaac gc ~ttcgcgaa cc ~cgctctgg cE rcgagtaca tS .cgatgatt at tccgcggc ta gcaggccg cg gctacatcga tcgctgccaa acatcggtca tccatcgcgg tcgcaaacac cgagacgagc agcaagcgcc gcctgt ccgc tggacaacca tgcact tccc tgacggaggc ccgggcaagg ttcgcgccgc tgttggccga tgcagtgggc gacacagcgt cgctcggctt ctgcggtcag cggccatcaa cgctggcaac ttcacagctc cgtggcgcaa 3acagctggc acqgtatcca :ggttctact :gctgcgtaa a ~gggtgggtt c :cctgccggc c Lacctgtagc g .tatcttcga g rttcggtggt g gtttggcgc C agcggagcg c gttccgcat a cgcccacgc t tggagacac g ccgcataca g gccgctcaa t gagtcccgc gi ttcgccgga c 3gaacttca gc laagcccga at ;ctacagcc tc ~gatgcgct cE ;tccctagt cc :gagctcgg ct tggctggc ae rctctgggg ct attgattcgg 8700 ggcccacgga 8760 ctttccgaaa 8820 tctagacgtc 8880 ccagattcct 8940 tccgtttcgt 9000 ggcagtcagt 9060 gccggtgacg 9120 caatacagac 9180 gcccggtttg 9240 gcaccgtatc 9300 acgacgggtt 9360 tgcgcaatac 9420 catggatgaa 9480 tgacgcgttg 9540 cttggcgcag 9600 cggcgaatac 9660 agttgccgaa. 9720 cgccggcgag 9780 cggacccgct 9840 tctacgtgcg 9900 gagcatggat 9960 tccttcgatc 10020 3aatccgctg 10080 iacgctcaag 10140 :ggcatgggc 10200 ~ggccgcgat 10260 :gacatcgat 10320 :tatcctttc 10380 jgttgcgagt 10440 rtcgaaacta 10500 rgcCccggcc 10560 'ggccggcac 10620 gacacggct 10680 ctcagcttg 10740 ggtgtcgct 10800 tactattcg 10860 cgcattcat 10920 ccccgttgt 10980 cttgcggat 11040 tacggcagc 11100 ggctttacc 11160 4gcgtgagt 11220 ttgtatgag 11280 ;gggcatgg 11340 ~cagctcag 11400 ;gtgagcgt 11460 :gccgcagc 11520 Lcggccggc 11580 tcggccgg 11640 gtgatcgcgi gcgcatgc t tcggcgtgg.
gcggcagtcc ggacttacac.
cgccggcctc acggtcgac( acgtcaccg( aatcaggact ctccatcatc tcgctcttac ctggcgcatc tcgggagccc gtggacgagS gcgctaccgc caactgcgca cacatcaggS cagcagcctt ctctcccaat gatgcgatcS gcgccggagt gcgcaggtcg agcttcgccg gcgcgcgcag atctcgatgc ccgaccgctg acacgcggta tcgctccgcg gggaggcgat tcttcatggg atatcgacgc acatcctcgg tcgcggtgca gcggcg tcaa tggcagc cga agggc tgcgg tgatggcggt cgccgaacgg agacgctgga tcgaagccgg agctcgggtc tgatcaaggt cgcccagccc cctggccggt gtacgaatgc caccgtacct cataccgcga gcacttcata ggc tggacag gcggccagaa cgcgagcaccc t cggcggccgg a~ gaagcaaccg :gtctggtttc g tcgcgaaatg :caaacgagga 3 tcagccggga tgcgcggcgt ggacgcggat acacccgcga Igtcct gccgg -accggcgcgg Igaatggccgc Igtctacacat Icgggctcgat Lcccgcttgaa iagtcactggc tgggtgaact cac tggggca itcagttacgt ttgagaacct accagtggtt cgccgtgatt cgagcccatc ctattggtcg ggacatcgat cggcggcttc cgaggcgatc cgagaacgcc gatcttttcc gtacaccggc gctgcagggc ccttgcctgt tctgattctc cggtcgctgt z tgtggttgtg c gattcgcggc z gcccgcacag tgtgagctat S agcccttgcg g ggtgaagacc a ggcgctgatg c gcacatcgat t tgcacccggc g gcacgtgctc a gcttccgcta t cgtggtgaac g cgaacaccgc g ttttctggcg g gcgaaaagtc g ggatctgtgg cgcggccgcg gcgc tacgtg gggcgcgac t gatggtggag gcagcagcgc agaagaggtc tatccatgcc cgcaagcgtc tctgccgctc gcaggcaggc actgggtt tg gcgcaccagc tct cgaggcc taccggcgag cgaagccaca gcgctttgtc gggact cgat gcc tttgccc gctccatgcg cgcccgcgaa ggagacgcaa gcgctcgaca gccctcatcg ttgctgcgcg gcctactacg atcgatcagg agcctggatc gggcttccac aacgattatt acgggcaata :cgaacatgg :agagcctgc :cgccggatc ~aggcattcg ~tgaagcgac Lcggcaatca z ~aagccgtgat jtcgaggcgc a rCcgcgctgg S iacttcggcc a tgcagaacg a ggaacacgc t ggcggcgcg t tcgagcagg c cggcgcgca g acaaccccg c cggcattca c gcaacccga a ttttcgttt t ggcgggctca cagatgcgtg ccgcgactga tatttgatca cacggcgcca gtgctgcaac gcggatctca gcgggtgtgc atggcgccga gacttcttcg tacgccgcgg ccggcgacca cagtcgatgg gtcctgcatg ttgctgcgtc ccccggcagc ggcatcgcga tcgctaatgg gcgagtctgc gtatttccac gaac tggaag taaacatgag aggtgcagaa gcgcgggctg agggccgcag atccggatcc ttgaccgttt cacagcagcg ccgaccggct acaacctgca cggccagcgt cgatcgacac 3ctcaggtga a Igacgatcta c itgccgcagc a :ctccgacgc g ~ccaggacgg a :ccgccaggc t ~cggaaccgg c ~agcggggcg C Lcctcgaggc g 4agccattcc g .tCCCctcga a cgccggcat c 'gccgcaaca g tccggaggc g cgacacctg c cgggacgaa c ccgcgatac c gccgggaca a tcgatctgga atttcgacgg cccgccgacc ccggcgggct ctcgcgtcgt agat tggtgc C tcgccgcat tggacgacgg aggcggaagg tgctcttttc ccaacgccgt gcat taactg ccggcgtggc aatgccccat ccgccgcgc C gcgaagccat cttccacacc ccatagaact tgttcgacta ccgaagcatc cgctgctcga cgggtcagac acgcatcgac ~CgcttCcc cgcggtacgt cggcgcgacg tgacgcccgg 3Ctgcttctg 3gcggggagc iatgcgcggc :gccgccggg ;gcatgctcg1 iagcgacctc ~ttctgcaag Lgacggctac ~ctgcgcgat .cgcagcaat .gtgggagac 'acgccgctg 'accaacggc a gcagcgggc 9 ccccatctg a atcccggca c aactcgttc g gccgcgtcc a ctgcgtgat c tacacggcg t gcgcaggac t gccacaggt t ggatcgcag t tcccgacgat 11700 cgaagatcag 11760 cagcgcgcga 11820 cggagccctg 11880 gctggccggg 11940 gacggcagag 12000 ccacacCgaa 12060 cgtactgctg 12120 cgctgtacac 12180 atCggcatcc 12240 tctcgatgcg 12300 ggggcgctgg 12360 gagCCtctcc 12420 tcagattgcc 12480 gccttcacct 12540 cctcattgcg 12600 gctcgatcca 12660 tcgcaactcg 12720 tccgtcgctc 12780 accggtggaa 12840 ttcgcggctg 12900 gatctcagca 12960 cagctggaga 13020 ggcgcatcca 13080 gaagttccac 13140 ggccgaatgt 13200 ttcttcggca 13260 gaagtcacct 13320 cggaccggcg 13380 ggggatgcgc 13440 cgtctctcgt 13500 tcatcgctgg 13560 3cgctggcgg 13620 ctgaaggcga 13680 4tccgcggtg 13740 :gcgatccgg 13800 ;gactgacgg 13860 ;cgcgcttgc 13920 ~gcgatccca 13980 ~acaagctga 14040 jtggccgcac 14100 Latctgacca 14160 :ggctcaccc 14220 Igcttgagcg 14280 .gtacgcccg 14340 'tggcgcgcg 14400 gcgctcgcc 14460 tgatggccg 14520 ttgtgccgc 14580 ggcccggca 14640 tgggccgcg, gcgccatgc.
gcatcgacgi gccattgggc cgcacattg( gga tgct cgc aggcca tcgc cgcgcagcac aggcgaaage tggactccgt cccttggcat cgtactgggc cggcgggtgg tccaggaaac acggaaacct actggtctcg agcgtgagcg gccggcggat atttcgcgga tggcgc tcgc cat tgacggg tccgtcatga agctgcacag cgattcgcgc atcgcggcta gtgaggtgct tgcagt tgcc gtctggaccg tgcggccgga acctgatgg a gccttacg gattcaac Saatcgagc aggtgcgc cggagtac tgccatcg :cgtcctgt Lcgtcttct gtgcgcggi gtactcca tcgtaatct tcatgatgt gc tcggag cgcactgcc tatttatcc ttattggat Cccgtcgcc t caccggct cgctgcgcg cgcgctgac agagggcgg cgaaggcat ccgctgcac tcacttcgg ttgtcgcgt cgcggccc t attctcgcta ttccacggtc rct tctgaacc tc gactggtc cg gccctgtt cg gacgccgt tg actctcga 9C ggccaggg CC gggcgctc, C9 ggcgacag, 9c cgtcgcgt4 C9 ttgccggg( cc gtcaccggc :t cgccaacc( it cgggcagcc "c cggacgctc "c aacggcggc "c gatatccgt :g ctgccggac g cacgatgtg 'c caaggtctc g ctgccggaa c ggagcttcc g ctgcaggcg g gcggagctc t cccaccttc 9 gacattccg, "gtccatcac, "gctgaagtg( 3 gatgtccgtc 3 agagtcgcc( I accgcgtcgc -atcggcgagc gatacgtgct gacctgcttt tggctgctga caagcctggc acgctcgtcg tcgaatatcg agtcttcaca atcacgggcg gcgcgccatc aacgtcaaga atcatcgatc gacggcatgc gccggcgcgt ttctcttccg tcatttctcg gcgtggggac gccgcgcgcg t tgacgcaga 'gg tgttccgt 9c tgacgcaa cg cagtcggg ga tcggccac tg aagccgct ag aaatggct, gg atcgggtc cg cagctctg! ga aagtggac :g9 tggtgggac :g cagcgattz :g tgatgctgt Ic tgagtcccc ;a ttgccgctc Ig gagcgctgc :c aaactcgcc c cgccgcagg ra tgcagttcg [a tcgtgacc g gcgccgggc a acgatgctg t tccgcatct g gcgattcca a cagccgatg, c gaaccatcg! tgacggaaal g acgatttgai ccactggccc tcgtcaccgc atagcggccz I ttcgccgcgc cggcgattgc cgggtccgcc ccgtattgcg cgcgcggatc tgcacggaat atctcgatgc cagaggatca agatcccatc ggctcggcgg tcgttctgct tcatccatgc gcgacatgcc tgctcaatct ggaacctgca ccagcgcgac acgcgctggc cgtggacaca gcatctcggt ttcggccgca gc cgccatcg ga gttgcagg gt cgccttgg ag catgggcg cg ggtgattt 9t cgtggaat tc gattgccg gg cgaactgci at tgcctcqc 3c gcttcagc( Ig cggtgaagi :c gacggccgt :a cccgttgtt ;CcCtcgttgcc ,t gactaacgc :g gctgcccaa [t cgagtctca ,a gtccactgt gggagcgtg tcaccatgt "cCaggcaggt a cagccgcga ggcatccat "cCttctattc 4 ccccatctgl :gcagacgat4 3L agatgtgcai :ggtctggggi I caccggcagc igctcggcgaE Icgatgccaat cgagactctc gtgccgtcaz Fcagcatcgtc tgccgcggt t tgggcggacg gcacagcaac agttgcgatc cgcacccgat actcggactg gggacgcagc ggacgtggcg gccgttgcgg cacgaccgaa cgaactcacc agtgggatct tcatctgcgc ggttggtttg tattcaaccg cgtcgctgtc 'aa gagtgcggcc 14700 gg ccgctcgacc 14760 cc ggactgtggc 14820 aa gtcgcggcag 14880 9c ctgcgcagcc 14940 ta gcgctggacg 15000 cc agcaacagcc 15060 t9 cgggaactgg 15120 ac agccatctga 15180 g9 cggccggccg 15240 1g ctggtttctg 15300 gccgcagccg 15360 g9 gtccagccga 15420 1g cgcgatgaag 15480 ;a gtcactccgg 15540 Lc tatccctggc 15600 L9 gcttgcctg 1.5660 g gaggcgaaag 15720 ,g cacctggcaa 15780 c gaacacgtgt 15840 t caactggtac 15900 g gattcctgga 15960 c gatctggatg 16020 g cgactgtggg 16080 9 cgcggcaacg 16140 cgactgctgtc 16200 t gtgccggtag 16260 a tacgcggtct 16320 gtggtggcgg 16380 i tcggagattc 16440 gccggcaatg 16500 icaaaagcgcg 16560 iattgtgtact 16620 rcaagcgggct 16680 ctcaactccg 16740 attgcctatg 16800 gactgcgggc 16860 cggcaaggca 16920 gtggcgttcc 16980 caggtggcgg 17040 gagcgtcctc 17100 gaccggcagc 17160 ggcgtgttcc 17220 cgcttcgaag 17280 gccggccggc 17340 cccggccagg 17400 cgcgcccagg 17460 gccgcacagg 17520 caacagggat 17580 atgaacttcg 17640 aattggtggg gctgcagtcc ccacctggac ctggcggacc gccaaacctg gtccctcgcc ggcctgagcc acaaagatat agcatcccga atctcgcgac cggtatgggc gtgccgacgc gatggctcgc ggccacaact agctatcgga atctggcagg ccgccatggc cgctggatca gcaactacgc gtct tcccgc cgaaccgcgg ggtgcaatgc ttggctcgtc fccgcacggcc gcgcatcgac gccgcgcctc tgatattcga gctgcgctgc gctgatgctg gccgcgcctc aacctatctg cgccgccgga ggaaggtgc tgcgctcgcg cacgc tggcc tccgaaagta ttttgttctc cgccggcaat cgtcagcatc agaccgtctg tgcgcgcgct ctacaaagca atatcgcgc tcgcacccc t tccagccc acgtgctgc tcgattcgt tttctgcca acaagct cg cCgttgcta cagacgatc ctttgttga gcgattccc acgtctcga ccggtttccl tgctattcg' atcgttgaai tttcgacaai gcggctgct(.
gctctccggc ctggatgca< tagcgtgatc cgacacggcc cggcgagtgt gtacgccacc ggcggcggac cgatgcac tc ggatggccgc gtcggcattS cacggggact cccgcgacct ggagggagcg gataccgccg gctcgacatt cgtcagcgcg gactggtaga ggcggctgtc cgacac tgca tgcggcaact ggacgaac tg gagcctgcgt cgCtgtttcc ttcccagcga ggggcgcagc cgccatgcgg cgaggtcat t ctggggaatc tgtcgcgggt gt tgagccgg tgaggccgtg Ctccaccgtc gcgaggcgtg :a gtggctcc< ic ggccgcggz 19 gcggcagcc :g cttcgatcc t gatggccct ,c cctgatctc ,g cctgccgct c ccttgctac t cgacgccgt a acagaagtt aaccttacg t gcggccgaa g ggcggcgat t gaggtaccg cccgccacg =gattttttci a ttggaagtgi :agccgcacg! 4 accgccgat4 -gccggccga( tgctcgtctt acgCtggcc tcgaagatgc ggcatcgtgt gcggccggac tcggccgggc fgccaatgcgc ccgctcggcc gtCggcgatc gcaggcatag agcttacacg 9 tgaaggaag tttggttggt ggcgaagctg Cccctcgcgg gacactcccg acgggcattg cgtcgggtgg gatctttgct ggcagaacg cggaagacgg tggatggacc ccttatgtgg cagcctgcgc gagccggatg gcgctgacgc atcagcggcc ctgtcgactt atctccggrtg tcttgccggc 3t tactatccc ic accaaaccc IC cgcgcgcgs -a gcgaaactc c gagtttcgc fg cgttaccCq .g gaaagcatg :c gttggcacc gcaaaccag gctcattt cgttgcagc c gcgacgcgc g ggcccgatg ctggacgat, cggtgaaga g gcatttcgci g Cgtgggagg, 3 gCgtCttcgi g gcgcgcgcal tttcctattt Cgctggcggc I tggccggcg5 ;gaaccgcctc tcggagaagc Iaccgggtgtc :tcaccgctcc Fgcgtcgcggc fatcccatcga tgtgcgcggt cgggattgat tgagacagct tccggccgtg ccggcacgaa cgagcgggt t agggggacac acactgcaga Cagacgcgat cgcgggcata acacggccgc ctgaagaact tattcgtct gcgaacccgt actggtcgct tCttcgcgct ccgtcatcgg tgcaggatgc tgggcgggat acacggaacg aagtcgaagc cggtgaaagt jr! cggccgcat 1g cggccgac iC tggaaacgc :g acggcagac a accgtctgc [a cattctccc 'g ccggcaatc 9 ccgcgggcs a tcgccgggt t caggagcct a ggCgtacct 9 cgaacccat a gttctggca g ggacgagga t tcaagccgg :acgcgaggc zactggagga cgggatccal :cgatccgtal :gctgaactt( :ggttcatcc I agtgaatctc ,gcccagcggt Ictgcggcgtc Iggccgtggtc caatgtcgtc gcagcagatc gatcgaggcc cgggtccctc9 taaagcggtg gaacccgaa t gcccgcgggt cgcgcatgtc ccattcccga tgggggcact cactcccgac gtatgtgctt cggggaa ttg agtccgccgc ggcggcgcag ctcgggacag tattcgcgag ga aaga aga a gcaggtcgcc gcacagcatg ggcgcgga tc ggcgatggtg ac tatcgccc cctggccgag ggacttcgcc :c gatgtccctg it gcgcagcgag :t gctgatgcac ;c gacgctgggt la agccgggctg Ic cctggcgcag [c tgaaccttcg fa ggaccggagt t gggggacaaa C cgagtgagtt g acgctggagc c gcgatcgtgg g atgttgcgca g tcggtccggc a tttctcgatt gtcagcattg t gcggggcaga zagccaaagca t accgccaccg g caaggacca 4 gcgtgccaga 3 cgcttctcgc :cgctgccgcg I gtggtgctga i cgcggctccg Itctcagcagg -ggttacaetcg rctggcggaaa faaatcgaaca ctcgcattga atccggttgg tcgagacgaa gttcttgaag a CCCCCcgccg c cccgacattg c attgcaggga c CCgctgtccg C ctgacagcgt c aCgCatcacC g ctccagggga t ggatCgcaat g gCgttggaaC g ctggcgaagc t atcgccgcat t ggagaggtCg c atttgCagcC g gagctgccgc t~ gCggtgtcga a( gtcgtcgCga cC gcgCatagcc cC CtggCCggCa 17700 ctcctggcag 17760 gaagCCggaC 17820 gatctcggat 17880 cgCgtcaagc 17940 CatCtCgCCg 18000 accgttgctg 18060 CcCCCC9ctg 18120 gaaatcgaag 18180 Cgatatccga 18240 aCatgcagcg 18300 gtctgggCtg 18360 gtggagtcga 18420 gCatCCtgaa 18480 CCatcgatgg 18540 atCCgcagCa 18600 cgatggaagg 18660 gcgactattt 18720 gCaCggCgCa 18780 gcatCgcgCt 18840 gcctgCgCag 18900 cggagtttat 18960 ccttcgaCgC 19020 agcgcctgtc 19080 cggtcaatCa 19140 tcgtcatCCg 19200 iagccatgg 19260 :CgtcggCCt 19320 tcgcac 19380 tcacgagac 19440 ~gggaacgtc 19500 ~gtttgcggg 19560 LagCggcgcc 19620 ~cgctgcgcg 19680 ~aggcactcc 19740 tgCaggcaC 19800 'gCatggtgC 19860 gcaCgcacc 19920 atgCCggct 19980 cacgatccc 20040 gatCggaat 20100 ctgCgaggC 20160 cgaccgcgt 20220 gtggCgttC 20280 cgccgCtCa 20340 cagccggCt 20400 :gcggaatg 20460 cggacccaa 20520 3ctgga9cg 20580 3caagtgga 20640 ccattgtc accttta ctgggctcc cagcgggcz Cggcaatgc cacgtcgct t tccggcaa cgct tcccc agcctcgac ttggctggc tatggc tc t gactttcac cgcggtcga ggtcctgca ggattctgc gcgtcagat ctggagt tg gccggcgt t aacc tggc t4 gtggacgcal acaactggt4 cgacatgcg( gccgccggc< caccgcatgc tgaactccac ccgcattctc atggctggtc tcaggcgcct cacgctgatc gaacaacggc tcggcacgaa cgagatcgat cc cgcaagc c tctgctcgcc cctgggcggc catcggagat cgccttccgc tatcgcgttt acgagtcctg gcgtgcgggc ctcgctgggc caat tggacg ggaggcgagc ttactatgcc ggtcgatttg gatggtcgca caccgaatcg catggcgatg cgagggaacc gatgatcgga Ic gacgaact( ic tccacggtc Ic aatctgcga LC gatgtcttt :g gcgctggt .9 ggcgccctc: t tgcgtgcgc :c gcgcgacac 'g caacccggc a gaccatcgc g gccggaact a cagatgctc t agacccggg t gcttccggg g gagacggtg g gcggagcac t ccaggtgag c ctCgatgca g cccgccggC, t gcgcggctgl gcccgCatc4.
ggctggttg( 3 gcgcgagcg 4 ctgcgcgcti 3 CC9CCgCtCc ;catctcctc< acgcgcggcc ttctggggtt gatctcgatc gagaatcaaa Lgcggatatgc gcccccggac ggtgaagtgS ctCggcgtta gaatgctcgg gaagtcgtgg gttgccttga ctcaccgccg attcacgctg gcggagatct atcgcgcatg aa tcaagaag ttcgatCtgc ggccgcaagc ctcggcatgt aaattcgaat gtggaggcgt cgagat tgcg tacttgatta cgcggcgc cc :c tgcagtcgc ;a ccggCgcgz it cgccggttc :c tcgagatca :c cgtctctgc ,t atgaggct9 -c tgccccggt :g cgatcacgt a ctttcttct !9 tgCagggCg t ccgagacct a ttgtgccgc a tggcgtcgt g acattcgtc c aggcccgct g gcgtcgagt g cgatcgggci t gtctgcaga a tcgaccggal g aaggacctal g agggtctgC( 7acgaactgc g CgCggtcatc 3L ccggcaaccc a aggaaatcgt :agaCcctggc I cgcagccggt tgggccggac ccgccggcgs Ltcgcctttcc aacccgccat tCctcgaccg agattgaagt tgcccgacga gccgtatcgt cccttgcc agccggccaa attacgcgct ccaccggcgg t cgctactgc tttcggattc gagtagacgt tgcgcgatca tggggcttcg CCC tgaagcg cggaaacc tg t tcgcaccat ccaatgcgc ccggcggact ggcggc tggt -C cgacgggatt IC gctggagacc :t gttCtggCag Lg CCCtCatCCC :g ccgcgaccag fg gCaCaCtgtC a tCCCtggCag t gggCaatCcg g ggagaCggaa a agtCgtCttg t Cggtgaaagt 9 CgacggCagC t tCggatttc a gacgCCtgCg g CCCCaCagtg a tggtccggct g tCtgCgtagc t Cat CgCCgC9 t gCgctggCtg t cgCCgatctg 4 gctgcagcgc 4 CtgggtCgCt 4 gctCattgtC I CgtgaCgcag :gttttgctc I9cgCacgccc cgatggacag cgtgcattac fcgaagaggaa ICggCggCgCg gttcaaggcC gCtgCgCttg C CtgCgCCgcg S t9CgcCcggC g ggCCatgggg a ttgCagtttc g CattCCCgCC 9 ctcgCgagcg 9 tgtgggattg g CgggagtCcg 9 gCgCtCgatg g CgtCCtgaat t tggaCggttc a CCCgttcct9 a cccggcattg a gCggCCCCtg g4 ggCgCaggcg cc CatCgCaCCC cl tggcgggCtC- g~ gctgCtgagC cC caaccgcggC accagcctcg ggcatccgcc atcctgctgc gacgaacgcg gcatggcgga cgtcgtcgtt ctgttgggaa ctcagt~ctcg ccggctactg CCgtgCgtgC atgacgttgc agccggcagg gatgcatcga gtgcCggCgg ttccgcgCgc tcggaaacgc gcgtttggtc catcccgcac tCgctgCtgg Ctggatgcgt ca9ccgcacg ~ggCgtgtgg gagcacgaac tggcgtcaag atcctgcaag aacatccgg :tcctgcacg z :gttaCgtcg c ~gcgatcggc c :gggCCaCat c ~gcctgaaCt t ICgattgCCg 9 LaaggCgtCa c rgtCgcttcg t raacaggccg c cggcaatcc a aaaaaCgag c ctttcgtgg a CgctttCCg 9 tcgagatCg g agaaCCtCt c cccgggagC t aaaCgcgag t ggcacatcg g taCgctCgg cC 3tCttaccg t 3CC9CgcgC Ct CCgCgacCat 20700 aCagcaCgta 20760 atCttgCCga 20820 CC9Ccatcgg 20880 gttccatgct 20940 CCgtgtaccc 21000 tctggCtCga 21060 aaCgCgtCga 21120 CttCCgtgCC 21180 CgtatctCga 21240 tggagcatgt 21300 agctggCcat 21360 CatCgacatg 21420 CCgtcccgCC 21480 cggagctgtg 21540 tCgagcagat 21600 gttCCaCtgC 21660 CCgCCggtgg 21720 gttCCgtggt 21780 acggagaggg 21840 cggagcgcat 21900 ~cgctgcaga 21960 atagCgCgct 22020 aaaagCtCga 22080 zctcatgcga 22140 :accgcgcct 22200 :CggtatCgC 22260 iaCtgaactg 22320 IaCtgctgaC 22380 ~gcgcgtggc 22440 :gttCcggct 22500 :gCgCCgccc 22560 .CCtCgaCgt 22620 FCagCCCgCg 22680 CgaCtttCg 22740 caCCaC9cc 22800 cgCCctgcc 22860 gCCCggCga 22920 gatCgCaca 22980 gtatCtgCg 23040 CgacatcCg 23100 cgatCtgct 23160 caagcgcga 23220 gtacacgct 23280 gctgcagga 23340 4acgaccat 23400 7aaaatcgt 23460 3ttcgatag 23520 :gcacgctg 23580 :tcacccga 23640 ggtccagca tcagcgcga tgccgcagc cgtgatggc gctcgatca aaactacgc cctgccggc ggacaatcg cctcgctat caatgtccg tt tgcatga, atcggccga4 gcgcgtgct( cttcgattc( gctccccgcc gtcgcaaatc ccgcgccatc tgcgc tccgz gcgctattgS cccatcgccz ttctggacgc aactcggacc gccgcct tcc gaagctctga gaggacgccq tcctgcgccc tcgggttccg cgcggtccga gcttgccaaa atcctgactc cgctgcaaga gtgctgctga cgcggc tcgg gcgcagaagg tcgttgatcg ctgcagtcgg aacatcgggc ctgcagcatc c tggacggca ccgcgtctgg gaagaggcgc t cggcgcgca tcgcacccga tatagtcacc gcggcgccgc ggccagggcg cgcgacgcgc .a gccatcgccg t gaactcgagc c gttctcgacg c gcgaaaatcg t ttcgtgctct g gccgcgaacg g ctgagcatcg c ggatcgcggc t ctggaacagc c cagtggcggc gacgcggcga 3 cctcagaccc 3 cgcatcgact :ctcatggccc 3 accctgattt 3 ggactgccgc 3 aaaactgcac iggagcaaggt I cgcggaacat itcgtcggcat *tgttgaagaa *agtactactc *tcgaacgcat Lacatggatcc gcatctctcc aggacttcgg gcgtggcgca gcatggcggt gcctgcgccg ccgagggcat cgttcgacgcc agcggctctc c caatcaatca cggtcctgca a aggcgcatgg c tctacgacgc g atctggaggg c gcaccattcc t gccggtttcg c ccggcgtcag c ctgcactccc t ccgacgaagc g atgcgttgct g gcttggcgat c ggcgcgaagt a cgcagtacgc c tcgatcgttg c tcgagtcggt ti tcatggacgc gcgtgatctc atgcgctgct acggtgcctg tctCCtCcgc cctttcttga gt tggggtgc tggctttgcg tgctgaacag agttctatcc gcgaagccga gcaggacat t ctcaaactat tggagtttcg ggggtcatcc tggt cgaagc tcagcgggtt cgtgagggaa gcggcaaaac ggcgtgtcgt cggtgtcgac ctccgatccc tgacggtttc gcagcagcgg :ggccctctg actgtttcag tagcatgttg :gatacggcct 3cgcgaatgc latcgctttgt :gcagccgac S :gatgcgctg g ~gacggacgg a tgaggcggtg g :acgggcacg t rccggactct g 'gcggcggga a ccgcacctg c atcgccacg g tcgttcggt t ttgccgaag c ctcggcgaa c tccgacgtt t accgccgca g tcgttgcgc c ggcatgggc g( gccgattgg ci tcgccgttg c~ gctcagttc tc gagtatgtt g agatgtccgg ttccatcgat actgaaccag gaacctgcac tgcaggac tg Cgcgctggcc gtggtcggag cggcatggaa ctcggcttgc caaggcggcg tgcgccaaac gc tcgaagaa cgatcccctg caaccgtctc cacgctggcc gcaggccgcg ggacgacatg aaaat tgcgc atcgcaggct tttccgggcg ggtgtcaccg gatgctccgg gatgcggaat ctgctgctgg 3cgggcagcg .acgccgacc ccaatcgca ~gctcctccg ~atgcggcatt *cgaaggctc ~gttatgtgc
S
~ccgatggcg a .gcaatggca t ccaacgcgc a cgctgggcg a 'cgccttgtc t tcgccgggc t attttcgcc g aatcgtcgc c ttggaggga g cggtcacac g tggccggcc a gcttcacca g atgcggcag a ggcggccgg C.
cagagcttt al -ccgtcccc a icgagacgg cc 4gctgtcgc tc ggcgtgtg tc acggtgcagg cgat tgcgcg acggaagcgc ttgctcaccc ctgggcgcgc tactaccgga gtcgggc tgg aacctgacgc cacgtcgccg cagtctgcac gcgt tgcgcg catctacagc cgcccgctga gaactcacac ggtcttgccc gctgctgcgg tcggaagaag ccatgtcgtc tcgacctggt gcgcgaagaa aggtgccgcc gcaaggcgta tcttcggcat aagtgtgctg cgaccggcgt ctgcccgcat tctcctatct 7gctcgtcgc :cgccggcgg ;catgttggc ~cggcgagggc Ltgccatccg t :cacggcgcc Lcatcgatcc a .tcctatcga g gctgggttc c gatcaaagc c gctgaatcc g gtggacgtc g caacgcgca c cccgcagct t cttcgcgga g tcaggttgg g ggctgtagc g aatcgcttt t taaaacgca g4 gctcgatgt t ?tataccca g~ 7ggcgtccg g( 3gccggcgc ci ccgatgtttC 23700 gcgtgattca 23760 atttccgcaa 23820 gcgactgccc 23880 ccgcccaggg 23940 aggcccaagg 24000 CtgccgcgCa 24060 cgcaacacgg 24120 cgatgcccat 24180 tgttcgagct 24240 cgcggctgca 24300 agcagctggc 24360 aggaactcgg 24420 tgggtctcac 24480 cgcacctggc 24540 aaggagacag 24600 cagccgtggc 24660 ggtcaaactc 24720 tcacgccgaa 24780 tccggacgcc 24840 agaccgctgg 24900 tgcgcgatat 24960 ctccccccgc 25020 ggaagcggca 25080 :tttgccggc 25140 cggagcttgg 25200 gctcgacctg 25260 :gtccatctg 25320 agtgaacttg 25380 ~cccgacgga 25440 ~tgcggcatc 25500 :9cagtcatc 25560 ;aatctgcag 25620 ttcccacgta 25680 ratcgaggcc 25740 gtaaagacc 25800 'gtactcgcc 25860 'aacatctca 25920 gaaggacgg 25980 gtcatcctc 26040 ctcactctg 26100 ttcctgcag 26160 cgcgacgca 26220 gcattggcc 26280 ctcttcacc 26340 cctgttttt 26400 Zcgctgacc 26460 ccggcaatg 26520 ccggactac 26580 tttagcgtg 26640 tttgccctgg aatgggctct gtgctgggcc acagtctcgg gaggacggcc tgcggctggt gaccgccagg gggcggctg aaagcggtc gtcgccgca gaagcgcaa gcgttccat atcgcgtat aaaggcacg gaaagcgcg catcccacg cactcccta' tttaccgc gtcgcgctgi agagagccg atcttcgaa gcggtcatt( t tcggtccgc ccggatacg( gtgcaggtct gcggcgactc ggcgatgcgt tgggtggaac gctgaggatc gtattcggac atcgaagcgc cgcccgagct gtcatcgcc9 tcgtggctgc ggccctggca atgccgcaag cagactttgg agtgatgacg gcgatcgcgc gccgacatcg atcgcat tgc aagcggccgc t taggacgaa ggccggcata gcttgcgatg ccgctgcgtg acgtgggctc cagctcactc ctcggttccg cacatgcgcc gaaggcatgg gaagtgggtg ttccaagtct ctcaccgaag gccggcggcg gaatacggac cgcaagagcc a tcgttcacg t cgaatgcgc g atgacctgc t cgccgctga agccgctgg, a cactcgacgi tgcgaaccci tcaccacgcl t ctaagggac( g gcgtgaatcc =cgacgtatc( 4 cgcgcggcgc a attcgctaac 4 tgcccggcgc 3 ttccctgcgc :cggtcacggt :tcagccaggz I ccggcgccgt :tctacggcgc jaagtctggcc Igcgcgaacgc Icgacttggcc Itgcgcttcta cgagcggccc fagttttccgg aggatgtgca agccggagga agctgcgcgt tcggccgccc atgcgactcc gcgagcatcc ccgccgccat gcaacggacg ctgccatttc gggtggcacg cggaggcagt tgagt tccga gagtcgtgca gtttcgagaa gccaccatgc ccggacagag gcgcgcaagg ccgcgcgcat cgcgcatctt ccgccgaaaa ctgcgccgga tgctggcgct tcgattcgct t tcccgcgac caatccgag ggaccgcac a tcgcgccgg t ggatccgat c gatcccgct z ccgctactg t. ggcggaccg t ggggcgata 3 atcggattgi -cgactggcgi :gtttcagcgl I catgttgggi cacggagac( ctggc-acgt( cgtctccgat gcaagcgatt Ltggcgattcc tcatttcgac gatgaacgcz ftcgcgatggc ttaccggctc cgcggagcgt ccgtccgccc gttcgtcggt actggctgta gtggcaggag ctggttgctg cgtgtccggc ggcgcggctc cgtcgatcct cgagctgtgg gctgc tggat ccgctacgt t agccgacggc ccgcttgatc tgccgatctc gcaacagctg tgccgcaggc ggtgctggcg gctcgacttt caattactcg actaccggcg cgcgcggcaa cggcgatctg aaggcggagc gcggcggcag cgatgcgtcc gatggcgc tg at tgctatac c cgcatcgcq c gtgatctcc c gtggaaacg t ttggacaag g gtgtcgaac g cggcgacag c gagtgcaag t tgtctgccc 3 tCcgtgctg CggtCtCtati Sgacaccttc, a gcgcgcctc, gtcatgatg( gtcacacccc I tggaagctcc I ctgccggcgc Icgcggcgtcc *gaggcgctgc caccccggcc *tgccagcccc gcaggttctc *gatctgacgc atgcatgccc tgcgagcgat tgtgccggcc gtcactctcc tggctgatca t tccaggctc ggcggcctga gaaatccgtt cgccggctgg gtctatctga gagcaaggcg gagcaactcg gcggcgctgc gtgctcgatg ccgaagctgc t tcgtac tc t gcggccaacg ctgagcatca ggcctgccgg c tgggcgaga ccggcgagcg gaactgctgc aagacgctcg gatctggcgc g tcaatgcgc g cgctcgccg g gcacggctg c gagagctga t tcgaagcgc g tcagcggag t tgcgcgaaa c tgtttctgg, g atgacggcgi c tggaaagtcl 9 ccggggaat( a gcctgagac( a acagcgcgti atgagcacgt 0tcgaagcggc 0ggcaggcact 3 gcgaggacgc :acacggcagc -agccttccgz I atcttggccc I ggcgaatgcc tgatcgattc Igcgcatacgt tgcgctgtca tggttgaaga Igtacgctgca cgacaacgtt cagacgatgt gccaggcgct cgcgcggcgt cactgtgggg tcgacctcgg atgccggcga tgcggcacaa tcaccggcgg cgcgccgtct gggc tgcagt tggcggaccc* acggggtagt agggtgcctg tctcttccgc catttctcga at tggggacc gggtaccgct ctgccgctca atcccggctt agatgcgcat acccgcgccg gcgcca tcgg t tccccgcggc 26700 caaggtggca 26760 agaaatcgcg 26820 a cgtatcgcat 26880 t tgcaggtgcg 26940 c cgtattgccg 27000 c cgtgcagttt 27060 a aatcggcccg 27120 cggtctggctg 27180 t tggcggcctg 27240 acccagccgc 27300 g cgtacccgcg 27360 gggcgatgtc 27420 gatctacgac 27480 acaggaagtc 27540 ggccatcccg 27600 1 cgaagcaaag 27660 :cagtctgcgc 27720 iagtcatttcc 27780 cgccttcagt 27840 1tctgccggtg 27900 *ttgttttcaa 27960 *gccggtcggg 28020 tgcgcgtctg 28080 *gaccggcgcg 28140 *atoogcacag 28200 gaagtccgac 28260 cgccggtttg 28320 ggaacagacc 28380 gcatcgcatc 28440 actcgggcag 28500 ttgcgacaat 28560 cgacaaagcg 28620 ggaaacgtcg 28680 tctcggcgca 28740 ggtactggtc 28800 catggttgct 28860 gcgcacccag 28920 tacagaacag 28980 gaatcttcac 29040 cgottcgctg 29100 cagccttgcc 29160 atgggcgggc 29220 gctgccgccg 29280 gatcgcggtg 29340 catccagcaa 29400 ccgcaagcag 29460 gccgctcaag 29520 agagctggtg 29580 attggccggc 29640 gaccatccga ccgtcgagaa catgtcctcc cagc t t ccg gaacgatgag cgcgtctcga gccgctttcc atgccatccg ccaacgcgcc gcttcgacgc agcgcctgct gtctcgcggg tgaaacctac c tgc cgccgg cggcgtgctc agtgcagcat acttctgccg ccgacggt ta cgacqcgtga gccgcagcaa cgctcaagaa ggacgccgct gtgccgtcga cggcggcagg cgccccatct agatagccac gctcgt tcgg tagaagcgaa t caaggcgag tcctcgaggg gcgaggaagt cgctttcggC gctttcacgc aacatcgcgc tccggcgcaa cgaaactcgc tgtattccga gcttcgtgga agatcgatcg aatcctgggg cccatgtcgc ggctgttgct gggcgaaggc gaccacgcag tcgagaggcg aggtggacc ccgccgtgcc cgtcgtactg tggctggtga cgatcgagga gcgacgaac agcgtccgga gcgaactcgg agcaggagat cgatctcact cgaac tggag cggcgcggac cgaaattcct gggaaagatg cggcttcttc gctcgaggtg cagcga 'cacc cgatccggcg acggatctcc ttcctcactc ggcgctggcc cctgcgggcc cggccgcggc cggcgatcgt cggcctcacg cgccggcatg gggagatCc ttctccgttg tatcgccggc gcatttcaac cgcatgttcg aatcagtggc gacgaatgta tgtagaggcc ggac agc cgc gccggtcccg gcgc cat ccg tccgctctCC agcgtttgtg tgaaaccaat cttcatcttt cgagcctgtc atggcggctt cgtgcagcct aattcgcccg aggcattctc cggact tcgc cgtgctcgct cacggtgttc cggcgtct tc gctcgagaac gttcttCtCC ggtagccaat tgagttcacg tagtctgcgg ggagcggcgt c tggcgcgcg actcgacgtc ggcggcgttc cctcttcaac agcgtccaca t cgccggaag gcgggccgtt tacacgcgtc ggaatcacgc gcatgggaag ggagtgt tca ctcattgacg tatctgctgg gtggcggttc ggcggcgtga atggcggccg gagggatgcg attctggcgc gcgccgaacg gcccccgccg atcgaactgc atcgtcgggt ctgatcaaga gcgcccaacc ccatggccct accaattcgc gaggcgaaga aaagggaatg ccgcgaagcg gatcaactcc caggctc tgc gcgc tgtgt t gcctcatccc cgcggcgtCg tccggccagg ttccgatcgg gcggacctgc gcgctgttcg gacggcgtgg accctggagg ggccggggag gaacgcggtc tcgggagacc tgccggc tga gaat tgcgcc acggt tgaag c tgcgacagc cagttcctgg acgctcggca gagctgctcg ctcgcttcgt cccagcgatt atcacggaaa aggcggtcc t acgaacccat cattttggca gggatgccga tgggcggatt cgcgcgaggt ctttggagcg tcggga tcag cctataccgg ggttgcaggg atctggcgtg acctgat tct atggccgttg gaatgctggt tgattcgcgg gtccggcgca atgtcgatta gggcgatggc cggtgaaaac ccattctcgc cgcacgtact ccaacggccg acgtcgtcct cgaatgtaga tggaggctaa gcggcggggg atgccgaaga gcgatctCgC ccgccgccag tgcccgagtt ccaccggttt gcggacagta cgatcgaacg tcgccgacga ccgttcaaat ccggacacag acgcggcccg cgatggctct tcactactgt gtg tggc tc t ttcaggtgga aggaactcgg gacagt tgag cagtccgttt agatcagtcc taaacggact agttgctcgc ctcccgacac ccctcgtcga ccttgcacca ggcgc tcaag cgcgatcgtc gctcctgcac tgcgttttac cct cga tggt cgccggtctg tgcgggtCgg caccgacgac taccggaacc accgaact tc ccgcagc ttg ggcgccggaa caaaagtttc gctgaagcgg atcggccgtc ggaagccgtg cgtggaagcc agcggtgc tg caacttcggc cctgcagcac ctggaatgag cccccgagt t cgcagaagcg ggcgaagacg ggctagtgct gtcgggcCgg cggccgcgaa cggcgcctat cctgacgcgc caatcaattg cgccgatccc cccgcgcatg ttgcgacgcc gtcgggagca cgcgctggtc catgggagaa catcatctgt ggtcgaactq ttctgtcgcg cgagcatttg tgtcgcttca ccgcgttatt cacgggcgag ctgggagtcg gcatcctgtg ggt tcgcc cc cgcgctctac gcgcctggat tgaagtgcgg tctgggagag cgcacgcgag ggcatggctt gatggcatcg gatcccgatc gccgtcgacg gatccgcagc ccgcccgaca tacagccggc gcgt tcagca cccgtcgaca cagtcgcgag ag cacgatct gctgcctccg ctgtccgatg aaccacggcg attcgggcgg cacggaaccg ggcgaggggc cacctggagg cgagaga ttc ctgccgctaa gccggggtga aagacgaatg ag tgaagagg agtgtccccc ccgcccagcc tacctcctac cgcgatgggc agtcactacg ctcgaggcct ggagttcgtc gcgtatcgcc gccttccgca tggctgagcc gaac tgctgc gtggcggcgg cgccgcagcc c cgc tcga tc gccagcaacg aaggacgact cacagctcgC gcaaaacgt t gcgtgcgacg ttgcaggcga ctgacgccgt gtactgcgcc gtgaatgggc ctgccgacgt 29700 29760 29820 29880 29940 30000 30060 30120 30180 30240 30300 30360 30420 30480 30540 30600 30660 30720 30780 30840 30900 30960 31020 31080 31140 31200 31260 31320 31380 31440 31500 31560 31620 31680 31740 31800 31860 31920 31980 32040 32100 32160 32220 32280 32340 32400 32460 32520 32580 32640 atccc tggca ttggcggtca tctgggagtc agcttgtggt tgttcgcggg ccgcgggcgc tggccgaagg acaaggtcgc tctatgcggc aagtgtggcg aagcagagtc gcgcgacgct tgcactgttt t cgagggcga tgcgtgccca agctgctggc ccggcggcat gtcgcgatgc atgagaccga tgcgcatcgt ccgaatcgca aga tgccggt agcatcccga aggcgctzctg gcgcgagcgc tCCgCtgctc cgtgctctct gcttcccggt tggctgcagc ctcgcgagtg cggttccgat gcccacggtg cttcgcatcg gcgcgapggc cggttacacg tggcggcgaa cggcagaccc tgtcaccctg ggaactggag cgagagtcca cgcagccgcg cgagataccg gga tgagacc tcaagaattc gcatccgcga ggtgcccgcg gttcgcttgc ccgagagatt ttctggttcg ggtcgcaagg ctggatgcgc gccgcttatg c tggaagaga caggtcatac tggaccgagc agcctgccca caggggatgc gaggcagtgg cttcatcctg ggcagcgccg gccggcgatc tgtgatgcgg cgccaatccg acggcaacgg gtggcgcgag gatcagcctt gatccgtcgg ggagtgggac atCtggCtga cgggcaccgg acctgcatcg ct cgcgggga cgacctcgac tcgagattgc tgccgtttct tggaga tggc tccggtttga tcgagggaca acgcgcgcgg cacttgggga attacggcga cgcgactgag cct tgctcga gtccttgcgt t tagggtgca aaggccacgt aatggttcca tgtcgggtgc ggc tgggcac accggggcgt c tgcgggggg gcatacagct ttacggcggg tgtggggtct atctcgacac gttctgaacg gcggcgaagt gctggcgccg cgccgatcac go tggccgca acaaatgctg cgcattccgc caccatggct tcgcatcgag caccttccgc cgtgccggat tgcctgtttg gcctgtcgcc tgcgcggCtg catcctcgag cgc ta tggaa.
atggctggtc aaacacggt t cattcactgc aaccgcctgc gacgaaaca~a cgttcatgcg gggacgtacc tgccggtgaa t cagggcccg ttgccggcag 32700 gacacacacg 32760 cggctcaacg 32820 gccaaggaag 32880 gttgttcctt 32940 atctccagtc 33000 gcggcgCCgg 33060 ggcgatgact 33120 ggcatcgcgg 33180 gccgttcgcg 33240 caggtgctgg 33300 atcgaacggt 33360 acggggcggc 33420 gtccaaggcc 33480 tgggagccgc 33540 attgccgatg 33600 gtgatttcgg 33660 gggagcctgg 33720 gaagacattt 33780 gcgtccgacg 33840 gagcatctgc 33900 atcgcggccg 33960 gtcgaggtgc 34020 g 34071 <210> 115 <211> 4615 <212> DNA <213> bacterium <400> 115 actgcagtgc ctcgctcatg catggtccga.
aagcggcctc tgccctctcc gccggaaagt gcaggcgt tg ga ttgcggag cgaagtgat c gcccttcaat tcaggttctg ggtggatgaa gccggtcgcg gggcgagcgg tctcccgtcg ccgaatcgaa go tgcatgcg agagatcctg ccggaatcgg gcggtgcaat tttctagacg agcgaacgca taccctctca cccgcataca cgccgt tcgc agtggcggcg ccgtgttcgc ctcggcgaaa gccatcgtgg ctccgcagta agcttcgccg ctttggaact oatcgtccca cccgcgctga acgctgatgg accggcaccc cggtggactt tacgcaaccg gccccagcgt cgacggtggc gcgccggcca, acatcgcgtg tgcaggacct cacccgttCa cggacgatga, actgtttccg tgcatcacat tctacctcgc ctttcgtccg actggtcctc g-tccgccggt ctgcgaaact cggcgtttca tcaccaacgg acagcagccg gatcgatacg tgcggaactg gccggaacct gcaggcgctt ga t ccgcgc ggtggatcgt aacggtccac ggcggtgctg ctcgcgtctc cctcgccgac gaggacagoc ctggcagaac gcagctttcc gcagagtttc gaaggcgctc agtgcttctc tcggacgcaa ctggtgcgta.
gatctgcgcg gccagggatc gcggcgcagg tggtttattt gcgagaggcg catccggcgc agcagcgtcc atcgacggcg c tggtgcagt ttctggtcac ggcggtccgc gaactgttgg ggccagc ttc cggggaaact gcgcggcggc tcccgt tgga ccggaattcg tgggattgga. tCttgctgcc 120 taagcgatct 180 cctcggttcc 240 accgaagcgc 300 ctttcgatcc 360 tgcgaacgac 420 cggtggattt 480 tcttccacgc 540 cggggaagga, 600 tgctggtgat 660 ctgtcgcgcc 720 ccggaaccga 780 cggttctgaa 840 ctcactcgtt 900 agaacgcgac 960 cctcacaaga 1020 ccgatctcgt 1080 cggatacttc Cacggtgctc gtatgcccgg gcctcatcgc c Cggggcagc ggatctgatg catttttgat aatcgcggag catccagctg cgagctgt Cc gcagaatctg ctcgcgcggc cgtcgcaggg atatccggcg tgtcacggaa cgccgatctc aatcacacac cgaccgcgat ctttttgccc gtgaatcccg gcccggat Cc atcgtggagc attcccgaat ctcacactgg atggtcgaaa gctgccacga aatcccgcct ctcgaagagt gaagctcagg acatatcgcg gctggacccg ctgttgggcg caacgtct Cc tcggaagtat gcctatgtcc caggccgtcg acgctgctcg C tgagcgccg tggtgagagg ctggcgaggg cgccacctgc ctgcggcggt gctatggaat gacaggtgac tgacgatcgg cgggctcgcc ttcgttcgac cgacggcgcg catcgaaacc ggtcaccgcc tgctgacggg gcacgtgacg ggat ccggag tgccgagatg tcgcgtgctc C Ccggagtac ggaccgcacc gatcgacagC cgaagcgtat cgtatttgtc agttcagcgc cgcgtcgcag gattccggag gacccgcCtc at cggatCcgc gccggagt Cg tgtgacctgg gcgccagacg cgtctgctgc gaagcccttc ctttacggac ggaccggttt fatgcagcccg cgtggatacc ccC ca tggca ctcgagtatc ggcgagatCcg agagaaaatg catcgcgcga cagtggcaat ttcaacatcg cgggagtggg gagattggct tgggccacgg ggcc tggcaa cgctcgtgcg ctgcggcgcg ggcgatgtcc gcacccgcgt gaagaggaac atcggccgga cgctaccagg aggcgttgcc gccgcattta atgaacggtg tcgccttccg Caatcctgcg ggcaaacgc C ggttgggtcc ccgtgccgC C agtccctggc ccgatggagg ttgaacgtct gtccagttgt ggaatgcgac tggagt tgac aactcaacgg gcgaaatggt tCCctgaaggc ggctgatgct ggacgcagcc tgtacacctc tcaaCCCtct ccc Ccacgac gcgcgcgcgt aactcgcgcg tcgca Cccgg ctcgcgatct ctCaccgaaac cgattggccg cacccatcgg tgaatcgtcc ~ccggctgta :cggccggat aggccgcggt 3cgcggccgg cggcagccgc =cgtctggga :cggctggag z :gcaggattc c Itggtacggg a cttttcgca S Cgtccgcac g Cgcggttgt t gctggccga g cagtctccc g gttgacccg g cgtggtcga t Cgaaatcct g gatcctgca C gaccgcggc c cgagattcc g :gaagctcc a :gtcgatcc c4 ctggagcag t tccgccggc Ci aggagaact C tctcggcgcg cggac Cgcgg catgttgggt gatgccgctg CCtCtCcgcc ctccCCgcac cgatctaccg cgccgcggaa gcccgacgcc gagcgccaac Cggcat cca C cggcgcggcc ggaagagacc cgacaccaa C cggttcgacc ttcgtcgatg gttcatgttc cgtggtggcg c agca aagcg ctggcccggc tgccgaccgg gacaatttgg ccccatcgca Cgttgcgggc ggaactcagc tcgcacggga cgaccaccag :cgcagtcac taagtatctg :gacacattc :accacatat ~agcagtgCC z :gtcgatcgc e Lctgctgctc t raaggcgctg g rttccggcag g .ctgaactcc g rgcggtgcgt g 'ctgctggag a aatgagttC c cccgcgttc t ccgcgtcgc g atcggatcg c gaaatacgc a aacgcacgg C gagacactc g gccgatcta Ct catgggcca c~ tcccgtcca cc C caggcga ec atcgagcacc gttctattcg cagtccggcg cgacagagcc tCtctgcaat Ctcgccgtgc c Cgctgacaa ttcccgtccc atcgcgt Cga cggat cgcgc gtcacgcggt Cacgttccgc aggccggtcg ccgaacccgc ggccggccga cggcatgagc gacatttccg aaccaggaga acaatgatgc gaccgccgcc ctcctgcaac Cccgccatcc aacactcagc gaactgtaca gcggacaagt gatctcgccc gtgaagatac =cggcggtcc 3cggcctaca -acgaccgag ;aacagaatg Lccggagagc LtcctggccC :Cccgcgtcg c ~actacatcg c rcggccgacg a rttatccagt a 'Cggtcaaac c CgCCCtacg c ggcaacgcg t tCCCgctc t gccggtcgc a gggaagcgg a gagtactga c Cgaccgccg a gggagttgc g qgcgtatgg a acggacgct t gacgccgcc t~ cggaCCtcaa 1140 aggagtaccc 1200 Cgctccagca 1260 gtcgcatggc 1320 ggtttgacct 1380 acaacacgga 1440 Cgctggaagg 1500 cccgggaacg 1560 aatgcgtgca 1620 gcttcggtga 1680 actatctccg 1740 cgctcgaaac 1800 Cggaaccgga 1860 ttgtgctgaa 1920 Ccgcgactcc 1980 aaggcgtgca 2040 cgggcatcag 2100 cgctcgagat 2160 cggccgtcga 2220 aggcaactcc 2280 Cgacggcgct 2340 gaaccgcggc 2400 aacgggtgac 2460 CctatgtgcC 2520 Ccggcggcgc 2580 Ccgtcgccaa 2640 gccgccaacg 2700 gcgggttccg 2760 3acatgctgC 2820 :Cgtccccct 2880 :cgagtccga 2940 :gccgaacgc 3000 :gattccagc 3060 ~gcggccgcg 3120 :Cccccactg 3180 :cgctcacgc 3240 Lcgcgtgcga 3300 .Cttccccgg 3360 'gggcggcat 3420 'Ctctttaga 3480 gcgttcgcC 3540 ccgcgaaca 3600 Caacgagct 3660 ggagccgga 3720 ggacgctca 3780 aagcgccat 3840 ggaccggct 3900 cgaagacct 3960 cgacgcgac 4020 ggccggccc 4080 gccgtaccgc gtggcaatcg cCtctgccgC gcggcggccg gtatacgaac tcatzctgaag aatgccgctg ccggcgagcc ccacat ttgg ctctggagga cgtggaagtg tat tcaggac ttcccaaatg gatccgctgc gagaagctgc acgcccaacg cacgccgaag ggcgagg tgc cattcgctgc ccctttcgaa ggcgtagacc tcagatgttc gagccgtcta ccgactacat gaaaaatcga cattcacgcc tcggcatgga tggtcacgca ccgtgt ttaa caggatgggc aaatcgagcg tacgcgcacg gatcccgacc ccgtaacgcc tccggaaact tggcatcggc gatgatcgcc cgcccccacg aaggcgagcc tatgatgagc gttgtgccgc gcgtgggtcg ctgcccgatc ccggtggaac gtccatgatc cgcgtgcgcg gttcgaggct gccgatttgc gccgcccaag agttgcgtac tgctccacga Ccgagcccag agg tact cgc acttc t tcga acatgctcca tcgccgtcgc tgatcgctgt act ag 4140 4200 4260 4320 4380 4440 4500 4560 4615 <210> 116 <211> 8301 <212> DNA <213> bacterium <400> 116 atgcagaat cgtctgctgi gcacccttgl cccgcctaci ctgcggagt( gtgga tggat caggctgacc gacctgcgaz attctcctgc cgcgacctgE ctgccgattc cagcagtact cgcccccggC gat ttgaccg gcaatcgcgg ggggt tccag aacatgatcg cgcacccgcg gttgaggaac gcgctgctgc tacatgcaca ggactgatgg ctgctcgatg tcaaccgctg acacggcgcg cgcactccgc aatgcgcggg atcatcgctc aagtccggca attctcgatg gcgatgatgg agc aagc cc g ggcgtggaga t cgtcgccaa aggaaaacal Scgctggccc, a acattcccal :tggaggccgi 3 aggcgcgcci Igaatcgcagi iaacccccgct ,tgacgttgac Lcgcgatcgte ,agtatggcgz ggaagaaaca ccgcgcagca atggactcca cgtttcaggt tcgcgggccg tcctgcgcgg acaccgcttt tgcatcctcc ccgatgcgcc acggcggatc cgtccgccga cgt accgaac cgctgttgtc a tgccggt cc acgccgtcgc ccaaccgcct tggcga tgga gcgcgtacct aagtgcaacc cgatgatggc acgatctcgc tccgccactc a taccatagai g ccccgaacai a gcagcggcti t agcgctgcat :ggtgcagcgc 3L gagcctcctc i agcgcggcaz tctgcggacc Sccacatcatc Lcgaagcgttc Lctgggccgtc LgctgtcgggC Lgacctggcgg3 cgcgtttgcc gctgc tgcat t aca ca acga cgatctgcgc gagcgccc tc gcgggacc tg ggccatcacc caaactcgac atacaacacc cctgctggcg ccccgcggtc gaacgggtgt cgttgtcttc ggctcatcgt gcgctcgctg gcctctcgat gcacgcga tc ggtcgccgtc ctacatcata Cctctcgctcg t cgcatcccgc :tggtttctcc :atccgaggtc 3 cacgagagcc I gcgcgagtga iatggccttgc -aagctgatct ,gcggatgcgt gtgcaggggc catcagcaga accttgcctt ggcgccgtgg t tgcgtgaag cgctataccg gaaacggaag gacgatccgt tctcatcagg agccggtcgc gtcatgcctg ctcggcgtga agcgtggtga cgaagccggat gcgcatgaac t gaagaccatc a ctgagcgcat c gagatggtga t cccgcgcacc c ctcacgcagg a gaaccagaag c tatacctccq g ccc9gcca ggcgtgaaaa atcagctcga cgctggatat tgcgcagc tg cac tggaact gtgatgccca gcctcgatga ggtcggtcga ggccatcgcc cgtcgctgaa tcctcgacct agaccacagc gagcgacggt cgcaggaaga gtctcgtcgg cgt ttcgcag actttccttt Ctgtatttca.
4gctcaccat :cctcgagcc itgcggcaac ~ggatcccga :gctcgagca :ggtcgaagct Lgttgaccta c :cggcgcggg C tgcgctgct t :caaggatcg t ggc9gtggC t tgcgaatct c atcgacggg g ctccatgca g attgctcgac ccgggatgcc 120 cccggattct 180 tcgcgtcctc 240 cattggcggt 300 tccggttgtt 360 gatcccgttc 420 caagcagcag 480 gacgttcgtc 540 gctcatggaa 600 ccaaaccgcg 660 tcctaccgat 720 cctcggccgt 780 gttcatgacg 840 catccttatc 900 ttgtttcgcc 960 tcttctcgcc 1020 cgaacgcctg 1080 ggtctccttc 1140 ctcgcgcgag 1200 itccggcgat 1260 :atcgcctcc 1320 'gtccgcatt 1380 ;cacaatgcg 1440 :caggcggaa 1500 :gccgagctg 1560 :ccgggaaag 1620 .gcgattctg 1680 ctcgcccgg 1740 gagatgatg 1800 gtcagcggc 1860 cgaccgaag 1920 cgcgagccg 1980 gtcgctagtc aatctgctgc ggtctgacag ccgccgatgg gctggtcgcc gtcaccaccg tgtcattcga ctggagatct gtggttgacg gcgaccccga cgtgttttct ggcgtagagc aagacacaaa acgcagttat ctgtacatcg gagaaattcc taccgttccg gggtttcgca caggcgatta gacgtgcgcg atgatcccct gacgcgaacg ggcgatgtgg gactatcggc ggac tgc tcg acgatcgagt gc tgcggtcg cgcaatgtgg t cgccggaag gtgccggcca agtccgcgcg gaagcgatgg gcgggatgcg ttctcacgcc cgtgtctctt acctcgctgg gccttggccg ggcatgatcc gtgccgggca ggcgacacga atggggttta atggcgggcg ctcggcgatc gatgtgtatat gccgggctgat ttccagcgtc c ctgctcgact g ggcggcacca a gcgcccgaac g gaactggtgc g gtcgcattca c cggtcgagcg a acgcgcccca a cgccagt tct a ctgaatggac g gccggcgcgc a ggctgccgtt gcgagcggct gcggttggcg gcggcggtga tgtggaatct gactgggtgc acatcc tgga gaggagcggg gcgagtggcg acggcgcagt tcgaaccggc cggtcgtgaa atctgcagag cggcgtttgt cgct tcccgg tggagacgat agaacttctt aggagcgcct Cgcttgccgg cagaagatcg aggagttctg atctgctggc agggtctgct aagcggagat agaacgcggg gcgtcaatac cctccgcgta acaagctgaa tgtcggtggt ggggagttgc tgtcgcccga c acggcgcggg t :ctacgccgt c :cgctccagg t :gaagccgga g :ggtggagat c :cggatccgt c ~caagacggt g :gaatccgcg a rgcccgccgg a icgctcacgt g ratccgcaca t ctcgtatcg c ggccaatgc a cgaggctcg c gattgcgtt t cgagtcgca gi gctcgatct g~ gcccgcgct gi gatcaccggc caccaccctg gcaattgctg agctctgccg ttacggaccg ctccgatagc t tcgcgcatg ac tggcgcgg tgatcgagga cgagtgcctg cgagat tgag ggacgatcgg cgat ttgcgg cagcctgtcc tttgcccaca tgcgtccatc tgatgtcggc ggggt tgacg cctggcagaa gatcgcagt t gcgcaatc tg gggcggcatc ggacggcatc .atggacccg atatgcggcg ctacctgctg ccagctgctg :Ctccgcggg latggcatgc ~atcaatgtt :gggcgctgc :gtggtcgtc :attcgcgga ~gtggacggt c rtccatcggc t gccgccatc g aagaccaac a 'cttgccgtc c attgatttc g aagaccccg a attctggag c gtgctttgc c ggccatactg g gggcgcgtg c caacgactg a cttttcacc g cggtgttt c ccggcgctg t :ttgctttg c gcccgcgtca ctggataagt ga tt cgggc t ccggaactgg acggagacca atcgtgccga gagccggt tc ggctatcatc cgcatttact ggacgagtcg gccgcgatcg ctgatcgcct tcgtggctgg tccct tccgc acgccggttg tggcgtgaag gggcactcgc Ctctccgtcg aaatccgaac atcgggatgg cgcgacggtg agcccggagg gagtttttcg cagcatcgcg cgaagctata aacaacctcg acggccaacg cccagcctga 3agagcttgc :cgcagtccgI :gcgccttcg ~tcaagcgct ;cggctatta :agacgcgatt ~acatcgagg c rctgccaact t .tcggtcatc t atcgcggcc a 'caaacactc c gacgagcgg c aagcgccgc c tgtccgcca a acaaccagc c acttcccgc a cggaggcac g ggcaaggtg c gcgccgcca t tggccgatg a agtgggcct t tcgtcgccac cgggcgctac ggaagc cggg cgcgccgcat ccatatggtc tcggccatcc cccccggagt gcaaccccga ctaccggcga atcgccagat agacgcacat atctcgttcc caacgcgcct tgacgcccaa ctgctcgcga ttctgcgcgt taatgctcac tcgatctgtt CCgccgctgc ccggccggt t tggattccat tcttccagga atgccgcgt t tgtttctcga agggttcgat ccaccgcgga acaaggattt cggttcagac agcgcggcgc :ggggtacct itgagtccgc :gagccgcgc itaatgatggc :gattcggcg c :ccacggaac z ~tccgaaaaa c ~agacgtcgc g igattcctcc c :gtttcgtgt g agtcagttc g ggtgacgcc g tacagacgc g cggtttgtc g ccgtatctg c acgggt tog o gcaatacgo g ggatgaatg c CgCgttgot c ggcgcagtt g tattgcogga 2040 ccgcgagatc 2100 ggtcatgcag 2160 taaaggcttc 2220 tctcgatagt 2280 ggccgtgcac 2340 oatcgacaac 2400 tccgggagag 2460 gotcacgcgt 2520 tctggctcgo 2580 caagctgcgc 2640 tgccgtgaag 2700 ggcaacgggc 2760 toocgattac 2820 cggoaaaatc 2880 gccgatgcgc 2940 ggagoacgtc 3000 aogggtgcgc 3060 ccggcatacg 3120 ggaaoctgcg 3180 cccgggggcg 3240 cgccaggott 3300 occgagctac 3360 cttoggctao 3420 gtgcgcgtgg 3480 cggcgtttto 3540 gccgttcgat 3600 cctggcoacg 3660 ggcgtgctcc 3720 ctcggaoatt 3780 4caooagccg 3840 tcaaggoacg 3900 :otggccgat 3960 :gccgagcgc 4020 ~actcaagag 4080 tgccacgcog 4140 :ggaagcggo 4200 ~gccggtgtg 4260 :agcctgaat 4320 ragtacgcgg 4380 rttogggato 4440 'gocgcagct 4500 gcoctogaa 4560 ttcggcgat 4620 attgtggoc 4680 atcgcccag 4740 ggoatgggc 4800 gcagctctg 4860 gaogogaoo 4920 tggaagtcc 4980 tggggtgtc at tgccggc a Lgcagaac gcagcgatc tcgggtgcc acgcagatg gacc tgcaa aacctcacg cacgctcga gacgtgttt cccgacgac attctcagc gaccgtccg cattggatc, tgccggctg cctctactc gccatggcg4 gtgaacttc( ct ttcacagi tggaact tac c tggt cga tc ctcgagatac gaagcgctgc gccggcc tgc ccgatattcc gtatgggggc ttgctggact actcgcgcat cccgaaccgc gacagtggcg gttaccgtgc atcgtcaacc gtgaagtccc agtgcggtgc gagcacccgg gcggccggcg agcaaccggc ctggtttcgg gcgaaatgga aacgaggagc agccgggaag cgcggcgt La acgcggatcg acccgcgatc cctgccgggc cggcgcggac atggccgcgC ctacacattc ggctcgatta cgct tgaacg ;a cgcccgacc -g ccgtcagcc :c tgccggaac :a cctcgcgcc [c cgcaggatz Ic tggccgttc C gccgggcgc ,g gcaaactgS a accctgtcc tcgagatcg g ccaagcagt a gtgtggcga t attcgcgaa g agcggagtt t cgctaccgg t cagaccacc tcgaggcgt 3 cgcaccctc a gcgatgacc 7atgctgccg! 3 aacgccggco aactggggcc I ccgcgattgi ,aattgcttac jctccgctcgc Iccgcgcaaat cggagggctc gggcgcagcc tccgccagcc gcgcggcccc cgccagccgc tgtacagtct tgaagtccgg acaatcccat at ctgtgggg cggccgcgca gctacgtgcc gcgcgact La tggtggagca agcagcgcg t aagaggtcgc tccatgccgc caagcgtcat tgccgctcga aggcaggcta tgggtttgcc gcaccagcca t cgaggccg t ccgcgccagtt aagccacacc :t ggtgatggc :t gccggatgc Ig tgcgatggc It ctccattgc it tgagagcgc Ic gcgcgcctt ;c ggcgatcgc rc cggcgaggg :g tttcgccga fg tcctaagcc g gctgccgtc c gctatatca g gcgtgtcgc c cagaccgga t ggcagacgt g atattacgg c ggcggaggtc g gcatgcctc< g caatattgc( tgcggtggat gagctaccgc 3L ttccgcaacc 3 cgccgcggcc 3 tatcgatcgc :tctccggcat I cgttctcggs Ictcggaacg9 :ttcgcgaacq Icgctctggca rcgagtacatg cgatgattat tccgcggcta gcaggccgcg cgggctcatc gatgcgtgat gcgactgacc tttgatcacc cggcgccact gc tgcaacag ggatctcatt gggtgtgctg ggcgccgaag Cttcttcgtg CgCCgcggcc ggcgaccagc gtcgatggcc cctgcatgaa g-ctgcgtccc ccggcagcgc ja cacagcgtcg '9 ctcggct tag .t gcggtcagcg .9 gccatcaacg .9 Ctggcaactc t cacagctcga ,g tggcgcaatc a cagctggcga ggtatccaaa g gttctactcg g Ctgcgtaaag g ggtgggttcg c Ctgccggcct a cctgtagcgg t atcttcgagt t tcggtggtgg 4 tttggcgccg a gcggagcgcg g ttccgcatac -gcccacgctg :ggagacacgt cgcatacagc Iccgctcaatc Iagtcccgcgc -gtttgttttt *tcgccggacg fgaacttcagg faagcccgaat ctacagcctg gatgcgctca c tccctagtcg gagctcggct tggctggtaa c ctctggggct t gatctggatc c tcgacggcg a cgccgaccda g ggcgggctcg 9 cgcgtcgtgc t attggtgcga c cgccgcatcc a gacgacggcg t.
gcggaaggcg c ctcttttcat cc aacgccgttc tc attaactggg gc ggcgtggcga g~ tgccccattc ac gcc-gcgct-gc ct gaagccatcc tc gcgaatacgc t tgccgaacg ccggcgagca gacccgctga t acg tgcgga gcatggatcc Cttcgatcgg atCCgCtgta' cgctcaagga gcatgggcca gccgcgatga acatcgattg atcctttcga t tgcgagtgg cgaaactatc CCCcggccgt gccggcacac acacggctgt tcagcttgtc gtgtcgctcc actattcgct gcattcattt cccgttgtga ttgcggatgg acggcagcct 5: 4ctttaccgg c ;cgtgagttt c :gtatgaggt c ;ggcatggct g ~agctcaggg c ~tgagcgtga c ~ccgcagcac t Cggccgggt g 'cgacgatgc g agatcagtc g cgcgcgagc 9 agccctggg a ggccgggcg C ggcagagac g~ caccgaaac gi actgctgaa t( tgtacacct c ggcatcctc gc cgatgcgct gc cgctggtc gc -Ctctccgt gc ~attgccgc gc :tcacctca ac :attgcgca ca ggcggcgtgt 5040 cggccggctc 5100 gcgctgtgcc 5160 ggtcgtgatt 5220 gggcatcaaa 5280 gattctggcg 5340 cttggtttcg 5400 ctggcgagat 5460 cgaaggctgc 5520 aaagtgcctg 5580 gtgggagacg 5640 gcaggagttc 5700 gagacgccgc 5760 tctcgtcggg 5820 gacggcttcg 5880 gtacttcctg 5940 gctggaaaac 6000 tCagctcgtg 6060 cgacggctcg 6120 cgtgccccga 6180 gctgcgccac 6240 cggtgaacag 6300 attggcggaa 6360 7gccgaacat 6420 ~gagggcgcc 6480 :gaggcgcag 6540 :cggcgcgtc 6600 :gagtggcgg 6660 !atcctggcc 6720 gagatgtgc 6780 tggcgcggg 6840 ctggccctg 6900 caggcgacc 6960 atcgcgcgc 7020 catgcttcg 7080 gcgtggaga 7140 gcagtccgt 7200 cttacagtc 7260 cggcctcca 7320 gtcgacgtc 7380 tcaccgctg 7440 caggactgg 7500 :atcatcac 7560 :tcttaggt 7620 ~cgcatcac 7680 ~gagccgga 7740 iacgagggt 7800 :Laccggcg 7860 ~tgcgcacc 7920 .tcagggag 7980 tcactggcgc ggtgaactgg ctggggcagc agt tacgtgc gagaacctcg cagtggt tgg gc tt~tg tcgg gactcgattc ctttgcccgc t cca tgcggt cccgcgaaga agacgcaata catcgcgact gctaatggcc gagtctgctg atttccaccc ac tggaagcg tccacaccgc at.agaacttc ttcgactatc gaagcatcac ctgctcgatt tcgatccaca gcaactcgct cgtcgctcga cggtggaagc cgcggctggC gcagcctttg ctcccaatca tgcgjatcgtc gccggagt tt gcaggtcgac 8040 8100 8160 8220 8280 8301 <210> 117 <211> 5292 <212> DNA <213> bacterium <400> 117 atgagcgggt cagaaacgca ggctgccgct cgc agcgcgg gatcccggcg cgttttgacg cagcggctgc cggctggCgg ctgcaaatgc agcgttgccg gacacggcat ggtgaaagcg atctacttct gcagcagacg gacgcgc tgc gacggacgca caggc tgtgg ac cggcacgc gggcgcacca gaggcggcag attccgcccc ctcgaaatc ggcatcaact caacaggccg gaggcgctgc acctgctaca acgaacgcgc gataccgcca ggacaagga t cgtgccgcca caagagt tgc ggggtcgcct cacagcatgg gctcgggtga gctgtcgtgg gtctcgattg cagacgatct t cgaccagc t tccccggcgc t acgtgaagt cgacgggccg cccggttctt ttctggaagt ggagccggac gcggcgggga ccgggcgtct gctcgtcatc acctcgcgct gcaagctgaa gc tacgtccg gcgatcgcga gcaatggact gagacgcgcg cgctgggcga acggcaacaa cgggcgtggc atctgaatct cggcacggct cgttcggctt cgtccagtac gtgatctggc cggcgtgcgc aggacttgat caggttttgt cgcagtggcc tcgaagagtg aggggccgct tggccggact gcgaagt cgc tttgcctgcg aattagcgct ccgccagcaa cagcaagctt ggagagcgcg atccaatctc tccacccgac aatgtacacg cggcatcgc t cacctgggag CggcgtCttc tgcgcatatc ctcgtacatc gctggtcgcg ggcgggcggc ggcgatggca cggtgagggc tccggtgatg gacggcgccg cttgcagacg tcccatcgaa gctgaagctc cgcactgatc gaccacgccc caccccctgg gagcggtacg gcccgcaccg gcgcgcatac tcgccgcact ggccgggctg gccgcgcggc cggcatgggc cggccgcgcc cgaccgcatc gtggcgccat ggcagcgcac cagccggatg ggacgaggcc cagcccgcgc cgccgcgCCg cgcagcgagc gatgcctatt cgc tgggaca cggtacggcg ccgcgcgagg gcgat cgaga atggggatct gacgcgtaca ctcgggctgc gtgcaccttg gtcaatctga gccgacggtc tgcggtgtgg gcggtgattc aacgggcccg ctggatgtga gccggagccc gggtcggtga aaggtggcgc agcccgcaca ccggt tgcac aatgcgcacg tacctgcttc cgcgacgtgg tcatacgaac gacagttttc cagaagcgaa cgcgacc tga atgcagcctt gacgtgat tc tggggaatcg attgcaggtg ctcgccggag atcgctgcca agcaccgtcc tgattgcgct ccatcgccct ggtcgttgc t tcgatgccta gcttcatcga cgatcagcct acgccgggct tttccaacga ccggcacggg agggcccgaa.
cctgtcagag ttctctcgcc gctgtaaggc ttgtgctgaa gcggcacggc cacaggaagc gctatgtcga t tgcggccgc agaccaact t tgatgc tgca tcgattggaa ccggcgggcg tgctcatcga cgctatcggc tgaacgacaa accgcgcggc tggcgggcaa aagtcgtttt tggcttctga acgtcgactg aac cggc cct agccggacgc cgctgactct tacgcggcca tcgccgggcg tgtcgggcga cgacaaggtg ca tcggcgcg gcgcgagggc ctacgatccg tcaggt tgac ggatccacag tccacccgac ttattacaac caa tacggcc catggcgatc cctgcgctca ggatcggacg attcgatgcc gcgactctcc aatcaaccag cgtgatccgc ggcgcacgga gctgggagcg cggccacctc gaacgaagcc CacgcttCCC gcgcgtcgCC gcaggcgccg gcgcagtccg ccccgccgac attcaccggg cccgaaccgc cgttttgccg accggtgttc gtcgctgacg gttcgcagtc cgtgatcggc cgatgaagcc gggagaaatg ctcggatcgg cagcgcagct 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 81 ctgggcgaac tgctgcggga actggaggcg aaagacgtct tctgccgtcg cgtgaaagtg 2220 gacat tgcc ggagcgct t attagcggt ctgtcgacg ccccacccg gctgcctcg ctgctgact cgccggctg' caggt cgag ttcgagtcci actccggga.
gggcctcac( gc tgccagg; atctacagc( tccacggcat gatgccttct atcggcccc gaaatgcagz ttgaaagatc ggcccggtct accggcaccc ggccagctcc cgcggcga tc attgccgaga ccgccgtgcc t tgcgcagca ggatctgccg ggaattgggc gatgcgcaca gatcaagttg ccatccgcac ggcggactcg ctgctgggac catgcggacg atgccgccgt aatctcacga ctgcacgaac gcgacagtgg ctggctcatc acacaggt tg tcggttattc ccgcacgtcg gcatcgatgt gacatgcgca acgctgctga agagcgacgc c tggaagccg tccgccctgg aatgctgaac t cgcacagcc c agccgcggc g aagagctgg g ccgtcgccg t tgtCggtcc t tgcggcgcg a acggagtca, r- ccaactatc t ctcaggqttl a ctgtggagg( g cgtggcaccl atgtcgaaci aggttcaact gcgaggatt( ccatcgatct attcgcgact I. tctggcgcgc i cgatcgactc I tgcatgtgcc ggggatacgc Igcagcgtggt Igcgaatcgga fccaatgccgg Lctctgcaaaa gtcaaattgt tcgtgcaagc cggttctcaa ggacgattgc gcaacgac tg cgatccggca ccgatgtggc gactgcaggt gcagcgagcg tggcggaccg tgcggggcgt ccgaacgctt tcaccgccgg gatctcccgg tgcgccgcgc gtttggccgc aaccgcaaca ctgtcatgaa ccc tgc tggc gcgagctcct tgcacgaagc tgggtgatct ggctgcgcgt cgcagcatct cttcgaccgt a tctgatggac ggccgccctt t ttctgcgtac c agccgcggcg a gccgatccag a tgaagacgga c tccggactgg =ctggcagcgt t gcctggccgg :gaaagatttc :ggcaatggcg i cgtgtcattg :ggtactccgt :ctggaagctg :ggatgcgatt :gtgggatcgc Icaacggtgag Ictgtctgcag ggtaggtctg ggtcttgcgg ggcggaattg gattcccacc caatgctggc gcgcggccaa gtactgtccc gggctggcct ctccgacaaa ctatgagcat cgggcatctc agg cac ggtCa gttccgtgcc ggcgggatgg tcctcggcca gcagcagcta gttccatctg cgaagccgcc ccggccgctg ccagggcaact ccagggtctt c acaggcgaac c gggattgcgc cttcgatatc g cggcatcgca c ggcagttcca g cggacacgtg c cggattcgat t caagctttct g cgccgacaag c tgctgccgtt q tccgtgtgcg ggcatgtact tgggctcgta ggtggtcatg gaaacgctcg aacctcgcac tctcgtattt gagcgttatt cggatcccgt gcggatcacc ctcgccgctg acgggcgcgc catgaagagg cacagcgaag cgcgcccgct ggctatcact gtgctttgtc C tgcccgcgg gaccgattct ccggattcca gtggggctgc tggacgg tgc ggacctCtggc acctgccgca tcgccgcgca gagccgccgc gatattgata1 cccgagctgc gcgacgctga tgggcgccgc gacgcaacct :tcgccgccg caactggaag tcggatgcgct 3caggcacgc t itggctccga a ;atcattttg t :acgccgccg g :ccgccgtca g :gcggagacc g ~cgctctaca a rcgcagtggc t :ccgcggccg c rccgggcggc a 'tgcgcttcg a cgttgatgg c 'ccaccctga t tcggcctgc c ctacccttg c cggcgttgcc ccaccgtcac atcttcgcca atgtgtttct gagatcgggc tgcgccggac a tcccaacgg ggatcgatat cgccgctgcc ggctgcacga cgcgccaagg tgacgctgcc gcggcggagc gcatgctgca gcacggcgga tcggtcccac gcgtggacat ccctcgtcca cgctcgctga cggtggatgt agtcgagagt aatggaccgc tcgtcatcgg cggccgatac tcgacgacc C gc ctgtggctC ttcgacaagc gctgcacgct tgctgtcgaa 3cctcagtct itctgatcac :cggagcgcg ;tgtcaacgt :cgcgatcat :ggccgacgg iagtagccgg tctcttctc jcaattcatt rcatcgcgtg S rtctggccgc S agcattgac ccgttacta t ggacaccaa a gcgccgcgc 9 tccagcgaa a cctcgagtt t cCtggcgtta c gctggaaag c taccgttgg c gggcgtggtg 2280 CggCgcagcg 2340 acccgtgatg 2400 ggaactgagt 2460 agcgatLgcc 2520 gctgggagcg 2580 cggccaaact 2640 ccgtccgccg 2700 ggagatgcag 2760 tgtgatcgtg 2820 tctcggcgcc 2880 ggaaaacgat 2940 ttccttccgc 3000 ggcgggcgat 3060 gctcacagcc 3120 cttccgaacc 3180 tccgctgacg 3240 tcacgacgat 3300 agtgcccact 3360 ccgtctcgtc 3420 cgcccatagc 3480 gtcggttcgc 3540 cgagccggcg 3600 gtgctcgggt 3660 gctttccgta 3720 gctgacgcgc 3780 ctggctgcac 3840 cgtcgatctc 3900 tatcgcagag 3960 tcacaagatc 4020 gggcgggctc 4080 ccatctcgtt 4140 caagatcatc 4200 cgatcgcgac 4260 catgctgctc 4320 cgcgtggaaC 4380 ttccgccagc 4440 :ctcgacgcg 4500 ;ggaccgtgg 4560 ;cgcggcatc 4620 ;cagattcgg 4680 :ccgtcggcc 4740 tccggcggcc 4800 ~cggctggaa 4860 LcCcgacggc 4920 cgcaaccgt 4980 ccgacatCtc 5040 'atggccggc 5100 accgccgcg 5160 ggcgaggacc ggagtcccgc cgctgcagac gggttggggg acaaagaaat cgaagctttg gcctccgagt ga gatctcgacg ccgtcgcaaa ccagatcgcc 5220 ttgaaacaga agttcgctca tttttcagga 5280 5292 <210> 118 <z211> 6462 <212> DNA (213> bacterium <400> 118 gtgagttcg, c tggagcac atcgtgggt( t tgcgcagt( gtccggcgci ctcgattcci agcattgatc gggcagacgi caaagcagcc gccaccggcz ggacccagcz tgccagagcc ttctcgccgs tgccgcgcct gtgctgaagc ggctccgcgs cagcaggt cg tacatcgaag gcggaaaccg tcgaacatcg gcattgagtc cggttggagg agacgaaggt ct tgaagaag cccgCcgccg gaca ttgcag gcagggactg ctgtccgcgc acagcgtcgc catcaccgat caggggatca tcgcaatgga t tggaacgct gcgaagctcg gccgcattgt gaggtcgccg tgcagccgca ctgccgctcg gtgt cgaacg atatccgagc a tgcagcgac :tgggctgcc, 3 gagtcgatg i tcctgaaati cgcagcagc( i tggaagggcl j actatttctc icggcgcatac ktcgcgCtcq :tgcgcagcgc agtttatgtz tcgacgcggc gcctgtccgE tcaatcagge tcatccggtc cccatggcac tcggcctccc gccacctgga acgagacgat gaacgtcgct ttgcgggcgt cggcgccgac c tgcgcgggc gcactcccga caggcac tgc atggtgcgga acgcaccgag gccggctcgc cgatcccttc tcggaatggg gcgaggccgc accgcgtcga ggcgttcctgr ccgctcatgt gccggctgtt cggaatgtga gacccaactc g attccccaa, g tctcgatgc.
g gtttccgggi C tattcgtga.
c gttgaaccc( t cgacaacgal 3 gctgctgttc ctccggcagc 3 gatgcagacc 3 cgtgatcgcc i cacggcctgc cgagtgtacc Icgccacctcc ,ggcggacggc Ltgcactcgcc Ltggccgctcc gcattggcc ggggactccg gcgacctgtc gggagcggca accgccgagc cgacattgtg cagcgcgttt tggtagaggc ggctgtcccc cactgcagac ggc aac tacg cgaactgcgt cctgcgtgat tgtttccggc ccagcgacgg gcgcagctgg catgcggcct ggtcattcag gggaatcgag cgcgggtgcg gagccggatc ggccgtgctg caccgtcatc c cttacgccgt g gccgaacgcg :ggcgatgggc g gtaccgcctg :gccacgccgg :tttttcggca 3 gaagtggcgt :cgcacgggcg :gccgatggcg -ggccgacttt -tcgtcttcgc Ictggccgtgg Iaagatgggaa :atcgtgttcg rgccggagacc gccgggctca aatgcgggcg ictcggcgatc ggcgatgtgt ggcatagcgg ttacacgtga aaggaagtcc ggttggtccg gaagctgcga ctcgcggagg actcccgaca ggcattgcag cgggtggcgc ctttgctaca agaacggctg aagacggtat atggaccgcgz tatgtggact cctgcgctctt ccggatgccg t ctgacgctgc a agcggcctgg g tcgacttaca c tccggtgaag t tgcagcaggc acgcgcgcga ccgatgagtt gacgatggga tgaagat tca tttcgccacg gggaggcact tcttcgtcgg cgcgcatcga cctatttgct tggcggcggt c cggcggagt ccgcctcgcc gagaaggctg gggtgtgggc ccgctcccaa tcgcggcgca ccatcgagat gcgcggt cgg gattgattaa gacagctgaa ggccgtggcc gcacgaacgc gcgggt tcca gggacactgg ctgcagacac acgcgatgta 3ggcatacgg :ggccgcagt aagaactggc :cgtcttctc ~acccgttat ;gtcgctgaa :cgcgctgca ~catcgggca c Lggatgcggc rcgggatggc :ggaacgact a .cgaagccct g gtacctgacg acccatcgcg 120 ctggcagatg 180 cgaggagtcg.240 agccggattt 300 cgaggccgtc 360 ggaggatgcg 420 gatccacagc 480 tccgtatacc 540 gaacttgcaa 600 tcatctggcg 660 gaatctgcgc 720 Cagcggtcgc 780 cggcgtggtg 840 cgtggtgcgc 900 tgtcgtgtct 960 gcagatcggt 1020 cgaggcgctg 1080 gtccctgaaa 1140 agcggtgctc 1200 cccgaatatc 1260 cgcgggttcg 1320 gcatgtcgtt 1380 ttcccgaccc 1440 gggcactccc 1500 tcccgacatt 1560 tgtgcttccg 1620 ggaattgctg 1680 Ccgccgcacg 1740 ggcgcagctc 1800 jggacaggga 1860 :cgcgaggcg 1920 igaagaactg 1980 ;gtcgccatc 2040 :agcatggga 2100 ;cggatcatt. 2160 ;atggtggag 2220 Ltcgcccgcg 2280 rgccgaggtc 2340 gtcgcgacgc tggagcggcg aggcgtgtct tgccggccgg catagcccg, ccgcggcccl agcctcgaci atccgccati ctgctgccc( gaacgcggt 1 tggcggaccc cgtcgtttcl ttzgggaaaa( agtctcgctt gctactgcgt tgcgtgctgc acgttgcagc cggcaggcat gcatcgaccc ccggcggc9c cgcgcgctcc gaaacgcgtt tttggtcccc cccgcacgtt ctgctggacS gatgcgtcgg ccgcacgccg gctgtggata tcgccggaaa cacgaaccct cgtcaagcac c tgcaagccg catccggaac ctgcacgaac tacgtcgcgc gatcggccgt gccacatcgc ctgaacttcc attgccggca ggcgtcaccg cgcttcgtca caggccgccg cggctggcgc gcaatccaga aaacgagcgt t tcgtggacg ctttccggcg gagatcggca aacctctcgt cgggagc tgc acgcgagtga cacatcggca cgctcggcgt caagtggacci g cgaccatac4 a gcacgtact( tgccgacac g ccatcggcgc ccatgctcac 4 tgtaccctt< :ggctcgacgc -gcgtcgaagc :ccgtgccttc :atctcgatat j agcatgtgac -tggccatcgc :cgacatgggt Itcccgccgge Fagctgtggcc IagcagatctS ,ccactgcgcc iccggtggaac .Ccgtggtgtg fgagagggaca agcgcatcga ctgcagagcc gcgcgctcac agctcgatga catgcgaccg cgcgcctatg gtatcgctca tgaac tgcac tgctgacgaa gcgtggctcg tccggctcga gccgcccccc tcgacgttct gcccgcgcct actttcgcat ccacgcccgc ccctgcctat ccggcgaacg t cgcacagcg atctgcgctc acatccgcaa atctgctgga agcgcgat ta acacgc tggt tgcaggagat cgaccatcac aaatcgtcat tcgatagcga attgtgcgac -tttttactcc 3 ggctcgcaat 3 cgggcacgat 3 caatgcggcg gtCgCtgggc cggcaattgc ttcccccgcg ctcgacgcaa Igctggcagac *ggctctggcc tttcacacag qgtcgataga cctgcatgct Lttctgcggag itcagatggcg fgagttgtcca ggcgttcctc ctggctgccc *gacgcatgcg actggtcgcc catgcgcggc gccggcggcg cgcatggc tg actccagccg cattctgcat gctggtcacg ggcgcctttc gctgatcgat caacggcgag gcacgaagcg gatcgatgcc gcaagccggt gctcgccctc gggcggcgaaI cggagatgaa cttccgcgt~t cgcgtttctc agtcctgatt tgcgggcgcg gctgggcatcc ttggacgaat c ggcgagcttcc ctatgccggc c cgatttgctc S ggtcgcaaaa t cgaatcggtg g ggcgatgcga g gggaacctac t gaactcctgc acggtgaccg ctgcgatcgc gtctttctcg ctggttccgt gccctctatg gtgcgcctgc cgacacgcga cccggcac tt catcgcgtgc ggaacttccg atgctcattg cccgggatgg tccggggaca acggtgcagg gagcacggcg ggtgaggcga gatgcatgtc gccggcatcg cggctggaag cgcatcgagg tggt tgcacg cgagcggcgc cgcgctaccg ccgctcgagg ctcctccaga cgcggcgcgc tggggtttgg ctcgatcccg aatcaaatcg gatatgcaac cccggagtcc gaagtggagaI ggcgttatgc :gctcgggcc 3tcgtggccc gccttgaagc lccgccgattz acgctgcca ~agatcttcg c ;cgcatgttt c ~aagaaggag t ;atctgctgc :gcaagctgg g ~gcatgtccc t ~tcgaatcgg a raggcgtttc g rattgcgcca a tgattaccg, g tgaaag tgga agtcgctcga gcgcgacgc t cggttctgtt agatcagccc ctctgcgccg aggctgggca cccggtatcc tcacgttggg tcttctggga agggcgaagt agacct tcgg tgccgcgcga cgtcgtttcg ttcgtcagac cccgctgccc tcgagtatgg tcgggcgtct tgcagatcat accggatgcg gacctatcgc gtctgcggct aac tgcgctg ggtcatggct gcaaccgcgt aaatcgtgtt ccctggggcg agccggtcga gccggaccgt ccggcggcga CCtttCgcgg ccgccatgtt tcgaccggct ttgaagtctg ccgacgatgc 3tatcgtggc :tgcgccttg :ggccaacat lcgcgctctc :cggcggtgt :tactgccgg ~ggattcgcg c :agacgtcgt c ~cgatcatgg z ~gcttcgccc .gaagcgccc g .aacctggcg g rcaccatggc g tgcgcccat c 'cggacttgg c cttcgccgcg 2400 cgggattcaa 2460 ggagaccacc 2520 ctggcagggc 2580 tcatcccatc 2640 cgaccaggac 2700 cactgtcgca 2760 ctggcagcgt 2820 caatccgctg 2880 gacggaactc 2940 cgtcttgccg 3000 tgaaagtccg 3060 cggcagcatg 3120 gatttccagc 3180 gcctgcggat 3240 cacagtggtg 3300 tccggctttc 3360 gcgtagctcg 3420 cgccgcggcg 3480 ctggctgcat 3S40 cgatctgtcg 3600 gcagcgcctg 3660 ggtcgctcag 3720 cattgtcggc 3780 gacgcagacc 3840 tttgctcgag .3900 cacgccctgg 3960 tggacagatc 4020 gcattacgaa 4080 agaggaactc 4140 cggcgcgcgt 4200 caaggccggc 4260 gcgcttgcgg 4320 cgccgcgggc 4380 gcccggcgcg 4440 catggggaaa 4500 :agtttcggt 4560 :cccgccgaa 4620 4cgagcggcg 4680 Iggattggcg 4740 ~agtccggaa 4800 :tcgatggct 4860 :ctgaattcg 4920 Lcggttcatc 4980 ttcctgaag 5040 rgcattgacc 5100 jcccctggaa 5160 jcaggcgcgg 5220 gcaccccta 5280 :gggctcggt 5340 cttaccgtcg cacgctggat gatcggacgc cgcgcgcct t gtgcaggccg t tgcgcggcg gaagcgcat t ctcacccgcg ggcgcgcccg taccggaagg gggctggctg ctgacgccgc gtcgccgcga tctgcactgt ttgcgcgcgc ctacagcagc ccgctgaagg ctcacactgg cttgccccgc gc tgcggaag gaagaagcag cacccgaggt atgtttctca tgat tcatgc tccgcaacgt actgcccgct cccagggaaa.
cccaaggcct ccgcgcagga aacacggcct tgcccatcaa t cgagct t t ggctgcaatc agctggcgcg aactcggctt gtctcacgct acctggcgtc gagacagccg ccgtggctgc ccagcaagcc gcgcgatgaa cgcagccgt t ga tggccgcg cgatcatttc ctacgcggcc gccggcgctg caatcgcgga cgctattctg tgtccgccag gcatgacgac ggccgagcct cgtgctgcgc cgattCCCtC ccccgcgacc gcaaatggga cgccatgaaa gctccgagga ggcgcccggc atcgccgtca c tcgagcgcg ctcgacgatg aaaatcgacg gtgctcttct gcgaacgcct agcatcggt t tcgcggctgg gaacagctgc tggcggcagt gcggcgagcg cagacccgca at cgac tc tc a tggccc tgg ctgatttggg ctgccgctgg actgcactca.
4caaggtcgt ggctggtgct tggacgcaga tgatt ~c t tc cgctgctact gtgcctggaa Cctccgctgc ttcttgacgc ggggtgcgtg Ct ttgCgCgg tgaacagctc tctatcccaa aagccga tgc ggacattgct aaactatcga agtttcgcaa gtcatcccac tcgaagcgca gcgggt tgga ga.
gctgagccgc tgtccggacg Catcgatcga.
gaaccagacg cctgcacttg aggac tgc tg gc tggc ctac gtcggaggtc catggaaaac ggct tgccac ggcggcgcag gccaaacgcg cgaagaacat tCCCCtgCgC ccgtctcgaa gctggccggt ggccgcggct cgacatgtcg 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6462 <210> 119 <211> 5088 <212> DNA <213> bacterium <400> 119 gtgagggaaa aaattgcgcc catgtcgtcg cggcaaaaca gcgtgtcgtt ggtgtcgacg tccgatcccg gacggtttcg cagcagcggc ggccctctgg ctgtttcagt agcatgttgg gatacggcct cgcgaatgcg atcgctttgt gcagccgacg gatgcgctgg gacggacgga, gaggcggtgg acgggcacgt ccggactctg gcggcgggaa ccgcacctgc atcgccacgg tcgttcggtt tcgcaggctt t tccgggcgg gtgtcaccga atgctccggg atgcggaatt tgctgctgga cgggcagcgc acgccgaccc ccaatcgcat gctcctccgc atgcggcatt cgaaggctcg gttatgtgcg ccgatggcga gcaatggca t ccaacgcgca cgctgggcga cgccttgtct tcgccgggct attttcgccg aatcgtlcgcc t tggagggag cgacctggtt cgcgaagaat ggtgccgcca caaggcgtat cttcggcatc agtgtgctgg gaccggcgt c tgcccgcatc ctcctatctg gctcgtcgcc cgccggcgga catgt tggcg cggcgagggc tgccatccgt cacggcgccg catcgatcca tcctatcgag gctgggttcc gatcaaagcc gctgaatccg gt;ggacgtcg caacgcgcac gtcaaactcg cacgccgaac ccggacgcct gaccgctgga gcgcgatatg tcCCCCCgcg gaagcggcag tttgccggct ggagcttggt ctcgacctgc gtccatctgg gtgaacttga.
cccgacggac tgcggcatcg gcagtcatcc aatctgcagg tcccacgtat atcgaggccc gtaaagacca gtactcgccc aacatctcac gaaggacggc gtcatcctcg cgctattggc ccatcgccat tctggacgct actcggacca CCgCCttCCt aagctctgaa.
aggacgccgg cctgcgccca cgggt tccgg gcggtccgag ct tgccaaag tcctgactcc gctgcaagac tgctgctgaa gcggctcggc cgcagaaggc cgttgatcga tgcagtcggt acatcgggca tgcagcatcg tggacggcag cgcgtctzggc aagaggcgcc gcggaacatg cgtcggcatg 120 gttgaagaac 180 gtactactcc 240 cgaacgcatt 300 catggatccg 360 catctctccc 420 ggacttcgga 480 cgtggcgcat 540 catggcggtc 600 cctgcgccgg 660 cgagggcatg 720 gttcgacgcc 780 gcggctctcc 840 aatcaatcag 900 ggtcctgcaa 960 ggcgcatggc 1020 ctacgacgcg 1080 tctggagiggc 1140 caccattcct 1200 ccggtttcgc 1260 cggcgtcagc 1320 tgcactccct 1380 ttgccgaagc cggtcacacg cccgcagctt ctcactctgt~ cggcgcgcac c tcggcgaai tccgacgttl accgccgca.
tcgt tgcgcc ggcatgggc, gccgattgg( tcgccgt tg( gctcagttct gag tatgt tc accgccagg aatccgagcc gaccgcaccc cgcgccggcc gatccgattt atcccgctgS cgctactggc gcggaccgcc gggcgatatt tcggat tggt gactggcgcS tttcagcgtS atgttgggag acggagacgc tggcacgtgt gtctccgatg caagcgat tg ggcga ttcgt catttcgagc atgaacgcac cgcgatggcg taccggctgc gcggagcgtt cg tccgccgg ttcgtcggtg ctggctgtaa tggcaggagt tggttgctgt gegt ccggcg gcgcggctct gt.cgatcctt gagctgtggg ctgctggatg cgctacgttc gccgacggcg cgcttgatcg gccgatctcg caacagctgg gccgcaggcg gtgctggcgc 2tggccggcc t gcttcacca 4 atgcggcag :ggcggccgg 4 cagagcttt tccgtcccc acgagacgg ggctgtcgc I cggcgtgtg 3 ggcggctggl gcatcgcggi ;tgatctccg Itggaaacgc( :tggacaagtl Itgtcgaacgi -ggcgacagtt Iagtgcaagct gtctgcccgz *ccgtgctgct gtctctatgc acaccttcac cgcgcctcaz *ctctgctccz cggcatttct tcatgatgcc tcacacccgS ggaagctcca tgccggcgca gcggcgtcga aggcgctggg accccggcct gccagcccgg caggt tct ct atctgacgct tgcatgccgg gcgagcgatc gtgccggcgc tcactctccg ggctgatcac tccaggctcc gcggcctgat aaatccgtta gccggctggt tctatctgat agcaaggcgc agcaactcgg cggcgctgc t tgctcgatga cgaagctgca a cttcgcgga(.
g tcaggttggl a ggctgtagcc.
aatcgcttti a taaaacgcac gctcgatgti gtatacccac t cggcgtccgc t ggccggcgcc t. caatgcgctt :gctcgccgcc g cacggctgcz 3 agagctgaac :cgaagcgctt :cagcggagcc :gcgcgaaacc :gtttctggaa itgacggcgcc :ggaaagtctt :cggggaatca jcctgagacgc Lcagcgcgttg Ltgagcacgtg :cgaagcggca Fgcaggcactg cgaggacggc cacggcagcc gccttccgaa tcttggcccc gcgaatgcgt gatcgattct cgcatacgtg gcgctgtcat ggt tgaagag tacgc tgcaa gacaacgttg agacgatgtc ccaggcgctg gcgcggcgtg actgtgggga cgacc tcggt tgccggcgac gcggcacaag caccggcggt gcgccgtctg ggctgcagtc ggcggacccg cggggtagtt gggtgcctgg 3 ttcctgcag' 3 cgcgacgca g gCattggcci, ctCttcaccc 3 Cctgttttti ccgctgaccc.
;ccggcaatgl Sccggactacc tttagcgtgc ccccgcggci aaggtggcac Lgaaatcgcgc gtatcgcatc gcaggtgcgz gtattgccgz gtgcagtttc Latcggcccgc rgtctggctgc ggcggcctgt cccagccgcc gtacccgcga iggcgatgtca atctacgacS caggaagtct gccatcccgc gaagcaaagg agtctgcgcg gtcatttccg gccttcagtt ctgccggtgg tgttttcaag ccggtcggga gcgcgtctgc accggcgcgg tccgcacagt aagtccgacg gccggtttga gaacagaccc catcgcatca ctcgggcagg tgcgacaatg gacaaagcga gaaacgt cga ctcggcgcat gtactggtcg atggttgctg cgcacc cagc acagaacaga aatcttcacc t cgcacccgaa t atagtcaccg 9 cggcgccgcg 3 gccagggcgc :gcgacgcgct 3 ttctcttgtt ttgccctgga 3 tgctgggcca 3 aggacggcct i aagcggtcat jtcgccgcatc ;aagcgcaaga I cgttccattc itcgcgtatca iaaggcacgac jaaagcgcgat atcccacgct actccctatc ttaccgcggg Itcgcgctgcc gagagccggc tcttcgaaaa cggtcattgt tcggtccggt cggatacgcc tgcaggtctt cggcgac tgc gcgatgcgt t gggtggaaga ctgaggatgg tattcggagc tcgaagcggt gcccgagctc tCatcgccga cgtggctgca gcCCtggcaa tgccgcaaga.
agactttggt gtgatgacga cgatcgcgcg ccgacatcgc tcgcattgcg agcggccgcc taggacgaag gccggcatac cttgcgatgt cgctgcgtgg 2 cgtcggctcg t agctcactcg c cgacgaagcg 1440 tgcgttgctg 1500 cttggcgatc 1560 gcgcgaagta 1620 gcagtacgcc 1680 cgatcgttgc 1740 cgagtcggtt 1800 atgggctctg 1860 cagtctcggc 1920 gCggctggtg 1980 cgttcacgcc 2040 gaatgcgccg 2100 tgacctgcat 2160 gccgctgatg 2220 gccgctggcg 2280 actcgacgcc 2340 gcgaaccctg 2400 caccacgctg 2460 taagggacga 2520 cgtgaatccc 2580 gacgtatccg 2640 gcgcggcygc 2700 ttcgctaacc 2760 gcccggcgcc 2820 tcc ctgcgcc 2880 ggtcacggtg 2940 cagccaggat 3000 cggcgccgtt 3060 ctacggcgcg 3120 agtctggcgt 3180 cgcgaacgct 3240 gacttggccc 3300 gcgcttctac 3360 gagcggcccg 3420 gttttccgga 3480 ggatgtgcag 3540 gccggaggac 3600 qctgcgcgtc 3660 cggccgcccg 3720 :gcgactccc 3780 :gagcatccc 3840 :gccgccatg 3900 -aacggacgc 3960 :gccatttca 4020 3gtggcacgc 4080 ;gaggcagtt 4140 ~agttccgag 4200 Lgtcgtgcat 4260 :ttcgagaag 4320 :caccatgcg 4380 ctcgactttt aattactcgg ctaccggcgc gcgcggcaag ggcgatctgc aggcggagcc cggcggcagg gatgcgtcca atggcgctgg ttgctatacg ctcgacgtcc gcggcgttca tcgtactctt cggccaacgc tgagcatcaa gcc tgccggg tgggcgagac cggcgagcga aactgctgca agacgctcga atzctggcgcg accatccqgac ccagcgattc t cacggaaac ctcttccgcc atttctcgac t tggggacca ggtaccgctg tgccgctcag tcccggcttc gatgcgcatc cccgcgccgg cgccatcgga cgtcgagaaa cctcgtcgat ct tgcaccat gcttcgctgc agcc ttgccc tgggcgggcg ctgccgccgg atcgcggtgt atccagcaac cgcaagcagg ccgctcaagg gagctggtgc ttggccggcc gaagtgcggc ctgggagagg tcggttccgc acatgcgccg aaggca tggc aagtgggtgc tccaagtctc tcaccgaagc ccggcggcgt aatacggact gcaagagcct atgtcctccg agctgtccga aacgatga cggacagagc cgcgcaagga cgcgcgcatc gcgcatcttc cgccgaaaaa tgcgccggag gctggcgctc cgattcgctg tcccgcgaca cgaactcgga gcaggagatg 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5088 <210> 120 <211> 4306 <212> DNA <213> bacterium <400> 120 atgagcgatc ctcgacgaac tttcccggcg atccgcgaaa gcgccgggaa gacgccggct c tgc tgctcg gcgggcagcg cctaccgatc gccggacgga t9CtCttcct agcatggcgc tgccgcctgc ggttacggc cgtgacggcg agcaacggcc aagaacgccg ccgctgggag gtcgattctc gcaggtatcg catctgcatt gccaccgcat t tcggaatca gcgaagacga gcgagtgtag gagggggaca gaagtgccgg tCggcgcgCC cacgctccgccgcgcagcgt tcactcctct tggagagcgt cggactcgcc ttcctgcggg agatgtacac tcttcggaat aggtggca tg acaccggagt cggcgctcat tctcctatct cactcgtggc tggccggcgg gggccatggc gcggcgaggg atcgtattct t cacggcgcc gcatggcccc atcccatcga cgttgatcgt ccggcctgat tcaacgcgcc gttcgccatg g tggcac caa atgtagaggc aggccaaagg gccgeccgcg tcccggatca atccgcaggc tctcCgCqct ttgtggcctc tcaacaggcg ccacaacgaa ggaagcatt t ccgttgggat gcgtctgggc cacgccgcgc ggaagct ttg gt tcatcggg tg acgC ct a t gctggggttg ggttcatctg cgtgaacctg ggccgatggc a tgcggaatg ggcgctgat t gaacggtccg cgccgatgtc ac tgcgggcg cgggtcggtg caagaccat t caacccgcac gccctccaac ttcgcacgtc gaagacgaat gaatgtggag aagcggcggc actccatgcc tctgcgcgat gtLgttccgcc atccctgccc gtcctggcgc cccatcgcga tggcagctcc gccgatgcgt ggattcctcg gaggtcgccg gagcgtgcgg atcagcaccg accggtaccg cagggaccga gcgtgccgca attctggcgc cgt tgcaaaa ctggtgctga cgcggatcgg gcgcaggaag gattacgtgg atggcagcgg aaaaccaact ctcgccctgc gtactctgga ggCCgCCCCC gtcctcgcag gtagaggcga gctaaggcta ggggggt cgg gaagacggcc ctcgccggcg gccagcctga gagttcaatc t caagcgcac tcgtcggcat tgcacgatgg tttacgatcc atggtgccgt gtctggatcc gt cggc cgcc acgactacag gaaccgcgtt acttccccgt gcttgcagtc cggaaagcac gtttcgctgc agcggctgtc ccgtcaacca ccgtgattcg aagcccacgg tgc tgggcga tcggccacct agcaccgaga atgagctgcc gagttgccgg aagcgaagac agacgagtga gtgctagtgt gccggccgcc gcgaatacct cctatcgcga cgcgcagtca aattgctcga gcgagcgcgt ggcttgccgc 120 catcgatgcc 180 cgatcccaac 240 cgacggcttc 300 gcagcagcgc 360 cgacagtctc 420 ccggctgaaa 480 cagcactgcc 540 cgacacggcg 600 gcgagagtgc 660 gatctacttc 720 ctccgccgac 780 cgatgcgacg 840 cggCggccgc 900 ggcggcgctc 960 aaccgggacg 1020 ggggcgtgcc 1080 ggaggcggcg 1140 gattccgccc 1200 gctaaagata 1260 ggtgagctcg 1320 gaatgtagaa 1380 agaggtcaag 1440 ccccctcctc 1500 cagccgcgag 1560 cctaccgctt 1620 tgggcgcttt 1680 ctacgaacat 1740 ggccttccgg 1800 cgcaatgaaa ctcgccttca tccgacgagc gtggaatggc gatcgcgtgc tggggaattc gtcgcaggca ttgctcggac aaggccgtgc cgcagcacgg aggcgcggcg gacccgctcg gtgccgttct tactgggtag ggtgatgagt gaggatagtc gaaccggagc ccggactggc tggcagcgcg ggt cat ccgc gagtccgtgc gtggtgct tc gcgggtggCt ggcgcctcgc gaaggcggtt gtcgcgccca gcggccttcg tggcggCgCg gagtccggtt acgcttggce tgtttCggCa ggcga tg tca gcccaggaac ctggccgagz ggcatcgca ga tgccgagz accgaggatc at cg ttcaa tcgcagcatc ccggtggtgC cccgagttcc ctctgccga ccaatcgcgg tcttttccgg ctgtcttccg ggcttgcgga agcctgcgct gcccggacgg ttctcaccct ttcgcggCCg t cgctgaacg tgttctcggg tcttctgccg agaacgaatt tctccacggt ccaatctgcg tcacgcagtt tgcggacgct ggcgtgagct gcgcgc tcgc agcgcttctg tgctcggtCg tctctctgga ccggtgcCgC gcagcctgga gagtgcaggt ccgattggac cggtgagcct catcgcaggg iacggcgaggc acacgcttca gcgaaggcag Lgacccgccgg Lccctgtgtga -tggagcgcca kgtccaacggc IccgcggtggC Ltaccggatca Iagaccgatcc aattcggagt cgcgaatctg ccgcgCgggC I cttgcacctg 3 agattctcgc cgtcgccacc ccagggcgga atcggcgatc cctgCtCgCC gt tcgccgt t cg tggccgga ggaggacgcg gggagcgatg cggtctcact agaccgtgtg gctgattcag gcgccaggaa tgaaggacag acagccagtc cctggagatc cggcataaac gctcgagttg ttcgtctcCC gt tcgcgacc caaggtcgag tgcgctgccg ttatgtggag agagatccgg catactcgag cgagcacgcg gcccacactt gatgcattac agtggcgcga tcctgccttg cgccggtcct cgatcttagg tgcggaaggc atccgaatgg aacggtgtcq gcgagggctg gccttaccgg gtcggctgCe gggacgcata gctgattacS accggtgtgS catcgatctc ggggagttct ggtttcgccg cagtacccgc gaacgttgcg gacgagtcgg caaatcgcgc cacagcatgg gcccgcatca gctctggtcg actgtttctg gctctcgagc gtggatgtcg ctcggccgcg ttgagcacgg cgt t tctggg agtccgcatc ggactggttc ctcgccgcgC gacacgcgcc t cgacgcggc attgcgctgg tttctcgccg atggcgctgg tttgaacaaa ggacacgcat cgcggcacca ggggatcgca ggcgacacct c tgagcgtgc ctcgatgcct tgcgtgcctg gtgcatgcgc cacgtcatcc ttccacgcta ggtgcatggc ggcacaaaca ggcgtcattc gggggaaccg cagctgacga fgcgggcgttc Iggtctgggac gacactgccg gaacgtcagg atcccggagt gcatggcgta acgccgcctt gagcatggct tggtcgaact gagaagtggc tctgtcgCCg aac tgccgc t tcgcggccag atttgaagga cttcacacag ttattgcaaa gcgaggcg tg agtcgttgca ctgtgctgac gccccgtact tctacgtgaa tggatctgcc gaagtttgcc cgccggacac atcaccggct ccgcagccaa tgc tggt tgt tccgcatctc tggctgcggC t cgagggcga tccgcggcat cggatgccgt gtttgcaggt tcgccatcga ggctgacggg tcgaggtcca tggaatggga tggtcattgc cggttgtgat actgcgggag cctgcgaaga aacaagcgtc atgcggagca gtaccatcgc gtgaagtcga gcccgg tcgtccgaaa 1860 tcgcctgtat 1920 ccgcagcttc 1980 gagccagatc 2040 gctgcaatcc 2100 ggcggcccat 2160 cagccggctg 2220 cgatcgggcg 2280 caacggacca 2340 cgacttcgag 2400 ctcgcaggtg 2460 acgttccgcc 2520 cgacgcgtcg 2580 ggcgatggct 2640 gccgtcgatc 2700 gcgcCgCgaC 2760 tgggcagcgt 2820 gacgtatcc 2880 ggcagttggc 2940 acacgtctgg 3000 caacgagctt 3060 ggaagtgttc 3120 tccttccgcg 3180 cagtctggcc 3240 gccggacaag 3300 tgacttctat 3360 cgcggaagtg 3420 tcgcgaagca 3480 gctgggcgcg 3540 acggttgcac 3600 gcggctcgag 3660 aggcctgcgt 3720 gccgcagctg 3780 cgatgccggc 3840 ttcgggtcgc 3900 cctggatgag 3960 cattttgcgc 4020 cgacgccgaa 4080 tctgcagatg 4140 ggccgagcat 4200 ggtgcaggcg 4260 4306 <210> 121 <211> 1537 <212> PRT <2.13> bDacterlum <400> 121 88 Leu Gin Cys Pro Glu Ser Ala Val Asp Leu Gin Gin Pro Leu Val Arg Met Thr Ser Giu Ala Tyr Arg Asp Gly 145 Giu Val Leu I His I Arg S 225 Pro V Gi As' Va.
Arc Le.
Arc Ala Leu 130 Gly Val1 The ~eu :le er al1 y Leu p Leu 1. Ala 0 a Thr 1 Ser Ser *Arg 115 *Val Ala Ile His2 Val C 195 Leu Ile TI Ala S 10 Gin Leu Arg Asn Arg Ile Asp Asp Arg Giu Thr Tyr Al a 100 Gly Asp Pro Pro kla 180 lnI lia yr L er P Ser Val1 Leu Leu Leu Ala Met Leu Arg 55 Val1 Pro Pro Ala Arg Val ,ys 165 Pro Ala 70 Leu Giu Phe His Gin 150 Ser Phe Pro Ser Ser Asp Pro 135 Thr Pro Asn Ala Va.
2! Pro Met 40 Asp LeL Giu Pro Ala Gly Pro Ala 105 Pro Gin 120 Ala Leu Val His Asp Asp Leu Giy Val Sex Ala Gin 90 Tyr Ala Arg Ser Giu 170 Glu *Arg Phe Leu 4S Asp Leu Ser Ala Gin Ala 75 *Gin Ala Leu Asn Ile Ala Leu Arg Arg 125 Thr Thr Ile 140 Ser Val Pro 155 Ala Val Leu Asn Cys Phe
L
Asp Gly Ser Trp Trp 110 Ser Ala Val Ile 190 Gly *Leu *Vai Phe Ile Leu Giu Asp Asp 175 Ser Pro Ser Pro Ile Al a Gin Ser Phe 160 Gly Arg 185 ~er Gly Lys Asp Gin Val Leu Ala Ile Val Val His 4.p .eu 'he Phe Ala 230 Ala Trp 215 Arg Ala 200 Ser Thr Phe Leu Ala Val Leu Gly Arg 250 Val1 Gly 235 Trp Met 220 Pro Gin 205 Val1 Pro Asn Asp Val Glu Glu Ala Leu 255 Leu Pro 240 Leu 245 Ala Gly Thr Giu 260 Gly Giu Arg Leu Trp 265 Asn Tyr Trp Ser Ser Gin Leu 270
I
Ser Gly Gin Leu Pro Val Leu Asn Leu Pro Ser Asp Arg Pro Ser Pro Pro Ala 305 Leu Thr Gin Leu Arg 385 Tyr Val Gly Leu Val 465 Ile Leu I Va.
29( Let HiE Sex Pro Arg 370 Ile Ala Leu Glin kl a 450 3iu ?he ~eu 27~ 1G1r i Thi Ala Gir Gitu 355 Gly Arg Arg Gin Ser 435 Met Thr Asp Giu i Ser 7Ala Thr Giu 340 Phe Gin Ile Gin 420 Giy Pro Asp Ala1 Gly I 500 Ph Ly Lei 32~ G1~ Ala Let Thr Val1 405 Pro Gly Leu la.
:le Arg Leu 310 *1 Met 5 iI le i Asp 1Ser Leu 390 Glu His Arg Arg Gly 470 ThrI Ala C Gi 29 Ly Al Let Let Gl 375 Leu Arg PArg M4et 31n 455 eu lie iu lu 280 y Asn 5 Ala Ala .1 Thr Val 360 Asp Gly Leu Ile Ala 440 Ser Ser Giu Asn I
C
Arg I Sei Let Phe Gi) 345 Giy Pro Ala Gly Pro 425 Trp Arg kl a krg ~ro ;05 le :His i Ala Gin 330 Thr Tyr Asp Ile Pro 410 Giu Gly Phe Phe Leu 490 Ala Gin I Ser Arg 315 Val Leu Phe Phe Glu 395 Gly Ser Ser Asp Leu 475 3er '~ys I ~eu I Phe 3 0( Arc LeL Thr Val1 Asn 380 His Leu Val Leu Leu 460 31n eu 285 Arg Gln Leu Asn Asn 365 Thr Gin Arg Pro Thr 445 Asp Tyr His Ile Asr Sex Gly 350 Pro Val1 Giu Val Phe 430 Leu Leu ksn ?he Git 1Ala Arc 335 Arc Vai Leu Tyr Leu 415 Met Glu Met Thr Ala i Pro SThr 320 Trp Thr Ile Ala Pro 400 Phe Leu Ser Met Asp 480 Val1 495
G
?ro Val aeu Giu Val1 510 Glu Asp Trp Leu Asn Pro Leu Leu Thr Thr Arg 515 520 525 Ala Thr Ala Ala Giu Phe Pro Ser Gin Cys Val His Giu Leu Phe Giu 530 535 540 Ala Gin Val Giu Leu Thr Pro Asp Ala Ile Ala Leu Ser Phe Gly Giu 545 550 555 560 Gin Asn Leu Thr Tyr Arg Giu Leu Asn Gly Ser Ala Asn Arg Ile Ala 565 570 575 His Tyr Leu Arg Ser Arg Gly Ala Gly Pro Gly Glu Met Val Gly Ile 580 585 590 His Val Thr Arg Ser Leu Giu Thr Val Ala Gly Leu Leu Gly Val Leu 595 600 605 Lys Ala Gly Ala Ala Tyr Val Pro Leu Glu Pro Giu Tyr Pro Ala Gin 610 615 620 Arg Leu Arg Leu Met Leu Glu Glu Thr Arg Pro Val Val Val Leu Asn 625 630 635 640 Val Thr Glu Ser Glu Val Trp Thr Gin Pro Asp Thr Asn Pro Asn Pro 645 650 655 Leu Ala Thr Pro Ala Asp Leu Ala Tyr Val Leu Tyr Thr Ser Gly Ser 660 665 670 Thr Gly Arg Pro Lys Gly Val Gin Ile Thr His Gin Ala Val Val Asn 675 680 685 Phe Leu Ser Ser Met Arg His Glu Pro Gly Ile Ser Asp Arg Asp Thr 690 695 700 Leu Leu Ala Leu Thr Thr Phe Met Phe Asp Ile Ser Ala Leu Glu Ile 705 710 715 720 Phe Leu Pro Leu Ser Ala Gly Ala Arg Val Val Val Ala Asn Gin Glu 725 730 735 Thr Ala Val Asp Gly Giu Arg Leu Ala Arg Glu Leu Ala Arg Ser Lys 740 745 750 Ala Thr Met Met Gin Ala Thr Pro Ala Thr Trp Arg Leu Leu Leu Ala 755 760 765 Ser Gly Trp Pro Gly Asp Arg Arg Leu Thr Ala Leu Cys Gly Gly Glu 770 775 780 Ala Leu Pro Arg Asp Leu Ala Asp Arg Leu Leu Gin Arg Thr Ala Ala 785 790 795 800 91 Leu Trp Asn Leu Tyr Gly Pro Thr Glu Thr Thr Ile Trp Ser Ala Ile Gin Arg Ala Asn Ile Gly 850 Gly Tyr 865 Ser Phe Arg Arg Gin Val Ala Val 930 Giu Asn 945 Ala Asp Val Glu Tyr Glu Trp, Arg 1010 Glu Trp 1025 Val Thi 835 Val Leu Asp Gin Lys 915 Arg Asp Gly Ser Gin 995 Ser Val *Th~ 82( *Gir *Ala *Asn Pro Arg 900 Ile Ser Al a His Glu 980 Asn Ser Gln 805 7 Thr Gly Asp Gly Leu Tyr Val Leu 840 Gly Giu Leu Tyr 855 Arg Pro Glu Le~u 870 His Gly Thr Arg 885 Asp Giy Ala Leu Arg Gly Phe Arg 920 His Pro Ala Val 935 Ala Gly Lys Tyr 950 Arg Ala Thr Ala 965 His Val Thr Gin Ala Pro Asn Ala 1000 Val Thr Gly Glu 1015 Asp Ser Val Asp2 1030 Ile Gly Cys Gly I] Prc 821 AsE Ile Ser Leu Giu 905 Ile Arg Leu Ala Trp 985 Asp Pro k.rg ~hr Vai Ser Asp Arg Gly Gly Ala Asp 875 Tyr Arg 890 Tyr Leu Glu Thr His Ala Ala Ala 955 Ala Asp 970 Gin Ser Pro Giu Ile Pro Ile Leu 1035 Gly Leu I Ile Gil Met Gir 84E Ala Gl 860 Lys Phe Thr Giy Gly Arg Gly Glu 925 Val Val 940 Tyr Ile Thr Phe Val Trp Phe Asn 1005 klia Ala 020 kla Ser .eu Leu 810 815 tArg Pro Ile 830 iPro Ala Pro Leu Aia Arg Val Ala Asn 880 Asp Leu Ala 895 Ile Asp His 910 Ile Glu Ala Thr Aia Arg Val Pro Leu 960 His Asp Arg 975 Asp Thr Thr 990 Ile Val Gly Giu Met Arg Arg Pro Arg 1040 Phe Arg Val 1055 Arg Val Leu Giu 1045 1050 Ala Pro His Cys Ser Giu Tyr Trp Ala Thr Asp Phe Ser Gin Lys Ala 1060 1065 1070 92 Leu Asp Tyr Ile Ala Ala His Ala Asp Arg Thr Gly Leu Ala Asn Val 1075 1080 1085 Arg Thr Phe Arg Gin Ala Ala Asp Asp Ala Cys Glu Ile Asp Ser Arg 1090 1095 1100 Ser Cys Asp Ala Val Val Leu Asn Ser Val Ile Gin Tyr Phe Pro Gly 1105 1110 1115 1120 Giu Ala Tyr Leu Arg Arg Val Leu Ala Giu Ala Val Arg Val Val Lys 1125 1130 1135 Pro Giy Gly Ile Val Phe Val Gly Asp Val Arg Ser Leu Pro Leu Leu 1140 1145 1150 Giu Thr Phe Tyr Ala Ser Leu Giu Val Gin Arg Ala Pro Ala Ser Leu 1155 1160 1165 Thr Arg Asn Giu Phe Arg Gin Arg Val Arg Ser Leu Ala Ser Gin Glu 1170 1175 1180 Giu Giu Leu Val Val Asp Pro Ala Phe Phe Phe Ala Leu Arg Glu Gin 1185 1190 1195 1200 Ile Pro Glu Ile Gly Arg Ile Glu Ile Leu Pro Arg Arg Gly Arg Ser 1205 1210 1215 His Asn Giu Leu Thr Arg Phe Arg Tyr Gin Ala Ile Leu His Ile Gly 1220 1225 1230 Ser Arg Giu Ala Giu Glu Pro Glu Ser Asp Arg Arg Arg Cys Gin Thr 1235 1240 1245 Ala Ala Glu Ile Arg Arg Val Leu Thr Asp Ala Gin Pro Giu Leu Ala 1250 1255 1260 Ala Phe Thr Giu Ile Pro Asn Ala Arg Leu Thr Ala Giu Ser Ala Ile 1265 1270 1275 1280 Val Thr Trp Met Asn Gly Asp Giu Ala Pro Giu Thr Leu Gly Glu Leu 1285 1290 1295 Arg Asp Arg Leu Arg Gin Thr Ser Pro Ser Gly Val Asp Pro Ala Asp 1300 1305 1310 Leu Trp Arg Met Asp Giu Asp Leu Pro Tyr Arg Val Ala Ile Asp Trp 1315 1320 1325 Ser Ser His Gly Pro His Gly Arg Phe Asp Ala Thr Phe Cys Arg Ala 1330 1335 1340 Ala Ala Gly Pro Pro Ala Ser Arg Pro Arg Arg Arg Leu Ala Gly Pro 1345 1350 1355 1360 Tyr Thr Asn Asp Pro Leu Arg Ala Val Tyr Thr Arg Thr Val Val Pro 1365 1370 1375 Gin Leu Arg Thr His Leu Lys Glu Lys Leu Pro Asp Tyr Met Ile Pro 1380 1385 1390 Thr Ala Trp Val Val Leu His Glu Met Pro Leu Thr Pro Asn Gly Lys 1395 1400 1405 Ile Asp. Arg Asn Ala Leu Pro Asp Pro Glu Pro Ser Arg Arg Ala His 1410 1415 1420 Ala Glu Ala Phe Thr Pro Pro Glu Thr Pro Val Glu Gin Val Leu Ala 1425 1430 1435 1440 His Ile Trp Gly Glu Val Leu Gly Met Asp Gly Ile Gly Val His Asp 1445 1450 1455 His Phe Phe Asp Ser Gly Gly His Ser Leu Leu Val Thr Gin Met Ile 1460 1465 1470 Ala Arg Val Arg Asp Met Leu His Val Glu Val Pro Phe Arg Thr Val 1475 1480 1485 Phe Asn Ala Pro Thr Val Arg Gly Phe Ala Val Ala Ile Gin Asp Gly 1490 1495 1500 Val Asp Pro Gly Trp Ala Arg Arg Ala Ala Asp Leu Leu Ile Ala Val 1505 1510 1515 1520 Ser Gin Met Ser Asp Val Gin Ile Glu Arg Met Met Ser Ala Ala Gin 1525 1530 1535 Asp <210> 122 <211> 2766 <212> PRT <213> bacterium <400> 122 Met Gin Asn Ser Ser Pro Asn Thr Ile Asp Leu Ser Leu Ala Arg Arg 1 Gin Pro Arg Ile Leu Cys Val1 Arg Pro 145 Ile Giu Giy Al a Lys 225 Arg Leu Arg Leu Pro Arg Ile Thr Gin 130 Pro Leu Thr Arg Val 210 Lys Pro Leu Arg Trp Ile Ser Giy Leu 115 MetI Leu Leu Phe Pro 195 His Gin Arg 5 Asp Arg Glu Asn Phe Leu Ala Leu Leu Giu Gly Val 100 Giu Leu Ala Leu Leu Arg Leu Thr 165 Vai Arg 180 Ser Pro Gin Gin Leu Ser Pro Ala 245 Leu Arg His His 70 Ala Asp Pro Arg Thr 150 Leu Asp Leu Thr Gly 230 Glm Leu Asp Gin Ile Val Gly Val1 Asp 135 Lys Ser Leu Met Ser 215 Thr Gin Gin Ala 40 Leu Arg Val Giu Val 120 Ala Leu His Thr Giu 200 Leu ELeu rhr Gi.
Ala Asp Gly Gin Ala 105 Gin Gin Ile Ile Arg 185 Leu Asn Pro rrp 10 Asn Pro Pro Pro Arg 90 Arg Al a Ile Cys Ile 170 Ser Pro Gin Phe Arg 250 Ser Pro Glu His Leu Asp Leu 75 His Gin Asp Pro Leu 155 Aia Tyr Ile Thr Leu 235 Gly Ser Ser Asp Giu Ser Giy Phe 140 Asp Asp Giu Gin Ala 220 Asp Ala Leu Pro Ile Ser Leu Ile 125 Asp Asp Al a Ala Tyr 205 Gin Leu Val Ala Al a Arg Leu Leu 110 Ala Leu Lys Trp Phe 190 Gly Gin Pro Glu Arg Ile Gin Gin Tyr Asn Val Leu Arg Ser Ala Arg Giu Ala Arg Lys Gin Gin 160 Ser Val 175 Val Gin Asp Trp Tyr Trp Thr Asp 240 Thr Thr 255 Ala Leu Gly Arg 260 Asp Leu Thr Asp Gly 265 Leu His Ala Phe Ala Leu Arg 270 Giu Gly Ala Thr Val Phe Met Thr Ala Ile Ala Ala Phe Gin Val Leu 275 280 285 Leu Ala 305 Asn Ser Gin Asp Asp 385 Tyr Pro Phe Leu Leu 465 Thr Ala His His 290 Gly Met Leu Asp Leu 370 Ala Met Ser Asp Ala 450 Leu Arg~ Gin Gin Arg Arg Ile Leu Phe 355 Ser Pro His Gly Ala 435 Ser Ser Arg kla Lieu Tyr Thr Val1 Ala 340 Pro Arg Ala Asn Asp 420 Ala Val1 Pro Asp Glu 500 Thr *Thr *Gin Leu 325 Arg Phe Ser Ile Gly 405 Gly Thr Val Ala Ala 485 Arg Tyr Ala Arc 310 Arc Thr Glu Pro Thr 390 Gly Leu Ile Thr Val1 470 Gly rhr kl a Gin 295 Giu Gly Arg Arg Val 375 Val1 Ser Met Ala Asp 455 Arg Pro Pro Glu Giu Thr Asp Asp Leu 360 Phe Met Lys Ala Ser 440 Pro Ser Pisf H~is Leu 520 Asp *Giu Leu Thr 345 Val Gin Pro Leu Ser 425 Leu Asp Arg Gly Ala 505 Asn Ile Gly Arg 330 Ala Glu Val1 Gly Asp 410 Al a Leu Val1 Met Cys 490 Val Ala Leu Leu 315 Asp Leu Giu Ser Leu 395 Leu Glu Asp Arg Leu 475 Aia Ala Arg Ile 300 Val1 Asp Ser Leu Phe 380 Thr Gly Tyr Al a Ile 460 Giu H{is Val kl a Gly Gly Pro Ala His 365 Ala Ile Val1 Asn Tyr 445 Ser Gin Giu Val1 Asn 525 Vai Cys Ser Leu 350 Pro Leu Ser Thr Thr 430 Arg Thr His Leu Phe 510 Arg Pro Phe Phe 335 Ser Pro Leu Arg Leu 415 Asp Thr Ala Asn.
Val 495 Giu Leu Val Ala 320 A-rg His Arg Pro Glu 400 Giu Leu Leu Ala Ala 480 Glu Asp Al a 515 His Arg Leu Ser Ala Ser Giy Ala Gly Pro Gly Lys Ile Ile Ala Leu 530 S35 540 96 Ala Met Glu Arg Ser Leu Glu Met Val Ile Ala Leu Leu Ala Ile Leu 545 550 555 560 Lys Ser Gly Ser Ala Tyr Leu Pro Leu Asp Pro Ala His Pro Lys Asp 565 570 575 Arg Leu Ala Arg Ile Leu Asp Glu Val Gin Pro His Ala Val Leu Thr 580 585 590 Gin Glu Ala Val Ala Glu Met Met Ala Met Met Ala Met Met Ala Val 595 600 605 Ala Val Glu Pro Glu Ala Ala Asn Leu Val Ser Gly Ser Lys Pro Asp 610 615 620 Asp Leu Ala Tyr Ile Ile Tyr Thr Ser Gly Ser Thr Gly Arg Pro Lys 625 630 635 640 Gly Val Glu Ile Arg His Ser Ser Leu Val Asn Leu Leu Arg Ser Met 645 650 655 Gin Arg Glu Pro Gly Leu Thr Ala Ala Asp Gly Leu Val Ala Val Thr 660 665 670 Thr Val Ser Phe Asp Ile Ala Gly Leu Glu Ile Trp Leu Pro Leu lie 675 680 685 Thr Gly Ala Arg Val Ile Val Ala Thr Arg Glu Ile Val Val Asp Gly 690 695 700 Glu Arg Leu Thr Thr Leu Leu Asp Lys Ser Gly Ala Thr Val Met Gin 705 710 715 720 Ala Thr Pro Ser Gly Trp Arg Gin Leu Leu Asp Ser Gly Trp Lys Pro 725 730 735 Gly Lys Gly Phe Arg Val Phe Cys Gly Gly Glu Ala Leu Pro Pro Glu 740 745 750 Leu Ala Arg Arg Ile Leu Asp Ser Gly Val Glu Leu Trp Asn Leu Tyr 755 760 765 Gly Pro Thr Glu Thr Thr Ile Trp Ser Ala Val His Lys Thr Gin Arg 770 775 780 Leu Gly Ala Ser Asp Ser Ile Val Pro Ile Gly His Pro Ile Asp Asn 785 790 795 800 Thr Gin Leu Tyr Ile Leu Asp Ser Arg Met Glu Pro Val Pro Pro Gly 810 Val Pro His Arg Arg Gly 850 Gly Ala 865 Gly Phe Ile Ala Ala Tyr Leu Arg 930 Ala Phe 945 Asp Ala Glu Pro Giu Val Val Gly 1010 Glu Arg 1025 Thr Ile Gl) Asr 83E ArS ValI Arg Val Leu 915 Ser Val Asn Met Leu 995 Gly Leu -lu rGlu Leu Tyr Ile Gly 820 Pro Giu Leu Thr Arg 840 Ile Tlyr Ser Thr Gly 855 Glu Cys Leu Gly Arg 870 Ile Glu Pro Ala Glu 885 Lys Gin Ala Ile Thr 900 Val Pro Ala Thr Gly 920 TI-p Leu Ala Thr Arg 935 Ser Leu Ser Ser Leu 950 Ala Leu Pro Gly Leu 965 Arg Gly Asp Val Val 980 Arg Val Giu His Val 1000 His Ser Leu Met Leu 1015 Gly Leu Thr Leu Ser 1 1030 Ser Leu Ala Gly Leu 1045 Gl 82 Gi As Va Ii Va 90 As Le Pr Pr< kr Pal la .y Ala Giy Leu Ala 5 ui Lys Phe Arg Giu 845 p Leu Ala Arg Tyr 860 1 Asp Arg Gin Ile 875 e Glu Ala Ala Ile 890 1 Val Lys Asp Asp 5 p Val Arg Asp Leu 925 u Pro Asp Tyr Met 940 o Leu Thr Pro Asri 955 3 Thr Thr Pro Val 970 aThr Ile Ala Ser Tyr Arg Gin Asn 1005 -Arg Vai Arg Gly 1020 *Val Asp Leu Phe 1035 *Giu Lys Ser Giu 1050 815 Arg Gly Tyr 830 Trp Arg Asp Arg Ser Asp Lys Leu Arg 880 Giu Thr His 895 Arg Leu Ile 910 Gin Ser Asp Ile Pro Ser Giy Lys Ile 960 Ala Ala Arg 975 Ile Trp, Arg 990 Phe Phe Asp Leu Leu Glu Arg His Thr 1040 Pro Ala Ala 1055 Ala Giu Pro Ala Ala Ala Val Ala Glu Asp Arg Ile Ala Val Ile Gly 1060 1065 1070 98 Met Ala Gly Arg Phe Pro Gly Ala Arg Asn Val Glu Glu Phe Trp Arg 1075 1080 1085 Asn Leu Arg Asp Gly Val Asp Ser Ile Ala Arg Leu Ser Pro Giu Asp 1090 1095 1100 Leu Leu Ala Gly Gly Ile Ser Pro Glu Val Phe Gin Asp Pro Ser Tyr 1105 1110 1115 1120 Val Pro Ala Lys Gly Leu Leu Asp Gly Ile Glu Phe Phe Asp Ala Ala 1125 1130 1135 Phe Phe Gly Tyr Ser Pro Arg Glu Ala Glu Ile Met Asp Pro Gin His 1140 1145 1150 Arg Val Phe Leu Giu Cys Ala Trp Glu Ala Met Giu Asn Ala Gly Tyr 1155 1160 1165 Ala Ala Arg Ser Tyr Lys Gly Ser Ile Gly Val Phe Ala Gly Cys Gly 1170 1175 1180 Val Asn Thr Tyr Leu Leu Asn Asn Leu Ala Thr Ala Giu Pro Phe Asp 1185 1190 1195 1200 Phe Ser Arg Pro Ser Ala Tyr Gin Leu Leu Thr Ala Asn Asp Lys Asp 1205 1210 1215 Phe Leu Ala Thr Arg Val Ser Tyr Lys Leu Asn Leu Arg Gly Pro Ser 1220 1225 1230 Leu Thr Val Gin Thr Ala Cys Ser Thr Ser Leu Val Ser Val Val Met 1235 1240 1245 Ala Cys Glu Ser Leu Gin Arg Gly Ala Ser Asp Ile Ala Leu Ala Gly 1250 1255 1260 Gly Val Ala Ile Asn Val Pro Gin Ser Val Gly Tyr Leu His Gin Pro 1265 1270 1275 1280 Gly Met Ile Leu Ser Pro Asp Gly Arg Cys Arg Ala Phe Asp Giu Ser 1285 1290 1295 Ala Gin Gly Thr Val Pro Gly Asn Gly Ala Gly Val Val Val Leu Lys 1300 1305 1310 Arg Leu Ser Arg Ala Leu Ala Asp Gly Asp Thr Ile Tyr Ala Val Ile 1315 1320 132S Arg Gly Ala Ala Ile Asn Asn Asp Gly Ala Giu Arg Met Gly Phe Thr 1330 1335 1340 99 Ala Pro Gly Val Asp Gly Gin Thr Arg Leu Ile Arg Arg Thr Gin Glu 1345 1350 1355 1360 Met Ala Gly Val Lys Pro Glu Ser Ile Gly Tyr Ile Glu Ala His Gly 1365 1370 1375 Thr Ala Thr Pro Leu Gly Asp Pro Val Glu Ile Ala Ala Ile Ala Ala 1380 1385 1390 Asn Phe Pro Lys Asn Gly Ser Gly Asp Val Tyr Ile Gly Ser Val Lys 1395 1400 1405 Thr Asn Ile Gly His Leu Asp Val Ala Ala Gly Val Ala Gly Leu Ile 1410 1415 1420 Lys Thr Val Leu Ala Val His Arg Gly Gin Ile Pro Pro Ser Leu Asn 1425 1430 1435 1440 Phe Gin Arg Pro Asn Pro Arg Ile Asp Phe Ala Asn Thr Pro Phe Arg 1445 1450 1455 Val Ser Thr Arg Leu Leu Asp Trp Pro Ala Gly Lys Thr Pro Arg Arg 1460 1465 1470 Ala Ala Val Ser Ser Phe Gly Ile Gly Gly Thr Asn Ala His Val Ile 1475 1480 1485 Leu Glu Gin Ala Pro Pro Val Thr Pro Ala Ala Ala Ala Pro Glu Arg 1490 1495 1500 Ser Ala His Val Leu Cys Leu Ser Ala Asn Thr Asp Ala Ala Leu Glu 1505 1510 1515 1520 Glu Leu Val Arg Ser Tyr Arg Gly His Met Asp Asn Gin Pro Gly Leu 1525 1530 1535 Ser Phe Gly Asp Val Ala Phe Thr Ala Asn Ala Gly Arg Val His Phe 1540 1545 1550 Pro His Arg Ile Cys Ile Val Ala Arg Ser Ser Asp Glu Ala Arg Gin 1555 1560 1565 Arg Leu Thr Glu Ala Arg Arg Val Arg Ile Ala Gin Thr Arg Pro Lys 1570 1575 1580 Ile Ala Phe Leu Phe Thr Gly Gin Gly Ala Gin Tyr Ala Gly Met Gly 1585 1590 1595 1600 Arg Gin Phe Tyr Glu Ser Gin Pro Val Phe Arg Ala Ala Met Asp Glu 100 1605 1610 1615 Cys Ala Ala Leu Leu Asn Gly Arg Leu Asp Leu Pro Ala Leu Leu Ala 1620 1625 1630 Asp Asp Ala Leu Leu Asp Ala Thr Ala Gly Ala Gin Pro Ala Leu Phe 1635 1640 1645 Ala Leu Gin Trp Ala Leu Ala Gin Leu Trp Lys Ser Trp Gly Val Thr 1650 1655 1660 Pro Asp Leu Val Met Gly His Ser Val Gly Glu Tyr Ala Ala Ala Cys 1665 1670 1675 1680 Ile Ala Gly Ala Val Ser Leu Pro Asp Ala Leu Gly Leu Val Ala Glu 1685 1690 1695 Arg Gly Arg Leu Met Gin Asn Leu Pro Glu Gly Ala Met Ala Ala Val 1700 1705 1710 Ser Ala Gly Glu Gin Arg Cys Ala Ala Ala Ile Thr Ser Arg Val Ser 1715 1720 1725 Ile Ala Ala Ile Asn Gly Pro Ala Glu Val Val Ile Ser Gly Ala Pro 1730 1735 1740 Gin Asp Ile Glu Ser Ala Leu Ala Thr Leu Arg Ala Glu Gly Ile Lys 1745 1750 1755 1760 Thr Gin Met Leu Ala Val Ala Arg Ala Phe His Ser Ser Ser Met Asp 1765 1770 1775 Pro Ile Leu Ala Asp Leu Gin Arg Arg Ala Ala Ala Ile Ala Trp Arg 1780 1785 1790 Asn Pro Ser Ile Gly Leu Val Ser Asn Leu Thr Gly Lys Leu Ala Gly 1795 1800 1805 Glu Gly Gin Leu Ala Asn Pro Leu Tyr Trp Arg Asp His Ala Arg Asn 1810 1815 1820 Pro Val Arg Phe Ala Asp Gly Ile Gin Thr Leu Lys Asp Glu Gly Cys 1825 1830 1835 1840 Asp Val Phe Leu Glu Ile Gly Pro Lys Pro Val Leu Leu Gly Met Gly 1845 1850 1855 Gin Lys Cys Leu Pro Asp Asp Ala Lys Gin Trp Leu Pro Ser Leu Arg 1860 1865 1870 101 Lys Gly Arg Asp Glu Trp Glu Thr Ile Leu Ser Ser Val Ala Thr Leu 1875 1880 1885 Tyr Gin Gly Gly Phe Asp Ile Asp Trp Gin Glu Phe Asp Arg Pro Tyr 1890 1895 1900 Ser Arg Arg Arg Val Ala Leu Pro Ala Tyr Pro Phe Glu Arg Arg Arg 1905 1910 1915 1920 His Trp Ile Glu Arg Ser Ser Arg Pro Glu Pro Val Ala Val Ala Ser 1925 1930 1935 Gly Leu Val Gly Cys Arg Leu Ser Leu Pro Val Ala Asp Val Ile Phe 1940 1945 1950 Glu Ser Lys Leu Ser Thr Ala Ser Pro Leu Leu Ser Asp His Arg Tyr 1955 1960 1965 Tyr Gly Ser Val Val Ala Pro Ala Val Tyr Phe Leu Ala Met Ala Leu 1970 1975 1980 Glu Ala Ser Ala Glu Val Phe Gly Ala Gly Arg His Thr Leu Glu Asn 1985 1990 1995 2000 Val Asn Phe Ala His Pro Leu Ile Leu Ser Ala Glu Arg Asp Thr Ala 2005 2010 2015 Val Gin Leu Val Leu Ser Gin Ser Asp Asp Arg His Ala Ser Phe Arg 2020 2025 2030 Ile Leu Ser Leu Ser Asp Gly Ser Trp Asn Leu His Ala Ala Gly Asn 2035 2040 2045 Ile Ala Ala His Ala Gly Val Ala Pro Val Pro Arg Leu Val Asp Glu 2050 2055 2060 Arg Arg Pro Ala Val Asp Gly Asp Thr Tyr Tyr Ser Leu Leu Arg His 2065 2070 2075 2080 Leu Glu Ile Glu Leu Gly Pro Ser Tyr Arg Arg Ile Gin Arg Ile His 2085 2090 2095 Phe Gly Glu Gin Glu Ala Leu Ala Ala Ile Asp Ser Ala Thr Pro Leu 2100 2105 2110 Asn Pro Arg Cys Glu Leu Ala Glu Ala Gly Leu Gin Leu Leu Ser Ala 2115 2120 2125 Ala Ala Ser Pro Ala Leu Ala Asp Gly Ala Glu His Pro Ile Phe Ala 2130 2135 2140 102 Pro Leu Gly Ile Asp Arg Val Cys Phe Tyr Gly Ser Leu Glu Gly Ala 2145 2150 2155 2160 Val Trp Gly Ala Ala Gin Ile Leu Arg His Ser Pro Asp Gly Phe Thr 2165 2170 2175 Gly Glu Ala Gin Leu Leu Asp Ser Glu Gly Cys Val Leu Gly Glu Leu 2180 2185 2190 Gin Gly Val Ser Phe Arg Arg Val Thr Arg Ala Trp Ala Gin Arg Ser 2195 2200 2205 Glu Arg Lys Pro Glu Leu Tyr Glu Val Glu Trp Arg Pro Glu Pro Leu 2210 2215 2220 Arg Gin Pro Ser Arg Thr Leu Gin Pro Gly Ala Trp Leu Ile Leu Ala 2225 2230 2235 2240 Asp Ser Gly Gly Ala Ala Arg Ala Leu Ala Asp Ala Leu Thr Ala Gin 2245 2250 2255 Gly Glu Met Cys Val Thr Val Pro Pro Ala Gly Glu Tyr Met Ser Leu 2260 2265 2270 Val Gly Glu Arg Asp Trp Arg Gly Ile Val Asn Leu Tyr Ser Leu Asp 2275 2280 2285 Asp Tyr Glu Leu Gly Cys Arg Ser Thr Leu Ala Leu Val Lys Ser Leu 2290 2295 2300 Lys Ser Gly Pro Arg Leu Trp Leu Val Thr Ala Gly Ala Gin Ala Thr 2305 2310 2315 2320 Ser Ala Val His Asn Pro Met Gin Ala Ala Leu Trp Gly Phe Gly Arg 2325 2330 2335 Val Ile Ala Arg Glu His Pro Asp Leu Trp Gly Gly Leu Ile Asp Leu 2340 2345 2350 Asp Pro Asp Asp Ala His Ala Ser Ala Ala Gly Ala Ala Ala Gin Met 2355 2360 2365 Arg Asp Phe Asp Gly Glu Asp Gin Ser Ala Trp Arg Ser Asn Arg Arg 2370 2375 2380 Tyr Val Pro Arg Leu Thr Arg Arg Pro Ser Ala Arg Ala Ala Val Arg 2385 2390 2395 2400 Leu Val Ser Gly Ala Thr Tyr Leu Ile Thr Gly Gly Leu Gly Ala Leu 103 2405 2410 2415 Gly Leu Thr Val Ala Lys Trp Met Val Glu His Gly Ala Thr Arg Val 2420 2425 2430 Val Leu Ala Gly Arg Arg Pro Pro Asn Glu Glu Gin Gin Arg Val Leu 2435 2440 2445 Gin Gin Ile Gly Ala Thr Ala Glu Thr Val Asp Val Ser Arg Glu Glu 2450 2455 2460 Glu Val Ala Asp Leu Ile Arg Arg Ile His Thr Glu Thr Ser Pro Leu 2465 2470 2475 2480 Arg Gly Val Ile His Ala Ala Gly Val Leu Asp Asp Gly Val Leu Leu 2485 2490 2495 Asn Gin Asp Trp Thr Arg Ile Ala Ser Val Met Ala Pro Lys Ala Glu 2500 2505 2510 Gly Ala Val His Leu His His His Thr Arg Asp Leu Pro Leu Asp Phe 2515 2520 2525 Phe Val Leu Phe Ser Ser Ala Ser Ser Leu Leu Gly Pro Ala Gly Gin 2530 2535 2540 Ala Gly Tyr Ala Ala Ala Asn Ala Val Leu Asp Ala Leu Ala His His 2545 2550 2555 2560 Arg Arg Gly Leu Gly Leu Pro Ala Thr Ser Ile Asn Trp Gly Arg Trp 2565 2570 2575 Ser Gly Ala Gly Met Ala Ala Arg Thr Ser Gin Ser Met Ala Gly Val 2580 2585 2590 Ala Ser Leu Ser Val Asp Glu Gly Leu His Ile Leu Glu Ala Val Leu 2595 2600 2605 His Glu Cys Pro Ile Gin Ile Ala Ala Leu Pro Ala Gly Ser Ile Thr 2610 2615 2620 Gly Glu Leu Leu Arg Pro Ala Ala Leu Pro Ser Pro Gin Leu Arg Thr 2625 2630 2635 2640 Arg Leu Asn Glu Ala Thr Pro Arg Gin Arg Glu Ala Ile Leu Ile Ala 2645 2650 2655 His Ile Arg Glu Ser Leu Ala Arg Phe Val Gly Ile Ala Thr Ser Thr 2660 2665 2670 104 Pro Leu Asp Pro Gin Gin Pro Leu Gly Glu Leu Gly Leu Asp Ser Leu 2675 2680 2685 Met Ala Ile Giu Leu Arg Asn Ser Leu Ser Gin Ser Leu. Gly Gin Pro 2690 2695 2700 Leu Pro Ala Ser Leu Leu Phe Asp Tyr Pro Ser Leu Asp Ala Ile Val 2705 2710 2715 2720 Ser Tyr Val Leu His Ala Val Phe Pro Pro Giu Ala Ser Pro Val Giu 2725 2730 2735 Ala Pro Glu Phe Glu. Asn Leu Ala Arg Giu Giu Leu Giu. Ala Leu Leu 2740 2745 2750 Asp Ser Arg Leu Ala Gin Val Asp Gin Trp Leu Glu Thr Gin 2755 2760 2765 <210> 123 <211> 1763 <212> PRT <213> bacterium <400> 123 Met Ser Gly Ser Asp Asp Leu Ser Lys Leu Arg Arg Ala Val Ile Ala 1 5 10 Leu Asp Lys Val Gin Lys Arg Ile Asp Gin Leu Glu Ser Ala Arg Ser 25 Glu Pro Ile Ala Leu Ile Gly Ala Gly Cys Arg Phe Pro Gly Ala Ser 40 Asn Leu Asp Ala Tyr Trp Ser Leu Leu Arg Giu Gly Arg Ser Ala Val s0 55 Arg Giu Val Pro Pro Asp Arg Trp Asp Ile Asp Ala Tyr Tyr Asp Pro 70 75 Asp Pro Gly Ala Thr Gly Arg Met Tyr Thr Arg Tyr Gly Giy Phe Ile 90 Asp Gin Val Asp Arg Phe Asp Ala Arg Phe Phe Gly Ile Ala Pro Arg 100 105 110 Giu Ala Ile Ser Leu Asp Pro Gin Gin Arg Leu Leu, Leu Giu Val Thr 115 120 125 105 Trp Glu Ala Ile Giu Asn Ala Gly Leu Pro Pro Asp Arg Leu Ala Gly Ser 14S Leu Gly Leu Val Leu 225 Ile Ala Val Val Asn 305 Gin Glu Ala Lys 130 Arg Gin Asn Gin Ala 210 Ala Tyr Phe Val Met 290 Gly Ala Ala Leu Leu 370 Thr Met Thr Gly 195 Val Leu Phe Asp Va1 275 Ala Leu Va1 His Ala 355 Gly Gly Arg Ala 180 Pro His Ala Cys Ala 260 Leu Va1 Thr Gly Gly 340 Ala Ser Val Gly 165 Ser Asn Leu Gly Lys 245 Ala Lys Ile Ala Asp 325 Thr Ala Val Phe 150 Gly Va1 Met Ala Gly 230 Leu Ala Arg Arg Pro 310 Ala Gly Leu Lys 135 Met Asp Ala Ala Cys 215 Val Lys Asp Leu Gly 295 Asn Arg Thr Gly Thr 375 Gl Ala Ala Ile 200 Gin Asn Ala Gly Ser 280 Thr Gly Leu Pro Ala 360 Asn Ile His Gly 185 Asp Ser Leu Met Tyr 265 Asp Ala Pro Gin Leu 345 Gly Phe Phe Ile 170 Arg Thr Leu Ile Ala 250 Val Ala Ile Ala Thr 330 Gly Arg Gly Ser 155 Asp Leu Ala Arg Leu 235 Ala Arg Leu Asn Gin 315 Leu Asp Thr His 140 Asn Ala Ser Cys Ser 220 Ser Asp Gly Arg Gin 300 Glu Asp Pro Asn Leu 380 Asp Tyr Tyr Ser 205 Gly Pro Gly Glu Asp 285 Asp Ala Val Ile Gly 365 Glu Tyr Thr Ile 190 Ser Glu Asp Arg Gly 270 Arg Gly Val Ser Glu 350 Asn Ala Tyr Gly 175 Leu Ser Ser Arg Cys 255 Cys Asp Arg Ile Tyr 335 Ala Lys Ala Asn 160 Thr Gly Leu Asp Thr 240 Lys Gly Pro Ser Arg 320 Va1 Gly Leu Ala Gly Val Ala Ala Leu Ile Lys Val Ala Leu Met Leu Gin Asn Giu Ala 390 395 400 106 Ile Pro Pro His Leu Asn Leu Thr Thr Pro Ser Pro His Ile Asp Trp 405 410 415 Asn Thr Leu Pro Leu Glu Ile Pro Ala Arg Leu Thr Pro Trp Pro Val 420 425 430 Ala Pro Gly Gly Arg Arg Val Ala Gly Ile Asn Ser Phe Gly Leu Ser 435 440 445 Gly Thr Asn Ala His Val Leu Ile Glu Gin Ala Pro Gin Gin Ala Ala 450 455 460 Ser Ser Thr Pro Ala Pro Tyr Leu Leu Pro Leu Ser Ala Arg Ser Pro 465 470 475 480 Glu Ala Leu Arg Asp Leu Ala Arg Ala Tyr Arg Asp Val Val Asn Asp 485 490 495 Asn Pro Ala Asp Thr Cys Tyr Thr Ala Cys Ala Arg Arg Thr Ser Tyr 500 505 510 Glu His Arg Ala Ala Phe Thr Gly Thr Asn Ala Gin Asp Leu Met Ala 515 520 525 Gly Leu Asp Ser Phe Leu Ala Gly Asn Pro Asn Arg Asp Thr Ala Thr 530 535 540 Gly Phe Val Pro Arg Gly Gin Lys Arg Lys Val Val Phe Val Leu Pro 545 550 555 560 Gly Gin Gly Ser Gin Trp Pro Gly Met Gly Arg Asp Leu Met Ala Ser 565 570 575 Glu Pro Val Phe Arg Ala Ala Ile Glu Glu Cys Gly Arg Ala Met Gin 580 585 590 Pro Tyr Val Asp Trp Ser Leu Thr Gln Glu Leu Gin Gly Pro Leu Asp 595 600 605 Arg Ile Asp Val Ile Gin Pro Ala Leu Phe Ala Val Gly Val Ala Leu 610 615 620 Ala Gly Leu Trp Arg His Trp Gly Ile Glu Pro Asp Ala Val Ile Gly 625 630 635 640 His Ser Met Gly Glu Val Ala Ala Ala His Ile Ala Gly Ala Leu Thr 645 650 655 Leu Asp Glu Ala Ala Arg Val Ile Cys Leu Arg Ser Arg Met Leu Ala Gly Glu Ala 705 Leu Arg Cys Ala Glu I 785 Leu S Leu G Leu G Asp G: 8i Gly Va 865 Arg Ar Ile Ar Pro Se
V
A
6
SI
G1 Va Al Le 77 el ;ei ly 1 iy 50 il -9 g r al Arg 675 la Ile 90 er Asn Ly Glu li Lys a Ala 755 u Gly 0 u Val S r Thr A i Leu S 8 SAsp A 835 Asn L Thr Pi Leu Pr Pro Pr Pro Le 915 6
G.
A:
Se Le Va 74 Le Me ;er .la er 20 rg eu -0 0
U
ro 'o o 0 u 60 ly Gin la Ala -r Pro u Leu 725 1 Asp 0 u Pro t Tyr r Ala Val A 805 Pro H Ala A Ala LI Asp T] 8" Asn Ty 885 Gin Va Pro Gl
G
I:
Ar 71 Ar Il Gl Se: yI 79( la 'is la eu rp 10 r .1
U
ly Glu Le Ala 695 rg Ser i0 g Glu e Ala y Val r Thr 775 r Trp 0) SAla A Pro L Ile A 8 Arg A 855 Ser AJ Pro Ti Glu Se Met Gl 92
M
6
G
T
Le Se Va 76 Va Ae la 40 rg rg -p .r n 0 107 665 et Ala 80 ly Arg S ir Val L u Glu A 7 *r His Si 745 1 Gly A: 0 1 Thr GJ a Arg As a Ala Al 81 i Leu Va 825 Ala Al Thr Let Ile Tyr Gin Arg 890 Gin Ala 905 Phe Glu
VE
Je 1 3 e
L
0 1 a u al Val er Asp u Ser 715 .a Lys 0 r His a Leu y Ala J i Leu 795 SGly G Gin P Ser L Gly A; 8 Pro As 875 Glu Ar Leu Pr Ser Th
G
A
7
G
A
Le G1 A1 78 rc eu la n *g o r lu L 6 rg V, 00 ly As sp Va _u Me n Pr 76 a Il 0 g Glr y His Ile Arg 845 Leu Gly Tyr Gly Val 925 670 eu Ala al Ser sp Ser li Phe t Asp 750 o Arg e Ser C i Pro V Asp V 8 Gin G 830 Arg A Leu T Gly G1 Trp Il 89 Arg Ar 910 Glu Al
L
I:
Al Cy 73 Se Pr ral 'al lu sp ihr .n e g a eu Asp Le Ala .a Ala 720 s Arg r Val o Ala y Glu SMet 800 SPhe Thr Glu Asn Thr 880 Asp Ile Lys Asp Phe Ala Asp His Arg Leu His Asp Val Ile Val Thr Pro Gly Ala 930 935 940 Trp His Leu Ala Met Ala Leu Ala Ala Ala Arg Gin Gly Leu Gly Ala 945 950 955 960 Gly Pro His His Val Glu His Val Ser Leu Thr Gly Ala Leu Thr Leu 965 970 975 Pro Glu Asn Asp Ala Ala Arg Gln Val Gin Leu Val Leu Arg His Glu 980 985 990 Glu Gly Gly Gly Ala Ser Phe Arg Ile Tyr Ser Arg Glu Asp Ser Trp 995 1000 1005 Lys Leu His Ser Glu Gly Met Leu Gln Ala Gly Asp Ser Thr Ala Ser 1010 1015 1020 Ile Asp Leu Asp Ala Ile Arg Ala Arg Cys Thr Ala Glu Leu Thr Ala 1025 1030 1035 1040 Asp Ala Phe Tyr Ser Arg Leu Trp Asp Arg Gly Tyr His Phe Gly Pro 1045 1050 1055 Thr Phe Arg Thr Ile Gly Pro Ile Trp Arg Gly Asn Gly Glu Val Leu 1060 1065 1070 Cys Arg Val Asp Ile Pro Leu Thr Glu Met Gin Thr Ile Asp Cys Cys 1075 1080 1085 Leu Gin Leu Pro Ala Ala Leu Val His His Asp Asp Leu Lys Asp Val 1090 1095 1100 His Val Pro Val Gly Leu Asp Arg Phe Ser Leu Ala Glu Val Pro Thr 1105 1110 1115 1120 Gly Pro Val Trp Gly Tyr Ala Val Leu Arg Pro Asp Ser Thr Val Asp 1125 1130 1135 Val Arg Leu Val Thr Gly Thr Gly Ser Val Val Ala Glu Leu Val Gly 1140 1145 1150 Leu Gin Ser Arg Val Ala His Ser Gly Gin Leu Gly Glu Ser Glu Ile 1155 1160 1165 Pro Thr Trp Thr Val Gin Trp Thr Ala Ser Val Arg Arg Gly Asp Ala 1170 1175 1180 Asn Ala Gly Asn Ala Gly Gly Pro Trp Leu Val Ile Gly Glu Pro Ala 1185 i190 1195 1200 109 Ile Ala.Glu Thr Leu Gin Lys Arg Gly Gin Thr Cys Arg Thr Ala Asp 1205 1210 1215 Thr Cys Ser Gly Pro Pro Cys Arg Gin Ile Val Tyr Cys Pro Ser Pro 1220 1225 1230 Arg Ile Asp Asp Leu Leu Ser Val Leu Arg Ser Ile Val Gin Ala Gly 1235 1240 1245 Trp Pro Glu Pro Pro Arg Leu Trp Leu Leu Thr Arg Gly Ser Ala Ala 1250 1255 1260 Val Leu Asn Ser Asp Lys Asp Ile Asp Ile Arg Gin Ala Trp Leu His 1265 1270 1275 1280 Gly Ile Gly Arg Thr Ile Ala Tyr Glu His Pro Glu Leu Arg Cys Thr 1285 1290 1295 Leu Val Asp Leu Asp Ala His Ser Asn Asp Cys Gly His Leu Ala Thr 1300 1305 1310 Leu Met Leu Ser Asn Ile Ala Glu Asp Gin Val Ala Ile Arg Gin Gly 1315 1320 1325 Thr Val Trp Ala Pro Arg Leu Ser Leu His Lys Ile Pro Ser Ala Pro 1330 1335 1340 Asp Val Ala Phe Arg Ala Asp Ala Thr Tyr Leu Ile Thr Gly Gly Leu 1345 1350 1355 1360 Gly Gly Leu Gly Leu Gin Val Ala Gly Trp Leu Ala Ala Ala Gly Ala 1365 1370 1375 Arg His Leu Val Leu Leu Gly Arg Ser Glu Arg Pro Arg Pro Gin Leu 1380 1385 1390 Glu Gly Val Asn Val Lys Ile Ile His Ala Asp Val Ala Asp Arg Gin 1395 1400 1405 Gin Leu Ser Asp Ala Leu Ala Ile Ile Asp Arg Asp Met Pro Pro Leu 1410 1415 1420 Arg Gly Val Phe His Leu Ala Gly Thr Leu Ala Asp Gly Met Leu Leu 1425 1430 1435 1440 Asn Leu Thr Thr Glu Arg Phe Glu Ala Ala Met Ala Pro Lys Val Ala 1445 1450 1455 Gly Ala Trp Asn Leu His Glu Leu Thr Ala Gly Arg Pro Leu Asp His 110 1460 1465 1470 Phe Val Leu Phe Ser Ser Ala Ser Ala Thr Val Gly Ser Pro Gly Gin 1475 1480 1485 Gly Asn Tyr Ala Ala Gly Asn Ser Phe Leu Asp Ala Leu Ala His Leu 1490 1495 1500 Arg Arg Ala Gin Gly Leu Pro Ala Val Ser Ile Ala Trp Gly Pro Trp 1505 1510 1515 1520 Thr Gin Val Gly Leu Ala Ala Gin Ala Asn Arg Gly Asp Arg Leu Ala 1525 1530 1535 Ala Arg Gly Ile Ser Val Ile Gin Pro Gin Gin Gly Leu Arg Ala Leu 1540 1545 1550 Tyr Lys Ala Leu Thr Gin Ile Arg Pro His Val Ala Val Met Asn Phe 1555 1560 1565 Asp Ile Ala Gin Trp Leu Arg Tyr Tyr Pro Ser Ala Ala Ser Met Ser 1570 1575 1580 Leu Leu Ala Gly Ile Ala Pro Ala Ala Ala Asp Thr Lys Pro Ala Ala 1585 1590 1595 1600 Asp Met Arg Ser Glu Leu Leu Ala Val Pro Ala Gly Arg Gin Arg Arg 1605 1610 1615 Ala Arg Leu Glu Thr Leu Leu Met His Glu Ala Gly His Val Leu Arg 1620 1625 1630 Phe Asp Pro Ala Lys Leu Asp Gly Arg Ala Thr Leu Gly Asp Leu Gly 1635 1640 1645 Phe Asp Ser Leu Met Ala Leu Glu Phe Arg Asn Arg Leu Glu Ala Gly 1650 1655 1660 Leu Arg Val Lys Leu Ser Ala Thr Leu Ile Trp Arg Tyr Pro Thr Phe 1665 1670 1675 1680 Ser Ala Leu Ala Gin His Leu Ala Asp Lys Leu Gly Leu Pro Leu Glu 1685 1690 1695 Ser Met Ala Gly Asn Ala Glu Pro Ser Thr Val Ala Ala Val Ala Thr 1700 1705 1710 Leu Ala Thr Val Gly Thr Ala Ala Gly Glu Asp Arg Ser Pro Ala Ala 1715 1720 1725 ill Ala Asp Asp Leu Asp Ala Val Ala Asn Gin Ile Ala Gly Leu Gly Asp 1730 1735 1740 Lys Giu Ile Giu Ala Leu Leu. Lys Gin Lys Phe Ala His Phe Ser Gly 1745 1750 1755 1760 Ala Ser Glu <210> 124 <211> 2153 <212> PRT <213> bacterium <400> 124 Met Ser Ser 1 Ala Tyr Leu Arg Asp Ala Pro Gly Gly Val Asp Ala I Val Arg Arg I Gin Ala Gly P 1 Gly Ile Ser P 115 Leu. Leu. Glu V 130 Giu Gly Leu S IlE rhx krg ksp :le le he 00 ro al Sei Let Giu Gly Arg Leu Leu Arg Ala Gly Git Gli.
Prc Pro Glu 70 Lys Asp Glu Trp Ser 1 Arg His Ile IAsp 5 Val Ser Ser Ala Glu 135 Arg Phe Met Al a 40 Giu.
Pro Leu Ile Val 120 ka Phr Pro Gin 25 Ile Phe Pro Asn Asp 105 Ser Leu Gly Asn 10 *Arg Val1 Trp Gly Pro 90 Gly Ile Glu Val Let Arc Gly Gin Arg 75 Ala Phe ksp ksp ?he aThr Leu.
Leu Met Trp Thr Asp Pro Ala 140 Val Pro Asp Gly Leu Ala Cys Gin Ala Arg Gin Glu Phe Leu Asp Pro Asn Gin 125 31
Y
Arg Giu Vai Asp 110 Gin Gin Ile Ser Giu Lys Phe Arg Thr His Gly Ser Ile Phe Leu.
Met Ser 160 145 150 155 CGIn Ser Ser Asp 165 P11he Trp Met Gin Thr 170 Ala Asp Giy Ala Arg Ile 175 Asp Pro Tyr Thr Ala Thr Giy Th Leu Ala Arg 225 Phe Pro Phe Leu Asn 305 Gin Gin Asp Pro His 385 Ala Asn I Ser Cys 210 Ser Ser Ser Gly Ala 290 Gin Gin Gin Pro Val 370 eu ,eu 'ro Tyl 19~ Sei Gi Pro Giy Glu 275 Ala Asp Val1 Ile Ile 355
G
1 y 3iu 3er .sn 180 Leu Ser Glu Giu *Arg 260 *Giy Giy Gly Val1 Gly 340 Glu Asp Gly His Ile 420 Lei Se2 Cys Phe 245 Cys Cys Asp Arg Ile 325 Tyr Ile VIal klia.
31u 105 krg i Asn Leu Thr 230 Met Arg Giy Arg Ser 310 Arg Ile Giu.
Cys Ala 390 Thr LeuC Leu Ala 215S Leu Tyr Ala Val Val 295 Ala Ser Glu Ala Ala 375 Giy Ile iluC Git 20( Al Al a Ala Phe Val1 280 Trp, Gly Ala Ala Leu 360 Val Ile Pro ;ily SAla 1Gly i Val Val Thr Asp 265 Val Ala Leu Leu His 345 Ala Gly Ala Pro Thr 425 His Ser Val Ile Ala Gly Arg 190 Prc His Ala Sex 250 Ala Leu Val1 Thr Ala 330 Gly Glu Ser Gly Ser 410 Ser Ser Leu Gly 235 *Lys *Ala Lys Val Ala 315 Asn Thr Thr Leu Leu 395 Leu I Leu IZ IlE Ala 220 Gly Met Ala Arg Arg 300 Pro Ala Gly Val Lays 380 Ile is sp Ala 205 Cys Vai Gly Asp Leu 285 Gly Asn Gly Thr Gly 365 Ser Lys Val I Ile Lei.
Gir As2 Thr Gly 270 Ser Ser Val Val1 Pro 350 Leu ksn kla .rg Tal 1Asp Ser Leu Ala 255 Ile Asp Ala Val Ala 335 Leu Pro Ile Vai Gin 415 LysC Thr Leu Arg 240 Ser Val Ala Val1 Ser 320 Al a Gly Arg Gly eu 400 eu lu 430 Gly Val Ser Val Arg Pro 435 Trp Pro Ala Gly Ser 440 Arg Arg Arg Phe Al a 445 113 Ala Phe Gly Trp Ser Gly Thr Asn Ala His Val Val Leu Glu Glu Ala 450 Ala 465 Pro Gly Asp Thr Gly 545 Thr Val1 Ala Arg Gly 625 Leu Lys Leu Ile Pro Ala Gly Thr Thr 530 Ala Al a Arg Giu Arg 610 Met Glu Glu Phe Glu Thr Ala Thr Ala Gly Asp Ser Arg Giu 595 Lys Gly Axrg Glu Ala 675 Pro Gly Ala Pro 500 Asp Ile Glu His Thr 580 Leu Thr Arg Cys Leu 660 Leu Asp Arg Ala 485 Asp Thr Ala Leu Ala 565 His Al a Val Ser Giu 645 Al a Gin Ala Gly 470 Arg Ile Pro Asp Arg 550 Pro His Ala Phe Trp 630 Ala Lys Val1 Val1 455 Glu Ala Ala Asp Ala 535 Arg.
Ser Arg Gin Val 615 Met Ala Leu Ala Ile Ala Ala Giy Ile 520 Met Val Leu Cys Leu 600 Phe Asp Met Asp Ile 680 Gly Ala Val Thr 505 Ala Tyr Ala Arg Arg 585 Gin Ser Arg Arg Arg 665 Ala His Ser Pro 490 Pro Gly Val Arg Asp 570 Leu Gly Gly Glu Pro 650 Val Ala S er Gly 475 Leu Asp Thr Leu Ala 555 Leu Al a Ile Gin Pro 635 Tyr Giu Leu Met 460 Phe Al a Thr Ala Pro 540 Tyr Cys Val1 Thr Gly 620 Val Val1 Val1 Trp Gly His Giu Ala Giy 525 Leu Gly Tyr Ser Ile 605 Ser Ile Asp Ile Arg 685 Glu Ser Gly Asp 510 Thr Ser Giu Thr Gly 590 Pro Gin Arg Trp Gin 670 Ser Val Arg Asp 495 Thr Ala Al a Leu Ala 575 Arg Ser Trp Giu Ser 655 Pro Trp Ala.
Pro 480 Thr Pro Ala His Leu 560 Ala Thr Gin Ile Ala 640 Leu Ala Gly Ala 690 695 700 Ala His Val Ala Gly Ala Leu Thr Leu Gin Asp Ala Ala Arg Ile Ile 114 705 Cys Al a Tyr Val1 Glu 785 His Asp Thr Arg Ala 865 Leu Arg Tyr Asn Leu 945 Sei Met Thx Ile 770 Arg Ser Gly Gly Asn 850 Asp Leu Asp Glu :ys 930 ksp *Arg *Val Glu 755 Ser Arg Pro Ile Ala 835 Leu Ser Pro Gln Ala 915 Val2 Ala Ser Glu 740 Arg Gly Gly Gln Gln 820 Thr Arg Gly Al a Asp 900 3ly krg 3er Ax 72! Let Let Gl.
Val Val1 805 Pro Leu Ser His Ile 885 Glu Hi s Leu ro 710 SLeu .i Pro i Ser 1Val Ser 790 Asp IArg Glu Pro Asp 870 Gly Arg4 Thr Pro2 Ala3 950 715 720 Met Lei Let Prc Glt 775 Cys Prc Pro Thr Val 855 Val Gly Gly Lal krg )35S .z Ser Arg Ile 730 i Ala Glu Cys 745 Ala Val Ser 760 1Ala Leu Ala Arg Pro Val Leu Cys Asp 810 Ala Thr Ile 825 Thr Ser Leu 840 Leu Phe Trp Phe Leu Glu Asn Ala Ala 890 Ser Met Leu 905 Ala Trp Arg 920 Tyr Pro Trp His Ala Ile Ser Gly Leu Gly Gly Gb.
Asr Gltu Lys 795 Glu Pro Asp Gln Ile 875 Leu Thr Thr Gln rhr 955 IAla Gly Val 780 Val Leu Phe Ser Gly 860 Ser Val Ser Val Arg 940 Leu Val1 Pro 765 Val1 Asp Leu Tyr Thr 845 Ile Pro Pro Leu T~yr 925 krg 3lY *Leu 750 Asn Ala Phe Gin Ser 830 Tyr Arg His Se r Gly.
910 Pro Arg Asn Ser Ser Thr Ala Ser 815 Thr Trp His Pro Leu 895 Al a Ser Phe Pro *Thr *Thr *Leu Al a 800 Leu Val Ala Leu Ile 880 Arg Leu Gly Trp Leu 960 Leu Gly Lys Arg Val1 965 Glu Ala Ser Thr Gin 970 Pro Gly Thr Phe Phe Trp 975 Giu Thr Giu Val Gin Gly 995 Leu Ala Gly 1010 His Val Thr 1025S Thr Let' Gin Arg Ile Ser Leu Ser Leu Ala Ser 980 Glu Vai Val Let' Pro 1000 Thr Ser Git' Thr Phe 1015 Phe Thr Gin Met Let' 1030 Let' Ala Ile Ala Val 1045 Ser Arg Gin Ala Ser 1060 115 Vai Pro Trp, Leu Aia Asp His Arg 985 990 Ala Thr Ala Tyr Let' Asp Met Ala 1005 Gly Glu Ser Pro Cys Val Leu Giu 1020 Ile Val Pro Arg Asp Gly Ser Met 1035 1040 Asp Arg Pro Gly Met Ala Ser Phe 1050 1055 Thr Trp Val Let' His Ala Ser Gly 1065 1070 Asp Ile Arg Gin Thr Pro Ala Asp Ala Ser Thr Val Pro Pro ASP Ser 107S 1080 1085 Ala Git' Thr Val Gin Ala Arg Cys Pro Thr Val Val Pro Ala Ala Git' 1090 1095 1100 Leu Trp Arg Gin Met Ala Giu His Gly Val Git' Tyr Gly Pro Ala Phe 1105 1110 1115 1120 Arg Ala Leu Giu Gn Ile Trp Ser Cys Pro Gly Glu Ala Ile Gly Arg 1125 1130 1135 Leu Arg Ser Ser Glu Thr Arg Ser Thr Ala Pro Ala Phe Let' Asp Ala 1140 1145 1150 Cys Leu Gin Ilie Ile Ala Ala Ala Phe Gly Pro Ala Gly Gly Thr Trp 1155 1160 1165 Let' Pro Ala Gly Ile Asp Arg Met Arg Trp Leu His Pro Ala Arg Ser 1170 1175 1180 17-1 Val Tro Thr His Ala Arg Leu Git' Gly Pro Ile 1185 Leu Leu Leu Gin His Git 1190 Asp Giy Glu Gly 1205 Arg Let' Asp Ala 1220 Leu Arg Trp Vai 1235 Gin Ser Ala Leu Val Ala Arg 1210 Git' Arg Ile Asp 122S Gin Pro His Ala 1240 Ile Met Ala 1200 Giu Gly Leu Arg 1215 Arg Giy Trp Leu 1230 Ala Giu Pro Pro 1245 116 Ala Ala Arg Ala Ala Arg Ser Trp Leu Ile Val Gly Ala Val Asp Ser 1250. 1255 1260 Ala Leu Thr Ala Trp Leu Arg Ala Thr Gly Asn Arg Val Thr Gin Thr 1265 1270 1275 1280 Ser Pro Glu Lys Leu Asp Glu Leu Gin Pro Pro Leu Glu Glu Ile Val 1285 1290 1295 Phe Leu Leu Glu His Glu Pro Ser Cys Asp Arg Ile Leu His Leu Leu 1300 1305 1310 Gin Thr Leu Gly Arg Thr Pro Trp Arg Gin Ala Pro Arg Leu Trp Leu 1315 1320 1325 Val Thr Arg Gly Ala Gin Pro Val Asp Gly Gin Ile Leu Gin Ala Gly 1330 1335 1340 Ile Ala Gin Ala Pro Phe Trp Gly Leu Gly Arg Thr Val His Tyr Glu 1345 1350 1355 1360 His Pro Glu Leu Asn Cys Thr Leu Ile Asp Leu Asp Pro Ala Gly Gly 1365 .1370 1375 Glu Glu Glu Leu Leu His Glu Leu Leu Thr Asn Asn Gly Glu Asn Gin 1380 1385 1390 Ile Ala Phe Arg Gly Gly Ala Arg Tyr Val Ala Arg Val Ala Arg His 1395 1400 1405 Glu Ala Asp Met Gin Pro Ala Met Phe Lys Ala Gly Asp Arg Pro Phe 1410 1415 1420 Arg Leu Glu Ile Asp Ala Pro Gly Val Leu Asp Arg Leu Arg Leu Arg 1425 1430 1435 1440 Ala Thr Ser Arg Arg Pro Pro Gin Ala Gly Glu Val Glu Ile Glu Val 1445 1450 1455 Cys Ala Ala Gly Leu Asn Phe Leu Asp Val Leu Leu Ala Leu Gly Val 1460 1465 1470 Met Pro Asp Asp Ala Pro Gly Ala Ile Ala Gly Ser Pro Arg Leu Gly 1475 1480 1485 Gly Glu Cys Ser Gly Arg Ile Val Ala Met Gly Lys Gly Val Thr Asp 1490 1495 1500 Phe Arg Ile Gly Asp Glu Val Val Ala Leu Ala Pro Cys Ser Phe Gly 117 1505 1510 1515 1520 Arg Phe Val Thr Thr Pro Ala Phe Arg Val Ala Leu Lys Pro Ala Asn 1525 1530 1535 Ile Pro Ala Glu Gin Ala Ala Ala Leu Pro Ile Ala Phe Leu Thr Ala 1540 1545 1550 Asp Tyr Ala Leu Ser Arg Ala Ala Arg Leu Ala Pro Gly Glu Arg Val 1555 1560 1565 Leu Ile His Ala Ala Thr Gly Gly Val Gly Leu Ala Ala Ile Gin Ile 1570 1575 1580 Ala Gin Arg Ala Gly Ala Glu Ile.Phe Ala Thr Ala Gly Ser Pro Glu 1585 1590 -1595 1600 Lys Arg Ala Tyr Leu Arg Ser Leu Gly Ile Ala His Val Ser Asp Ser 1605 1610 1615 Arg Ser Met Ala Phe Val Asp Asp Ile Arg Asn Trp Thr Asn Gin Giu 1620 1625 1630 Gly Val Asp Val Val Leu Asn Ser Leu Ser Gly Asp Leu Leu Giu Ala 1635 1640 1645 Ser Phe Asp Leu Leu Arg Asp His Gly Arg Phe Ile Glu Ile Gly Lys 1650 1655 1660 Arg Asp Tyr Tyr Ala Gly Arg Lys Leu Gly Leu Arg Pro Phe Leu Lys 1665 1670 1675 1680 Asn Leu Ser Tyr Thr Leu Val Asp Leu Leu Gly Met Ser Leu Lys Arg 1685 1690 1695 Pro Ala Leu Thr Arg Giu Leu Leu Gin Glu Met Val Ala Lys Phe Giu 1700 1705 1710 Ser Giu Thr Trp Arg Pro Leu Giu Thr Arg Val Thr Thr Ile Thr Glu 1715 1720 1725 Ser Val Glu Ala Phe Arg Thr Met Ala Gin Ala Arg His Ile Gly Lys 1730 1735 1740 Ile Val Met Ala Met Arg Asp Cys Ala Asn Ala Pro Ile Ala Pro Leu 1745 1750 1755 1760 Arg Ser Ala Phe Asp Ser Glu Gly Thr Tyr Leu Ile Thr Gly Gly Leu 1765 1770 1775 118 Gly Gly Leu Gly Leu Thr Val Ala Arg Trp Met Ile Gly Arg Gly Ala 1780 1785 1790 Arg Arg Leu Val Leu Leu Ser Arg Arg Ala Pro Ser Pro Glu Val Gin 1795 1800 1805 Gin Ala Ile Ala Val Met Asp Ala Asp Val Arg Thr Val Gin Ala Asp 1810 1815 1820 Val Ser Gin Arg Asp Glu Leu Glu Arg Val Ile Ser Ser Ile Asp Arg 1825 1830 1835 1840 Leu Arg Gly Val Ile His Ala Ala Ala Val Leu Asp Asp Ala Leu Leu 1845 1850 1855 Leu Asn Gin Thr Glu Ala His Phe Arg Asn Val Met Ala Ala Lys Ile 1860 1865 1870 Asp Gly Ala Trp Asn Leu His Leu Leu Thr Arg Asp Cys Pro Leu Asp 1875 1880 1885 His Phe Val Leu Phe Ser Ser Ala Ala Gly Leu Leu Gly Ala Pro Ala 1890 1895 1900 Gin Gly Asn Tyr Ala Ala Ala Asn Ala Phe Leu Asp Ala Leu Ala Tyr 1905 1910 1915 1920 Tyr Arg Lys Ala Gin Gly Leu Pro Ala Leu Ser Ile Gly Trp Gly Ala 1925 1930 1935 Trp Ser Glu Val Gly Leu Ala Ala Ala Gin Asp Asn Arg Gly Ser Arg 1940 1945 1950 Leu Ala Leu Arg Gly Met Glu Asn Leu Thr Pro Gin His Gly Leu Ala 1955 1960 1965 Ile Leu Glu Gin Leu Leu Asn Ser Ser Ala Cys His Val Ala Ala Met 1970 1975 1980 Pro Ile Asn Val Arg Gin Trp Arg Gin Phe Tyr Pro Lys Ala Ala Gin 1985 1990 1995 2000 Ser Ala Leu Phe Glu Leu Leu His Asp Asp Ala Ala Ser Glu Ala Asp 2005 2010 2015 Ala Pro Asn Ala Leu Arg Ala Arg Leu Gin Ser Ala Glu Pro Gin Thr 2020 2025 2030 Arg Arg Thr Leu Leu Glu Glu His Leu Gin Gin Gln Leu Ala Arg Val 2035 2040 ZUt3 119 Leu Arg Ile Asp Ser Gin Thr Ile Asp Pro Leu Arg Pro Leu Lys Glu 2050 2055 2060 Leu Gly Phe Asp Ser Leu Met Ala Leu Glu Phe Arg Asn Arg Leu Glu 2065 2070 2075 2080 Leu Thr Leu Gly Leu Thr Leu Pro Ala Thr Leu Ile Trp Gly His Pro 2085 2090 2095 Thr Leu Ala Gly Leu Ala Pro His Leu Ala Ser Gin Met Gly Leu Pro 2100 2105 2110 Leu Val Glu Ala Gin Ala Ala Ala Ala Ala Glu Gly Asp Ser Arg Ala 2115 2120 2125 Met Lys Thr Ala Leu Ser Gly Leu Asp Asp Met Ser Glu Glu Ala Ala 2130 2135 2140 Val Ala Ala Leu Arg Gly Ala Arg Ser 2145 2150 <210> 125 <211> 1695 <212> PRT <213> bacterium <400> 125 Met Arg Glu Lys Ile Ala Pro Met Ser Ser Val Lys Leu Ala Leu Leu 1 5 10 Ala Arg Asn Met Arg Gin Asn Ile Ala Gly Phe Asp Leu Val His Ala 25 Glu Pro Ile Ala Ile Val Gly Met Ala Cys Arg Phe Pro Gly Gly Ala 40 Lys Asn Pro Asp Ala Phe Trp Thr Leu Leu Lys Asn Gly Val Asp Gly 55 Val Thr Glu Val Pro Pro Asp Arg Trp Asn Ser Asp Gin Tyr Tyr Ser 70 75 Ser Asp Pro Asp Ala Pro Gly Lys Ala Tyr Ala Arg Tyr Ala Ala Phe 90 Leu Glu Arg Ile Asp Gly Phe Asp Ala Glu Phe Phe Gly Ile Ser Pro 100 105 110 120 Arg Glu Ala Leu Asn Met Asp Pro Gin Gin Arg Leu Leu Leu Glu Val 115 Cys Gly 145 Leu Gly Leu Val Ala 225 Ile Thr Ile Ile Asn 305 Glu Glu Ala Trr 130 Ser Phe Val Arg Ala 210 Ala Ala Phe Val Arg 290 Gly Ala Ala Leu Glu Ala Gin Ala Gly 195 Val Phe Leu Asp Leu 275 Ala Ile Val His Gin Ala Thr Tyr His 180 Pro His Ala Ser Ala 260 Leu Val rhr kla .7y 340 3er L Ala Gl Ala 165 Set Ser Leu Gly Lys 245 Ala Lys Ile Ala Asn 325 Thr Val Git Val 150 Asp Met Met Ala Gly 230 Ala Ala Arg Arg Pro 310 Ala Gly Tyr Asp 135 Phe Pro Leu Ala Cys 215 Val Arg Asp Leu Gly 295 Asn His Thr Asp 2 120 Ala Ala Ala Ala Val 200 Gin Asn Met Gly Ser 280 Ser Leu Ile 3er kla Gi Glj Arc Asr 185 Asp Set Leu Leu Tyr 265 Asp Ala Gin Asp Leu 345 Pro Ile r Ser Ile 170 Arg Thr Leu Ile Ala 250 Val Ala Ile Ala Pro 330 Gly Asp Se2 CyE 155 Gl Ile Ala Arg Leu 235 Pro Arg Leu Asn Gin 315 Ser ksp 3er Pro 140 Ala Ala Ser Cys Arg 220 Thr Asp Gly Ala Gin 300 Lys His Pro Ala 125 Gly Gin Trp Tyr Ser 205 Arg Pro Gly Glu Asp 285 Asp Ala Val Ile Pro Pro Asp Set Leu 190 Ser Glu Glu Arg Gly 270 Gly Gly Va1 Ser 3lu 350 -ys Leu Phe Gly 175 Leu Ala Cys Gly Cys 255 Cys Asp Arg Leu Leu 335 Ile Leu Ala Gly 160 Ser Asp Leu Asp Met 240 Lys Gly Ala Ser Gin 320 Ile Glu Leu 355 360 365 Gly Ser Val Lys Thr Asn Ile Giy His Leu Glu Gly Ala Ala Gly Ile 121 370 375 380 Ala Gly Leu Ile Lys Ala Val Leu Ala Leu Gin His Arg Thr Ile Pro 385 390 395 400 Pro His Leu His Phe Arg Arg Leu Asn Pro Asn Ile Ser Leu Asp Gly 405 410 415 Ser Arg Phe Arg Ile Ala Thr Glu Ser Ser Pro Trp Thr Ser Glu Gly 420 425 430 Arg Pro Arg Leu Ala Gly Val Ser Ser Phe Gly Phe Gly Gly Ser Asn 435 440 445 Ala His Val Ile Leu Glu Glu Ala Pro Ala Leu Pro Leu Pro Lys Pro 450 455 460 Val Thr Arg Pro Gin Leu Leu Thr Leu Ser Ala Arg Thr Asp Glu Ala 465 470 475 480 Leu Gly Glu Leu Ala Gly His Phe Ala Glu Phe Leu Gin Ser His Pro 485 490 495 Asn Ala Leu Leu Ser Asp Val Cys Phe Thr Ser Gin Val Gly Arg Asp 500 505 510 Ala Tyr Ser His Arg Leu Ala Ile Thr Ala Ala Asp Ala Ala Glu Ala 515 520 525 Val Ala Ala Leu Ala Ala Ala Pro Arg Arg Glu Val Ser Leu Arg Arg 530 535 540 Arg Pro Ala Ile Ala Phe Leu Phe Thr Gly Gin Gly Ala Gin Tyr Ala 545 550 555 560 Gly Met Gly Ala Glu Leu Tyr Lys Thr Gin Pro Val Phe Arg Asp Ala 565 570 575 Leu Asp Arg Cys Ala Asp Trp Leu Arg Pro Gin Leu Asp Val Pro Leu 580 585 590 Thr Val Leu Leu Phe Glu Ser Val Ser Pro Leu His Glu Thr Ala Tyr 595 600 605 Thr Gin Pro Ala Met Phe Ala Leu Glu Trp Ala Leu Ala Gin Phe Trp 610 615 620 Leu Ser Leu Gly Val Arg Pro Asp Tyr Val Leu Gly His Ser Leu Gly 625 630 635 640 Glu T Leu A Gly L Ala Ile 705 Arg Ser Ala Gly Arg 785 Ala Leu Leu Ser Leu 865 yr V .rg L Jys P~ 6 la I 3er kia Pro Ile Ala 770 Gin Asp Thr His Leu 850 Tyr al eu ila ;75 ;y Gly Leu Ala Val1 Ala 645 Thr Cys Ala Val1 Arg Ala Gly 122 Gly Ala 650 Arg Leu Phe Val Ser Asn
V.
A
660 665 Val Val Ile Al a His Ala Ala 680 Ala Asn Ser
P
695 Thr Val Met Ala Glu 725 Asp Ala 710 Thr pro Giu Arg Ile 740 Ala 755 Val Leu Arg Thr Ser 835 Gly Ala Tyr Leu Arg Giu Leu 820 Leu Gly Gly Gln Pro Glu Cys 805 Gly Ser Leu Glu Pro Lys Thr 790 Lys Arg Lys Phe Ser Leu Gly 775 Val Leu Tyr Gly Thr 855 Pro Ile Glu Leu Ala 760 Thr Gin Phe Cys ArS 84( Al~ Se~ Ala Leu Asp 745 Ile Thr *Phe *Leu Leu 825 Ser i Gly r Arg ~ro sn 31u %.sn 730 Lys Pro Leu Giu Giu 810 Pro Asr Val Va Ser Arg I 6 Ala Pro A 700 Ala Gin P 715 Val Ser k Phe Giti Leu Val Asp Ala 780 Ser Ala 795 Ile Gly Asp Asp Trp Ser Asfl Pro 860 1 Ala Leti 875 SVal Pro al G la L.
6 le P .sp 7 sp I [is kla Ser 765 A.rg Met Pro Gly Val1 845 Asp Pro Ala lu eu lia ~rg ksp klia Leu 750 Asri Tyr Arg HiE Al 83( Lei Trj Th Ar Asp G 655 Pro A Ala L Thr V Leu 1 Phe 1 735 Ala Val Trp Thr Pro 815 i. Val 0 :i Leti SArg r Tyr g Giu rg ,eu ral [is is Ser Arg Leu 800 Thr Trp Glu Gly Pro 880 pro Phe Gin Arg Ala Arg Gly Asp Thr Phe Ser Leu Arg Ar 885 89( 895 Gly 900 Met Leu Gly Ala Arg 905 Leu Asn Ser Ala Leu Gly Asp 910 123 Val Ile Phe Glu Asn Ser Leu Thr Thr Glu Thr Pro Leu Leu His Glu 915 920 925 His Val Ile Tyr Asp Ala Val Ile Val Pro Gly Ala Trp His Val Ser 930 935 940 Ala Phe Leu Glu Ala Ala Gin Glu Val Phe Gly Pro Val Pro Cys Ala 945 950 955 960 Val Ser Asp Val Met Met Arg Gin Ala Leu Ala Ile Pro Pro Asp Thr 965 970 975 Pro Val Thr Val Gin Ala Ile Val Thr Pro Gly Glu Asp Gly Glu Ala 980 985 990 Lys Val Gin Val Phe Ser Gin Asp Gly Asp Ser Trp Lys Leu His Thr 995 1000 1005 Ala Ala Ser Leu Arg Ala Ala Thr Ala Gly Ala Val His Phe Glu Leu 1010 1015 1020 Pro Ala Gin Pro Ser Glu Val Ile Ser Gly Asp Ala Phe Tyr Gly Ala 1025 1030 1035 1040 Met Asn Ala Arg Gly Val Asp Leu Gly Pro Ala Phe Ser Trp Val Glu 1045 1050 1055 Glu Val Trp Arg Arg Asp Gly Glu Ala Leu Gly Arg Met Arg Leu Pro 1060 1065 1070 Val Ala Glu Asp Gly Ala Asn Ala Tyr Arg Leu His Pro Gly Leu Ile 1075 1080 1085 Asp Ser Cys Phe Gin Val Phe Gly Ala Thr Trp Pro Ala Glu Arg Cys 1090 1095 1100 Gin Pro Gly Ala Tyr Val Pro Val Gly Ile Glu Ala Val Arg Phe Tyr 1105 1110 1115 1120 Arg Pro Pro Ala Gly Ser Leu Arg Cys His Ala Arg Leu Arg Pro Ser 1125 1130 1135 Ser Ser Gly Pro Phe Val Gly Asp Leu Thr Leu Val Glu Glu Thr Gly 1140 1145 1150 Ala Val Ile Ala Glu Phe Ser Gly Leu Ala Val Met His Ala Gly Thr 1155 1160 1165 Leu Gin Ser Ala Gin Ser Trp Leu Gin Asp Val Gin Trp Gin Glu Cys 124 1170 1175 1180 Glu Arg Ser Thr Thr Leu Lys Ser Asp Gly Pro Gly Lys Pro Giu Asp 1185 1190 1195 1200 Trp, Leu Leu Cys Ala Gly Ala Asp Asp Val Ala Gly Leu Met Pro Gin 1205 1210 1215 Glu Leu Arg Val Val Ser Gly Val Thr Leu A-rg Gin Ala Leu Glu Gin 1220 1225 1230 Thr Gin Thr Leu Val Gly Arg Pro Ala Arg Leu Trp Leu Ile Thr Arg 1235 1240 1245 Gly Val His Arg Ile Ser Asp.Asp Asp Ala Thr Pro Val Asp Pro Phe 1250 1255 1260 Gin Ala Pro Leu Trp, Gly Leu Gly Gin Ala Ile Ala Arg Glu His Pro 1265 1270 1275 1280 Giu Leu Trp Gly Gly Leu Ile Asp Leu Gly Cys Asp Asn Ala Asp Ile 1285 1290 1295 Ala Ala Ala Met Leu Leu Asp Glu Ile Arg Tyr Ala Gly Asp Asp Lys 1300 1305 1310 Aia Ile Ala Leu Arg Asn Gly Arg Arg Tyr Val Arg Arg Leu Val Arg 1315 1320 1325 His Lys Giu Thr Ser Lys Arg Pro Pro Al1a Ile Ser Ala Asp Gly Val 1330 1335 1340 Tyr Leu Ile Thr Gly Gly Leu Gly Ala Leu Gly Arg Arg Val Ala Arg 1345 1350 1355 1360 Arg Leu Ile Glu Gin Gly Ala Arg Arg Leu Val Leu Val Gly Arg His 1365 1370 1375 Thr Glu Ala Val Ala Asp Leu Giu Gin Leu Gly Ala Ala Val Met Val 1380 1385 1390 Ala Ala Cys Asp Val Ser Ser Giu Gin Gin Leu Ala Ala Leu Leu Ala 1395 1400 1405 Asp Pro Arg Thr Gln Pro Leu Arg Gly Val Val His Ala Ala Gly Val 1410 1415 1420 Leu Asp Asp Gly Val Val Thr Glu Gin Thr Trp Ala Arg Phe Giu Lys 1425 1430 1435 1440 125 Val Leu Ala Pro Lys Leu Gin Gly Ala Trp Asn Leu His Gin Leu Thr 1445 1450 1455 Arg His His Ala Leu Asp Phe Phe Val Leu Phe Ser Ser Ala Ala Ser 1460 1465 1470 Leu Leu Gly Ser Ala Gly Gin Ser Asn Tyr Ser Ala Ala Asn Ala Phe 1475 1480 1485 Leu Asp Ser Leu Ala His Met Arg Arg Ala Gin Gly Leu Pro Ala Leu 1490 1495 1500 Ser Ile Asn Trp Gly Pro Trp Ala Gly Glu Gly Met Ala Ala Arg Ile 1505 1510 1515 1520 Ala Arg Gin Gly Leu Pro Gly Val Pro Leu Leu Pro Pro Glu Val Gly 1525 1530 1535 Ala Arg Ile Phe Gly Asp Leu Leu Gly Glu Thr Ala Ala Gin Ile Ala 1540 1545 1550 Val Phe Gin Val Ser Ala Glu Lys Arg Arg Ser Pro Ala Ser Asp Pro 1555 1560 1565 Gly Phe Ile Gin Gin Leu Thr Glu Ala Ala Pro Glu Arg Arg Gin Glu 1570 1575 1580 Leu Leu Gin Met Arg Ile Arg Lys Gin Ala Gly Gly Val Leu Ala Leu 1585 1590 1595 1600 Asp Ala Ser Lys Thr Leu Asp Pro Arg Arg Pro Leu Lys Glu Tyr Gly 1605 1610 1615 Leu Asp Ser Leu Met Ala Leu Asp Leu Ala Arg Ala Ile Gly Glu Leu 1620 1625 1630 Val Arg Lys Ser Leu Pro Ala Thr Leu Leu Tyr Asp His Pro Thr Val 1635 1640 1645 Glu Lys Leu Ala Gly His Val Leu Arg Glu Leu Gly Leu Asp Val Pro 1650 1655 1660 Ser Asp Ser Leu Val Asp Glu Val Arg Gin Leu Ser Glu Gin Glu Met 1665 1670 1675 1680 Ala Ala Phe Ile Thr Glu Thr Leu His His Leu Gly Glu Glu Arg 1685 1690 1695 126 <210> 126 <211> 1434 <212> PRT <213> bacterium <400> 126 Met Ser Asp Leu Thr Pro Leu Gin Gin Ala Val Leu Ala Leu Lys Arg is Thr Al a Ala Pro Ala Val Ala Ala Thr 145 Pro Phe Pro His *Arg Ile *Phe so Ala Pro Asp Gly Leu 130 Gly Thr Ser Asn Leu 210 Ala Val1 Trp Gly Gly Gly Leu 115 Glu Val1 Asp Thr Phe 195 Ala Arg Gly Gin Arg Lys Phe 100 Asp Arg Phe Pro Ala 180 Pro Cys LeL Met Leu Trp Met Asp Pro Ala Ile Ala 165 Ala Val Arg Asp Ala Leu Asp 70 Tyr Ala Gin Gly Gly 150 Leu Gly Asp Ser Glu Cys His 55 Al a Thr Gly Gin Arg 135 Ile Ile Arg Thr Leu 215 *Leu Arg 40 Asp Asp Arg Phe Arg 120 Pro Ser Asp Ile Ala 200 Gin Gi u 25 Phe Gly Ala Leu Phe 105 Leu Pro Thr Al a Ser 185 :ys Ser Ser Pro Ile Phe Gly 90 Gly Leu Asp Asp Tyr 170 Tyr Ser Arg Val Gly Asp Tyr 75 Gly Ile Leu Ser Asp 155 Thr Leu Ser Glu His Ala Ala Asp Phe Thr Giu Leu 140 Tyr Gly Leu Ser Cys 220 Asn Asp Ile Pro Leu Pro Val 125 Al a Ser Thr Gly Leu 205 Ser Giu Ser Arg Asp Asp Arg 110 Ala Gly Arg Gly Leu 190 Val1 Met Pro Pro Giu Pro Gly Glu Trp Ser Leu Thr 175 Gin Ala Ala Ile Glu Ile Asn Ala Val Giu Asp Lys 160 Ala Gly Val1 Leu Ala Gly Gly Val Asn Leu Ile Leu Ala Pro Glu Ser Thr lie Tyr Phe 225 230 235 240 127 Cys Arg Leu Arg Ala Met Ala Ala Asp Gly Arg Cys Lys Ser Phe Ala 245 250 255 Ala Ser Ala Asp Gly Tyr Gly Arg Gly Glu Gly Cys Gly Met Leu Val 260 265 270 Leu Lys Arg Leu Ser Asp Ala Thr Arg Asp Gly Asp Arg Ile Leu Ala 275 280 285 Leu Ile Arg Gly Ser Ala Val Asn His Gly Gly Arg Ser Asn Gly Leu 290 295 300 Thr Ala Pro Asn Gly Pro Ala Gin Glu Ala Val Ile Arg Ala Ala Leu 305 310 315 320 Lys Asn Ala Gly Met Ala Pro Ala Asp Val Asp Tyr Val Giu Ala His 325 330 335 Gly Thr Gly Thr Pro Leu Giy Asp Pro Ile Glu Leu Arg Ala Met Ala 340 345 350 Ala Val Leu Gly Glu Gly Arg Ala Val Asp Ser Pro Leu Ile Val Gly 355 360 365 Ser Val Lys Thr Asn Phe Gly His Leu Giu Ala Ala Ala Gly Ile Ala 370 375 380 Gly Leu Ile Lys Thr Ile Leu Ala Leu Gln His Arg Giu Ile Pro Pro 385 390 395 400 His Leu His Phe Asn Ala Pro Asn Pro His Val Leu Trp Asn Giu Leu 405 410 415 Pro Leu Lys Ile Ala Thr Ala Cys Ser Pro Trp Pro Ser Asn Gly Arg 420 425 430 Pro Arg Val Ala Gly Val Ser Ser Phe Gly Ile Ser Gly Thr Asn Ser 435 440 445 His Val Val Leu Ala Glu Ala Lys Thr Asri Val Giu Ala Lys Thr Asn 450 455 460 Val Giu Ala Lys Thr Asn Val Glu Ala Lys Thr Ser Glu Glu Val Lys 465 470 475 480 Ala Ser Val Giu Ala Lys Gly Asn Val Giu Ala Lys Ala Ser Ala Ser 485 490 49S Val Pro Leu Leu Glu Gly Asp Ser Arg Pro Arg Ser Gly Gly Gly Gly 128 S00 Pro S0S Pro Ser Arg Glu Glu Val Pro Val Pro 520 510 Asp Gin Leu Ser Gly Arg 515 His Pro 545 His His Asn Ala Phe 625 Ser Phe Ser Ala Pro 705 Val Arg Alz 53( Gir Al a Tyr Gin Thr 610 Ser Asp Arg Gly Val 690 Asp Ala Ser Glu Ala Pro Glu Leu 595 Gly Gly Glu Ser Ala 675 Gin Gly1 Gly Arg I As! Let Lei.
His 580 Leu Phe Gin Pro Phe 660 Trp Ile lal Ile .eu 140 ?Gly i Arg Ser 565 Arg Giu Ala Gly Val1 645 Val Leu Ala AlaC Leu 'I 725 Leu L Arc Asr 550 Al a Ala Ala Asp Gly 630 Phe G1u Ser eu ;ly 1l0 'hr ~eu Gb.
53~ Let Let Ala Phe Pro 615 Gin Arg Trp Gin Val1 695 His Leu Gly .i Tyr xAla 1Cys Phe Arg 600 IGly Tyr Ser Arg Ile 680 Giu Ser D~ Glu Leu A~ Leu Gly Ser Val1 585 Arg Val1 Pro Al.a ELeu 665 ksp eu e t ~sp Lrg 145 Leu Ala Ala 570 Ala Asn Arg Arg Ile 650 Ala Arg Leu GlyC Ala 730 Gly P Prc Tyr 555 Ala Ser Giu Pro Met 635 Giu Asp VJal 31n ilu 115 ~rg Leu 540 *Arg *Ser Ser Thr Lys 620 Ala Arg Leu Gin Ser 700 Val Arg Gly2 Sex Asp Leu Leu Asn 605 Leu Tyr Cys Leu Pro 685 Trp kla Ile kla Ala Gly Thr Pro 590 Arg Ala Arg Asp Ala 670 Al a Gly Ala Ile Met 750 ArS Are Arg 575 Glu Gly Phe Leu Ala 655 Asp Leu Ile kla Zys 735 kla His Phe 560 Ser Phe Val Ile Tyr 640 Ala Glu Phe Arg His 720 Arg Leu 7 Val Giu Leu Pro Leu Asp Arg Ala Lys Ala Val Leu Ala Giu Arg Gly 129 Leu Thr Thr Val Ser Val Ala Ala Ser Asn 770 Phe Ser 785 Arg Arg Ser Ser Arg Val Gly Gin 850 Asn Leu 865 Gly Asp Thr Pro Val Arg Glu Leu 930 Ala Leu 945 Trp Gin Pro Ala Leu Ala Leu Pro 1010 775 Gly Gly Gin Ile 835 Leu Arg Giu Ser Pro 915 Leu Ala Arg Val Pro 995 Phe Asp Val1 Val1 820 Ala Ser Gin Phe Ile 900 Val Ala Ser Giu Gly 980 Asp Leu Arg Phe 805 Asp Lys Thr Pro Thr 885 Giu Leu Ala Ser Arg 965 Gly Thr Ala Val Ala Leu 790 Cys Arg Leu Pro Leu Glu Arg Ser Ala 840 Gly Giu Ala 855 Val Arg Phe 870 Gin Phe Leu Asp Ser Leu Arg Arg Asp 920 Leu Tyr Val 935 Pro Asp Thr 950 Phe Trp, Phe His Pro Leu His Val Trp 1000 Asp His Arg 1015 Glu Ile As n 825 Val1 Cys Trp, Giu Arg 905 Giu Asn Arg Ala Leu 985 Glu Leu His Gin 810 Glu Pro Asp Glu Ile 890 Thr Pro Gly Leu Thr 970 Gly Ser Asn Gly Pro A-rg 780 Leu Lys Asp 795 Val Asp Val Leu Arg Gin Phe Phe Ser 845 Ala Ser Tyr 860 Ser Leu Gin 875 Ser Pro His Leu Gly Ile Glu Arg Arg 925 Gin Arg Pro 940 Asp Leu Pro 955 Ser Thr Arg Arg Lys Val Val Leu Ser 1005 Giu Leu Val 1020 Ser Asp Ala Giu 830 Thr Trp Ala Pro Asn 910 Giu Asp Thr Arg Giu 990 Leu Val Thr Phe Ser 815 Leu Val1 Val1 Met Val 895 Gly Leu Trp Tyr Ser 975 Ile Asp Leu Val1 Glu 800 His Gly Giu Ala Ala 880 Leu Leu Leu Arg Pro 960 Leu Ala Ala Pro Gly Ala Ala Tyr Val Glu Met Ala Leu Ala Ala Ala Lys Glu Val Phe 1025 1030 1035 1040 Ala Val Ala Gly Gly Cys Ser 1045 Pro Ser Ala Gly 1060 Phe Arg Ile Ser 1075 Leu Ala Ser Glu Ser Leu 130 Giu Ile Arg 1050 Arg Val Gin 1065 Ala Glu Gly Phe Val1 Gly Glu Ile Ser Gin Met Leu loss Leu Giu Gly 1070 Asp Trp Thr Val1 His Glu 1080 1085 His Ala Arg 1090 Val Ser Leu 1105 Ala Ala Phe Ile Ala Glu Gly Thr Met Ala 1095 Pro Thr Leu Gly 1110 Ala Ser Gin Gly 1125 Val Trp Arg Arg Al Asr Met Asp Ala Pr Arg 11E His Tyz 1130 IGly Glu 1145 1140 DAsp Lys Val Ala Pro Thr 1100 SGlu Gly Asp Asp Phe Tyr 1115 1120 *Gly Asp Thr Phe Arg Gly 1135 Ala Val Ala Arg Leu Ser 1150 *Gly Tyr Thr Leu His Pro 1165 Gly Ala Thr Leu Gly Gly 1180 Ala Ile Glu Arg Leu His 1195 1200 Val His Ala Arg Leu Thr 1215 Val Pro Asp Ala Val Arg Giu Ala Glu Set 1155 1160 Ala Leu LeU Asp Ala Cys Leu Gin Val Leu 1170 1175 Glu Gly Ser Ala Gly Pro Cys Val Pro Val 1185 1190 Cys Phe Gly Arg Pro Ala Gly Asp Leu A rg 1205 1210 Giy Arg Leu Glu Gly Asp Val Thr Leu Cys 1220 1225 Ile Leu Glu Val Gin Gly Leu Arg Aia Gin 1235 1240 Glu Tr-p Phe His Ala Met Glu Trp Glu Pro 1250 1255 Pro Thr Ala Thr Val Ser Gly Ala Trp Leu 1265 1270 1 Asp Giu Gin Val1 .275 *Ala Glu Gly 1230 Leu Glu Arg 1245 Leu Leu Ala 1260 Ile Ala Asp His Vai Gln Ser Glu Ser Ala Gly 1280 Gly Ile Ile Ser Ala Ala Val 1285 Arg Asp Ala Ala Glu Arg Ile Gly Leu Gly 1290 Pro Asp Gln Thr Pro Asn Tyr Thr Val Vai 1295 Arg Gly Vai 131 1300 1305 1310 Ile His Cys Gly Ser Leu Asp Giu Thr Giu Asp Giu Thr Asp Pro Ser 1315 1320 1325 Ala Ala Giy Gly Thr Ala Cys Giu Asp Ile Leu Arg Ile Val Gin Glu 1330 1335 1340 Phe Gly Val Gly Arg Ile Gin Leu Thr Lys Gin Ala Ser Asp Ala Glu 1345 1350 1355 1360 Ser Gin His Pro Arg Ile Trp Leu Ile Thr Ala Gly Val His Ala Giu 1365 1370 1375 His Leu Gin Met Pro Val Val Pro Ala Arg Ala Pro Val Trp, Gly Leu 1380 1385 1390 Gly Arg Thr Ile Ala Ala Glu His Pro Giu Phe Ala Cys Thr Cys Ile 1395 1400 1405 Asp Leu Asp Thr Ala Gly Glu Val Glu Val Gin Ala Leu Cys Arg Giu 1410 1415 1420 Ile Leu Ala Gly Ser Ser Glu Arg Gin Gly 1425 1430

Claims (6)

1. Process for preparing a collection of recombinant vectors, wherein a collection of nucleic acids is prepared from a soil sample containing organisms, by a process comprising the following sequence of steps: I obtaining microparticles by grinding a pre-dried or pre-desiccated soil sample, followed by suspension of the microparticles in a liquid buffer medium; and extracting the nucleic acids present in the microparticles; and passage of the solution containing the nucleic acids over a molecular sieve, followed by recovery of the elution fractions enriched in nucleic acids and passage of the elution fractions enriched in nucleic acids over an anion- exchange chromatography support, followed by recovery of the elution fractions containing the purified nucleic acids, 15 and the obtained nucleic acids are inserted into a cloning and/or expression vector.
2. Process according to claim 1, characterized in that step is followed by an additional step of: 20 treating the microparticles suspended in a liquid buffer by sonication. :o 3. Process according to claim 1, characterized in that step is followed o by the following additional steps: treatment of the microparticles suspended in a liquid buffer by sonication; 25 incubation of the suspension at 37 0 C after sonication in the presence of lysozyme and achromopeptidase; addition SDS; and recovery of the nucleic acids.
4. Process according to claim 1, characterized in that step is followed by the following additional steps: P.OPER\Ktb\21791-01 rel doc-2RD4/05
172- homogenization of the microparticles using a step of vigorous mixing (vortex) followed by a step of simple stirring; freezing the homogeneous suspension followed by thawing; treatment of the suspension by sonication after thawing; incubation of the suspension at 37 0 C after sonication in the presence of lysozyme and achromopeptidase; and addition of SDS. Process according to any one of claims 1 to 4, characterized in that the nucleic acids are DNA molecules. 6. Process according to any one of claims 1 to 5, characterized in that the nucleic acids are separated as a function of their size prior to inserting them into the cloning and/or expression vector. 7. Process according to any one of claims 1 to 5, characterized in that the average size of the nucleic acids is made substantially uniform by physical rupture, prior to inserting them into the cloning and/or expression vector. 20 8. Process according to any one of claims 1 to 5, characterized in that the cloning and/or expression vector is of the plasmid type. 9. Process according to any one of claims 1 to 5, characterized in that the cloning and/or expression vector is of the cosmid type. i. Process according to claim 9, characterized in that it is a cosmid which is replicative in E. coli and integrative in Streptomyces. 11. Process according to claim 10, characterized in that it is the cosmid pOS7001. P.\OPERlKbm\21791401 rel.doc-28MA)5
173- 12. Process according to claim 9, characterized in that it is a cosmid which is conjugative and integrative in Streptomyces. 13. Process according to claim 12, characterized in that the cosmid is chosen from cosmids pOSV303, pOSV306 and pOSV307. 14. Process according to claim 9, characterized in that it is a cosmid which is replicative both in E. coli and in Streptomyces. 15. Process according to claim 14, characterized in that it is the cosmid pOS 700R. 16. Process according to claim 9, characterized in that it is a cosmid which is replicative in E. coli and Streptomyces and conjugative in Streptomyces. 17. Process according to claim 1, characterized in that the cloning and/or expression vector is the BAC type. *O 18. Process according to claim 17, characterized in that it is a BAC vector S. 20 which is integrative and conjugative in Streptomyces. o 19. Process according to claim 18, characterized in that the vector is chosen from BAC vectors pOSV403, pMBD-1, pMBD-2, pMBD-3, pMBD-4, pMBD- 5 and pMBD-6. i Collection of recombinant vectors obtained by the process of any one of claims 1 to 21. Collection of recombinant vectors as obtained according to the process of any one of claims 7 to 19. P\OPERbrnWU11791-O1 ,el -174- 22. Collection of recombinant host cells, each of the constituent host cells of the collection comprising a nucleic acid from the collection of nucleic acids according to claim 23. Process for detecting a nucleic acid of given nucleotide sequence, or whose nucleotide sequence is structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to claim 22, characterized in that it comprises the following steps: placing the collection of recombinant host cells in contact with a pair of primers which hybridize with the given nucleotide sequence or which hybridize with the nucleotide sequence that is structurally similar to a given nucleotide sequence; carrying out at least three amplification cycles; and detecting any nucleic acid amplified. 24. Process for detecting a nucleic acid of given nucleotide sequence, or •whose nucleotide sequence is structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to claim 22, characterized in that it comprises the following steps: placing the collection of recombinant host cells in contact with a probe which hybridizes with the given nucleotide sequence or which hybridizes with a nucleotide sequence that is structurally similar to the given nucleotide sequence; and detecting the hybrid possibly formed between the probe and the nucleic 25 acids included in the vectors of the collection. Process for identifying the production of a compound of interest by one or more recombinant host cells in a collection of recombinant host cells according to claim 22, characterized in that it comprises the following steps: culturing the recombinant host cells of the collection in a suitable culture medium; and P\OPER\Ktn21'7914l reas doc.-2A4)5
175- detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant host cells cultured. 26. Process for selecting a recombinant host cell which produces a compound of interest in a collection of recombinant host cells according to claim 22, characterized in that it comprises the following steps: culturing recombinant host cells of the collection in a suitable culture medium; detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant host cells cultured; and selecting recombinant host cells which produce the compound of interest. 27. Process for producing a compound of interest, characterized in that it 15 comprises the following steps: culturing a recombinant host cell selected according to the process of claim 26; and -recovering and, where appropriate, purifying the compound produced by the said recombinant host cell. *gO 28. A process according to any one of claims 1 and 23 to 27, substantially as hereinbefore described and/or exemplified. DATED this 29 th day of April, 2005 Aventis Pharma S.A. By DAVIES COLLISON CAVE Patent Attorneys for the Applicants
AU21791/01A 1999-11-29 2000-11-27 Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds Ceased AU781961B2 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
FR9915032 1999-11-29
FR9915032A FR2801609B1 (en) 1999-11-29 1999-11-29 PROCESS FOR OBTAINING NUCLEIC ACIDS FROM A SAMPLE OF THE ENVIRONMENT, NUCLEIC ACIDS THUS OBTAINED AND THEIR APPLICATION TO THE SYNTHESIS OF NEW COMPOUNDS
US20980000P 2000-06-07 2000-06-07
US60/209800 2000-06-07
PCT/FR2000/003311 WO2001040497A2 (en) 1999-11-29 2000-11-27 Method for obtaining nucleic acids from an environment sample

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2005211587A Division AU2005211587A1 (en) 1999-11-29 2005-09-19 Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds

Publications (2)

Publication Number Publication Date
AU2179101A AU2179101A (en) 2001-06-12
AU781961B2 true AU781961B2 (en) 2005-06-23

Family

ID=26235170

Family Applications (2)

Application Number Title Priority Date Filing Date
AU21791/01A Ceased AU781961B2 (en) 1999-11-29 2000-11-27 Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds
AU2005211587A Abandoned AU2005211587A1 (en) 1999-11-29 2005-09-19 Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds

Family Applications After (1)

Application Number Title Priority Date Filing Date
AU2005211587A Abandoned AU2005211587A1 (en) 1999-11-29 2005-09-19 Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds

Country Status (9)

Country Link
EP (1) EP1268764A2 (en)
JP (1) JP2003520578A (en)
KR (1) KR20020060242A (en)
AU (2) AU781961B2 (en)
BR (1) BR0015993A (en)
CA (1) CA2393041A1 (en)
IL (1) IL149846A0 (en)
NO (1) NO20022532L (en)
WO (1) WO2001040497A2 (en)

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010047029A1 (en) * 2000-04-26 2001-11-29 Handelsman Jo E. Triaryl cation antibiotics from environmental DNA
US20040121313A1 (en) 2002-12-06 2004-06-24 Ecker David J. Methods for rapid detection and identification of bioagents in organs for transplantation
US20030027135A1 (en) 2001-03-02 2003-02-06 Ecker David J. Method for rapid detection and identification of bioagents
US7666588B2 (en) 2001-03-02 2010-02-23 Ibis Biosciences, Inc. Methods for rapid forensic analysis of mitochondrial DNA and characterization of mitochondrial DNA heteroplasmy
US7226739B2 (en) 2001-03-02 2007-06-05 Isis Pharmaceuticals, Inc Methods for rapid detection and identification of bioagents in epidemiological and forensic investigations
US8073627B2 (en) 2001-06-26 2011-12-06 Ibis Biosciences, Inc. System for indentification of pathogens
US7217510B2 (en) 2001-06-26 2007-05-15 Isis Pharmaceuticals, Inc. Methods for providing bacterial bioagent characterizing information
FR2834997B1 (en) * 2002-01-21 2004-03-19 Evialis METHOD FOR DETECTION OF OCHRATOXIN A OR CITRININ PRODUCING FUNGI
EP1386966A1 (en) 2002-07-24 2004-02-04 Libragen Method for the expression of unknown environmental DNA into adapted host cells
AU2003264077C1 (en) * 2002-08-20 2009-06-11 B.R.A.I.N. Biotechnology Research And Information Network Ag Isolation and cloning of DNA from uncultivated organisms
EP1905833B1 (en) 2002-10-08 2015-08-12 Aventis Pharma S.A. Polypeptides involved in the biosynthesis of spiramycins, nucleotide sequences coding these polypeptides and their uses
CA2508726A1 (en) 2002-12-06 2004-07-22 Isis Pharmaceuticals, Inc. Methods for rapid identification of pathogens in humans and animals
US8057993B2 (en) 2003-04-26 2011-11-15 Ibis Biosciences, Inc. Methods for identification of coronaviruses
US8158354B2 (en) 2003-05-13 2012-04-17 Ibis Biosciences, Inc. Methods for rapid purification of nucleic acids for subsequent analysis by mass spectrometry by solution capture
US7964343B2 (en) 2003-05-13 2011-06-21 Ibis Biosciences, Inc. Method for rapid purification of nucleic acids for subsequent analysis by mass spectrometry by solution capture
US20120122103A1 (en) 2003-09-11 2012-05-17 Rangarajan Sampath Compositions for use in identification of bacteria
US8097416B2 (en) 2003-09-11 2012-01-17 Ibis Biosciences, Inc. Methods for identification of sepsis-causing bacteria
US8546082B2 (en) 2003-09-11 2013-10-01 Ibis Biosciences, Inc. Methods for identification of sepsis-causing bacteria
US8163895B2 (en) 2003-12-05 2012-04-24 Ibis Biosciences, Inc. Compositions for use in identification of orthopoxviruses
WO2005073377A1 (en) * 2004-01-28 2005-08-11 Toudai Tlo, Ltd. Method of collecting dna from environmental sample
US7666592B2 (en) 2004-02-18 2010-02-23 Ibis Biosciences, Inc. Methods for concurrent identification and quantification of an unknown bioagent
JP4810533B2 (en) 2004-05-24 2011-11-09 アイビス バイオサイエンシズ インコーポレイティッド Mass spectrometry using selective ion filtration by digital thresholding.
US20050266411A1 (en) 2004-05-25 2005-12-01 Hofstadler Steven A Methods for rapid forensic analysis of mitochondrial DNA
WO2005121336A1 (en) * 2004-06-08 2005-12-22 Toagosei Co., Ltd. Method of purifying environmental dna and method of efficiently screening protein-encoding gene from environmental dna
US7811753B2 (en) 2004-07-14 2010-10-12 Ibis Biosciences, Inc. Methods for repairing degraded DNA
US20060205040A1 (en) 2005-03-03 2006-09-14 Rangarajan Sampath Compositions for use in identification of adventitious viruses
US8084207B2 (en) 2005-03-03 2011-12-27 Ibis Bioscience, Inc. Compositions for use in identification of papillomavirus
US8026084B2 (en) 2005-07-21 2011-09-27 Ibis Biosciences, Inc. Methods for rapid identification and quantitation of nucleic acid variants
CA2663029C (en) 2006-09-14 2016-07-19 Ibis Biosciences, Inc. Targeted whole genome amplification method for identification of pathogens
EP2126132B1 (en) 2007-02-23 2013-03-20 Ibis Biosciences, Inc. Methods for rapid foresnsic dna analysis
US9598724B2 (en) 2007-06-01 2017-03-21 Ibis Biosciences, Inc. Methods and compositions for multiple displacement amplification of nucleic acids
EP2349549B1 (en) 2008-09-16 2012-07-18 Ibis Biosciences, Inc. Mixing cartridges, mixing stations, and related kits, and system
US8534447B2 (en) 2008-09-16 2013-09-17 Ibis Biosciences, Inc. Microplate handling systems and related computer program products and methods
US8148163B2 (en) 2008-09-16 2012-04-03 Ibis Biosciences, Inc. Sample processing units, systems, and related methods
US8637249B2 (en) 2008-11-14 2014-01-28 Gen-Probe Incorporated Compositions, kits and methods for detection of Campylobacter nucleic acid
WO2010093943A1 (en) 2009-02-12 2010-08-19 Ibis Biosciences, Inc. Ionization probe assemblies
US9194877B2 (en) 2009-07-17 2015-11-24 Ibis Biosciences, Inc. Systems for bioagent indentification
US8950604B2 (en) 2009-07-17 2015-02-10 Ibis Biosciences, Inc. Lift and mount apparatus
ES2628739T3 (en) 2009-10-15 2017-08-03 Ibis Biosciences, Inc. Multiple displacement amplification
EP2325312A1 (en) * 2009-11-19 2011-05-25 Qiagen GmbH Method for selective enrichment and isolation of microbial and optional additionally of viral nucleic acids

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4900677A (en) * 1986-09-26 1990-02-13 E. I. Du Pont De Nemours And Company Process for rapid isolation of high molecular weight DNA
KR19990008000A (en) * 1995-04-24 1999-01-25 로버트 에스. 화이트 헤드 How to create a new metabolic pathway and screen it
US6261842B1 (en) * 1997-10-23 2001-07-17 Wisconsin Alumni Research Foundation Microorganism genomics, compositions and methods related thereto
EP1090113A1 (en) * 1998-06-23 2001-04-11 Biosearch Italia S.p.A. Methods for transferring the capability to produce a natural product into a suitable production host

Also Published As

Publication number Publication date
WO2001040497A3 (en) 2002-10-17
JP2003520578A (en) 2003-07-08
AU2179101A (en) 2001-06-12
NO20022532D0 (en) 2002-05-28
WO2001040497A2 (en) 2001-06-07
BR0015993A (en) 2002-08-06
EP1268764A2 (en) 2003-01-02
AU2005211587A1 (en) 2005-10-13
NO20022532L (en) 2002-06-26
CA2393041A1 (en) 2001-06-07
IL149846A0 (en) 2002-11-10
KR20020060242A (en) 2002-07-16

Similar Documents

Publication Publication Date Title
AU781961B2 (en) Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds
CN108368517B (en) Methods and compositions for rapid plant transformation
DK2271666T3 (en) NRPS-PKS GROUP AND ITS MANIPULATION AND APPLICABILITY
JP2001197895A (en) Method and composition for cell technology and metabolism technology
AU2017233862B2 (en) Methods and compositions for increased double stranded RNA production
KR20100039443A (en) Compositions and methods relating to the daptomycin biosynthetic gene cluster
CN109843909A (en) The cell and method of rhamnolipid are generated using the glucose transporter of substitution
JP2009539368A (en) Plasmid RK2 system wide host range cloning vector useful for transfer of metagenomic library to various bacterial species
US7255989B1 (en) Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds
CN108753674B (en) Gene cluster for regulating and controlling milbemycin synthesis, recombinant streptomycete, and preparation method and application thereof
CN108456703B (en) Method for heterogeneously expressing epothilone
WO2002059322A9 (en) Compositions and methods relating to the daptomycin biosynthetic gene cluster
CN110857447B (en) Method for increasing yield of milbemycins A3/A4 or derivatives thereof
US20020164747A1 (en) Gene cluster for ramoplanin biosynthesis
Sarovich et al. pPSX: A novel vector for the cloning and heterologous expression of antitumor antibiotic gene clusters
KR102159415B1 (en) Uk-2 biosynthetic genes and method for improving uk-2 productivity using the same
NZ519883A (en) Extraction of DNA from microorganisms in dry soil samples thereby identifying the diversity of the organisms present
KR101189475B1 (en) Genes and proteins for biosynthesis of tricyclocompounds
WO2004018635A2 (en) Myxococcus xanthus bacteriophage mx9 transformation and integration system
KR20100034055A (en) Vectors and methods for cloning gene clusters or portions thereof
US20030171562A1 (en) Genes and proteins for the biosynthesis of polyketides
JP3944577B2 (en) Method for producing recombinant proteins in Rhodococcus bacteria
MXPA02003819A (en) Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds.
CN107164394B (en) Biosynthetic gene cluster of atypical keratinocyte compound nenestatin A and application thereof
KR100952754B1 (en) A gene cluster encoding antibiotic activity from soil metagenome

Legal Events

Date Code Title Description
MK6 Application lapsed section 142(2)(f)/reg. 8.3(3) - pct applic. not entering national phase
TH Corrigenda

Free format text: IN VOL 15, NO 40, PAGE(S) 8688-8692 UNDER THE HEADING APPLICATIONS LAPSED, REFUSED OR WITHDRAWN PLEASE DELETE ALL REFERENCE TO APPLICATION NOS. 21791/01, 50796/01, 12935/01, 23577/01, 16879/01, 16877/01, 16875/01, 15084/01 AND 15090/01