WO1999058722A1 - Characterization of interactions between molecular interaction sites of rna and ligands therefor - Google Patents

Characterization of interactions between molecular interaction sites of rna and ligands therefor Download PDF

Info

Publication number
WO1999058722A1
WO1999058722A1 PCT/US1999/010510 US9910510W WO9958722A1 WO 1999058722 A1 WO1999058722 A1 WO 1999058722A1 US 9910510 W US9910510 W US 9910510W WO 9958722 A1 WO9958722 A1 WO 9958722A1
Authority
WO
WIPO (PCT)
Prior art keywords
rna
molecular interaction
compounds
organic compounds
interaction site
Prior art date
Application number
PCT/US1999/010510
Other languages
French (fr)
Inventor
Richard Griffey
Venkatraman Mohan
Original Assignee
Isis Pharmaceuticals, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Isis Pharmaceuticals, Inc. filed Critical Isis Pharmaceuticals, Inc.
Priority to AU39010/99A priority Critical patent/AU3901099A/en
Publication of WO1999058722A1 publication Critical patent/WO1999058722A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction

Definitions

  • the present invention is directed to methods of identifying compounds which bind to molecular interaction sites of nucleic acids, especially RNA.
  • the present invention is also directed to the numerical representations of the three dimensional structures of molecular interaction sites and the compounds which interact with those sites.
  • nucleic acids has been recognized as a valid strategy for interference with biological pathways and the treatment of disease.
  • DNA deoxyribonucleic acids
  • RNA ribonucleic acids
  • a wide variety of "small" molecules, oligomers and ohgonucleotides have been shown to possess binding affinity for nucleic acids.
  • the vast majority of experience in interfering with nucleic acid function has been via the specific binding of ligands to a particular base, base pair, and/or primary sequence of bases in the nucleic acid target.
  • Some compounds have also demonstrated a composite specificity that arises from recognition and interactions with both the primary and secondary structural features of the nucleic acid, such as preferential binding to A-T base pairs in the DNA minor groove, with little or no binding to corresponding RNA sequences.
  • RNA structure Many approaches to predicting RNA structure have been discussed in the scientific literature. Essentially, these involve sequencing and genomic analysis of nucleic acids, such as RNA, as a first step to establish the primary sequence structure and potential folded structures of the target. A second step entails definition of structural constraints such as base pairing and long range interactions among bases based on information derived from cross-linking, biochemical and genetic structure- function studies. This information, together with modeling and simulation software, has allowed scientists to predict three dimensional models of RNA and DNA. While such models may not be as powerful as X-ray crystal - 3 - structures, they have been useful in ascertaining some structural features and structure- function relationships.
  • a hairpin motif comprising a double helical stem and a single-stranded loop is believed to be one of the simplest yet most important structural element in nucleic acids.
  • Such hairpin structures are proposed to be nucleation sites and serve as major building blocks for the folded three dimensional structure of RNAs. Shen, et al, FASEB J. , 1995, 9, 1023. Hairpins are also involved in specific interactions with a variety of proteins to regulate gene expression.
  • Nucleic acid hairpin structures have therefore been widely studied by NMR, molecular modeling techniques such as constrained molecular dynamics and distance geometry (Cheong, et al, Nature, 1990, 346, 680 and Cain, et al, Nuc.
  • MC-SYM is yet another approach to predicting the three dimensional structure of RNAs using a constraint-satisfaction method.
  • Major et al, Proc. Natl. Acad. Sci. , 1993, 90, 9408.
  • the MC-SYM program is an algorithm based on constraint satisfaction that searches conformational space for all models that satisfy query input constraints, and is described in, for example, Cedergren, et al,RNA Structure And Function, 1998, Cold Spring Harbor Lab. Press, p.37-75.
  • Three dimensional structures of RNA are produced by this method by the stepwise addition of a nucleotide having one or several different conformations to a growing oligonucleotide model.
  • Westhof and Altman have described the generation of a three-dimensional working model of Ml RNA, the catalytic RNA subunit of RNase P from E. coli via an interactive computer modeling protocol.
  • This modeling protocol incorporated data from chemical and enzymatic protection experiments, phylogenetic analysis, studies of the activities of mutants and the kinetics of reactions catalyzed by the binding of substrate to Ml RNA. Modeling was performed for the most part as described in the literature. Westhof, et al, in "Theoretical Biochemistry and Molecular Biophysics," Beveridge and Lavery (eds.), Adenine, NY, 1990, 399.
  • a method to model nucleic acid hairpin motifs has been developed based on a set of reduced coordinates for describing nucleic acid structures and a sampling algorithm that equilibriates structures using Monte Carlo (MC) simulations. Tung, Biophysical J. , 1997, 72, 876, incorporated herein by reference.
  • the stem region of a nucleic acid can be adequately modeled by using a canonical duplex formation.
  • an algorithm that is capable of generating structures of single stranded loops with a pair of fixed ends was created. This allows efficient structural sampling of the loop in conformational space.
  • the comparison of molecular interaction sites of RNA with compounds is achieved through comparison of numerical representations of the three-dimensional structure of the molecular interaction site with the three dimensional structure of the ligands in a fashion such that such interactions can be compared as to quality.
  • Another object of the present invention is the preparation of hierarchies of ligands ranked or ordered in accordance with their ability to interact with molecular interaction sites of RNA and other nucleic acid targets.
  • Yet another object of the present invention is the establishment of databases of the numerical representations of three-dimensional structures of molecular interaction sites of nucleic acids and three-dimensional structures of libraries of ligands.
  • databases libraries provide powerful tools for the elucidation of structure and interactions of molecular interaction sites with potential ligands and predictions thereof.
  • the present invention is directed to methods of identifying compounds which bind to a molecular interaction site of a nucleic acid comprising providing a numerical representation of the three-dimensional structure of the molecular interaction site and providing a compound data set comprising numerical representations of the three dimensional structures of a plurality of organic compounds.
  • the numerical representation of the molecular interaction site is then compared with members of the compound data set to generate a hierarchy of organic compounds ranked in accordance with the ability of the organic compounds to form physical interactions with the molecular interaction site.
  • the present invention is also directed to data sets comprising the numerical representations of the three dimensional structures of molecular interaction sites and to the numerical representations of the three dimensional structure of a plurality of organic compounds.
  • the present invention is directed to methods of identifying compounds which bind to a molecular interaction site of nucleic acids. They comprise providing a numerical representation of the three dimensional structure of the molecular interaction site, providing a compound data set comprising numerical representations of the three dimensional structures of a plurality of organic compounds, comparing the numerical representation of the molecular interaction site with members of the compound data set to generate a hierarchy of organic - 7 - compounds which is ranked in accordance with the ability of the organic compounds to form physical interactions with the molecular interaction site.
  • Figure 1 shows exemplary compounds which were docked to TAR with subsequent evaluation of the solvation/desolvation energy.
  • Figure 2 shows the target RNA for 4.5S-P48.
  • Figure 3 A shows a representative demonstration of cap-dependent translation of three DNA plasmids with a wheat germ lysate system: a) a luciferase gene with a 9 base leader sequence before the AUG start codon; b) translation of a construct with the TAR RNA structure adjacent to the cap; c) translation of a construct with the TAR RNA structure separated from the cap by a 9 base leader sequence. Solid bars: no added m 7 G. Hatched bars: added m 7 G.
  • Figure 3B shows an exemplary inhibition of translation of an mRNA construct containing the TAR RNA structure by a 39 amino acid t ⁇ t peptide: a) translation of a luciferase mRNA with a 9 base leader sequence with and without 10 ⁇ M added tat peptide; b) translation of luciferase mRNA containing the TAR RNA structure adjacent to the cap; c) translation of the luciferase/TAR RNA construct with a 9 base leader in the presence/absence of 10 ⁇ M t ⁇ t peptide.
  • Figure 4 shows an exemplary dose-dependent inhibition of translation of a luciferase mRNA construct containing a TAR RNA structure in the 5'-UTR by ACD 00001199 (DecpBlue-3).
  • Solid line inhibition of translation of the control luc+9 plasmid.
  • Dashed line inhibition of expression of the luc+9 mRNA containing the TAR RNA structure of the 5'-UTR.
  • Figure 5 shows a representative lowest energy structure of paromomycin (dark grey) bound to bacterial 16S ribosomal A site (not shown) identified using the QXP method for the lowest energy conformers.
  • the target RNA was held rigid whereas the paromomycin was treated as fully flexible.
  • the structure obtained using NMR is shown in light grey.
  • Figure 6 shows a representative correlation between the observed rms deviation and QXP energy scores obtained for the bacterial 16S ribosomal A site bound to paromomycin. 11-15 represent separate runs.
  • a molecular interaction site is a region of a nucleic acid which has secondary structure.
  • the molecular interaction site is conserved between a plurality of different taxonomic species.
  • the nucleic acid can be either eukaryotic or prokaryotic.
  • the nucleic acid is preferably mRNA, pre-mRNA, tRNA, rRNA, or snRNA.
  • the RNA can be viral, fungal, parasitic, bacterial, or yeast.
  • the molecular interaction site is present in a region of an RNA which is highly conserved among a plurality of taxonomic species. Molecular interaction site are described in further detail in U.S. Application Serial No.
  • RNA targets may be derived from a number of sources.
  • RNA targets can be identified by any means, rendered into three dimensional representations and employed for the identification of compounds which can interact with them to effect modulation of the RNA.
  • the three dimensional structure of a molecular interaction site can be manipulated as a numerical representation.
  • Computer software that provides one skilled in the art with the ability to design molecules based on the chemistry being performed and on available reaction building blocks is commercially available.
  • Software packages from companies such as, for example, Tripos (St. Louis, MO), Molecular Simulations (San Diego, CA), MDL Information Systems (San Leandro, CA) and Chemical Design (NJ) provide means for computational generation of structures. These software products also provide means for evaluating and comparing computationally generated - 9 - molecules and their structures. In silico collections of molecular interaction sites can be generated using the software from any of the above-mentioned vendors and others which are or may become available
  • a set of structural constraints for the molecular interaction site of the RNA can be generated from biochemical analyses such as, for example, enzymatic mapping and chemical probes, and from genomics information such as, for example, covariance and sequence conservation. Information such as this can be used to pair bases in the stem or other region of a particular secondary structure. Additional structural hypotheses can be generated for noncanonical base pairing schemes in loop and bulge regions.
  • a Monte Carlo search procedure can sample the possible conformations of the RNA consistent with the program constraints and produce three dimensional structures.
  • the present invention preferably employs computer software that allows the construction of three dimensional models of RNA structure, the construction of three dimensional, in silico representations of a plurality of organic compounds, "small" molecules, polymeric compounds, ohgonucleotides and other nucleic acids, screening of such in silico representations against RNA molecular interaction sites in silico, scoring and identifying the best potential binders from the plurality of compounds, and finally, synthesizing such compounds in a combinatorial fashion and testing them experimentally to identify new ligands for such targets.
  • an automated computational search algorithm such as those described above, is used to predict all of the allowed three dimensional molecular interaction site structures, preferably from RNA, which are consistent with the biochemical and genomic constraints specified by the user. Based, for example on their root-mean-squared deviation values, these structures are clustered into different families. - 10 -
  • a representative member or members of each family can be subjected to further structural refinement via molecular dynamics with explicit solvent and cations.
  • Structural enumeration and representation by these software programs is typically done by drawing molecular scaffolds and substituents in two dimensions. Once drawn and stored in the computer, these molecules may be rendered into three dimensional structures using algorithms present within the commercially available software.
  • MC-SYM is used to create three dimensional representations of the molecular interaction site.
  • the rendering of two dimensional structures of molecular interaction sites into three dimensional models typically generates a low energy conformation or a collection of low energy conformers of each molecule.
  • the end result of these commercially available programs is the conversion of a nucleic acid sequence containing a molecular interaction site into families of similar numerical representations of the three dimensional structures of the molecular interaction site. These numerical representations form an ensemble data set.
  • the three dimensional structures of a plurality of compounds can be designated as a compound data set comprising numerical representations of the three dimensional structures of the compounds.
  • "Small” molecules in this context refers to non-oligomeric organic compounds.
  • Two dimensional structures of compounds can be converted to three dimensional structures, as described above for the molecular interaction sites, and used for querying against three dimensional structures of the molecular interaction sites.
  • the two dimensional structures of compounds can be generated rapidly using structure rendering algorithms commercially available.
  • the three dimensional representation of the compounds which are polymeric in nature, such as ohgonucleotides or other nucleic acids structures, may be generated using the literature methods described above.
  • a three dimensional structure of "small" molecules or other compounds can be generated and a low energy conformation can be obtained from a short molecular dynamics minimization.
  • These three dimensional structures can be stored in a relational database.
  • the compounds upon which three dimensional structures are constructed can be proprietary, commercially available, or virtual.
  • a compound data set comprising numerical representations of the three dimensional structure of a plurality of organic compounds is provided by, for example, Converter (MSI, San Diego) from two - 11 - dimensional compound libraries generated by, for example, a computer program modified from commercial programs.
  • Converter MSI, San Diego
  • Other suitable databases can be constructed by converting two dimensional structures of chemical compounds into three dimensional structures, as described above.
  • the software is described in greater detail in U.S. Application Serial No. 09/076,405, filed May 12, 1998, which is assigned to the assignee of the present application, and which is incorporated herein by reference in its entirety.
  • the end result is the conversion of two dimensional structures of organic compounds into numerical representations of the three dimensional structures of a plurality of organic compounds.
  • the numerical representations of the molecular interaction sites are compared with members of the compound data set to generate a hierarchy of the organic compounds.
  • the hierarchy is ranked in accordance with the ability of the organic compounds to form physical interactions with the molecular interaction site.
  • the comparing is carried out seriatim upon the members of the compound data set.
  • the comparison can be performed with a plurality of molecular interaction sites at the same time.
  • DOCK a software program that allows structure-based database searches to find and identify molecules that are expected to bind to a receptor of interest. Kuntz, et al., Acc. Chem. Res., 1994, 27, 111, and - 12 -
  • DOCK allows the screening of a large collection of molecules whose three dimensional structures have been generated in silico, i.e., in computer readable format, but for which no prior knowledge of interactions with the ligands is available. DOCK, therefore, is a significant tool to the process of discovering new ligands to a molecule of interest and is presently preferred for use herein.
  • the DOCK program has been widely applied to protein targets and the identification of ligands that bind to them. Typically, new classes of molecules that bind to known targets have been identified, and later verified by in vitro experiments.
  • the DOCK software program consists of several modules, including SPHGEN (Kuntz, et al, J. Mol Biol, 1982, 161, 269) and CHEMGRID (Meng, et al, J. Comput. Chem., 1992, 13, 505). SPHGEN generates clusters of overlapping spheres that describe the solvent-accessible surface of the binding pocket within the target receptor. Each cluster represents a possible binding site for small molecules.
  • CHEMGRID precalculates and stores in a grid file the information necessary for force field scoring of the interactions between binding molecule and target.
  • the scoring function approximates molecular mechanics interaction energies and consists of van der Waals and electrostatic components.
  • DOCK uses the selected cluster of spheres to orient ligands molecules in the targeted site on the receptor. Each molecule within a previously generated three dimensional database is tested in thousands of orientations within the site, and each orientation is evaluated by the scoring function. Only that orientation with the best score for each compound so screened is stored in the output file. Finally, all compounds of the database are ranked in a hierarchy, e.g., ordered by scores, and a collection of the best candidates may then be screened experimentally.
  • DOCK DOCK
  • numerous ligands have been identified for a variety of protein targets. Recent efforts in this area have resulted in reports of the use of DOCK to identify and design small molecule ligands that exhibit binding specificity for nucleic acids such as RNA double helices. While RNA plays a significant role in many diseases such as AIDS, viral and bacterial infections, few studies have been made on small molecules capable of specific RNA binding. Compounds possessing specificity for the RNA double helix, based on the unique geometry of its deep major groove, were identified using the DOCK methodology. Chen, et - 13 - al, Biochemistry, 1997, 36, 11402 and Kuntz, et al, Ace. Chem. Res., 1994, 27, 117. Recently, the application of DOCK to the problem of ligand recognition in DNA quadruplexes has been reported. Chen, et al, Proc. Natl. Acad. Sci., 1996, 93, 2635.
  • QXP is a method that permits flexible ligand docking calculations (McMartin, C. and Bohacek, R.S., J. Comput.-Aided Mole.Design, 1997, 11, 333). In this method, full conformational searches on flexible ligands are carried out.
  • QXP search algorithms employ the Monte Carlo perturbation technique with energy minimization in Cartesian space. An additional fast search step is introduced between the initial perturbation and energy minimization. This method is also presently preferred for use herein.
  • individual compounds to be used in these methods are designated as mol files, for example, and combined into a collection of in silico representations using appropriate computer software, such as the software described in greater detail in U.S. Application Serial No. 09/076,405, filed May 12, 1998, which is assigned to the assignee of the present application, and which is incorporated herein by reference in its entirety.
  • These two dimensional mol files are exported and converted into three dimensional structures using commercial software such as Converter (Molecular Simulations Inc., San Diego) or equivalent software, as described above.
  • Atom types suitable for use with a docking program such as DOCK or QXP are assigned to all atoms in the three dimensional mol file using software such as, for example, Babel, or with other equivalent software.
  • a low-energy conformation of each molecule is generated with software such as Discover (MSI, San Diego).
  • An orientation search is performed by bringing each compound of the plurality of compounds into proximity with the molecular interaction site in many orientations using DOCK or QXP.
  • a contact score is determined for each - 14 - orientation, and the optimum orientation of the compound is subsequently used.
  • the conformation of the compound can be determined from a template conformation of the scaffold determined previously.
  • the interaction of a plurality of compounds and molecular interaction sites is examined by comparing the numerical representations of the molecular interaction sites with members of the compound data set.
  • a plurality of compounds such as those generated by computer programs or otherwise, is compared to the molecular interaction site and allowed to undergo random "motions" among the dihedral bonds of the compounds.
  • about 20,000 to 100,000 compounds are compared to at least one molecular interaction site.
  • 20,000 compounds are compared to about five molecular interaction sites and scored. Individual conformations of the three dimensional structures are placed at the target site in many orientations.
  • the compounds and molecular interaction sites are allowed to be "flexible” such that the optimum hydrogen bonding, electrostatic, and van der Waals contacts can be realized.
  • the energy of the interaction is calculated and stored for 10-15 possible orientations of the compounds and molecular interaction sites.
  • QXP methodology allows true flexibility in both the ligand and target and is presently preferred.
  • the relative weights of each energy contribution are updated constantly to insure that the calculated binding scores for all compounds reflect the experimental binding data.
  • the binding energy for each orientation is scored on the basis of hydrogen bonding, van der Waals contacts, electrostatics, solvation/desolvation, and the quality of the fit.
  • the lowest-energy van der Waals, dipolar, and hydrogen bonding interactions between the compound and the molecular interaction site are determined, and summed. In preferred embodiments, these parameters can be adjusted according to the results obtained empirically.
  • the binding energies for each molecule against the target are output to a relational database.
  • the relational database contains a hierarchy of the compounds ranked in accordance with the ability of the compounds to form physical interactions with the molecular interaction site.
  • the higher ranked compounds are better able to form physical interactions with the molecular interaction site.
  • the highest ranking i.e., the best fitting compounds
  • those - 15 - compounds which are likely to have desired binding characteristics based on binding data are selected for synthesis.
  • the highest ranking 5% are selected for synthesis.
  • the highest ranking 10% are selected for synthesis.
  • the highest ranking 20% are selected for synthesis.
  • the synthesis of the selected compounds can be automated using a parallel array synthesizer or prepared using solution-phase or other solid-phase methods and instruments.
  • the interaction of the highly ranked compounds with the nucleic acid containing the molecular interaction site is assessed as described below.
  • the interaction of the highly ranked organic compounds with the nucleic acid containing the molecular interaction site can be assessed by numerous methods known to those skilled in the art.
  • the highest ranking compounds can be tested for activity in high-throughput (HTS) functional and cellular screens.
  • HTS assays for each target RNA can be determined by scintillation proximity, precipitation, luminescence-based formats, filtration based assays, colorometric assays, and the like.
  • Lead compounds can then be scaled up and tested in animal models for activity and toxicity.
  • the assessment preferably comprises mass spectrometry of a mixture of the nucleic acid and at least one of the compounds or a functional bioassay.
  • the highest ranking 20% of compounds from the hierarchy generated using the DOCK program or QXP are used to generate a further data set of three dimensional representations of organic compounds comprising compounds which are chemically related to the compounds ranking high in the hierarchy.
  • additional compounds up to about 20%
  • This process insures that small errors in the molecular interaction sites are not propagated into the compound identification process.
  • the resulting structure/score data from the highest ranking 20% is studied mathematically (clustered) to find trends or features within the compounds which enhance binding.
  • the compounds are clustered into different groups. Chemical synthesis and screening of the compounds, described above, allows the computed DOCK or QXP scores to be correlated with the actual binding data. After the compounds have been prepared and screened, the predicted binding energy and the observed Kd values are correlated for each compound.
  • the results are used to develop a predictive scoring scheme, which weighs various factors (steric, electrostatic) appropriately.
  • the above strategy allows rapid evaluation of a number of scaffolds with varying sizes and shapes of different functional groups for the high ranked compounds.
  • a further data set of representations of organic compounds comprising compounds which are chemically related to the organic compounds which rank high in the hierarchy can be compared to the numerical representations of the molecular interaction site to determine a further hierarchy ranked in accordance with the ability of the organic compounds to form physical interactions with the molecular interaction site.
  • the further data set of representations of the three dimensional structures of compound which are related to the compounds ranked high in the hierarchy are produced and have, in effect, been optimized by correlating actual binding with virtual binding.
  • the entire cycle can be iterated as desired until the desired number of those compounds highest in the hierarchy are produced.
  • Target biomolecule especially a target RNA or which otherwise have been shown to be able to bind to the target RNA to effect modulation thereof
  • labeling may include all of the labeling forms known to persons of skill in the art such as fluorophore, radiolabel, enzymatic label and many other forms.
  • labeling or tagging facilitates detection of molecular interaction sites and permits facile mapping of chromosomes and other useful processes.
  • the compounds are screened for binding affinity using MASS or conventional high-throughput functional screens.
  • the best scoring compounds from docking a 256- member library against the 16S A-site ribosomal RNA structure are shown in the table below.
  • the DOCK scores ranged from -308.8 to -144.2 as listed in Table 1.
  • the MASS assay was performed with the 27-mer model RNA sequence of the 16S A-site whose NMR structure has been determined.
  • the transcription/translation assay was based on expression of a luciferase plasmid.
  • Paromomycin is an aminoglycoside antibiotic known to bind to the A-site RNA structure.
  • the NMR structure was determined with paromomycin bound at the A-site. - 18 -
  • Paromomycin had the best DOCK contact score, along with high chemical and energy scores.
  • the docking results for these compounds have been correlated with their binding affinity for a 16S RNA fragment using MASS mass spectrometry, and their ability to inhibit protein synthesis in a transcription translation assay.
  • Four of the 12 compounds with the best DOCK scores had good affinity ( ⁇ 10 ⁇ M) for the RNA in the MASS assay and inhibited translation of a luciferase plasmid at ⁇ 10 ⁇ M.
  • all 9 of the "good" binders in the MASS assay scored in the top 30%> in the DOCK calculation.
  • Ibis compound 169970 had the best energy score of any compound, but had a poor contact score. This result suggests that the biological activity may be increased further by modifying the structure to increase the number of close contacts with the 16S A-site RNA.
  • the NMR solution structure of TAR RNA (Varani, et al, J. Mol Biol, 1995, 253, 313) has been used in the study of virtual screening for HIV-1 TAR RNA ligands.
  • the compounds present in the Available Chemicals Database (ACD) have been partitioned into a number of subsets according to their formal charges (neutral, +1 , +2, etc) and DOCKed to the TAR structure. Five aminoglycoside antibiotics were among the 20 compounds with the best binding energies.
  • Example 3 LI 1 /Thiostrepton - An Example Of A High Throughput RNA/Protein Assay
  • RNA molecules play numerous roles in cellular functions that range from structural to enzymatic in nature. These RNA molecules may work as single large molecules, in complexes with one or more proteins, or in partnership with one or more RNA molecules. Some of these complexes, such as those found in the ribosome, have been virtually intractable as high throughput screening targets due to their immense size and complexity. The ribosome presents a particularly rich source of RNA structures and functions that would appear, at first glance, to be highly effective drug targets. A large number of natural antibiotics exist that are - 19 - directed against ribosomal targets indicating the general success of this strategy.
  • thiostrepton a cyclic peptide based antibiotic, inhibits several reactions at the ribosomal GTPase center of the 5 OS ribosomal subunit.
  • thiostrepton acts by binding to the 23 S rRNA component of the 50S subunit at the same site as the large ribosomal protein Ll l. The binding of LI 1 to the 23 S rRNA causes a large conformation shift in the proteins tertiary structure.
  • thiostrepton has very poor solubility, relatively high toxicity, and is not generally useful as an antibiotic. The discovery of new, novel, antibiotics directed against these types of targets would be of great value.
  • thiostrepton appears to be to stabilize a region of the 23 S rRNA and by doing so prevent a structural transition in the LI 1 protein.
  • a SPA assay has been designed to look for small molecules that could be effective as thiostrepton 'like' agents.
  • This assay uses a radiolabeled small fragment of the 23 S rRNA, a biotinylated 75 amino acid fragment of the LI 1 protein that contains the 23 S rRNA binding domain and thiostrepton.
  • the folding conditions of the secondary and tertiary structures of the 23 S rRNA fragment have been examined as have the binding conditions of the Ll l fragment to the 23 S rRNA.
  • the LI 1 -thiostrepton assay has been optimized so that the 23 S rRNA fragment is in an unfolded state prior to the addition of compounds. Addition of the Ll l fragment to this unfolded RNA results in no detectable binding interaction.
  • the high throughput assay is run by mixing the 23 S rRNA fragment, - 20 - under destabilizing conditions, with compounds of interest, incubating this mixture, and then adding the LI 1 fragment. Streptavidin-coated SPA beads are added for binding detection. Thiostrepton is used as a positive control. Addition of thiostrepton to the RNA promotes the correct secondary and/or tertiary folding of the structure and allows the LI 1 fragment to bind leading to the generation of a signal in the assay.
  • a tested paradigm has been developed for designing, developing and performing high and low throughput assays to look at RNA protein function, structure, and binding in bacteria.
  • the LI 1 /thiostrepton assay described above is but one of a number of RNA protein interaction and functional assays that we have designed and developed for high and low throughput screening.
  • Others include functional assays to measure RnaseP, RnaseE, and EF-Tu activity.
  • Assays to examine the function of the bacterial signal recognition particle and S30 assembly are also contemplated.
  • the P48 protein-binding region of the 4.5S RNA present in the signal recognition particle of bacteria has been selected as a target.
  • the binding of P48 to 4.5 S RNA is essential for bacteria to survive, and development of an inhibitor of this binding should generate a novel class of antimicrobial agent.
  • initial screening using DOCK (Meng, et al, J. Comp. Chem., 1992, 13, 505-524, incorporated herein by reference in its entirety) (version 4.0) can be carried out.
  • New compounds (-20,000) will be prepared through combinatorial addition and/or repositioning of hydrogen bonding, aromatic, and charged functional groups to enhance the activity and specificity of the compounds for the bacterial SRP relative to the human counterpart.
  • a pseudobrownian Monte Carlo search in torsion angle space using the program ICM2.6 (Abagyan, et al, J. Comp. Chem., 1994, 15, 488-506, incorporated herein by reference in its entirety) will be performed, coupled with local minimization of each conformation, for automated flexible docking of the truncated database to the NMR structural models.
  • RNA secondary structures near the 5'-cap can affect the rates of translation - 22 - ofmRNAs. Kozak, J. 5to/. Chemistry, 1991, 266, 19867-19870. These RNA structures can bind proteins and inhibit the level of translation.
  • the translational machinery has an ATP-dependant RNA hehcase activity associated with the eIF-4a/eIF-4b complex, and under normal conditions, the RNA structures are opened by the helicase and do not slow the rate of translation of the mRNA.
  • the eIF-4a has a low, i.e., ⁇ M, affinity for the pre-initiation complex.
  • Insertion of a 9-base leader before the TAR structure enhanced the translational efficiency, presumably by allowing the pre-initiation complex to form.
  • the helicase activity associated with the pre-initiation complex can transiently melt out the TAR RNA structure, and the message is translated (see, Figure 3 A).
  • Addition of a 39 amino acid t ⁇ t peptide to the lysate stabilized the TAR RNA structure and inhibited the expression of the luciferase protein, as expected from a specific interaction between the TAR RNA and tat (see, Figure 3B).
  • "Small" organic molecules were then found that could inhibit the translation of the TAR-luciferase mRNA by stabilizing the TAR RNA structure.
  • ACD 00001199 Compounds for the Available Chemicals Directory were docked to the TAR RNA structure and scored for binding energies. Among the best 25 compounds was ACD 00001199, whose structure is shown below. This compound has been shown to bind to TAR RNA with sufficient affinity to disrupt the interaction with tat peptide at a 1 ⁇ M concentration. - 23 - ACD 00001199 Structure
  • QXP method employs Monte Carlo type algorithm to search the conformational space and to make sure that the method is reliable in yielding global minimum
  • QXP docking simulations were run with very different initial ligand structures.
  • the performance of the QXP docking method can be quantified by its ability to identify the bound conformation of the ligand within 1.0 A rms deviation from the crystallographically observed conformation.
  • the success rate of the QXP runs is in the 80% range.
  • the nearly linear correlation between the rms deviation from the crystal structure - 24 - and the score of the docked structure indicates that the QXP method is sufficiently accurate in predicting structures of ligand-receptor complexes.
  • the QXP method was used to derive an accurate structure of a bound ligand to the RNA target.
  • the NMR structure of the bacterial 16S ribosomal A site bound to paromomycin (Fourmy et al, Science, 1996, 274, 1367; PDB ID: lpbr) was used as the reference state.
  • the aminoglycoside antibiotic was removed from the ligand-RNA complex.
  • the conformation space of paromomycin was exhaustively searched using the QXP method for the lowest energy conformers.
  • the target RNA was held rigid whereas the paromomycin was treated as fully flexible. Multiple docking searches with the randomly disrupted paromomycin as initial structures were performed.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention is directed to methods of identifying compounds which bind to a molecular interaction site of a nucleic acid comprising providing a numerical representation of the three-dimensional structure of the molecular interaction site, providing a compound data set comprising numerical representations of the three-dimensional structures of a plurality of organic compounds, and comparing the numerical representation of the molecular interaction site with members of the compound data set to generate a hierarchy of organic compounds ranked in accordance with the ability of the organic compounds to form physical interactions with the molecular interaction site. Data sets comprising the numerical representations of the three-dimensional structures of molecular interaction sites, and comprising the numerical representations of the three-dimensional structure of a plurality of organic compounds are also described.

Description

CHARACTERIZATION OF INTERACTIONS BETWEEN MOLECULAR INTERACTION SITES OF RNA AND LIGANDS THEREFOR
CROSS REFERENCE TO RELATED APPLICATIONS
The present application is a continuation-in-part of U.S. Serial No. 09/076,447 filed May 12, 1998, which claims priority to provisional U.S. Serial No. 60/085,092 filed May 12, 1998, each of which is incorporated herein by reference in its entirety.
FIELD OF THE INVENTION
The present invention is directed to methods of identifying compounds which bind to molecular interaction sites of nucleic acids, especially RNA. The present invention is also directed to the numerical representations of the three dimensional structures of molecular interaction sites and the compounds which interact with those sites.
BACKGROUND OF THE INVENTION
The selection of compounds for synthesis and screening is a critical step in any drug discovery process. This is particularly true for combinatorial chemistry-based discovery strategies, where a very much larger number of compounds can be conceived than can be prepared in a reasonable time frame. Computational chemistry methods have been applied to find the "best" sets of compounds for screening. One strategy optimizes the chemical "diversity" in a library in order to increase the likelihood of finding a hit with biological activity in a screen against a macromolecular target of unknown structure. - 2 -
Targeting nucleic acids has been recognized as a valid strategy for interference with biological pathways and the treatment of disease. In this regard, both deoxyribonucleic acids (DNA) and ribonucleic acids (RNA) have been the target of numerous therapeutic strategies. A wide variety of "small" molecules, oligomers and ohgonucleotides have been shown to possess binding affinity for nucleic acids. The vast majority of experience in interfering with nucleic acid function has been via the specific binding of ligands to a particular base, base pair, and/or primary sequence of bases in the nucleic acid target. Some compounds have also demonstrated a composite specificity that arises from recognition and interactions with both the primary and secondary structural features of the nucleic acid, such as preferential binding to A-T base pairs in the DNA minor groove, with little or no binding to corresponding RNA sequences.
Exploiting the knowledge of the three-dimensional structure of biological targets is a promising strategy from a drug design and discovery standpoint. This has been demonstrated by the design and development of numerous drugs and drug candidates targeted to proteins involved in various pathophysiological pathways. While three dimensional structures of proteins have been widely determined by techniques such as X-ray crystallography, molecular modeling and NMR, nucleic acid targets have been difficult to study. The literature reveals few three dimensional structures of biologically active RNA, including a tRNA, said to have been determined via X-ray crystallography. Quigley, et al. , Nucleic Acids Res., 1975, 2, 2329; and Moras, et al, Nature (London), 1980, 255, 669. The difficulties associated with proper crystallization and study of nucleic acids by X-ray methods along with the increasing number of biologically important small RNAs have increased the need for new structure determination and drug discovery strategies for such targets.
Many approaches to predicting RNA structure have been discussed in the scientific literature. Essentially, these involve sequencing and genomic analysis of nucleic acids, such as RNA, as a first step to establish the primary sequence structure and potential folded structures of the target. A second step entails definition of structural constraints such as base pairing and long range interactions among bases based on information derived from cross-linking, biochemical and genetic structure- function studies. This information, together with modeling and simulation software, has allowed scientists to predict three dimensional models of RNA and DNA. While such models may not be as powerful as X-ray crystal - 3 - structures, they have been useful in ascertaining some structural features and structure- function relationships.
An understanding of the structural features of specific motifs in nucleic acids, especially hairpins, loops, helices and double helices, has been found to be useful in gaining molecular insights. For example, a hairpin motif comprising a double helical stem and a single-stranded loop is believed to be one of the simplest yet most important structural element in nucleic acids. Such hairpin structures are proposed to be nucleation sites and serve as major building blocks for the folded three dimensional structure of RNAs. Shen, et al, FASEB J. , 1995, 9, 1023. Hairpins are also involved in specific interactions with a variety of proteins to regulate gene expression. Feng, et al, Nature, 1988, 334, 165, Witherell, et al, Prog. Nucleic Acids Res. Mol Biol, 1991, 40, 185, and Phillipe, et al, J. Mol Biol, 1990, 211, 415. Nucleic acid hairpin structures have therefore been widely studied by NMR, molecular modeling techniques such as constrained molecular dynamics and distance geometry (Cheong, et al, Nature, 1990, 346, 680 and Cain, et al, Nuc. Acids Res., 1995, 23, 2153), X-ray crystallography (Valegard, et al , Nature, 1994, 371, 623 and Chattopadhyaya, et al, Nature, 1988, 334, 175), and theoretical methods (Tung, Biophysical J., 1997, 72, 876,
Erie, et al, Biopolymers, 1993, 33, 75, and Raghunathan, et al , Biochemistry, 1991, 30, 782.
The determination of potential three dimensional structures of nucleic acids and their attendant structural motifs affords insights into areas such as the study of catalysis by RNA, RNA-RNA interactions, RNA-nucleic acid interactions, RNA-protein interactions, and the recognition of small molecules by nucleic acids. Four general approaches to the generation of model three dimensional structures of RNA have been demonstrated in the literature. All of these employ sophisticated molecular modeling and computational algorithms for the simulation of folding and tertiary interactions within target nucleic acids, such as RNA. Westhof and Altman (Proc. Natl Acad. Sci., 1994, 91, 5133) have described the generation of a three-dimensional working model of Ml RNA, the catalytic RNA subunit of RNase P from E. coli via an interactive computer modeling protocol. Leveraging the significant body of work in the area of cryo-electron microscopy (cryo-EM) and biochemical studies on ribosomal RNAs, Mueller and Brimacombe (J. Mol. Biol, 1997, 271, 524) have constructed a three dimensional model of E. coli 16S Ribosomal RNA. A method to model nucleic acid hairpin motifs has been developed based on a set of reduced coordinates for - 4 - describing nucleic acid structures and a sampling algorithm that equilibriates structures using Monte Carlo (MC) simulations (Tung, Biophysical J., 1997, 72, 876, incorporated herein by reference in its entirety). MC-SYM is yet another approach to predicting the three dimensional structure of RNAs using a constraint-satisfaction method. Major, et al, Proc. Natl. Acad. Sci. , 1993, 90, 9408. The MC-SYM program is an algorithm based on constraint satisfaction that searches conformational space for all models that satisfy query input constraints, and is described in, for example, Cedergren, et al,RNA Structure And Function, 1998, Cold Spring Harbor Lab. Press, p.37-75. Three dimensional structures of RNA are produced by this method by the stepwise addition of a nucleotide having one or several different conformations to a growing oligonucleotide model.
Westhof and Altman (Proc. Natl Acad. Sci., 1994, 91, 5133) have described the generation of a three-dimensional working model of Ml RNA, the catalytic RNA subunit of RNase P from E. coli via an interactive computer modeling protocol. This modeling protocol incorporated data from chemical and enzymatic protection experiments, phylogenetic analysis, studies of the activities of mutants and the kinetics of reactions catalyzed by the binding of substrate to Ml RNA. Modeling was performed for the most part as described in the literature. Westhof, et al, in "Theoretical Biochemistry and Molecular Biophysics," Beveridge and Lavery (eds.), Adenine, NY, 1990, 399. In general, starting with the primary sequence of Ml RNA, the stem-loop structures and other elements of secondary structure were created. Subsequent assembly of these elements into a three dimensional structure using a computer graphics station and FRODO (Jones, J. Appl. Crystallogr., 1978, 11, 268) followed by refinement using NUCLIN-NUCLSQ afforded a RNA model that had correct geometries, the absence of bad contacts, and appropriate stereochemistry. The model so generated was found to be consistent with a large body of empirical data on Ml RNA and opens the door for hypotheses about the mechanism of action of RNase P. However, the models generated by this method are less well resolved that the structures determined via X- ray crystallography.
Mueller and Brimacombe (J. Mol Biol, 1997, 271, 524) have constructed a three dimensional model of E. coli 16S ribosomal RNA using a modeling program called ΕRNA-3D. This program generates three dimensional structures such as A-form RNA helices and single-strand regions via the dynamic docking of single strands to fit electron density - 5 - obtained from low resolution diffraction data. After helical elements have been defined and positioned in the model, the configurations of the single strand regions is adjusted, so as to satisfy any known biochemical constraints such as RNA-protein cross-linking and foot- printing data. A method to model nucleic acid hairpin motifs has been developed based on a set of reduced coordinates for describing nucleic acid structures and a sampling algorithm that equilibriates structures using Monte Carlo (MC) simulations. Tung, Biophysical J. , 1997, 72, 876, incorporated herein by reference. The stem region of a nucleic acid can be adequately modeled by using a canonical duplex formation. Using a set of reduced coordinates, an algorithm that is capable of generating structures of single stranded loops with a pair of fixed ends was created. This allows efficient structural sampling of the loop in conformational space. Combining this algorithm with a modified Metropolis Monte Carlo algorithm afforded a structure simulation package that simplifies the study of nucleic acid hairpin structures by computational means. Knowledge and mastery of the foregoing techniques is assumed to be part of the ordinary skill in the art. There has been a long-felt need in the art to provide methods for improved determination of the three-dimensional structure of important regulatory and other elements in nucleic acids, especially RNA. It is also been greatly desired to achieve improved knowledge about the nature of interactions between ligands or potential ligands and nucleic acids, especially RNA. The present invention is directed towards satisfaction of these objectives.
Accordingly, it is an objective of the present invention to provide improved characterization of interactions between RNA and other nucleic acids and ligands or potential ligands therefor. A further object of the invention is to compare molecular interaction sites of
RNA with compounds proposed for interaction therewith.
In accordance with preferred embodiments of the present invention, the comparison of molecular interaction sites of RNA with compounds is achieved through comparison of numerical representations of the three-dimensional structure of the molecular interaction site with the three dimensional structure of the ligands in a fashion such that such interactions can be compared as to quality. - 6 -
Another object of the present invention is the preparation of hierarchies of ligands ranked or ordered in accordance with their ability to interact with molecular interaction sites of RNA and other nucleic acid targets.
Yet another object of the present invention is the establishment of databases of the numerical representations of three-dimensional structures of molecular interaction sites of nucleic acids and three-dimensional structures of libraries of ligands. Such databases libraries provide powerful tools for the elucidation of structure and interactions of molecular interaction sites with potential ligands and predictions thereof.
Other objectives will become apparent to persons of ordinary skill in the art upon review of the present specification and appended claims.
SUMMARY OF THE INVENTION
The present invention is directed to methods of identifying compounds which bind to a molecular interaction site of a nucleic acid comprising providing a numerical representation of the three-dimensional structure of the molecular interaction site and providing a compound data set comprising numerical representations of the three dimensional structures of a plurality of organic compounds. The numerical representation of the molecular interaction site is then compared with members of the compound data set to generate a hierarchy of organic compounds ranked in accordance with the ability of the organic compounds to form physical interactions with the molecular interaction site.
The present invention is also directed to data sets comprising the numerical representations of the three dimensional structures of molecular interaction sites and to the numerical representations of the three dimensional structure of a plurality of organic compounds. The present invention is directed to methods of identifying compounds which bind to a molecular interaction site of nucleic acids. They comprise providing a numerical representation of the three dimensional structure of the molecular interaction site, providing a compound data set comprising numerical representations of the three dimensional structures of a plurality of organic compounds, comparing the numerical representation of the molecular interaction site with members of the compound data set to generate a hierarchy of organic - 7 - compounds which is ranked in accordance with the ability of the organic compounds to form physical interactions with the molecular interaction site.
While there are a number of ways to identify molecular interaction sites, identify compounds likely to interact with molecular interaction sites of RNA and other biological molecules, synthesize such compounds and analyze their binding, preferred methodologies are described in U.S. Serial Numbers 09/076,440, 09/076,405, 09/076,447, 09/076,206, 09/076,214, and 09/076,404, each of which was filed on May 12, 1998 and each assigned to the assignee of this invention. All of the foregoing applications are incorporated by reference herein in their entirety.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows exemplary compounds which were docked to TAR with subsequent evaluation of the solvation/desolvation energy. Figure 2 shows the target RNA for 4.5S-P48. Figure 3 A shows a representative demonstration of cap-dependent translation of three DNA plasmids with a wheat germ lysate system: a) a luciferase gene with a 9 base leader sequence before the AUG start codon; b) translation of a construct with the TAR RNA structure adjacent to the cap; c) translation of a construct with the TAR RNA structure separated from the cap by a 9 base leader sequence. Solid bars: no added m7G. Hatched bars: added m7G. Figure 3B shows an exemplary inhibition of translation of an mRNA construct containing the TAR RNA structure by a 39 amino acid tαt peptide: a) translation of a luciferase mRNA with a 9 base leader sequence with and without 10 μM added tat peptide; b) translation of luciferase mRNA containing the TAR RNA structure adjacent to the cap; c) translation of the luciferase/TAR RNA construct with a 9 base leader in the presence/absence of 10 μM tαt peptide.
Figure 4 shows an exemplary dose-dependent inhibition of translation of a luciferase mRNA construct containing a TAR RNA structure in the 5'-UTR by ACD 00001199 (DecpBlue-3). Solid line: inhibition of translation of the control luc+9 plasmid. Dashed line: inhibition of expression of the luc+9 mRNA containing the TAR RNA structure of the 5'-UTR. - 8 -
Figure 5 shows a representative lowest energy structure of paromomycin (dark grey) bound to bacterial 16S ribosomal A site (not shown) identified using the QXP method for the lowest energy conformers. The target RNA was held rigid whereas the paromomycin was treated as fully flexible. The structure obtained using NMR is shown in light grey. Figure 6 shows a representative correlation between the observed rms deviation and QXP energy scores obtained for the bacterial 16S ribosomal A site bound to paromomycin. 11-15 represent separate runs.
DETAILED DESCRIPTION OF THE INVENTION
A molecular interaction site is a region of a nucleic acid which has secondary structure. Preferably, the molecular interaction site is conserved between a plurality of different taxonomic species. The nucleic acid can be either eukaryotic or prokaryotic. The nucleic acid is preferably mRNA, pre-mRNA, tRNA, rRNA, or snRNA. The RNA can be viral, fungal, parasitic, bacterial, or yeast. Preferably, the molecular interaction site is present in a region of an RNA which is highly conserved among a plurality of taxonomic species. Molecular interaction site are described in further detail in U.S. Application Serial No. 09/076,440, filed May 12, 1998, which is assigned to the assignee of the present application, which is incorporated herein by reference in its entirety. In accordance with some preferred embodiments of this invention, it will be appreciated that the biomolecules having a molecular interaction site or sites, especially RNAs, may be derived from a number of sources. Thus, such RNA targets can be identified by any means, rendered into three dimensional representations and employed for the identification of compounds which can interact with them to effect modulation of the RNA.
The three dimensional structure of a molecular interaction site, preferably of an RNA, can be manipulated as a numerical representation. Computer software that provides one skilled in the art with the ability to design molecules based on the chemistry being performed and on available reaction building blocks is commercially available. Software packages from companies such as, for example, Tripos (St. Louis, MO), Molecular Simulations (San Diego, CA), MDL Information Systems (San Leandro, CA) and Chemical Design (NJ) provide means for computational generation of structures. These software products also provide means for evaluating and comparing computationally generated - 9 - molecules and their structures. In silico collections of molecular interaction sites can be generated using the software from any of the above-mentioned vendors and others which are or may become available
A set of structural constraints for the molecular interaction site of the RNA can be generated from biochemical analyses such as, for example, enzymatic mapping and chemical probes, and from genomics information such as, for example, covariance and sequence conservation. Information such as this can be used to pair bases in the stem or other region of a particular secondary structure. Additional structural hypotheses can be generated for noncanonical base pairing schemes in loop and bulge regions. A Monte Carlo search procedure can sample the possible conformations of the RNA consistent with the program constraints and produce three dimensional structures.
Reports of the generation of three dimensional, in silico representations are available from the standpoint of library design, generation, and screening against protein targets. Likewise, some efforts in the area of generating RNA models have been reported in the literature. However, there are no reports on the use of structure-based design approaches to query in silico representations of organic molecules, "small" molecules, ohgonucleotides or other nucleic acids, with three dimensional, in silico, representations of RNA structures. The present invention preferably employs computer software that allows the construction of three dimensional models of RNA structure, the construction of three dimensional, in silico representations of a plurality of organic compounds, "small" molecules, polymeric compounds, ohgonucleotides and other nucleic acids, screening of such in silico representations against RNA molecular interaction sites in silico, scoring and identifying the best potential binders from the plurality of compounds, and finally, synthesizing such compounds in a combinatorial fashion and testing them experimentally to identify new ligands for such targets.
In preferred embodiments of the invention, an automated computational search algorithm, such as those described above, is used to predict all of the allowed three dimensional molecular interaction site structures, preferably from RNA, which are consistent with the biochemical and genomic constraints specified by the user. Based, for example on their root-mean-squared deviation values, these structures are clustered into different families. - 10 -
A representative member or members of each family can be subjected to further structural refinement via molecular dynamics with explicit solvent and cations.
Structural enumeration and representation by these software programs is typically done by drawing molecular scaffolds and substituents in two dimensions. Once drawn and stored in the computer, these molecules may be rendered into three dimensional structures using algorithms present within the commercially available software. Preferably, MC-SYM is used to create three dimensional representations of the molecular interaction site. The rendering of two dimensional structures of molecular interaction sites into three dimensional models typically generates a low energy conformation or a collection of low energy conformers of each molecule. The end result of these commercially available programs is the conversion of a nucleic acid sequence containing a molecular interaction site into families of similar numerical representations of the three dimensional structures of the molecular interaction site. These numerical representations form an ensemble data set.
The three dimensional structures of a plurality of compounds, preferably "small" organic compounds, can be designated as a compound data set comprising numerical representations of the three dimensional structures of the compounds. "Small" molecules in this context refers to non-oligomeric organic compounds. Two dimensional structures of compounds can be converted to three dimensional structures, as described above for the molecular interaction sites, and used for querying against three dimensional structures of the molecular interaction sites. The two dimensional structures of compounds can be generated rapidly using structure rendering algorithms commercially available. The three dimensional representation of the compounds which are polymeric in nature, such as ohgonucleotides or other nucleic acids structures, may be generated using the literature methods described above. A three dimensional structure of "small" molecules or other compounds can be generated and a low energy conformation can be obtained from a short molecular dynamics minimization. These three dimensional structures can be stored in a relational database. The compounds upon which three dimensional structures are constructed can be proprietary, commercially available, or virtual.
In some preferred embodiments of the invention, a compound data set comprising numerical representations of the three dimensional structure of a plurality of organic compounds is provided by, for example, Converter (MSI, San Diego) from two - 11 - dimensional compound libraries generated by, for example, a computer program modified from commercial programs. Other suitable databases can be constructed by converting two dimensional structures of chemical compounds into three dimensional structures, as described above. The software is described in greater detail in U.S. Application Serial No. 09/076,405, filed May 12, 1998, which is assigned to the assignee of the present application, and which is incorporated herein by reference in its entirety. The end result is the conversion of two dimensional structures of organic compounds into numerical representations of the three dimensional structures of a plurality of organic compounds. These numerical representations are presented as a compound data set. After both the numerical representations of the three-dimensional structure of the molecular interaction sites and the compound data set comprising numerical representations of the three dimensional structures of a plurality of organic compounds are obtained, the numerical representations of the molecular interaction sites are compared with members of the compound data set to generate a hierarchy of the organic compounds. The hierarchy is ranked in accordance with the ability of the organic compounds to form physical interactions with the molecular interaction site. Preferably, the comparing is carried out seriatim upon the members of the compound data set. In accordance with some embodiments, the comparison can be performed with a plurality of molecular interaction sites at the same time. A variety of theoretical and computational methods are known by those skilled in the art to study and optimize the interactions of "small" molecules or organic compounds with biological targets such as nucleic acids. These structure-based drug design tools have been very useful in modeling the interactions of proteins with small molecule ligands and in optimizing these interactions. Typically this type of study has been performed when the structure of the protein receptor was known by querying individual small molecules, one at a time, against this receptor. Usually these small molecules had either been co-crystallized with the receptor, were related to other molecules that had been co-crystallized or were molecules for which some body of knowledge existed concerning their interactions with the receptor. A significant advance in this area was the development of a software program called DOCK that allows structure-based database searches to find and identify molecules that are expected to bind to a receptor of interest. Kuntz, et al., Acc. Chem. Res., 1994, 27, 111, and - 12 -
Gschwend and Kuntz, J. Compt.-Aided Mol Des., 1996, 10, 123. DOCK 4.0 is commercially available from the Regents of the University of California. Equivalent programs are also comprehended in the present invention. DOCK allows the screening of a large collection of molecules whose three dimensional structures have been generated in silico, i.e., in computer readable format, but for which no prior knowledge of interactions with the ligands is available. DOCK, therefore, is a significant tool to the process of discovering new ligands to a molecule of interest and is presently preferred for use herein.
The DOCK program has been widely applied to protein targets and the identification of ligands that bind to them. Typically, new classes of molecules that bind to known targets have been identified, and later verified by in vitro experiments. The DOCK software program consists of several modules, including SPHGEN (Kuntz, et al, J. Mol Biol, 1982, 161, 269) and CHEMGRID (Meng, et al, J. Comput. Chem., 1992, 13, 505). SPHGEN generates clusters of overlapping spheres that describe the solvent-accessible surface of the binding pocket within the target receptor. Each cluster represents a possible binding site for small molecules. CHEMGRID precalculates and stores in a grid file the information necessary for force field scoring of the interactions between binding molecule and target. The scoring function approximates molecular mechanics interaction energies and consists of van der Waals and electrostatic components. DOCK uses the selected cluster of spheres to orient ligands molecules in the targeted site on the receptor. Each molecule within a previously generated three dimensional database is tested in thousands of orientations within the site, and each orientation is evaluated by the scoring function. Only that orientation with the best score for each compound so screened is stored in the output file. Finally, all compounds of the database are ranked in a hierarchy, e.g., ordered by scores, and a collection of the best candidates may then be screened experimentally. Using DOCK, numerous ligands have been identified for a variety of protein targets. Recent efforts in this area have resulted in reports of the use of DOCK to identify and design small molecule ligands that exhibit binding specificity for nucleic acids such as RNA double helices. While RNA plays a significant role in many diseases such as AIDS, viral and bacterial infections, few studies have been made on small molecules capable of specific RNA binding. Compounds possessing specificity for the RNA double helix, based on the unique geometry of its deep major groove, were identified using the DOCK methodology. Chen, et - 13 - al, Biochemistry, 1997, 36, 11402 and Kuntz, et al, Ace. Chem. Res., 1994, 27, 117. Recently, the application of DOCK to the problem of ligand recognition in DNA quadruplexes has been reported. Chen, et al, Proc. Natl. Acad. Sci., 1996, 93, 2635.
Programs such as DOCK typically assume knowledge of the conformation of the bound ligand and use a rigid conformation for a given ligand in molecular docking studies to arrive at structures of ligand-receptor complexes (which is a prerequisite for computing binding energies). Most ligands, however, possess a number of rotatable bonds, thus increasing the complexity of the calculations. Docking of flexible ligands would be desirable, but requires one to search an enormous amount of conformational space. For example, the study of an aminoglycoside antibiotic (paromomycin) bound to 16S A-site RNA target, would constitute a search space of- 1030 possible solutions.
QXP is a method that permits flexible ligand docking calculations (McMartin, C. and Bohacek, R.S., J. Comput.-Aided Mole.Design, 1997, 11, 333). In this method, full conformational searches on flexible ligands are carried out. QXP search algorithms employ the Monte Carlo perturbation technique with energy minimization in Cartesian space. An additional fast search step is introduced between the initial perturbation and energy minimization. This method is also presently preferred for use herein.
Preferably, individual compounds to be used in these methods are designated as mol files, for example, and combined into a collection of in silico representations using appropriate computer software, such as the software described in greater detail in U.S. Application Serial No. 09/076,405, filed May 12, 1998, which is assigned to the assignee of the present application, and which is incorporated herein by reference in its entirety. These two dimensional mol files are exported and converted into three dimensional structures using commercial software such as Converter (Molecular Simulations Inc., San Diego) or equivalent software, as described above. Atom types suitable for use with a docking program such as DOCK or QXP are assigned to all atoms in the three dimensional mol file using software such as, for example, Babel, or with other equivalent software.
A low-energy conformation of each molecule is generated with software such as Discover (MSI, San Diego). An orientation search is performed by bringing each compound of the plurality of compounds into proximity with the molecular interaction site in many orientations using DOCK or QXP. A contact score is determined for each - 14 - orientation, and the optimum orientation of the compound is subsequently used. Alternatively, the conformation of the compound can be determined from a template conformation of the scaffold determined previously.
The interaction of a plurality of compounds and molecular interaction sites is examined by comparing the numerical representations of the molecular interaction sites with members of the compound data set. Preferably, a plurality of compounds such as those generated by computer programs or otherwise, is compared to the molecular interaction site and allowed to undergo random "motions" among the dihedral bonds of the compounds. Preferably about 20,000 to 100,000 compounds are compared to at least one molecular interaction site. Typically, 20,000 compounds are compared to about five molecular interaction sites and scored. Individual conformations of the three dimensional structures are placed at the target site in many orientations. Moreover, during execution of the DOCK program, the compounds and molecular interaction sites are allowed to be "flexible" such that the optimum hydrogen bonding, electrostatic, and van der Waals contacts can be realized. The energy of the interaction is calculated and stored for 10-15 possible orientations of the compounds and molecular interaction sites. QXP methodology allows true flexibility in both the ligand and target and is presently preferred.
The relative weights of each energy contribution are updated constantly to insure that the calculated binding scores for all compounds reflect the experimental binding data. The binding energy for each orientation is scored on the basis of hydrogen bonding, van der Waals contacts, electrostatics, solvation/desolvation, and the quality of the fit. The lowest-energy van der Waals, dipolar, and hydrogen bonding interactions between the compound and the molecular interaction site are determined, and summed. In preferred embodiments, these parameters can be adjusted according to the results obtained empirically. The binding energies for each molecule against the target are output to a relational database. The relational database contains a hierarchy of the compounds ranked in accordance with the ability of the compounds to form physical interactions with the molecular interaction site. The higher ranked compounds are better able to form physical interactions with the molecular interaction site. In a preferred embodiment, the highest ranking, i.e., the best fitting compounds, are selected for synthesis. In preferred embodiments of the invention, those - 15 - compounds which are likely to have desired binding characteristics based on binding data are selected for synthesis. Preferably the highest ranking 5% are selected for synthesis. More preferably, the highest ranking 10% are selected for synthesis. Even more preferably, the highest ranking 20% are selected for synthesis. The synthesis of the selected compounds can be automated using a parallel array synthesizer or prepared using solution-phase or other solid-phase methods and instruments. In addition, the interaction of the highly ranked compounds with the nucleic acid containing the molecular interaction site is assessed as described below.
The interaction of the highly ranked organic compounds with the nucleic acid containing the molecular interaction site can be assessed by numerous methods known to those skilled in the art. For example, the highest ranking compounds can be tested for activity in high-throughput (HTS) functional and cellular screens. HTS assays for each target RNA can be determined by scintillation proximity, precipitation, luminescence-based formats, filtration based assays, colorometric assays, and the like. Lead compounds can then be scaled up and tested in animal models for activity and toxicity. The assessment preferably comprises mass spectrometry of a mixture of the nucleic acid and at least one of the compounds or a functional bioassay.
Certain preferred evaluation techniques employing mass spectroscopy are disclosed in U.S. Patent Application Serial. No. 09/076,206 filed May 12, 1998, which is assigned to the assignee of the present application. The foregoing patent application is incorporated herein by reference in its entirety as exemplary of certain useful and preferred mass spectrometric techniques for use herewith. It is to be specifically understood, however, that it is not essential that these particular mass spectrometric techniques be employed in order to perform the present invention. Rather, any evaluative technique may be undertaken so long as the objectives of the present invention are maintained.
In some embodiments of the invention, the highest ranking 20% of compounds from the hierarchy generated using the DOCK program or QXP are used to generate a further data set of three dimensional representations of organic compounds comprising compounds which are chemically related to the compounds ranking high in the hierarchy. Although the best fitting compounds are likely to be in the highest ranking 1%, additional compounds, up to about 20%), are selected for a second comparison so as to provide diversity (ring size, chain - 16 - length, functional groups). This process insures that small errors in the molecular interaction sites are not propagated into the compound identification process. The resulting structure/score data from the highest ranking 20%, for example, is studied mathematically (clustered) to find trends or features within the compounds which enhance binding. The compounds are clustered into different groups. Chemical synthesis and screening of the compounds, described above, allows the computed DOCK or QXP scores to be correlated with the actual binding data. After the compounds have been prepared and screened, the predicted binding energy and the observed Kd values are correlated for each compound.
The results are used to develop a predictive scoring scheme, which weighs various factors (steric, electrostatic) appropriately. The above strategy allows rapid evaluation of a number of scaffolds with varying sizes and shapes of different functional groups for the high ranked compounds. In this manner, a further data set of representations of organic compounds comprising compounds which are chemically related to the organic compounds which rank high in the hierarchy can be compared to the numerical representations of the molecular interaction site to determine a further hierarchy ranked in accordance with the ability of the organic compounds to form physical interactions with the molecular interaction site. In this manner, the further data set of representations of the three dimensional structures of compound which are related to the compounds ranked high in the hierarchy are produced and have, in effect, been optimized by correlating actual binding with virtual binding. The entire cycle can be iterated as desired until the desired number of those compounds highest in the hierarchy are produced.
Compounds which have been determined to have affinity and specificity for a target biomolecule, especially a target RNA or which otherwise have been shown to be able to bind to the target RNA to effect modulation thereof, can, in accordance with preferred embodiments of this invention, be tagged or labeled in a detectable fashion. Such labeling may include all of the labeling forms known to persons of skill in the art such as fluorophore, radiolabel, enzymatic label and many other forms. Such labeling or tagging facilitates detection of molecular interaction sites and permits facile mapping of chromosomes and other useful processes. - 17 -
EXAMPLES
Example 1: Functional Screening
The compounds are screened for binding affinity using MASS or conventional high-throughput functional screens. The best scoring compounds from docking a 256- member library against the 16S A-site ribosomal RNA structure are shown in the table below. The DOCK scores ranged from -308.8 to -144.2 as listed in Table 1. The MASS assay was performed with the 27-mer model RNA sequence of the 16S A-site whose NMR structure has been determined. The transcription/translation assay was based on expression of a luciferase plasmid.
Table 1. DOCK scores correlated with mass spectrometry and biological assay
Compound DOCK score MASS KD Activity1
Paromomycin -308.8 0.5 μM 0.3 μM
170046 -303.4 >50 >100
169999 -299.0 >50 >100
169963 -293.9 >50 >100
170070 -290.2 >50 >100
169970 -288.9 1.5 2.5
169961 -288.5 5.0 10
170003 -287.8 >50 >100
169995 -286.4 >50 >100
169993 -286.0 >50 >100
170072 -282.6 >50 >100
170078 -281.6 5.0 10
169985 -280.1 4.0 10
169998 -278.0
Figure imgf000019_0001
>50 >100
'Inhibition of protein synthesis in transcription/translation assay for luciferase reporter.
Paromomycin is an aminoglycoside antibiotic known to bind to the A-site RNA structure. The NMR structure was determined with paromomycin bound at the A-site. - 18 -
Paromomycin had the best DOCK contact score, along with high chemical and energy scores. The docking results for these compounds have been correlated with their binding affinity for a 16S RNA fragment using MASS mass spectrometry, and their ability to inhibit protein synthesis in a transcription translation assay. Four of the 12 compounds with the best DOCK scores had good affinity (<10 μM) for the RNA in the MASS assay and inhibited translation of a luciferase plasmid at <10 μM. In addition, all 9 of the "good" binders in the MASS assay scored in the top 30%> in the DOCK calculation.
Ibis compound 169970 had the best energy score of any compound, but had a poor contact score. This result suggests that the biological activity may be increased further by modifying the structure to increase the number of close contacts with the 16S A-site RNA.
Example 2: Target Site of TAR
The NMR solution structure of TAR RNA (Varani, et al, J. Mol Biol, 1995, 253, 313) has been used in the study of virtual screening for HIV-1 TAR RNA ligands. The compounds present in the Available Chemicals Database (ACD) have been partitioned into a number of subsets according to their formal charges (neutral, +1 , +2, etc) and DOCKed to the TAR structure. Five aminoglycoside antibiotics were among the 20 compounds with the best binding energies.
In addition, a number of compounds were docked to TAR with subsequent evaluation of the solvation/desolvation energy. An exemplary result is illustrated in Figure 1 which shows that ACD 00001199 and ACD 00192509 show relatively low energies of solvation desolvation as well as low IC50 values.
Example 3 : LI 1 /Thiostrepton - An Example Of A High Throughput RNA/Protein Assay
RNA molecules play numerous roles in cellular functions that range from structural to enzymatic in nature. These RNA molecules may work as single large molecules, in complexes with one or more proteins, or in partnership with one or more RNA molecules. Some of these complexes, such as those found in the ribosome, have been virtually intractable as high throughput screening targets due to their immense size and complexity. The ribosome presents a particularly rich source of RNA structures and functions that would appear, at first glance, to be highly effective drug targets. A large number of natural antibiotics exist that are - 19 - directed against ribosomal targets indicating the general success of this strategy. These include the aminoglycosides, kirromycin, neomycin, paromomycin, thiostrepton, and many others. Thiostrepton, a cyclic peptide based antibiotic, inhibits several reactions at the ribosomal GTPase center of the 5 OS ribosomal subunit. Evidence exists that thiostrepton acts by binding to the 23 S rRNA component of the 50S subunit at the same site as the large ribosomal protein Ll l. The binding of LI 1 to the 23 S rRNA causes a large conformation shift in the proteins tertiary structure. The binding of thiostrepton to the rRNA appears to cause an increase in the strength of the L11/23S rRNA interactions and prevents a conformational transition event in the LI 1 protein thereby stalling translation. Unfortunately, thiostrepton has very poor solubility, relatively high toxicity, and is not generally useful as an antibiotic. The discovery of new, novel, antibiotics directed against these types of targets would be of great value.
The design of high throughput assays to discover new antibiotics directed against ribosomal targets has been difficult, in part, due to the large structures involved and the low binding affinity of the RN A/protein interactions. Recently, a tremendous amount of data has been generated concerning RNA structures in the ribosome. This data has allowed the elucidation a number of structures and enabled the prediction of many others. Further, the use of the SPA assay format, as described below, allows for assays to be run without washing or other steps that lower the concentrations of binding components. This allows one to examine binding interactions with very low (> 1 μM) Kd's.
The mode of action of thiostrepton appears to be to stabilize a region of the 23 S rRNA and by doing so prevent a structural transition in the LI 1 protein. Among the many assays that look at RNA/protein interactions, a SPA assay has been designed to look for small molecules that could be effective as thiostrepton 'like' agents. This assay uses a radiolabeled small fragment of the 23 S rRNA, a biotinylated 75 amino acid fragment of the LI 1 protein that contains the 23 S rRNA binding domain and thiostrepton. The folding conditions of the secondary and tertiary structures of the 23 S rRNA fragment have been examined as have the binding conditions of the Ll l fragment to the 23 S rRNA. The LI 1 -thiostrepton assay has been optimized so that the 23 S rRNA fragment is in an unfolded state prior to the addition of compounds. Addition of the Ll l fragment to this unfolded RNA results in no detectable binding interaction. The high throughput assay is run by mixing the 23 S rRNA fragment, - 20 - under destabilizing conditions, with compounds of interest, incubating this mixture, and then adding the LI 1 fragment. Streptavidin-coated SPA beads are added for binding detection. Thiostrepton is used as a positive control. Addition of thiostrepton to the RNA promotes the correct secondary and/or tertiary folding of the structure and allows the LI 1 fragment to bind leading to the generation of a signal in the assay.
A tested paradigm has been developed for designing, developing and performing high and low throughput assays to look at RNA protein function, structure, and binding in bacteria. The LI 1 /thiostrepton assay described above is but one of a number of RNA protein interaction and functional assays that we have designed and developed for high and low throughput screening. Others include functional assays to measure RnaseP, RnaseE, and EF-Tu activity. Assays to examine the function of the bacterial signal recognition particle and S30 assembly are also contemplated.
Example 4: P48-4.5S Interaction
The P48 protein-binding region of the 4.5S RNA present in the signal recognition particle of bacteria has been selected as a target. The binding of P48 to 4.5 S RNA is essential for bacteria to survive, and development of an inhibitor of this binding should generate a novel class of antimicrobial agent. Using compounds (~2 x 105) from the Available Chemicals Directory (ACD), as well as from additional libraries, initial screening using DOCK (Meng, et al, J. Comp. Chem., 1992, 13, 505-524, incorporated herein by reference in its entirety) (version 4.0) can be carried out. This should leave about 15-20%> of the compounds in the database which have reasonably good shape complementarity in docking to the NMR structure of the 46mer, which is from the asymetric bulged regions of E. coli 4.5S RNA. A pseudobrownian Monte Carlo search in torsion angle space is performed using the program ICM (version 2.6), coupled with local minimization of each conformation, for automated flexible docking of that truncated set of potential ligands to the NMR structure and scoring for predicted affinity using an empirical free energy function.
Approximately 2000 of the best scoring compounds will be examined for experimental testing of their capability to inhibit the binding of P48 to 4.5S RNA. Inhibition of P48-4.5S RNA binding produced by the selected compounds will be measured using (his)6- tagged P48 and 33P-RNA in a high-throughput scintillation proximity assay system. The - 21 - structure-activity relationship among these 2000 compounds will serve as the basis for an expanded synthetic effort.
Docking of small molecules to the region of the asymmetric RNA bulges is expected to identify compounds with a high probability of selectively destabilizing the 4.5S- P48 interaction in vitro. The structure for the target RNA, shown in Figure 2, will be determined using NMR. Compounds (approaching 2 x 105) from the Available Chemicals Directory (ACD) will be docked to the structure and scored for predicted affinity. The best molecules will be screened for their ability to disrupt the RNA-protein interaction. Quantitative structure-activity relationship (QSAR) studies will be performed on the most active compounds to identify critical features and interactions with the RNA. New compounds (-20,000) will be prepared through combinatorial addition and/or repositioning of hydrogen bonding, aromatic, and charged functional groups to enhance the activity and specificity of the compounds for the bacterial SRP relative to the human counterpart. In addition, a pseudobrownian Monte Carlo search in torsion angle space using the program ICM2.6 (Abagyan, et al, J. Comp. Chem., 1994, 15, 488-506, incorporated herein by reference in its entirety) will be performed, coupled with local minimization of each conformation, for automated flexible docking of the truncated database to the NMR structural models.
In order to rank the ligands after flexible docking is completed, a function to estimate their binding free energies is used. There are a number of empirical methods for estimation of the free energy of binding, but we intend to use the empirical free energy function we derived from the thermodynamic binding cycle (Filikov, et al, J. Comp. -Aided Molec. Design, 1998, 12, 1-12, which is incorporated herein by reference in its entirety).
Example 5: Inhibition of Translation of an mRNA Containing a Molecular Interaction Site by a "Small" Molecule Identified by Molecular Docking
Translation of mRNAs in eukaryotic cells follows formation of an initiation complex at the 5 '-cap (m7Gppp). A variety of initiation factors bind to the 5 '-cap to form a pre-initiation complex before the 40S ribosomal subunit binds to the 5 '-untranslated region upstream of the AUG start codon. Pain, Eur. J. Biochem., 1996, 236, 747-771. It has been demonstrated that RNA secondary structures near the 5'-cap can affect the rates of translation - 22 - ofmRNAs. Kozak, J. 5to/. Chemistry, 1991, 266, 19867-19870. These RNA structures can bind proteins and inhibit the level of translation. Standart, et al, Biochimie, 1994, 76, 867- 879. The translational machinery has an ATP-dependant RNA hehcase activity associated with the eIF-4a/eIF-4b complex, and under normal conditions, the RNA structures are opened by the helicase and do not slow the rate of translation of the mRNA. The eIF-4a has a low, i.e., μM, affinity for the pre-initiation complex.
It is believed that stabilization of mRNA structures near the 5'-cap also could be effected by specific "small" molecules, and that such binding would reduce the translational efficiency of the mRNA. To test this hypothesis, a plasmid was constructed containing the luciferase message behind a 5'-UTR containing a 27-mer RNA construct of the HIV TAR stem-loop bulge whose structure had been determined by NMR. The resulting mRNA could be expressed and capped in a wheat germ lysate translation system supplemented with T7 polymerase following addition of m7G to the lysate (see, Figure 3A). Insertion of a 9-base leader before the TAR structure (HIVluc + 9) enhanced the translational efficiency, presumably by allowing the pre-initiation complex to form. The helicase activity associated with the pre-initiation complex can transiently melt out the TAR RNA structure, and the message is translated (see, Figure 3 A). Addition of a 39 amino acid tαt peptide to the lysate stabilized the TAR RNA structure and inhibited the expression of the luciferase protein, as expected from a specific interaction between the TAR RNA and tat (see, Figure 3B). "Small" organic molecules were then found that could inhibit the translation of the TAR-luciferase mRNA by stabilizing the TAR RNA structure. Compounds for the Available Chemicals Directory were docked to the TAR RNA structure and scored for binding energies. Among the best 25 compounds was ACD 00001199, whose structure is shown below. This compound has been shown to bind to TAR RNA with sufficient affinity to disrupt the interaction with tat peptide at a 1 μM concentration. - 23 - ACD 00001199 Structure
H N
Figure imgf000025_0001
Addition of 00001199 to the wheat germ lysate translation system with the luciferase mRNA produced some inhibition of translation at very high concentrations (see, Figure 4). However, the compound was much more efficient in inhibiting translation of the luciferase mRNA containing the TAR RNA structure in the 5'-UTR, reducing translation by 50%), at a 50 μM concentration. Small molecules that do not bind specifically to the TAR RNA structure did not affect translation of either mRNA construct.
Example 6: Comparison of QXP predicted ligand-DNA structures to X-ray crystallography
The utility of QXP in the context of ligands that bind to nucleic acid targets was evaluated. The X-ray data for netropsin (a minor groove binding drug) bound to two different duplex DNA sequences (PDB ID: 261d and 195d respectively (PDB IDs are identification codes for structures deposited in the Protein Data Bank, maintained at the Research Collaboratory for Structural Bioinformatics)) and an intercalator bound to an octamer duplex (PDB ID: 2d55) were used in validation studies. Root mean square (rms) deviations between the lowest energy docked structure (with randomly disordered ligands as initial structures) and the energy minimized X-ray structure fall with in 0.6 A in all the cases. Given that QXP method employs Monte Carlo type algorithm to search the conformational space and to make sure that the method is reliable in yielding global minimum, at least 10 QXP docking simulations were run with very different initial ligand structures. The performance of the QXP docking method can be quantified by its ability to identify the bound conformation of the ligand within 1.0 A rms deviation from the crystallographically observed conformation. In the test cases described above, the success rate of the QXP runs is in the 80% range. The nearly linear correlation between the rms deviation from the crystal structure - 24 - and the score of the docked structure indicates that the QXP method is sufficiently accurate in predicting structures of ligand-receptor complexes.
Example 7: Prediction of paromomycin-RNA complex structure using the QXP method
The QXP method was used to derive an accurate structure of a bound ligand to the RNA target. The NMR structure of the bacterial 16S ribosomal A site bound to paromomycin (Fourmy et al, Science, 1996, 274, 1367; PDB ID: lpbr) was used as the reference state. The aminoglycoside antibiotic was removed from the ligand-RNA complex. The conformation space of paromomycin was exhaustively searched using the QXP method for the lowest energy conformers. The target RNA was held rigid whereas the paromomycin was treated as fully flexible. Multiple docking searches with the randomly disrupted paromomycin as initial structures were performed. The representative lowest energy structure identified from the search (dark grey) is superimposed on the NMR structure (light grey) of the bound complex as shown in Figure 5. The robustness of the QXP method is indicated (in Figure 6), through a correlation between the observed rms deviation and QXP energy scores.

Claims

- 25 -What is claimed is:
1. A method of identifying compounds which bind to a molecular interaction site of a nucleic acid comprising: providing a numerical representation of the three dimensional structure of said molecular interaction site; providing a compound data set comprising numerical representations of the three dimensional structures of a plurality of organic compounds; and comparing the numerical representation of the molecular interaction site with members of the compound data set to generate a hierarchy of said organic compounds, said hierarchy being ranked in accordance with the ability of said organic compounds to form physical interactions with said molecular interaction site.
2. The method of claim 1 wherein said ranked hierarchy identifies those compounds which bind to the molecular interaction site.
3. The method of claim 1 wherein the comparing is carried out seriatim upon the members of the compound data set.
4. The method of claim 1 further comprising chemically synthesizing said organic compounds which rank high in said hierarchy.
5. The method of claim 4 further comprising assessing the interaction of said highly ranked organic compounds with said nucleic acid.
6. The method of claim 5 wherein said assessment comprises mass spectrometry of a mixture of said nucleic acid and at least one of said compounds.
7. The method of claim 5 wherein said assessment comprises a functional bioassay.
8. The method of claim 1 further comprising generating a further data set of representations of organic compounds, said organic compounds comprising compounds which - 26 - are chemically related to the organic compounds which rank high in said hierarchy, and comparing the numerical representation of the molecular interaction site with members of the further data set to determine a further hierarchy ranked in accordance with the ability of the organic compounds to form physical interactions with said molecular interaction site.
9. The method of claim 8 performed iteratively.
10. The method of claim 1 wherein said nucleic acid is RNA.
11. The method of claim 10 wherein said RNA is eukaryotic.
12. The method of claim 11 wherein said RNA is selected from the group consisting of mRNA, pre-mRNA, tRNA, rRNA, and snRNA.
13. The method of claim 10 wherein said nucleic acid is prokaryotic.
14. The method of claim 13 wherein said RNA is viral.
15. The method of claim 13 wherein said RNA is bacterial.
16. The method of claim 1 wherein said comparing is performed in silico.
17. The method of claim 1 wherein said molecular interaction site is present in a region of an RNA which is highly conserved among a plurality of taxonomic species.
18. The method of claim 1 performed for a plurality of molecular interaction sites.
19. The method of claim 1 wherein said molecular interaction site is located in the 3' or 5' untranslated region of a prokaryotic or eukaryotic mRNA. - 27 -
20. The method of claim 1 wherein said molecular interaction site is located in the 5' untranslated region of mRNA associated with a disease process.
21. A data set comprising the numerical representations of the three dimensional structures of molecular interaction sites determined in accordance with claim 18.
22. A data set comprising the numerical representations of the three dimensional structure of organic compounds ranked high in the hierarchy generated in accordance with the method of claim 1.
23. A data set comprising the numerical representations of the three dimensional structures of a plurality of organic compounds in accordance with the method of claim 1.
PCT/US1999/010510 1998-05-12 1999-05-12 Characterization of interactions between molecular interaction sites of rna and ligands therefor WO1999058722A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU39010/99A AU3901099A (en) 1998-05-12 1999-05-12 Characterization of interactions between molecular interaction sites of rna and ligands therefor

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US7644798A 1998-05-12 1998-05-12
US8509298P 1998-05-12 1998-05-12
US60/085,092 1998-05-12
US09/076,447 1998-05-12

Publications (1)

Publication Number Publication Date
WO1999058722A1 true WO1999058722A1 (en) 1999-11-18

Family

ID=26758117

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/010510 WO1999058722A1 (en) 1998-05-12 1999-05-12 Characterization of interactions between molecular interaction sites of rna and ligands therefor

Country Status (2)

Country Link
AU (1) AU3901099A (en)
WO (1) WO1999058722A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002016930A2 (en) * 2000-08-21 2002-02-28 Ribotargets Limited Computer-based modelling of ligand/receptor structures

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5888738A (en) * 1993-11-26 1999-03-30 Hendry; Lawrence B. Design of drugs involving receptor-ligand-DNA interactions

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5888738A (en) * 1993-11-26 1999-03-30 Hendry; Lawrence B. Design of drugs involving receptor-ligand-DNA interactions

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BRADDOCK M., ET AL.: "BLOCKING OF TAT-DEPENDENT HIV-1 RNA MODIFICATION BY AN INHIBITOR OFRNA POLYMERASE II PROCESSIVITY.", NATURE, NATURE PUBLISHING GROUP, UNITED KINGDOM, vol. 350., 1 January 1991 (1991-01-01), United Kingdom, pages 439 - 441., XP002919419, ISSN: 0028-0836, DOI: 10.1038/350439a0 *
CHEN Q., SHAFER R.H., KUNTZ I.D.: "STRUCTURE-BASED DISCOVERY OF LIGANDS TARGETED TO THE RNA DOUBLE HELIX.", BIOCHEMISTRY, AMERICAN CHEMICAL SOCIETY, US, vol. 36., 1 January 1997 (1997-01-01), US, pages 11402 - 11407., XP002919417, ISSN: 0006-2960, DOI: 10.1021/bi970756j *
CRAIN P F, ET AL.: "APPLICATIONS OF ELECTROSPRAY IONIZATION MASS SPECTROMETRY TO STRUCTURAL STUDIES OF OLIGONUCLEOTIDES AND RNA", BOOK OF ABSTRACTS. ACS NATIONAL MEETING., XX, XX, 13 March 1994 (1994-03-13), XX, pages COMPLETE, XP002919418 *
RAMEZANI A., JOSHI S.: "COMPARAT IVE ANALYSIS OF FIVE HIGHLY CONSERVED TARGET SITES WITHIN THE HIV-1 RNA FOR THEIR SUSCEPTIBILITY TO HAMMERHEAD RIBOZYME MEDIATED CLEAVAGE IN VITRO AND IN VIVO.", ANTISENSE & NUCLEIC ACID DRUG DEVELOPMENT., MARY ANN LIEBERT, INC., NEW YORK., US, vol. 06., 1 January 1996 (1996-01-01), US, pages 229 - 235., XP002919420, ISSN: 1087-2906 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002016930A2 (en) * 2000-08-21 2002-02-28 Ribotargets Limited Computer-based modelling of ligand/receptor structures
WO2002016930A3 (en) * 2000-08-21 2003-01-16 Ribotargets Ltd Computer-based modelling of ligand/receptor structures

Also Published As

Publication number Publication date
AU3901099A (en) 1999-11-29

Similar Documents

Publication Publication Date Title
Shapiro et al. Bridging the gap in RNA structure prediction
Searls Using bioinformatics in gene and drug discovery
Hargrove Small molecule–RNA targeting: starting with the fundamentals
Eubanks et al. RNA structural differentiation: opportunities with pattern recognition
Chen et al. A multivariate prediction model for microarray cross-hybridization
Yu et al. Prediction and differential analysis of RNA secondary structure
Giannetti et al. SHAPE probing reveals human rRNAs are largely unfolded in solution
Guo et al. Novel perspectives of environmental proteomics
EP1083980B1 (en) Modulation of molecular interaction sites on rna and other biomolecules
Powers et al. The application of FAST-NMR for the identification of novel drug discovery targets
Chou et al. Predicting networking couples for metabolic pathways of Arabidopsis
US20030017483A1 (en) Modulation of molecular interaction sites on RNA and other biomolecules
Mlýnský et al. Understanding in-line probing experiments by modeling cleavage of nonreactive RNA nucleotides
WO1999058722A1 (en) Characterization of interactions between molecular interaction sites of rna and ligands therefor
US7085652B2 (en) Methods for searching polynucleotide probe targets in databases
US20030092662A1 (en) Molecular interaction sites of 16S ribosomal RNA and methods of modulating the same
Zacharias Perspectives of drug design that targets RNA
Liu et al. Molecular Profiling-An Essential Technology Enabling Personalized Medicine in Breast Cancer
Zhang et al. Simulation study of the plasticity of k-turn motif in different environments
Wijesuriya et al. Selection of RNA Aptamers to Distinguish the V600E Mutation Status of BRAF Protein: A Potential in silico Approach
US20030082598A1 (en) Molecular interaction sites of 23S ribosomal RNA and methods of modulating the same
US20030083483A1 (en) Molecular interaction sites of vimentin RNA and methods of modulating the same
Quinn et al. Domain ChIRP reveals the modularity of long noncoding RNA architecture, chromatin interactions, and function
Oduguwa et al. An overview of soft computing techniques used in the drug discovery process
Kinoshita et al. Prediction of Molecular Interactions from 3D‐Structures: From Small Ligands to Large Protein Complexes

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase