WO2007148106A2 - Detection of acetylation of prokaryotic proteins by mass spectrometry - Google Patents

Detection of acetylation of prokaryotic proteins by mass spectrometry Download PDF

Info

Publication number
WO2007148106A2
WO2007148106A2 PCT/GB2007/002332 GB2007002332W WO2007148106A2 WO 2007148106 A2 WO2007148106 A2 WO 2007148106A2 GB 2007002332 W GB2007002332 W GB 2007002332W WO 2007148106 A2 WO2007148106 A2 WO 2007148106A2
Authority
WO
WIPO (PCT)
Prior art keywords
protein
bacterium
pnat
tuberculosis
identifying
Prior art date
Application number
PCT/GB2007/002332
Other languages
French (fr)
Other versions
WO2007148106A3 (en
Inventor
Jens Mattow
Neil Graham Stoker
Peter Roman Jungblut
Stuart Christopher Gorthorn Rison
Original Assignee
The Royal Veterinary College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Royal Veterinary College filed Critical The Royal Veterinary College
Publication of WO2007148106A2 publication Critical patent/WO2007148106A2/en
Publication of WO2007148106A3 publication Critical patent/WO2007148106A3/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • G01N33/6851Methods of protein analysis involving laser desorption ionisation mass spectrometry
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • C07K14/35Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Mycobacteriaceae (F)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/48Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving transferase
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6842Proteomic analysis of subsets of protein mixtures with reduced complexity, e.g. membrane proteins, phosphoproteins, organelle proteins
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/90Enzymes; Proenzymes
    • G01N2333/91Transferases (2.)
    • G01N2333/91045Acyltransferases (2.3)
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2500/00Screening for compounds of potential therapeutic value

Definitions

  • This invention relates to acetylation, and in particular to N-te ⁇ ninal acetylation of proteins in bacteria.
  • Proteomic approaches have been used for protein analysis in recent years, and mass spectrometric methods, such as MALDI-MS, allow the identification both of proteins and post-translational modifications. Modification of proteins extends the range of possible molecular structures beyond the limits imposed by the 20 encoded amino acids and, if reversible, gives a means of control and signaling.
  • acetylation is the most common.
  • Amino (N) terminal acetylation occurs on approximately 50% of yeast proteins and 80-90% of proteins in higher eukaryotes (Polevoda & Sherman 2002), and affects many protein functions including enzymatic activity, stability, DNA binding, protein-protein interaction and peptide-receptor recognition, and occurs on numerous and diverse proteins.
  • acetylation is believed to be very rare in prokaryotic proteins (Polevoda & Sherman 2003b; Polevoda & Sherman, 2006) because the data, mainly from E. coli, have shown very few bacterial proteins to be N-terminally acetylated (Walker 1963).
  • Tuberculosis caused by Mycobacterium tuberculosis (M. tuberculosis), has been declared a global emergency and is the most frequent infectious cause of mortality in the world.
  • M. tuberculosis resistant to the currently available chemotherapeutics, particularly INH, is an additional cause of alarm.
  • INH chemotherapeutics
  • M. tuberculosis is likely to contain many N-acetyl transferases (NATs).
  • aNAT M. tuberculosis arylamine N-acetyl transferase
  • aNAT arylamine N-acetyl transferase
  • N-terminal acetylation of proteins is common in eukaryotes, the presence of this activity in bacteria suggests that this process originally evolved in bacteria, and was then lost in species such as E. coli. It is axiomatic that the more closely related an organism is to M. tuberculosis, the more likely they are to have the similar properties. Accordingly, we consider that other high-GC Gram-positive bacteria, especially other actinomycetes, and in particular other mycobacteria, are likely to share widespread N-terminal acetylation of proteins.
  • a first aspect of the invention thus provides a method of identifying an N- terminally acetylated protein in a bacterium, the method comprising: providing details of at least one putative translation start site (TSS) for at least one protein expressed in the bacterium; confirming the actual TSS of the at least one protein using mass spectrometry (MS); and determining, using MS, whether the at least one protein is N-terminally acetylated.
  • TSS putative translation start site
  • MS mass spectrometry
  • Example 1 This method is described in Example 1 for a representative sample of 13 proteins from M. tuberculosis in which 3 out of 13 (23%) of these proteins were found to be N-terminally acetylated.
  • N-terminally acetylated protein which is expressed in a bacterium, we do not mean that the protein is translated in its acetylated form.
  • N-terminally acetylation is a post- translational modification.
  • the MS methods used are matrix-assisted laser desorption/ionisation (MALDI) MS (Karas & Hillenkamp, 1988) or electrospray MS (Fenn et ah, 1989).
  • MALDI matrix-assisted laser desorption/ionisation
  • electrospray MS Feenn et ah, 1989.
  • MS techniques include tandem mass spectrometry (MS/MS; Medzihradszky et al, 2000). Suitable MS methods are reviewed by Aebersold & Mann (2003); and Domon & Aebersold (2006).
  • MALDI-MS is used for confirming the actual TSS of the at least one protein.
  • MS/MS is used for determining whether the actual TSS of the at least one protein is N-terminally acetylated.
  • different MS methods are used for confirming the actual TSS of the at least one protein, and for determining whether the actual TSS of the at least one protein is N-terminally acetylated.
  • MALDI-MS is used for confirming the actual TSS of the at least one protein
  • MS/MS is used for determining whether the at least one protein is N-terminally acetylated.
  • This aspect of the invention thus includes the use of mass spectrometry (MS) in the identification of an N-terminally acetylated protein expressed in a bacterium.
  • the protein is not a ribosomal protein.
  • the protein is one that does not contain an N-terminal signal sequence.
  • N-acetyl transferase (NAT) mutant bacteria can be employed to identify N-terminally acetylated proteins.
  • a second aspect of the invention thus provides a method of identifying an N- terminally acetylated protein in a bacterium, the method comprising: providing a mutant strain of the bacterium comprising at least one protein N-acetyl transferase (pNAT) in mutant form; providing a wild-type strain of the bacterium comprising the at least one pNAT in wild-type form; and identifying a protein that is differentially N-terminal acetylated between the mutant and wild-type bacterial strains.
  • pNAT protein N-acetyl transferase
  • this aspect of the invention includes the use of N-acetyl transferase (pNAT) mutant bacteria in the identification of N-terminally acetylated proteins.
  • pNAT N-acetyl transferase
  • the at least one mutant pNAT in the mutant bacterial strain is nonfunctional.
  • the at least one mutant pNAT has been deleted ("knocked- out") and hence can not possess any partial or residual activity.
  • the mutant bacterial strain contains at least two pNATs in mutant form. This is in case of functional redundancy between the at least two pNATs.
  • the mutant bacterial strain is M. tuberculosis and the at least one pNAT in mutant form is selected from Riml, RvO133 and Rv3225c.
  • the step of identifying proteins that are differentially N-terminal acetylated between the mutant and wild-type bacterial strains comprises: extracting proteins from the mutant and wild-type bacterial strains and separating the proteins by 2D-P AGE; determining a difference between the 2D-P AGE patterns obtained from the mutant and wild-type bacterial strains; and identifying a protein which corresponds to the difference between the two 2D-P AGE patterns.
  • liquid chromatography could be used instead of 2D-P AGE (Yates et al., 1999).
  • the method will typically further comprise confirming the N-terminal acetylation status of the identified protein from one or both of the mutant and wild-type bacterial strains.
  • MS is used to determine the identity of the proteins that correspond to the differences between the two 2D-P AGE patterns, and to confirm the N-terminal acetylation status of the identified protein.
  • MALu ⁇ -MS is used initially to identify the N-terminal acetylated peptide, and MS/MS is used to confirm the location of the acetyl group. It is also possible to determine the N-terminal sequence using Edman degradation, but this is much less preferred.
  • the bacterium used in the first and second aspects of the invention is a Gram-positive bacterium, and more preferably it is afirmicute.
  • the bacterium is an ⁇ ctinomycete, which may be selected from a Mycobacterium, Corynebacterium or a Nocardia.
  • M. tuberculosis A large number of species are included within the genus of Mycobacteria, and these include M. tuberculosis, M. avium, M. leprae, M. bovis, M. smegmatis, M. paratuberculosis and M. marinum
  • suitable mycobacteria are mentioned by Tsukamura (1983, Microbiol Immunol. 27(4): 315-34, incorporated herein by reference) who describes a numerical classification of 280 strains of slowly growing mycobacteria. M. tuberculosis is most preferred.
  • the bacterium may be a pathogenic bacterium and the protein identified by the methods of the first and second aspects of the invention may be a drag discovery target for a disease or condition caused by the pathogenic bacterium.
  • a third aspect of the invention thus provides a method of identifying a drug discovery target in a pathogenic bacterium, the method comprising identifying an N-terminally acetylated protein expressed in a pathogenic bacterium according to the methods of the first or second aspects of the invention.
  • the method further comprises determining at least one property or activity of the protein that is affected by N-tenninal acetylation of the protein, and which affected property or activity is relevant to the pathogenicity of the bacterium.
  • this aspect of the invention includes identifying a protein that is N-terminally acetylated in a pathogenic bacterium, and determining the effect of the N-terminal acetylation on the protein in vitro and/or on the bacterium in culture.
  • determining the effect of N-terminal acetylation on at least one property or activity of the protein which is relevant to the pathogenicity of the bacterium is carried out by comparing N-terminally acetylated and non-acetylated forms of the protein in vitro.
  • protein N-acetyl transferases preferentially acetylate proteins starting with particular amino acids, which are serine (Ser) and alanine (Ala), followed by methionine (Met), glycine (GIy) and threonine (Thr).
  • Ser serine
  • Al alanine
  • Thr methionine
  • GIy glycine
  • Thr threonine
  • the three N-terminally acetylated proteins that we identified in M. tuberculosis have N-terminal sequences that fit with the eukaryotic pattern. Therefore it should be possible to prevent acetylation by altering the N-terminal sequence. For example, mutating the protein so that it has a proline (Pro) at the N- terminus inhibits acetylation in eukaryotes and is also expected to prevent acetylation in prokaryotes.
  • Pro proline
  • one approach for generating a non-acetylated version of a protein that is normally subject to N-terminal acetylation includes carrying out site- directed mutagenesis to alter the start of the gene such that a non-acetylatable amino acid is encoded, and expressing and purifying the protein.
  • An alternative approach for generating N-terminal acetylated and non-acetylated protein for carrying out an in vitro assay includes expressing and purifying an in vivo N-terminally acetylated protein, and treating it with a deacetylase. Conversely, a non-acetylated protein can be expressed and purified, and then acetylated using a pNAT.
  • a further approach is to express a protein with an N-terminal tag which can be cleaved off following purification. Thereafter the functional activity or property of the protein can be assayed, and the effect of N-terminal acetylation can be assessed.
  • Suitable activities and properties of a protein that may be affected by its N- terminal acetylation and which can be tested in vitro and/or in culture depend upon what the protein is known or predicted to do. For example, as would be appreciated by the skilled person, if the protein has an enzymatic function, this function could be assayed in vitro. Acetylation may also affect protein-protein binding (which can be tested by immuno co-precipitation), dimerisation, and protein stability. If the protein is related to iron metabolism, growth in medium containing high/low iron may be tested. If the protein is related to oxidative stress, survival in the presence of hydrogen peroxide, or in activated versus resting macrophages, could be tested. Other suitable properties of the bacterium that may be affected by N-terminal acetylation of a protein, and which can be tested for in cell culture, include survival, growth rate, and drug resistance.
  • determining at least one activity or property of a protein that is N-terminally acetylated in a pathogenic bacterium, and which may be affected by its N-terminal acetylation may be performed in vivo.
  • properties of the bacterium that may be affected by N-terminal acetylation of a protein may relate to the pathogenic characteristics of the bacterium.
  • an animal model of the pathogenic condition caused by the pathogenic bacterium may usefully be employed.
  • Properties of the pathogenic bacterium which may be tested in vivo include infectivity, induction of pathology, and drag resistance (Parish et al, 2003). It is thus appreciated that the identified protein may be a drag discovery target for a disease or condition caused by the pathogenic bacterium.
  • the bacterium is M. tuberculosis and the protein is a drug discovery target for tuberculosis.
  • the bacterium may be M, leprae and the protein may be a drug discovery target for leprosy; the bacterium may be M. avium, and the protein may be a drug discovery target for the M. avium complex; the bacterium may be Coi ⁇ nebacterium diphtheriae and the protein may be a drug discovery target for diphtheria.
  • the bacterium may be Nocardia asteroides and the protein may be a drug discovery target for nocardiosis; the bacterium may be Actinomyces spp and the protein may be a drug discovery target for actinomycosis; and the bacterium may be Arcanobacterium spp and the protein may be a drug discovery target for Arcanobacterium spp infection.
  • the bacterium may be Listeria monocytogenes and the protein may be a drug discovery target for listeria; the bacterium may be streptococcus such as Strep pneumoniae or Strep pyogenes and the protein may be a drug discovery target for streptococcal infections; the bacterium may be staphylococcus such as Staph aureus and the protein may be a drag discovery target for staphylococcal infections including MRSA; the bacterium may be an enterococcus and the protein may be a drag discovery target for enterococcal infections; the bacterium may be Bacillus anthracis and the protein may be a drug discovery target for anthrax; the bacterium may be Bacillus cereus and the protein may be a drag discovery target for food poisoning; or the bacterium may be a
  • Clostridia such as C. perfringens, C. difficile, C. tetani, and C. botulinum
  • the protein may be a drag discovery target for diseases caused by these Clostridia including tetanus and botulism (see http ://www.textbookofbacteriolo gy.net/) .
  • a fourth aspect of the invention thus provides an isolated, N-terminally acetylated M. tuberculosis protein which is selected from GIpX, PrcA and ArgD, or an N- terminal fragment thereof.
  • the amino acid sequence of GIpX from M. tuberculosis strain H37Rv is listed in Genbank Accession No. NP_215615 and in Figure 4 (SEQ ID No: 1), while GIpX from M. tuberculosis strain CDCl 551 is listed in Genbank Accession No. NP_335575.
  • the amino acid sequence of PrcA from M. tuberculosis strain H37Rv is listed in Genbank Accession No. NP_216625 and in Figure 5 (SEQ ID No: T), while PrcA from M. tuberculosis strain CDCl 551 is listed in Genbank Accession No. NP_336638.
  • tuberculosis strain H37Rv is listed in Genbank Accession No. NP_216171 and in Figure 6 (SEQ ID No: 3), while ArgD from M. tuberculosis strain CDC1551 is listed in Genbank Accession No . NP_336148.
  • an N-terminal fragment of the specified proteins we include a region of at least 5 consecutive amino acids, more preferably at least 10, or at least 15, or at least 20, or at least 30, or at least 50 amino acid residues of the protein.
  • the N- terminal fragment may be at least 100 or at least 150 amino acid residues of the protein, but less than 100% of the length of the whole polypeptide.
  • an N-terminal fragment of the invention is itself N-terminally acetylated. Such fragments may be useful, for example, to prepare antibodies which will specifically bind the N-terminally acetylated form of the protein.
  • the invention is not limited to an isolated N- terminally acetylated M. tuberculosis protein having the sequence listed in Figures 4, 5 or 6 (SEQ ID Nos: 1-3), but includes naturally occurring variants thereof in which one or more of the amino acid residues have been replaced with another amino acid.
  • the invention includes isolated, N-terminally acetylated M. tuberculosis GIpX, PrcA and ArgD proteins from strains other than H37Rv, in particular strain CDCl 551 and strain Erdmann.
  • the invention further includes isolated, N-terminally acetylated GIpX, PrcA and ArgD proteins from Mycobacteria other than M. tuberculosis.
  • the percentage sequence identity between GIpX, PrcA and ArgD proteins from M. tuberculosis and M. bovis, M. marinum, M. leprae, M. paratuberculosis and M. smegmatis are listed in Table 1.
  • a fifth aspect of the invention provides an isolated, N-terminally acetylated protein having at least 80% sequence identity with an M. tuberculosis protein selected from GIpX, PrcA and ArgD, as provided in Figures 4 to 6 (SEQ ID Nos: 1-3), respectively, or an N-terminal fragment thereof.
  • the protein has at least 81%, 82%, 83%, 84% or 85% sequence identity, and yet more preferably at least 86%, 87%, 88%, or 89% or at least 90%, 91%, 92%, 93%, 94% or 95% sequence identity, and yet more preferably at least 96%, 97%, 98% or at least 99% sequence identity with the M. tuberculosis GIpX, PrcA or ArgD protein from strain H37Rv, as provided in Figures 4 to 6 (SEQ ID Nos: 1-3), respectively.
  • the percent sequence identity between two polypeptides may be determined using suitable computer programs, for example the GAP program of the University of Wisconsin Genetic Computing Group and it will be appreciated that percent identity is calculated in relation to polypeptides whose sequence has been aligned optimally.
  • the alignment may alternatively be carried out using the Clustal W program (Thompson et al, (1994) Nucleic Acids Res 22, 4673-80).
  • the parameters used may be as follows:
  • Fast pairwise alignment parameters K-tuple(word) size; I 3 window size; 5, gap penalty; 3, number of top diagonals; 5. Scoring method: x percent. Multiple alignment parameters: gap open penalty; 10, gap extension penalty; 0.05. Scoring matrix: BLOSUM.
  • the invention also provides an isolated, N-terminally acetylated GIpX, PrcA or ArgD protein from a Mycobacterium other than M. tuberculosis, in particular those Mycobacteria species listed above.
  • a sixth aspect of the invention provides a method of determining the effect of N- terminal acetylation on the function or activity of a protein as defined in the fourth or fifth aspects of the invention.
  • the method comprises providing N-terminally acetylated and non N-terminally acetylated forms of the protein, and determining at least one property of the protein that is affected by N-terminal acetylation of the protein.
  • the at least one property of the protein or the at least one property of the cell which is tested is one that is relevant to the pathogenicity of M. tuberculosis.
  • This method is typically performed in vitro, using methods well known in the art. For example, with respect to GIpX, fructose 1-6 bisphosphatase activity may be tested; with respect to argD, acetylornithine aminotransferase activity may be tested; and with respect to PrcA, either or both of protein degradation and sensitivity to nitric oxide may be tested (Darwin et al, 2003).
  • the method of determining the effect of N-terminal acetylation on the function or activity of a protein as defined in the fourth or fifth aspects of the invention typically comprises providing a cell which contains the N-terminally acetylated form of the protein and a cell which contains the non N-terminally acetylated form of the protein, and determining at least one property of the cell that is affected by N-terminal acetylation of the protein.
  • the cell is a bacterial cell, most preferably M. tuberculosis.
  • the present invention provides, for the first time, motivation for the skilled person to attempt to identify protein N-acetyl transferase (pNAT) enzymes in bacteria, and to assess their suitability as drug discovery targets.
  • pNAT protein N-acetyl transferase
  • inhibitors of protein N-acetyl transferase enzymes may be therapeutically useful.
  • a seventh aspect of the invention thus provides a method of identifying a protein N- acetyl transferase (pNAT) in a bacterium, the method comprising: providing details of at least one putative pNAT; providing a mutant strain of the bacterium in which the putative pNAT gene has been knocked-out; and determining the amount, rate and/or level of N-terminal protein acetylation in the mutant bacterial strain, wherein a reduction in the amount, rate and/or level of N-terminal protein acetylation in the mutant bacterial strain in comparison to the amount, rate and/or level of N-terminal protein acetylation in a wild-type strain of that bacterium, indicates that the putative pNAT is an actual pNAT.
  • pNAT protein N- acetyl transferase
  • determining the amount, rate and/or level of N-terminal protein acetylation in the mutant bacterial strain comprises measuring the level of pNAT enzyme activity in vitro.
  • a suitable method for determining the amount, rate and/or level of N-terminal protein acetylation is described below and in Example 2B.
  • putative bacterial pNATs may be identified by virtue of their homology to known yeast pNATs such as NatA, NatB and NatC. Putative bacterial pNATs may alternatively be identified by homology searches of protein motif databases such as pfam (http://www.sanger.ac.uk/Software/Pfam/). Further alternatively, putative bacterial pNATs may be identified by virtue of their homology to other bacterial pNATs, such as the M. tuberculosis proteins Riml, Rv0133 and Rv3225c.
  • the identified pNAT is further tested by complementing the mutant strain of the bacterium with a polynucleotide encoding the knocked-out pNAT, to show that the N-terminal acetylation activity of the mutant bacteria is thereby restored.
  • Suitable bacteria include those described above with respect to the first and second aspects of the invention.
  • the bacterium is a pathogenic bacterium and the pNAT identified by the methods of this aspect of the invention may be a drug discovery target for a disease or condition caused by the pathogenic bacterium.
  • Suitable pathogenic bacteria, and the diseases that they cause include those listed above with respect to the third aspect of the invention.
  • An eighth aspect of the invention provides a method of identifying a polynucleotide encoding a pNAT from a desired bacterium, the method comprising: providing a library of polynucleotides from the desired bacteria, which library comprises at least one polynucleotide that encodes a putative pNAT; providing microbial cells of a strain in which a pNAT gene is naturally absent or has been knocked-out; transforming the library of polynucleotides into the microbial cells; incubating the transformed microbial cells under conditions that allow expression of the polypeptides encoded by the polynucleotides; and identifying one or more microbial cells in which the levels of protein N- terminal acetylation have been increased under the conditions in the previous step, wherein increased levels of protein N-terminal acetyiation in the identified microbial cell indicates that the transformed polynucleotide encodes a pNAT from the desired bacterium.
  • the library of polynucleotides is a cDNA library.
  • the library may be derived from a desired bacterial source, typically the bacteria listed above with respect to the first and second aspects of the invention, and more preferably, the pathogenic bacteria listed above with respect to the third aspect of the invention.
  • the microbial cells may be yeast cells, such as an S. cerevisiae. If the microbial cells are S. cerevisiae, the knocked-out pNAT may be NatA, NatB or NatC.
  • the microbial cells may be bacterial cells, such as M, tuberculosis, and the knocked-out pNAT may be Riml, RvO133 or Rv3225c.
  • the microbial cells may be bacterial cells, such as E. coli, in which pNAT activity is thought not to be present.
  • the library of polynucleotides is under the control of an inducible promoter.
  • inducible promoter are known in the art.
  • the promoter may be a tetracycline-inducible promoter, a methionine-inducible promoter, a galactose-inducible promoter such as GALl or GALlO, or the CUPl metallothionein promoter (induced in the presence of Cu 2+ , Zn 2+ ).
  • lac or ara promoters are preferred.
  • a ninth aspect of the invention provides a method of identifying a pNAT from a desired bacterium, the method comprising: identifying a protein from the desired bacterium which requires N-terminal acetyiation for a specified activity; randomly mutagenising genes in the desired bacteria; selecting or screening for mutagenised bacteria that have lost the specified activity; and identifying at least one mutagenised gene from the bacteria that has lost the specified activity, wherein a loss of the specified protein activity indicates that the mutagenised gene encodes a pNAT from the desired bacterium.
  • the Esat ⁇ protein requires acetylation for an activity (Okkels et al, 2004). Since Esat ⁇ requires N-terminal acetylation for secretion from the M. tuberculosis cell, an anti-Esat6 antibody could be used to screen for Esat ⁇ secretion, and the absence thereof, using methods well known in the art.
  • a tenth aspect of the invention provides a method of screening for an inhibitor of a bacterial pNAT, the method comprising: expressing and purifying the bacterial pNAT; contacting the pNAT with a test compound; and assaying for pNAT activity, wherein a reduction in pNAT activity in the presence of the test compound indicates that the test compound is a potential inhibitor of the pNAT.
  • the bacterial pNAT may be one identified by the method of the seventh, eighth or ninth aspects of the invention.
  • the M. tuberculosis proteins Riml, RvO133 and Rv3225c are pNAT enzymes by virtue of their homology to known S. cerevisiae pNATs.
  • the bacterial pNAT is a M. tuberculosis pNAT selected from Riml, Rv0133 and Rv3225c.
  • the bacterial NAT may be a homologue of M. tuberculosis Riml, RvO133 and Rv3225c from a mycobacteria other than M. tuberculosis.
  • Riml, RvOl 33 and Rv3225c can readily be identified and used in the screening methods, tor example, the percentage sequence identity between Riml, RvOl 33 and Rv3225c proteins from M. tuberculosis and M. bovis, M. marinum, M. leprae, M. paratuberculosis and M. smegmatis are listed in Table 2.
  • the assay may comprise: incubating the purified pNAT protein with [ 3 H]acetyl-CoA and an acceptor such as adrenocorticotropic hormone peptide, or a bacterial protein or its N- terminal peptide (such as M. tuberculosis GIpX) in the presence or absence of the test compound; separating the acetylated peptide/protein from the radioactive substrate e.g. by cation-exchange chromatography, affinity chromatography or size-exclusion chromatography; and counting the radioactivity of the acetylated peptide/protein in the presence or absence of the test compound.
  • an acceptor such as adrenocorticotropic hormone peptide, or a bacterial protein or its N- terminal peptide (such as M. tuberculosis GIpX)
  • the above aspect of the invention includes screening methods to identify drugs or lead compounds of use in treating the disease or condition caused by the bacterium (e.g. tuberculosis). It is appreciated that screening assays which are capable of high throughput operation are particularly preferred.
  • the compound may be a drag-like compound or lead compound for the development of a drug- like compound.
  • drag-like compound is well known to those skilled in the art, and may include the meaning of a compound that has characteristics that may make it suitable for use in medicine, for example as the active ingredient in a medicament.
  • a drag-like compound may be a molecule that may be synthesised by the techniques of organic chemistry, less preferably by techniques of molecular biology or biochemistry, and is preferably a small molecule, which may be of less than 5000 daltons and which may be water-soluble.
  • a drug-like compound may additionally exhibit features of selective interaction with a particular protein or proteins and be bioavailable and/or able to penetrate target cellular membranes or the blood:brain barrier, but it will be appreciated that these features are not essential.
  • lead compound is similarly well known to those skilled in the art, and may include the meaning that the compound, whilst not itself suitable for use as a drag (for example because it is only weakly potent against its intended target, nonselective in its action, unstable, poorly soluble, difficult to synthesise or has poor bioavailability) may provide a starting-point for the design of other compounds that may have more desirable characteristics.
  • the method further comprises modifying the test compound, and testing the modified compound for the ability to inhibit the pNAT in vitro or in culture.
  • the method may further comprise determining whether the test compound or the modified compound has the ability to inhibit the pNAT in the bacterium, such as M. tuberculosis, in vivo.
  • the method may comprise determining whether the test compound or the modified compound has the ability to inhibit the pNAT in an in vivo model of the disease or condition caused by the bacterium.
  • a suitable experimental model of tuberculosis is the experimental infection of a mouse with M. tuberculosis administered intravenously, intranasally or by aerosol. Bacterial numbers are measured in the lung, liver and spleen. Details vary according to route of administration, mouse strain, bacterial dose. Flynn (2006) provides a review of this and other suitable animal models.
  • the method may also comprise the step of formulating a compound which has the ability to inhibit the pNAT into a pharmaceutically acceptable composition.
  • the invention includes a pharmaceutical composition
  • a pharmaceutical composition comprising a compound which has the ability to inhibit a bacterial pNAT that has been identified as described above, and a pharmaceutically acceptable carrier, diluent or excipient.
  • a compound Whilst it is possible for a compound to be administered alone, it is preferable to present it as a pharmaceutical formulation, together with one or more acceptable carriers.
  • the carrier(s) must be "acceptable” in the sense of being compatible with the compound of the invention and not deleterious to the recipients thereof.
  • the carriers will be water or saline which will be sterile and pyrogen free.
  • the aforementioned compounds or a formulation thereof may be administered by any conventional method including oral, which is preferred, as well as parenteral (eg subcutaneous or intramuscular) injection.
  • the treatment may consist of a single dose or a plurality of doses over a period of time.
  • the invention further provides a method of treating an individual suffering from a tuberculosis, the method comprising administering to a patient an appropriate quantity of a pNAT inhibitor compound identified as described above.
  • this embodiment of the invention provides the use of the M. tuberculosis pNAT inhibitor compound in the manufacture of a medicament for preventing or treating tuberculosis.
  • references to models of tuberculosis and to treating tuberculosis are to be construed in relation to a disease or condition caused by the other bacteria as known in the art and discussed above.
  • FIG. 1 This figure illustrates the general strategy for identifying alternative translation start positions and tryptic digest peptides: application to GIpX (RvI 099c).
  • the six alternative translation starts for GIpX are indicated by the labelled arrows.
  • the m(inus)l, ml, m3 and m4 starts are located upstream of the original (pO) translation start indicated in the M. tuberculosis genome annotation.
  • the p(lus)l and p2 starts are downstream of the original translation start prediction.
  • the putative trypsin cleavage sites are indicated by the 'T'-labelled double-tailed bars.
  • the resulting tryptic peptides are shown by the horizontal boxes. Tryptic fragments identified by MS/MS are shown in white, those not detected are shown in black.
  • the N-terminal region of the M. tuberculosis GIpX sequence (SEQ ID No: 4) was aligned with the N-terminal region of GIpX homologues from six other actinomycetes (Streptomyces coelicolor, SEQ ID No: 5; Coi ⁇ nebacterium diphtheriae, SEQ ID No: 6; Corynebacterium glutamicum, SEQ ID No: 7; Mycobacterium smegmatis, SEQ ID No: 8; M avium, SEQ ID No: 9; M.
  • actinomycetes Streptomyces coelicolor, SEQ ID No: 5; Coi ⁇ nebacterium diphtheriae, SEQ ID No: 6; Corynebacterium glutamicum, SEQ ID No: 7; Mycobacterium smegmatis, SEQ ID No: 8; M avium, SEQ ID No: 9; M.
  • coli Esco
  • Streptomyces coelicolor Sco
  • Coiynebacterium diphtheriae Cdi
  • Corynebacterium glutamicum CgI
  • Mycobacterium smegmatis Msm
  • Mycobacterium avium Mav
  • Mycobacterium leprae MIp
  • Mtb M. tuberculosis
  • TAEGSGSSTAAVASHDPSHTRPSR 2366.1; MS/MS: yl,4,8,9,21).
  • FIG. 3 Translation start site reassignment for r ⁇ bH (RvI 416).
  • RvI 416 There is a 14 bp gap between the stop codon of ribA2 and the M. tuberculosis annotation predicted start codon (p ⁇ ) of ribH. No peak corresponding to this start was seen; however peaks corresponding to the ml start variant (dashed box) were identified.
  • the corrected TSS overlaps the upstream ribA2 (RvI 415) gene by four base pairs — an arrangement often observed in prokaryotic genomes for functionally related genes in a common operon.
  • SEQ ID No: 14 is the C-terminal sequence of ribA2 shown in Figure 3
  • SEQ ID No: 15 is the N-terminal sequence of ribH shown in Figure 3
  • SEQ ID No: 16 is the nucleotide sequence shown in Figure 3 spanning the ribA2 and ribH genes.
  • Figure 4. Amino acid sequence of GIpX from M. tuberculosis strain H37Rv taken from Genbank Accession No. NP_215615 (SEQ ID No: 1).
  • FIG. 7 Sequence alignment of the catalytic domains of three Saccharomyces cerevisiae N-terminal acetyl transferases NatA (SEQ ID No: 17), NatB (SEQ ID No: 18) and NatC (SEQ ID No: 19).
  • FIG. 1 Sequence alignment of the catalytic domain of S. cerevisiae NatC (SEQ ID No: 19) and the M. tuberculosis protein Riml (SEQ ID No: 20).
  • FIG. 9 Sequence alignment of the catalytic domain of S. cerevisiae NatC (SEQ ID No: 19) and the M, tuberculosis protein RvO 133 (SEQ ID No: 21).
  • FIG. 1 Sequence alignment of the catalytic domain of S. cerevisiae NatC (SEQ ID No: 19) and the M. tuberculosis protein Rv3225c (SEQ ID No: 22).
  • Example 1 Rapid experimental determination of translational starts using peptide mass mapping and tandem mass spectrometry within the proteome of Mycobacterium tuberculosis
  • TSS TSS Determining the TSS accurately is important, not only because this defines the amino acid sequence of the protein (as the stop codon is unambiguous), but also because this also defines the upstream region in the DNA. Genome-wide and focused studies of promoter structure and regulatory motifs depend on the intergenic regions defined by the gene-finding process (Edwards et al, 2005; Salgado et al, 2000). The TSS assignment therefore affects analysis of both protein function and of transcriptional regulation.
  • TSS identification has been achieved using N-terminal sequencing by Edman degradation (Edman, 1950). This is often technically demanding, and requires large (pmol) quantities of protein. Furthermore, some proteins are blocked through N-terminal modifications and cannot be sequenced. Proteomic approaches using sensitive mass spectrometry methods (Jungblut et al, 1999; Jungbhit et al, 2000) have revolutionised protein analysis because of the speed and sensitivity of the technology, but are not used to identify TSSs.
  • M. tuberculosis protein GIpX (RvI 099c) were added to the list because although the protein does not meet all our selection criteria, we had strong previous evidence of incorrect translational start prediction for it (Movahedzadeh et al, 2004). From these proteins, 15 proteins with the best available spectral data were selected for further investigation (Table 3). AU these proteins were tested for the presence of a signal peptide using the online SignalP 3.0 resource available at http://www.cbs. dtu.dk/services/SignalP/ (Bendtsen et al, 2004). No signal peptide was detected in any of them.
  • Mass spectra were recorded in Reflectron mode with delayed extraction on. 256 laser shots constitute one spectrum. Reanalysis for MS/MS confirmation of the N- termini was performed with a 4700 Proteomics Analyzer (Applied Biosystems, Foster City, Ca, USA).
  • the gel pieces were destained in 500 ⁇ l of destaining buffer (200 mM NH 4 HCO 3 , 50% acetonitrile (ACN)), equilibrated in 500 ⁇ l digestion buffer (50 mM NH 4 HCO 3 , 5% ACN). The supernatant was removed and the gel piece was dried within 30 min in a Micro concentrator 5301 (Eppendorf) at 3O 0 C and then digested in 25 ⁇ l digestion buffer containing 0.1 ⁇ g trypsin (Sequencing grade modified Trypsin, Promega, WI, USA). After digestion overnight at 37 0 C the reaction tube was centrifuged and the supernatant (Sl) transfe ⁇ -ed into another tube.
  • destaining buffer 200 mM NH 4 HCO 3 , 50% acetonitrile (ACN)
  • 500 ⁇ l digestion buffer 50 mM NH 4 HCO 3 , 5% ACN.
  • the supernatant was removed and the gel piece
  • the dried peptides were dissolved in 1 ⁇ l of 33% ACN and 0.1% TFA and 0.25 ⁇ l of this solution were mixed with 0.5 ⁇ l of alpha-Cyano-4-hydroxycinnamic acid (CHCA) sol ⁇ bilised in 50% ACN, 0.3%TFA on parafilm and the resulting mixture was transferred to the template of the mass spectrometer and analysed.
  • CHCA alpha-Cyano-4-hydroxycinnamic acid
  • the peptide masses were obtained using the following parameters: reflectron mode, 2OkV accelerating voltage, a low mass gate of 500Da and a mass range between 500 and 4000Da.
  • MS/MS spectra were obtained with and without collision gas.
  • the database searches were performed with Mascot (http://www.matrixscience.com). Search parameters were: 30 ppm peptide mass tolerance for peptide mass fingerprints and 0.3 Da for MS/MS spectra. MS/MS data were compared with the theoretical sequence data of the protein under investigation.
  • Search criteria were: one and two missed cleavages allowed and possible oxidation of methionine, N-terminal acetylation of the protein and pyro GIu formation from N-terminal Gin, propionamide and sodium adducts. MS/MS spectra were manually evaluated.
  • Table 3 shows the data we obtained. We initially analysed existing data, which were adequate in some instances but not in others, requiring repeat MALDI-MS analyses. Of the 15, we had difficulties with two. In one case (RvlO17c;PrsA), the predicted N-terminal peptide coincided with a common contamination peak. Another (Rv2557) came from a spot containing more than one protein and was therefore unusable for the present investigation. The other 13 proteins were successfully analysed, and are briefly presented below.
  • the fMet is cleaved, and in addition to an unmodified protein species there is also a less abundant spot with an acetylated N-terminal peptide ( Figure 2).
  • the methionine was encoded by a GTG, which would normally code for valine further confirming that this was the translational start. This sequence was confirmed by MS/MS.
  • the GGA... peptide was confirmed by 8 y ions (1,2,3,5,6,9,10, 12) and 3 b ions (4,5,8) and the MKGGA... peptide by 8 y ions (1,3,4,5,6,9,10,12).
  • the expected ions y5 and ylO, both with a cleavage after D and the mass loss of 64 further confirm the sequence.
  • This new ribH start would overlap with the ribA2 stop codon (GTGA; Figure 3) at position -4.
  • Such an arrangement is known to be the most common for prokaryotic genes thought to be part of the same operon (Salgado et al, 2000) as is the case with r ⁇ bA2 and ribH.
  • proteomic methods can be used to identify protein TSSs.
  • TSS protein TSS
  • glpX ribH
  • RvI 416 the alteration is minor, but makes biological sense, and indicates that a predicted ribosome-binding site is probably non- functional.
  • ribH RvI 416
  • the correction of mis-assigned start sites underlines the need to carry out this work, confirmation of current predictions is also an important contribution.
  • proteins are mainly identified through homology, it is critical that at least a small number of representatives have confirmed TSSs, in order to anchor the remaining predictions within an experimental context.
  • ATG is the classical initiation codon, but in GC-rich genomes such as M. tuberculosis ⁇ GTG is also utilised.
  • Our method conclusively confirms the starts where a methionine residue is seen with a GTG or TTG codon, as this is the only scenario where such a substitution would take place.
  • Met residues were detected with both RibH and RvO738, even though they start with GTG (valine) codons.
  • proteomic analyses One benefit of proteomic analyses is the ability to identify post-translational modifications. A common modification is removal of fMet, and we observed this for 11 of the 13 proteins. In the two cases where there was no cleavage, the residue had been deformylated. In some cases, methionine residues (both N- terminal and internal) had been oxidised (Schmidt et al, 2006); we cannot tell if this occurred intracellularly or during the experimental procedure.
  • N-terminal peptides from three proteins were at least partially acetylated. MS-MS sequencing showed the acetyl group to be on the N-terminal residue.
  • N-terminal acetylation is the norm in eukaryotic proteins, it is reported to be rare in prokaryotes (Polevoda & Sherman, 2002). N-terminal acetylation is amino-acid dependent (Persson et al, 1985), and our observations of acetylated serine or threonine residues are in line with previous work. N-terminal acetylation may alter function of the protein. The only M.
  • tuberculosis protein that has been reported to be N-terminally acetylated is the antigen ESAT-6 which normally interacts with the protein CFP-10 but which fails to do so when acetylated (Okkels e ⁇ ⁇ /., 2004).
  • CP cellular proteins
  • CSN culture supernatant
  • Curved brackets indicate a missing residue, presumed to be a cleaved fMet, and the amino acid predicted from the DNA sequence
  • Non-acetylated peptide also found, mass 2366.1 Spot from M. bovis BCG Chicago Non-acetylated peptide also found, mass 1113,52 Non-acetylated peptide also found, mass 1588,74
  • the first sequence for each gene corresponds to the sequence of the detected N-terminal peptide
  • the second sequence corresponds to the N-terminal peptide sequence predicted from the nucleotide sequence.
  • Example 2A Identification of M tuberculosis protein N-terminal acetylases
  • Protein N-terminal acetylation in Saccharomyces cerevisiae occurs by any of three protein N-terminal acetyl transferases: NatA, NatB or NatC (Polevoda & Sherman 2003a, b). These enzymes have different specificities; thus while NatA tends to acetylate proteins starting with the amino acids Ser, Ala, GIy or Thr, NatB acetylates proteins starting with Met-Glu, Met-Asp, Met-Asn or Met-Met, and NatC functions with proteins beginning Met-Ile, Met-Leu, Met-Trp or Met-Phe.
  • the yeast enzymes each have a catalytic subunit and 1-2 auxiliary subunits.
  • the three catalytic subunits (which are 176-195 residues long, with predicted molecular weights of 19.7-22.9 kDa) are relatively dissimilar, but are clearly related. They show homology along the length of the proteins, as shown in Figure 7.
  • Sc_NatC_cat matches to M. tuberculosis Riml (E 9e-05), RvO133 (E 0.008), and Rv3225c (E 0.0932) (see Figures 8-10, respectively).
  • Riml is a 158 residue protein whose sequence is shown in Figure 8 (SEQ ID No: 20). Riml encodes a probable ribosomal protein alanine acetyl transferase, that acetylates ribosomal protein S 18.
  • Riml appears to lie within an operon of six genes, which includes an essential peptidoglycan biosynthesis gene, air, encoding alanine racemase.
  • the other gene with a predicted function is gap, that encoded a putative O-sialoglycoprotein endopeptidase
  • RvOl 33 is a 201 residue protein whose sequence is shown in Figure 9 (SEQ ID No: 21).
  • RvO133 codes for a 22.8 kDa protein, that shows low but significant homology with Sc_NatC_cat: This type of scattered homology results in poor matches using BLAST (which looks for clustered local alignments), but gives confidence that the proteins are genuinely related.
  • BLAST which looks for clustered local alignments
  • NatC_cat is more related to RvOl 33 that the other yeast proteins.
  • the RvOl 33 gene is not obviously in an operon, so adjacent genes cannot give clear indications of function.
  • RvO133 The Tuberculist annotation of RvO133 indicates that it is "a probable acetyl transferase (EC 2.3.1.-), highly similar to others e.g. PUAC_STRLP
  • Rv3225c is a 474 residue protein whose sequence is shown in Figure 10 (SEQ ID No: 22). Rv3225c encodes a larger protein (51.2 IcDa) that shows a relatively high level of homology to Sc_NatC_cat in the N-terminal half of the protein. Rv3225c does not appear to lie in an operon.
  • Example 2B Identification of M. tuberculosis protein N-terminal acetylases
  • the following in vitro N-acetylation assay (adapted from Sugiura et al 2003 and Arnesen et al 2005) is used to determine their protein NAT activity.
  • the purified proteins are incubated with [ 3 H]acetyl-CoA and an acetyl group acceptor such as adrenocorticotropic hormone peptide, or the M. tuberculosis GIpX protein or its N-terminal peptide.
  • the acetylated peptide/protein is separated from the radioactive substrate by cation-exchange chromatography, affinity chromatography or size-exclusion chromatography.
  • the radioactive count is taken.
  • the presence and level of the radioactivity is indicative of the level of N- acetyl transferase activity.
  • this assay can be carried out using unlabelled acetyl-CoA, and the location of the acetyl group can be confirmed by mass spectrometry.
  • Arylamine N- acetyltransferase is required for synthesis of mycolic acids and complex lipids in Mycobacterium bovis BCG and represents a novel drug target. J Exp Med 199, 1191-1199.
  • the Mycobacterium tuberculosis RvI 099c gene encodes a GlpX-like class II fructose 1,6-bisphosphatase. Microbiology 150, 3499-3505. Okkels, L. M., Muller, E. C, Schmid, M., Rosenkrands, L, Kaufmann, S. H., Andersen, P. & Jungblut, P. R. (2004). CFPlO discriminates between nonacetylated and acetylated ESAT-6 of Mycobacterium tuberculosis by differential interaction. Proteomics 4, 2954-2960.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Immunology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hematology (AREA)
  • Urology & Nephrology (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Zoology (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Genetics & Genomics (AREA)
  • Cell Biology (AREA)
  • Wood Science & Technology (AREA)
  • Food Science & Technology (AREA)
  • Optics & Photonics (AREA)
  • Gastroenterology & Hepatology (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Peptides Or Proteins (AREA)

Abstract

A method of identifying an N-terminally acetylated protein in a bacterium, the method comprising providing details of at least one putative translation start site (TSS) for at least one protein expressed in the bacterium, confirming the actual TSS of the at least one protein using mass spectrometry (MS), and determining whether the at least one protein is N-terminally acetylated using MS. A method of identifying an N-terminally acetylated protein in a bacterium, the method comprising providing a mutant strain of the bacterium comprising at least one protein N-acetyl transferase (pNAT) in mutant form, providing a wild-type strain of the bacterium comprising the at least one pNAT in wild-type form, and identifying a protein that is differentially N-terminal acetylated between the mutant and wild-type bacterial strains. A method of identifying a drug discovery target in a pathogenic bacterium, the method comprising determining at least one property of an N-terminally acetylated protein expressed in a pathogenic bacterium and which is relevant to the pathogenicity of the pathogenic bacterium. A method of identifying a bacterial pNAT. A method of screening for an inhibitor of a bacterial pNAT.

Description

ACETYLATION
This invention relates to acetylation, and in particular to N-teπninal acetylation of proteins in bacteria.
The listing or discussion of a prior-published document in this specification should not necessarily be taken as an acknowledgement that the document is part of the state of the art or is common general knowledge
Proteomic approaches have been used for protein analysis in recent years, and mass spectrometric methods, such as MALDI-MS, allow the identification both of proteins and post-translational modifications. Modification of proteins extends the range of possible molecular structures beyond the limits imposed by the 20 encoded amino acids and, if reversible, gives a means of control and signaling.
Over 200 types of covalent modification of proteins have been reported and, at least for eukaryotic proteins, acetylation is the most common. Amino (N) terminal acetylation occurs on approximately 50% of yeast proteins and 80-90% of proteins in higher eukaryotes (Polevoda & Sherman 2002), and affects many protein functions including enzymatic activity, stability, DNA binding, protein-protein interaction and peptide-receptor recognition, and occurs on numerous and diverse proteins. However, acetylation is believed to be very rare in prokaryotic proteins (Polevoda & Sherman 2003b; Polevoda & Sherman, 2006) because the data, mainly from E. coli, have shown very few bacterial proteins to be N-terminally acetylated (Walker 1963).
Tuberculosis, caused by Mycobacterium tuberculosis (M. tuberculosis), has been declared a global emergency and is the most frequent infectious cause of mortality in the world. The emergence of many strains of M. tuberculosis resistant to the currently available chemotherapeutics, particularly INH, is an additional cause of alarm. There is thus a desperate need for additional drugs against tuberculosis, and thus for the identification of new targets for drug discovery in M. tuberculosis. M. tuberculosis is likely to contain many N-acetyl transferases (NATs). For example, an arginine N-acetylating enzyme has been described, although there is no evidence that this particular NAT is involved in protein acetylation (Errey & Blanchard, 2005). It has also been shown that heterologous expression of an M. tuberculosis arylamine N-acetyl transferase (referred to herein as aNAT) in M. smegmatis results in a concurrent increase in resistance to INH, and the disruption of the NAT gene resulted in the delay in growth of the knocked-out strain compared with the wild type (Bhakta et al, 2004). Although the endogenous function of this aNAT is still uncertain, it is believed that it has a significant role in xenobiotic metabolism and is responsible for inactivation of the anti-tubercular drug isoniazid. Some studies also suggested that N-acetyl transferases have a role in cancer development (Hein et al, 2000).
A polyclonal antiserum against a recombinant arylamine N-acetyl transferase (aNAT) protein has been developed, and aNATs have been suggested to be drug targets by using special aNAT substrates and inhibitors (Payton et al, 2001). In addition, high throughout screening has identified novel substrates for bacterial aNAT, which suggested an endogenous role of aNAT in the protection of bacteria from aromatic and lipophilic toxins (Brooke et al, 2003).
One M. tuberculosis protein has been previously been shown to be N-terminally acetylated. hi the virulence factor ESAT6, acetylation is required for dimerisation, indicating that, in this example at least, acetylation is critical for function (Okkels et al, 2004). Nevertheless, the general expectation in the art is that bacterial proteins are generally not N-terminally acetylated.
As part of a study to determine the actual translational start sites of M. tuberculosis proteins predicted by bioinformatics analyses, we adopted a proteomic approach to identify N-terminal peptides of proteins from M. tuberculosis. Surprisingly and unexpectedly, we found that 3 out of 13 (23%) of these proteins had been acetylated. Since these proteins were chosen in a way that does not select for acetylated proteins, we now consider that bacteria other than E. coli have a much higher degree of acetylation than was previously accepted, potentially regulating protein activity in these bacteria. This has significant implications for identifying acetylated proteins as potential targets for drug discovery, and for potential therapeutic treatment via use of inhibitors of protein N-aceryl transferases.
Indeed, prior to the present invention, there was no reason to suspect that N-terminal acetylation of proteins might be present to a significant extent, or have a major role, in prokaryotes. Thus there was there no motivation for a skilled person to even attempt to identify N-terminally acetylated proteins in bacteria or assess their suitability as drug discovery targets. Furthermore, prior to the present invention there was no reason to identify protein N-acetyl transferase enzymes in bacteria, nor any motivation to test inhibitors of protein N-acetyl transferases as potential therapeutic agents.
Without wishing to be bound by theory, we consider that this finding of higher levels of N-terminal protein acetylation is applicable to bacteria other than M. tuberculosis. The general expectation in the art that bacterial proteins are not N-terminally acetylated was derived from data in the Gram-negative bacterium E. coli (Walker 1963). By contrast M. tuberculosis is a Gram-positive bacterium in the genus Mycobacterium, class Actinobacteria, order Actinomycetales, and family Mycobacteriaceae. Thus we now consider that many bacteria, particularly Gram- positive bacteria, contain N-acetylated proteins, with E. coli being unusual. Furthermore, without wishing to be bound by theory, as N-terminal acetylation of proteins is common in eukaryotes, the presence of this activity in bacteria suggests that this process originally evolved in bacteria, and was then lost in species such as E. coli. It is axiomatic that the more closely related an organism is to M. tuberculosis, the more likely they are to have the similar properties. Accordingly, we consider that other high-GC Gram-positive bacteria, especially other actinomycetes, and in particular other mycobacteria, are likely to share widespread N-terminal acetylation of proteins.
A first aspect of the invention thus provides a method of identifying an N- terminally acetylated protein in a bacterium, the method comprising: providing details of at least one putative translation start site (TSS) for at least one protein expressed in the bacterium; confirming the actual TSS of the at least one protein using mass spectrometry (MS); and determining, using MS, whether the at least one protein is N-terminally acetylated.
This method is described in Example 1 for a representative sample of 13 proteins from M. tuberculosis in which 3 out of 13 (23%) of these proteins were found to be N-terminally acetylated.
It is appreciated that by identifying an N-terminally acetylated protein which is expressed in a bacterium, we do not mean that the protein is translated in its acetylated form. As is well known in the art, N-terminally acetylation is a post- translational modification.
Preferably, the MS methods used are matrix-assisted laser desorption/ionisation (MALDI) MS (Karas & Hillenkamp, 1988) or electrospray MS (Fenn et ah, 1989). Other suitable MS techniques include tandem mass spectrometry (MS/MS; Medzihradszky et al, 2000). Suitable MS methods are reviewed by Aebersold & Mann (2003); and Domon & Aebersold (2006).
Conveniently, MALDI-MS is used for confirming the actual TSS of the at least one protein.
Preferably, MS/MS is used for determining whether the actual TSS of the at least one protein is N-terminally acetylated.
In one embodiment, different MS methods are used for confirming the actual TSS of the at least one protein, and for determining whether the actual TSS of the at least one protein is N-terminally acetylated. Typically, MALDI-MS is used for confirming the actual TSS of the at least one protein, and MS/MS is used for determining whether the at least one protein is N-terminally acetylated. This aspect of the invention thus includes the use of mass spectrometry (MS) in the identification of an N-terminally acetylated protein expressed in a bacterium.
It is preferred if the protein is not a ribosomal protein.
It is also preferred if the protein is one that does not contain an N-terminal signal sequence.
It is appreciated that there are additional methods that can be used for identifying an N-terminally acetylated protein in a bacterium other than those described in the first aspect of the invention. For example, N-acetyl transferase (NAT) mutant bacteria can be employed to identify N-terminally acetylated proteins.
A second aspect of the invention thus provides a method of identifying an N- terminally acetylated protein in a bacterium, the method comprising: providing a mutant strain of the bacterium comprising at least one protein N-acetyl transferase (pNAT) in mutant form; providing a wild-type strain of the bacterium comprising the at least one pNAT in wild-type form; and identifying a protein that is differentially N-terminal acetylated between the mutant and wild-type bacterial strains.
In other words, this aspect of the invention includes the use of N-acetyl transferase (pNAT) mutant bacteria in the identification of N-terminally acetylated proteins.
It is preferred if the at least one mutant pNAT in the mutant bacterial strain is nonfunctional. Typically the at least one mutant pNAT has been deleted ("knocked- out") and hence can not possess any partial or residual activity.
Methods for making pNAT mutant bacterial strains, including knock-out mutants, are well known in the art. For example, Parish & Stoker (2000) describe suitable techniques for cloning and mutagenesis of Mycobacteria, and Sambrook et al (2001) "Molecular Cloning, a Laboratory Manual", 3rd edition, Sambrook et al (eds), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA, describes general techniques for bacterial manipulation that would be useful for this purpose.
In an embodiment, the mutant bacterial strain contains at least two pNATs in mutant form. This is in case of functional redundancy between the at least two pNATs.
As described below in Example 2, we consider the M. tuberculosis proteins Riml, RvO133 and Rv3225c to be pNAT enzymes. Accordingly, in one preferred embodiment of this aspect of the invention, the mutant bacterial strain is M. tuberculosis and the at least one pNAT in mutant form is selected from Riml, RvO133 and Rv3225c.
Typically, the step of identifying proteins that are differentially N-terminal acetylated between the mutant and wild-type bacterial strains comprises: extracting proteins from the mutant and wild-type bacterial strains and separating the proteins by 2D-P AGE; determining a difference between the 2D-P AGE patterns obtained from the mutant and wild-type bacterial strains; and identifying a protein which corresponds to the difference between the two 2D-P AGE patterns.
Alternatively, liquid chromatography could be used instead of 2D-P AGE (Yates et al., 1999).
The method will typically further comprise confirming the N-terminal acetylation status of the identified protein from one or both of the mutant and wild-type bacterial strains.
Usually, MS is used to determine the identity of the proteins that correspond to the differences between the two 2D-P AGE patterns, and to confirm the N-terminal acetylation status of the identified protein. Typically, MALuϊ-MS is used initially to identify the N-terminal acetylated peptide, and MS/MS is used to confirm the location of the acetyl group. It is also possible to determine the N-terminal sequence using Edman degradation, but this is much less preferred.
Preferably, the bacterium used in the first and second aspects of the invention is a Gram-positive bacterium, and more preferably it is afirmicute. In an embodiment, the bacterium is an αctinomycete, which may be selected from a Mycobacterium, Corynebacterium or a Nocardia.
A large number of species are included within the genus of Mycobacteria, and these include M. tuberculosis, M. avium, M. leprae, M. bovis, M. smegmatis, M. paratuberculosis and M. marinum Other examples of suitable mycobacteria are mentioned by Tsukamura (1983, Microbiol Immunol. 27(4): 315-34, incorporated herein by reference) who describes a numerical classification of 280 strains of slowly growing mycobacteria. M. tuberculosis is most preferred.
It is appreciated that the bacterium may be a pathogenic bacterium and the protein identified by the methods of the first and second aspects of the invention may be a drag discovery target for a disease or condition caused by the pathogenic bacterium.
A third aspect of the invention thus provides a method of identifying a drug discovery target in a pathogenic bacterium, the method comprising identifying an N-terminally acetylated protein expressed in a pathogenic bacterium according to the methods of the first or second aspects of the invention.
In a preferred embodiment, the method further comprises determining at least one property or activity of the protein that is affected by N-tenninal acetylation of the protein, and which affected property or activity is relevant to the pathogenicity of the bacterium. Thus this aspect of the invention includes identifying a protein that is N-terminally acetylated in a pathogenic bacterium, and determining the effect of the N-terminal acetylation on the protein in vitro and/or on the bacterium in culture.
Typically, determining the effect of N-terminal acetylation on at least one property or activity of the protein which is relevant to the pathogenicity of the bacterium is carried out by comparing N-terminally acetylated and non-acetylated forms of the protein in vitro.
As is well known by the skilled person, protein N-acetyl transferases (pNATs) preferentially acetylate proteins starting with particular amino acids, which are serine (Ser) and alanine (Ala), followed by methionine (Met), glycine (GIy) and threonine (Thr). The three N-terminally acetylated proteins that we identified in M. tuberculosis have N-terminal sequences that fit with the eukaryotic pattern. Therefore it should be possible to prevent acetylation by altering the N-terminal sequence. For example, mutating the protein so that it has a proline (Pro) at the N- terminus inhibits acetylation in eukaryotes and is also expected to prevent acetylation in prokaryotes.
Accordingly, one approach for generating a non-acetylated version of a protein that is normally subject to N-terminal acetylation includes carrying out site- directed mutagenesis to alter the start of the gene such that a non-acetylatable amino acid is encoded, and expressing and purifying the protein.
An alternative approach for generating N-terminal acetylated and non-acetylated protein for carrying out an in vitro assay includes expressing and purifying an in vivo N-terminally acetylated protein, and treating it with a deacetylase. Conversely, a non-acetylated protein can be expressed and purified, and then acetylated using a pNAT.
A further approach is to express a protein with an N-terminal tag which can be cleaved off following purification. Thereafter the functional activity or property of the protein can be assayed, and the effect of N-terminal acetylation can be assessed.
Techniques for cloning, manipulation, modification and expression of nucleic acids, including protein engineering and site-directed mutagenesis and purification of expressed proteins, are very well known in the art and are described for example in Sambrook et al (2001), supra.
Suitable activities and properties of a protein that may be affected by its N- terminal acetylation and which can be tested in vitro and/or in culture depend upon what the protein is known or predicted to do. For example, as would be appreciated by the skilled person, if the protein has an enzymatic function, this function could be assayed in vitro. Acetylation may also affect protein-protein binding (which can be tested by immuno co-precipitation), dimerisation, and protein stability. If the protein is related to iron metabolism, growth in medium containing high/low iron may be tested. If the protein is related to oxidative stress, survival in the presence of hydrogen peroxide, or in activated versus resting macrophages, could be tested. Other suitable properties of the bacterium that may be affected by N-terminal acetylation of a protein, and which can be tested for in cell culture, include survival, growth rate, and drug resistance.
Additionally or alternatively, determining at least one activity or property of a protein that is N-terminally acetylated in a pathogenic bacterium, and which may be affected by its N-terminal acetylation, may be performed in vivo.
As will be appreciated by the skilled person, properties of the bacterium that may be affected by N-terminal acetylation of a protein may relate to the pathogenic characteristics of the bacterium. Thus, in an embodiment, an animal model of the pathogenic condition caused by the pathogenic bacterium may usefully be employed. Properties of the pathogenic bacterium which may be tested in vivo include infectivity, induction of pathology, and drag resistance (Parish et al, 2003). It is thus appreciated that the identified protein may be a drag discovery target for a disease or condition caused by the pathogenic bacterium.
In a preferred embodiment, the bacterium is M. tuberculosis and the protein is a drug discovery target for tuberculosis.
In alternative embodiments, the bacterium may be M, leprae and the protein may be a drug discovery target for leprosy; the bacterium may be M. avium, and the protein may be a drug discovery target for the M. avium complex; the bacterium may be Coiγnebacterium diphtheriae and the protein may be a drug discovery target for diphtheria.
In other embodiments, the bacterium may be Nocardia asteroides and the protein may be a drug discovery target for nocardiosis; the bacterium may be Actinomyces spp and the protein may be a drug discovery target for actinomycosis; and the bacterium may be Arcanobacterium spp and the protein may be a drug discovery target for Arcanobacterium spp infection.
In additional embodiments, the bacterium may be Listeria monocytogenes and the protein may be a drug discovery target for listeria; the bacterium may be streptococcus such as Strep pneumoniae or Strep pyogenes and the protein may be a drug discovery target for streptococcal infections; the bacterium may be staphylococcus such as Staph aureus and the protein may be a drag discovery target for staphylococcal infections including MRSA; the bacterium may be an enterococcus and the protein may be a drag discovery target for enterococcal infections; the bacterium may be Bacillus anthracis and the protein may be a drug discovery target for anthrax; the bacterium may be Bacillus cereus and the protein may be a drag discovery target for food poisoning; or the bacterium may be a
Clostridia such as C. perfringens, C. difficile, C. tetani, and C. botulinum, and the protein may be a drag discovery target for diseases caused by these Clostridia including tetanus and botulism (see http ://www.textbookofbacteriolo gy.net/) . As discussed in ϋxampie 1, we have carried out the method of the first aspect of the invention and have identified three proteins from M. tuberculosis that were found to be N-terminally acetylated. These are GIpX, PrcA and ArgD.
A fourth aspect of the invention thus provides an isolated, N-terminally acetylated M. tuberculosis protein which is selected from GIpX, PrcA and ArgD, or an N- terminal fragment thereof.
The amino acid sequence of GIpX from M. tuberculosis strain H37Rv is listed in Genbank Accession No. NP_215615 and in Figure 4 (SEQ ID No: 1), while GIpX from M. tuberculosis strain CDCl 551 is listed in Genbank Accession No. NP_335575. The amino acid sequence of PrcA from M. tuberculosis strain H37Rv is listed in Genbank Accession No. NP_216625 and in Figure 5 (SEQ ID No: T), while PrcA from M. tuberculosis strain CDCl 551 is listed in Genbank Accession No. NP_336638. The amino acid sequence of ArgD from M. tuberculosis strain H37Rv is listed in Genbank Accession No. NP_216171 and in Figure 6 (SEQ ID No: 3), while ArgD from M. tuberculosis strain CDC1551 is listed in Genbank Accession No . NP_336148.
By "an N-terminal fragment" of the specified proteins we include a region of at least 5 consecutive amino acids, more preferably at least 10, or at least 15, or at least 20, or at least 30, or at least 50 amino acid residues of the protein. Alternatively, the N- terminal fragment may be at least 100 or at least 150 amino acid residues of the protein, but less than 100% of the length of the whole polypeptide. For avoidance of doubt, an N-terminal fragment of the invention is itself N-terminally acetylated. Such fragments may be useful, for example, to prepare antibodies which will specifically bind the N-terminally acetylated form of the protein.
It is well known that certain polypeptides are polymorphic, and it will be appreciated that some natural variation of these M. tuberculosis protein sequences may occur. Thus, in an embodiment, the invention is not limited to an isolated N- terminally acetylated M. tuberculosis protein having the sequence listed in Figures 4, 5 or 6 (SEQ ID Nos: 1-3), but includes naturally occurring variants thereof in which one or more of the amino acid residues have been replaced with another amino acid. In particular, the invention includes isolated, N-terminally acetylated M. tuberculosis GIpX, PrcA and ArgD proteins from strains other than H37Rv, in particular strain CDCl 551 and strain Erdmann.
The invention further includes isolated, N-terminally acetylated GIpX, PrcA and ArgD proteins from Mycobacteria other than M. tuberculosis. The percentage sequence identity between GIpX, PrcA and ArgD proteins from M. tuberculosis and M. bovis, M. marinum, M. leprae, M. paratuberculosis and M. smegmatis are listed in Table 1.
Figure imgf000014_0001
Accordingly, a fifth aspect of the invention provides an isolated, N-terminally acetylated protein having at least 80% sequence identity with an M. tuberculosis protein selected from GIpX, PrcA and ArgD, as provided in Figures 4 to 6 (SEQ ID Nos: 1-3), respectively, or an N-terminal fragment thereof.
More preferably, the protein has at least 81%, 82%, 83%, 84% or 85% sequence identity, and yet more preferably at least 86%, 87%, 88%, or 89% or at least 90%, 91%, 92%, 93%, 94% or 95% sequence identity, and yet more preferably at least 96%, 97%, 98% or at least 99% sequence identity with the M. tuberculosis GIpX, PrcA or ArgD protein from strain H37Rv, as provided in Figures 4 to 6 (SEQ ID Nos: 1-3), respectively.
The percent sequence identity between two polypeptides may be determined using suitable computer programs, for example the GAP program of the University of Wisconsin Genetic Computing Group and it will be appreciated that percent identity is calculated in relation to polypeptides whose sequence has been aligned optimally.
The alignment may alternatively be carried out using the Clustal W program (Thompson et al, (1994) Nucleic Acids Res 22, 4673-80). The parameters used may be as follows:
Fast pairwise alignment parameters: K-tuple(word) size; I3 window size; 5, gap penalty; 3, number of top diagonals; 5. Scoring method: x percent. Multiple alignment parameters: gap open penalty; 10, gap extension penalty; 0.05. Scoring matrix: BLOSUM.
The invention also provides an isolated, N-terminally acetylated GIpX, PrcA or ArgD protein from a Mycobacterium other than M. tuberculosis, in particular those Mycobacteria species listed above.
A sixth aspect of the invention provides a method of determining the effect of N- terminal acetylation on the function or activity of a protein as defined in the fourth or fifth aspects of the invention. The method comprises providing N-terminally acetylated and non N-terminally acetylated forms of the protein, and determining at least one property of the protein that is affected by N-terminal acetylation of the protein.
Typically, the at least one property of the protein or the at least one property of the cell which is tested is one that is relevant to the pathogenicity of M. tuberculosis. This method is typically performed in vitro, using methods well known in the art. For example, with respect to GIpX, fructose 1-6 bisphosphatase activity may be tested; with respect to argD, acetylornithine aminotransferase activity may be tested; and with respect to PrcA, either or both of protein degradation and sensitivity to nitric oxide may be tested (Darwin et al, 2003).
Alternatively the method may be performed in culture. In this embodiment, the method of determining the effect of N-terminal acetylation on the function or activity of a protein as defined in the fourth or fifth aspects of the invention typically comprises providing a cell which contains the N-terminally acetylated form of the protein and a cell which contains the non N-terminally acetylated form of the protein, and determining at least one property of the cell that is affected by N-terminal acetylation of the protein.
Preferably the cell is a bacterial cell, most preferably M. tuberculosis.
As discussed above, the present invention provides, for the first time, motivation for the skilled person to attempt to identify protein N-acetyl transferase (pNAT) enzymes in bacteria, and to assess their suitability as drug discovery targets. In particular, inhibitors of protein N-acetyl transferase enzymes may be therapeutically useful.
A seventh aspect of the invention thus provides a method of identifying a protein N- acetyl transferase (pNAT) in a bacterium, the method comprising: providing details of at least one putative pNAT; providing a mutant strain of the bacterium in which the putative pNAT gene has been knocked-out; and determining the amount, rate and/or level of N-terminal protein acetylation in the mutant bacterial strain, wherein a reduction in the amount, rate and/or level of N-terminal protein acetylation in the mutant bacterial strain in comparison to the amount, rate and/or level of N-terminal protein acetylation in a wild-type strain of that bacterium, indicates that the putative pNAT is an actual pNAT.
Typically, determining the amount, rate and/or level of N-terminal protein acetylation in the mutant bacterial strain comprises measuring the level of pNAT enzyme activity in vitro. A suitable method for determining the amount, rate and/or level of N-terminal protein acetylation is described below and in Example 2B.
As described below in Example 2A, putative bacterial pNATs may be identified by virtue of their homology to known yeast pNATs such as NatA, NatB and NatC. Putative bacterial pNATs may alternatively be identified by homology searches of protein motif databases such as pfam (http://www.sanger.ac.uk/Software/Pfam/). Further alternatively, putative bacterial pNATs may be identified by virtue of their homology to other bacterial pNATs, such as the M. tuberculosis proteins Riml, Rv0133 and Rv3225c.
It is appreciated that, usually, the identified pNAT is further tested by complementing the mutant strain of the bacterium with a polynucleotide encoding the knocked-out pNAT, to show that the N-terminal acetylation activity of the mutant bacteria is thereby restored.
As described above, suitable methods for making pNAT mutant bacterial strains, including knock-out mutants, are well known in the art (see, Parish & Stoker, 2000; Sambrook et al, 2001).
Suitable bacteria include those described above with respect to the first and second aspects of the invention. However, in a preferred embodiment, the bacterium is a pathogenic bacterium and the pNAT identified by the methods of this aspect of the invention may be a drug discovery target for a disease or condition caused by the pathogenic bacterium. Suitable pathogenic bacteria, and the diseases that they cause, include those listed above with respect to the third aspect of the invention.
An eighth aspect of the invention provides a method of identifying a polynucleotide encoding a pNAT from a desired bacterium, the method comprising: providing a library of polynucleotides from the desired bacteria, which library comprises at least one polynucleotide that encodes a putative pNAT; providing microbial cells of a strain in which a pNAT gene is naturally absent or has been knocked-out; transforming the library of polynucleotides into the microbial cells; incubating the transformed microbial cells under conditions that allow expression of the polypeptides encoded by the polynucleotides; and identifying one or more microbial cells in which the levels of protein N- terminal acetylation have been increased under the conditions in the previous step, wherein increased levels of protein N-terminal acetyiation in the identified microbial cell indicates that the transformed polynucleotide encodes a pNAT from the desired bacterium.
Preferably, the library of polynucleotides is a cDNA library. The library may be derived from a desired bacterial source, typically the bacteria listed above with respect to the first and second aspects of the invention, and more preferably, the pathogenic bacteria listed above with respect to the third aspect of the invention.
It is appreciated that in this aspect of the invention the microbial cells may be yeast cells, such as an S. cerevisiae. If the microbial cells are S. cerevisiae, the knocked-out pNAT may be NatA, NatB or NatC.
The microbial cells may be bacterial cells, such as M, tuberculosis, and the knocked-out pNAT may be Riml, RvO133 or Rv3225c. Alternatively, the microbial cells may be bacterial cells, such as E. coli, in which pNAT activity is thought not to be present.
Typically, the library of polynucleotides is under the control of an inducible promoter. Many such inducible promoter are known in the art. For yeast cells, the promoter may be a tetracycline-inducible promoter, a methionine-inducible promoter, a galactose-inducible promoter such as GALl or GALlO, or the CUPl metallothionein promoter (induced in the presence of Cu2+, Zn2+). For E. coli, lac or ara promoters are preferred.
A ninth aspect of the invention provides a method of identifying a pNAT from a desired bacterium, the method comprising: identifying a protein from the desired bacterium which requires N-terminal acetyiation for a specified activity; randomly mutagenising genes in the desired bacteria; selecting or screening for mutagenised bacteria that have lost the specified activity; and identifying at least one mutagenised gene from the bacteria that has lost the specified activity, wherein a loss of the specified protein activity indicates that the mutagenised gene encodes a pNAT from the desired bacterium.
For example, in M. tuberculosis, the Esatβ protein requires acetylation for an activity (Okkels et al, 2004). Since Esatβ requires N-terminal acetylation for secretion from the M. tuberculosis cell, an anti-Esat6 antibody could be used to screen for Esatβ secretion, and the absence thereof, using methods well known in the art.
A variety of methods are very well known in the art for carrying out the mutagenesis methods of the ninth aspect of the invention, and include transposon mutagenesis.
A tenth aspect of the invention provides a method of screening for an inhibitor of a bacterial pNAT, the method comprising: expressing and purifying the bacterial pNAT; contacting the pNAT with a test compound; and assaying for pNAT activity, wherein a reduction in pNAT activity in the presence of the test compound indicates that the test compound is a potential inhibitor of the pNAT.
The bacterial pNAT may be one identified by the method of the seventh, eighth or ninth aspects of the invention.
As described below in Example 2, we consider the M. tuberculosis proteins Riml, RvO133 and Rv3225c to be pNAT enzymes by virtue of their homology to known S. cerevisiae pNATs. Thus, typically the bacterial pNAT is a M. tuberculosis pNAT selected from Riml, Rv0133 and Rv3225c. In addition, the bacterial NAT may be a homologue of M. tuberculosis Riml, RvO133 and Rv3225c from a mycobacteria other than M. tuberculosis. In these other mycobacteria, homologues of Riml, RvOl 33 and Rv3225c can readily be identified and used in the screening methods, tor example, the percentage sequence identity between Riml, RvOl 33 and Rv3225c proteins from M. tuberculosis and M. bovis, M. marinum, M. leprae, M. paratuberculosis and M. smegmatis are listed in Table 2.
Table 2
Figure imgf000020_0001
* = The available sequence information does not contain a match having > 49% sequence identity.
Suitable assays for measuring N-terminal acetylation, and hence inhibition of pNAT enzymes, are known in the art (see Sugiura et al, 2003; and Axnesen et al,
2005). For example, the assay may comprise: incubating the purified pNAT protein with [3H]acetyl-CoA and an acceptor such as adrenocorticotropic hormone peptide, or a bacterial protein or its N- terminal peptide (such as M. tuberculosis GIpX) in the presence or absence of the test compound; separating the acetylated peptide/protein from the radioactive substrate e.g. by cation-exchange chromatography, affinity chromatography or size-exclusion chromatography; and counting the radioactivity of the acetylated peptide/protein in the presence or absence of the test compound.
The above aspect of the invention includes screening methods to identify drugs or lead compounds of use in treating the disease or condition caused by the bacterium (e.g. tuberculosis). It is appreciated that screening assays which are capable of high throughput operation are particularly preferred.
It is appreciated that in the methods described herein, which may be drug screening methods, a term well known to those skilled in the art, the compound may be a drag-like compound or lead compound for the development of a drug- like compound.
The term "drag-like compound" is well known to those skilled in the art, and may include the meaning of a compound that has characteristics that may make it suitable for use in medicine, for example as the active ingredient in a medicament. Thus, for example, a drag-like compound may be a molecule that may be synthesised by the techniques of organic chemistry, less preferably by techniques of molecular biology or biochemistry, and is preferably a small molecule, which may be of less than 5000 daltons and which may be water-soluble. A drug-like compound may additionally exhibit features of selective interaction with a particular protein or proteins and be bioavailable and/or able to penetrate target cellular membranes or the blood:brain barrier, but it will be appreciated that these features are not essential.
The term "lead compound" is similarly well known to those skilled in the art, and may include the meaning that the compound, whilst not itself suitable for use as a drag (for example because it is only weakly potent against its intended target, nonselective in its action, unstable, poorly soluble, difficult to synthesise or has poor bioavailability) may provide a starting-point for the design of other compounds that may have more desirable characteristics.
Thus in an embodiment, the method further comprises modifying the test compound, and testing the modified compound for the ability to inhibit the pNAT in vitro or in culture.
The method may further comprise determining whether the test compound or the modified compound has the ability to inhibit the pNAT in the bacterium, such as M. tuberculosis, in vivo.
Still further, the method may comprise determining whether the test compound or the modified compound has the ability to inhibit the pNAT in an in vivo model of the disease or condition caused by the bacterium. A suitable experimental model of tuberculosis is the experimental infection of a mouse with M. tuberculosis administered intravenously, intranasally or by aerosol. Bacterial numbers are measured in the lung, liver and spleen. Details vary according to route of administration, mouse strain, bacterial dose. Flynn (2006) provides a review of this and other suitable animal models.
In a further embodiment, the method may also comprise the step of formulating a compound which has the ability to inhibit the pNAT into a pharmaceutically acceptable composition.
Accordingly, the invention includes a pharmaceutical composition comprising a compound which has the ability to inhibit a bacterial pNAT that has been identified as described above, and a pharmaceutically acceptable carrier, diluent or excipient.
Whilst it is possible for a compound to be administered alone, it is preferable to present it as a pharmaceutical formulation, together with one or more acceptable carriers. The carrier(s) must be "acceptable" in the sense of being compatible with the compound of the invention and not deleterious to the recipients thereof. Typically, the carriers will be water or saline which will be sterile and pyrogen free.
The aforementioned compounds or a formulation thereof may be administered by any conventional method including oral, which is preferred, as well as parenteral (eg subcutaneous or intramuscular) injection. The treatment may consist of a single dose or a plurality of doses over a period of time.
"When the identified inhibitor is an inhibitor of a M. tuberculosis pNAT such as Riml, RvOl 33 and Rv3225c, the invention further provides a method of treating an individual suffering from a tuberculosis, the method comprising administering to a patient an appropriate quantity of a pNAT inhibitor compound identified as described above. Similarly, this embodiment of the invention provides the use of the M. tuberculosis pNAT inhibitor compound in the manufacture of a medicament for preventing or treating tuberculosis.
It is appreciated that in the context of screening for inhibitors of a pNAT from bacteria other than M. tuberculosis, references to models of tuberculosis and to treating tuberculosis are to be construed in relation to a disease or condition caused by the other bacteria as known in the art and discussed above.
All of the documents referred to herein are incorporated herein, in their entirety, by reference.
The invention will now be described in more detail by reference to the following Examples and Figures.
Figure 1. This figure illustrates the general strategy for identifying alternative translation start positions and tryptic digest peptides: application to GIpX (RvI 099c). The six alternative translation starts for GIpX are indicated by the labelled arrows. The m(inus)l, ml, m3 and m4 starts are located upstream of the original (pO) translation start indicated in the M. tuberculosis genome annotation. The p(lus)l and p2 starts are downstream of the original translation start prediction. The putative trypsin cleavage sites are indicated by the 'T'-labelled double-tailed bars. The resulting tryptic peptides are shown by the horizontal boxes. Tryptic fragments identified by MS/MS are shown in white, those not detected are shown in black.
The figure shows that the detected fragments could only have resulted from the digest of the m2 variant. For reference, the N-terminal region of the M. tuberculosis GIpX sequence (SEQ ID No: 4) was aligned with the N-terminal region of GIpX homologues from six other actinomycetes (Streptomyces coelicolor, SEQ ID No: 5; Coiγnebacterium diphtheriae, SEQ ID No: 6; Corynebacterium glutamicum, SEQ ID No: 7; Mycobacterium smegmatis, SEQ ID No: 8; M avium, SEQ ID No: 9; M. Smegmatis, SEQ ID No: 10), and with the N- terminal region of the GIpX homologue from E. coli (SEQ ID No: 11). Asterisks (*) indicate perfectly conserved residues, colons (:) indicate conserved substitutions (start codon methionines are ignored). The Coiynebacterium glutamicum homologue TSS (at position -14) has been confirmed experimentally (Rittmann et al, 2003). Organisms are E. coli (Eco), Streptomyces coelicolor (Sco), Coiynebacterium diphtheriae (Cdi), Corynebacterium glutamicum (CgI), Mycobacterium smegmatis (Msm), Mycobacterium avium (Mav), Mycobacterium leprae (MIp), and M. tuberculosis (Mtb).
Figure 2. Mass spectrometry of GIpX. (a) SDS-PAGE indicating two forms of GIpX, acetylated (1) and non- acetylated (2).
(b) identification of peptide spanning annotated start (NLAMELVR (SEQ ID
No: 12) = 945.52; MS/MS-detected b-ions: b2, b3, b4, b5, b6; detected y ions: yl, y25 y3, y5). (c) identification of acetylated N-terminal peptide (Ac-
TAEGSGSSTAAVASHDPSHTRPSR (SEQ ID NO: 13) = 2408.1; MS/MS: y4,5,8,9,10,21).
(d) identification of non-acetylated N-terminal peptide
(TAEGSGSSTAAVASHDPSHTRPSR (SEQ ID NO: 13) = 2366.1; MS/MS: yl,4,8,9,21).
Figure 3. Translation start site reassignment for rϊbH (RvI 416). There is a 14 bp gap between the stop codon of ribA2 and the M. tuberculosis annotation predicted start codon (pθ) of ribH. No peak corresponding to this start was seen; however peaks corresponding to the ml start variant (dashed box) were identified. Thus, the corrected TSS overlaps the upstream ribA2 (RvI 415) gene by four base pairs — an arrangement often observed in prokaryotic genomes for functionally related genes in a common operon. SEQ ID No: 14 is the C-terminal sequence of ribA2 shown in Figure 3, SEQ ID No: 15 is the N-terminal sequence of ribH shown in Figure 3, and SEQ ID No: 16 is the nucleotide sequence shown in Figure 3 spanning the ribA2 and ribH genes. Figure 4. Amino acid sequence of GIpX from M. tuberculosis strain H37Rv taken from Genbank Accession No. NP_215615 (SEQ ID No: 1).
Figure 5. Amino acid sequence of PrcA from M. tuberculosis strain H37Rv taken from Genbank Accession No. NP_216625 (SEQ ID No: 2).
Figure 6. Amino acid sequence of ArgD from M. tuberculosis strain H37R.V taken from Genbank Accession No. NP_216171 (SEQ ID No: 3) .
Figure 7. Sequence alignment of the catalytic domains of three Saccharomyces cerevisiae N-terminal acetyl transferases NatA (SEQ ID No: 17), NatB (SEQ ID No: 18) and NatC (SEQ ID No: 19).
Figure 8. Sequence alignment of the catalytic domain of S. cerevisiae NatC (SEQ ID No: 19) and the M. tuberculosis protein Riml (SEQ ID No: 20).
Figure 9. Sequence alignment of the catalytic domain of S. cerevisiae NatC (SEQ ID No: 19) and the M, tuberculosis protein RvO 133 (SEQ ID No: 21).
Figure 10. Sequence alignment of the catalytic domain of S. cerevisiae NatC (SEQ ID No: 19) and the M. tuberculosis protein Rv3225c (SEQ ID No: 22).
Example 1: Rapid experimental determination of translational starts using peptide mass mapping and tandem mass spectrometry within the proteome of Mycobacterium tuberculosis
Abstract
Identification of protein translation start sites has generally been considered to be a bioinformatics exercise, with relatively few start sites confirmed by N-terminal sequencing. Translation start site determination is critical for defining both the protein sequence, and also the upstream DNA which may contain regulatory motifs. We demonstrate here that translation start sites can be determined during routine protein identification using MALDI-MS and MS/MS data to select the correct "N -terminal sequence from a list of alternatives generated in silico. Applying our method to 13 proteins from Mycobacterium tuberculosis, we confirmed 11 predicted translational start sites, and reassigned two. We also showed that N-acetylation, reported to be rare in prokaryotes, was present in 3 of the 13 proteins (23%), suggesting that in the mycobacteria, and related bacteria, this modification may be common, and an important regulator of protein function.
Introduction
The sequencing of complete genomes allows every potentially encoded protein to be identified. In prokaryotes, gene-finding using a combination of bioinformatic factors (homology with other predicted proteins, third base preference etc) is rather efficient, and is generally achieved by a combination of automated and manual curation (Brent, 2005). However, the final proof that a gene is expressed as a protein can only be provided by experimental protein analysis. Proteomic approaches have been used to confirm many predicted genes, as well identifying genes not easily found bioinformatically (Jungblut et al, 2001). However, there is one aspect of the annotation that is difficult to predict and usually remains experimentally untested: the translational start site (TSS).
Determining the TSS accurately is important, not only because this defines the amino acid sequence of the protein (as the stop codon is unambiguous), but also because this also defines the upstream region in the DNA. Genome-wide and focused studies of promoter structure and regulatory motifs depend on the intergenic regions defined by the gene-finding process (Edwards et al, 2005; Salgado et al, 2000). The TSS assignment therefore affects analysis of both protein function and of transcriptional regulation.
Traditionally, TSS identification has been achieved using N-terminal sequencing by Edman degradation (Edman, 1950). This is often technically demanding, and requires large (pmol) quantities of protein. Furthermore, some proteins are blocked through N-terminal modifications and cannot be sequenced. Proteomic approaches using sensitive mass spectrometry methods (Jungblut et al, 1999; Jungbhit et al, 2000) have revolutionised protein analysis because of the speed and sensitivity of the technology, but are not used to identify TSSs.
We have applied rapid proteomic analysis methods to the issue of identifying TSSs. As the same data generated for protein identification could simultaneously identify protein starts, this is therefore a highly efficient approach. We confirmed the predicted N-termini of 11 proteins in Mycobacterium tuberculosis and corrected the predictions for two further proteins from M. tuberculosis .
Surprisingly and unexpectedly, we also demonstrated that, in contrast to the situation in Escherichia coli, a high proportion (23%) of the proteins showed N- terminal acetylation.
Materials and methods
Alternative start codon identification
All Mycobacterium tuberculosis sequences in this manuscript were derived from the M. tuberculosis H37Rv complete genome entry in the EMBL database (Accession number: AL123456 (version 2)).
The following strategy was used to generate alternative TSSs for each predicted gene in the M. tuberculosis genome. Each protein coding sequence in the EMBL entry (identified as a CDS in the feature table) was considered in turn. The region upstream of the gene .was scanned until an in-frame stop codon was identified; by definition, alternative start codons for the gene cannot be found upstream of such a stop codon. The in-frame codons downstream of this stop codon were scanned, and the position and triplet code of each alternative start codon (ATG, GTG or TTG) were recorded (these are the 'm' start codons in Figure 1). Similarly, in- frame codons downstream of the original start codon (pθ) were scanned for alternative start codons. Only the first two downstream alternative start codons were considered at this stage (these are the £pl ' and 'p2' codons in Figure 1). For each predicted alternative start, the new protein sequence encoded was generated. Thus, from the 3,999 coding sequences listed in AL123456.2, 15,199 alternative start protein sequences were identified. This procedure was performed using a Perl program (AlternaStart.pl).
In silico tryptic digestions The alternative protein sequences generated above were subjected to an in silico tryptic digest using the 'proteogest' software from University of Toronto (Cagney et al, 2003) (downloaded from h.ttp://www.utoronto.ca/emililab/program/ proteogest.htm). This run was performed using the standard tryptic digest settings (no missed cleavages, no modification, no cleavage where R or K are followed by P).
Protein test set selection
In order to asses the feasibility of our strategy, we decided to focus initially on the proteins most likely to have their definitive translation start identified by mass spectrometry. Thus, the alternative protein sequences were screened according to a number of technical criteria. First we discarded any protein not in a set of 289 proteins for which a 2D-gel spot had been identified in a previous large scale proteomics analysis of M. tuberculosis (http://web.mpiib-berlin.mpg.de/cgi- bin/pdbs/2d-page/extern/index.cgi). We next eliminated any protein for which the predicted mass of the first N-terminal tryptic fragment of the p0 protein was not between 800 and 2000 Da (the best resolution range for our mass-spectrometry equipment), and proteins without an arginine at the C-terminal end of the p0 tryptic peptide (as ionisation in MALDI-MS is less effective in lysine-containing peptides as compared to arginine containing ones (Krause et al, 1999)). The remaining 76 proteins were then ranked according to minimal number of alternative starts (i.e. least total number of 'm' and 'p' variants), and least number of fragments assuming a p0 start. This selection and classification task was performed using a program called ParseProteogest.pl, which also calculates N- terminal tiyptic fragment weights including possible modifications such as methionine formylation and fragment acetylation.
The data for M. tuberculosis protein GIpX (RvI 099c) were added to the list because although the protein does not meet all our selection criteria, we had strong previous evidence of incorrect translational start prediction for it (Movahedzadeh et al, 2004). From these proteins, 15 proteins with the best available spectral data were selected for further investigation (Table 3). AU these proteins were tested for the presence of a signal peptide using the online SignalP 3.0 resource available at http://www.cbs. dtu.dk/services/SignalP/ (Bendtsen et al, 2004). No signal peptide was detected in any of them.
2-DE/MALDI-MS and MS/MS
Cellular proteins of Mycobacterium tuberculosis H37Rv were prepared and 300 μg analysed by 2-D gel electrophoresis (2-DE) as previously described (Jungblut et al, 1999). Protein spots on analytical gels were visualized by Coomassie
Brilliant Blue G250 staining(Doherty et al, 1998). Spots were excised and digested in-gel (Lamer & Jungblut, 2001). MALDI-MS was performed on a
PerSeptive Voyager Elite time-of-flight instrument (PerSeptive, Framingham, MD, USA) for which 0.5 μl peptide solution were mixed with an equal amount of dehydroxybenzoic acid (DHB) matrix and applied to a MALDI sample template.
Mass spectra were recorded in Reflectron mode with delayed extraction on. 256 laser shots constitute one spectrum. Reanalysis for MS/MS confirmation of the N- termini was performed with a 4700 Proteomics Analyzer (Applied Biosystems, Foster City, Ca, USA).
For the digestion of spots, the gel pieces were destained in 500 μl of destaining buffer (200 mM NH4HCO3, 50% acetonitrile (ACN)), equilibrated in 500 μl digestion buffer (50 mM NH4HCO3, 5% ACN). The supernatant was removed and the gel piece was dried within 30 min in a Micro concentrator 5301 (Eppendorf) at 3O0C and then digested in 25μl digestion buffer containing 0.1 μg trypsin (Sequencing grade modified Trypsin, Promega, WI, USA). After digestion overnight at 370C the reaction tube was centrifuged and the supernatant (Sl) transfeπ-ed into another tube. To the gel piece 25 μl 60% ACN, 0.3% TFA was added to stop the trypsin reaction and to shrink and wash the gel piece. After 10 min the supernatant (S2) was added to S 1. The gel piece was washed and shrunk with 25 μl 100% ACN. This supernatant was then added to Sl + S2 and dried in the Microconcentrator 5301 at 65°C. For the MS analysis, the dried peptides were dissolved in 1 μl of 33% ACN and 0.1% TFA and 0.25μl of this solution were mixed with 0.5 μl of alpha-Cyano-4-hydroxycinnamic acid (CHCA) solύbilised in 50% ACN, 0.3%TFA on parafilm and the resulting mixture was transferred to the template of the mass spectrometer and analysed.
The peptide masses were obtained using the following parameters: reflectron mode, 2OkV accelerating voltage, a low mass gate of 500Da and a mass range between 500 and 4000Da. MS/MS spectra were obtained with and without collision gas. The database searches were performed with Mascot (http://www.matrixscience.com). Search parameters were: 30 ppm peptide mass tolerance for peptide mass fingerprints and 0.3 Da for MS/MS spectra. MS/MS data were compared with the theoretical sequence data of the protein under investigation. Search criteria were: one and two missed cleavages allowed and possible oxidation of methionine, N-terminal acetylation of the protein and pyro GIu formation from N-terminal Gin, propionamide and sodium adducts. MS/MS spectra were manually evaluated.
Results
Confirming the M. tuberculosis GIpX start site
We have previously predicted that the annotated translational starts for two Mycobacterium tuberculosis proteins were incorrect (Movahedzadeh et al, 2004). This is despite the fact that the genome of M. tuberculosis was one of the earliest completed genome sequences in which each ORF was manually annotated (Cole et al, 1998), and which has since been rigorously reannotated (Camus et al, 2002).
Our predicted reassignment of the start of GIpX (RvI 099c) was based mainly on extensive comparative genomic analysis (Movahedzadeh et al, 2004). We observed that amino acid homology extended beyond the annotated start, and we proposed that translation actually started 34 residues earlier (Figure 1). In addition, some experimental N-terminal sequence data had been reported for the Corynebacterium glutamicum GIpX orthologue, Fbp (Rittmann et al, 2003), which allowed us to be more confident about our new prediction for the M. tuberculosis GIpX. The extension of GIpX led to a conflict with the adjacent and divergently transcribed gene RvIlOO, and we proposed an alternative start for this as well. This apparent misannotation therefore completely altered the predicted shared promoter region for two genes.
As GIpX had been identified proteomically in M. tuberculosis extracts (http://web.mpiib-berlin.mpg.de/cgi-bin/pdbs/2d-page/extern/index.cgi), we reasoned that existing MS data might be able to confirm or refute our reannotation. We therefore predicted peptide masses that would distinguish the predictions (Figure 1; Table 3). Analysis of the existing data was inconclusive, but generation of new MS data confirmed (a) the presence of a peptide (NLAMELVR; SEQ ID No: 12) that crosses the originally predicted start site and (b) the presence of a peptide that corresponds to our new prediction (Figure 2). No peak corresponding to the original start prediction was seen.
The start peptide we identified (TAEGSGSSTAAVASHDPSHTRPSR; SEQ ID No: 13) lacked an N-terminal formyl-methionine, which is not an unusual finding (see Discussion). Interestingly, GIpX occurred in two spots on the 2D gel; the more acidic one showed a shift of 42 in mass for this peptide suggesting an acetylation (Figure 2). MS/MS spectra resulted in y ions up to y21 without this shift suggesting acetylation of the N-terminal threonine. We concluded that the start of the new prediction is correct, with post-translational modifications to cleave f-Met, and partially acetylate the N-terminus.
Developing a higher throughput approach
The success with GIpX showed the potential of peptide mass mapping to experimentally verify protein starts on a large scale. We therefore calculated the masses of start peptides for all predicted proteins in the M. tuberculosis genome. We called the currently annotated start codon 'pθ', and identification of these peptides would support the current annotation. We then identified potential alternative start codons upstream (ml, m2, etc. for 'minus') and downstream (pi, p2, etc. for 'plus') (see Figure 1). For each of these we recalculated the predicted masses of start peptides. Use of an upstream start codon would result in a longer protein, with the pO codon encoded residue contained within another peptide, the identification of which would be evidence against pO (as occurred with GIpX). We therefore also calculated masses of such trans-pθ peptides.
To test our approach more rigorously, a shortlist of 15 proteins was drawn up. This was based on (a) the presence of existing MS data, (b) no obvious signal sequence being present, (c) the mass of the pO peptide falling within the range of ideal sensitivity for MS, and (d) the peptide being preferably cleaved by trypsin at arginine rather than lysine (see methods).
Table 3 shows the data we obtained. We initially analysed existing data, which were adequate in some instances but not in others, requiring repeat MALDI-MS analyses. Of the 15, we had difficulties with two. In one case (RvlO17c;PrsA), the predicted N-terminal peptide coincided with a common contamination peak. Another (Rv2557) came from a spot containing more than one protein and was therefore unusable for the present investigation. The other 13 proteins were successfully analysed, and are briefly presented below.
Peptide start analyses
A. Revised start sites
GIpX (RvI 099c): m2
We confirmed that GIpX is 34 residues larger than originally predicted (Figure 1).
The fMet is cleaved, and in addition to an unmodified protein species there is also a less abundant spot with an acetylated N-terminal peptide (Figure 2).
Figure imgf000032_0001
The ribH gene lies downstream of rϊbA2 in what appears to be an operon (Figure
3). There is a 14 bp gap between the stop codon of ήhA2 and the predicted start valine of ribH. No peak corresponding to this start was seen; instead there was a clear peak corresponding to the peptide crossing the originally predicted start. The ml start lies two amino acids upstream of the peptide identified, and as the second residue is a lysine, should be cleaved in the procedure used. We identified both the predicted peptide cleaved prior to the amino acid sequence GGA in the N- terminal region of ribH in Figure 3, and a peptide beginning with the amino acid sequence MKGGA (SEQ ID No: 23) where the trypsin had not cleaved. The methionine was encoded by a GTG, which would normally code for valine further confirming that this was the translational start. This sequence was confirmed by MS/MS. The GGA... peptide was confirmed by 8 y ions (1,2,3,5,6,9,10, 12) and 3 b ions (4,5,8) and the MKGGA... peptide by 8 y ions (1,3,4,5,6,9,10,12). The expected ions y5 and ylO, both with a cleavage after D and the mass loss of 64 further confirm the sequence. This new ribH start would overlap with the ribA2 stop codon (GTGA; Figure 3) at position -4. Such an arrangement is known to be the most common for prokaryotic genes thought to be part of the same operon (Salgado et al, 2000) as is the case with rϊbA2 and ribH.
B. Confirmed start sites
As shown in Table 3, we confirmed the start sites for 11 proteins. In most cases, a peptide lacking formylmethionine was detected, although in one case (Rv0738) deformylated methionine was present. In two cases (ArgD and PrcA), N- terminally acetylated peptides were found in addition to the non-acetylated form.
Discussion We have suggested a simple strategy for the experimental identification of translation start sites. The correct identification of these sites is important when assessing the length and function of the encoded proteins, as well as key regulatory factors such as the presence or absence of upstream regulatory sequences or the identification of genes likely to be co-expressed as part of an operon.
We have shown that proteomic methods can be used to identify protein TSSs. Of 15 proteins tested, we confirmed predicted starts for 11, and reassigned the starts of two. In one case (glpX; RvI 099 c), the reassigned start affects the adjacent gene and completely alters the predicted promoter region for both genes. In the second case (ribH; RvI 416), the alteration is minor, but makes biological sense, and indicates that a predicted ribosome-binding site is probably non- functional. While the correction of mis-assigned start sites underlines the need to carry out this work, confirmation of current predictions is also an important contribution. As proteins are mainly identified through homology, it is critical that at least a small number of representatives have confirmed TSSs, in order to anchor the remaining predictions within an experimental context.
Our approach is simple in concept, and has the great advantage that most of the data are routinely generated during standard proteomic procedures. TSS identification has previously been a relatively neglected area; in the past N- terminal sequencing was the main method for protein identification, but this has been largely supplanted by more sensitive MALDI-MS technology. There have been large-scale N-terminal sequencing projects; for example, Link et at (1997) studied the N-termini of 295 Escherichia coli proteins. Of these, 72 failed because of signal peptide cleavage or N-terminal blockage. Of 223 N-terminal starts characterised, 10 (4.5%) were reassigned. However they used Edman degradation, and the need to use different chemistries makes this less practical as a routine method. We suggest that application of the method described here in proteomics laboratories would dramatically increase the number of experimentally determined protein TSSs with minimal extra resource. There is also the possibility of looking at this retrospectively using pre-existing MS datasets, although we preferred to carry out confirmatory analyses. This emphasises the importance of storing MS data in proteome databases.
More recent large-scale proteome projects have used proteomics to help genome annotation (Jaffe et at., 2004; Lipton et al, 2002). In a reanalysis of the Mycoplasma pneumoniae genome, over 81% of the predicted ORFs were identified proteomically, and 16 new ORFs were detected. The algorithm used resulted in extending the predicted N-termini of 19 proteins, on the basis that peptides were identified that extended beyond the annotated start. The translational start was then reassigned by looking upstream for a start codon. However, they did not actually identify start peptides.
We looked for proteins starting at ATG, GTG or TTG codons. ATG is the classical initiation codon, but in GC-rich genomes such as M. tuberculosis^ GTG is also utilised. Our method conclusively confirms the starts where a methionine residue is seen with a GTG or TTG codon, as this is the only scenario where such a substitution would take place. Thus Met residues were detected with both RibH and RvO738, even though they start with GTG (valine) codons.
It is appreciated that there are some minor limitations to this method of identifying TSSs since proteins that are cleaved (for example removal of signal sequences) would not be resolved. However, as will be readily appreciated by the person of skill in the art, it is possible to adapt this method to identify such cleaved N- termini, since signal sequence cleavage sites can be predicted.
We selected a subset of proteins that would maximise the chance of success in identifying TSSs, suggesting that our success rate (84%) would be less likely with a random set of proteins. One factor we used was to choose proteins where the predicted N-terminal peptide was resolvable using our experimental conditions and mass spectrometer. If a protein had an N-terminal peptide that was not suitable, then alternative proteases could be used, or an orthologue in another species may be more amenable. Also, the presence or absence of a peptide that crosses the predicted start might be detectable, even if the start peptide itself is not. We encountered a potential difficulty with Rv2908c, where a predicted start (p2) corresponded to a tryptic peptide (because it lies next to an arginine or lysine residue); in such cases it may be preferable to include additional evidence before unambiguously assigning this as a start. Finally, our approach is directly applicable to all prokaryote genomes, but will need adapting for eukaryotes where exons need to be considered.
Both of the misannotated proteins we identified were longer than the predicted proteins. Although our sample is small, we might expect this to be the case with a carefully annotated genome, as alignments to other homoϊogues will tend to give a conservative indication of length. With genomes for which gene start assignment uses other assumptions, the balance of 'm' and 'p' reassignments might change.
One benefit of proteomic analyses is the ability to identify post-translational modifications. A common modification is removal of fMet, and we observed this for 11 of the 13 proteins. In the two cases where there was no cleavage, the residue had been deformylated. In some cases, methionine residues (both N- terminal and internal) had been oxidised (Schmidt et al, 2006); we cannot tell if this occurred intracellularly or during the experimental procedure.
Most interestingly, and quite unexpectedly, we found that N-terminal peptides from three proteins (GIpX, ArgD and PrcA) were at least partially acetylated. MS-MS sequencing showed the acetyl group to be on the N-terminal residue. Although N-terminal acetylation is the norm in eukaryotic proteins, it is reported to be rare in prokaryotes (Polevoda & Sherman, 2002). N-terminal acetylation is amino-acid dependent (Persson et al, 1985), and our observations of acetylated serine or threonine residues are in line with previous work. N-terminal acetylation may alter function of the protein. The only M. tuberculosis protein that has been reported to be N-terminally acetylated is the antigen ESAT-6 which normally interacts with the protein CFP-10 but which fails to do so when acetylated (Okkels e^ α/., 2004).
We have shown that 3 out of 13 proteins (23%) have an acetylated form. This raises the question as to whether the common assumption in the art - that only very few prokaryotic proteins are acetylated - is actually true. Much of the existing dogma came from analysing E. coli proteins (Poloveda & Sherman, 2002), and our data suggest that this may be very different in other bacteria, in particular the mycobacteria. Table 3: MS analyses of N-terminal peptides from M. tuberculosis proteins
LU
Figure imgf000037_0001
UJ
Figure imgf000038_0001
a
AU from M. tuberculosis unless otherwise mentioned b
CP: cellular proteins; CSN: culture supernatant c
Curved brackets indicate a missing residue, presumed to be a cleaved fMet, and the amino acid predicted from the DNA sequence
Identity of start peptide. 'pθ' indicates the annotation in the published genome sequence is correct, 'ml 7'm2' indicate that the first or second upstream start codon is used. 'Ac' indicates that the peptide is N-terminally acetylated. e
Square brackets indicate the predicted amino acid encoded by the codon present where an uncleaved methionine was identified (M=ATG,
V=GTG, L=TTG)
Non-acetylated peptide also found, mass 2366.1 Spot from M. bovis BCG Chicago Non-acetylated peptide also found, mass 1113,52 Non-acetylated peptide also found, mass 1588,74
^j Spot from M. tuberculosis Erdman
The first sequence for each gene corresponds to the sequence of the detected N-terminal peptide, and the second sequence corresponds to the N-terminal peptide sequence predicted from the nucleotide sequence.
Example 2A: Identification of M tuberculosis protein N-terminal acetylases
Protein N-terminal acetylation in Saccharomyces cerevisiae occurs by any of three protein N-terminal acetyl transferases: NatA, NatB or NatC (Polevoda & Sherman 2003a, b). These enzymes have different specificities; thus while NatA tends to acetylate proteins starting with the amino acids Ser, Ala, GIy or Thr, NatB acetylates proteins starting with Met-Glu, Met-Asp, Met-Asn or Met-Met, and NatC functions with proteins beginning Met-Ile, Met-Leu, Met-Trp or Met-Phe. The yeast enzymes each have a catalytic subunit and 1-2 auxiliary subunits. The three catalytic subunits (which are 176-195 residues long, with predicted molecular weights of 19.7-22.9 kDa) are relatively dissimilar, but are clearly related. They show homology along the length of the proteins, as shown in Figure 7.
We searched the M. tuberculosis genome sequence (http://genolist.pasteur.fr/TubercuList/) using BLAST with each of the catalytic subunits of the three yeast NATs, and identified the following matches (E value <0):
Sc_NatA_cat: match to M. tuberculosis Riml (E 3e-05)
Sc_NatB_cat: no matches.
Sc_NatC_cat: matches to M. tuberculosis Riml (E 9e-05), RvO133 (E 0.008), and Rv3225c (E 0.0932) (see Figures 8-10, respectively).
Riml
Riml is a 158 residue protein whose sequence is shown in Figure 8 (SEQ ID No: 20). Riml encodes a probable ribosomal protein alanine acetyl transferase, that acetylates ribosomal protein S 18.
Bacterial genes are often organised in cotranscribed units (operons) that may have related functions. Riml appears to lie within an operon of six genes, which includes an essential peptidoglycan biosynthesis gene, air, encoding alanine racemase. The other gene with a predicted function is gap, that encoded a putative O-sialoglycoprotein endopeptidase
RvO133
We now consider the RvOl 33 protein to be an important M. tuberculosis protein N-terminal acetyl transferase. RvOl 33 is a 201 residue protein whose sequence is shown in Figure 9 (SEQ ID No: 21). RvO133 codes for a 22.8 kDa protein, that shows low but significant homology with Sc_NatC_cat: This type of scattered homology results in poor matches using BLAST (which looks for clustered local alignments), but gives confidence that the proteins are genuinely related. A Clustal analysis with all yeast proteins and RvOl 33 suggests that NatC_cat is more related to RvOl 33 that the other yeast proteins. The RvOl 33 gene is not obviously in an operon, so adjacent genes cannot give clear indications of function.
The Tuberculist annotation of RvO133 indicates that it is "a probable acetyl transferase (EC 2.3.1.-), highly similar to others e.g. PUAC_STRLP|P 13249 puromycin N-acetyltransferase". Puromycin is an aminoglycoside, and a puromycin N-acetyltransferase is therefore not a protein N-terminal acetyl transferase. Indeed, neither RvOl 33 nor PUAC_STRLP|P 13249 puromycin N- acetyltransferase have been shown, suggested or predicted to acetylate proteins.
Rv3225c
Rv3225c is a 474 residue protein whose sequence is shown in Figure 10 (SEQ ID No: 22). Rv3225c encodes a larger protein (51.2 IcDa) that shows a relatively high level of homology to Sc_NatC_cat in the N-terminal half of the protein. Rv3225c does not appear to lie in an operon.
Searching in the Tuberculist database (http://genolist.pasteur.fr/TubercuList/) for annotation referring to acetyltransferases, a number of other proteins are identified, but none which make them as likely as the three candidates listed above. We therefore consider that Riml, Rv0133 and Rv3225c, and especially RvO 133, act as protein N-terminal acetyl transferases in M. tuberculosis. Since we now consider N-terminal acetylation of proteins to be important for their function, at least in mycobacteria, we would predict that loss of the acetylase would affect growth of the bacterium, hence an inhibitor of such proteins would be expected to be clinically useful.
Example 2B: Identification of M. tuberculosis protein N-terminal acetylases
Based on the sequence of the M. tuberculosis proteins Riml, RvOl 33 and - Rv3225c, and their corresponding polynucleotide sequences (http://genolist.pasteur.fr/TubercuList/) the three genes are cloned into expression vectors using techniques well known in the art. Sambrook et al (2001) "Molecular Cloning, a Laboratory Manual", 3r edition, Sambrook et al (eds), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA, describes general bacterial cloning techniques that would be used for this purpose. The three proteins are each expressed in E. coli, and are purified by affinity chromatography (as described by Sambrook et al, 2001).
The following in vitro N-acetylation assay (adapted from Sugiura et al 2003 and Arnesen et al 2005) is used to determine their protein NAT activity. The purified proteins are incubated with [3H]acetyl-CoA and an acetyl group acceptor such as adrenocorticotropic hormone peptide, or the M. tuberculosis GIpX protein or its N-terminal peptide. The acetylated peptide/protein is separated from the radioactive substrate by cation-exchange chromatography, affinity chromatography or size-exclusion chromatography. The radioactive count is taken. The presence and level of the radioactivity is indicative of the level of N- acetyl transferase activity. Alternatively, this assay can be carried out using unlabelled acetyl-CoA, and the location of the acetyl group can be confirmed by mass spectrometry. References
Aebersold, R. & Mann, M. (2003). Mass spectrometry-based proteomics. Nature All, 198-207.
Arnesen et al (2005) Identification and characterization of the human ARDl- NATH protein acetyltransferase complex. Biochem. J. 386: 433-443
Bendtsen, J. D., Nielsen, H., von Heijne, G. & Brunak, S. (2004). Improved prediction of signal peptides: SignalP 3.0. J MoI Biol 340, 783-795.
Bhakta, S., Besra, G. S., Upton, A. M. & other authors (2004). Arylamine N- acetyltransferase is required for synthesis of mycolic acids and complex lipids in Mycobacterium bovis BCG and represents a novel drug target. J Exp Med 199, 1191-1199.
Brent, M. R. (2005). Genome annotation past, present, and future: How to define an ORF at each locus. Genome Res 15, 1777-1786.
Brooke, E. W., Davies, S. G., Mulvaney, A. W., Pompeo, F., Sim, E. & Vickers, R. J. (2003). An approach to identifying novel substrates of bacterial arylamine N- acetyltransferases. Bioorg Med Chem 11, 1227-1234.
Cagney, G., Amiri, S., Premawaradena, T., Lindo, M. & Emili, A. (2003). In silico proteome analysis to facilitate proteomics experiments using mass spectrometry. Proteome Sci 1, 5. Camus, J. C, Pryor, M. J., Medigue, C. & Cole, S. T. (2002). Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology 148, 2967-2973.
Cole, S. T., Brosch, R., Parkhill, J. & other authors (1998). Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537-544.
Darwin, K. H., Ehrt, S., Gutierrez-Ramos, J. C, Weich, N. & Nathan, C. F. (2003). The proteasome of Mycobacterium tuberculosis is required for resistance to nitric oxide. Science 302, 1963-1966.
Doherty, N. S., Littman, B. H., Reilly, K., Swindell, A. C, Buss, J. M. & Anderson, N. L. (1998). Analysis of changes in acute-phase plasma proteins in an acute inflammatory response and in rheumatoid arthritis using two-dimensional gel electrophoresis. Electrophoresis 19, 355-363.
Domon, B. & Aebersold, R. (2006). Mass spectrometry and protein analysis. Science 312, 212-217. Edman, P. J. (1950). Method for determination of the amino acid sequence in peptides. Acta Chem Scand 4, 283-293.
Edwards, M. T., Rison, S. C, Stoker, N. G. & Wernisch, L. (2005). A universally applicable method of operon map prediction on minimally annotated genomes using conserved genomic context. Nucleic Acids Res 33, 3253-3262.
Errey, J. C. & Blanchard, J. S. (2005). Functional characterization of a novel ArgA from Mycobacterium tuberculosis. J Bacteriol 187, 3039-3044.
Fenn, J. B., Mann, M., Meng, C. K., Wong, S. F. & Whitehouse, C. M. (1989). Electrospray ionization for mass spectrometry of large biomolecules. Science 246, 64-71.
Flynn, J. L. (2006). Lessons from experimental Mycobacterium tuberculosis infections. Microbes Infect 8, 1179-1188.
Hein, D. W., Doll, M. A., Fretland, A. J., Leff, M. A, Webb, S. J., Xiao, G. H., Devanaboyina, U. S., Nangju, N. A. & Feng, Y. (2000). Molecular genetics and epidemiology of the NATl and NAT2 acetylation polymorphisms. Cancer Epidemiol Biomarkers Prev 9, 29-42.
Jaffe, J. D., Berg, H. C. & Church, G. M. (2004). Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4, 59-11.
Jungblut, P. R., Bumann, D., Haas, G., Zimny-Arndt, U., Holland, P., Lamer, S., Siejak, F., Aebischer, A. & Meyer, T. F. (2000). Comparative proteome analysis of Helicobacter pylori. MoI Microbiol 36, 710-725.
Jungblut, P. R., Muller, E. C, Mattow, J. & Kaufmann, S. H. (2001). Proteomics reveals open reading frames in Mycobacterium tuberculosis H37Rv not predicted by genomics. Infect Immun 69, 5905-5907. Jungblut, P. R., Schaible, U. E., Mollenkopf, H. J. & other authors (1999). Comparative proteome analysis of Mycobacterium tuberculosis and Mycobacterium bovis BCG strains: towards functional genomics of microbial pathogens. MoI Microbiol 33, 1103-1117.
Karas, M. & Hillenkamp, F. (1988). Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem 60, 2299-2301.
Krause, E., Wenschuh, H. & Jungblut, P. R. (1999). The dominance of arginine- containing peptides in MALDI-derived tryptic mass fingerprints of proteins. Anal Chem 71, 4160-4165. Lamer, S. & Jungblut, P. R. (2001). Matrix-assisted laser desorption-ionization mass spectrometry peptide mass fingerprinting for proteome analysis: identification efficiency after on-blot or in-gel digestion with and without desalting procedures. J Chromatogr B Biomed Sd Appl 752, 311-322. Link, A. J., Robison, K. & Church, G. M. (1997). Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K- 12. Electrophoresis 18, 1259-1313.
Lipton, M. S., Pasa-Tolic, L., Anderson, G. A. & other authors (2002). Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags. Proc Natl Acad Sd USA 99, 11049-11054.
Medzihradszky, K. F., Campbell, J. M., Baldwin, M. A., Falick, A. M., Juhasz, P., Vestal, M. L. & Burlingame, A. L. (2000). The characteristics of peptide collision- induced dissociation using a high-performance MALDI-TOF/TOF tandem mass spectrometer. Anal Chem 72, 552-558. Movahedzadeh et al (2004). The Mycobacterium tuberculosis inol gene is essential for growth and virulence. MoI Microbiol 51:1003-14.
Movahedzadeh, F., Rison, S. C, Wheeler, P. R., Kendall, S. L., Larson, T. J. & Stoker, N. G. (2004). The Mycobacterium tuberculosis RvI 099c gene encodes a GlpX-like class II fructose 1,6-bisphosphatase. Microbiology 150, 3499-3505. Okkels, L. M., Muller, E. C, Schmid, M., Rosenkrands, L, Kaufmann, S. H., Andersen, P. & Jungblut, P. R. (2004). CFPlO discriminates between nonacetylated and acetylated ESAT-6 of Mycobacterium tuberculosis by differential interaction. Proteomics 4, 2954-2960.
Parish, T. & Stoker, N. G. (2000). Use of a flexible cassette method to generate a double unmarked Mycobacterium tuberculosis UyA plcABC mutant by gene replacement. Microbiology 146 ( Pt 8), 1969-1975.
Parish, T., Smith, D. A., Kendall, S. L., Casali, N., Bancroft, G. J. & Stoker, N. G. (2003). Deletion of two-component regulatory systems increases virulence of Mycobacterium tuberculosis. Infect Immun 71, 1134-1140. Payton, M., Gifford, C, Schartau, P., Hagemeier, C, Mushtaq, A., Lucas, S., Pinter, K. & Sim, E. (2001). Evidence towards the role of arylamine N- acetyltransferase in Mycobacterium smegmatis and development of a specific antiserum against the homologous enzyme of Mycobacterium tuberculosis. Microbiology Ul, 3295-3302. Persson, B., Flinta, C, von Heijne, G. & Jornvall, H. (1985). Structures of N- terminally acetylated proteins. Eur J Biochem 152, 523-527.
Polevoda, B. & Sherman, F. (2002). The diversity of acetylated proteins. Genome Biol 3, reviewsOOOό. Polevoda, B., and F. Sherman. 2002. The diversity of acetylated proteins. Genome Biol 3:reviews0006.
Polevoda, B., and F. Sherman. 2003a. Composition and function of the eukaryotic N-terminal acetyltransferase subunits. Biochem Biophys Res Commun 308:1-11.
Polevoda, B., and F. Sherman. 2003b. N-terminal acetyltransferases and sequence requirements for N-terminal acetylation of eukaryotic proteins. J MoI Biol
325:595-622.
Rittmann, D., Schaffer, S., Wendisch, V. F. & Sahm, H. (2003). Fructose-1,6- bisphosphatase from Corynebacterium glutamicum: expression and deletion of the fbp gene and biochemical characterization of the enzyme. Arch Microbiol 180, 285-292.
Salgado, H., Moreno-Hagelsieb, G., Smith, T. F. & Collado-Vides, J. (2000). Operons in Escherichia coli: genomic analyses and predictions. Proc Natl Acad Sd USA 91, 6652-6657.
Schmidt, F., Krah, A., Schmid, M., Jungblut, P. R. & Thiede, B. (2006). Distinctive mass losses of tryptic peptides generated by matrix-assisted laser desorption/ionization time-of-flight/time-of-flight. Rapid Commun Mass Spectrom 20, 933-936.
Sugiura N, Adams SM, Corriveau RA. (2003) An evolutionarily conserved N- terminal acetyltransferase complex associated with neuronal development. J Biol Chem. 278:40113-20.
Vetting, M. W., LP, S. d. C, Yu, M., Hegde, S. S., Magnet, S., Roderick, S. L. & Blanchard, J. S. (2005). Structure and functions of the GNAT superfamily of acetyltransferases. Arch Biochem Biophys 433, 212-226.
Walker, J.-P. 1963. The NH2-terminal residues of the proteins from cell-free extracts of E. coli. J.Mol.Biol. 7:483-496.
Yates, J. R., 3rd, Carmack, E., Hays, L., Link, A. J. & Eng, J. K. (1999). Automated protein identification using microcolumn liquid chromatography- tandem mass spectrometry. Methods MoI Biol 112, 553-569.

Claims

1. A method of identifying an N-teraiinally acetylated protein in a bacterium, the method comprising: providing details of at least one putative translation start site (TSS) for at least one protein expressed in the bacterium; confirming the actual TSS of the at least one protein using mass spectrometry (MS); and determining whether the at least one protein is N-terminally acetylated using MS.
2. A method according to Claim 1 wherein confirming the actual TSS of the at least one protein comprises the use of matrix-assisted laser desorption/ionisation (MALDI) MS.
3. A method according to Claim 1 wherein determining whether the at least one protein is N-terminally acetylated comprises the use of tandem mass spectrometry (MS/MS).
4. A method according to Claim 1 wherein confirming the actual TSS of the at least one protein comprises the use of MALDI MS and wherein determining whether the at least one protein is N-terminally acetylated comprises the use of MS/MS.
5. Use of MS in the identification of an N-terminally acetylated protein expressed in a bacterium.
6. Use according to Claim 5 wherein the MS is selected from MALDI-MS and MS/MS.
7. A method or a use according to any of the preceding claims wherein the protein is not a ribosomal protein.
8. A method or a use according to any of the preceding claims wherein the protein does not contain an N-terminal signal sequence.
9. A method of identifying an N-terminally acetylated protein in a bacterium, the method comprising: providing a mutant strain of the bacterium comprising at least one protein N-acetyl transferase (pNAT) in mutant form; providing a wild-type strain of the bacterium comprising the at least one pNAT in wild-type form; and identifying a protein that is differentially N-terminal acetylated between the mutant and wild-type bacterial strains.
10. A method according to Claim 9 wherein identifying a protein that is differentially N-terminal acetylated between the mutant and wild-type bacterial strains comprises: extracting proteins from the mutant and wild-type bacterial strains and separating the proteins by 2D-P AGE; determining a difference between the 2D-P AGE patterns obtained from the mutant and wild-type bacterial strains; and identifying a protein which corresponds to the difference in 20-
PAGE patterns.
11. A method according to Claim 9 or 10 further comprising determining the N-terminal acetylation status of the identified protein from one or both of the mutant and wild-type bacterial strains.
12. A method according to Claim 9 or 10 wherein identifying the protein and/or determining the N-terminal acetylation status of the protein comprises the use of MS.
13. A method according to Claim 12 wherein the MS is selected from MALDI-MS and MS/MS.
14. A method according to any of Claims 9 to 13 wherein the at least one pNAT in mutant form has been knocked-out of the mutant bacterial strain.
15. A method according to any of Claims 9 to 14 wherein the mutant bacterial strain contains at least two pNATs in mutant form.
16. A method or a use according to any of the preceding claims wherein the bacterium is a Gram-positive bacterium.
17. A method or a use according to Claim 16 wherein the Gram-positive bacterium is afirmicute.
18. A method or a use according to Claim 17 wherein the firmicute is an αctinomycete.
19. A method or a use according to Claim 18. wherein the αctinomycete is selected from a Mycobacterium, a Corynebacterium or a Nocardia.
20. A method or a use according to Claim 19 wherein the Mycobacterium, is selected from M. tuberculosis, M. avium, M. leprae, M. bovis,
M. smegmatis, M. paratuberculosis and M. marinum.
21. A method or a use according to Claim 20 wherein the Mycobacterium is M. tuberculosis.
22. A method according to any of Claims 9 to 15, wherein the bacterium is M. tuberculosis and the at least one pNAT is selected from Riml, RvO 133 and Rv3225c.
23. A method of identifying a drug discovery target in a pathogenic bacterium, the method comprising: identifying an N-terminaily acetyiated protein expressed in a pathogenic bacterium according to the method or the use of any of the preceding claims.
24. A method according to Claim 23 further comprising determining at least one property of the protein or of the bacterium that is affected by N- terminal acetylation of the protein and which is relevant to the pathogenicity of the pathogenic bacterium.
25. A method according to Claim 24 wherein determining the at least one property of the protein is performed in vitro.
26. A method according to Claim 25 wherein the affected property of the protein is selected from enzymatic function, protein-protein binding, dimerisation and protein stability.
27. A method according to Claim 24 wherein determining the at least one property of the bacterium is performed in culture.
28. A method according to Claim 27 wherein the affected property of the bacterium is selected from survival, growth rate and drug resistance.
29. A method according to Claim 24 wherein determining the at least one property of the bacterium is performed in vivo.
30. A method according to Claim 29 wherein the affected property is selected from infectivity, growth rate, induction of pathology and drug resistance.
31. A method according to any of Claims 23 to 30 wherein the pathogenic bacterium is M. tuberculosis and the protein is a drug discovery target for tuberculosis.
32. A method according to any of Claims 23 to 30 wherein the bacterium is M. leprae and the protein is a drug discovery target for leprosy; or the bacterium is M. avium and the protein is a drug discovery target for M. avium complex; or the bacterium is Corynebacterium diphtheriae and the protein is a drug discovery target for diphtheria.
33. A method according to any of Claims 23 to 30 wherein the bacterium is Nocardia asteroides and the protein is a drug discovery target for nocardiosis; or the bacterium is an Actinomyces spp and the protein is a drug discovery target for actinomycosis; or the bacterium is an
Arcanobacterium spp and the protein is a drug discovery target for Arcanobacterium spp infection.
34. A method according to any of Claims 23 to 30 wherein the bacterium is Listeria monocytogenes and the protein is a drug discovery target for listeria; or the bacterium is a streptococcus, such as Strep pneumoniae or Strep pyogenes, and the protein is a drag discovery target for streptococcal infections; or the bacterium is a staphylococcus, such as Staph aureus, and the protein is a drug discovery target for staphylococcal infections including MRSA; or the bacterium is an enterococcus and the protein is a drag discovery target for enterococcal infections; or the bacterium is Bacillus anthracis and the protein is a drag discovery target for anthrax; or the bacterium is Bacillus cereus and the protein is a drug discovery target for food poisoning; or the bacterium is a Clostridia such as C. perfringens, C. difficile, C. tetani, and C. botulinum, and the protein is a drug discovery target for Clostridial infections and diseases caused by these Clostridia including tetatnus and botulism.
35. An isolated, N-terminally acetylated M, tuberculosis protein wherein the protein is selected from GIpX, PrcA and ArgD as defined in Figures 4 to 6, respectively.
36. An isolated, N-terminaliy acetyiated protein having at least 80% sequence identity with an M. tuberculosis protein selected from GIpX, PrcA and ArgD as defined in Figures 4 to 6, respectively.
37. An isolated, N-terminally acetyiated protein according to Claim 36 having at least 95% sequence identity with an M. tuberculosis protein selected from GIpX, PrcA and ArgD as defined in Figures 4 to 6, respectively.
38. A method of determining the effect of N-teππinal acetylation on the function or activity of a protein as defined in any of Claims 35 to 37, the method comprising: providing N-terminally acetyiated and non N-terminaliy acetyiated forms of the protein, and determining at least one property of the protein that is affected by N-terminal acetylation of the protein.
39. A method according to Claim 38 wherein providing N-terminally acetyiated and non N-terminally acetyiated forms of the protein comprises providing a cell which contains the N- terminally acetyiated form of the protein and a cell which contains the non
N-terminally acetyiated form of the protein, and wherein determining at least one property of the protein that is affected by N-terminal acetylation of the protein comprises determining at least one property of the cell that is affected by N-terminal acetylation of the protein.
40. A method according to Claim 39 wherein the cell is a bacterial cell.
41. A method according to Claim 40 wherein the bacterial cell is M. tuberculosis.
42. A method according to any of uiaims 38 to 42 wherein the at least one property of the protein is one which is relevant to the pathogenicity of M. tuberculosis or a related Mycobacterium.
43. A method of identifying a protein N-acteyl transferase (pNAT) in a bacterium, the method comprising: providing details of at least one putative pNAT; providing a mutant strain of the bacterium in which the putative pNAT gene has been knocked-out; and determining the amount, rate and/or level of N-terminal protein acetylation in the mutant bacterial strain, wherein a reduction in the amount, rate and/or level of N-terminal protein acetylation in the mutant bacterial strain in comparison to the amount, rate and/or level of N-terminal protein acetylation in a wild-type strain of that bacterium, indicates that the putative pNAT is an actual pNAT.
44. A method according to Claim 43 wherein providing details of at least one putative pNAT comprises identifying a putative pNAT by virtue of its homology to a microbial pNAT.
45. A method according to Claim 44 wherein the microbial pNAT is selected from Saccharomyces cerevisiae NatA, NatB and NatC.
46. A method according to Claim 44 wherein the microbial pNAT is selected from M. tuberculosis Riml, RvI 33 and Rv3225c.
47. A method according to any of Claims 43 to 46 wherein the bacterium is as defined in any of Claims 16 to 21.
48. A method according to any of Claims 43 to 47 wherein the bacterium is a pathogenic bacterium and the identified pNAT is a drug discovery target for a disease or condition caused by the pathogenic bacterium.
49. A method according to Claim 48 wherein the pathogenic bacterium is as defined in any of Claim 31 to 34.
50. A method of identifying a pNAT from a desired bacterium, the method comprising: providing a library of polynucleotides from the desired bacterium, which library comprises at least one polynucleotide that encodes a putative pNAT; providing microbial cells of a strain in which a pNAT gene is naturally absent or has been knocked-out; transforming the library of polynucleotides into the microbial cells; incubating the transformed microbial cells under conditions that allow expression of the polypeptides encoded by the polynucleotides; and identifying one or more microbial cells in which the levels of protein N-terminal acetylation have been increased under the conditions in the previous step, wherein increased levels of protein N-terminal acetylation in the identified microbial cell indicates that the transformed polynucleotide encodes a pNAT from the desired bacterium.
51. A method according to Claim 50 wherein the library of polynucleotides is a cDNA library.
52. A method according to Claim 50 or 51 wherein the library of polynucleotides is derived from a bacterium as defined in any of Claims 16 to 21 or 31 to 34.
53. A method according to any of Claims 50 to 52 wherein the microbial cell is Saccharomyces cerevisiae and the pNAT gene which has been knocked- out is NatA, NatB or NatC.
54. A method according to any of Claims 50 to 52 wherein the microbial cell is M. tuberculosis and the pNAT gene which has been knocked-out is RimI, Rvl33 or Rv3225c.
55. A method according to any of Claims 50 to 52 wherein the microbial cell in which a pNAT is naturally absent is E. coli.
56. A method of identifying a pNAT from a desired bacterium, the method comprising: identifying a protein from the desired bacterium which requires N- terminal acetylation for a specified activity; randomly mutagenising genes in the desired bacteria; selecting or screening for bacteria that have lost the specified activity; and identifying at least one gene that has been mutagenised in the bacteria which has lost the specified activity, wherein a loss of the specified protein activity when a gene is mutagenised indicates that the gene encodes a pNAT from the desired bacterium.
57. A method according to Claim 56 wherein the desired bacterium is M. tuberculosis, the protein is Esat6 and the specified activity which requires acetylation is secretion from the cell.
58. A method of screening for an inhibitor of a bacterial pNAT, the method comprising: purifying the pNAT; contacting the pNAT with a test compound; and assaying for pNAT activity in vitro, wherein a reduction in pNAT activity in the presence of the test compound indicates that it is a potential inhibitor of the pNAT.
59. A method according to Claim 58 wherein the bacterial pNAT is selected from M. tuberculosis BAwI, RvO 133 and Rv3225c; a homologue of M. tuberculosis Kiml, KvO133 and κv3225c from a, Mycobacterium other than M. tuberculosis; and a bacterial pNAT identified by the method of any of Claims 44 to 58.
60. A method according to Claim 58 or 59 further comprising modifying the test compound, and testing the modified compound for the ability to inhibit the pNAT in vitro and/or in culture.
61. A method according to any of Claims 58 to 60 further comprising determining whether the test compound or the modified compound has the ability to inhibit the pNAT in the bacterium in vivo.
62. A method according to Claim 61 wherein the bacterium is M. tuberculosis.
63. A method according to any of Claims 58 to 62 wherein the bacterium is a pathogenic bacterium, the method further comprising determining whether the test compound or the modified compound has the ability to inhibit the pNAT in an in vivo model of a disease or condition caused by the pathogenic bacterium.
64. A method according to any of Claims 58 to 63 further comprising the step of formulating a compound which has the ability to inhibit the bacterial pNAT into a pharmaceutically acceptable composition.
PCT/GB2007/002332 2006-06-24 2007-06-25 Detection of acetylation of prokaryotic proteins by mass spectrometry WO2007148106A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0612564A GB0612564D0 (en) 2006-06-24 2006-06-24 Acetylation
GB0612564.5 2006-06-24

Publications (2)

Publication Number Publication Date
WO2007148106A2 true WO2007148106A2 (en) 2007-12-27
WO2007148106A3 WO2007148106A3 (en) 2008-03-27

Family

ID=36803845

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2007/002332 WO2007148106A2 (en) 2006-06-24 2007-06-25 Detection of acetylation of prokaryotic proteins by mass spectrometry

Country Status (2)

Country Link
GB (1) GB0612564D0 (en)
WO (1) WO2007148106A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011088421A2 (en) * 2010-01-15 2011-07-21 California Institute Of Technology Discovery and applications of the proteolytic function of n-terminal acetylation of cellular proteins
CN115873775A (en) * 2022-12-12 2023-03-31 华东理工大学 Method for improving secondary metabolic capacity of actinomycetes based on regulatory protein BldD posttranslational modification

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHANG ET AL: "Improved analysis of membrane protein by PVDF-aided, matrix-assisted laser desorption/ionization mass spectrometry" ANALYTICA CHIMICA ACTA, ELSEVIER, AMSTERDAM, NL, vol. 556, no. 1, 18 January 2006 (2006-01-18), pages 237-246, XP022213746 ISSN: 0003-2670 *
OKKELS LIMEI MENG ET AL: "CFP10 discriminates between nonacetylated and acetylated ESAT-6 of Mycobacterium tuberculosis by differential interaction" PROTEOMICS, vol. 4, no. 10, October 2004 (2004-10), pages 2954-2960, XP002454945 ISSN: 1615-9853 *
RISON STUART C G ET AL: "Experimental determination of translational starts using peptide mass mapping and tandem mass spectrometry within the proteome of Mycobacterium tuberculosis" MICROBIOLOGY (READING), vol. 153, no. Part 2, February 2007 (2007-02), pages 521-528, XP002454946 ISSN: 1350-0872 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011088421A2 (en) * 2010-01-15 2011-07-21 California Institute Of Technology Discovery and applications of the proteolytic function of n-terminal acetylation of cellular proteins
WO2011088421A3 (en) * 2010-01-15 2011-11-10 California Institute Of Technology Discovery and applications of the proteolytic function of n-terminal acetylation of cellular proteins
US8999896B2 (en) 2010-01-15 2015-04-07 California Institute Of Technology Discovery and applications of the proteolytic function of N-terminal acetylation of cellular proteins
CN115873775A (en) * 2022-12-12 2023-03-31 华东理工大学 Method for improving secondary metabolic capacity of actinomycetes based on regulatory protein BldD posttranslational modification
CN115873775B (en) * 2022-12-12 2024-02-20 华东理工大学 Method for improving secondary metabolic capacity of actinomycetes based on posttranslational modification of regulatory protein BldD

Also Published As

Publication number Publication date
WO2007148106A3 (en) 2008-03-27
GB0612564D0 (en) 2006-08-02

Similar Documents

Publication Publication Date Title
Poulsen et al. Proteome‐wide identification of mycobacterial pupylation targets
Jiang et al. The Escherichia coli GTPase CgtAE is involved in late steps of large ribosome assembly
Medzihradszky et al. Lessons in de novo peptide sequencing by tandem mass spectrometry
Okkels et al. CFP10 discriminates between nonacetylated and acetylated ESAT‐6 of Mycobacterium tuberculosis by differential interaction
Nouwens et al. Proteomic comparison of membrane and extracellular proteins from invasive (PAO1) and cytotoxic (6206) strains of Pseudomonas aeruginosa
Hackett et al. Internal lysine palmitoylation in adenylate cyclase toxin from Bordetella pertussis
Watrous et al. Expansion of the mycobacterial “PUPylome”
Johnson et al. Informatics for protein identification by mass spectrometry
Kang et al. The Mycobacterium tuberculosis serine/threonine kinases PknA and PknB: substrate identification and regulation of cell shape
Bai et al. Characterization of Mycobacterium tuberculosis Rv3676 (CRPMt), a cyclic AMP receptor protein-like DNA binding protein
Ouidir et al. Characterization of N-terminal protein modifications in Pseudomonas aeruginosa PA14
Shaw et al. Characterization of a secreted Chlamydia protease
Warscheid et al. A targeted proteomics approach to the rapid identification of bacterial cell mixtures by matrix‐assisted laser desorption/ionization mass spectrometry
Kariu et al. Proteolysis of BB 0323 results in two polypeptides that impact physiologic and infectious phenotypes in B orrelia burgdorferi
Miller et al. NetPhosBac–a predictor for Ser/Thr phosphorylation sites in bacterial proteins
Sun et al. Posttranslational modification of flagellin FlaB in Shewanella oneidensis
Sinha et al. Proteome analysis of the plasma membrane of Mycobacterium tuberculosis
Christie‐Oleza et al. Shotgun nanoLC‐MS/MS proteogenomics to document MALDI‐TOF biomarkers for screening new members of the Ruegeria genus
Wilcox et al. Single ribosomal protein mutations in antibiotic-resistant bacteria analyzed by mass spectrometry
Zhang et al. Members of the Legionella pneumophila Sde family target tyrosine residues for phosphoribosyl-linked ubiquitination
Bastos et al. A glimpse into the modulation of post-translational modifications of human-colonizing bacteria
Potgieter et al. Proteogenomic analysis of Mycobacterium smegmatis using high resolution mass spectrometry
Fagerquist et al. Clinically-relevant Shiga toxin 2 subtypes from environmental Shiga toxin-producing Escherichia coli identified by top-down/middle-down proteomics and DNA sequencing
Cathro et al. Isolation and identification of Enterococcus faecalis membrane proteins using membrane shaving, 1D SDS/PAGE, and mass spectrometry
Rison et al. Experimental determination of translational starts using peptide mass mapping and tandem mass spectrometry within the proteome of Mycobacterium tuberculosis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07733327

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

NENP Non-entry into the national phase in:

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 07733327

Country of ref document: EP

Kind code of ref document: A2