AU2015203546A1 - Reengineering mrna primary structure for enhanced protein production - Google Patents

Reengineering mrna primary structure for enhanced protein production Download PDF

Info

Publication number
AU2015203546A1
AU2015203546A1 AU2015203546A AU2015203546A AU2015203546A1 AU 2015203546 A1 AU2015203546 A1 AU 2015203546A1 AU 2015203546 A AU2015203546 A AU 2015203546A AU 2015203546 A AU2015203546 A AU 2015203546A AU 2015203546 A1 AU2015203546 A1 AU 2015203546A1
Authority
AU
Australia
Prior art keywords
sequence
codons
protein
initiation
mirna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU2015203546A
Inventor
Stephen A. Chappell
Gerald M. Edelman
Vincent P. Mauro
Wei Zhou
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Scripps Research Institute
Original Assignee
Scripps Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2010218388A external-priority patent/AU2010218388B2/en
Application filed by Scripps Research Institute filed Critical Scripps Research Institute
Priority to AU2015203546A priority Critical patent/AU2015203546A1/en
Publication of AU2015203546A1 publication Critical patent/AU2015203546A1/en
Abandoned legal-status Critical Current

Links

Landscapes

  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

Described herein are rules to modify natural mRNAs or to engineer synthetic mRNAs to increase their translation efficiencies. These rules describe modifications to mRNA coding and 3 UTR sequences intended to enhance protein synthesis by: 1) decreasing ribosonral diversion via AUG or non-canonical initiation codonsin coding sequences and/or 2) by evading miRNA-mediated down-regulation by eliminating one or more miRNA binding sites in coding sequences.

Description

REENGINEERING MRNA PRIMARY STRUCTURE FOR ENHANCED PROTEIN PRODUCTION (00011 The entire disclosure in the complete specificaton of our Australian patent application 2010218388 is by this cross-eference incorporated into the present application. BACKGROUND [0002] Translation initiation in eukaryotes involves recruitment by mRNAs of the 408 ribosomal subunit and other components of the translation machinery at either the 5'cap-structure or an internal ribosome entry site ORES Following its recruitment, the 408 subunit moves to an initiation codon. One widely held notion of translation initiation postulates that the 408 subunit moves from the site of recruitment to the initiation codon by scanning through the Sleader in a 5'to direction until the first AUG codon that resides in a good nuleotide context is encountered (Kozak "The Scanning Model for Translation: An Update" J. Cell BioL 108:229-241 (1989)). More recently, it has been postulated that translation initiation does not involve scanning, but may involve tethering of ribosomal subunits at either the cap-structure or an IRES, or clustering of ribosomal subunits at internal sites (Chappell et at "Ribosomal shunting mediated by a translational enhancer element that base pairs to 188 rRNA" P/NAS USA 103(25)9488-9493 (2006) Chappell et al, "Ribosomal tethering and clustering as mechanisms lbr translation initiation" PNAS USA 103(48): 18077-82 (2006)), The 40S subunit moves to an accessible AUG codon that is not necessarily the first AUG codon in the mRNA. Once the subunit reaches the initiation codon by whatever mechanism, the initiator Methionine-tRNA, which is associated with the subunit, base-pairs to the initiation codon, the large (60)S ribosomal subunit attaches, and peptide synthesis begins.
I
[0003] Inasmuch as translation is generally thought to initiate by a scanning mechanism, the effects on translation of AUG codons contained within 5 leaders, termed upstream AUG codons have been considered, and it is known that an AUG codon in the 5' leader can have either a positive or a negative effect on protein synthesis depending on the gene, the nucleotide context, and cellular conditions. For example, an upstream AUG codon can inhibit translation initiation by diverting ribosomes from the authentic initiation codon. However, the notion that translation initiates by a scanning mechanism does not consider the effects of potential initiation codons in coding sequences on protein synthesis. In contrast, the tethering/clustering mechanisms of transation initiation suggests that putative initiation codons in coding sequences, which include both AUG codons and non-canonical codons, may be utilized, consequentially lowering the rate of protein synthesis by competing with the authentic initiation codon for ribosomes. [0004] Micro RNA (miRNL)-mediated down-regulation can also negatively irnpact translation efficiency miRNAs are generally between 21-23 nucleotides in length and are components of ribonucleoprotein complexes. It has been suggested that miRNAs can negatively impact protein levels by base-pairing to mnRNAs and reducing mRNA stability, nascent peptide stability and translation efficiency (Eulalio et ai "Getting to the Root of miRNA-Mediated Gene Silencing" Cel 132:9-14 (1998)). Although miRNAs generally mediate their effects by base-pairing to binding sites in the 3' untranslated sequences(URs) of mRNAs, they have been shown to have similar repressive effects from binding sites contained within coding sequences and S leader sequences. Base-pairing occurs via the so-called "seed sequence," which includes nucleotides 2-8 of the miRNA. There may be more than 1,000 different miRNAs in humans. [00051 The negative impact of putative initiation codons in mRNA coding sequences and miRNA-binding sites in mRNAs pose challenges to the pharmaceutical industry. For example, the industrial production of proteindrugs, DNA vaccines for antigen production, general research purposes and for gene therapy applications are all affected by a sub-optimal rate of protein synthesis or sequence stability. Improving protein yields and higher protein concentration can minimize the costs associated with industrial scale cultures reduce costs of producing drugs and can facilitate protein 2 purification. Poor protein expression limits the large-scale use of certain technologies, for example, problems in expressing enough antigen from a DNA vaccine to generate an immune response to conduct a phase 3 clinical trial. SUMMARY [0006] There is a need in the art for improving the efficiency and stability of protein translation and improving protein yield and concentrationfor example, in the industrial production of protein drugs. [0007] Accordingly, in one aspect the present invention provides a method of improving full-length protein exprecsion efficiency, the method comprising: a) providing a polynuleotide sequence comprising a5 leader sequence, a coding sequence for the protein, a 3' untranslated region, and one or more miRNA binding sites; and b) mutating the one or more miRNA binding sites, wherein the mutation results in a decrease in miRNA binding at the one or more mi RNA binding sites resulting in a reduction of miRNA-mediated down regulation of protein translation, thereby increasing full-ength protein expression efficiency [0008] The method can also include mutating one ornore nucleotides such that the amino acid sequence remains unaltered. [0009] Also disclosed is a method of improving full-ength protein expression efficiency. The method includes providing a polynueleotide sequence having a coding sequence for the protein and one or more miRNA binding sites located within the coding sequene; and mutating the one or more miRNA binding sites. The mutation results in a decrease in miRNA binding at the one or more miRNA binding sites resulting in a reduction of miRNA-mediated down regulation of protein translation, thereby increasing fullength protein expression efficiency. [0010] The methods can include mutating one or more nucleotides in an miRNA seed sequence. The methods can include mutating one or more nucleotides such that initiation codons are not introduced into the polynucleotide sequence. The methods can include nutating one or more nucleotides such that rare codons are not introduced 3 into the polynucleotide sequence The methods can include mutating one or more nucleotides such that additional miRNA seed sequences are not introduced into the polynuelotide sequence. The one or more miRNA binding sites can be located within the coding sequence. The one or more miRNA binding sites can be located within the 3' untranslated region. The one or more miRNA binding sites can be located within the 5' leader sequence. [0011] A further understanding of the nature and advantages of the present disclosure may be realized by reference to the remaining portions of the specification and claims. BRIEF DESCRIPTION OF THE DRAWINGS [0012] Figures IA-I 1B show growth curves of E co/i DHacell cultures transformed with CAT (diamonds) or mCAT expression constructs (squares) [0013] Figure 2 shows a Western blot analysis of lysates collected from B coi DH5Mz cells transformed with CAT (C) or mCAT (mC) expression constructs; [0014] Figure 3 shows a Western blot analysis of extracts from DG44 cells transformed with wild type CAT or modified CAT expression constructs; [0015] Figure 4 shows a Western blot analysis of supematants from DG44 cells transformed with the wild type CD5 (cd-1) or modified CD5 signal peptide a thyroglobulin light chain expression constructs (cd5-2 to cd5-5Y DETAILED DESCRIPTION L Overview [0016] Described herein are methods to modify natural mRNAs or to engineer synthetic mRNAs to increase levels of the encoded protein. These rules describe modifcations to mRNA coding and 3' UTR sequences intended to enhance protein synthesis by: 1 decreasing ribosomal diversion via AUG or non-canonical initiation codons in coding sequences, and/or 2) by evading miRNA-mediated down-regulation by eliminating miRNA binding sites in coding sequences. 4 [0017] Described are methods of reengineering rRNA primary struture that can be used to increase the yield of specific proteins in eukarvotic and bacterial cells. The methods described herein can be applied to the industrial production of protein drugs as well as for search purposes, gene therapy applications, and DNA vaccines for increasing antigen production. Greater protein yields minimize the costs associated with industrial scale cultures and reduce drug costs. in addition, higher protein concentrations can facilitate protein purification. Moreover processes that may otherwise not be possible due to poor protein expression e.g. in the conduct of phase 3 clinical trials, or in expressing enough antigen from a DNA vaccine to generate an immune response can be possible using the methods described herein. 1. Definitions [0018] This specification is not limited to the particular methodology protocols, and reagents described, as these may vary, It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present methods which will be described by the appended claims. [0019] As used herein, the singular forms "a", "an", and "the" include plural reference unless the context clearly dictates otherwise. Thus, for example, reference to "a Cell" includes a plurality of such cells reference to "a protein" includes one or more proteins and equivalents thereof known to those skilled in the art, and so forth. [0020] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art to which this disclosure pertains. The blowing references provide one of skill with a general definition of many of the terms used in this disclosure: Academic Press Dictionary ofScience and Technology, Morris (Ed) Academic Press (i4 ed, 1992); Oxford Diction;an o Biochemistry and Molecular Biology Smith et al. (Eds.L Oxford University Press (revised ed., 2000); Encyclopaedic Dictionary /oChemistry Kumar (Ed)Anmol Publications Pvta. Ltd(2002); Dictionary oJMicrobioiogy and Molecular Biology, Singleton et aL (Eds, John Wiley & Sons ( 3 rd ed. '2002); Dictionaryof Chemistry, Hunt (Ed. Routiedge (1: ed 1999); Dictionary ofPharmaceutical 5 Medicine, Naher (Ed.), Springer-Verlag Telos (1994); Dictionary ofOrganic Chemistry Kumar and Anandand (Eds.), Anmol Publications Pvt. Ltd (2002); and A Dictionary of Biology (Ox ford Paperback Reference Martin and Hine (Eds.), Oxford University Press
(
4 t ed 2000) Further clarifications of some of these terms as they apply specifically to this disclosure are provided herein. [0021] The term "agent" includes any substance, molecule, element compound, entity, or a combination thereof. It includes, but is not limited to, e g protein, polypeptide, small organic molecule, polysaccharide, polynucleotide, and the like. It can be a natural product, a synthetic compound, or a chemical compound, or a combination of two or more substances. Unless otherwise specified, the terms "agent "substance",and "compound" are used interchangeably herein. [0022] The term "cistron" means a unit of DNA that encodes a single poyeptide or protein. The tern "transcriptional unit" refers to the segment of DNA within which the synthesis of RNA occurs. [0023] The term "DNA vaccines" refers to a DNA that can be introduced into a host cell or a tissue and therein expressed by cells to produce a messenger ribonucleic acid (mRNA) molecule, which is then translated to produce a vaccine antigen encoded by the DNA. [0024] The language "gene of interest" is intended to include a cistron, an open reading frame (ORE), or a polynucleotide sequence which codes for a protein product (protein of interest) whose production is to be modulated. Examples of genes of interest include genes encoding therapeutic proteins, nutritional proteins and industrial useful proteins. Genes of interest can also include reporter genes or selectable marker genes such as enhanced green fluorescent protein (EGFP)| luciferase genes (Renilia or Photinus). [0025] Expression is the process by which a polypeptide is produced from DNA. The process involves the transcription of the gene into mRNA and the subsequent translation of the mRNA into a polypeptide. 6 [00281 The teri "endogenous" as used herein refers to a gene normally found in the wild-type host, while the term "exogenous" refers to a gene not normally found in the wild-type host. [0027] A "host cel" refers to a living cell into which a heterologous polynucleotide sequence is to be or has been introduced. The living cell includes both a cultured cell and a cell within a living organism Means for introducing the heterologous polynucleotide sequence into the cell are vell known, eg transfection, electroporation, calcium phosphate precipitation, microinjection, transformation, viral infection, and/or the like. Often, the heterologous polynu-leotide sequence to be introduced into the cell is a replicable expression vector or cloning vector. In some embodiments, host cells can be engineered to incorporate a desired gene on its chromosome or in its genome. Many host cells that can be employed in the practice of the present rnethods (e g. CHO cells) serve as hosts are well known in the art. See, eg, Sambrook e at Molecula cloning: A Laboratory Manuai, Cold Spring Harbor Press (3d ed., 2001; and Brent et at, Cturent Protocols in Molecular Biology, John Wiley & Sons, Inc. (Ringbou ed., 2003). In some embodiments, the host cell is a eukaryotic celi. [00281 The term "inducing agent" is used to refer to a chemical, biological or physical agent that effects translation from an inducible translational regulatory element. In response to exposure to an inducing agent, translation from the element generally is initiated de novo or is increased above a basal or constitutive level of expression. An inducing agent can be, for example, a stress condition to which a cell is exposed, for example, a heat or cold shock, a toxic agent such as a heavy metal ion or a lack of a nutrient hormone, growth factor, or the like; or can be a compound that affects the growth or differentiation state of a cell such as a hormone or a growth factor. [0029] The phrase "isolated or purified polynucleotide" is intended to include a piece of polynucleotide sequence (eg. DNA) which has been isolated at both ends from the sequences with vhich it is immediately contiguous in the naturally occurring genome of the organism. The purified polynucleotide can be an oligonucleotide which is either double or single stranded; a polynucleotide fragment incorporated into a vector; a fragment inserted into the genome of a eukaryotic or prokaryotic organism; or a fragment used as a probe. The phrase "substantially pure when referring to a polynucleotide, means that the molecule has been separated from other accompanying biological components so that, typically, it has at least 85 percent of a sample or greater percentage [0030] The term "nucleotide sequence," "nucleic acid sequence," "nucleic acid," or "polynucleotide sequence," refers to a deoxyribonucleotide or ribonucleotide polymer in either single or double-stranded form, and unless otherwise limited, encompasses known analogs of natural nucleotidles that hybridize to nucleic acids in a manner similar to naturallyoccurring nucleotides. Nucleic acid sequences can begg prokaryotic sequences, eukaryotic mRNA sequences, eDNA sequences from eukaryotic mRNA, genomic DNA sequences from eukaryotic DNA (eng mammalian DNA) and synthetic DNA or RNA sequences, but are not lined thereto. [0031] The term "promoter" means a nucleic acid sequence capable of direcing transcription and at which transcription is iniiated. A variety of promoter sequences are known in the art For example, such elements can include, but are not limited to, TA TA-boxes, CCAAT-boxes. bacteriophage RNA polymerase specific promoters (e.g- T7, SP6, and T3 promoters), an SP1 site, and a cyclic AMP response element. If the promoter is of the inducible type, then its activity increases in response to an inducing agent [0032] The five prime leader or untranslated region (5'leader, 'leader sequence or 5' UTR) is a particular section of messenger RNA (mRNA) and the DNA that codes for it. It starts at the +1 position (where transcription begins) and ends just before the start codon (typically AUG) of the coding region. In bacteria, it may contain a ribosome binding site (RBS) known as the Shine-Delgarno sequence. 5Q leader sequences range in length from no nucleotides (in rare leaderless messages) up to >1 000 nucleotides. 3' UTRs tend to be even longer (up to several kilobases in length) [0033] The term "operably linked" or "operably associated" refers to functional linkage between genetic elements thattare joined in a manner that enables them to carry out their normal functions. For example,a gene is operably linked to a promoter vhen its transcription is under the control of the promoter and the transcript produced is 8 correctly translated into the protein normally encoded by the gene. Similarly a translational enhancer element is operably associated with a gene of interest if it alows up-regulated translation of a mRNA transcribed front the gene. [00341 A sequence of nucleotides adapted for directional ligation, eiga polylinker, is a region of an expression vector that provides a site or means for directional ligation of a polynucleotide sequence into the vector. Typically a directional polylinker is a sequence of nucleotides that defnes two or more restriction endonuclease recognition sequences, or restriction sites. Upon restriction cleavage, the two sites yield cohesive ternini to which a polynucleotide sequence can be lighted to the expression vector. In an embodiment, the two restriction sites provide, upon restriction cleavage cohesive tennini that are non-complementary and thereby permit directional insertion of a polynucleotide sequence into the cassette. For example, the sequence ofnucleotides adapted tbr directional ligation can contain a sequence of nucleotides that defines multiple directions cloning means. Where the sequence of nucleotides adapted for directional ligation defines numerous restriction sites, it is referred to as a multiple cloning site [0035] The term "subject" for purposes of treatment refers to any animal classified as a mammal, e.g., human and non-human mammals. Examples of non-human animals include dogs, cats, cattle, horses, sheep, pigs, goats, rabbits, and etc. Except when noted, the terms "patient" or "subject" are used herein interchangeably. In an embodirent, the subject is human. [0036] Transcription factor refers to any polypeptide that is required to initiate or regulate transcription, For example, such factors include, but are not limited to, c-Myc, c-Fos, c-un, CREB chts, GATA, GAL4, GAL4/Vp16, c-Myb, MyoD, NF KB, bacteriophage-specifi RNA polymerases, Hif , and TRE Example of sequences encoding such factors include, but are not limited to. GenBank accession numbers K02276 (c-Myc), K00650 (c-fos), BC002981 (c-un) M27691 (CREB) X14798 (cEts), M77810 (GATA), K01486 (GAL4), AY136632 (GAL4/Vpl6), M95584 (-Myb), M84918 (MyoD), 2006293A (NF-KB), NP 853568 (SP6 RNA polymerase), AAB281 1i (T7 RNA polymerase), NP 523301 (T3 RNA polymerase), AF364604 (HIF-1), and X63547 (TRE). 9 [0037] A "substantially identical" nuclei acid or amino acid sequence refers to a nucleic acid or amino acid sequence which includes a sequence that has at least 90% sequence identity to a reference sequence as measured by one of the well known programs described herein (e.g BLAST) using standard parameters The sequence identity can be at least 95% at least 98%, and at least 99%. In some embodiments, the subject sequence is of about the same length as compared to the reference sequence, ite, consisting of about the same number of contiguous amino acid residues (for polypeptide sequences) or nucleotide residues (for polynucleotide sequences). [0038] Sequence identity can be readily determined with various methods known in the art For example, the BLASTN program (for nucleotide sequences) uses as defaults a wordlength (WV) of I , an expectation (E) of 10, M=5, N=-4, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & lenikoff, Proc. Natl, Acad. Sci. USA 89:10915 (1989)), Percentage of sequence identity is determined by comparing two optimally aligned sequences over a comparison vindowwherein the portion of the polynucleotide sequence in the comparison window may include additions or deletions (ie., gaps) as compared to the reference sequence (which does not include additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and mnultiplying the result by 100 to yield the percentage of sequence identity. [0039] The term "treating" or "alleviating" includes the administration of compounds or agents to a subject to prevent or delay the onset of the symptoms, complications, or biochemical indicia of a disease (eg. a cardiac dysfunction), alleviating the symptoms or arresting or inhibiting further development of the disease, condition, or disorder. Subjects in need of treatment include patients already suffering from the disease or disorder as well as those prone to have the disorder or those in whom the disorder is to be prevented, 10 [0040] Treatment may be prophylactic (to prevent or delay the onset of the disease, or to prevent the manifestation of clinical or subclinical symptoms thereof) or therapeutic suppression or alleviation of symptoms after the manifestation ofthe disease. In the treatment of cardiac remodeling and/or heart failure, a therapeutic agent may directly decrease the pathology of the disease, or render the disease more susceptible to treatment by other therapeutic agents. [0041] The term "etor" or "construct" refers to polynucleotide sequence elements arranged in a definite pattern of organization such that the expression of genes/gene products that are operably linked to these elements can be predictably controlled. Typically, they are transmissible polynucleotide sequences (e g plasmid or vihus) into which a segment of foreign DNA can be spliced in order to introduce the foreign DNA into host cells to promote its replication and/or transcription. [0042] A cloning vector is a DNA sequence (typically a plasmid or phage) which is able to replicate autonomously in a host cell, and which is characterized by one or a small number of restriction endonuclease recognition sites. A foreign DNA fragment may be spliced into the vector at these sites in order to bring about the replication and cloning of the fragment. he vector may contain one or more markers suitable for use in the identification of transformed ells.For example, markers may provide tetracycline or ampicillin resistance. [0043] An expression vector is similar to a cloning vector but is capable of inducing the expression of the DNA that has been cloned into it, after transformation into a host. The cloned DNA is usually placed under the control of iie., operably linked to) certain regulatory sequences such as promoters or enhancers. Promoter sequences may be constitutive, inducible or repressible. [0044] An "initiation codon" or "initiation triplet" is the position within a citron where protein synthesis starts. It is generally located at the ' end of the coding sequence. In eukaryotic mRNAs, the initiation codon typically consists of the three nucleotides (the Adenine, Uracil, and Guanine (AUG) nucleotides) which encode the amino acid Methionine (Met) In bacteria, the initiation codon is also typically AUG, but this codon encodes a modified Methionine (N-Formylmethionine (fMet) Nucleotide 11 triplet other than AUG are sometimes used as initiation codons, both in eukaryotes and in bacteria. [0045] A "downstream initiation codon" refers to an initiation codon that is located downstream of the authentic initiation codon, typically in the coding region of the gene, An "upstream initiation codon" refers to an initiation codon that is located upstream of the authentic initiation codon in the 5, leader region. [0046] As used herein, reference to downstreama" and "upstream" refers to a location with respect to the authentic initiation codon. For example, an upstream codon on an mRNA sequence is a codon that is towards the 5'end of the rnRNA sequence relative to another location within the sequence (such as the authentic initiation codon) and a downstream codon refers to a codon that is towards the 3'-end of the mRNA sequence relative to anther location within the sequence. [0047] As used herein, "authentic initiation codon" or "primary initiation codon" refers to the initiation codon of a cistron that encodes the first amino acid of the coding sequence of the encoded protein of interest whose production is to be modulated. A "secondary initiation codon" refers to an initiation codon that is other than the primary or authentic initiation codon for the encoded protein of interest. The secondary initiation codon is generally downstream of the primary or authentic initiation codon and located within the coding sequence. [0048] As used herein. "increased protein expression" refers to translation of a modified mRNA where one or more secondary initiation codons are mutated that generates polypeptide concentration that is at least about 5%; 10%,20%, 30%, 40%, 50% or greater over the polypeptide concentration obtained from the wild type mRNA where the one or more secondary initiation coons have not been mutated. Increased protein expression can also refer to protein expression of a mutated mRNA that is 1.5-fold, 2 fold, 3-fold 5fo-Id, 10-fold or more over the wild type mRNA. [0049] As used herein, "ribosomal recruitment site" refers to a site within an mRNA to which a ribosome subunit associates prior to initiation of translation of the encoded protein, Ribosomal recruitment sites can include the cap structure, a modified nucleotide (m 7 G cap-structure) found at the 5' ends of mRNAs, and sequences termed 12 internal ribosome entry sies (IRES), which are contained within mRNAs. Other ribosomal recruitment sites can include a 9-nucleotide sequence from the Gtx homeodomain mRNA, The ribosomal recruitment site is often upstream of the authentic initiation codon, but can also be downstream of the authentic initiation codon. [0050] As used herein, "usage bias" refers to the particular preference an organism shows for one of the several codons that encode the same amino acid, Altering usage bias refers to mutations that lead to use of a different codon for the same amino acid with a higher or lower preference than the original codon. [00511 As used herein, "fAll-length protein" refers to a protein which encompasses essentially every amino acid encoded by the gene encoding the protein Those of skill in the art know there are subtle modifications of some proteins in living cells so that the protein is actually a group of closely related proteins with slight alterations. For example, sone but not all proteins a) have amino acids removed from the amino-terminus, and/or b) have chemical groups added which could increase molecular weight. Most bacterial proteins as encoded contain a methionine and an alanine residue at the amino-terminus of the protein; one or both of these residues are frequently removed from active fbrms of the protein in the bacterial cell. -These ypes of modifications are typically heterogenous so not all modifications happen to every molecule. Thus, the natural "full4ength" molecule is atually a family of molecules that start from the same arnino acid sequence but have small differences in how they are modified. The term "full length protein" encompasses such a family of molecules. [0052] As used herein, "rescued" or "modified" refer to nucleotide alterations that remove most to all secondary initiation codons from the coding region "Partially modified" refers to nucleotide alterations that remove a subset of all possible mutations of secondary initiation codons from the coding region. H. Reduction of Ribosomal diversionvia downstream initiation eodons [0053] As mentioned above, it is well-known that features contained within 5' leaders can affect translation efficiency. For example, an AUG codon in the 5 leader, termed an upstream AUG codon, can have either a positive or a negative effect on protein 13 synthesis depending on the gene, the nucleotide context, and cellular conditions. An upstream AUG codon can inhibit translation initiation by diverting ribosomes from the authentic initiation codon Meijer et aL, "Translational Control of the Xenopus laevis Connexin-41 5'-Untranslated Region by Three Upstream Open Reading Frames" I Bioi Chem. 275(40):30787-30793 (2000)) For example, Figures 6 and 8 in Meijer et al. show the ribosomal diversion effect of upstream AUG codon in the 5 leader sequence. [0054] Although AUG/ATG is the usual translation initiation codon in many species, it is known that translation can sometimes also initiate at other upstream codons, including ACG, GUG/GTG, LUG/TT, CUG/CTG, AUA/ATA, AUC/ATC, and AUU/ATT in vivOa For example, it has been shown that mammalian ribosomes can initiate translation at a non-AUG triplet when the initiation codon of mouse dihydrofolate reductase (dhpj was mutated to ACG (Peabody, D.S. (1987) Bial. Chen. 262, 11847 11851) A further study by Peabody showed that mutant initiation codons AUG of dhf (GUG, U, CUG, AUA, AUC and AU) all were able to direct the synthesis of apparently nornal dkfr (Peabody . S' (1989) Rio. Chem.264, 5031-5035 [0055] The tethering and clustering models of translation iniiation postulate that translation can initiate at an accessible initiation codonand studies have shown that an initiation codon can be used in a distance-dependent manner downstream of the ribosomal recruitment site (cap or IRES) (Chappeil ct at. "Ribosomal tethering and clustering as mechanisms for translation initiation" PNAS USA 1 03(48): 18077-82 2006) This suggests that putative initiation codons in coding sequences may also be utilized. Translation initiation at downstream initiation codons, or secondary initiation sites, can compete with the authentic initiation codon, or primary initiation site, for ribosomes and lower the expression of the encoded protein. Decreasing the availability of these secondary initiation sites, such as by mutating them into a non-initiation codon, increases the availability of the primary intiation sites to the ribosome and a more efficient encoded protein expression. [00561 The present method allows for improved and more efficient protein expression and reduces the competition between various initiation codons for the translation machinery. By eliminating downstream initiation codons incoding sequences 14 that are in the same reading frame as the encoded protein, the generation of truncated proteins, with potential adhered function, will be eliminated. In addition, by eliminating downstream initiation codons that are out-of-frame with the coding sequence, the generation of various peptides, some of which may have negative effects on cell physiology or protein production, will also be eliminated, This advantage can be particularly important for applications in DNA vaccines or gene therapy, [00571 Direct mutation ofdownstream initiation codons can take place such that the encoded amino acid sequence remains unaltered. This is possible in many cases because the genetic code is degenerate and most amino acids are encoded by two or more codons. The only exceptions are Methionine and Tryptophan, which are only encoded by one codon, AUG, and UGI respectively. Mutation of a downstream initiation codon that also aters the amino acid sequence can also be considered. In such cases, the effects of altering the amino acid sequence can be evaluated. Alternatively, if the amino acid sequence is to remain unaltered, the nucleotides flanking the putative initiation codon can sometimes be mutated to diminish the efficiencyof the initiation codon. For AUG codons, this can be done according to the nucleotide context rules established by Marilyn Kozak (Kozak, M. (1984) Nature 308, 241-246), which state that an AUG in excellent context contains a purine at position -3 and a G at +4, where AUG is numbered +1, +2, +3 [00581 For non-,AUG codons, similar rules seem to apply with additional determinants from nucleotides at positions +5 and +6. In designing mutations, the codon usage bias can, in many cases, remain relatively unalered, e.g. by introducing mutated codons with similar codon bias as the wild type codon. Inasmuch as different organisms have different codon usage frequencies, the specific mutations for expression in cells from different organisms will vary accordingly. [0059] It should be appreciated that the methods disclosed herein are not limited to eukaryotic cells, but also apply to bacteria. Although bacteria translation initiation is thought to differ from eukaryotes, ribosomal recruitment still occurs via cis elements in mRNAs, which include the so-called Shine-Delgarno sequence. NonAUG initiation codons in bacteria include ACGG, , UUGUG, G, AUA, AUC and AUU. 15 [00601 In an embodiment, disclosed are modifications to coding sequences that enhance protein synthesis by decreasing ribosomal diversion via downstream initiation codons. These codons can include AUG/ATG and other nucleotide triplet codons known to function as initiation codons in cells, including but not limited to ACG, GUG/GTG, UUGFTG, CUG/CTG, AUAIATA, AUC/ATC, and A.UU/ATT. In one embodiment the downstream initiation codon is mutated. Reengineering of mRNA coding sequences to increase protein production can involve mutating all downstream initiation codons or can involve mutating just some of the downstream initiation codons. In another embodiment, the flanking nucleotides are mutated to a less favorable nucleotide context In an embodiment, ATG codons in the signal peptide can be mutated to ATC codons resulting in a Methionine to Isoleucine substitution. I another embodimentCTIG codons in the signal peptide can be mutated to CTC, In another embodiment, ATG codons can be mutated to ATC codons resulting in a Methionine (M) to Isoleucine (I) amino acid substitution, and CTG codons can be mutated to CTCs. In another embodiment, ATG codons can be mutated to ATC codons, CTG codons can be mutated to CTC codons, and the context of initiator AUG can be improved by changing the codon 3' of the initiator from CCC to GCT resulting in a Proline (P) to Argenine (R) amino acid substitution. In other embodiments, modifications can be made to the signal peptde in which one or more AUG and CUG codons can be removed Modifications can be made including a modified signal peptide by removal of most of the potential initiation codons, removal of ATG and CTGs of the signal peptide, removal of ATGY CTG and ACG codons resultnig in a Glutamic acid (E) to Glutamine (Q) amino acid substitution or a Histidine (I) to Argenine (R) amino acid substitution. [00611 Standard techniques in molecular biology can be used to generate the mutated nucleic acid sequences. Such techniques include various nucleic acid manipulation techniques nucleic acid transfer protocols, nuclei acid amplification protocols and other molecular biology techniques known in the art For example, point mutations can be introduced into a gene of interest through the use of oligonucleotide mediated site-directed mutagenesis. Modified sequences also can be generated synthetically by using oligonueleotides synthesized with the desired mutations. These approaches can be used to introduce mutations at one site or throughout the coding 16 region. Alternatively, homologous recombination can be used to introduce a mutation or exogenous sequence into a target sequence of interest. Nucleic acid transfer protocols include calcium chloride transformation/transfection, electroporation, liposome mediated nucleic acid transfer, N.J1I-(2,3 -Dioioyloxy)propyl I-NN N-trimethylammoni um niethylsulfate meditated transformation, and others. In an alternative mutagenesis protocol, point mutations in a particular gene can also be selected for using a positive selection pressure. See, eg., Cunrent Techniques in Molecular Biology, (Ed, Ausubel, et a,. Nucleic acid amplification protocols include but are not limited to the polymerase chain reaction (PCR) Use of nucleic acid tools such as pasmnids, vectors, promoters and other regulating sequences, are well known in the art ibr a large variety of viruses and cellular organisms. Further a large variety of nucleic acid tools are available from many different sources including ATCC, and various commercial sources, One skilled in the art will be readily able to select the approprate tools and methods for genetic modifications of any particular virus or cellular organism according to the knowledge in the art and design choice. Protein expression can be measured also using various standard methods These include, but are not limited to, Western blot analysis, EUSA, metabolic labeling, and enzymatic activity measurements. IV. Evasion of miRNA-mediated down-reaulation [0062] MicroRNAs are an abundant class f small noncoding RNAs that generally function as negative gene regulators In an embodiment, modifications can be made to mRNA sequences, including 5' leader, coding sequenceand 3 UTR, to evade niRNA-mediated down-regulation. Such modification can thereby alter rnRNA or nascent peptide stability, and enhance protein synthesis and translation efficiency [0063] MiRNAs can be generally between 21-23 nucleotide RNAs that are components of ribonucleoprotein complexes. miRNAs can affect nRNA stability or protein synhesis by base-pairing to mRNAs. miRNAs generally mediate their effects by base-pairing to binding sites in the 3' UTRs of mRNAs. However, they have been shown to have similar repressive effects from binding sites contained within coding sequences and 5' leader sequences. Base-pairing occurs via the so-caled "seed sequence," which 17 consists of nucleotides8 2- of the miRNA. There may be more than 1,000 different miRNAs in humans. (0064] Reengineering rRNAs to circumvent miRNA-mediated repression can involve mutating all seed sequences within an mRNA. As with the initiation codon mutations described above, these mutations can ensure that the encoded amino acid sequence remains unaltered, and act not to introduce initiation codons,; rare codons, or other miRNA seed sequences [0065] A computer program can be used to reengineer mRNA sequences according to a cell type of interest, eg. rodent cells for expression in Chinese hamster ovary cells, or human cells for expression in human cell lines or for application in DNA vaccines. This program can recode an mRNA to eininate potential initiation codons except for the inidation codon. In the case of in-frame AUG codons in the coding sequence, the context of these downstream initiation codons can be weakened if possible, Mutations can be performed according to the codon bias for the cell line of interest, e.g human codon bias information can be used for human cell lines Saccharomyces cerevisiae codon bias information can be used for this yeast, and Ecoi codon bias information can be used for this bacteria. In higher eukaryotic mRNAs, the recorded mRNA can then be searched ibr all known seed sequences in the organism of interest, e.g. human seed sequences for human cell lines. Seed sequences can be mutated with the following considerations: ) without disrupting the amino acid sequence, 2) without dramatically altering the usage bias of mutated codons, 3) without introducing new putative initiation codons. [0066] While this specification contains many specifics and described with references to preferred embodiments thereof, these should not be construed as limitations on the scope of a method that is claimed or of what may be claimed,but rather as descriptions of features specific to particular embodiments. It wil be understood by those skilled in the art that various changes in form and details may be made therein without departing fom the meaning of the subject matter described. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that 18 are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub-combination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub combination or a variation of a sub-combination The scope of the subject matter is defined by the claims that follow. [0067] All publications, databases. GenBank sequences, patents, and patent applications cited in this specification are herein incorporated by reference as if each was specifically and individually indicated to be incorporated by reference. EXAMPLES [0068] The following examples are provided as further illustration, but not to limit the scope. Other variants will be readily apparent to one of ordinary skilin the art and are encompassed by the appended claims. Example 1 d ne initiation sitesithiimRNA" t [0069] The presence of multiple translation initiation sites within the 5' UTR and coding regions of mRNA transcripts decreases translation efficiency by, for example, diverting ribosomes from the authentic or demonstated translation initiator codon. Alternatively, or in addition, the presence of multiple translation initiation sites downstream of the authentic or demonstrated translation initiator codon induces initiation of translation of one or more protein isoforms that reduce the translation efficiency of the full length protein. To improve translation efficiency of mRNA transcripts encoding commercially-valuable human proteins, potential translation initiation sites within all reading frames upstream and downstream of the authentic or demonstrated translation initiator codon are mutated to eliminate these sites. In preferred aspect of this method, the mRNA sequence is altered but the resultant amino acid encoded remains the same. Alternatively, conservative changes are induced that substitute amino acids having sirnilar physical properties. 19 [00701 The canonical translation initiation codon is A[G'lATG. Other identified initiator codons include, but are not limited to, ACG, (UG/GTG, UUG/TTG CUG/CTG, AUAI"ATA AUCATC, and AUU/ATT. Intracelldar protein: Chloranmphenicol Aceyl transferase (CAT) [00711 Chloramphenicol is an antibiotic that interferes with bacterial protein synthesis by binding the 50S ribosomal subunit and preventing peptide bond formation.he resistance gene (cat) encodes an acetyl transferase enzyme that acetylates and thereby inactivates this antibiotic by acetylating the drug at one or both of its two hydroxyl groups. The unmodified open reading frame of CAT contains 1 13 potential initiation codons (20 ATG including the authentic initiation codon, 8 ATC 8 ACG, 12 GTG, 8 TTG, 11 CTG, 6 AGG, 10 AAG 16 ATA, and 14 ATT codons) (SEQ ID NO: 120) SEQ ID NO: 121 is a fhly modified CAT ORF and SEQ ID NO: 122 is a partially modified CAT ORs in which only some of the potential modifications were made [0072] Figures IA-B show bacterial expression constructs were generated containing the CAT cistron (CAT) and a partially modified CAT cistron (mCAT) and tested in the E. col bacterial strain DH5u. DH5a cells were transfoned with the CAT and mCAT expression construts and plated onto LB/ampicillin plates. Cultures were obtained from single colonies and cultured in LB/ampiciliin (~50pg/ml) at 37C with shaking at 220 rpm until logarithmic growth was reached as determined by measuring the
A
60 of the culture. The cultures were then diluted with LB/ampicillin to comparable Aqw s. The A 600 of the culture derived from DHIa cells transformed with the CAT expression construct was 03, while that from the cells transformed with the mCAT expression construct was 0 25, Chloramphenicol acetyltransferase expression was induced via the lc operon contained within the CAT and mCAT plasmids by the introduction of Isopropyl (3Dmi thiogalactopyranoside (IPT, final concentration of 0.4mM). Three milliliters of each culture was transferred to a fresh tube containing chloramphenicol resulting in a final concentration of 20 40 80, 160, 320, 640, 1280, and 2560pg/ml. Cultures were incubated at 37C with shaking at 220 rpm and the A 0 of each culture measured at 1 hour intervals. 20 [0073] Figures IA-I B show growth curves of cultures of DH5 cels transformed with CAT (diamonds) and mCAT (squares) expression constructs Chloramphenicol acetyltransferase expression was induced by the addition of IPTG (0,4mM final concentration) 3 milliliters of IPTG containing culture was added to fresh tubes containing Chioramphenicol resuming in final concentrations of 0, 40, 80, 160, 320, 640 1280 and 2560pg/ni Cultures were incubated at 37"C with shaking at 220 rpm and the A 0 o of each culture measured over time. The results for cuhures grown in the presence of 320 and 640pg/ml Ciloramphenicol are shown. The X-axis represents time in hours, the Y-axis represents normalized A 6 00 (relative to starting A 6 6 o) [00741 The resuls showed that bacteria transformed wih the mCAT expression construct grew better than the bacteria transtbrmed with the CAT expression construct at all concentrations. As shown in Figures 1A- B, in high concentrations of Chloramphenicol (320 and 640 gml), cells with the modified CAT still grew, but cells with the wild type CAT did not. These results indicate that more functional Chloramphenicol acetyltransferase enzyme was expressed from the mCAT construct thus allowing the bacteria transformed with this expression construct to grow better in the presence of this antibiotic. [0075] To determine the relative amounts of Chloramphenicol acetyltransferase enzyme synthesized from DH5a cells transformed with the CAT and mCAT expression constructs, Western blot analysis was performed on cell extracts at 5 30, 60 and 90 minutes after induction by IPTG. 50p1 of culture at each time point was centrifuged, and bacterial pellets resuspended in 30p1 of TE buffer and 10p] of a 4 x SDS gel loading buffer. The sample was heated at 95"C for 3 minutes and loaded onto a 10% Bis-Tris/SDS polyacylamide get Proteins were transferred to a PVDF membrane and probed with an anti-CAT antibody, Figure 2 is a Western blot analysis of lysates from DH5a cells transformed with the CAT (C) and mCAT (mCAT) expression constructs at various tines after IPTO induction, The results showed that the amount of Chloramphenicol acetytransferase protein (above the ! 9kDa marker) is substantially increased in DH5c cells transformed with the mCAT expression construct (mC) at all time points tested. 21 [0076] Analysis of the Chioramphenicol acetylransferase ORF was also performed in mammalian cells. The CAT ORE and the partially modified CAT ORF were cloned into mammalian expression constructs containing a CMV promoter and tested by transient transfection into Chinese Hamster Ovary (DG44) cells. In brief, M5pg of each expression construct along with 20ng of a comtransfection control plasmid that expresses the fp-galactosidase reporter protein (pCMVv, IClontech) was transfected into 1005000 DG44 cells using the Eugene 6 (Roche) transfection reagent according to the manufacturers instructions Twenty-four hours post transfection, cells were lysed using 25|pi of lysis buffer. Lac Z reporter assay was performed to ensure equaltransfection efficiencies between samples 30p1 of lysate was added to 0Jl of a 4 x SDS get loading buffer. The sample was heated at 72'C for 10 minutes and loaded onto a 10% Bis Tris/SDS polyacylamide gel Proteins were transferred to a PVDF membrane and probed with an c-CAT antibody. [0077] Eigure 3 shows a Western blot analysis of extracts from the DG44 cells transfomed with wild type (CAT) and modified CAT expression constructs. Cell extracts were fractionated on 10% Bis-Tris gels in 1 x MOPS/SDS transferred to PVDF membrane and probed with an anti-CAT antibody. Experiments were performed in triplicate with extracts from cells in which transfection efficiency was the same. [0078] Comparisons were made between three transfections with the wild type (CAT) and three with the modified CAT The amount of CAT protein (above the 1I9kDa marker) is substantially increased in cells transfected with the modified construct 'The results showed that the amount of CAT protein (above the 1 9kDa marker) is substantially increased in DG044 cells transfected with the mCAT construct. Modification of the CAT ORF by eliminating multiple translation initiation sites within the resuming -nRNA transcripts demonstrated that this technology may be of practical use in numerous organisms besides just mammalian and bacterial cells. Secreted Proteins [0079] The usefiness of this technology was also investigated with secreted proteins. Mammalian expression constructs were generated for a signal peptide that is encoded within the Homo sapiens CD5 m l ), mRNA. Mammalian 22 expression constructs were generated in which transcription was driven by a CMV promoter and where the cd5 signal peptide was placed at the 5' end of the ORF that encodes a light chain of an antibody against the thyrogloulin protein (cd5- 1, SEQ ID NO: 123) The CD5 signal peptide sequence contains 7 potential initiation codons including 3 ATG, I TTG and 3 CTG codons, A series of expression constructs was generated, In one variation, ATG codons in the d5 signal peptide were changed to ATC codons resulting in a Methionine to Isoleucine substitution (ed5-2, SEQ ID NO: 124). In another variation, CTG codons in the cd5 signal peptide were changed to CTC (d5-3 SEQ ID NO: 125). In another variation, ATG codons were mutated to ATC codons resulting in a Mlethionine (M) to isoleucine (I) amino acid substitution, and CTG codons were changed to CTCs (cd5-4. SEQ ID NO: 126). In another variation, ATG codons were changed to ATC codons resulting in a Methionine (M) to Isoleucine (1) amino acid substitution, CTG codons were changed to CTC codons, and the context of initiator AUG was improved by changing the codon ' of it from CCC to GCT resulting in a Proline (P) to Argenine (R) amino acid substitution (ed5-5. SEQ ID NO: 127 [0080] These constructs were then tested by transient transfection into Chinese Iamster Ovary (DG44) cells.In brief, .0-5ptg of each expression construct along with 2ong of a co-transfection control plasmid that expresses the galactosidase reporter protein (pCMVp3, Clontech) was transfected into 100,000 DG44 cells using the Fugene 6 (Roche) transfection reagent according to the manufacturer's instructions. Twenty-four hours post transfection cells were ysed using 250pl of lysis buffer. Lac Z reporter assay were performed to ensure equal transfection efficiencies between samples. 30p of supernatant was added to 1 Opl of a 4 x SDS gel loading buffer. The sample was heated at 72"C for 10 minutes and loaded onto a 10% Bis-TAris/SDS polyacylamide gel. Proteins were transferred to a PVDF membrane and probed with an a-ikappa light chain antibody. [0081] Figure 4 shows a Western blot analysis of supernatant from DG44 cells transformed with the wild type (d5-1) and modified cd5 signal peptide a thyroglobulin light chain expression constructs (cd5-2 to Cd5-5) Cell extracts were fractionated on 10% Bis-Tris gels in I x MOPS/SDS transferred to PVDF membrane and probed with an a-kappa light chain antibody. Experiments were performed with 23 supernatant from cells in which transfection efficiency was the sare. The results show that the levels of the secreted antibody light chain product (above 28 kDa) in the supernatant ofcells was substantially increased for the expression construct lacking CTG codons in the signal peptide (d5-3). The expression construct lacking CTG, ATG codons and with improved nucleotide context around the authentic initiation codon in the signal peptide (fully rescued) also had levels of protein product in the supernatant that were substantially increased. [0082] Thy-I Variable Light chain ORF containing light chain signal peptide I (|SEQ ID NO: 128) contains 104 potential initiation codons including 8 ATG, including the authentic initiation codon, 15 ATC, 6 ACG. 14 GTG, 4 "TG, 26 CTG, 16 AGG, 10 AAG, 3 ATA, and 2 ATT codons. Modifications were made in the signal peptide in which an AUG and CUG codons were removed (SEQ ID NO: 129). Thy-I Variable Light chain ORF containing light chain signal peptide 2 (SEQ ID NOS: 130) contains 104 potential initiation codons including 7 ATG, including the authentic initiation codon, 16 ATC.6 ACG, 13 GTG, 4 TTG 27 CTG, 15 AGO, 10 AAG 4 ATA. and 2 ATT codons, Thy-I Variable Ieavy chain ORF containing heavy chain signal peptide 1 contains 225 potential initiation codons including 18 ATGincluding the authentic initiation codon, 14 ATC 18 ACG, 42 GTG, 7 TTG, 43 CTG, 43 AGO, 33 AAG, 5 ATA. and 2 ATT codons (SEQ ID NO: 131) Modifications were made in the signal peptide by removing an AUG and CUG codon (SEQ ID NO: 132). Thy-I Variable Heavy chain ORF containing heavy chain signal peptide 2 contains 227 potential initiation codons including 18 ATG, including the authentic initiation codon, 14 ATC, 18 ACG, 43 GTG, 9 TTG, 41 CTG, 43 AGG 33 AAG, 5 ATA, and 3 ATT codons (SEQ ID NO: 133), [0083] Thy-I Variable Light chain ORF in which the signal peptide is replaced with the CD5 signal peptide (SEQ ID NO: 137) contains 104 potential initiation codons including 8 AT, including the authentic initiation codon, 15 ATC. 6 ACG, 13 GTG, 5 TTG, 27 CTG, 14 AGO, 10 AAG, 3 ATA, and 2 ATT codons. A modification was made in which the ATG codons were changed to ATC codons that resulted in a Methionine (M) to Isoleucine (1) amino acid substitution (SEQ ID NO: 138). A modification was also made in which the CTG codons were changed to CTC codons 24 (SEQ ID NO: 139). Another modification was made in which the ATG codons were mutated to ATC codons that resulted in Methionine (M) to Isoleucine (I) amino acid substitution and CTG codons were changed to CTC codons (SEQ ID NO: 140). Another modification was made i which ATG codons were changed to ATC codons resulting in a Methionine (M) to Isoleucine () amino acid substitution, CTG codons were changed to C-TC codons, and the context of initiator AUG was improved by changing the codon 3 of it from CCC to GGT resulting in a Proline (P) to Argenine (R) amino acid substitution (SEQ IDNO: 141), [0084] Signal peptides from other organisms were mutated as well(see Table 1). DNA sequences for signal peptides that function in yeast and mammalian cells were analyzed and mutated to create mutated versions (SEQ ID NOS: 145456). It should be appreciated that in signal peptides, which are cleaved off of the protein, in frame ATG codons can be mutated, eg. to ATT or ATC, to encode Isoleucine, which is another hydrophobic amino acid. DNA constructs can be generated that contain these signal sequences fused in frame with a light chain from a human monoclonal antibody. Upon expression in different organisms (such as yeast Pichia pastors and mammalian cell lines), protein gel and Westemn assay can be used to check the expression level of human light chain antibody. Table 1: DNA sequences for signal peptide that function in yeast and mammalian cells. ______ ------- - - - - - ----- I... .......... ... . ............. .- - ----- -... -- --------- Organism/ DNA sequence SEQ ID .... ..... ___ __ .. ________ ke ........... . ............. -------- ----------------. ---.... ... ..................... ... N --- : Ka,1Signal sequence NO-i' Pichia pastors/ ATG/CTG/TCG/TTA/AA A/CC A/TCT/TGG/CTr/ 145 Kar2 Signal sequence ACT/TTG/GCG/GCA/TTAIATG/TAT/GCC/ATG/ CTA/TTG/GTC/GTA/GTG/CCA/TTT/GCT/A AA/ CCT/GTT/AGA/GCT ___ Pichia pastoris/ ATG/CTC/TCG/TTA/AAA/CCA/TCT/TGG/CTC 146 Kar2 Signal sequence rescue /ACT/TTG/GCG/GCA/TTA/ATT/TAC/GCC/AT version C/CTA/TTG/GTC/GTA/GTG/CCA/TTT/GCT/A AA/CCC/GTT/AGA/GCT chicken / ATG/CTG/GGT/AAG/AAG/GAC/CCA/ATG/TG 14 lysozymne signal sequence T/CTT/GTT/TTG/GTC/TTG/TTG/GGA/TTG/AC .- / ~T/GCT/TTG/TTG/GGT/ATC/TGT/CAA/GGT ___ ~chicken /ATG/CTC/GGT/AAG/AAC/GAC/CCA"ATT/TG 148 lysozyme signal sequence T/CTT/GTT/TTG/GTC/TTG/TTG/GGA/TTG/AC rescue version C/GCT/TTG/TTG/GGT/ATT/TGT/CAA/GGT 25 Human / AT/AGG/CTG/GGA/A ACTGCAGC/CTG/AC 149 G--CSF-.R signal sequence T/TGG/GCT/GCC/CTG/ATC/A TCCTG/CTG/CT C/CCC/GGA/AGT/CTG/G AG H uman / ATG/AGG/CTT/OGA/AAT/TGTAOC/CTIC/AC 150 G-CSF-R signal sequence T/TGG/GCC/GCC/CTC/ATC/ATC/CTC!CTT C rescue version TCCC/GG AJAGT/CTC/GAG Human / ATG/AGG/ACA/TT/ACA/ACCGGTGC TT 51 calcitonin receptor precursor G/GCA/CTG/TTT/CTTICTT/CTA/AAT/CAC/CC signal sequence A/ACC/CCA/'ATT/CTT/CCT17 G__ Human / ATG/AGO/ACA/TTT/ACA/AGC CT/TGCTT 152 calcitonin receptor precursor G/GCA/CTC/TTT/CTT/CTT/CTA/AAT/CAC/CC signal sequence rescue A/ACC/CCA/ATT/CTT/CCC/G version __________ Human / TGne/GCC GOCCCC/TCGCTC/CTG/CTC 13 cell adhesion molecule 3 /CT&/CTC/CTG/CTGITTC/GCC/TGC/TCC/TGG precursor (Immunogiobulin /GCG/CCC/GGCIGGO/GCC superfamily member?4B) signal seuence
____
Human /ATG/GCC/CCAiGCC/GCC/TCG/CTC/CTT/CTC 154 ccli adhesion molecule 3 /CTT/CTC/CTT/CTC/TTTIGCT/TOT/TGT/TGO precursor (lImmunoglobulin /GCG/CCC/GGC/GGG/GCC superfamily mnemberA4B) signal sequence rescue version _______ Human /ATG/GTC/GCG/CCC/CGAIACC/CTC!CTC/CTG 155 HLA class I CTA/CTC/TCGGOG/GCC/CTG/OCC/CTC/AC hi stocompatibilty antigen C/CAG/AC§I/TGG/GCG sinseuence_ H uman A TO/GTC/OCG/CCC/CGAIACC/GTC/CTC/CTT 156 HLA class I /CTT/CTC/TCG/GCG/GCC/CTC/OCC/CTT/AC hi stocompatibility antigen C/GAG/ACT/TGG/GCC signal sequence rescue version _______ HeRed I [0085] HcRedl encodes a far-red fluorescent protein whose excitation and emission maxima occur at 558 nm and 618 nm +/- 4mn, respectively. HeRedl was generated by mutagenesis of a non-flourescent chromoptorein from the reef coral HeIteractis crispa. The HcR ed I coding sequence was subsequently human codon optimized for higher expression in mammalian cells. This ORF contains 99 potential initiation codons including 9 ATG, including the authentic initiation codon, 8 ATC, 12 26 ACG, 16 GTG, 21 CTG, 18 AGG, and 15 AAG codons (SEQ ID NO: 134). Full and partial modifications of HcRedl ORF were generated (SEQ ID NOS: 135 and 136, respectively), Erythropoietin EPQ1. [0086] Human erythropetin(EPO) is a valuable therapeutic agent. Using methods described herein, the mRNA sequence that encodes for the human EPO this protein (provided below and available as GenBank Accession No. NM 000799) is optimized to eliminate multiple translation initiation sites within this mRNA transcript. [0087] An exemplary human erythropoietin (EPO) protein is encoded by the following mRNA transcript wherein the sequence encoding the mature peptide is underlined, all potential translation initiation start sites within all three reading frames are boiled, the cano ical nitiator codon corresponding to methionine is capitalized, and uracil (u) is substituted for thymidine (4) (SEQ ID NO: I 111): -i cag(acta-Y.ct{2avgccgtgqggatggvc~tgc EaggegC2aQC crrggATGaq 21g::cggtgtgg 1:cac aq ggqccc:aggtQtgaggga3C-e Q((e QQQCgY ni gv acacttacatgtgrecacceg 10 ggtatcac a+eC cYaceaaLangtgcI!Qtacacetcac-ag2cactGan 9- geATaAattatgaeagaggaaetgrcccdgetcer enz tAgcc-den>g [00a T maagrcautgghea ntandmn acadaeaen, sm et: av gbAaNti azgAt aaa- (on 'eg ossibe. ag the t{0egof :ettgoninea Lvtt lotgA YOU a(7c ntqq ow: ygbytt ttt o ggtagtaggmt'agand. ATKa 2 ggaga-cjiEttagtgattc(( -g <'-'~a 'ATG-t agt eteacg Tg a -t gg' gggt-qgtqa sttge ;ggggeaeTtegaagA ggggt - qcdatgg 3C-( AT:gg 4 t 4 0 aagetegtg(OcWfqaiATe ci(Caa ttgta(7EEagaatg s.,aaasacatc aaaaaaaaa oacg u(U(Ltg3a~&agT wh2:Cich tg c are nl:yen cd tonly by3 one qo (ugatg and (utAgg/tggcqT espectzyg 27 substitution replaces the sequence encoding methionine or tryptophan with a sequence encoding an amino acid of similar physical properties. Physical properties that are considered iportant when making conservative amino acid substiutions include, but are not limited to, side chain geometry, size, and branching; hydrophobicity; polarity; acidity; aromatic versus aliphatic structure; and Van der Waals volume. For instance, the amino acids leucine or isoleucine can be substituted for methionine because these amino acids are all similarly hydrophobic, non-polar, and occupy equivalentVan der Waals volumes Thus, a substitution of leucine or isoleucine for nethionine would not affect protein folding. Leucine is a preferred amino acid for methionine substitution Ahematively, the amino acids tyrosine or phenylalanine can be substituted for tryptophan because these amino acids are all similarly aromatic, and occupy equivalent Van der Waals volumes. [0089] The following sequence is an example of a modified nRNA transcript encoding human erythropoietin (EPO), wherein all potential translation initiation start sites upstream of the demonstrated initiator methionine (encoded by nucleotides 182184) and those potential translation initiation start sites downstream of the demonstrated iniiator metinne within the coding region, are mutated (mutations in italics) (SEQ ID NO: 113). c -'-.-.-rtecceaggcccrgtaggg~cta 4ccctatcg8c gcagettceccqggTiAaggT :4pgtctagtLca ggggcg ** cagg-t cQtaaa23 C3&1>gge a C Gg qa - AT ggggtacaaTAtcctacctagctta423:tcctanectactaC otQC 8' c gaac a :egagggg:ctneicagchagecagectactcTT~taa cagctaqna 9M"geThaaetttacggdgggaaetnat-cagaeaor cangtgtaNYCA 99 teacagggdaetttaagcagagtalgqa-gttatagLgg'xAccttaaoo-r4a (..g......... - a c a a -ac -- ta a------- ---- g-----a aa-ttttaT T ::--: * aggat-tacacttaa t actcoaattltrteggnaweta:-:-ttaccagcTa t tKK ga...t.g...ag t....tKpag tegac t. 9 g 2tacag gtta a ca a-agggatatag a'icTAao aagg ttagggget 5a eI gortagt:tTTAgggciq vooagitttat ttatcaa-zctottacacaagaazta 28 [0090] The unmodified open reading frame for erythropoietin contains 8|8 potential initiation codons (8 ATG, including the authentic initiation codon, 5 ATC, 4 ACG 7 GTG, 3 TTO, 32 CTG 14 AG, 10 AAG, 3 ATA, and 2 ATT codons) (SEQ ID NO: 112) Modifications were made including a modified signal peptide by removal of nost of the potential initiation codons (SEQ ID NO: 116), removal of ATG and CTGs of the signal peptide (SEQ ID NO: 21 R) removal of ATO. CTG and AC codons resulting in a Glutamic acid (E) to Giutamine (Q) amino acid substitution (SEQ ID NO: 118) or a Histidine (H) to Argenine (R) amino acid substitution(SEQ ID NO: 119) [0091] MicroRNA (miRNA) binding to target mRNA transcripts decreases translation efficiency by either inducing degradation of the target mRNA transcript, or by preventing translation of the target mRNA transcript. To improve translation efficiency of mRNA transcripts encoding commercially-valuable human proteins, all known or predicted miRNA binding sites within a target mRNA's 5 leader sequence, 5 untranslated region (UTR) sequence, coding sequence, and 3? untranslated region (UTR) sequence are first identified, and secondly mutated or altered in order to inhibit miRNA binding. [00923 In a preferred aspect of this method, the seed sequence, comprising the first eight 5'- nucleotides of the mature niRNA sequence is specifically targeted. Seed sequences either include 5' nucleotides 1-7 or 2-Sof the mature miRNA sequence. Thus, a seed sequence, for the purposes of this method, encompasses both alternatives. The miRNA seed sequence is functionally significant because it is the only portion of the nnRNA which binds according to Watson-Crick base-pairing rules. Without absolute complementarity of binding within the seed sequence region of the miRNA, binding of the miRNA to its target mRNA does not occur-However, unlike most nucleotide pairings, the seed sequence of a n-iRNA is capable of pairing with a target mRNA such that a guanine nucleotide pairs with a uracil nucleotide, known as the G:U wobble. [0093] For example, human erythropoietin (EPO) is a valuable therapeutic agent tht has been difficult to produce in sufficient quantities., Using the instant methods, the sequence of the mRNA sequence that encodes this protein (GenlBank Accession No. 29 NM 000799) is optimized to inhibit miRNA downregulation. The PicTar Web Interface (publicly available at pictarxmdc berlin de/egi-bin'PicTar vertebratesegi) predicted that human niRNAs hsamiR-328 and hsamiR-122a targeted the mRNA encoding ibr human EPO (the mature and seed sequences of these miRNAs are provided below in Table 2) Thus, in the ease of hsa-miR122a, for instance, having a seed sequence of uggaguu one or more nucleotides are mutated such that hsamiR I22a no longer binds, and the seed sequence of another known miRNA is not created. One possible mutated hsa-miR 122a seed sequence that should prevent binding is "uagaguqu." It is unlikely that this mutated seed sequence belongs to another known miRNA because this sequence is not represented, for instance, within Table 2 below. [0094] Similarly, the PicTar Web Interface predicted that human miRNAs hsa-rniR 149 hsa-let7 aet7c. hsaiet7b, hsa-let7g, hsaiet7a, hsa-miR -98, hsa-et7i, hsa-let"e and hsa-miR-26b targeted the mRNA encoding for human interferon beta 2 (also known as IL-6, Genbank Accession No. NM_000600) (the mature and seed sequences of these miRNAs are provided below in Table 2). [0095] MiRNA binding sites can also be identified by entering any sequence of less than 1000 base pairs into the Sanger Institute's MiRNA:Sequence database (publicly available at microrna.sangeracuklequences/search shtml) Table 2: now n Human MiRNAs, mature sequces ndse eqec MiRNA Mature Sequence SEQ ID Seed Sequence hsa-let-7a t a u u 1 2 *hsa-let-7d a u 2 * hsa-let-7f I nuse - hsa-let- 7 - g. ..uc i. ..... 7 -. ---------------- ----------- hsa-let7 a ua u - au 8 ------------ ------- 30 h......R .... .............................. .. ........ a -- -n --------------- 9hsa~m iR - 00 ......... 1... ... .. .. .. .. .. .. .. .. .. .. .. .. --- -- -- -- - --- -- -- h a r i -....1.01 .. 6 ... S'- ............ h s"ni 0 --------------------------- --- s --- m --- 10~ 13 ha iR.-106a 1 hs-R1 06b15:acu (rnm-mi 84) _____ h11s'a - R-122a ni j---- 19............ -- - - - - - ----------------------------- +---------------- ......... h-Isa-miR- 12 22SQ5 ns a -rn iR13 6zsu'cuca'i:i~ 23 h- i- 13 ----------- N <fA ........ 32. J.. .... .... .... ... .......... u..... ----- ---------- 31 11 .. ..
-----
hsa-miR-135b uagguo7c oc1 gq 34 au2TCu hsa-miR--136 au<~ccca s\: uuc 356cuc hsa-mR-138 37 hsa-m..R-139 -8 a - u-------ua.... 38 hsa-miR-140 mag 8.z ua'u'vdcu 39888 hsa-m iR-141 u40m8u8u g u 41a hsa-miR-143 7CaguKa'caugugeu 42 N isa-miR- 144 uca a.g . c 43 ...a.....a......u. hsa-miR-145 j cuuuuccaga:ee 44-- ucc--u ..........- -....-...........- - - -- - - hsa-miR-146 ogagaa:8&u..iuuc:u88888a 45 uz2 ccc 22> hsa-miR-147 Lc:Iuggeauge'o.e.. 46 hsa-mniR-148a uc c c c a....s..... 47.m.a.ug.. hsa-miR-148b ucagogeaucaca acuo u 48 hsa-miR- 151 uce tccugacau 501ecca hsa-miR-151 ga z :zucac uC g. 5 ...... gg ......... hsa-nmiR-154 ua>89uuau&ce2u8uagecou 54ua uu .~t ......
Z.. hsa-miR-i155 8 aaogeuaauegugaua45 uu u C' hsa-miR-15a uau u u 56 u c hsa-m.iR-15b uageag.acacaugguuuac 157 esca ------ .I .. .
......... hsa-miR-t1 5a c 5 hsa-miR-18 u5abe u a .............. .......... 3 2..
1 hsa-miR,18 81 a -C.tP.E30n <Y 62 ...... h s - i -"b \c u a u: - - . ' s uta..."" " ..... 63 .... harR-18 1 b ~ )Q:flq~q<> 64 hsa-mIZ- 183 1 c .cj 1 ,7
..............
n i --- 4 ---------- -------------------- ' ....... 6 7hsa-miR-183 ksaiR-8 I W S44 69 '.. . ........................ Ihsa -n iR - 188 5 :7 hsa-miR-8 190 h.....iR 191.... ;.x . ..... 3 h1s a -rni,.-.194 .: ,)uC&) :Ulan ".::Ta 7 hsa-riR- 95 ' 1 ROOb . OK 74/~ cwJRn ... .. .. .. .. .. .. .. .. .. .. .. .. ........ ................. .) .. 7 hsa-mik 1975c'. ". >8 - ................
'... hsani i Ri 9a 5.:.'---77--:0 7 hsa -rniR-196b 79::.:a'..'.:a'.0008 ... mR - 0. -- ----- ----- ---- .. .. n.. . iZ R2. 2:O i . -''. 19 8
........
:. .. . . . . .. ................. ............. . ... .. ... harIR-20 99 873.:44. ................ .. .... .. -a u cc sa - ' -. 8- O hs;a ................. .......... .44'. . ..........(3 8 - ---- 3 3.. ..... . . - hsa-mniR-205 ucococaog ::u 92 hsa-miR-206 ogauuagaruu93 u hsa-mniR-208 .a&aagarcgageaa a e~:~B 94 aa hsa-maiR-.21 1un uacgc 95 aguu hsa-mniR-210 I uao ogc a 96 s hsa-mniR-21 1 acusuucugeu 97 uucccou. hsa-miR-212 j sacagccca oacggae 98 eaau hsa-miR-213 a case.. . u..cgga - 61 asuu hsa-miR- 181 a) --- -. h~sa-iRI-21 cge aa cgeau 9 aae hsa-miR-215 ua0 I ~ -'-- ---------- hsa-miR-216 a u 101u hsamiR21 e u cuao 102 uc a hsa-m--iR-218 .. gu.u...c. . 103 g... hsa-mniR-2 19 ugogcaar:sua 104 augu hsa-miR-22 aa5::u..:ca.aaa.u 105 hsa-mniR-220 coacaccgu>ucugac un 106 coace hsa-miR-221 a* u cuuugi'- u 107 hsamiR-222 u 108 hsa-miR-223 u6uc u 109 u ..................... .li ..... ... ....... hsa miR 224 a -e 110 hsa-m R 26b aua: +Ra a:2 114 [0096] The miR-183 binding sequence (SEQ ID NO: 59) was mutated (SEQ ID NO: 142) and embedded into the coding sequence of a reporter gene, such as in a CAT gene that also contains a FLAG Tag (SEQ ID NC): 143). This allows for the evaluation of expression in cells by Western blot analyses using an anti-FLAG Tag 34 antibody in which mutations of the miR183 binding sequence were made (SEQ ID NO: 144). [0097] The discussion of documents, acts, materials, devices, articles and the like is included in this specification solely for the purpose of providing a context for the present invention It is not suggested or represented that any or all of these matters formed part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application. [0098] Throughout the description and claims of this specification, the word "comprise" and variations of the word, such as "comprising' and "comprises", is not intended to exclude other additives, components, integers or steps. 35

Claims (9)

1. A method of improving full-length protein expression efficiency, the method comprising: a) providing a polynucleotide sequence comprising a 5' leader sequence, a coding sequence for the protein, a 3' untranslated region, and one or more miRNA binding sites; and b) mutating the one or more miRNA binding sites, wherein the mutation results in a decrease in miRNA binding at the one or more miRNA binding sites resulting in a reduction of miRNA-mediated down regulation of protein translation, thereby increasing full-length protein expression efficiency.
2. The method of claim 1, wherein mutating the one or more miRNA binding sites comprises mutating one or more nucleotides such that the amino acid sequence remains unaltered.
3. The method of claim 1 or claim 2, wherein mutating the one or more miRNA binding sites comprises mutating one or more nucleotides in a miRNA seed sequence.
4. The method of any one of claims I to 3, wherein mutating the one or more miRNA binding sites comprises mutating one or more nucleotides such that initiation codons are not introduced into the polynucleotide sequence.
5. The method of any one of claims 1 to 4, wherein mutating the one or more miRNA binding sites comprises mutating one or more nucleotides such that rare codons are not introduced into the polynucleotide sequence.
6. The method of any one of claims 1 to 5, wherein mutating the one or more miRNA binding sites comprises mutating one or more nucleotides such that additional miRNA seed sequences are not introduced into the polynucleotide sequence. 36
7. The method of any one of claims 1 to 6, wherein the one or more miRNA binding sites is located within the coding sequence.
8. The method of any one of claims 1 to 6, wherein the one or more miRNA binding sites is located within the 3' untranslated region.
9. The method of any one of claims 1 to 6, wherein the one or more miRNA binding sites is located within the 5' leader sequence. 37
AU2015203546A 2009-02-24 2015-06-25 Reengineering mrna primary structure for enhanced protein production Abandoned AU2015203546A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2015203546A AU2015203546A1 (en) 2009-02-24 2015-06-25 Reengineering mrna primary structure for enhanced protein production

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US61/155,049 2009-02-24
AU2010218388A AU2010218388B2 (en) 2009-02-24 2010-02-24 Reengineering mRNA primary structure for enhanced protein production
AU2015203546A AU2015203546A1 (en) 2009-02-24 2015-06-25 Reengineering mrna primary structure for enhanced protein production

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU2010218388A Division AU2010218388B2 (en) 2009-02-24 2010-02-24 Reengineering mRNA primary structure for enhanced protein production

Publications (1)

Publication Number Publication Date
AU2015203546A1 true AU2015203546A1 (en) 2015-07-16

Family

ID=53673666

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2015203546A Abandoned AU2015203546A1 (en) 2009-02-24 2015-06-25 Reengineering mrna primary structure for enhanced protein production

Country Status (1)

Country Link
AU (1) AU2015203546A1 (en)

Similar Documents

Publication Publication Date Title
JP5735927B2 (en) Re-engineering the primary structure of mRNA to enhance protein production
US11976308B2 (en) CRISPR DNA targeting enzymes and systems
CN113728098A (en) Enzymes with RUVC domains
KR20210042130A (en) ACIDAMINOCOCCUS SP. A novel mutation that enhances the DNA cleavage activity of CPF1
WO2019206233A1 (en) Rna-edited crispr/cas effector protein and system
KR102358375B1 (en) Method of establishing modified host cell
US20220282283A1 (en) Novel crispr dna targeting enzymes and systems
CA3173526A1 (en) Rna-guided genome recombineering at kilobase scale
AU2015203546A1 (en) Reengineering mrna primary structure for enhanced protein production
WO2023081762A2 (en) Serine recombinases
CN118139979A (en) Enzymes with HEPN domains
JP2015180203A (en) REENGINEERING mRNA PRIMARY STRUCTURE FOR ENHANCED PROTEIN PRODUCTION
CA3207525A1 (en) Compositions comprising a variant cas12i4 polypeptide and uses thereof
AU2013201458B2 (en) COMPOSITIONS AND METHODS RELATED TO mRNA TRANSLATIONAL ENHANCER ELEMENTS
JP2022528252A (en) Protein translation using circular RNA and its applications
CN110804626B (en) Method for constructing high-efficiency expression vector by combining high CG segment and low CG promoter
EP4303310A1 (en) Method for producing cas3 protein
CN107365770B (en) siRNA sequence aiming at 174-194 site of coding region 174 of autophagy-related gene Beclin1 and application thereof
CN107873056A (en) Novel clock reaches modulability RNA molecule and application thereof
CN115491377A (en) Nucleotide sequence for inducing RNA interference, reducing and eliminating virus pollution in cells and application
Go et al. A genetic system for RNase E variant-controlled overproduction of ColE1-type plasmid DNA
CN110863008A (en) Method for constructing high-efficiency expression vector by using MAR sequence to regulate weak promoter

Legal Events

Date Code Title Description
MK1 Application lapsed section 142(2)(a) - no request for examination in relevant period