WO2012039954A2 - Reliable stabilization of n-linked polypeptide native states with enhanced aromatic sequons located in polypeptide tight turns - Google Patents

Reliable stabilization of n-linked polypeptide native states with enhanced aromatic sequons located in polypeptide tight turns Download PDF

Info

Publication number
WO2012039954A2
WO2012039954A2 PCT/US2011/050900 US2011050900W WO2012039954A2 WO 2012039954 A2 WO2012039954 A2 WO 2012039954A2 US 2011050900 W US2011050900 W US 2011050900W WO 2012039954 A2 WO2012039954 A2 WO 2012039954A2
Authority
WO
WIPO (PCT)
Prior art keywords
asn
amino acid
thr
sequence
turn
Prior art date
Application number
PCT/US2011/050900
Other languages
French (fr)
Other versions
WO2012039954A3 (en
Inventor
Jeffery W. Kelly
Joshua L. Price
Elizabeth K. Culyba
Evan T. Powers
Original Assignee
The Scripps Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Scripps Research Institute filed Critical The Scripps Research Institute
Publication of WO2012039954A2 publication Critical patent/WO2012039954A2/en
Publication of WO2012039954A3 publication Critical patent/WO2012039954A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/575Hormones
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/67General methods for enhancing the expression

Definitions

  • the present invention was made with
  • N-Glycosylation can increase the stability of proteins, however the molecular basis for this is enhanced stability is incompletely understood.
  • the enzyme oligosaccharyl transferase (OST) attaches the highly conserved GlC3Man 9 Glc Ac 2 (where Glc is glucose, Man is mannose, and GlcNAc is N-acetylglucosamine ) glycan (oligosaccharide) en bloc to the N atom of the Asn side chain in a subset of Asn-Xxx-Thr/Ser sequons [Kornfeld et al . , Annu Rev Biochem 54, 631-664
  • N-linked glycans have important extrinsic effects on folding in the ER by allowing
  • N-glycans can also have intrinsic effects on protein folding by
  • a ⁇ -turn or reverse turn contains a sequence of four consecutive amino acid residues that are designated i, i+1, i+2 and i+3, in the direction from N-terminus toward C-terminus of the polypeptide.
  • the five residues of an CC-turn are designated i, i+1, i+2, i+3 and i+4.
  • the /3-turns are usually described as orienting structure because they orient a-helices, and /3-sheets, indirectly defining the topology of proteins. They are one of the most abundant
  • Types I, I', II, III, IV, V and VI are the most common reverse turns, the essential difference between them being the orientation of the peptide bond between residues at i+1 and i+2.
  • Thr/Ser where Aro is an aromatic amino acid residue such as histidine, phenylalanine, tyrosine or
  • n is zero, 1, 2, 3 or 4
  • Xxx is an amino acid residue other than an aromatic residue
  • p is zero or one
  • Zzz is any amino acid residue
  • Asn is asparagine
  • Yyy is any amino acid residue other than proline
  • Thr/Ser is one or the other of the amino acid residues threonine and serine
  • RnCD2ad glycosylation-naive rat CD2 adhesion domain
  • AcyP2 human muscle acylphosphatase
  • a chimeric therapeutic polypeptide of a pre-existing therapeutic polypeptide is contemplated. Such a therapeutic chimeric polypeptide is often present in isolated and purified form.
  • the pre-existing therapeutic polypeptide has a length of about 15 to about 1000, preferably about 25 to about 500, and more preferably about 35 to about 300, amino acid residues, and exhibits a secondary structure that comprises at least one tight turn containing a sequence of four to about seven amino acid residues in which at least two amino acid side chains extend on the same side of the tight turn and are within less than about 7A of each other.
  • the pre-existing therapeutic polypeptide lacks the sequon, in the direction from left to right and from N-terminus to C-terminus, Aro- (Xxx) n - ( Zz z ) p-Asn-Yyy-
  • Thr/Ser within that sequence of four to about seven amino acid residues.
  • Aro is an aromatic amino acid residue such as histidine, phenylalanine, tyrosine or tryptophan
  • n is zero, 1, 2, 3 or 4
  • Xxx is an amino acid residue other than an aromatic residue
  • p is zero or one
  • Zzz is any amino acid residue
  • Asn is asparagine
  • Yyy is any amino acid residue other than proline
  • Thr/Ser is one or the other of the amino acid residues threonine and serine.
  • a contemplated chimeric therapeutic polypeptide has the same length, at least one tight turn and substantially the same amino acid residue sequence as the pre-existing therapeutic polypeptide.
  • the two sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro- (Xxx) n - ( Zz z ) p-Asn-Yyy-Thr/Ser [SEQ ID NO: 1
  • n is 1 and "p” is 1 and the chimeric polypeptide contains a Type II ⁇ - turn in a six-residue loop.
  • n is 1 and "p” is zero.
  • the two polypeptide sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro-Xxx-Asn-Yyy-Thr/Ser as defined above.
  • the chimeric polypeptide preferably contains a five-residue type I ⁇ -bulge turn.
  • n is zero and "p” is zero.
  • the two sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro-Asn-Yyy-Thr/Ser as defined above.
  • a preferred chimeric polypeptide contains a four-residue type I' ⁇ -turn.
  • the therapeutic chimeric polypeptide when the sequon is glycosylated, exhibits a folding stabilization enhancement by about -0.5 to about -4 kcal/mol compared to the before- mentioned pre-existing therapeutic polypeptide in non-glycosylated form.
  • substantially any and every therapeutic polypeptide that contains a tight turn in its secondary structure is contemplated herein.
  • substantially all of the Fc portions of human IgG antibodies contain one or two tight turn sequences to which the present invention can be applied. One of those sequences is often glycosylated, whereas the other is not glycosylated.
  • the sequon has the sequence, in the direction from left to right and from N-terminus to C-terminus, -Lys- ( Zz z ) m -Aro-
  • a method of enhancing folded stabilization of a therapeutic polypeptide is also contemplated.
  • a contemplated therapeutic polypeptide has a sequence of about 15 to about 1000, preferably about 25 to about 500, and more preferably about 35 to about 300, amino acid residues, and exhibits a secondary
  • a therapeutic chimeric polypeptide is
  • That therapeutic chimeric polypeptide has the same length and substantially same amino acid sequence as the therapeutic polypeptide, and exhibits a secondary structure containing at least one tight turn at the same sequence position within the tight turn of the therapeutic polypeptide except that the sequence of preferably glycosylation-free four to about seven amino acid residues is replaced with the sequon, in the direction from left to right and from N-terminus to C-terminus, Aro- (Xxx) n - ( Zz z ) p- Asn (Glycan) -Yyy-Thr/Ser , wherein Aro is an aromatic amino acid residue, n is zero, 1, 2, 3 or 4, Xxx is an amino acid residue other than an aromatic residue, p is zero or one, Zzz is any amino acid residue, Asn (Glycan) is glycosylated asparagine, Yyy is any amino acid residue other than proline, Thr/Ser is one or the other of the amino acid residues threonine and serine, and the side chains
  • a therapeutic chimeric polypeptide is prepared by expressing a nucleic acid sequence that encodes the polypeptide sequence of the therapeutic chimeric polypeptide in a host cell that glycosylates the amino acid residue sequence Aro- (Xxx) n - ( Zz z ) p-Asn-Yyy-Thr/Ser when present in a polypeptide sequence expressed therein to form a polypeptide containing the amino acid residue
  • polypeptide is prepared by in vitro peptide
  • Another embodiment is a pharmaceutical composition that comprises an effective amount of a before-discussed chimeric therapeutic polypeptide dissolved or dispersed in a pharmaceutically
  • That pharmaceutical composition typically also contains water, at least when administered.
  • the present invention has several benefits and advantages.
  • One benefit is that a therapeutic polypeptide whose folding is thermodynamically more stable by the preparation of glycosylated chimer whose amino acid residue sequence is almost identical to that of the therapeutic polypeptide.
  • An advantage of the invention is that the preparation of a glycosylated chimeric therapeutic polypeptide is readily accomplished.
  • Fig. 1 in four parts illustrates that matching enhanced aromatic sequons with reverse turn hosts that can facilitate stabilizing interactions among Phe, Asn (GlcNAcl ) , and Thr .
  • Fig. 1A shows a space-filling model of the Phe63-Asn65-GlcNAc-Thr67 interaction of a glycosylated five-residue type I ⁇ -bulge turn from the adhesion domain of the human protein CD2 [PDB accession code: 1GYA; Wyss et al . , Science 269, 1273-1278 (1995)]; Fig.
  • IB illustrates a Type II ⁇ -turn in a six-residue loop [PDB accession code: 1PIN; Ranganathan et al . , Cell 89, 875-886 (1997)]; Fig. 1C shows a five-residue type I ⁇ -bulge turn [PDB accession code: 2F21; Jager et al . , Proc. Natl. Acad. Sci. USA 103, 10648-10653 (2006)]; and Fig. ID illustrates a four-residue type I' ⁇ -turn
  • Figs. IB-ID are from variants of the WW domain of human protein Pinl having incorporated components of the enhanced aromatic sequon. Structures are rendered in PyMOL (a user-sponsored molecular visualization system on an open-source foundation) with dotted lines depicting hydrogen bonds. Interatomic distances between the side-chain beta carbons ( ⁇ ' ⁇ ) in A are depicted.
  • Fig. 2 in six parts shows in Fig. 2A that residues 63-67 of the RnCD2ad retain the same five- residue type I ⁇ -bulge turn geometry found in HsCD2ad but RnCD2ad does not require N-glycosylation to fold;
  • Fig. 2B and 2C show stabilities and folding kinetics of the eight RnCD2* sequences required for the thermodynamic cycle were determined by equilibrium denaturation and stopped-flow kinetic studies; Fig.
  • FIG. 2D is a western blot showing that the relative ratio of N-glycosylated to non-glycosylated polypeptides from Sf9 insect cells is substantially higher for a RnCD2* variant having a Phe residue in the tight turn relative to a variant that lacks the Phe residue; tabulated data are shown in Fig. 2E (N refers to N-glycosylated Asn) ; and Fig.
  • 2F illustrates contact of the Phe and Thr side chains with the first GlcNAc of the N-glycan of four polypeptides found in a PDB search of proteins that contain type I ⁇ -bulge turns with a Phe at the i position, a glycosylated Asn residue at the i+2 position, and a Thr at the i+4.
  • Fig. 3 in four parts illustrates in Fig. 3A that the Thr43Phe (i) and Lys45Asn (i+2) mutations in the ⁇ -bulge turn human muscle acylphosphatase (AcyP2) create an enhanced aromatic sequon in that the i+4 position is already Thr;
  • Fig. 3B shows data from a equilibrium denaturation study for determining folding free energy;
  • Fig. 3C illustrates the
  • Fig. 3D is a western blot showing that the relative ratio of N-glycosylated to non-glycosylated polypeptides from Sf9 insect cells is substantially higher for a AcyP2* variant having a Phe residue in the tight turn relative to a variant that lacks the Phe residue.
  • Fig. 4 in five parts illustrates in Fig. 4A the residues of loop 1 of the 34-residue WW domain from human Pin 1 (Pin WW or Pinl WW) , a
  • Fig. 4B shows melting curves of a glycosylated (g-WW-F,T) and non-glycosylated (WW-F,T) variants
  • Figs. 4D and 4E show illustrative plots from variable temperature circular dichroism spectroscopy and laser temperature jump studies
  • Fig. 4F tabulates the thermal stability and folding rate data for the eight Pin WW variants studied (SEQ ID NOs : ) .
  • Fig. 5 in three parts illustrates triple mutant cycle cubes formed by protein 4, glycoprotein 4g, and their derivatives (Fig. 5A) ; Protein 5, glycoprotein 5g, and their derivatives (Fig. 5B) ; and Protein 6, glycoprotein 6g, and their derivatives (Fig 5C) .
  • Fig. 6 is a graph showing the origin of the increase in stability of Pinl protein derivatives 4-F,T, 5-F,T, and 6-F,T upon glycosylat ion .
  • AAG f , to tai is the sum of the energetic effects of (1) the Asnl9 to Asn (GlcNAc) 19 mutation (C N ) ; (2) the two-way interaction between Phel6 and Asn (GlcNAc) 19 (C FfN ) ; (3) the two-way interaction between Asn (GlcNAc) 19 and Thr21 (C N , T ) and (4) the three-way interaction between Phel6, Asn (GlcNAc) 19, and Thr21 (C F , N , T ) ⁇ 3 ⁇ 4, C F , / - C N , T ? and C F , N ,Tr are parameters obtained from least-squares regression of Equation A; error bars represent the corresponding standard errors.
  • antibody refers to a molecule that is a member of a family of glycosylated proteins called immunoglobulins, which can specifically bind to an antigen.
  • chimer or “chimeric” is used to describe a polypeptide that is man-made and does not occur in nature.
  • a contemplated chimeric polypeptide is encoded by a nucleotide sequence made by a
  • polypeptide is used herein to denote a sequence of about 15 to about 1000 peptide- bonded amino acid residues. A whole protein as well as a portion of a protein having the stated minimal length is a polypeptide.
  • Tight turn is used herein as defined in Chou, Anal Biochem 286, 1-16 (2000) to mean a polypeptide site where (i) a polypeptide chain reverses its overall direction, and (ii) the amino acid residues directly involved in forming the turn are no more than six.
  • Tight turns are generally categorized as ⁇ -turn, ⁇ -turn, ⁇ -turn, CC-turn, and ⁇ -turn, which are formed by two-, three-, four-, five-, and six-amino-acid residues, respectively. According to the folding mode, each of such tight turns can be further classified into several
  • ⁇ -Turns also known as "reverse turns” are of most interest herein, and of those tight turns, the tight turns referred to as a type-I ⁇ -bulge turn, a type-I' ⁇ -turn and a type-II ⁇ -turn are of particular interest.
  • Methods for predicting the presence of ⁇ -turns in polypeptides are provided in the citations of Chou, Anal Biochem 286, 1-16 (2000), and are otherwise well known in the art.
  • the present invention contemplates a therapeutic chimeric polypeptide that is typically present in isolated and purified form, and is a chimer of a pre-existing therapeutic polypeptide.
  • the pre-existing therapeutic polypeptide has a length of about 15 to about 1000, preferably about 25 to about 500, and more preferably about 35 to about 300 amino acid residues.
  • a pre-existing therapeutic polypeptide is a polypeptide used as a pharmaceutical or nutraceutical that is administered to a human or other animal.
  • a contemplated pre-existing therapeutic polypeptide is typically prepared exogenously of the recipient's body, but can be an endogenous polypeptide.
  • a contemplated chimeric therapeutic polypeptide is typically prepared as an exogenous polypeptide, but can be produced endogenously via gene therapy.
  • a contemplated pre-existing therapeutic polypeptide exhibits a secondary structure that comprises at least one tight turn containing a sequence of four to about seven amino acid residues in which at least two amino acid side chains extend on the same side of the tight turn and are within less than about 7A of each other.
  • the four to about seven amino acid residues present do not necessarily participate in the formation of the tight turn, but are present in the turn.
  • polypeptide has substantially the same length, at least one tight turn and substantially the same amino acid residue sequence as the pre-existing therapeutic polypeptide.
  • a contemplated chimer is different in its total amino acid sequence from the pre-existing polypeptide, and can be longer or shorter by one to about three residues than the pre ⁇ existing therapeutic polypeptide (substantially the same length), but is preferably the same length.
  • the two sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, in the direction from left to right and from N-terminus to C-terminus ,
  • Aro is an aromatic amino acid residue such as histidine, phenylalanine, tyrosine or tryptophan, of which phenylalanine, tyrosine and tryptophan are preferred,
  • n zero, 1, 2, 3 or 4,
  • Xxx is an amino acid residue other than an aromatic residue
  • p is zero or one
  • Zzz is any amino acid residue
  • Yyy is any amino acid residue other than proline
  • Thr/Ser is one or the other of the amino acid residues threonine and serine, of which
  • threonine is preferred.
  • the above sequon is located at the same position in the tight turn as the sequence of four to about seven amino acid residues present in the pre ⁇ existing polypeptide such that the side chains of three amino acid residues-- Aro, Asn and Thr/Ser -- project on the same side of the turn and are within less than about 7A of each other.
  • sequence of four to about seven amino acid residues present in the pre-existing polypeptide is preferably glycosylation-free .
  • the above sequon is glycosylated.
  • the therapeutic chimeric polypeptide exhibits a folding stabilization
  • residues Xxx, Yyy and Zzz be other than cysteine .
  • Aro- (Xxx) n - ( Zz z ) p-Asn-Yyy-Thr/Ser sequence is referred to herein as an "enhanced aromatic sequon" because of its increased propensity to form a stabilizing compact structure upon
  • OST oligosaccharyl transferase
  • a glycan bonded to the amido nitrogen of an asparagine side chain is illustrated herein as "Asn (Glycan) " to denote any glycan.
  • glycosylated polypeptide During the translocation of a glycosylated polypeptide through the endoplasmic reticulum (ER) , several sugars including each glucose (Glc) and several of the mannose (Man) groups are removed from the glycan portion.
  • the specific resulting glycan is dependent upon the plant or animal in which the polypeptide is expressed, and at what stage after expression the glycopolypeptide is recovered.
  • Illustrative glycosylated Asn residues include those with one N-acetylglucosamine
  • N-acetylglucosamines [Asn (ManGlcNAc2 ) ] , and with three mannoses and two N-acetylglucosamines that is referred to as "paucimannose" (Man3GlcNAc2 ) that forms the glycosylated residue Asn (Man3GlcNAc2 ) , and the like. Additionally, glycosylated asparagine residues can be utilized in an in vitro polypeptide synthetic scheme.
  • the sequon contemplated has the formula, from left to right and in the direction from N-terminus to C-terminus,
  • n is zero, 1, 2, or 3
  • Lys is lysine
  • Zzz, Aro, Xxx, n, p, Yyy and Thr/Ser are as defined previously.
  • this sequon is positioned in the tight turn sequence of the chimeric polypeptide at the same position in the tight turn as the sequence of four to about seven amino acid residues present in the pre-existing polypeptide such that the side chains of four amino acid residues—Lys , Aro, Asn and Thr/Ser --project on the same side of the turn and are within less than about 7A of each other. That is, each of the Lys, Aro and Thr/Ser residue side chains interacts with the glycan of the Asn residue after proper folding, as for example, after expression and passage of the expressed polypeptide through the ER.
  • Another way to identify the position of the about four to seven residue amino acid residues present in the pre-existing polypeptide is through use of the numbering system utilized for the location of residues present in a hydrogen bonded sequence of a ⁇ -turn, even though a hydrogen bond need not be present in a contemplated tight turn.
  • the N-terminal residue of the sequence that participates in the hydrogen bond is designated the "i" residue. Going in the direction toward the
  • residues are numbered "i+1", “i+2”, “i+3”, “i+4", " +5" , etc.
  • Residues to the N-terminal side of residue "i" are numbered
  • type-I ⁇ -bulge turn present in the non- therapeutic genetically-engineered polypeptide rat glycoprotein CD2 (RnCD2*) .
  • the sequon in that type-I ⁇ -bulge turn was engineered to be Asn-Gly-Thr, within the seven residue sequence Glu-Ile-Leu-Ala-Asn-Gly-
  • the pre-existing sequence in the pre-existing RnCD2* is Asn-Gly-Thr, where the Asn is at the i position, whereas the Gly is at the i+1, and Thr is at the i+2 position.
  • the Asn, Gly and Thr are as before, and the Lys, lie, Phe, and Ala are at positions i-4, i-3, i-2, and
  • n is 1 and "p” is 1 and the chimeric polypeptide contains a Type II ⁇ -turn in a six-residue loop.
  • the resulting enhanced aromatic sequon present in the chimeric polypeptide has the sequence : Aro-Xxx-Zzz-Asn-Yyy-Thr/Ser .
  • n is 1 and "p” is zero.
  • the pre-existing and chimeric polypeptide sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro- Xxx-Asn-Yyy-Thr/Ser as defined above.
  • the chimeric polypeptide preferably contains a five-residue type I ⁇ -bulge turn.
  • n is zero and "p” is zero.
  • the pre-existing and chimeric polypeptide sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro-Asn-Yyy-Thr/Ser as defined above.
  • a preferred chimeric polypeptide contains a four- residue type I' ⁇ -turn.
  • One group of exemplary pre-existing therapeutic polypeptides is constituted of
  • the heavy chain of all IgG-type antibodies has three constant domains: CHI, CH2, and CH3.
  • the CH2 and CH3 domains form what is called the Fc fragment, or the crystallizable fragment.
  • a complete human antibody heavy chain contains about 450 amino acid residues, of which about one-half are present in the Fc portion.
  • Table A provides a list of USAN names of therapeutic antibodies that are approved or at some point in clinical trials.
  • the CH2 and CH3 domains of human antibody Fc portions contain reverse turns, each of which can be modified to form one or two enhanced aromatic sequons .
  • the pre-existing tight turn sequence of illustrative antibodies or antibody Fc portions such as those below and exemplary replacement sequons contemplated herein that can provide enhanced folding stability are provided in Table B thereafter.
  • efungumab "elotuzumab” , “epratuzumab” , ertumaxomab” , “etaracizumab”, “figitumumab” , galiximab” , “ganitumab”, “gemtuzumab” , “genmab golimumab” , “ibalizumab” , “ ibritumomab” , infliximab” , “ ipilimumab” , “ lexatumumab” , lintuzumab” , “ lumiliximab” , “mapatumumab” , matuzumab” , “mepolizumab” , “milatuzumab” , motavizumab” , “natalizumab” , “necitumumab” , nimotuzumab” , “ofatumuma
  • hormones Another group of exemplary pre-existing therapeutic polypeptides is hormones.
  • hormones are erythropoietin, darbepoetin alfa (an erythropoietin variant with two additional
  • N-glycans interferon beta, and follicle stimulating hormone, follitropin beta, peginterferon alfa-2b, becaplermin, sermorelin, somatropin, pramlintide, sargramostim, insulin, thyrotropin alfa,
  • choriogonadotropin alfa lepirudin
  • lutropin alfa secretin
  • bivalirudin corticotrophin, exenatide and the like.
  • enzymes are laronidase, collagenase, and others.
  • pancrelipase streptokinase, urokinase, imiglucerase, reteplase, coagulation factor VII, coagulation factor VII, coagulation factor IX, alglucerase, agalsidase beta, asparaginase, hyaluronidase, tenecteplase, pegademase bovine, dornase alfa, anistreplase, pegaspargase,reteplase, and the like.
  • polypeptides include denileukin diftitox, botulinum toxin type B,
  • nesiritide pegfilgrastim, human serum albumin, mecasermin, aldesleukin, antihemophilic factor, aprotinin, palifermin, peginterferon alfa-2a, teriparatide, urofollitropin, anakinra, menotropins, OspA lipoprotein, pegvisomant, thymalfasin,
  • follitropin beta follitropin beta, peginterferon alfa-2b, alpha-1- proteinase inhibitor, filgrastim, oprelvekin, rasburicase, darbepoetin alfa, enfuvirtide and the like .
  • Table C illustrates five residue native sequences within tight turns of two of the above polypeptides, the alpha chain of follitropin beta, which has a type VI ⁇ -turn, and imiglucerase, which has a type I ⁇ -bulge turn. Also illustrated for each of those polypeptides are replacement sequon sequences for the illustrated native five residue sequences .
  • PDB Protein Data Bank
  • glycosylated Asn residues include those with one N-acetylglucosamine
  • glycosylated asparagine residues can be utilized in an in vitro polypeptide synthetic scheme.
  • a method of method of enhancing folded stabilization of a chimeric therapeutic polypeptide compared to a pre-existing therapeutic polypeptide is also contemplated.
  • the pre-existing therapeutic polypeptide comprises a sequence of about 15 to about 1000 amino acid residues, preferably about 25 to about 500 residues, and more preferably about 35 to about 300 residues, and exhibits a secondary
  • a therapeutic chimeric polypeptide is prepared that is of the same length and substantially same sequence as the therapeutic polypeptide and exhibits a secondary structure comprising at least one tight turn at the same sequence position within the tight turn as in the therapeutic polypeptide, except that said sequence of four to about seven amino acid residues is replaced with the sequon, in the direction from left to right and from N-terminus to C-terminus, Aro- (Xxx) n - (Zzz ) p-Asn (Glycan) -Yyy-Thr/Ser ,
  • Aro is an aromatic amino acid residue, n is zero, 1, 2, 3 or 4,
  • Xxx is an amino acid residue other than an aromatic residue
  • p zero or 1
  • Zzz is any amino acid residue
  • Asn (Glycan) is glycosylated asparagine
  • Yyy is any amino acid residue other than proline
  • Thr/Ser is one or the other of the amino acid residues threonine and serine
  • the side chains of the Aro, Asn (Glycan) and Thr/Ser amino acid residues project on the same side of the turn and are within less than about 7A of each other .
  • the Asn (Glycan) is Asn (GlcNAc) _ . In other embodiments, Asn (Glycan) is
  • Asn(Glycan) is Asn (GlcNAc) 2 Mani .
  • the glycan of Asn (Glycan) is
  • a contemplated polypeptide can be prepared in a number of manners. Longer polypeptides, such as those of about 50 residues and longer, are most readily prepared by genetic engineering following well known techniques. Thus, for example, a
  • therapeutic chimeric polypeptide is prepared by expressing a nucleic acid sequence that encodes the polypeptide sequence of the therapeutic chimeric polypeptide in a host cell that glycosylates the amino acid sequence Aro- (Xxx) n - ( Zz z ) pAsn-Yyy-Thr/Ser when present in a polypeptide sequence expressed therein to form the sequence Aro- (Xxx) n - ( Zz z ) p-
  • any of eukaryotic several host cells can be utilized for the
  • yeast cells such as
  • Saccharomyces cerevisiae Pichia pastoris
  • mammalian cells such as CHO cells
  • insect cells such as
  • Unstablized (unglycosylated or non-glycosylated) therapeutic polypeptides useful for comparative purposes can be expressed in bacterial cells that do not gylcosylate their expressed
  • polypeptides such as E. coli.
  • an illustrative polypeptide is expressed as a fusion protein that contains isolation and purification sequences.
  • One such sequence is a 6-residue hexa-histidine sequence at the N-terminus of the polypeptide to assist in purifying and isolating the desired chimer via binding to a Nickel affinity ligand on a solid support.
  • Additional affinity tags include the
  • Strep-tag® II which consists of a
  • streptavidin-recognizing octapeptide can be any streptavidin-recognizing octapeptide.
  • affinity tags Because it is desirable to remove most tags at the end of the purification process, considerable advances have been made in design of affinity tags so that they can be cleaved without leaving any residues behind and also to simplify the entire process of purification and cleavage.
  • One such system is the "Profinity eXactTM" fusion-tag system (Bio-Rad
  • subtilisin protease to carry out affinity binding and tag cleavage.
  • the protease is not only involved with the binding and recognition of the tag, but upon application of the elution buffer, it also serves to precisely cleave the tag from the fusion protein directly after the cleavage recognition sequence. This delivers a native, tag-free
  • polypeptide in a single step.
  • Another system for simple purification of proteins is based on elastin- like polypeptides (ELP) and intein.
  • ELP consist of several repeats of a peptide motif that undergo a reversible transition from soluble to insoluble upon temperature upshift.
  • the fusion protein is purified by temperature-induced aggregation and separation by centrifugation, and intein is used for tag removal. No affinity columns are needed for initial
  • Solubility-enhancing tags are generally large peptides or proteins that increase the
  • Fusion tags like GST and MBP also act as affinity tags and as a result, they are very popular for protein purification.
  • Other fusion tags like NusA,
  • TRX thioredoxin
  • SUMO small ubiquitin-like modifier
  • Ub ubiquitin
  • An expressed polypeptide also preferably includes a peptide cleavage site so that a purified polypeptide can be cleaved from any tags utilized in its purification and isolation. This cleavage or tag-removal step almost always involves using a protease to cleave a specific peptide bond between the tag and the protein of interest. A small number of highly specific proteases are routinely used for this purpose.
  • TMV tobacco etch virus
  • thrombin factor Ila, flla
  • factor Xa factor Xa
  • EK enterokinase
  • SUMOstar e.g. SUMOstar, Profinity
  • SUMOstar e.g. SUMOstar, Profinity
  • all of these enzymes have the potential to cleave within the protein of interest.
  • the SUMO proteases recognize not only their specific cleavage site, but also the tertiary structure of SUMO itself, giving them a very high degree of specificity.
  • a desired polypeptide can also be prepared by one or more of the well known in vitro polypeptide synthesis techniques, particularly solid phase synthesis. This mode of synthesis is also
  • a contemplated chimeric therapeutic polypeptide is an active ingredient in a pharmaceutical composition for administration to a human patient or suitable animal host such as a chimpanzee, mouse, rat, horse, sheep or the like.
  • a contemplated chimeric therapeutic polypeptide is dissolved or dispersed in a
  • polypeptide When administered to a host animal in need of the polypeptide, such as a mammal (e.g., a mouse, dog, goat, sheep, horse, bovine, monkey, ape, or human) or bird (e.g., a chicken, turkey, duck or goose) , the polypeptide provides the benefit of the pre-existing polypeptide.
  • a mammal e.g., a mouse, dog, goat, sheep, horse, bovine, monkey, ape, or human
  • bird e.g., a chicken, turkey, duck or goose
  • the amount of chimeric therapeutic polypeptide present in a pharmaceutical composition is referred to as an effective amount and can vary widely, depending inter alia, upon the polypeptide used and the presence of adjuvants and/or other excipients present in the composition.
  • the amount of chimeric therapeutic polypeptide that constitutes an effective amount varies with the polypeptide and the condition to be treated. Starting dosages are taken from the literature or the product label of the corresponding pre-existing therapeutic polypeptide usage, and are typically ultimately some what less than that used for the pre-existing therapeutic polypeptide .
  • compositions that contain proteinaceous materials as active ingredients are well understood in the art.
  • compositions are prepared as parenterals, either as liquid solutions or
  • suspensions solid forms suitable for solution in, or suspension in, liquid prior to injection can also be prepared.
  • the preparation can also be emulsified.
  • a contemplated chimeric therapeutic polypeptide is typically recovered by lyophilization .
  • a pharmaceutical composition is typically prepared from a recovered chimeric
  • polypeptide preferably in particulate form, in a physiologically tolerable (acceptable) diluent vehicle such as water, saline, phosphate-buffered saline (PBS), acetate-buffered saline (ABS), Ringer's solution, or the like to form an aqueous composition.
  • a physiologically tolerable (acceptable) diluent vehicle such as water, saline, phosphate-buffered saline (PBS), acetate-buffered saline (ABS), Ringer's solution, or the like to form an aqueous composition.
  • PBS phosphate-buffered saline
  • ABS acetate-buffered saline
  • Ringer's solution or the like to form an aqueous composition.
  • the lyophilized polypeptide is mixed with additional solid excipients and stored as such for constitution with water, saline and the like as discussed above.
  • Excipients that are pharmaceutically acceptable and compatible with the active ingredient are often mixed with the solid polypeptide, or can be predissolved in the liquid medium. Suitable
  • excipients are, for example, water, saline, dextrose, glycerol, ethanol, or the like and combinations thereof.
  • a composition can contain minor amounts of auxiliary substances such as wetting or emulsifying agents, pH buffering agents that enhance the effectiveness of the composition.
  • HsCD2ad human glycoprotein CD2
  • Fig. 1A The adhesion domain of human glycoprotein CD2 (HsCD2ad) , a non-therapeutic polypeptide, is glycosylated at Asn65, within the Asn65-Gly66-Thr67 sequon (Fig. 1A) .
  • NMR and crystallographic data demonstrate that Asn65 occupies the i+2 position of a five-residue type I ⁇ -bulge turn that spans from Phe63 (i) to Thr67 (i+4; Fig. IB), with Gly66
  • NOE evidence also suggests the possibility of a stabilizing protein-glycan interaction between GlcNAc2 of the glycan and Lys61 [Wyss et al., Science 269, 1273-1278 (1995)] .
  • Wyss et al hypothesized that this interaction disperses the positive charge present in a cluster of five Lys residues, but the energetics of this interaction were not probed [Wyss et al., Science 269, 1273-1278 (1995)] .
  • Previous kinetic studies of glycan-dependent HsCD2ad folding suggest that the N-glycan does much more than
  • HsCD2ad nonglycosylated HsCD2ad is unfolded, we used the structurally homologous rat ortholog of HsCD2ad
  • RnCD2ad Residues 63-67 of the RnCD2ad retain the same five-residue type I ⁇ - bulge turn geometry found in HsCD2ad (Fig. 2A, inset) [Jones et al . , Nature 360, 232-239 (1992)].
  • glycosylation (expressed in E. coli) , and has Glu at position 61 and Leu at position 63 in contrast to the Lys61 and Phe63 in HsCD2ad (Fig. 2A) .
  • Glycosylation stabilizes g-RnCD2* by -0.6 kcal mol -1 relative to RnCD2*, which is -2.5 kcal mol -1 less than the increase in stability observed upon glycosylation of HsCD2ad.
  • g-RnCD2*-K Glu61Lys
  • g-RnCD2*-F Leu63Phe
  • These effects are each about -1 kcal mol -1 greater than the observed increase in stability upon glycosylation of the unmodifided RnCD2*, suggesting that Lys61 and Phe63 in these RnCD2* variants are each able to form
  • glycosylation-naive proteins would also result in substantial increases to stability.
  • a PDB search supports this possibility by revealing four additional proteins that contain type I ⁇ -bulge turns with a Phe at the i position, a glycosylated Asn residue at the i+2 position, and a Thr at the i+4.
  • the Phe and Thr side chains contact the first GlcNAc of the N-glycan (Fig. 2F) .
  • glycosylated type I ⁇ -bulge turns in four additional proteins in which aromatic residues other than Phe (Tyr, Trp, or His) occupy the i position, making analogous contacts. This observation highlights the view that aromatic amino acid side chains other than Phe can also enhance glycosylation sequons by engaging in
  • the portability of the stabilization conferred by the enhanced aromatic sequon was tested by integrating it into a glycosylation-naive reverse turn in human muscle acylphosphatase (AcyP2), a two- layer oc ⁇ protein, in which two a-helices pack against a four-stranded ⁇ -sheet [Pastore et al . , J Mol Biol 224 , 427-440 (1992)].
  • Reverse turn residues 43 to 47 are not well-enough defined in the NMR structure of AcyP2 to discern their precise conformation, but homologous residues in the crystal structure of common type acylphosphatase (57% identical to AcyP2) adopt a type I ⁇ -bulge turn conformation [Yeung et al . , Acta Crystallogr Sect F Struct Biol Cryst Commun 62 , 80-82 (2006) ] .
  • Thr43Phe (i) and Lys45Asn (i+2) mutations in the ⁇ -bulge turn create the enhanced aromatic sequon (the i+4 position is already Thr;
  • AcyP2 glycosylated, as AcyP2 is a cytosolic protein
  • Ser to Ala mutations at positions 44, 82 and 95 to create a modified version of AcyP2 (AcyP2*) that is N-glycosylated only at Asn45.
  • fucosylated paucimannose glycans were expressed in Sf9 insect cells.
  • nonglycosylated AcyP2*-F from E.coli.
  • glycoprotein g-AcyP2* is destabilized relative to the non-glycosylated AcyP2* by +0.5 kcal mol -1 .
  • the estimated N-glycan-dependent contribution of the Phe- glycan interaction is -2.5 kcal mol ⁇ , suggesting that an interaction between Phe43 and the N-glycan at position 45 (and putatively Thr47) stabilizes the reverse turn, and thus the protein.
  • glycosylation efficiency was consistently enhanced.
  • the ratio of N-glycosylated to non-glycosylated proteins from Sf9 insect cells is substantially higher for both RnCD2* and AcyP2* variants relative to variants that lack the Phe residue (Fig. 2D and Fig. 3D), suggesting that the enhanced glycosylation sequon may be a better substrate for glycosylation by OST.
  • This observation should prove useful for enhancing glycoprotein yields, as sequon occupancy can be variable.
  • the enzymology of this observation merits further investigation, but it is plausible to speculate that OST may have evolved to favor
  • Pinl WW can be synthesized chemically, enabling us to examine the contributions of the Thr side chain to N-glycan dependent stabilization of Pin WW, in addition to the Phe-glycan interaction
  • the N-glycan in WW (GlcNAc) is much smaller than the N-glycans in RnCD2 ( oligomannose ) and AcyP2
  • the N-glycan-dependent contribution of Phel6 to Pin WW stability is -0.19 kcal mol -1 in the absence of Thr21, but is -0.62 kcal mol -1 in the presence of Thr 21.
  • the N-glycan- dependent contribution of Thr21 to Pin WW stability is -0.18 kcal mol -1 in the absence of Phel6, but is -0.63 kcal mol -1 in the presence of Phel6.
  • N-glycosylation at a given site is likely to
  • the WW domain from human Pin 1 also conveniently provides a single protein into which several types of enhanced aromatic sequons and their corresponding reverse turn types can be inserted without changing the overall structure or the
  • the WW domain is ideal for these requirements: many WW variants harboring different reverse turn types in loop 1 have been structurally characterized [Ranganathan et al . , Cell 89, 875-886 (1997); Jager M, et al . Proc. Natl. Acad. Scl. USA 103, 10648-106531 (2006); and Fuller et al . Proc. Natl. Acad. Sci. USA 106, 11067-11072 (2009)] and biophysically [Jager et al . Proc. Natl. Acad. Sci. USA 103, 10648-106531 (2006); Fuller et al . Proc. Natl. Acad. Sci.
  • sequences of the enhanced aromatic sequons in the four-, five-, and six-residue reverse turns comprising loop 1 include Phel6-Asn (GlcNAcl) 19-Gly20-Thr21 , Phel6-Alal8- Asn (GlcNAcl) 19-Gly20-Thr21 , and Phel6-Argl7-Serl8- Asn(GlcNAcl) 19-Gly20-Thr21 , respectively.
  • glycosylation can be estimated using triple mutant cycle analyses, done previously [Culyba et al . ,
  • the WW variants are named by the number of amino acids in the loop 1 reverse turn, followed by the letter “q” if the variant is N-glycosylated on Asnl9, the letter “F” if it has Phe at position 16, and the letter “T” if it has Thr at position 21.
  • the lack of the letters g, F, and/or T indicates that the variant is not N-glycosylated on Asnl9, that position 16 is Ser, and/or that position 21 is Arg,
  • variant 4g-F,T has a
  • Variant 4 has a 4-residue loop 1 type I' ⁇ -turn, with Asn (GlcNAcl) at position 19, Phe at position 16, and Thr at position 21.
  • Variant 4 has a 4-residue loop 1 type I' ⁇ -turn, with Asn at position 19, Ser at position 16, and Arg at position 21 (see the table hereinafter for the names of the WW variants studied) .
  • variable temperature circular dichroism (CD) spectropolarimetry to analyze the thermodynamic stability of WW variants 4-F,T, 4g-F,T,
  • glycosylating the Phe-Asn-Yyy-Thr enhanced aromatic sequon in the context of a four-residue type I ' ⁇ -turn stabilizes WW.
  • thermodynamic stabilities of each WW variant were measured in the four-, five-, and six-residue reverse turn groups in the table above.
  • the data from each group of eight WW variants comprise a triple mutant cycle (Fig. 5) .
  • Triple mutant cycles contain more information than conventional double mutant cycles, because each of the six "faces" of a triple mutant cycle "cube" is itself a double mutant cycle
  • AAG f , 2 -0.18 ⁇ 0.08 kcal mol -1 at 65° C
  • AAG f , 2 -0.18 ⁇ 0.08 kcal mol -1 at 65° C
  • AAG f , 2 0.05 ⁇ 0.10 kcal mol -1 at 65° C
  • AAAG ffb ack -0.51 ⁇ 0.15 kcal mol -1 at 65° C
  • AAAGf The attribution of AAAGf, fr0 nt and AAAG f , b ack values to the interaction between Phel6 and
  • Equation A shows how the AG f of a given variant of 4 is related to the average AG f ° of 4, plus a series of correction terms that account for the interactions amongst the amino acids at positions 16, 19, and 21.
  • Each correction term is a product of one or more indicator variables W (that reflect whether a mutation is present in the given variant) and a free energy contribution factor C .
  • W F is 0 when position 16 is Ser or 1 when it is Phe
  • W N is 0 when position 19 is Asn or 1 when it is Asn (GlcNAcl )
  • W T is 0 when position 21 is Arg or 1 when it is Thr.
  • CF , CN, and CT describe the energetic consequences of the Serl6 to Phel6, Asnl9 to Asn (GlcNAcl ) 19 , and Arg21 to Thr21 mutations, respectively. These energies are thought to reflect the difference in conformational
  • CF , N C F , T, and CN, T describe the free energies of the two-way interactions between Phel6 and Asn (GlcNAcl ) 19 , between Phel6 and Thr21, and between Asn (GlcNAcl ) 19 and Thr21, respectively.
  • C F , N, T describes the energetic impact of the three-way interaction between Phel6, Asn (GlcNAcl ) 19 , and Thr21.
  • CF , C F , T , CN, T, and C F , N, T are essentially equivalent to the two- and three-way interaction energies (AAAG f and AAAAG f values) that could be calculated by a
  • AAAAG f values obtained by comparison of the front and back double mutant cycles in each triple mutant cube in Fig. 5, confirming that the three-way interaction between Phel6, Asn (GlcNAcl ) 19 , and Thr21 stabilizes each reverse turn type by similar amounts.
  • N-glycans can extend serum half-life [Egrie et al . , Exp Hematol 31(4), 290-299 (2003); Su et al . , Int J Hematol 91 (2), 238-244 (2010); and Ceaglio et al . , Biochimie 90(3), 437-449 (2008)] and shelf-life, owing in part to increased protease resistance [Raju et al., Biochem Bioph Res Co 341(3), 797-803 (2006)], decreased aggregation propensity, and compensation for the destabilizing effect of methionine oxidation [Liu et al., Biochemistry 47 (18) , 5088-5100 (2008)].
  • the present invention has provided engineering guidelines by which N-glycosylation can reliably stabilize proteins. These matches include Phe-Asn-Yyy-Thr for type I' ⁇ -turns, Phe-Xxx-Asn-Yyy-Thr for type I ⁇ -bulge turns, and Phe-Xxx-Zzz-Asn-Yyy-Thr [SEQ ID NO:
  • the type I ⁇ -bulge turn and the type II ⁇ -turn in a six-residue loop comprise less than 9% of all reverse turns in the PDB [Sibanda et al . , J Mol Biol 206(4), 759-777 (1989); and Oliva et al . , J Mol Biol 266(4), 814-830 (1997)].
  • PBS Phosphate buffered saline
  • 50 mM acetate buffer was prepared from a 4X solution made from 4X solutions of acetic acid (Acros Organic 124040025) and sodium acetate
  • Acetate buffer was also prepared with 0.5 mM TCEP and 0.01% sodium azide. All buffer solutions were filtered (Millipore 0.2 ⁇ ) . Protein was concentrated using Amicon centrifugation devices, MWCO 3kDa (Millipore) . Final concentrations of
  • oligonucleotides for site directed mutagenesis were purchased from Integrated DNA Technologies (IDT), 25 nmole DNA oligo normalized to 100 ⁇ in IDTE pH 8.0. Wild type RnCD2 and AcyP2 gene constructs were ordered from IDT as miniGenes in pZErO-2 vectors (Kan resistant ) .
  • the first 6 residues are a 6 Histidine-tag, which was included for Nickel affinity chromatography purification. This tag is followed by a 7-residue Tobacco Etch Virus protease cleavage site (TEVs) tag. This tag/protease cleavage site combination is followed by a 9-residue FLAG-tag, which in turn is followed by the 4-residue Factor Xa cleavage site (Xas) that was included so that all of the tags could be removed from the expressed gene construct (which was done before all measurements were taken) .
  • TSVs Tobacco Etch Virus protease cleavage site
  • the wild type RnCD2 sequence contains three glycosylation sequons .
  • To confer glycosylation at Asn65 (bold) Asp67 (bold and underlined) was mutated to threonine [SEQ ID NO: ] .
  • the same purification/protease site tag used in the RnCD2* variants was used for AcyP2* variants and as with RnCD2* the entire tag was remove via Factor Xa cleavage prior to all studies. Note that the residues are numbered starting with the first residue (Met) after the Factor Xa cleavage site. It should also be noted that some sequence changes were made to all mutants to ensure that the protein was only glycosylated at the desired position (45) when expressed in Sf9 cells.
  • the wild type AcyP2 sequence contains three glycosylation sequons. The serines in these positions, Ser44, Ser82, and Ser96 (underlined), were mutated to alanine.
  • Lys45 (bold and underlined) was mutated to asparagine
  • NIMA- interacting 1 is an enzyme (EC 5.2.1.8) that regulates mitosis presumably by interacting with NIMA and attenuating its mitosis-promoting activity.
  • the enzyme displays a preference for an acidic residue N-terminal to the isomerized proline bond.
  • the enzyme catalyzes pSer/Thr-Pro cis/trans
  • Residues 6 through 44 at the N-terminus constitute the WW domain of Pinl [Ranganathan et al., Cell 89 , 875-886 (1997)].
  • the WW domain of Pinl [Ranganathan et al., Cell 89 , 875-886 (1997)].
  • amino acid residue sequences used as illlustrative herein are from position-6 through position-38. Amino acid residue position changes made to the WW domain are designated with the original amino acid residue position from the N-terminus .
  • the amino acid residue sequences utilized herein are shown in the tables below along with their expected and observed MALDI-TOF [M+H+] values .
  • N Asn (GlcNAc) ; ⁇ Monoisotopic masses; ⁇ Determined previously [Culyba et al., Science 331, 571-575 (2011)].
  • RnCD2 structural coordinates were obtained from the PDB (accession code 1HNG) .
  • AcyP2 structural coordinates were obtained from the PDB for horse muscle acylphosphatase (accession code lAPS.pdb), which shares 94% sequence homology with the human protein. Coordinates were manipulated and rendered using PyMOL software (Schrodinger LLC) .
  • a SuperdexTM 75 10/300 GL column (24 mL) was run in PBS (RnCD2*) or acetate (AcyP2*) at a flow rate of 0.4 mL/minute at room temperature (retention times: RnCD2* with glycan 12.5 minutes, RnCD2* without glycan 12.75 minutes, AcyP2* with glycan 14.75 minutes, AcyP2* without glycan 15 minutes ) .
  • RnCD2 * and AcyP2* have at least one tryptophan residue buried in the hydrophobic core allowing for an intrinsic fluorescence that depends on the folding status. Fluorescence measurements for RnCD2* and AcyP2 variants were obtained using either a CARY Eclipse (Varian) or an ATF-105 (Aviv)
  • Fluorescence emission spectra were collected from 315 to 400 nm, following excitation at 280 nm.
  • CD measurements were made using an AvivTM 62A DS spectropolarimeter , using quartz cuvettes with path lengths of 0.1 or 1 cm.
  • CD spectra were obtained by monitoring molar ellipticity from 340 to 200 nm in 1 nm increments, with 5-second averaging times.
  • Variable temperature CD data were obtained by monitoring molar ellipticity at 227 nm from 0.2 to 98.2°C at 2°C intervals, with 90 second equilibration time between data points and 30 second averaging times.
  • the variable temperature CD data were fit to obtain T m and AG f values for each protein, as
  • insoluble fraction was treated with 6 M guanidine hydrochloride (GdnHCl) in the appropriated binding buffer and subjected to for Ni-NTA purification under denaturing conditions (6 M GdnHCl) .
  • GdnHCl guanidine hydrochloride
  • RnCD2 * final buffer PBS, 0.5 mM TCEP, 0.01% sodium azide, pH 7.2.
  • AcyP2* final buffer 50 mM Acetate, 0.5 mM TCEP, 0.01% sodium azide, pH 5.5. HisFLAG-free
  • a 5' Sacl site (gagctc) and 3' Kpnl (ggtacc) site and a preprotrypsin leader sequence (PLS, for excretion into the medium) were designed into both the RnCD2 and AcyP2 genes ordered from IDT. Digestion (Sacl and Kpnl) and ligation of the products and the insect shuttle vector pFastBacTM ( Invitrogen) , yielded clone pPLSHisFLAG-RnCD2i and pPLSHisFLAG-AcyP2i (sometimes referred to as RnCD2i and AcyP2i, respectively, herein) .
  • growth medium was collected and 0.2 ⁇ filtered.
  • Protease inhibitors (1 tablet/200 mL; Roche EDTA-free) , 0.5 mM TCEP, and 1 mM EDTA were added to the filtered growth media extract.
  • Superflow® Ni-NTA resin (Qiagen) was used to affinity-purify proteins via the 6xHis tag, using conditions described in the Qiagen manual. Briefly, precipitated protein was resuspended in 1/4 of expression volume of lysis buffer (same as non- glycosylated variants) stirred for 1 hour at 4° C and 0.2 ⁇ filtered. Filtered medium was applied to a gravity Ni-NTA column in appropriate lysis buffer, and washed with 10 column volumes of lysis buffer and 50 column volumes of washing buffer (18 mM
  • Bound protein was removed with 4 column volumes of elution buffer (20 mM TrisHCl, 300 mM imidazole, pH 8.0 for all variants) .
  • an FPLC HisTrap HP column (1 mL) was used for purification with the same buffer conditions as above. Eluted fractions were exchanged into Concanavilin A (ConA) binding buffer (25 mM TrisHCl, 500 mM NaCl, 1 mM MnCl 2 , 1 mM CaCl 2 , pH 7.4) and 0.5 mM TCEP and concentrated in Amicon
  • RnCD2 * variant final buffer PBS, 0.5 mM TCEP, 0.01% sodium azide, pH 7.2.
  • AcyP2* variant final buffer 50 mM acetate, 0.5 mM TCEP, 0.01% sodium azide, pH 5.5. If cleavage was incomplete Nickel-NTA resin was used to remove uncleaved protein. ESI-MS characterization
  • LCMS analysis was performed using an Agilent 1100 LC coupled to an Agilent 1100 single quad ESI mass spectrometer. LC was performed with a 4.6 mm ⁇ 50 mm ZORBAX C8 column (Agilent Technologies, Inc.) .
  • PBS buffer (lx, 0.5mM TCEP, 0.01% sodium azide, pH 7.2) was made fresh daily from a 10x stock and filtered. Urea and guanidine solutions were prepared fresh daily in lxPBS, filtered, and
  • thermodynamic stability of the L63F variants and the saturation point of urea at 25° C all measurements were also taken in guanidine hydrochloride solutions for this mutant (variants g-RnCD2*-F and RnCD2*-F) . Further data can be found in Culyba et al . , Science 331, 571-575 (2011) .
  • Fluorescence measurements related to kinetic studies were obtained using an AVIV® ATF-105 stopped-flow fluorimeter for single-mixing studies.
  • the set-up consisted of two syringes (syringe 1: lmL, syringe 2: 2 mL) that permitted up to a 25-fold dilution of the components of syringe 1 with syringe 2, in a minimum of 80 ⁇ i , of which the flow cell holds 40 ]i .
  • the dead time between start of mixing and acquisition of data was estimated to be 50-100 ms; in general, only data after the first 200 ms were used for fitting.
  • Excitation was set at 280 nm (bandwidth: 2 nm) and emission was measured at 330 nm (bandwidth: 8 nm) .
  • the photomultiplier voltage was set to 1000 V and data was recorded for 20-200 seconds.
  • the decrease in intensity at 330 nm was monitored after native protein in PBS or low concentrations of urea or guanidine in syringe 1 was mixed with varying volumes of concentrated urea or guanidine solutions in syringe 2.
  • the increase in intensity at 330 nm was monitored after denatured protein in a urea or guanidine solution in syringe 1 was diluted with varying volumes of PBS buffer or low concentrations of urea or guanidine from syringe 2. All shots of a particular dilution were typically repeated at least 4 times.
  • solutions of RnCD2* variants were prepared in PBS and high concentration of urea or guanidine (in lxPBS) at matched protein concentrations ( 15-2C ⁇ g/mL) .
  • the solutions were mixed to produce approximately thirty 120 ⁇ samples at regular intervals of urea or
  • V-shapes (hence the term "chevron plot") .
  • the quantity k obs is equal to the sum of the unfolding and folding rate constants, k u and k f . Chevron plots therefore result from the dependence of In k u and In k f on urea concentration.
  • the unfolding rate constant dominates k obs at high denaturant concentrations, where the chevron plots for several of the RnCD2* variants are slightly curved. Curvature in the unfolding arm of a chevron plot is often attributed to changes in the structure of the folding transition state.
  • This equation can be fit to folding kinetics vs.
  • the equilibrium data were weighted as follows: 1) the equilibrium and kinetic data were fit separately to their models; 2) the root mean squared residuals for the two fits were calculated; 3) the ratio of the kinetic and equilibrium RMS residuals was calculated (RMSkinetic/RMSequiiibrium) ; 4) the equilibrium data points were multiplied by this ratio.
  • the combined kinetic and (weighted) equilibrium data sets were then fit simultaneously to the combined kinetic and
  • Acetate buffer 50 mM Acetate, 0.5 mM TCEP, 0.01% sodium azide, pH 5.5; Acetate
  • Urea solutions were prepared fresh daily in lxAcetate, filtered, and concentrations were confirmed my index of refraction (IOR) .
  • IOR my index of refraction
  • Subsequent dilutions of urea were made with lxAcetate and concentrations were checked by IOR.
  • Constants defined in equations include the universal gas constant (R) and temperature ( ⁇ ) . The value of RT at 25° C was taken to be 0.592 kcal/mol. Data were imported and fit in Microsoft Excel.
  • solutions AcyP2* variants were prepared in Acetate and high concentration of urea (in lxAcetate) at matched concentrations (15-30 ⁇ g/mL) .
  • the solutions were mixed to produce approximately thirty 120 ⁇ samples at regular intervals of urea or guanidine concentrations. Solutions were permitted to
  • Pinl WW domain proteins were synthesized as C-terminal acids, employing a solid phase peptide synthesis approach using a standard Fmoc Na
  • Piperidine and N, -diisopropylethylamine were purchased from Aldrich, N-methyl pyrrolidinone (NMP) was purchased from Applied Biosystems, and N,N- dimethylformamide (DMF) was obtained from Fisher.
  • dimethylformamide (DMF) .
  • Solvent was drained from the resin using a vacuum manifold.
  • To remove the Fmoc protecting group on the resin-linked amino acid 2.5 mL of 20% piperidine in DMF was added to the resin, and the resulting mixture was stirred at room temperature for 5 minutes.
  • the deprotection solution was drained from the resin with a vacuum manifold.
  • an additional 2.5 mL of 20% piperidine in DMF was added to the resin, and the resulting mixture was stirred at room temperature for 15 minutes.
  • the deprotection solution was drained from the resin using a vacuum manifold, and the resin was rinsed five times with DMF.
  • the desired Fmoc- protected amino acid (250 ⁇ , 5 eq.) and HBTU (250 ⁇ , 5 eq.) were dissolved by vortexing in 2.5 mL 0.1 M HOBt (250 ⁇ , 5 eq.) in NMP .
  • dissolved amino acid solution was added 87.1 ⁇ DIEA (500 ⁇ , 10 eq.) . Only 1.5 eq. of amino acid were used during the coupling of the expensive Fmoc- Asn (Ac3GlcNAc) -OH monomer, and the required amounts of HBTU, HOBT, and DIEA were adjusted accordingly. The resulting mixture was vortexed briefly and allowed to react for at least 1 minute.
  • the activated amino acid solution was then added to the resin, and the resulting mixture was stirred at room temperature for at least 1 hour.
  • Acid-labile side-chain protecting groups were globally removed and proteins were cleaved from the resin by stirring the resin for about 4 hours in a solution of phenol (0.5 g) , water (500 ]iL) ,
  • TFA trifluoroacetic acid
  • the TFA solution was drained from the resin, the resin was rinsed with additional TFA, and the resulting solution was concentrated under Ar . Proteins were precipitated from the concentrated TFA solution by addition of diethyl ether (about 45 mL) . Following centrifugation, the ether was decanted, and the pellet (containing the crude protein) was stored at -20° C until purification.
  • Acetate protecting groups were subsequently removed from the 3-, 4-, and 6-hydroxyl groups of GlcNAc in Asn ( GlcNAc ) -containing proteins by
  • the WW domains were purified by reverse-phase HPLC on a C18 column using a linear gradient of water in acetonitrile with 0.2% v/v TFA. The identity of each WW domain was confirmed by matrix-assisted laser desorption/ionization time-of- flight spectrometry (MALDI-TOF) , and purity was evaluated by analytical HPLC.
  • MALDI-TOF matrix-assisted laser desorption/ionization time-of- flight spectrometry
  • Acetate protecting groups were removed from the 3-, 4-, and 6-hydroxyl groups on the Asn-linked GlcNAc residues in proteins g-WW, g-WW-F, g-WW-T, and g-WW-F,T via hydrazinolysis as described previously [Ficht et al., Chem. Eur. J. 14, 3620-3629 (2008)]. Briefly, the crude protein was dissolved in a solution of 5% hydrazine solution in 60 mM aqueous dithiothreitol (sometimes containing as much as 50% acetonitrile, to facilitate dissolution of the crude protein) and allowed to stand at room temperature for about 1 hour with intermittent agitation. The deprotection reaction was quenched by the addition of about 1 mL TFA and about 20 mL water. The quenched reaction mixture was frozen and lyophilized to give the crude deprotected protein as a white powder.
  • glycosylated proteins even though these proteins were readily soluble in water after purification
  • Proteins were purified by preparative reverse-phase HPLC on a C18 column using a linear gradient of water in acetonitrile with 0.2% v/v TFA. HPLC fractions containing the desired protein product were pooled, frozen, and lyophilized. Polypeptides were
  • MALDI-TOF desorption/ionization time-of-flight spectrometry
  • CD spectra were obtained by monitoring molar ellipticity from 340 to 200 nm, with 5 second averaging times.
  • Variable temperature CD data were obtained by monitoring molar ellipticity at 227 nm from 0.2 to 98.2° C at 2 0 C intervals, with 90 seconds equilibration time between data points and 30 second averaging times.
  • y-intercept and Di is the slope of the post-transition baseline
  • N 0 is the y-intercept and Ni is the slope of the pre-transit ion baseline
  • K f is the
  • K f is related to the temperature-dependent free energy of folding AG f (T) according to the following equation: where R is the universal gas constant (0.0019872 kcal/mol/K) .
  • R is the universal gas constant (0.0019872 kcal/mol/K) .
  • T m melting temperature
  • AG f (T) ⁇ 0 + AG, x(T-T m ) + ⁇ 0 2 x(T-T m ) 2 (13) in which AGo, AGi, and AG 2 are parameters of the fit and T m is a constant obtained from the van't Hoff fit (in equation 12) .
  • the AG f values displayed in Figure 4F for each Pin WW domain protein were obtained by averaging the AG f values (calculated at 328.15 K using equation 13) from each of three or more replicate variable temperature CD studies on the same protein.
  • Ci and C2 are constants describing the amplitude of the fluorescence decay
  • xo is a constant that adjusts the measured time to zero after the
  • n(59° C) is the solvent viscosity at 59° C and n(T) is the solvent viscosity at temperature T, both calculated with equation 21:

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Endocrinology (AREA)
  • Physics & Mathematics (AREA)
  • Toxicology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Plant Pathology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Peptides Or Proteins (AREA)

Abstract

A chimeric therapeutic polypeptide of a pre-existing therapeutic polypeptide is disclosed, as are a method of enhancing folded stabilization and a pharmaceutical composition of the glycosylated chimer. The pre-existing and chimeric polypeptides have substantially the same length, substantially the same amino acid residue sequence, and exhibit at least one tight turn containing a sequence of four to about seven amino acid residues in which at least two amino acid side chains extend on the same side of the tight turn and are within less than about 7Å of each other. The chimeric therapeutic polypeptide has the sequon Aro- (Xxx)n- ( Zzz )p-Asn-Yyy-Thr/Ser [SEQ ID NO:___] within that tight turn sequence such that the side chains of the Aro, Asn and Thr/Ser amino acid residues project on the same side of the turn and are within less than about 7Å of each other. That sequon is absent from the pre-existing therapeutic polypeptide.

Description

RELIABLE STABILIZATION OF N-LINKED POLYPEPTIDE NATIVE STATES WITH ENHANCED AROMATIC SEQUONS LOCATED IN POLYPEPTIDE TIGHT TURNS
GOVERNMENTAL SUPPORT
The present invention was made with
governmental support from National Institutes of Health grant GM051105 and NRSA NIH post-doctoral fellowship F32 GM086039. The US government has certain rights in the invention.
CROSS-REFERENCE TO RELATED APPLICATION This application claims priority from
provisional application No. 61/514,202, filed August 2, 2011 and also from provisional application No. 61/380967, filed 08 September 2010, both of whose disclosures are incorporated herein by reference.
BACKGROUND ART
Nearly one-third of the eukaryotic proteome traverses the cellular secretory pathway [Imperiali, Acc. Chem. Res. 30, 452-459 (1997)]. Many of these proteins are co-translationally N-glycosylated at Asn residues within the conserved Asn1-Yyy2-Thr3/Ser3 sequon, where Yyy is any amino acid residue other than proline and is located at position 2 between an asparagine (Asn) at the amino-terminal end of the sequon and a threonine or serine (Thr/Ser) at the carboxy-terminal end of the sequon. N-Glycosylation can increase the stability of proteins, however the molecular basis for this is enhanced stability is incompletely understood. As the ribosome inserts polypeptide chains into the endoplasmic reticulum (ER) , the enzyme oligosaccharyl transferase (OST) attaches the highly conserved GlC3Man9Glc Ac2 (where Glc is glucose, Man is mannose, and GlcNAc is N-acetylglucosamine ) glycan (oligosaccharide) en bloc to the N atom of the Asn side chain in a subset of Asn-Xxx-Thr/Ser sequons [Kornfeld et al . , Annu Rev Biochem 54, 631-664
(1985); and Kelleher et al . , Glycobiology 16:47R-62R (2006)]. N-linked glycans have important extrinsic effects on folding in the ER by allowing
glycoproteins to enter the calnexin/calreticulin (CNX/CRT) folding/degradation pathway [Molinari, Nat Chem Biol 3, 313-320 (2007); Helenius et al . ,
Science 291, 2364-2369 (2001)]. N-glycans can also have intrinsic effects on protein folding by
enhancing protein folding efficiency in cells, even when the CNX/CRT pathway is absent [Banerjee et al., Proc Natl Acad Sci U S A 104, 11676-11681 (2007); Trombetta, Glycobiology 13, 77R-91R (2003)] or when the N-glycan does not allow CNX/CRT interactions
[Stanley et al . , FASEB J 9, 1436-1444 (1995)], consistent with reports that N-glycans stabilize protein structure, accelerate folding, and reduce aggregation in vitro [Wormald et al . , Structure with Folding & Design 7, R155-R160 (1999); Jitsuhara et al., J Biochem 132, 803-811 (2002); Mitra et al . , Trends in Biochemical Sciences 31, 156-163 (2006)].
The increased use of protein therapeutics has made issues such as stabilized polypeptide structure, accelerated folding, and reduced
aggregation of paramount importance to the
pharmaceutical industry [Li et al . , Curr Opin
Biotechnol 20, 678-684 (2009); Sinclair et al . , J Pharm Sci 94, 1626-1635 (2005); Sola et al . , BioDrugs 24, 9-21 (2010); Walsh et al . , Nat Biotechnol 24,
1241-1252 (2006)]. The therapeutic benefits of
N-glycosylation are exemplified in darbepoetin alfa (an erythropoietin variant with two additional
N-glycans) [Egrie et al . , Exp Hematol 31, 290-299 (2003), interferon β [Runkel et al., Pharm Res 15, 641-649 (1998)], and follicle stimulating hormone
[Perlman et al . , J Clin Endocrinol Metab 88, 3227- 3235 (2003) ] .
A number of types of tight turns within secondary protein or polypeptide sequences have described in the literature. These structures are referred to as a δ-turn that encompasses two amino acid residues, a γ-turn that involves three residues, a β-turn that involves four amino acid residues, an CC-turn that involves five residues and a π-turn that involves six residues. [Chou, Anal Biochem 286, 1-16 (2000) . ]
A β-turn or reverse turn contains a sequence of four consecutive amino acid residues that are designated i, i+1, i+2 and i+3, in the direction from N-terminus toward C-terminus of the polypeptide. The five residues of an CC-turn are designated i, i+1, i+2, i+3 and i+4. Most, but not all reverse turns and CC-turns contain a hydrogen bond between the first and fourth or first and fifth residues, respectively, in which the residue designated i contains a peptide bond (peptidyl) carbonyl group (>C=0) , whereas the fourth residue, i+3, or the fifth residue, i+4, contains the peptidyl -NH- group whose hydrogen is hydrogen-bonded to the carbonyl oxygen of the i residue. Residues bonded to the amino group of the i residue (toward the amino-terminus from the i
residue) are designated i-1, i-2, i-3, etc.
Another way to define a reverse turn and an CC-turn motif is by the close approach, less than 7 A, of Ca atoms (alpha-carbon atoms) of the residues of the motif. Thus, one can define a β-turn and an
CC-turn by the close approach of Ca atoms of residues
I and i+3 or i and i+4, respectively. [Chou, Anal Biochem 286, 1-16 (2000) .] This distance implies a particular geometry of the corresponding backbone, which turns back on itself or, more generally, that corresponds to a change of direction, and that the residue side chains are on the same side of the backbone chain.
The /3-turns are usually described as orienting structure because they orient a-helices, and /3-sheets, indirectly defining the topology of proteins. They are one of the most abundant
secondary structures.
Several types of reverse turns have been identified and are designated types I, I', II, III, IV, V and VI. Types I and II are the most common reverse turns, the essential difference between them being the orientation of the peptide bond between residues at i+1 and i+2. The i+2 residue of the type
II turn can substantially only be occupied by glycine because of steric interference of the carbonyl group of the i+1 residue.
It was recently shown that naturally occurring N-glycosylat ion at a single Asn residue comprising a reverse turn within the adhesion domain of human glycoprotein CD2 (HsCD2ad) stabilizes the protein by -3.1 kcal mol-1, makes folding four times faster, and makes unfolding 50 times slower in vitro [Hanson et al . , Proc Natl Acad Sci U S A 106, 3131- 3136 (2009)]. However, introducing N-glycans into proteins that are not normally glycosylated (naive proteins) has previously rarely led to substantially improved folding energetics [Hackenberger et al . , J Am Chem Soc 127, 12882-12889 (2005); Wang et al . , Biochemistry 35, 7299-7307 (1996); Elliott et al . , J Biol Chem 279, 16854-16862 (2004)].
The present inventors and co-workers recently showed that glycosylation of an Asn residue within the sequence Aro- (Xxx) n- ( Zz z ) p-Asn-Yyy-
Thr/Ser, where Aro is an aromatic amino acid residue such as histidine, phenylalanine, tyrosine or
tryptophan, n is zero, 1, 2, 3 or 4, Xxx is an amino acid residue other than an aromatic residue, p is zero or one, Zzz is any amino acid residue, Asn is asparagine, Yyy is any amino acid residue other than proline, Thr/Ser is one or the other of the amino acid residues threonine and serine, stabilizes the glycosylation-naive rat CD2 adhesion domain (RnCD2ad) and human muscle acylphosphatase (AcyP2) by about -2 kcal mol-1, provided that Asn is located at the i+2 position of a type I β-turn with a Gl β-bulge using the terminology of Sibanda et al . , J Mol Biol 206(4), 759-777 (1989); Richardson, Adv Protein Chem 34, 167- 339 (1981), hereafter called a type I β-bulge turn [Culyba et al . , Science 331, 571-575 (2011);
Application Serial No. 61/380967, filed 08 September 2010] .
Published structural data [Wyss et al . , Science 269, 1273-1278 (1995)] from the human
ortholog of RnCD2ad (HsCD2ad, Fig. 1A) suggest that placement of an N-glycan at i+2 in the type I β-bulge turn context permits the -face of GlcNAcl of the N-glycan to engage in stabilizing hydrophobic
interactions with the aromatic ring of Phe at the i position, and the side-chain methyl group of Thr at the i+4 position {a stabilizing C-Η/π interaction may also play a role [Laughrey et al . , J Am Chem Soc
130 ( 44 ) , 14625-14633 (2008)]}.
Thus, it is hypothesized that the substantial energetic benefits of glycosylating a protein such as HsCD2ad depend on both the reverse turn context of the glycosylation site and the surrounding amino acid sequence. Some results showing the correctness of this hypothesis as applied to therapeutic polypeptides are shown and discussed hereinafter .
BRIEF DESCRIPTION OF THE INVENTION
A chimeric therapeutic polypeptide of a pre-existing therapeutic polypeptide is contemplated. Such a therapeutic chimeric polypeptide is often present in isolated and purified form.
The pre-existing therapeutic polypeptide has a length of about 15 to about 1000, preferably about 25 to about 500, and more preferably about 35 to about 300, amino acid residues, and exhibits a secondary structure that comprises at least one tight turn containing a sequence of four to about seven amino acid residues in which at least two amino acid side chains extend on the same side of the tight turn and are within less than about 7A of each other. The pre-existing therapeutic polypeptide lacks the sequon, in the direction from left to right and from N-terminus to C-terminus, Aro- (Xxx) n- ( Zz z ) p-Asn-Yyy-
Thr/Ser [SEQ ID NO: ], within that sequence of four to about seven amino acid residues. In that sequon, Aro is an aromatic amino acid residue such as histidine, phenylalanine, tyrosine or tryptophan, n is zero, 1, 2, 3 or 4, Xxx is an amino acid residue other than an aromatic residue, p is zero or one, Zzz is any amino acid residue, Asn is asparagine, Yyy is any amino acid residue other than proline, Thr/Ser is one or the other of the amino acid residues threonine and serine. Except for the four to about seven residues within the tight turn, a contemplated chimeric therapeutic polypeptide has the same length, at least one tight turn and substantially the same amino acid residue sequence as the pre-existing therapeutic polypeptide. The two sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro- (Xxx) n- ( Zz z ) p-Asn-Yyy-Thr/Ser [SEQ
ID NO: ] as defined above. That sequon is located at the same position in the tight turn as the sequence of four to about seven amino acid residues such that the side chains of the Aro, Asn and Thr/Ser amino acid residues project on the same side of the turn and are within less than about 7A of each other. In one preferred embodiment, "n" is 1 and "p" is 1 and the chimeric polypeptide contains a Type II β- turn in a six-residue loop.
In another preferred embodiment, "n" is 1 and "p" is zero. The two polypeptide sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro-Xxx-Asn-Yyy-Thr/Ser as defined above. The chimeric polypeptide preferably contains a five-residue type I β-bulge turn.
In still another preferred embodiment, "n" is zero and "p" is zero. The two sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro-Asn-Yyy-Thr/Ser as defined above. Here, a preferred chimeric polypeptide contains a four-residue type I' β-turn.
In preferred practice, when the sequon is glycosylated, the therapeutic chimeric polypeptide exhibits a folding stabilization enhancement by about -0.5 to about -4 kcal/mol compared to the before- mentioned pre-existing therapeutic polypeptide in non-glycosylated form.
It is to be understood that substantially any and every therapeutic polypeptide that contains a tight turn in its secondary structure is contemplated herein. For example, substantially all of the Fc portions of human IgG antibodies contain one or two tight turn sequences to which the present invention can be applied. One of those sequences is often glycosylated, whereas the other is not glycosylated.
In some preferred embodiments, the sequon has the sequence, in the direction from left to right and from N-terminus to C-terminus, -Lys- ( Zz z ) m-Aro-
(Xxx) n-Zzz-Asn-Yyy-Thr/Ser, wherein m is zero, 1, 21, or 3, and Lys is lysine, and Zzz, Aro, Xxx, Asn, Yyy and Thr/Ser are as defined above.
A method of enhancing folded stabilization of a therapeutic polypeptide is also contemplated. A contemplated therapeutic polypeptide has a sequence of about 15 to about 1000, preferably about 25 to about 500, and more preferably about 35 to about 300, amino acid residues, and exhibits a secondary
structure that comprises at least one tight turn in which the side chains of two residues in a preferably glycosylation-free sequence of four to about seven amino acid residues within the tight turn project on the same side of the turn and are within less than about 7A of each other. In accordance with the method, a therapeutic chimeric polypeptide is
prepared. That therapeutic chimeric polypeptide has the same length and substantially same amino acid sequence as the therapeutic polypeptide, and exhibits a secondary structure containing at least one tight turn at the same sequence position within the tight turn of the therapeutic polypeptide except that the sequence of preferably glycosylation-free four to about seven amino acid residues is replaced with the sequon, in the direction from left to right and from N-terminus to C-terminus, Aro- (Xxx) n- ( Zz z ) p- Asn (Glycan) -Yyy-Thr/Ser , wherein Aro is an aromatic amino acid residue, n is zero, 1, 2, 3 or 4, Xxx is an amino acid residue other than an aromatic residue, p is zero or one, Zzz is any amino acid residue, Asn (Glycan) is glycosylated asparagine, Yyy is any amino acid residue other than proline, Thr/Ser is one or the other of the amino acid residues threonine and serine, and the side chains of the Aro, Asn (Glycan) and Thr/Ser amino acid residues project on the same side of the tight turn and are within less than about 7A of each other.
In some embodiments, a therapeutic chimeric polypeptide is prepared by expressing a nucleic acid sequence that encodes the polypeptide sequence of the therapeutic chimeric polypeptide in a host cell that glycosylates the amino acid residue sequence Aro- (Xxx) n- ( Zz z ) p-Asn-Yyy-Thr/Ser when present in a polypeptide sequence expressed therein to form a polypeptide containing the amino acid residue
sequence Aro- (Xxx) n- ( Zz z ) p-Asn (Glycan) -Yyy-Thr/ Ser .
In other embodiments, a therapeutic chimeric
polypeptide is prepared by in vitro peptide
synthesis . Another embodiment is a pharmaceutical composition that comprises an effective amount of a before-discussed chimeric therapeutic polypeptide dissolved or dispersed in a pharmaceutically
acceptable diluent composition. That pharmaceutical composition typically also contains water, at least when administered.
The present invention has several benefits and advantages. One benefit is that a therapeutic polypeptide whose folding is thermodynamically more stable by the preparation of glycosylated chimer whose amino acid residue sequence is almost identical to that of the therapeutic polypeptide.
An advantage of the invention is that the preparation of a glycosylated chimeric therapeutic polypeptide is readily accomplished.
Still further benefits and advantages will be apparent to those of skill in the art from the disclosures that follow.
BRIEF DESCRIPTION OF THE DRAWINGS
In the drawings forming a portion of this disclosure,
Fig. 1 in four parts illustrates that matching enhanced aromatic sequons with reverse turn hosts that can facilitate stabilizing interactions among Phe, Asn (GlcNAcl ) , and Thr . Fig. 1A shows a space-filling model of the Phe63-Asn65-GlcNAc-Thr67 interaction of a glycosylated five-residue type I β-bulge turn from the adhesion domain of the human protein CD2 [PDB accession code: 1GYA; Wyss et al . , Science 269, 1273-1278 (1995)]; Fig. IB illustrates a Type II β-turn in a six-residue loop [PDB accession code: 1PIN; Ranganathan et al . , Cell 89, 875-886 (1997)]; Fig. 1C shows a five-residue type I β-bulge turn [PDB accession code: 2F21; Jager et al . , Proc. Natl. Acad. Sci. USA 103, 10648-10653 (2006)]; and Fig. ID illustrates a four-residue type I' β-turn
[PDB accession code: 1ZCN; Jager et al . , Proc. Natl. Acad. Sci. USA 103, 10648-10653 (2006)]. Figs. IB-ID are from variants of the WW domain of human protein Pinl having incorporated components of the enhanced aromatic sequon. Structures are rendered in PyMOL (a user-sponsored molecular visualization system on an open-source foundation) with dotted lines depicting hydrogen bonds. Interatomic distances between the side-chain beta carbons (Οβ'ε) in A are depicted.
Fig. 2 in six parts shows in Fig. 2A that residues 63-67 of the RnCD2ad retain the same five- residue type I β-bulge turn geometry found in HsCD2ad but RnCD2ad does not require N-glycosylation to fold; Fig. 2B and 2C show stabilities and folding kinetics of the eight RnCD2* sequences required for the thermodynamic cycle were determined by equilibrium denaturation and stopped-flow kinetic studies; Fig. 2D is a western blot showing that the relative ratio of N-glycosylated to non-glycosylated polypeptides from Sf9 insect cells is substantially higher for a RnCD2* variant having a Phe residue in the tight turn relative to a variant that lacks the Phe residue; tabulated data are shown in Fig. 2E (N refers to N-glycosylated Asn) ; and Fig. 2F illustrates contact of the Phe and Thr side chains with the first GlcNAc of the N-glycan of four polypeptides found in a PDB search of proteins that contain type I β-bulge turns with a Phe at the i position, a glycosylated Asn residue at the i+2 position, and a Thr at the i+4.
Fig. 3 in four parts illustrates in Fig. 3A that the Thr43Phe (i) and Lys45Asn (i+2) mutations in the β-bulge turn human muscle acylphosphatase (AcyP2) create an enhanced aromatic sequon in that the i+4 position is already Thr; Fig. 3B shows data from a equilibrium denaturation study for determining folding free energy; Fig. 3C illustrates the
sequences at positions 41-47 of four AcyP2* variants
(SEQ ID NOs : ) differing in the identity of the side chain at position 43 (Phe or Thr) and in the presence or absence of a glycan at Asn45; and Fig. 3D is a western blot showing that the relative ratio of N-glycosylated to non-glycosylated polypeptides from Sf9 insect cells is substantially higher for a AcyP2* variant having a Phe residue in the tight turn relative to a variant that lacks the Phe residue.
Fig. 4 in five parts illustrates in Fig. 4A the residues of loop 1 of the 34-residue WW domain from human Pin 1 (Pin WW or Pinl WW) , a
glycosylation-naive β-sheet protein, that contains a four residue type II β-turn within a larger six- residue H-bonded loop; Fig. 4B shows melting curves of a glycosylated (g-WW-F,T) and non-glycosylated (WW-F,T) variants; Figs. 4D and 4E show illustrative plots from variable temperature circular dichroism spectroscopy and laser temperature jump studies; and Fig. 4F tabulates the thermal stability and folding rate data for the eight Pin WW variants studied (SEQ ID NOs : ) .
Fig. 5 in three parts illustrates triple mutant cycle cubes formed by protein 4, glycoprotein 4g, and their derivatives (Fig. 5A) ; Protein 5, glycoprotein 5g, and their derivatives (Fig. 5B) ; and Protein 6, glycoprotein 6g, and their derivatives (Fig 5C) .
Fig. 6 is a graph showing the origin of the increase in stability of Pinl protein derivatives 4-F,T, 5-F,T, and 6-F,T upon glycosylat ion . AAGf,totai is the sum of the energetic effects of (1) the Asnl9 to Asn (GlcNAc) 19 mutation (CN) ; (2) the two-way interaction between Phel6 and Asn (GlcNAc) 19 (CFfN) ; (3) the two-way interaction between Asn (GlcNAc) 19 and Thr21 (CN,T) and (4) the three-way interaction between Phel6, Asn (GlcNAc) 19, and Thr21 (CF,N,T) · ¾, CF, /- CN,T? and CF,N,Tr are parameters obtained from least-squares regression of Equation A; error bars represent the corresponding standard errors.
DEFINITIONS
To facilitate understanding of the invention, a number of terms are defined below.
The term "antibody" refers to a molecule that is a member of a family of glycosylated proteins called immunoglobulins, which can specifically bind to an antigen.
The term "chimer" or "chimeric" is used to describe a polypeptide that is man-made and does not occur in nature. A contemplated chimeric polypeptide is encoded by a nucleotide sequence made by a
splicing together of two or more complete or partial genes or cDNA, or by synthetically constructing such a polypeptide by in vitro methods. The pieces used can be from different species. In the present instance, the sequence of the sequon Aro-(Xxx)n-
( Zz z ) p-Asn-Yyy-Thr /Ser , as defined before, is
typically spliced into a tight turn present in a pre- existing therapeutic polypeptide using genetic engineering techniques.
The term "polypeptide" is used herein to denote a sequence of about 15 to about 1000 peptide- bonded amino acid residues. A whole protein as well as a portion of a protein having the stated minimal length is a polypeptide.
The term "tight turn" is used herein as defined in Chou, Anal Biochem 286, 1-16 (2000) to mean a polypeptide site where (i) a polypeptide chain reverses its overall direction, and (ii) the amino acid residues directly involved in forming the turn are no more than six. Tight turns are generally categorized as δ-turn, γ-turn, β-turn, CC-turn, and π-turn, which are formed by two-, three-, four-, five-, and six-amino-acid residues, respectively. According to the folding mode, each of such tight turns can be further classified into several
different types. β-Turns also known as "reverse turns" are of most interest herein, and of those tight turns, the tight turns referred to as a type-I β-bulge turn, a type-I' β-turn and a type-II β-turn are of particular interest. Methods for predicting the presence of β-turns in polypeptides are provided in the citations of Chou, Anal Biochem 286, 1-16 (2000), and are otherwise well known in the art.
All amino acid residues identified herein are in the natural L-configuration . In keeping with standard polypeptide nomenclature, J. Biol. Chem. , 243, 3557-3559 (1969), abbreviations for amino acid residues are as shown in the following Table of Correspondence : TABLE OF CORRESPONDENCE
SYMBOL
1-Letter 3-Letter AMINO ACID
Y Try L-tyrosine
G Gly glycine
F Phe L-phenylalanine
M Met L-methionine
A Ala L-alanine
S Ser L-serine
I He L-isoleucine
L Leu L-leucine
T Thr L-threonine
V Val L-valine
P Pro L-proline
K Lys L-lysine
H His L-histidine
Q Gin L-glutamine
E Glu L-glutamic acid
W Trp L-tryptophan
R Arg L-arginine
D Asp L-aspartic acid
N Asn L-asparagine
C Cys L-cysteine
DETAILED DESCRIPTION OF THE INVENTION
The present invention contemplates a therapeutic chimeric polypeptide that is typically present in isolated and purified form, and is a chimer of a pre-existing therapeutic polypeptide. The pre-existing therapeutic polypeptide has a length of about 15 to about 1000, preferably about 25 to about 500, and more preferably about 35 to about 300 amino acid residues.
A pre-existing therapeutic polypeptide is a polypeptide used as a pharmaceutical or nutraceutical that is administered to a human or other animal. A contemplated pre-existing therapeutic polypeptide is typically prepared exogenously of the recipient's body, but can be an endogenous polypeptide. A contemplated chimeric therapeutic polypeptide is typically prepared as an exogenous polypeptide, but can be produced endogenously via gene therapy.
A contemplated pre-existing therapeutic polypeptide exhibits a secondary structure that comprises at least one tight turn containing a sequence of four to about seven amino acid residues in which at least two amino acid side chains extend on the same side of the tight turn and are within less than about 7A of each other. The four to about seven amino acid residues present do not necessarily participate in the formation of the tight turn, but are present in the turn.
A contemplated chimeric therapeutic
polypeptide has substantially the same length, at least one tight turn and substantially the same amino acid residue sequence as the pre-existing therapeutic polypeptide. However, a contemplated chimer is different in its total amino acid sequence from the pre-existing polypeptide, and can be longer or shorter by one to about three residues than the pre¬ existing therapeutic polypeptide (substantially the same length), but is preferably the same length. The two sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, in the direction from left to right and from N-terminus to C-terminus ,
Aro- (Xxx) n- (Zzz)p-Asn-Yyy-Thr/Ser, [SEQ ID
NO: ]
wherein Aro is an aromatic amino acid residue such as histidine, phenylalanine, tyrosine or tryptophan, of which phenylalanine, tyrosine and tryptophan are preferred,
n is zero, 1, 2, 3 or 4,
Xxx is an amino acid residue other than an aromatic residue,
p is zero or one,
Zzz is any amino acid residue,
Asn is asparagine,
Yyy is any amino acid residue other than proline, and
Thr/Ser is one or the other of the amino acid residues threonine and serine, of which
threonine is preferred.
The above sequon is located at the same position in the tight turn as the sequence of four to about seven amino acid residues present in the pre¬ existing polypeptide such that the side chains of three amino acid residues-- Aro, Asn and Thr/Ser -- project on the same side of the turn and are within less than about 7A of each other.
The sequence of four to about seven amino acid residues present in the pre-existing polypeptide is preferably glycosylation-free . In preferred practice, the above sequon is glycosylated. When the sequon is glycosylated, the therapeutic chimeric polypeptide exhibits a folding stabilization
enhancement by about -0.5 to about -4 kcal/mol compared to the pre-existing therapeutic polypeptide in non-glycosylated form.
Returning to the formula for the above- mentioned sequon, it is seen that the length can be four residues when n = p = zero, five residues when n = zero and p = 1 or when n = 1 and p = 0, and ten residues when n = 5 and p = 1. Of course,
intermediate lengths between four and ten residues are also contemplated. It is additionally preferred that the residues Xxx, Yyy and Zzz be other than cysteine .
The above Aro- (Xxx) n- ( Zz z ) p-Asn-Yyy-Thr/Ser sequence is referred to herein as an "enhanced aromatic sequon" because of its increased propensity to form a stabilizing compact structure upon
glycosylation, relative to the canonical Asn-Yyy-Thr sequon, and because it is more efficiently
glycosylated by eukaryotic cells than the Asn-Yyy-Thr sequon [Culyba et al . , Science 331, 571-575 (2011)].
The sequon Asn-Yyy-Thr/Ser is a sequon recognized by the enzyme oligosaccharyl transferase (OST) . OST attaches the highly conserved
GlC3MangGlc Ac2 glycan en bloc to the N atom of the Asn amido side chain. A glycan bonded to the amido nitrogen of an asparagine side chain is illustrated herein as "Asn (Glycan) " to denote any glycan.
During the translocation of a glycosylated polypeptide through the endoplasmic reticulum (ER) , several sugars including each glucose (Glc) and several of the mannose (Man) groups are removed from the glycan portion. The specific resulting glycan is dependent upon the plant or animal in which the polypeptide is expressed, and at what stage after expression the glycopolypeptide is recovered. Illustrative glycosylated Asn residues include those with one N-acetylglucosamine
[Asn (GlcNAc) ] , two N-acetylglucosamines
[Asn (Glc Ac2 ) ] , with one mannose and two
N-acetylglucosamines [Asn (ManGlcNAc2 ) ] , and with three mannoses and two N-acetylglucosamines that is referred to as "paucimannose" (Man3GlcNAc2 ) that forms the glycosylated residue Asn (Man3GlcNAc2 ) , and the like. Additionally, glycosylated asparagine residues can be utilized in an in vitro polypeptide synthetic scheme.
In another embodiment, the sequon contemplated has the formula, from left to right and in the direction from N-terminus to C-terminus,
-Lys- (Zzz)m-Aro- (Xxx) n- ( Z z z ) p-Asn-Yyy-Thr/ Ser ,
[SEQ ID NO: ]
wherein
m is zero, 1, 2, or 3, and
Lys is lysine, and
Zzz, Aro, Xxx, n, p, Yyy and Thr/Ser are as defined previously.
Again, as in the previous discussion, this sequon is positioned in the tight turn sequence of the chimeric polypeptide at the same position in the tight turn as the sequence of four to about seven amino acid residues present in the pre-existing polypeptide such that the side chains of four amino acid residues—Lys , Aro, Asn and Thr/Ser --project on the same side of the turn and are within less than about 7A of each other. That is, each of the Lys, Aro and Thr/Ser residue side chains interacts with the glycan of the Asn residue after proper folding, as for example, after expression and passage of the expressed polypeptide through the ER.
Another way to identify the position of the about four to seven residue amino acid residues present in the pre-existing polypeptide is through use of the numbering system utilized for the location of residues present in a hydrogen bonded sequence of a β-turn, even though a hydrogen bond need not be present in a contemplated tight turn. In this system, the N-terminal residue of the sequence that participates in the hydrogen bond is designated the "i" residue. Going in the direction toward the
C-terminus of the sequence, the residues are numbered "i+1", "i+2", "i+3", "i+4", " +5" , etc. Residues to the N-terminal side of residue "i" are numbered
"i-1", "±-2", "i-3", "i-4", " -5" , etc.
Illustrative examples of this type of nomenclature can be seen hereinafter such as in work regarding the type-I β-bulge turn present in the non- therapeutic genetically-engineered polypeptide rat glycoprotein CD2 (RnCD2*) . The sequon in that type-I β-bulge turn was engineered to be Asn-Gly-Thr, within the seven residue sequence Glu-Ile-Leu-Ala-Asn-Gly-
Thr (SEQ ID NO: ) and was replaced in the chimeric polypeptide by the sequon -Lys- ( Zz z ) m-Aro- (Xxx) n-
( Zz z ) p-Asn-Yyy-Thr/Ser , where m =1, n = 3 and p = zero, Lys-Ile-Phe-Ala-Asn-Gly-Thr (SEQ ID NO: ) .
The pre-existing sequence in the pre-existing RnCD2* is Asn-Gly-Thr, where the Asn is at the i position, whereas the Gly is at the i+1, and Thr is at the i+2 position. In the chimeric polypeptide, the Asn, Gly and Thr are as before, and the Lys, lie, Phe, and Ala are at positions i-4, i-3, i-2, and
respectively .
In the above polypeptide [Lys-Ile-Phe-Ala-
Asn-Gly-Thr (SEQ ID NO: ) } the Phe, Thr and
Asn(Glycan) interact, and the Lys also appears to interact with those residues. As a result, looking from the viewpoint of the chimeric therapeutic polypeptide, one can base the numbering nomenclature upon the Phe as the i residue, the Ala as i+1, the Asn as i+2, the Gly as i+3, and the Thr as i+4. Both methods of numbering are used herein.
Turning again to the before-discussed preferred sequon,
Aro- (Xxx) n- (Zzz)p-Asn-Yyy-Thr/Ser, in one preferred embodiment, "n" is 1 and "p" is 1 and the chimeric polypeptide contains a Type II β-turn in a six-residue loop. The resulting enhanced aromatic sequon present in the chimeric polypeptide has the sequence : Aro-Xxx-Zzz-Asn-Yyy-Thr/Ser .
In another preferred embodiment, "n" is 1 and "p" is zero. The pre-existing and chimeric polypeptide sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro- Xxx-Asn-Yyy-Thr/Ser as defined above. The chimeric polypeptide preferably contains a five-residue type I β-bulge turn.
In still another preferred embodiment, "n" is zero and "p" is zero. The pre-existing and chimeric polypeptide sequences differ by the presence in the chimeric therapeutic polypeptide of the sequon, Aro-Asn-Yyy-Thr/Ser as defined above. Here, a preferred chimeric polypeptide contains a four- residue type I' β-turn.
One group of exemplary pre-existing therapeutic polypeptides is constituted of
therapeutic antibodies, and particularly the heavy chains of human antibodies. The heavy chain of all IgG-type antibodies has three constant domains: CHI, CH2, and CH3. The CH2 and CH3 domains form what is called the Fc fragment, or the crystallizable fragment. A complete human antibody heavy chain contains about 450 amino acid residues, of which about one-half are present in the Fc portion.
The following Table A provides a list of USAN names of therapeutic antibodies that are approved or at some point in clinical trials. The CH2 and CH3 domains of human antibody Fc portions contain reverse turns, each of which can be modified to form one or two enhanced aromatic sequons . The pre-existing tight turn sequence of illustrative antibodies or antibody Fc portions such as those below and exemplary replacement sequons contemplated herein that can provide enhanced folding stability are provided in Table B thereafter.
Table A
USAN Names of Therapeutic Antibodies "abagovomab" , "adalimumab" , "alemtuzumab" ,
"apolizumab" , "basiliximab" , "basliximab" ,
"belimumab", "bevacizumab" , "canakinumab" ,
"catumaxomab" , "certolizumab" , "cetuximab",
"cixutumumab" , "conatumumab" , "consumab",
"daclizumab" , "dalotuzumab" , "denosumab", "dermab", "eculizumab" , "edrecolomab" , "efalizumab" ,
"efungumab", "elotuzumab" , "epratuzumab" , ertumaxomab" , "etaracizumab", "figitumumab" , galiximab" , "ganitumab", "gemtuzumab" , "genmab golimumab" , "ibalizumab" , " ibritumomab" , infliximab" , " ipilimumab" , " lexatumumab" , lintuzumab" , " lumiliximab" , "mapatumumab" , matuzumab" , "mepolizumab" , "milatuzumab" , motavizumab" , "natalizumab" , "necitumumab" , nimotuzumab" , "ofatumumab" , "omali zumab" , oregovomab" , "otelixizumab" , "palivizumab" , panitumumab" , "pertuzumab" , "ramucirumab" , ranibizumab" , "reslizumab" , "rituximab" , siplizumab" , " sonepci zumab" , "tanezumab", tefibazumab" , "tepli zumab" , "ticilimumab" , tocili zumab" , "tositumomab" , "trastuxumab" , trastuzumab" , "tremelimumab ", "tucotuzumab" , ustekinumab" , "veltuzumab" , "visilizumab" , volociximab" , " zalutumumab"
Table B
Figure imgf000025_0001
CAMPATH-
1H : Heavy
Chain 1
1CE1 :H DSDGS FSNGT
CAMPATH-
1H : Heavy
Chain 2
DB00113 Arcitumomab lclo : Anti- DYNST FYNST
CEA heavy
chain 1
DSDGS FSNGT
DB00113 Arcitumomab lclo : Anti- DYNST FYNST
CEA heavy
chain 2
DSDGS WSNGS
DB00043A Omalizumab IgE QYNST WYNST nti antibody VH
domain
chain 1
DSDGS YSNGT
DB00043A Omalizumab IgE QYNST HYNST nti antibody VH
domain
chain
DSDGS YSNGS
DB00057 Satumomab Heavy chain DYNST FYNST
Pendetide 1 B72.3
DSDGS HSNGT
DB00057 Satumomab Heavy chain DYNST WYNST
Pendetide 2 B72.3
DSDGS YSNGT
DB00092 Alefacept Human LFA QYNST WYNST fused to
human Fc
DSDGS YSNGT
DB00111 Daclizumab Humanized QYNST FYNST
Anti-CD25
Heavy Chain
1
DSDGS YSNGS
DB00111 Daclizumab Humanized QYNST HYNST
Anti-CD25
Heavy Chain
2
DSDGS FSNGS
DB00002 Cetuximab Anti-EGFR DSDGS WSNGS heavy chain
1
DB00002 Cetuximab Anti-EGFR DSDGS WSNGT heavy chain
2
DB00081 Tositumomab Mouse-Human QYNST FYNST chimeric
Anti-CD20
heavy chain
1
DSDGS FSNGT
DB00081 Tositumomab Mouse-Human QYNST FYNSS chimeric
Anti-CD20
heavy chain
2
DSDGS WSNGT
DB00072 Trastuzumab Anti-HER2 QYNST FYNST
Heavy chain
1
DSDGS WSNGS
DB00072 Trastuzumab Anti-HER2 QYNST FYNST
Heavy chain
2
DSDGS WSNGS
DB00075 Muromonab 1SY6:H OKT3 QYNST WYNST
Heavy Chain
1
DSDGS FSNGS
DB00075 Muromonab 1SY6:H OKT3 QYNST HYNST
Heavy Chain
2
DSDGS YSNGS
DB00054 Abciximab 1TXV:H QYNST WYNST
ReoPro-like
antibody
Heavy Chain
1
DSDGS WSNGT
DB00054 Abciximab 1TXV:H QYNST FYNST
ReoPro-like
antibody
Heavy Chain
2
DSDGS FSNGT
DB00074 Basiliximab 1MIM:H QYNST YYNST Anti-CD25
antibody
heavy
CHIMERIC
chain 1
DSDGS FSNGT
DB00074 Basiliximab 1MIM:H QYNST WYNST
Anti-CD25
antibody
heavy
CHIMERIC
chain 2
DSDGS FSNGS
DB00073 Rituximab Mouse-Human QYNST HYNSS chimeric
Anti-CD20
Heavy Chain
1
DSDGS YSNGS
DB00073 Rituximab Mouse-Human QYNST FYNSS chimeric
Anti-CD20
Heavy Chain
2
DSDGS WSNGS
Another group of exemplary pre-existing therapeutic polypeptides is hormones. Illustrative of such hormones are erythropoietin, darbepoetin alfa (an erythropoietin variant with two additional
N-glycans), interferon beta, and follicle stimulating hormone, follitropin beta, peginterferon alfa-2b, becaplermin, sermorelin, somatropin, pramlintide, sargramostim, insulin, thyrotropin alfa,
choriogonadotropin alfa, lepirudin, lutropin alfa, secretin, bivalirudin, corticotrophin, exenatide and the like.
Yet another group of exemplary pre-existing therapeutic polypeptides is enzymes. Illustrative of such enzymes are laronidase, collagenase,
pancrelipase, streptokinase, urokinase, imiglucerase, reteplase, coagulation factor VII, coagulation factor VII, coagulation factor IX, alglucerase, agalsidase beta, asparaginase, hyaluronidase, tenecteplase, pegademase bovine, dornase alfa, anistreplase, pegaspargase, alteplase, and the like.
Further pre-existing polypeptides include denileukin diftitox, botulinum toxin type B,
nesiritide, pegfilgrastim, human serum albumin, mecasermin, aldesleukin, antihemophilic factor, aprotinin, palifermin, peginterferon alfa-2a, teriparatide, urofollitropin, anakinra, menotropins, OspA lipoprotein, pegvisomant, thymalfasin,
follitropin beta, peginterferon alfa-2b, alpha-1- proteinase inhibitor, filgrastim, oprelvekin, rasburicase, darbepoetin alfa, enfuvirtide and the like .
Table C, below, illustrates five residue native sequences within tight turns of two of the above polypeptides, the alpha chain of follitropin beta, which has a type VI β-turn, and imiglucerase, which has a type I β-bulge turn. Also illustrated for each of those polypeptides are replacement sequon sequences for the illustrated native five residue sequences .
Table C
Figure imgf000029_0001
FMNGS
[]
WMNGS
[]
YMNGS
[]
HMNGS
[]
IY7V Imigluceras Human Beta- HPDGS FPNGT e glucocidase [] []
WPNGT
[]
YPNGT
[]
HPNGT
[]
FPNGS
[]
WPNGS
[]
YPNGS
[]
HPNGS
[]
Nearly 9% of the reverse turns in the
Protein Data Bank (PDB) are type I β-bulge turns
[Sibanda et al . , J Mol Biol 206(4), 759-777 (1989); and Oliva et al . , J Mol Biol 266(4), 814-830 (1997)], so installing the Aro- (Xxx) n- ( Zz z ) p-Asn-Yyy-Thr/Ser enhanced aromatic sequon could be an attractive strategy for increasing the stability of the many proteins that harbor type I β-bulge turns.
Identifying other suitable reverse turn types that could position Phe, GlcNAcl, and Thr close enough to facilitate a tripartite interaction would further expand the number of proteins that could benefit from the increased stability and possibly the increased glycosylation efficiency afforded by the enhanced aromatic sequon.
Illustrative glycosylated Asn residues include those with one N-acetylglucosamine
[Asn (GlcNAc) ] , two N-acetylglucosamines
[Asn (Glc Ac2 ) ] , with one mannose and two N-acetylglucosamines [Asn (ManGlcNAc2 ) ] , and with three mannoses and two N-acetylglucosamines that is referred to as "paucimannose" (Man3Glc Ac2 ) that forms the glycosylated residue Asn (Man3GlcNAc2 ) , and the like. Additionally, glycosylated asparagine residues can be utilized in an in vitro polypeptide synthetic scheme.
PREPARATION OF A CHIMERIC THERAPEUTIC POLYPEPTIDE
A method of method of enhancing folded stabilization of a chimeric therapeutic polypeptide compared to a pre-existing therapeutic polypeptide is also contemplated. The pre-existing therapeutic polypeptide comprises a sequence of about 15 to about 1000 amino acid residues, preferably about 25 to about 500 residues, and more preferably about 35 to about 300 residues, and exhibits a secondary
structure that comprises at least one tight turn in which the side chains of two residues in a sequence of four to about seven amino acid residues within the tight turn project on the same side of the turn and are within less than about 7A of each other. Those four to about seven amino acid residues are
preferably glycosylation-free . In accordance with that method, a therapeutic chimeric polypeptide is prepared that is of the same length and substantially same sequence as the therapeutic polypeptide and exhibits a secondary structure comprising at least one tight turn at the same sequence position within the tight turn as in the therapeutic polypeptide, except that said sequence of four to about seven amino acid residues is replaced with the sequon, in the direction from left to right and from N-terminus to C-terminus, Aro- (Xxx) n- (Zzz ) p-Asn (Glycan) -Yyy-Thr/Ser ,
[SEQ ID NO: ]
wherein
Aro is an aromatic amino acid residue, n is zero, 1, 2, 3 or 4,
Xxx is an amino acid residue other than an aromatic residue,
p is zero or 1,
Zzz is any amino acid residue,
Asn (Glycan) is glycosylated asparagine,
Yyy is any amino acid residue other than proline,
Thr/Ser is one or the other of the amino acid residues threonine and serine, and
the side chains of the Aro, Asn (Glycan) and Thr/Ser amino acid residues project on the same side of the turn and are within less than about 7A of each other .
In some embodiments, the Asn (Glycan) is Asn (GlcNAc) _ . In other embodiments, Asn (Glycan) is
Asn (GlcNAc) 2 ? whereas in other embodiments
Asn(Glycan) is Asn (GlcNAc) 2Mani . In still other embodiments, the glycan of Asn (Glycan) is
paucimannose .
A contemplated polypeptide can be prepared in a number of manners. Longer polypeptides, such as those of about 50 residues and longer, are most readily prepared by genetic engineering following well known techniques. Thus, for example, a
therapeutic chimeric polypeptide is prepared by expressing a nucleic acid sequence that encodes the polypeptide sequence of the therapeutic chimeric polypeptide in a host cell that glycosylates the amino acid sequence Aro- (Xxx) n- ( Zz z ) pAsn-Yyy-Thr/Ser when present in a polypeptide sequence expressed therein to form the sequence Aro- (Xxx) n- ( Zz z ) p-
Asn (Glycan) -Yyy-Thr/Ser . Examples of such
preparations are illustrated hereinafter.
In such a preparation, any of eukaryotic several host cells can be utilized for the
preparation of a glycosylated chimeric therapeutic polypeptide. For example, yeast cells such as
Saccharomyces cerevisiae, Pichia pastoris, mammalian cells such as CHO cells, insect cells such as
Spodoptera frugiperda (Sf9) cells, and in plant cells such as those of tobacco (Nicotiana tobaccum M38) or Arabidopsis thaliana. Unstablized (unglycosylated or non-glycosylated) therapeutic polypeptides useful for comparative purposes can be expressed in bacterial cells that do not gylcosylate their expressed
polypeptides such as E. coli.
In the following examples, an illustrative polypeptide is expressed as a fusion protein that contains isolation and purification sequences. One such sequence is a 6-residue hexa-histidine sequence at the N-terminus of the polypeptide to assist in purifying and isolating the desired chimer via binding to a Nickel affinity ligand on a solid support. Additional affinity tags include the
9-residue FLAG-tag and the myc-tag that are bound by solid-support-linked antibody binding sites. The so- called Strep-tag® II, which consists of a
streptavidin-recognizing octapeptide, can be
affinity-purified using a matrix with a modified streptavidin and eluted with a biotin analog.
Because it is desirable to remove most tags at the end of the purification process, considerable advances have been made in design of affinity tags so that they can be cleaved without leaving any residues behind and also to simplify the entire process of purification and cleavage. One such system is the "Profinity eXact™" fusion-tag system (Bio-Rad
Laboratories, Hercules, CA) , which uses an
immobilized subtilisin protease to carry out affinity binding and tag cleavage. The protease is not only involved with the binding and recognition of the tag, but upon application of the elution buffer, it also serves to precisely cleave the tag from the fusion protein directly after the cleavage recognition sequence. This delivers a native, tag-free
polypeptide in a single step. Another system for simple purification of proteins is based on elastin- like polypeptides (ELP) and intein. ELP consist of several repeats of a peptide motif that undergo a reversible transition from soluble to insoluble upon temperature upshift. The fusion protein is purified by temperature-induced aggregation and separation by centrifugation, and intein is used for tag removal. No affinity columns are needed for initial
purification .
Solubility-enhancing tags are generally large peptides or proteins that increase the
expression and solubility of fusion proteins. Fusion tags like GST and MBP also act as affinity tags and as a result, they are very popular for protein purification. Other fusion tags like NusA,
thioredoxin (TRX) , small ubiquitin-like modifier (SUMO), and ubiquitin (Ub) , on the other hand, require additional affinity tags for use in protein purification . An expressed polypeptide also preferably includes a peptide cleavage site so that a purified polypeptide can be cleaved from any tags utilized in its purification and isolation. This cleavage or tag-removal step almost always involves using a protease to cleave a specific peptide bond between the tag and the protein of interest. A small number of highly specific proteases are routinely used for this purpose. These include the tobacco etch virus (TEV) protease; thrombin (factor Ila, flla) and factor Xa (fXa) from the blood coagulation cascade; an enzyme involved in the cleavage or activation of trypsin in the mammalian intestinal tract,
enterokinase (EK) ; proteases involved in the
maturation and decon ugation of SUMO, SUMO proteases (Ulpl, Senp2, and SUMOstar) ; and a mutated form of the Bacillus subtilis protease, subtilisin BP ' (Bio- Rad ' s Profinity eXact system) . Many of these enzymes have been genetically engineered to enhance their stability (e.g., AcTEV™, ProTEV) or their
specificity, (e.g. SUMOstar, Profinity). With the exception of the SUMO proteases, all of these enzymes have the potential to cleave within the protein of interest. The SUMO proteases recognize not only their specific cleavage site, but also the tertiary structure of SUMO itself, giving them a very high degree of specificity.
A desired polypeptide can also be prepared by one or more of the well known in vitro polypeptide synthesis techniques, particularly solid phase synthesis. This mode of synthesis is also
illustrated hereinafter. PHARMACEUTICAL COMPOSITIONS
In yet another embodiment of the invention, a contemplated chimeric therapeutic polypeptide is an active ingredient in a pharmaceutical composition for administration to a human patient or suitable animal host such as a chimpanzee, mouse, rat, horse, sheep or the like.
Thus, a contemplated chimeric therapeutic polypeptide is dissolved or dispersed in a
pharmaceutically acceptable diluent composition that typically also contains water. When administered to a host animal in need of the polypeptide, such as a mammal (e.g., a mouse, dog, goat, sheep, horse, bovine, monkey, ape, or human) or bird (e.g., a chicken, turkey, duck or goose) , the polypeptide provides the benefit of the pre-existing polypeptide.
The amount of chimeric therapeutic polypeptide present in a pharmaceutical composition is referred to as an effective amount and can vary widely, depending inter alia, upon the polypeptide used and the presence of adjuvants and/or other excipients present in the composition. The amount of chimeric therapeutic polypeptide that constitutes an effective amount varies with the polypeptide and the condition to be treated. Starting dosages are taken from the literature or the product label of the corresponding pre-existing therapeutic polypeptide usage, and are typically ultimately some what less than that used for the pre-existing therapeutic polypeptide .
The preparation of pharmaceutical compositions that contain proteinaceous materials as active ingredients is well understood in the art. Typically, such compositions are prepared as parenterals, either as liquid solutions or
suspensions; solid forms suitable for solution in, or suspension in, liquid prior to injection can also be prepared. The preparation can also be emulsified.
Once purified, a contemplated chimeric therapeutic polypeptide is typically recovered by lyophilization . A pharmaceutical composition is typically prepared from a recovered chimeric
therapeutic polypeptide by dispersing the
polypeptide, preferably in particulate form, in a physiologically tolerable (acceptable) diluent vehicle such as water, saline, phosphate-buffered saline (PBS), acetate-buffered saline (ABS), Ringer's solution, or the like to form an aqueous composition. Alternatively, the lyophilized polypeptide is mixed with additional solid excipients and stored as such for constitution with water, saline and the like as discussed above.
Excipients that are pharmaceutically acceptable and compatible with the active ingredient are often mixed with the solid polypeptide, or can be predissolved in the liquid medium. Suitable
excipients are, for example, water, saline, dextrose, glycerol, ethanol, or the like and combinations thereof. In addition, if desired, a composition can contain minor amounts of auxiliary substances such as wetting or emulsifying agents, pH buffering agents that enhance the effectiveness of the composition.
ILLUSTRATIVE EXAMPLES
The adhesion domain of human glycoprotein CD2 (HsCD2ad) , a non-therapeutic polypeptide, is glycosylated at Asn65, within the Asn65-Gly66-Thr67 sequon (Fig. 1A) . NMR and crystallographic data demonstrate that Asn65 occupies the i+2 position of a five-residue type I β-bulge turn that spans from Phe63 (i) to Thr67 (i+4; Fig. IB), with Gly66
occupying the i+3 bulge position [Wyss et al . ,
Science 269, 1273-1278 (1995); Wang et al., Cell 97, 791-803 (1999)] . Nuclear Overhauser effects (NOEs) suggest that the side chain of Phe63 at the i
position interacts with the hydrophobic face of the first GlcNAc residue of the glycan (Asn65-GlcNAcl- GlcNAc2-) , which also packs into the side chain methyl group of Thr67 (see Fig. 1C for a space¬ filling view of this cluster) .
NOE evidence also suggests the possibility of a stabilizing protein-glycan interaction between GlcNAc2 of the glycan and Lys61 [Wyss et al., Science 269, 1273-1278 (1995)] . Wyss et al . hypothesized that this interaction disperses the positive charge present in a cluster of five Lys residues, but the energetics of this interaction were not probed [Wyss et al., Science 269, 1273-1278 (1995)] . Previous kinetic studies of glycan-dependent HsCD2ad folding suggest that the N-glycan does much more than
attenuate unfavorable electrostatic interactions
[Hanson et al . , Proc Natl Acad Sci U S A 106:3131- 3136 (2009) ] .
Bioinformat ic analysis of the protein data bank (PDB) has revealed that aromatic residues are overrepresented two residues before Asn in occupied sequons [Petrescu et al . , Glycobiology 14, 103-114 (Feb, 2004)], leading us to hypothesize that the unusually large stabilizing effect of glycosylat ion on HsCD2ad folding is largely due to a tripartite Phe63-GlcNAcl-Thr67 interaction. Because
nonglycosylated HsCD2ad is unfolded, we used the structurally homologous rat ortholog of HsCD2ad
(RnCD2ad) to test this hypothesis as RnCD2ad does not require N-glycosylation to fold. Residues 63-67 of the RnCD2ad retain the same five-residue type I β- bulge turn geometry found in HsCD2ad (Fig. 2A, inset) [Jones et al . , Nature 360, 232-239 (1992)].
We installed the Asn65-Gly66-Thr67 glycosylation sequon from HsCD2ad into the β-bulge turn of RnCD2 by mutating Asp67 to Thr (residues 65 and 66 in the wild-type RnCD2ad sequence are already Asn and Gly, respectively) . To generate a version of RnCD2ad that would be glycosylated only within this turn context, we removed three naturally occurring N-glycosylation sequons (by mutating Asn72, Asn82 and Asn89 to Gin, Gin and Asp, respectively) . This modified RnCD2ad sequence (which contains only one glycosylation site at Asn65) is referred to as
RnCD2 * .
RnCD2* folds in the absence of
glycosylation (expressed in E. coli) , and has Glu at position 61 and Leu at position 63 in contrast to the Lys61 and Phe63 in HsCD2ad (Fig. 2A) . These
differences make RnCD2* an ideal sequence in which to study the kinetic and thermodynamic consequences of the interactions between the N-glycan, Lys61, and Phe63, using a triple mutant thermodynamic cycle
[Jones et al . , Nature 360, 232-239 (1992)]. The stabilities and folding kinetics of the eight RnCD2* sequences required for the cycle were determined by equilibrium denaturation and stopped-flow kinetic studies (see Fig. 2B,C for representative data) and tabulated data are shown in Fig. 2E (N refers to N-glycosylated Asn). Glycosylated (g-RnCD2*)
variants appended with Man6-8 oligomannose glycans (determined by ESI-MS; Fig SX) were expressed in Sf9 insect cells.
Glycosylation stabilizes g-RnCD2* by -0.6 kcal mol-1 relative to RnCD2*, which is -2.5 kcal mol-1 less than the increase in stability observed upon glycosylation of HsCD2ad. g-RnCD2*-K (Glu61Lys), and g-RnCD2*-F (Leu63Phe) are stabilized by -1.5 and -1.8 kcal mol-1 relative to the corresponding non- glycosylated variants, respectively. These effects are each about -1 kcal mol-1 greater than the observed increase in stability upon glycosylation of the unmodifided RnCD2*, suggesting that Lys61 and Phe63 in these RnCD2* variants are each able to form
stabilizing interactions with the N-glycan at
position 65 that are putatively similar to the
interactions observed in the NMR structure of
HsCD2ad.
The N-glycan-dependent contributions of Lys61 ( Gt [RnCD2*-K] - AAGf [RnCD2 * ] ) and Phe63
{AAGf [RnCD2*-F] - AAGf [RnCD2 * ] ) to RnCD2 * stability, are comparable: -0.9 kcal mol-1 and -1.2 kcal mol-1, respectively. Notably, these interactions are
synergistic: according to the data in the triple mutant cycle, this synergy amounts to -1.0 kcal mol-1. A comparison of kinetic measurements shows that glycosylated variants that contain Phe63 unfold 20 to 200 times more slowly than the corresponding
nonglycosylated variants, suggesting that an
interaction between Phe63 and the N-glycan at
position 65 stabilizes the native state of RnCD2* (Fig. 2E) .
Unlike the interaction of the N-glycan with Lys61, which may depend on the presence of a nearby cluster of positively charged residues, the stabilizing tripartite interaction between Phe-i, Asn-N-glycan-i+2 and Thr-i+4 in RnCD2 * and HsCD2ad appears to be a self-contained structural module, which we call an enhanced aromatic sequon. We next explored whether incorporating this "enhanced
aromatic sequon" into reverse turns in other
glycosylation-naive proteins would also result in substantial increases to stability.
A PDB search supports this possibility by revealing four additional proteins that contain type I β-bulge turns with a Phe at the i position, a glycosylated Asn residue at the i+2 position, and a Thr at the i+4. In each case, the Phe and Thr side chains contact the first GlcNAc of the N-glycan (Fig. 2F) . Furthermore, we identified glycosylated type I β-bulge turns in four additional proteins in which aromatic residues other than Phe (Tyr, Trp, or His) occupy the i position, making analogous contacts. This observation highlights the view that aromatic amino acid side chains other than Phe can also enhance glycosylation sequons by engaging in
similarly stabilizing interactions with N-glycans in reverse turns.
As is disclosed in detail hereinafter, we demonstrate that placing a Phe two or three residues prior to (up stream of or toward the amino-terminus from) a glycosylated Asn in certain reverse turn- contexts leads to substantial stabilization in three different proteins, and constitutes a portable method for increasing glycoprotein stability.
The portability of the stabilization conferred by the enhanced aromatic sequon was tested by integrating it into a glycosylation-naive reverse turn in human muscle acylphosphatase (AcyP2), a two- layer oc β protein, in which two a-helices pack against a four-stranded β-sheet [Pastore et al . , J Mol Biol 224 , 427-440 (1992)]. Reverse turn residues 43 to 47 are not well-enough defined in the NMR structure of AcyP2 to discern their precise conformation, but homologous residues in the crystal structure of common type acylphosphatase (57% identical to AcyP2) adopt a type I β-bulge turn conformation [Yeung et al . , Acta Crystallogr Sect F Struct Biol Cryst Commun 62 , 80-82 (2006) ] .
Thus, Thr43Phe (i) and Lys45Asn (i+2) mutations in the β-bulge turn create the enhanced aromatic sequon (the i+4 position is already Thr;
Fig. 3A) . The three additional sequons present in wild-type AcyP2 (but which are not normally
glycosylated, as AcyP2 is a cytosolic protein) were removed by Ser to Ala mutations at positions 44, 82 and 95 to create a modified version of AcyP2 (AcyP2*) that is N-glycosylated only at Asn45.
Four AcyP2* variants, differing in the identity of the side chain at position 43 (Phe or Thr) and in the presence or absence of a glycan at Asn45 (Fig. 3A,C) were prepared. Glycoproteins g-AcyP2* and g-AcyP2*-F with predominantly
fucosylated paucimannose glycans were expressed in Sf9 insect cells.
The folding free energies of each variant were determined by equilibrium denaturation (see Fig. 3B for representative data) . Glycoprotein g-AcyP2*-F is stabilized by -2 kcal mol-1 relative to
nonglycosylated AcyP2*-F from E.coli. In contrast, glycoprotein g-AcyP2* is destabilized relative to the non-glycosylated AcyP2* by +0.5 kcal mol-1. Thus, the estimated N-glycan-dependent contribution of the Phe- glycan interaction is -2.5 kcal mol~ , suggesting that an interaction between Phe43 and the N-glycan at position 45 (and putatively Thr47) stabilizes the reverse turn, and thus the protein.
In fact, the contribution of Phe-glycan interaction in AcyP2* is about -1 kcal mol-1 larger than was observed in RnCD2* (Fig. 2E) . Even in the absence of structural data for g-AcyP2*-F, it is clear the enhanced aromatic sequon is a portable module that can stabilize proteins, like RnCD2* and AcPy2*, whose glycosylation-naive reverse turns have not been tailored by evolution for optimal proteoglycan interactions.
In addition to the stabilization conferred by the enhanced aromatic sequon, cellular
glycosylation efficiency was consistently enhanced. The ratio of N-glycosylated to non-glycosylated proteins from Sf9 insect cells is substantially higher for both RnCD2* and AcyP2* variants relative to variants that lack the Phe residue (Fig. 2D and Fig. 3D), suggesting that the enhanced glycosylation sequon may be a better substrate for glycosylation by OST. This observation should prove useful for enhancing glycoprotein yields, as sequon occupancy can be variable. The enzymology of this observation merits further investigation, but it is tempting to speculate that OST may have evolved to favor
sequences, like the enhanced aromatic sequon, that stabilize proteins upon glycosylation .
The structural information in the PDB suggests that the origin of the enhanced aromatic sequon effect depends on the Phe2-Xxx-Asn (Glycan) -Gly-
Thri+4-type I β-bulge turn substructure, which allows the Phe, GlcNAc and Thr side-chains to interact optimally .
Several features of this substructure are also likely to be important. The Thr side chain accepts a H-bond from the NH of the first GlcNAc residue and the C=0 of the i+2 Asn residue accepts a H-bond from the backbone NH of Thr at the i+4 position. These H-bonds, and the characteristic H-bond between the >C=0 of the i+4 Thr and the NH of the i position Phe are largely solvent occluded and may contribute additional enthalpic stabilization to this portable stabilizing substructure.
In principle, reverse turns other than the type I β-bulge turn could also benefit from the tripartite stabilizing interaction between Phe, Thr and Asn-glycan. This hypothesis was tested using a portion of the 34-residue WW domain from human Pin 1 (Pin WW) , a glycosylation-naive β-sheet protein in which three anti-parallel β-strands are connected by two loops. In wild-type WW, loop 1 adopts an unusual six-residue hydrogen-bonded loop harboring an
internal type II β-turn (Fig. IB); 0.1% of the reverse turns in the PDB have this conformation
[Oliva et al . , J Mol Biol 266 (4) : 814-830 (1997)].
In the Pinl WW crystal structure, the side chains of Serl6, Serl9, and Arg21 all project on the same side of loop 1, such the side-chain β-carbons (Οβ'ε) at each position are within 5-6 A [Ranganathan et al., Cell 89, 875-886 (1997)]. Those distances are close enough to facilitate a stabilizing interaction between Phe, GlcNAcl and Thr, similar to the
interactions observed in the glycosylated type I β-bulge turn of HsCD2ad (Fig. 1A) [Wang et al., Cell 97, 791-803 (1999)]. The similar Οβ-Οβ distances in HsCD2ad suggest that positions 16, 19, and 21 might be suitable locations for incorporating the
individual elements of the enhanced sequon (Phe at position 16, Asn-linked glycan at position 19, and Thr at position 21, Fig. 4A) . In this version of the enhanced aromatic sequon, Phe is at the -3 position relative to the glycosylated Asn, instead of at the -2 position, as in the examples above.
Pinl WW can be synthesized chemically, enabling us to examine the contributions of the Thr side chain to N-glycan dependent stabilization of Pin WW, in addition to the Phe-glycan interaction
explored above. Here, a simple Asn-GlcNAc side chain is used. Eight Pin WW variants were synthesized (Fig. 4E) , which contain all possible combinations of the Serl6Phe, Asnl 9Asn-GlcNAc, and Arg21Thr
mutations, enabling triple mutant thermodynamic cycle analysis. The thermal stability and folding rates of these variants were determined by variable
temperature circular dichroism spectroscopy and laser temperature jump studies, respectively (see Fig. 3B-D for representative data and Table SX and Figures SX for the remaining data) and tabulated data appear in Fig. 4E.
Chemical glycosylation of the Phe-Xxx-Zzz- Asn-Yyy-Thr sequon (with a single GlcNAc, GlcNAcl) in the six-residue loop of WW increased the stability of the resulting WW variant by -0.7 kcal mol-1 [herein and in Culyba et al . , Science 331, 571-575 (2011)], a smaller effect than observed for the Phe-Xxx-Asn-Yyy-
Thr [SEQ ID NO: ] sequon in the type I β-bulge turns of RnCD2 and AcyP2 (AAGf = -1.8 kcal mol-1, -2.5 kcal mol-1, respectively) . One possible interpretation of these results is that the type II β-turn within the six- residue loop does not promote the stabilizing
tripartite interaction between Phe, GlcNAc, and Thr as effectively as does the five-residue type I β-bulge turn. However, key host context differences between the WW, RnCD2, and AcyP2 proteins could also be partially responsible for these observations, including differences in folding topology and
mechanism [Nickson et al . , Methods 52(1), 38-50 (2010)], and differences in the amino acids that flank the glycosylated reverse turns [Culyba et al . , Science 331, 571-575 (2011)].
Moreover, because the WW domain is synthesized chemically via a solid-phase strategy, the N-glycan in WW (GlcNAc) is much smaller than the N-glycans in RnCD2 ( oligomannose ) and AcyP2
(fucosylated paucimannose ) . Interactions between the host sequences and these extended glycans could also contribute to the stabilization associated with glycosylating the Phe-Xxx-Asn-Yyy-Thr sequon in the type I β-bulge turns within RnCD2 and AcyP2.
The N-glycan-dependent contribution of Phel6 to Pin WW stability is -0.19 kcal mol-1 in the absence of Thr21, but is -0.62 kcal mol-1 in the presence of Thr 21. Similarly, the N-glycan- dependent contribution of Thr21 to Pin WW stability is -0.18 kcal mol-1 in the absence of Phel6, but is -0.63 kcal mol-1 in the presence of Phel6.
These results strongly suggest the presence of a stabilizing tripartite interaction between
Phel6, Asnl 9-GlcNAc, and Thr21 and provide evidence that the enhanced aromatic sequon can be successfully applied in reverse turn contexts other than the type I β-bulge turns present in HsCD2ad, RnCD2* and
AcyP2*. Significant slowing of the unfolding rate in g-WW-F,T relative to WW-F,T suggests that the Phe- GlcNAc-Thr interaction stabilizes the Pin WW native state. Notably, both the kinetic and thermodynamic data provide strong evidence for the importance of Thr in this reverse turn context. Thr has long been known to play a crucial role in the biology of the OST-mediated glycosylation, but this is the first clear demonstration of its energetic importance.
It is well established that N-glycosylation can enhance glycoprotein stability, but it was not previously possible to know where to put a glycan to achieve predictable stabilization. Our findings indicate that the Asn-N-glycan, Phe and Thr side chains contribute key interactions that significantly stabilize glycoproteins when appropriately placed in reverse turn contexts, even those that are not normally glycosylated. This observation may account in part for the high frequency of glycosylation in the reverse turns of secreted proteins [Petrescu et al., Glycobiology 14 , 103-114 (2004); Zielinska et al., Cell 141 , 897-907 (2010)].
The results obtained herein are useful for predicting with increased accuracy whether
N-glycosylation at a given site is likely to
stabilize a protein, and whether that site is likely to be glycosylated efficiently, information that is critical for glycoprotein engineering. That the enhanced aromatic sequon in a type I β-bulge turn context is found in the PDB with all possible
aromatic residues at the i position indicates that aromatics other than Phe are useful. The WW domain from human Pin 1 also conveniently provides a single protein into which several types of enhanced aromatic sequons and their corresponding reverse turn types can be inserted without changing the overall structure or the
flanking sequences. The WW domain is ideal for these requirements: many WW variants harboring different reverse turn types in loop 1 have been structurally characterized [Ranganathan et al . , Cell 89, 875-886 (1997); Jager M, et al . Proc. Natl. Acad. Scl. USA 103, 10648-106531 (2006); and Fuller et al . Proc. Natl. Acad. Sci. USA 106, 11067-11072 (2009)] and biophysically [Jager et al . Proc. Natl. Acad. Sci. USA 103, 10648-106531 (2006); Fuller et al . Proc. Natl. Acad. Sci. USA 106, 11067-11072 (2009); Jager et al., J. Mol. Biol. 311, 373-393 (2001); and Kaul et al., J. Am. Chem. Soc. 123, 5206-5212 (2001)].
Crystal structures exist for WW domains harboring a type II β-turn in a six-residue loop (Fig. IB), a five-residue type I β-bulge turn (Fig. 1C) , and a four-residue type I' β-turn (Fig. ID) as loop 1. It is thought that a type I' β-turn, which makes up 11% of the reverse turns in the PDB [Oliva et al., J Mol Biol 266(4), 814-830 (1997)], would also be an additional conformational host for a complementary enhanced aromatic sequon: the Οβ'ε of the side chains at the i, i+1, and i+3-positions are close enough (< 5.6 A; see Fig ID) to support a stabilizing tripartite interaction among Phe,
Asn (GlcNAcl ) , and Thr . Importantly, the chemical synthesis of homogeneously glycosylated [Bertozzi et al., Science 291, 2357-2364 (2001)] WW domains is efficient [Culyba et al . , Science 331, 571-575
(2011); and Price et al . , J. Am. Chem. Soc. 132, 15359-15367 (2010)] enabling numerous analogs to be prepared, each having an identical N-glycan (in this case, GlcNAcl) .
The data herein show that type I ' β-turns are suitable conformational hosts for a stabilizing enhanced aromatic sequon. This result significantly expands the scope of protein stabilization by
glycosylating enhanced aromatic sequons .
Furthermore, these data show that the order of stabilization by glycosylating enhanced aromatic sequons in the different turn types is: type I β- bulge turns > type II β-turns in a six-residue loop > type I ' β-turns .
Because enhanced aromatic sequons in proper turn contexts are stabilizing and may be preferred OST substrates, engineering glycoproteins with these sequences is a useful tool for protein evolution. Thermodynamic stabilization has proven essential for the discovery of mutants with enhanced activity where functional requirements might be at odds with optimal folding energetics. The enhanced aromatic sequon design concepts outlined within should also be immediately applicable to pharmacologic proteins, including antibodies, which could benefit from additional thermodynamic stabilization (and thus increased against proteolysis and aggregation) beyond the numerous other benefits of N-glycosylation such as improved serum half-life; solubility; and lowered immunogenicity [Li et al . , Curr Opin Biotechnol 20, 678-684 (2009); Sinclair et al . , J Pharm Sci 94,
1626-1635 (2005); Sola et al . , BioDrugs 24, 9-21 (2010); Walsh et al . , Nat Biotechnol 24, 1241-1252 (2006) ] . Using the before noted the ideal platform offered by Pin 1 WW domain loop 1 reverse turn types, four-, five-, and six-residue reverse turns
comprising loop 1 of WW were converted to their corresponding enhanced aromatic sequons by replacing the amino acid at position 16 (Ser in all cases) with Phe, replacing the amino acid at position 19 (Asn, Asp, or Ser, respectively) with Asn (GlcNAcl ) , and replacing the amino acid at position 21 (Arg in all cases) with Thr [Jager et al . Proc. Natl. Acad. Sci. USA 103, 10648-106531 (2006); and Jager et al . , J Mol Biol 311, 373-393 (2001) ] .
Note that the same number is used to indicate amino acid residues in analogous positions in WW variants with different loop 1 lengths [Jager et al. Proc. Natl. Acad. Sci. USA 103, 10648-106531 (2006); and Fuller et al . Proc. Natl. Acad. Sci. USA 106, 11067-11072 (2009)]. Thus, the sequences of the enhanced aromatic sequons in the four-, five-, and six-residue reverse turns comprising loop 1 include Phel6-Asn (GlcNAcl) 19-Gly20-Thr21 , Phel6-Alal8- Asn (GlcNAcl) 19-Gly20-Thr21 , and Phel6-Argl7-Serl8- Asn(GlcNAcl) 19-Gly20-Thr21 , respectively.
The stabilizing effect of glycosylating enhanced aromatic sequons can be quantified by comparing the stabilities of WW variants with
glycosylated enhanced aromatic sequons to the
stabilities of their non-glycosylated counterparts. The contributions of two- and three-way interactions amongst the Phel6, Asnl9 (GlcNAcl ) and Thr21 side chains to the overall stabilizing effect of
glycosylation can be estimated using triple mutant cycle analyses, done previously [Culyba et al . ,
Science 331, 571-575 (2011)]. This parsing of stabilization energies through energetic comparisons was accomplished by replacing Phel6, Asnl9 (GlcNAcl ) and Thr21 by Serl6, Asnl9, and Arg21, respectively, in every possible combination, for a total of eight proteins in each of the three correlated enhanced aromatic sequon-reverse turn contexts. The results of these analyses are described hereinafter
The WW variants are named by the number of amino acids in the loop 1 reverse turn, followed by the letter "q" if the variant is N-glycosylated on Asnl9, the letter "F" if it has Phe at position 16, and the letter "T" if it has Thr at position 21. The lack of the letters g, F, and/or T indicates that the variant is not N-glycosylated on Asnl9, that position 16 is Ser, and/or that position 21 is Arg,
respectively. For example, variant 4g-F,T has a
4-residue loop 1 type I' β-turn, with Asn (GlcNAcl) at position 19, Phe at position 16, and Thr at position 21. Variant 4 has a 4-residue loop 1 type I' β-turn, with Asn at position 19, Ser at position 16, and Arg at position 21 (see the table hereinafter for the names of the WW variants studied) .
Stabilization from Glycosylating Enhanced Aromatic Sequons
To quantify the stabilizing effect of glycosylating enhanced aromatic sequons in loop 1 of the corresponding four-, five-, and six-residue reverse turns, we used variable temperature circular dichroism (CD) spectropolarimetry to analyze the thermodynamic stability of WW variants 4-F,T, 4g-F,T,
5-F,T, 5g-F,T, 6-F,T, and 6g-F,T. CD data for 6-F,T and 6g-F,T and their derivatives (described below) have been published previously at a protein concentration of 50 μΜ [Culyba et al . , Science 331, 571-575 (2011)], but were further studied herein at a protein concentration of 10 μΜ (the energetic data are comparable at both concentrations) to facilitate direct comparisons with 4-F,T, 4g-F,T, 5-F,T, and
5g-F,T and their derivatives (some of which were not completely soluble at 50 μΜ) .
The table below shows the melting temperature Tm and free energy of folding free
energies (AGf) (at 65° C) for each protein and
corresponding glycoprotein, along with the effect of glycosylation on the Tm and AGf (at 65° C) for each protein. A reference temperature of 65° C was used because it is within
Table*
AGf MG£
Protein Sequence* Tm (°C)
(kcal/mol) (kcal/mol)
15 21
4 MS-NGR 2_2 ± 0.06 ± 0.04 _Q^ ±
4g MS-NGR ^6 °" 6 -0.17 ± 0.04 °-°6
4-F MF-NGR ^ ± ^5 ± -0.18 ± 0.08 _Q_18 ±
4g-F MF-NGR 68 °" 7 -0.36 ± 0.05 °-°8
4-T MS—NGT 62 ' 2± n o 0.30 ± 0.04
0.4 — . o ±
0.07 ± 0.07
61.4 + 0.6
4g-T MS—NGT 0 5 " °'37 ± °'°5
4-F,T MF NGT ± ^2 ± 0.18 ± 0.03 _Q_39 ±
66.7 + 0.7 0.09 4g-F,T MF—NGT ' ~ -0.21 ± 0.08
— (J .6
15 21
5 MS-ANGR 68 0_6 ± -0.38 ± 0.02 _0_Q7 ± 5g MS-ANGR ^- +- °-3 -0.46 ± 0.03 ° " 04
5-F MF-ANGR ± ^q ± -0.02 ± 0.03 _Q_55 ±
70 3 + 0.4 0.04 5g-F MF-ANGR 0 2 " -0.58 ± 0.02
68 9 + 2 4 + -0 23 +
5-T MS-ANGT 0_2 - 0_3- -0.42 ± 0.02 5g-T MS-ANGT -0.65 + 0.03
0 .3
5 66. 0 +
-F,T MF-ANGT -0 .11 + 0 .02
0 .2 9.2 ± -0. 94 ±
75. 2 + 0.2 0 .03
5g-F,T MF-ANGT -1 .05 + 0 .02
0 .2
15 21
56. 2 +
6 MSRSNGR 0. 95 + 0. 04
0 .3 -2.6 ±
0. .21 ± 0.06
53. 6 + 0.4
6g MSRSNGR 1.16 + 0.04
0 .3
6 51. 0 +
-F MFRSNGR 1. 45 + 0. 06
0 .3 0.7 ± -0. 17 ±
51. 7 + 0.4 0 .08
6g-F MFRSNGR 1. 28 + 0. 04
0 .3
52. 5 +
6-T MSRSNGT 1. 22 + 0. 05
0 .3 -0.2 ±
0. .04 ± 0.07
52. 3 + 0.5
6g-T MSRSNGT 1.26 + 0.05
0 .3
47. 4 +
6-F,T MFRSNGT 1. 72 + 0. 09
0 .4 7.6 ± -0. 70 ±
55. 0 + 0.5 0 .10
6g-F, T MFRSNGT 1. 02 +
n 0. 04
.3
* Tabulated data are given as mean ± standard error at 65 °C for
WW variants at 10 μΜ in 20 mM aqueous sodium phosphate, pH 7. ^ N = Asn(glycan) . the transition regions of all the variants studied herein. Extrapolating AGf to temperatures outside the transition region using thermodynamic parameter estimates from fits to variable temperature CD data is unreliable (because errors in ACP, the least-well defined parameter from such fits, become magnified outside the transition region) . For sets of proteins with similar ACP values, the differences between their Tm values should reflect the differences between their AGf values both at 65° C and at lower temperatures.
The Tm of glycoprotein 4g-F , T is 3.2 ± 0.7° C higher than that of protein 4-F , T (AAGf = -0.39 ± 0.09 kcal mol-1 at 65° C) , indicating that
glycosylating the Phe-Asn-Yyy-Thr enhanced aromatic sequon in the context of a four-residue type I ' β-turn stabilizes WW. Glycosylating the Phe-Xxx-Asn- Yyy-Thr sequon in the context of the five-residue type I β-turn also stabilizes WW (ATm = 9.2 ± 0.2° C, AAGf = -0.94 ± 0.03 kcal mol 1 at 65° C) , as does glycosylating the Phe-Xxx-Zzz-Asn-Yyy-Thr sequon in the type II β-turn in a six-residue loop (ATm = 7.6 ± 0.5° C, AAGf = -0.70 ± 0.10 kcal mol-1 at 65° C) .
These data indicate that the Phe-Xxx-Asn-Yyy-Thr enhanced aromatic sequon corresponding to the five- residue type I β-bulge turn is, overall, the best for stabilizing WW amongst those studied here.
Interaction Energies in Enhanced Aromatic Sequons from Triple Mutant Cycle Analysis
To determine whether Phe, Asn(GlcNAcl) and Thr side chains interact similarly in each correlated enhanced aromatic sequon/reverse turn context, the thermodynamic stabilities of each WW variant were measured in the four-, five-, and six-residue reverse turn groups in the table above. The data from each group of eight WW variants comprise a triple mutant cycle (Fig. 5) . Triple mutant cycles contain more information than conventional double mutant cycles, because each of the six "faces" of a triple mutant cycle "cube" is itself a double mutant cycle
[Horovitz et al . , J Mol Biol 224(3), 733-740 (1992)]. Whereas double mutant cycles provide information about the energetic impact of an interaction between two residues, a triple mutant cycle provides
information about the energetic impact of the two- and three-way interactions.
Extracting this information from a triple mutant cycle is straightforward, and begins with analyzing the double mutant cycle faces of the triple mutant cycle cube (Fig. 5) . The double mutant cycle formed by proteins 4 and 4-F and glycoproteins 4g and 4g-F (the front face of the triple mutant cycle cube in Fig. 5A) , reveals that glycosylation of Asnl9 (in the presence of Arg21) stabilizes glycoprotein 4g relative to protein 4 (AAGf,i = -0.23 ± 0.06 kcal mol-1 at 65° C) . Similarly, glycosylation of Asnl9 (in the presence of Arg 21) stabilizes 4g-F relative to 4g (AAGf,2 = -0.18 ± 0.08 kcal mol-1 at 65° C) . The difference between AAGf,2 and AAGffl (AAAGf,front = 0.05 ± 0.10 kcal mol-1 at 65° C) indicates that changing Serl6 to Phel6 (while keeping Arg21 constant) does not significantly change the effect of glycosylating Asnl9 in the four-residue type I' β-turn. In other words, Phel6 and Asn (GlcNAcl ) 19 do not interact favorably in 4g-F.
Changing Arg21 to Thr21 changes this trend. The double mutant cycle formed by proteins 4-T, 4g-T, 4-F,T, and 4g-F,T (the back face of the triple mutant cycle "cube" shown in Fig. 5A) reveals that in the presence of Thr21 (instead of Arg21), Phel6 and
Asn (GlcNAcl) interact favorably (AAAGf,baCk = -0.46 ± 0.11 kcal mol-1 at 65° C) . The difference between the front and back double mutant cycles is an estimate of the energy of the three-way interaction between
Phel6, Asn (GlcNAcl) 19, and Thr21. The large
difference between AAAGffback and AAAGff fr0nt for the four-residue type I' β-turn (AAAAGf = -0.51 ± 0.15 kcal mol-1 at 65° C) indicates that Phel6,
Asn (GlcNAcl ) 19 and Thr21 engage in a favorable three- way interaction in 4g-F,T.
Similar analyses of the triple mutant cycles formed by proteins 5 and 6 and their
derivatives (Fig. 5B, 5C) reveal a favorable
interaction between Phel6, Asn (GlcNAcl ) 19 , and Thr21 in the five-residue type I β-bulge turn (AAAAGf = - 0.23 ± 0.07 kcal mol-1 at 65° C) and in the type II β- turn in a six-residue loop (AAAAGf = -0.36 ± 0.15 kcal mol-1 at 65°C) . This three-way interaction between Phel6, Asn (GlcNAcl) 19, and Thr21 is similarly
favorable in each reverse turn context (perhaps more favorable in the type I ' β-turn than in the type I β-bulge turn, but recall that this is only part of the overall stabilizing effect of N-glycosylation) .
The attribution of AAAGf,fr0nt and AAAGf,back values to the interaction between Phel6 and
Asn (GlcNAcl) 19, and of AAAAGf to the tripartite interaction among Phel6, Asn (GlcNAcl ) 19 , and Thr21, assumes that the Serl6 side chain does not interact with the side chains at positions 19 or 21, and that the Arg21 side chain does not interact with the side chains at positions 16 or 19, in any variant. This assumption is, to a first approximation, consistent with the available structural data. Crystal
structures of WW in the context of the full-length Pinl protein [Ranganathan et al., (1997) Cell 89:875- 886; and Jager et al . , (2006) Proc. Natl. Acad. Sci . USA 103:10648-10653] show that the side chains at positions 16, 19, and 21 generally interact only with solvent or the main chain (see, Fig. IB-ID) .
The lone exception is an interaction between the side chain hydroxyl of Serl6 and the side chain carboxylate of Aspl9 in the type I β-bulge turn (Fig. 1C) . However, the equivalent interaction
(between the Serl6 hydroxyl and the Asnl9 side chain carbonyl) in the variants of 5 that have Ser at position 16 (5, 5g, 5-T, and 5g-T) should be the same whether Asnl9 is N-glycosylated or not, and thus should not affect the analysis.
It is also noted that the reverse turn structures are likely to depend primarily on loop length and the identities of a few key residues
(e.g., Asnl9 and Gly20 in the variants of 4 and Gly20 in the variants of 5, because these amino acids are strongly favored in these positions of type I ' β-turns and type I β-bulge turns, respectively)
[Sibanda et al . , J Mol Biol 206(4), 759-777 (1989); and Jager et al . Proc. Natl. Acad. Sci . USA 103,
10648-10653 (2006). Because these factors are kept constant within the variants that make up each triple mutant cycle, the corresponding reverse turn
structures should remain roughly constant as well.
Least-squares regression was used to extract additional information about interactions amongst Phe, Asn (GlcNAcl ) , and Thr from the triple mutant cycles formed by WW variant groups 4, 5, and 6. The folding free energy data (at 65 °C) from the triple mutant cycle formed by 4 and its derivatives were fit to the following Equation ( A ) : kG{ = AG° + CF · WF + C^ · Wj, + CT · WT
"'"Cp N * ' p χ ' '
Figure imgf000057_0001
* * Wr ( A )
Figure imgf000057_0002
Equation A shows how the AGf of a given variant of 4 is related to the average AGf° of 4, plus a series of correction terms that account for the interactions amongst the amino acids at positions 16, 19, and 21. Each correction term is a product of one or more indicator variables W (that reflect whether a mutation is present in the given variant) and a free energy contribution factor C . WF is 0 when position 16 is Ser or 1 when it is Phe; WN is 0 when position 19 is Asn or 1 when it is Asn (GlcNAcl ) ; WT is 0 when position 21 is Arg or 1 when it is Thr. CF , CN, and CT describe the energetic consequences of the Serl6 to Phel6, Asnl9 to Asn (GlcNAcl ) 19 , and Arg21 to Thr21 mutations, respectively. These energies are thought to reflect the difference in conformational
preferences between Ser and Phe at position 16, Asn and Asn (GlcNAcl) at position 19, and Arg and Thr at position 21.
CF , N CF , T, and CN, T describe the free energies of the two-way interactions between Phel6 and Asn (GlcNAcl ) 19 , between Phel6 and Thr21, and between Asn (GlcNAcl ) 19 and Thr21, respectively. CF , N, T describes the energetic impact of the three-way interaction between Phel6, Asn (GlcNAcl ) 19 , and Thr21. CF , , CF , T , CN, T, and CF , N, T are essentially equivalent to the two- and three-way interaction energies (AAAGf and AAAAGf values) that could be calculated by a
conventional analysis (e.g., as in the preceding section) of the triple mutant cycle data [Horovitz et al., (1992) J Mol Biol 22 4 ( 3 ) : 733-740 ] , but obtaining them by regression is more convenient, and can provide their standard errors in the regression output. Similar analyses for triple mutant cycle analysis of folding free energy data at 65° C (338.15 K) for glycosylated and non-glycosylated WW variants harboring either a four-, five-, or six-residue reverse turn in loop 1 were performed and the results are shown in the table below. Note that the caveats Table*
I β-bulge
-0. 38 ± 0 .04 0. 95 ± 0. 04
AGF 0 0. .06 + 0. 06 (0.287)
(0.000) (0. 000)
0. 24 ± 0.08 0. 36 ± 0. 06 0. 50 ± 0. 06
<~F
(0. 005) (0. 000) (0. 000)
0. 23 ± 0.08 -0 .07 ± 0. .06 0. 21 ± 0. 06
(0. 009) (0. 248) (0. 000)
Figure imgf000059_0001
-0 .48 ± 0. .09 -0 .38 ± 0 .08
CF,N 0. .05 + 0. 11 (0.661)
(0. 000) (0.000)
-0 .05 ± 0. .09 0. 00 ± 0. 08
CF, T 0. .15 + 0. 11 (0.168)
(0.562) (0. 983)
-0 .16 ± 0. .09 -0 .17 ± 0 .08
0. .31 + 0. 12 (0.015)
(0. 088) (0.051)
0. 54 ± 0.15 -0 .23 ± 0. .12 -0 .36 ± 0 .11
*~F,N,T
(0. 001) (0. 078) (0. 006)
* Parameters are given as mean ± standard error. P values given in parentheses indicate the probability that random sampling error accounts for the difference between zero and the observed value of the parameter. to the conventional analysis of triple mutant cycle data mentioned in the preceding section apply to this analysis as well.
According to Equation A, the stabilizing effect of glycosylating the enhanced aromatic sequon in 4-F,T [AAGf = AGf(4g-F,T) - AGf ( 4-F , T ) ] is equal to the sum of the corresponding values of CN, CF,N, CN,T, and CF,N,T. The same is true for 5-F,T and 6-F,T.
Thus, by comparing CN, CF,N, CN,T, and CF,N,T values one can trace the origins of the stabilizing effect of glycosylating the enhanced aromatic sequon in each reverse turn context (Fig. 6) .
Changing Asnl9 to Asn (GlcNAcl ) 19 affects each turn type differently: it stabilizes the four- residue type I' β-turn (CN = -0.23 ± 0.08 kcal mol-1) , does not affect the five-residue type I β-bulge turn substantially (CN = -0.07 ± 0.06 kcal mol~ ), and destabilizes the type II β-turn within a six-residue loop (CN = 0.21 ± 0.06 kcal mol-1) . It is possible that Asn (GlcNAcl) has backbone dihedral angle
preferences that are more compatible with the i+1 position of a type I ' β-turn than with the i+2 position of a type I β-bulge turn or with the i+3 position of a type II β-turn. If so, such
preferences would differ substantially from those of Asn itself [Hovmoller et al . , Acta Crystallogr D 58, 768-776 (2002)], which is favored at i+1 in a type I' β-turn, and at i+2 in a type I β-turn, but not at i+3 in a type II β-turn [Hutchinson et al . , Protein Sci 3 (12) , 2207-2216 (1994) ] .
The two-way interaction between Phel6 and Asn (GlcNAcl ) 19 stabilizes the five-residue type I β-bulge turn (CF,N = -0.48 ± 0.09 kcal mol-1) and the type II β-turn within a the six-residue loop (CF,N = -0.38 ± 0.08 kcal mol-1) , but does not substantially change the stability of the four-residue type I ' β-turn (CF,N = 0.05 ± 0.11 kcal mol-1) . These
differences appear not to correlate with differences among the Οβ-Οβ distances between positions 16 and 19 in the four-, five-, and six-residue turns (Fig. 1B- D) , although it is possible that the backbone
flexibility and/or direction of the C -Οβ bond vectors in the five- and six-residue turns permit better two-way interactions between Phel6 and
Asn (GlcNAcl ) 19 than are possible in the four-residue turn .
The two-way interaction between
Asn (GlcNAcl ) 19 and Thr21 stabilizes the five- and six-residue turns (CN,T = -0.16 ± 0.09 kcal mol-1 and -0.17 ± 0.08 kcal mol-1, respectively), but substantially destabilizes the four-residue turn (CN,T = 0.31 ± 0.12 kcal mol-1) . Published structural data [Wyss et al., Science 269, 1273-1278 (1995)] indicate that the glycosylated enhanced aromatic sequon in an analogous type I β-bulge turn in HsCD2ad involves three hydrogen bonds between Thr and Asn(GlcNAcl) : one between the Thr side-chain oxygen and the amide proton of the 2-acetamido group of GlcNAc, and two between the Asn side-chain amide carbonyl oxygen and the backbone amide and side-chain hydroxyl protons of Thr (Fig. 1A) . The differences observed here between the CN;T values in the four-, five-, and six-residue turn contexts could reflect the presence of analogous hydrogen bonds in the type I β-bulge turn of 5g-F,T and in six-residue loop of 6g-F,T, but not in the type I' β-turn of 4g-F,T.
The CF,N,T values for the four-residue type I' β-turn (CF,N,T = -0.54 ± 0.15 kcal mol-1) , the five- residue type I β-bulge turn (CF,N,T = -0.23 ± 0.12 kcal mol-1) , and the type II β-turn within a six-residue loop (CF, ,T = -0.36 ± 0.11 kcal mol-1) mirror the
AAAAGf values obtained by comparison of the front and back double mutant cycles in each triple mutant cube in Fig. 5, confirming that the three-way interaction between Phel6, Asn (GlcNAcl ) 19 , and Thr21 stabilizes each reverse turn type by similar amounts.
Discussion
Glycosylating an enhanced aromatic sequon in its correlated reverse turn context is
stabilizing. However, the origins of this
stabilizing effect differ amongst the enhanced aromatic sequon/reverse turn pairs (Fig. 6) . In the type I ' β-turn, this effect comes predominantly from the three-way interaction between Phel6, Asn (GlcNAcl) 19, and Thr21 (CF , , T) and from the Asnl9 to Asn (GlcNAcl ) 19 mutation (CN) , offset by an unfavorable two-way interaction between
Asn (GlcNAcl) 19 and Thr21 (CN, T ) ·
In the type I β-bulge turn, the two-way interaction between Phel6 and Asn (GlcNAcl ) 19 (CF , N ) contributes more than does the three-way interaction between Phel6, Asn (GlcNAcl ) 19 , and Thr21 (CF , N, T) . In the type II β-turn within a six-residue loop, the two-way interaction between Phel6 and Asn (GlcNAcl ) 19 ( CF , N) and the three-way interaction between Phel6, Asn (GlcNAcl ) 19 and Thr21 (CF , N, T) contribute similar amounts, offset by the unfavorable effect of the Asnl9 to Asn (GlcNAcl ) 19 mutation (CN) . Despite these differences, the results provided here show that each reverse turn type is a suitable host for its
corresponding enhanced aromatic sequon.
Adding N-glycans to naive sites in proteins can be an attractive strategy for increasing their stability. This approach has been used in the development of protein drugs [Walsh et al . , Nat
Biotechnol 24(10), 1241-1252 (2006); Sinclair et al . , J Pharm Sci-Us 94 (8), 1626-1635 (2005); Li et al . , Curr Opin Biotech 20(6), 678-684 (2009); and Sola et al., Biodrugs 24(1), 9-21 (2010)], where new
N-glycans can extend serum half-life [Egrie et al . , Exp Hematol 31(4), 290-299 (2003); Su et al . , Int J Hematol 91 (2), 238-244 (2010); and Ceaglio et al . , Biochimie 90(3), 437-449 (2008)] and shelf-life, owing in part to increased protease resistance [Raju et al., Biochem Bioph Res Co 341(3), 797-803 (2006)], decreased aggregation propensity, and compensation for the destabilizing effect of methionine oxidation [Liu et al., Biochemistry 47 (18) , 5088-5100 (2008)]. Historically, efforts to increase protein stability via N-glycosylation have depended on a trial-and- error approach [Ceaglio et al . , Biochimie 90(3), 437- 449 (2008); and Elliott et al . , J. Biol. Chem. 279, 16854-16862 (2004)], which resulted in unpredictable energetic consequences [Price et al . , J. Am. Chem. Soc. 132, 15359-15367 (2010); Hackenberger et al . , J. Am. Chem. Soc. 127, 12882-12889 (2005); and Chen et al. Proc. Natl. Acad. Sci. USA 107(52), 22528-22533 (2010) .
By matching each enhanced aromatic sequon to an appropriate reverse turn conformation, the present invention has provided engineering guidelines by which N-glycosylation can reliably stabilize proteins. These matches include Phe-Asn-Yyy-Thr for type I' β-turns, Phe-Xxx-Asn-Yyy-Thr for type I β-bulge turns, and Phe-Xxx-Zzz-Asn-Yyy-Thr [SEQ ID
NO: ] for type II β-turns within a six-residue loop. Each appears to facilitate native-state stabilizing interactions between Phe, Asn(GlcNAc) and Thr in glycosylation-naive proteins that have not evolved to optimize protein-carbohydrate interactions [Culyba et al . , Science 331, 571-575 (2011)]. The structure-stability relationships unveiled by this work also enable investigators to better predict which glycans can be removed from a glycoprotein to increase crystallization propensity, without yielding an unfolded or destabilized protein.
As noted earlier, the type I β-bulge turn and the type II β-turn in a six-residue loop (in which the Phe-Xxx-Asn-Yyy-Thr and Phe-Xxx-Zzz-Asn- Yyy-Thr sequons were previously applied, respectively) comprise less than 9% of all reverse turns in the PDB [Sibanda et al . , J Mol Biol 206(4), 759-777 (1989); and Oliva et al . , J Mol Biol 266(4), 814-830 (1997)]. By successfully applying the new Phe-Asn-Yyy-Thr enhanced aromatic sequon to the type I' β-turn (which comprises nearly 11% of all reverse turns in the PDB) , the number of candidate proteins in which enhanced aromatic sequons can be employed without altering the conformation or the number of residues comprising the native reverse turn is doubled [DeGrado et al . , Annu Rev Biochem 68, 779-819 (1999); and Gellman, Curr Opin Chem Biol 2(6), 717- 725 (1998) ] .
MATERIALS AND METHODS
General
Unless otherwise noted, chemicals and products were purchased from Fisher Scientific or Sigma-Aldrich . Phosphate buffered saline (PBS) was prepared from PBS tablets (SIGMA P-4417) and
maintained at pH 7.2 with 0.5 mM TCEP and 0.01% sodium azide. 50 mM acetate buffer was prepared from a 4X solution made from 4X solutions of acetic acid (Acros Organic 124040025) and sodium acetate
trihydrate (SIGMA 236500) to achieve a final pH of 5.5. Acetate buffer was also prepared with 0.5 mM TCEP and 0.01% sodium azide. All buffer solutions were filtered (Millipore 0.2 μΜ) . Protein was concentrated using Amicon centrifugation devices, MWCO 3kDa (Millipore) . Final concentrations of
RnCD2* and AcyP2* variants were determined by
evaluation of absorbance at 280 nm using calculated extinction coefficients (ExPASy, ProtParam tool, Swiss Institute of Bioinformatics ) . All
oligonucleotides for site directed mutagenesis were purchased from Integrated DNA Technologies (IDT), 25 nmole DNA oligo normalized to 100 μΜ in IDTE pH 8.0. Wild type RnCD2 and AcyP2 gene constructs were ordered from IDT as miniGenes in pZErO-2 vectors (Kan resistant ) .
RnCD2 amino acid sequence
The sequence of wild type RnCD2 used for site directed mutagenesis to produce mutant sequences used :
HHHHHHENLYFQS DYKDDDDKIEGR ADCRDSGTVW
GALGHGINLN IPNFQMTDDI DEVRWERGSTLV
AEFKRKMKPF LKSGAFEILA NGDLKIKNLT
RDDSGTYNVTVY STNGTRILDK ALDLRILEM
SEQ ID NO:
The first 6 residues are a 6 Histidine-tag, which was included for Nickel affinity chromatography purification. This tag is followed by a 7-residue Tobacco Etch Virus protease cleavage site (TEVs) tag. This tag/protease cleavage site combination is followed by a 9-residue FLAG-tag, which in turn is followed by the 4-residue Factor Xa cleavage site (Xas) that was included so that all of the tags could be removed from the expressed gene construct (which was done before all measurements were taken) .
All residues are numbered to correspond to homologous residues in human CD2ad. Thus, the numbering begins with 3; i.e., Ala3, and all
following residue numbers increase sequentially. It should also be noted that some sequence changes were made to all mutants to ensure that the protein was only glycosylated at the desired position (Asn65) when expressed in Sf9 cells.
The wild type RnCD2 sequence contains three glycosylation sequons . The asparagines in these positions, Asn72, Asn82, and Asn89, were mutated to glutamine, glutamine, and aspartic acid (underlined) , respectively. Finally, to confer glycosylation at Asn65 (bold), Asp67 (bold and underlined) was mutated to threonine [SEQ ID NO: ] . Ilillllilliiillill
WITH NO BOLD Asn65 and Thr for Asp67
AcyP2 amino acid sequence
The sequence of wild type AcyP2 used for site directed mutagenesis to produce mutant sequences used :
HHHHHHENLYFQS DYKDDDDKIEGR MSTAQSLKSV DYEVFGRVQG VCFRMYTEDE ARKIGVVGWV KNTSKGTVTG QVQGPEDKVN SMKSWLSKVG SPSSRIDRTN FSNEKTISKL EYSNFSIRY
SEQ ID
The same purification/protease site tag used in the RnCD2* variants was used for AcyP2* variants and as with RnCD2* the entire tag was remove via Factor Xa cleavage prior to all studies. Note that the residues are numbered starting with the first residue (Met) after the Factor Xa cleavage site. It should also be noted that some sequence changes were made to all mutants to ensure that the protein was only glycosylated at the desired position (45) when expressed in Sf9 cells. The wild type AcyP2 sequence contains three glycosylation sequons. The serines in these positions, Ser44, Ser82, and Ser96 (underlined), were mutated to alanine.
Finally, to confer glycosylation at position 45, Lys45 (bold and underlined) was mutated to asparagine
[ SEQ I D NO : ] .
Lys45 and change to Asn
Pinl amino acid sequence
Peptidyl-prolyl cis-trans isomerase NIMA- interacting 1 (Pinl) is an enzyme (EC 5.2.1.8) that regulates mitosis presumably by interacting with NIMA and attenuating its mitosis-promoting activity. The enzyme displays a preference for an acidic residue N-terminal to the isomerized proline bond. The enzyme catalyzes pSer/Thr-Pro cis/trans
isomerizations, and its amino acid residue sequence in single letter code is shown below, from left to right and from N-terminus to C-terminus .
MADEEKLPPG WEKRMSRSSG RVYYFNHITN ASQWERPSGN SSSGGKNGQG EPARVRCSHL LVKHSQSRRP SSWRQEKITR TKEEALELIN GYIQKIKSGE EDFESLASQF SDCSSAKARG DLGAFSRGQM QKPFEDASFA LRTGEMSGPV FTDSGIHIIL RTE
SEQ ID NO :
Pinl WW domain amino acid sequence
Residues 6 through 44 at the N-terminus constitute the WW domain of Pinl [Ranganathan et al., Cell 89 , 875-886 (1997)]. The WW domain
sequences used as illlustrative herein are from position-6 through position-38. Amino acid residue position changes made to the WW domain are designated with the original amino acid residue position from the N-terminus . The amino acid residue sequences utilized herein are shown in the tables below along with their expected and observed MALDI-TOF [M+H+] values .
Figure imgf000068_0001
* N = Asn(GlcNAc), Dash = deletion
MALDI-TOF [M+H+]
Figure imgf000068_0002
4-F 3826.9 3826.4
4g-F 4030.0 4030.5
4-T 3711.8 3711.8
4g-T 3914.9 3916.3
4-F, T 3771.9 3770.8
4g-F, T 3974.9 3975.1
5 3837.9 3837.4
5g 4041.0 4041.7
5-F 3897.9 3898.1
5g-F 4101.0 4101.6
5-T 3782.9 3783.2
5g-T 3985.9 3986.2
5-F,T 3842.9 3842.7
5g-F,T 4046.0 4045.4
6 4010.0 Φ
6g 4213.1 Φ
6-F 4070.0 Φ
6g-F 4273.1 Φ
6-T 3954.9 Φ
6g-T 4158.0 Φ
6-F, T 4015.0 Φ
6g-F, T 4218.1 Φ
* N = Asn (GlcNAc) ; Φ Monoisotopic masses; Φ Determined previously [Culyba et al., Science 331, 571-575 (2011)].
Structural coordinates
RnCD2 structural coordinates were obtained from the PDB (accession code 1HNG) . AcyP2 structural coordinates were obtained from the PDB for horse muscle acylphosphatase (accession code lAPS.pdb), which shares 94% sequence homology with the human protein. Coordinates were manipulated and rendered using PyMOL software (Schrodinger LLC) .
Molecular Biology
All PCR was performed using Pfu Turbo® DNA polymerase (Stratagene) using recommended conditions. Restriction enzymes were obtained from New England Biolabs and applied as indicated. DNA fragments were ligated with standard conditions supplied for T4 ligase (Roche) . Amplified and digested DNA was purified using 1% agarose (molecular biology grade gel prepared in TAE buffer. DNA isolation/
purification steps, including genomic isolation, plasmid isolation, restriction digestion clean-up, and PCR purification were performed with Qiagen kits. Clones were transformed, amplified, and maintained in DH5 E. coli. All clones were verified for accuracy by sequencing.
Protein purification steps on FPLC
All FPLC procedures were carried out on an AKTA FPLC from GE Healthcare. HisTrap™ HP columns (1 mL) were run in 25 mM sodium phosphate, 300 mM NaCl, 5-300 mM imidazole, pH 8.0 at a flow rate of 3 mL/minute at room temperature. A Superdex™ 75 10/300 GL column (24 mL) was run in PBS (RnCD2*) or acetate (AcyP2*) at a flow rate of 0.4 mL/minute at room temperature (retention times: RnCD2* with glycan 12.5 minutes, RnCD2* without glycan 12.75 minutes, AcyP2* with glycan 14.75 minutes, AcyP2* without glycan 15 minutes ) .
Fluorescence spectrometry
Both RnCD2 * and AcyP2* have at least one tryptophan residue buried in the hydrophobic core allowing for an intrinsic fluorescence that depends on the folding status. Fluorescence measurements for RnCD2* and AcyP2 variants were obtained using either a CARY Eclipse (Varian) or an ATF-105 (Aviv)
fluorescence spectrometer. Measurements were made in quartz cuvettes, at 25° C, at protein concentrations of 5-30 μg/mL, unless otherwise noted. Fluorescence emission spectra were collected from 315 to 400 nm, following excitation at 280 nm. CD spectrometry
CD measurements were made using an Aviv™ 62A DS spectropolarimeter , using quartz cuvettes with path lengths of 0.1 or 1 cm. WW domain solutions were prepared in 20 mM sodium phosphate buffer, pH 7; protein solution concentrations were determined spectroscopically from tyrosine and tryptophan absorbance at 280 nm in 6 M guanidine hydrochloride + 20 mM sodium phosphate (sTrP = 5690 M^cm"1, sTyr = 1280 M^cm-1) as described previously [Price et al . , J. Am. Chem. Soc. 132, 15359-15367 (2010); and Edelhoch Biochemistry 6, 1948-1954 (1967)]. CD spectra were obtained by monitoring molar ellipticity from 340 to 200 nm in 1 nm increments, with 5-second averaging times. Variable temperature CD data were obtained by monitoring molar ellipticity at 227 nm from 0.2 to 98.2°C at 2°C intervals, with 90 second equilibration time between data points and 30 second averaging times. The variable temperature CD data were fit to obtain Tm and AGf values for each protein, as
described previously [Price et al . , J. Am. Chem. Soc. 132, 15359-15367 (2010)], and elsewhere herein.
Preparation of RnCD2* and AcyP2* variants
Construction of non-glycosylated variant genes
Genes for non-glycosylated versions of RnCD2* and AcyP2* were subcloned into pT7-7
expression vectors using the PIPES method [Klock et al., Methods Mol Biol 498, 91-103 (2009)], to create pHisFLAG-RnCD2b and pHisFLAG-AcyP2b with native sequons removed sequences. The total N- to C-protein coding region is: Met-6His-TEVs-FLAGtag-FXas-RnCD2 * or AcyP2*.
Site directed mutagenesis:
All mutant variants were engineered from these constructs using quick change site directed mutagenesis .
Expression of non-glycosylated
variants in E. coli (rich medium)
Bacterial RnCD2* and AcyP2* were expressed as described previously [Hanson et al . , Proc Natl Acad Sci USA 106, 3131-3136 (2009)].
Nickel Affinity Purification
Cells were thawed and resuspended in an appropriate purification buffer (RnCD2* variants: 25 mM sodium phosphate, 300 mM NaCl, 5 mM imidazole, 0.5 mM TCEP, pH8.0; AcyP2* variants: same as above with 25 mM TrisHCl in place of phosphate) in l/20th of the original growth volume. Protease inhibitors (1 tablet/50 mL; Roche EDTA-free) were added. Cells were lysed by sonication. The cell lysate was spun down (15,000 rpm, 30 minutes, 4° C) , the soluble fraction ( supernatent ) was separated from the
insoluble fraction (pellet) and used for Ni-NTA purification. In the case of RnCD2* variants RnCD2*K and RnCD2*KF and AcyP2* variant AcyP2*-F the
insoluble fraction was treated with 6 M guanidine hydrochloride (GdnHCl) in the appropriated binding buffer and subjected to for Ni-NTA purification under denaturing conditions (6 M GdnHCl) .
Superflow™ Ni-NTA resin was used to
affinity purify proteins via the 6xHis tag, using conditions described in the Qiagen manual.
Denaturing purification was performed similarly with the addition of 6 M GdnHCl to all solutions. Eluted fractions were exchanged into Factor Xa cleavage buffer (50 mM TrisHCl, 100 mM NaCl, pH 7.9) and concentrated in Amicon centrifugation devices.
Factor XA cleavage of N-terminal
tags from non-glycosylated proteins
5 mM CaCl2 was added to concentrated protein in 50 mM TrisHCl, 100 mM NaCl, pH 7.9 before Factor Xa (New England Biolabs) treatment (1 μg Factor Xa : 100 μg of RnCD2 of mAcP) . For RnCD2 * variants, the protease reaction was carried out at 4° C for 12 hours. For AcyP2* variants, the protease reaction was carried out at 25° C for 2 hours. The cleavage mixture was quenched with 100 μΜ PMSF and separated and buffer exchanged by FPLC (Superdex® 75) . RnCD2 * final buffer: PBS, 0.5 mM TCEP, 0.01% sodium azide, pH 7.2. AcyP2* final buffer: 50 mM Acetate, 0.5 mM TCEP, 0.01% sodium azide, pH 5.5. HisFLAG-free
RnCD2 * ESI MS found: 11578; RnCD2*-K ESI MS found: 11576; RnCD2*-F ESI MS found: 11612; RnCD2 *-KF ESI MS found: 11611. HisFLAG-free AcyP2* ESI MS found:
11078; AcyP2*-F ESI MS found: 11124.
Glycosylated variants
Cloning for RnCD2 and AcyP2
into an insect shuttle vector
A 5' Sacl site (gagctc) and 3' Kpnl (ggtacc) site and a preprotrypsin leader sequence (PLS, for excretion into the medium) were designed into both the RnCD2 and AcyP2 genes ordered from IDT. Digestion (Sacl and Kpnl) and ligation of the products and the insect shuttle vector pFastBac™ ( Invitrogen) , yielded clone pPLSHisFLAG-RnCD2i and pPLSHisFLAG-AcyP2i (sometimes referred to as RnCD2i and AcyP2i, respectively, herein) .
Site directed mutagenesis
All mutant variants were engineered from these constructs [in the pFastBac™ vector
(Invitrogen)] using quick change site directed mutagenesis .
Expression of RnCD2* and
AcyP2* in Sf9 (insect) cells
Expression in insect cells was carried out as previously described [Hanson et al . , Proc Natl Acad Sci U S A 106, 131-3136 (2009)]. After
expression, growth medium was collected and 0.2 μΜ filtered. Protease inhibitors (1 tablet/200 mL; Roche EDTA-free) , 0.5 mM TCEP, and 1 mM EDTA were added to the filtered growth media extract.
Ammonium sulfate precipitation
of glycosylated variants
Growth medium was incubated for 1 hour with ammonium sulfate (30% wt/vol) at 4° C with constant stirring and precipitating species were removed.
Addition of more ammonium sulfate (80% total wt/vol) to the soluble fraction for 1 hour at 4° C resulted in the precipitation of either RnCD2* or AcyP2* variants from the medium. The precipitate was collected with centrifugation followed by vacuum filtration (Whatman Grade 5 qualitative filter paper) . Precipitate was stored at -80° C.
Purification of glycosylated variants
by Nickel Affinity Chromatography
Superflow® Ni-NTA resin (Qiagen) was used to affinity-purify proteins via the 6xHis tag, using conditions described in the Qiagen manual. Briefly, precipitated protein was resuspended in 1/4 of expression volume of lysis buffer (same as non- glycosylated variants) stirred for 1 hour at 4° C and 0.2 μΜ filtered. Filtered medium was applied to a gravity Ni-NTA column in appropriate lysis buffer, and washed with 10 column volumes of lysis buffer and 50 column volumes of washing buffer (18 mM
imidazole) . Bound protein was removed with 4 column volumes of elution buffer (20 mM TrisHCl, 300 mM imidazole, pH 8.0 for all variants) .
Alternatively, an FPLC HisTrap HP column (1 mL) was used for purification with the same buffer conditions as above. Eluted fractions were exchanged into Concanavilin A (ConA) binding buffer (25 mM TrisHCl, 500 mM NaCl, 1 mM MnCl2, 1 mM CaCl2, pH 7.4) and 0.5 mM TCEP and concentrated in Amicon
centrifugation devices.
Isolation of glycosylated
protein by Lectin Chromatography
Lectin chromatography with Concanavilin A (ConA) was performed on Nickel column eluate with the ConA Glycoprotein Isolation Kit (Pierce) , following the protocols described therein. High mannose and paucimannose species were separated from the non- glycosylated protein found in every expression.
Elution and wash fractions that contained only glycosylated protein were pooled and exchanged into Factor Xa cleavage buffer (50 mM TrisHCl, 100 mM NaCl, pH 7.9).
Factor XA cleavage of N-terminal
tags from glycosylated proteins
5 mM CaCl2 was added to concentrated protein in 50mM TrisHCl, lOOmM NaCl, pH 7.9 before Factor Xa (New England Biolabs) treatment (1 μg Factor Xa : 100 μg of RnCD2 * or AcyP2*) . For RnCD2 * variants the protease reaction was carried out at 4° C for 12 hours. For AcyP2* variants the protease reaction was carried out at 25° C for 2 hours. The cleavage mixture was quenched with 100 μΜ PMSF and separated and buffer exchanged by FPLC (Superdex® 75) . RnCD2 * variant final buffer: PBS, 0.5 mM TCEP, 0.01% sodium azide, pH 7.2. AcyP2* variant final buffer: 50 mM acetate, 0.5 mM TCEP, 0.01% sodium azide, pH 5.5. If cleavage was incomplete Nickel-NTA resin was used to remove uncleaved protein. ESI-MS characterization
Liquid chromatography mass spectrometry (LCMS)
LCMS analysis was performed using an Agilent 1100 LC coupled to an Agilent 1100 single quad ESI mass spectrometer. LC was performed with a 4.6 mm χ 50 mm ZORBAX C8 column (Agilent Technologies, Inc.) .
Table
characterization of glycosylated
RnCD2 * and AcyP2* variants
Variant % Structure g-RnCD2* 12956 12956 25 Man6GlcNAc2
13119 13118 44 Man7GlcNAc2
13282 13282 31 Man8GlcNAc2 g-RnCD2*-K 12468 12469 6 Man3GlcNAc2
12792 12793 13 Man5GlcNAc2
12955 12956 22 Man6GlcNAc2
13118 13118 43 Man7GlcNAc2
13281 13280 16 Man8GlcNAc2 g-RnCD2*-F 12990 12991 23 Man6GlcNAc2
13153 13153 54 Man7GlcNAc2
13316 13315 23 Man8GlcNAc2 g-RnCD2*-KF 12989 12989 21 Man6GlcNAc2
13152 13151 33 Man7GlcNAc2
13315 13314 46 Man8GlcNAc2
AcyP2* 12117 12116 100 Man3GlcNAc2 (Fuc)
AcyP2* 12163 12162 100 Man3GlcNAc2 (Fuc) Details for the characterization of RnCD2*
variants folding kinetics and thermodynamics
General
PBS buffer (lx, 0.5mM TCEP, 0.01% sodium azide, pH 7.2) was made fresh daily from a 10x stock and filtered. Urea and guanidine solutions were prepared fresh daily in lxPBS, filtered, and
concentrations were confirmed my index of refraction (IOR) . Subsequent dilutions of urea or guanidine were made with lxPBS and concentrations were checked by IOR. Constants defined in equations include the universal gas constant (R) and temperature ( Γ) . The value of RT at 25° C was taken to be 0.592 kcal/mol.
Data were imported and fitted in Mathematica® 7 software (Wolfram Research) . Urea was used,
exclusively, as the chaotrope for all RnCD2* variants except for L63F variants. Due to the high
thermodynamic stability of the L63F variants and the saturation point of urea at 25° C, all measurements were also taken in guanidine hydrochloride solutions for this mutant (variants g-RnCD2*-F and RnCD2*-F) . Further data can be found in Culyba et al . , Science 331, 571-575 (2011) .
Folding kinetics of RnCD2* variants
Fluorescence measurements related to kinetic studies were obtained using an AVIV® ATF-105 stopped-flow fluorimeter for single-mixing studies. The set-up consisted of two syringes (syringe 1: lmL, syringe 2: 2 mL) that permitted up to a 25-fold dilution of the components of syringe 1 with syringe 2, in a minimum of 80 \i , of which the flow cell holds 40 ]i . The dead time between start of mixing and acquisition of data was estimated to be 50-100 ms; in general, only data after the first 200 ms were used for fitting.
Excitation was set at 280 nm (bandwidth: 2 nm) and emission was measured at 330 nm (bandwidth: 8 nm) . The photomultiplier voltage was set to 1000 V and data was recorded for 20-200 seconds.
For unfolding studies, the decrease in intensity at 330 nm was monitored after native protein in PBS or low concentrations of urea or guanidine in syringe 1 was mixed with varying volumes of concentrated urea or guanidine solutions in syringe 2. For refolding studies, the increase in intensity at 330 nm was monitored after denatured protein in a urea or guanidine solution in syringe 1 was diluted with varying volumes of PBS buffer or low concentrations of urea or guanidine from syringe 2. All shots of a particular dilution were typically repeated at least 4 times.
Continuous irradiation of RnCD2* at 280 nm led to a decrease in fluorescence intensity over time that correlated with the excitation bandwidth, indicating that photobleaching was taking place. The fluorescence intensity at 330 nm (F330 ) was therefore fit to a double exponential containing a
photobleaching (kpb) component and a folding/unfolding component {kobs) :
F330 =e-k^(Cl + c2e-^) [ 1 ] where t is time, c1 is the fluorescence intensity at t = 0, and C2 is the difference in fluorescence between the initial and final states. Note that C2 was positive in unfolding studies and negative in refolding studies. There was no indication in any of the kinetic studies performed that Eq. 1 was
inadequate to describe the observed folding kinetics. Thus, after accounting for photobleaching, folding was a monoexponential process for all variants.
Thermodynamic stability of RnCD2*
variants using chaotrope denaturation
All fluorescence measurements for equilibrium chaotrope denaturation studies were taken on a CARY Eclipse fluorescence spectrophotometer. The temperature at reading was kept constant at 25° C using a CARY single cell Peltier accessory (Agilent Technologies ) .
For equilibrium denaturation studies, solutions of RnCD2* variants were prepared in PBS and high concentration of urea or guanidine (in lxPBS) at matched protein concentrations ( 15-2C^g/mL) . The solutions were mixed to produce approximately thirty 120μ samples at regular intervals of urea or
guanidine concentrations. Solutions were permitted to equilibrate for at least 30 minutes before
fluorescence emission spectra were scanned, the average of three scans was taken.
Global fit to kinetic and equilibrium data
Plots of the natural logarithm of the observed rate of equilibration between the folded and unfolded states of a protein, In {kobs) , vs.
denaturant concentration have characteristic V-shapes (hence the term "chevron plot") . The quantity kobs is equal to the sum of the unfolding and folding rate constants, ku and kf. Chevron plots therefore result from the dependence of In ku and In kf on urea concentration. The unfolding rate constant dominates kobs at high denaturant concentrations, where the chevron plots for several of the RnCD2* variants are slightly curved. Curvature in the unfolding arm of a chevron plot is often attributed to changes in the structure of the folding transition state. This behavior is accounted for by assuming that In ku has a quadratic dependence on denaturant concentration: ln „ =\nku0 + mul[D] + mu2[D]2 [2] where [D] is denaturant concentration, kU/0 is the unfolding rate constant at [D] = 0, and mul and mu2 are the coefficients of the linear and squared terms in the dependence of In ku on [D] . The folding rate constant dominates kobs at low denaturant
concentrations, where, again, the chevron plots for many of the RnCD2* variants are curved. This has been observed previously by Parker et al . [Parker et al., Biochemistry 36, 13396-13405 (1997)], and was attributed to the rapid formation of an off-pathway intermediate. Thus, the effective folding rate constant, kf *, depends as follows on denaturant concentration :
Figure imgf000081_0001
where fu is the fraction of not-yet-folded protein that is in the unfolded state (instead of the off- pathway intermediate state; i.e, fu = [U] / ( [U] + [ I ] ) = 1/(1+K±))r kf is the true folding rate constant at a given denaturant concentration, [D] is denaturant concentration, kffo is the true folding rate constant at [D] = 0, rrif is the slope of the dependence of In kf on [D] , Kifo is the equilibrium constant for formation of the off-pathway intermediate at [D] = 0, and m± is the slope of the dependence of In K± on [D] . Summing the expressions for kf* and ku yields an equation for , ·
Figure imgf000082_0001
This equation can be fit to folding kinetics vs.
denaturant concentration data to get the parameters of interest (primarily kf/0 and kU/0) · However, the robustness of the fit can be improved by
simultaneously fitting kinetics and equilibrium data. The folding equilibrium constant at a given
denaturant concentration (Kf) is related to the parameters above as follows:
In 0+mj D]
Kf=
[ 5 ]
This expression can be inserted into the equation for fluorescence-detected equilibrium denaturation:
AF + A(dD] AF + A(dD]
3 + ^ [ ] + 1 + = / ° + [D] + e fW
f 1 + e^K,o gma l [D ]+ma 2 [D ]2 where F is the total fluorescence, Ff 0 is the
fluorescence of the folded protein at [D] = 0, (pf is the slope of the fluorescence of the folded state vs. [D] , F is the difference in fluorescence between the unfolded and folded states, and Αφ is the difference between the slopes of the fluorescences of the folded and unfolded states vs [D] . Some of the same
parameters occur in the models for the dependence on [D] of the folding kinetics and equilibrium. This circumstance enables the simultaneous fitting of kinetic and equilibrium data mentioned above.
To ensure that the kinetic and equilibrium data had equal influence on the parameter estimates, the equilibrium data were weighted as follows: 1) the equilibrium and kinetic data were fit separately to their models; 2) the root mean squared residuals for the two fits were calculated; 3) the ratio of the kinetic and equilibrium RMS residuals was calculated (RMSkinetic/RMSequiiibrium) ; 4) the equilibrium data points were multiplied by this ratio. The combined kinetic and (weighted) equilibrium data sets were then fit simultaneously to the combined kinetic and
equilibrium model using Mathematica® 7.0 (Wolfram Research) . The fit yielded estimates for kf/0 and kU/0 which were converted to a folding free energy ( Gf/o) through the relation:
AG ) = -RTlnK ) = -RT ln kffi /kufi [ 7 ]
The slope of the dependence of AGf o on [D] at [D] = 0, meqfo, was determined from the values of mf and mul through the relation: meq,o = -RT(mf - mj [ 8 ]
Further data from these studies can be found in
Culyba et al . , Science 331 , 571-575 (2011) . Details for the characterization
of AcyP2* variants thermodynamics
General
Acetate buffer (50 mM Acetate, 0.5 mM TCEP, 0.01% sodium azide, pH 5.5; Acetate) was made fresh daily from a 4x stock and filtered. Urea solutions were prepared fresh daily in lxAcetate, filtered, and concentrations were confirmed my index of refraction (IOR) . Subsequent dilutions of urea were made with lxAcetate and concentrations were checked by IOR. Constants defined in equations include the universal gas constant (R) and temperature ( Γ) . The value of RT at 25° C was taken to be 0.592 kcal/mol. Data were imported and fit in Microsoft Excel.
Thermodynamic stability of AcyP2*
variants using chaotrope denaturation
All fluorescence measurements for equilibrium chaotrope denaturation studies were taken on a CARY Eclipse fluorescence spectrophotometer. The temperature at reading was kept constant at 25° C using a CARY single cell Peltier accessory (Agilent Technologies) . Each chaotrope denaturation study was repeated at least three times for each variant.
For equilibrium denaturation studies, solutions AcyP2* variants were prepared in Acetate and high concentration of urea (in lxAcetate) at matched concentrations (15-30 μg/mL) . The solutions were mixed to produce approximately thirty 120μ samples at regular intervals of urea or guanidine concentrations. Solutions were permitted to
equilibrate for at least 30 minutes before fluorescence emission spectra were scanned, an average of three scans was taken. Like RnCD2*,
AcyP2* unfolding in response to increasing
concentrations of urea or guanidine causes a shift and intensity change in fluorescence spectrum. Thus, plots of fluorescence intensity at single wave lengths (Fx) , versus chaotrope concentration were plotted to demonstrate unfolding. AGf, o and meq values for AcyP2* were estimated by fitting fluorescence intensity at 330 nm (F330) vs. urea concentration data to :
AF+ Acp[D]
F = Fffl + 9f[D]+i +e_(AG [D\IRT [9]
where F is the total fluorescence, Ffi0 is the
fluorescence of the folded protein at [D] = 0, (pf is the slope of the fluorescence of the folded state vs. [D] , AF is the difference in fluorescence between the unfolded and folded states, and Δφ is the difference between the slopes of the fluorescences of the folded and unfolded states vs [D] .
AGf, 0 and meq values derived from single chaotrope denaturation studies were averaged to give the AGf, 0 and meq values and fits reported.
Polypeptide Synthesis
General
Pinl WW domain proteins were synthesized as C-terminal acids, employing a solid phase peptide synthesis approach using a standard Fmoc Na
protecting group strategy either manually (protein WW) or via a combination of manual and automated methods (proteins g-WW, WW-F, g-WW-F, WW-T, g-WW-T, WW-F,T, and g-WW-F,T were synthesized on an Applied Biosystems 433A automated peptide synthesizer except for the manual coupling of Fmoc-Asn (AC3GICNAC) -OH; as discussed below) . See also, Price et al . , J. Am. Chem. Soc. 132, 15359-15367 (2010) .
Amino acids were activated by 2-(lH- benzotriazole-l-yl ) -1,1,3, 3-tetramethyluronium hexafluorophosphate (HBTU, purchased from Advanced ChemTech) and N-hydroxybenzotriazole hydrate (HOBt, purchased from Advanced ChemTech) . Fmoc-Gly-loaded
NovaSyn® TGT resin and all Fmoc-protected -amino acids (with acid-labile side-chain protecting groups) were purchased from EMD Biosciences, including the glycosylated amino acid Fmoc-Asn (AC3GICNAC) -OH
{N- -Fmoc-Ν-β- [3, 4, 6-tri-0-acetyl-2- ( acetylamino ) - deoxy-2^-glucopyranosyl ] -L-asparagine } [Meldal et al., Tetrahedron Lett. 31, 6987-6990 (1990); Otvos et al., Tetrahedron Lett. 31, 5889-5892 (1990)].
Piperidine and N, -diisopropylethylamine (DIEA) were purchased from Aldrich, N-methyl pyrrolidinone (NMP) was purchased from Applied Biosystems, and N,N- dimethylformamide (DMF) was obtained from Fisher.
A general protocol for manual solid phase peptide synthesis follows: Fmoc-Gly-loaded NovaSyn® TGT resin (217 mg, 50 μιηοΐ at 0.23 mmol/g resin loading) was aliquotted into a fritted polypropylene syringe and allowed to swell in CH2CI2 and
dimethylformamide (DMF) . Solvent was drained from the resin using a vacuum manifold. To remove the Fmoc protecting group on the resin-linked amino acid, 2.5 mL of 20% piperidine in DMF was added to the resin, and the resulting mixture was stirred at room temperature for 5 minutes. The deprotection solution was drained from the resin with a vacuum manifold. Then, an additional 2.5 mL of 20% piperidine in DMF was added to the resin, and the resulting mixture was stirred at room temperature for 15 minutes. The deprotection solution was drained from the resin using a vacuum manifold, and the resin was rinsed five times with DMF.
For coupling of an activated amino acid to a newly deprotected amine on resin, the desired Fmoc- protected amino acid (250 μιηοΐ, 5 eq.) and HBTU (250 μιηοΐ, 5 eq.) were dissolved by vortexing in 2.5 mL 0.1 M HOBt (250 μιηοΐ, 5 eq.) in NMP . To the
dissolved amino acid solution was added 87.1 μιηοΐ DIEA (500 μιηοΐ, 10 eq.) . Only 1.5 eq. of amino acid were used during the coupling of the expensive Fmoc- Asn (Ac3GlcNAc) -OH monomer, and the required amounts of HBTU, HOBT, and DIEA were adjusted accordingly. The resulting mixture was vortexed briefly and allowed to react for at least 1 minute.
The activated amino acid solution was then added to the resin, and the resulting mixture was stirred at room temperature for at least 1 hour.
Selected amino acids were double coupled as needed to allow the coupling reaction to proceed to completion. Following the coupling reaction, the activated amino acid solution was drained from the resin with a vacuum manifold, and the resin was subsequently rinsed five times with DMF. The cycles of
deprotection and coupling were alternately repeated to give the desired full-length protein.
Acid-labile side-chain protecting groups were globally removed and proteins were cleaved from the resin by stirring the resin for about 4 hours in a solution of phenol (0.5 g) , water (500 ]iL) ,
thioanisole (500 ]i ) , ethanedithiol (250 ]i ) , and triisopropylsilane (100 μL) in trifluoroacetic acid (TFA, 8 mL) . Following the cleavage reaction, the TFA solution was drained from the resin, the resin was rinsed with additional TFA, and the resulting solution was concentrated under Ar . Proteins were precipitated from the concentrated TFA solution by addition of diethyl ether (about 45 mL) . Following centrifugation, the ether was decanted, and the pellet (containing the crude protein) was stored at -20° C until purification.
Acetate protecting groups were subsequently removed from the 3-, 4-, and 6-hydroxyl groups of GlcNAc in Asn ( GlcNAc ) -containing proteins by
hydrazinolysis, as described previously [Price et al., J. Am. Chem. Soc. 132 , 15359-15367 (2010); and Ficht et al., Chem. Eur. J. 14 , 3620-3629 (2008)] and elsewhere herein. The WW domains were purified by reverse-phase HPLC on a C18 column using a linear gradient of water in acetonitrile with 0.2% v/v TFA. The identity of each WW domain was confirmed by matrix-assisted laser desorption/ionization time-of- flight spectrometry (MALDI-TOF) , and purity was evaluated by analytical HPLC.
Removal of Acetate Protecting Groups
on Asn-linked GlcNAc Residues in
Glycosylated Pinl WW Domain Proteins
Acetate protecting groups were removed from the 3-, 4-, and 6-hydroxyl groups on the Asn-linked GlcNAc residues in proteins g-WW, g-WW-F, g-WW-T, and g-WW-F,T via hydrazinolysis as described previously [Ficht et al., Chem. Eur. J. 14, 3620-3629 (2008)]. Briefly, the crude protein was dissolved in a solution of 5% hydrazine solution in 60 mM aqueous dithiothreitol (sometimes containing as much as 50% acetonitrile, to facilitate dissolution of the crude protein) and allowed to stand at room temperature for about 1 hour with intermittent agitation. The deprotection reaction was quenched by the addition of about 1 mL TFA and about 20 mL water. The quenched reaction mixture was frozen and lyophilized to give the crude deprotected protein as a white powder.
Purification and Characterization
Immediately prior to purification, the crude proteins were dissolved in either 1:1
water : acetonitrile, DMSO, or 8 M GdnHCl (depending on solubility of the crude protein 8 M GdnHCl was frequently required to dissolve the crude
glycosylated proteins even though these proteins were readily soluble in water after purification) .
Proteins were purified by preparative reverse-phase HPLC on a C18 column using a linear gradient of water in acetonitrile with 0.2% v/v TFA. HPLC fractions containing the desired protein product were pooled, frozen, and lyophilized. Polypeptides were
identified by matrix-assisted laser
desorption/ionization time-of-flight spectrometry (MALDI-TOF) and purity was established by analytical HPLC. Further data from these studies can be found in Culyba et al . , Science 331, 571-575 (2011).
Circular Dichroism Spectroscopy
Measurements were made with an Aviv 62A DS Circular Dichroism Spectrometer, using quartz cuvettes with a 0.1 cm path length. Protein
solutions were prepared in 10 mM sodium phosphate buffer, pH 7, and protein concentrations were determined spectroscopically based on tyrosine and tryptophan absorbance at 280 nm in 6 M guanidine hydrochloride + 20 mM sodium phosphate (Sirp = 5690 M^cirf1, sTyr = 1280 M^cirf1) [Price et al . , J. Am. Chem. Soc. 132, 15359-15367 (2010); and Edelhoch
Biochemistry 6, 1948-1954 (1967)] . CD spectra were obtained by monitoring molar ellipticity from 340 to 200 nm, with 5 second averaging times. Variable temperature CD data were obtained by monitoring molar ellipticity at 227 nm from 0.2 to 98.2° C at 20 C intervals, with 90 seconds equilibration time between data points and 30 second averaging times.
Variable temperature CD data were fit to the following model for two-state thermally induced unfolding transitions:
1 + K, where T is temperature in Kelvin, D0 is the
y-intercept and Di is the slope of the post-transition baseline; N0 is the y-intercept and Ni is the slope of the pre-transit ion baseline; and Kf is the
temperature-dependent folding equilibrium constant. Kf is related to the temperature-dependent free energy of folding AGf(T) according to the following equation:
Figure imgf000090_0001
where R is the universal gas constant (0.0019872 kcal/mol/K) . The midpoint of the thermal unfolding transition (or melting temperature Tm) was calculated by fitting AGf(T) to either of two equations. The first equation is derived from the van't Hoff
relationship:
AGf(T) = (12)
Figure imgf000091_0001
where AH(Tm) is the enthalpy of folding at the melting temperature and ACP is the heat capacity of folding (AH(Tm), ACP, and Tm are parameters of the fit) . The second equation represents AGf(T) as a Taylor series expansion about the melting temperature:
AGf (T) = ΔΟ0 + AG, x(T-Tm) +Δ02 x(T-Tm)2 (13) in which AGo, AGi, and AG2 are parameters of the fit and Tm is a constant obtained from the van't Hoff fit (in equation 12) . The AGf values displayed in Figure 4F for each Pin WW domain protein were obtained by averaging the AGf values (calculated at 328.15 K using equation 13) from each of three or more replicate variable temperature CD studies on the same protein.
CD spectra and variable-temperature CD data for proteins Pin WW domain proteins WW, g-WW, WW-F, g-WW-F, WW-T, g-WW-T, WW-F,T, and g-WW-F,T appear in the Supplemental Information along with parameters from equations 12 and 13 that were used to fit the variable temperature CD data. The standard error for each fitted parameter is also shown. These standard parameter errors were used to estimate the
uncertainty in the average AGf values, along with the uncertainty in the folding and unfolding rate ratios shown in Figure 4F by propagation of error. Further data from these studies can be found in Culyba et al., Science 331, 571-575 (2011) . Laser Temperature Jump Studies
Relaxation times following a rapid laser- induced temperature jump of about 12° C were measured by monitoring Trp fluorescence of 50 μΜ solution of Pin WW domain proteins WW, g-WW, WW-F, g-WW-F, WW-T, g-WW-T, WW-F,T, and g-WW-F, T in 20 mM sodium
phosphate (pH 7) using a nanosecond laser temperature jump apparatus, as described previously [Ballew et al., Rev. Sci. Instrum. 67, 3694-3699 (1996); Ballew et al., Proc. Natl. Acad. Sci USA 93, 5759-5764 (1996); Ervin et al . , J. Photochem. Photobiol . sect. B 54, 1-15 (2000); Jager et al . , J. Mol . Biol. 311, 373-393 (2001)] to monitor the fluorescence decay of a Trp residue in each protein after a laser-induced temperature jump at each of several temperatures.
The relaxation traces represent the average of at least 10 individual temperature-jump studies, and were obtained by fitting the shape f of each fluorescence decay at time t to a linear combination of the fluorescence decay shapes before fi and after 2 the temperature jump: f (t ) = a, (t) - f + a2 (t ) - f2 , (14) where a1{t) and az{t) are the coefficients of the linear combination describing the relative
contributions of fi and 2 to the shape of the
fluorescence decay at time t [Jager et al . , J. Mol. Biol. 311, 373-393 (2001)]. Then, the relaxation of the protein to equilibrium at the new temperature following the laser-induced temperature jump can be represented as Xi(t): plotted as a function of time for each protein at several temperatures [Ballew et al . , Proc. Natl.
Acad. Sci USA, 93, 5759-5764 (1996); and Ervin et al . , J. Photochem. Photobiol . sect. B 54, 1-15
(2000) ] .
The relaxation traces at each temperature were then fit to the following equation:
%(t) = -(i-Xn)
C ex + C?, (16) where Ci and C2 are constants describing the amplitude of the fluorescence decay, xo is a constant that adjusts the measured time to zero after the
instantaneous temperature jump, and τ is the
relaxation time, which is the inverse of the observed rate constant Jc0bs (^obs = l/τ) . Using the
temperature-dependent equilibrium constant ¾ for each protein (from the variable temperature CD studies), folding kf and unfolding ku rate constants can be extracted from kobs according to the following
equations :
^obs _ + (17)
K -** (18)
(19)
Kf +1
The folding rates for each protein can then be fit as a function of temperature to the following Kramers model [Kramers, Physica 7, 284 (1940);
Lapidus et al . , Proc. Natl. Acad. Sci USA 97, 7220- 7225 (2000); Hanggi et al . , Rev. Mod. Phys . 62, 251- 341 (1990)] equation:
η(59 °C) AG0+AG1-(T-Tm)+AG2-(T-Tm)2
£f(T) = v(59 °C) exri
η(Τ) RT
(20) in which the temperature-dependent free energy of activation AGf is represented as a second order
Taylor series expansion about the melting temperature Tm, and AGo, AGi, and AG2 are parameters of the fit. The pre-exponential term in equation Sll represents the viscosity-corrected frequency v of the
characteristic diffusional folding motion at the barrier [Bieri et al . , Proc. Natl. Acad. Sci USA 96, 9597-9601 (1999); Ansari et al . , Science 256, 1796- 1798 (1992) at 59° C, v = 5 χ 105 s_1) [Fuller et al . , Proc. Natl. Acad. Sci. USA 106, 11067-11072 (2009)]. n(59° C) is the solvent viscosity at 59° C and n(T) is the solvent viscosity at temperature T, both calculated with equation 21:
B
η(Τ) = A-10T-C (21)
where A = 2.41 χ 105 Pas, B = 247.8 K, and C = 140 K [Weast, CRC Handbook of Chemistry and Physics; CRC Press: Boca Raton, 1982].
The parameters for equations 13 and 20 were used to calculate the folding and unfolding rate ratios at 328.15 K for Pin WW domain proteins WW, g-WW, WW-F, g-WW-F, WW-T, g-WW-T, WW-F,T, and
g-WW-F,T shown in Figure 4F .
Each of the patents, patent applications and articles cited herein is incorporated by reference. The use of the article "a" or "an" is intended to include one or more.
The foregoing description and the examples are intended as illustrative and are not to be taken as limiting. Still other variations within the spirit and scope of this invention are possible and will readily present themselves to those skilled in the art .
SUPPLEMENTAL INFORMATION
SEQUENCES OF ANTIBODY FC PORTIONS FOR PREPARATION OF ENHANCED AROMATIC SEQUONS
SEQUENCE PORTIONS TO BE REVISED ARE UNDERLINED
>drugbank_drug I DB00078 Ibritumomab - Mouse Anti-CD20 Heavy chain
1 QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARWYYSNSYWYFDVWGTGTTVTV SAPSVYPLAPVCGDTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDLYT LSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRGPTIKPCPPCKCPAPNLLGGPSV FIFPPKIKDVLMI SLSPIVTCVVVDVSEDDPDVOI SWFVNNVEVHTAOTOTHREDYNSTL RVVSALPIQHQDWMSGKEFKCKVNNKDLPAPIERTI SKPKGSVRAPQVYVLPPPEEEMTK KQVTLTCMVTDFMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVER NSYSCSVVHEGLHNHHTTKSFSR
>drugbank_drug I DB00078 Ibritumomab - Mouse Anti-CD20 Heavy chain
2 QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARVVYYSNSYWYFDVWGTGTTVTV SAPSVYPLAPVCGDTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDLYT LSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRGPTIKPCPPCKCPAPNLLGGPSV FIFPPKIKDVLMI SLSPIVTCVVVDVSEDDPDVQI SWFVNNVEVHTAQTQTHREDYNSTL RVVSALPIQHQDWMSGKEFKCKVNNKDLPAPIERTI SKPKGSVRAPQVYVLPPPEEEMTK KQVTLTCMVTDFMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVER NSYSCSVVHEGLHNHHTTKSFSR
>drugbank_drug I DB00078 Ibritumomab - Mouse Anti-CD20 Light chain
1 QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTI SRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRADAAPTVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFN
>drugbank_drug I DB00078 Ibritumomab - Mouse Anti-CD20 Light chain
2 QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTI SRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRADAAPTVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFN
>drugbank_drug I DB00028 Immune globulin - IGG1
PSALTQPPSASGSLGQSVTISCTGTSSDVGGYNYVSWYQQHAGKAPKVI IYEVNKRPSGV PDRFSGSKSGNTASLTVSGLQAEDEADYYCSSYEGSDNFVFGTGTKVTVLGQPKANPTVT LFPPSSEELQANKATEVCLI SDFYPGAVTVAWKADGSPVKAGVETTKPSKQSNNKYAASS YLSLTPEQWKSHRSYSCQVTHEGSTVEKTVAPTECSPLVLQESGPGLVKPSEALSLTCTV SGDS INTILYYWSWIRQPPGKGLEWIGYIYYSGSTYGNPSLKSRVTI SVNTSKNQFYSKL SSVTAADTAVYYCARVPLVVNPWGQGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGC LVKDYFPQPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNH KPSNTKVDKRVAPELLGGPSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPQVKFNWYV DGVOVHNAKTKPREOOYNSTYRVVSVLTVLHONWLDGKEYKCKVSNKALPAPIEKTI SKA KGQPREPQVYTLPPSREEMTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLD SDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSL
>drugbank_drug I DB00028 Immune globulin - IgA2
ELVMTQSPSSLSASVGDRVNIACRASQGI SSALAWYQQKPGKAPRLLIYDASNLESGVPS RFSGSGSGTDFTLTI SSLQPEDFAIYYCQQFNSYPLTFGGGTKVEIKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNRGECQVKLLEQSGAEVKKPGASVKVSCKAS GYSFTSYGLHWVRQAPGQRLEWMGWI SAGTGNTKYSQKFRGRVTFTRDTSATTAYMGLSS LRPEDTAVYYCARDPYGGGKSEFDYWGQGTLVTVSSASPTSPKVFPLSLDSTPQDGNVVV ACLVQGFFPQEPLSVTWSESGQNVTARNFPPSQDASGDLYTTSSQLTLPATQCPDGKSVT CHVKHYTNPSQDVTVPCPVPPPPPCCHPRLSLHRPALEDLLLGSEANLTCTLTGLRDASG ATFTWTPSSGKSAVQGPPERDLCGCYSVSSVLPGCAQPWNHGETFTCTAAHPELKTPLTA NITKSGNTFRPEVHLLPPPSEELALNELVTLTCLARGFSPKDVLVRWLQGSQELPREKYL TWASRQEPSQGTTTFAVTS ILRVAAEDWKKGDTFSCMVGHEALPLAFTQKTIDRLAGKPT HVNVSVVMAEVDGTCY >drugbank_drug I DB00005 Etanercept - DB00005 sequence
LPAQVAFTPYAPEPGSTCRLREYYDQTAQMCCSKCSPGQHAKVFCTKTSDTVCDSCEDST
YTQLWNWVPECLSCGSRCSSDQVETQACTREQNRICTCRPGWYCALSKQEGCRLCAPLRK
CRPGFGVARPGTETSDWCKPCAPGTFSNTTSSTDICRPHQICNWAIPGNASMDAVCTS
TSPTRSMAPGAVHLPQPVSTRSQHTQPTPEPSTAPSTSFLLPMGPSPPAEGSTGDEPKSC
DKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPQVKFNWYVD
GVOVHNAKTKPREOOY STYRVVSVLTVLHONWLDGKEYKCKVSNKALPAPIEKTI SKAK
GOPREPOVYTLPPSREEMTKNOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDS
DGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DB00087 Alemtuzumab - 1CE1:H CAMPATH-1H : Heavy Chain 1
QVQLQESGPGLVRPSQTLSLTCTVSGFTFTDFYMNWVRQPPGRGLEWIGFIRDKAKGYTT EYNPSVKGRVTMLVDTSKNQFSLRLSSVTAADTAVYYCAREGHTAAPFDYWGQGSLVTVS SASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQS SGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCDKTHTCPPCPAPELLG GPSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQY NSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRD ELTKNOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSR WQQGNVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DB00087 Alemtuzumab - 1CE1:H CAMPATH-1H : Heavy Chain 2
QVQLQESGPGLVRPSQTLSLTCTVSGFTFTDFYMNWVRQPPGRGLEWIGFIRDKAKGYTT EYNPSVKGRVTMLVDTSKNQFSLRLSSVTAADTAVYYCAREGHTAAPFDYWGQGSLVTVS SASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQS SGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCDKTHTCPPCPAPELLG GPSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQY NSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRD ELTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSR WQQGNVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DB00087 Alemtuzumab - 1CE1:L CAMPATH-1H : Light Chain 1
DIQMTQSPSSLSASVGDRVTITCKASQNIDKYLNWYQQKPGKAPKLLIYNTNNLQTGVPS RFSGSGSGTDFTFTI SSLQPEDIATYYCLQHI SRPRTFGQGTKVEIKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNR
>drugbank_drug I DB00087 Alemtuzumab - 1CE1:L CAMPATH-1H : Light Chain 2
DIQMTQSPSSLSASVGDRVTITCKASQNIDKYLNWYQQKPGKAPKLLIYNTNNLQTGVPS RFSGSGSGTDFTFTI SSLQPEDIATYYCLQHI SRPRTFGQGTKVEIKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNR
>drugbank_drug I DBO 0113 Arcitumomab - lclo: Anti-CEA antigen light chain 1
QTVLSQSPAILSASPGEKVTMTCRASSSVTYIHWYQQKPGSSPKSWIYATSNLASGVPAR FSGSGSGTSYSLTI SRVEAEDAATYYCQHWSSKPPTFGGGTKLEIKRADAAPTVS IFPPS SEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLTL TKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC
>drugbank_drug I DBO 0113 Arcitumomab - lclo: Anti-CEA antigen light chain 2
QTVLSQSPAILSASPGEKVTMTCRASSSVTYIHWYQQKPGSSPKSWIYATSNLASGVPAR FSGSGSGTSYSLTI SRVEAEDAATYYCQHWSSKPPTFGGGTKLEIKRADAAPTVS IFPPS SEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLTL TKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC
>drugbank_drug I DBO 0113 Arcitumomab - lclo : Anti-CEA heavy chain 1 EVKLVESGGGLVQPGGSLRLSCATSGFTFTDYYMNWVRQPPGKALEWLGFIGNKANGYTT EYSASVKGRFTI SRDKSQS ILYLQMNTLRAEDSATYYCTRDRGLRFYFDYWGQGTTLTVS SAKTTPPSVYPLAPGSAAQTNSMVTLGCLVKGYFPEPVTVTWNSGSLSSGVHTFPAVLQS DLYTLSSSVTVPSSPRPSETVTCNVAHPASSTKVDKKIVPRDCPPCKCPAPNLLGGPSVF IFPPKIKDVLMI SLSPIVTCVVVDVSEDDPDVQI SWFVNNVEVHTAQTQTHREDYNSTLR VVSALPIQHQDWMSGKEFKCKVNNKDLPAPIERTI SKPKGSVRAPQVYVLPPPEEEMTKK QVTLTCMVTDFMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVERN SYSCSVVHEGLHNHHTTKSFSR
>drugbank_drug I DBO 0113 Arcitumomab - lclo : Anti-CEA heavy chain 2 EVKLVESGGGLVQPGGSLRLSCATSGFTFTDYYMNWVRQPPGKALEWLGFIGNKANGYTT EYSASVKGRFTI SRDKSQS ILYLQMNTLRAEDSATYYCTRDRGLRFYFDYWGQGTTLTVS SAKTTPPSVYPLAPGSAAQTNSMVTLGCLVKGYFPEPVTVTWNSGSLSSGVHTFPAVLQS DLYTLSSSVTVPSSPRPSETVTCNVAHPASSTKVDKKIVPRDCPPCKCPAPNLLGGPSVF IFPPKIKDVLMI SLSPIVTCVVVDVSEDDPDVOI SWFVNNVEVHTAOTOTHREDYNSTLR VVSALPIQHQDWMSGKEFKCKVNNKDLPAPIERTI SKPKGSVRAPQVYVLPPPEEEMTKK QVTLTCMVTDFMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVERN SYSCSVVHEGLHNHHTTKSFSR
>drugbank_drug I DBO 0103 Agalsidase beta - DB00103 sequence LDNGLARTPTMGWLHWERFMCNLDCQEEPDSCI SEKLFMEMAELMVSEGWKDAGYEYLCI DDCWMAPQRDSEGRLQADPQRFPHGIRQLA YVHSKGLKLGIYADVGNKTCAGFPGSFGY YDIDAQTFADWGVDLLKFDGCYCDSLENLADGYKHMSLALNRTGRS IVYSCEWPLYMWPF QKPNYTEIRQYCNHWRNFADIDDSWKS IKS ILDWTSFNQERIVDVAGPGGWNDPDMLVIG NFGLSWNQQVTQMALWAIMAAPLFMSNDLRHISPQAKALLQDKDVIAINQDPLGKQGYQL RQGDNFEVWERPLSGLAWAVAMINRQEIGGPRSYTIAVASLGKGVACNPACFITQLLPVK RKLGFYEWTSRLRSHINPTGTVLLQLENTMQMSLKDLL
>drugbank_drug I DB00064 Serum albumin iodonated - DB00064 sequence
DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAE NCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEV DVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLP KLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTK VHTECCHGDLLECADDRADLAKYICENQDS I SSKLKECCEKPLLEKSHCIAEVENDEMPA DLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKC CAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVST PTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSDRVTKCCTES LVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKAT KEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL
>drugbank_drug I DBO 00 3 Omalizumab - Anti IgE antibody VH domain chain 1
EVQLVESGGGLVQPGGSLRLSCAVSGYSITSGYSWNWIRQAPGKGLEWVASITYDGSTNY ADSVKGRFTI SRDDSKNTFYLQMNSLRAEDTAVYYCARGSHYFGHWHFAVWGQGTLVTVS SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEOYNSTY RVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRDELTK NOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWOOG NVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DBO 0043 Omalizumab - Anti IgE antibody VH domain chain 2
EVQLVESGGGLVQPGGSLRLSCAVSGYSITSGYSWNWIRQAPGKGLEWVASITYDGSTNY ADSVKGRFTI SRDDSKNTFYLQMNSLRAEDTAVYYCARGSHYFGHWHFAVWGQGTLVTVS SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEOYNSTY RVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRDELTK NQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQG NVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DBO 0043 Omalizumab - Anti IgE antibody VL domain chain 1
DIQLTQSPSSLSASVGDRVTITCRASQSVDYDGDSYMNWYQQKPGKAPKLLIYAASYLES GVPSRFSGSGSGTDFTLTI SSLQPEDFATYYCQQSHEDPYTFGQGTKVEIKRTVAAPSVF IFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLS STLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNR
>drugbank_drug I DBO 0043 Omalizumab - Anti IgE antibody VL domain chain 2
DIQLTQSPSSLSASVGDRVTITCRASQSVDYDGDSYMNWYQQKPGKAPKLLIYAASYLES GVPSRFSGSGSGTDFTLTI SSLQPEDFATYYCQQSHEDPYTFGQGTKVEIKRTVAAPSVF IFPPSDEQLKSGTASWCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLS STLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNR
>drugbank_drug I DBO 0100 Coagulation Factor IX - DB00100 sequence YNSGKLEEFVQGNLERECMEEKCSFEEAREVFENTERTTEFWKQYVDGDQCESNPCLNGG SCKDDINSYECWCPFGFEGKNCELDVTCNIKNGRCEQFCKNSADNKVVCSCTEGYRLAEN QKSCEPAVPFPCGRVSVSQTSKLTRAEAVFPDVDYVNSTEAETILDNITQSTQSFNDFTR VVGGEDAKPGQFPWQVVLNGKVDAFCGGS IVNEKWIVTAAHCVETGVKITVVAGEH IEE TEHTEQKRNVIRI IPHHNYNAAINKYNHDIALLELDEPLVLNSYVTPICIADKEYTNIFL KFGSGYVSGWGRVFHKGRSALVLQYLRVPLVDRATCLRSTKFTIYNNMFCAGFHEGGRDS CQGDSGGPHVTEVEGTSFLTGI I SWGEECAMKGKYGIYTKVSRYVNWIKEKTKLT
>drugbank_drug I DBO 00 6 Insulin Lyspro recombinant - A chain GIVEQCCTSICSLYQLENYCN
>drugbank_drug I DBO 00 6 Insulin Lyspro recombinant - B chain FVNQHLCGSHLVEALYLVCGERGFFYTKPT
>drugbank_drug I DB00088 Alglucerase - DB00088 sequence
ARPCIPKSFGYSSWCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANH TGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIR VPMASCDFS IRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWT SPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGL LSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPE AAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRG MQYSHS I ITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPI IVDITKDTFYKQPMFYHL GHFSKFIPEGSQRVGLVASQKNDLDAVALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFL ETI SPGYS IHTYLWRRQ
>drugbank_drug I DB00016 Epoetin alfa - DB00016 sequence
APPRLICDSRVLERYLLEAKEAENITTGCAEHCSLNENITVPDTKVNFYAWKRMEVGQQA VEVWQGLALLSEAVLRGQALLVNSSQPWEPLQLHVDKAVSGLRSLTTLLRALGAQKEAIS PPDAASAAPLRTITADTFRKLFRVYSNFLRGKLKLYTGEACRTGDR
>drugbank_drug I DB00057 Satumomab Pendetide - Heavy chain 1 B72.3 (murine )
QVQLQQSDAELVKPGASVKISCKASGYTFTDHAIHWAKQKPEQGLEWIGYISPGNDDIKY NEKFKGKATLTADKSSSTAYMQLNSLTSEDSAVYFCKRSYYGHWGQGTTLTVSSASTKGP SVFPLAPCSRSTSESTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLS SVVTVPSSSLGTKTYTCNVDHKPSNTKVDKRVCPPCKCPAPNLLGGPSVFIFPPKIKDVL MISLSPIVTCVVVDVSEDDPDVOISWFVNNVEVHTAOTOTHREDYNSTLRVVSALPIOHO DWMSGKEFKCKVNNKDLPAPIERTI SKPKGSVRAPQVYVLPPPEEEMTKKQVTLTCMVTD FMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVERNSYSCSVVHEG LHNHHTTKSFSR
>drugbank_drug I DB00057 Satumomab Pendetide - Heavy chain 2 B72.3 (murine )
QVQLQQSDAELVKPGASVKISCKASGYTFTDHAIHWAKQKPEQGLEWIGYISPGNDDIKY NEKFKGKATLTADKSSSTAYMQLNSLTSEDSAVYFCKRSYYGHWGQGTTLTVSSASTKGP SVFPLAPCSRSTSESTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLS SWTVPSSSLGTKTYTCNVDHKPSNTKVDKRVCPPCKCPAPNLLGGPSVFIFPPKIKDVL MI SLSPIVTCVVVDVSEDDPDVQI SWFVNNVEVHTAQTQTHREDYNSTLRVVSALPIQHQ DWMSGKEFKCKVNNKDLPAPIERTI SKPKGSVRAPQVYVLPPPEEEMTKKQVTLTCMVTD FMPEDIYVEWTNNGKTELNYKNTEPVLDSDGSYFMYSKLRVEKKNWVERNSYSCSVVHEG LHNHHTTKSFSR
>drugbank_drug I DB00057 Satumomab Pendetide - Light chain 1 B72.3 (murine )
DIQMTQSPASLSVSVGETVTITCRASENIYSNLAWYQQKQGKSPQLLVYAATNLADGVPS RFSGSGSGTQYSLKINSLQSEDFGSYYCQHFWGTPYTFGGGTRLEIKRADAAPTVFIFPP SDEQLKSGTASWCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFN
>drugbank_drug I DB00057 Satumomab Pendetide - Light chain 2 B72.3 (murine )
DIQMTQSPASLSVSVGETVTITCRASENIYSNLAWYQQKQGKSPQLLVYAATNLADGVPS RFSGSGSGTQYSLKINSLQSEDFGSYYCQHFWGTPYTFGGGTRLEIKRADAAPTVFIFPP SDEQLKSGTASWCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFN >drugbank_drug I DBO 00 5 OspA lipoprotein - DB00045 sequence CKQNVSSLDEKNSVSVDLPGEMNVLVSKEKNKDGKYDLIATVDKLELKGTSDKNNGSGVL EGVKADKSKVKLTI SDDLGQTTLEVFKEDGKTLVSKKVTSKDKSSTEEKFNEKGEVSEKI ITRADGTRLEYTEIKSDGSGKAKEVLKSYVLEGTLTAEKTTLVVKEGTVTLSK I SKSGE VSVELNDTDSSAATKKTAAWNSGTSTLTITVNSKKTKDLVFTKENTITVQQYDSNGTKLE GSAVEITKLDEIKNALK
>drugbank_drug I DB00068 Interferon beta-lb - DB00068 sequence MSYNLLGFLQRSSNFQSQKLLWQLNGRLEYCLKDRMNFDIPEEIKQLQQFQKEDAALTIY EMLQNIFAIFRQDSSSTGWNETIVENLLANVYHQINHLKTVLEEKLEKEDFTRGKLMSSL HLKRYYGRILHYLKAKEYSHCAWTIVRVEILRNFYFINRLTGYLRN
>drugbank_drug I DB00047 Insulin Glargine recombinant - A chain GIVEQCCTSICSLYQLENYCG
>drugbank_drug I DB00047 Insulin Glargine recombinant - B chain FVNQHLCGSHLVEALYLVCGERGFFYTPKTRR
>drugbank_drug I DB00092 Alefacept - DB00092 sequence
CFSQQIYGVVYGNVTFHVPSNVPLKEVLWKKQKDKVAELE SEFRAFSSFKNRVYLDTVS
GSLTIYNLTSSDEDEYEMESP ITDTMKFFLYVLESLPSPTLTCALTNGS IEVQCMIPEH
YNSHRGLIMYSWDCPMEQCKRNSTS IYFKMENDLPQKIQCTLSNPLFNTTSS I ILTTCIP
SSGHSRHRYALIPIPLAVITTCIVLYMNGILKCDRKPDRTNSNRVEPKSCDKTHTCPPCP
APELLGGPSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPQVKFNWYVDGVQVHNAKTK
PREOOY STYRVVSVLTVLHONWLDGKEYKCKVSNKALPAPIEKTI SKAKGOPREPOVYT
LPPSREEMTKNOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKL
TVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DBO 0011 Interferon alfa-nl - DB00011 sequence CDLPQTHSLGSRRTLMLLAQMRKI SLFSCLKDRHDFGFPQEEFGNQFQKAETIPVLHEMI QQIFNLFSTKDSSAAWDETLLDKFYTELYQQLNDLEACVIQGVGVTETPLMKEDS ILAVR KYFQRITLYLKEKKYSPCAWEWRAEIMRSFSLSTNLQESLRSKE
>drugbank_drug I DBO 0030 Insulin recombinant - A chain
GIVEQCCTSICSLYQLENYCN
>drugbank_drug I DBO 0030 Insulin recombinant - B chain
FVNQHLCGSHLVEALYLVCGERGFFYTPKT
>drugbank_drug I DB00023 Asparaginase - DB00023 sequence
QMSLQQELRYIEALSAIVETGQKMLEAGESALDWTEAVRLLEECPLFNAGIGAVFTRDE
THELDACVMDGNTLKAGAVAGVSHLRNPVLAARLVMEQSPHVMMIGEGAENFAFARGMER
VSPEIFSTSLRYEQLLAARKEGATVLDHSGAPLDEKQKMGTVGAVALDLDGNLAAATSTG
GMTNKLPGRVGDSPLVGAGCYANNASVAVSCTGTGEVFIRALAAYDIAALMDYGGLSLAE
ACERVVMEKLPALGGSGGLIAIDHEGNVALPFNTEGMYRAWGYAGDTPTTGIYREKGDTV ATQ
>drugbank_drug I DB00024 Thyrotropin Alfa - Alpha chain
APDVQDCPECTLQENPFFSQPGAPILQCMGCCFSRAYPTPLRSKKTMLVQKNVTSESTCC
VAKSYNRVTVMGGFKVENHTACHCSTCYYHKS
>drugbank_drug I DB00024 Thyrotropin Alfa - Beta chain
NSCELTNITIAIEKEECRFCI S INTTWCAGYCYTRDLVYKDPARPKIQKTCTFKELVYET VRVPGCAHHADSLYTYPVATQCHCGKCDSDSTDCTVRGLGPSYCSFGEMKE
>drugbank_drug I DB00082 Pegvisomant - DB00082 sequence
FPTIPLSRLFDNAMLRAHRLHQLAFDTYQEFEEAYIPKEQKYSFLQNPQTSLCFSES IPT PSNREETQQKSNLELLRI SLLLIQSWLEPVQFLRSVFANSLVYGASDSNVYDLLKDLEEG IQTLMGRLEDGSPRTGQIFKQTYSKFDTNSHNDDALLKNYGLLYCFRKDMDKVETFLRIV QCRSVEGSCGF
>drugbank_drug I DBO 0013 Urokinase - DB00013 sequence
KPSSPPEELKFQCGQKTLRPRFKI IGGEFTTIENQPWFAAIYRRHRGGSVTYVCGGSLMS PCWVI SATHCFIDYPKKEDYIVYLGRSRLNSNTQGEMKFEVENLILHKDYSADTLAHHND IALLKIRSKEGRCAQPSRTIQTICLPSMYNDPQFGTSCEITGFGKENSTDYLYPEQLKMT VVKLISHRECQQPHYYGSEVTTKMLCAADPQWKTDSCQGDSGGPLVCSLQGRMTLTGIVS WGRGCALKDKPGVYTRVSHFLPWIRSHTKEENGLAL >drugbank_drug I DB00097 Choriogonadotropin alfa - Alpha chain APDVQDCPECTLQENPFFSQPGAPILQCMGCCFSRAYPTPLRSKKTMLVQKNVTSESTCC VAKSYNRVTVMGGFKVENHTACHCSTCYYHKS
>drugbank_drug I DB00097 Choriogonadotropin alfa - Beta chain SKEPLRPRCRPINATLAVEKEGCPVCITVNTTICAGYCPTMTRVLQGVLPALPQVVCNYR DVRFES IRLPGCPRGVNPVVSYAVALSCQCALCRRSTTDCGGPKDHPLTCDDPRFQDSSS SKAPPPSLPSPSRLPGPSDTPILPQ
>drugbank_drug I DBO 0111 Daclizumab - Humanized Anti-CD25 Heavy Chain 1
QVQLVQSGAEVKKPGSSVKVSCKASGYTFTSYRMHWVRQAPGQGLEWIGYINPSTGYTEY NQKFKDKATITADESTNTAYMELSSLRSEDTAVYYCARGGGVFDYWGQGTTLTVSSGPSV FPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSV VTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSVFLFPP KPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQY STYRVVSV LTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRDELTKNQVSL TCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWOOGNVFSC SVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DBO 0111 Daclizumab - Humanized Anti-CD25 Heavy Chain 2
QVQLVQSGAEVKKPGSSVKVSCKASGYTFTSYRMHWVRQAPGQGLEWIGYINPSTGYTEY NQKFKDKATITADESTNTAYMELSSLRSEDTAVYYCARGGGVFDYWGQGTTLTVSSGPSV FPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSV VTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSVFLFPP KPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEOY STYRVVSV LTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRDELTKNQVSL TCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWOOGNVFSC SVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DBO 0111 Daclizumab - Humanized Anti-CD25 Light Chain 1
DIQMTQSPSTLSASVGDRVTITCSASSSISYMHWYQQKPGKAPKLLIYTTSNLASGVPAR FSGSGSGTEFTLTI SSLQPDDFATYYCHQRSTYPLTFGSGTKVEVKRTVAAPSVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR
>drugbank_drug I DBO 0111 Daclizumab - Humanized Anti-CD25 Light Chain 2
DIQMTQSPSTLSASVGDRVTITCSASSSISYMHWYQQKPGKAPKLLIYTTSNLASGVPAR FSGSGSGTEFTLTI SSLQPDDFATYYCHQRSTYPLTFGSGTKVEVKRTVAAPSVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR
>drugbank_drug | DBO 003 Interferon Alfa-2a, Recombinant - DB00034 sequence
CDLPQTHSLGSRRTLMLLAQMRKI SLFSCLKDRHDFGFPQEEFGNQFQKAETIPVLHEMI QQIFNLFSTKDSSAAWDETLLDKFYTELYQQLNDLEACVIQGVGVTETPLMKEDS ILAVR KYFQRITLYLKEKKYSPCAWEVVRAEIMRSFSLSTNLQESLRSKE
>drugbank_drug I DB00096 Serum albumin - DB00096 sequence
DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAE
NCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEV
DVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLP
KLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTK
VHTECCHGDLLECADDRADLAKYICENQDS I SSKLKECCEKPLLEKSHCIAEVENDEMPA
DLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKC
CAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVST
PTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSDRVTKCCTES
LVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKAT
KEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL
>drugbank_drug I DBO 0001 Lepirudin - DB00001 sequence
LVYTDCTESGQNLCLCEGSNVCGQGNKCILGSDGEKNQCVTGEGTPKPQSHNDGDFEEIP
EEYLQ >drugbank_drug I DB00044 Lutropin alfa - AlphaChain
APDVQDCPECTLQENPFFSQPGAPILQCMGCCFSRAYPTPLRSKKTMLVQKNVTSESTCC
VAKSYNRVTVMGGFKVENHTACHCSTCYYHKS
>drugbank_drug I DB00044 Lutropin alfa - BetaChain (LH)
SREPLRPWCHPINAILAVEKEGCPVCITVNTTICAGYCPTMMRVLQAVLPPLPQVVCTYR DVRFES IRLPGCPRGVDPVVSFPVALSCRCGPCRRSTSDCGGPKDHPLTCDHPQLSGLLF L
>drugbank_drug I DB00070 Hyaluronidase - DB00070 sequence
LNFRAPPVIPNVPFLWAWNAPSEFCLGKFDEPLDMSLFSFIGSPRINATGQGVTIFYVDR LGYYPYIDS ITGVTVNGGIPQKI SLQDHLDKAKKDITFYMPVDNLGMAVIDWEEWRPTWA RNWKPKDVYKNRS IELVQQQNVQLSLTEATEKAKQEFEKAGKDFLVETIKLGKLLRPNHL WGYYLFPDCYNHHYKKPGYNGSCFNVEIKRNDDLSWLWNESTALYPS IYLNTQQSPVAAT LYVRNRVREAIRVSKIPDAKSPLPVFAYTRIVFTDQVLKFLSQDELVYTFGETVALGASG IVIWGTLS IMRSMKSCLLLDNYMETILNPYI INVTLAAKMCSQVLCQEQGVCIRKNWNSS DYLHLNPDNFAIQLEKGGKFTVRGKPTLEDLEQFSEKFYCSCYSTLSCKEKADVKDTDAV DVCIADGVCIDAFLKPPMETEEPQIFYNASPSTLSATMFIVS ILFLI I SSVASL
>drugbank_drug I DBO 0031 Tenecteplase - DB00031 sequence
SYQVICRDEKTQMIYQQHQSWLRPVLRSNRVEYCWCNSGRAQCHSVPVKSCSEPRCFNGG
TCQQALYFSDFVCQCPEGFAGKCCEIDTRATCYEDQGI SYRGNWSTAESGAECTQWNSSA
LAQKPYSGRRPDAIRLGLGNHNYCRNPDRDSKPWCYVFKAGKYSSEFCSTPACSEGNSDC
YFGNGSAYRGTHSLTESGASCLPWNSMILIGKVYTAQNPSAQALGLGKHNYCRNPDGDAK
PWCHVLKNRRLTWEYCDVPSCSTCGLRQYSQPQFRIKGGLFADIASHPWQAAAAAKHRRS
PGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVH
KEFDDDTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEAL
SPFYSERLKEAHVRLYPSSRCTSQHLLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGG
PLVCLNDGRMTLVGI I SWGLGCGQKDVPGVYTKVTNYLDWIRDNMRP
>drugbank_drug I DBO 0076 Digoxin Immune Fab - 26-10 Heavy chain (murine )
EVQLQQSGPELVKPGASVRMSCKSSGYIFTDFYMNWVRQSHGKSLDYIGYISPYSGVTGY NQKFKGKATLTVDKSSSTAYMELRSLTSEDSAVYYCAGSSGNKWAMDYWGHGASVTVSSA KTTAPSVYPLAPVCGDTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDL YTLSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEP
>drugbank_drug I DBO 0076 Digoxin Immune Fab - 26-10 Light chain (murine )
DVVMTQTPLSLPVSLGDQAS I SCRSSQSLVHSNGNTYLNWYLQKAGQSPKLLIYKVSNRF SGVPDRFSGSGSGTDFTLKI SRVEAEDLGIYFCSQTTHVPPTFGGGTKLEIKRADAAPTV SIFPPSSEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSM SSTLTLTKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC
>drugbank_drug I DB04900 Thymalfasin - Thymalfasin
SDAAVDTSSEITTKDLKEKKEVVEEAEN
>drugbank_drug I DB00061 Pegademase bovine - DB00061 sequence AQTPAFNKPKVELHVHLDGAIKPETILYYGRKRGIALPADTPEELQ I IGMDKPLSLPEF LAKFDYYMPAIAGCREAVKRIAYEFVEMKAKDGVVYVEVRYSPHLLANSKVEPIPWNQAE GDLTPDEVVSLVNQGLQEGERDFGVKVRS ILCCMRHQPSWSSEVVELCKKYREQTVVAID LAGDETIEGSSLFPGHVKAYAEAVKSGVHRTVHAGEVGSANVVKEAVDTLKTERLGHGYH TLEDATLYNRLRQENMHFEVCPWSSYLTGAWKPDTEHPVVRFKNDQVNYSLNTDDPLIFK STLDTDYQMTKNEMGFTEEEFKRLNINAAKSSFLPEDEKKELLDLLYKAYGMPSPASAEQ CL
>drugbank_drug I DB00004 Denileukin diftitox - DB00004 sequence MGADDVVDSSKSFVMENFSSYHGTKPGYVDS IQKGIQKPKSGTQGNYDDDWKGFYSTDNK YDAAGYSVDNENPLSGKAGGVVKVTYPGLTKVLALKVDNAETIKKELGLSLTEPLMEQVG TEEFIKRFGDGASRVVLSLPFAEGSSSVEYINNWEQAKALSVELEINFETRGKRGQDAMY EYMAQACAGNRVRRSVGSSLSCINLDWDVIRDKTKTKIESLKEHGPIKNKMSESPNKTVS EEKAKQYLEEFHQTALEHPELSELKTVTGTNPVFAGANYAAWAVNVAQVIDSETADNLEK TTAALS ILPGIGSVMGIADGAVHHNTEEIVAQS IALSSLMVAQAIPLVGELVDIGFAAYN FVES I INLFQVVHNSYNRPAYSPGHKTHAPTSSSTKKTQLQLEHLLLDLQMILNGINNYK NPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLI S I VI VLELKGSETTFMCEYADETATIVEFLNRWITFCQS I I STLT
>drugbank_drug I DBO 0042 Botulinum Toxin Type B - Botulinum neurotoxin type B - Clostridium botulinum MPVTI NFNYNDPIDNN I IMMEPPFARGTGRYYKAFKITDRIWI IPERYTFGYKPEDFN KSSGIFNRDVCEYYDPDYLNTNDKK IFLQTMIKLFNRIKSKPLGEKLLEMI I GIPYLG DRRVPLEEFNTNIASVTVNKLI SNPGEVERKKGIFANLI IFGPGPVLNENETIDIGIQNH FASREGFGGIMQMKFCPEYVSVFNNVQENKGAS IFNRRGYFSDPALILMHELIHVLHGLY GIKVDDLPIVPNEKKFFMQSTDAIQAEELYTFGGQDPS I ITPSTDKS IYDKVLQNFRGIV DRLNKVLVCI SDP I I IYKNKFKDKYKFVEDSEGKYS IDVESFDKLYKSLMFGFTETN IAENYKIKTRASYFSDSLPPVKIKNLLDNEIYTIEEGF I SDKDMEKEYRGQNKAINKQA YEEI SKEHLAVYKIQMCKSVKAPGICIDVDNEDLFFIADK SFSDDLSKNERIEYNTQSN YIENDFPINELILDTDLISKIELPSENTESLTDFNVDVPVYEKQPAIKKIFTDENTIFQY LYSQTFPLDIRDI SLTSSFDDALLFSNKVYSFFSMDYIKTANKVVEAGLFAGWVKQIVND FVIEANKSNTMDKIADISLIVPYIGLALNVGNETAKGNFENAFEIAGASILLEFIPELLI PVVGAFLLESYIDNKNKI IKTIDNALTKRNEKWSDMYGLIVAQWLSTVNTQFYTIKEGMY KALNYQAQALEEI IKYRY IYSEKEKS I IDFNDINSKLNEGINQAID INNFINGCSV SYLMKKMIPLAVEKLLDFDNTLKKNLLNYIDENKLYLIGSAEYEKSKVNKYLKTIMPFDL SIYTNDTILIEMFNKYNSEILNN11LNLRYKDNNLIDLSGYGAKVEVYDGVELNDKNQFK LTSSANSKIRVTQNQ I IFNSVFLDFSVSFWIRIPKYKNDGIQNYIHNEYTI INCMKNNS GWKI S IRGNRI IWTLIDINGKTKSVFFEY IREDI SEYINRWFFVTITNNLNNAKIYING KLESNTDIKDIREVIANGEI IFKLDGDIDRTQFIWMKYFS IFNTELSQS IEERYKIQSY SEYLKDFWGNPLMYNKEYYMFNAGNKNSYIKLKKDSPVGEILTRSKYNQNSKYINYRDLY IGEKFI IRRKSNSQS INDDIVRKEDYIYLDFFNLNQEWRVYTYKYFKKEEEKLFLAPI SD SDEFYNTIQIKEYDEQPTYSCQLLFKKDEESTDEIGLIGIHRFYESGIVFEEYKDYFCIS KWYLKEVKRKPYNLKLGCNWQFIPKDEGWTE
>drugbank_drug I DB04899 Nesiritide - DB04899: Natriuretic peptides B SPKMVQGSGCFGRKMDRI SSSSGLGCKVLRRH
>drugbank_drug I DB00107 Oxytocin - DB00107 sequence CYIQNCPLG
>drugbank_drug I DB00003 Dornase Alfa - DB00003 sequence LKIAAFNIQTFGETKMSNATLVSYIVQILSRYDIALVQEVRDSHLTAVGKLLDNLNQDAP DTYHYVVSEPLGRNSYKERYLFVYRPDQVSAVDSYYYDDGCEPCGNDTFNREPAIVRFFS RFTEVREFAIVPLHAAPGDAVAEIDALYDVYLDVQEKWGLEDVMLMGDFNAGCSYVRPSQ WSS IRLWTSPTFQWLIPDSADTTATPTHCAYDRIVVAGMLLRGAVVPDSALPFNFQAAYG LSDQLAQAI SDHYPVEVMLK
>drugbank_drug I DB00002 Cetuximab - Anti-EGFR heavy chain 1 QVQLKQSGPGLVQPSQSLSITCTVSGFSLTNYGVHWVRQSPGKGLEWLGVIWSGGNTDYN TPFTSRLS INKD SKSQVFFKM SLQSNDTAIYYCARALTYYDYEFAYWGQGTLVTVSAA STKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSG LYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPKSPKSCDKTHTCPPCPAPELL GGPSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQ Y STYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSR DELTKNOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKS RWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DB00002 Cetuximab - Anti-EGFR heavy chain 2 QVQLKQSGPGLVQPSQSLSITCTVSGFSLTNYGVHWVRQSPGKGLEWLGVIWSGGNTDYN TPFTSRLS INKDNSKSQVFFKMNSLQSNDTAIYYCARALTYYDYEFAYWGQGTLVTVSAA STKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSG LYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPKSPKSCDKTHTCPPCPAPELL GGPSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQ YNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSR DELTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKS RWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DB00002 Cetuximab - Anti-EGFR light chain 1 DILLTQSPVILSVSPGERVSFSCRASQSIGTNIHWYQQRTNGSPRLLIKYASESISGIPS RFSGSGSGTDFTLS INSVESEDIADYYCQQNNNWPTTFGAGTKLELKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNRGA
>drugbank_drug I DB00002 Cetuximab - Anti-EGFR light chain 2 DILLTQSPVILSVSPGERVSFSCRASQSIGTNIHWYQQRTNGSPRLLIKYASESISGIPS RFSGSGSGTDFTLS INSVESEDIADYYCQQNNNWPTTFGAGTKLELKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNRGA
>drugbank_drug I DB00019 Pegfilgrastim - DB00019 sequence MTPLGPASSLPQSFLLKCLEQVRKIQGDGAALQEKLCATYKLCHPEELVLLGHSLGIPWA PLSSCPSQALQLAGCLSQLHSGLFLYQGLLQALEGISPELGPTLDTLQLDVADFATTIWQ QMEELGMAPALQPTQGAMPAFASAFQRRAGGVLVASHLQSFLEVSYRVLRHLAQP
>drugbank_drug I DB00036 Coagulation factor Vila - DB00036 sequence
A AFLEELRPGSLERECKEEQCSFEEAREIFKDAERTKLFWI SYSDGDQCASSPCQNGGS CKDQLQSYICFCLPAFEGRNCETHKDDQLICVNENGGCEQYCSDHTGTKRSCRCHEGYSL LADGVSCTPTVEYPCGKIPILEKRNASKPQGRIVGGKVCPKGECPWQVLLLVNGAQLCGG TLINTIWVVSAAHCFDKIKNWRNLIAVLGEHDLSEHDGDEQSRRVAQVI IPSTYVPGTTN HDIALLRLHQPVVLTDHVVPLCLPERTFSERTLAFVRFSLVSGWGQLLDRGATALELMVL NVPRLMTQDCLQQSRKVGDSPNITEYMFCAGYSDGSKDSCKGDSGGPHATHYRGTWYLTG IVSWGQGCATVGHFGVYTRVSQYIEWLQKLMRSEPRPGVLLRAPFP
>drugbank_drug I DBO 0081 Tositumomab - Mouse-Human chimeric Anti- CD20 Heavy Chain 1
QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARVVYYSNSYWYFDVWGTGTTVTV SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEOYNSTY RVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRDELTK NQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQG NVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DBO 0081 Tositumomab - Mouse-Human chimeric Anti- CD20 Heavy Chain 2
QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARVVYYSNSYWYFDVWGTGTTVTV SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTY RWSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRDELTK NOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWOOG NVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DBO 0081 Tositumomab - Mouse-Human chimeric Anti- CD20 Light Chain 1
QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTI SRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRTVAAPSVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR
>drugbank_drug I DBO 0081 Tositumomab - Mouse-Human chimeric Anti- CD20 Light Chain 2
QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTI SRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRTVAAPSVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR
>drugbank_drug I DB00062 Human Serum Albumin - DB00062 sequence
DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAE
NCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEV
DVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLP
KLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTK
VHTECCHGDLLECADDRADLAKYICENQDS I SSKLKECCEKPLLEKSHCIAEVENDEMPA
DLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKC
CAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVST
PTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSDRVTKCCTES
LVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKAT
KEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL
>drugbank_drug I DB01277 Mecasermin - Mecasermin
GPEILCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMY CAPLKPAKSA
>drugbank_drug I DB00041 Aldesleukin - DB00041 sequence
PTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEE ELKPLEEVLNLAQSKNFHLRPRDLI SNINVIVLELKGSETTFMCEYADETATIVEFLNRW ITFAQS I I STLT >drugbank_drug I DB00029 Anistreplase - DB00029 sequence
SYQVICRDEKTQMIYQQHQSWLRPVLRSNRVEYCWCNSGRAQCHSVPVKSCSEPRCFNGG
TCQQALYFSDFVCQCPEGFAGKCCEIDTRATCYEDQGI SYRGTWSTAESGAECTNWNSSA
LAQKPYSGRRPDAIRLGLGNHNYCRNPDRDSKPWCYVFKAGKYSSEFCSTPACSEGNSDC
YFGNGSAYRGTHSLTESGASCLPWNSMILIGKVYTAQNPSAQALGLGKHNYCRNPDGDAK
PWCHVLKNRRLTWEYCDVPSCSTCGLRQYSQPQFRIKGGLFADIASHPWQAAIFAKHRRS
PGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVH
KEFDDDTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEAL
SPFYSERLKEAHVRLYPSSRCTSQHLLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGG
PLVCLNDGRMTLVGI I SWGLGCGQKDVPGVYTKVTNYLDWIRDNMRP
>drugbank_drug I DB00071 Insulin, porcine - A chain
GIVEQCCTSICSLYQLENYCN
>drugbank_drug I DB00071 Insulin, porcine - B chain
FVNQHLCGSHLVEALYLVCGERGFFYTPKT
>drugbank_drug I DB00025 Antihemophilic Factor - DB00025 sequence
ATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFNTSWYKKTLFVEFTDHLFN
IAKPRPPWMGLLGPTIQAEVYDTWITLKNMASHPVSLHAVGVSYWKASEGAEYDDQTSQ
REKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSHVDLVKDLNSGLIGALLVCR
EGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRDAASARAWPKMHTVNGYVNR
SLPGLIGCHRKSVYWHVIGMGTTPEVHS IFLEGHTFLVRNHRQASLEI SPITFLTAQTLL
MDLGQFLLFCHI SSHQHDGMEAYVKVDSCPEEPQLRMKNNEEAEDYDDDLTDSEMDVVRF
DDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLAPDDRSYKSQYLNNGPQRIG
RKYKKVRFMAYTDETFKTREAIQHESGILGPLLYGEVGDTLLI IFKNQASRPY IYPHGI
TDVRPLYSRRLPKGVKHLKDFPILPGEIFKYKWTVTVEDGPTKSDPRCLTRYYSSFVNME
RDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDENRSWYLTENIQRFLPNPAG
VQLEDPEFQASNIMHS INGYVFDSLQLSVCLHEVAYWYILS IGAQTDFLSVFFSGYTFKH
KMVYEDTLTLFPFSGETVFMSMENPGLWILGCHNSDFRNRGMTALLKVSSCDKNTGDYYE
DSYEDI SAYLLSKNNAIEPRSFSQNSRHPSTRQKQFNATTIPENDIEKTDPWFAHRTPMP
KIQNVSSSDLLMLLRQSPTPHGLSLSDLQEAKYETFSDDPSPGAIDSNNSLSEMTHFRPQ
LHHSGDMVFTPESGLQLRLNEKLGTTAATELKKLDFKVSSTSNNLI STIPSDNLAAGTDN
TSSLGPPSMPVHYDSQLDTTLFGKKSSPLTESGGPLSLSEENNDSKLLESGLMNSQESSW
GKNVSSTESGRLFKGKRAHGPALLTKDNALFKVS I SLLKTNKTSNNSATNRKTHIDGPSL
LIENSPSVWQNILESDTEFKKVTPLIHDRMLMDKNATALRLNHMSNKTTSSKNMEMVQQK
KEGPIPPDAQNPDMSFFKMLFLPESARWIQRTHGKNSLNSGQGPSPKQLVSLGPEKSVEG
QNFLSEKNKVVVGKGEFTKDVGLKEMVFPSSRNLFLTNLDNLHENNTHNQEKKIQEEIEK
KETLIQENVVLPQIHTVTGTKNFMKNLFLLSTRQNVEGSYDGAYAPVLQDFRSLNDSTNR
TKKHTAHFSKKGEEENLEGLGNQTKQIVEKYACTTRI SPNTSQQNFVTQRSKRALKQFRL
PLEETELEKRI IVDDTSTQWSKNMKHLTPSTLTQIDYNEKEKGAITQSPLSDCLTRSHS I
PQANRSPLPIAKVSSFPSIRPIYLTRVLFQDNSSHLPAASYRKKDSGVQESSHFLQGAKK
NNLSLAILTLEMTGDQREVGSLGTSATNSVTYKKVENTVLPKPDLPKTSGKVELLPKVHI
YQKDLFPTETSNGSPGHLDLVEGSLLQGTEGAIKWNEANRPGKVPFLRVATESSAKTPSK
LLDPLAWDNHYGTQIPKEEWKSQEKSPEKTAFKKKDTILSLNACESNHAIAAINEGQNKP
EIEVTWAKQGRTERLCSQNPPVLKRHQREITRTTLQSDQEEIDYDDTISVEMKKEDFDIY
DEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSVPQFKKVVFQEFTD
GSFTQPLYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYSSLI SYEEDQRQGA
EPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDVHSGLIGPLLVCHT
NTLNPAHGRQVTVQEFALFFTIFDETKSWYFTENMERNCRAPCNIQMEDPTFKENYRFHA
INGYIMDTLPGLVMAQDQRIRWYLLSMGSNE IHS IHFSGHVFTVRKKEEYKMALYNLYP
GVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGMASGHIRDFQITAS
GQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAPMI IHGIKTQGARQKFSSLYISQ
FI IMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKH IFNPPI IARYIRLHPTHYS IRS
TLRMELMGCDLNSCSMPLGMESKAI SDAQITASSYFTNMFATWSPSKARLHLQGRSNAWR
PQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQDGHQWTLFFQNGKV
KVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCEAQDLY
>drugbank_drug I DB00059 Pegaspargase - DB00059 sequence
QMSLQQELRYIEALSAIVETGQKMLEAGESALDVVTEAVRLLEECPLFNAGIGAVFTRDE
THELDACVMDGNTLKAGAVAGVSHLRNPVLAARLVMEQSPHVMMIGEGAENFAFARGMER
VSPEIFSTSLRYEQLLAARKEGATVLDHSGAPLDEKQKMGTVGAVALDLDGNLAAATSTG
GMTNKLPGRVGDSPLVGAGCYANNASVAVSCTGTGEVFIRALAAYDIAALMDYGGLSLAE
ACERVVMEKLPALGGSGGLIAIDHEGNVALPFNTEGMYRAWGYAGDTPTTGIYREKGDTV ATQ
>drugbank_drug I DB00009 Alteplase - DB00009 sequence
SYQVICRDEKTQMIYQQHQSWLRPVLRSNRVEYCWCNSGRAQCHSVPVKSCSEPRCFNGG TCQQALYFSDFVCQCPEGFAGKCCEIDTRATCYEDQGI SYRGTWSTAESGAECTNWNSSA LAQKPYSGRRPDAIRLGLGNHNYCRNPDRDSKPWCYVFKAGKYSSEFCSTPACSEGNSDC YFGNGSAYRGTHSLTESGASCLPWNSMILIGKVYTAQNPSAQALGLGKHNYCRNPDGDAK PWCHVLKNRRLTWEYCDVPSCSTCGLRQYSQPQFRIKGGLFADIASHPWQAAIFAKHRRS PGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVH KEFDDDTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEAL SPFYSERLKEAHVRLYPSSRCTSQHLLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGG PLVCLNDGRMTLVGI I SWGLGCGQKDVPGVYTKVTNYLDWIRDNMRP
>drugbank_drug I DB06692 Aprotinin - Aprotinin (bovine pancreatic trypsin inhibitor)
RPDFCLEPPYTGPCKARI IRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA
>drugbank_drug I DB00039 Palifermin - DB00039 sequence
YDYMEGGDIRVRRLFCRTQWYLRIDKRGKVKGTQEMKNNYNIMEIRTVAVGIVAIKGVES EFYLAMNKEGKLYAKKECNEDCNFKELILENHYNTYASAKWTHNGGEMFVALNQKGIPVR GKKTKKEQKTAHFLPMAIT
>drugbank_drug I DBO 0072 Trastuzumab - Anti-HER2 Heavy chain 1 EVQLVESGGGLVQPGGSLRLSCAASGF IKDTYIHWVRQAPGKGLEWVARIYPTNGYTRY ADSVKGRFTI SADTSKNTAYLQMNSLRAEDTAVYYCSRWGGDGFYAMDYWGQGTLVTVSS ASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSS GLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPPKSCDKTHTCPPCPAPELLG GPSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEOY NSTYRVVSVLTVLHODWLNGKEYKCKVSNKALPAPIEKTI SKAKGOPREPOVYTLPPSRD ELTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSR WQQGNVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DBO 0072 Trastuzumab - Anti-HER2 Heavy chain 2 EVQLVESGGGLVQPGGSLRLSCAASGFNIKDTYIHWVRQAPGKGLEWVARIYPTNGYTRY ADSVKGRFTI SADTSKNTAYLQMNSLRAEDTAVYYCSRWGGDGFYAMDYWGQGTLVTVSS ASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSS GLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPPKSCDKTHTCPPCPAPELLG GPSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEOY NSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRD ELTKNOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSR WQQGNVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DBO 0072 Trastuzumab - Anti-HER2 Light chain 1 DIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLYSGVPS RFSGSRSGTDFTLTI SSLQPEDFATYYCQQHYTTPPTFGQGTKVEIKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC
>drugbank_drug I DBO 0072 Trastuzumab - Anti-HER2 Light chain 2 DIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLYSGVPS RFSGSRSGTDFTLTI SSLQPEDFATYYCQQHYTTPPTFGQGTKVEIKRTVAAPSVFIFPP SDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLT LSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC
>drugbank_drug | DBO 0075 Muromonab - 1SY6:H OKT3 Heavy Chain 1 QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNY NQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSA KTTAPSVYPLAPVCGGTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDL YTLSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRPKSCDKTHTCPPCPAPELLGG PSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYN STYRVVSVLTVLHODWLNGKEYKCKVSNKALPAPIEKTI SKAKGOPREPOVYTLPPSRDE LTKNOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSRW QQGNVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug | DBO 0075 Muromonab - 1SY6:H OKT3 Heavy Chain 2 QVQLQQSGAELARPGASVKMSCKASGYTFTRYTMHWVKQRPGQGLEWIGYINPSRGYTNY NQKFKDKATLTTDKSSSTAYMQLSSLTSEDSAVYYCARYYDDHYCLDYWGQGTTLTVSSA KTTAPSVYPLAPVCGGTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDL YTLSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRPKSCDKTHTCPPCPAPELLGG PSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEOYN STYRVVSVLTVLHODWLNGKEYKCKVSNKALPAPIEKTI SKAKGOPREPOVYTLPPSRDE LTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRW QQGNVFSCSVMHEALHNHYTQKSLSLSPGK >drugbank_drug I DB00075 Muromonab - 1SY6:L OKT3 Light Chain 1 QIVLTQSPAIMSASPGEKVTMTCSASSSVSYMNWYQQKSGTSPKRWIYDTSKLASGVPAH FRGSGSGTSYSLTI SGMEAEDAATYYCQQWSSNPFTFGSGTKLEINRADTAPTVS IFPPS SEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLTL TKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC
>drugbank_drug I DB00075 Muromonab - 1SY6:L OKT3 Light Chain 2 QIVLTQSPAIMSASPGEKVTMTCSASSSVSYMNWYQQKSGTSPKRWIYDTSKLASGVPAH FRGSGSGTSYSLTI SGMEAEDAATYYCQQWSSNPFTFGSGTKLEINRADTAPTVS IFPPS SEQLTSGGASWCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLTL TKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC
>drugbank_drug I DBO 0008 Peginterferon alfa-2a - DB00008 sequence CDLPQTHSLGSRRTLMLLAQMRKI SLFSCLKDRHDFGFPQEEFGNQFQKAETIPVLHEMI QQIFNLFSTKDSSAAWDETLLDKFYTELYQQLNDLEACVIQGVGVTETPLMKEDS ILAVR KYFQRITLYLKEKKYSPCAWEWRAEIMRSFSLSTNLQESLRSKE
>drugbank_drug I DB06285 Teriparatide - Parathyroid hormone precursor - Homo sapiens (1-34)
SVSEIQLMHNLGKHLNSMERVEWLRKKLQDVHNF
>drugbank_drug I DB00054 Abciximab - 1TXV:H ReoPro-like antibody Heavy Chain 1
EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYVHWVKQRPEQGLEWIGRIDPANGYTKY DPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCVRPLYDYYAMDYWGQGTSVTVSSA KTTAPSVYPLAPVCGDTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDL YTLSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRPKSCDKTHTCPPCPAPELLGG PSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYN STYRVVSVLTVLHODWLNGKEYKCKVSNKALPAPIEKTI SKAKGOPREPOVYTLPPSRDE LTKNOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSRW QQGNVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DB00054 Abciximab - 1TXV:H ReoPro-like antibody Heavy Chain 2
EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYVHWVKQRPEQGLEWIGRIDPANGYTKY DPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCVRPLYDYYAMDYWGQGTSVTVSSA KTTAPSVYPLAPVCGDTTGSSVTLGCLVKGYFPEPVTLTWNSGSLSSGVHTFPAVLQSDL YTLSSSVTVTSSTWPSQSITCNVAHPASSTKVDKKIEPRPKSCDKTHTCPPCPAPELLGG PSVFLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEOYN STYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRDE LTKNOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSRW QQGNVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DB00054 Abciximab - 1TXV:L ReoPro-like antibody Light Chain 1
DILMTQSPSSMSVSLGDTVSITCHASQGISSNIGWLQQKPGKSFMGLIYYGTNLVDGVPS RFSGSGSGADYSLTI SSLDSEDFADYYCVQYAQLPYTFGGGTKLEIKRADAAPTVS IFPP SSEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLT LTKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC
>drugbank_drug I DB00054 Abciximab - 1TXV:L ReoPro-like antibody Light Chain 2
DILMTQSPSSMSVSLGDTVSITCHASQGISSNIGWLQQKPGKSFMGLIYYGTNLVDGVPS RFSGSGSGADYSLTI SSLDSEDFADYYCVQYAQLPYTFGGGTKLEIKRADAAPTVS IFPP SSEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLT LTKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC
>drugbank_drug I DB00094 Urofollitropin - Alpha chain
APDVQDCPECTLQENPFFSQPGAPILQCMGCCFSRAYPTPLRSKKTMLVQKNVTSESTCC
VAKSYNRVTVMGGFKVENHTACHCSTCYYHKS
>drugbank_drug I DB00094 Urofollitropin - Beta chain
NSCELTNITIAIEKEECRFCI S INTTWCAGYCYTRDLVYKDPARPKIQKTCTFKELVYET VRVPGCAHHADSLYTYPVATQCHCGKCDSDSTDCTVRGLGPSYCSFGEMKE
>drugbank_drug I DBO 0033 Interferon gamma-lb - DB00033 sequence CYCQDPYVKEAENLKKYFNAGHSDVADNGTLFLGILKNWKEESDRKIMQSQIVSFYFKLF KNFKDDQS IQKSVETIKEDMNVKFFNSNKKKRDDFEKLTNYSVTDLNVQRKAIHELIQVM AELSPAAKTGKRKRSQMLFRGRRASQ >drugbank_drug I DB00026 Anakinra - DB00026 sequence
MRPSGRKSSKMQAFRIWDVNQKTFYLRNNQLVAGYLQGPNVNLEEKIDVVPIEPHALFLG IHGGKMCLSCVKSGDETRLQLEAVNITDLSENRKQDKRFAFIRSDSGPTTSFESAACPGW FLCTAMEADQPVSLTNMPDEGVMVTKFYFQEDE
>drugbank_drug I DB00021 Secretin - DB00021 sequence
HSDGTFTSELSRLRDSARLQRLLQGLV
>drugbank_drug I DB00006 Bivalirudin - DB00006 sequence
FPRPGGGGNGDFEEIPEEYL
>drugbank_drug I DB01285 Corticotropin - ACTH(l-39)
SYSMEHFRWGKPVGKKRRPVKVYPDGAEDQLAEAFPLEF
>drugbank_drug I DB00074 Basiliximab - 1MIM:H Anti-CD25 antibody heavy CHIMERIC chain 1
QLQQSGTVLARPGASVKMSCKASGYSFTRYWMHWIKQRPGQGLEWIGAIYPGNSDTSYNQ KFEGKAKLTAVTSASTAYMELSSLTHEDSAVYYCSRDYGYYFDFWGQGTTLTVSSASTKG PSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSL SSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPPKSCDKTHTCPPCPAPELLGGPSVF LFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQY STYR VVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRDELTKN OVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWOOGN VFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DB00074 Basiliximab - 1MIM:H Anti-CD25 antibody heavy CHIMERIC chain 2
QLQQSGTVLARPGASVKMSCKASGYSFTRYWMHWIKQRPGQGLEWIGAIYPGNSDTSYNQ KFEGKAKLTAVTSASTAYMELSSLTHEDSAVYYCSRDYGYYFDFWGQGTTLTVSSASTKG PSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSL SSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPPKSCDKTHTCPPCPAPELLGGPSVF LFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEOY STYR VVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRDELTKN OVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWOOGN VFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DB00074 Basiliximab - 1MIM:L Anti-CD25 antibody light CHIMERIC chain 1
QIVSTQSPAIMSASPGEKVTMTCSASSSRSYMQWYQQKPGTSPKRWIYDTSKLASGVPAR FSGSGSGTSYSLTISSMEAEDAATYYCHQRSSYTFGGGTKLEIKRTVAAPSVFIFPPSDE QLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSK ADYEKHKVYACEVTHQGLSSPVTKSFNRGE
>drugbank_drug I DB00074 Basiliximab - 1MIM:L Anti-CD25 antibody light CHIMERIC chain 2
QIVSTQSPAIMSASPGEKVTMTCSASSSRSYMQWYQQKPGTSPKRWIYDTSKLASGVPAR FSGSGSGTSYSLTISSMEAEDAATYYCHQRSSYTFGGGTKLEIKRTVAAPSVFIFPPSDE QLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSK ADYEKHKVYACEVTHQGLSSPVTKSFNRGE
>drugbank_drug I DB01276 Exenatide - Exenatide
HGEGTFTSDLSKQMEEEAVRLFIEWLKNGGPSSGAPPPS
>drugbank_drug I DBO 0073 Rituximab - Mouse-Human chimeric Anti- CD20 Heavy Chain 1
QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARVVYYSNSYWYFDVWGTGTTVTV SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTY RVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRDELTK NOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWOOG NVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DBO 0073 Rituximab - Mouse-Human chimeric Anti- CD20 Heavy Chain 2
QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQGLEWIGAIYPGNGDTSY NQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARVVYYSNSYWYFDVWGTGTTVTV SGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLY SLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSV FLFPPKPKDTLMI SRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEOYNSTY RVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTI SKAKGQPREPQVYTLPPSRDELTK NOVSLTCLVKGFYPSDIAVEWESNGOPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWOOG NVFSCSVMHEALHNHYTQKSLSLSPGK
>drugbank_drug I DBO 0073 Rituximab - Mouse-Human chimeric Anti- CD20 Light Chain 1
QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTI SRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRTVAAPSVFIFPPS DEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR
>drugbank_drug I DBO 0073 Rituximab - Mouse-Human chimeric Anti- CD20 Light Chain 2
QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKPWIYAPSNLASGVPAR FSGSGSGTSYSLTI SRVEAEDAATYYCQQWSFNPPTFGAGTKLELKRTVAAPSVFIFPPS DEQLKSGTASWCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTL SKADYEKHKVYACEVTHQGLSSPVTKSFNR
>drugbank_drug I DB00055 Drotrecogin alfa - Heavy chain
LIDGKMTRRGDSPWQVVLLDSKKKLACGAVLIHPSWVLTAAHCMDESKKLLVRLGEYDLR
RWEKWELDLDIKEVFVHPNYSKSTTDNDIALLHLAQPATLSQTIVPICLPDSGLAERELN
QAGQETLVTGWGYHSSREKEAKRNRTFVLNFIKIPVVPHNECSEVMSNMVSENMLCAGIL
GDRQDACEGDSGGPMVASFHGTWFLVGLVSWGEGCGLLHNYGVYTKVSRYLDWIHGHIRD
KEAPQKSWAP
>drugbank_drug I DB00055 Drotrecogin alfa - Light chain
SKHVDGDQCLVLPLEHPCASLCCGHGTCIXGIGSFSCDCRSGWEGRFCQREVSFLNCSLD
NGGCTHYCLEEVGWRRCSCAPGYKLGDDLLQCHPAVKFPCGRPWKRMEKKRSHL
SEQUENCES OF NON-ANTIBODY POLYPEPTIDE FOR PREPARATION
OF ENHANCED SEQUONS
SEQUENCE PORTIONS TO BE REVISED ARE UNDERLINED
Alpha chain of Follitropin beta:
1 71
APDVODCPECTLOENPFFSOPGAPILOCMGCCFSRAYPTPLRSKKTMLVOKNVTSESTCCVAKSYNRVTVM
72 92
GGFKVENHTACHCSTCYYHKS
Beta chain of Follitropin beta:
1 71
NSCELTNITIAIEKEECRFCIS INTTWCAGYCYTRDLVYKDPARPKIQKTCTFKELVYETVRVPGCAHHAD
72 111
SLYTYPVATQCHCGKCDSDSTDCTVRGLGPSYCSFGEMKE
Imiglucerase :
1 71
ARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHTGTGLLLTLQP
72 142
EQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNI IRVPMASCDFS IRTYTYADTPDDF
143 213
QLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYF
214 284
VKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQ
285 355
RLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLG 356 426
SWDRGMQYSHSI ITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPI IVDITKDTFYKQPMFYHLGHFSKF
427 497
IPEGSORVGLVASOKNDLDAVALMHPDGSAWWLNRSSKDVPLTIKDPAVGFLETISPGYS IHTYLWRRO

Claims

WHAT IS CLAIMED
1. A chimeric therapeutic polypeptide of a pre-existing therapeutic polypeptide, said pre¬ existing therapeutic polypeptide having a length of about 15 to about 1000 amino acid residues and exhibiting a secondary structure that comprises at least one tight turn containing a sequence of four to about seven amino acid residues in which at least two amino acid side chains extend on the same side of the tight turn and are within less than about 7A of each other, said preexisting therapeutic polypeptide lacking the sequon within that sequence of four to about seven amino acid residues, in the direction from left to right and from N-terminus to C-terminus,
Aro- (Xxx) n- (Zzz)p-Asn-Yyy-Thr/Ser, [SEQ ID
NO: ]
wherein
Aro is an aromatic amino acid residue, n is zero, 1, 2, 3 or 4,
Xxx is an amino acid residue other than an aromatic residue,
P is zero or 1,
Zzz is any amino acid residue,
Asn is asparagine,
Yyy is any amino acid residue other than proline, and
Thr/Ser is one or the other of the amino acid residues threonine and serine,
said chimeric therapeutic polypeptide having substantially the same length, at least one tight turn and substantially the same sequence as the pre-existing therapeutic polypeptide, the two sequences differing by the presence in the chimeric therapeutic polypeptide of said sequon,
Aro- (Xxx) n- (Zzz)p-Asn-Yyy-Thr/Ser, and
said sequon being located at the same position in said tight turn as said sequence of four to about seven amino acid residues such that the side chains of the Aro, Asn and Thr/Ser amino acid
residues project on the same side of the turn and are within less than about 7A of each other.
2. The chimeric polypeptide according to claim 1, wherein said chimeric polypeptide, when said sequon is glycosylated, exhibits a folding
stabilization enhanced by about -0.5 to about -4 kcal/mol compared to said pre-existing therapeutic polypeptide in non-glycosylated form.
3. The chimeric polypeptide according to claim 1, wherein said aromatic amino acid residue is Phe, Trp, Tyr or His.
4. The chimeric polypeptide according to claim 1, wherein n is 1.
5. The chimeric polypeptide according to claim 1, wherein Thr/Ser is threonine.
6. The chimeric polypeptide according claim 1, where in said pre-existing therapeutic chimeric polypeptide is an antibody.
7. The chimeric polypeptide according to claim 6, where in said antibody is 0KT3.
8. The chimeric polypeptide according to claim 1, where in said pre-existing therapeutic chimeric polypeptide is a hormone.
9. The chimeric polypeptide according to claim 8, where in said pre-existing hormone is follicle-stimulating hormone, luteinizing hormone, human, choriogonadotropin, or human growth, factor, insulin .
10. The chimeric polypeptide according to claim 1, wherein said pre-existing therapeutic chimeric polypeptide is selected from the group consisting of factor VIII, factor IX, erythropoietin, hepatitis B surface protein, tPA, plasmin,
streptokinase, urokinase, thrombin, follicle- stimulating hormone, luteinizing hormone, human choriogonadotropin, IL-2, GM-CSF and IFN-γ.
11. The chimeric polypeptide according to claim 1 that has a length of about 25 to about 500 amino acid residues.
12. The chimeric polypeptide according to claim 1, wherein Asn occupies the i+2 position of a five-residue type I β-bulge turn that spans from Aro at the i position to Ser/Thr at the i+4 position.
13. The chimeric polypeptide according to claim 1, wherein Asn occupies the i+1 position of a five-residue type I' β-turn that spans from Aro at the i position to Ser/Thr at the i+3.
14. The chimeric polypeptide according to claim 1, wherein Asn occupies the i+3 position of a six-residue 4:6 hairpin loop turn that spans from Aro at the i position to Ser/Thr at the i+5 position.
15. The chimeric polypeptide according to claim 1, wherein said sequon has the sequence
Lys- (Zzz)m-Aro- (Xxx) n-Zzz-Asn-Yyy-Thr/Ser,
[SEQ ID NO: ]
wherein
m is zero to three, and
Lys is lysine.
16. The chimeric polypeptide according to claim 1, wherein said sequon has the sequence
Aro-Xxx-Zzz-Asn-Yyy-Thr/Ser [SEQ ID
NO: 1.
17. The chimeric polypeptide according to claim 1, wherein said sequon has the sequence
Aro-Xxx-Asn-Yyy-Thr/Ser [SEQ ID NO: ] .
18. The chimeric polypeptide according to claim 1, wherein said sequon has the sequence
Aro-Asn-Yyy-Thr/Ser [SEQ ID NO: ] .
19. The chimeric polypeptide according to claim 1, wherein the Asn of said sequon is
glycosylated .
20. A method of enhancing folded
stabilization of a chimeric therapeutic polypeptide compared to a pre-existing therapeutic polypeptide, wherein said pre-existing therapeutic polypeptide comprises a sequence of about 15 to about 1000 amino acid residues and exhibits a secondary structure that comprises at least one tight turn in which the side chains of two residues in a sequence of four to about seven amino acid residues within said tight turn project on the same side of the turn and are within less than about 7A of each other, said sequence of four to about seven amino acid residues free of the sequon Aro- (Xxx) n- ( Zz z ) p-Asn (Glycan) -Yyy-Thr/Ser [SEQ
ID NO: ], as defined below, said method comprising the step of :
preparing a therapeutic chimeric polypeptide of the same length and substantially same sequence as the therapeutic polypeptide that exhibits a secondary structure comprising at least one tight turn at the same sequence position within the tight turn of said therapeutic polypeptide except that said sequence of four to about seven amino acid residues is replaced with the sequon, in the direction from left to right and from N-terminus to C-terminus,
Aro- (Xxx) n- (Zzz ) p-Asn (Glycan) -Yyy-Thr/Ser, wherein
Aro is an aromatic amino acid residue, n is zero, 1, 2, 3 or 4,
Xxx is an amino acid residue other than an aromatic residue,
p is zero or 1,
Zzz is any amino acid residue,
Asn (Glycan) is glycosylated asparagine, Yyy is any amino acid residue other than proline,
Thr/Ser is one or the other of the amino acid residues threonine and serine, and
the side chains of the Aro, Asn(Glycan) and Thr/Ser amino acid residues project on the same side of the turn and are within less than about 7A of each other .
21. The method according to claim 20, wherein Asn(Glycan) is a 2- ( acetylamino ) -deoxy-2-β- glucopyranosyl ] -L-asparaginyl residue [Asn (GlcNAc) _ ] .
22. The method according to claim 20, wherein Asn(Glycan) is Asn (GlcNAc) 2 ·
23. The method according to claim 20, wherein Asn(Glycan) is Asn (GlcNAc) 2Mani .
24. The method according to claim 20, wherein the glycan of Asn(Glycan) is paucimannose .
25. The method according to claim 20, wherein said therapeutic chimeric polypeptide is prepared by expressing a nucleic acid sequence that encodes the polypeptide sequence of said therapeutic chimeric polypeptide in a host cell that glycosylates the amino acid sequence Aro- (Xxx) n-Zzz-Asn (Glycan) -
Yyy-Thr/Ser [SEQ ID NO: ] when present in a
polypeptide sequence expressed therein.
26. The method according to claim 20, wherein said therapeutic chimeric polypeptide is prepared by in vitro peptide synthesis.
27. The method according to claim 26, wherein said in vitro peptide synthesis is by solid phase means .
28. The method according to claim 20, wherein said sequence of four to about seven amino acid residues within said tight turn of said
therapeutic polypeptide are glycosylation-free .
29. A pharmaceutical composition comprising a pharmaceutically acceptable diluent having dissolved or dispersed therein an effective amount of a therapeutic chimeric polypeptide
according to claim 1 in which the Asn of said sequon is glycosylated.
30. The pharmaceutical composition according to claim 29 wherein said chimeric
therapeutic polypeptide is an antibody or a hormone.
PCT/US2011/050900 2010-09-08 2011-09-09 Reliable stabilization of n-linked polypeptide native states with enhanced aromatic sequons located in polypeptide tight turns WO2012039954A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US38096710P 2010-09-08 2010-09-08
US61/380,967 2010-09-08
US201161514202P 2011-08-02 2011-08-02
US61/514,202 2011-08-02

Publications (2)

Publication Number Publication Date
WO2012039954A2 true WO2012039954A2 (en) 2012-03-29
WO2012039954A3 WO2012039954A3 (en) 2012-08-23

Family

ID=45874281

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/050900 WO2012039954A2 (en) 2010-09-08 2011-09-09 Reliable stabilization of n-linked polypeptide native states with enhanced aromatic sequons located in polypeptide tight turns

Country Status (1)

Country Link
WO (1) WO2012039954A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109790545A (en) * 2016-03-10 2019-05-21 约翰·霍普金斯大学 Generate the method and therapeutical uses of the monomer diphtheria toxin fusion protein without aggregation
US11203626B2 (en) 2016-03-10 2021-12-21 The Johns Hopkins University Methods of producing aggregate-free monomeric diphtheria toxin fusion proteins and therapeutic uses
US11965009B2 (en) 2016-03-10 2024-04-23 The Johns Hopkins University Methods of producing aggregate-free monomeric diphtheria toxin fusion proteins and therapeutic uses

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1464702A1 (en) * 2001-12-28 2004-10-06 Chugai Seiyaku Kabushiki Kaisha Method of stabilizing protein

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1464702A1 (en) * 2001-12-28 2004-10-06 Chugai Seiyaku Kabushiki Kaisha Method of stabilizing protein

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANDREI-J. PETRESCU ET AL.: 'Statistical analysis of the protein environment of N-glycosylation sites: implications for occupancy, structure, and folding' GLYCOBIOLOGY vol. 14, no. 2, 2004, ISSN 0959-6658 pages 103 - 114 *
SIGNE PERLMAN ET AL.: 'Glycosylation of an N-terminal extension prolongs the half-life and increases the in vivo activity of follicle stimulating hormone' THE JOURNAL OF CLINICAL ENDOCRINOLOGY & METABOLISM vol. 88, no. 7, 2003, ISSN 0021-972X pages 3227 - 3235 *
STEVE ELLIOTT ET AL.: 'Control of rHuEPO biological activity: The role of carbohydrate' EXPERIMENTAL HEMATOLOGY vol. 32, no. 12, 2004, ISSN 0301-472X pages 1146 - 1155 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109790545A (en) * 2016-03-10 2019-05-21 约翰·霍普金斯大学 Generate the method and therapeutical uses of the monomer diphtheria toxin fusion protein without aggregation
EP3426785A4 (en) * 2016-03-10 2019-12-25 The Johns Hopkins University Methods of producing aggregate-free monomeric diphtheria toxin fusion proteins and therapeutic uses
US10988512B2 (en) * 2016-03-10 2021-04-27 The Johns Hopkins University Methods of producing aggregate-free monomeric diphtheria toxin fusion proteins and therapeutic uses
US11203626B2 (en) 2016-03-10 2021-12-21 The Johns Hopkins University Methods of producing aggregate-free monomeric diphtheria toxin fusion proteins and therapeutic uses
AU2017230792B2 (en) * 2016-03-10 2023-08-10 The Johns Hopkins University Methods of producing aggregate-free monomeric diphtheria toxin fusion proteins and therapeutic uses
US11965009B2 (en) 2016-03-10 2024-04-23 The Johns Hopkins University Methods of producing aggregate-free monomeric diphtheria toxin fusion proteins and therapeutic uses

Also Published As

Publication number Publication date
WO2012039954A3 (en) 2012-08-23

Similar Documents

Publication Publication Date Title
US8906681B2 (en) Reliable stabilization of N-linked polypeptide native states with enhanced aromatic sequons located in polypeptide tight turns
De Veer et al. Cyclotides: from structure to function
WO2020182229A1 (en) Fusion protein and method of preparing liraglutide intermediate polypeptide thereof
Weissman et al. The pro region of BPTI facilitates folding
JP2021059595A (en) Extended recombinant polypeptides and compositions comprising the same
US9168312B2 (en) Growth hormone polypeptides and methods of making and using same
US10590407B2 (en) Asx-specific protein ligase
Du et al. A bacterial expression platform for production of therapeutic proteins containing human-like O-linked glycans
CN101974090B (en) GLP-1 analog fusion proteins
US11345722B2 (en) High pH protein refolding methods
AU2010258892A1 (en) Growth hormone polypeptides and methods of making and using same
EP0531404A1 (en) Ubiquitin-specific protease.
WO1994005699A1 (en) Glucagon antagonists and methods for detecting glucagon antagonists
JPH01104178A (en) Production of enkephalinase, its assay,and pharmaceutical composition containing same
IE914347A1 (en) Fusion polypeptides
WO2012039954A2 (en) Reliable stabilization of n-linked polypeptide native states with enhanced aromatic sequons located in polypeptide tight turns
WO2016119399A1 (en) Use of polypeptide complex as polypeptide or protein drug carrier, method, and fusion protein complex thereof
US20220213461A1 (en) Asx-specific protein ligases and uses thereof
Tran et al. Evaluation of efficient non-reducing enzymatic and chemical ligation strategies for complex disulfide-rich peptides
JP2001511346A (en) Human parathyroid hormone recombinant expression vector
EP3888667B1 (en) Glucagon analog and methods of use thereof
WO2007068053A1 (en) Recombinant protein production
AU1200392A (en) Methods for detecting glucagon antagonists
CN104364378A (en) Method of producing a recombinant peptide
Svoboda et al. Molecular cloning and in vitro properties of the recombinant rabbit secretin receptor

Legal Events

Date Code Title Description
NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11827210

Country of ref document: EP

Kind code of ref document: A2