WO2011038187A1

WO2011038187A1 - Controlled adeno-associated virus (aav) diversification and libraries prepared therefrom

Info

Publication number: WO2011038187A1
Application number: PCT/US2010/050135
Authority: WO
Inventors: James M. Wilson; Luc H. Vandenberghe
Original assignee: The Trustees Of The University Of Pennsylvania
Priority date: 2009-09-25
Filing date: 2010-09-24
Publication date: 2011-03-31

Abstract

A method for the directed production of a combinatorial library of an altered protein coding sequence is described. The method involves predetermining at least two different nucleotide sequence fragments (regions) which are of interest (ROI) in which the sequence diversity is to be introduced in a manner that may be different for the different domains. The full-length protein coding sequence is divided into predetermined subdomains which are fragments spanning the full-length protein coding sequence. At least one of the subdomains contains the first ROI and at least one of the subdomains contains the second ROI. Primers are designed to specifically amplify these subdomains in a manner which permits the subdomains following treatment as described herein to be directionally and positionally assembled into a combinatorial library of the full-length protein coding sequence which contains altered sequences. Each of the subdomains is amplified with a series of primer sets which comprises at least a first primer set and at least a second primer set, wherein each of the primer sets each consist of a right primer and a left primer which are complementary to a junction located between consecutive fragments in the protein coding sequence. A predetermined subset of the amplified subdomains are subject to different amplification conditions which generate subdomains with altered sequences. Thereafter, the amplified subdomains are pooled and assembled in a directional and positional manner using SOE PCR or other cloning methods to generate a combinatorial library of full-length protein coding sequences which comprise the altered sequences.

Description

CONTROLLED ADENO-ASSOCIATED VIRUS (AAV)

DIVERSIFICATION AND LIBRARIES PREPARED THEREFROM

Statement Regarding Federally Sponsored Research or Development

This invention was made with government support under P01-HL-059407 and P01-HL-051746 awarded by the National Institutes of Health. The US Government has certain rights in the invention.

Background of the Invention

The use of adeno-associated viruses as viral vectors has been widely described. AAV is desirable for use as viral vectors, because AAV are safe, as wild-type AAV is not associated with any pathology in humans. Further, AAV offers the capability for efficient gene delivery and sustained transgene expression in a number of tissues.

However, several problems remain with existing AAV vectors, including preexisting immunity against human serotypes, targeted and efficient delivery, limited packaging capacity, and infection of nonpermissive cell types. Therefore, the search for new AAV capsids is on-going, because the capsids mediate cellular targeting (and thus gene delivery) and immune response. Attempts to generate new AAV viral vectors have been described.

One such approach involves the so-called directed evolution which may be performed in a high through-put approach [N. Maheshri, et al, "Directed evolution of adeno-associated virus yields enhanced gene delivery vectors", Nature Biotechnology, 24 (2): 198-204 (February 2006); J. T. Koerber, et al, Construction of diverse adeno- associated viral libraries for directed evolution of enhanced gene delivery vehicles", Nature Protocols, 1(2): 701-706 (2006)]. This method involves diversifying AAV capsid (cap) genes through error-prone polymerase chain reaction (PCR) [Maheshri et al, cited above and Koerber et al, cited above] followed by the staggered extension process [H. Zhao, et al, "Molecular Evolution by staggered extension process (StEP) in vitro recombination", Nat. Biotechnol, 16: 258-261 (1998); this staggered extension process is the subject of US Patent No. 6, 153,410 ]. The mutant cap genes are then cloned into an AAV packaging plasmid to produce a large, diverse AAV library containing randomly distributed capsid mutations [Maheshri (2006) and Koerber (2006), cited above]. After an optional purification step, the library may be selected for variants with enhanced function and the genomic DNA from the remaining virions recovered. The process is repeated until the desired phenotype is achieved (Maheshri (2006) and Koerber (2006), cited above).

In another publication, a method for generating a library of AAV capsids has been described [Kay and Grim, US Patent Application Publication T o. US

2007/0243526 (Oct. 18, 2007)]. This approach involves fragmenting nucleic acids isolated from capsid genes from multiple AAV serotypes, re-assembling the fragments into larger pieces by PCR, amplifying the full-length PCR pieces and cloning these pieces to create a relatively large library containing randomly shuffled AAV capsid genes. To produce a viral library of AAV with the shuffled AAV capsids, the full-length capsid sequence may be cloned into a plasmid and the plasmid can be used to produce a recombinant AAV in a host cell. Through the use of one or more types of selective pressure, recombinant AAV that survives multiple passages under the selective pressure are obtained.

Several major obstacles have limited construction of AAV libraries to date, including architectural incompatibility and stearic obstruction on a tertiary and quaternary level of two or more different AAV capsids. Secondly, even though existing error-prone PCR and shuffling protocols generate enormous diversity, this diversity is too limiting to yield all permutations of capsid composition and therefore do not provide efficient libraries. A hypothetical AAV capsid of only 10 amino acids (AA), for example, would require a library with a clonal complexity of approximately O¹³ to contain all AA permutations. Technically, classical plasmid libraries are limited, mainly due to limiting bacterial transformation efficiency of ligated DNA to approximately tO⁷ which clearly can only accommodate a fraction of the diversity required for the hypothetical capsid. AAV is not 10 AA but is over 700 AA in size, bringing the number of permutations 20^>70°, well over the technical constraints of a bacterial library. Given the limited capacity of bacterial libraries, strategies that increase functional diversity rather than random structure diversity are desired. Thus, existing approaches permits production of libraries numbers of altered AAV capsid genes and requires time-consuming procedures to eliminate potentially significant numbers of non-functional capsids. What are needed are efficient methods for generating novel AAV capsids, while reducing the number of non-functional AAV capsids generated.

Brief Description of the Drawings

Fig 1 provides a cartoon example of a two step method to generate diversity within delineated domains of coding sequence (CDS). In this example, two regions of interest (ROIs) are defined within a CDS. In a first step, the ROI are delineated by primers according to the method in a manner that all eventual permutations of the ROI universally link up with the flanking regions (FRs). Diversity is generated by any number of methods in any number of combinations e.g. one ROI be amplified from different templates without or without different primers or pools of primers whereas another ROI undergoes a DNA polymerization or ligation step that introduces variation through shuffling or mutagenesis. In a final second step the entire CDS is amplified by SOE PCR that then splices all individual components in a correct order due to the complementarity of the primers. N: number of ROI, EP: Error Prone, STEP: Staggered extension process, Δ: different.

Fig. 2A is a Neighbor-Joining dendrogram of the AAV capsid illustrating rh32.33 structural distinctness from all serotypes with AAV4 the most homologous capsid.

Fig. 2B provides various quantitative measures of AAV8 and rh32.33 neutralization by human serum including IVIG neutralization in the in vitro NAB assay (titer represents dilution of 50% transduction inhibition), estimate of seroprevalence in 888 individuals in worldwide populations (Europe, US, Africa and Australia) and lowest IVIG dose at which statistically significant in vivo transduction inhibition is observed at a dose of 10ⁿ GC per mouse.

Fig. 3 A is a cartoon showing the viral genome backbones for pAAV8 and the pAAVivo - 8.

Fig. 3B is a cartoon showing the AAVrh32.33 capsid open reading frame (ORF) broken down into preliminary subdomains which consist of either constant (C) or hypervariable (H) domains. Constant (C) regions CI , C2, C3, C4, C5, C6, C7 and C8 are illustrated, as are hypervariable (H) regions HI, H2, H3, H4, H5, H6, H7 and H9.

Fig. 3C provides a description of the first cycle polymerase chain reaction (PCR) in the method of the invention. Individual domains (here H) are amplified flanked by oligonucleotide mixtures with overlapping sequences to flanking domains with complementarity to either the same or a different serotype.

Fig. 3D provides a schematic of 2^nd round PCR of a full capsid open reading frame with only individual subdomains as template and yielding pAAV and pAAVivo compatible BgHI-Spel flanked capsids.

Fig. 4 is a bar chart comparing the ability of vectors containing mutant AAV8 capsid proteins, generated using an AAV combinatorial library approach of the invention, to express a marker gene product (enhanced green fluorescent protein) following incubation with different monoclonal antibodies, including two an-AAV8 antibodies.

Summary of the Invention

Advantageously, the method of the invention permits the generation of novel proteins and protein coding sequences, while providing the ability to control where diversity is introduced and retaining regions of interest within the protein where no changes are introduced. Thus, the method of invention permits the number of nonfunctional proteins to be reduced by permitting one to define regions which are critical to the desired function. This allows for a more representative, functional combinatorial library of a size manageable within the context of a bacterial library (about 10⁷ molecules).

In one embodiment, a method is provided for the directed production of a combinatorial library of an altered protein coding sequence. According to this method, at least a first region of interest (ROI) and at least a second ROI within a protein coding sequence are predetermined and at least a first primer set which is specific for at least a first subdomain containing the at least first ROI and at least a second primer set which is specific for at least second subdomain containing the at least second ROI are provided. The subdomains are specifically amplified with a series of primer sets which comprises the at least first primer set and the at least second primer set, wherein each of the primer sets each consist of a right primer and a left primer which are complementary to a junction located between consecutive fragments in the protein coding sequence, wherein a predetermined subset of the amplified subdomains containing the at least first ROI and/or the at least second ROI are subject to different conditions in order to generate a subset of subdomains with altered sequences in the at least first ROI and/or the at least second ROI. The resulting amplified subdomains (both altered or non-altered) are directionally and positionally assembled to generate a combinatorial library of altered full-length protein coding sequences.

In one embodiment, the invention provides a method for the directed production of a combinatorial library of an altered adeno-associated virus gene. The method may utilize a single AAV gene sequence in which at least two regions of interest (ROI) have been identified. Alternatively, the method may utilize two or more different AAVs for preparation of the directed combinatorial library.

Suitably, at least a first primer set which specifically amplifies one of the at least two ROI is designed. The primer set consists of a right primer (P_Ri) and left primer (Pu)_> wherein one of the P_R or P_L has 5' complementarity to nucleotide sequences in the sequence the first ROI and the other has 3' complementary to the first ROI. Further, at least a second primer set which specifically amplifies a second of at least two ROI, wherein said second primer set consists of a right primer (PR2) and left primer (Pu), wherein one of the Pm or PL2 has 5' complementarity to nucleotide sequences in the sequence flanking the second ROI and the other has 3' complementary to the second ROI, is provided. These primer sets are used to specific amplify the ROI (templates) to which they are directed and thereby generate a series of building blocks which correspond to subdomains within the full-length gene sequence, which subdomains, when assembled, span the full length AAV gene sequence. The method permits a subset of the building blocks containing altered sequences to be generated from the at least one of the first or second ROI to form a plurality of diverse building blocks corresponding to the first or second ROI. The resulting amplified regions (building blocks) are directionally and positionally assembled to generate a combinatorial library of full-length AAV genes, each of which contains the at least one ROI with altered sequences.

In another aspect, the invention provides a combinatorial library produced using the method of the invention. In one embodiment, this method provides a collection or library of sequences which contains a number of variant sequences, which variants are unknown in nature.

These and other aspects of the invention will be readily apparent from the following detailed description of the invention. Detailed Description of the Invention

The present invention provides a method for engineering DNA and/or proteins in defined regions of biological relevance in a combinatorial manner. One advantage of this invention lies is that variability can be introduced in a specific manner in regions of interest in a combinatorial fashion while preserving domains that one chooses not to perturb.

A method for the directed production of a combinatorial library of an altered protein coding sequence is described. The method involves predetermining at least two different nucleotide sequence fragments (regions) which are of interest (ROI) in which the sequence diversity is to be introduced in a manner that may be different for the different subdomains. The full-length protein coding sequence is divided into predetermined subdomains which are fragments spanning the full-length protein coding sequence.

As used herein, a protein coding sequence is an open reading frame (ORF) for a selected protein or polypeptide. A fragment of the protein coding sequence refers to a nucleic acid sequence which is a discrete portion of the protein coding sequence, which is shorter in length than the full-length protein coding sequence. Such a fragment may be internal, or located at the 5' or 3 ' terminus of the protein coding sequence. In one embodiment, the fragment is at least about 15 base pairs in length, and contains at least 15 base pairs less than the full-length open reading frame, preferably more.

As used throughout this specification and the claims, the terms "comprising" and

"including" are inclusive of other components, elements, integers, steps and the like. Conversely, the term "consisting" and its variants are exclusive of other components, elements, integers, steps and the like.

The protein coding sequence is divided into "subdomains" which are nucleic acid fragments of the protein coding sequence used to generate a combinatorial library. In one embodiment, the entire protein coding sequence is divided into contiguous subdomains such that the full-length protein coding sequence is represented. The location where two contiguous subdomains meet is termed herein a junction. Optionally, a protein coding sequence or a portion thereof may be divided into more than one set of subdomains. Suitably, each subdomain is large enough to accommodate a primer for PCR isolation, e.g., at least 15, 25, 30, or 50 nucleotides in length and may be as large a fragment as can be amplified via PCR. In some cases this may be up to 95% of the length of the protein coding sequence. In other embodiments, the fragment may be 5%, 10%, or 75% of the protein coding sequence, or shorter or longer. Suitably, a subdomain may contain a ROI. In some embodiments, a subdomain may contain no sequences other than the ROI. In other embodiment, a subdomain may contain nucleic acids which are 5 ^! and/or 3 ^! to the ROI. A protein coding sequence is divided into at least one set of subdomains. In one embodiment, a protein coding sequence is divided into two or more sets of subdomains. It follows then, that the method of the invention may utilize multiple subdomains of different length which span the coding sequence.

In one embodiment, the preparation of a library according to the present invention involves the use of two or more subdomains containing the same ROI (i.e., different ROI templates). Since one of the techniques for modifying polymerase chain reactions of a ROI involves varying the template, it will be understood that there can be multiple subdomains which contain the same ROI.

The term "region of interest" or "ROI" refers to a nucleic acid fragment within a protein coding sequence that has been pre-selected to be either altered or changed by some means, or to remain unchanged or unperturbed. Suitably, an ROI is at least 15 nucleotides in length and may be up to 95% of the length of the protein coding sequence. An ROI may be about 15 nucleotides to 25 nucleotides to about 60% of the length of the protein coding sequence. In one embodiment, the ROI may be about 50 base pairs to about 800bp. However, one can readily select design shorter or longer ROI lengths. Often, the ROI is pre-selected based upon the function which is performed by the peptide or polypeptide for which the ROI codes. For example, one of skill in the art may pre-select the coding sequence corresponding to a specific protein structural element to remain unchanged and thus identify the coding sequence as an ROI. Alternatively, one of skill in the art may select a hypervariable region of a viral capsid to a ROI, which is to be targeted for sequence alteration, or to be swapped with another hypervariable region, e.g., from another source or from a non-contiguous region within the same virus source. For example, within an AAV capsid, one may designate one or more conserved regions as an ROI, or pre-select one or more regions associated with AAV capsid functional variation (variable regions V, VIII and/or IX), transduction efficiency (variable regions I, II, ΙΙΙ,ΐν, V, VI and/or IX), and antigenic recognition (variable regions I, III, IV, V, VI, VII, VIII, IX,) as an ROI. In another embodiment, an ROI may be selected from receptor binding sites, DNA binding regions, phospholipase activity regions, or other desired domains. However, an ROI may also be pre-selected based upon other considerations.

As used herein, "altered ROIs" and/or "altered subdomains" are sequences in which nucleic acid bases have been changed as compared to the ROI and/or subdomain derived from its respective protein coding sequence. Typically, these changes are introduced by changes in polymerase chain reaction conditions such as are described herein and/or which are know to those of skill in the art. These different PCR conditions may include, e.g., varying the primer (e.g. changing the length of the primer, altering the sequences of the primer), varying the templates (e.g., changing the location of the junction between the ROI), varying the dNTP concentrations, and varying the salt concentrations and/or buffer conditions. However, other methods are may be utilized to introduce these changes, e.g., site-specific mutagenesis. In some places in this specification, changing the sequences of ROIs and/or subdomains is referred to as introducing diversity, which covers introduction of diversity by any means.

Advantageously, the method of the invention permits one to predetermine which ROIs and/or subdomains are unperturbed or unaltered.

The term "altered protein coding sequences" includes protein coding sequences which comprise altered ROIs and/or altered subdomains as defined in the prior paragraph. "Altered protein coding sequences" also include sequences which are hybrids or chimera formed by recombining subdomains from at least a second protein coding sequence with those of a first protein coding sequence. Such "altered protein coding sequences" may contain subdomains which are derived from altered ROIs and/or altered subdomains from the same or a different protein coding sequence source and/or and unaltered sequences from at least one other protein coding sequence source. Thus, altered protein coding sequences may encode chimeric or hybrid proteins, optionally with altered ROI(s) and/or altered subdomain(s). "Altered proteins" are the products encoded by the "altered protein coding sequences" as defined herein. Such altered proteins may include proteins, polypeptides, enzymes, viral capsids, viral capsid proteins, viral envelopes, viral envelope proteins, polypeptides, or other products.

The term "unperturbed" or "conserved" ROI and/or subdomains refers to sequences in which nucleic acid bases remain unchanged as to the ROI and/or subdomain derived from its respective protein coding sequence during the amplification process. In one embodiment, this is accomplished through use of a proof-reading enzyme during amplification. For example, the use of a proofreading enzyme such as the bacteriophage phi 29 enzyme has been described. Other suitable enzymes may include Phusion™ polymerase from NEB or (ultra)pfu enzymes from Stratagene.

Additionally, commercial kits are available including, e.g., Advantage® cDNA PCR Kits [Clontech, an enzyme blend consisting of an N-terminal deletion mutant of Taq DNA Polymerase, a proofreading enzyme, and TaqStart® Antibody],

For a general discussion of polymerase chain reaction conditions, see generally, PCR primer: a Laboratory Manual, 2d ed., ed., Carl W. Dieffenbach, Gabriela S.

Dveksler, Cold Spring Harbor Press, Cold Spring Harbor, NY (2003) and Molecular Cloning: A Laboratory Manual, 3d Ed., Vol. 2, ed., J Sambrook and DW Russell, Cold Spring Harbor Press, Cold Spring Harbor, NY (2001 ).

The method of the invention may be used to generate a combinatorial library containing altered protein coding sequences. Suitably, the resulting library of altered protein coding sequences can be used to general a combinatorial library of the proteins encoded by the altered coding sequences using expression methods known to those of skill in the art. In another embodiment, the library of altered protein coding sequences is used to general a viral library. In this embodiment, the viral capsid and/or envelope may be the product of the altered protein coding sequences. Alternatively, another viral element may be altered protein coding sequence or the product thereof. Suitable techniques for generation of these protein and/or viral libraries from the combinatorial library prepared by the method of the invention are known to those of skill in the art. For example, after establishing a plasmid library, an AAV capsid library may be generated in two additional steps [I. Virella-Lowell, et l, J Gene Med l: 842-850], e.g., by partially packaging the library containing recombinant Cap genes into AAV2 capsids by cotransfection of a suitable plasmid along with the AAV plasmid library and infecting the resulting AAV library into suitable host cells {e.g., HE 293 cells), followed by superinfection with adenovirus (Ad) dl309, thus ensuring production of chimeric AAV capsids packaging the corresponding chimeric AAV genome. See, e.g., JT Koerber et al, Nature Protocols, 1, 701-706 (2006) and, more generally, Sambrook et al, cited above.

In another embodiment, such an altered protein coding sequence is a protein coding sequence in which the original sequences of the ROI are conserved (unchanged), but in which ROI from different protein coding sequence sources are combined. Thus, a library comprising an adeno-associated virus (AAV) capsid coding sequence made be generated through the method of the invention, involving the use of two or more different AAV capsid proteins. For example, such a library may be generated which retains the conserved regions of the capsid sequences of at least a first AAV and at least a second, different, AAV, such that the library contains various combinations of the hypervariable regions of each of the first and second AAVs superimposed on the unchanged conserved regions of the first and second AAVs.

At least one of the subdomains contains a first ROI and at least one of the subdomains contains a second ROI. Primers are designed to specifically amplify these subdomains in a manner which permits the subdomains following treatment as described herein to be directionally and positionally assembled into a combinatorial library of the full-length protein coding sequence which contains altered sequences.

Each of the subdomains is amplified with at least one primer set which consists of a right primer and a left primer. The primers are designed so that they are complimentary for the junctions between subdomains of the protein coding sequence which are contiguous with one another (both at the 5' and 3' end of each subdomain). Thus, a given primer set is 5' identical to the subdomain located 5' to a selected subdomain and 3' identical to the subdomain located 3' to this selected subdomain.

This permits amplification of the entire protein coding sequence in order to permit directional and positional assembly of the resulting amplified regions. Generally, the primers are independently selected from a length of about 1 5 to about 45 base pairs, about 20 to about 40 base pairs, about 25 to about 35 base pairs, or 30 base pairs. In one embodiment, a primer has complementarity at the 5' end of its subdomain template of 100% complementary over at least 8 base pairs. In another embodiment, a primer has 100% complementarity at the 3' end of its template (subdomain) over at least 15 base pairs.

Predetermined subsets of the amplified subdomains are subject to different amplification conditions which specifically generate unperturbed subdomains or which specifically generate altered subdomains. Suitably, the ROIs of interest are preselected to be treated under different conditions prior to pooling for assembling into a full-length altered protein coding sequence. For example, one or more ROIs may be preselected to have diversity introduced by altering their nucleotide sequences and one or more different ROIs may be preselected to remain conserved (i.e., no nucleotide sequences altered). Of the ROIs to be modified, each may be independently modified using different techniques. Thus, the method of the invention permits multiple ROIs to be treated in parallel by different treatments (e.g., introduce diversity or retain sequences without changes). Thereafter, the amplified altered or unperturbed subdomains are pooled and assembled in a directional and positional manner. Typically this is achieved using splicing-by-overlap-extension (SOE) PCR [Horton, R.M., et al, Gene 77, 61-68 (1989)] to generate a combinatorial library which comprise the chimeric/hybrid and/or altered sequences. In one embodiment, the amplified subdomains, termed alternatively herein building blocks, are assembled in a directional manner through the use of the 3 '-5 ' exonuclease activity of poxvirus DNA polymerase. This enzyme is available commercially in kits, including, e.g., In-Fusion™ cloning kit [Clontech ; Choo-Choo™ kit; ClonEZ® PCR cloning kit [GenScripfJ, FAST SEAMLESS PCR™ Cloning Kit [DoGene] . Without wishing to be bound by theory, it is believed that this method works by incubating linear duplex DNAs with homologous ends in the presence of Mg²⁺ and low concentrations of dNTP, the 3'-5 ' proofreading activity of poxvirus DNA polymerase progressively removing nucleotides from the V end. This exposes complementary regions on the substrate DNAs that can spontaneously anneal through base pairing, resulting in seamless fusions.

Because of the design of the primers for the subdomains, the amplified subdomains will be assembled in the proper order and direction. Alternatively, one of skill in the art may select another technique which permits directional and positional assembly of the subdomains to form a full-length, altered, protein coding sequence. The SOE oligonucleotide PCR primers are the primers of the extreme 3' and 3' ends of the final assembled altered protein coding sequence. Such primers may be of the size described above for PCR primers and readily determined by one of skill in the art.

In one embodiment, with reference to Figure 1 , a two step method for generating diversity within delineated domains of protein coding sequence (CDS) is provided. In this illustration, two regions of interest (ROIs) are defined within a CDS. The ROI are delineated by primers according to the method in a manner that all eventual permutations of the ROI universally link up with the flanking regions (FRs). Diversity is generated by any number of methods in any number of combinations, e.g., one ROI be amplified from different templates without or without different primers or pools of primers whereas another ROI undergoes a DNA polymerization or ligation step that introduces variation through shuffling or mutagenesis.

For example, the oligonucleotide primers

are used to amplify the subdomain containing the region of interest in a manner that diversity is introduced through the use of different combinations of primers, through varying or providing a mixture of the template and/or modifying the PCR conditions (e.g. in shuffling or error- prone conditions). In one embodiment, complementary primers to P|_eft and P_ri_gh_t are used in separate reactions and in combination with primers further outside (P_outsideieft and Poutsideright, respectively) the region of interest to amplify the subdomains containing the flanking regions (Fig 1). Flanking regions are combined with regions of interest in a final PCR reaction with all the internal components that one intends to combine in a combinatorial fashion as well as P_ou«ideieft and Pomsideright- Subsequently, the entire protein coding sequence is amplified by SOE PC or another cloning process that then splices all individual components in a correct order due to the complementarity of the design of primers used in the first steps to amplify the subdomains. These full length libraries can then be screened and selected in a functional assay in order to identify hybrids and/or mutants with desirable properties.

In one approach, combinatorial libraries maintaining the constant (C) domains of one serotype will be produced while generating all permutations of embedded hypervariable (H) domains. The anticipated complexity of this library is 2⁸ or 256. Another strategy would allow both C and H domains of either capsid origin to recombine (2¹⁶ or 65536 permutations). Both complexities fall well within the size limitation of bacterial libraries (~10⁷) and will provide increased functional diversity as compared to random approaches. Optionally, error-prone PCR may be performed to add further diversity, in which a limited number of mutations may provide compensatory structural modifications to rescue vector function of phenotype. Alternatively, known structural defects (e.g., singleton mutations) can be identified and corrected through a rational design intervention. [LH Vandenberghe, et ah, Gene Therapy, Advanced on-line publication, 3 September 2009; WO 2006/110689 and US Published Patent Publication No. 2009/0197338].

Prior art methods are not as refined as the above described method in that they generate random diversity throughout a coding sequence. The method of the invention allows structural and functional knowledge about the coding sequence to play into the combinatorial approach. The net effect of this refinement is that functional diversity is generated more efficiently. For example, when diversity is generated across a coding region with other methods, this diversity will be introduced irrespective of whether that domain is a conserved or a variable domain. Often, conserved domain mutants are a priori defective. In this method, the diversification can be focused on the variable domain. In addition, this can be done for multiple cis domains in a simultaneous manner to increase the combinatorial complexity that may allow for additive or synergistic functions to emerge or compensatory modifications to be incorporated. Advantageously, the method of the invention permits one to control undesired sequence modifications within a selected subdomain (e.g., a constant region), permits reassembly of the various subdomains in the proper direction such that a full-length open reading frame is obtained, and permits control of subdomains such that a selected subdomain {e.g., a constant region or a hypervariable domain) is not. For example, in one embodiment, a library of AAV capsid proteins (e.g., vpl or a full-length capsid, vp2, or vp3) or AAV particles may be desirable. In another embodiment, a library of AAV rep proteins may be desired.

In one embodiment, the method of the invention permits one to generate and select novel, tailor-made AAV gene transfer vectors with desirable properties for clinical or commercial use. For example, the method of the invention is useful for generating combinatorial diversity in the hypervariable and/ or conserved regions of adeno- associated virus protein, a vector for gene therapy in order to select and identify novel vector with desirable properties. For example, one may wish to screen for an AAV library with a particular tropism of one AAV {e.g., hepatotropism) and with the low susceptibility to antibody mediated neutralization in humans.

The crystal structures of a number of AAV, including AAV 1, 2, 4, 5, 6, 7, 8 and AAV9 , have been determined [Q. Xie, et al, Proc Natl Acad Sci U S A. 2002 August 6; 99(16): 10405-10410 (AAV2 crystal structure); M. Mitchell et al, Acta Crystallogr Sect F Struct Biol Cryst Commun, 2009 Jul 1 ; 65 (Pt 7): 715-718, Epub 2009 Jun 27

(preliminary studies AAV9 structure) Quesada, O,, et al. Acta Crystallogr Sect F Struct Biol Cryst Commun 63, 1073-1076 (2007) (AAV7 crystal structure); Nam, H.J, et al., J Virol 81, 12260-12271 (2007) (AAV8 crystal structure); Miller, E.B., et al. Acta Crystallogr Sect F Struct Biol Cryst Commun 62, 1271-1274 (2006) (AAV1 crystal structure). Padron, E., et al., J Virol 79, 5047-5058 (2005) (AAV4 crystal structure); Walters, R.W., et al, J Virol 78, 3361-3371 (2004) (AAV5 crystal structure) by cryo- electron microscopy and/or X-ray crystallography. The comparative analysis of the crystal structures of AAV capsids shows that common variable capsid surface regions are associated with AAV capsid functional variation, such as receptor recognition (variable regions V, VIII, and IX), transduction efficiency (variable regions I, II-IV, V, VI, and IX), and antigenic recognition (variable regions I, III, IV, V, VI, VII, VIII, IX). Using this information, one can readily identify an ROI either to be changed or not changed within a selected AAV. For example, for an AAV which has no crystal structure available, one can predict variable or constant regions which may be desirable ROIs by aligning the sequence of the AAVs and taking into consideration the information available on the three-dimensional structures of other AAV proteins.

Similar methods can be applied to other proteins and their coding sequences by taking into consideration the secondary and/or tertiary structure of the protein, a target domain (e.g., a binding site or epitope), and/or another functional domain.

Alignments are performed using any of a variety of publicly or commercially available Multiple Sequence Alignment Programs. Examples of such programs include, "Clustal W", "CAP Sequence Assembly", "MAP", and "MEME", which are accessible through Web Servers on the internet. Other sources for such programs are known to those of skill in the art. Alternatively, Vector NTI utilities are also used. There are also a number of algorithms known in the art that can be used to measure nucleotide sequence identity, including those contained in the programs described above. As another example, polynucleotide sequences can be compared using Fasta™, a program in GCG Version 6.1. Fasta™ provides alignments and percent sequence identity of the regions of the best overlap between the query and search sequences. For instance, percent sequence identity between nucleic acid sequences can be determined using Fasta™ with its default parameters (a word size of 6 and the MOP AM factor for the scoring matrix) as provided in GCG Version 6.1, herein incorporated by reference. Multiple sequence alignment programs are also available for amino acid sequences, e.g., the "Clustal X", "MAP", "PIMA", "MSA", "BLOCKMAKER", "MEME", and "Match-Box" programs. Generally, any of these programs are used at default settings, although one of skill in the art can alter these settings as needed. Alternatively, one of skill in the art can utilize another algorithm or computer program which provides at least the level of identity or alignment as that provided by the referenced algorithms and programs. See, e.g., J. D. Thomson et al, Nucl. Acids. Res., "A comprehensive comparison of multiple sequence alignments", 27(13):2682-2690 (1999).

An algorithm developed to determine areas of sequence divergence in AAV2 has yielded 12 hypervariable regions (HVR) of which 5 overlap or are part of the four variable regions. [Chiorini et al, J. Virol, 73:1309-19 (1999); Rutledge et al, J. Virol, 72:309-319] Using this algorithm and/or the alignment techniques described herein, the HVR of a selected AAV serotypes can be determined. For example, the HVR are located as follows: HVR1, aa 146-152; HVR2, aa 182-186; HVR3, aa 262-264; HVR4, aa 381-383; HVR5, aa 450-474; HVR6, aa 490-495; HVR7, aa 500-504; HVR8, aa 514- 522; HVR9, aa 534-555; HVR10, aa 581-594; HVR1 1, aa 658-667; and HVR12, aa 705-719 [the numbering system is based on an alignment which uses the AAV2 vpl as a point of reference]. Using an the alignment prepared as described herein, e.g., using the Clustai X program at default settings, or using other commercially or publicly available alignment programs at default settings such as are described herein, one of skill in the art can readily determine corresponding fragments of the novel protein coding sequences and novel proteins (e.g., AAV capsids) of the invention.

Still other protein coding sequences can be readily used in the method of the invention by one of skill in the art, without limitation to those discussed or illustrated herein.

The following examples are illustrative only and do not limit the present invention. Example 1 - Engineering of Functional Chimera of AAV8 and rh32.33

AAV8 and rh32.33 are structurally and phylogenetically two of the most distinct primate AAVs. See, e.g., the dendogram of Fig. 2A. Fig. 2B provides various quantitative measures of AAV8 and rh32.33 neutralization by human serum including intravenous immune globulin (IV1G) neutralization in the in vitro neutralizing antibody (NAB) assay (titer represents dilution of 50% transduction inhibition), estimate of seroprevalence in 888 individuals in worldwide populations (Europe, US, Africa and Australia) and lowest IVIG dose at which statistically significant in vivo transduction inhibition is observed at a dose of 10^M genome copies (GC) per mouse.

The goal of the following experiment is to illustration combination of functional determinants of AAV8 and rh32.33 capsids to obtain a chimera that is both low in seroprevalence and high in liver transduction efficiency. Fig. 3A is a cartoon showing the viral genome backbones for the pAAV and the pAAVivo. Whereas both systems receive BgllT-Spel flanked AAV capsids (here AAV8), pAAV yields replication-competent particles due to the presence of Rep in its wild typelike backbone. In contrast pAAVivo is replication-deficient due to the replacement of Rep with 1) CMV for driving capsid expression and 2) a PGK.ZsGreen minigene for diagnostic follow up and enrichment of transduced cells. CMV driven cap expression permits in vivo applications of the directed evolution system by rescue of genomes on an RNA level.

Briefly, a first cycle PCR makes use of primers designed by a bioinformatics algorithm, and amplifies the hypervariable (H) and constant (C) domains on the AAV capsid structure. For example, the ordered primers for combinatorial shuffling of AAV32.33 and AAV8 are listed below. The primers are named using the following convention (X.Y#Pd), where X.Y defines the transition at the junction with X being name of the protein of the left fragment (5' on dsDNA) and Y the right fragment (3' on dsDNA); P for position defines the position of the junction on the defined domain i.e. here the start (S) or the end (E) of that domain; and d for direction defines the direction of the primer, i.e., sense (s) or antisense (as). For example, "32.8IIEas" is the antisense primer (as) that is located at the end (E) of hypervariable domain II (II) and defines a junction of rh32.33 (5') and AAV 8 (3') (32.8) of the final dsDNA molecule.

SEQ

Primer SEO Names ID

aacaaccacctctacctgcggctcggaaca 8.32ISs 1

aacaaccacttgtacaagcaaatctccaac 32.8ISs 2

tgttccgagccgcaggtagaggtggttgtt 8.32ISas 3

Gttggagatttgcttgtacaagtggttgtt 32.8ISas 4

acctacttcggctactccaccccctgggga 8.32IES 5

acctacaacggattcagcaccccctggggg 32.8IEs 6

tccccagggggtggagtagccgaagtaggt 8.32IEas 7

cccccagggggtgctgaatccgttgtaggt 32.8IEas 8

Gtcaaggaggtcacgacgtcgaacggc 8.32I1SS 9

Gttaaggaggtcacacagaatgaaggc 32.8IISS 10

Gccgttcgacgtcgtgacctccttgac 8.32IISas n

Gccttcattctgtgtgacctccttaac 32.8IISas 12

Accaagaccatcgctaataaccttacc 8.32IIES 13

Gagactacggtcgccaataacctcacc 32.8IIES 14 SEP

Primer SEP Names ID NP

Ggtaaggttattagcgatggtcttggt 8.32IIEas 15

Ggtgaggttattggcgaccgtagtctc 32.81IEas 16 ccccagtacggctactgtggcattgtgact 8.32IIISS 17 cctcaatatggctacctaacactcaacaac 32.8IIISS 18 agtcacaatgccacagtagccgtactgggg 8.32IIISas 19

Gttgttgagtgttaggtagccatattgagg 32.81IlSas 20

Gtgggacgctcctccttctactgcctgga 8.32IIIES 21

Acggacagaaatgctttctactgcctgga 32.8IIIES 22 tccaggcagtagaaggaggagcgtcccac 8.32IIIEas 23

Tccaggcagtagaaagcatttctgtccgt 32.8IIIEas 24 attgaccagtacctgtggcacttacagtcg 8.32IVSs 25

Ctggaccagtacctgtactacttgtctcgg 32.8IVSs 26 cgactgtaagtgccacaggtactggtcaat 8.32IVSas 27 ccgagacaagtagtacaggtactggtccag 32.8IVSas 28 atggccaatcaggcaaagaactggctgcc 8.32IVES 29

Tttgccttttacagaaagaactggctgcc 32.8IVES 30

Ggcagccagttctttgcctgattggccat 8.32IVEas 31

Ggcagccagttctttctgtaaaaggcaaa 32.8IVEas 32 tggctgccaggaccctgtgttaaacagcag 8.32VSs 33 tggctgcctgggccttgttaccgccaacaa 32.8VSs 34 ctgctgtttaacacagggtcctggcagcca 8.32VSas 35 ttgttggcggtaacaaggcccaggcagcca 32.8VSas 36 gggaccaaataccatttaaacaaccgctgg 8.32VES 37 gacacccactataccctgaatggaagaaat 32.8VEs 38

Ccagcggttgtttaaatggtatttggtccc 8.32VEas 39

Atttcttccattcagggtatagtgggtgtc 32.8VEas 40 atcgctatggcaacagctggaccttcagat 8.32VI+VI1SS 41 cctccaatggcaacacacaaagacgacgag 32.8VI+VIISS 42 atctgaaggtccagctgttgccatagcgat 8.32Vl+VllSas 43

Ctcgtcgtctttgtgtgttgccattggagg 32.8VI+VHSas 44 agcgatgtcatgctcacatcagaagaagaa 8.32VI+VIIES 45 aacaatctgttgtttaccagcgaggaagaa 32.8VI+VIIES 46

Ttcttcttctgatgtgagcatgacatcgct 8.32VI+VIIEas 47

Ttcttcctcgctggtaaacaacagattgtt 32.8VI+VlIEas 48 acagaggaatacggtcagattgctgacaat 8.32VIIISS 49 acggacatgtttggtatcgtggcagataac 32.8VIIISS 50

Attgtcagcaatctgaccgtattcctctgt 8.32VIHSas 51 gttatctgccacgataccaaacatgtccgt 32.8VlIISas 52 agccagggggccttacctggcatggtgtgg 8.32VIIIES 53 gctatgggagtgcttcccggtatggtctgg 32.8VIIIES 54 ccacaccatgccaggtaaggccccctggct 8.32VIIIEas 55 ccagaccataccgggaagcactcccatagc 32.8VIIIEas 56 tacacctccaactacgggaaccagtcttct 8.32IXSs 57 SEP

Primer SEP Names ID NP

Tttacttcaaactattacaaatctacaagt 32.8IXSS 58 agaagactggttcccgtagttggaggtgta 8.32IXSas 59

Acttgtagatttgtaatagtttgaagtaaa 32.8IXSas 60 aacaaccacctctacaagcaaatctccaac 8.8ISs 61 acctacttcggctacagcaccccctggggg 8.81Es 62

Gtcaaggaggtcacgcagaatgaaggc 8.81ISs 63

Accaagaccatcgccaataacctcacc 8.8IIES 64 ccccagtacggctacctaacactcaacaac 8.8I1ISS 65

Gtgggacgctcctccttctactgcctgga 8.8IIIES 66

Attgaccagtacctgtactacttgtctcgg 8.8IVSs 67 atggccaatcaggcaaagaactggctgcc 8.8IVES 68 tggctgccaggaccctgttaccgccaacaa 8.8VSs 69 gggaccaaataccatctgaatggaagaaat 8.8VEs 70 atcgctatggcaacacacaaagacgacgag 8.8VI+VIISS 71 agcgatgtcatgctcaccagcgaggaagaa 8.8VI+VIIES 72 acagaggaatacggtatcgtggcagataac 8.8VIIISS 73 agccagggggccttacccggtatggtctgg 8.8VIIIES 74

Tacacctccaactactacaaatctacaagt 8.8IXSs 75 gttaatacagaaggcgtgtactctgaaccc 8.8IXEs 76 aacaaccacttgtacctgcggctcggaaca 32.321Ss 77 acctacaacggattctccaccccctgggga 32.321ES 78

Gttaaggaggtcacaacgtcgaacggc 32.32IISS 79

Gagactacggtcgctaataaccttacc 32.32IIES 80

C ctcaatatggctactgtggcattgtgact 32.321IISS 81

Acggacagaaatgctttctactgcctgga 32.32IIIES 82 ctggaccagtacctgtggcacttacagtcg 32.32IVSs 83

Tttgccttttacagaaagaactggctgcc 32.32IVES 84 tggctgcctgggccttgtgttaaacagcag 32.32VSs 85 gacacccactataccttaaacaaccgctgg 32.32VEs 86 cctccaatggcaacagctggaccttcagat 32.32VI+VIISS 87

Aacaatctgttgtttacatcagaagaagaa 32.32VI+VIIEs 88 acggacatgtttggtcagattgctgacaat 32.32VIIISS 89 gctatgggagtgcttcctggcatggtgtgg 32.32VIIIES 90

Tttacttcaaactatgggaaccagtcttct 32.32IXSs 91 cccgatacaactgggaagtatacagagccg 32.32IXES 92

Gttggagatttgcttgtagaggtggttgtt 8.8ISas 93 cccccagggggtgctgtagccgaagtaggt 8.8IEas 94

Gccttcattctgcgtgacctccttgac 8.8IISas 95

Ggtgaggttattggcgatggtcttggt 8.8IIEas 96 gttgttgagtgttaggtagccgtactgggg 8.8IIISas 97 tccaggcagtagaaggaggagcgtcccac 8.81IIEas 98 ccgagacaagtagtacaggtactggtcaat 8.81VSas 99

Ggcagccagttctttgcctgattggccat 8.81VEas 100 SEO

Primer SEO Names ID NO

ttgttggcggtaacagggtcctggcagcca 8.8VSas 101

Atttcttccattcagatggtatttggtccc 8.8VEas 102

Ctcgtcgtctttgtgtgttgccatagcgat 8.8VI+VIlSas 103

Ttcttcctcgctggtgagcatgacatcgct 8.8VI+VIIEas 104

Gttatctgccacgataccgtattcctctgt 8.8VIIISas 105

ccagaccataccgggtaaggccccctggct 8.8VIHEas 106

Acttgtagatttgtagtagttggaggtgta 8.8IXSas 107

Gggttcagagtacacgccttctgtattaac 8.8IXEas 108

tgttccgagccgcaggtacaagtggttgtt 32.32ISas 109

tccccagggggtggagaatccgttgtaggt 32.32IEas 110

Gccgttcgacgttgtgacctccttaac 32.32IISas 111

Ggtaaggttattagcgaccgtagtctc 32.321IEas 1 12

agtcacaatgccacagtagccatattgagg 32.32IIISas 1 13

Tccaggcagtagaaagcatttctgtccgt 32.32IIIEas 1 14

cgactgtaagtgccacaggtactggtccag 32.32IVSas 115

Ggcagccagttctttctgtaaaaggcaaa 32.32IVEas 1 16

ctgctgtttaacacaaggcccaggcagcca 32.32VSas 1 17

Ccagcggttgtttaaggtatagtgggtgtc 32.32VEas 1 18

atctgaaggtccagctgttgccattggagg 32.32Vl+VlISas 119

Ttcttcttctgatgtaaacaacagattgtt 32.32VI+VIIEas 120

attgtcagcaatctgaccaaacatgtccgt 32.32VIIISas 121

ccacaccatgccaggaagcactcccatagc 32.32VIIIEas 122

agaagactggttcccatagtttgaagtaaa 32.32IXSas 123

C ggctctgtatacttcccagttgtatcggg 32.32IXEas 124

gttaatacagaaggcaagtatacagagccg 8.321XES 125

cccgatacaactggggtgtactctgaaccc 32.8IXEs 126

Cggctctgtatacttgccttctgtattaac 8.32IXEas 127

gggttcagagtacaccccagttgtatcggg 32.8IXEas 128

The SOE oligonucleotide PCR primers in the first reaction are mixtures of primers with partial complimentarity to both AAV8 and AAVrh32.33 in their 5^! and 3' extremities (Fig. 3C). In one design, these 16 fragments, ranging in sizes from -50 to 800bp, are amplified from AAV8 and rh32.33 and purified by electrophoresis. In a second round of PCR, final combinatorial domain diversification is accomplished by adding those fragments in a single reaction that aims at stitching them together in a directed but combinatorial fashion. Fig. 3D. A key requirement for making this approach work is that these SOE oligonucleotide PCR primers in the first reaction are in fact mixtures of primers with partial complimentarity to both AAV8 and AAVrh32.33 in their 5' and 3' extremities. The PCR progeny of the initial combinatorial domain diversification building blocks therefore consist of 4 slightly distinct H or C fragments for which the overlapping regions permit assembly with either an AAV8 or an

AAVrh32.33 component at every junction.

Preliminary experiments have yielded a small library consisting of several hybrid open reading frames (ORFs) with multiple rh32.33 and AAV8 recombination junctions. Some of these hybrids were produced and yielded titers of functional virus. In addition, as a proof of concept, capsid ORFs that combine all C domains of AAV8 with all H of rh32.33, and vice versa have been successfully generated and confirmed by sequencing.

The resulting hybrid ORF library can be used to generate a library of AAVs with the hybrid capsids. This may be accomplished by cloning the hybrid ORFs into a suitable plasmid backbone pAAVivo (e.g., by cloning Spel into) and generating vectors using conventional methods. Optionally, the orfs can be cloned into the Bglll-Spel site of pAAVivo and subject to in vivo selection in an IVIG-passive transfer C57B16 mouse model. It is anticipated that capsids rescued by RT-PCR in this system will have undergone purifying selection for both liver transduction as well as lack of IVIG neutralization.

Example 2 - Construction of Combinatorial Library Based upon Diversified AAV8 Region of Interest

A diversified library of AAV8 capsids was prepared by diversifying the hypervariable region 8 (HVR8) [amino acids 583, 588, 594 - 597, based upon the numbering of the AAV 8 vpl, SEQ ID NO: 165]. The resulting capsids were assembled into vectors and assessed for their ability to be recognized by neutralizing antibodies (NABs). Those which avoided NABs were marked for further study.

A. Degenerate PCR

A collection of AAV8 nucleic acid sequences within the open reading frame for hypervariable region 8 were amplified degenerate PCR primers. Degenerate PCR primers are well suited for introducing controlled variation in this HPV (region of interest). In the present case, the goal is to generate AAV8 capsids having variants in the AAV8 HPV8 region in order to minimize or avoid the effect of neutralizing antibodies. PRIMER SEQ ID

SEQUENCE NAME NO:

GGCTCACGTCTCTGTAGCCACAGGGT 129

A2B2VIII. -1 TAGTGGTT

CGGACACGTCTCGCTACAGAGGAATA 130

A2B2VI11.F CGGTATCGTG

GGCTCACGTCTCGGTAAGGCCCCCTG 131

A2B2VII1.R GCTG

CGGACACGTCTCCTTACCCGGTATGG 132

A2B2VIII.F2 TCTGGCAGAA

133

TGGACCGGCTGATGAATCCT

AV8VP 1.1298.F

134

CGGTGCTGTATTGCGTGATG

AV8VP1.2035.R

CTACAGAGGAATACGGTATCGTGNNK 135

GATAACTTGCAGNNKNN AACACGG CTCCT NKN KN KN KGTCAACAG

A2.VIII.P1-1

CCAGGGGGCCTTAC

The PCR was performed at: 98° 10s, 66° 15s, 30 cycles. B. PCR2

All PCR products from subpart A above were purified with QIAquick™ PCR purification Kit (Qiagen), combined together, digested with BsmBl (New England Bioiabs) and purified again. T4 DNA ligase and its buffer were added into the purified mix and kept at 16°C overnight. The overnight ligation was then purified with

QIAquick™ PCR purification Kit (Qiagen), digested with BsmBI and a fragment of 428bp was extracted with QIAquick™ Gel Extraction Kit (Qiagen). The 428-bp fragment was ligated overnight using T4 DNA ligase at 16°C, with a 6908-bp fragment made by a pAAV2/8 plasmid that had been cut with BsmBI, dephosphorylated and then purified. [The pAAV2/8 plasmid is described in G. Gao, et al, (2002), Proc. Natl. Acad Sci., 99, 1 1854-1 1859; the Xhol site of p5E18 plasmid at 3,169 bp was ablated, and the modified plasmid was restricted th Xbal and Xhol in a complete digestion to remove the AAV2 cap gene and replace it with a 2,267-bp Spel Xhol fragment containing AAV8 cap gene to produce the pAAV2/8 packaging construct, which retains AAV2 ITRs.] One of the ligation product of the 428 fragment and the pAAV2/8 fragment was used as the template for the following PCR.

Using the reagents in the following table, the PCR was performed at 98° 10s 72°

60s 25cycles; 72° 7 minutes.

C. Cloning into plasmid

A fragment of ~2.2kb was extracted with the electrophoresis of the PCR product, digested with Aarl (Fermentas) and Spel (NEW ENGLAND BIOLABS), and purified. The purified fragment was ligated overnight by T4 DNA ligase at 16°C, with a 5478-bp fragment made by double digestion of pAAVivo plasmid, dephosphorylated and then purified. The ligation was transformed into Escherichia coli Stbl4 (Invitrogen). The transformation was transferred to TB medium (with 60 μg mL of carbenicillin), cultured overnight and the plasmid was extracted with EndoFree Plasmid Purification Mega Kit (Qiagen), This was the final plasmid for the mutagenesis library for AAV8 VP1 (amino acid 583, 588, 589, 594, 595, 596, and 597, VPl numbering, SEQ ID NO: 165).

D. Small-scale packaging of the AA V mutagenesis library

Triple transfection protocol was used to make the library.

0.83μg of the plasmid of the library was mixed with helper plasmid (AF6) 1 .66 another helper plasmid (pRep) 0.83μg and CaCl₂ (2.5M) 8.33μί. The final volume was 83.3uL by adding water. The mixture was then quickly mixed with an equal volume of 2x HEPES-buffered saline (HBS), and the total mixture was applied to 293 cells. Three days later, the cells were harvested and lysated by 3 times of freeze/thaw, spun down at 13,000 rpm, 1 Ornin at room temperature; Benzonase(Merck) was then added to the supernatant (final concentration: 41.7 U/mL) and incubated at 37°C for 30 min. The final product of AAV library was stored at -20°C.

Nineteen (19) unique nucleic acid sequences were recovered, which correspond to 12 unique amino acid sequences. Unique amino acid sequences are shown in table below.

Sequences in the Table below are listed in the order of homology to AAV8, with cDNA.9 being the most similar and cDNA.6 being the least similar. Based on the primary alignment, the sequences cluster in 4 groups (Gl: cDNA.9, gDNA.5 and gDNA.7; G2: cDNA.2, gDNA.2 and cDNA.3; G3: cDNA.8, cDNA.10 and gDNA.3; G4: gD A.l O, cD A.I and cDNA.6). * The mutation in this clone is outside the targeted area therefore testing for this clone was terminated.

Overall, for all targeted positions within HVR8 significant diversification was achieved. See Table below, Variants generated via the library approach of the invention (double underlined) are different from the natural variants. Some of the naturally occurring amino acids were not represented within the library at all (boxed). Variants generated via library approach are different from the natural variants. Some of the naturally occurring amino acids were not represented within the library at all (boxed). The majority of the library substitutions (double-underlined) do not occur naturally.

E. In vitro Selection

1 x 10⁹ genome copies (GC) of the AAV library was mixed with 0.5μΙ. of ADK.8 (AAV8 Nab: 1 :2560), incubated at 37°C for 37min, then applied to the 293 cells. Two days later, the cell culture was split at a ratio of 1 :5. Two days later, the cells were transfected with the plasmid AF6 and pRep (for 1 well of a 6-well plate, 2.49 of AF6 and 0.83 g of pRep). Two days later, the cells were harvested; RNA and genomic DNA were extracted from the cell. RNA was converted to cDNA with High Capacity cDNA Reverse Transcription Kit (Applied Biosystems) with the random primer provided by the kit. The sequences of the target region of AAV were retrieved by the following PCR.

The PCR conditions were as follows: 94° 30s, 61° 30s 72° 45 40cycles.

F. Packaging of the individual mutants

I, Small - Scale

The PCR product was purified with QIAquick™ Gel Extraction

Kit and inserted into the pCR4-Topo plasmid (Invitrogen) according to the

manufacture's manual. The resulting plasmids were treated with BsmBI and the 428-bp fragments were isolated and inserted back to the pAAV2/8 backbone using techniques known to one of ordinary skill in the art.

2. Large-Scale

The pAAV2/8 plasmids carrying the inserts were submitted for sequencing and then used as the trans-plasmid in the following transfection to package individual AAV vector. The packaging of the individual AAV vector was essentially as described in A above, with the pAAV2/8 plasmids carrying the inserts replacing the library plasmid and pAAV.CMV.EGFP replacing the pRep plasmid.

G. In vitro Neutralizing Antibody Assay

1 x 10⁹ GC of each AAV mutant was mixed with different monoclonal antibodies(ADK8, [Nab]_AAV8=l :2560, 0.5uL/well; ADK8/9,

No Ab: medium), made up to lOOuL with media, incubated at 37° for 30 minutes and then applied to 293 cells(5 x 10⁴ cells/well seeded one day before infection in a 96-well plate). GFP expression was observed with microscopy photograph and semi -quantified with Image J. The recognition epitope for the ADK8 neutralizing antibody is LQQQNT. The majority of the library substitutions do not occur naturally, yet result in a packageable and infective virus that escapes recognition by the anti-AAV8 antibody. These results are illustrated in Figure 4. Of notable interest is cDNA2 (cDNA42), which has considerably higher level of infectivity, and incorporates most unexpected amino acid substitutions, specifically, R and W in positions 589 and 595,

All publications cited in this specification are incorporated herein by reference. It will be appreciated that modifications can be made from the compositions and methods described herein without departing from the spirit of the invention embodied in the claims. Such modifications are intended to fall within the scope of the appended claims.

Claims

CLAIMS:

1. A method for the directed production of a combinatorial library of an altered protein coding sequence, the method comprising:

(a) predetermining at least a first region of interest (ROI) and at least a second ROI within a protein coding sequence, and providing at least a first primer set which is specific for at least a first subdomain containing the at least first ROI and at [east a second primer set which is specific for at least second subdomain containing the at least second ROI, wherein the subdomains which comprise the at least first subdomain and the at least second subdomain are nucleic acid fragments which span the full-length of the protein coding sequence;

(b) specifically amplifying each of the subdomains with a series of primer sets which comprises the at least first primer set and the at least second primer set, wherein each of the primer sets each consist of a right primer and a left primer which are complementary to a junction located between consecutive fragments in the protein coding sequence, wherein a predetermined subset of the amplified subdomains containing the at least first ROI and/or the at least second ROI are subject to different conditions in order to generate subdomains with altered sequences in the at least first ROI and/or the at least second ROI;

(c) directionally and positionally assembling the amplified subdomains of (b) to generate a combinatorial library of full-length protein coding sequences which comprise the altered first ROI sequences and/or the altered second ROI sequences.

2. The method according to claim 1 , wherein (b) further comprises separating one or more preselected amplified subdomains and subjecting the preselected subdomains to different polymerase chain reaction conditions in order to generate sequence diversity in the preselected subdomains.

3. The method according to claim 1 , wherein (b) further comprises amplifying a subset of the subdomains containing the first ROI and/or the second ROI under conditions which prevents sequence alterations.

4. The method according to claim 1, wherein primer sets are provided for at least two different subdomatns which comprise the at least first ROI, such that the primer sets amplify subdomains which are different templates in the protein coding sequence which comprise the at least first ROI.

5. A method for the directed production of a combinatorial library of an altered adeno-associated virus sequence, the method comprising:

(a) providing a full length AAV gene sequence in which at least two regions of interest (ROI) have been identified;

(b) providing at least a first primer set which specifically amplifies one of the at least two ROI, wherein said first primer set consists of a right primer (PRI) and left primer (PLI), wherein one of the P_R or PL has 5 ' complementarity to nucleotide sequences in the sequence the first ROI and the other has 3' complementary to the first ROI;

(c) providing at least a second primer set which specifically amplifies a second of at least two ROI, wherein said second primer set consists of a right primer (PR2) and left primer (PL2), wherein one of the PR2 or PL₂ has 5 ' complementarity to nucleotide sequences in the sequence flanking the second ROI and the other has 3' complementary to the second ROI,

(d) generating a series of building blocks which correspond to subdomains within the full-length gene sequence which subdomains when assembled span the full length AAV gene sequence;

wherein a subset of the building blocks contain altered sequences from the at least one of the first or second ROI to form a plurality of diverse building blocks corresponding to the first or second ROI, and (e) directionally and positional!y assembling the building blocks to generate a combinatorial library of full-length AAV genes, each of which contains the at least one ROI with altered sequences.

6. The method according to claim 5, wherein the subset of building blocks having altered sequences are generated by amplifying one or more of the subdomains under different polymerase chain reaction conditions.

7. The method according to claim 6, wherein at least two primers sets are provided which specifically amplifies the sequence comprising the first ROI, wherein the at least two primer sets comprise P_Ri and P_Lt and a second primer set having a right PR and left primer P_Li_a which differ in length from PRI and P_U -

8. The method according to claim 5, wherein at least one of the subdomains is amplified under conditions which conserves the sequence of the at least one subdomain such that sequence changes are not introduced.

9. The method according to claim 5, wherein (e) is performed by splicing by overlap extension (SOE) polymerase chain reaction (PCR).

10. The method according to claim 5, wherein the primer having

complementarity at the 5' end of the first or second ROI is 100% complementary over at least 8 base pairs.

1 1 . The method according to claim 5, wherein the primer having

complementarity at the 3' end of the first or second ROI is 100% complementary over at least 1 5 base pairs.

12. The method according to claim 5, wherein both the first ROI and the second ROI are modified by using different polymerase chain reaction conditions.

13. The method according to claim 5, wherein the AAV gene sequence is the AAV vpl capsid sequence.

14. The method according to claim 5, wherein the AAV gene sequence is a AAV rep gene.

15. The method according to claim 5, wherein the first AAV is AAV8 and the second AAV is rh32.33.

16. A method for the generating a combinatorial library of modified adeno- associated virus (AAV) capsids while controlling the number of non-functional AAV, the method comprising:

(a) providing at least a first AAV capsid gene sequence in which at least two regions of interest (ROI_AAV^{L ST}) have been identified and at least a first primer set which specifically amplifies one of the at least two ROI AAv'^st, wherein said first primer set consists of a right primer (PRI) and left primer (Pu), wherein one of the PRI or P_Li has 5' complementarity to nucleotide sequences in the sequence the first ROl AAV^1s1 and the other has 3' complementary to the first ROI AA ^lst and at least a second primer set which specifically amplifies a second of at least two ROl AAV'^s' , wherein said second primer set consists of a right primer (PR2) and left primer (PL₂), wherein one of the PR₂ or P_L2 has 5 ' complementarity to nucleotide sequences in the sequence flanking the second ROl AAV^{1 S1} and the other has 3 ' complementary to the second ROI _AAv'^st,

(b) providing at least a second AAV capsid gene sequence in which at least two regions of interest (ROI AAv^2nd) have been identified, which second AAV sequence differs from the first AAV gene sequence, and at least a third primer set which specifically amplifies one of the at least two ROI AAV²""³, wherein said first primer set consists of a right primer (PR₃) and left primer (P_L3), wherein one of the P S or P_L3 has 5' complementarity to nucleotide sequences in the sequence the first ROI _AAV^2ND and the other has 3' complementary to the first ROI AAV²"^*3 and at least a fourth primer set which specifically amplifies a second of at least two ROI AAV^2ND, wherein said fourth primer set consists of a right primer P 4) and left primer (ΡΙ ), wherein one of the PR₄ or PL₄ has 5' complementarity to nucleotide sequences in the sequence flanking the second ROI AAv^2nd and the other has 3' complementary to the second ROI AAV^{21 1},

(d) generating a series of building blocks which correspond to subdomains within the full-length gene sequence which subdomains when assembled span the full length of the AAV gene;

wherein a subset of the building blocks contain altered sequences to form a plurality of diverse building blocks, and

(e) directionally and positionally assembling the building blocks to generate a combinatorial library of full-length AAV capsids which comprise the building blocks with altered sequences.

17. The method according to claim 16, wherein the subset of building blocks having altered sequences are generated by amplifying one or more of the subdomains under different polymerase chain reaction conditions.

18. The method according to claim 17, wherein at least two primers sets are provided which specifically amplifies the sequence comprising the first ROl, wherein the at least two primer sets comprise Pm and Pu and a second primer set having a right P and left primer P_Li_awhich differ in length from PRI and P .

19. The method according to claim 15, wherein at least one of the subdomains is amplified under conditions which conserves the sequence of the at least one subdomain such that sequence changes are not introduced.

20. The method according to claim 15, wherein (e) is performed by splicing by overlap extension (SOE) polymerase chain reaction (PCR).

21. The method according to claim 15, wherein the primers are approximately 30 nucleotides in length.