WO2008083203A1

WO2008083203A1 - Identification of multivalent viral envelope protein epitopes

Info

Publication number: WO2008083203A1
Application number: PCT/US2007/088910
Authority: WO
Inventors: Xiaofeng Fan; Adrian M. Di Bisceglie
Original assignee: Saint Louis University
Priority date: 2006-12-28
Filing date: 2007-12-27
Publication date: 2008-07-10
Also published as: CA2674244A1; EP2118124A1

Abstract

The present invention provides methods for identifying multivalent vaccine antigens derived from envelope proteins.

Description

DESCRIPTION

Identification of Multivalent Viral Envelope Protein Epitopes

CROSS-REFERENCE TO PRIOR APPLICATIONS

This application claims the benefit of U.S. Provisional Patent Application

Serial No. 60/882,287, filed December 28, 2006, the contents of which is hereby specifically incorporated by reference in its entirety.

BACKGROUND OF THE INVENTION

A. Field of the Invention

The invention is directed to methods of identifying nucleic acid and polypeptide sequences useful in the development of universal multivalent viral vaccines.

B. Description of the Related Art

Many important infectious agents belong to the family of RNA viruses, such as hepatitis C virus (HCV), human immunodeficiency virus (HIV), dengue virus and influenza virus. A common feature shared by these viruses is the great genetic heterogeneity, shown as multiple serotypes, genotypes, subtypes and quasispecies nature even in single infected individual. The genetic heterogeneity has many physiological and pathogenical results, including immune escape and drug resistance. More importantly, it is a major challenge for vaccine development. For a given infectious disease as mentioned above, a long-term seeking goal is to identify or design a single antigen which, upon vaccination, is able to induce broad neutralizing antibodies against all viral isolates. However, such antigens have not been identified in despite considerable effort.

HIV provides an excellent example of this problem. The HIV envelope region is highly diverse and contains multiple potential N-linked glycosylation sites that shield most potential neutralizing epitopes. However, despite this hurdle, several neutralizing monoclonal antibodies, produced from either antibody libraries or transformed B cells, are available against most primary HIV isolates. All recognize conformational epitopes as mapped in 3-D structure of HIV envelope protein, but there is presently no HIV envelope antigen available known to induce such neutralizing antibodies. In animal experiments, HIV envelope proteins do induce neutralizing antibodies, but they are very weak for neutralization, and mostly isolate - or genotype-specific. These observations indicate that the virus contains many neutralizing epitopes with differential antigenicity and immunogenicity.

In addition, while the mechanism for virus neutralization is not completely elucidated, it is well documented that virus neutralization can occur at each step of the virus entry, i.e., attachment, fusion and post-entry. Virus envelope proteins experience conformational alterations at these steps and, as a result, display different neutralizing epitopes. The immune systems, especially on the humoral branch, only response well to epitopes displayed before and perhaps during the attachment but not after. In other words, all active immunization with wild-type or wild-type-like viral envelope proteins only elicits neutralizing antibodies to epitopes displayed before and to a less extent during the viral attachment. Neutralizing epitopes displayed after the viral attachment are hardly attacked. There is only one report in HIV of using fusion intermediates in which HIV envelope proteins covalently cross-linked to CD4 (the viral receptor) as immunogens, which resulted in eliciting antibodies to neutralize some primary HIV isolates. However, it was not known to which target - HIV envelope protein or CD4 - these antibodies bound.

Besides the application of wild-type viral envelope proteins as immunogens, researchers have used other strategies to engineer viral envelope proteins so as to redirect the immune response to evolutionarily conserved, multivalent epitopes, thereby providing strong neutralizing antibodies with broad specificity. These include site-directed mutagenesis, hyperglycosylation and deglycosylation in certain domains. Unfortunately, none of these has provided the desired results.

Another approach, fueled by frustration using conventional strategies, involves the construction of antigen libraries. Such libraries increase the chance for the recovery of potential antigen variants that provide the characteristics needed for ideal vaccine antigens. There are many approaches for the construction of antigen libraries, and among them DNA shuffling technique is the most attractive. The DNA shuffling technique has been extensively applied by Maxygen (www.maxygen.com) in the search for vaccine antigens of various viral diseases, including dengue and AIDS. Recently, a dengue envelope protein produced by DNA shuffling was shown to elicit broad neutralizing antibodies to all four dengue serotypes, although the titer was found not to be high. Due to the use of wild-type viral sequence as starting resource, given its intrinsic bias towards homogeneous recombination, DNA shuffling technique is less powerful for the generation of novel viral antigens. Thus, there remains a need for improved methods of finding antigenic universal multivalent vaccine epitopes.

SUMMARY OF THE INVENTION

Thus, in accordance with the present invention, there is provided a method of identifying a multivalent vaccine epitope comprising (a) providing a data set comprising a plurality of viral envelope protein (VEP) nucleic acid or protein sequences; (b) subjecting said data set to be simulated under evolution models to obtain an ancestral VEP sequence; (c) subjecting said an ancestral VEP sequence to be dedicated for saturated mutatgenesis to obtain a VEP antigen library; (d) constructing a pseudotyped viral particle library, wherein said particles express members of said VEP antigen library; (e) neutralizing said pseudotyped viral particle library with antiserum raised against wild-type VEP sequence; (f) infecting cells with the neutralized viral particle library and selecting variants; (g) selecting survival variants; and (h) obtaining sequence information on said survival variants, wherein these sequences are potential vaccine candidates containing pluriopotent epitopes and deserve for further refinement. Step (c) of the method may comprise codon optimization. The VEP may be from hepatitis C virus, human immunodeficiency virus, dengue virus, influenza virus, ebola virus, coronavirus, or hantavirus. The plurality of VEP sequences may comprise at least 5, at least 10, at least 15, at least 20 or at least 25 sequences. The simulated evolution may comprise reconstruction of a phylogenetic tree, rooting of the phylogenetic tree, and inference of said ancestral VEP sequence. The phylogenetic tree may comprise a phylogenetic inference method. The phylogenetic inference method may comprise a maximum likelihood method. The rooting of the phylogenetic tree may also comprise the use of molecular clock hypothesis or outgroup criterion. The method may further comprise selecting a nucleotide or amino acid substitution model that fits the data set. The method may also further comprise identifying domains in said ancestral sequence for saturated mutagenesis. Constructing a pseudotyped viral particle library may comprise PCR assembly. The pseudotyped viral particles may be retroviral particles, including lentiviral particles. The antiserum may be human antiserum or from a non-human animal. Selecting survival variants may comprise fluorescence activated cell sorting of infected cells. As used herein the specification, "a" or "an" may mean one or more. As used herein in the claim(s), when used in conjunction with the word "comprising," the words "a" or "an" may mean one or more than one. As used herein "another" may mean at least a second or more. Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein:

FIG 1 - Schematic representation of the relationship between the potential sequence space of a given protein and current methods for library construction. Due to the application of different strategies, each method takes up only a small but distinct part of the space although the possibility for the overlap among methods cannot be totally excluded. RACHITT, random chimeragenesis on transient templates (Coco et al, 2001; Coco, 2003); StEP, staggered extension process (Aguinaldo and Arnold, 2003; 1998); ITCHY, iterative truncation for the creation of hybrid enzymes (Ostermeier and Lutz, 2003; Ostermeier et al, 1999); MAX randomization (Hughes et al, 2003); Error-prone PCR (Cirino et al, 2003; Matsumura and Ellington, 2001). FIGS. 2A-B - The saturation mutagenesis of an ancestral gene. (FIG. 2A) The design of mutagenic primers is shown. Each primer consists of three parts (5' to 3'), the overlapped domain for assembly PCR, the middle part for saturation mutagenesis and the priming domain for single PCR. (FIG. 2B) An example of EGCAL is depicted. Assuming that three domains (green) are identified for saturation mutagenesis, a given ancestral gene should be divided into three overlapped fragments. The saturation mutagenesis is done for each fragment, followed by the assembly PCR. The final product is cloned into appropriate E.coli cells to generate EGCAL.

DETAILED DESCRIPTION OF THE INVENTION I. The Present Invention

The inventor now presents a novel approach for construction of antigen libraries. Basically, this approach takes into the account of molecular evolution. The first step is the inference of the putative ancestor of a given viral gene by simulating its evolutionary history. The ancestor is then used as the backbone for further saturated mutagenesis at certain regions, which give the higher dN/dS values by scanning entire genetic domain with an appropriate slide window. The ratio of the number of nonsynonymous substitutions per nonsynonymous site (dN) and the number of synonymous substitutions per synonymous site (dS) is generally defined as an indicator of evolutionary pressure. While keeping certain sites stable due to functional constraint, viruses always search sequence space for affordable sites to increase replicate fitness and/or to escape immune/drug attacks.

Every library needs a screening method. For instance, screening antibody libraries is achieved through the ligand-binding mechanism, which is fast and powerful. In fact, for a given antigen, it is now possible to obtain new monoclonal antibodies with extremely high affinity in just a few weeks. However, it is impossible to find an ideal candidate vaccine antigen from an antigen library by screening strategies as antibody libraries. For each antigen variant, you need to express its recombinant protein, followed by the inoculation into mice or rabbit. The resulting antiserum is subsequently estimated for the neutralizing potential. Although the progress may be speeded by using DNA vaccine approach omitting the step of protein expression/purification, time is not affordable even with a small antigen library containing 10⁴ variants in which you may need to test randomly at least 200 antigen variants to obtain a complete picture.

The inventor describes a novel screening strategy based on the concept of the redirection of immune hierarchy, which focuses on novel protective immune epitopes that are not dominant on wild-type virus or displayed after virus attachment. A pseudotyped virus pool will be produced by the transfection of lentiviral vectors containing envelope variants from the antigen libraries. Neutralizing assay based on pseudotyped viral particles is now extensively used for many viruses, including HIV, HCV, SARS, Ebola, etc., because it is a fast and convenient way without the concern for biosafety. This pseudotyped virus pool will be first incubated with antiserum produced with wild-type viral envelope gene, followed by the infection with acceptable cells. The infected pseudotyped viral particles should have viral envelope variants that have not been blocked by antiserum with wild-type viral gene. Therefore, these survival viral envelope variants may contain expected multivalent epitopes that are able to elict broad neutralizing antibodies against all isolates for a given virus.

II. Generation of Ancestral Viral Sequences by evolutionary si,ulation

Pursuing ancestral characteristics of organisms or genes represents one of the most promising strategies for understanding complex biological or biomolecular functions. Unlike animals or plants, a virus evolves without traces of fossils. To reconstruct ancestral characteristics of viral genes, the only approach is to exploit nature's principles of Darwinian evolution, i.e., variation and selection, which can be simulated under a given nucleotide substitution model by mathematical models. Results are shown as phylogenies, in which end branches represent contemporary viral sequences while intra-nodes assume as ancestral nodes from which viruses start to diversify (FIG. 2?). Ancestral sequences, also called most recent common ancestor (MRCA), can be inferred from ancestral nodes within a given phylogeny. It should be noted that a viral MRCA is different from its consensus sequence that is inferred based on the nucleotide frequency at each position over multiple viral sequences. The consensus sequence does not contain any evolutionary information.

Biological characteristics of evolutionary ancestors have been studied for a few families of enzymes (Belinda et al., 2002; Jermann et al, 1995). However, such an approach has not been applied in vaccinology. Recently consensus sequences have been suggested in human immunodeficiency virus (HIV) vaccine development in which a high genetic heterogeneity is a major obstacle (Gaschen et al., 2002). Preliminary data showed that consensus HIV sequences induced broad neutralizing antibodies to many genotypes of HIV-I (Gao et al., 2005). Artificial sequences created by other approaches, such as the DNA shuffling technique (Locher et al., 2005) have also been tested for improving vaccine antigens. However, all these strategies do not take into account the adaptive evolution, which is experienced by most, if not all, of microbes. Vaccine development by using ancestral sequences represents a novel approach for viruses with great genetic variability. The present invention utilizes a method for developing an ancestral nucleotide sequence through reconstruction of phylogenetic trees. The ancestral nucleotide sequence may be directed to any one of myriad viruses or virus families, with HCV being exemplified herein. The method involves, generally, the steps of retrieving virus nucleic acid sequences from a genetic database (e.g. , GenBank) and then editing and aligning those sequences using editing and alignment programs, which include for example Clustal W (Higgins and Sharp, 1988), the BioEdit program available from North Carolina State University (available at www.mbio.ncsu.edu/BioEdit/ bioedit.html), and the SegEd program available in the GCG package (Wisconsin GCG package. Version 10.0. Oxford Molecular Group, Inc.). Any missing information, for example, the genotype for a given sequence, may be determined by phylogenetic analyses, such as for example Molecular Evolutionary Genetics Analysis (MEGA; Kumar et al., 2001). The sequences are then filtered to remove sequences that are below a particular size cut-off. Those remaining sequences that show signs of recombination are eliminated based on the detection of phylogenetic noise through split decomposition analysis (Fan et al., 2003). These surviving sequences having greater than 99% identity at the nucleotide level are reduced to a single representative sequence.

Model simulation and phylogenetic reconstruction are applied to the now remaining sequences. A particular model is selected through a hierarchical likelihood ratio test (hLRT) simulated with the program Modeltest (Posada and Crandall, 1998;

2001). A phylogenetic tree is then constructed by heuristic search using a maximum likelihood (ML) approach for all genotypes. ML trees can be constructed using any one or more of known programs (e.g., PAUP, Swofford, Sinauer Associates; PHYML, Guindon and Gascuel, 2003). Once the tree is produced, it is then rooted, for example, by assuming molecular clock hypothesis. The rooted tree is then used as a template to simulate an ancestral sequence. Simulation of ancestral sequences at each internal node as well as the most recent common ancestor (MRCA) is conducted using an evolutionary program, such as for example the baseml program of the PAML package (Yang, 1997). The ancestral sequence(s) are reconstructed at the nucleotide level. A. Reconstruction of Phylogenetic Tree

There are numberous methods available for the construction of phylogenetic trees (for a detailed discussion see Felsenstein, 1988; Swofford et al, 1996). Maximum likelihood (ML) approach is generally selected due to its outperformance when evolutionary rates are heterotacheous, for instance, among different viral genotypes (Gadagkar and Kumar, 2005). The ML approach requires an explicit evolutionary model that could be determined by model simulation. When applying the best-fit model and relative parameters obtained from the model simulation, the best trees for a given sequence data can be recovered by heuristic search. All processes can be computed with existing programs, such as PAUP* (Swofford, PAUP Ver. 4.02b). A frequently observed drawback for ML approach is the intensive computation that is sometime unaffordable, especially when the data set contains a large number of sequences and the complicated evolutionary models are applied. The inventors solved this issue by producing initial ML trees with PHYML program that implants a simple hill-climbing algorithm for heuristic tree search and uses a distance-based tree as a starting point (Guindon and Gascuel, 2003). The trees produced by PHYML are then transferred into PAUP* for further optimization, including the tree rooting under molecular clock hypothesis, which is otherwise impossible for large ML phylogenies.

B. Rooting of the Phylogenetic Tree The root of a phylogenetic tree represents its first and deepest split, and it therefore provides the crucial time point for polarizing the historical sequences of all subsequent evolutionary events. An incorrectly rooted tree can result in profoundly misleading inferences of taxonomic relationships and character evolution. It is mandatory to root the tree to generate a correct topology, which will serve as the template for the inference of ancestral sequences.

There are several methods available for rooting phylogenetic trees, including non-reversible models of substitution, midpoint rooting, the outgroup criterion and the molecular clock (Graham et al, 2002; Huelsenbeck et al, 2002). The first two approaches have been proven to be problematic (Wheeler, 1990; Yang, 1994). Although the outgroup criterion has been frequently used in phylogenetic practice, appropriate outgroups are sometimes difficult to be identified. The inventors therefore suggest rooting the tree under molecular clock hypothesis, especially when working on RNA viruses. For real viral sequence data, it should be noted that molecular clock hypothesis is rejected in most of situations by using the likelihood ratio test (LRT) (Goldman, 1993). This may be partially attributed to missing information of sampling dates for individual viral sequences. When considering the time scale, the evolution of a given virus generally shows a molecular clock pattern, referred as a "relaxed" clock model (Twiddy et al., 2003; Twiddy et al., 2002). Additionally, the root is more consistently identified with molecular clock assumption comparing to the outgroup approach. The rooting can be done with many programs, such as PAUP*.

C. Inference The simulation of ancestral sequences at each internal node can be done with

"baseml" program in PAML package for both marginal and joint ancestral reconstruction (Yang, 1997). The rooted trees described above serve as the template. It is recommended to reconstruct ancestral sequences at nucleotide level rather than at codon or amino acid levels since the later two approaches ignore synonymous substitutions that may also experience positive selection (Novella et al., 2004). The posterior probability of the reconstruction should be examined for reliability at each nucleotide site.

III. Viral Envelope Proteins

A variety of viral envelope proteins may be subjected to the methods described herein. The following are a possible candidates: hepatitis C virus, human immunodeficiency virus, dengue virus, influenza virus, ebola virus, coronavirus, hepatitis B virus, Japanese encephalitis virus or hantavirus.

IV. Viral Envelope Polynucleotides

The term "gene" is used here to refer to the nucleic acid giving rise to a functional protein, polypeptide, or peptide-encoding unit, in this case a viral envelope protein. In addition to the full length gene, which may contain non-coding sequences, the polynucleotides of the present invention contemplate shorter lengths that comprise less than all of a complete viral envelope polypeptide, in particular epitopes thereof, including about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330,

340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 441, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 600, 650, 700, 750, 800, 900, 1000, 1100, 1200, 1500, 2000 or more nucleotides, nucleosides, or base pairs.

In particular embodiments, the invention concerns isolated nucleic acid segments and recombinant vectors incorporating DNA sequences that encode envelope polypeptides or peptides. Such vectors used in the present invention, regardless of the length of the coding sequence itself, may be combined with other

DNA or RNA sequences, such as promoters, polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, other coding segments, and the like, such that their overall length may vary considerably. It is therefore contemplated that a nucleic acid fragment of almost any length may be employed, with the total length preferably being limited by the ease of preparation and use in the intended recombinant DNA protocol.

A. Vectors Encoding HCV Envelope Proteins

The present invention encompasses the use of vectors that encode all or part of viral envelope polypeptides. The term "vector" is used to refer to a carrier nucleic acid molecule into which a nucleic acid sequence can be inserted for introduction into a cell where it can be replicated. A nucleic acid sequence can be "exogenous," which means that it is foreign to the cell into which the vector is being introduced or that the sequence is homologous to a sequence in the cell but in a position within the host cell nucleic acid in which the sequence is ordinarily not found. Vectors include plasmids, cosmids, viruses (bacteriophage, animal viruses, and plant viruses), and artificial chromosomes (e.g., YACs). In particular embodiments, gene therapy or immunization vectors are contemplated. One of skill in the art would be well equipped to construct a vector through standard recombinant techniques, which are described in Maniatis et al. (1990) and Ausubel et al. (1996), both incorporated herein by reference.

The term "expression vector" or "expression construct" refers to a vector containing a nucleic acid sequence coding for at least part of a gene product capable of being transcribed. In some cases, RNA molecules are then translated into a protein, polypeptide, or peptide. Expression vectors can contain a variety of "control sequences," which refer to nucleic acid sequences necessary for the transcription and possibly translation of an operably linked coding sequence in a particular host organism. In addition to control sequences that govern transcription and translation, vectors and expression vectors may contain nucleic acid sequences that serve other functions as well and are described infra.

B. Promoters and Enhancers A "promoter" is a control sequence that is a region of a nucleic acid sequence at which initiation and rate of transcription are controlled. It may contain genetic elements at which regulatory proteins and molecules may bind such as RNA polymerase and other transcription factors. The phrases "operatively positioned," "operatively linked," "under control," and "under transcriptional control" means that a promoter is in a correct functional location and/or orientation in relation to a nucleic acid sequence to control transcriptional initiation and/or expression of that sequence. A promoter may or may not be used in conjunction with an "enhancer," which refers to a cis-acting regulatory sequence involved in the transcriptional activation of a nucleic acid sequence. A promoter may be one naturally-associated with a gene or sequence, as may be obtained by isolating the 5' non-coding sequences located upstream of the coding segment and/or exon. Such a promoter can be referred to as "endogenous." Similarly, an enhancer may be one naturally associated with a nucleic acid sequence, located either downstream or upstream of that sequence. Alternatively, certain advantages will be gained by positioning the coding nucleic acid segment under the control of a recombinant or heterologous promoter, which refers to a promoter that is not normally associated with a nucleic acid sequence in its natural environment. A recombinant or heterologous enhancer refers also to an enhancer not normally associated with a nucleic acid sequence in its natural environment. Such promoters or enhancers may include promoters or enhancers of other genes, and promoters or enhancers isolated from any other prokaryotic, viral, or eukaryotic cell, and promoters or enhancers not "naturally-occurring," i.e., containing different elements of different transcriptional regulatory regions, and/or mutations that alter expression. In addition to producing nucleic acid sequences of promoters and enhancers synthetically, sequences may be produced using recombinant cloning and/or nucleic acid amplification technology, including PCR™, in connection with the compositions disclosed herein (see U.S. Patent 4,683,202 and U.S. Patent 5,928,906, each incorporated herein by reference). Furthermore, it is contemplated the control sequences that direct transcription and/or expression of sequences within non-nuclear organelles such as mitochondria, chloroplasts, and the like, can be employed as well.

Naturally, it will be important to employ a promoter and/or enhancer that effectively directs the expression of the nucleic acid segment in the cell type, organelle, and organism chosen for expression. Those of skill in the art of molecular biology generally know the use of promoters, enhancers, and cell type combinations for protein expression, for example, see Sambrook et al (2001), incorporated herein by reference. The promoters employed may be constitutive, tissue-specific, inducible, and/or useful under the appropriate conditions to direct high level expression of the introduced DNA segment, such as is advantageous in the large-scale production of recombinant proteins and/or peptides. The promoter may be heterologous or exogenous, i.e., from a different source than the viral envelope sequence. In some examples, a prokaryotic promoter is employed for use with in vitro transcription of a desired sequence. Prokaryotic promoters for use with many commercially available systems include T7, T3, and Sp6.

Table 2 lists several elements/promoters that may be employed, in the context of the present invention, to regulate the expression of a gene. This list is not intended to be exhaustive of all the possible elements involved in the promotion of expression but, merely, to be exemplary thereof. Table 3 provides examples of inducible elements, which are regions of a nucleic acid sequence that can be activated in response to a specific stimulus.

C. Initiation Signals

A specific initiation signal also may be required for efficient translation of coding sequences. These signals include the ATG initiation codon or adjacent sequences. Exogenous translational control signals, including the ATG initiation codon, may need to be provided. One of ordinary skill in the art would readily be capable of determining this and providing the necessary signals. It is well known that the initiation codon must be "in- frame" with the reading frame of the desired coding sequence to ensure translation of the entire insert. The exogenous translational control signals and initiation codons can be either natural or synthetic. The efficiency of expression may be enhanced by the inclusion of appropriate transcription enhancer elements.

D. Multiple Cloning Sites

Vectors can include a multiple cloning site (MCS), which is a nucleic acid region that contains multiple restriction enzyme sites, any of which can be used in conjunction with standard recombinant technology to digest the vector (see Carbonelli et al, 1999; Levenson et ah, 1998; and Cocea, 1997; all incorporated herein by reference). "Restriction enzyme digestion" refers to catalytic cleavage of a nucleic acid molecule with an enzyme that functions only at specific locations in a nucleic acid molecule. Many of these restriction enzymes are commercially available. Use of such enzymes is widely understood by those of skill in the art. Frequently, a vector is linearized or fragmented using a restriction enzyme that cuts within the MCS to enable exogenous sequences to be ligated to the vector. "Ligation" refers to the process of forming phosphodiester bonds between two nucleic acid fragments, which may or may not be contiguous with each other. Techniques involving restriction enzymes and ligation reactions are well known to those of skill in the art of recombinant technology.

E. Termination/Polyadenylation Signals

The vectors or constructs of the present invention will generally comprise at least one termination signal. A "termination signal" or "terminator" is comprised of the DNA sequences involved in specific termination of an RNA transcript by an RNA polymerase. Thus, in certain embodiments a termination signal that ends the production of an RNA transcript is contemplated. A terminator may be necessary in vivo to achieve desirable message levels.

In eukaryotic systems, the terminator region may also comprise specific DNA sequences that permit site-specific cleavage of the new transcript to expose a polyadenylation site. This signals a specialized endogenous polymerase to add a stretch of about 200 A residues (polyA) to the 3' end of the transcript. RNA molecules modified with this polyA tail appear to more stable and are translated more efficiently. Thus, in other embodiments involving eukaryotes, it is preferred that that terminator comprises a signal for the cleavage of the RNA, and it is more preferred that the terminator signal promotes polyadenylation of the message. The terminator and/or polyadenylation site elements can serve to enhance message levels and/or to minimize read through from the cassette into other sequences.

Terminators contemplated for use in the invention include any known terminator of transcription described herein or known to one of ordinary skill in the art, including but not limited to, for example, the termination sequences of genes, such as for example the bovine growth hormone terminator or viral termination sequences, such as for example the SV40 terminator. In certain embodiments, the termination signal may be a lack of transcribable or translatable sequence, such as due to a sequence truncation. F. Origins of Replication

In order to propagate a vector in a host cell, it may contain one or more origins of replication sites (often termed "ori"), which is a specific nucleic acid sequence at which replication is initiated. Alternatively, an autonomously replicating sequence (ARS) can be employed if the host cell is yeast. G. Selectable and Screenable Markers

In certain embodiments of the invention, the cells containing a nucleic acid construct of the present invention may be identified in vitro or in vivo by including a marker in the expression vector. Such markers would confer an identifiable change to the cell permitting easy identification of cells containing the expression vector. Generally, a selectable marker is one that confers a property that allows for selection. A positive selectable marker is one in which the presence of the marker allows for its selection, while a negative selectable marker is one in which its presence prevents its selection. An example of a positive selectable marker is a drug resistance marker.

Usually the inclusion of a drug selection marker aids in the cloning and identification of transformants, for example, genes that confer resistance to neomycin, puromycin, hygromycin, DHFR, GPT, zeocin and histidinol are useful selectable markers. In addition to markers conferring a phenotype that allows for the discrimination of transformants based on the implementation of conditions, other types of markers including screenable markers such as GFP, whose basis is colorimetric analysis, are also contemplated. Alternatively, screenable enzymes such as herpes simplex virus thymidine kinase (tk) or chloramphenicol acetyltransferase (CAT) may be utilized. One of skill in the art would also know how to employ immunologic markers, possibly in conjunction with FACS analysis. The marker used is not believed to be important, so long as it is capable of being expressed simultaneously with the nucleic acid encoding a gene product. Further examples of selectable and screenable markers are well known to one of skill in the art.

H. Host Cells

As used herein, the terms "cell," "cell line," and "cell culture" may be used interchangeably. All of these terms also include their progeny, which refers to any and all subsequent generations. It is understood that all progeny may not be identical due to deliberate or inadvertent mutations. In the context of expressing a heterologous nucleic acid sequence, "host cell" refers to a prokaryotic or eukaryotic cell, and it includes any transformable organisms that are capable of replicating a vector and/or expressing a heterologous gene encoded by a vector. A host cell can, and has been, used as a recipient for vectors. A host cell may be "transfected" or "transformed," which refers to a process by which exogenous nucleic acid is transferred or introduced into the host cell. A transformed cell includes the primary subject cell and its progeny.

Host cells may be derived from prokaryotes or eukaryotes, depending upon whether the desired result is replication of the vector, expression of part or all of the vector-encoded nucleic acid sequences, or production of infectious viral particles.

Numerous cell lines and cultures are available for use as a host cell, and they can be obtained through the American Type Culture Collection (ATCC), which is an organization that serves as an archive for living cultures and genetic materials. An appropriate host can be determined by one of skill in the art based on the vector backbone and the desired result. A plasmid or cosmid, for example, can be introduced into a prokaryote host cell for replication of many vectors. Bacterial cells used as host cells for vector replication and/or expression include DH5α, JM109, and KC8, as well as a number of commercially available bacterial hosts such as SURE^® Competent Cells and SOLOPACK™ Gold Cells (STRATAGENE^®, La Jolla). Alternatively, bacterial cells such as E. coli LE392 could be used as host cells for phage viruses.

Examples of eukaryotic host cells for replication and/or expression of a vector include HeLa, NIH3T3, Jurkat, 293, Cos, CHO, Saos, and PC12. Many host cells from various cell types and organisms are available and would be known to one of skill in the art. Similarly, a viral vector may be used in conjunction with either a eukaryotic or prokaryotic host cell, particularly one that is permissive for replication or expression of the vector. I. Expression Systems

Numerous expression systems exist that comprise at least all or part of the compositions discussed above. Prokaryote- and/or eukaryote -based systems can be employed for use with the present invention to produce nucleic acid sequences, or their cognate polypeptides, proteins and peptides. Many such systems are commercially and widely available.

The insect cell/baculovirus system can produce a high level of protein expression of a heterologous nucleic acid segment, such as described in U.S. Patents

5,871,986 and 4,879,236, both herein incorporated by reference, and which can be bought, for example, under the name MAXBAC^® 2.0 from INVITROGEN^® and BACPACK™ BACULOVIRUS EXPRESSION SYSTEM from CLONTECH^®.

Other examples of expression systems include STRATAGENE^®'S COMPLETE CONTROL™ Inducible Mammalian Expression System, which involves a synthetic ecdysone-inducible receptor, or its pET Expression System, an E. coli expression system. Another example of an inducible expression system is available from INVITROGEN^®, which carries the T-REX™ (tetracycline-regulated expression) System, an inducible mammalian expression system that uses the full-length CMV promoter. The Tet-On™ and Tet-Off™ systems from CLONTECH^® can be used to regulate expression in a mammalian host using tetracycline or its derivatives. The implementation of these systems is described in Gossen et al. (1992) and Gossen et al. (1995), and U.S. Patent 5,650,298, all of which are incorporated by reference. INVITROGEN^® also provides a yeast expression system called the Pichia methanolica Expression System, which is designed for high-level production of recombinant proteins in the methylotrophic yeast Pichia methanolica. One of skill in the art would know how to express a vector, such as an expression construct, to produce a nucleic acid sequence or its cognate polypeptide, protein, or peptide. J. Introduction of Nucleic Acids into Cells

In certain embodiments, a nucleic acid may be introduced into a cell in vitro for production of polypeptides or in vivo for immunization purposes. There are a number of ways in which nucleic acid molecules such as expression vectors may be introduced into cells. In certain embodiments of the invention, the expression vector comprises an HCV infectious particle or engineered vector derived from an HCV genome. In other embodiments, an expression vector known to one of skill in the art may be used to express an HCV polypeptide. The ability of certain viruses to enter cells via receptor-mediated endocytosis, to integrate into host cell genome and express viral genes stably and efficiently have made them attractive candidates for the transfer of foreign genes into mammalian cells (Ridgeway, 1988; Nicolas and Rubenstein, 1988; Baichwal and Sugden, 1986; Temin, 1986).

"Viral expression vector" is meant to include those vectors containing sequences of that virus sufficient to (a) support packaging of the vector and (b) to express a polynucleotide that has been cloned therein. In this context, expression may require that the gene product be synthesized. A number of such viral vectors have already been thoroughly researched, including adenovirus, adeno-associated viruses, retroviruses, herpesviruses, and vaccinia viruses.

Delivery may be accomplished in vitro, as in laboratory procedures for transforming cells lines, or in vivo or ex vivo, as in the treatment of certain disease states. One mechanism for delivery is via viral infection where the expression vector is encapsidated in an infectious viral particle. Several non-viral methods for the transfer of expression vectors into cultured mammalian cells also are contemplated by the present invention. These include calcium phosphate precipitation (Graham and Van Der Eb, 1973; Chen and Okayama, 1987; Rippe et al, 1990) DEAE-dextran (Gopal, 1985), electroporation (Tur-Kaspa et al, 1986; Potter et al, 1984), direct microinjection (Harland and Weintraub, 1985), DNA-loaded liposomes (Nicolau and Sene, 1982; Fraley et al, 1979) and lipofectamine-DNA complexes, cell sonication (Fechheimer et al, 1987), gene bombardment using high velocity microprojectiles (Yang et al, 1990), liposome (Ghosh and Bachhawat, 1991; Kaneda et al, 1989) and receptor-mediated transfection (Wu and Wu, 1987; Wu and Wu, 1988). Some of these techniques may be successfully adapted for in vivo or ex vivo use.

In certain embodiments, e.g., in vitro transformation of cells, the polynucleotide encoding an HCV gene may be stably integrated into the genome of the cell. In yet further embodiments, the nucleic acid may be stably maintained in the cell as a separate, episomal segment of DNA. Such nucleic acid segments or "episomes" encode sequences sufficient to permit maintenance and replication independent of or in synchronization with the host cell cycle. How the expression vector is delivered to a cell and where in the cell the nucleic acid remains is dependent on the type of expression vector employed.

K. Saturation Mutagenesis Mutagenesis may be accomplished by a variety of standard, mutagenic procedures. Mutation is the process whereby changes occur in the quantity or structure of an organism. Mutation can involve modification of the nucleotide sequence of a single gene, blocks of genes or whole chromosome. Changes in single genes may be the consequence of point mutations which include the deletion, insertion or substitution of a single nucleotide base within a DNA sequence, or they may be the consequence of changes involving the insertion or deletion of large numbers of nucleotides.

Mutations can arise spontaneously as a result of events such as errors in the fidelity of DNA replication or the movement of transposable genetic elements (transposons) within the genome. They also are induced following exposure to chemical or physical mutagens. Such mutation-inducing agents include ionizing radiations, ultraviolet light and a diverse array of chemical such as alkylating agents and polycyclic aromatic hydrocarbons all of which are capable of interacting either directly or indirectly (generally following some metabolic biotransformations) with nucleic acids. The DNA lesions induced by such environmental agents may lead to modifications of base sequence when the affected DNA is replicated or repaired and thus to a mutation. Mutation also can be site-directed through the use of particular targeting methods.

1. Random Mutagenesis

Insertional Mutagenesis. Insertional mutagenesis is based on the inactivation of a gene via insertion of a known DNA fragment. Because it involves the insertion of some type of DNA fragment, the mutations generated are generally loss-of-function, rather than gain-of-function mutations. However, there are several examples of insertions generating gain-of-function mutations (Oppenheimer et al 1991). Insertion mutagenesis has been very successful in bacteria and Drosophila (Cooley et al 1988) and recently has become a powerful tool in corn (Schmidt et al. 1987); Arabidopsis; (Marks et al, 1991; Koncz et al 1990); and Antirrhinum (Sommer et al 1990).

Transposable genetic elements are DNA sequences that can move (transpose) from one place to another in the genome of a cell. The first transposable elements to be recognized were the Activator/Dissociation elements of Zea mays (McClintock,

1957). Since then, they have been identified in a wide range of organisms, both prokaryotic and eukaryotic.

Transposable elements in the genome are characterized by being flanked by direct repeats of a short sequence of DNA that has been duplicated during transposition and is called a target site duplication. Virtually all transposable elements whatever their type, and mechanism of transposition, make such duplications at the site of their insertion. In some cases the number of bases duplicated is constant, in other cases it may vary with each transposition event. Most transposable elements have inverted repeat sequences at their termini. These terminal inverted repeats may be anything from a few bases to a few hundred bases long and in many cases they are known to be necessary for transposition. Prokaryotic transposable elements have been most studied in E. coli and Gram negative bacteria, but also are present in Gram positive bacteria. They are generally termed insertion sequences if they are less than about 2 kb long, or transposons if they are longer. Bacteriophages such as mu and D 108, which replicate by transposition, make up a third type of transposable element. Elements of each type encode at least one polypeptide a transposase, required for their own transposition. Transposons often further include genes coding for function unrelated to transposition, for example, antibiotic resistance genes.

Transposons can be divided into two classes according to their structure. First, compound or composite transposons have copies of an insertion sequence element at each end, usually in an inverted orientation. These transposons require transposases encoded by one of their terminal IS elements. The second class of transposon have terminal repeats of about 30 base pairs and do not contain sequences from IS elements. Transposition usually is either conservative or replicative, although in some cases it can be both. In replicative transposition, one copy of the transposing element remains at the donor site, and another is inserted at the target site. In conservative transposition, the transposing element is excised from one site and inserted at another.

Eukaryotic elements also can be classified according to their structure and mechanism of transportation. The primary distinction is between elements that transpose via an RNA intermediate, and elements that transpose directly from DNA to DNA.

Elements that transpose via an RNA intermediate often are referred to as retrotransposons, and their most characteristic feature is that they encode polypeptides that are believed to have reverse transcriptionase activity. There are two types of retrotransposon. Some resemble the integrated proviral DNA of a retrovirus in that they have long direct repeat sequences, long terminal repeats (LTRs), at each end. The similarity between these retrotransposons and proviruses extends to their coding capacity. They contain sequences related to the gag and pol genes of a retrovirus, suggesting that they transpose by a mechanism related to a retroviral life cycle. Retrotransposons of the second type have no terminal repeats. They also code for gag- and /?o/-like polypeptides and transpose by reverse transcription of RNA intermediates, but do so by a mechanism that differs from that or retrovirus-like elements. Transposition by reverse transcription is a replicative process and does not require excision of an element from a donor site.

Transposable elements are an important source of spontaneous mutations, and have influenced the ways in which genes and genomes have evolved. They can inactivate genes by inserting within them, and can cause gross chromosomal rearrangements either directly, through the activity of their transposases, or indirectly, as a result of recombination between copies of an element scattered around the genome. Transposable elements that excise often do so imprecisely and may produce alleles coding for altered gene products if the number of bases added or deleted is a multiple of three.

Transposable elements themselves may evolve in unusual ways. If they were inherited like other DNA sequences, then copies of an element in one species would be more like copies in closely related species than copies in more distant species. This is not always the case, suggesting that transposable elements are occasionally transmitted horizontally from one species to another.

Chemical mutagenesis. Chemical mutagenesis offers certain advantages, such as the ability to find a full range of mutant alleles with degrees of phenotypic severity, and is facile and inexpensive to perform. The majority of chemical carcinogens produce mutations in DNA. Benzo[a]pyrene, N-acetoxy-2-acetyl amino fluorene and aflotoxin Bl cause GC to TA transversions in bacteria and mammalian cells. Benzo[a]pyrene also can produce base substitutions such as AT to TA. N-nitroso compounds produce GC to AT transitions. Alkylation of the 04 position of thymine induced by exposure to n-nitrosoureas results in TA to CG transitions. A high correlation between mutagenicity and carcinogenity is the underlying assumption behind the Ames test (McCann et ah, 1975) which speedily assays for mutants in a bacterial system, together with an added rat liver homogenate, which contains the microsomal cytochrome P450, to provide the metabolic activation of the mutagens where needed. In vertebrates, several carcinogens have been found to produce mutation in the ras proto-oncogene. N-nitroso-N-methyl urea induces mammary, prostate and other carcinomas in rats with the majority of the tumors showing a G to A transition at the second position in codon 12 of the Ha-ras oncogene. Benzo[a]pyrene-induced skin tumors contain A to T transformation in the second codon of the Ha-ras gene.

Radiation Mutagenesis. The integrity of biological molecules is degraded by the ionizing radiation. Adsorption of the incident energy leads to the formation of ions and free radicals, and breakage of some covalent bonds. Susceptibility to radiation damage appears quite variable between molecules, and between different crystalline forms of the same molecule. It depends on the total accumulated dose, and also on the dose rate (as once free radicals are present, the molecular damage they cause depends on their natural diffusion rate and thus upon real time). Damage is reduced and controlled by making the sample as cold as possible.

Ionizing radiation causes DNA damage and cell killing, generally proportional to the dose rate. Ionizing radiation has been postulated to induce multiple biological effects by direct interaction with DNA, or through the formation of free radical species leading to DNA damage (Hall, 1988). These effects include gene mutations, malignant transformation, and cell killing. Although ionizing radiation has been demonstrated to induce expression of certain DNA repair genes in some prokaryotic and lower eukaryotic cells, little is known about the effects of ionizing radiation on the regulation of mammalian gene expression (Borek, 1985). Several studies have described changes in the pattern of protein synthesis observed after irradiation of mammalian cells. For example, ionizing radiation treatment of human malignant melanoma cells is associated with induction of several unidentified proteins (Boothman et al, 1989). Synthesis of cyclin and co-regulated polypeptides is suppressed by ionizing radiation in rat REF52 cells, but not in oncogene-transformed REF52 cell lines (Lambert and Borek, 1988). Other studies have demonstrated that certain growth factors or cytokines may be involved in x-ray-induced DNA damage. In this regard, platelet-derived growth factor is released from endothelial cells after irradiation (Witte, et al, 1989).

In the present invention, the term "ionizing radiation" means radiation comprising particles or photons that have sufficient energy or can produce sufficient energy via nuclear interactions to produce ionization (gain or loss of electrons). An exemplary and preferred ionizing radiation is an x-radiation. The amount of ionizing radiation needed in a given cell generally depends upon the nature of that cell. Typically, an effective expression-inducing dose is less than a dose of ionizing radiation that causes cell damage or death directly. Means for determining an effective amount of radiation are well known in the art. In a certain embodiments, an effective expressioninducing amount is from about 2 to about 30 Gray (Gy) administered at a rate of from about 0.5 to about 2 Gy/minute. Even more preferably, an effective expression inducing amount of ionizing radiation is from about 5 to about 15 Gy. In other embodiments, doses of 2-9 Gy are used in single doses. An effective dose of ionizing radiation may be from 10 to 100 Gy, with 15 to 75 Gy being preferred, and 20 to 50 Gy being more preferred.

Any suitable means for delivering radiation to a tissue may be employed in the present invention in addition to external means. For example, radiation may be delivered by first providing a radiolabeled antibody that immunoreacts with an antigen of the tumor, followed by delivering an effective amount of the radiolabeled antibody to the tumor. In addition, radioisotopes may be used to deliver ionizing radiation to a tissue or cell.

In Vitro Scanning Mutagenesis. Random mutagenesis also may be introduced using error prone PCR (Cadwell and Joyce, 1992). The rate of mutagenesis may be increased by performing PCR in multiple tubes with dilutions of templates. One particularly useful mutagenesis technique is alanine scanning mutagenesis in which a number of residues are substituted individually with the amino acid alanine so that the effects of losing side-chain interactions can be determined, while minimizing the risk of large-scale perturbations in protein conformation (Cunningham et al, 1989). In recent years, techniques for estimating the equilibrium constant for ligand binding using minuscule amounts of protein have been developed (Blackburn et al. , 1991; U.S. Patents 5,221,605 and 5,238,808). The ability to perform functional assays with small amounts of material can be exploited to develop highly efficient, in vitro methodologies for the saturation mutagenesis of antibodies. The inventors bypassed cloning steps by combining PCR mutagenesis with coupled in vitro transcription/translation for the high throughput generation of protein mutants. Here, the PCR products are used directly as the template for the in vitro transcription/translation of the mutant single chain antibodies. Because of the high efficiency with which all 19 amino acid substitutions can be generated and analyzed in this way, it is now possible to perform saturation mutagenesis on numerous residues of interest, a process that can be described as in vitro scanning saturation mutagenesis (Burks et al., 1997).

In vitro scanning saturation mutagenesis provides a rapid method for obtaining a large amount of structure-function information including: (i) identification of residues that modulate ligand binding specificity, (ii) a better understanding of ligand binding based on the identification of those amino acids that retain activity and those that abolish activity at a given location, (iii) an evaluation of the overall plasticity of an active site or protein subdomain, (iv) identification of amino acid substitutions that result in increased binding.

Fragmentation and Reassmbly. A method for generating libraries of displayed polypeptides is described in U.S. Patent 5,380,721. The method comprises obtaining polynucleotide library members, pooling and fragmenting the polynucleotides, and reforming fragments therefrom, performing PCR amplification, thereby homologously recombining the fragments to form a shuffled pool of recombined polynucleotides. 2. Site-Directed Mutagenesis

Structure-guided site-specific mutagenesis represents a powerful tool for the dissection and engineering of protein- ligand interactions (Wells, 1996; Braisted et al., 1996). The technique provides for the preparattion and testing of sequence variants by introducing one or more nucleotide sequence changes into a selected DNA. Site-specific mutagenesis uses specific oligonucleotide sequences which encode the DNA sequence of the desired mutation, as well as a sufficient number of adjacent, unmodified nucleotides. In this way, a primer sequence is provided with sufficient size and complexity to form a stable duplex on both sides of the deletion junction being traversed. A primer of about 17 to 25 nucleotides in length is preferred, with about 5 to 10 residues on both sides of the junction of the sequence being altered. The technique typically employs a bacteriophage vector that exists in both a single-stranded and double-stranded form. Vectors useful in site-directed mutagenesis include vectors such as the M 13 phage. These phage vectors are commercially available and their use is generally well known to those skilled in the art. Double- stranded plasmids are also routinely employed in site-directed mutagenesis, which eliminates the step of transferring the gene of interest from a phage to a plasmid.

In general, one first obtains a single-stranded vector, or melts two strands of a double-stranded vector, which includes within its sequence a DNA sequence encoding the desired protein or genetic element. An oligonucleotide primer bearing the desired mutated sequence, synthetically prepared, is then annealed with the single-stranded DNA preparation, taking into account the degree of mismatch when selecting hybridization conditions. The hybridized product is subjected to DNA polymerizing enzymes such as E. coli polymerase I (Klenow fragment) in order to complete the synthesis of the mutation-bearing strand. Thus, a heteroduplex is formed, wherein one strand encodes the original non-mutated sequence, and the second strand bears the desired mutation. This heteroduplex vector is then used to transform appropriate host cells, such as E. coli cells, and clones are selected that include recombinant vectors bearing the mutated sequence arrangement.

Comprehensive information on the functional significance and information content of a given residue of protein can best be obtained by saturation mutagenesis in which all 19 amino acid substitutions are examined. The shortcoming of this approach is that the logistics of multiresidue saturation mutagenesis are daunting (Warren et al,

1996, Zeng et al, 1996; Burton and Barbas, 1994; Yelton et al, 1995; Hilton et al,

1996). Hundreds, and possibly even thousands, of site specific mutants must be studied. However, improved techniques make production and rapid screening of mutants much more straightforward. See also, U.S. Patents 5,798,208 and 5,830,650, for a description of "walk-through" mutagenesis.

Other methods of site-directed mutagenesis are disclosed in U.S. Patents 5,220,007; 5,284,760; 5,354,670; 5,366,878; 5,389,514; 5,635,377; and 5,789,166. L. Codon Optimization

Most of amino acids are encoded by more than one codon. For a given amino acid, different species may favor different codons, which creates possible codon bias.

There is documented effect of codon bias on the expression of viral genes that is closely related to vaccine efficiency (Andre et al., 1998; Matsumura and Ellington,

2001). It is therefore necessary to reformat the codon usage of inferred ancestral genes according to suggested species. The processing of optimization can be done with several web-based programs, such as JCat (Grote et al, 2005) that implants an algorithm for the calculation of the codon adaption index (CAI). The CAI is the prevailing empirical measure of expressivity (Sharp and Li, 1984).

V. Pseudotyped Virus Production and Neutralization Experiments A. Pseudotyped Viruses

It has long been known that concurrent productive infection of cells with two types of enveloped virus can potentially lead to the production of mixed viral particles or "pseudotypes." These naturally produced viral particles may carry the core and genetic information of one virus, and in addition the surface proteins of the other virus

(Weiss, 1993).

Retroviral vectors are some of the most commonly pseudotyped viruses. Murine leukemia virus (MLV), a retrovirus that is able to infect many different cell types, can be manipulated to infect a wide range of cells by pseudotyping. Pseudotyped retroviruses can be engineered in the laboratory using packaging cell lines that produce gag and pol proteins from one virus and env proteins from a second virus. For example, non-targeted, pseudotyped retroviruses based upon MLV and carrying the envelope protein of highly promiscuous vesicular stomatis virus (VSV) have been produced (Yee et. al, 1994). These vectors give titers higher than 10⁹ (cf 10⁶ for MLV based RVs) and are more stable, facilitating their concentration. These MLV/VSV pseudotyped RVs show a very wide infection spectrum and are able to infect even fish cells.

Pseudotyped retroviral vectors based upon MoMuLV (MLV) and carrying the envelope of gibbon ape leukemia virus (GaLV) or the HTLV-I envelope protein also have been described (Wilson et al, 1989). The GaLV SEATO-MOMULV hybrid particles were generated at titers approximately equivalent to those obtained with the MoMuLV particles, and the infection spectrum correlates exactly with the previously reported in vitro host range of wild-type GaLV SEATO, i.e., bat, mink, bovine and human cells. The apparent titers of HTLV-I MoMuLV (1-10 CFU/ml) were substantially lower than the titers achieved with either the MoMuLV or GaLV- MoMuLV recombinant virions. The HTLV-I hybrid virions were able to infect human and mink cells (Wilson et al, 1989).

Using techniques similar to those described above, the inventors will introduce putative vaccine envelope sequences into expression vectors that will be introduced into host cells and sequently infected with viral vectors lacking envelope sequences. The pseudotyped particles will contain the vaccine envelope sequence(s) to be tested by neutralization assays, as described below. In one embodiment, the virus population will be mixed, i.e., it will contain a large number of distinct virus envelope sequences. In other embodiments, the population will be homogenous, i.e., it will contain a single envelope sequence, thereby permitting more precise assessment of the degree of virus neutralization.

B. Neutralization

Following production of pseudotyped virus populations, viruses (and hence virus envelope sequences) will be tested for their ability to be neutralized by antibodies. The antibodies may come from a variety of sources, including natural (serum) and synthetic (polyclonal immune sera; monoclonal antibodies) sources. For neutralization assays, the antibody preparation will be diluted to appropriate concentrations and, alternatively, multiple concentrations of antibody may be used in replicate experiments. The antibody preparation is then mixed with the pseudotyped viral particles and incubated for an appropriate period of time. Virus infectivity is then assessed by plating of the neutralized virus onto permissive cells, followed by incubation under conditions permitting uptake and replication of the virus in the cells.

Assessing of virus replication can be undertaken by several different means.

First, cells may be solubilized and Western blot or radioimmune precipitation may be performed. Alternatively, the virus replication can be assessed by incorporated of a radiolabel, such as tritiated thymidine, into the produced genomes and assessing by scintillation counting. Finally, plaque assays may be performed where virus is plated onto confluent cell monolayers followed by observations of areas of cell death, called "plaques."

VI. Examples

The following examples are included to demonstrate preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.

EXAMPLE 1 - Inference of Ancestral Envelope Sequences

Data compilation. Full-length HCV E1E2 sesquences were retrieved from the GenBank (www.ncbi.nlm.nih.gov/Genbank/index.html) and Los Alamos HCV database (hcv.lanl.gov) (Kuiken et al, 2005). Each sequence was manually examined to determine its genotype and exact length. Sequences were edited and aligned with Clustal W (Higgins and Sharp, 1988), BioEdit

(www.mbio.ncsu.edu/BioEdit/bioedit.html) and SeqEd program in GCG package (Wisconsin GCG Package, Ver. 10.0). Missing genotype information for some sequences were determined by phylogenetic analyses with MEGA (Molecular Evolutionary Genetics Analysis) (Kumar et al, 2001) under neighbor-joining approach with kumar-2 parameter as nucleotide substitution model. The data was further filtered by excluding recombinants and sequences with high homogeneity in which only one of them was included. The final data set consists of 119 full-length HCV E1E2 sequences, designated number 1 to 119, respectively matching with their GenBank accession numbers AFOl 1753, AF271632, AF290978, AF511948, AF511949, AF511950, AF529293, AJ278830, AJ557444, AY388455, AY615798, AY695436, AY695437, AY885238, AY958064, D10749, DQ061303, DQ061307, DQ061312, DQ061318, DQ061322, DQ061326, DQ061327, M62321, AB049087, AB049090, AB049093, AB049096, AB154186, AB154188, AB154192, AB154194, AB154198, AB154200, AB154202, AB154206, AF165056, AF176573, AF207758, AF207759, AF207761, AF207763, AF207764, AF207770, AF207773, AF356827, AF483269, AJ849974, AY070174, AY460204, D10934, D13406, D14484, D45172, D50480, D50481, D50484, D89815, D90208, L02836, L20498, M84754, M96362, U01214, U89019, X61592, AY051292, AY651061, D14853, AB030907, AB031663, AB047645, AF169002, AF169003, AF169004, AF169005, AF177036, AF238481, AF238482, AF238483, AF238484, AF238485, AF238486, AY232731, AY232733, AY232735, AY232737, AY232739, AY232741, AY232743, AY232745, AY232747, AY232749, AY746460, D00944, D10988, D50409, DQ155561, AF046866, AY958004, AY958024, AY958044, D17763, D28917, D49374, D63821, X76918, Yl 1604, AF064490, Y13184, AY859526, AY878650, D63822, D84262, D84263, D84264, D84265, DQ155560 and Y12083.

Model selection. The inventor then evaluated the most appreciate nucleotide substitution model for these 119 HCV sequences. This was done by using hierarchical likelihood ratio tests (hLRTs) that were simulated with the program Modeltest for total of 56 evolutionary models (Goldman, 1993; Posada and Crandall, 1998; Posada and Crandall, 2001). General time Reversible model (GTR) (Tavare, 1986) was selected together with among-site variation where proportion of invariable sites (I) and gamma distribution shape parameter (G) are 0.2188 and 0.6121, respectively.

Reconstruction of phylogenetic trees. The best tree was recovered by heuristic search using maximum likelihood (ML) approach for the whole data set. The best-fit model and relative parameters described above were applied. All processes were completed with the program PAUP* (Swofford, PAUP Ver. 4.02b). Initially, it was failed to construct the tree with PAUP* directly due to the large numbers of sequences that assume unaffordable computation (years). As an alternative approach, the inventor first produced ML trees with PHYML program that implanted a simple hill-climbing algorithm for heuristic tree search and used a distance-based tree as a starting point (Guindon and Gascuel, 2003). The tree produced by PHYML was then transferred into PAUP* for further optimization and rooted by molecular clock approach (FIG. 3).

Inference of the ancestral HCV envelope sequence. The simulation of ancestral sequences at each internal node was done with "baseml" program in PAML package for both marginal and joint ancestral reconstruction (Yang, 1997). The tree shown in FIG. 3 served as the template. The inventor reconstructs ancestral sequences at nucleotide level rather than at codon or amino acid levels since the later two approaches ignore synonymous substitutions that may also experience positive selection (Novella et al., 2004). The inventor successfully inferred two ancestral sequences at the deepest and interior roots of the tree, representing the ancestors of all HCV isolates (HCVAl) and HCV genotype 1 isolates (HCV A2), respectively. Furthermore, the codon usages of HCVAl and HCVA2 were optimized based on mammal species with program JCat (Grote et al., 2005) that implanted an algorithm for the calculation of the codon adaption index (CAI), a prevailing empirical measure of expressivity (Sharp and Li, 1987). The nucleotide sequences after codon optimization are designated as HCVAI l for HCVAl and HCVA22 for HCVA2, respectively. All sequences are shown below.

The similarity of ancestral HCV envelope sequences was examined at both nucleotide and amino acid levels using SimPlot program (LoIe et al, 1999). HCVAl and HCV A2 share 84% and 86% nucleotide and amino acid homogeneity, respectively. Interestingly, codon-optimized versions, HCVAI l and HCV A22 share 93% nucleotide homogeneity. The comparison of ancestral sequences and HCV Chiron strain (M62321) is shown in FIG. 4.

All of the compositions and methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure.

While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the compositions and methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope, and concept of the invention as defined by the appended claims. VI. References

The following references, to the extent that they provide exemplary procedural or other details supplementary to those set forth herein, are specifically incorporated herein by reference.

U.S. Patent 4,683,202 U.S. Patent 5,220,007 U.S. Patent 5,221,605 U.S. Patent 5,238,808 U.S. Patent 5,284,760 U.S. Patent 5,354,670 U.S. Patent 5,366,878 U.S. Patent 5,380,721 U.S. Patent 5,389,514 U.S. Patent 5,635,377 U.S. Patent 5,789,166 U.S. Patent 5,798,208 U.S. Patent 5,830,650 U.S. Patent 5,928,906

Aguinaldo and Arnold, Methods MoL Biol., 231 : 105-10, 2003.

Almendro et al., J. Immunol., 157(12):5411-5421, 1996.

Alter et al, N. Engl. J. Med., 341 :556-562, 1999.

Andre et al, J. Virol, 72:1497-1503, 1998.

Angel et al, Cell, 49:729, 1987b.

Angel et al, MoI Cell. Biol, 7:2256, 1987a.

Atchison and Perry, Cell, 46:253, 1986.

Atchison and Perry, Cell, 48:121, 1987.

Ausubel et al, In: Current Protocols in Molecular Biology, John, Wiley & Sons, Inc,

New York, 1996. Baichwal and Sugden, In: Gene Transfer, Kucherlapati (Ed.), NY, Plenum Press, 117-

148, 1986. Banerji et al, Cell, 27(2 Pt l):299-308, 1981.

Banerji et al, Cell, 33(3):729-740, 1983.

Barany and Merrifϊeld, In: The Peptides, Gross and Meienhofer (Eds.), Academic

Press, NY, 1-284, 1979.

Belinda et al, MoI Biol Evol, 19:1483-1489, 2002. Berkhout et al, Cell, 59:273-282, 1989. Blackburn et al, J. Lipid. Res., 32(12): 1911-1918, 1991. Blanav et al, EMBO J., 8:1139, 1989. Bodine and Ley, EMBO J., 6:2997, 1987. Boothman et al, Cancer Res., 49(l l):2871-2878, 1989. Borek, Carcinog. Compr. Surv., 10:303-316, 1985. Boshart et al, Cell, 41 :521, 1985. Bosze ^ α/., £7kffi0J., 5(7):1615-1623, 1986. Braddock et al, Cell, 58:269, 1989.

Braisted and Wells, Proc. Natl. Acad. Sci. USA, 93(12):5688-5692, 1996. Bulla and Siddiqui, J. Virol, 62:1437, 1986. Burks et al, Proc. Natl. Acad. Sci. USA, 94(2):412-417, 1997. Burton and Barbas, Adv. Immunol, 57:191-280, 1994. Cadwell and Joyce, PCR Methods Appl, 2(l):28-33, 1992. Campbell and Villarreal, MoI Cell. Biol, 8:1993, 1988. Campere and Tilghman, Genes and Dev., 3:537, 1989. Campo et al, Nature, 303:77, 1983.

Carbonelli et al, FEMS Microbiol. Lett., 177(l):75-82, 1999. Celander and Haseltine, J. Virology, 61 :269, 1987. Cdandev et al, J. Virology, 62:1314, 1988. Chandler et al, Cell, 33:489, 1983. Chang et al, MoI Cell. Biol, 9:2153, 1989. Chatterjee et al, Proc. Natl. Acad. Sci. USA, 86:9114, 1989. Chen and Okayama, MoI Cell Biol, 7(8):2745-2752, 1987. Choi et al, Cell, 53:519, 1988. Cirino et al, Methods MoI Biol, 231:3-9, 2003. Cocea, Biotechniques, 23(5):814-816, 1997. Coco et al, Nat. Biotechnol, 19:354-359, 2001. Coco, Methods MoI Biol. , 231 : 111-27, 2003.

Cohen et al, J. Cell. Physiol, 5:75, 1987.

Costa et al., MoI. Cell. Biol, 8:81, 1988.

Cripe et al, EMBOJ., 6:3745, 1987.

Culotta and Hamer, MoI Cell. Biol, 9:1376, 1989.

Cunningham and Wells, Science, 244(4908):1081-1085, 1989.

Dandolo et al, J. Virology, 47:55-64, 1983.

De Jager et al, Semin. Nucl Med., 23(2):165-179, 1993.

De Villiers et al, Nature, 312(5991):242-246, 1984.

Deschamps et al, Science, 230:1174-1177, 1985.

Doolittle and Ben-Zeev, Methods MoI Biol, 109:215-237, 1999.

Edbrooke et al, MoI Cell. Biol, 9:1908, 1989.

Edlund et al, Science, 230:912-916, 1985.

Fan et al, Hepatology 38:25-33, 2003.

Fechheimer, et al, Proc Natl. Acad. Sci. USA, 84:8463-8467, 1987.

Felsenstein, Annu. Rev. Genet., 22:521-565, 1988.

Feng and Holland, Nature, 334:6178, 1988.

Firak and Subramanian, MoI Cell. Biol, 6:3667, 1986.

Foecking and Hofstetter, Gene, 45(1): 101-105, 1986.

Fraley et al, Proc. Natl. Acad. Sci. USA, 76:3348-3352, 1979.

Fujita et α/., Cell, 49:357, 1987.

Gahizon et al, Cancer Res., 50(19):6371-6378, 1990.

Gadagkar and Kumar, Molec. Biol. Evol, 22:2139-2141, 2005.

Gao et al, J. Virol, 79:1154-1163, 2005.

Gaschen et al, Science, 296:2354-2360, 2002.

Ghosh and Bachhawat, In: Liver Diseases, Targeted Diagnosis and Therapy Using

Specific Receptors and Ligands, Wu et al (Eds.), Marcel Dekker, NY, 87-104,

1991.

Gilles et al, Cell, 33:717, 1983. Gloss et al, EMBO J., 6:3735, 1987. Godbout et al, MoI Cell. Biol, 8:1169, 1988. Goldman N. Statistical tests of models of DNA substitution. J MoI Evol 1993;

36:182-98. Goldman, J. MoL EvoL, 36:182-198, 1993.

Goodbourn and Maniatis, Proc. Natl. Acad. Sci. USA, 85:1447, 1988.

Goodbourn et al, Cell, 45:601, 1986.

Gopal, M>/. Cell Biol, 5:1188-1190, 1985.

Gossen et al, Proc. Natl. Acad. Sci. USA, 89:5547-5551, 1992.

Gossen et al, Science, 268(5218): 1766- 1769, 1995.

Graham and Van Der Eb, Virology, 52:456-467, 1973.

Graham et al, MoI Biol. EvoL, 19:1769-1781, 2002.

Greene et al, Immunology Today, 10:272, 1989.

Grosschedl and Baltimore, Cell, 41 :885, 1985.

Grote et al, Nucleic Acid Res., 33:W526-531, 2005.

Guindon and Gascuel, Systematic Biology, 52:696-704, 2003.

Gulbis and Galand, Hum. Pathol, 24(12):1271-1285, 1993.

Hall, Genetics, 120(4):887-897, 1988.

Harland and Weintraub, J. Cell Biol, 101(3): 1094-1099, 1985.

Haslinger and Karin, Proc. Natl. Acad. Sci. USA, 82:8572, 1985.

Hauber and Cullen, J. Virology, 62:673, 1988.

Hen et al, Nature, 321 :249, 1986.

Hensel et al, Lymphokine Res., 8:347, 1989.

Herr and Clarke, Cell, 45:461, 1986.

Higgins and Sharp, Gene, 73:237-244, 1988.

Hilton et al, J. Biol. Chem., 271(9):4699-4708, 1996.

Hirochika ^ α/., J. Virol, 61 :2599, 1987. ϋivsch et al, MoI Cell. Biol, 10:1959, 1990.

Holbrook e^/., Virology, 157:211, 1987.

Holland et al, Science, 215:1577-1585, 1982.

Horlick and Benfϊeld, MoI Cell. Biol, 9:2396, 1989.

Huang et al, Cell, 27:245, 1981.

Huelsenbeck et al, Syst. Biol, 51 :32-43, 2002.

Hug et al, MoI Cell. Biol, 8:3065, 1988.

Hughes et al, J. MoI Biol, 331 :973-979, 2003.

Hwang et al, MoI Cell. Biol, 10:585, 1990.

Imagawa et al, Cell, 51 :251, 1987. Imbra and Karin, Nature, 323:555, 1986.

Imler et al., Mol. Cell. Biol., 7:2558, 1987.

Imperiale and Nevins, MoI. Cell. Biol, 4:875, 1984.

Ishii et al, Hepatology, 28:1117-1120, 1998.

Jakobovits et al., Mol. Cell. Biol, 8:2555, 1988.

Jameel and Siddiqui, MoI. Cell. Biol, 6:710, 1986.

Jaynes et al, Mol Cell. Biol, 8:62, 1988.

Jermann et al, Nature, 374:57-59, 1995.

Johnson et al, MoI Cell. Biol, 9:3393, 1989.

Kadesch and Berg, MoI Cell. Biol, 6:2593, 1986.

Kaneda et al, Science, 243:375-378, 1989.

Karin et al, MoI Cell. Biol, 7:606, 1987.

Katinka et al, Cell, 20:393, 1980.

Katinka et al, Nature, 290:720, 1981.

Kato et al, Biochem. Biophys. Res. Commun., 189:119-127, 1992.

Kato et al, J. Virol, 68:4776-4784, 1994.

Kawamoto et al, MoI Cell Biol, 8:267, 1988.

Kiledjian et al, MoI Cell. Biol, 8:145, 1988.

Klamut et al, MoI Cell. Biol, 10:193, 1990.

Koch et al, MoI Cell. Biol, 9:303, 1989.

Koncz et al, EMBOJ., 9(5): 1337-1346, 1990.

Rriegler and Botchan, In: Eukaryotic Viral Vectors, Gluzman (Ed.), Cold Spring

Harbor: Cold Spring Harbor Laboratory, NY, 1982. Rriegler and Botchan, MoI Cell. Biol, 3:325, 1983. Rriegler et al, Cell, 38:483, 1984a. Rriegler et al, Cell, 53:45, 1988. Rriegler et al, In: Cancer Cells 2/Oncogenes and Viral Genes, Van de Woude et al eds, Cold Spring Harbor, Cold Spring Harbor Laboratory, 1984b. Rriegler et al, In: Gene Expression, Alan Liss (Ed.), Hamer and Rosenberg, New

York, 1983.

Ruhl et al, Cell, 50:1057, 1987. Ruiken et al, Bioinformatics, 21(3):379-384, 2005. Rumar et al, Bioinformatics, 17:1244-1245, 2001. Kunz et al., Nucl. Acids Res., 17:1121, 1989.

Lambert and Borek, J. Natl. Cancer Inst., 80(18):1492-1497, 1988.

Larsen et al, Proc Natl. Acad. Sci. USA., 83:8283, 1986.

Laspia et al, Cell, 59:283, 1989.

Laήmev et al, MoI Cell. Biol., 10:760, 1990.

Lee et αl., Nature, 294:228, 1981.

Lee et al., Nucleic Acids Res., 12:4191-206, 1984.

Levenson et al, Hum. Gene Ther., 9(8):1233-1236, 1998.

Lin et al, MoI. Cell. Biol, 10:850, 1990.

Locher et al, DNA & Cell Biology, 24:256-263, 2005.

LoIe et al, J. Virol, 73:152-160, 1999.

Luria et al, EMBO J., 6:3307, 1987.

Lusky and Botchan, Proc. Natl. Acad. Sci. USA, 83:3609, 1986.

Lusky et al, MoI Cell. Biol, 3:1108, 1983.

Macejak and Sarnow, Nature, 353:90-94, 1991.

Majors and Varmus, Proc. Natl. Acad. Sci. USA, 80:5866, 1983.

Maniatis, et al, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press,

Cold Spring Harbor, N.Y., 1990. Marks et al, J. MoI Biol. 222: 581-597, 1991. Matsumura and Ellington, J. MoI Biol, 305:331-339, 2001. McCann et al, Proc. Natl. Acad. Sci. USA, 72(3):979-983, 1975. McClintok et al, Am. J. Physiol, 189(3):463-469, 1957. McNeall et al, Gene, 76:81, 1989. Miksicek et al, Cell, 46:203, 1986. Mordacq and Linzer, Genes andDev., 3:760, 1989. Moreau et al, Nucl Acids Res., 9:6047, 1981. Muesing et al, Cell, 48:691, 1987. Ng et al, Nuc. Acids Res., 17:601, 1989. Nicolas and Rubinstein, In: Vectors: A survey of molecular cloning vectors and their uses, Rodriguez and Denhardt, eds., Stoneham: Butterworth, pp. 494-513,

1988. Nicolau and Sene, Biochim. Biophys. Acta, 721 :185-190, 1982. Novella et al, J. MoI Biol, 342:1415-1421, 2004.

Ondck et al, EMBO J, 6:1017, 1987.

Ornitz et al., Mol. Cell. Biol, 7:3466, 1987.

Ostermeier and Lutz, Methods MoI Biol. , 231 : 129- 141 , 2003.

Ostermeier et al, Nat. Biotechnol, 17:1205-1209, 1999.

Palmiter et al, Nature, 300:611, 1982.

Pech et al, Mol Cell. Biol, 9:396, 1989.

Perez-Stable and Constantini, MoI Cell. Biol, 10:1116, 1990.

Picard and Schaffher, Nature, 307:83, 1984.

Pinkert et α/., Genes and Dev., 1 :268, 1987.

Ponta et al, Proc. Natl. Acad. Sci. USA, 82:1020, 1985.

Forton et al, MoI Cell. Biol, 10:1076, 1990.

Posada and Crandall, Bioinformatics, 14:817-818, 1998.

Posada and Crandall, Syst. Biol, 50:580-601, 2001.

Potter et al, Proc. Natl. Acad. Sci. USA, 81 :7161-7165, 1984.

Queen and Baltimore, Cell, 35:741, 1983.

Quinn et al, MoI Cell. Biol, 9:4713, 1989.

Redondo et al, Science, 247:1225, 1990.

Reisman and Rotter, MoI Cell. Biol, 9:3571, 1989.

Resendez Jr. et al, MoI Cell. Biol, 8:4579, 1988.

Ridgeway, In: Vectors: A Survey of Molecular Cloning Vectors and Their Uses,

Rodriguez et al (Eds.), Stoneham: Butterworth, 467-492, 1988. Ripe et al, MoI Cell. Biol, 9:2224, 1989. Rippe, et al, MoI Cell Biol, 10:689-695, 1990. Rittling et al, Nuc. Acids Res., 17:1619, 1989.

Rollenhagen et al, Proc. Natl. Acad. Sci. USA, 101 :8739-8744, 2004. Rosen et al, Cell, 41 :813, 1988. Sakai et al, Genes and Dev., 2:1144, 1988. Sambrook et al, In: Molecular cloning, Cold Spring Harbor Laboratory Press, Cold

Spring Harbor, NY, 2001. Satake et al, J. Virology, 62:970, 1988. Schaffher et al, J. MoI Biol, 201 :81, 1988. Schmidt et al, Science, 238(4829):960-963, 1987. Searle et al, MoL Cell. Biol, 5:1480, 1985.

Sharp and Li, Nucleic Acid Res., 15:1281-1295, 1987.

Shaul and Ben-Levy, EMBO J., 6:1913, 1987.

Sherman et al, MoI Cell. Biol, 9:50, 1989.

Sleigh and Lockett, J. EMBO, 4:3831, 1985.

Sommer et al EMBO J., 9(3):605-613, 1990.

Spalholz et al, Cell, 42:183, 1985.

Spandau and Lee, J. Virology, 62:427, 1988.

Spandidos and Wilkie, EMBO J, 2: 1193, 1983.

Stephens and Hentschel, Biochem. J, 248:1, 1987.

Stuart et al, Nature, 317:828, 1985.

Sullivan and Peterlin, MoI Cell. Biol, 7:3315, 1987.

Swartzendruber and Lehman, J. Cell. Physiology, 85:179, 1975.

Swofford et al, In: Molecular Systematics, 2^nd Ed., Hills et al (Eds.), Sinauer

Associates, Sunderland, Massachusetts, 407-453, 1996 Swofford, PAUP*: Phylogenetic Analysis using Parsimony and Other Methods.

Version 4.02b. Sinauer Associates. Sunderland, MA. Takebe et al, MoI Cell. Biol, 8:466, 1988. Tavare, Lectures Math. Life Sd., 17:57-86, 1986. Tavernier et al, Nature, 301 :634, 1983. Taylor and Kingston, MoI Cell. Biol, 10:165, 1990a. Taylor and Kingston, MoI Cell. Biol, 10:176, 1990b. Taylor et al, J. Biol. Chem., 264:15160, 1989.

Temin, In: Gene Transfer, Kucherlapati (Ed.), NY, Plenum Press, 149-188, 1986. Thiesen et α/., J. Virology, 62:614, 1988. Treisman, Cell, 46(4):567-174, 1986 Tranche et al, MoI Biol. Med., 7:173, 1990. Tranche et al, MoI Cell Biol, 9(l l):4759-4766, 1989. Trudel and Constantini, Genes and Dev., 6:954, 1987. Tuv-Kaspa et al, MoI Cell Biol, 6:716-718, 1986. Twiddy et al, J Gen. Virol, 83:1679-1689, 2002. Twiddy et al, MoI Bio.l Evol, 20:122-129, 2003. Tyndell et al, Nuc. Acids. Res., 9:6231, 1981. Vannice and Levinson, J. Virology, 62:1305, 1988.

Vasseur et al, Proc Natl. Acad. Sci. USA, 77:1068, 1980.

Wang and Calame, Cell, 47:241, 1986.

Warren et al, Biochemistry, 35(27):8855-8862, 1996.

Weber et al, Cell, 36:983, 1984.

Weinberger et al. MoI. Cell. Biol, 8:988, 1984.

Weiss, R. A., In: The Retroviridae, 2:1-108, Levy (Ed.), Plenum Press,NY, 1993.

Wells et al, J. Leukoc. Biol, 59(l):53-60, 1996.

Wheeler, Cladistics, 6:363-368, 1990.

Wilson et al, J. of Virology, 63(5):2374-2378, 1989

Winoto and Baltimore, Cell, 59:649, 1989.

Wisconsin GCG package. Version 10.0. Oxford Molecular Group, Inc.

Witte et al, Cancer Res., 49(18):5066-5072, 1989.

Wu and Wu, Biochemistry, 27: 887-892, 1988.

Wu and Wu, J. Biol. Chem., 262:4429-4432, 1987.

Yang et al Proc. Natl. Acad. Sci. USA, 87:4144-4148, 1990.

Yang, Com. Appl Biosci., 13:555-556, 1997.

Yang, J. MoI Evol, 39:105-111, 1994.

Yee et. al, Proc. Natl. Acad. Sci. USA, 91 :9564-9568, 1994

Yelton et al, J. Immunol, 155(4): 1994-2004, 1995.

Zeng et al, Biochemistry, 35(40): 13157-13164, 1996.

Zhao et al, Nat. Biotechnol, 16:258-261, 1998.

Claims

1. A method of identifying a multivalent vaccine epitope comprising:

(a) providing a data set comprising a plurality of viral envelope protein (VEP) nucleic acid or protein sequences;

(b) subjecting said data set to simulated evolution to obtain an ancestral VEP sequence;

(c) subjecting said ancestral VEP sequence to saturated mutatgenesis to obtain a VEP antigen library;

(d) constructing a pseudotyped viral particle library, wherein said particles express members of said VEP antigen library;

(e) neutralizing said pseudotyped viral particle library with antiserum raised against wild-type VEP sequence;

(f) infecting cells with the neutralized viral particle library and selecting variants;

(g) selecting survival variants; and

(h) obtaining sequence information on said survival variants, wherein epitopes common to said survival variants are pluriopotent vaccine epitopes.

2. The method of claim 1, wherein step (c) comprises codon optimization.

3. The method of claim 1, wherein said VEP are from hepatitis C virus, human immunodeficiency virus, dengue virus, influenza virus, ebola virus, coronavirus, or hantavirus.

4. The method of claim 1, wherein said plurality of VEP sequences comprises at least 5, at least 10, at least 15, at least 20 or at least 25 sequences.

5. The method of claim 1, wherein said simulated evolution comprises reconstruction of a phylogenetic tree, rooting of the phylogenetic tree, and inference of said ancestral VEP sequence.

6. The method of claim 5, wherein reconstruction of the phylogenetic tree comprises a phylogenetic inference method.

7. The method of claim 6, wherein said phylogenetic inference method comprises a maximum likelihood method.

8. The method of claim 5, wherein rooting of the phylogenetic tree comprises use of a molecular clock hypothesis or outgroup criterion.

9. The method of claim 1, further comprising selecting a nucleotide or amino acid substitution model that fits the data set.

10. The method of claim 1, further comprising identifying domains in said ancestral sequence for saturated mutagenesis.

11. The method of claim 1, wherein constructing a pseudotyped viral particle library comprises PCR assembly.

12. The method of claim 1, wherein said pseudotyped viral particles are retroviral particles.

13. The method of claim 1 , wherein said antiserum is human antiserum.

14. The method of claim 1 , wherein said antiserum is from a non-human animal.

15. The method of claim 1, wherein selecting survival variants comprises fluorescence activated cell sorting of infected cells.