MXPA01008415A - Compositions and methods for non-targeted activation of endogenous genes - Google Patents

Compositions and methods for non-targeted activation of endogenous genes

Info

Publication number
MXPA01008415A
MXPA01008415A MXPA/A/2001/008415A MXPA01008415A MXPA01008415A MX PA01008415 A MXPA01008415 A MX PA01008415A MX PA01008415 A MXPA01008415 A MX PA01008415A MX PA01008415 A MXPA01008415 A MX PA01008415A
Authority
MX
Mexico
Prior art keywords
vector
cell
gene
further characterized
promoter
Prior art date
Application number
MXPA/A/2001/008415A
Other languages
Spanish (es)
Inventor
John J Harrington
Bruce Sherf
Stephen Rundlett
Original Assignee
Athersys Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Athersys Inc filed Critical Athersys Inc
Publication of MXPA01008415A publication Critical patent/MXPA01008415A/en

Links

Abstract

The present invention is directed generally to activating gene expression or causing over-expression of a gene by recombination methods in situ. The invention also is directed generally to methods for expressing an endogenous gene in a cell at levels higher than those normally found in the cell. In one embodiment of the invention, expression of an endogenous gene is activated or increased following integration into the cell, by non-homologous or illegitimate recombination, of a regulatory sequence that activates expression of the gene. In another embodiment, the expression of the endogenous gene may be further increased by co-integration of one or more amplifiable markers, and selecting for increased copies of the one or more amplifiable markers located on the integrated vector. In another embodiment, the invention is directed to activation of endogenous genes by non-targeted integration of specialized activation vectors, which are provided by the invention, into the genome of a host cell. The invention also provides methods for the identification, activation, isolation, and/or expression of genes undiscoverable by current methods since no target sequence is necessary for integration. The invention also provides methods for isolation of nucleic acid molecules (particularly cDNA molecules) encoding a variety of proteins, including transmembrane proteins, and for isolation of cells expressing such transmembrane proteins which may be heterologous transmembrane proteins. The invention also is directed to isolated genes, gene products, nucleic acid molecules, to compositions comprising such genes, gene products and nucleic acid molecules, and to vectors and host cells comprising such genes and nucleic acid molecules, that may be used in a variety of therapeutic and diagnostic applications. Thus, by the present invention, endogenous genes, including those associated with human disease and development, may be activated and isolated without prior knowledge of the sequence, structure, function, or expression profile of the genes.

Description

COMPOSITIONS AND METHODS FOR NON-DIRECTED ACTIVATION OF ENDOGENOUS GENES RECIPROCAL REFERENCE TO RELATED REQUESTS This application is a continuation in part of the application of E.U.A. No. of John J. Harrington, Bruce Sherf, and Stephen Rundlett, entitled "Compositions and Methods for the Undirected Activation of Endogenous Genes", filed March 8, 1999, which is a continuation in part of the application of E.U.A. No. 09 / 253,002, filed on February 19, 1999, to which is a continuation in the part of the application of E.U.A. No. 09 / 159,643, filed September 24, 1998, which is a continuation in part of the application of E.U.A. No. 08/941, 223, filed on September 26, 1997, the description of all of which are hereby incorporated in their entirety for reference.
BACKGROUND OF THE INVENTION FIELD OF THE INVENTION The present invention is in the fields of molecular biology and cell biology. The invention is generally directed to the activation of the expression of a gene or to causing overexpression of a gene by in situ recombination methods. More specifically, the invention is directed to the activation of endogenous genes by the non-targeted integration of specialized activation vectors, which are provided by the invention, within the genome of a host cell. The invention is also directed to methods for the identification, activation, and isolation of genes that until now had not been discovered, and to host cells and vectors comprising said isolated genes. The invention is also directed to isolated genes, gene products, nucleic acid molecules, and compositions comprising said genes, gene products, and nucleic acid molecules, which can be used in a variety of therapeutic and diagnostic applications. Therefore, by the present invention, endogenous genes, including those associated with disease and human development, can be identified, activated, and isolated without prior knowledge of the sequence, structure, function, or expression profile of the genes.
BACKGROUND ART The identification and overexpression of novel genes associated with human diseases is an important step towards the development of new therapeutic drugs. Current approaches to generate cell libraries for overexpression of the protein are based on the production and cloning of cDNA. Therefore, in order to identify a new gene using this approach, the gene must be expressed in the cells that were used to generate the library. The gene must also be expressed at sufficient levels to be adequately represented in the library. This is problematic, since many genes are expressed only in very small amounts, in a rare population of cells or during short periods of development. In addition, due to the large size of some messenger RNAs, it is difficult or impossible to produce full-length cDNA molecules capable of expressing the biologically active protein. The lack of full length cDNA molecules for minor messenger RNAs has been observed and is thought to be related to the sequences in the message and which are difficult to produce by reverse transcription or which are unstable during propagation in bacteria. As a result, even the most complete cDNA libraries express only a fraction of the entire group of possible genes. Finally, various cDNA libraries are produced in bacterial vectors. The use of these vectors to express biologically active mammalian proteins is severely limited, since most mammalian proteins do not fold correctly and / or are inadequately glycosylated in bacteria. Accordingly, a method for generating a more representative library for the expression of the protein, which can facilitate the reliable expression of biologically active proteins, would be extremely valuable. Suitable methods for overexpressing proteins include cloning the gene of interest and placing it, in a construction, together with a suitable promoter / enhancer, a polyadenylation signal, and a processing site, and introducing the construct into a suitable host cell. An alternative approach involves the use of homologous recombination to activate gene expression by selecting a strong promoter or other regulatory sequence for a previously identified gene. WO90 / 14092 describes the modification of genes in situ, in mammalian cells, which encode proteins of interest. This application describes single chain oligonucleotides for the targeted site modification of genes encoding proteins of interest. A marker can also be included. However, methods for providing an oligonucleotide sequence substantially homologous to a target site are limited. Therefore, the method requires the knowledge of the site required for activation by means of modified directed site and homologous recombination. Novel genes can not be discovered by such methods. WO91 / 06667 describes methods for expressing a mammalian gene in situ. With this method, an amplifiable gene is introduced near a selected gene by homologous recombination. When the cell is then cultured in the appropriate medium, both the amplifiable gene and the selected gene are amplified and there is an increased expression of the selected gene. As in the previous case, the methods for introducing the amplifiable gene are limited to homologous recombination, and are not useful for activating novel genes whose sequence (or existence) is unknown. WO91 / 01140 describes the deactivation of endogenous genes by modifying cells by homologous recombination.
With these methods, homologous recombination is used to modify and deactivate genes and to produce cells that can be useful as donors in gene therapy. WO92 / 20808 describes methods for modifying target genomic sites in situ. The modifications are described as very reduced, for example, changing a single base in the DNA. The method is based on genomic modification using a homologous DNA for targeting. WO92 / 19255 describes a method for improving the expression of a target gene, which is achieved by homologous recombination, in which a DNA sequence or a larger genomic fragment is integrated into the genome. This modified sequence can then be transferred to a secondary host for expression. An amplifiable gene can be integrated near the selected gene so that the selected region can be amplified to obtain improved expression. Homologous recombination is necessary for this targeting approach. WO93 / 09222 describes methods for making proteins by activating an endogenous gene that encodes a desired product. A regulatory region is targeted by homologous recombination and the region normally associated with the gene whose expression is desired is replaced or deactivated. This deactivation or replacement causes the gene to be expressed at higher than normal levels. WO94 / 12650 describes a method for activating the expression of an endogenous gene and amplifying it in situ in a cell, whose gene is not expressed per se or is not expressed at desired levels in the cell. The cell is transfected with exogenous DNA sequences that repair, alter, delete or replace a sequence present in the cell, or that are regulatory sequences that are not normally linked functionally to the endogenous gene in the cell. To do the foregoing, DNA sequences homologous to the genomic DNA sequences are used to a site previously selected to direct the endogenous gene. In addition, an amplifiable DNA encoding a selection marker can be included. By culturing the homologously recombinant cells under conditions they select for amplification, both the endogenous gene and the amplifiable marker are amplified together and the expression of the gene is increased. WO95 / 31560 discloses DNA constructs for homologous recombination. The constructs include a targeting sequence, a regulatory sequence, an exon, and an uncoupled processing donation site. Identification is achieved by homologous recombination of the construct with genomic sequences in the cell and allows the production of a protein in vitro or in vivo. WO96 / 29411 describes methods using an exogenous regulatory sequence, an exogenous exon, either coding or non-coding, and a processing donor site introduced at a site previously selected in the genome by homologous recombination. In this application, the introduced DNA is placed in such a way that the transcripts under control of the exogenous regulatory region include both the exogenous exon and the endogenous exons present in either the thrombopoietin, DNase I, or β-interferon genes, which they result in transcripts in which exogenous and exogenous exons are operatively linked. The novel transcription units are produced by homologous recombination. The Patent E.U.A. No. 5,272,071 describes the transcriptional activation of transcriptionally silent genes in a cell, by inserting a regulatory element of DNA that can promote the expression of a gene normally expressed in that cell. The regulatory element is inserted to be operatively to the normally silent gene. The insertion is achieved through homologous recombination generating a DNA construct with a normally silent segment of the gene. (The white DNA) and the DNA regulatory element used to induce the desired transcription. The Patent E.U.A. No. 5, 578,461 describes the expression of selected mammalian genes by homologous recombination. A DNA sequence is integrated into the genome or into a large genomic fragment to increase the expression of the target gene. The modified construct can then be transferred to a secondary host cell. An amplifiable gene can be integrated adjacent to the selected gene such that the selected region is amplified to achieve improved expression. Both previous approaches (the construction of an overexpression construct by cloning or homologous recombination in vivo) require that the gene be cloned and sequenced before it can be overexpressed. Additionally, using homologous recombination, the genomic sequence and structure can also be known. Unfortunately, many genes have not yet been identified and / or sequenced. Therefore, a method for overexpressing a gene of interest would be useful, whether it has been previously cloned or not, or that its sequence and structure have been known or not.
BRIEF DESCRIPTION OF THE INVENTION Therefore, the invention relates generally to methods for overexpressing an endogenous gene in a cell, comprising introducing a vector containing a regulator regulatory regulatory sequence into the cell, allowing the vector to integrate into the cell. cell genome by non-homologous recombination, and allowing overexpression of the endogenous gene in the cell. The method does not require prior knowledge of the sequence of the endogenous gene or even of the existence of the gene. Therefore, the invention is directed to the activation of non-targeted genes, which as the means used here employ the activation of endogenous genes by non-targeted or non-homologous (as opposed to directed and homologous) integration of specialized activation vectors within the genome of a host cell. The invention also encompasses constructs of novel vectors to activate gene expression or overexpression of a gene through non-homologous recombination. The novel construction lacks homologous identification sequences. That is, it does not contain nucleotide sequences that target the host cell DNA and promote homologous recombination in the target site, causing overexpression of a cell gene by the regulatory sequence of the introduced transcript. The novel vector constructs include a vector that contains a transcriptional regulatory sequence operably linked to an uncoupled processing donor sequence and further contains one or more amplifiable markers. The novel vector constructs include constructs with a transcriptional regulatory sequence operably linked to a translation initiation codon, a secretion signal sequence and an uncoupled processing donation site; constructs with a transcriptional regulatory sequence, operably linked to a start codon of translation, a tag epitope and an uncoupled processing donation site; the constructs contain a transcriptional regulatory sequence operably linked to a translation initiation codon, a signal sequence and a tag epitope, and an uncoupled processing donation site; constructs containing a transcriptional regulatory sequence operably linked to a translation initiation codon, a secretion signal sequence, a tag epitope, and a sequence specific protease site and an uncoupled processing donation site. The vector construct may contain one or more selection markers for the selection of the recombinant host cell. Alternatively, selection can be effected by phenotypic selection to obtain a character provided by the activated endogenous gene product.
These vectors, and indeed any of the vectors described herein, as well as the variants of the vectors that those skilled in the art will readily recognize, can be used in any of the methods described herein to form any of the compositions that They can be produced through these methods. The regulatory sequence of the transcript used in the vector constructs of the invention includes, but is not limited to, a promoter. In preferred embodiments, the promoter is a viral promoter. In highly preferred embodiments, the viral promoter is the cytomegalovirus immediate early promoter. In alternative embodiments, the promoter is a cellular, non-viral or inducible promoter. The regulatory sequence of the transcript that is used in the vector construction of the invention may also include, but is not limited to, an enhancer. In preferred embodiments, the enhancer is a viral enhancer. In highly preferred embodiments, the viral enhancer is the immediate early enhancer of cytomegalovirusIn alternative modalities, the enhancer is a cellular non-viral enhancer. In preferred embodiments of the methods described herein, the vector construct is, or may contain, linear RNA or DNA. The cell containing the vector can be selected for the expression of the gene. The cell that overexpresses the gene can be cultured in vitro under conditions that favor the production, by the cell, of desired amounts of the gene product (also referred to indistinctly herein as "the expression product") of the endogenous gene that has been activated or whose expression has been increased. The expression product can then be isolated and purified for use, for example, in protein therapy or drug discovery. Alternatively, the cell expressing the desired gene product may be allowed to express the gene product in vivo. In certain mentioned aspects of the invention, the cell containing a vector construct of the invention integrated into its genome can be introduced into a eukaryote (such as a vertebrate, particularly a mammal and most particularly a human) under conditions that favor overexpression or activation of the gene by the cell in vivo in the eukaryote. In those related aspects of the invention, the cell can be isolated and cloned before being introduced into the eukaryote. The invention is also directed to methods for overexpressing an endogenous gene in a cell, comprising the introduction of a vector containing a transcriptional regulatory sequence and one or more amplifiable markers in the cell, allowing the vector to integrate into the genome of the cell by non-homologous recombination and allowing overexpression of the endogenous gene in the cell. The cell containing the vector can be selected for overexpression of the gene.
The cell that overexpresses the gene is cultured in such a way that the amplification of the endogenous gene is obtained. The cell can then be cultured in vitro to produce desired quantities of the gene product of the amplified endogenous gene that has been activated or whose expression has been increased. The gene product can be isolated and purified. Alternatively, after amplification, the cell may be allowed to express the endogenous gene and produce desired quantities of the gene product in vivo. However, it should be understood that any vector used in the methods described herein may include one or more amplifiable markers. Accordingly, the amplification of both the vector and the DNA of interest (ie, containing the overexpressed gene) occurs in the cell, and in addition, increased expression of the endogenous gene is obtained. In accordance with the foregoing, the methods may include a step in which the endogenous gene is amplified. The invention also relates to methods for overexpressing an endogenous gene in a cell comprising introducing a vector containing a transcriptional regulatory sequence and a non-coupled processing donor sequence into the cell, which allows the vector to be integrated into the cell. cell genome by non-homologous recombination, and allows overexpression of the endogenous gene in the cell. The cell containing the vector can be selected for expression of the gene.
The cell that overexpresses the gene can be cultured in vitro to produce desired amounts of the gene product of the endogenous gene, whose expression has been activated or increased. The gene product can then be isolated and purified. Alternatively, the cell may be allowed to express the desired gene product in vivo. The vector construct may consist essentially of the regulatory sequence of the transcript. The vector construct may consist essentially of the transcriptional regulatory sequence and one or more amplifiable markers. The vector construct can consist essentially of the transcriptional regulatory sequence and the donor processing sequence. Any of the constructions of the vector of the invention can also include a secretion signal sequence. The secretion signal sequence is placed in the construct in such a way that it will functionally bind to the activated endogenous protein. Therefore, the secretion of the protein of interest occurs in the cell, and the purification of that protein is facilitated. In accordance with the above, the methods may include a step in which the protein expression product is secreted from the cell. The invention further relates to cells generated by any of the above methods. The invention also relates to cells that contain the vector constructs, cells in which the vector constructs have been integrated into the cell genome and cells that overexpress the desired gene products from an endogenous gene, said overexpression is driven by the regulatory sequence of the introduced transcription. The cells can be isolated and cloned. The methods can be carried out in any cell of eukaryotic origin, such as fungal, plant or animal. In preferred embodiments, the methods of the invention can be carried out in vertebrate cells, and particularly in mammalian cells that include but are not limited to cells from rats, mice, cattle, swine, sheep, goats and human cells, and very particularly in human cells. A single cell generated by the methods described above can overexpress a single gene or more than one gene. More than one gene in a cell can be activated by integrating a single type of construction into multiple sites in the genome. Similarly, more than one gene in a cell can be activated by integrating multiple constructs (ie, more than one type of construction) at multiple sites in the genome. Therefore, a cell can contain only one type of vector construct or different types of constructions, each of which can activate an endogenous gene. The invention also relates to methods for generating the cells that were described above by one or more of the following steps: introducing one or more of the vector constructs of the invention into a cell; allow the introduced construct (s) to be integrated into the cell genome by non-homologous recombination; allow overexpression of one or more endogenous genes in the cell; as well as isolate and clone the cell. The invention also relates to cells produced by said methods, which may be isolated cells. The invention also encompasses methods for using the cells described above to overexpress a gene, such as an endogenous cellular gene, which has been characterized (eg, sequenced), or uncharacterized, (e.g., a gene whose function is known but which it has not been cloned or sequenced), or a gene whose existence, prior to overexpression, was unknown. The cells can be used to produce desired amounts of an expression product in vitro or in vivo. If desired, this expression product can be isolated and purified, for example, by cell lysis or by isolation of the growth medium (as in the case when the vector contains a secretion signal sequence). The invention also encompasses cell libraries generated by the methods described above. A library can encompass all clones of a single transfection experiment. The subgroup may overexpress the same gene or more than one gene, for example, a class of genes. The transfection can be carried out with a single construction or with more than one construction. A library can also be formed by combining all of the recombinant cells from two or more transfection experiments, combining one or more subsets of cells from a single transfection experiment or by combining cell subgroups from transfection experiments performed separately. The resulting library can express the same gene, or more than one gene, for example, a class of genes. Again, in each of the individual transfections, a single construction or more than one construction can be used. The libraries can be formed from the same cell type or from different cell types. The invention also relates to methods for forming libraries by selecting various subgroups of cells from the same or different transfection experiments. The invention further relates to methods for using the cells or libraries of cells described above, for overexpressing or activating endogenous genes, or for obtaining the gene expression products of said overexpressed or activated genes. According to this aspect of the invention, the cell or library can be selected for the expression of the gene, just as the cells expressing the desired gene product can be selected. Then the cell can be used to isolate or purify the gene product for subsequent use. Expression in the cell can occur by culturing the cell in vitro, under conditions that favor the production of the endogenous gene expression product by the cell, or by allowing the cell to express the gene in vivo. In preferred embodiments of the invention, the methods include a method wherein the expression product is isolated or purified. In highly preferred embodiments, cells expressing the endogenous gene product are cultured under conditions that favor the production of sufficient quantities of the gene product for commercial purposes, and especially for diagnostic uses, drug discovery and therapeutic uses. Any of the methods may further comprise the introduction of double-strand breaks in the genomic DNA in the cell before or simultaneously with the integration of the vector. The invention is also directed to vector constructs that are useful for activating the expression of endogenous genes and for isolating the mRNA and cDNA corresponding to the activated genes. In one such embodiment, the construction of the vector may comprise (a) a first transcription regulatory sequence operably linked to a first sequence of the non-coupled processing donor.; (b) a second transcriptional regulatory sequence operably linked to a second unpaired processing donor sequence; and (c) a linearization site, which can be located between the first and second transcriptional regulatory sequences. In accordance with the invention, when the vector construct is transformed into a host cell and then integrated into the genome of the host cell, the first transcriptional regulatory sequence is preferably in an inverted orientation relative to the orientation of the cell. the second regulatory sequence of transcription. In certain preferred embodiments, the vector can be linearized by cleaving the linearization site. In another embodiment, the invention provides a linear vector construct having a 3 'end and a 5' end, comprising a transcriptional regulatory sequence operably linked to a non-coupled site of the processing donor, where the regulatory sequence of the transcription is oriented in the linear construction of the vector in an orientation that directs the transcription towards the 3 'end or towards the 5' end of the construction of the linear vector. In another embodiment, the invention provides a construction of the vector comprising, in sequential order, (a) a transcriptional regulatory sequence (b) a non-coupled processing donor site, (c) a rare restriction-cut site, a linearization site. In another embodiment, it provides a vector construct comprising (a) a first transcriptional regulatory sequence operably linked to a selection marker lacking a polyadenylation signal; and (b) a second transcriptional regulatory sequence operably linked to an exon processing donor site complex, where the first regulatory sequence is in the same orientation in the construction of the vector as is the second transcription regulatory sequence. , and where the first transcriptional regulatory sequence is located towards the 5 'end of the second transcriptional regulatory sequence in the construction of the vector. In further embodiments, the invention provides vector constructs comprising a transcriptional regulatory sequence operably linked to a selection marker that lacks a polyadenylation signal, and further, comprises an altered processing donor site. In another embodiment, the invention provides vector constructs comprising a first operably linked transcription regulatory sequence or a selection marker lacking a polyadenylation signal, and further comprising a second transcription regulatory sequence operably linked to a donor site of altered processing. According to the invention, the regulatory sequence of the transcription (or first or second regulatory sequence of the transcription, in the constructions of the vector having more than one transcription regulatory sequence) can be a promoter, an enhancer, or a repressor , and preferably is a promoter, including an animal cell promoter, a plant cell promoter, or a fungal cell promoter, more preferably a promoter selected from the group consisting of a promoter of the immediate early gene of CMV, a promoter of SV40 T antigen and a β-actin promoter. Other promoters of the origin of animal, plant, or fungal cells can also be used according to the invention and are known in the art and will be familiar to one skilled in the art in view of the teachings set forth herein. The selection marker used in the constructions of the vector of the invention can be any marker gene or marker which, after the integration of a vector containing the selection marker within the genome of a host cell, allows the selection of a cell that contains or expresses the marker gene. Suitable markers for selection include, but are not limited to, a neomycin gene, a hypoxanthine phosphoribosyltransferase gene, a puromycin gene, a dihydro-oratase gene, a glutamine synthetase gene, a histidine D gene, a carbamyl phosphate synthase gene, a dihydrofolate reductase gene, a gene 1 of multi-drug resistance, an aspartate transcarbamylase gene, a xanthine-guanine phosphoribosyl transferase gene, an adenosine deaminase gene, and a thymidine kinase gene. In related embodiments, the invention provides constructs of the vector comprising a positive selection marker, a negative selection marker, and an altered processing donor site, wherein the positive and negative selection markers and the donor donor site are targeted in the construction of the vector in an orientation that results in the expression of the positive selection marker in an active form, and either the non-expression of said negative selection marker or the expression of the negative selection marker inactive form, where the construction The vector is integrated into the genome of a eukaryotic host cell and activates an endogenous gene in the genome. In certain preferred embodiments, either the positive selection marker, the negative selection marker, or both, may lack a polyadenylation signal. The positive selection marker used in this aspect of the invention can be any selection marker that, after expression, produces a protein capable of facilitating the isolation of cells expressing the marker, including but not limited to the neomycin gene, a hypoxanthine tribosiltransferase gene, a puromycin gene, a dihydro-oratase gene, a glutamine synthetase gene, a histidine D gene, a carbamylphosphatosynthase gene, a dihydrofolate reductase gene, a gene 1 for multidrug resistance, a gene for aspartate transcarbamylase, a xanthine-guanine phosphoribosyltransferase gene, or an adenosine deaminase gene. Similarly, the negative selection marker used in these aspects of the invention can be any selection marker which, once expression is given, produces a protein capable of facilitating the removal of the cells expressing the marker including but not limited to a hypoxanthine phosphoribosyltransferase gene, a thymidine kinase gene, or a diphtheria toxin gene. The invention is also directed to eukaryotic host cells, which may be isolated host cells, comprising one or more of the vector constructs of the invention, preferred eukaryotic host cells include, but are not limited to, animal cells (including, but not limited to, mammalian cells (particularly human), insect cells, bird cells, annelid cells, amphibian cells, reptile cells, and fish cells), plant cells, and fungal cells (particularly yeast). In certain such host cells, the construction of the vector can be integrated into the genome of the host cell. The invention is also directed to starter molecules comprising a sequence that is amplified by PCR and a degenerate 3 'end. The initiator molecules according to this aspect of the invention preferably have the general structure: 5 '- (dT) aX-Nb-TTTATT-3', where a is an integer from 1 to 100 (preferably from 10 to 30) ), X is a sequence that is amplified by PCR consisting of a nucleic acid sequence of about 10-20 nucleotides in length, N is any nucleotide, and b is an integer from 0 to 6. A preferred initiator has the sequence nucleotide 5'-TTTTTTTT-TTTTCGTCAGCGGCCGCATCNNNNTTTATT-3 '(SEQ ID NOMO). In the related embodiments, the starter molecules according to this aspect of the invention can be biotinylated. The invention is also directed to methods for the synthesis of the first strand of cDNA comprising (a) coupling a first primer of the invention (such as the primer described above) to an RNA template molecule to form a first initiator complex -ARN, and (b) treating this first initiator-RNA complex with reverse transcriptase and one or more triphosphated deoxynucleotide molecules under conditions that favor reverse transcription of the first initiator-RNA complex to synthesize a first strand of cDNA. The invention is also directed to methods for isolating activated genes, particularly from a host cell genome. These methods of the invention exploit the structure of mRNA molecules produced using the non-targeted gene activation vectors of the invention. One such method of the invention comprises, for example, (a) introducing a construct of the vector comprising a transcriptional regulatory sequence and an altered processing donor site within the host cell (preferably one of the eukaryotic host cells described above). ), (b) allow the construction of the vector to be integrated into the genome of the host cell by non-homologous recombination, under conditions such that the vector activates an endogenous gene comprising an exon in the genome, (c) isolate RNA from of host cells, (d) synthesizing the first strand of cDNA according to the method of the invention described above, (e) coupling a second primer specific for the vector encoding the exon of the first strand of cDNA to create a second complex initiator-first strand of cDNA, and (f) contacting the second complex of the primer-first strand of cDNA with a low DNA polymerase. or conditions that favor the production of a second strand of cDNA substantially complementary to the first strand of cDNA. Methods according to this aspect of the invention may comprise one or more additional steps, such as treating the second strand of cDNA with a restriction enzyme that cleaves a restriction site located on the vector towards the 3 'end of the donor site of altered processing, or by amplifying the second strand of cDNA using a third primer specific for the exon encoded by the vector and a fourth primer specific for the first primer. The invention is also directed to isolated genes produced in accordance with these methods and to vectors (which can be expression vectors) and host cells comprising these isolated genes. The invention also is directed to methods for producing a polypeptide, which comprises culturing a host cell comprising the isolated gene (or a vector, particularly an expression vector, comprising the isolated gene), and culturing the host cell under conditions that favor the expression by the host cell of a polypeptide encoded by the isolated gene. The invention also provides additional methods for producing a polypeptide, which comprises introducing into a host cell a vector comprising a transcriptional regulatory sequence operably linked to an exonic region followed by an altered processing donor site, and culturing the host cell under conditions that favor the expression by said host cell of a polypeptide encoded by the exonic region, where the exon contains a translation initiation site located at any of the positions of the open reading frame relative to the base most 5 'of the site of the altered processing donor (for example, the "A" in the ATG start codon may be in the -3 position or in a three-base increment towards the 5 'end thereof (eg -6, -9, - 12, -15, -18- etc), in position -2 or an increase of three bases towards the 5 'end thereof (for example, -5, -8, -11, -14, -17, - 20, etc .), or to a -1 position or to an increase of three bases towards the 5 'end thereof (for example, -4, -7, -10. -13, -16, -19, etc.), relative to the base plus 5 'of the donor site of processing). In the related embodiments, the methods of the invention may further comprise isolation of the polypeptide. The invention is also directed to polypeptides, which may or may not be isolated polypeptides, produced in accordance with these methods. Other preferred embodiments of the present invention will be apparent to those skilled in the art in light of the following drawings and description of the invention, as well as the claims.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1.- Schematic diagram of the activation events of genes described in the present. The activation construct is transfected into the cells and allowed to integrate into the chromosomes of the host cell to DNA breaks. If the break occurs towards the 5 'end of a gene of interest (e.g., Epo) and the appropriate activation construct is integrated to the break in such a way that its regulatory sequence is operatively linked to the gene of interest, activation will then be presented. of the gene. Transcription and processing produce a chimeric RNA molecule that contains exonic sequences from the activation construct and from the endogenous gene. Subsequent translation will result in the production of the protein of interest. After isolation of the recombinant cell, gene expression can be further increased by amplification of the gene. Figure 2.- Schematic diagram of activation constructions not translated. The arrows show the promoter sequences. The exonic sequences are shown as open boxes and the donor sequence of processing is indicated by S / D (D / E). The construction numbers corresponding to the following description are shown on the left side. The selection and amplification markers are not displayed. Figure 3.- Schematic diagram of translated activation constructions. The arrows show the promoter sequences. The exonic sequences are shown as open boxes and the donor sequence of processing is indicated by S / D (D / E). Translated sequences of signal peptide, epitope tag, and protease cut sequences are shown in the legend below the constructs. The building numbers corresponding to the following description are shown on the left side. The selection and amplification markers are not displayed. Figure 4. Schematic diagram of an activation construct that can activate endogenous genes. Figures 5A-5D. Nucleotide sequence of pRIG8R1-CD2 (SEQ ID NO: 7). Figures 6A-6C. Nucleotide sequence of pRIG8R2-CD2 (SEQ ID NO: 8). Figures 7A-7C. Nucleotide sequence of pRIG8R3-CD2 (SEQ ID NO: 9). Fiquras 8A-8F Examples of poly (A) trap vectors. Each of the vectors was illustrated schematically in its linearized form. Each horizontal line represents a DNA molecule. The arrows denote the promoter sequences located on the DNA molecule, and look in the direction of transcription. The transcribed regions include all sequences located towards the 3 'end of a promoter. Untranslated regions were designated by shaded boxes and open reading frames were designated by open boxes. The following designations were used: processing donor site (S / D), secretion signal (SP) sequences, epitope tag (ET), neomycin resistance gene (Neo). In the vectors described in Figures 8B-8E, it is possible to omit the site of the processing donor immediately towards the 3 'end of the Neo gene. In vectors lacking a processing donor site between the neo gene and the promoter to the 3 'end, the Neo transcript will use the donor donor site located 3' from the promoter to the 3 'end. In addition, as shown in the vectors described in Figures 8B-8E, a promoter toward the 3 'end can direct the expression of an exon. It is recognized that this exon, when present, can encode codons in any reading frame. Using multiple vectors, codons can be created in each of the three possible reading frames.
Fiquras 9A-9F Examples of processing acceptor trap vectors containing a positive selection marker and a negative selection marker directed from a single promoter Each vector is schematically illustrated in its linearized form. Each horizontal line represents a DNA molecule. Arrows denote localized promoter sequences on the DNA molecule, and look in the direction of the transcript. The transcribed regions include all sequences located towards the 3 'end of a promoter. Untranslated regions are designated by shaded boxes. Poly (A) signals are not present in these examples. As described in the specification, however, poly (A) signals can be located on the 3 'vector or on both selection markers. The following designations were used: processing donor site (S / D), secretion signal sequence (SP), epitope tag (ET), internal ribosome entry site (ires), hypoxanthine phosphoribosyl transferase (HPRT), and gene resistant to neomycin (Neo). In these examples, Neo represents the positive marker and HPRT represents the negative selection marker. In the vectors shown in Figures 9C and 9F, the region designated exon contains a start translation codon. As described in the detailed description, the exon can encode a methionine residue, a partial signal sequence, a complete secretion signal sequence, a portion of a protein, or a tag epitope. In addition, the codons may be presented in any reading frame in relation to their donor processing site. In other examples of vectors not shown, the region designated exon lacks a start codon of the translation.
Fiqura 10A-10F Examples of processing acceptor trap vectors containing a positive selection marker and a negative selection marker directed from promoters. Each vector is illustrated schematically in its linearized form. Each horizontal line represents a DNA molecule. The arrows denote the promoter sequences located on the DNA molecule, and look in the direction of transcription. The transcribed regions include all sequences located towards the 3 'end of a promoter. Untranslated regions are designated by shaded boxes. Poly (A) signals are not present in these examples. As described in the specification, however, the poly (A) signals can be located on the 3 'vector or on both selection markers. The following designations were used: processing donation site (S / D), internal ribosome entry site (ires), hypoxanthine phosphoribosyltransferase (HPRT), and neomycin resistance gene (Neo). In the vectors shown in Figures 10A-10F, Neo represents the positive selection marker HPRT represents the negative selection marker. As shown, the vectors described in FIGS. 10A-10F do not contain a 3 'processing located donation site of the Neo gene to facilitate processing of the positive selection marker in an endogenous exon. In the vectors shown in Figures 10C and 10F, the region designated exon contains a start codon of the translation. As described in the detailed description, the exon can encode a methionine residue, a partial signal sequence, a complete secretion signal sequence, a portion of a protein, or a tag epitope. In addition, the codons may be present in any reading frame relative to the processing donation site. In other examples of vectors not shown, the region designated exon lacks a start codon of the translation.
Fiquras 11A-11 C The schematic diagram of the activation and directional vectors. The arrows denote the promoter sequences. The exons are shown as check boxes and the processing donation sites are indicated by S / D. The shaded boxes indicate the sequences of the exon operatively linked to the promoter towards the 5 'end. It is understood that the exons of the vectors may not be translated, or may contain an additional start codon and codons as described herein. As illustrated in the vectors described in Figures 11B-11C, they may contain a selection marker. In these vectors, the neomycin resistance gene (Neo) is illustrated. In Figure 11 B, a polyadenylation signal (pA) is located towards the 3 'end of the selection marker. In Figure 11 C, the polyadenylation signals are omitted from the vector.
Figures 12A-12G. Examples of vectors useful for recovering exon I from activated endogenous genes. Each vector is illustrated schematically in its linearized form. Each horizontal line represents a DNA molecule. The arrows denote promoter sequences located on the DNA molecule and look in the direction of transcription. The transcribed regions include all sequences located towards the 3 'end of a promoter. Untranslated regions were designated by shaded boxes. Poly (A) signals are not present in the described vectors. As discussed in the detailed description, however, the signals, poly (A) can be located on the 3 'vector in each selection marker or both. The following designations were used: processing donor site (S / D), internal ribosome entry site (ires), hypoxanthine phosphoribosyltransferase (HPRT), and neomycin resistance gene (Neo). In these examples, Neo represents the positive selection marker and HPRT represents the negative selection marker. It is also recognized that in these examples, the region designated exon, when present, lacks a start codon for translation. In other examples not shown, the region designated exon contains a start codon of the translation. In addition, when the exon vector contains a translation initiation codon, the exon can encode a methionine residue, a partial signal sequence, a total secretion signal sequence, a portion of a protein, or a tag epitope. In addition, the codons may be present in each of the reading frames in relation to the processing donation site.
Fiqura 13 The illustration describes two transcripts produced from the integrated vectors described in Figures 12A-12G. DNA strands are described as horizontal lines. The DNA vector is shown as a black line. The endogenous genomic DNA is shown as a gray line. The rectangles describe exons. The vectors encoded by exons are shown as open rectangles, although the endogenous exons are shown as shaded boxes. S / D denotes a donor site for processing. Following integration, the promoters encoded by the vector activate the transcription of the endogenous vectors. Transcription results from the promoter at the 5 'end that produces a modified RNA molecule containing the exon encoding the vector linked to the second and subsequent exons from an endogenous gene. On the other hand, transcription from the promoter towards the 3 'end, produces a transcript containing the sequences and the 3' end of the integrated bound to exon I and the subsequent exons from an endogenous gene. Figures 14A-14B. Nucleotide sequence of pRIG1 (SEQ ID NO: 18). Figures 15A-15B. Nucleotide sequence of PRIG21 b (SEQ ID NO: 19). Figures 16A-16B. Nucleotide sequence of PRIG22B (SEQ ID NO: 20). Fiquras 17A-17G. Examples of poly (A) trap vectors Each vector is illustrated schematically in its linearized form. Each horizontal line represents a DNA molecule. The arrows denote promoter sequences located on the DNA molecule, and look in the direction of transcription. The transcribed regions include all sequences located towards the 3 'end of a promoter. The boxes indicate exons. Shaded boxes indicate untranslated regions. The following designations were used: processing donation site (S / D), secretion signal sequence (SP), epitope tag (ET), neomycin resistance gene (Neo). Vector promoter # 1 (VP # 1), and vector promoter # 2 (VP # 2). As shown in the vectors described in Figures 17C-17G, a promoter operably linked to an exon element in the altered processing donor site can be located towards the 5 'end of the selection marker. It is recognized that this exon, when present, can encode codons as a start codon in any reading frame relative to the donor site of processing. To activate the expression of proteins from genes with different reading frames, three separate vectors can be used, each with a start codon in a different reading frame relative to the donation processing site.
Fig. 18 Illustration of the transcripts produced by the vector from Figure 17C after integration into a host cell genome towards the 5 'end of an endogenous gene with multiexons. Each horizontal line represents a DNA molecule. The vertical lines that run through the DNA strand mark the boundaries towards the 5 'and 3' end of the vector / cell genome. The arrows denote the promoter sequences located on the DNA molecule, and look in the direction of transcription. The transcribed regions include all sequences located towards the 3 'end of a promoter. The boxes indicate exons. Shaded boxes indicate untranslated regions. The endogenous exons are numbered using Roman numerals. The following designations were used: processing donation site (S / D), neomycin resistance gene (Neo), promoter vector # 1 (VP # 1), promoter vector # 2 (VP # 2), endogenous promoter (EP ) and polyadenylation signal (pA). Following integration, promoter vector # 1 expresses a chimeric transcript containing the Neo gene linked to the genomic sequences towards the 3 'end of the integration site, including the processed exons (modified) from the endogenous gene. Since the # 1 transcript contains a poly (A) signal from the endogenous gene, the Neo gene product will be efficiently produced, thereby conferring drug resistance to the cell. In addition to transcript # 1, the integrated vector will generate a second transcript, designated transcript # 2, which originates from promoter vector # 2. The structure of a # 2 transcript facilitates the efficient translation of the protein encoded by the endogenous gene. As exemplified in Figure 17, vectors containing alternative encoding information in the exon encoded by the vector can be used to produce different chimeric proteins, containing, for example, signal sequence and / or epitope tags.
Fiqura 19 Example of a dual vector positive selection marker. The vector was illustrated schematically in its linearized form. The horizontal line represents a DNA molecule. The arrows denote promoter sequences located on the DNA molecule, and look in the direction of transcription. The transcribed regions include all sequences located towards the 3 'end of a promoter. The boxes indicate exons. Shaded boxes indicate untranslated regions. The poly (A) signals are not present in these examples. The following designations were used: processing donation site (S / D), hygromycin resistance gene (Hig), neomycin resistance gene (Neo), promoter vector # 1, and a # 2 promoter vector.
Fiquras 20A-20B. Examples of transcripts produced by a dual vector positive selection marker integrated into a host cell genome adjacent to an endogenous gene. Figure 20A illustrates the transcripts produced after integration of the vector near a multiexonic gene. Figure 20B illustrates the transcripts produced on the integration of the vector near a single exon gene. Each of the horizontal lines represents a DNA molecule. The vertical lines that run through the DNA strand mark the boundaries of the 5 'and 3' end of the vector / cell genome. The arrows denote the promoter sequences located on the DNA molecule, and look in the direction of transcription. The transcribed regions include all sequences located towards the 3 'end of each of the promoter boxes indicating exons. Shaded boxes indicate untranslated regions. Endogenous exons are enumerated using Roman numerals. The following designations were used: processing donor site (S / D), hygromycin resistance gene (Hig), neomycin resistance gene (Neo), promoter vector # 1 (VP # 1), promoter vector # 2 (VP # 2), endogenous promoter (EP), and polyadenylation signal (pA). Following integration, the promoter vector # 1 expresses a chimeric transcript containing the Hig gene linked to the genomic sequences towards the 3 'end of the integration site, including the processed exons (modified) from the endogenous gene. Since transcript # 1 contains a poly (A) signal from the endogenous gene, the product of the Hig gene will be efficiently produced, thereby conferring drug resistance to the cell. In addition to transcript # 1, the integrated vector will generate a second transcript, designated transcript # 2, which originates from promoter vector # 2. In Figure 20A, the neo gene is removed from transcript # 2 after processing from the vector encoded by the modification donor site, and the first to endogenous processing sector located towards the 3 'end of the integration site of the vector (ie exon II in this example). Since genes with multiple exons contain sites to processing sectors towards the 5 'end of each exon (except for exon I), the neo gene will be removed from transcript # 2 in the cells in which the vector a is integrated into a gene of multiple exons that is nearby, and has activated it transcriptionally. As a result, cells that have activated multiple exon genes can be activated by selection with G418 and hygromycin. In Figure 20B, the neo gene was not removed from the # 2 transcript by processing, since the genes with a single exon do not contain any modification acceptor sequence. Therefore, cells that contain an integrated vector near the unique exon genes will survive double screening with G418 and hygromycin. These cells can be useful to efficiently isolate the activated single exon genes using methods described herein.
Fiquras 21 A-21 B Examples of dual trap vectors containing a positive selection marker and a negative selection marker. Each vector is illustrated schematically in its linearized form. Each horizontal line represents a DNA molecule. Arrows denote localized promoter sequences on the DNA molecule, and look in the direction of the transcript. The transcribed regions include all sequences located towards the 3 'end of a promoter. The boxes indicate exons. Shaded boxes indicate untranslated regions. The following designations were used: processing donor site (S / D), hypoxanthine phosphoribosyl transferase (HPRT), neomycin resistance gene (Neo), promoter vector # 1 (VP # 1), promoter vector # 2 (VP # 2) ), and vector promoter # 3 (VP # 3). In the vectors shown in Figures 21A-21 B, Neo represents the positive selection marker and HPRT represents the negative selection marker. At 21 B a third promoter is located towards the 3 'end of the selection markers. This promoter towards the 5 'end is operatively linked to an exon and is not coupled to a donor site of processing. Fig. The region designated exon contains a start codon of translation in this example. As described herein, the exon can encode a methionine residue, a partial signal sequence, a complete secretion signal sequence, a portion of a protein, or a branded epitope. In addition, the codons can be presented in any reading frame in relation to the donor site of processing. In other examples of vectors that are not shown, the region designated exon lacks a start codon of the translation.
Fig. 22 Examples of transcripts produced by a dual positive / negative selection marker vector integrated within the genome of a host cell towards the 5 'end of an endogenous gene with multiple exons. Each horizontal line represents a DNA molecule. The vertical lines that run through the DNA strands mark the 5 'end and the 3' end of the vector / cell genome boundaries. The arrows denote the promoter sequences located on the DNA molecule, and look in the direction of transcription. The transcribed regions include all sequences located towards the 3 'end of each promoter. The boxes indicate exons. Shaded boxes indicate untranslated regions. The endogenous exons are numbered using Roman numerals. The following designations were used: processing donation site (S / D), neomycin resistance gene (Neo), promoter vector # 1 (VP # 1), promoter vector "2 (VP # 2), promoter vector # 3 (VP #), polyadenylation signal (pA), and endogenous promoter (EP) Following the integration, promoter vector # 1 expresses a chimeric transcript containing the Neo gene linked to the genomic sequences towards the 3 'end of the site of integration, including the exons processed (modified) from the endogenous gene Since the # 1 transcript contains a poly (A) signal from the endogenous gene, the product of the Neo gene will be produced efficiently, thus conferring resistance to the drug to the cell In addition to transcript # 1, the integrated vector will generate a second transcript, designated transcript # 2, which originates from promoter vector # 2 In this example, the vector has been integrated towards the 5 'end of a gene with multiple exons. s genes with multiple exons contains processing acceptor sites towards the 5 'end of each exon, the HPRT gene will be removed from transcript # 2 in cells in which the vector has been integrated near a gene with multiple exons, and has activated it transcriptionally. As a result, cells containing genes with multiple activated exons can be isolated by selecting with G418 and 8-azaguanine 6-thioguanine (AgThg), therefore, cells that contain a vector integrated near genes with a single exon will survive to a double selection with G418 and AgThg. These cells can be used to efficiently isolate the activated genes from multiple exons using methods described herein. In addition to transcripts # 1 and # 2, a third transcript, designated transcript # 3, is produced from the integrated vector. Transcript # 3 originates from promoter vector # 3, which contains an exonic sequence suitable for directing protein expression from the endogenous gene. This occurs following processing from the donor site of initial processing towards the 3 'end of promoter # 3 towards the first acceptor site of processing towards the 3' end from the endogenous gene. In addition to directing protein expression, transcript # 3, and / or transcripts # 1 and / or # 2, can be isolated for gene discovery purposes using the methods described herein.
Fiquras 23A-23D Example of an exon vector with multipromotors / activation. Each vector is illustrated schematically in its linearized form. Each horizontal line represents a DNA molecule. The arrows denote promoter sequences. The boxes indicate exons. Shaded boxes indicate untranslated regions. It is understood that the exons of these vectors may not be translated, or may contain a start codon and additional codons as described herein. The following designations were used: processing donation site (S / D), promoter vector # 1 (VP # 1), promoter vector # 2 (VP # 2), promoter vector # 3 (VP # 3), and vector promoter # 4 (VP # 4). The individual exons for vector activation were designated A, B, C, and D. Each activation exon may contain a different structure. The structure of each activation exon and its flanking intron are shown below. It will be understood, however, that any activation exons described herein can be used in these vectors, in any combination and / or order, including exons encoding signal sequence, partial signal sequences, epitopes, tags, proteins, protein portions and protein motifs . Any of the exons may lack a start codon. In addition, although not illustrated in these examples, these vectors may contain a selectable marker and / or an amplifiable marker. The selection marker may contain a poly (A) signal or a donation processing site. When present, the processing donation site can be located towards the 5 'end or towards the 3' end of the selection marker. Alternatively, the selection marker may not be operably linked to a poly (A) signal and / or a processing donation site.
Fig. 24 Examples of the transcripts produced from a vector with a multiple promoter / activation exon after integration into the genome of a host cell towards the 5 'end of an endogenous gene. Each horizontal line represents a DNA molecule. The vertical lines that run through the DNA strand mark the boundaries towards the 5 'end and towards the 3' end of the vector / cell genome. The arrows denote promoter sequences located on the DNA molecule, and look in the direction of transcription. The transcribed regions include all the sequences located towards the 3 'end of each promoter. The boxes indicate exons. Shaded boxes indicate untranslated regions. The endogenous exons are numbered using Roman numerals. The following designations were used: processing donation site (S / D), promoter vector # 1 (VP # 1), promoter vector # 2 (VP # 2), promoter vector # 3 (VP # 3), promoter vector # 4 (VP # 4), endogenous promoter (EP), and polyadenylation signal (pA). The individual exons activation vectors were designated A, B, C, and D. Following integration, each vector encoding a promoter is capable of producing a different transcript. Each transcript contains a different activation exon linked to the first processing acceptor site towards the 3 'end from an endogenous gene (exon II in this example). The individual activation exons were designated by (A), (B), (C), or (D). The endogenous exons were designated by (I), (II), (III), or (IV). Generally, the coding sequence and / or reading frames, if present, are different between the activation exons. Although four activation exons were illustrated in this example, any number of activated exons may be present on the integrated vectors.
Fiquras 25A-25D Examples of activation vectors useful for the detection of protein-protein interactions. Each vector is illustrated schematically in its linearized form. Each horizontal line represents a DNA molecule. The arrows denote promoter sequences. The boxes denote exons. Shaded boxes indicate untranslated regions. The following designations were used: processing donation site (S / D), neomycin resistance gene (Neo). It is also recognized that the DNA binding domain and the activation domain can be encoded in any reading frame (relative to the donation processing site), allowing the activation of endogenous genes with different reading frames.
Fiqura 26 Schematic illustration describing a method of detecting protein-protein interactions using the vectors shown in Figure 25. Each of the horizontal lines represents a DNA molecule. The vertical lines that run through the DNA strand mark the boundaries towards the 5 'end and towards the 3' end of the vector / cell genome. The arrows denote promoter sequences located on the DNA molecule, and look in the direction of transcription. The transcribed regions include all the sequences located towards the 3 'end of each promoter. The boxes indicate exons. Shaded boxes indicate untranslated regions. The endogenous exons were enumerated using Roman numerals. The following designations were used: processing donation site (S / D), binding domain (BD), activation domain (AD), recognition sequence (RS), and polyadenylation signal (pA). The vector of the sample binding domain integrated within the genome of a host cell, towards the 5 'end of an endogenous gene, designated gene A. The vector of the activation domain is shown integrated into the genome of the same host cell towards the host. 5 'end of an endogenous gene, designated gene B. Both vectors are integrated into the genome of the same host cell. Following integration, each vector is capable of producing a fusion protein containing the binding domain (or activation domain, as the case may be) and the protein encoded by the endogenous gene towards the 3 'end. If the fusion protein of the binding domain interacts with the activation protein of the activation domain, a protein complex will be formed. This complex is capable of increasing the expression of a reporter gene present in the cell.
Fig. 27 Examples of useful activation vectors for in vitro and in vivo transposition. Each vector is illustrated schematically in its linearized form. Each horizontal line represents a DNA molecule. The arrows denote promoter sequences. The boxes denote exons. Shaded boxes indicate untranslated regions. The solid boxes indicate the transposon signals. It is recognized that there is a directionality in the signals of the transposon, and that the signals are oriented in the appropriate configuration for the type of transposition reaction (integration, inversion or deletion). The following designations were used: processing donation site (S / D), neomycin resistance gene (Neo), dihydrofolate reductase (DHFR), puromycin resistance gene (Pure), poly (A) signal (pA), and the origin of replication of the Epstein Barr virus (ori P). It is also recognized that the activation exon can encode amino acids in any reading frame (in relation to the donation processing site), allowing the activation of endogenous genes with different reading frames.
Fiqura 28 Schematic illustration describing the integration of an activation vector into a genomic DNA fragment cloned by in vitro transposition. Each horizontal line represents a DNA molecule. The cloned genomic DNA is in a BAC vector. The single line represents the genomic DNA and the rectangle describes the sequences of the BAC vector. The arrows denote promoter sequences located on the DNA molecule, and look in the direction of transcription. The transcribed regions include all sequences located towards the 3 'end of each promoter. The exon activation vector is described as an open box. The exons from a gene encoded in the cloned genomic fragment are described as shaded boxes. The solid boxes indicate the transposon signals. It is recognized that there is directionality in the transposon signals, and that the signals are oriented in the proper configuration for the type of transposition reaction (integration, inversion, or deletion). The following designations were used: processing binding site (S / D), and polyadenylation signal (pA). To integrate the vector into the genomic fragment, the activation vector was incubated with the cloned genomic DNA in the presence of transposase. Following the integration of the activation vector into the genomic fragment, the plasmid can be transfected directly into a suitable eukaryotic host cell to express the gene located towards the 3 'end of the vector's integration site. Alternatively, the BAC plasmid can be transformed into E. coli to produce larger amounts of plasmid for transfection within the appropriate eukaryotic host cell. Figures 29A-29B. Nucleotide sequence of pRIG14. Figures 30A-30C. Nucleotide sequence of pRIG19. Figures 31A-31C. Nucleotide sequence of pRIG20. Figures 32A-32C. Nucleotide sequence of pRIGadl. Figures 33A-33D. Nucleotide sequence of pRIGdbl. Figures 34A-34B. Nucleotide sequence of pUniBAC. Figures 35A-35B. Nucleotide sequence of pRIG22 .. Figure 36. Schematic diagram of pRIG-TP. The vector is shown in its linearized form. The horizontal line represents a DNA molecule. The arrows denote promoters. The open boxes indicate exons. The filled boxes represent recombination signals of the transposon (from Tn5 - compatible with the in vitro transposition equipment available from Epicenter Technologies). The following designations were used: processing donation site (S / D), puromycin resistance gene (pure), dihydrofolate reductase gene (DHFR), Epstein Barr nuclear antigen 1 replication protein (EBNA-1) , Epstein Barr origin of virus replication (ori P), poly (A) signal (pA) and activation exon (AE). It is understood that the activation exon can contain any sequence capable of directing protein synthesis, including a start codon of translation in any reading frame, a partial secretion signal sequence, a total secretion signal sequence, a brand epitope , a protein, a portion of a protein, or a protein motif. The activation exon may also lack a start codon of translation. Figures 37A-37C. Nucleotide sequence of pRIG-T.
DETAILED DESCRIPTION OF THE INVENTION There are great advantages of genetic activation through non-homologous recombination compared with other genetic activation procedures. Unlike previous methods of protein overexpression, the methods described herein do not require that the gene of interest be cloned (isolated from the cell). Nor do they require any knowledge of the DNA sequence or structure of the gene to be overexpressed (ie, the sequence of ORF (open reading frame), introns, exons, or regulatory elements towards the 5 end and towards the 3 end. ') or knowledge of gene expression patterns (ie specific character of the tissue, regulation of development, etc.). Moreover, the methods do not require any knowledge regarding the genomic organization of the gene of interest (ie, the structure of the intron and exon). The methods of the present invention therefore involve vector constructs that do not contain nucleotide sequences selected for homologous recombination. A selected sequence allows homologous DNA recombination of the vector with cellular DNA at a predetermined site in the cellular DNA, the site has homology to the sequences in the vector, homologous recombination at the predetermined site results in the introduction of the regulatory sequence of transcription in the genome and the subsequent activation of endogenous gene. The method of the present invention does not involve the integration of the vector at predetermined sites. In contrast, the methods of the present invention involve the integration of the vector constructs of the invention into cellular DNA (eg, the cell genome) by non-homologous or "illegitimate" recombination, also called "non-gene targeted" activation. . In the related embodiments, the present invention also relates to activation not directed to the gene. Activation not directed to the gene has another important application. First, by activating genes that are not normally expressed in a given cell type, it becomes possible to isolate a cDNA copy of genes independently of their normal expression pattern. This facilitates the isolation of genes that are normally expressed in rare cells during short periods of development, and at a very low level. Second, by activating genes translationally, it is possible to produce protein expression libraries without the need to clone the full-length cDNA. These libraries can be selected for novel enzymes and proteins and / or for interesting phenotypes that result from the overexpression of an endogenous gene. Third, cell lines that overexpress a specific protein can be created and used to produce commercial quantities of protein. Thus, activating endogenous genes provides a powerful method to discover and isolate novel genes and proteins, and to produce large amounts of specific proteins for commercialization. The vectors described herein do not contain selected sequences. A selected sequence is a sequence in the vector that has homology to a sequence or sequences within the gene to be activated or to the 5 'end of the gene to be activated, the region toward the 5' end that is located above it. of, and including the first acceptor site of functional processing in the same coding chain of the gene of interest, and whereby the homology of the transcriptional regulatory sequence that activates the gene of interest is integrated into the genome of the cell that contains the gene that is going to be activated. In the case of an enhancer integration vector that activates an endogenous gene, the vector contains no homology to any sequence in the genome towards the 5 'end or towards the 3' end of the gene of interest (or within the gene of interest) to a distance that extends as far as the enhancing function is operative. Therefore, the present methods, are able to identify new genes that have been omitted or that can be omitted using conventional and currently available cloning techniques. By using the constructs and methodology described herein, one can easily identify unknown and / or uncharacterized genes that have been overexpressed to produce proteins. The proteins have uses such as, among others, therapeutic and diagnostic in humans and as targets for the discovery of drugs. The described methods are also capable of producing the overexpression of genes known and / or characterized for the production of proteins in vitro or in vivo. A "known" gene is directed to the level of characterization of a gene. The invention allows the expression of genes that have been characterized, as well as the expression of genes that have not been characterized. It is possible to have different levels of characterization. These include detailed characterization, such as cloning, DNA, RNA, and / or protein sequencing, and in relation to the regulation and function of the gene to the cloned sequence (eg, recognition of the promoter and enhancer sequences, functions of the open reading frames, introns and the like). The characterization may be less detailed, such as having mapped a gene and its related function, or having a partial amino acid or nucleotide sequence, or having purified a protein and investigated a function. The characterization may be minimal, as in the case when a nucleotide or amino acid sequence is known or a protein has been isolated but the function is unknown. Alternatively, a function may be known but the associated protein or nucleotide sequence is not known or is known, but has not been correlated with function. Finally, there may be no characterization, when the existence and function of the gene are not known. The invention allows the expression of any gene in any of these or other specific characterization grades. Many different proteins can be activated or overexpressed (also referred to herein interchangeably as "gene products" or "expression products") by a simple activation construct and in a single group of transfections. Therefore, a single cell or different cells in a group of transfectants (library) can overexpress more than one protein after transfection with the same constructs or with different constructions. The previous activation methods require a unique construction to be generated for each gene that is to be activated. In addition, various different integration sites adjacent to a single gene can be generated, and can be tested simultaneously using a single construct. This allows the rapid determination of the optimal genomic position of the activation construct for the expression of the protein. Using the above methods, the 5 'end of the gene of interest had to be characterized extensively with respect to sequence and structure. In order for each activation construct to occur, an appropriate selected sequence must be isolated. Generally, this must be an isogenic sequence isolated from the same person or laboratory animal strain according to the type of cells that will be activated. In some cases, this DNA can be 50 kb or more of the gene of interest. Therefore, the production of each identification construct required a difficult amount of cloning and sequencing of the endogenous gene. However, since the sequence and structure information for the methods of the present invention are not required, unknown genes and genes with regions to the 5 'uncharacterized end may be activated. This is possible using in situ gene activation using non-homologous recombination of exogenous DNA sequences with cellular DNA. The methods and compositions (for example, vector constructs) that are required to achieve such in situ gene activation using non-homologous recombination are contemplated by the present invention. DNA molecules can recombine to redistribute their genetic content by several different and distinct mechanisms, including homologous recombination, site-specific recombination, and non-homologous / illegitimate recombination. Homologous recombination involves recombination between DNA spaces that are very similar in sequence. It has been shown that homologous recombination involves matching between the homologous sequences along their length prior to the redistribution of the genetic material. The exact site of crosslinking can be any point in the homologous segments. The efficiency of recombination is proportional to the length of homologous selected sequences (Hope, Development 113: 399 (1991J; Reddy er al., J. Virol. 65? 507 (1991)), the degree of sequence identity between the two recombinant sequences (von Melchner et al., Genes Dev. 6: 919 (1992)), and the relationship of Homologous to non-homologous DNA present in the construct (Letson, Genetics 777: 759 (1987)). On the other hand, site-specific recombination involves the exchange of genetic material at a predetermined site, designated by specific DNA sequences. In this reaction, a recombinase protein binds to the recombination signal sequences, generates a chain excision, and facilitates the exchange of DNA strands. Cre / Lox recombination is an example of site-specific recombination. Non-homologous / illegitimate recombination such as that which is conveniently used by means of the methods of the present invention, involves the binding (exchange or redistribution) of genetic material that does not share significant sequence homology and does not occur in the sequences of site-specific recombination. Examples of non-homologous recombination include the integration of exogenous DNA into chromosomes at non-homologous sites, translocations and chromosomal deletions, DNA end binding, repair of double-stranded chromosomal end breaks, bridging-breaking fusion, and concatemerization of transfected sequences . In most cases, it is considered that non-homologous recombination occurs through the binding of "free DNA ends". Free ends are DNA molecules that contain an end that can be attached to a second DNA and either directly, or after repair or processing. The DNA terminus may consist of a 5 'pendant end, 3' pendant, or shaved end. As used herein, retroviral insertion and other transposition reactions are vaguely considered forms of non-homologous recombination. These reactions do not involve the use of homology between the recombinant molecules. Moreover, unlike site-specific recombination, these types of recombination reactions do not occur between discrete sites. Instead, a specific protein / DNA complex is required in only one of the recombination elements (ie, the retrovirus or transposon), with the second DNA element (ie, the cell genome) that is generally not It is relatively specific. As a result, these "vectors" are not integrated into the cell genome in a selected form, and therefore can be used to provide the activation construct according to the present invention. Useful vector constructs for the methods described herein may ideally contain a transcriptional regulatory sequence that undergoes non-homologous recombination with genomic sequences in a cell to overexpress an endogenous gene in that cell.
The vector constructs of the invention also lack homologous identification sequences. That is, they do not contain DNA sequences that select the DNA of the host cell and promote homologous recombination at the selected site. Therefore, the integration of the vector constructs of the present invention into the cell genome occurs by means of non-homologous recombination and can lead to the overexpression of a cellular gene by the regulatory sequence of the introduced transcription that is contained in the construction of integrated vector. The invention generally relates to methods for overexpressing an endogenous gene in a cell, comprising the introduction of a vector containing a transcriptional regulatory sequence within the cell, allowing the vector to be integrated into the genome of the cell by recombination it does not homologate and thus allowing overexpression of the endogenous gene in the cell. The method does not require prior knowledge of the sequence of the endogenous gene or even of the existence of the gene. However, when the sequence of the gene to be activated is known, the constructs can be genetically manipulated to contain the appropriate configuration of the vector elements (eg, position of the start codon, addition of codons present in the first exon of the endogenous gene, and the appropriate reading frame) to achieve maximum overexpression and / or adequate protein sequence. In certain embodiments of the invention, the cell containing the vector can be selected for the expression of the gene. The cell that overexpresses the gene can be cultured in vitro under conditions that favor the production, of the cell, of desired amounts of the gene product of the endogenous gene that has been activated or whose expression has been increased. If desired, the gene product can be isolated or purified for use, for example in protein therapy or drug discovery. Alternatively, the cell expressing the desired gene product may be allowed to express the gene product in vivo. The vector construct may consist essentially of the regulatory sequence of the transcript. Alternatively, the vector construct may consist essentially of the transcriptional regulatory sequence and one or more amplifiable markers. Accordingly, the invention also relates to methods for overexpressing an endogenous gene in a cell, comprising the introduction of a vector containing a transcriptional regulatory sequence and an amplifiable marker within the cell, allowing the vector to integrate into the genome of the cell by non-homologous recombination, and allowing the overexpression of the endogenous gene in the cell. The cell containing the vector is selected for overexpression of the gene. The cell that overexpresses the gene is cultured in such a way that the amplification of the endogenous gene is obtained. The cell can then be cultured in vitro to produce the desired amounts of the gene product of the amplified endogenous gene that has been activated or whose expression has been increased. The gene product can then be isolated and purified.
Alternatively, after amplification, the cell can be allowed to express the endogenous gene and produce desired amounts of the gene product in vivo. The vector construct can consist essentially of the regulatory sequence of the transcription and of the donor sequence of processing. Therefore, the invention also encompasses methods for overexpressing an endogenous gene in a cell comprising the introduction of a vector containing a transcriptional regulatory sequence and a non-coupled processing donor sequence in the cell, allowing the vector to integrate in the genome of the cell through non-homologous recombination, and allowing overexpression of the endogenous gene in the cell. The cell containing the vector is selected for the expression of the gene. The cell that overexpresses the gene can be cultured in vitro to produce desirable quantities of the gene product of the endogenous gene whose expression has been activated or increased. The gene product can then be isolated and purified. As an alternative, the cell can be allowed to express the desired gene product in vivo. The vector construct may consist essentially of a transcriptional regulatory sequence operably linked to an uncoupled processing donor sequence and further containing an amplifiable marker. Other activation vectors include constructs with a transcriptional regulatory sequence and an exonic sequence containing a start codon; a transcriptional regulatory sequence and an exonic sequence containing a start codon of translation and a signal sequence of secretion; constructs with a transcriptional regulatory sequence and an exonic sequence containing a start codon of translation and a tag epitope; constructs containing a transcriptional regulatory sequence and an exonic sequence containing a start codon of translation, a signal sequence and a tag epitope; constructs containing a transcriptional regulatory sequence and an exonic sequence with a translation initiation codon, a secretion signal sequence, a tag epitope and a specific protease sequence site. In each of the above constructions, the exon in the construct is located immediately towards the 5 'end with respect to the non-coupled processing donation site. The constructs may also contain a regulatory sequence, a selectable marker lacking a poly (A) signal, an internal ribosome entry site ("ires"), and an uncoupled processing donation site ( Figure 4). Optionally, a start codon, a secretion signal sequence, a tag epitope, and / or a protease cleavage site may be included between the internal ribosome entry sites (ires) and the uncoupled processing donor sequence. When this construct is integrated towards the 5 'end of a gene, the selectable marker will be expressed efficiently, since the poly (A) site will be provided by the endogenous gene. In addition, the gene towards the 3 'end will be further expressed because the internal ribosome entry sites (ires) will allow the translation to the protein to start in the open reading frame towards the 3' end (i.e., the endogenous gene). ). Therefore, the message produced by this activation construct will be polycistronic. The advantage of this construction is that integration events that do not occur near the genes and in the proper orientation will not produce a drug-resistant colony. The reason for this is that without a poly (A) end (supplied by the endogenous gene), the neomycin resistance gene will not be expressed efficiently. By reducing the number of non-productive integration events, the complexity of the library can be reduced without affecting its coverage (the number of activated genes) and this will facilitate the selection procedure. In another embodiment of this construct, cre-lox recombination sequences may be included between the regulatory sequence and the neo start codon and between the internal ribosome entry sites (ires) and the uncoupled processing donation site (between the internal ribosome entry sites (ires) and the start codon if present). After isolation of cells that have activated the gene of interest, the neo gene and the internal ribosome entry sites (ires) can be removed by transfecting the cells with a plasmid encoding cre recombinase. This would eliminate the production of the polycistronic message and allow the endogenous gene to be expressed directly from the regulatory sequence in the integrated activation construct. The use of cre recombination has been described to facilitate the deletion of genetic elements from mammalian chromosomes in (Gu et al., Science 265: 103 (1994); Sauer, Meth. Enzymology 225: 890-900 (1993)). Therefore, constructs useful in the methods described herein include, but are not limited to, the following: (See also Figures 1-4): 1) Construction with a regulatory sequence and an exon lacking a codon of the start of the translation. 2) Construction with a regulatory sequence and an exon that lacks a start codon of translation followed by a donor site of processing. 3) Construction with a regulatory sequence and an exon containing a start codon of translation in reading frame 1 (in relation to the donor site of processing), followed by a non-coupled processing donation site. 4) Construction with a regulatory sequence and an exon containing a translation initiation codon in reading frame 2 (in relation to the donor processing site), followed by an uncoupled processing donation site. 5) Construction with a regulatory sequence and an exon containing a translation initiation codon in reading frame 3 (relative to the donor processing site), followed by a non-coupled processing donation site. 6) Construction with a regulatory sequence and an exon containing a codon of initiation of translation and a signal sequence of secretion in reading frame 1 (in relation to the donor site of processing) followed by a donation processing site not coupled. 7) Construction with a regulatory sequence and an exon containing a codon of initiation of translation and a signal sequence of secretion in reading frame 2 (in relation to the donor site of processing) followed by a donation processing site not coupled. 8) Construction with a regulatory sequence and an exon containing a translation start codon and a secretion signal sequence in reading frame 3 (in relation to the donor site of processing) followed by a non-processing donation site coupled. 9) Construction with a regulatory sequence and an exon containing (from 5 'to 3') a start codon of translation and a mark epitope in reading frame 1 (relative to the donor site of processing) followed by a donation site of non-coupled processing.
) Construction with a regulatory sequence and an exon containing (from 5 'to 3') a start codon of translation and a mark epitope in reading frame 2 (relative to the donor site of processing) followed by a donation site of non-coupled processing. 11) Construction with a regulatory sequence and an exon containing (from 5 'to 3') a start codon of translation and a mark epitope in reading frame 3 (relative to the donor site of processing) followed by a donation site of non-coupled processing. 12) Construction with a regulatory sequence and an exon containing (from 5 'to 3'), a translation initiation codon, a secretion signal sequence, and a mark epitope in reading frame 1 (relative to the donor site processing), followed by a non-coupled processing donation site. 13) Construction with a regulatory sequence and an exon containing (from 5 'to 3') a translation start codon, a secretion signal sequence, and a mark epitope in reading frame 2 (relative to the site) donor processing), followed by a non-coupled processing donation site. 14) Construction with a regulatory sequence and an exon containing (from 5 'to 3'), a start codon of the translation, a secretion signal sequence, and a mark epitope in the reading frame 3 (in relation to the donor site processing), followed by a non-coupled processing donation site.
) Construction with a regulatory sequence and an exon containing (from 5 'to 3'), a start codon of the translation, a secretion signal sequence, a tag epitope, and a sequence specific protease site in the framework of reading 1 (in relation to the donor site of processing), followed by a non-coupled processing donation site. 16) Construction with a regulatory sequence and an exon containing (from 5 'to 3'), a start codon of translation, a secretion signal sequence, a tag epitope, and a sequence specific protease site in the framework of reading 2 (in relation to the donor site of processing), followed by a non-coupled processing donation site. 17) Construction with a regulatory sequence and an exon containing (from 5 'to 3'), a start codon of translation, a secretion signal sequence, a tag epitope, and a sequence specific protease site in the framework of reading 3 (in relation to the donor site of processing), followed by a non-coupled processing donation site. 18) Construction with a regulatory sequence linked to a selectable marker, followed by an internal ribosome entry site, and an uncoupled processing donation site. 19) Construction 18 in which a cre / lox recombination signal is located between a) the regulatory sequence and the open reading frame of the selectable marker and b) between the internal ribosome entry sites (ires) and the donor donation site. processing not coupled. 20) Construction with a regulatory sequence operably linked to an exon containing green fluorescent protein lacking a stop codon, followed by a non-coupled processing donation site. However, it should be understood, that any vector used in the methods described herein may include one or more (ie, one, two, three, four, five, or more and most preferably one or two) amplifiable markers. In accordance with the foregoing, the methods may include a step in which the endogenous gene is amplified. The placement of one or more amplifiable markers in the activation construct results in the juxtaposition of the gene of interest and one or more amplifiable markers in the activated cell. Once the activated cell has been isolated, the expression can be further increased by selecting those cells that contain an increased copy number of the locus that contains both the gene of interest and the activation construct. This can be achieved by selection methods known in the art, for example by culturing cells in selective culture media containing one or more selection agents that are specific for one or more amplifiable markers contained in the genetic construct or vector. After activation of an endogenous gene by non-homologous integration of any of the vectors described above, the expression of the endogenous gene can also be increased by selecting the increased copies of the localized amplifiable marker (s) in the integrated vector. Although such an approach can be achieved using an amplifiable marker in the integrated vector, in an alternative embodiment of the invention, said methods are provided wherein two or more (i.e., two, three, four, five, or more, and most preferably two) ) amplifiable markers can be included in the vector to facilitate the most efficient selection of cells that have amplified the vector and the flanking gene of interest. This approach is particularly useful in cells that have a functional endogenous copy of one or more of the amplifiable marker (s) that are contained in the vector, since the selection procedure can result in the isolation of cells that have incorrectly amplified the endogenous amplifiable marker (s) in place of the amplifiable marker (s) encoded by vector. This approach is also useful for screening against cells that develop resistance to the selective agent by mechanisms that do not involve amplification of the gene. The approach using two or more amplifiable markers has great advantage in these situations due to the probability of a cell to develop resistance to two or more selective agents (resistance encoded by two or more amplifiable markers) without amplifying the integrated vector and the flanking of interest that is significantly less than the probability of the cell to develop resistance to any individual selective agent. Therefore, when selecting for two or more vector-encoded amplifiable markers, either simultaneously or sequentially, a higher percentage of cells that were finally isolated will contain the amplified vector and the gene of interest. Therefore, in another embodiment, the vectors of the invention may contain two or more (ie, two, three, four, five, or more, and most preferably two) amplifiable labels. This approach allows a more efficient amplification of the sequences of the vector and of the adjacent gene of interest after the activation of expression. Examples of amplifiable labels that can be used to construct the present vectors include, but are not limited to, dihydrofolate reductase, adenosine deaminase, aspartate transcarbamylase, dihydro-orotase, and carbamyl phosphate synthase. It is also understood that any of the constructions described herein may contain a replication eukaryotic viral origin, either in place of, or in conjunction with, an amplifiable marker. The presence of the viral origin of replication allows the integrated vector and the adjacent endogenous gene to be isolated as an episome and / or amplified to a larger number of copies upon introduction of the appropriate viral replication protein. Examples of useful viral sources are included, but are not limited to, ori SV40 and ori P EBV. The invention also encompasses embodiments in which the constructions described herein consist essentially of the components that are specifically described for these constructions. It is also understood that the above constructions are examples of constructions useful in the methods described herein, but that the invention encompasses functional equivalents of said constructions. It is understood that the term "vector" generally refers to the vehicle by which the nucleotide sequence is introduced into the cell. The intention is not to limit it to a specific sequence. The vector may itself be the nucleotide sequence that activates the endogenous gene or may contain the sequence that activates the endogenous gene. Therefore, the vector may simply be a linear or circular polynucleotide containing essentially only those sequences necessary for activation, or these sequences could be in a major polynucleotide or other construct such as a viral DNA or RNA genome or complete virion, or another biological construct that is used to introduce the critical nucleotide sequences into a cell. It is also understood that the phrase "vector construction" or the term "construction" may be used interchangeably with the term "vector" herein. The vector may contain DNA sequences that exist in nature or that have been generated by manipulation of genetic engineering or synthetic procedures.
After the non-homologous integration into the genome of a cell is carried out, the construct can activate the expression of an endogenous gene. Expression of the endogenous gene may result in the production of long-length protein or the production of a biologically truncated active form of the endogenous protein, depending on the site of integration (e.g., region toward the 5 'end against intron 2). The activated gene may be a known gene (for example, cloned or previously characterized) or an unknown gene (not previously cloned or characterized). The function of the gene can be known or unknown. Examples of proteins with known activities include, but are not limited to, cytokines, growth factors, neurotransmitters, enzymes, structural proteins, cell surface receptors, intracellular receptors, hormones, antibodies and transcription factors. Specific examples of known proteins that can be produced by this method include, but are not limited to, erythropoietin, insulin, growth hormone, glucocerebrosidase, tissue plasminogen activator, granulocyte colony stimulation factor (G-CSF), stimulation of granulocyte / macrophage colonies (GM / CSF), macrophage colony stimulating factor (M-CSF), interferon-interferon-ß, interferon-β, interleukin-2, interleukin-3, interleukin-4, interleukin-1 6, interleukin-8, interleukin-10, interleukin-11, interleukin-12, interleukin-13, interleukin-14, TGF-β, blood coagulation factor V, blood coagulation factor VII, blood coagulation factor VIII , blood coagulation factor IX, blood coagulation factor X, TSH-β, bone growth factor-2, bone growth factor -7, tumor necrosis factor, alpha-1 antitrypsin, antithrombin III, leukemia inhibitory factor , g lucagon, protein C, protein kinase C, stem cell factor, follicle stimulating hormone ß, urokinase, nerve growth factors, growth factors in the form of insulin, insulinotropin, parathyroid hormone, lactoferrin, complement inhibitors, growth factor platelet derivative, keratinocyte growth factor, hepatocyte growth factor, endothelial cell growth factor, neurotropin-3, thrombopoietin, chorionic gonadotropin, thrombomodulin, alpha glucosidase, epidermal growth factor, and fibroblast growth factor. The invention also allows the activation of a variety of genes expressing transmembrane proteins, and the production and isolation of said proteins, including but not limited to, cell surface receptors for growth factors, hormones, neurotransmitters and cytokines such as those described. above, transmembrane ion channels, cholesterol receptors, lipoprotein receptors, (including LDL and HDL) and other portions of lipids, integrins and other extracellular matrix receptors, supporting cytoskeletal proteins, immunoglobulin receptors, CD antigens (including CD2, CD3, CD4, CD8, and CD34 antigens) and other structural and functional transmembrane cell surface proteins that are known in the art. As will be appreciated by those skilled in the art, other cellular proteins and receptors that are known in the art can also be produced by the methods of the invention. One of the advantages of the method described herein is that virtually any gene can be activated. However, since genes have different genomic structures, including different limits and intron / exon positions of the start codons, a variety of activation constructs are provided to activate the maximum number of different genes within a population of cells. These constructs can be transfected separately in the cells to produce libraries. Each library contains cells with a unique group of activated genes. Some genes will be activated by various different activation constructs. further, portions of a gene can be activated to produce truncated, biologically active proteins. Truncated proteins can be produced, for example, by integrating an activation construct into introns or exons in the middle part of an endogenous gene rather than towards the 5 'end of the second exon. The use of different constructions also allows the activated gene to be modified to contain new sequences. For example, a secretion signal sequence may be included in the activation construct to facilitate secretion of the activated gene. In some cases, depending on the structure of the exon or the gene of interest, the secretion signal sequence can replace the entire signal sequence or part of it of the endogenous gene. In other cases, the signal sequence will allow a protein that is normally localized intracellularly to be secreted. The regulatory sequence in the vector can be a constitutive promoter. Alternatively, the promoter may be inducible. The use of inducible promoters will allow the activated protein to be produced at low base levels by the cell during the culture and expansion routine. The cells can then be induced to produce large amounts of the desired proteins, for example during manufacture or selection. Examples of inducible promoters may include, but are not limited to, the inducible tetracycline promoter and the metallothionein promoter. In preferred embodiments of the invention, the regulatory sequence in the vectors of the invention may be a promoter, enhancer or repressor, any of which may be tissue-specific. The regulatory sequence in the vector can be isolated from cellular or viral genomes. Examples of cellular regulatory sequences include, but are not limited to, regulatory elements of the actin gene, metallothionein I gene, immunoglobulin genes, casein I gene, serum albumin gene, collagen gene, globin genes, gene of laminin, spectrin gene, andrin gene, sodium-potassium ATPase gene and tubulin gene. Examples of viral regulatory sequences include, but are not limited to, regulatory elements of the immediate gene of Cytomegalovirus (CMV), late adenovirus genes, SV40 genes, retroviral LTR, and Herpes virus genes. Typically, the regulatory sequences contain binding sites for transcription factors such as, NF-kB, SP-1, TATA-bound protein, AP-1 and CAAT-bound protein. Functionally, the regulatory sequence is defined by its ability to promote, increase or otherwise alter the transcription of an endogenous gene. In certain preferred embodiments, the regulatory sequence is a viral promoter. In particularly preferred embodiments, the promoter is the promoter of the CMV immediate early gene (cytomegalovirus). In alternative embodiments, the regulatory element is a cellular, non-viral promoter. In alternative preferred embodiments, the regulatory element may be or may contain an enhancer. In particularly preferred embodiments, the enhancer is the enhancer of the cytomegalovirus immediate early gene. In preferred embodiments, the enhancer is a cellular, non-viral enhancer. In alternative preferred embodiments, the regulatory element may be or may contain a repressor. In particularly preferred embodiments, the repressor may be a viral repressor or a non-viral cellular repressor. The transcriptional regulatory sequence may also be composed of one or more scaffold binding regions of matrix binding sites, negative regulatory elements, and transcription factor binding sites. Regulatory sequences may also include regions of locus control. The invention also encompasses the use of regulatory retroviral transcription sequences, for example, prolonged terminal repeats. Nevertheless, where the latter are used, are not necessarily bound to any retrovirus sequence that materially affects the function of the transcription regulatory sequence as a promoter or transcription enhancer of the endogenous gene to be activated (ie, the cellular gene). with which the regulatory sequence of the transcription is recombined to activate it). The vector constructs of the invention may also contain a regulatory sequence that is not operably linked to exonic sequences in the vector. For example, when the regulatory element is an enhancer, it can be integrated near an endogenous gene (eg, towards the 5 'end, towards the 3' end or in an intron) and stimulate the expression of the gene from its endogenous promoter. . By means of this activation mechanism, the exonic sequences of the vector are absent in the transcription of the activated gene. Alternatively, the regulatory element can be operatively linked to an exon. The exon may be a sequence that occurs naturally or may be a sequence that does not occur naturally (for example, produced synthetically). To activate endogenous genes lacking an initiation codon in their first exon (eg, hormone that stimulates follicle-β), a start codon is preferably omitted for the exon in the vector. To activate endogenous genes containing a start codon in the first exon (e.g., erythropoietin and growth hormone), the exon in the vector preferably contains a start codon, generally ATG and preferably an efficient translation start site ( Kozak, J. Mol Biol. 196: 947 (1987)). The exon may contain additional codons after the start codon. These codons may be derived from a gene that occurs naturally or may not be present naturally (eg, synthetic). The codons may be the same codons as those present in the first exon of the endogenous gene to be activated. Alternatively, the codons may be different than the codons present in the first exon of the endogenous gene. For example, codons can unify a tag epitope, secretion signal sequence, transmembrane domain, selectable marker, or selectable marker. Optionally, an uncoupled processing donation site may be present immediately 3 'of the exonic sequence. When the structure of the gene to be activated is known, the donor site of processing must be placed adjacent to the vector exon in a position such that the codons in the vector will be in frame with the codons of the second exon of the endogenous gene after the prosecution. When the structure of the endogenous gene to be activated is not known, separate constructions are used, each containing a different reading frame. Operationally linked is defined as a configuration that allows transcription through the designated sequence (s). For example, a regulatory sequence that is operably linked to an exonic sequence indicates that the exonic sequence is transcribed. When a start codon is present in the vector, operably linked means that the open reading frame of the vector exon is in a frame with the open reading frame of the endogenous gene. After non-homologous integration, the regulatory sequence (eg, a promoter) in the vector becomes operably linked to an endogenous gene and facilitates the initiation or initiation of transcription, to a site generally termed a CAP site. The procedures of transcription through the exonic elements in the vector (and, if present, through the start codon, the open reading frame, and / or the uncoupled processing donation site), and through the gene endogenous. The primary transcript produced by this operable linkage is processed to generate a chimeric transcript that contains exonic sequences of both, the vector and the endogenous gene. This transcription can produce the endogenous protein when it is translated. An exon or "exon sequence" is defined as any transcribed sequence that is present in the mature RNA molecule. The exon in the vector may contain untranslated sequences, e.g., a 5 'untranslated region. Alternatively, in conjunction with the untranslated sequences, the exon may contain coding sequences such as a start codon and an open reading frame. The open reading frame can encode amino acid sequences that occur naturally and amino acid sequences that occur unnaturally (eg, synthetic codons). The open reading frame may also encode a secretion signal sequence, epitope tag, exon, selectable marker, selectable marker, or nucleotides that function to allow the open reading frame to be conserved when processing an endogenous gene. The processing of primary transcripts, the procedure by which introns are removed, is directed by a processing donor site and a processing acceptor site, located at the 5 'and 3' ends of introns respectively. The consensus sequence for the donor sites of processing is (A / C) AG GURAGU (where R represents a purine nucleotide) with nucleotides at positions 1-3 located in the exon and GURAGU nucleotides located in the intron. An uncoupled processing donation site is defined herein as a donor site of processing present in the activation construct without a processing acceptor site towards the 3 'end. When the vector is integrated by non-homologous recombination within a host cell genome, the uncoupled processing donation site becomes paired with an endogenous gene-processing acceptor site. The vector-processing donor site, together with the endogenous gene-processing acceptor site, will then lead to the cleavage of all sequences between the vector-processing donor site and the endogenous-processing acceptor site. The excision of these intervening sequences removes the sequences that interfere with the translation of the endogenous protein. The terms toward the 5 'end and toward the 3' end, as used herein, mean towards the direction (towards the end) 5 'or toward the direction (toward the end) 3', respectively, in relation to the encoded string. The term "5 'end region" of a gene is defined as the 5' nucleic acid sequence of its second exon (relative to the coding strand) up to and including the last exon of the first adjacent gene having the same coding chain. Functionally, the region toward the 5 'end is any 5' site of the second exon of an endogenous gene that can allow a non-homologously integrated vector to become operably linked to the endogenous gene. The vector construct may contain a selectable marker to facilitate identification and isolation of cells that contain an integrated activation construct in a non-homologous manner. Examples of selectable markers include genes coding for resistance to neomycin (neo), hypoxanthine phosphoribosyl transferase (HPRT) puromycin (pac), dihydro-orotase glutamine synthetase (GS), histidine D (his D), carbamyl phosphate synthetase (CAD), dihydrofolate reductase (DHFR), drug resistance 1 (mdr 1), aspartate transcarbamylase, xanthine-guanine phosphoribosyl transferase (gpt) and adenosine deaminase (ada). Alternatively, the vector may contain a selection marker, instead of, or in addition to, the selectable marker. The selection marker allows the cells containing the vector to be isolated without placing them under drug pressures or other selective pressures. Examples of selection markers include genes that code for cell surface proteins, fluorescent proteins, and enzymes. The vector containing cells can be isolated, for example by FACS using fluorescently labeled antibodies to the cell surface protein or to substrates that can be converted to fluorescent products by a vector-encoded enzyme. Alternatively, selection can be effected by phenotypic selection to obtain a character provided by the endogenous gene product. Therefore, the activation construct may lack a selectable marker different from the "marker" provided by the endogenous gene itself. In this embodiment, the activated cells can be selected based on a phenotype conferred by the activated gene. Examples of selectable phenotypes include cell proliferation, growth factor-independent growth, colony formation, cell differentiation, (eg, differentiation within a neuronal cell, muscle cell, epithelial cell, etc.), growth independent of anchoring, activation of cellular factors, (e.g., kinases, transcription factors, nucleases, etc.), expression of cell surface / protein receptors, gain or loss of cell-to-cell adhesion, migration, and cellular activation (e.g. resting T cells against activated T cells). A selectable marker of the construct can also be omitted when the transfected cells are selected for gene activation products without selecting the stable integrating elements. This is particularly useful when the efficiency of stable integration is very high. The vector may contain one or more (i.e., one, two, three, four, five, or more, and most preferably one or two) amplifiable markers to allow the selection of cells containing increased copies of the integrated vector and the endogenous gene activated adjacent. Examples of amplifiable labels include, but are not limited to, dihydrofolate reductase (DHFR), adenosine deaminase (ada), dihydro-orotase, glutamine synthetase (GS), and carbamyl phosphate synthase (CAD). The vector may contain replication useful eukaryotic viral origins for the amplification of the gene. These origins may be present in place of, or in conjunction with, an amplifiable marker. The vector may also contain genetic elements useful for the propagation of the construction in microorganisms. Examples of useful genetic elements include microbial origins of replication and markers of antibiotic resistance. These vectors, and any of the vectors described herein, and obvious variants recognized by those skilled in the art, can be used in any of the methods described herein to form any of the compositions that can be produced by those methods. The non-homologous integration of the construction into the genome of a cell originates the operable link between the regulatory elements of the vector and the exons from an endogenous gene. In preferred embodiments, the insertion of the vector regulatory sequences is used to upregulate the expression of the endogenous gene. Upregulation of gene expression includes converting a silent gene transcriptionally to a transcriptionally active gene. It also includes the increase of gene expression for genes that are already transcriptionally active, but that produce protein at levels lower than desired. In other embodiments, the expression of the endogenous gene may be affected in other ways such as the down-regulation of expression, the generation of an inducible phenotype or the change of the specific character of the expression tissue. According to the invention, in vitro methods of producing an expression product of a gene can comprise, for example, a) introducing a vector of the invention into a cell; b) allow the vector to integrate into the cell genome by non-homologous recombination; c) allowing overexpression of an endogenous gene in the cell by upregulating the gene by the transcriptional regulatory sequence contained in the vector; d) selecting the cell for overexpression of the endogenous gene; and e) culturing the cell under conditions that favor the production of the expression product of the endogenous gene by the cell. Said in vitro methods of the invention may further comprise isolating the expression product to produce an expression product of the isolated gene. In such methods, any method known in the art of protein isolation can be conveniently used, including but not limited to, chromatography (e.g., HPLC, FPLC, LC, ion exchange, affinity, size exclusion, and the like). ), precipitation (eg, precipitation of ammonium sulfate, immunoprecipitation and the like), electrophoresis and other methods of isolation and purification of proteins that will be known to those skilled in the art. Analogously, in vivo methods of producing a gene expression product may comprise, for example, a) introducing a vector of the invention into a cell; b) allow the vector to integrate into the cell genome by non-homologous recombination; c) allowing overexpression of an endogenous gene in the cell by upregulating the gene by the transcriptional regulatory sequence contained in the vector; d) selecting the cell for overexpression of the endogenous gene; and e) introducing the isolated and cloned cell into a eukaryote under conditions that favor overexpression of the endogenous gene by the cell in vivo and the eukaryote. According to this aspect of the invention, any eukaryote can be conveniently used, including plant and animal fungi (particularly yeast), most preferably animals, most preferably still vertebrates, and still most preferably mammals, particularly humans. In certain related embodiments, the invention provides such methods that further comprise isolating and cloning the cell before introducing it into the eukaryote. As used herein, the phrases "conditions favoring the production" of an expression product, "conditions favoring overexpression" of a gene, and "conditions favoring the activation" of a gene, in a cell or by an in vitro cell refers to any and all environmental, physical, nutritional or biochemical parameters that allow, facilitate, or promote the production of an expression product, or overexpression or activation of a gene, through an in vitro cell. Of course, said conditions may include the use of culture media, incubation, delivery, moisture, etc., which are optimal or which allow, facilitate, or promote the production of an expression product, or overexpression or activation of a gene, by an in vitro cell. Analogously, as used herein, the phrases "conditions favoring the production" of an expression product, "conditions favoring overexpression" of a gene, in a cell or by an in vivo cell refer to any and all the environmental, physical, nutritional, biochemical, behavioral, genetic and emotional parameters under which an animal that contains a cell is maintained, which allows, facilitates or promotes the production of an expression product, or overexpression or activation of a gene , by a cell in a eukaryotic in vivo. A person skilled in the art can determine whether a given set of conditions is favorable for the expression, activation, or overexpression of the gene in vitro or in vivo, using the selection methods described and exemplified below, or using other methods for measure expression, activation or overexpression of the gene, which are already routine in the art. As used herein, the phrase "endogenous activation gene" means introducing the production of a transcript encoding the endogenous gene to levels greater than those normally found in the cell containing the endogenous gene. In some applications, "endogenous activation gene" may also mean producing the protein, or a portion of the protein, encoded by the endogenous gene at levels higher than those normally found in the cell containing the endogenous gene. The invention also encompasses cells obtained by any of the above methods. The invention encompasses cells that contain the vector constructs, cells in which the vector constructs have been integrated, and cells that overexpress the desired gene products from an endogenous gene, said overexpression has been driven by the regulatory sequence of transcription introduced. The cells used in this invention can be derived from any eukaryotic species and can be primary, secondary or immortalized. Moreover, cells can be derived from any tissue in the body. Examples of useful tissues from which the cells can be isolated and activated include, but are not limited to, liver, kidney, bladder, bone marrow, thymus, heart, muscle, lung, brain, testis, ovary, islet, intestinal cells , skin, bone, gallbladder, prostate, vesicle, embryos, and hematopoietic immune systems. Cell types include fibroblastic, epithelial, neuronal, stem, and follicular cells. However, any cell or cell type can be used to activate the expression of the gene that is used in this invention. The methods that can be carried out in any of the cells of eukaryotic origin, such as fungal, vegetable or animal. Preferred embodiments include vertebrates and particularly mammals, most particularly humans. The construction can be integrated into primary, secondary or immortalized cells. The primary cells are cells that have been isolated from a vertebrate and that have not been transferred. Secondary cells are cells that have been crossed, but not immortalized, immortalized cells are cell lines that can apparently be traded indefinitely. In preferred embodiments, the cells are immortalized cell lines. Examples of immortalized cell lines include, but not limited to, KM 080, HeLa, Jurkat, 293 cells, KB carcinoma, epithelial cell line Colon T84, Raji, hepatoma cell lines Hep G2 or Hep 3B, melanoma A2058 , U937 lymphoma, and WI38 fibroblastic cell line, somatic cell hybrids, and hybridomas. The cells used in this invention can be derived from any eukaryotic species, including but not limited to mammalian cells (such as cells of rats, mice, cattle, pigs, sheep, goats and humans) avian cells, fish cells , amphibian cells, reptile cells, plant cells, and yeast cells. Preferably, overexpression of an endogenous gene or gene product of particular species is achieved by activating gene expression in a cell of those species. For example, human cells are used to overexpress endogenous human proteins. Similarly, bovine cells are used to overexpress endogenous bovine proteins, for example bovine growth hormone. The cells can be derived from any tissue in the eukaryotic organism. Examples of useful vertebrate tissues from which cells can be isolated and activated include, but not limited to, liver cells, kidney, bladder, bone marrow, thymus, heart, muscle, lung, brain, immune system (including lymphatic) , testicles, ovaries, islets, intestines, stomach, bone marrow, skin, bone, bladder, gallbladder, prostate, gallbladder, zygote, embryos and hematopoietic tissue. Useful types of vertebrate cells include, but are not limited to, fibroblasts, epithelial cells, neuronal cells, germ cells (ie, spermatocytes / spermatozoa and oocytes), stem cells, and follicular cells. Examples of plant tissues from which the cells can be isolated and activated include, but are not limited to, leaf tissues, ovarian tissues, stamen fabrics, pistil tissues, root tissues, tuber tissues, gametes, seeds, embryos and the like. Those skilled in the art will appreciate, however, that any eukaryotic cell or cell type can be used to activate the gene expression utilizing the present invention. Any of the cells produced by any of the described methods are useful for selecting the expression of a desired gene product and for providing desired amounts of a gene product that is overexpressed in the cell. The cells can be isolated and cloned. The cells produced by this method can be used to produce the protein in vitro (for example, for use as a protein therapy) or in vivo (for example, for use in cell therapy). Commercial growth and production conditions frequently vary from the conditions used to grow and prepare cells for analytical use (eg, cloning, protein or nucleic acid sequencing, elevation antibodies, X-ray crystallography analysis, Enzymatic analysis, and the like The method for enlarging the cells in proportion to obtain their production in roller bottles involves the increase in the surface area in which the cells can bind, therefore microcarrier spheres are frequently added to increase the area of surface for commercial production The method for enlarging cells in proportion in rotating crops may involve increases in volume.Five liters or more may be required for the microcarrier for production by rotary methods, depending on the inherent potency (specific activity) of the protein of interest, the vo Lumen can be low as 1-10 liters. 10-15 liters is more common. However, up to 50-100 liters may be necessary and the volume may be as high as 10,000 to 15,000 liters. In some cases higher volumes may be required. The cells can also be produced in large quantities of T flasks, for example from 50 to 100. Despite the production conditions, the purification of protein on a commercial scale can also vary considerably from the purification for analytical purposes. The purification of proteins in a commercial practical context can initially be the mass equivalent of 10 liters of cells approximately 104 cells / ml. The cell mass equivalent to initiate protein purification may be high as 10 liters of cells up to 10 6 or 10 7 cells / ml. However, as will be appreciated by one skilled in the art, a lower or higher initial cell mass equivalent may be of great use and great advantage so that it can be conveniently used in such methods. Another condition of commercial production, especially when the final product is used clinically, is cell production in a serum-free medium, whereby the medium is intended not to contain serum or to contain it in quantities required for cell production. This obviously avoids unwanted co-purification of toxic contaminants (e.g., viruses) or other types of contaminants, e.g., proteins that would complicate purification. Serum-free media for cell production, commercial sources for such media, and methods for cell cultures in serum-free media, are already known to those skilled in the art. A simple cell obtained by these methods described above can overexpress a single gene or more than one gene. More than one gene can be activated by integrating a simple construction or by integrating multiple constructions into the same cell (ie, more than one type of construction). Therefore, a cell can contain only one type of vector construct or different types of constructions, and each can activate an endogenous gene. The invention is also directed to methods for obtaining the cells described above by one or more of the following steps: introducing one or more of the vector constructs; allowing the introduced constructs to be integrated into the cell genome by non-homologous recombination, allowing the overexpression of one or more endogenous genes in the cell; as well as isolate and clone the cell. The term "transfection" has been used herein for convenience, when describing the introduction of a polynucleotide into a cell. However, it should be understood that the specific use of this term has been applied to generally refer to the introduction of the polynucleotide into a cell and is also intended to refer to the introduction by other methods described herein such as electroporation, mediated introduction by liposome, retrovirus-mediated introduction, and the like (as well as according to this specific meaning). The vector can be introduced into the cell by a number of methods known in the art. These include, but are not limited to, electroporation, calcium phosphate precipitation, DEAE dextran, lipofection, and receptor-mediated endocytosis, polybrene, particle bombardment, and microinjection. Alternatively, the vector can be delivered to the cell as a viral particle (either competent or deficient replication). Examples of viruses useful for the delivery of nucleic acid include, but are not limited to, adenoviruses, adeno-associated viruses, retroviruses, herpes viruses, and vaccine viruses. Other viruses suitable for the delivery of nucleic acid molecules in cells are known to those skilled in the art and can be used equivalently in current methods. After transfection, the cells are cultured under conditions, as are known in the art, suitable for non-homologous integration between the vector and the host cell genome. Cells that contain the integrated vector in a non-homologous manner can be further cultured under conditions, which are already known in the art, that allow the expression of activated endogenous genes.
The vector construct can be introduced into cells in a simple DNA construct or in separate constructions and allowed to concatemerize. Considering the preferred embodiments, the vector construct is a double-stranded DNA vector construct, the vector constructs also include single-stranded DNA, single-stranded and double-stranded DNA combinations, single-stranded RNA, double chain, and combinations of single chain and double chain RNA. Therefore, as an example, the vector construct could be single-stranded RNA that is converted to cDNA by means of reverse transcriptase, the cDNA converted to double-stranded DNA, and the double-stranded DNA finally recombine with the genome of host cell. In preferred embodiments, the constructions are linearized prior to introduction into the cell. The linearization of the activation construct generates free DNA ends capable of reacting with chromosomal ends during the integration procedure. In general, the construct is linearized to the 3 'end of the regulatory element (and the exonic and donor sequences of processing, if present). Linearization can be facilitated by, for example, placement of a single restriction site towards the 3 'end of the regulatory sequences and treatment of the construction with the corresponding restriction enzyme prior to transfection. As long as it is not required, it is convenient to place a "separating" sequence between the linearization site and the more functional nearby element (for example, the non-coupled processing donation site) in the construction. When the spacer sequence is present, it protects the important functional elements in the vector of exonucleic degradation during the transfection procedure. The spacer sequence can be composed of any nucleotide sequence that does not change the essential functions of the vector as described herein. The circular constructions can also be used to activate the expression of the endogenous gene. Circular plasmids, as is known in the art, at the time of transfection in cells, can be integrated into the genome of the host cell. Probably, DNA breaks occur in the circular plasmid during the transfection procedure, generating there free DNA ends capable of binding to the chromosomal ends. Some of these construction breaks will occur in a position that does not destroy the essential functions of the vector (for example, the break will occur towards the 3 'end of the regulatory sequence), and therefore, allow the construction to be integrated into a chromosome in a configuration that can activate an endogenous gene. As described above, the separator sequences can be placed in the construct (eg, towards the 3 'end of the regulatory sequences). During transfection, cleavages occurring in the separating region will generate free ends at the site in the proper construction for the activation of an endogenous gene after integration into the host cell genome. The invention also encompasses libraries of cells formed by the methods described above. A library can encompass all clones of a simple transfection experiment or encompass a subset of clones from a simple transfection experiment. The subgroup may overexpress the same gene or more than one gene, for example, a class of genes. The transfection could have been carried out with a simple type of construction or with more than one type of construction. A library can also be formed by combining all recombinant cells from two or more transfection experiments, by combining one or more subgroups of cells from a single transfection experiment or by combining cell subgroups from transfection experiments carried out separately. The resulting library can express the same gene, or more than one gene, for example a class of genes. Again, in each of these individual transfections, a single construction or more than one construction can be used. The libraries can be formed from the same cell type or from different cell types. The library can be composed of a single cell type that contains a unique type of activation construct that has been integrated into the chromosomes in spontaneous DNA breaks or in breaks generated by radiation, restriction enzymes, and / or agents of DNA cleavage, applied either bound (to the same cells) or separately (applied to individual groups of cells and then combining the cells together to produce the library). The library can be composed of multiple cell types containing single or multiple constructs that were integrated into the genome of a radiation-treated cell, restriction enzymes, and / or DNA-breaking agents, applied either attached (to them) cells) or separately (applied to individual groups of cells and then combining the cells together to produce the library). The invention also relates to methods for forming libraries by selecting several subgroups of cells from the same or different transfection experiments. For example, all cells expressing nuclear factors (determined by the presence of green fluorescent nuclear protein in cells transfected with construct 20) can be pooled to generate a cell library with activated nuclear factors. Similarly, cells expressing membrane or secreted proteins can be pooled. The cells can also be grouped by phenotype, eg growth factor-independent growth, growth factor-independent proliferation, colony formation, cell differentiation (eg, differentiation within a neuronal cell, muscle cell, epithelial cell, etc.). ), independent anchor growth, activation of cellular factors (eg, kinases, transcription factors, nucleases, etc.), gain or loss of cell-to-cell adhesion, migration, or cellular activation (eg, resting T cells against activated T cells). The invention further relates to methods for using libraries of cells that overexpress an endogenous gene. The library is selected for the expression of the gene and the cells are selected to express the desired gene product. The cell can then be used to purify the gene product for subsequent use. Cell expression can occur when the cell is cultured in vitro or by allowing the cell to express the gene in vivo. The invention also relates to methods for using libraries and thus identifying the novel gene and the gene products. The invention also relates to methods for increasing the efficiency of gene activation by treating the cells with agents that stimulate or cause an effect on non-homologous integration patterns. It has been shown that gene expression patterns, chromatin structure and methylation patterns can differ dramatically from one type of cell to another type of cell. Even different cell lines of the same cell type can have significant differences. These differences can cause an impact on non-homologous integration patterns by affecting both the DNA breakdown pattern and the repair procedure. For example, chromatized DNA spaces (characteristics likely associated with inactive genes) may be more resistant to breakage by restriction enzymes and chemical agents, considering that they may be susceptible to radiation breakage. In addition, inactive genes can be methylated. In this case, restriction enzymes that are blocked by CpG methylation will be unable to cut methylated sites near the inactive gene making it more difficult to activate that gene using enzymes sensitive to methylation. These problems can be counteracted by generating activation libraries in different cell lines using a variety of DNA breaking agents. Carrying out the above, a more complete integration pattern can be generated and the probability of activating a given gene can be maximized. The methods of the invention may include the introduction of double-strand breaks in the DNA of the cell containing the endogenous gene to be overexpressed. These methods exhibit double-strand breaks in the genomic DNA in the cell prior to or simultaneously with the integration of the vector. The mechanism of DNA breakdown can have a significant effect on the pattern of DNA breaks in the genome. As a result, DNA breaks produced spontaneously or artificially with radiation, restriction enzymes, bleomycin, or other breaking agents, can be presented in different positions. To increase the integration efficiency and thus improve the random distribution of integration sites, cells can be treated with low, intermediate, or high doses of radiation prior to or after transfection. By the artificial induction of double-strand breaks, the transfected DNA can now be integrated into the chromosome of the host cell as part of the DNA repair procedure. Normally, the generation of double-stranded breaks that function as the integration site is the limiting frequency stage. Therefore, by increasing chromosomal breaks using radiation (or other agents that damage DNA), a large number of members can be obtained in a given transfection. Moreover, the mechanism of breaking DNA by radiation is different from the mechanism by spontaneous breaking. Radiation can include DNA breaks directly when a high-energy photon collides with the DNA molecule. Alternatively, radiation can activate compounds in the cell that in turn react with and break the DNA strand. On the other hand, it is considered that spontaneous breaks occur due to the interaction between the reactive compounds produced in the cell (such as superoxides and peroxides) and the DNA molecule. However, the DNA in the cell is not present as a naked polymer, deproteneized, but instead, it is bound to the chromatin and present in a condensed state. As a result, some regions are not accessible to agents in the cell that cause double-stranded breaks. Photons produced by radiation have wavelengths short enough to collide with highly condensed regions of DNA, thus inducing breaks in the regions of DNA that are represented below in spontaneous breaks. Therefore, radiation can generate different DNA breaking patterns, which in turn, should lead to different patterns of integration. As a result, libraries produced using the same activation construct in cells with or without radiation treatment will potentially contain different groups of activated genes. Finally, radiation treatment increases the non-homologous integration efficiency up to 5 to 10 times, allowing complete libraries to be generated using a smaller number of cells. Therefore, radiation treatment increases the efficiency of gene activation and generates new patterns of integration and activation in the transfected cells. Useful types of radiation include radiation with a, ß,?, X and ultraviolet radiation. Useful doses of radiation vary for different types of cells, but in general, dose scales that result in cell viabilities of 0.1% a >are useful.99% For HT1080 cells, this corresponds to radiation doses from a 137Cs source from about OJ rads to 100 rads. Other doses may also be useful, as long as the dose increases the frequency of integration or changes the pattern of integration sites. In addition to radiation, restriction enzymes can be used to artificially induce chromosomal breaks in transfected cells. As with radiation, DNA restriction enzymes can generate chromosomal breaks which, in turn, function as the integration sites for the transfected DNA. This large number of DNA breaks increases the overall integration efficiency of the activation construct. Additionally, the mechanism of cleavage by restriction enzymes differs from the mechanism by radiation, and the pattern of chromosomal breaks is likely to be different. Restriction enzymes are relatively large molecules compared to photons and smaller metabolites capable of damaging DNA. As a result, restriction enzymes will tend to break regions that are less condensed than the genome considered as a whole. If the gene of interest is within an accessible region of the genome, then treating the cells with a restriction enzyme may increase the likelihood of integrating the activation construct towards the 5 'end of the gene of interest. As the restriction enzymes recognize the specific sequences, and as a given restriction site may not be towards the 5 'end of the gene of interest, a variety of restriction enzymes may be used. It may also be important to use a variety of restriction enzymes since each enzyme has different properties (for example: size, stability, ability to cut methylated sites, and optimal reaction conditions) that affect the determination of which sites on the chromosome host that will be cut. Each enzyme, due to the different distribution of restriction sites susceptible to cutting, will generate a different integration pattern. Therefore, the introduction of restriction enzymes (or plasmids capable of expressing for restriction enzymes), before, during, or after the introduction of the activation construct will result in the activation of different groups of genes. Finally, cleavage induced by restriction enzymes increases the integration efficiency up to 5 to 10 times (Yorifuji et al., Mut. Res 243: 121 (1990)), allowing a smaller number of cells to be transfected to produce a complete library. . Therefore, restriction enzymes can be used to generate new integration patterns, allowing the activation of genes that were not activated in libraries produced by non-homologous recombination in spontaneous breaks or in other artificially induced breaks. Restriction enzymes can also be used to shift the integration of the activation construct to a desired site in the genome. For example, various rare restriction enzymes have been described that cut eukaryotic DNA every 50 to 1000 kilobases, on average. If it happens that a rare restriction recognition sequence is located towards the 5 'end of a gene of interest, by introducing the restriction enzyme at the time of transfection together with the activation construct, the DNA breaks can preferably be towards the 5 'end of the gene of interest. These breaks can then function as sites for the integration of the activation construction. It may be that any enzyme cuts into an appropriate position in or near the gene of interest and its site is down represented in the rest of the genome or its site is overrepresented near genes (eg, restriction sites containing CpG). For genes that have not been previously identified, restriction enzymes can be used with 8bp recognition sites (eg,? / oíl, Sfil, Pme \, Swa \, Ssel, SrfL, SgrAI, Pací, Asc \, Sgl \, and dse83871), enzymes that recognize sites containing CpG (eg, Eagl, Bsi-WI, Mlu \, and BssHIl) and other rare-cut enzymes. In this way, "deviated" libraries can be generated, which are enriched for certain types of activated genes. In this regard, restriction enzyme sites containing CpG dinucleotides are particularly useful, since these sites are underrepresented in the genome over their length, but overrepresented in the form of CpG islands at the 5 'end of many genes, the position which is very useful for the activation of the gene. Accordingly, enzymes that recognize these sites will preferably cut at the 5 'end of the gene sequences. Restriction enzymes can be introduced into the host cell by various methods. First, restriction enzymes can be introduced into the cell by electroporation (Yorifuji et al., Mut. Res 243: 121 (1990); Winegar eí al., Mut. Res 225: 49 (1989)). In general, the amount of restriction enzyme introduced into the cell is proportional to its concentration in the electroporation medium. The pulse conditions must be optimized for each cell line by adjusting the voltage, capacitance and resistance. Second, the restriction enzyme can be transiently expressed from a plasmid encoding the enzyme under the control of eukaryotic regulatory elements. The level of enzyme produced can be controlled using implantable promoters, and varying the induction force. In some cases, it would be desirable to limit the amount of restriction enzyme produced (due to its toxicity). In these cases, weak or mutant promoters, processing sites, translation start codons, and poly A ends can be used to decrease the amount of restriction enzyme produced. Third, restriction enzymes can be introduced by agents that fuse with or permeabilize the cell membrane. Liposomes and streptolysin O (Pimplikar et al., J. Cell Biol. 725: 1025 (1994)) are examples of this type of agent. Finally, mechanical drilling can also be used (Beckers et al., Cell 50: 523-534 (1987)) and microinjection to introduce nucleases and other proteins into the cells. However, any method that can supply active enzymes to a living cell is suitable. DNA breaks induced by bleomycin and other DNA-damaging agents can also produce DNA breaking patterns that are different. Therefore, any agent or incubation condition that has the ability to generate double-stranded breaks in the cells, is useful for increasing the efficiency and / or altering non-homologous recombination sites. Examples of classes of chemical DNA breaking agents include, but are not limited to, peroxides and other free radical generating compounds, alkylating agents, topoisomerase inhibitors, antineoplastic drugs, acids, substituted nucleotides and enediin antibiotics. Specific chemical DNA breaking agents include, but are not limited to, bleomycin, hydrogen peroxide, cumene hydroperoxide, tert-butyl hydroperoxide, hypochlorous acid (reacted with aniline, -1-naphthylamine or 1-naphthol), nitric acid, phosphoric acid, doxorubicin, 9-deoxidoxorubicin, demethyl-6-deoxirubicin, 5-iminodaunorubicin, adriamycin, 4, - (- 9-acridinylamino) methanesulfon-m-anisidide, neocarzinostatin, 8-methoxycaffeine, etoposide, ellipticine, iododeoxyuridine and bromodeoxidiuridine. It has been shown that the DNA repair machinery in the cell can be induced by prior exposure of the cell at low doses of a DNA breaking agent such as radiation or bleomycin. By pre-treating cells with these agents for approximately 24 hours prior to transfection, the cell will be more efficient in repairing DNA breaks and integrating the DNA after transfection. In addition, higher doses of radiation or other DNA breaking agents can be used since the LD50 (the dose causes mortality in 50% of the exposed cells) is higher after the previous treatment. This allows randomized activation of libraries at multiple doses and results in a different distribution of integration sites within the chromosomes of the host cell.
Selection Once an activation library (or libraries) has been generated, you can select it (s) by performing a number of tests. Depending on the characteristics of the protein (s) of interest (e.g., proteins secreted against intracellular proteins) and the nature of the activation construct used to generate the library, any or all of the tests described below may be used. . Other test formats can also be used.
ELISA Activated proteins can be detected using the enzyme-linked immunosorbent assay (ELISA). If the activated gene product is secreted, the culture supernatants of the activation gene cell groups are incubated in wells containing a bound antibody specific for the protein of interest. If a cell or group of cells has activated the gene of interest, then the protein will be secreted into the culture medium. By selecting groups of clones from a library (groups can be from 1 to greater than 100,000 elements of the library), one can identify the groups that contain a cell (s) that have (n) activated the gene of interest. The cell of interest can then be purified from other library members by "sib" inbreeding selection, limiting dilution, or other methods known in the art. In addition to the secreted proteins, the ELISA test can be used to select cells that express intracellular proteins and membrane-bound proteins. In these cases, instead of selecting culture supernatants, a small number of cells are removed from the library group (each cell is represented at least 100 to 1000 times in each group), lysed, clarified, and added to the coated wells with antibodies.
ELISA test by drip. The ELISA test drops are coated with antibodies specific for the protein of interest. After coating, the wells are blocked with 1% BSA / PBS for one hour at 37 ° C. After blockade, 100,000 to 500,000 cells of the randomly activating library are applied to each well (this represents -10% of the total pool) Generally, one pool is applied to each well, if the frequency of a cell expressing the protein of interest is 1 in 10,000 (ie, the group consists of 10,000 individual clones, each of which expresses the protein of interest), then plating 500,000 cells per well, yielding 50 specific cells.The cells are incubated in the wells at 37 ° C for 24 to 48 hours without being moved or disturbed.At the end of the incubation, the cells are removed and the plate is washed 3 times with PBS / 0.05% Tween 20 and 3 times with PBS / 1 % / BSA Secondary antibodies are applied to the wells in the proper concentration and are incubated for two hours at room temperature or 16 hours at 4 ° C. These antibodies can be biotinylated or directly labeled with horseradish peroxidase (HRP). They go secondary antibodies and wash the plate with PBS / 1% / BSA. The tertiary antibody or streptavidin labeled with HRP is added and incubated for one hour at room temperature.
Test FACS. The fluorescence activated cell sorter (FACS) can be used to select the random activation library in various forms. If the gene of interest codes for a cell surface protein, then the fluorescently labeled antibodies are incubated with cells from the activation library. If the gene of interest codes for a secreted protein, then the cells can be biotinylated and incubated with streptavidin conjugated to an antibody specific for the protein of interest (Manz et al., Proc. Nati. Acad Sci. (USA) 92: 1921 (nineteen ninety five)). After incubation, the cells are placed in a high concentration of gelatin (or another polymer, such as agarose or methylcellulose) to limit the diffusion of the secreted protein. As the protein is secreted by the cell, this is captured by the antibody bound to the cell surface. The presence of the protein of interest is then detected by a second antibody that is fluorescently labeled. The cells can be classified according to their fluorescence signal, both for the secreted proteins and for the membrane-bound proteins. The fluorescent cells can then be isolated, expanded, and then enriched by FACS, limiting dilution, or other cellular purification techniques known in the art.
Separation of magnetic spheres. The principle of this technique is similar to that of FACS. The membrane bound proteins and the captured secreted proteins (as described above) are detected by incubating the activation library with magnetic beads conjugated with antibody, which are specific for the protein of interest. If the protein is present on the surface of a cell, the magnetic spheres will bind to that cell. Using a magnet, cells expressing the protein of interest can be purified from other cells in the library. The cells are then released from the spheres, expanded, analyzed and subsequently purified if necessary.
RT-PCR. A small number of cells (equivalent to at least the number of individual clones in the group) is cultured and lysed to allow RNA purification. After isolation, the RNA is reverse transcribed using reverse transcriptase. PCR is then carried out using specific primers for cDNA of the gene of interest. Alternatively, primers encompassing the synthetic exon in the activation construct and the exon of the endogenous gene can be used. This initiator will not hybridize or amplify the gene of interest expressed endogenously. On the contrary, if the activation construct has been integrated towards the 5 'end of the gene of interest and has activated the expression of the gene, then this initiator, together with a second specific primer for the gene will amplify the activated gene due to the presence of the synthetic exon spliced on the exon of the endogenous gene. Therefore, this method can be used to detect activated genes in cells that normally express the gene of interest at lower levels than desired.
Phenotypic Section In this embodiment, the cells can be selected based on a phenotype conferred by the activated gene. Examples of phenotypes that can be selected are: proliferation, growth factor-independent growth, colony formation, cell differentiation (for example, differentiation in a neuronal cell, muscle cell, epithelial cell, etc.), independent anchoring growth, activation of cellular factors (e.g., kinases, transcription factors, nucleases, etc.), gain or loss of cell-to-cell adhesion, migration and cellular activation (e.g., resting T cells against activated T cells). It is important to isolate activated cells that demonstrate a phenotype, such as that described above, due to the activation of an endogenous gene by integrated construction and is probably responsible for the observed cell phenotype. Therefore, the activated gene can be a therapeutic drug or important drug target to treat or induce the observed phenotype. The sensitivity of each of the above tests can be increased effectively by transiently overregulating the expression of the gene in the cells of the library. This can be achieved for promoters containing NF-βB on site (in the activation construct) by adding PMA and tumor necrosis factor-a, eg, to the library. Separately, or in conjunction with PMA and TNF-a, sodium butyrate can be added to further increase gene expression. The addition of these reagents can increase the expression of the protein of interest, thus giving rise to a lower sensitivity test which is used to identify the activated gene of the cell of interest. As the major activation libraries are generated to maximize the activation of a large number of genes, it is a great advantage to organize the clones of the library into groups. Each group can consist of 1 to more than 100,000 individual clones. Therefore, in a given group, many activated proteins are produced, often in diluted concentrations (due to the overall size of the group and the limited number of cells within the group)., which produce a certain activated protein). Therefore, the concentration of proteins before selection effectively increases the ability to detect activated proteins in the selection test. A particularly useful method of concentration is ultrafiltration. However, other methods can be used. For example, proteins can be concentrated non-specifically, or semi-specifically by adsorption on the exchange of ions, hydrophobic resins, dyes, hydroxyapatite, lectin and other suitable resins under conditions that bind most of all proteins present The bound proteins can then be removed in a small volume before selection. It is a great advantage to grow the cells in a serum-free medium to facilitate the concentration of proteins. In another embodiment, a sequence useful in the activation construct, which is a tag epitope, may be included. The epitope tag consists of an amino acid sequence that allows affinity purification of the activated protein (for example, in immunoaffinity matrices or chelators). Therefore, by including a tag epitope in the activation construct, all activated proteins in an activation library can be purified. By purifying activated proteins from other proteins and cellular media, selection for novel proteins and enzymatic activities can be facilitated. In some cases, it would be desirable to remove the epitope tag after purification of the activated protein. This can be achieved by including a protease recognition sequence (e.g., Factor lia or enterokinase cleavage site) towards the 3 'end of the epitope tag in the activation construct. Incubation of the purified protein (s) purified with the appropriate protease will release the epitope tag of the protein (s).
In those libraries in which a sequence of epitope tags is located in the activation construct, all activated proteins can be purified from all other proteins and cell media using affinity purification. This not only concentrates the activated proteins, but also purifies them from other activities that may interfere with the test used to select the library. Once a clone group containing cells that overexpress the gene of interest is identified, some steps can be taken to isolate the activated cell. The isolation of the activated cell can be achieved by a variety of methods k in the art. Examples of cell purification methods include limiting dilution, selection of fluorescence activated cells, separation of magnetic spheres, "sib" inbreeding selection and purification of single colonies using cloning rings. In preferred embodiments of the invention, the methods include a method wherein the expression product is purified. In highly preferred embodiments, the cells expressing the endogenous gene product are cultured in such a way as to produce quantities of the gene product that is feasible for commercial application, and especially for various uses such as diagnosis and therapeutic and drug discovery. Any vector used in the methods described herein may include an amplifiable marker. Therefore, the amplification of both the vector and the DNA of interest (ie, which contains the overexpressed gene) occurs in the cell, and the additional increased expression of the endogenous gene is obtained. In accordance with the foregoing, the methods may include a step in which the endogenous gene is amplified. Once the activated cell has been isolated, the expression can be further increased by expanding the locus that contains both the gene of interest and the activation construct. This can be achieved by each of the methods described below, either separately or in combination. Amplifiable markers are genes that can be selected for a greater number of copies. Examples of amplifiable labels include dihydrofolate reductase, adenosine deaminase, aspartate transcarbamylase, dihydro-orotase and carbamyl phosphate synthase. For these examples, the high number of copies of the amplifiable marker and of the flanking sequences (including the gene of interest) can be selected so that a toxic drug or metabolite acting on the amplifiable marker is used. In general, as the concentration of the drug or toxic metabolite increases, cells containing a lower number copies of the amplifiable marker die, while cells containing increased copies of the marker survive and form colonies. These colonies can be isolated, expanded and analyzed for increased levels of production of the gene of interest. The placement of an amplifiable marker in the activation construct results in the juxtaposition of the gene of interest and the amplifiable marker in the activated cell. The selection for activated cells containing an increased number of copies of the amplifiable marker and the gene of interest can be achieved by cell growth or cell culture in the presence of increased amounts of a selective agent (generally a drug or metabolite). For example, the amplification of dihydrofolate reductase (DHFR) can be selected using metrotrexate. As the drug-resistant colonies arise at each increased concentration of the drug, the individual colonies can be selected and characterized for the number of copies of the amplifiable marker and the gene of interest, and analyzed for the expression of the gene of interest. Individual colonies with the highest levels of expression of the activated gene can be selected to achieve additional amplification at higher drug concentrations. At the highest drug concentrations, the clones will express considerably increased amounts of the protein of interest. When DHFR is amplified, it is convenient to plague approximately 1 x 107 cells at different concentrations of methotrexate. The useful initial methotrexate concentrations range from about 5 nM to 100 nM. However, the optimal concentration of metrotrexate must be determined empirically for each cell line and for each site of integration. After growth in the medium containing methotrexate, the colonies of the highest concentration of methotrexate are chosen and analyzed for the increased expression of the gene of interest. The clone (s) with the highest concentration of methotrexate are then grown in higher concentrations of methotrexate to select the additional amplification of DHFR and the gene of interest. Methotrexate concentrations on the micromolar and millimolar scale can be used for clones that contain the highest degree of gene amplification. The placement of a viral origin of replication (s) (eg, ori P or SV40 in human cells, and polyoma ori in mouse cells) in the activation construct will cause the juxtaposition of the gene of interest and the viral origin of replication in the cell activated. The sequences of origin and flanking can then be amplified by introducing the viral replication proteins in trans. For example, when ori P (the origin of replication in the Epstein-Barr virus) is used, EBNA-I can be expressed transiently or stably. EBNA-I will initiate the replication of the integrated ori P locus. The replication will be extended bidirectionally from the origin. As each replication product is generated, replication can also begin. As a result, many copies of the sequences of viral origin and genomic flanking sequences including the gene of interest are generated. This higher number of copies allows the cells to produce larger quantities of the gene of interest. At some frequency, the replication product will recombine to form a circular molecule containing genomic flanking sequences, including the gene of interest. Cells containing circular molecules with the gene of interest can be isolated by simple cell cloning and by analysis by extraction of Hirt and Southern blot. Once purified, the cell containing the episomal genomic locus in the high number of copies (typically 10-50 copies) can be propagated in the culture. In order to achieve greater amplification, the episome can also be elevated by including a second origin adjacent to the first in the original construction. For example, T antigen can be used to raise the number of episome copies of ori P / SV40 to a copy number of -1000 (Heinzel et al., J. Virol. 62: 3738 (1988)). This substantial increase in the number of copies can dramatically increase the expression of the protein. The invention contemplates the overexpression of endogenous genes both in vivo and in vitro. Accordingly, the cells could be used in vitro to produce desired amounts of a gene product or they could be used in vivo to provide the gene product in the intact animal. The invention also contemplates the proteins produced by the methods described herein. The proteins can be produced either from known genes, or from previously unknown genes. Examples of known proteins that can be produced by this method include, but are not limited to, erythropoietin, insulin, growth hormone, glucocerebrosidase, tissue plasminogen activator, granulocyte colony stimulating factor, granulocyte / macrophage colony stimulating factor, interferon a, interferon beta, interferon beta, interleukin-2, interleukin-6, interleukin-11, interleukin-12, TGF beta, factor V blood coagulation, factor VII of blood coagulation, factor VIII of blood coagulation, blood coagulation factor IX, blood coagulation factor X, TSH-β, bone growth factor 2, bone growth factor-7, tumor necrosis factor, alpha-1-antitrypsin, antithrombin III, inhibitory factor leukemia, glucagon, protein C, protein kinase C, macrophage colony stimulating factor, stem cell factor, follicle stimulating hormone ß, urokinase, nerve growth factors, growth factors in the form of insulin, insulinotropin, parathyroid hormone, lactoferrin, complement inhibitors, platelet-derived growth factor, keratinocyte growth factor, neurotropin-3, thrombopoietin, chorionic gonadotropin, thrombomodulin, alpha glucosidase, epidermal growth factor, FGF, macrophage colony stimulating factor and surface receptors cellular for each of the proteins described above. When the protein product of the activated cell is purified, any method of protein purification known in the art can be used.
Isolation of cells that contain activated enzymes that code for membrane protein. The genes that encode membrane-associated proteins are particularly interesting from the point of view of drug development. These genes, and the proteins they encode, can be used, for example, to develop small molecule drugs using combinatorial chemistry libraries and high throughput screening tests. Alternatively, proteins or soluble forms of the proteins (eg, truncated proteins lacking the transmembrane region) can be used as the therapeutically active agents in humans or animals. Identification of membrane proteins can also be used to identify new ligands (eg, cytokines, growth factors and other effector molecules) using two hybrid approaches or affinity capture techniques. Other diverse uses of membrane proteins are also possible. Current approaches for identifying genes encoding integral membrane proteins encompass the isolation and sequencing of genes from cDNA libraries. Integral membrane proteins are identified by ORF (open reading frame) analysis using hydrophobic capacity plots that can identify the transmembrane region of the protein. Unfortunately, by using this approach, a gene encoding an integral membrane protein can not be identified, unless the gene is expressed in the cells used to produce the cDNA library. Moreover, many genes are only expressed in rare cells, during reduced experimental windows, and / or at other very low levels. As a result, these genes can not be identified efficiently using the currently available approaches.
The present invention allows endogenous genes to be activated without any knowledge of the sequence, structure, function, or expression profile of the genes. Using the described methods, the genes can be activated at the transcription level only, or at both levels, both transcription and translation. As a result, the proteins encoded by the activated endogenous gene can be produced in cells containing the integrated vector. Furthermore, by using the specific vectors described herein, the protein produced from the activated endogenous gene can be modified, for example, to include a tag epitope. Other vectors, (e.g., vectors 12-17 described above) can encode a signal peptide after a tag epitope. This vector can be used to isolate cells that have activated the expression of an integral membrane protein (see, Example 5 below). This vector can also be used to select the secretion of proteins that are not normally secreted. Therefore, the invention is also directed to methods for identifying an endogenous gene that encodes an integral cell membrane protein or a transmembrane protein. Said methods of the invention may comprise one or more steps. For example, such a method of the invention may comprise: a) introducing one or more vectors of the invention into a cell; b) allow the vector to integrate into the cell genome by non-homologous recombination; c) allowing overexpression of an endogenous gene in the cell by upregulating the gene by the transcription regulatory sequence that is contained in the integrated vector construct; c) selecting the cell for overexpression of the endogenous gene; and e) characterizing the activated gene to determine its identity as a gene encoding an integral cellular membrane protein. In related embodiments, the invention provides such methods that further comprise isolating the activated gene from the cell prior to characterization of the activated gene. To identify genes that encode integral membrane proteins, vectors integrated into the cell genome will comprise a regulatory sequence linked to an exonic sequence containing a start codon, a signal sequence and a tag epitope, followed by an uncoupled processing donation site. When the integration and activation of an endogenous gene is carried out, a chimeric protein is produced which contains the signal peptide and the epitope tag of the vector fused to the protein, encoded by the exons towards the 3 'end of the endogenous gene. This chimeric protein, by virtue of the presence of the vector encoded signal peptide, is directed to the secretory pathway where the translation of the protein is terminated and the protein is secreted. However, if the activated endogenous gene encodes an integral membrane protein, and the transmembrane region of that gene is encoded by exons located 3 'of the vector's integration site, then the chimeric protein will be driven to the cell surface, and the epitope mark will be shown on the surface of the cell. Using known methods of cell isolation (eg, flow cytometry classification, cell selection by magnetic spheres, immunoadsorption, or other methods that will be familiar to those skilled in the art), antibodies to the epitope tag can then be used. to isolate cells from the population that shows the epitope tag and that have activated a gene that encodes the integral membrane. These cells can then be used to study the function of the membrane protein. Alternatively, the activated gene can then be isolated from these cells using any method known in the art, for example by hybridization with a DNA probe specific for the vector-encoded exon to select a cDNA library produced from these cells , or using the genetic constructions described herein. The epitope tag encoded by the vector exon can be a minor peptide that can bind to an antibody, a minor peptide that can bind to a substance (eg, polyhistidine / divalent metal ion supports, bound protein supports a maltose / maltose supports, glutathione-S-transferase / glutathione support), or an extracellular domain (lacking a transmembrane domain) of an integral membrane protein for which an antibody or ligand exists. However, it will be understood that other types of epitope marks that are familiar to those skilled in the art may be used in an equivalent manner according to the invention.
Vectors for non-targeted activation of endogenous genes As evidenced above, the activation of non-targeted genes has a number of important applications, including endogenous activation genes in host cells that provide a powerful method for discovering and isolating new genes and proteins, and for produce large quantities of specific proteins for commercialization. For some applications of non-targeted gene activation, it is desirable to create cell libraries in which each member of the library contains an activated activation vector within a unique location in the genome of the host cell, and in which each member of the library has a different activated endogenous gene. In addition, it would be desirable to remove cells from the library containing an integrated vector, but which avoids activating an endogenous gene. Since eukaryotic genomes frequently contain large regions of missing genes, the integration of an activation vector into a region lacking genes can occur frequently. These integrated vectors, however, do not activate an endogenous gene, and are therefore capable of conferring drug resistance to the host cells when a selection marker (directed by a suitable promoter and followed by a polyadenylation signal) is included on the activation vector. Even more problematic for gene discovery applications, a transcript containing vector sequences is produced in these cells subject to whether the gene is activated or not. In the case where a gene has not been activated, these sequences of the vector containing transcripts do not contain non-genomic genomic DNA sequences. As a result, when activated genes are isolated, one can not isolate all the RNA (or cDNA) molecules that are derived from the integrated vector (ie transcripts containing vector sequences), since many of these transcripts do not encode a gene endogenous. To overcome these difficulties, the present invention provides highly specific vectors and methods of facilitating the isolation of genes activated by the vector. These vectors of the invention are useful for the activated expression of endogenous genes and for isolating the mRNA and cDNA corresponding to the activated genes. One such vector reduces the number of cells in which the integrated vector within the genome prevents activating expression from (or transcripts thereof) an endogenous gene. By removing these cells, few members of libraries can be created and screened to isolate a given number of activated genes. In addition, cells that contain vectors that prevent activating gene expression produce an RNA molecule that can interfere with the isolation of currently activated genes. Therefore, the vectors described herein are particularly useful for producing cells suitable for overexpression of proteins and / or for isolating cDNA molecules corresponding to activated genes. The second type of vector of the invention is useful for isolating exon I from activated endogenous genes. As a result, these vectors can be used to obtain full-length genes from activated RNA transcripts. Each of the functional vector components described herein can be used separately, or in combination with each other.
Poly (A) trap activation vectors To facilitate the isolation of activated genes, the present invention provides novel gene activation vectors that are capable of producing a drug resistant colony, preferably after activation of an endogenous gene. Said vectors are referred to herein as "poly (A) trap vectors". Examples of the poly (A) trap vectors are those figures 8A-8F. The nucleotide sequence of one of said dual poly (A) trap vectors, designated pRIG21 b, is shown in Figures 15A-15B (SEQ ID NO: 19). These vectors contain a transcription regulatory sequence (which can be any transcription regulatory sequence)., including but not limited to the promoters, enhancers, and repressors described herein, and which preferably is a promoter or enhancer, and more preferably a promoter such as a promoter of the CMV immediate early gene, an SV40 T antigen promoter, a tetracycline-inducible promoter, or a β-actin promoter) operably linked to a selection marker gene that lacks a poly (A) signal, since the selection marker gene lacks a polyadenylation signal, its message will not be stable, and the product of the marker gene will not be produced efficiently. However, if the activation vector is integrated towards the 5 'end of an endogenous gene, the selection marker can use the polyadenylation signal of the endogenous gene, therefore allowing the production of the selection marker protein in sufficient amounts. to confer resistance to the drug. Therefore, the cells that make up this activation vector generally form a drug-resistant colony only if an endogenous gene has been activated. The poly (A) trap activation vectors can include any selectable or selectable marker. Therefore, the selection marker can be expressed from any promoter that is functional in the cells used to create the integration library. Therefore, the selection marker can be expressed by viral or non-viral promoters. Optionally, an uncoupled processing donor site can be included in the construct, preferably 3 'of the selection marker to allow the exon to encode the selection marker to be processed directly in the exons of the endogenous gene. When a transcriptional regulatory sequence towards the 3 'end and a donation processing site are included on the vector, the inclusion of a processing donation site adjacent to the selection marker results in the removal of these elements towards the end. 'from the messenger RNA. In a related embodiment, a second transcriptional regulatory sequence (which can be any transcriptional regulatory sequence, including but not limited to promoters, enhancers, and repressors described herein, and which preferably is a promoter and an enhancer, and more preferably a promoter) can be located towards the 3 'end of, and in the same orientation as, the selection marker. Optionally, an uncoupled processing donation site can bind to the transcriptional regulatory sequence towards the 3 'end. In this configuration, the poly (A) trap vector is capable of producing a messenger that contains the exon towards the 3 'end encoded by the processed vector towards endogenous exons. As described below, these chimeric transcripts can be translated into native or modified protein, depending on the nature of the exon encoded by the vector. As used herein, an "exon encoded by the vector" means a region of a vector toward the 3 'end of the transcriptional regulatory sequence and between the transcription initiation site and the non-coupled processing donation site found over the vector. The exon encoded by the vector is present at the 5 'end of the transcript containing the endogenous gene in the fully processed messenger. Similarly, as used herein, an "intron encoded by the vector" is the region of the vector located toward the 3 'end of the uncoupled processing donation site. When a linearization site is present on the vector, the intron encoded by the vector is the region of the vector that is towards the 3 'end of the exon encoded by the vector between the uncoupled processing donation site and the linearization site. The intron encoded by the vector is removed from the transcript of the activated gene during RNA processing.
Cheat acceptor processing vectors (SAT) As an alternative method for removing cells that do not activate an endogenous gene, the invention provides additional vectors designated herein as "trap acceptor processing" (SAT) vectors. These vectors are designed to process from a processing donation site encoded by the vector to an endogenous processing acceptor. In addition, vectors are designed to produce a product that is toxic to host cells (or a product that can be selected against) if processing does not occur. Therefore, these vectors facilitate the elimination of cells in which the exon is encoded by the vector that does not process an endogenous gene. The trap acceptor processing vectors may contain both a negative selection marker gene as well as a negative selection marker gene oriented in the same direction as the vector. As used herein, a positive selection marker is a gene that, after expression, produces a protein capable of facilitating the isolation of cells expressing the marker. Analogously, as used herein, a negative selection marker gene is a gene that, after expression, produces a protein capable of facilitating the removal of cells expressing the marker. A positive selection marker and a negative selection marker are preferably separated in the construction of the vector by an uncoupled processing donation site. In other embodiments, however, the positive selection marker can be fused to the negative selection marker gene. In this configuration, an uncoupled processing donation site is located between the positive and negative selection marker, such that the reading frame of the negative selection marker is preserved. In uncoupled processing donation site it is preferably located at the junction of positive and negative selection markers. However, the uncoupled processing donation site can be located anywhere in the fusion gene so that post processing to an endogenous processing acceptor site, the positive selection marker will be expressed in an active form and the selection marker Negative will be expressed in an inactive form, or will not be expressed. In this configuration, the positive selection marker is located towards the 5 'end of the negative selection marker. It will be apparent to one skilled in the art in view of the description contained herein that the positive and negative selection markers of the SAT vector do not need to be expressed as a fusion protein. In one embodiment, an internal entry site to the ribosome (ires) is inserted between the positive selection marker and the negative selection marker. In this configuration, the uncoupled processing donation site can be located between the two markers, or in the open reading frame of any of the marker genes so that, after processing, the positive selection marker will be expressed in a active and the negative selection marker will be expressed in an inactive form, or it will not be expressed. In another embodiment, the positive selection marker can be directed from a transcriptional regulatory sequence different from the negative selection marker. In this configuration, the uncoupled processing donation site is located in the untranslated region 5 'of the negative selection marker or anywhere in the open reading frame of the negative selection marker so that, after processing, the marker Negative selection will occur in an inactive form or it will not occur. In addition, when the positive and negative markers are directed from the different transcription regulatory sequence, the positive selection marker can be located towards the 5 'end or towards the 3' end of the negative selection marker, and the positive selection marker may contain or be lacking in a processing donation site at its 3 'end. The vectors described herein may contain any positive selection marker. Examples of positive selection markers useful in this invention include genes encoding neomycin (neo), hypoxanthine phosphoribosyl transferase (HPRT), puromycin (pac), dihydro-oratase, glutamine synthetase (GS), histidine D (his D), carbamyl phosphate synthase (CAD), dihydrofolate reductase (DHFR), gene 1 for multidrug resistance (mdr1), aspartate transcarbamylase, xanthine-guanine phosphoribosyl transferase (gpt), and adenosine deaminase (ada). Alternatively, the vectors may contain a selectable marker in place of the positive selection marker. Selectable markers include any protein capable of producing a recognizable phenotype in the host cell. Examples of selectable markers include cell surface epitopes (such as CD2) and enzymes (such as β-galactosidase). The vectors described herein may also, or alternatively, contain any negative selection marker that may be selected against. Examples of negative selection markers include hypoxanthine phosphoribosyl transferase (HPRT), thymidine synthase (TK), and diphtheria toxin. Negative selection markers may also be a selectable marker, such as a cell surface protein or an enzyme. Cells expressing negative selection markers can be removed by, for example, fluorescent activated cell selection (FACS) or cell selection in magnetic spheres. To isolate cells that have activated expression of an endogenous gene, the cells containing the integrated vector can be located under the selection of the appropriate drug. The selection for the positive selection marker and against the negative selection marker can occur simultaneously. In another modality, the selection may occur sequentially. When the selection occurs sequentially, the selection for the positive selection marker may occur first, followed by the selection against the negative selection marker. Alternatively, selection against the negative selection marker may occur first, followed by selection for the positive selection marker. Positive and negative markers are expressed by a transcriptional regulatory element located towards the 5 'end of the translation start site of each gene. When a positive / negative marker fusion gene or an ires sequence is used, a single transcriptional regulatory element directs the expression of both markers. A poly (A) signal can be located towards 3 'of each selection marker. If a positive / negative fusion gene is used as a single signal (A) it is located towards the 3 'of the markers. Alternatively, a poly (A) signal can be excluded from the vector to provide additional specificity for a gene activation event (see below dual poly (A) / trap processing acceptor).
Dual vector poly (A) / trap processing acceptors To further reduce the number of cells that are absent in a gene activation event, the invention also provides vectors that confer survival of the host cell only if the exon encoded by the vector has been processed to an exon from an endogenous gene and has acquired a poly (A) signal. These vectors are referred to herein as "dual poly (A) vectors / trap processing acceptor" or as "dual poly (A) / SAT vectors". By requiring both processing and polyadenylation to occur for cell survival, cells that avoid activating an endogenous gene are eliminated more efficiently from the activation library. The dual vectors poly (A) / trap processing acceptor contain a positive selection marker and a negative selection marker configured as described for the SAT vectors; however, no gene contains a functional poly (A) signal. Therefore, the positive selection marker will not be expressed at higher levels unless processing occurs to capture an endogenous poly (A) signal. Apart from the absence of a poly (A) signal, all other features and modalities for this type of vector are the same as those for the SAT vectors as described herein. Examples of the dual poly (A) / SAT vectors are shown in Figures 9A-9F and 10A-10F. The nucleotide sequence of one of said dual poly (A) / SAT vectors designated Prig22B, is shown in Figures 16A-16B (SEQ ID NO: 20). Vectors for the activation of protein expression from endogenous genes In many applications of non-targeted gene activation, it is desirable to produce protein from an activated endogenous gene. To accomplish this, a second transcriptional regulatory sequence (which can be any transcriptional regulatory sequence, includes but is not limited to the promoters, enhancers, and repressors described herein, which preferably are a promoter or enhancer, and more preferably a promoter) that can be placed towards the 3 'end of the selection marker (s) in any of the vectors described herein. When the poly (A) trap vectors, SAT vectors, or dual poly (A) trap / SAT vectors are used, the transcriptional regulatory sequence towards the 3 'end is located to direct the expression in the same direction as the marker (en) selection towards the 5 'end. To activate the expression of the full length protein with this type of vector, however, the vector must be integrated within 5 'UTR of the endogenous gene to avoid the cryptic start ATG codons towards the 5' end of exon I. Alternatively, to increase the frequency of protein expression using the activation of non-targeted genes, the transcriptional regulatory sequence towards the 3 'end on the vector can be operatively linked to an exonic sequence followed by a processing donation site. In a preferred embodiment the exon of the vector lacks a start codon. This vector is particularly useful for activating protein expression from genes that do not encode the translation initiation codon in exon I. In a preferred alternative embodiment, the exon of the vector contains a start codon. Additional codons can be located between the start codon of the translation and the donation processing site. For example, a partial secretion signal sequence can be encoded in the exon of the vector. The partial signal sequence can be any amino acid sequence capable of completing a partial signal sequence from an endogenous gene to produce a functional signal sequence. The partial sequence may encode between one and one hundred amino acids, and may be directed from existing genes or may consist of novel sequences. Therefore, this vector is useful for the production and secretion of proteins from genes that encode part of the endogenous signal sequence in exon I, and the remnant in the subsequent exons. In another example of a vector useful for activating a particular type of endogenous gene, a functional signal sequence can be encoded in the exon of the vector. This vector allows the protein to be produced and secreted from genes that encode a signal sequence in exon I. It can also be used to produce secreted forms of proteins that are not normally secreted. In cases where a start codon is included in the exon of the vector, it may be advantageous to produce a vector in each reading frame. This is achieved by varying the number of nucleotides between the start codon and the donation-binding site of processing. Together, the preferred configurations of the vector are capable of producing protein from endogenous genes, independently of the exon / intron structure, location of the start codon of the translation, or reading frame.
Vectors to isolate exon I from activated endogenous genes The non-directed gene activation vectors described above were used to activate and isolate endogenous genes and produce protein from endogenous genes. After integration to the 5 'end of an endogenous gene, however, each of these vectors produces a transcript lacking exon I from the endogenous gene. Since the vectors are designed to produce a transcript containing the vector encoding the exon processed at the initial processing acceptor site towards the 3 'end of the vector's integration site, and since the exon of the initial eukaryotic genes does not contains a processing acceptor site, normally, the first exon of the endogenous genes will not be recovered in mRNA molecules derived from the activation of non-targeted genes. For some genes, such as genes that contain the information encoding the first exon, there is a need to efficiently recover the first exon of the activated endogenous gene. To recover the first exon of the activated endogenous genes, a transcriptional regulatory sequence (which may be any transcriptional regulatory sequence, including but not limited to the promoters, enhancers, and repressors described herein and which preferably is a promoter or an enhancer, and more preferably a promoter) is included towards the 3 'end of the activation vector of a second transcription regulatory sequence (which may also be any transcriptional regulatory sequence, including but not limited to promoters, enhancers, and repressors described herein, and in which it is preferably a promoter or an enhancer, and more preferably a promoter) which directs the expression of a vector encoding the exon. Therefore, the regulatory sequence of the transcription towards the 5 'end is linked to an uncoupled processing donation site and the transcriptional regulatory sequence to the 3' end is not attached to a processing donation site. Both regulatory sequence of the transcription are oriented to direct the expression in the same direction. Examples of said exon I recovery vectors are shown in Figures 12A-12G. the integration of this type of vector will create at least two different types of RNA transcripts (figure 13). The first transcript is derived from the regulatory sequence of transcription towards the 5 'end and contains the exon of the processed vector towards exon II of an endogenous gene. The second transcript is derived from the transcriptional regulatory sequence towards the 3 'end and contains, from 5' to 3 ', the region between the vector and the transcription start site of the gene, exon, I , exon II, and all the exons towards the 3 'end. Using the methods described herein, both transcripts can be recovered and analyzed, allowing the characterization of exon I from the isolated genes by the activation of an unmanaged gene. The exon located on the activation vector can encode a selection marker, a protein, a portion of a protein, secretion signal sequences, a portion of a signal sequence, an epitope, or nothing. When a protein is encoded by the exon, a poly (A) signal can be included towards the 3 'end of the gene encoded by the vector. Alternatively, a poly (A) signal may be omitted. In another embodiment, a positive and negative selection marker can be operably linked to the transcriptional regulatory sequence (s) to the 5 'end. In this embodiment, the position of the unpaired processing donation site in relation to the selection markers is described above from the SAT vectors and the dual poly (A) / SAT vectors.
Gene activation vectors for entrapment of the gene in a single exon and in multi exon As previously evidenced, in one embodiment the poly (A) trap vectors of the invention may contain a promoter operably linked to a selection marker followed by an uncoupled processing donation site. Such vectors, when integrated into or near a gene, produce transcripts that contain the selection marker that is processed to an endogenous gene. Since the endogenous gene encodes a poly (A) signal the resulting mRNA is polyadenylated, thereby allowing the transcript to be translated at levels sufficient to confer resistance to the drug to the cell containing the integrated vector. Although the vectors described above are capable of "trapping" endogenous genes, the site of donation processing to the 3 'end of a selection marker can not be used in, and in some cases may interfere with, several potential applications for such vectors. First, these vectors can not be crossed to selectively trap genes with a single exon, since these genes do not contain a processing acceptor site. Second, these vectors frequently "trap" triptych genes, since resistance to the drug lies solely in the integration of the vector towards the 5 'end of a poly (A) signal, unfortunately, the cryptic poly (A) signals exist in the genome , leading to the formation of drug-resistant cells and the creation of non-gene transcripts that contain the selection marker. These cells and transcripts can interfere with gene discovery applications using these vectors. Third, without novel modifications such as those described herein (see above), these vectors are not capable of efficiently producing protein from the activated endogenous gene. In addition, the expression of proteins from an endogenous gene may be deficient even when an internal entry site to the ribosome (ires) is included between the selection marker and the processing donation site, since translation from a It is generally less efficient than translation from the first start codon of the 5 'end of a transcript. Therefore, there is a need for vectors that are able to more specifically trap endogenous genes, including genes with a single exon, and that are capable of efficiently expressing protein from the activated endogenous genes. Therefore, in additional embodiments, the present invention provides said vectors. In one such embodiment, the vector may contain a promoter operably linked to one or more (i.e., one, two, three, four, five or more) selection markers, where the selection marker is not followed by a site of processing donation or a poly (A) signal (see figures 17A-17G). In general, after integration into a host cell genome, this vector will not produce sufficient amounts of selection marker since the marker transcript will not be polyadenylated. Therefore, if the vector is integrated into a close proximity to, or within, a gene, including a gene with a single exon, the selection marker will acquire a poly (A) signal from the endogenous gene, therefore stabilizing the demarcated transcript and conferring a genotype of drug resistance of the cell. In addition to selection for integration of the vector into or near the genes, vectors according to this aspect of the invention can also be used to recover exon I from the activated gene, as described in the section of this application entitled "vectors to isolate exon I from activated endogenous genes". In a preferred embodiment, the vector may contain a second selection marker towards the 5 'end of the first selection marker "see Figure 18. The selection marker toward the 5' end is preferably operably linked to a transcription regulatory sequence. , more preferably to a promoter., an uncoupled processing donation site can be located between the transcription start site and the translation start site of the selection marker towards the 5 'end. Alternatively, the processing donation site can be located anywhere in the open reading frame of the selection marker towards the 5 'end, so that, following the integration of the vector into a host cell genome, and after the Processing from the vector encoding the endogenous exon processing donation site, the selection marker towards the 5 'end will occur in an inactive form, or will not occur. To select the cells that produce the positive selection marker towards the 3 'end in an active form, the cells containing the vector integrated into or near a gene can be isolated. In addition, by selecting against the cells that produce the selection marker towards the 5 'end in the active form, cells in which the vector transcript has been processed to an exon from an endogenous gene of multi exons In other words, these vectors can be used to isolate cells that contain an integrated vector within a single exon gene or within exon plus 19A and 19B of a multi-exon gene since, in these cases, a processing acceptor site is found absent between the site of donation processing encoded by the vector and the endogenous signal poly (A). therefore, most cells containing genes with multiple activated exons will not survive selection, and as a result, the cells containing the genes with activated single exons will be extensively enriched in the library. In another preferred embodiment, the vectors according to this aspect of the invention may contain one or more (i.e., one, two, three, four, five, or more, and preferably 1) negative selection marker (s) toward the 5 'end of the first selection marker (see Figures 19A and 19B) the negative selection marker is preferably operatively linked to a promoter. Optionally, an uncoupled processing donation site can be located between the transcription start site and the translation start site of the negative selection marker. Alternatively, the processing donation site can be located anywhere in the open reading frame of the negative selection marker, so that, following the integration of the vector into a genome of a host cell, and after processing from from the donation processing site encoded by the vector to an endogenous exon, the negative selection marker will occur in an inactive form, or it will not occur. By selecting for cells that produce the positive selection marker in an active form and selecting against the cells that produce the negative selection marker in the active form, these vectors can be used to identify cells that contain the integrated vector in or to the 5 'end of an endogenous gene. Since the processing (1) of an endogenous gene and the acquisition (2) of a poly (A) signal are required for cell survival, the cells that contain the triptych gene trap events are reduced within the library. The reason for this is that the probability of a vector integrating together with a cryptic processing acceptor site and a cryptic poly (A) signal is substantially less than the probability of a vector integrating with a single cryptic site. Therefore, these vectors provide a high degree of specificity for the entrapment genes that the previous vectors.
It will also be recognized by one skilled in the art in view of the techniques contained herein that vectors containing positive and negative selection markers can be used to produce protein from an activated endogenous gene. A vector configuration capable of directing protein production consists of the processing donation site located at the 5 'UTR of the negative selection marker. After processing, a chimeric transcript containing the 5 'UTR is produced from the negative selection marker linked to the second exon of an endogenous gene. This vector is capable of activating protein production from genes coding for a start codon of translation in the secondary or subsequent exon. Similarly, the processing donation site can be located in the open reading frame of the negative selection marker, in a position that does not interfere with the function of the marker unless the processing has occurred. Similar vectors that contain the processing donation site located in different reading frames in relation to the start codon of the translation can also be used. After processing an endogenous gene, these vectors will produce a chimeric transcript containing a start codon from the negative selection marker fused to exon II of the activated endogenous gene. Therefore, these vectors will be able to activate the expression of proteins from genes coding for a start codon of translation in exon I. In addition, the positive / negative selection vector is designed to be able to efficiently produce protein from endogenously activated genes that are described below. Any of the vectors of the invention may contain an internal ribosome entry site ((ires) 3 'of the selectable marker towards the 3' end.) Ires allow translation of the endogenous gene after integration of the vector into a gene endogenously Optional codon start translation may be included between the selection marker and the ires sequence When a start codon is present, additional codons may be present on the exon The start codon, and if the codons are present In addition, codons to the 3 'end of the translation start codon, if present, can code, for example, can be presented in any, and collectively all, reading frames relative to the processing donation site. a signal sequence of secretion, a partial signal sequence, a protein (including a protein of total length, a portion of a protein, a protein na motive, an epitope mark), or a spacer region. In preferred additional embodiments, any of the vectors described herein may contain, toward the 5 'end of the selection marker (s), a second transcriptional regulatory sequence (more preferably a promoter) operably linked to an exonic region, followed by an uncoupled processing donation site: This region towards the 5 'end is particularly useful for expressing protein from activated endogenous genes. The exon may lack a start codon for translation. Alternatively, the exon may contain a start codon of the translation. When a start codon is present, additional codons may be present on the exon. The start codon and, if the additional codons are present, can be present in any, and collectively all the reading frames in relation to the donation processing site. Additionally, codons towards the 3 'end of the translation initiation codon, if present, may encode, eg, a secretion signal sequence, a partial signal sequence, a protein (including a full length protein, a portion of a protein, a protein motif, a brand epitope), or a spacer region.
Activation Vectors Useful for Detecting Protein-Protein Interactions Genetic methods for detecting protein-protein interactions have been previously described (see for example, U.S. Patent Nos. 5,283,173; 5,468,614; and 5,667,973, the disclosure of which is hereby fully incorporated as reference). This method lies in the cloning of a first cDNA molecule together with, and in frame with, a fragment of a gene encoding a DNA binding domain; and cloning a second cDNA molecule together with, and in frame with, a fragment of a gene encoding a transactivation domain of transcription. Each chimeric gene is expressed from a promoter region located towards the 5 'end of the chimeric gene. To detect expression, both chimeric genes are transfected into a reporter cell. If the first chimeric protein interacts with the second chimeric protein (via the proteins encoded by the cloned cDNA fused to the binding and activation domains of the transcription of the DNA), then the DNA binding domain and the activation domain of the transcription will be bound within a single protein complex. As a result, the protein-protein interaction complex can bind to the regulatory region of the reporter gene and activate its expression. A limitation of this prior method is that it is only capable of detecting protein-protein interactions between genes that have been cloned as cDNAs. As described here, many genes are expressed at very low levels, in rare cell types, or during small windows of development; and therefore, these genes are typically absent from the cDNA libraries. In addition, many genes are too large to be efficiently isolated as full-length clones, thus making it difficult to use these previous methods. The present invention is capable of activating the expression of proteins from endogenous genes or from transfected genomic DNA. Unlike previous methods, virtually any gene can be expressed efficiently, without taking into account its normal pattern of expression. In addition, since the present invention is also capable of modifying the protein expressed from the endogenous gene or from the transfected genomic DNA), it is also possible to produce chimeric proteins for use in protein-protein interaction assays.
To detect protein-protein interaction by the present invention, two vectors are used. The first vector is generally referred to as BD / SD (binding domain / processing donor), contains a promoter operably linked to a polynucleotide encoding a DNA binding domain and an uncoupled processing donation site. The second vector, generally referred to as AD / SD (activation domain / processing donor), contains a promoter operably linked to a polynucleotide that encodes a transcriptional activation domain and an uncoupled processing donation site. To accommodate genes that have different reading frames, the binding domain and the activation domain can be encoded in each of the three possible reading frames in relation to the non-coupled processing donation site. In addition, BD / SD and AD / SD vectors may have other functional elements, as described herein for other vectors, including selection markers and amplification markers. The vectors may also contain selection markers oriented in a configuration that allows selection for cells in which the vector has activated a gene. The multiple promoter / exon activation vectors are also useful. Several examples of BD / SD and AD / SD vectors are illustrated in Figure 25. An example illustrating the detection of a protein-protein interaction using these vectors is described in Figure 26. The DNA binding domain of the BD vector / SD can encode any protein domain capable of binding to a specific nucleotide sequence. When a transcription activation protein is used to supply the DNA binding domain, the activation domain of the transcription is omitted from the BD / SD vector. Examples of the proteins encoding genes with DNA binding domains include, but are not limited to, the yeast GAL4 gene, the yeast GCN4 gene, and the yeast ADRI gene. Other genes from prokaryotic and eukaryotic sources can also be used to supply DNA binding domains. The domain of transcription activation of the AD / SD vector encodes a protein domain capable of improving the transcription of a reporter gene when it is located near the reporter gene promoter region. When a transcription activation protein is used to supply the activation domain of transcription, the DNA binding domain is omitted from the AD / SD vector. Examples of genes encoding proteins with transcriptional activation domains include, but are not limited to, the yeast GAL4 gene, the yeast GCN4 gene, and the yeast ADR1 gene. Other genes from prokaryotic and eukaryotic sources can also be used to supply transcriptional activation domains. In the present invention, protein-protein interactions are detected using BD / SD and AD / SD vectors described above, to activate the expression of genes located in genomic DNA extensions. In one embodiment, the BD / SD vector is randomly integrated into the genome of a reporter cell line. As with other vectors described herein, BD / SD vectors are capable of activating the expression of proteins from genes located towards the 3 'end of the vector's integration site. Since activation of the exon in the BD / SD vector encodes a DNA binding domain, the activated endogenous protein will be produced as a fusion protein containing the DNA binding domain and its N-terminus. Therefore, by integrating the BD / SD vector into the genome of a host cell, a fusion protein library can be created, where each protein will contain a DNA-binding domain and its N-terminus. It is also recognized that the AD / SD vector can also be integrated into the genome of a reporter cell line to produce a cell library, where each member of the library is expressed as a different endogenous gene fused to a transcription activation domain. . Once created, the BD / SD library can be transfected with a vector that expresses a specific gene (referred to below as gene X) fused to a transcription activation domain. This allows virtually any gene encoded in the genome to be tested for an interaction with the X gene. Similarly, the AD / SD library can be transfected with a vector that expresses a specific gene (eg X gene) fused to a binding domain. This DNA allows virtually any gene encoded in the genome to be tested for an interaction with the X gene. It is also recognized that the specific gene can be expressed stably in the host cell prior to construction of the BD / SD or AD / SD library. In an alternative embodiment, the genomic DNA is cloned into the BD / SD and / or AD / SD vector (s) towards the 3 'end of the DNA binding domain and the activation domain, respectively. If a gene is present and correctly oriented in the genomic DNA, then the BD / SD vector (or the AD / SD vector) will be able to express the gene as a fusion protein useful for detecting protein-protein interactions. As the integration of the BD / SD (or AD / SD) vectors in situ, any gene can be evaluated independently of whether it has previously been isolated as a cDNA molecule. In another embodiment, a second library is created in the cells of the first library. For example, the AD / SD vector can be integrated into cells comprising the BD / SD library. Conversely, the BD / SD vector can be integrated into cells comprising the AD / SD library, this allows all proteins to be expressed as fusion proteins with a binding domain to be evaluated against all the fusion proteins with the AD domain. activation. Since the present invention is capable of expressing substantially all proteins (such as fusions with the binding and activation domains) in a eukaryotic organism, this method, for the first time, allows all combinations of protein-protein interactions to be evaluated in a single library. To carry out protein-protein interactions in an organism, the library must be substantially comprised within a library. For example, to detect -50% of protein-protein interactions in an organism containing 100,000 genes, the first library should contain at least 100,000 cells each expressing an active gene. Within each clone of the first library, the second vector could be used to create a library of at least 100,000 clones, each containing an activated gene. Therefore, the total library could contain 100,000 clones per 100,000 clones, or 1010 clones total. This assumes that all genes are activated at equal frequencies, and that each gene activation event results in the production of a fusion protein in frame with the endogenous activated gene. To produce libraries with more than 50% coverage of protein-protein interactions and / or to ensure that proteins that are activated at lower frequencies are represented, larger libraries can be created. It is also recognized that the library versus library selection can be created in several ways. First, both libraries are produced, simultaneously or sequentially, by integrating the BD / SD and AD / SD vectors into the genome of the same reporter cells. Second, a first library is created by integrating a BD / SD vector into the genome of a reporter cell and a second library is produced by transfecting the AD / SD vector containing the cloned genomic DNA. It is recognized that in this Method, the AD / SD library can be created initially, followed by the introduction of a BD / SD vector containing the cloned genomic DNA. It is also recognized that the first library can be created by transfecting the BD / SD vector (or AD / SD vector) containing the genomic or cloned DNA, followed by the integration of a second vector into the reporter cell genome. Third, both libraries are created, simultaneously or sequentially, by transfecting cells with BD / SD and AD / SD vectors, where each vector contains a cloned fragment of genomic DNA. Fourth, it is recognized that when the cloned genomic fragments are used either in the BD / SD vector or in the AD / SD vector, a cDNA library can be created in the other vector and introduced into the cells. This allows all the genes present in the cDNA library to be tested for interaction with the other genes in the genome. Since the library / library selection is involved in the creation of large cell libraries, it is important to maximize the frequency of gene activation and the production in frame of the fusion protein between the members of the library, this can be achieved in at least two modes First, the BD / SD and AD / SD vectors can contain selection markers in a configuration that "traps" the genes. Examples of entrapment selection vectors are shown in Figures 8, 9, 10, 17, 19, 21, and 25. These vectors select cells in which the activation vector has transcriptionally activated a gene. Second, the multiple promoter / exon activation units can be included in the BD / SD and AD / SD vectors. Each promoter / exon activation unit encodes the binding domain (or activation domain) in a different reading frame in relation to the non-coupled processing donation site. An example of a multi-promoter / exon vector is illustrated in Figure 23. This type of vector ensures that any gene activated at the transcription level will produce as a frame fusion protein from the promoter / exon activation subunits towards the vector. Third, vectors can be introduced into reporter cells using efficient transfection procedures. In this respect, the insertion of BD / SD and AD / SD vectors through retroviral integration. Reporter cells useful in the present invention include any cell that is capable of appropriately processing the transcripts produced by the BD / SD and AD / SD vectors. The reporter cells contain a reporter gene that is expressed at higher levels in the presence of a protein-protein interaction between the protein that is expressed from the BD / SD and AD / SD vectors. The reporter gene can be a selection marker, such as any of the markers described herein. Alternatively, the reporter gene can be a selection marker. Examples of useful selection markers and selectable markers are described herein. In the reporter cell, a minimal promoter is operatively linked to the reporter gene. To allow for increased expression of the reporter gene in the presence of a protein-protein interaction, a DNA binding site is located at or near a minimal promoter, so that the DNA binding site is recognized by the protein encoded by the DNA binding domain region of the BD / SD vector. In the absence of a protein-protein interaction, the fusion protein of the DNA-binding domain produced from BD / SD lacks a transcriptional activation domain, and therefore, can not activate transcription from the minimal promoter of the reporter gene. However, if the fusion protein of the DNA binding domain produced by BD / SD interacts with the activation domain fusion protein from the AD / SD vector, then the protein complex can activate the expression of the reporter gene. Increased expression of the reporter gene can be detected using an assay for the selection marker, or using drug selection for a selection marker. It is also recognized that other reporter systems can be used in conjunction with the present invention to detect protein-protein interactions. Specifically, any protein containing two separable domains, each requiring close proximity to the other to produce a biochemical or structural activity, can be used in conjunction with the present invention.
Multi-promoter / exons of activation In applications of non-targeted gene activation where the goal is to activate the expression of proteins from an unknown gene, typically a collection of vectors should be used, therefore, in a In an additional embodiment, the invention provides vectors containing one or more activation promoter / exon units (see Figures 20A-20E). To order the variety of gene structures that exist in the genome of eukaryotic cells, the vectors presumably contain according to this aspect of the invention a transcriptional regulatory sequence (e.g., a promoter) operably linked to an activating exon with a different structure. Collectively, these activation exons are capable of activating the expression of proteins from substantially completely endogenous genes. For example, to activate the expression of proteins from genes coding for a start-of-translation codon in exon II (or exons towards the 3 'end of exon II), a vector may contain a transcriptional regulatory sequence ( for example, a promoter) operably linked to an activation exon that lacks a start codon of the reduction. To activate the expression of proteins from all types of genes encoding a start-of-translation codon in exon I, three separate vectors must be used, each containing a transcriptional regulatory sequence (eg, a promoter). ) operatively linked to a different activation exon. Each activation exon encodes a start codon in a different reading frame. The additional activation exon configurations are also useful, for example, to activate protein expression and secretion from genes that encode a portion of their secretion signal sequence in exon I, three separate vectors should be used, each containing a transcriptional regulatory sequence (eg, a promoter) operably linked to a different activation exon. Each activation exon encodes a partial signal sequence in a different reading frame. To activate protein expression and secession from genes encoding their complete signal sequence in exon I, three vectors must be used, each containing a transcriptional regulatory sequence (eg, a promoter) operably linked to an exon. of different activation, each activation exon contains a complete secretion signal sequence in a different reading frame. In addition to the activated expression of the genes encoding secreted proteins, the promoter / activation exons that encode the total signal sequences will also activate the expression and secretion of proteins that are not normally secreted. This, for example, can facilitate the purification of proteins from proteins that are normally located intracellularly. Other useful coding sequences may be included in the activation exon of the vectors in accordance with this aspect of the invention, including but not limited to sequences that encode proteins (including full-length proteins, portions of proteins, protein motifs, and / or brand epitopes). As described herein, vectors according to this aspect of the invention can be integrated, individually or collectively, into the genome of a host cell to produce a cell library. Each member of the library will potentially overexpress a different endogenous protein. Thus, these vector collections make it possible to activate all or substantially all of the endogenous genes in a eukaryotic host cell. When integrating a collection of vectors within the host cells, as described above, activation of protein expression can be achieved from substantially any gene. Unfortunately, to produce protein from all endogenous genes, a large number of library members must be generated. In part, this is due to the large number of genes encoded by the host cell. In addition, using this method, many cells will contain an integrated vector in or near an endogenous gene; therefore, the integrated vector will contain an activated exon with a structure that is incompatible with the activation of protein expression from the endogenous gene. For example, the exon of the vector can verify a start codon in reading frame 1 (relative to the processing junction), where the protein encoded by the first exon towards the 3 'end of the integrated vector can be in the frame of reading 2 (in relation to the union of prosecutions). Therefore, many members of libraries will contain an integrated vector that has the activated transcription of an endogenous gene, but that does not produce the protein encoded by the endogenous gene. To decrease the number of cells that do not activate the expression of proteins after integration of the vector into or near an endogenous gene, a vector containing multiple promoters / activation exons can be used. In this vector, each promoter / exon activation unit may be capable of activating protein expression from an endogenous gene with a different structure. Since a single vector comprises multiple activation exons is capable of producing multiple transcripts, each containing a different activation exon, a single vector integrated into or near a gene may be capable of activating protein expression, regardless of structure of the endogenous gene (see figure 21). The multiprodrive / exon activation vectors may contain two or more activation promoter / exons. Each promoter / exon activation unit can be followed by an uncoupled processing donation site. In one such modality, two activation promoter / exons are included in the vector, wherein each promoter / exon of activation is capable of activating the expression of proteins from a different type of endogenous gene. In a preferred embodiment, the vector may contain three activation promoter / exons, where each exon encodes a translation start codon in a different reading frame. In another preferred embodiment, the vector may contain three activation promoter / exons, where each exon encodes a partial secretion signal sequence in a different reading frame. In yet another preferred embodiment, the vector may contain three activation promoter / exons, where each exon encodes a complete secretion signal sequence in a different reading frame. Additional modalities include each of the above vectors that contain a fourth activation promoter / exon, where the fourth activation exon does not encode a translation initiation codon. Any number (eg, one or more, two or more, three or more, five or more, etc.) of promoter / exon activation units can be included in the vector. When multiple promoter / exons of activation are present in a single vector, they are preferably oriented in the same relative direction to each other (ie, the promoter directs expression in the same direction). The promoters that direct the transcription of the different activation exons may be the same with each other or one or more promoters may be different. The promoters can be viral, cellular, or synthetic. The promoters can be inducible or constitutive. Other types of promoters and regulatory sequences, recognizable by those skilled in the art or as described herein, can also be used to prepare the vectors according to this aspect of the invention. Any of the vectors containing multiple units of activation promoter / exons may optionally include one or more selection marker (s) and / or amplifiable label (s). The selectable and / or amplifiable markers may contain a poly (A) signal. Alternatively, the markers may lack a poly (A) signal. The selectable marker can be a positive or negative selection marker. The selection marker may contain a site towards the 5 'end of the uncoupled processing donor, inside, or towards the 3 'end of the marker. Alternatively, the selection marker may lack an uncoupled processing donor site. The selectable marker (s) and / or amplifiable label (s), when present, can be located toward the 5 'end between, or toward the 3' end of, the promoter / exon activation units. The selection and / or amplifiable marker (s) can be located on the vector in any orientation relative to the promoter / exon activation units. When the purpose of the selection marker is to trap endogenous genes, the selection marker is preferably oriented in the same direction as the activation promoter / exon.
Amplifiable Markers Any of the vectors described herein may also optionally comprise one or more (eg, two, three, four, five, or more) amplifiable labels. Examples of the amplifiable markers include those described in detail herein above. Preferably, the amplifiable marker (s) are located toward the 5 'end of the positive / negative selection marker (s). When using trap polyadenylation vectors, it may be advantageous to omit a polyadenylation signal from the amplifiable marker (s) to eliminate the possibility of capturing a vector encoding a poly (A) signal derived from a concatamerization vector prior to its integration When present, the amplifiable marker (s) can be located towards the 5 'end of the transcriptional regulatory activating sequence (i.e., the promoter responsible for directing transcription from the vector through the endogenous gene). The amplifiable marker (s) may be present on the vector in any orientation (ie, the open reading frame may be present on any strand of DNA). It is also understood that the amplifiable marker (s) may also be the same gene as the positive selection marker. Examples of genes that can be used both as positive selection markers as well as amplifiable markers include dihydrofolate reductase, adenosine deaminase (ada), dihydro-oratase, glutamine synthase (GS), and carbamyl phosphate synthase (CAD). In some embodiments and for certain applications, it may be desirable to place the multiple amplifiable markers on the vector. The use in addition to an amplifiable marker allows dual selection, or alternatively sequential selection, for each amplifiable marker. This facilitates the isolation of cells that have amplified the vector and the flanking genomic locus, including the gene of interest.
Promoters It is understood that any promoter and regulatory element can be used on these activation vectors to direct the expression of the selection marker, amplifiable marker (if present), and / or the endogenous gene. In additional preferred embodiments, the promoter that directs the expression of the endogenous gene is a strong promoter. The promoter of the immediate early CMV gene, SV40T antigen promoter, and β-actin promoter are examples of this type of promoters. In another preferred embodiment, an inducible promoter is used to direct the expression of the endogenous genes. This allows the endogenous proteins to be expressed in a more controlled manner. The tetracycline-inducible promoter, heat shock promoter, ecdysone promoter, and metallothionein promoter are examples of this type of promoter. In yet another embodiment, a tissue-specific promoter is used to direct the expression of endogenous genes. Examples of tissue-specific promoters include, but are not limited to, immunoglobulin promoters, casein promoter, and growth hormone promoters.
Restriction sites The vectors of the invention may contain one more restriction sites located towards the 3 'end of the processing donation site not coupled in the vector. These restriction sites can be used to linearize the plasmid vectors before transfection. In a linearized configuration, the activation vector contains, from 5 'to 3' in relation to the transcribed strand, a promoter, a processing donation site, and a linearization site. A restriction site (s) can also be included in the intron vector to facilitate the removal of the cDNA molecules contained in the vector intron. In this embodiment, the vectors contain, from 5 'to 3' in relation to the transcribed strand, a promoter, a processing donation site, a restriction site, and a linearization site. By including a restriction site between the uncoupled processing donation site and the linearization site, unprocessed transcripts can be removed by digestion of the cDNA with the appropriate restriction enzyme. The cDNA molecules derived from the activation of the gene have removed the intron from the vector containing the restriction site, and therefore, will not be digested. This allows the transcripts activated by the gene to be preferentially enriched during the amplification / cloning, and greatly facilitate the identification and analysis of the endogenous genes. A restriction site (s) can also be included in the exon of the vector to facilitate the cloning of active genes. Following the activation of the gene, the mRNA is recovered from the cells and synthesized to cDNA. By digesting the cDNA with a restriction enzyme that cleaves the exon from the vector, the cDNA molecules activated by the gene will contain an appropriate excess at the 5 'end for subsequent cloning into a suitable vector. This facilitates the isolation of the cDNA molecules activated by the gene. In one embodiment, the restriction site located in the exon of the vector is different than the restriction site (s) located in the vector intron. This facilitates the removal of the cDNA molecules containing an intron from the vector since the cDNA fragments digested from the intron of the vector containing transcripts can be designed to have an excess that is incompatible with the cloning vector (see below). ). Alternatively, the degenerate restriction sites recognized by the same enzyme can be located in the exon and intron of the vector. The enzymes that cleave these sites are capable of cleaving multiple sites, sites with an odd number of bases in the recognition sequence, sites with interrupted palindromes, non-palindromic sequences, or sites containing one or more degenerate bases. In other words, the restriction sites recognized by the same restriction endonucleases can be used if the enzyme produces an excess in the exon of the vector that is different from the excess produced in the intron of the vector. Since different excesses occur, a cloning vector containing a site that is compatible with the excess of the vector exon, and incompatible with the excess of the intron of the vector can be used to preferentially clone the exon of the vector containing cDNA molecules that they lack the intron of the vector. Examples of useful degenerate restriction sites include DNA sequences recognized by Sfi I, Acci, Afl III, Sapl, Foot I, Tsp45 I, ScrF I, Tse I, PpuM I, Rsr II and SgrA I. The site (s) ) of restriction localized in the intron and / or exon of the vector can be a rare restriction site (for example an 8 bp restriction site) or an ultra rare site (for example a site recognized by the nucleases encoded by the intron) . Examples of restriction enzymes with 8 bp recognition sites include α / oyl, Sfil, Pac \, Asc \, Fsel, Pmel, Sgñ, Srfí, Sbfl, Sse 8387 I and Swal. Examples of the restriction enzymes encoded by the intron include l-Ppol, l-Scel, l-Ceul, Pl-Pspl and PI-T // I. Alternatively, restriction sites smaller than 8 bp can be replaced in the vector. For example, restriction sites composed of 7 bp, 6 bp, 5 bp, or 4 bp can be used. In general, the use of smaller restriction recognition sites will lead to the cloning of smaller genes than full length ones. In some cases, such as the creation of hybridization probes, the isolation of smaller cDNA clones can be advantageous.
Bidirectional vectors of activation Activation vectors described here can also be bidirectional. When a single transcription regulatory sequence for activation is present in the vector, gene activation occurs only when the vector is integrated in an appropriate location (for example towards the 5 'end of the gene) and in the correct orientation. That is, in order to activate an endogenous gene, the promoter in the activation construct must look towards the endogenous gene allowing the transcription of the coding strand. As a result of this directional requirement, only half of the integration events within a locus can result in the transcriptional activation of an endogenous gene. The other half of integration events results in the vector transcribing beyond the gene of interest. Therefore, to increase the frequency of gene activation by a factor of two, the present invention provides bidirectional vectors that can be used to activate an endogenous gene irrespective of the orientation in which the vector is integrated into the cell genome host A bidirectional vector according to this aspect of the invention preferably comprises two transcriptional regulatory sequences (which may be any transcriptional regulatory sequence, including but not limited to the promoters, enhancers, and repressors described herein, and which are preferably promoters. or enhancers, and more preferably enhancers), two processing donation sites, and a linearization site. When a donation processing site is useful, each transcription regulatory sequence is operably linked to a separate processing donation site, and the pairs of transcription / donation processing regulatory sequences may be in a reverse orientation with a relationship each other (ie, the first regulatory sequence of the transcript can be integrated into the genome of the host cell in an orientation that is inverse relative to the orientation in which the second transcriptional regulatory sequence has been integrated into the genome of the host cell). The two opposite sequences of regulation of the transcription / donation processing site can be separated by the linearization site. The function of the linearization site is to produce free ends of DNA between the regulatory sequence of the transcription / donation processing sites (ie at a suitable location for the activation of endogenous genes). Examples of bi-directional vectors of the invention are shown in Figures 11A-11C. The two opposite regulatory sequences of transcription may be the same transcriptional regulatory sequences or may be different transcriptional regulatory sequences. Optionally, a translation initiation codon (ie, ATG) and one or more additional codons can be included in either or both exons encoded by the vector. When a translation initiation codon is present, each or both exons of the vector may encode a protein, a portion of a protein, a secretion signal sequence, a portion of a secretion signal sequence, a protein motif, or a epitope mark. Alternatively, each or both exons of the vector may lack a start codon of the translation. Bidirectional vectors in accordance with this aspect of the invention may optionally include one or more markers of choice and one or more amplification markers, including those selectable markers and amplifiable markers described in detail herein. The bidirectional vectors may also be configured as poly (A) trap, trap processing acceptor, or dual poly (A) vectors / trap processing acceptor, as described above. Other configurations of vectors described for the unidirectional vectors may also be incorporated within bidirectional vectors.
Co-transfection of genomic DNA with non-targeted activation vectors It is recognized that any of the vectors described herein can be integrated into, or otherwise combined with, genomic DNA prior to transfection within a eukaryotic host cell. This allows a high level of expression for virtually any gene in the genome, without considering the normal expression characteristic of the gene. Therefore, the vectors of the invention can be used to activate the expression of genes encoded by fragments of isolated genomic DNA. To accomplish this, the vector is integrated into, or otherwise combined with, genomic DNA that contains at least one gene, or a portion of a gene. Typically, the activation vector must be located in or towards the 5 'end of a gene in order to activate the expression of the gene. Once inserted (or joined), the gene towards the 3 'end can be expressed (as a transcript or a protein) by introducing the genomic vector / DNA into an appropriate eukaryotic host cell. Following introduction into the host cell, the promoter encoded by the vector directs expression through the gene encoded on the isolated DNA, and upon processing, produces a mature mRNA molecule. Using the appropriate activation vectors, this process allows the protein to be expressed from any gene encoded by the transfected genomic DNA. In addition, using the methods described herein, the cDNA molecules, which correspond to the genes encoded by the transfected genomic DNA, can be generated and isolated.
To achieve stable expression of the activated gene, the transfected activation vector / genomic DNA can be integrated into the genome of the host cell. Alternatively, the transfected activation vector / genomic DNA can be maintained as a stable episome (eg, using a viral origin of replication and / or a nuclear retention function - see below). In yet another embodiment, the activated gene can be expressed transiently, for example, from a plasmid. As used herein, the term "genomic DNA" refers to genetic material not processed from a cell. Processing refers to the process of removing introns from genes that follows transcription. Thus, genomic DNA, in contrast to mRNA and cDNA, contains exons and introns in an unprocessed form. In the present invention, genomic DNA derived from eukaryotic cells is particularly useful since most eukaryotic genes contain exons and neutrons, and since many of the vectors of the present invention are designed to activate genes encoded in the Genomic DNA by processing a first exon towards the 3 'end, and removing the intervening neutrons. Genomic DNA useful in the present invention can be isolated using any method known in the art. A number of methods for isolating high molecular weight genomic DNA and ultrahigh molecular weight genomic DNA (intact and enclosed in an agarose concentrate) have been described (Sambrook et al., Molecular Cloning, Cold Spring Harbor Laboratory Press, 1989). )). In addition, commercial kits for isolating genomic DNA of various sizes are also available (Gibco / BRL, Stratagene, Clontech, etc.). The genomic DNA used in the invention can encompass the entire genome of an organism. Alternatively, the genomic DNA may include only a portion of the entire genome from an organism. For example, genomic DNA can contain multiple chromosomes, a single chromosome, a portion of a chromosome, a genetic locus, a single gene, or a portion of a gene. The genomic DNA useful in the invention can be substantially intact (ie not fragmented) before introduction into a host cell. Alternatively, the genomic DNA can be fragmented before introduction into a host cell. This can be achieved for example, by mechanical fragmentation, nuclease treatment, chemical treatment, irradiation or other methods known in the art. When the genomic DNA is fragmented, the fragmentation conditions can be adjusted to produce DNA fragments of any desirable size. Typically, the DNA fragments must be large enough to contain at least one gene, or a portion of a gene (eg, at least one exon). Genomic DNA can be introduced directly into an appropriate eukaryotic host cell without prior cloning. Alternatively, genomic DNA (or genomic DNA fragments) can be cloned into a vector before transfection. Useful vectors include, but are not limited to, plasmids with high and intermediate number of copies (eg pUC, pBluescript, pACYC184, pBR322, etc.), cosmids, artificial bacterial chromosomes (BAC), artificial yeast chromosomes ( YAC), artificial chromosomes P1 (PAC), and phages (for example, M13, etc.). Other cloning vectors known in the art can also be used. When the DNA has been cloned into a cloning vector, specific cloned DNA fragments can be isolated and used in the present invention. For example, YAC, BAC, PAC, or cosmid libraries can be selected by hybridization to identified clones that map to specific chromosomal regions. Optionally, once isolated, these clones can be ordered to produce a contiguous through the chromosomal region of interest. To rapidly isolate cDNA copies of the genes present in this contiguous, these genomic clones can be transfected, separately, or en masse, with the activation vector within a host cell. A cDNA containing an exon encoded by the vector, and lacking an intron encoded by the vector, can then be isolated and analyzed. Thus, since all the genes present in a contiguous one can be rapidly isolated as cDNA clones, this method greatly improves the speed of positional cloning methods. Any activation vector described herein, including derivatives recognized by those skilled in the art, can be co-transfected with genomic DNA, and therefore, are useful in the present invention. In its simplest form, the vector may contain a promoter operably linked to an exon followed by an uncoupled processing donation site. Examples of other useful vectors include, but are not limited to, poly A trap vectors (eg, vectors illustrated in Figures 8, 9, 11 C, 12 F, and 17), and dual poly (A) / acceptor vectors. of trap processing (eg, vectors illustrated in Figures 9, 10, 12G, 19, and 21), bidirectional vectors (eg, vectors illustrated in Figure 11), single-exon trap vectors (eg, the vector Illustrated in FIG. 19), multipromotor / exon activation vectors (eg, the vector illustrated in FIG. 23), vectors for isolating cDNA corresponding to activated genes and vectors to activate the expression of proteins from activated genes ( for example, vectors illustrated in Figures 2,3,4, 8B-F, 9B-C, 9E-F, 10B-C, 10E-F, 11, 12, 17B-G, and 23). The activation vector may also contain a viral origin of replication. The presence of a viral origin of replication allows vectors containing genomic fragments to propagate as an episome in the host cell. Examples of useful viral replication origins include ori P (Epstein Barr Virus), SV40 ori, BPV ori, and vaccinia ori. To facilitate replication from these origins, the appropriate viral replication proteins can be expressed from the vector. For example, EBV P ori and SV40 ori contained in the vectors can also encode and express EBNA-1 or T antigen, respectively. Alternatively, the vectors can be introduced into cells that are easily expressed in the viral replication protein (e.g., EBNA-1 or T antigen). Examples of cells expressing EBNA-1 and T antigen include human 293 cells transfected with an expression unit of EBNA-1 (Clontech) and COS-7 cells (American Type Culture Collection; ATCC No. CRL-1651), respectively. The activation vector may also contain an amplification marker. This enables cells that contain increased copies of the vector and flanking genomic DNA, either episomal or integrated into the genome of the host cell to be isolated. Cells containing increased copies of the vector and flanking genomic DNA express the activated gene at higher levels, facilitating gene isolation and protein production. The activation vectors and the genomic DNA can be introduced into any host cell capable of processing from the processing donation site encoded by the vector to a processing acceptor site encoded by the genomic DNA. In a preferred embodiment, the genomic DNA / activation vector is transfected into a host cell from the same species of cells from which the genomic DNA was isolated. In some cases, however, it is advantageous to transfect genomic DNA within a host cell from a species that is different from the cell from which the genomic DNA was isolated. For example, transfection of genomic DNA from a species within a host cell of a second species can facilitate the analysis of the activated genes in the transfected genomic DNA using hybridization techniques. Under highly severe hybridization, the activated genes that were encoded by the transfected DNA can be distinguished from the genes derived from the host cell. Transfection of genomic DNA from a species within the host cell from other species can also be used to produce protein in a heterologous cell. This may allow the protein to be produced in the hetérologous cells that provide growth, protein modification or manufacturing advantages. The activation vector can be co-transfected into a host cell together with the genomic DNA, where the vector is not linked to the genomic DNA before introduction into the cell. In this modality, the genomic DNA will be fragmented during the transfection process, thus creating free ends of DNA. These DNA ends can be re-ligated into the co-transfected activation vector by the cell's DNA repair machinery. Following the binding of the activation vector, the genomic DNA and the activation vector can be integrated into the genome of the host cell by processing the non-homologous recombination. If, during this process, a vector is re-linked to a gene encoded by the transfected genomic DNA, the vector will activate its expression. Alternatively, the non-targeting activation vector may be physically bound to the genomic DNA before transfection. In a preferred embodiment, the genomic DNA fragments are ligated to the vector before transfection. This is an advantage because this maximizes the probability that the vector is operably linked to a gene encoded by the genomic DNA, and minimizes the probability of the vector to integrate into the genome of the host cell without the heterologous genomic DNA. In a related embodiment, the genomic DNA can be cloned into the activation vector, towards the 3 'end of the activation exon. In this modality, the cloning of large genomic fragments can be facilitated in vectors capable of accommodating large genomic fragments. Thus, the activation vector can be constructed in BAC, YAC, PAC, cosmids, or similar vectors capable of propagating large fragments of genomic DNA. Another method for attaching the activation vector to genomic DNA involves transposition. In this embodiment, the activation vector is integrated into the genomic DNA by transposition by retroviral integration reactions prior to transfection within the cell. Accordingly, the activation vectors may contain cis sequences necessary to facilitate transposition and / or retroviral integration. Examples of vectors containing transposon signals are illustrated in Figure 27; however, it is recognized that any vector described herein may contain transposon signals. Any transposition system capable of inserting foreign sequences into genomic DNA can be used in the present invention.
In addition, transposons capable of facilitating inversions and deletions can also be used to practice the invention. Although the deletion and inversion systems do not integrate the activation vector into the genomic DNA, they allow activation of the vector to change positions relative to the cloned genomic DNA when the genomic DNA has been cloned into the activation vector. Therefore, multiple genes within a given genomic fragment can be activated by mixing the activation vector (by integration, inversion or deletion) into multiple positions within, or outside of, the genomic fragment. Examples of transposition systems useful for the present invention include, but are not limited to dg, Tn 3, Tn5, Tn7, Tn9, Tn10, Ty, retroviral integration and retro-transposons (Berg et al., Mobile DNA, ASM Press , Washington DC, pp. 879-925 (1989), Strathman et al., Proc. Nati, Acad Sci. USA 88: 1247 (1991), Berg et al., Gene 1 13: 9 (1992); Liu et al. , Nucí Acids Res. 15: 9461 (1987), Martin et al., Proc. Nati, Acad Sci. USA 92: 8398 (1995), Phadnis et al., Proc. Nati. Acad Sci. USA 86: 5908 (1989), Tomcsanyi et al., J. Bacteriol 172: 6348 (1990), Way et al., Gene 32: 369 (1984), Bainton et al., Cell 65: 805 (1991), Ahmed et al. , J. Mol. Biol. 178: 941 (1984), Benjamin et al., Cell 59: 373 (1989), Brown et al., Cell 49: 347 (1987), Eichinger et al., Cell 54: 955 ( 1988), Eichinger et al., Genes Dev. 4: 324 (1990), Braiterman et al., Mol Cell. Biol. 14: 5719 (1994), Braiterman et al., Mol Cell Biol. 14: 5731 (1994).; York et al., Nucí Acids Res. 26: 1927 (1998); Devine et al., Nucí Acids Res. 18: 3765 (1994); Goryshin et al., J. Biol. Chem. 273: 7367 (1998).
Using transposition, an activation vector can be integrated into any form of genomic DNA. For example, the activation vector can be integrated into a genomic DNA either intact or fragmented. Alternatively, the activation vector can be integrated into a cloned genomic DNA fragment (Figure 28). In this embodiment, the genomic DNA can reside in any cloning vector, including high plasmid and intermediate number of copies (eg pUC, pBluescript, pACYC184, pBR322, etc.), cosmids, artificial bacterial chromosomes (BAC), chromosomes yeast artificial (YAC), artificial chromosomes P1 (PAC), and phages (for example, lambda, M13, etc.). Other cloning vectors known in the art can also be used. As described above, genomic fragments of specific genetic loci can be isolated and used as a substrate for the integration of the activation vector. Following the integration of the activation vector, the genomic DNA can be introduced directly into a host cell suitable for the expression of the activated gene. Alternatively, genomic DNA can be introduced into and propagated in an intermediate host cell. For example, following the integration of an activation vector into a BAC genomic library, the BAC library can be transformed into E. coli. This allows plasmids containing the transposon to be enriched by the selection of a resistance marker to an antibiotic residing in the activation vector. As a result, BAC plasmids lacking an integrated activation vector will be removed by antibiotic selection. Integration of the activation vector mediated by transposition can occur in vitro using purified enzymes. Alternatively, the transposition reaction can occur in vivo. For example, transposition can be carried out in bacteria, using a donor strain that carries the transposon either on a vector or as integrated copies in the genome. A white interest is introduced into the transposition host where it receives the integrations. The targets containing the inserts are then recovered from the host by genetic selection. Similarly, eukaryotic host cells, such as yeast, plant, insect, or mammalian cells, can be used to carry out the transposon-mediated integration of an activation vector into a genomic DNA fragment.
Isolation of mRNA and cDNA produced from activated endogenous genes In further embodiments, the present invention is directed to methods for isolating genes, particularly genes contained within the genome of a eukaryotic cell, and which are activated using the vectors of the invention. These methods exploit the structure of mRNA molecules produced using the non-targeted gene activation vectors of the invention. The methods of the invention described herein allow virtually any activated gene to be isolated, regardless of whether it has been previously isolated and characterized, and regardless of whether its biological activity is known. This is possible due to the nature of the chimeric transcripts produced from the integrated vectors of the present invention. Using methods described herein, activation vectors can be integrated into the genome of a cell. However, activation vectors are typically integrated into the genome of many cells to produce a library of unique integration events. Each member of the library contains the vector located at a unique integration site (s), and potentially contains an activated endogenous gene. Activation of the gene occurs when the activation vector is integrated towards the 5 'end of the 3' exon of an endogenous gene and in an orientation capable of allowing transcription from the vector to proceed through the endogenous gene. The integration site can be in an intron or exon of the endogenous gene, or it can be towards the 5 'end of the start site of the transcription of the gene. Following the integration, the activation construct is designed to produce a transcript capable of being processed from an exon encoded by the activation vector to an exon encoded by the endogenous gene. As a result, a chimeric messenger containing the exon of the vector bound to the exons is produced from an endogenous gene, where the endogenous exons are derived from the region located towards the 3 'end of the vector's integration site. The structure of this chimeric transcript can be exploited for gene discovery purposes. For example, chimeric transcripts can be rapidly isolated to be used as probes (to isolate the full length cDNA or genomic copies of the gene or to characterize the gene) or to direct the sequence and / or characterization. To isolate the activated chimeric transcripts by insertion of the vector, the cDNA is produced from a member of a library containing the activation event. It is also possible to isolate chimeric transcripts from groups of members of the library in order to increase the whole process, the cDNA can then be produced from the mRNA harvested from the activated cells. Alternatively, the total RNA can be used to produce a cDNA. In each case, the synthesis of the first strand can be carried out using an oligo dT primer, an oligo dT primer / poly (A) signal, or a random primer. To facilitate the cloning of the cDNA product, a dT-based primer can be used with the structure: 5'-lniciator X (dT)? _? Oo-3 '. The oligo dT primer / poly (A) signal can have the structure 5 '- (dT)? 0-3o-lniciador X-N0-6-TTTATT-3'. The random primer can have the structure: 5 '- (Initiator X) NNNNNN-3'. In each primer, primer X is any sequence that can be used to amplify target nucleic acid molecules subsequently by PCR. Where it is desired to clone the activated gene amplification product, it is useful to include one or more restriction sites within the sequence of the X primer to facilitate subsequent cloning. Other initiators recognized by those skilled in the art can be used to create cDNA products of the first strand, including primers lacking the region of the X primer. According to the invention, the primers can be conjugated with one or more haptens molecules to facilitate the subsequent isolation of nucleic acid molecules (eg, first and / or second strand cDNA products) comprising said primers. After the initiator has been started with the nucleic acid molecule (by incorporation during cDNA synthesis), selective isolation of the haptenylated initiator-containing molecule can be carried out using a corresponding ligand which specifically interacts with and binds to the hapten through ligand-hapten interactions. In preferred aspects, the ligand can be attached to, for example, a solid support. Once bound to the solid support the molecules of interest (nucleic acid molecules containing the haptenylated initiator) can be separated from the contaminating nucleic acids and other materials by washing the matrix of the support with a solution, preferably a buffer or Water. The breaking of one or more of the breaking sites within the initiator, or by treatment of the solid support containing the nucleic acid molecule with a high ionic strength elution buffer, then allowing the removal of the nucleic acid molecule of interest from the solid support. Preferred solid supports for use in this aspect of the invention include, but are not limited to, nitrocellulose, diazocellulose, glass, polystyrene, polyvinyl chloride, polypropylene, polyethylene, dextran, sepharose, agar, starch, nylon, latex bed, beds magnetic, magnetic beds, superparamagnetic beds, or microtitre plates and more preferably a magnetic bed, a magnetic bed or a superparamagnetic bed, comprising one or more specifically recognized ligand molecules and which bind to the hapten molecule in the primer. Particularly preferred hapten molecules for use of the initiator molecules of the invention, include without limitation: (i) biotin; (ii) an antibody: (iii) an enzyme; (iv) lipopolysaccharide; (v) apotransferrin; (vi) ferrotransferrin; (vii) insulin; (VIII) cytokines (growth factors, interleukins or stimulating factors of the colony); (X) gp120; (x) β-actin; (xi) LFA-1; (xii) Mac-1; (xiii) glycophorin; (xiv) laminin; (xv) collagen; (xvi) fibronectin; (xvii) vitronectin; (xviii) interns avß? and avß3; (xix) interinas a3ß- ?, a ß- !, a ß7, a5ß ?, avß ?, anbß3, avß3, and avß6, (xx) interinas aißi, a2ß ?, a3ß ?, and avß ?; (xxi) interinas a-iß-i, a2ß ?, a3ß ?, a6ß ?, a7ß ?, and aeßs. (xxii) ankyrin; (xxiii) fibrinogen or factor X; (xxiv) ICAM-1 or ICAM-2; (xxv) spectrin or fodrin; (xxvi) CD4; (xxvii) a cytokine (e.g., growth factor, interleukin or colony-stimulating factor) receptor; (xxvüi) an insulin receptor; (xxix) a transferrin receptor; (xxx) Fe +++ '; (xxxi) polymyxin B or endotoxin neutralizing protein (ENP); (xxxii) a specific enzyme substrate; (xxxiii) protein A, protein G, a cell surface Fc receptor or an antibody specific antigen; and (xxxiv) avidin and streptavidin. Particularly preferred is biotin.
Particularly preferred ligand molecules according to this aspect of the invention, which correspond in order to the hapten molecules described above, include without limitation: (i) avidin and streptavidin; (i) protein A, protein G, a cell surface Fc receptor or an antibody-specific antigen; (iii) a specific substrate of the enzyme; (V) polymyxin B or endotoxin neutralizing protein (ENP); (v) Fe +++ '; (vi) a transferrin receptor; (vii) an insulin receptor; (viii) a cytokine receptor (eg, growth factor, interleukin or colony stimulating factor) (x) CD4; (x) spectrin or fodrin; (xi) ICAM-1 or ICAM-2; (xii) C3bi, fibrinogen or factor X; (xiii) ankyrin; (xiv) integrins a-? ß- ?, a2ß ?, a3ß ?, a6ß ?, a7ß ?, and a6ßs; (xv) interinas a? ß ?, a2ß ?, a3ß ?, and avß ?; (xvi) interinas a3ß ?, a4ß ?, a ß7, a5ß ?, avß ?, aN ß3, and avß6; (xvií) interinas avß- ?, and avß3; (xviii) vitronectin; (xix) fibronectin; (xx) collagen; (xxi) laminin; (xxii) glycophorin; (xxiii) Mac-1; (xxiv) LFA-1; (xxv) β-actin; (xxvi) gp120; (xxvii) cytokines (growth factors, interleukins or stimulating factors of the colony); (xxviii) insulin; (xxix) ferrotransferrin; (xxx) apotransferrin; (xxxi) lipopolysaccharide; (xxxii) an enzyme; (xxxiii) an antibody; and (xxxiv) biotin. Particularly preferred, for use with biotinylated initiators of the invention are avidin and streptavidin. Following the synthesis of the first strand, the synthesis of the second strand of cDNA can be carried out using a specific primer for the exon encoded by the vector. This creates a double-stranded cDNA from all the transcripts that were derived from the promoter encoded by the vector. All cellular mRNAs (and cDNAs) produced from the endogenous promoters remain as a single strand since the transcript lacks an exon of the vector at the 5 'end. Once the synthesis of the second strand is carried out, the cDNA can be digested with a restriction enzyme, cloned into a vector, and propagated. To facilitate cloning, the cDNA molecules containing the exon of the vector are amplified by using a primer specific for the exon of the vector and a primer specific for primer of the first strand of cDNA (for example X primer). The results of PCR amplification in the production of variable length DNA fragments represent different coupling locations during the synthesis of the first strand and / or amplification of multiple chimeric transcripts from different genes. These amplification products can be cloned into plasmids for characterization, or they can be labeled and used as a probe. Other amplification techniques, such as linear amplification using RNA polymerase (Van Gelder, Proc. Nati, Acad. Sci. USA 87 / 1663-1667 (1990); Eberwine, Methods 70 / 283-288 (1996)), can be used. . For example, when linear amplification by RNA polymerase is used, a promoter (e.g. T7) can be located in the exon of the vector. As a result, the transcripts activated by the gene will contain the promoter sequence at the 5 'end of the transcript. Alternatively, a promoter can be ligated onto a cDNA molecule following the synthesis of the first and second strands. Using any strategy, the RNA polymerase is then incubated with cDNA in the presence of ribonucleotide trisphosphates to create RNA transcripts from cDNA. These transcripts are then retrotranscribed to produce cDNA. Since RNA polymerase can create several thousand transcripts from a single cDNA molecule, and since each of these transcripts can be retrotranscribed to cDNA, a large amplification can be achieved. As with PCR, amplification with RNA polymerase can facilitate the cloning of activated genes. Other types of amplification strategies are also possible. In another modality, the exon of the vector containing cDNA molecules is asylated without amplification. This may be useful in cases where preference occurs during amplification (for example, when one DNA fragment is amplified more efficiently than another). To produce a cDNA enriched for labeled messengers, the RNA is isolated from the activation library. An initiator (e.g., a random hexamer, oligo (dT), or hybrid primers containing an initiator linked to a poly (dT) or a random nucleotide) is coupled to the RNA and used to direct the synthesis of the first strand . The molecules of the first strand of cDNA are then hybred to a specific primer from the exon encoded by the vector. This initiator directs the synthesis of the second strand. Following the synthesis of the second strand, the cDNA can be digested with restriction enzymes that cut into the exon of the vector and the primer of the first strand (for example, in the X primer - see above). The products of the second strand can then be cloned into a useful vector to allow them to propagate. It will be apparent to one skilled in the art in view of the description contained herein that cDNA products made in accordance with the methods of the invention can also be cloned into a suitable cloning vector for the transfection or transformation of a variety of prokaryotic cells. (bacterial) or eukaryotes of yeast, plants, or animals including humans and other mammals). Such cloning vectors, which may be expression vectors, include but are not limited to chromosomal, episomal vectors and virus derived vectors, eg, vectors derived from bacterial or bacteriophage plasmids, and vectors derived from combinations thereof, such as cosmids and phagemids, BAC, MAC, YAC, and the like. Other suitable vectors for use in accordance with this aspect of the invention and methods of inserting fragments in upper case DNA here and transforming host cells with said cloning vectors will be familiar to those skilled in the art.
Removal of unprocessed transcription products In some cases, the activation vector will be integrated into the genome in a region that lacks genes. Alternatively, it can be integrated into a region that contains a gene (s), but is oriented in a manner that results in the transcription of the non-coding strand. In each of these cases, the transcripts activated by the gene are produced in a way that they contain normally untranscribed DNA sequences together with the exon encoded by the vector. These sequences could complicate the identification and analysis of novel genes. Therefore, it should be advantageous to selectively remove these genomic molecules. To remove the cDNA molecules containing an intron encoded by the vector, the double strand of cDNA is treated with a restriction enzyme that recognizes a sequence located in the coding intron by the inventor, preferably, the restriction enzyme creates an excess which is different from the excess that is produced by the breaking of the exon of the vector. This ensures the cloning of the activated genes only by preventing breakage products from binding within the cloning vector.
Recovery of exon I from activated endogenous genes To recover the exon from activated genes, specialized vectors can be used to create gene libraries for activation of the non-targeted gene. In its simplest form, this vector contains, from 5 'to 3', a promoter, an uncoupled processing donation site, and a second promoter. The promoter towards the 3 'end is oriented in the same direction as the promoter towards the 5' end. After integration towards the 5 'end of an endogenous gene, this type of vector produces two types of transcripts. The first transcript contains the exon vector linked to exon II of the endogenous gene. The methods for isolating this transcript were described above. The second transcript contains the region towards the 5 'end of the endogenous gene followed by exon I linked to exon II and other exons towards the 3' end from the endogenous gene (figure 6). Using a two-step process, exon I can be recovered from the cells that contain the integrated vector. First, the exon of the vector containing transcripts (ie transcribed type # 1, figure 13) is isolated using the methods described above. Once isolated, the 5 'end of the transcript including exon II can be sequenced to determine the sequence of the endogenous flanking exons. Second, once the sequence of the endogenous flanking exons is known, PCR primers capable of coupling to exon II (or an exon to the 3 'end) of the activated gene can develop. These primers can be used to amplify exon I from transcript # 2 (Figure 13) using a modified form of PCR (Zeiner, M., Biotechniques 17 (6): 1051-1053 (1994)). Starting from the endogenous gene is achieved by carrying out the synthesis of the first strand of cDNA with a specific gene of the initiator, passed in the information sequence described above.The synthesis of the second strand can be carried out using the DNA polymerase I of E. coli under conditions well known to those skilled in the art The double strand of cDNA is then digested with a restriction enzyme that cleaves at least once in the endogenous gene to the 5 'end of the first strand of the cDNA primer , and that does not split in the exon of the vector Following the digestion, the cDNA is self-ligated to produce circular molecules, using the inverted PCR primers that are coupled in the endogenous gene towards the 5 'end of the For restriction / circularization, amplification by PCR produces a DNA product that contains exon I sequences from the endogenous gene.
Methods to select cells that contain high levels of transcripts / protein activated by the gene. In various embodiments of the invention described, the activation vector contains an amplification marker (for example DHFR) and a viral origin of replication (for example Ori P of EBV). In other embodiments, an amplification marker and a viral origin of replication are present on a cloning vector containing a cloned genomic DNA fragment. In yet another embodiment, the activation vector contains an element (e.g. DHFR) and a cloning vector carrying a genomic insert that contains the other element (e.g., Ori P). Regardless of the initial location of the amplification marker and the viral origin, the elements are combined in the same DNA molecule before or during introduction into a host cell. In addition to the elements that act in cis, a viral protein that acts in trans is generally required for the efficient replication of episomes. Examples of the viral proteins that act in translucen the EBNA-1 antigens and SV40 T antigens. to promote the efficient replication of episomes, the viral protein acting in trans can be expressed through the episome. Therefore, the viral protein acting in trans can be expressed from the transposition activation vector, or it can be located on the structure of the cloning vector. Alternatively, the trans-acting viral protein can be expressed by eukaryotic host cells within which the episome is introduced. Once the amplifiable marker and the origin of viral replication are in the same molecule and present in a host cell that expresses the appropriate viral replication protein (s), the number of copies of the episome can be increased. To increase the number of copies of the episome, the cells can be placed under the appropriate selection. For example, DHFR is present in the episome, methotrexate can be added to the culture. The agent can be applied at relatively high concentrations to isolate cells in the population that at that time has a large number of copies of episome. Alternatively, the selection agent can be applied at a lower concentration, and periodically increased in concentrations. Increases of twice the concentration of the drug will result in a step-by-step increase in the number of copies. To reduce the frequency of non-specific drug resistance (i.e. resistance to the drug that is not associated with an increased number of copies of the episome), more than one amplifiable marker can be placed on the vector. The inclusion of multiple amplifiable markers on the episome allows the cells to be selected with multiple drugs (either simultaneously or sequentially). Since non-specific drug resistance is a relatively rare event, the likelihood that a cell will develop non-specific drug resistance to multiple drugs is extremely rare. Therefore, the presence of multiple amplifiable markers in the episome facilitates the isolation of cells that have a high copy number of episome. The amplification of the copy number of the episome increases the number of transcripts derived from the gene activated by the vector. This, in turn, facilitates the isolation of cDNA molecules derived from the activated gene. In addition, the amplification of episome copy number can dramatically increase protein expression from the activated gene. Higher levels of protein production facilitate the generation of protein for selection by bioassays, cell selection assays, and manufacturing purposes. As a result of the highly desirable characteristics described above, vectors containing a viral origin of replication and an amplifiable marker, and the use of these vectors to rapidly amplify the number of copies of episomal vectors, represent a breakthrough extending beyond the range of activated genes present in genomic DNA. For example, these vectors can be used to overexpress genes encoded by cDNA to produce higher levels of protein expression without the need to integrate the gene into a host cell genome with an amplification marker. In addition, as the amplification of chromosomal sequences, the cell has several hundred to several thousand episomal copies of the vector that can be isolated and maintained in culture. Therefore, the vectors described herein, and their uses, allow high levels of cloned genomic DNA to be propagated in mammalian cells, facilitating the isolation of copies of cDNA genes present in the vector as genomic inserts, and maximizing production of protein from cloned cDNA and genomic copies of eukaryotic genes. Other modifications and adaptations suitable to the methods and applications described herein, will be readily apparent to those skilled in the relevant arts and may be carried out without departing from the spirit of the invention and any modality thereof. As the present invention has already been described in detail, it will be more readily understood by reference to the following examples, which are included herein for purposes of illustration only and which are not intended to be limiting of the invention.EXAMPLES EXAMPLE 1 Transfection of cells for the activation of endogenous gene expression Method: Construction of pRIG-1 Human DHFR was amplified by PCR from cDNA produced from HT1080 cells by PCR using the DHFR-F1 primers (5 'TCCTTCGAAGCTTGTCATGGTTGGTTCGCTAAACTGCAT 3') (SEQ ID NO: 1) and DHFR-R1 (5 'AAACTTAAGATCGATTAATCATTC-TTCTCATATACTTCAA 3') (SEQ ID NO: 2), and cloned into the T site in pTARGET ™ (Promega) to generate pTARGET: DHFR. The RSV promoter was isolated from 'PREP9 by digestion with Nhel and Xbal, and inserted into the Nhel site of PTARGET: DHFR to generate pTgT: RSV + DHFR. Oligonucleotides JH169 (5 'ATCCACCATGGCTACAGGTGAGTACTCG 3') (SEQ ID NO: 3) and JH170 (5 ' GATCCGAGTACTCACCTGTAGCCATGGTGGATTTAA 3 ') (SEQ ID NO: 4) were ligated and inserted into the l-Ppo-l and Nhel sites of pTgT: RSV + DHFR to generate pTgT: RSV + DHFR + Exl. A 279 bp region corresponding to nucleotides 230-508 of pBR322 was amplified by PCR using the Tet F1 primers (5 'GGCGAGATCTAGCGCTATATGCGTTGATGCAAT 3') (SEQ ID NO: 5) and Tet F2 (5 'GGCCAGATCTGCTACCTTAAGAGAGCCG-AAACAAGCGCTCATGAGCCCGAA 3') (SEQ ID NO: 6). The amplification products were digested with BglII and cloned into the BamHI site of pTgT: RSV + RSV + DHFR + Exl to generate pRIG-1.
Transfection-Generation of the activation library of the pRIG-1 gene in HT1080 cells To activate the expression of the gene, an appropriate activation construct of the group of constructions described above is selected. The selected activation construct is then introduced into the cells by any transfection method known in the art. Examples of transfection methods include electroporation, lipofection, calcium phosphate precipitation, DEAE dextran and receptor-mediated endocytosis. After introduction into the cells, the DNA is allowed to integrate into the genome of the host cell by non-homologous recombination. Integration can occur in spontaneous chromosomal breaks or in artificially induced chromosomal breaks.
Method: Transfection of human cells with pRIG1 2x109 HH1 cells, an HPRT subclone of HT1080 cells, were grown in tissue culture plates of 150 mm up to 90% confluence. The media were removed from the cells and preserved as conditioned media (see below). Cells were removed from the plate by brief incubation with trypsin, added to 10% fetal calf serum / medium to neutralize trypsin, and transformed to pellets at 1000 rpm in a Jouan centrifuge for 5 minutes. The cells were washed in 1X PBS, counted and transformed again to pellets as indicated above. The cell pellet was resuspended in 2.5 x 107 cells / ml finally in 1X PBS (Gibco BRL, cat # 14200-075). The cells were then exposed to 50 rads of irradiation? from a 137Cs source. PRIG1 (Figure 14A-14B; SEQ ID NO: 18) was linearized with ßamHI, purified with phenol / chloroform, precipitated with ethanol and resuspended in PBS. The purified and linearized activation construct was added to the cell suspension to produce a final concentration of 40 μg / ml. The DNA / irradiated cell mixture was then mixed, and 400 μl was placed in each 0.4 cm electroporation mixing vessel (Biorad). The mixing vessels were pulsed at 250 Volts, 600 μFarads, 50 Ohms using an electroporation apparatus (Biorad). After the electrical pulse, the cells were incubated at room temperature for 10 minutes, and then placed in 10% MEM / 10% FBS containing penicillin / streptomycin (Gibco / BRL). The cells were then seeded at approximately 7 x 10 5 cells / 150 mm plate containing 35 ml of MEM alpha / 10% FBS / penstrep (33% conditioned media / 67% fresh media). After an incubation period of 24 hours at 37 ° C, G418 (Gibco / BRL) was added to each plate to a final concentration of 500 μg / ml from a 60 mg / ml supply material. After 4 days of selection, the media were replaced with fresh alpha MEM / 10% FBS / penstrep / 500 μg / ml G418. The cells were then incubated for another 7 to 10 days, and the culture supernatant was tested for the presence of new protein factors or stored at -80 ° C for further analysis. The drug resistant clones were stored in liquid nitrogen for further analysis.
EXAMPLE 2 Use of ionizing irradiation to increase the frequency and randomness of DNA integration Method HH1 cells were harvested at 90% confluence, washed in 1x PBS, and resuspended at a cell concentration of 7.5 x 106 cells / ml in 1X PBS. 15 μg of linearized DNA (pRIG-1) was added to the cells and mixed. 400 μl was added to each mixing vessel for electroporation, and pulsed at 250 Volts, 600 μFarads, 50 Ohms using an electroporation apparatus (Biorad). After the electrical pulse, the cells were incubated at room temperature for 10 minutes, and then placed in 2.5 ml of MEM alpha / 10% FBS / 1X penicillin-streptomycin. 300 μl of the cells of each supply material were irradiated at 0, 50, 500 and 5000 rads immediately before transfection or at 1 hour or 4 hours after transfection. Immediately after irradiation, the cells were seeded in tissue culture plates in complete medium. At 24 hours after sowing, G418 was added to the culture to a final concentration of 500 μg / ml. At 7 days after the selection, the culture medium was replaced with fresh complete medium containing 500 μg / ml of G418. At 10 days after the selection, the medium was removed from the plate, the colonies were stained with Coomassie blue / 90% methanol / 10% acetic acid, and the colonies of more than 50 cells were counted.
EXAMPLE 3 Use of restriction enzymes to generate random, semi-random or targeted breaks in the genome Method HH1 cells were harvested at 90% confluence, washed in 1x PBS, and resuspended at a cell concentration of 7.5 x 106 cells / ml in 1X PBS. To test the integration efficiency, 15 μg of linearized DNA (PGK-βgeo) was added to each aliquot of 400 μl of cells, and mixed. To several aliquots of cells, then the restriction enzymes Xbal, Notl, Hindlll, Ippol (10-500 units) were added to separate the cell / DNA mixture. 400 μl was added to each mixing vessel for electroporation, and pulsed at 250 Volts, 600 μFarads, 50 Ohms using an electroporation apparatus (BioRad). After the electrical pulse, the cells were incubated at room temperature for 10 minutes, and then placed in 2.5 ml of MEM alpha / 10% FBS / IX penicillin-streptomycin. 300 μl of 2.5 ml of the total cells of each supply material were seeded in tissue culture plates in complete media. At 24 hours after sowing, G-418 was added to the culture to a final concentration of 600 μg / ml. At 7 days after the selection, the media was replaced with fresh complete media containing 600 μg / ml of G418. At 10 days after the selection, the media were removed from the plate, the colonies were stained with Coomassie blue / 90% methanol / 10% acetic acid, and the colonies were counted from more than 50 cells.
EXAMPLE 4 Amplification by selection for two amplifiable markers located in the integrated vector After integration of the vector into the genome of a host cell, the genetic locus can be amplified in number of copies by simultaneous or sequential selection for one or more amplifiable markers located in the integrated vector. For example, a vector comprising two amplifiable markers can be integrated into the genome, and the expression of a given gene (i.e., a gene located at the vector's integration site) can be increased by selecting for both amplifiable markers located on the vector . This method greatly facilitates the isolation of clones from cells that have amplified the correct locus (i.e., the locus that contains the integrated vector). Once the vector has been integrated into the genome by non-homologous recombination, individual clones of cells containing the integrated vector can be isolated at a unique position from other cells containing the integrated vector to other positions in the genome. In alternative form, mixed populations of cells can be selected for amplification. Cells containing the integrated vector are then cultured in the presence of a first selective agent that is specific for the first amplifiable marker. This agent selects for cells that have amplified the amplifiable marker in the vector or in the endogenous chromosome. These cells are then selected for the amplification of the second selectable marker by culturing the cells in the presence of a second selective agent that is specific for the second amplifiable marker. The cells that amplified the vector and that flank the genomic DNA will survive this second selective step, while the cells that amplified the first endogenous amplifiable marker or that developed nonspecific resistance will not survive. Additional selections can be carried out in a similar manner when vectors containing more than two (eg, three, four, five or more) amplifiable markers are integrated into the cell genome, by sequential culture of the cells in the presence of selective agents that are specific for the additional amplifiable markers contained in the integrated vector. After selection, the surviving cells are tested to determine the level of expression of a desired gene, and the cells expressing the highest levels are selected for further amplification. Alternatively, groups of cells resistant to both selective agents can be further cultured (if two amplifiable markers are used) or all selective agents (if more than two amplifiable markers are used), without isolation of individual clones. These cells are then expanded and cultured in the presence of higher concentrations of the first selective agent (usually greater than double). The procedure is repeated until the desired level of expression is obtained. Alternatively, cells containing the integrated vector can be selected simultaneously for both amplifiable markers (if two of them are used) or all amplifiable markers (if more than two are used). The simultaneous selection is achieved by incorporating both selection agents (if two markers are used) or all selection agents (if more than two markers are used) in the selection means in which the transfected cells are cultured. The majority of the surviving cells will have amplified the integrated vector. These clones can then be individually selected to identify cells with level 0 of major expression, or they can be brought as a group. A higher concentration of each selective agent (usually greater than double) is then applied to the cells. Then, the surviving cells are tested for expression levels. This procedure is repeated until the desired expression levels are obtained. By any selection strategy (ie, simultaneous or sequential selection), the initial concentration of the selective agent is determined independently by titrating the agent from low concentrations without cytotoxicity at high concentrations, which results in the death of the cells in the majority of the cells. In general, a concentration that results in defined colonies (eg, several hundred colonies per 100,000 cells seeded), is chosen as the initial concentration.
EXAMPLE 5 Isolation of cDNA molecules that encode transmembrane proteins The pRIG8RI-CD2 vectors (Fig. 5A-5D; SEQ ID NO: 7), pRIG8R2-CD2 (Fig. 6A-6C; SEQ ID NO: 8 and pRIG8R3-CD2 (Fig. 7A-7C; SEQ ID NO: 9) contain the promoter of the CMV immediate early gene operably linked to an exon, followed by an uncoupled processing donation site. The exon in the vector codes for a signal peptide linked to the extracellular domain of CD2 (which lacks a stop codon in the reading frame). Each vector codes for CD2 in a different reading frame relative to the donor site of processing. To generate a library of activated genes, 2 x 10 7 cells were irradiated with 50 rads from a source of 137Cs and electroporated with 15 μg of linearized pRIG8R-CD2 (SEQ ID NO: 7). Separately, this was repeated with pRIG8R2-CD2 (SEQ ID NO: 8), and again with pRIG8R3-CD2 SEQ ID NO: 9). After transfection, the three groups of cells were combined and plated in 150 mm plates at 5 x 10 6 transfected cells per plate to generate library # 1. At 24 hours after transfection, library # 1 was placed under 500 μg. / ml of G418 for selection for 14 days. The drug-resistant clones containing the integrated vector in the genome of the host cell were combined, aliquoted and frozen for analysis. Generator # 2 was generated as described above, except that 3 x 10 7 cells, 3 x 10 7 cells and 1 x 10 7 cells were transfected with pRIG8R2-CD2, pRIG8R2-CD2 and pRIG8R3-CD2, respectively. To isolate the cells containing the activated genes encoding integral membrane proteins, 3 x 10 6 cells from each library were cultured and treated as follows: • The cells were treated with trypsin using 4 ml trypsin-EDTA. • After the cells were released, the trypsin was neutralized by the addition of 8 ml of MEM alpha / 10% FBS. • The cells were washed once with sterile PBS, and collected by centrifugation at 800 x g for 7 minutes. • The cell pellet was resuspended in 2 ml of MEM alpha / 10% FBS. One ml was used for the classification, while the other ml was reseeded in 10% MEM / FBS containing 500 μg / ml G-418, and it was expanded and conserved.
• The cells used for sorting were washed once with sterile alpha MEM / 10% FBS, and collected by centrifugation at 800 x g for 7 minutes. • The supernatant was removed, and the pellet was resuspended in 1 ml of MEM alpha / 10% FBS. 100 μl of these cells were removed for staining with isotype control. • 200 μl of anti-CD2 FITC (Pharmingen, catalog number) 30054X), were added to the 900 μl of the cells, while 20 μl of the isotype control of IgG? of mice (Pharmingen, catalog number 33814X) were added from the 100 μl of the cells. The cells were incubated on ice for 20 minutes. • To the tube containing the cells stained with the anti-human FITC CD2, 5 ml of PBS / 1% FBS was added. To isotype control, 900 μl of PBS / FBS at 1% was added. Cells were harvested by centrifugation at 600 x g for 6 minutes. • The supernatant was removed from the tubes. Cells that had been stained with the isotype control were resuspended in 500 μl of 10% MEM alpha / FBS, and the cells that had been stained with anti-CD2 FITC were resuspended in 1.5 ml of MEM alpha / FBS a 10% The cells were classified through five sequential classes in a FACS Vantage flow cytometry (Becton Dickinson Immunocytometry Systems; Mountain View, CA). In each class, the indicated percentage of the total cells, representing the most strongly fluorescent cells (see below) was collected, expanded and reclassified. The HT1080 cells were classified as negative control. The following populations were classified and collected in each class: The cells from each of the final classes of each library were expanded and stored in liquid nitrogen.
Isolation of activated cells from cells classified by FACS Once the cells were classified as described above, the endogenous genes activated from the sorted cells were isolated by PCR-based cloning. However, one skilled in the art will appreciate that any method known in the art of gene cloning can be used in an equivalent manner to isolate activated genes from cells classified by FACS. The genes were isolated by the following protocol: 1) Using the PoIyATract 1000 messenger RNA isolation kit (Promega), messenger RNA was isolated from 3x107 CD2 + cells (five cycles sorted by FACS, as described above) from the libraries number 1 and 2. 2) After the isolation of the messenger RNA, the concentration of the messenger RNA was determined by diluting 0.5 μl of isolated messenger RNA, in 99.5 μl of water, and measuring the OD260. 25 μg of messenger RNA was recovered from the CD2 + cells. 3) First-strand cDNA synthesis was then carried out in the following manner: a) While the PCR machine was maintained at 4 ° C; the reaction mixtures of the first chain were adjusted by the sequential addition of the following components: 41 μl of ddH20 treated with DEPC 4 μl of dNTP, each at 10 mM 8 μl of MDTT to OJ M 16 μl of buffer pH 5x first MMLV chain (Gibco-BRL) 5 μl (10 pmol / μl) of the polyadenylation consensus site initiator GD.R1 (SEQ ID NO: 10) * 1 μl ofRNAs (Promega) 3 μl (1.25 μg / μl) of Messenger RNA * Note: GD.R1, 5 'TTTTTTTTTTTTCGTCAGCGGCCGCATCNNNNTTT-ATT 3' (SEQ ID NO: 10), is an initiator for the "Discovery of genes" for the synthesis of first strand cDNA messenger RNA; this initiator is designed to bind to the AATAAA polyadenylation signal and the poly-A region towards the 3 'end. This initiator will introduce a Notl site in the first string. Once the samples were obtained, they were incubated as follows: b) 70 ° C for 1 minute c) maintenance at 42 ° C. 2 μl of 400 U / μl SuperScript II (Gibco-BRL, Rockville, MD) was then added to each sample to give a final total volume of 82 μl. After approximately three minutes, the samples were incubated as follows: d) 37 ° C for 30 min. e) 94 ° C for 2 min. f) 4 ° C for 5 min. Then 2 μl of 20 U / μl of RNace-IT (Stratagene) was added to each sample, and the samples were incubated at 37 ° C for 10 minutes. 4) After the synthesis of the first strand, cDNA was purified using a PCR cleaning kit (Qiagen) in the following manner: a) 80 μl of the reaction of the first strand was transferred to a 1.7 ml siliconized Eppendorf tube , and adding 400 μl of PB. b) The samples were then transferred to a PCR cleaning column, and centrifuged for two minutes at 14,000 RPM. c) The columns were disassembled then, decanted completely, 750 μl of PE was added to the pellets, and the tubes were centrifuged for two minutes at 14,000 RPM. d) The columns were disassembled and decanted completely, and the tubes were then centrifuged for two minutes at 14,000 RPM to dry the resin. e) The cDNA was then eluted using 50 μl of EB, through a transfer column, to a new siliconized Eppendorf tube, which was then centrifuged for two minutes at 14,000 RPM. 5) The synthesis of the cDNA of the second chain was carried out in the following manner: a) The reaction mixtures of the second chain were adjusted to room temperature, by the sequential addition of the following components: ddH20 55 μl PCR buffer 10 x 10 μl 50 mM MgCl 2 5 μl 10 mM dNTP 2 μl RIG. 751 -Bio * 25 pmol / μl 4 μl GD. R2 ** 25 pmoles / μl 4 μl Product of the first chain 20 μl * Note: RIG.F751-Bio, 5 'Biotin-CAGATCACTAGAAGCTTTATTGCGG 3' (SEQ ID NO: 11), binds to the cap site of the expressed transcript of pRIG vectors. ** Note: GD.R2, 5 'TTTTCGTCAGCGGCCGCATC 3' (SEQ ID NO: 12), is an initiator used to PCR amplify cDNA molecules generated using the primer GD.R1 (SEQ ID NO: 10). GD.R2 is a subsequence of GD.R1 with a mating sequence with the degenerate bases that precede the poly A signal sequence. B) Begin synthesis of the second strand: 94 ° C for 1 min; add 1 μl of Taq (5U / μl, Gibco-BRL); add 1 μl of Vent DNA polymerase (OJ U / μl, New England Biolabs) c) Incubate at 63 ° C for 2 minutes d) Incubate at 72 ° C for 3 minutes e) Repeat step four times b) f) Incubate 72 ° C for 6 minutes g) Incubate at 4 ° C (keep at this temperature) h) End of the procedure. 6) 200 μl of 1 mg / ml of streptavidin-paramagnetic particles (SA-PMP) was then prepared by washing three times with STE 7) The products of the reaction of the second chain were added directly to SA-PMPs, and incubated room temperature for 30 minutes 8) After the binding, SA-PMPs were collected by the use of the magnet, and the material was fully recovered 9) The spheres were washed three times with 500 μl of STE 10) The spheres were resuspended in 50 μl of STE, and collected at the bottom of the tube using the magnet. The supernatant of STE was then carefully pipetted. 11) The beads were resuspended in 50 μl of ddH20, and placed in a water bath at 100 ° C for two minutes, to release the purified cDNA of PMP. 12) Purified cDNA was recovered by collecting PMP in the magnet, and carefully removing the supernatant containing the cDNA. 13) The purified products were transferred to a clean tube, and centrifuged at 14,000 RPM for two minutes to remove all residual PMPs. 14) A PCR reaction was then carried out to specifically amplify RIG-activated cDNA molecules from the following way: a) The PCR reaction mixtures were adjusted to room temperature, by sequential addition of the following components: H20 59 μl 10 x pH regulator for 10 μl PCR MgCI 2 at 50 mM 5 μl DNTP at 10 mM 2 μl 25 pmoles / μl of RIG. 2 μl F781 * 25 pmol / μl of GD.R2 2 μl Product of the second 20 μl chain * Note RIG.F781, 5 'ACTCATAGGCCATAGAGGCCTATCACAG- TTAAATTGCTAACGCAG 3 '(SEQ ID NO: 13), binds to the 3' end of GD.F1 GD.F3, GD.Fd-Bio and RIG.F751-Bio, and adds a Sfil site for the 'cloning of cDNA molecules. This initiator is used in the amplification nested PCR of second-strand cDNA molecules specific to the exon 1 of RIG. b) The thermal cycler is operated: 94 ° C for 3 minutes; add 1 μl of Taq (5U / μl, Gibco-BRL); add 1 μl of Vent DNA polymerase to OJ U / μl (New England Biolabs) PCR was then carried out by 10 cycles of steps c) to e): c) 94 ° C for 30 sec. d) 60 ° C for 40 sec. e) 72 ° C for 3 min. The PCR was then concluded carrying out the following steps: f) 94 ° C for 30 sec. g) 60 ° C for 40 sec. h) 72 ° C for 3 min. i) 72 ° C + each cycle of 20 sec. for 10 cycles j) 72 ° C for 5 min. k) maintenance at 4 ° C. 15) After elution of the library material with 50 μl of EB, the samples were digested by adding 10 μl of buffer pH 2 NEB, 40 μl of dH2Oμl of Sfil, and digesting for 1 hour at 50 ° C, to cut the 5 'end of the cDNA at the Sfñ site encoded by the forward primer (RIG.F781; SEQ ID NO: 13). 16) After digestion of Sß, 5 μl of NaCl to 1 M and 2 μl of Noti were added to each sample, and the samples were digested for 1 hour at 37 ° C, to cut the 3 'end of the cDNA in the site? / oil encoded by the first chain primer (GD.R1; SEQ ID NO: 10). 17) The digested cDNA was then separated on a 1% low melting point agarose gel. Gel molecules were separated from the gel CDNAs that varied in size from 1.2Kb to 8Kb. 18) cDNA was recovered from the separated agarose gel using Qiaex II gel extraction (Qiagen). 2 μl of cDNA (approximately 30 mg) were ligated to 7 μl (35 ng) of pBS-HSB (linearized with S / l / oyl) in a total volume of 10 μl of pH regulator 1X for T4 ligase (NEB), using 400 units of T4 DNA ligase (NEB). 19) 0.5 μl of the reaction mixture for ligation of step (18), were transformed into DH10B of E. coli. 20) 103 colonies / 0.5 μl of bound DNA was recovered. 21) These colonies were selected for exons using primers M13F20 and JH182 (specific for exon 1 of RIG) by PCR in volumes of 12.5 μl, as follows: a) 100 μl of LB (with selective antibiotic) were supplied in the Appropriate number of 96-well plates. b) The individual colonies were selected and inoculated in the individual wells of the 96-well plate, and the plate was placed in an incubator at 37 ° C for 2 to 3 hours without agitation. c) A "master mix" was prepared for PCR reaction on ice, as follows: d) 10 μl of the master mix were supplied in each well of the PCR reaction plate e) 2.5 μl of each 100 μl of the E. coli culture were transferred to the corresponding wells of the PCR reaction plate. f) PCR was carried out, using typical conditions of the PCR cycle of: (i) 94 ° C / 2min. (bacterial lysis and denaturation of plasmid); (I) 30 cycles of denaturation at 92 ° C for 15 seconds; primer binding at 60 ° C for 20 seconds; and extension of the initiator at 72 ° C for 40 seconds; (iii) final extension at 72 ° C for 5 minutes; (iv) maintenance at 4 ° C. g) Bromophenol blue was then added to the PCR reaction; the samples were mixed and centrifuged, and then the entire reaction mixture was loaded onto an agarose gel. 23) Of 200 selected clones, 78% were positive for the exon of the vector. 96 of these clones were developed as minipreparations, and purified using a Qiagen 96-well turbo preparation, following the Qiagen miniprep manual (April 1997). 24) Many duplicate clones were eliminated through the simultaneous digestion of 2 μl of DNA with Notl, Bam Hl, Xhol, Xbal, Hindlll and EcoRl in buffer pH 3 NEB, in a total volume of 22 μl, followed by electrophoresis on a 1% agarose gel.
Results: Two different cDNA libraries were selected using this protocol. In the first library (TMT # 1), eight of the isolated activated genes were sequenced. Of these eight genes, four genes encoded for known integral membrane proteins, and six were novel genes. In the second library (TMT # 2), 11 isolated activated genes were sequenced. Of these 11 genes, one gene encoded for a well-known integral membrane protein, one gene encoded for a partially sequenced gene homologous to an integral membrane protein, and nine were novel genes. In all cases where the isolated gene corresponded to a known characterized gene, said gene was an integral membrane protein. Below are examples of significant alignments (obtained from the Gene Bank) for genes isolated from each library: Significant TMT Alignments # 1: 179761 | gb | M76559 1 HUMCACNLB: Complete CDs of messenger RNA of the calcium channel alpha-2b subunit, voltage dependent and sensitive to human neuronal DHP. Length = 3600 > gi | 3183974 | emb | Y10183 | HSMEMD: messenger RNA for MEMD protein of H. sapiens. Length = 4235 Significant TMT # 2 alignments: > gi 1476590 1 gb | UO6715 1 HSUO6715: human B561 cytochrome, HCYTO B561, messenger RNA, partial CD. Length = 2463 > gi | 2184843 | gb | AA459959 | AA459959 zx66c01.s1: Nb2HF8 9w total of Soares fetus; clone 796414 of Homo sapiens cDNA 3 'similar to the precursor of the gb interferon alpha receptor: J03171 (human); length = 431 EXAMPLE 6: Activation of endogenous genes using a poly (A) trap vector HT1080 cells (1 x 107 cells) were irradiated with 50 rads using a 137Cs source and electroporated with 15 μg of pRIG14 (Figs. 29A-29B). Following transfection, the cells were placed inside a 150 mm disc at 5 x 106 cells / disc. At 24 hours, puromycin was added at 3 μg / ml. The cells were incubated at 37 ° C for 12 days in the presence of 3 μg / ml puromycin. The medium was replaced every 5 days. At 12 days, the number of colonies was counted, and the cells were trypsinized and replanted on a new dish. Cells were grown up to 90% confluence and harvested for storage by freezing and gene isolation. Typically, 1000-3000 colonies were produced by 1 x 107 transfected cells.
EXAMPLE 7: Activation of endogenous genes using a dual vector poly (A) Trap / SAT Cells 1 x 107 HH1 (HPRT-minus HT1080 cells) were irradiated with 50 rads using a 137Cs source and electroporated with 15 μg of linearized pRIG-22. Following transfection, the cells were planted inside 150 mm discs at 5 x 106 cells / disc. At 24 hours, neomycin was added at 500 μg / ml G481. The cells were incubated at 37 ° C for 4 days in the presence of 500 μg / ml G481. The medium was replaced with fresh medium containing 500 μg / ml G418 and AgThg and were grown in the presence of both drugs for an additional 7 days. Alternatively, as a control for HPRT activity, the medium was replaced with fresh medium containing 500 μg / ml G418 and HAT (available from Life Technologies, Inc., Rockville, MD, and used at a concentration recommended by the manufacturer) and they grow in the presence of both drugs for an additional 7 days. At 12 days post transfection, the colony number was counted, and the cells were trypsinized and reseeded on a new box. The cells were grown to 90% confluence and harvested to freeze and isolate the gene. Typically, cells are screened by G418 / AgThg which produces 1000-3000 colonies per 1 x 107 transfected cells. In contrast, cells undergoing a selection of G418 / HAT produce approximately 100 colonies per 1 x 107 of transfected cells.
EXAMPLE 8: Isolation of activated genes The non-targeted gene activation vectors were integrated into the genome of eukaryotic cells using the methods of the invention. By integrating the vector into multiple cells, a library was created in which the cells are expressing different genes activated by the vector. The RNA is isolated from these cells using a commercial RNA isolation kit. In this example, the RNA is isolated from the cells using poly (A) Trac 1000 (Promega). The RNA is converted to cDNA, amplified, fractionated to its size, and cloned into a plasmid for analysis and sequencing. A brief description of this procedure is presented. 1) Place 4 ml of GTC extraction buffer (poly (A) tract 100 kit-Promega) in a tube with 15 ml polycarbonate thread and add 168 μl of 1, 2-mercapto ethanol and place in a bath of water at 70 ° C. 2) Place 8 ml of dilution buffer in a tube with 15 ml polycarbonate screw cap for each processed concentrate and add 168 μl of 2-mercaptoethanol and place in a water bath at 70 ° C. 3) Remove from cell storage at 80 ° C the cell concentrates (1 x 107 - 1 x 108 cell) containing the activation vector to the unmanaged gene integrated into its genome. Pipette 4 ml GTC of extraction buffer immediately onto each cell concentrate. Pipette up and down several times until the concentrate is resuspended and transfer to polypropylene tubes with a 15 ml screw cap. 4) Add 8 ml of the dilution buffer and mix by inversion. 5) Add 10μl (500 pmol) of the initiator and the biotinylated oligo dT mixture. 6) Allow to settle at 70 ° C for 5 minutes by inverting every couple of minutes to ensure homogeneous heating. 7) Centrifuge in a Sorvall HB-6 rotor at 7800 rpm (10k x g) at 25 ° C for 10 minutes. During this period of time wash with 6 ml of paramagnetic particles marked with streptavidin (SA-PMP) 3 x with 6 ml 0.5 x SSC through the use of a magneto poly (A) Tract 1000 system. 8) After 3 washes Resuspend the SA-PMP in 6 ml 0.5 x SSC. 9) Pipette to remove the supernatant from the RNA preparation and add to the resuspended SA-PMP (be careful when removing the supernatant so that the tablet does not break). 10) Allow SA-PMP / RNA to mix and incubate for 2 minutes at room temperature. 11) Capture the magnetic beds through the use of a Poly (A) Tract 1000 magneto system. Note that this takes some time for all the beds to concentrate due to the high viscosity of the liquid. 12) Pour the supernatant and resuspend the beds in 1.7 ml of 0.5 x SSC using a 2 ml pipette and transfer to a 2 ml screw cap tube. 13) Capture the SA-PMP using the magnet and remove the supernatant by pipetting with a P1000. 14) Add 1.7 ml 0.5 x SSC and invert the tube several times to mix. 15) Repeat steps 14 and 15 two more times. 16) Resuspend the SA-PMP in 1 ml of nuclease-free water and invert several times to mix. 17) Capture the SA-PMP and pipette the mRNA. 18) Place 0.5 ml of mRNA inside each of the siliconized tubes and add 50 μl of 3M NaOAc solution treated with DEPC and 0.55 ml of isopropanol. Invert several times to mix and place at -20 ° C for at least 4 hours. 19) Centrifuge the mRNA for 10 minutes at maximum RPM (14 k). 20) Carefully pipette the supernatant and wash the concentrates with 200 μl 80% ethanol through re-centrifugation for 2 minutes at 14K RPM. Note that the concentrates are often brown or tan. This color results from residual SA-PMP. 21) Remove the wash and allow the concentrates to air dry for no more than 10 minutes at room temperature. 22) Resuspend the concentrates in 5 μl each and combine in a single tube. 23) Centrifuge at 14K RPM for 2 minutes to remove the residual SA-PMPs and carefully remove the mRNA. 24) Determine the concentration of mRNA by diluting 0.5 μl into 99.5 μl of water and measure OD 260. Note that 1 OD 260 = 40 μg of RNA. 25) Establish the first strand reaction for both test samples and the negative control (HT1080) through the sequential addition of the following components while the PCR machine is maintained at 4 ° C: step 1: 42 μl ddH20 treated with DEPC 4 μl 10 mM each dNTP 8 μl 0.1 M DTT 16 μl 5 x MMLV first-strand buffer 5 μl (10 pmol / μl) GDR1 1 μl RNAsin (Promega) 4 μl (1.25 μg / μl) mRNA Step 2: 70 '1 min Step 3: 42 maintain Step 4: After 1 minute add 2 μl SUPERSCRIPT II® (Life Technologies, Inc., Rockville, MD) and incubate at 37 ° C for 30 minutes. Step 5: 9472 min. Step 6: 478 Step 7: Add 2 μl of RNase and incubate at 37 ° C for 10 minutes. Step 8: 478 26) Analyze 8 μl of cDNA on a 1% agarose gel to check the cDNA synthesis and purify the remaining cDNA using the PCR cleaning kit from Qiagen by transferring 70 μl of the reaction of the First strand to a 1.5 ml siliconized eppendorf tube and add 400 μl PB. 27) Transfer to a PCR cleaning column and centrifuge for 2 minutes at maximum RPM. 28) Disassemble the column and pour the flow. Add 750 μl PE and centrifuge for 2 minutes at maximum RPM. 29) Disassemble the column and pour the flow through, then centrifuge for 2 minutes at maximum RPM to dry the resin. 30) Elute using 50 μl of EB through the transfer column to a new siliconized eppendorf tube and centrifuge for 2 minutes at maximum RPM. 31) Synthesis of the second strand of cDNA established at RT: H20 8.5 μl buffer 10x PCR 5 μl MgCl250 mM 2.5 μl dNTP 50 mM 1 μl GDF5Bio 25 pmol / μl 10 μl GDR225 pmol / μl 10μl Product of the first strand 15 μl Step 9: 94 ° C / 1 min. Step 10: 60 ° C / 10 min. Add 0.25 μl Taq polymerase Step 11: 60 ° C / 2 min. Step 12: 72 ° C / 10 min. Step 13: 94 ° C / 1 min. Step 14: a minute go to "step 11" four more times Step 15: 60 ° C / 2 min Step 16: 72 ° C / 10 min Step 17: end 32) Prepare 100 μL of SA-PMP when washing 3x with STE and collect using a magnet. After the final wash, resuspend the beds in 150 μl of STE. 33) Purify the reaction products of the second strand using the PCR cleaning kit from Qiagen. Eluate in 50 μl EB and add the reaction products of the second strand to 150 μl of the PMPs. 34) Mix carefully at RT for 30 minutes.
) After collecting the SA-PMP connections through the use of a magnet and recovering the flow through the material (save this material). 36) Wash beds 3x with 500 μl STE and 1x with NEB 2 (1x). 37) Resuspend the beds in 100 μl of NEB 2 (1x). 38) Add 2 μl Sfil and digest at 50 ° C for 30 minutes with gentle mixing every 10 minutes. 39) Retrieve the purified cDNA through the use of a magnet and carefully remove the supernatant. 40) Transfer the products to a new tube and centrifuge at maximum RPM for 2 minutes to remove all the beds. 41) Establish a PCR reaction to specifically amplify RAGE-activated cDNAs: H20 37 μl 10x PCR buffer 10 μl 50 mM dNTP 2 μL GDF781 25 pmol / μl 10 μl GDR2 25 pmol / μl 10μl Product of second strand 25 μl Step 1: 94 ° C / 2 min. Step 2: 94 ° C / 45 sec Step 3: 60 ° C / 10 min. Add 0.5 μl of Taq polymerase Step 4: 72 ° C / 10 min.
Step 6: 60 ° C / 2 min. Step 7: 72 ° C / 10 min. Step 8: cycle for step 5, 8 times more Step 9: 94 ° C / 45 sec. Step 10: 60 ° C / 2 min Step 11: 72 ° C / 10 min. + 20 sec each cycle Step 12: cycle to step 9, 14 times more Step 13: 72 ° C / 5 min. Step 14: 4 ° C maintain 42) Verify the specificity of the PCR amplification of HT1080 against the material of the library through an analysis of a 1% agarose gel. If there is a high specificity of the cDNA amplification, then use the Qiagen PCR cleaning kit to purify the PCR products. 43) After elution of the material from the library with 50 μl of EB add 10 μl of NEB2, 40 μl of dH20 and 2 μl of Sfil and digest for one hour at 50 ° C. 44) Add 5 μl of 1 M NaCl and 2 μl of Notl and digest for 1 hour at 37 ° C. 45) Prepare and run a 1% agarose gel L.M. and run the material from the library on a gel. After viewing the material, cut the fragments having a size range of 500 bp to 10 kb. 46) Recover the DNA of the library from the agarose using the Qiaex II gel extraction protocol (Qiagen) and elute the DNA in 10 μl of EB. Link 5 μl of this material to 4 μl of pBS-HSB (Silyl / Notl) or pBS-SNS in a total volume of 10 μl. 47) Transform E. coli with 0.5 μl bound to DNA by 40 μl of cells. 48) Select the colonies, grow them overnight in LB, asyle the plasmids. 49) Analyze the cDNA inserts activated by the gene by restriction digestion and DNA sequencing.
EXAMPLE 9 Isolation of activated genes from subtracted cDNA pools The mRNAs purified from non-transfected HT1080 cells were prepared using the Poly-A Tract 1000 system (Promega), as described in example 8 steps 1-24, and were biotinylated using the EZ-LinkTM Biotin LC-ASA reagent ( Pierce), as follows: 1.) 25 μl of dH2O treated with DEPC and 15 μl containing 10 μg of HT1080 mRNA was added into siliconized microfuge tubes and kept on ice. 2.) Working with low light, 40 μl of the prepared storage reagent LC-ASA (1 mg / ml in 100% ethanol) were added into the reaction tube. 3.) A UV light (wavelength at 365 nm) was placed 5 cm above the microcentrifuge tube and used to irradiate the reaction mixture for 15 minutes. 4.) The reagent not coupled to biotin was removed from the unlabeled HT1080 mRNA by passing the reaction mixture through a P-30 RNAse-free microcentrifuge column (BioRad), as prescribed by the manufacturer. HT1080 cells were transfected with an activation vector Poly (A) pRIG trap and were grown under selection medium to produce a population of drug-resistant colonies, as described in Example 1. The purified mRNAs were prepared from the colonies pooled using the Promega Poly-A Tract system. 1000, as described in example 8. The first cDNA was prepared from 5μg of this mRNA using oligo GD.R1 (TTTTTTTTTTTTCGTCAGCGGCCGCATCNNNNTTTATT) (SEQ ID NO: 10), as described in example 8, step 25. The reaction mixture was passed through a Quiagen PCR Quick Clean-up column and the first purified strand of in cDNA was recovered in 100 μl of EB. Subtractive hybridization of the biotinylated .HT1080 mRNA (subtracted population) and first strand in cDNA prepared from the super group of colonies transfected with PRIG (target population) were carried out as follows: 1.) 9 μg of biotinylated mRNA was added inside a 0.5 ml microcentrifuge tube containing 0.5 μg of the first strand of cDNA. 2.) 1/1 OOx glycogen volume 10 mg / ml, 1/1 Ox volume of 3 M sodium acetate, pH 5.5, and 2.6x volume of 100% ethanol were added into the tube and mixed. 3.) The tube was placed at -80 ° C for 1 hour, then centrifuged in a refrigerated microcentrifuge for 20 minutes. 4.) The precipitated nucleic acid concentrate was dried, washed once with 70% ethanol, and then air dried. 5.) The concentrate was solvated in 5 μl of HBS (50 mM HEPES, pH 7.6; 2 mM EDTA; SDS 0.2%; 500mM NaCl) and overlaid with 5 μl of light mineral oil, then heated at 95 ° C for 2 minutes followed by 68 ° C for 24 hours. 6.) The reaction mixture was diluted with 100 μl of HB (HBS without SDS) and extracted once with 100 μl of chloroform to remove the oil. 7.) The diluted hybridization mixture was added to 300 μl of paramagnetic particles coated with streptavidin (Promega) which had been pre-washed 3x in 300 μl of HB. 8.) The mixture was incubated 10 minutes at room temperature and the SA-PMP and the mRNA: DNA hybrids bound to biotin were removed from the solution by magnetic capture. 9.) Steps 7 and 8 were repeated once.
. ) The clear solution was subjected to an additional round of subtractive hybridization and magnetic removal of captured hybrids (steps 1-9) with the following exceptions: Step 6: the hybridization reaction was diluted with pH 2x PCR regulator (40 mM Tris- HCl, pH 8.4, 100 mM KCl). Step 7: PMP was pre-washed in 1X PCR buffer. The cDNA of the first strand subtracted 2 times was used to generate a second strand of cDNA by combining 45 μl of cDNA from the first strand with 7 μl of dH20, 5 μl of 50 mM MgCl2, 2 μl of premix of 10 mM of each dNTP, 1 μl 10x PCR regulator, 20 μl of GDI9F1 -Bio 12.5 pmol / μl (5 'Biotin-CTCGTTTAGTGCGGCCGCTCAG-ATCACTGAATTCTGACGACCT (SEQ ID NO: 14), 20 μl of GD.R2 12.5 pmol / μl (TTTTCGTCAGCGGCCGCATC) (SEQ ID NO: 12), and 0.5 μl of Taq polymerase, with thermocycling as described in example 8, step 31. The cDNA product of the second strand was amplified and further processed for the production of a cDNA library based on E. coli, as described in example 8, steps 32-49.
EXAMPLE 10 Selective capture of transcripts activated by RIG HT1080 cells were transfected with the Prig19 activation vector (Figures 30A-30C) and cultured for 2 weeks in selection medium, as described in Example 6. Total RNA was prepared from a concentrate comprising 108 cells. using TRIzol® reagent (Life Technologies, Inc., Rockville, MD) following the manufacturer's protocol and dissolved in 720 μl of dH20 treated with DEPC (dH20DEPC). The contaminating genomic DNA was removed from the RNA preparation by mixing 80 μl of pH 2 NEB 10x buffer, 8 μl of Promega Rnasin, and 20 μl of Promega RNase-free DNase RQ1, incubating at 37 ° C for 30 minutes. minutes, sequentially extracting with equal volumes of phenol: chloroform (1: 1) and chloroform, mixing with 1 / 10x volumes of sodium acetate (pH 5.5), precipitating the RNA with 2x volumes of 100% ethanol, and solvating the concentrate of dry RNA in dH20DEPC for a final concentration of 4.8 μg / μl. Transcripts of mRNA derived from the genes activated by pRIG19 were selectively captured from the pooling of total cellular RNAs by mixing in a 2 ml RNase-free microcentrifuge tube 150 μl total RNA, 150 μl HBDEPC (HEPES 50 mM, pH 7.6, 2 mM EDTA, 500 mM NaCl), 3 μl of Rnasin Promega, and 2.5 μl (25 pmol / μl) of oligo GD19.R1-Bio (see table 1), then incubating at 70 ° C for 5 days. minutes followed by 50 ° C for 15 minutes. One ml of paramagnetic particles coated with Promega's streptavidin (SA-PMP) were magnetically captured and washed in 3x each with 1.5 ml of 0.5x SSC, and the SA-PMPs were left without resuspending. The hot oligo reaction: RNA hybridization was added directly into the tube containing the semi-dried SA-PMPs. After incubation for 10 minutes at room temperature the SA-PMPs were washed 3x with 1 ml of 0.5X SSC.
TABLE 1 Sequences of primers and oligonucleotide After the final magnetic capture, the SA-PMPs were suspended in 190 μL of dH20 with DEPC and incubated at 68 ° C for 15 minutes. The PMPs were immobilized by exposure to a magnetic solution and the clarified solution containing transcripts activated by RIG was transferred to a microcentrifuge tube. 63 μL of the captured transcripts activated by RIG were transferred to a PCR tube where the synthesis of the first and second strands of cDNA was carried out using the PCR program "1 + 2CDNA", as follows: Step 1: 4 ° C / 8: add 20 μL of GibcoBRL RT 5x buffer, 1 μL of Promega Rnasin, 10 μL of 100 mM DTT into the PCR tube containing the RIG-activated transcripts. μL of dNTP premix at 10 mM each, 1 μL of oligo GD.R1 (see Table 1) at 25 pmol / μL. Step 2: 70 ° C / 3 minutes. Step 3: 42 ° C / 10 minutes. Step 4: Add 2.5 μL of SuperScript II® (Life Technologies, Inc.), incubate at 37 ° C / 1 hour. Step 5: 94 ° C / 2 minutes. Step 6: 4 ° C / oo. To the mixture of the first strand of cDNA, 2 μL of Stratagene RNase I was added and the mixture was incubated at 37 ° C for 15 minutes. 600 μL of the Qiagen PB reagent were added to the reaction, then transferred to a Qiagen PCR cleaning column and processed according to the manufacturer's protocol. The cDNA was eluted from the column in 50 μL of EB and transferred to a PCR tube. The reaction of the second strand of cDNA was carried out using oligos GD19.F2-Bio (Table 1) and GD.R2 (Table 1) as described in Example 9. The product of the second strand was captured on SA- Promega PMP as described in Example 9, with the exception that the final suspension of SA-PMP was made in the NEB4 1x buffer and the captured cDNAs were excised from the particles using restriction endonuclease Ase I. The amplification of the products of the second strand of cDNA using oligos GD19.F2 and GD.R2, the digestion of the amplified cDNAs using Sfil and Notl endonucleases, and the selection of the size of the cDNAs before cloning were carried out as described in Example 9. The final cDNA cleaning was achieved by eluting the cDNA pool from a cleaning column in Qiagen PCR in 30 μL of EB. 11 μL of cDNA were mixed with 4 μL of 5x GibcoBRL ligase buffer, 4 μL of the pGD5 DNA vector previously prepared by digestion with Sß, Notl and CIP. 1 μL of T4 DNA ligase was added, and the reaction mixture was incubated at 16 ° C overnight. 1μL of the ligation reaction was used to transform electro-competent E. coli DH10B cells, which were subsequently seeded onto LB agar boxes containing 12.5 μg / ml chloramphenicol. Typically, 60 to 80 bacterial colonies were recovered per μL of transformed ligation mixture.
EXAMPLE 11: Selective capture of transcripts activated by RIG HT1080 cells were transfected with pRIG19 of the activation vector and cultured for two weeks in selective medium, as described in example 6. Total RNA was prepared from a concentrate comprising 108 cells using TRIzol® reagent (Life Technologies , Inc.) following the manufacturer's protocol, and dissolved in 720 μL of dH2O treated with DEPC (dH20DEPC). The contaminating genomic DNA was removed from the RNA preparation by mixing 80 μL of 2 NEB 10x buffer, 8 μL of Promega Rnasin, and 20 μL of Promega RNase-free DNase RQ1, incubating at 37 ° C for 30 minutes , sequentially extracting with equal volumes of phenol ¡chloroform (1: 1) and chloroform, mixing with 1/1 Ox volume of sodium acetate (pH 5.5), precipitating the RNA with 2x volumes of 100% ethanol, and solvating the dry RNA concentrate in dH20DEPC to a final concentration of 4.8 μg / μL. Transcripts of mRNA derived from the genes activated by pRIG19 were selectively captured from the pooling of total cellular RNAs by mixing 150 μL of total RNA, 150 μL of HBDEPC (HEPES 50) in a RNase-free 2 ml microcentrifuge tube. mM, pH 7.6, 2 mM EDTA, 500 mM NaCl), 3 μL of Promega Rnasin, and 2.5 μL (25 pmol / μL) of oligo GD19.R1-Bio (see Table 1), then incubating at 70 ° C 5 minutes followed by 50 ° C for 15 minutes. One ml of paramagnetic particles coated with Promega's streptavidin (SA-PMP) were magnetically captured and washed 3x each with 1.5 ml of 0.5x SSC, and the SA-PMPs were left without resuspending. The hot oligo reaction: hybridization RNA was added directly into the tube containing the semi-dry SA-PMPs. After incubation for 10 minutes at room temperature the SA-PMP were washed 3x with 1 ml of 0.5x SSC. After the final magnetic capture, the SA-PMPs were suspended in 190 μL of dH20 with DEPC and incubated at 68 ° C for 15 minutes. The PMPs were immobilized by exposure to a magnetic solution and the clarified solution containing transcripts activated by RIG was transferred to a microcentrifuge tube. 63 μL of the captured transcripts activated by RIG were transferred to a PCR tube where the synthesis of the first and second strands of cDNA was carried out using the PCR program "1 + 2CDNA", as follows: Step 1: 4 ° C / 8: add within the PCR tube containing the transcripts activated by RIG 20 μL of RT 5x buffer of GibcoBRL, 1 μL of RNasin Promega, 10 μL of DTT 100 mM, 5 μL premix of dNTP at 10 mM each, 1 μL of oligo GD.R1 (see Table 1) at 25 pmol / μL. Step 2: 70 ° C / 3 minutes. Step 3: 42 ° C / 10 minutes Step 4: Add 2.5 μl SUPERSCRIPT II® (Life Technologies, Inc.), then incubate at 37 ° C / 1 hour Step 5: 94 ° C / 2 minutes Step 6: 60oC / 8; while maintaining the temperature, the following was added: 2 μl of 50 mM MgCl 2, 1 μl of oligo GD19.F1-Bio (table 1) at 25 pmol / μl, and 2 μl of RNace-lt from Stratagene. After 10 minutes, 0.5 μl of Taq DNA polymerase (Life Technologies, Inc.) was added and the cycle continued: Step 7: 72 ° C / 10 minutes Step 8: 4 ° C / 8.
The 100 μl volume of the cDNA reaction mixture was transferred to a siliconized 1.5 ml microcentrifuge tube and extracted sequentially with equal volumes of phenol: chloroform (1: 1) and chloroform, and the aqueous phase was transferred to a tube and was placed in a speed-vac for 5 minutes at 37 ° C. The restriction digestion of the cDNA was carried out by adding 74 μl of dH20, 20 μl of buffer pH 2 NEB 10x, 2 μl of BSA 1 mg / ml, 4 μl of Sfil and incubating at 50 ° C for 1 hour, then adding 10 μl of 1 M NaCl, 4 μl of Notl and incubating an additional 1 hour at 37 ° C. The reaction mixture was extracted sequentially with equal volumes of phenol: chloroform (1: 1) and chloroform, then the cDNAs were precipitated by adding 1/1 OOx volume of glycogen 10 mg / ml, 1/30 x volume of acetate 3 M sodium (pH 7.5), 2x volume of 100% absolute ethanol, and freezing at -80 ° C for 1 hour. The cDNA concentrate was washed once with 70% ethanol and air-dried for 15 minutes, then solvated in 5 μl of dH20, 1 μl of 10X NEB ligation buffer, 4 μl of vector DNA previously prepared by digestion with Sß, and CIP. 0.5 μl of T4 DNA ligase was added, and the reaction mixture was incubated at 16 ° C overnight, 10 μl of dH20 was added to the ligation reaction and 0.5 μl was used to transform electrocompetent E. Coli DH10B cells. Typically, 6 to 10 clones per μl of transformed ligation mixture were observed.
EXAMPLE 12 Ligation and activation vectors of genomic DNA and transfection within human cells Genomic DNA was harvested from a human cell line, HT1080 (108 cells), in accordance with published procedures (Sambrook et al., Molecular Cloning, Cold Spring Harbor Laboratory Press, (1989)). Isolated genomic DNA was digested with ßamHI under conditions that resulted in incomplete digestion. This was achieved by titrating the amount of ßamHI in the reaction. Each reaction contained 10 μg of genomic DNA and SamHI at a concentration of either 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.62, or 11.24 units. After 1 hour of incubation at 37 ° C, the reactions were stopped by extraction with phenol, followed by ethanol precipitation. The DNA digested from each reaction was separated by agarose gel electrophoresis. Reactions containing DNA predominantly in the range of 10 kb to 400 kb were combined for ligation with the activation vector. The pooled, genomic and digested DNAs were then added to a SamHI linearized activation vector in a ligation buffer IX. Ligase (Life Technologies, Inc., 40 units) was added and the ligation reaction was incubated at 16 ° C for 24 hours. After ligation, the genomic / activation DNA vector was transfected into HT1080 cells using LIPOFECTIN® (Life Technologies, Inc.) in accordance with the manufacturer's procedure. Optionally, HT1080 cells were irradiated before or after transfection. When the cells were irradiated the doses in the range of OJ rads at 200 rads were found particularly useful. Following transfection, the cells were grown in a complete medium. At 36 hours post-transfection, G418 (300 μg / ml) was added to the medium. At 10-14 days post-selection, the drug-resistant clones were pooled, expanded, and harvested. The total RNA or mRNA was collected from the harvested cells. The cDNA derived from the genes activated by the vector was synthesized and isolated using the methods described herein (see, for example, Example 8 mentioned above).
EXAMPLE 13 Co-transfection of clones contiguous to BAC with the activation vector Genomic libraries were created in pUniBAC (Figures 34A-34B) according to published procedures (Shizuya et al., Proc. Nati, Acad. Sci. USA 89: 8794 (1992)). Typically, the size of the genomic fragments may be between 1 kb and 500 kb, and preferably between 50 kb and 500 kb. The BAC library was propagated in E. coli. To prepare plasmids for transfection, the library was placed on LB agar plates containing 12.5 μg / ml chloramphenicol. Approximately 1000 clones were presented on each plate of 150 mm. Following growth and selection, colonies from each dish were eluted from the agar box by the addition of LB and pooled. Each cluster (-10,000 clones) was grown in 1 liter of LB / chloramphenicol 12.5 μg / ml overnight. The BAC plasmids were then isolated from each pool using commercial equipment (Qiagen). The purified BAC clones were digested with l-Ppo-l which breaks a unique site in the BAC vector flanking the cloning site. Since l-Ppo-l is an ultra-rare cut, it will not digest the vast majority of genomic DNA inserts. Following digestion, clones of linearized genomic libraries were cotransfected into HT1080 cells using LIPOFECTIN® (Life Technologies, Inc.) according to the manufacturer's instructions. Briefly, 10 μg of BAC genomic DNA was combined with 1 μg of linearized pRig20 (Figure 31A-31C) in a-MEM (without serum). 5 μg of LIPOFECTIN® were added to the DNA and the mixture was incubated at room temperature for 15 minutes. The DNA / LIPOFECTIN® mixture was then added to 105 HT1080 cells in a 6-well box. Cells were incubated with DNA / LIPOFECTIN® in serum-free a-MEM for 12 hours, washed, and placed in a-MEM / 10% FBS for 36 hours. To select the cells that had integrated the vector and the genoDNA, the transfected cells were placed on a 10 cm box and incubated in the presence of 300 μg / ml of G418 for 10 days. The drug-resistant clones were expanded and harvested to allow isolation of the active cDNA molecules as described herein in Example 8.
EXAMPLE 14 In vitro integration of the activation vector into the purified genomic DNA and transfection of the integration products within the host cells Genomic DNA was isolated and cloned into the artificial bacterial chromosome, pUniBAC (Figure 34A-34B), using published procedures (Sambrook et al., Molecular Cloning, Cold Spring Harbor Laboratory Press, (1989); Shizuya et al., Proc. Nati, Acad. Sci. USA 89: 8794 (1992) Following the ligation of the genomic inserts within PUniBAC, the plasmids were transformed into the DH10B strain of E. coli (Life Technologies, Inc.) and selected with The individual bacterial clones were combined into groups containing approximately 1000. Each group was grown to saturation in 1 liter of LB / tetracycline.PUniBAC plasmids containing the genomic DNA inserts were isolated from the bacterium using a kit commercial (Qiagen) For each group of UniBAC clones, 2 μg of the library was incubated with 50 ng of the activation vector pRIG-T and 1 unit of transposase mutant Tn5 for 2 hours at 37 ° C (transposase available at artir of Epicenter Technologies). Following incubation, pUniBAC clones were transformed into DH10B cells and selected on chloramphenicol. All clones from each pool were combined and grown in 1 liter of LB / chloramphenicol. The plasmids were harvested using Qiagen Tip-500 columns in accordance with the manufacturer's instructions. For each group, 20 μg of the library was transfected into 2 × 10 6 HT1080 cells with 30 μg of Ex-gene 500 (MBI Fermentas) according to the manufacturer's instructions. At 48 hours post-transfection, the cells were placed in medium containing 3 μg / ml puromycin. After 10 days of growth in the presence of puromycin, the drug-resistant clones were pooled, expanded and harvested for gene discovery. To isolate vector-activated genes, the mRNAs from each cluster of cells were isolated, converted to cDNA, and cloned into the plasmids as described in example 8. The individual DNA clones were analyzed by digestion by restriction and sequencing.
EXAMPLE 15 Creation of protein expression library from cloned genomic DNA A genomic library containing genomic DNA inserts (100 kb average size) was created in pUniBAC as described in examples 13 and 14. (note: in some embodiments of the invention, the genomic fragments are cloned into the linearization site of an activation vector, wherein the activation vector is preferably YAC, BAC, PAC or a cosmid-based vector). In this example, the activation vector, pRIG-TP, was integrated into the BAC genomic library using an in vitro transposition as described in example 14. pRIG-TP is shown in figure 36. Following the integration, the Plasmids from the library were transformed into E. coli and BAC vectors containing an integrated pRIG-TP vector were selected on chloramphenicol boxes. The colonies were pooled and grown to saturation in LB / tetracycline. The BAC plasmids were harvested using a commercial kit (Qiagen). For each transfection, 20 μg of the BAC library was transfected into 2 × 10 6 HT1080 cells using 30 μg of Ex-gene 500 (MBI Fermentas) in accordance with the manufacturer's instructions. At 48 hours post-transfection, the cells were placed in medium containing 3 μg / ml puromycin. After 10 days of selection, the drug-resistant clones were pooled and expanded. The expanded pools for the drug-resistant clones were divided into separate groups for freezing, protein production, and episomal amplification. To isolate and evaluate activated secreted proteins, culture supernatants were harvested and stored at -80 ° C until used in specific assays. Activated intracellular proteins were harvested from cell lysates (prepared by any method known in the art) and used for in vitro assays. To amplify the copy number of the BAC episomes, the cells were selected with increased concentrations of methotrexate, in these experiments, the initial concentration of methotrexate was 20 nM. The concentrations of methotrexate were doubled every 7 days until resistant cells were obtained at 5 μM. At each concentration of methotrexate, a portion of the cells was removed for storage and production of protein. Activated secreted and intracellular proteins were harvested from these cells as described for cells that were not selected by methotrexate. Having now fully described the present invention in some detail by way of illustration and example for purposes of clarity of understanding, it will be obvious to the person skilled in the art that it can be carried out by modifying or changing the invention within a broad scale and equivalent of conditions, formulations and other parameters, without affecting the scope of the invention or any specific modality thereof, and that said modifications or changes are intended to be contemplated within the scope of the appended claims. All publications, patents and patent applications mentioned in this description of the invention are indicative of the level of aptitude of those skilled in the art to which this invention pertains, and are incorporated herein by reference to the same extent as if each publication, patent or individual patent application was indicated in a specific and individual form to be incorporated herein by reference.

Claims (197)

  1. NOVELTY OF THE INVENTION CLAIMS 1. - A vector construction characterized in that it comprises: (a) a first transcription regulatory sequence operably linked to a first non-coupled processing donor sequence; and (b) a second transcriptional regulatory sequence operably linked to a second non-coupled processing donation sequence.
  2. 2. The vector construction according to claim 1, further characterized in that said first regulatory transcriptional sequence is in the same orientation as said second transcriptional regulatory sequence.
  3. 3. The vector construction according to claim 1, further characterized in that said first transcriptional regulatory sequence is in an inverted orientation relative to the orientation of said second transcriptional regulatory sequence.
  4. 4. The vector according to claim 2 or 3, further characterized in that said vector has been made linear.
  5. 5. A vector construction characterized in that it comprises, in sequential order: (a), a regulatory sequence of the transcription; (b) a donation processing site not coupled; (c). a rare site of restriction of cut; and (d). a linearization site. 6. - A vector construction characterized in that it comprises, in sequential order: (a), a regulatory sequence of the transcription; (b) an exon comprising a rare site of restriction of cut; (c). a donation processing site not coupled; and (d). a linearization site. 7. The vector construction according to claim 6, further characterized in that it comprises a second rare cut restriction site located between said non-coupled processing donation site and said linearization site. 8. A vector construct characterized in that it comprises a first transcription regulatory sequence operably linked to a selection marker that lacks a polyadenylation signal, and that further comprises a second transcription regulatory sequence operably linked to a donation site processing not coupled. 9. The vector construction according to claim 1 or claim 8, further characterized in that said first transcriptional regulatory sequence or said second transcriptional regulatory sequence is a promoter. 10. The vector construction according to claim 9, further characterized in that said promoter is selected from the group consisting of a promoter of the immediate early gene of CMV, an SV40 T antigen promoter, a tetracycline-inducible promoter, and a β-actin promoter. 11. The vector construction according to any of claims 5-7, further characterized in that said transcription regulatory sequence is a promoter. 12. The vector construction according to claim 11, further characterized in that said vector is selected from the group consisting of a promoter of the immediate early gene of CMV, a promoter of the SV40 T antigen, a tetracycline-inducible promoter, and a β-actin promoter. 13. The vector construct according to claim 8, further characterized in that said selection marker is selected from the group consisting of a neomycin gene, a hypoxanthinephosphibosyl transferase gene, a puromycin gene, a dihydrooratase gene , a glutamine synthetase gene, a histidine D gene, a carbamyl phosphate synthase gene, a dihydrofolate reductase gene, a multidrug resistance gene 1, a transcarbamylase aspartate gene, a xanthine-guanine phosphoribosyl transferase gene, a adenosine deaminase, and a thymidine kinase gene. 14. A eukaryotic host cell characterized in that it comprises the vector construction as claimed in any of claims 1, 5, 6, 7 and 8. 15. The eukaryotic host cell according to claim 14, further characterized in that said cell is an animal cell. 16. The eukaryotic host cell according to claim 15, further characterized in that said animal cell is selected from the group consisting of a mammalian cell, an insect cell, a bird cell, an annelid cell, an amphibian cell, a reptile cell, and a fish cell. 17. The eukaryotic host cell according to claim 15, further characterized in that said animal cell is a mammalian cell. 18. The eukaryotic host cell according to claim 17, further characterized in that said mammalian cell is a human cell. 19. The eukaryotic host cell according to claim 14, further characterized in that said cell is a plant cell. 20. The eukaryotic host cell according to claim 14, further characterized in that said cell is a fungal cell. 21. The eukaryotic host cell according to claim 20, further characterized in that said fungal cell is a yeast cell. 22. The eukaryotic host cell according to claim 15, further characterized in that said cell is an isolated cell. 23. The eukaryotic host cell according to claim 15, further characterized in that said vector construction is integrated into the genome of said host cell. 24. An initiator molecule characterized in that it comprises a sequence amplifiable by PCR and a degenerate 3 'end, where said first molecule has the structure 5' - (dT) aX-Nb-TTTATT-3 ', where a is an integer a from 1 to 100, X is a PCR-amplifiable sequence consisting of a nucleic acid sequence of about 10-20 nucleotides in length, N is any nucleotide, and b is an integer from 0 to 6. 25. - The initiator molecule according to claim 24, further characterized in that said sequence that is amplifiable by PCR comprises one or more restriction sites. 26.- The initiator molecule according to claim 24, further characterized in that a is an integer from 10 to 30. 27.- The initiator molecule according to the claim 24, further characterized in that the initiator molecule comprises one or more hapten molecules conjugated with one or more bases of said starter molecule. 28. The starter molecule according to claim 27, further characterized in that said hapten molecules are selected from the group consisting of biotin, digoxigenin, an antibody, an enzyme, lipopolysaccharide, apotransferrin, ferrotransferrin, insulin, a cytokine, a extracellular matrix protein, an integrin, ankyrin, C3bi, fibrinogen, spectrin, a cytokine receptor, an insulin receptor, a transferrin receptor, polymyxin B, endotoxin neutralizing protein (ENP), a specific enzyme substrate, protein A , protein G, cell surface Fc receptor, an antibody-specific antigen, an antibody-specific peptide, avidin, and streptavidin. 29. The initiator molecule according to claim 27, further characterized in that said hapten molecule is biotin. A method for the synthesis of the first strand of cDNA characterized in that it comprises: (a) coupling the initiator according to claim 24 to an RNA template molecule to form an initiator-RNA complex; and (b) treating said initiator-RNA complex with reverse transcriptase and one or more deoxynucleotide molecules under conditions that favor reverse transcription of said initiator-RNA complex to synthesize a first strand of cDNA. 31.- A vector construct characterized in that it comprises a transcriptional regulatory sequence operably linked to a non-coupled processing donor sequence and one or more amplifiable markers, wherein said vector construct does not comprise a homologous target sequence. 32 ,. A vector construct characterized in that it comprises a transcriptional regulatory sequence, an amplifiable marker and a viral origin of replication. 33.- A vector construct characterized in that it comprises a selection marker, a transcription regulatory sequence operably linked to a transcription start codon, a secretion signal sequence, a labeled epitope, and a processing donation site not coupled. 34.- A vector construct characterized in that it comprises a transcriptional regulatory sequence operably linked to a start codon of transcription, a secretion signal sequence, a tag epitope, a sequence specific protease site, and a donation site processing not coupled. 35.- A vector characterized in that it comprises: (a) a transcriptional regulatory sequence operably linked to a start translation codon, (b) a nucleic acid sequence encoding an amino acid sequence of four or more amino acids, wherein said amino acid sequence alone is not sufficient to constitute the activity of the signal peptide, but is sufficient to constitute the activity of the signal peptide when said nucleic acid sequence is combined with or located towards the 5 'end of an exon of an endogenous gene , and (c) a non-coupled processing donation site. 36. The vector construction according to any of claims 33-35, further characterized in that said construction further comprises one or more amplifiable markers. 37. The vector construction according to any of claims 31 and 33-35, further characterized in that said transcription regulatory sequence is a promoter. 38.- The vector construction according to claim 37, further characterized in that said promoter is a viral promoter. 39. The vector construction according to claim 38, further characterized in that said viral promoter is a promoter of an immediate early cytomegalovirus gene. 40. The vector construction according to claim 38, further characterized in that said promoter is a non-viral promoter. 41.- The vector construction according to claim 38, further characterized in that said promoter is an inducible promoter. 42.- A cell containing the vector construction as claimed in any of claims 31-35. 43. A cell containing the vector construct as claimed in claim 36. 44. The cell according to claim 42, further characterized in that said vector construction has been integrated into the cellular genome. 45. The cell according to claim 43, further characterized in that said vector construction has been integrated into the cellular genome. 46. - The cell according to claim 44 or 45, further characterized in that an endogenous gene is overexpressed in said cell by upregulating the gene by said transcription regulatory sequence in said vector construct. 47. The cell according to claim 42, further characterized in that said cell is an isolated cell. 48. The cell according to claim 43, further characterized in that said cell is an isolated cell. 49, .- A method for making a host cell, characterized in that it comprises introducing the construction as claimed in any of claims 31-35 within a cell. 50. A method for producing an expression product of an endogenous cellular gene or a portion thereof characterized in that it comprises: (a) introducing the construction of any of claims 31-35 into a cell containing the genome; (b) integrating said construction into the genome of said cell by non-homologous recombination; and (c) overexpressing said endogenous gene in said cell. 51. The method according to claim 50, further characterized in that said overexpression is carried out in vitro. 52. The method according to claim 50, further characterized in that said overexpression is carried out in vivo. 53. The method according to claim 50, further characterized in that it comprises isolating said expression product from said cell. 54.- A cellular library characterized in that it comprises a collection of cells transformed with the construction as claimed in any of claims 31-35, further characterized in that said construction is integrated into the genomes of said cells by non-homologous recombination. 55.- A method for obtaining a gene product from a cell library characterized in that it comprises selecting the library as claimed in claim 54 for the expression of said gene product, selecting from said library a cell that overexpresses said product gene, and obtain said gene product from said selected cell. 56.- A method for producing an expression product of an endogenous cellular gene characterized in that it comprises: (a) introducing a vector comprising a transcription regulatory sequence operably linked to a secretion signal sequence and a non-processing donation sequence coupled within a cell; (b) integrating said vector into the genome of said cell by non-homologous recombination; (c) overexpressing an endogenous gene or a portion thereof in said cell by upregulating said gene by said transcription regulatory sequence; (d) selecting said cell for overexpression of said endogenous gene or a portion thereof; and (e) culturing said cell under conditions that favor the production of the expression of the product of said endogenous gene or portion thereof by said cell. 57. The method according to claim 56, further characterized in that it comprises isolating said expression product. 58.- A method for the overexpression of an endogenous gene in a cell in vivo, characterized in that it comprises: (a) introducing a vector comprising a regulatory sequence of transcription within a cell; (b) integrating said vector into the genome of said cell by non-homologous recombination; (c) overexpressing an endogenous gene or a portion thereof in said cell by upregulating said gene by said transcription regulatory sequence; (d) selecting said cell for overexpression of said endogenous gene; and (e) introducing said isolated and cloned cell into an animal under conditions that favor overexpression of said endogenous gene by said cell in vivo. 59. A method for producing an expression product of an endogenous cellular gene in vivo, characterized in that it comprises: (a) introducing a vector comprising a transcriptional regulatory sequence operably linked to a non-coupled processing donor sequence within a cell; (b) integrating said integrated vector into the genome of said cell by non-homologous recombination; (c) overexpressing an endogenous gene or a portion thereof in said cell by upregulating said gene by said transcription regulatory sequence; (d) selecting said cell for overexpression of said endogenous gene; and (e) introducing said isolated and cloned cell into an animal under conditions that favor overexpression of said endogenous gene by said cell in vivo. A method for producing an expression product of an endogenous cellular gene, characterized in that it comprises: (a) introducing a vector comprising a transcription regulatory sequence and one or more amplifiable markers within a cell; (b) integrating said vector into the genome of said cell by non-homologous recombination; (c) overexpressing an endogenous gene or a portion thereof in said cell by upregulating said gene by said transcription regulatory sequence; (d) selecting said cell for overexpression of said endogenous gene; (e) culturing said cell under conditions in which said vector and said endogenous gene are amplified in said cell; and (f) culturing said cell under conditions that favor the production of the expression of the product of said endogenous gene by said cell. 61.- The method according to claim 60, further characterized in that it comprises isolating said expression product. 62. The method according to claim 60, further characterized in that said vector further comprises a processing donor site operably linked to said transcription regulatory sequence. 63.- The method according to claim 60 or claim 62, further characterized in that said endogenous gene or portion thereof encodes a protein selected from the group of proteins consisting of erythropoietin, insulin, growth hormone, glucocerebrosidase, activator of tissue plasminogen, granulocyte colony stimulating factor (G-CSF), granulocyte / macrophage colony stimulating factor (GM-CSF), macrophage colony stimulating factor (M-CSF), interferon-a, interferon-β, interferon ?, interleukin-2, interleukin-3, interleukin-4, interleukin-6, interleukin-8, interleukin-10, interleukin-11, interleukin-12, interleukin-13, interleukin-14, TGF-β, blood coagulation V, blood coagulation factor VII, blood coagulation factor VIII, blood coagulation factor IX, blood coagulation factor X, TSH-β, bone growth factor-2, growth factor or seo-7, tumor necrosis factor, alpha-1 antitrypsin, antithrombin III, leukemia inhibitory factor, glucagon, protein C, protein kinase C, stem cell factor, ß-follicle-stimulating hormone, urokinase, a nerve growth factor , a growth factor similar to insulin, insulinotropin, parathyroid hormone, lactoferrin, a complement inhibitor, platelet-derived growth factor, keratinocyte growth factor, hepatocyte growth factor, endothelial cell growth factor, neurotropin-3 , thrombopoietin, chorionic gonadotropin, thrombomodulin, alpha glucosidase, epidermal growth factor, fibroblast growth factor, a cell surface receptor, a transmembrane ion channel, a cholesterol receptor, a receptor for a lipoprotein, an integrin, a protein of the cytoskeleton for attachment, an immunoglobulin receptor, and a CD antigen. 64.- The method according to claim 60 or claim 62, further characterized in that said endogenous gene or portion thereof encodes an erythropoietin protein. The method according to claim 60 or claim 62, further characterized in that said endogenous gene or portion thereof encodes a growth hormone protein. 66.- The method according to claim 60 or claim 62, further characterized in that said endogenous gene or portion thereof encodes a G-CSF protein. 67.- A gene expression product produced by the method as claimed in claim 60 or claim 62, characterized in that it is a protein selected from the group of proteins consisting of erythropoietin, insulin, growth hormone, glucocerebrosidase, activator of tissue plasminogen, granulocyte colony stimulating factor (G-CSF), granulocyte / macrophage colony stimulating factor (GM-CSF), macrophage colony stimulating factor (M-CSF), interferon-a, interferon-β , interferon ?, interleukin-2, interleukin-3, interleukin-4, interleukin-6, interleukin-8, interleukin-10, interleukin-11, interleukin-12, interleukin-13, interleukin-14, TGF-β, factor of blood coagulation V, blood coagulation factor VII, blood coagulation factor VIII, blood coagulation factor IX, blood coagulation factor X, TSH-β, bone growth factor-2, bone growth factor -7, tumor necrosis factor, alpha-1 antitrypsin, antithrombin III, leukemia inhibitory factor, glucagon, protein C, protein kinase C, stem cell factor, ß-follicle-stimulating hormone, urokinase, a nerve growth factor, a growth factor similar to insulin, insulinotropin, parathyroid hormone, lactoferrin, an inhibitor of complement, platelet-derived growth factor, keratinocyte growth factor, hepatocyte growth factor, endothelial cell growth factor, neurotropin-3, thrombopoietin, chorionic gonadotropin, thrombomodulin, alpha glucosidase, epidermal growth factor, fibroblast growth factor, a cell surface receptor, a transmembrane ion channel, a cholesterol receptor, a receptor for a lipoprotein, an integrin, a cytoskeleton protein for attachment, an immunoglobulin receptor, and a CD antigen. 68.- A product of expression of the gene produced by the method as claimed in claim 60 or claim 62, further characterized in that said gene expression product is an erythropoietin protein. 69.- A product for expressing the gene produced by the method as claimed in claim 60 or 62, further characterized in that said gene expression product is a growth hormone protein. 70.- A product of expression of the gene produced by the method as claimed in claim 60 or 62, further characterized in that said gene expression product is a G-CSF protein. 71.- A method for the overexpression of an endogenous gene in an in vivo cell characterized in that it comprises: (a) introducing a vector comprising a transcription regulatory sequence and one or more amplifiable markers within a cell; (b) integrating said vector into the genome of said cell by non-homologous recombination; (c) overexpressing an endogenous gene or a portion thereof in said cell by upregulating said gene by said transcription regulatory sequence; (d) selecting said cell for overexpression of said endogenous gene; and introducing said isolated and cloned cell into an animal under conditions that favor overexpression of said endogenous gene by said cell in vivo. 72. The method according to any of claims 56, 58-60, 62 and 71 further characterized in that said transcription regulatory sequence is a promoter. 73.- The method according to claim 72, further characterized in that said promoter is a viral promoter. 74. The method according to claim 73, further characterized in that said promoter is an immediate early promoter of cytomegalovirus. 75. The method according to claim 72, further characterized in that said promoter is a non-viral promoter. 76. - The method according to claim 72, further characterized in that said promoter is inducible. 77. The method according to any of claims 56, 58-60, 62 and 71, further characterized in that it comprises introducing a double break of the strand within the genomic DNA of said cell before or simultaneously with the integration of said vector . The method according to claim 49, further characterized in that it comprises introducing a double break of the strand into the genomic DNA of said cell before or simultaneously with the integration of said vector. 79. The method according to claim 50, further characterized by comprising the introduction of a double break of the strand within the genomic DNA of said cell before or simultaneously with the integration of said vector. 80.- A gene expression product produced by the method as claimed in any of claims 56, 58-60, 62 and 71. 81.- The method according to any of claims 56, 58-60, 62 and 71, further characterized in that said vector construction is linear. 82. A method for producing an expression product of an endogenous gene in a cell comprising: (a) introducing a vector comprising a transcriptional regulatory sequence into at least one isolated cell containing the genome; (b) integrating said vector into the genome of said cell by non-homologous recombination; (c) overexpressing an endogenous gene or a portion thereof in said cell by upregulating said gene by said transcription regulatory sequence; selecting said cell for overexpression of said endogenous gene; and (e) culturing said cell in medium with reduced serum. 83.- A method for discovering a protein characterized in that it comprises: (a) introducing a vector comprising a transcription regulatory sequence into at least one isolated cell containing the genome; (b) integrating said vector into the genome of said cell by non-homologous recombination; (c) culturing said cell in medium with reduced serum under conditions that allow overexpression of an endogenous gene or a portion thereof in said cell by upregulating said gene by said transcription regulatory sequence, thereby producing conditioned medium by the cell; and (d) selecting said medium conditioned by the cell for the presence of the expression product of said gene or portion thereof. 84. The method according to claim 83, further characterized in that it comprises concentrating said medium conditioned by the cell before selecting it in (d). 85.- The method according to any of claims 82-84, further characterized in that said method comprises a high fidelity test. 86. - A method for the production of an expression product of an endogenous cellular gene characterized in that it comprises: (a) introducing a vector comprising a transcription regulatory sequence within a cell; (b) integrating said vector into the genome of said cell by non-homologous recombination; (c) overexpressing an endogenous gene or a portion thereof in said cell by upregulating said gene by said transcription regulatory sequence; (d) selecting said cell for overexpression of said endogenous gene; (e) culturing said cell under conditions that favor the production of the expression product of said endogenous gene by said cell; and (f) isolating said expression product from a cell mass equivalent to at least 10 liters of cells at 10 4 cells / ml. 87. The method according to any of claims 82-84 and 86, further characterized in that said vector comprises one or more amplifiable markers. 88.- The method according to any of claims 82-84 and 86, further characterized in that said vector comprises a non-coupled processing donation site. 89.- A method for increasing the expression of an endogenous gene in a cell in situ, knowing the phenotype of said cell, without making use of any sequence information of the gene, characterized in that it comprises the steps of: (a) constructing a vector comprising an amplifiable marker, a transcriptional regulatory sequence and an uncoupled processing donation sequence; (b) administering copies of the vector to a plurality of cells (c) culturing the cells under conditions that allow non-homologous recombination events between the inserted vector and the genome of the cells; (d) selecting the recombinant cells by assays for the phenotype of said endogenous gene to identify cells in which the expression of said gene has been improved; and (e) selecting the cells with increased expression of said amplifiable marker and said endogenous gene. 90. The method according to claim 89, further characterized in that the phenotype is the production of a particular protein and the test is conducted by testing the increased production of the protein. 91.- An isolated cell characterized in that it comprises an inserted genetic construct in its genome, said genetic construct comprising an amplifiable marker and a transcription regulatory sequence, where said construction is inserted inside a gene or in a region towards the 5 'end of a gene and activates the expression of said gene, and where said gene and region towards the 5 'end of said gene has no nucleotide sequence homology to said genetic construct. 92. The isolated cell according to claim 91, further characterized in that said genetic construct further comprises a donor sequence of processing not coupled to the exon. 93. The isolated cell according to claim 91 or claim 92, further characterized in that said gene encodes a protein selected from the group of proteins consisting of erythropoietin, insulin, growth hormone, glucocerebrosidase, tissue plasminogen activator, Granulocyte Colony Stimulating Factor (G-CSF), Granulocyte / Macrophage Colony Stimulating Factor (GM-CSF), Macrophage Colony Stimulating Factor (M-CSF), Interferon A, Interferon-B, Interferon-A, interleukin-2, interleukin-3, interleukin-4, interleukin-6, interleukin-8, interleukin-10, interleukin-11, interleukin-12, interleukin-13, interleukin-14, TGF-β, blood coagulation factor V, blood clotting factor VII, blood coagulation factor VIII, blood coagulation factor IX, blood coagulation factor X, TSH-β, bone growth factor-2, bone growth factor-7, necro factor tumor sis, alpha-1 antitrypsin, antithrombin III, leukemia inhibitory factor, glucagon, protein C, protein kinase C, stem cell factor, ß-follicle-stimulating hormone, urokinase, nerve growth factor, a similar growth factor insulin, insulinotropin, parathyroid hormone, lactoferrin, a complement inhibitor, platelet-derived growth factor, keratinocyte growth factor, hepatocyte growth factor, endothelial cell growth factor, neurotropin-3, thrombopoietin, chorionic gonadotropin, thrombomodulin, alpha glucosidase, epidermal growth factor, fibroblast growth factor, a cell surface receptor, a transmembrane ion channel, a cholesterol receptor, a receptor for a lipoprotein, an integrin, a cytoskeleton protein for attachment, a receptor of immunoglobulin, and a CD antigen. 94. The isolated cell according to claim 91 or claim 92, further characterized in that said gene encodes an erythropoietin protein. 95. The isolated cell according to claim 91 or claim 92, further characterized in that said gene encodes a growth hormone protein. 96.- The isolated cell according to claim 91 or claim 92, characterized. also because said gene encodes a G-CSF protein. 97.- A method for improving the expression of a gene, characterized in that it comprises: (a) introducing a vector into the genome of a cell, said vector contains an enhancer sequence and one or more amplifiable markers, where said vector does not contain sequences of specific targeting of the gene; (b) selecting said cell for the expression of an endogenous gene; and (c) selecting from the cells with increased expression of said amplification marker and said endogenous gene. 98. The method according to claim 97, further characterized in that it comprises isolating the cell in which the expression of said endogenous gene has been increased. 99. - A method for improving the expression of an endogenous gene in a cell, characterized in that it comprises: (a) integrating a vector into a cell by non-homologous recombination, said vector comprising an enhancer sequence and one or more amplifiable markers; (b) selecting for non-homologous recombinant cells expressing said endogenous gene, wherein said gene and the regions towards the 5 'end and towards the 3' end of said gene, in whose regions said enhancer sequence is active, are not homologous to said vector; and (c) selecting for cells with increased expression of said amplification marker and said endogenous gene. 100.- An isolated cell characterized in that it comprises in its genome an inserted artificial genetic construct, the genetic construct comprising one or more amplifiable markers and an effective enhancer to improve the expression of a gene in said cell, where said genetic construct is inserted within a gene or towards the 5 'or 3' regions of a gene, and where said gene and said regions, in which said enhancer sequence is active, are not homologous to said genetic construct. 101. The method according to any of claims 53, 56, 58-60, 62, 71, 82-84 and 86, further characterized in that said endogenous gene encodes a transmembrane protein. 102. The method according to any of claims 89, 97 and 99, further characterized in that said gene encodes a cellular transmembrane protein. 103. The method according to any of claims 58, 59, 62 and 71, further characterized in that it comprises the isolation and cloning of said cell before introducing said cell into an animal. 104. The method according to any of claims 58, 59, 62 and 71, further characterized in that said animal is a mammal. 105. The method according to claim 104, further characterized in that said mammal is a human. 106.- A method for identifying a cell that expresses an endogenous gene that encodes an integral membrane protein, characterized in that it comprises: (a) introducing into a cell a vector comprising: (i) an operably linked transcription regulatory sequence to an exonic sequence containing a start codon, (ii) a signal sequence, and (iii) a tag epitope followed by an uncoupled processing donor site; (b) integrating said vector into the genome of said cell by non-homologous recombination; (c) overexpressing an endogenous gene or a portion thereof in said cell by upregulating said gene by said transcription regulatory sequence; and (d) selecting said cell for the expression of said labeled epitope on the surface of said cell. 107.- A method for identifying a cell that expresses an endogenous gene that encodes an integral membrane protein, characterized in that it comprises: (a) isolating genomic DNA from eukaryotic host cells; (b) combining said isolated genomic DNA with a vector to form a genomic DNA-vector complex, said vector comprising: (i) a transcriptional regulatory sequence operably linked to an exonic sequence containing a start codon, (ii) a signal sequence, and (iii) a brand epitope; (c) introducing said genomic-vector DNA complex into a eukaryotic host cell; (d) overexpressing an endogenous gene in said cell by upregulating said gene by said transcription regulatory sequence; and (e) selecting said cell for the expression of said tag epitope on the surface of said cell. 108. A method for identifying a cell that expresses an endogenous gene that encodes an integral membrane protein, characterized in that it comprises: (a) preparing cDNA from a eukaryotic host cell; (b) combining said isolated cDNA with a vector to form a cDNA-vector complex, said vector comprising: (i) a transcriptional regulatory sequence operably linked to an exonic sequence containing a start codon, (ii) a sequence signal, and (iii) a tag epitope followed by an uncoupled processing donor site; (c) introducing said cDNA-vector complex into a eukaryotic host cell; (d) overexpressing an endogenous gene in said cell by upregulating said gene by said transcription regulatory sequence; and (e) selecting said cell for the expression of said tag epitope on the surface of said cell. 109. The method according to any of cl 106-108, further characterized in that it comprises isolating said cell expressing said tag epitope. 110. The method according to cl109, further characterized in that it comprises isolating said endogenous gene overexpressed from said isolated cell. 111. A vector characterized in that it comprises: (a) a first promoter operably linked to an exon and an uncoupled processing donor site, and (b) a second promoter operably linked to a selection marker lacking a polyadenylation signal . 112. The vector according to cl111, further characterized in that said first and second promoters are presented in said vector in the same orientation. 113. The vector according to cl112, further characterized in that said vector is linear and where said selection marker is located towards 3 'of said first promoter. 114. The vector according to cl112, further characterized in that said vector is linear and wherein said second promoter is located towards 5 'of said non-coupled processing donor site. 115. The vector according to cl111, further characterized in that said exon lacks a start codon of the translation. 116. - The vector according to cl111, further characterized in that said exon comprises a start codon of the translation. 117. The vector according to cl111, further characterized in that said exon comprises a start codon of the translation and a secretion signal sequence. 118.- A vector construction characterized in that it comprises: (a) a first promoter; (b) a positive selection marker; (c) a negative selection marker; and (d) a non-coupled processing donor site, wherein said positive and negative selection marker and said processing donation site are oriented in said vector construct in an orientation that, when said vector construct is integrated into the genome of A eukaryotic host cell is such that processing occurs between said processing donation sites encoded by the vector and a processing acceptor site encoded by the genome, then said positive selection marker is expressed in an active form and said marker of Negative selection is not expressed or expressed in an inactive form. 119. The vector according to cl118, further characterized in that said positive and negative selection markers are present as a fusion gene. 120. The vector according to cl118, further characterized in that said positive selection marker, said negative selection marker, or both positive and negative selection markers, lack a polyadenylation site. 121. The vector according to cl118, further characterized in that said vector further comprises a second promoter operably linked to a second non-coupled processing donation site. 122.- A vector comprising a first promoter and a second promoter, said first and second promoters being oriented in the same direction, characterized in that: (a) said first promoter, but not said second promoter, is operatively linked to a site of donation of non-coupled processing; (b) said vector does not comprise polyadenylation signals towards the 3 'end of either said first promoter or said second promoter. 123.- The vector according to cl122, further characterized in that said vector is linear and where said second promoter is located towards 3 'of said first promoter. 124. A vector characterized in that it comprises: (a) a first promoter operably linked to a first selection marker containing an uncoupled processing donor site; and (b) a second promoter operatively linked to a second selection marker, wherein neither said first selection marker nor said second selection marker contains a polyadenylation signal. 125. The vector according to claim 124, further characterized in that said first and second selection markers are positive selection markers. 126. The vector according to claim 124, further characterized in that said first selection marker is located towards the 5 'end of said second selection marker. 127.- A vector characterized in that it comprises: (a) a first promoter operatively linked to a first exon and a first non-coupled processing donor site; and (b) a second promoter operably linked to a second exon and a second non-coupled processing donor site, wherein the nucleotide sequence of said first exon is different from the nucleotide sequence of said second exon. 128. The vector according to claim 127, further characterized in that said first and second exons each comprise a start codon of translation and an open reading frame that does not end with a stop codon. 129. The vector according to claim 127, further characterized in that said first exon, said second exon or both said first and second exons, lack a start codon of translation. 130.- A vector construct characterized in that it comprises: (a) a first promoter operably linked to a positive selection marker; (b) a second promoter operably linked to a negative selection marker; and (c) an uncoupled processing donor site, wherein said positive and negative selection markers and said processing donor site are oriented in said vector construct in an orientation that, when said vector construct is integrated into the genome of a eukaryotic host cell such that an endogenous gene in said genome is transcriptionally activated, then said positive selection marker is expressed in an active form and said negative selection marker is not expressed or expressed in an inactive form. 131. The vector construct according to claim 130, further characterized in that it comprises a third promoter operably linked to a second non-coupled processing donor site. 132.- The vector according to any of claims 1, 5-7, 8, 35, 111, 118, 122, 124, 127, 130 and 131, further characterized in that said vector comprises one or more of the transposition signals . 133. The vector according to any of claims 111, 118, 122, 124, 127, 130 and 131, further characterized in that said vector comprises one or more amplifiable markers. 134. The vector according to any of claims 1, 5-7, 8, 31, 32, 35, 111, 118, 122, 124, 127, 130 and 131, further characterized in that said vector comprises one or more origins Viral replication. 135. - The vector according to any of claims 1, 5-7, 8, 31, 32, 35, 111, 118, 122, 124, 127, 130 and 131, further characterized in that said vector comprises one or more factor genes of viral replication. The vector according to claim 133, further characterized in that said amplification marker is selected from the group consisting of dihydrofolate reductase, adenosine deaminase, aspartate transcarbamylase, dihydro-oratase, and carbamyl phosphate synthase. 137. The vector according to claim 134, further characterized in that said viral origin of replication is selected from the group consisting of ori P of the Epstein Barr virus and SV40 ori. 138. The vector according to any of claims 1, 5-7, 8, 31, 32, 35, 111, 118, 122, 124, 127, 130 and 131, further characterized in that said vector comprises genomic DNA. 139.- A host cell characterized in that it comprises the vector as claimed in any of claims 31, 32, 35, 111, 118, 122, 124, 127, 130 and 131. 140. A host cell comprising the vector as described in FIG. claims in claim 132. 141. A host cell comprising the vector as claimed in claim 133. 142. A host cell comprising the vector as claimed in claim 134. 143. A host cell comprising the vector as claimed in claim 135. 144.- A host cell comprising the vector as claimed in claim 138. 145. - The host cell according to claim 139, further characterized in that said host cell is a cell isolated 146. The host cell according to any of claims 140-144, further characterized in that said host cell is an isolated cell. 147. A cell library characterized in that it comprises the vector as claimed in any of claims 1, 5-7, 8, 31, 32, 35, 111, 118, 122, 124, 127, 13O and 131. 148. - A cell library characterized in that it comprises the vector as claimed in claim 132. 149.- A cell library characterized in that it comprises the vector as claimed in claim 133. 150.- A cell library characterized in that it comprises the vector as claimed in claim 134. 151. A cell library characterized in that it comprises the vector as claimed in claim 135. 152. A cell library characterized in that it comprises the vector as claimed in claim 138. 153. - A method for the activation of an endogenous gene is a cell characterized in that it comprises: (a) transfecting a cell containing the genome with the vector as claimed in any of claims 1, 5- 7, 8, 31, 32, 35, 111, 118, 122, 124, 127, 130 and 131; and (b) culturing said cell under conditions suitable for non-homologous integration of said vector into the genome of said cell, wherein said integration results in the activation of an endogenous gene in said cell's genome. 154. A method for identifying a gene characterized in that it comprises: (a) transfecting a plurality of genome-containing cells with the vector according to any of claims 1, 5-7, 8, 31, 32, 35, 111, 118, 122, 124, 127, 130 and 131; (b) culturing said cells under conditions suitable for non-homologous integration of the vector into the genome of the host cell; (c) selecting for cells in which said vector has been integrated into the genomes of said cells; (d) isolating the RNA from said selected cells; (e) producing cDNA from said isolated RNA; and (f) identifying a gene in said cDNA by isolating one or more cDNA molecules containing no more nucleotide sequences from said vector. 155. The method according to claim 154, further characterized in that said identification in (f) is carried out by hybridizing said cDNA to said vector. 156. - The method according to claim 154, further characterized in that said identification in (f) is carried out by sequencing in said cDNA and comparing the nucleotide sequence of said cDNA to the nucleotide sequence of said vector. 157. The vector according to claim 124, further characterized in that said non-coupled processing donor site is located towards the 5 'end of, or within, said first selection marker so that, when said vector is integrated within of the genome of a eukaryotic host cell results in the processing of said processing donor site not coupled to a processing acceptor site encoded by the genome, then said first selection marker is expressed in an inactive form or is not fully expressed. 158. A method for isolating cells in which a single exon gene has been activated, characterized in that it comprises: (a) transfecting a plurality of eukaryotic cells containing genome with the vector according to claim 157; (b) culturing said cells under conditions suitable for non-homologous integration of the vector into the genomes of said cells; and (c) selecting for cells in which said first and second selection markers are expressed in their active forms. 159.- The method of compliance with the. claim 158, further characterized in that it comprises: (d) isolating RNA from the selected cells; (e) producing cDNA from said isolated RNA; and (f) isolating a single exon gene from said cDNA. 160.- A method for isolating exon I from a gene characterized in that it comprises: (a) transfecting one or more eukaryotic cells containing genome with the vector according to any of claims 111, 112, 114, 122, 124 and 127; (b) cultivating said low cells suitable conditions for non-homologous integration of the vector into the genome of said cells; (c) selecting for cells in which said vector has transcriptionally activated an endogenous gene containing one or more exons; (d) isolating RNA from said selected cells; (e) producing cDNA from said isolated RNA; (f) recovering the cDNA molecules containing a first exon from said processed vector to a second exon from said endogenous gene, thereby obtaining one or more cDNA molecules of the labeled exon in the vector; and (g) using said cDNA molecules of the labeled exon in the vector to recover the activated endogenous gene contained in exon I. 161. A method for expressing a transcript containing exon I of a gene, characterized in that it comprises: a) transfecting one or more eukaryotic cells containing genomes with the vector as claimed in any of claims 111, 112, 114, 122, 124 and 127; (b) culturing said cells under conditions suitable for non-homologous integration of the vector into the genome of said cells; and (c) culturing said cells lowered suitable conditions for the expression of a transcript containing exon I from an endogenous gene. 162. - A method for producing a gene product characterized in that it comprises: (a) isolating genomic DNA containing at least one gene, from a eukaryotic cell; (b) inserting within said isolated genomic DNA, by in vitro transposition, a vector comprising one or more transposition signals, one or more promoters, one or more exons, and one or more non-coupled processing donor sites, both forming a genomic-vector DNA complex; (c) introducing said genomic-vector DNA complex into a eukaryotic host cell; and (d) culturing said host under conditions suitable for the expression of said gene. 163. The method according to claim 162, further characterized in that it comprises isolating an expression product of said gene. 164. A method for producing a gene product encoded by an endogenous genomic cell gene, characterized in that it comprises: (a) isolating genomic DNA, containing at least one gene, from a eukaryotic cell; (b) inserting in or otherwise combining with said isolated genomic DNA, the vector according to any of claims 111, 112,114, 122, 124 and 127, thereby producing a genomic vector-DNA complex; (c) transfecting said genomic vector-DNA complex into a suitable eukaryotic host cell; and (d) culturing said host cell under suitable conditions to result in the transcription of one or more genes encoded by said vector contained in said genomic vector-DNA complex. 165. - The method according to claim 164, further characterized in that it comprises: (e) isolating RNA produced by said transcription from said host cell; (f) producing one or more cDNA molecules from said isolated RNA; and (g) recovering one or more cDNA molecules containing the vector sequence at the 5 'end of said cDNA molecules, thereby isolating said gene. 166. The method according to claim 164, wherein said vector further comprises one or more transposition signals, and wherein said vector is inserted into said isolated DNA by in vitro transposition. 167. The method according to claim 164, further characterized in that said isolated genomic DNA is present in a cloning vector. 168.- A method for producing a protein characterized in that it comprises: (a) isolating genomic DNA from one or more cells; (b) inserting in or otherwise combining with said isolated genomic DNA, the vector as claimed in any of claims 111, 112, 114, 122, 124 and 127, thereby producing a genomic vector-DNA complex; (c) transfecting said genomic vector-DNA complex into a suitable host cell; and (d) culturing said cell under suitable conditions to result in the expression of protein from the genomic DNA contained in said genomic vector-DNA complex. 169. A method for producing a protein characterized in that it comprises: (a) isolating genomic DNA from one or more cells; (b) integrating a vector comprising one or more transposition signals and a transcriptional regulatory sequence operably linked to a processing donor complex not coupled to the exon, within said genomic DNA isolated by transposition, thereby producing a vector complex -DNA genomic; (c) transfecting said genomic vector-DNA complex into a suitable host cell; and (d) culturing said cell under suitable conditions to result in the expression of proteins from said genomic DNA contained in said genomic vector-DNA complex. 170.- A method for expressing a gene characterized in that it comprises; (a) isolating genomic DNA, which contains one or more genes, from one or more eukaryotic cells; (b) combining said isolated genomic DNA with a vector comprising: (i) a selection marker, (ii) a transcriptional regulatory sequence operably linked to a translation start codon, (iii) a secretion signal sequence , (iv) an epitome tag, and (v) an uncoupled processing donor site, thereby producing a genomic vector-DNA complex; (c) introducing said genomic vector-DNA complex into a cell; (d) selecting for cells containing said genomic vector-DNA complex; and (e) culturing said cell under suitable conditions to result in the expression of a gene contained in said genomic vector-DNA complex. 171. The method according to claim 168, further characterized in that said host cell is selected from a cell containing said transfected genomic vector-DNA complex prior to, during, or following the culture under suitable conditions to result in the protein expression. 172. The method according to claim 169, further characterized in that said vector comprises a selection marker, and wherein said host cell is selected from a cell containing said transfected genomic vector-DNA complex before being cultured under conditions suitable to result in the expression of protein or gene. 173. The method according to claim 167, further characterized in that said cloning vector is selected from the group consisting of a BAC, a YAC, a PAC, a cosmid, a phage, and a plasmid. 174. The method according to claim 164, further characterized in that it comprises isolating said protein. 175. A protein produced by the method as claimed in claim 168. 176. A protein produced by the method as claimed in any of claims 170-172. 177.- A protein produced by the method as claimed in claim 174. 178.- A method for protein expression characterized in that it comprises: (a) transfecting a host cell with a vector comprising a heterologous promoter operatively linked to: (i) a heterologous exon, (ii) a heterologous processing donor site, (iii) a genomic DNA fragment encoding a gene or a portion thereof, and (v) one or more selection markers, wherein said heterologous exon either lacks the start codon of the translation or encode a start codon of translation and an open reading frame that is not terminated by a stop codon; (b) selecting for a cell containing said transfected vector; and (c) culturing said selected transfected host cell under conditions suitable for protein expression from said vector. 179. The method according to claim 178, further characterized in that said vector comprises a viral origin of replication. 180. The method according to claim 179, further characterized in that said origin of viral replication is oriP of the Epstein Barr virus. 181. A vector characterized in that it comprises: (a) a heterologous promoter; (b) a heterologous exon; (c) a heterologous processing donor site; (d) a genomic fragment encoding a gene or a portion thereof; (e) one or more selection markers; and (f) one or more viral replication origins, wherein said heterologous exon either lacks the start codon of the translation or encodes a start codon of the translation and an open reading frame that is not terminated by a codon of stop, and where said genomic fragment is oriented towards the 3 'end of said heterologous promoter, said exon and said donor site of processing, such that after the introduction of said vector into a host cell, the protein is expressed from said gene or portion thereof encoded by said genomic fragment. 182. The vector according to claim 181, further characterized in that said selection marker lacks a polyadenylation signal. 183.-. The vector according to claim 181, further characterized in that it comprises one more genes encoding one or more proteins that viral replication. 184. The vector according to claim 181, further characterized in that it comprises an amplifiable marker. 185. A cell comprising the vector as claimed in any of claims 181-184. 186. The cell according to claim 185, further characterized in that said cell is an isolated cell. 187. The vector construction according to claim 8, further characterized in that said first transcriptional regulatory sequence is in the same orientation in said vector construct as said second transcriptional regulatory sequence. 188. - The vector construct according to claim 118 or 130, further characterized in that said positive selection marker is selected from the group consisting of a neomycin gene, a hypoxanthine gene, a phosphibosyl transferase gene, a puromycin, a dihydrooratase gene, a glutamine synthetase gene, a histidine D gene, a carbamyl phosphate synthase gene, or a hydrofolate reductase gene, a gene 1 for multidrug resistance, an aspartate transcarbamylase gene, a gene for xanthine-guanine phosphoribosyl transferase, and an adenosine deaminase gene. 189. The construction of vector according to claims 118 or 130, further characterized in that said negative selection marker is selected from the group consisting of a hypoxanthine fosfibosyl transferase gene, a thymidine kinase gene, a diphtheria toxin. 190. The vector according to claim 130, further characterized in that said negative selection marker is located towards the 5 'end of said positive selection marker. 191. A host cell that stably expresses a protein, characterized in that it comprises a vector comprising a promoter, an exon / donor processing complex, and a genomic fragment encoding said protein or portion thereof, wherein said promoter and exon / processing donor are heterologous to said genomic fragment. 192. - The host cell according to claim 191, further characterized in that said vector is integrated into the genome of said cell. 193. The host cell according to claim 191, further characterized in that said vector further comprises a viral origin of replication and is maintained within said host cell as an episome. 194. The cell according to claim 190 or claim 192, further characterized in that said vector comprises one or more selection markers. 195. The cell according to claim 192, further characterized in that the origin of viral replication is oriP of the Epstein Barr virus. 196. A method for activating the expression of an endogenous gene characterized in that it comprises: (a) introducing into a host cell containing a chromosome a suitable vector for the activation of an endogenous gene; (b) treating said cell with an agent capable of introducing breaks in the DNA in the chromosome of said host cell before or after the introduction of said vector; (c) integrating said vector into said DNA breakdown so as to result in the formation of an operable link between said vector and said endogenous gene, therefore said endogenous gene is activated by one or more nucleotide sequences encoded by the vector. 197. - The method according to claim 196, further characterized in that said activation in (d) is carried out by isolating said host cell and cultivating said host cell under conditions that favor the activation of said endogenous gene. 198. A vector characterized in that it comprises: (a) a transcriptional regulatory sequence operably linked to a gene; (b) a viral origin of replication; and (c) an amplifiable marker. 199. A method for increasing the expression of a gene, characterized in that it comprises: (a) introducing the vector as claimed in claim 198 into a host cell, further characterized in that said vector is maintained as an episome within said cell host and (b) selecting for the increased expression of said amplifiable marker and said gene. 200. A method for breaking a cDNA molecule derived from a molecule of an unprocessed cellular transcript, characterized in that it comprises: a) integrating the vector as claimed in claim 5 or claim 7 into the genome of a or more eukaryotic host cells; (b) culturing said host cell under conditions suitable for the expression of said transcription regulatory sequence; (c) isolating the RNA from said host cell; produce cDNA from said isolated RNA; and (d) digesting said cDNA with an enzyme that breaks said rare disruption restriction site. 201.- A method for drug discovery characterized in that it comprises: (a) integrating a vector into the genome of a eukaryotic host cell, further characterized in that said integration vector activates the expression of an endogenous gene in said host cell; (b) culturing said cell under conditions that favor the expression of said activated gene, thereby producing a gene product of said activating gene; (c) treating said cell with one or more test compounds to be selected for drug acty; and (d) determining the ability of said one or more test compounds to interact with, or affect a cellular phenotype induced by, said gene product. 202.- A method for drug discovery characterized in that it comprises: (a) integrating a vector into the genome of a eukaryotic host cell, further characterized in that said integration vector activates the expression of an endogenous gene in said host cell; (b) culturing said cell in medium with reduced serum under conditions that favor the production of a gene product of said activated gene, therefore producing conditioned medium for the cell comprising said gene product; (c) selecting one or more of the test compounds for drug acty by determining the ability of said test compounds to interact with said gene product in said medium conditioned by said cell. 203. The method according to claim 202, further characterized in that it comprises concentrating said medium conditioned by the cell before said selection in (c). SUMMARY OF THE INVENTION The present invention generally relates to activating gene expression or causing overexpression of a gene by in situ recombination methods; the invention also generally relates to methods for expressing an endogenous gene in a cell at levels higher than those normally found in the cell; In one embodiment of the invention, the expression of an endogenous gene is activated or increased after integration into the cell, by non-homologous or illegitimate recombination, of a regulatory sequence that activates the expression of a gene; in another embodiment, the expression of the endogenous gene can also be increased by the joint integration of one or more amplifiable markers, and selected for increased copies of one or more amplifiable markers located in the integrated vector; in another embodiment, the invention relates to the activation of endogenous genes by non-targeted integration of specialized activation vectors, which are provided by the invention, within the genome of a host cell; the invention also provides methods for the identification, activation, isolation and / or expression of genes that can not be discovered by current methods since the identification sequence for integration is not necessary; the invention further provides methods for the isolation of nucleic acid molecules (particularly cDNA molecules) that encode a variety of transmembrane proteins, and for the isolation of cells expressing said transmembrane proteins which may be heterologous transmembrane proteins; the invention is also directed to isolated genes, gene products, nucleic acid molecules to compositions comprising said genes, gene products and nucleic acid molecules, and to vectors and host cells comprising said genes and nucleic acid molecules, which may be used in a variety of therapeutic and diagnostic applications; therefore, by the present invention, endogenous genes, including those associated with diseases and human development, can be activated and isolated without prior knowledge of the sequence, structure, function or expression profile of the genes. * MA / cgt * P01 / 1237F
MXPA/A/2001/008415A 1999-02-19 2001-08-17 Compositions and methods for non-targeted activation of endogenous genes MXPA01008415A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US09/253,022 1999-02-19
US09/263,814 1999-03-08
US09276820 1999-03-26

Publications (1)

Publication Number Publication Date
MXPA01008415A true MXPA01008415A (en) 2002-06-05

Family

ID=

Similar Documents

Publication Publication Date Title
US7785831B2 (en) Compositions and methods for non-targeted activation of endogenous genes
EP1017803B1 (en) Expression of endogenous genes by non-homologous recombination of a vector construct with cellular dna
US5981214A (en) Production of proteins using homologous recombination
US5830698A (en) Method for integrating genes at specific sites in mammalian cells via homologous recombination and vectors for accomplishing the same
EP1155131B1 (en) Vector comprising a splice acceptor trap and a poly a trap and the corresponding eukaryotic host cells
WO2000065042A1 (en) P element derived vector and methods for its use
JP2004524031A (en) Bacterial plasmid lacking synthetic gene and CpG
US7033801B2 (en) Compositions and methods for rapidly generating recombinant nucleic acid molecules
JPH11507540A (en) Synthetic mammalian chromosomes and methods for construction
US20230323323A1 (en) Method for performing gene editing on target site in cell
WO2004029284A2 (en) Efficient generation of stable expression cell lines through the use of scorable homeostatic reporter genes
US7316923B1 (en) Compositions and methods for non-targeted activation of endogenous genes
US20110217779A1 (en) Compositions and Methods for Non-Targeted Activation of Endogenous Genes
MXPA01008415A (en) Compositions and methods for non-targeted activation of endogenous genes
TW202309034A (en) Inhibitors of dna-dependent protein kinase and compositions and uses thereof
ZA200106777B (en) Compositions and methods for non-targeted activation of endogeneous genes.
JP2004337066A (en) Vector for improving stability of gene transfected in mammalian cell and method
JP4119986B2 (en) Vectors and methods for improving the stability of genes transfected into mammalian cells
Rauth et al. Expression of DNA transferred into mammalian cells
CN114807155A (en) Compositions for gene editing and uses thereof
Schafer et al. Somatic cell hybrid approaches to genome
US20040265860A1 (en) Production of proteins using homologous recombination