[go: up one dir, main page]

HK1059281B - Method of producing functional protein domains - Google Patents

Method of producing functional protein domains Download PDF

Info

Publication number
HK1059281B
HK1059281B HK04101655.0A HK04101655A HK1059281B HK 1059281 B HK1059281 B HK 1059281B HK 04101655 A HK04101655 A HK 04101655A HK 1059281 B HK1059281 B HK 1059281B
Authority
HK
Hong Kong
Prior art keywords
exon
gene
sequence
fragment
protein
Prior art date
Application number
HK04101655.0A
Other languages
Chinese (zh)
Other versions
HK1059281A1 (en
Inventor
G‧德鲁卡
L‧法尔乔拉
Original Assignee
默克雪兰诺有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB0018876.3A external-priority patent/GB0018876D0/en
Application filed by 默克雪兰诺有限公司 filed Critical 默克雪兰诺有限公司
Publication of HK1059281A1 publication Critical patent/HK1059281A1/en
Publication of HK1059281B publication Critical patent/HK1059281B/en

Links

Description

Method for preparing functional protein domain
Technical Field
The present invention relates to a method for preparing a protein, in particular to a method for preparing a functional protein domain.
Background
The functionality of proteins is closely related to their structural properties, with different levels of structure: primary structure (corresponding to the amino acid sequence encoded by the gene), secondary structure (relative backbone positions preferred for residues arranged in sequential order), tertiary structure (relative positions of all atoms of the polypeptide chain), and quaternary structure (arrangement of different protein subunits into a complex, each arrangement corresponding to a different polypeptide chain).
In addition to these tissue levels, related polypeptides may correspond to protein domains, which are defined in two ways. Structurally, a domain is a polypeptide region that contains one or more secondary and/or tertiary structures and is tightly folded. Small proteins (less than 100 amino acids) usually consist of a single protein domain, whereas large proteins generally consist of multiple protein domains. In the three-dimensional structure of a protein, a protein domain is viewed as an independently folded polypeptide unit, distinct from other portions of the protein. In connection with this structural definition, a closer look at the protein domain finds it to be the smallest part of the protein's retained function, resulting from the interaction of the protein domain with one or more proteins, nucleic acids, carbohydrates, lipids, or any other organic or inorganic compounds. Functional domains are typically composed of 50-350 amino acids and may therefore include one or more domains.
In fact, a protein may comprise one or more functional protein domains, connected to each other in sequence, arranged in three dimensions. The main conclusion that a specific amino acid sequence can be converted into a class of domains by nuclear magnetic resonance or X-ray crystallography techniques, identifying the sequence homology of known functional protein domains and/or generating alternative forms of the original protein to be tested in biochemical or biological assays carried out in the study of so-called structure-activity or structure-function, is that the protein sequence is functionally related.
Many proteins share highly homologous protein functions and/or domains, possibly as a result of a process called "exon shuffling" (exon shuffling). According to this evolutionary theory, a number of genes can be generated through a series of replications, intron recombination, combinatorial assembly, and mutations of existing exons encoding a protein "template" (Patthy L, Gene, 1999; 238 (1): 103-.
The theory that protein evolution in eukaryotic cells is also regulated by sexual intron homologous recombination in the parental genome, generating new genes from exon combinations associated with specific protein domains, has now become widely accepted, also because a series of studies confirm the relationship between intron phase and position at the structural interface between protein domains (de Souza SJ et al, Proc Natl Acad Sci USA, 1996; 93 (25): 14632-6). In most cases, a protein template is associated with one or more exons, which are limited by introns with zero phase (introns do not break the codon) or with the same phase from beginning to end. In this way, exons encoding these "mobile" domains bind more readily to novel proteins composed of chimeras of these protein templates, without any reading frame problems (de SouzaSJ et al, Proc Natl Acad Sci USA, 1998; 95 (9): 5094-9; Kolkman JA and StemmerWP, Nat Biotechnol., 2001; 19 (5): 423-8).
However, it has also been found that the biological activity of a protein obtained when the functional protein domain is physically separated from the rest of the primary translational product, that is to say, directly translated into mRNA transcribed from a gene, is distinct from that of the complete protein pre-and post-sequence. Such functional protein domains are obtained in vivo after proteolytic cleavage by endopeptidases produced by the cell itself, which encodes the primary translational product, or by other cells, for example when the primary translational product is secreted or brought into contact with the cell membrane. With respect to The increasing characterization of these events, it is clear that The functional protein domains thus prepared may have important physiological activities (Kiessling LL and Gordon EJ, Chem biol., 1998; 5 (3): R49-R62; Halim NS, The Scientist, 2000; 1 (16); 20; Blobel CP, curr. Opin. cell biol., 2000; 12 (5): 606-.
Many commercially valuable eukaryotic proteins correspond to these functional protein domains, which are encoded in large part by the coding exon subunits that make up the complete gene.
An example is endostatin (endostatin), an endogenous inhibitor of angiogenesis and tumor growth, which corresponds to the C-terminal protease fragment of collagen XVIII 1 α, a collagen protein of the extracellular matrix. Endostatin is essentially encoded by 3 exons at the 3 'end of the collagen XVIII 1 alpha (COL18A1) gene, but it is only proteolytically released from the primary translational product collagen XVIII 1 alpha to exert its full function (O' Reilly MS et al, Cell, 1997; 88 (2): 277-285).
Another example is the cytokine (also known as TRANCE, RANKL, OPGL or ODF) associated with Tumor Necrosis Factor (TNF) and having an inducing activity, which is a type II transmembrane protein, present in a signaling pathway rapidly induced by an activated gene for triggering the development of osteoclasts. TRANCE, which is a primary translation product of the anchored membrane, is cleaved by metalloproteinase-degrading protein TNF-alpha converting enzyme (TACE), and the resulting soluble TRANCE is a fully functional protein that enables dendritic cells to survive and has osteoclastogenic (osteoclastogenic) activity. The sequence of this soluble protein corresponds to the protein encoded by the last 3 exons of the TRANCE gene (Lum L et al, J.biol.chem., 1999; 274 (19): 13613-8).
There are two main methods for the production of functional protein domains on an industrial scale, but each has drawbacks.
Attempts have been made to emulate nature by first preparing a primary translation product and then subjecting it to proteolytic processing to obtain the desired protein.
This technically requires the whole process to be carried out in recombinant DNA technology. Not only the primary translational product is expressed, but also the proteolytic enzyme specific for the desired functional protein domain is recognized and expressed in order to interact with the primary translational product in a cellular model or in vivo system, ensuring proper division before further processing.
The second method allows the preparation of an expression construct (expression construct) containing only the DNA coding sequences, which are functional protein domains isolated from the original mRNA or gene. Even this most commonly used technique requires a series of manipulations that can significantly delay the development of the recombinant product (Makrides SC, Protein expr. Purif.1999; 17 (2): 183-; Kaufman RJ, mol. Biotechnol., 2001; 16 (2): 151-60). The relevant coding sequence needs to be isolated from the complete cDNA sequence, which is modified so that it can be further subcloned into an expression vector containing all the transcriptional and translational regulatory elements required for proper expression in the host cell. The construct is then used to transform a host cell, and finally, the transformants are selected so that clones expressing the foreign protein are correctly isolated at high levels.
Isolation of the clones involved is time consuming, since in addition to the requirements listed above, ordinary expression vectors also recombine with the genomic sequence of the host cell. The expression construct remains extrachromosomal and is not stable enough to allow transient expression of the protein and is often insufficient for commercial scale production.
Therefore, recombination involving coding and non-coding foreign DNA and host cell genomic DNA is a prerequisite for the transfer of the expression construct into all cells which are produced by the later cycles of DNA replication and mitosis of the originally transformed cells. Since the expression vector has no specific characteristics that can promote complete integration of the foreign sequence with the host cell genome, this process is completely arbitrary and error-prone. The cell used for the non-homologous recombination event can therefore be any part of it, which has a low homology to any endogenous sequence, as is well known, non-homologous recombination occurs much more frequently than homologous recombination, often resulting in incomplete integration of the necessary exogenous sequence.
These problems lead to the development of gene enrichment and selection steps aimed at eliminating transformants which integrate the expression construct in an incomplete manner. Since some essential parts of the foreign sequence may be lost or altered during recombination, the relevant coding sequence may be mutated, truncated, or not expressed at all. In any case, regardless of the technique used to introduce the DNA and select the cells, most transformants cannot produce the desired protein.
Finally, the literature has established that the correct expression of recombinant proteins in eukaryotic cells depends on a number of factors associated with the particular host cell. Characteristics such as recombinant protein toxicity, mRNA processing and stability, and other post-translational events are strongly associated with the product itself, expression vector coding and non-coding sequences, and interaction of foreign sequences with the genomic background of the host cell. In fact, random insertion of a fully recombinant gene, which may contain many kilobases of DNA, can severely interfere with the genome of a host cell, compromising its stability and viability. Therefore, even if the exogenous sequences have been fully integrated, it is not possible to use several clones to produce proteins whenever these sequences are split into genomic sequences that are essential for cellular metabolism and/or replication. Because of the different selection pressures in cell culture, or their slow replication, these clones may be lost, making it difficult to efficiently obtain enough cells to stably express the desired protein.
In order to minimize the drawbacks of essentially unregulated integration of foreign genes (drawbacks), alternative strategies have been developed. They are mainly based on homologous recombination, a unique technique for inserting specific exogenous sequences into predetermined genomic sequences of mammalian cells. This technique has been characterized primarily by modifying genes or regulatory sequences in animal and cellular models to produce either discontinuous, non-functional or chimeric genes, as described in recent reviews (Muller U, Mech. Dev., 1999; 82 (1-2): 3-21; Sedivy JM and Dutriaux A, Trends Genet., 1999; 15 (3): 88-90). For example, various vectors and selectable marker genes are introduced into the genome of mouse Embryonic Stem (ES) cells in order to study the effects of genetic alterations on various phenotypic characteristics, such as hormone regulation, fertility, immune response, organ development, and the like.
The feasibility of producing recombinant proteins based on homologous recombination techniques has been demonstrated at the level of all primary translational products which can be obtained by regulating the entire endogenous gene with exogenous transcriptional regulatory sequences (WO 91/09955, WO 95/31560). Alternatively, once transfected into cells antisense oligonucleotides can be paired with endogenous mRNA that blocks ribosomes from passing through the entire coding sequence, truncated proteins can also be obtained (WO 97/23244). However, none of the documents mentions the use of homologous recombination for the selective expression of one or more exons of an endogenous gene of interest encoding a functional domain of a protein comprised in the primary translation product by integration of exogenous regulatory sequences.
Disclosure of Invention
It has been found that a functional protein domain can be prepared in an alternative way when it corresponds, for example, to the N-or C-terminus of a primary translational product encoded by at least the most 5 'or 3' exon, respectively, of an intron-containing gene.
Therefore, a first object of the present invention is to provide a method for producing a protein which is a functional protein domain corresponding to the N-or C-terminus of a primary translational product of a gene, wherein the biological activity of the protein is distinct from that of the primary translational product, said method comprising:
(i) growing a host cell transfected with a DNA construct comprising:
(a) a regulatory DNA sequence capable of initiating or terminating transcription and translation of the DNA encoding the protein; and
(b) a DNA targeting region (targeting region) comprising sequences homologous to the genomic region 5 'or 3', respectively, to the sequence encoding the protein, the construct being integrated in the host cell genomic DNA at a position determined by the DNA targeting segment such that expression of the protein is under the control of the regulatory DNA;
(ii) culturing the homologous recombinant cell; and
(iii) the functional protein domain is collected.
The present invention is useful for preparing proteins that form functional protein domains in cells in which one or more coding exons lack, at one end, regulatory DNA sequences that are essential for the expression and direct translation of the entire molecule, not for the primary translational product fragment, which are completely different in nature. The term protein includes short peptides, oligopeptides (e.g. comprising no more than 30 amino acid residues) and long peptides (comprising more than 30 amino acid residues). Peptides and polypeptides that can be prepared using the present invention include those made by proteolytic mutagenesis of the primary translation product.
The present invention also discloses a method for selectively expressing exons encoding functional protein domains whose sequences correspond to the C-or N-terminal sequences of the primary translational product of a gene, wherein the proteins encoded by said exons have a biological activity distinct from that of the primary translational product, said method comprising growing a host cell transfected with a DNA construct comprising:
(a) a regulatory DNA sequence capable of initiating the transcription and translation or of terminating the transcription and translation of an exon coding for a protein corresponding to a functional protein domain; and
(b) a DNA targeting region comprising sequences homologous to genomic regions 5 'or 3', respectively, of an exon encoding a protein, the construct being integrated in the host cell genomic DNA at a position determined by the DNA targeting segment such that expression of the exon is under the control of the regulatory DNA.
The host cell modified by the method of the invention selectively expresses the exonic sequences present in the genome coding for the functional protein domain as a new mRNA molecule which is directly translated into the functional protein domain without the need for proteolytic modification by any specific cell or tissue type.
In a preferred embodiment, the peptide or polypeptide forming the functional protein domain is encoded by at least the most 5 'or 3' exon of an intron-containing gene. Thus, in a first preferred embodiment, there is provided a method for producing a protein which is a functional protein domain corresponding to the C-terminus of the primary translational product of a target gene and encoded by at least the 3' -most exon of the target gene carrying an intron, said method comprising:
(i) growing a host cell transfected with a DNA construct operably linked to said exon following integration in the host cell genome by homologous recombination, said construct comprising:
(a) a DNA targeting region comprising a sequence homologous to a genomic region 5' of an exon encoding said protein;
(b) a transcription module comprising a DNA sequence capable of activating the transcription of a DNA coding for a functional protein domain;
(c) a translation module comprising a DNA sequence capable of initiating the translation of the functional protein domain; and optionally
(d) A splicing module comprising an unpaired 5 'splice donor site, complementary to an unpaired 3' splice acceptor site in the endogenous exon coding for the N-terminus of the functional protein domain, which allows the splicing of the primary transcript, with the result that the translation module is juxtaposed in-frame with the sequence coding for the functional protein domain;
(ii) culturing the homologous recombination cells; and
(iii) the functional protein domain is collected.
Accordingly, in this embodiment of the invention, the construct should provide exogenous sequences forming a regulatory unit enabling the correct transcription and translation of the exon(s) coding for the functional protein domain. The DNA targeting region is formed by sequences belonging to the genome 5 'of the sequence coding for the functional protein domain (e.g.the contiguous intron of the gene of interest located at its 5' end), alone or together with sequences belonging to the exon(s) coding for the functional protein domain. If it is desired to increase the length of the homologous regions, DNA sequences belonging to adjacent coding and/or non-coding sequences of the gene of interest (or finally adjacent genes) may also be included.
The methionine (ATG) codon necessary for the initiation of translation, which must be in the context of the appropriate Kozak sequence, can be provided by the intron 5' of the exon coding for the functional protein domain, by the exon containing the N-terminus of the functional protein domain, or by an exon operatively linked to the endogenous coding sequence by the addition of one or more natural or synthetic exons inserted between the exons coding for the functional protein domain, and by an exogenous sequence contained in the regulatory unit which initiates the transcription. If the translation initiation codon is endogenous, the transcription template will include a suitable 5' untranslated region.
The splicing module comprising the unpaired splice donor site is associated with natural or synthetic exons that are contiguous to the exon coding for the N-terminus of the functional protein domain. Once the methionine codon is in the same reading frame as the functional protein domain after splicing, the splicing module will make the methionine codon the N-terminal residue of the functional protein domain. In this example, the gene of interest encoding the primary translational product may have been expressed at a significant level, or else, the exogenous regulatory sequence should have a new transcriptional and translational start site 5' of the exon(s) encoding the functional protein domain.
FIGS. 1A-F show some examples of the practice of this embodiment of the invention.
In a second preferred embodiment, the present invention provides a method for producing a protein which is a functional protein domain corresponding to the N-terminus of a primary translational product of a target gene and encoded by at least the 5' -most exon of the target gene carrying an intron, said method comprising:
(i) growing a host cell transfected with a DNA construct operably linked to said exon following integration in the host cell genome by homologous recombination, said construct comprising:
(a) a DNA targeting region comprising a sequence homologous to the 3' end of the genomic region of the exon encoding the protein;
(b) a transcription template comprising a DNA sequence capable of terminating transcription of the genomic DNA;
(c) a translation module comprising a DNA sequence capable of terminating the translation of the functional protein domain; and optionally
(d) A splicing module comprising an unpaired 3 'splice acceptor site, complementary to an unpaired 5' splice donor site of the endogenous exon coding for the C-terminus of the functional protein domain, capable of splicing the primary transcript so as to juxtapose the translation module in frame with the sequence coding for the functional protein domain;
(ii) culturing the homologous recombinant cell; and
(iii) collecting the functional protein domain.
Accordingly, in this embodiment of the invention, the construct should provide exogenous sequences that terminate the correct transcription and translation of the exons encoding the proteins that form the functional protein domain. The DNA targeting region is formed by sequences belonging to the genomic region 3 ' of the sequence encoding the functional protein domain protein (e.g.the adjacent intron of the gene of interest at its 3 ' end), alone or in combination with sequences belonging to the exon encoding the C-terminus of the protein or the intron at its 3 ' end, or both. If it is desired to increase the length of the homologous regions, DNA sequences belonging to adjacent coding and/or non-coding sequences of the gene of interest (or finally adjacent genes) may also be included.
The codons required for termination of translation can be provided by the intron 3 'of the exon coding for the functional protein domain, or operatively linked to the endogenous coding sequence by the addition of natural or synthetic exons inserted between the exons coding for the functional protein domain, and by the introduction of exogenous sequences for termination of transcription by homologous recombination, e.g.translation termination codons which are endogenous, from which transcription templates include the appropriate 3' untranslated region.
The splicing module comprising unpaired splice acceptor sites is associated with natural or synthetic exons that are adjacent to the exon coding for the C-terminus of the functional protein domain. Once the stop codon is in the same reading frame as the functional protein domain after splicing, the splicing module will make the stop codon a functional protein domain stop codon. In this example, the gene of interest encoding the primary translational product must be expressed at a significant level, while the foreign sequence is only used to terminate transcription and translation at different locations.
Some examples of implementing this embodiment are shown in FIGS. 1G-L.
Another object of the invention is to provide constructs which, once the host cell has been correctly integrated in the genome by homologous recombination, allow the host cell to express a new mRNA comprising, among all the exons contained in the gene of interest coding for the primary translational product, only the exon coding for the functional protein domain.
The choice of the homologous recombination strategy influences the final structure of the gene of interest, since the regulatory unit can be inserted in the adjacent intron or become an alternative part of this intron and of all or part of the genomic region belonging to the same gene but not coding for the functional protein domain.
The regulatory unit is integrated by homologous recombination in the gene comprising the exon(s) coding for the functional protein domain, and it comprises exogenous sequences including a transcription module, a translation module and optionally a splicing module. Such exogenous sequences are used to provide the sequences necessary for binding to the endogenous sequences surrounding the functional protein domain. Therefore, there is a need for new recombinant genes in the genome of a host cell, comprising endogenous transcriptional and translational regulatory elements at one end, endogenous exons and introns associated with a functional protein domain in the middle, and exogenous transcriptional and translational regulatory elements at the other end.
The present invention provides a method for the preparation of functional protein domains, which are primary translational product fragments consisting of one or more distinct protein domains encoded by exon sequence-specific subunits. The method is based on the integration of a regulatory unit encoding a primary translational product into a eukaryotic gene at the level of an intronic genomic region, which is in turn immediately adjacent to an exonic sequence encoding a functional protein domain, by a single homologous recombination event. Depending on the position of the relevant exonic sequence in the gene of interest, these introns are immediately 5 'or 3' of the exonic sequence when the functional protein domain is at the C-or N-terminus, respectively, of the primary translational product.
The present invention has several important advantages over the prior art in this field. The functional protein domain is prepared by isolating the coding sequence of the functional protein domain, expressed under the control of regulatory sequences located at the ends, or of the primary translational product and of the specific protease producing the functional protein domain, both expressed, using methods known in the art.
The present invention provides a method for the preparation of functional protein domains, which are C-terminal or N-terminal fragments of primary translational products, corresponding to known exon/intron structures of genomic DNA. The method involves the integration of exogenous regulatory sequences in the 5 '(if the functional protein domain corresponds to a C-terminal fragment) or 3' (if the functional protein domain corresponds to an N-terminal fragment) of the exons coding for these protein domains of the genomic DNA. The method of the invention allows the production of recombinant genes which are directly transcribed and translated into a functional protein domain by the cell, in contrast to genes encoding primary translational products.
The method of the invention allows the production of functional protein domains by regulating the genome of the host cell on the basis of the nature of homologous recombination and making it modified accurately. The number of foreign sequences to be integrated within the host cell genome is very limited, since the original coding sequence present in the host cell genome itself serves as the coding sequence. Moreover, only those additional elements are actually required to be integrated, such as transcriptional and/or translational regulatory elements.
The use of host cell sequences encoding a functional protein domain has the further advantage of eliminating, on the one hand, any variations of these coding sequences deriving from recombination and, on the other hand, the same processes of transcription (for example splicing) and/or posttranslational (for example glycosylation, phosphorylation) which allow the actual maturation of the functional protein domain in vivo. The use of a single regulatory unit makes it unnecessary to manipulate the complementary DNA coding for the primary translational product, to isolate the fragment coding for the functional protein domain and to ligate it to an expression vector.
Finally, it has been demonstrated that genomic expression constructs (i.e.containing one or more synthetic and/or natural introns) are more efficiently expressed as a result of the splicing process than the same constructs lacking the intron (i.e.constructs typically prepared by techniques disclosed in the literature). The method of the invention avoids the introduction of foreign intron sequences by using introns which naturally disrupt the sequence encoding the functional protein domain.
Detailed description of the invention
In the following sections, the basic elements of the invention can be referred to as appropriate for the description of homologous recombination techniques (Muller U, Mech.Dev., 1999; 82 (1-2): 3-21; Hasty P et AL, in "Gene targeting: a practical laboratory", ed.Joyner AL, pub.Oxford Univ.Press, 1999; 1-35) and Protein expression techniques (Makrides SC., Protein Expr. purif.1999; 17 (2): 183-; 202; Kaufman RJ, Mol Biotechnol, 2001; 16 (2): 151-; 160), etc.
Functional protein domain
The expression "functional protein domain" (FPD) refers to a protein fragment of the primary translational product of a gene with biological function. The functional protein domain may consist of one or more protein domains (identical or different from each other), should contain all the biological characteristics necessary for proper folding as an independent folding unit, and has a predetermined biological activity.
In the context of cellular genomic DNA, a functional protein domain is encoded by a portion of the entire coding region of a gene, while the primary translational product is encoded by the entire gene coding sequence. The genomic sequence encoding the functional protein domain is deleted at one or both ends of appropriately placed regulatory sequences, which are recognized by the expression apparatus (machinery) of the cell to generate the primary transcription and translation products. As a result, the functional protein domain corresponds to a part of the translation region contained in the mRNA transcribed by the cell, and the active regulatory sequences at both ends are required for the alternative splicing event, but such a functional protein domain cannot be obtained by the alternative splicing event.
Therefore, a complete and functional reading frame is subject to transcriptional and translational regulatory sequences at both ends, and genes with such a reading frame or mRNAs encoding functional protein domains are not present, since there is no genomic DNA directly transcribed and translated into the protein corresponding to the functional protein domain. Starting from a gene contained in its own genome, a cell can prepare a functional protein domain in which the coding sequence for such a protein is embedded only after specific proteolytic modification after transcription and translation of the gene into a primary translation product.
In the sense of the present invention, proteolysis, which is derived from a functional protein domain, is not a simple determination of the position of the protein, as when the signal peptide is recognized and eliminated extracellularly or intraproteolytically. Proteolysis, derived from a functional protein domain, is more used to separate the functional protein domain from the primary translational product, in order to perform one or more biological activities whose physiological effects are distinct from those of the primary translational product. Similar proteolytic activities are not expressed constitutively in any cell, e.g., enzymes that eliminate signal peptides, and are expressed specifically only in certain cell types or only under certain metabolic conditions.
Given that the function of a protein sequence arises as a result of interaction with other proteins, nucleotides, lipids, sugars or any other organic or inorganic ligand, in the sense of the present invention a functional protein domain is defined as having unique properties related to the nature of the interaction, called three basic groups.
The unique effect of the first basic group of the functional protein domain is based on the fact that: the functional protein domain, isolated or in the context of the primary translational product, retains the same interaction properties due to the specificity and affinity of the ligands, but when isolated from the remainder of the primary translational product, it does not allow the cell or organism to recognize the presence of these ligands. For example, the extracellular binding domain of a membrane receptor, when proteolytically separated from the transmembrane and intracellular portions of the receptor by extracellular proteases, subtracts the ligand of the membrane receptor, blocking the intracellular signaling pathway for subsequent activation.
Release of extracellular domains is a functional protein domain in the sense of the present invention, since they have the effect of trapping ligands and silencing signal channels, e.g.when they are linked to the rest of the primary translational product, preventing specific cellular responses, without signaling the presence of ligands to the cell. These functional protein domains, commonly known as decoy receptors, play an important physiological role due to their ability to fine-tune the organism, e.g. the circulating chemokine and cytokine effects (Mantovani A et al, Trends Immunol., 2001; 22 (6): 328-336).
The unique effect of the second basic group of the functional protein domain is based on the fact that: when the functional protein domain is separated from the primary translational product context, the unexpected physiological effect is determined by high affinity recognition of ligands that do not or weakly bind to the primary translational product. Functional protein domains with important physiological effects, such as proteolytic fragments of extracellular structural matrix proteins with strong anti-angiogenic properties, are increasingly described in the literature (Cao Y, int.J.biochem.cell.biol., 2001; 33 (4): 357-69), which indicates that proteins also function as reservoirs of functional protein domains, which are hidden in the primary translation products until proteolysis, separated into functional forms, in the framework of specific physiological mechanisms.
The third basic group of a functional protein domain is represented by the protein released proteolytically from the inactive precursor protein. Many signaling and secretory proteins can only exert their physiological functions after being separated from primary translation products, such as pro-inflammatory cytokines (Dinarello CA, Chest, 2000; 118 (2): 503-508).
Different methods can be used to identify the functional protein domain. It is common practice to identify functional Protein domains by contacting the Protein with a series of proteases of different specificities and testing the activity of the resulting fragments (Carrey E., in "Protein Struture: a practical application", ed. Creighton T., Oxford Univ. Press, pp. 117-144, (1989)). This method has the advantage of being fast, simple and sensitive, but it is limited by the amount of intact native protein to be digested, the specificity of the protease, and the sensitivity of the test used to confirm that the proteolytic fragment is a functional protein domain.
Moreover, advances in the field of protein isolation and sequencing, as well as advances in bioinformatics, have enabled the analysis of many protein samples in parallel, with identity and quantitative information available even for relatively few molecular species (Lottspeich F., Angew. chem. int. Ed. Engl., 1999; 38 (17): 2476-. For example, it is now possible to identify and isolate most, if not all, proteins present in protein mixtures that have not even been subjected to preliminary fractionation (fraofication), which are not fragmented, or obtained after controlled digestion in biological samples with proteolytic enzymes specific for all proteins (Spahr CS et al, Proteomics, 1 (1): pages 93-107, 2001). The novel functional protein domain can be identified by combining protein digestion, detection, isolation, sequence comparison, and analysis based on high flux assays of appropriate cell biology or biochemistry (Kuhlmann J, int.J.Clin.Pharm.and Ther., 1997; 35 (12): 541) 552).
It has been found that the proteolytic activity exerted by proteases such as metalloproteases (Raza SL Cornelius LA, J.Investig.Dermatol.Symp.Proc., 2000; 5 (1): 47-54), cysteine proteases (caspases) (Los M et al, Trends Immunol., 2001; 22 (1): 31-4), etc., can precisely modulate important physiological activities, much more than just simple protein degradation. Moreover, since proteases account for 1.5-2% of the total amount of human genes (Southan C, J.Pept Sci., 2000; 6 (9): 453-8), it is increasingly easy to identify the proteolytic cleavage site (clavage site) motif associated with specific proteins (TurkBE, nat. Biotech., 2001; 19 (7): 661-7), and the number of proteases and protein species that can be tested is increasing, it is possible to identify functional protein domains for therapeutic and commercial use.
Once a functional protein has been identified as the N-or C-terminus of the primary translational product and the relationship between the organization of the protein domains and the organization of the exons/introns of the corresponding genes has been found, constructs with regulatory units and appropriate targeting regions can be used to generate cells expressing these protein species according to the method of the invention.
The choice of the exon(s) expressing the functional protein domain is influenced by the evidence provided by analyzing samples of broken or unbroken protein and by combining the calculated three-dimensional structure of the protein, the homology with other known functional protein domains, the mutagenesis and structure-function studies, or other modeling and computer modeling information.
Even if the functional protein domain does not correspond perfectly to a discrete number of exons when it is initially characterized in vitro and in vivo, the structure and/or homology of the protein sequence itself is such that the main elements of the functional protein domain are identified as being encoded by specific N-or C-terminal exons. The N-or C-terminal residues of the functional protein domain prepared according to the method of the invention do not necessarily correspond to those of the functional protein domain initially identified in vitro or in vivo, but need to have a comparable activity.
Further confirmation and characterization that the proteolytic fragment is a functional protein domain can be performed by other functional screening methods, including molecular biology techniques such as random PCR/deletion mutagenesis, proteolytic cleavage site mapping, phage display, two-hybrid systems, etc. (WO 96/31625; WO 90/04788; Parry S et al, Biochem Biophys Res Commun, 2001; 283 (3): 715-20; Kawasaki M, and Inagaki F, Biochem Biophys Res Commun, 2001; 280 (3): 842-4), or including non-homology based methods (Marcotte EM, Curr Opin Struct biol, 2000; 10 (3): 359-65; WO 00/11206). Prior to the large scale production of functional domains using the methods of the invention, small quantities of fragments of varying length belonging to the primary translational product are prepared using conventional expression techniques and tested in an assay to identify the minimal or sufficient protein sequence corresponding to the functional protein domain.
Finally, the Internet provides more and more algorithms and software (Teichmann SA et al, curr. Opin. Structure biol., 2001; 11 (3): 354-63; Skolnick J and Fetrow JS, Trends Biotechnol., 2000; 18 (1): 34), which can be used separately or in combination, and compare protein or DNA sequences of unknown function (such as protein or DNA sequences obtained by translation of ESTs or output obtained by genome sequencing programs) with the aid of protein or DNA sequence databases of known structure and/or biological activity. This method of in silico analysis of data in silico allows better approximation of the position and function of protein domains contained within proteins of unknown function or into genes that have not yet been cloned. Therefore, these bioinformatic tools facilitate the identification of functional protein domains expressed by the method of the invention.
Among commercially valuable functional protein domains, preferred groups are those with therapeutic efficacy. Most therapeutically effective proteins can be divided into three classes: regulatory factors (including hormones, cytokines, lymphokines, chemokines, receptors, and other regulators of cell growth and metabolism), blood products (including serum-derived blood factors and the enzyme fibrinogen-activating factor), and monoclonal antibodies. The primary translational products containing the functional protein domain are encoded by genes belonging to these three classes, these genes being used as the target genes, and the functional protein domain being prepared by the method of the invention using the endogenous exon.
However, the scientific literature referred to in the background and examples of the present invention shows that functional protein domains with therapeutic efficacy are identified as primary translational products which were not initially classified as belonging to these three groups (e.g.membrane-bound proteins, transmembrane proteins, enzymes, extracellular matrix proteins, intracellular signals, structural proteins and nuclear proteins), but also include functional protein domains with therapeutic efficacy. Therefore, the genes corresponding to these primary translational products can also be used as target genes for the production of functional protein domains by the method of the invention using endogenous exons.
Host cell
The invention is generally applicable to protein sequences of eukaryotic origin, since essentially only eukaryotic genes have introns. For this purpose, the host cell is typically a eukaryotic cell, such as a mammalian cell, although other eukaryotic cells derived from plants, insects, yeasts, fungi, and the like may also be used. Assuming that most of the functional protein domains of interest belong to the human protein, it is preferred to use human host cells.
Any eukaryotic cell may be used which comprises at least one copy of a gene encoding a primary translational product comprising a functional protein domain of interest and which carries an intron, but preferably eukaryotic cells are differentiated and/or immortalized (immunized) immune mammalian cells, particularly cells derived from humans, monkeys and rodents, such as SV 40-transformed african green monkey Kidney CV1 cells (colloquially referred to as COS cells), chinese hamster ovary Cells (CHO), Human Embryonic Kidney (HEK) -293, baby hamster Kidney cells (BHK), Canine Kidney epithelial cells (Madin-Darby Canine Kidney cells, MDCK), and other stem cells, differentiated or undifferentiated eukaryotic cell lines, all of which have exons encoding a functional protein domain of interest.
Prior to using the methods of the invention, these cells have been modified at the genomic level by homologous or non-homologous recombination integration with other viral or non-viral constructs, which may alter the expression and/or the construct of the gene of interest or of other genes. For transformed or immortalized cell lines with more than two copies of the gene of interest, the regulatory unit can be inserted at one or more of the possible sites by homologous recombination which is carried out in succession.
The choice of the host cell must also take into account exogenous regulatory units integrated 5 'or 3' to the relevant exon(s) in order to render these sequences fully active, structurally or by inducible means. For example, the method of the invention can be used for somatic hybridization of immortalized cells, which is useful for the expression of specific functional protein domains, in particular functional peptides or polypeptides derived from immunoglobulins, or any other functional protein domain whose transcription is regulated by immunoglobulin-specific promoter and/or enhancer elements using the method of the invention.
Since the primary cells are modified by homologous recombination in culture medium with a comparable frequency of immortalized cell lines (Hatada S et al, Proc Natl Acad Sci USA, 2000; 97 (25): 13807-11), the method of the invention is also applicable to primary cells whenever the production of the functional protein domain has to be carried out in a similar cellular environment, for example for gene therapy purposes. Obviously, the functional protein domain derived from human is prepared and the host cell is a human cell.
Another criterion to be applied when selecting a host cell type for use in the method of the invention is both the inherent capacity of the host cell type to undergo homologous recombination, and the actual transcriptional state of the gene of interest. Preliminary evaluation of these characteristics of candidate cell types facilitates the selection of cell types, which allows a faster and more direct isolation of the clones producing the desired functional protein domain.
The tests compared the homologous recombination frequencies of primate and mouse fibroblast cell lines and found that they differ significantly (Taghian DG and Nickoloff JA, mol. cell. biol., 1997; 17 (11): 6386-93). Similar tests using standard homologous recombination vectors based on nuclear extraction and/or transformation facilitate the quantitative determination of the recombination activity of specific cell types.
The effect of gene targeting at different loci in cultured human cells is evaluated by comparing gene targeting effects at different loci in the presence and absence of a medium that stimulates transcription at a targeted site (Thyagarajan B et al, Nucleic Acids Res., 1995; 23 (14): 2784-90). In general, gene targeting is significantly enhanced by transcription at the targeting site.
Finally, the host cell should express the enzyme modifying the functional protein domain after translation, rather than the enzyme required for the production of the functional protein domain from the primary translational product of the gene of interest. For example, if the functional protein domain corresponds to the N-terminus of the secreted primary translational product, the host cell should allow signal peptide processing. In other particular cases, the host cell should allow the correct glycosylation or phosphorylation of the functional protein domain. The method of the invention allows the cell type to produce clones producing the functional protein domain more efficiently if the cell type expresses the gene of interest under cell culture conditions, but does not express the proteolytic activity required for the production of the functional protein domain, and/or does not express this gene at a level sufficient for commercial exploitation.
Transcription templates
The transcription module is the first DNA sequence contained in the exogenous regulatory unit, which provides the transcription regulatory elements deleted at one end of the targeted exon(s) to obtain a primary transcription, which, among all the exons belonging to the gene of interest, comprises only those exons coding for the functional protein domain. The transcription module is integrated either 5 'to the longer 5' exon (in case of the functional protein domain corresponding to the C-terminus of the primary translational product, FIGS. 1A-F) encoding the functional protein domain, or 3 'to the longer 3' exon (in case of the functional protein domain corresponding to the N-terminus of the primary translational product, FIGS. 1G-L).
Depending on the position and the sequence of the functional protein domain, various DNA sequences can be contained in the transcription template to be integrated in the gene of interest, in the same orientation, including: promoters, enhancers, recognition sites for transcription factors, polyadenylation sites, and any other DNA sequence capable of transcribing regulatory DNA into an mRNA template, including sequences contained within the intron(s) ultimately added to the construct.
A functional promoter is defined as the site of transcription of a gene at the beginning of the RNA polymerase II protein complex. Enhancers and other transcription factor recognition sites interact with accessory proteins, causing the aggregation of active transcription complexes, which can increase promoter activity. When the functional protein domain is located at the C-terminus of the primary translational product, the promoter, with or without the enhancer, is an essential element for insertion into the DNA construct.
It is well known that it is preferable to use a combination of a promoter and an enhancer to structurally or after induction initiate gene expression in a host cell line in which homologous recombination is performed. For example, if the host cell line comprises pituitary cells that naturally express proteins such as growth hormone and prolactin, the promoter of one of the genes may be used. In addition, promiscuous or constitutive DNA regulatory fragments which function in most cell types, such as Rous Sarcoma Virus (RSV), Simian Virus 40(Simian Virus 40, SV40), Mouse papilloma Virus (MMTV), Moloney Murine leukemia Virus (Moloney Murine Leucaemia Virus, MoMLV), Cytomegalovirus (CMV), Sendai Virus (Sindbis, SG), are also suitable. Examples of other promoters are promoters regulating the transcription of human genes, such as interferon-alpha (IFN-alpha), Heat Shock Proteins (HSP), elongation factor-1. alpha (EF-1. alpha), metallothionein-I/-II (MT-I/-II), the protein C (UbC), the leukocyte sialic acid protein (LS). These latter promoters are particularly useful when the host cell, such as a T cell, for which the LS promoter is active, is highly differentiated.
Since a functional protein domain may for any reason be toxic and/or inhibit the growth of the host cell, there is an urgent need to prepare inducible promoters for functional protein domains. Examples of inducible promoters are metallothionein-I/-II (which contains multiple metal response elements activated in the presence of heavy metals) and Lac (a bacterial operator-repressor system induced by isopropyl thiogalactoside (IPTG), suitable for use in mammalian cells).
When the functional protein domain is located at the N-terminus of the primary translational product, the transcription module should contain sequences that allow the correct termination and modification of the mRNA at its 3' end, a complex process involving primary transcription and coupled polyadenylation reactions. Most mammalian mrnas have a polyadenylation sequence which is critical to mRNA stability and translation efficiency. The polyadenylation signal consists of an AATAAA sequence located 20-30 nucleotides upstream of the polyadenylation site, a GT-rich fragment located immediately downstream of the polyadenylation site. Some of the potent poly (a) signals isolated from eukaryotic (bovine growth hormone, mouse beta globin) or viral genes (SV40 early transcription unit, herpes simplex virus thymidine kinase) have been used in expression vectors and also in regulatory units.
Obviously, since the level of gene expression depends mainly on the promoter and other transcriptional regulatory regions of the gene 5', the gene of interest modified with a construct containing transcriptional and translational termination sites must have been sufficiently expressed by the host cell. In such a case, the host cells to be selected must be those which already strongly express the gene encoding the primary translational product.
An additional sequence, called a transcription terminator, is associated with the poly A signal, which can be present in the transcription template to ensure that the transcription does not pass to the 3' adjacent genomic sequence unrelated to the functional protein domain. This event may lead to two possible outcomes: the introduction of sequences, which are not necessary for the primary transcription, which reduce or alter the translation of the desired functional protein domain; and inhibiting the activity of a downstream promoter, and can regulate genes which are important for the replication or metabolism of the host cell. A clear consensus sequence is not available even when several mRNAs are analyzed, but some of these sequences have been clearly characterized in the literature (Petitclerc D et al, J.Biotechnol., 1995; 40 (3): 169-78).
The DNA construct may also include other DNA sequences that affect transcription. For example, a DNA sequence called chromatin opening domain (UCOE) inserted proximal to a gene either separates it from putative negative regulatory sequences near the gene of interest or forces adjacent chromatin domains to open, can result in poor expression or better expression of a silenced gene. Some of these elements have been reported to increase gene expression of heterologous (hetereloguos) promoters in tissue-dependent or tissue-independent manner in transgenic mice or cultured cell lines (WO 00/05393).
Translation template
The translation module is a second DNA sequence contained in an exogenous regulatory unit which provides in the correct reading frame a translational regulatory element which is deleted at one end of the targeted exon(s) to allow the correct and efficient translation of the primary transcript into a functional protein domain. The translation module should be integrated between the transcription module and the longer 5 'exon (if the functional protein domain corresponding to the C-terminus of the primary translational product) or between the transcription module and the longer 3' exon (if the functional protein domain corresponding to the N-terminus of the primary translational product) coding for the functional protein domain in the same orientation.
Depending on the position and the sequence of the functional protein domain, the translation module can comprise various DNA sequences: a translation initiation codon (which together with the surrounding nucleotide context forms a Kozak sequence), a translation termination codon, a 5 '-/3' untranslated region, and any other DNA sequence capable of regulating translation of mRNA into protein.
The translation initiation codon is usually an ATG (coding for methionine) and must be introduced into the translation module whenever the functional protein domain is located C-terminally of the primary translational product and the endogenous sequence (intron or exon) does not contain an ATG at the appropriate site for the correct translation of the functional protein domain. In such cases, the translation template should contain an exogenous ATG codon embedded in a sequence contained within a set of consensus sequences defined by Kozak sequences to obtain optimal translation initiation efficiency. These consensus sequences (CC (A/G) C) were found from analysis of translation initiation sequences of several hundred mRNAsCATGG) But not all nucleotides are equally important: one or more cytosines may be substituted by another nucleotide, but the purine (A/G) must be retained (Kozak M, Gene, 1999; 234 (2): 187-208).
The 5 ' untranslated region (5 ' UTR) is related to the translation initiation codon because it is a sequence that belongs to the 5 ' region of the primary transcript of the translation initiation codon. Physically, it is a sequence between the transcription initiation site (usually located 20-30 nucleotides downstream of the promoter, constructed from a G nucleotide modified by a "capping" enzyme) and the translation initiation codon. Depending on the sequence of the targeted intron by homologous recombination and the adjacent exon coding for the functional protein domain, this sequence can be constructed either entirely (if the ATG is also introduced by homologous recombination) or only partially from the foreign sequence. The specific length of the 5' untranslated region or the consensus sequence has not been described in the literature, but it should not exceed 100 and 200 nucleotides in order to minimize the interference with the correct and efficient translation of the functional protein domain. Furthermore, it should not carry additional ATGs or other secondary structural sequences (e.g., GC-rich regions) that can pair and generate during translation that can delay or terminate ribosome processing. In some cases, for example, if the 5 'untranslated region is particularly long, an Internal Ribosome Entry Site (IRES), or a translation enhancing element may be incorporated into the 5' untranslated region to facilitate primary transcription and ribosomal protein interactions, increasing translation efficiency, as different mRNAs have been shown in some mammalian cell types (Liu X et al, Anal biochem., 2000; 280(1) 20-8).
When the functional protein domain is located at the N-terminus of the primary translational product, the translation module should include sequences which allow the correct termination of translation, such as a stop codon and a 3 'untranslated region (3' UTR). As mentioned above, depending on the sequence of the targeted intron of the homologous recombination and of the adjacent exon coding for the functional protein domain, these sequences can be constructed from all or only part of the exogenous sequence, but in most cases the translation module will provide both elements. Unlike the constructed stop codons (TGA, TAA, TAG), the sequence surrounding this triplet has some effect on the efficiency of translation termination. For example, if the nucleotide immediately following the stop codon is A or G, then termination will be more efficient.
As described for the 5 'untranslated region, the 3' untranslated region also includes the primary transcribed segment between the translation stop codon and the polyadenylation site, nor is there any reference to a particular length or consensus sequence thereof. However, to obtain the highest transcription efficiency, it should contain a destabilizing region, such as an AT-rich sequence.
The reading frame of the translation initiation or termination codon must be identical to the reading frame of the functional protein domain coding sequence. Integration of an appropriate transcription module with a translation module comprising only an appropriate untranslated region allows the direct correct expression of a functional protein domain, if an intron comprising a trinucleotide corresponding to a stop or start codon in the correct reading frame and at a position where a number of amino acids compatible with the activity of the functional protein domain are added is close to the exon coding for the functional protein domain. In this way, the trinucleotides, the intron sequences comprised between them and the proximal ends of the adjacent exons are fully functional by being part of the exon.
The resulting peptide sequence fused to the N-or C-terminus of a functional protein should not interfere with the functionality of these protein sequences and, if it does, should be easier to remove or inactivate during purification. For the functional protein domain corresponding to the C-terminus of the primary translational product, the same approach can be used to limit the N-terminus if the longer 5' exon contains an ATG, which usually encodes an internal methionine, in frame with the functional protein domain. The integration of a transcription module which activates the transcription of the ATG 5' should allow the correct transcription and translation of the functional protein domain.
However, in general, since trinucleotides are removed during splicing (if included in an intron), they are inactivated as translation initiation or termination sites, or advanced by other translation initiation sites (if included in an exon), and therefore, trinucleotides are present only in a small number of genes. Thus, translation initiation or termination codons should be included in synthetic or natural exons as part of the translation template. The template may include one or more exogenous exons (separated by natural or synthetic intron sequences) that may additionally encode protein sequences that are homologous or heterologous to the protein sequences contained in the primary translation product encoded by the gene of interest.
However, as in the case described above, the protein sequence fused to the functional protein should not interfere with the functional protein domain and, if they do, should be eventually removed or inactivated more easily by the enzymatic activity produced by the host cell itself. For example, if the functional protein domain is a C-terminal fragment of the primary translational product, the exon(s) may encode a signal peptide (or the N-terminus of the signal peptide) which, once placed in the correct reading frame between the translation initiation site and the endogenous exonic sequence, allows the secretion of the functional protein domain by the culture medium.
Furthermore, the additional protein sequence may be a simple spacer or linker peptide, which is finally advantageous for the purification and/or collection of the functional protein domain. The additional protein sequence may also encode a recognition site for a proteolytic enzyme, so that if the functional protein domain is affinity purified using the additional sequence immobilized on a substrate on a carrier, the additional protein sequence can be removed later using commercially available proteases.
Splicing template
As mentioned above, if the endogenous sequence belonging to the gene of interest does not comprise a translation stop or start codon in the appropriate position to be used (i.e.in the vicinity of or inside the codon encoding the functional protein domain and in the same reading frame as the functional protein domain), the construct used for homologous recombination will have one of the stop or start codons in the exogenous, natural or synthetic exon.
However, an exogenous exon belonging to the translation template must be in frame with an endogenous exon. This alignment can be achieved either by using splice sites complementary to the more proximal exon (FIG. A, D, G, J) splice sites or by selecting targeting sequences in which the exons are fused precisely to the more proximal exon (FIG. B, E, H, K).
Thus, when the exogenous exon contains a translation initiation codon, the splicing module is a 5 ' donor splice site located at the 3 ' end of the translation module (FIG. 1A, D), complementary to the 3 ' splice acceptor site, which is associated with the proximal endogenous exon coding for the functional protein domain. Furthermore, when the synthetic exon contains a translation stop codon, the splicing module is a 3 ' acceptor splicing site located 5 ' to the translation module (FIG. 1G, J), complementary to a 5 ' splice donor site associated with the proximal endogenous exon coding for the functional protein domain.
Thus, the regulatory unit may comprise a splicing module which allows the transcription of the functional protein domain to take place whenever the remaining intronic sequences separate the translation module from the exon(s) coding for the functional protein domain, following the integration of the regulatory unit. Endogenous splicing sequences, which normally promote the fusion of an exon different from that coding for the functional protein domain with an exon regulated by an exogenous regulatory unit, render the gene of interest incapable of reconstituting splicing in the host cell. This is based on the fact that: the regulatory unit replaces these splice sequences and/or is remotely located and does not exert its activity efficiently, and thus remains unpaired.
The 5 '/3' splice sites are essential elements of gene expression because they enable proper splicing of the primary transcript produced by an intron-disrupted gene. These sequences are well conserved in many vertebrate genes, particularly at the 5 'and 3' ends of introns. Most of the 5' splice donor sites in higher eukaryotes conform to the consensus AG][GTRAGT, wherein AG is a dinucleotide motif conserved at the 3' end of an exon,][ denotes a splice site, and represents,GTis a dinucleotide motif highly conserved at the 5' end of introns, R is a purine. The 3' splice acceptor site is substantially defined by the consensus sequence YAG][ G composition, in which AG is a dinucleotide motif (generally preceded by a pyrimidine (Y)) highly conserved at the 3' -end of an intron,][ denotes a splice site, and G is a nucleotide conserved at the 5' end of the exon.
Unlike the appropriate consensus splice sequence, the splice template is followed by an intron sequence from the adjacent targeting region and a natural or synthetic intron sequence added to the construct between the splice site and the targeting region. For example, these intron sequences may contain another sequence element, called a branching site, which is typically located 18-40 nucleotides upstream of the 3' splice site. This site displays the sequence CTRACT, wherein N is any nucleotide,Ais a nucleotide with an-OH group that interacts with the phosphoester group of the G nucleotide at the 5' end of the intron during the catalytic step of mRNA splicing. Consensus sequences, mechanisms, factors and other regulatory sequences involved in mammalian pre-mRNA splicing, such as exon splicing enhancers have been reviewed (Long M et al, Proc. Natl Acad. Sci USA, 1998; 95 (1): 219-.
Progress in the study of the splicing machinery, even though there is no more precise definition of the universal consensus sequence, has led to the conclusion that the selection of a specific splicing template associated with the cell type to be modified by homologous recombination is possible. For example, a series of sequence features and motifs have been found to be common to brain-specific splicing genes, suggesting that cell-type specific splicing can be modulated by selection of these sequences (Brudno M et al, nucleic acids Res., 2001; 29 (11): 2338-48).
The splicing module is separated from the endogenous splicing regulatory elements present in the target gene proximal to the adjacent exon(s) coding for the functional protein domain, by an intron comprising sequences which are completely endogenous, completely exogenous or hybridized endogenous and exogenous. Since this sequence is eventually cleaved, it is not important whether the sequence is precise, as long as it does not adversely interfere with the correct expression of the desired functional protein domain.
DNA targeting sequences
Since the DNA targeting sequence is responsible for the correct integration and localization of foreign sequences (regulatory units, positive marker genes) in the genome of the cell, it is an indispensable element of the construct. These sequences are typically cloned from genomic DNA or amplified by PCR and are functionally determined as molecular processes with a level of homology to endogenous DNA sufficient to cause homologous recombination (strand pairing and displacement) in specific genomic regions.
The DNA targeting sequence may consist of a single DNA fragment or be split into two DNA fragments, which are separated in the construct by the foreign sequence of the corresponding regulatory unit and finally by the positive marker gene. In order to improve the efficiency and accuracy of integration, two-stage oriented segments are preferred, and the present invention also encompasses the use of single-stage oriented segments. The simplest form of DNA for use in the present invention is a circular fragment, which contains regulatory units along the targeting segment. Thus, the homologously targeted fragment hybridizes to its genomic counterpart and the regulatory unit is inserted into the gene of interest after the crossover.
Although the shorter the homologous regions, the lower their probability of finding a suitable homologous region and recombining at the desired site, what is critical is not the size (length) of each fragment oriented (i.e., the homologous region). Thus, the shorter the homologous region, the less efficient the homologous recombination, i.e., the smaller the percentage of successfully recombined clones. It is suggested that the minimum requirement for sequence homology is 25 base pairs (Ayares D et al, Proc Natl Acad Sci USA, 1986; 83 (14): 5199-. The best results are obtained when the entire homologous region comprises two targeting regions which are larger, e.g.one to five kilobases or more. The size (length) of the targeting segment is not limited as long as the regulatory unit can be introduced at an appropriate site in the genome if it does not affect the stability of the vector.
In many cases, the DNA targeting sequence comprises the entire sequence or a fragment of the intron 5 '(if the functional protein domain is located C-terminally of the primary translational product) or 3' (if the functional protein domain is located N-terminally of the primary translational product) of the exon(s) coding for the functional protein domain (FIG. 1A, D, G, J). The targeting region includes, in addition to this specific intron, other parts of the gene of interest, including the exon(s) coding for the functional protein domain (FIG. 1B, E, H, K). In other cases, the intron and exon proximal to the exon coding for the functional protein domain can be more or less almost completely removed or replaced with exogenous regulatory units using sequences located in adjacent genes (FIG. 1C, F, I, L). If the DNA targeting sequence consists of a single DNA fragment, it should be homologous to a gene fragment of interest contained only in the part of the gene of interest comprising the exon coding for the functional protein domain, to another gene fragment of interest contained only in the part of the gene of interest comprising the intron proximal to the exon coding for the functional protein domain, or to a gene fragment of interest comprising the sequences belonging to these two regions. In any case, the integration of the construct in the genomic DNA of the host cell by homologous recombination at the site defined by the DNA targeting segment allows the expression of the functional protein domain under the control of the regulatory unit.
Such targeting strategies are possible as long as the integration of the construct does not modify the order (order) and/or the sequence of the genomic elements encoding the functional protein domain or of other genomic sequences required for the survival or metabolism of the cell. If the primary translational product of the gene of interest itself is essential for cell survival or metabolism, these gene copies not modified by homologous recombination usually have an expression level sufficient to maintain the metabolic activity of the cell.
Knowing the complete gene sequence and structure from the promoter to the polyadenylation site will help to choose a more suitable targeting strategy, but it is possible to generate constructs starting only from the sequence and exon and intron structures associated with the functional protein domain. This is particularly true for human genes, since genomic clones containing only appropriate candidate sequences at the 5 'or 3' end, particularly when compared to corresponding mouse genes that have been sequenced and characterized. Based on homology to other genes in the same or other organisms and in silico predictions (RogicS et al, Genome Res., 2001; 11 (5): 817-32), the relevant exon/intron junctions are defined on the genomic clone together with a peripheral sequence (hybridizing sequence) limited to one end of the functional protein domain. The cloned fragments of the gene can be selected because the length of the genomic fragment is sufficient to generate the targeting sequences required to direct the regulatory unit at the correct location in the genome.
The DNA targeting sequence may comprise sequences homologous to adjacent or non-adjacent genes of interest in the targeted genome. In a first example, the exogenous sequence is simply inserted inside or at one end of an intron proximal to the exon coding for the functional protein domain (FIG. 1A, B, D, E, G, H, J, K). In a second example, part of the gene of interest, which is used to isolate non-contiguous sequences proximal to the exon(s) coding for the functional protein domain (exons and/or introns), is deleted or replaced by exogenous sequences due to homologous recombination (FIG. 1C, F, I, L). In both vectors homologous recombination is driven by a new foreign DNA which can be directly aligned with the homologous sequence in the target gene encoding the functional protein domain (aligned). The specific linear arrangement of the homologous sequences of the foreign DNA will determine the position and orientation of the regulatory unit at the level of the intron region proximal to the exon(s) coding for the functional protein domain.
Whenever the targeting fragments are derived by PCR techniques and/or from cells of unequal genes (isogenic) to the host cell, and the constructs prepared are used to identify any sequence differences (due to their origin in the targeting region) that can significantly alter the expression of the predetermined sequence and functional protein domain, these targeting fragments are ordered.
When the targeted structure type selected is used to determine the genomic modification nature of the gene of interest, resulting in a recombinant gene encoding a functional protein domain, the actual efficiency of the generation of a targeted cell line with the predetermined construct is largely dependent on the targeted construct.
In particular, it has been found that the absolute targeting frequency achieved with the construct depends on a number of factors, including the length of the homologous sequence in the targeting construct, the degree of homology between the sequence in the targeting construct and the gene of interest, and the particular genomic region to be targeted. It was also found that the targeting frequency increased with increasing length of sequence homology between targeting vector and locus until the peak of targeting frequency reached between 10-14kb, which is not consistent with the frequency difference between insertion and substitution vectors (Deng C and Capecchi MR, MolCelbiol., 1992; 12 (8): 3365-71). This peak may reflect a limitation in the size of the intact DNA fragment introduced into the cell, rather than a limitation in the impact of homology length on targeting frequency.
With regard to the level of homology required to promote correct integration of the construct by homologous recombination, the DNA targeting sequence must hybridize with the endogenous sequence under stringent hybridization conditions as described in the literature (Sambrook et al, "molecular cloning: A Laboratory Manual", Cold Spring Harbor Press, 1989), for example, using the following washing conditions: washed twice with 2X SCC, 0.1% SDS for 30 minutes each at room temperature; washed once more with 2X SCC, 0.1% SDS at 50 ℃ for 30 minutes, and finally twice with 2X SCC at room temperature for 10 minutes each.
Homologous sequences containing up to about 25-30% base pair mismatches (mismatches) can be identified. Homologous nucleic acid strands more preferably contain 15-25% base pair mismatches, and especially preferably contain 5-15% base pair mismatches. Selection of the degree of homology will employ more stringent washing conditions to identify clones from the gene bank (or from other genetic material), as is well known to those skilled in the art.
Selectable marker genes
The construct may also comprise one or more of the following genes: a positive selection gene, an amplified gene, and a negative selection gene. The construct used for homologous recombination may comprise an exogenous sequence different from the sequence constituting the regulatory unit. In particular, one or more positive, amplifiable, or negative selection genes are added to the construct to facilitate identification of transformed clones with the regulatory unit properly integrated into the genome. To this end, amplification and/or marker genes are placed between the targeting regions, typically the construct between the transcription template and one of the targeting regions (FIG. 1D-F, J-L). Whatever the selection marker used, its constituent transcription and translation unit is distinct from that of the functional protein domain, avoiding any interference with the latter, and it is separated from the regulatory unit by sequences which avoid any "read-through" event, such as the transcription terminator mentioned above. Finally, one end of the positive marker gene may be deleted for transcriptional and/or translational signal sequences, which will be provided by the appropriate integration of the gene of interest, resulting in a fusion gene.
The positive selection marker gene is capable of rendering the transfected host cell resistant to the normally toxic environment. Examples of such genes are Adenosine Deaminase (ADA), aminoglycoside phosphotransferase (neo), dihydrofolate reductase (DHFR), hygromycin B phosphotransferase (HPH), thymidine kinase (tk), xanthine-guanine phosphoribosyl transferase (gpt), multidrug resistance gene (MDR), Ornithine Decarboxylase (ODC) and N- (phosphoacetyl) -L-aspartate resistance (CAD).
Alternatively, or in place of the positive selection marker gene, an amplification gene may also be optionally included in the construct. An amplified gene is a gene whose copy number increases when subjected to selective pressure. The number of copies of genes located in the vicinity of the amplifiable gene, including the novel gene encoding the functional protein domain, is also increased. Useful amplified genes include DHFR, MDR, ODC, ADA, and CAD. The members of the positive selection marker genome overlap with the members of the amplified genome, theoretically eliminating the need for two genes, one for positive selection and one for amplification, which can be achieved with only one gene. However, since most cell lines contain endogenous copies of these amplifiable genes, these cells are already somewhat resistant to the selection conditions, and it is difficult to distinguish cells with transfected DNA from cells that do not receive transfected DNA. Thus, when amplification of genes is desired, positive selection genes such as HPH, gpt, neo, tk (in tk cells) are dominant and should also be included in the construct. In some applications, it is possible or preferable to ignore amplification markers, even if increasing the number of copies of a new gene encoding a functional protein domain will eventually provide a large amount of this protein. For example, amplification is not necessary when the regulatory unit has difficulty driving the transcription and translation of the functional protein domain. To remove the positive selection gene and select for cells, one may also screen only for the production of the desired protein or mRNA. But most preferably at least the positive selection gene.
The negative selectable marker gene may also be present in the construct outside of the targeting sequence. Such a gene is not expressed in cells where the DNA construct is properly inserted by homologous recombination, since it has been removed, but is expressed in cells where the DNA construct is improperly inserted, e.g., randomly integrated. If the vector is inserted correctly by homologous recombination, it will recombine in the homologous region, resulting in the loss of sequences outside the homologous region. One such gene is herpes simplex virus thymidine kinase (HSVtk). HSVtk has low nucleotide requirements and is capable of phosphorylating nucleotide analogs that normal mammalian cells are unable to phosphorylate. If HSVtk is present in the cell, nucleotide analogs such as acyclovir and 9- [1, 3-dihydroxy-2-propoxymethyl ] guanine (ganciclovir) will be phosphorylated and incorporated into the DNA of the host cell, so as to kill the cell.
Regardless of the marker gene used, it is believed in the literature that such a gene can be later removed using a site-specific recombinase (e.g., Flp or Cre) that is already present in the cell genome and co-transfected with the construct for homologous recombination, or introduced after transformation or any recombination technique. There are documents describing these steps, which may be indispensable conditions when a marker gene affects the transcription of nearby genes, and which may finally be used to activate or inactivate the elements of the regulatory unit (Gorman C and Bullock C, Curr Opin Biotechnol., 2000; 11(5) 455-60; Kuhn R and Schwenk F, Curr Opin Immunol., 1997; 9 (2): 183-8).
Structure body
As described above, a linear or circular construct or targeting vector comprising a DNA sequence is introduced into a host cell so that the DNA targeting region hybridizes to a homologous genomic sequence and homologous recombination occurs between endogenous and exogenous sources, whereby the construct or targeting vector is stably integrated into the genome of the host cell.
The assembly of the construct should take into account the orientation of the exon(s) coding for the functional protein domain. The targeting sequence is therefore cloned in a construct in order to integrate, by homologous recombination, the elements contained in the regulatory unit in the same direction, determining the operative linkage between the exogenous regulatory unit and the region of the endogenous gene comprising the exon(s) coding for the functional protein domain. In contrast, the orientation of the selectable marker genes is arbitrary, but as previously described, they should not interfere with the activity of the exogenous regulatory unit and are therefore typically located in the transcription template and more distantly directed sequences (FIG. 1D-F, J-L).
The DNA sequences belonging to the construct and not to the regulatory unit and the selectable marker gene are eventually integrated by homologous recombination (for example, in connection with replication or selection in bacterial cells), and they should not affect the transcription or translation of the functional protein domain.
Preparation of functional protein domains
Once the amino acid sequence and the genetic structure of the functional protein domain are known, the method of the invention can be used to obtain its product by the following steps:
● identify a gene region of interest in which the integration (by insertion or substitution) of the regulatory unit allows the accurate and efficient expression of a functional protein domain;
● constructing a targeting vector comprising a targeting sequence, a regulatory unit and optionally an amplification and/or selection marker gene;
● selecting an appropriate host cell comprising the targeting sequence and the exon(s) coding for the functional protein domain;
● transformation of the construct-carrying host cells by known techniques (lipofection, electroporation, calcium phosphate precipitation);
● transformants containing the desired recombinant gene after homologous recombination were identified by the following technique: positive and/or negative selection, amplification, restriction enzyme site analysis, genome/reverse transcription PCR, southern blot or DNA sequencing;
● collecting and processing samples from the cultured cells and/or the culture medium using conventional separation techniques (lysis/disruption, extraction, precipitation, chromatography), expanding the selected transformants and selecting clones expressing the functional protein domain correctly using mRNA analysis techniques (primer extension, northern blot) and protein analysis (Western blot, ELISA, protein sequencing, epitope mapping, two-dimensional polyacrylamide gel, affinity purification, enzyme testing, CD/NMR spectroscopy, HPLC gel filtration, mass spectrometry);
● further expand the clones in culture medium in order to quantitatively analyze the selection of homologous recombinant clones for industrial production and to generate efficient protocols for the expression and purification of functional protein domains harvested from expanded cell culture medium.
Drawings
FIGS. 1A-L show some possible ways of preparing novel genes for expressing exons coding for functional protein domains by homologous recombination of a DNA construct according to the invention with a gene of interest.
HS1 and HS2 are construct fragments homologous to the target gene containing the exon(s) coding for the functional protein domain, which are used as DNA targeting sequences. IN this schematic, the gene of interest contains four exons (EX1, EX2, EX3, EX4) and three introns (IN1, IN2, IN 3). The exon(s) coding for the functional protein domain, indicated by the black bar, are located either at the 3 'end (in FIGS. 1A-F) or at the 5' end (in FIGS. 1G-L) of the gene of interest. The constructs also contain a marker gene (MK) in FIGS. 1D-F and J-L.
The regulatory unit consists of a transcription module (TS) and a translation module (TL), and comprises, whenever necessary, an exogenic splicing module with a splice donor site (SD in FIGS. 1A and D) or a splice acceptor site (SA in FIGS. 1G and J), for the excision of an intronic sequence between the translation module and the more proximal exon coding for the functional protein domain.
The DNA construct allows the insertion of a regulatory unit, optionally also a marker gene, in the intron region adjacent to the exon(s) coding for the functional protein domain, in order to facilitate the initiation (FIGS. 1A-FIndicated) or terminated (symbolized in FIGS. 1G-L)Expression) transcription and translation of the functional protein domain. Initiation and termination regulatory elements for transcription/translation of endogenous and marker genes are designated → andand (4) showing.
The splicing module can be located at the 3 'end (FIGS. 1A and D) or the 5' end (FIGS. 1G and J) of the translation module. For example, insertion of a foreign regulatory unit results IN the intron (IN2) being split into two parts (IN2 ' and IN2 "), and the intervening intron sequence (IN2 ' IN FIGS. 1A and D; IN2 ' IN FIGS. 1G and J) is excised using a splicing template to selectively transcribe and translate EX3/EX4 (IN FIGS. 1A and D) or EX1/EX2 (IN FIGS. 1G and J) into a functional protein domain. IN addition, the DNA construct allows the creation of a functional protein domain by means of a regulatory unit, selectively inserting a marker gene between said intron and exon (IN2 and EX3 IN FIGS. 1B and E; IN2 and EX2 IN FIGS. 1H and K), without splitting the intron, so as to selectively transcribe and translate EX3/EX4 (IN FIGS. 1B and E) or EX1/EX2 (IN FIGS. 1H and K).
Alternatively, the DNA construct may be such that the exogenous regulatory unit, and optionally also the marker gene, replaces a fragment of the endogenous gene adjacent to the exon(s) encoding the functional protein domain. In the above case, one targeting sequence must belong to the exon (EX 3 in FIGS. 1B, C, E and F; EX2 in FIGS. 1H, I, K and L) coding for the functional protein domain, while the other targeting sequence corresponds to an endogenous sequence not comprised in the genomic region of the functional protein domain. The result is that the foreign regulatory sequences replace the fragment between the two homologous regions (IN2 and partial EX2 IN FIGS. 1C and F; IN2 and partial EX3 IN FIGS. 1I and L), generating a new intron sequence (IN 1/EX2 IN FIGS. 1C and F; EX3/IN3 IN FIGS. 1I and L).
FIG. 2 shows a sequence alignment of human and mouse TRNACE proteins (SEQ ID NO: 1 and SEQ ID NO: 2) in the regions proteolytically processed to produce sTRANCE (for identity, + for homology). The likely N-terminal residues of mouse sTRANCE (Schlondorff J et al, J Biol chem., 2001; 276 (18): 14665-74) are denoted by § a. The corresponding N-terminus of sTRANCE, mouse TRANCE proteins (IC, TM and EC represent intracellular, transmembrane and extracellular structural regions, respectively), mouse TRANCE gene (EX represents an exon, IN represents an intron, followed by the relevant numbering), and human chromosome 13 (nucleotide numbering relative to GenBank version NT-009935.3) are shown IN dashed lines. The numbering of human and mouse protein sequences corresponds to that of the original published literature (Anderson DM et al, Nature, 1997; 390 (6656): 175-9; Wong BR et al, J Biol chem., 1997; 272 (40): 25190-4).
FIG. 3 shows the protein sequence of human collagen XVIII 1. alpha. NC1 domain with the corresponding coding exon (SQE ID NO: 3). The numbering corresponds to the full variant long sequence as disclosed (Oh SP et al, genomics.19 (3): 494-9, 2 months 1994). The residues expressed in the literature as possible N-terminal of the functional protein domain of mouse or human endostatin (endostatin) are denoted § h.
FIG. 4A shows the positions of the 5 'and 3' targeting sequences used to construct pEnd-HR #1 and pEnd-HR #2 on the human COL18A1 gene. Also shown are the primers used to amplify the COL18A1 genomic sequence, which are necessary for constructing the targeting vector using the original AL163302GenBank clone, and the length of the exons and amplified sequences. FIG. 4B shows a schematic of pEnd-HR #1 and pEnd-HR #2 showing the regulatory units, the targeting sequences (dotted lines), and the relative positions of the positive and negative marker genes. Boxed are the sequences encoding the mouse Ig signal peptide (mIgSP; nucleotides 1-56 of GenBank record M13329), followed by a splice donor site (SD; underlined sequence) including the last nucleotide of the coding sequence and the sixth nucleotide of the intron start, selected from the motifs of stimulating splicing (SEQ ID NO: 4). The splice sites are indicated by the symbols. The last codon of the original mouse Ig signal peptide, TCA, was modified to TCG to allow for correct splicing to give a better mIgSP 3' end, but this mutation did not alter the corresponding amino acid (serine).
FIG. 5 shows a schematic diagram of plasmid pBS-EF1 alpha-mIgSP-SD with relevant restriction sites.
FIG. 6 shows a schematic diagram of plasmid pGEM-3Z-mPGK-TK-HR with relevant restriction enzyme sites.
FIG. 7 shows a schematic diagram of plasmid pEnd-HR # 1. It also shows the relevant positions of the 5 'and 3' targeting regions (5 'HR and 3' HR), the genes for positive selection (SEL +) and negative selection (SEL-) and the Regulatory Unit (RU) containing the EF 1-alpha promoter, the mouse Ig signal peptide and the splice donor site. A NotI unique restriction site linearized prior to plasmid transfection was located at the 3' HR end. The plasmid pEnd-HR #2 differs from pEnd-HR #1 in length level and position on the genomic DNA of the targeting sequence (FIG. 4A), by a little more (16.9 Kb).
FIG. 8 shows the sequence of the novel mRNA expressed in cells in which plasmid pEnd-HR #1 is correctly integrated in the human COL18A1 gene, in particular the mIgSP exon and the exogenous coding sequence. The start and stop codons are indicated in bold lines. The complete mRNA sequence contains 2210 nucleotides (only the coding part of exon 41 is shown), while the functional protein domain is encoded as a protein with 275 amino acids (19 of them belonging to the mIgSP exon, 256 of them belonging to the human COL18A1 exons 38-41 coding sequence).
FIG. 9 shows the sequence of the novel mRNA expressed in cells in which plasmid pEnd-HR #2 is correctly integrated in the human COL18A1 gene, in particular the mIgSP exon and the endogenous coding sequence. The start and stop codons are indicated in bold lines. The complete mRNA sequence comprises 1964 nucleotides (only the coding part of exon 41 is shown), while the functional protein domain is encoded as a protein with 193 amino acids (19 of which belong to the mIgSP exon, 174 belong to the human COL18A1 exons 39-41 coding sequence).
FIG. 10A is a diagram showing the pEAK-HR #2 fragment. The intron 37 and exon 39-41 coding sequences of COL18A1 were fused to a FLAG epitope (FLAG epitope) and a polyadenylation site. pEAK-HR #1 comprises intron 37 instead of intron 38, and also comprises exon 38 between intron 37 and exons 39-41. FIG. 10B shows a Western blot performed with whole cell extracts tested with rabbit polyclonal antibodies against human endostatin at a 1: 100 dilution (Chemicon AB 1878). FIG. 10C shows a Western blot performed with 1.5 ml of conditioned medium from the same transfected cells, which was assayed by immunoprecipitation using A1: 1000 dilution of mouse monoclonal anti-FLAG-M2 antibody (Sigma F3165) and 20. mu.l of M2-FLAG-agarose (Sigma A1205). The secondary antibody was conjugated to anti-goat anti-rabbit or anti-goat anti-mouse antibody labeled with horseradish peroxidase, diluted 1: 10000(Amersham-Pharmacia), and detected with ECL Western Pico reagent (Pierce). The predetermined molecular weights of the functional protein domains of pEnd-HR #1 and pEnd-HR #2 are 31Kd and 22Kd (before removal of the signal peptide, as shown in FIG. 10B) or 29Kd and 19Kd (after removal of the signal peptide and after secretion in the culture medium, as shown in FIG. 10C), respectively. The molecular weight shift is due to glycosylation at the level of the sequence encoded by exon 38.
FIG. 11A is a schematic representation of the 1.0Kb long coding region contained in human 293EBNA expressing cells following integration of pEnd-HR # 1. The positions of the primers used to identify the transcription are shown. FIG. 11B is a schematic representation of the 2.4Kb long COL18A1 genomic region present in human 293EBN cells following pEnd-HR #1 integration. The positions of the primers and restriction sites used to characterize the clones are shown.
FIG. 12 shows agarose of the amplification products obtained by amplifying cDNA of selected cells using o-1165 and o-1175 as primers. Each pool corresponds to cells obtained from a single plate.
FIG. 13 shows the restriction enzyme analysis of two amplified fragments FragA and FragB taken from pool 4 of pEnd-HR #1 positive clones. The predetermined lengths of the amplified fragments from FragA are 501(A), 577(B), 618(C) and 851(D) bases. The predetermined lengths of the amplified fragments from FragB were 255(A), 331(B), 372(C) and 605(D) bases.
FIG. 14A shows DNA fragments amplified with o-1165 and o-1166 exon-specific primers using genomic DNA extracted from the original pool 4(1) of 293-EBNA cells transfected with pEnd-HR #1, from a clone pool (2-4) further isolated from the latter pool, or from 293-EBNA cells not transfected (5). FIG. 14B shows DNA fragments amplified with intron-specific primers o-1121 and o-1168 using genomic DNA isolated from the original pool 4 of 293-EBNA cells transfected with pEnd-HR #1 (2-4), or from 293-EBNA cells not transfected (5).
FIG. 15A shows DNA fragments amplified with genomic DNA from 293-EBNA cells and sequence specific primers present in intron 37 (1) or with genomic DNA of a cloned pEnd-HR #1 positive pool that has been identified by RT-PCR techniques and primers that hybridize in mIgSP exon and exon 38 (2), along with patterns obtained from each of the two fragments using a series of restriction endonucleases. The table shown in FIG. 15B lists the predetermined lengths of DNA.
The invention will now be described with reference to the following examples, which should not be construed as limiting the invention in any way.
Detailed Description
Soluble TRANCE
The mouse TRANCE gene contains 5 exons, the first exon essentially encoding the intracellular and transmembrane domains of the protein, while most of the extracellular domain is encoded by the remaining 4 exons. In particular, the fragment specifically encoding the functional protein domain in vivo (soluble TRANCE or sTRANCE) is encoded exclusively by the 3 rd, 4 th and 5 th exons (Lum L. et al, J Biol chem., 1999; 274 (19): 13613-8; Kodaira K. et al, Gene, 1999; 230(1) 121-. The corresponding human gene structure is not known, but it was determined that the human genome fragment (GenBank record NT _009935) associated with human chromosome 13 contains the coding sequence of human TRANCE protein, which is split into fragments with a sequence and length very similar to the exons of the mouse TRANCE gene. The length of the intron sequences of the two genes also appeared to be similar (FIG. 2).
Recently, the slightly different N-terminal sequence version of sTRANCE has been characterized (SchlondorffJ et al, J Biol chem., 2001; 276 (18): 14665-74), indicating that the functional protein domain can be reduced (reduced) to the coding sequence of exons 4-5. Interestingly, we noted that the homology between mouse and human TRANCE was lower in the region around the likely N-terminal sequence of sTRANCE.
If one wishes to prepare a functional protein domain corresponding to soluble TRANCE, for example, mouse or human host cells are modified to express exons 3-5 using two adjacent or non-adjacent fragments of intron 2 (20 Kb in length) as targeting sequences for homologous recombination. The regulatory unit comprises a transcription module containing promoter and enhancer sequences active in human cells, and a translation module containing synthetic exons with appropriate 5 'untranslated regions, Met codon and 5' splice donor sites.
Furthermore, given the homology between the cDNA and protein sequences of human and mouse TRNACE, as well as the recently identified pattern of sTRANCE, the sequence corresponding to intron 3 can also be targeted to mouse or human cells, expressing only exons 4 and 5. The construct may have a regulatory unit similar to that used for expression of exons 3-5, although a simplified construct may be used which includes only the appropriate transcription module as a transcription module and a 5' untranslated region, excluding a splicing module, because of the conserved methionine at the start of exon 4 in both human and mouse TRANCE genomic sequences.
The methods of the invention can be used to prepare extracellular domains of other proteins that are similar in Gene structure and belong to the TNF family, e.g., CD40L, CD70, FasL, etc. (Kodaira K et al, Gene, 1999; 230 (1): 121-127; Locksley RM et al, cell., 2001; 104 (4): 487-501). Anti-angiogenic factors derived from collagen XVIII 1 alpha (endostatin)
a) Orientation strategy
Endostatin belongs to a growing number of angiogenesis-related functional protein domains that are produced in vivo as proteolytic fragments of secreted primary translational products, but do not possess any angiogenesis-related activity. Known angiogenesis inhibitors, such as PEX, endostatin, or Restin (Restin) are C-terminal fragments of MMP-2, collagen XVIII 1 α, and collagen XV, respectively, while Fn-f and Vasostatin are fibronectin and calreticulin N-terminal fragments, respectively, as thought by recent review (Cao Y, int.J.biochem.cell biol., 2001; 33 (4): 357-69).
In particular, many proteins belonging to the collagen family and functioning as structural elements of the extracellular matrix are proteolytically modified to produce angiogenesis inhibitors, which in most of their research are termed endostatin. The functional protein domain is encoded by the collagen XVIII 1 alpha (COL18A1) gene, is highly similar to a fragment of the restin, which is encoded by the collagen XV (COL15) gene, and represents the non-collagenous domain (NC1) C-terminal of the primary translation product (John H et al, Biochemistry, 1999; 38 (32): 10217-24; Sasaki T et al, J Mol biol., 2000; 301 (5): 1179-90).
The human COL18a1 gene contained 41 exons, while the mouse COL18AL1 gene contained 43 exons, but the non-collagenous domains of both organisms were encoded by the last 6 exons. Another difference is the exons of the human COL18A1 gene encoding the non-collagenous domains, which are associated with a multimerization domain (exons 36-37), a hinge region (exon 38), and an endostatin core domain (exons 39-41).
Endostatin was originally characterized in mice as a fragment containing 183 amino acids, corresponding to the last 9 amino acids encoded by exon 40, and the amino acids encoded by exons 41-43 (O' Reilly MS et al, Cell, 1997; 88 (2): 277-. However, the corresponding fragment containing the last 9 amino acids encoded by exon 38 and the amino acids encoded by exons 39-41 has not been found in human samples. Several studies have shown that the hinge region is particularly sensitive to various proteases (Felbor U et al, EMBO J., 2000; 19 (6): 1187-94; Ferreras M et al, FEBS Lett., 2000; 486(3) L247-51; John H et al, Biochemistry, 1999; 38 (32): 10217-24; Wen W et al, Cancer Res., 1999; 59 (24): 6052) and 6056), resulting in the generation of a series of fragments with different N-terminal sequences encoded by exon 38 (FIG. 3). The literature also shows that the first amino acid encoded by exon 39 has been constructed (Hohenester E et al, EMBO J., 1998; 17 (6): 1656-64) and that proteins with at least 4 amino acids at the N-terminus longer or shorter than the N-terminus of the protein encoded by exons 39-41 show activity and inactivity, respectively (Yamaguchi N et al, EMBO J., 1999; 18 (16): 4414-. Finally, isolated or mosaic fragments derived from different sequences encoded by exons 39-41 differ in their properties with respect to cell motility and proliferation (WO 00/63249, WO 00/667771).
In conclusion, human COL18A1 exons 39-41 encode the spontaneous folding unit of human collagen XVIII 1. alpha. protein, which corresponds to the authentic functional core protein domain, which provides the anti-angiogenic properties of natural endostatin. Furthermore, it has been shown in the literature that the sequences belonging to exons 36-38, or even the heterologous sequence eventually added to the N-terminus of these functional protein domains, if of limited length (Yamaguchin et al, EMBO, 1999; 18 (16): 4414-.
The genomic DNA sequence of the human COL18A1 gene is located on chromosome 21 (Hattori M et AL, Nature, 2000; 405 (6784): 311-9), and is contained in a genomic clone with 340 kilobases that is easily found in GenBank (accession number AL163302), and the inclusion in this clone also makes the design of primers for fragment-specific amplification easier. Thus, according to the present invention, functional protein domains with the anti-angiogenic properties of human endostatin can be produced by modifying human cells with a vector containing a regulatory unit and a targeting sequence which, by homologous recombination, integrates the regulatory unit at the level of intron 37 or intron 38 of the human COL18A1 gene.
Two different constructs (pEnd-HR #1 and pEnd-HR #2) were assembled with the same regulatory unit and different targeting sequences belonging to the human COL18A1 gene (FIG. 4A). The construct pEnd-HR #1 allows, by homologous recombination, the replacement of the 3 'end of intron 36, of complete exon 37 and of the 5' end of intron 37 with a regulatory unit which promotes the expression of exons 38 to 41. The construct pEnd-HR #2 allows, by homologous recombination, the replacement of the 3' end of intron 37 and of complete exon 38 with a regulatory unit which promotes the expression of exons 39 to 41.
Cloning of DNA fragments, construction and transfection of plasmids and selection and analysis of cells are carried out by standard techniques described in the literature (Ausubel FM et AL, "Current Protocols in Molecular Biology" Press John Wiley & Sons Inc., 1999; Sambrook et AL, "Molecular Cloning: A laboratory Manual", Press Cold Spring Harbor Press, 1989; Hasty P et AL, in "general targeting: a practical approach", ed. Joyner AL, Press, pages 1-35, 1999). All plasmids prior to transfection into human cells were maintained and propagated using commonly used E.coli (E.coli) strains DH5 α or XL1 blue (XL blue).
b) Construction of targeting vectors
The COL18A1 genomic fragment required for the generation of the targeting vector was cloned using PCR amplification of the appropriate homologous regions of GenBank clone AL163302, particularly within the 9.7 kilobases fragment contained between exon 32 and the 3' untranslated region of exon 41 (FIG. 4A).
The 37 base long primer o-1124(SEQ ID NO: 5) contains a 10 base long sequence at the 5 'end that includes a SalI restriction site, while the 27 bases at the 3' end correspond to nucleotides 202790 and 202816 of clone AL 163302. The latter sequence allows o-1124 to hybridize in exon 32 of the human COL18AL gene, which serves as a forward primer for amplification of the 5' targeting region of the two constructs.
The 36 base long primer o-1125(SEQ ID NO: 6) contains a 9 base long sequence at the 5 'end that includes a Bam HI restriction site, while the 27 bases at the 3' end are complementary to nucleotide 206301-206327 of clone AL 163302. The latter sequence, which allows o-1125 to hybridize in intron 36 of the human COL18AL gene, serves as a reverse primer for the 5' targeting region of the amplification construct pEnd-HR # 1.
The 35 base long primer o-1121(SEQ ID NO: 7) contained a 9 base long sequence at the 5 'end that included a Bam HI restriction site, while the 26 bases at the 3' end were complementary to nucleotide 208099-208125 of clone AL 163302. The latter sequence allows hybridization of o-1121 in intron 37 of the human COL18AL gene, which serves as a reverse primer for the amplification of the 5' targeting region of construct pEnd-HR # 2.
The 35 base long primer o-1116(SEQ ID NO: 8) contains a 10 base long sequence at the 5 'end, which includes an XbaI restriction site, while the 25 bases at the 3' end correspond to nucleotides 206382-206406 of clone AL 163302. The latter sequence allows o-1116 to hybridize in intron 37 of the human COL18AL gene, which serves as a forward primer for the 3' targeting region of the amplification construct pEnd-HR # 1.
The 40 base long primer o-1117(SEQ ID NO: 9) contains a 16 base long sequence at the 5 'end that includes a Not I restriction site, while the 24 bases at the 3' end are complementary to nucleotides 208098-208121 of clone AL 163302. The latter sequence, which allows o-1117 to hybridize in intron 37 of the human COL18AL gene, serves as a reverse primer for the amplification of the 3' targeting region of construct pEnd-HR # 1.
The 34 base long primer o-1126(SEQ ID NO: 10) contains a 9 base long sequence at the 5 'end that includes an XbaI restriction site, while the 25 bases at the 3' end correspond to nucleotides 208381 and 208405 of clone AL 163302. The latter sequence allows o-1126 to hybridize in intron 38 of the human COL18AL gene, which serves as a forward primer for the 3' targeting region of the amplification construct pEnd-HR # 2.
The 43 base long primer o-1123(SEQ ID NO: 11) contains at the 5 'end a 17 base long sequence including a Not I restriction site, while the 26 bases at the 3' end are complementary to nucleotides 209828-209853 of clone AL 163302. The latter sequence, which allows the o-1123 to hybridize in intron 39 of the human COL18AL gene, serves as a reverse primer for the amplification of the 3' targeting region of construct pEnd-HR # 2.
The 5' targeting DNA fragment of construct pEnd-HR #1 was prepared by PCR using the genomic clone AL163302 as template and o-1124 and o-1125 as primers. This 3.5Kb fragment includes the 3 'end of exon 32, full introns 32-35 and exons 33-36 at the 5' and 3 'ends, respectively, as well as the 5' end of intron 36 and unique Sal I and Bam HI restriction sites.
A5' targeting DNA fragment of construct pEnd-HR #2 was prepared by PCR using the genomic clone AL163302 as template and o-1124 and o-1121 as primers. This fragment, which carries 5354 base pairs, includes the 3 'end of exon 32, full introns 32-36 and exons 33-37 at the 5' and 3 'ends, respectively, as well as the 5' end of intron 37 and unique Sal I and Bam HI restriction sites.
A3' targeting DNA fragment of construct pEnd-HR #1 was prepared by PCR using the genomic clone AL163302 as template and o-1116 and o-1117 as primers. This 1.7Kb fragment includes the middle region of intron 37 at the 5 'and 3' ends, respectively, as well as unique XbaI and NotI restriction sites.
A3' targeting DNA fragment of construct pEnd-HR #2 was prepared by PCR using the genomic clone AL163302 as template and o-1126 and o-1123 as primers. This 1.5Kb long fragment includes the complete intron 38 and exon 39 at the 5 ' and 3 ' ends, respectively, as well as the 5 ' end of intron 39 and the XbaI and NotI unique restriction sites.
Because the PCR product is particularly long, it is preferable to use specific enzymes and procedures known in the art to amplify the fragments required for preparing the targeting vector. Kits with high fidelity and long-range PCR are commercially available, such as the Herculase kit for PCR (Stratagene).
Each of the amplified 5 'and 3' homologous regions was cloned between the BamHI and SalI sites and the XbaI and NotI sites of the plasmid pBluescript-KS II (pBS-KS II; Stratagene), respectively. The amplified and cloned genomic fragments were then sequenced using restriction mapping and partial DNA sequencing to determine their identity.
The regulatory unit was constructed by assembling the DNA sequences disclosed in the literature (FIG. 4B). Transcription templates were selected among human or non-human promoters with high levels of organization activity in human cultured cells. An example of this is the cloned human elongation factor-1 (EF-1) alpha gene, (Uetsuki T et al, J Biol chem., 1989; 164 (10): 5791-8), which has proven to be very effective in a wide range of host cells (Mizushima S and Nagata S, Nucleic Acids Res., 1990; 18 (17): 5322). In several commercially available plasmids (InVitrogen) a promoter 1.2Kb In length comprises a TATA box followed by a transcription start site which allows the transcription of an untranslated exon 22 bases long and an intron 0.9Kb In length which has a transcription enhancing effect due to the presence of several Sp1 and Ap1 sequences.
Combining a translation template comprising a translation initiation site with a splice template comprising a splice donor site consensus sequence. In addition, in order to facilitate the isolation of the functional protein domain located at the 3' end of the gene, a sequence coding for a signal peptide is included between the translation initiation site and the splicing consensus sequence, in frame with the exon coding for the functional protein domain. Since the signal peptide of choice is the mouse Ig signal peptide (mIgSP) sequence (GenBank accession number M13329), a suitable consensus sequence for splicing was chosen among the recently discovered functional human cell sequences (Long M et al, Proc Natl Acad Sci USA, 1998; 95 (1): 219-.
The human EF-1. alpha. promoter fragment (373-1561 nucleotides corresponding to GenBank record J04617) was cloned between ClaI and NheI of pBS-KS II generating vector pBS-EF 1. alpha. The exons encoding mIgSP and the splice donor site are combined in a synthetic DNA fragment 0.2Kb in length, which has an XbaI restriction site at the 5 'end and NheI and NotI restriction sites at the 3' end. This fragment was cloned between the XbaI and NotI sites of pBS-EF1 α, such that it mapped the 3' end of the human EF1 α promoter, resulting in plasmid pBS-EF1 α -mIgSP-SD (FIG. 5).
Construction of the homologous recombination vector backbone begins with plasmid pGEM-3Z (Promega) to which positive and negative selection markers, targeting regions, transcription, translation and splicing modules are added as follows.
The gene used for the negative selection for homologous recombination under the control of the ubiquitously active mouse phosphoglycerate kinase (mPGK) promoter and polyadenylation signal (fragments of 508 and 480 bases derived from the plasmid deposited in GenBank under accession number X76683, respectively) was HSV-1 thymidine kinase (HSV-TK; a 1.8Kb long fragment derived from the complete genome of human herpesvirus 1, deposited in GenBank under accession number NC-001806).
The multiple cloning site at the 3' end of HSV-TK was modified to allow cloning of all other elements. Two new unique restriction sites NotI and ClaI were introduced by cloning two annealed oligonucleotides, and subsequent digestion and religation removed two XbaI sites adjacent to the mPGK promoter and polyadenylation site, resulting in plasmid pGEM-3Z-mPGK-TK-HR (FIG. 6). The latter plasmid was used to clone the first 5 'targeting region, then the positive selection marker, and finally the 3' targeting region of each construct together with the transcription/translation template.
The 5 ' targeting region of each construct was excised from the pBS-KS II vector with BamH and Sal I restriction sites and subcloned into pGEM-3Z-PGK-TK-HR between BamH and Xho I sites to generate plasmids pGEM-3Z-PGK-TK-5 ' HR #1 and pGEM-3Z-mPGK-TK-5 ' HR # 2. The hygromycin resistance gene was selected as the positive selection gene, which was prepared from plasmid pHygEGFP (Clontech), wherein the resistance gene was expressed as a fusion protein with Green Fluorescent Protein (GFP) under the control of a viral promoter (CMV). The commercial plasmid was modified by cutting off NotI and filling in with Klenow enzyme (Life Technologies) to remove two adjacent NotI sites. The CMV-HygEGFP-polyA cassette was cloned as a Cla I-Bgl II fragment into plasmids pGEM-3Z-PGK-TK-5 'HR #1 and pGEM-3Z-mPGK-TK-5' HR #2 using ClaI and BamHI restriction enzyme sites to generate plasmids pGEM-3Z-PGK-TK-HYG-5 'HR #1 and pGEM-3Z-mPGK-TK-HYG-5' HR # 2.
The 3' targeting region of each construct was excised from the pBS-KS II vector using Nhe I and NotI sites preceding the consensus splice donor site downstream of the exon encoding the mIgSP signal peptide and cloned as an XbaI-Not I fragment into plasmid pBS-EF1 α -mIgSP-SD. The resulting plasmids pBS-EF1 alpha-mIgSP-SD-3 ' HR #1 and pBS-EF1 alpha-mIgSP-SD-3 ' HR #2 contain an EF1-mIgSP fragment fused to the 3 ' targeting region between the ClaI and NotI sites. These ClaI-NotI fragments were finally introduced between ClaI-Not I sites of pGEM-3Z-PGK-TK-HYG-5 'HR #1 and pGEM-3Z-PGK-TK-HYG-5' HR #2 downstream of the positive selection cassette, resulting in pEnd-HR #1 (FIG. 7) and pEnd-HR #2 vectors. These plasmids were linearized at a unique NotI site located 3 'of the 3' targeting region prior to transfection into cells for targeting the exogenous sequence to the specific location of the human COL18A1 gene. Once integrated, the regulatory unit promotes transcription of mIgSP-encoding mRNA fused in-frame to exons 38-41 of COL18A1 of pEnd-HR #1(SEQ ID NO: 12; SEQ ID NO: 13; FIG. 8) or to exons 39-41 of COL18A1 of pEnd-HR #2(SEQ ID NO: 14; SEQ ID NO: 15; FIG. 9).
The activity of the exogenous regulatory units contained in pEnd-HR #1 and pEnd-HR #2 was initially tested by transient transfection of pEAK plasmids containing the endostatin coding sequence (Edge BioSystems). The regulatory unit was cloned 5' of a DNA fragment with intron sequence, which could be transcribed and spliced upon homologous recombination in the human COL18A1 gene, while the coding sequence located in the downstream exon of the human COL18A1 gene fused in-frame with the heterologous epitope helped to recognize even small amounts of protein, stop codon, and polyadenylation site present in the vector (FIG. 10A). Both constructs (pEAK-pEnd-HR #1 and pEAK-pEnd-HR #1) were transfected in human 293-EBNA cells (later used for homologous recombination), which did not express the COL18A1 gene (Yamaguchi N et al, EMBO J., 1999; 18 (16): 4414-.
The transfected cells were tested for mRNA and secreted and intracellular proteins to confirm that the construct was transcribed, spliced, and translated correctly.
First, cDNA obtained from transfected cells was amplified by RT-PCR analysis, and Western blot analysis was performed using commercially available antibodies to endostatin (Chemicon Inc.) and flag epitopes (Amersham-Pharmacia). In particular, protein analysis showed that the EF-1. alpha. promoter readily enters 293-EBNA, which is then transcribed and spliced, and that the final mRNA is translated into a functional protein domain of predetermined size (FIG. 10B). Subsequently, the protein is secreted in the medium by the action of an exogenous signal peptide (FIG. 10C), for example, a construct containing only the COL18A1 exon sequence is described (Blezinger P et al, nat. Biotechnol., 1999; 17 (4): 343-348).
c) Of orientation carriersTransfection and clonal selection
The pEnd-HR #1 and pEnd-HR #2 vectors may be used to transfect human cell types in which the exogenous regulatory sequences are active or inducible, regardless of whether the endogenous COL18A1 gene has been expressed. This gene is highly expressed in liver, heart and renal vascular tissues and hepatocytes (Saarrela J et al, am. J. Pathol., 1998; 153 (2): 611-626). Therefore, cell lines derived from these cell types can be suitable for use in the methods of the invention, but other cell types that do not express COL18a1 can be used in the invention, since even though the chromatin structure of the locus may eventually repress transcription, its sufficiently strong and ubiquitous regulatory sequences, active and inducible, are able to overcome these limitations. The selection of cell types can therefore be extended to immortalized human cell lines which are easy to transfect and expand, such as HT1080, WI38, HepG2 or 293 cells.
As indicated previously, the human immortalized human cell line, 293-EBNA, derived from human embryonic kidney, is effectively modified by the method of the invention to selectively express the exon(s) encoding the anti-angiogenic functional protein domain of the human COL18A1 gene. These cells are commonly commercially available (InVitrogen), which express EB virus nuclear antigen 1(ENBA-1), can be efficiently transfected with electroporation and grown in standard DMEM Medium (Dulbecco's Modified Eagle's Medium) containing 10% fetal bovine serum, 4.5 g/L glucose and antibiotics (100. mu.g/ml penicillin and streptomycin; Gibco-BRL).
Electroporation transfection was carried out using a multiwell device (Eppendorf) with 4 mm spaced electrodes under the recommended conditions (conductivity 60microSiemens, voltage 500V) with the mammal's best hypotonic buffer as supplied by the manufacturer. Eight aliquots of the exponential growth phase of 293-EBNA cells (2.5X 10 per aliquot) were grown in 800. mu.l buffer with 12. mu.g of NotI linear plasmid of pEnd-HR #1 or pEnd-HR #26Individual cells) are electroporated. All 2X 10 transfected with the same linear vector after application of the pulse7The cells are seeded on four pieces which have been plated with D-polylysine (SIGMA)Plated on 150 mm (NUNC) diameter tissue culture plates. After 72 hours, selection was started with 250. mu.g/ml hygromycin (Life Technology) and the medium was changed every two days. After 4 days of selection, the cells were incubated with a solution containing 10. mu. mol of 9- [1, 3-dihydroxy-2-propoxymethyl group]Guanine (gancylovir) (cymene; Roche) in hygromycin, and negative selection for homologous recombination.
Approximately 25 days after transfection, selected cells were isolated as single clones under a microscope with a pipette tip. One clone pool (approximately 300 clones per pool) was generated per plate, and 4 pools were kept in positive selection for 4 to 5 weeks to expand to sufficient material to allow further analysis before continuing to select smaller clone pools.
d) Identification and analysis of endostatin expressing clones
The clones obtained after positive-negative selection allow the identification of cells in which the functional anti-angiogenic protein domain is expressed by homologous recombination integrating the exogenous regulatory unit in the human COL18A1 gene, in a number of ways. The following experiments were carried out in order to determine the correct integration of the regulatory unit and the specific expression of the exon(s) coding for the anti-angiogenic functional protein domain.
Human COL18a1 gene and transcriptional structure in selected clones were analyzed on genomic DNA and mRNA extracted from positive cell banks using techniques described in literature known to those skilled in the art. To identify sequences contained in genomic DNA and mRNA after proper orientation and splicing (fig. 11A-B), the preferred method is to selectively amplify DNA fragments using DNA Primers (PCR), because this method is fast and requires less biological material. Some of the amplified fragments can also be cloned and DNA sequenced to further determine sequence identity.
Polymerase Chain Reaction (PCR) was performed on an Applied biosystems 9700thermocycler (Applied biosystems 9700thermocycler) using a commercially available kit (Qiagen) for either HotStarTAQ PCR (when the starting material is genomic DNA or cDNA) or OneStep HotStarRT-PCR (when the starting material is complete RNA), essentially following the instructions provided by the manufacturer. The final volume of the reaction was 25 or 50 microliters. After completion of the PCR reaction, 10 or 20. mu.l of the reaction mixture was eluted on an agarose gel and screened for the presence of the predetermined PCR fragment, which indicates proper integration of the exogenous sequence in the targeted position of COL18A1 gene, or for the presence of the predetermined spliced mRNA encoding the functional anti-angiogenic protein domain.
Pools of clones transfected with either pEnd-HR #1 or pEnd-HR #2 were first screened to find new transcripts. mRNA extracted from the clones is subjected to reverse transcription and then to PCR amplification (two-step method) or directly to reverse transcription and amplification in the same tube (one-step method). The primers were designed to hybridize to templates of different exon sequences, such that the same forward primer always hybridized within the mIgSP exon, and a different reverse primer hybridized to the human endogenous COL18A1 exon (FIG. 11A)
For the two-step procedure, the Superscript-II cDNA kit (Life technologies) and 5 units of MMLV reverse transcriptase (Promega Biotech) were used as oligo-dT according to the manufacturer's instructions18The primer reverse transcribed 1. mu.g of complete RNA. After incubation at 37 ℃ for 45 minutes, (1 unit) RNase H was added to remove RNA that pairs with DNA extended from the oligonucleotide, and incubation was continued at 37 ℃ for an additional 15 minutes. Finally, the resulting complementary DNA (cDNA) was diluted with RNase-free water to a concentration of 10 ng/μ l. PCR was then performed using 20 ng oligo-dT primer cDNA, 0.5. mu. mol of each primer, as follows:
amplifying for 15 minutes at 95 ℃ for 1 cycle;
amplifying at 95 ℃ for 45 seconds, 60-56 ℃ (the temperature is reduced by 1 ℃ for each cycle), amplifying at 72 ℃ for 1 minute, and circulating for 5 times;
amplifying at 95 ℃ for 45 seconds, amplifying at 54 ℃ for 30 seconds and amplifying at 72 ℃ for 1 minute for 35 cycles;
amplification was performed at 72 ℃ for 10 min, 1 cycle.
For the one-step method, OneStep HotStar RT-PCR was performed with 500 ng total RNA, 0.5. mu.M of each primer. The reverse transcription reaction was carried out at 50 ℃ for 30 minutes according to the conditions given by the manufacturer, followed by the following steps:
amplifying for 15 minutes at 95 ℃ for 1 cycle;
amplifying at 95 ℃ for 30 seconds, amplifying at 57 ℃ for 30 seconds and amplifying at 72 ℃ for 1 minute for 35 cycles;
amplification was performed at 72 ℃ for 10 min, 1 cycle.
Initially, an oligonucleotide hybridizing within the coding region of the exogenous mIgSP exon (o-1165; SEQ ID NO: 16) and a primer hybridizing within exon 40 (o-1175; SEQ ID NO: 17) of endogenous COL18A1 were used, which should be expressed upon integration of either construct of the human COL18A1 gene. The cDNA amplified by this method, which is extracted from a pool of selected clones transfected with either pEnd-HR #1 or pEnd-HR #2, identifies the pool of clones for each targeting construct (pool 4 contains pEnd-HR #1 and pool 3 contains pEnd-HR #2), which expresses mRNA molecules with both exogenous and endogenous exon sequences, and after correct targeting, transcription and splicing (577 bases for pEnd-HR #1 and 331 bases for pEnd-HR #2), isolates the mRNA molecules by the number of predetermined nucleotides. The PCR results were negative for cells that were not transfected (FIG. 12).
In the pool of pEnd-HR #1 positive clones, a weaker signal band corresponding to the predetermined size of pEnd-HR #2 was clearly identified. Therefore, detailed PCR analysis was performed on each of the two molecular species using the outer end primers o-1165(SEQ ID NO: 16) and o-1131(SEQ ID NO: 18) and the pool of pEnd-HR #1 positive clones 4, and two larger signal bands (FragA and FragB) were amplified using the inner end reverse primer o-1175, and found to be the same difference in length. The o-1165 primer is used as a forward primer, and other oligonucleotides are used as reverse nested primers: o-1164(SEQ ID NO: 19), o-1175(SEQ ID NO: 17) o-1179(SEQ ID NO: 20), which were isolated from the gel, cloned, and used independently as templates.
If the weaker signal band is due to alternative splicing, the amplified fragments obtained using FragA and FragB as templates should be different from the fragments corresponding to the length of the deleted exon. Since all primer pairs amplified fragments that differed by approximately 250 base pairs between FragA and FragB, the following conclusions can be drawn: the weaker signal band originally identified in pool 4 actually corresponded to the transcript, while the transcribed exon 38 (with 246 bases) was contained in the entire fragment and was excised due to alternative splicing (FIG. 13). This normal splicing was later confirmed by sequencing the cloned fragments, however, because exons 38 and 39 are in-frame, the resulting transcription is identical to that of cells transfected with pEnd-HR #2 (FIG. 8).
Finally, a fragment of 0.9Kb in length was obtained by RT-PCT technique on mRNA from pool 4 transfected with pEnd-HR #1 using mIgSP specific primer o-1165 and 5' untranslated region specific primer o-1193 of exon 41 (SEQ ID NO: 21). The fragments were cloned and sequenced. This sequence corresponded to the predetermined sequence (FIG. 8), which further confirmed that the DNA construct was correctly integrated, facilitating transcription of the endostatin-specific targeting COL18A1 exon.
Pools of clones transfected with pEnd-HR #1 and expressing the correct transcripts were further analyzed at the genomic level by PCR techniques using oligonucleotides hybridizing to either exon sequences, such as o-1165(SEQ ID NO: 16) and o-1166(SEQ ID NO: 22), or intron sequences, such as the examples of o-1168(SEQ ID NO: 23) or o-1121(SEQ ID NO: 7) (FIG. 11B).
PCR was performed using 200 ng of genomic DNA (isolated from a pool of RT-PCT recognized pEnd-HR #1 positive clones, or from 293-EBNA non-transformed cells), 0.5. mu. moles of each primer, as follows:
amplifying for 15 minutes at 95 ℃ for 1 cycle;
amplifying at 95 ℃ for 30 seconds, amplifying at 65-58 ℃ for 30 seconds (the temperature is reduced by 1 ℃ for each cycle), amplifying at 72 ℃ for 2 min and 15 seconds, and cycling for 5 times;
amplifying at 95 ℃ for 30 seconds, amplifying at 57 ℃ for 30 seconds and amplifying at 72 ℃ for 2 minutes and 15 seconds for 25 cycles;
amplification was performed at 72 ℃ for 10 min, 1 cycle.
Primers hybridizing in the exon regions (one inserted by homologous recombination and the other endogenous) were able to amplify a fragment of predetermined length (1820 bases) in the entire pool of clones generated from the original positive pool, but not in the untransfected cells (FIG. 14A), confirming the integration of the mIgSP exon. Primers hybridizing to the intron regions were able to amplify fragments of predetermined length (1784 bases) in both transfected and non-transfected clones (FIG. 14B), confirming the structural integrity of the gene.
Another evidence that the desired structure was obtained after integration of the regulatory unit contained in the targeting vector pEnd-HR #1 by digestion of the fragment generated with exon-and intron-specific primers and a series of restriction enzymes confirms that the resulting subfragments have the desired length. All tested enzymes provided a predetermined restriction pattern (fig. 15A, B).
The PCR analysis of mRNA and genomic DNA extracted from the cells modified by the method of the invention allows to identify clones whose regulatory unit, integrated in the human COL18A1 gene, leads to the specific expression of the exon(s) coding for the functional protein domain responsible for the antiangiogenic properties of endostatin. Therefore, this analysis enables further isolation and characterization of clones that best express the desired functional protein domain. This additional analysis can also be performed at the level of hybridization of the endostatin-specific probes to the whole RNA (northern blot) or genomic DNA (southern blot) obtained after sufficient expansion of these clones. For example, southern blots may recognize signal bands of 2.2Kb or 1.9Kb in length from RNA isolated from clones transfected with pEnd-HR #1 or pEnd-HR #2, respectively. If genomic DNA from positive and non-transfected cells was digested with NheI and SpeI, separated on an agarose gel, transferred to a filter, probed with a radioactive fragment corresponding to the human COL18A1 genomic region, including exons 32-36 and introns 32-36, it was found that the hybridization pattern differed because the 12.4Kb fragment visible in the non-transfected cells, including human COL18A1 from intron 31 to the end, was replaced by the shorter fragment of the positive cells (4.4 Kb in pEnd-HR #1 transfected cells) due to the additional NheI and SpeI sites in the positively selected gene.
In order to identify clones with higher production levels, screening was performed at the protein level with antibody-based techniques (ELISA, western blot, immunoprecipitation) before further expansion, collection and purification of the anti-angiogenic functional protein domain.
The anti-angiogenic activity of the functional protein domain after further purification can be determined on the basis of endothelial cells by one of several methods described in the literature. Protein extracts, purified preparations or culture media from positive clones were tested in endothelial cell migration assays using recombinant or purified human endostatin as a standard. One of the more commonly used assays employs commercially available (clonetics) Human Umbilical Vein Endothelial Cells (HUVECs) which, when cultured, enable reliable mobilization assays to be established (Yamaguchi N et al, EMBO J., 1999; 18 (16): 4414-.
Trx80
Human thioredoxin (Trx) is an enzyme that catalyzes the reduction of intracellular disulfides. Truncated thioredoxin (Trx80) has 80-84N-terminal residues, does not have any enzymatic activity, is cleaved and secreted by monocytic cell lines and is itself a potent mitotic cytokine that stimulates the growth of quiescent human peripheral blood mononuclear cells (Pekkari K et al, J Bio chem., 2000; 275 (48): 37474-80). Furthermore, purified human CD14(+) monocyte culture medium was specifically activated to differentiation by Trx80 due to the measured increased expression of CD14, CD40, CD54, CD 86. Trx80 also induced secretion of IL-12 from CD40(+) monocytes in human peripheral Blood mononuclear cell culture medium, an enhanced effect of IL-2 secretion in PBMC medium by induction of interferon-. gamma.was shown (Pekkari K et al, Blood, 97 (10): page 3184-90, 5/15/2001). Even though thioredoxin may have co-cytokine (co-cytokine) activity with interleukins after non-lead (leader) secretion (Bertini R et al, J ExpMed., 1999; 189 (11): 1783-9), the effect obtained with Trx80 cannot be reproduced with the complete protein.
The human thioredoxin Gene (Genbank records X54539 and X54540) carries 5 exons coding for a protein containing 105 residues (Kaghad M et al, Gene, 1994; 140 (2): 273-8). The first 4 exons all have a reading frame of zero, encoding all 85 residues, corresponding essentially to Trx 80. The 1.3Kb sequence with intron 4, exons 4-5 and part of intron 3(Genbank record X70288) can be used to construct a targeting vector that, together with the splice acceptor site, integrates the regulatory unit at the level of intron 4 to terminate transcription and translation.
Cytokines derived from human tyrosine-tRNA synthetases
aminoacyl-tRNA synthetases catalyze the aminoacylation of transfer rna (tRNA). When native human tyrosine-tRNA synthetase is inactive as a cell signaling molecule, it is secreted under apoptotic conditions by one of the extracellular proteases, leukocyte elastase (Leukocyfe elasnase), and is split into two distinct cytokines (Wakasugi K and Schimmel P, Science, 1999; 284 (5411): 147-51). The N-terminal fragment, which is hidden at the catalytic site, serves as an interleukin 8-like cytokine. The C-terminal domain is an endothelial-monocyte-activating polypeptide ii (emap ii) -like cytokine that has potent leukocyte and monocyte chemotactic activity and stimulates the production of myeloperoxidase and tumor necrosis factor alpha.
The putative cleavage site of the protein, which contains 528 residues, is located at residue 360, but fragments cleaved at residue 344 are also active. By probing the human genome with the known coding sequence for tyrosine-tRNA synthetase (Genbank record BC001933), a genomic clone can be found that contains a disruption sequence corresponding to the coding sequence for human tyrosine-tRNA synthetase (Genbank record AL 356780). Specifically, the sequences encoding amino acids 303-348 and 349-380 correspond to the human clone sequences 98110-97970 and 96712-96615, respectively, which are oppositely numbered.
In this case, the cloned fragment comprised between 98110 and 96615, as well as other surrounding sequences available in the clone, can be used to direct a regulatory unit which can initiate or terminate transcription and translation between 97970 and 96712, depending on whether expression of the N-terminal (amino acids 1-348) or C-terminal (amino acids 349 and 528) cytokine is desired.
Antigen binding sites of immunoglobulin heavy chains
In some embodiments, the regulatory unit may comprise a sequence capable of terminating transcription and translation at a position corresponding to the 5' end of the functional protein domain. Such a method can be applied as long as the functional protein domain of interest is located in the first exon of the gene of interest, which is expressed constitutively or after induction to a high degree in the cell in which the method is used.
An example is an antibody binding site located at the N-terminus of an immunoglobulin molecule. The antigen binding site of commonly used antibodies is mainly formed by the hypervariable loops of both the heavy and light chain variable regions. However, a functional antigen-binding site may also be formed solely from heavy chain variable regions (VH), such as camelids and camelids, where the antibody contains only two heavy chain variable regions, and no light chain. Analysis of the differences in amino acid sequence between these heavy chain-only camel antibody VH and the commonly used human antibody VH domains facilitates the design of alternative human heavy chain variable regions. Camelized VH's such as camelized VH's have been shown to be robust and efficient small recognition units formed by single immunoglobulin (Ig) domains (Riechmann L. et al, J Immunol methods, 231(1-2) pages 25-38, 10/12/1999; Davies J. et al, Biotechnology (NY), 13 (5): pages 475-9, 5/1995)
The exons encoding the VH domain of IgG are generated by rearrangements and mutations that occur during B cell development. Once human myeloma cells are fused with B cells encoding antibodies with high affinity for the antigen, the resulting hybridoma cells actively transcribe and translate the complete IgG gene, but these cells can also efficiently integrate foreign sequences by homologous recombination (Shulman MJ et al, Mol Cell biol., 1990; 10 (9): 4466-4472). If only the VH region is desired as functional protein domain, then the IgG gene can be modified by the method of the invention by integrating, by homologous recombination, a regulatory unit containing a transcription and translation termination template into the intron following the exon in question.

Claims (33)

1. A method for producing a protein which is a protein fragment of a primary translation product of a gene of interest which is N-or C-terminal to the primary translation product of the gene, wherein said fragment has a biological activity which is distinct from the activity of the primary translation product, said method comprising:
(i) growing a host cell transfected with a DNA construct comprising:
(a) a regulatory DNA capable of initiating transcription and translation of the DNA encoding the fragment or terminating transcription and translation of the DNA encoding the fragment; and
(b) a DNA targeting region comprising a sequence homologous to the region of the gene of interest 5 'or 3' to the sequence encoding the fragment, the construct being integrated in the host cell genomic DNA at a position determined by the DNA targeting region such that expression of the fragment is under the control of the regulatory DNA; and
(ii) culturing the homologous recombination cells; and
(iii) the protein is collected and the protein is collected,
wherein the fragment is selected from the group consisting of the extracellular binding domain of a membrane receptor, a proteolytic fragment of an extracellular structural matrix protein having anti-angiogenic properties, and a protein proteolytically released from an inactive precursor protein.
2. The method according to claim 1, wherein said fragment is encoded by an exon, and wherein said DNA targeting region comprises a sequence homologous to a genomic region 5 'or 3' to the exon(s) encoding the protein.
3. The method of claim 1 or 2, wherein
(i) Said fragment corresponding to the C-terminus of the primary translation product of the target gene and being encoded by at least the 3' -most exon of the target gene carrying an intron;
(ii) the DNA targeting region comprises a sequence homologous to a genomic region 5' of the exon(s) encoding the fragment;
(iii) following integration into the host cell genome by homologous recombination, the DNA construct is operably linked to the exon;
(iv) the DNA construct includes:
(a) a transcription template consisting of a DNA sequence capable of activating transcription of the DNA encoding the fragment; and
(b) a translation template consisting of a DNA sequence capable of initiating translation of said fragment; and
(v) the DNA construct optionally includes a splicing template comprising an unpaired 5 'splice donor site complementary to an unpaired 3' splice acceptor site in the endogenous exon encoding the N-terminus of the protein, thereby enabling splicing of the primary transcript with the translation template juxtaposed in frame with the sequence encoding the protein.
4. The method of claim 3, wherein: the translation module comprises a 5' untranslated region and the translation initiation codon is contained in the exon coding for the N-terminus of the fragment.
5. The method of claim 3, wherein: the translation module comprises one or more exogenous exons containing a 5' untranslated region and a translation initiation codon.
6. The method of claim 5, wherein said exogenous exon is juxtaposed in frame with the sequence encoding said fragment by a splicing template to become the N-terminal exon of said fragment.
7. The method of claim 5, wherein said exogenous exon is juxtaposed in frame with the sequence encoding said fragment by selecting a targeting sequence such that said exogenous exon is fused to the exon encoding the N-terminus of said fragment.
8. The method of any one of claims 5 to 7, wherein the exogenous exon encodes a signal peptide or the N-terminus of a signal peptide.
9. The method of claim 1 or 2, wherein:
(i) said fragment corresponding to the N-terminus of the primary translational product of the target gene and being encoded by at least the most 5' exon of the target gene carrying an intron;
(ii) the DNA targeting region comprises a sequence homologous to a genomic region 3' of the exon(s) encoding the fragment;
(iii) following integration into the host cell genome by homologous recombination, the DNA construct is operably linked to the exon;
(iv) the DNA construct includes:
(a) a transcription template consisting of a DNA sequence capable of terminating transcription of genomic DNA; and
(b) a translation template consisting of a DNA sequence capable of terminating translation of the fragment; and
(v) the DNA construct optionally includes a splicing template comprising an unpaired 3 'splice acceptor site complementary to an unpaired 5' splice donor site in the endogenous exon at the C-terminus of the coding fragment, such that splicing of the primary transcript is correct, resulting in-frame juxtaposition of the translation template to the sequence of the coding fragment.
10. The method of claim 9, wherein: the translation template includes a 3' untranslated region and a translation stop codon.
11. The method of claim 9, wherein: the translation module comprises one or more exogenous exons containing a 3' untranslated region and a translation stop codon that becomes the translation stop codon of the segment by splicing the module in frame with the sequence encoding the segment.
12. The method of claim 5 or 11, wherein: when the exogenous exon is in frame with the exon encoding the fragment, the exogenous exon encodes a protein sequence that is heterologous to the protein sequence contained in the primary translation product encoded by the gene of interest.
13. The method of claim 5 or 11, wherein: when the exogenous exon is in frame with the exon encoding the fragment, the exogenous exon encodes a protein sequence that is homologous to the protein sequence contained in the primary translation product encoded by the gene of interest.
14. The method of claim 5 or 11, wherein: the exogenous exon also encodes a recognition site for a proteolytic enzyme.
15. The method of claim 14, wherein: the proteolytic enzyme is expressed by a host cell.
16. The method of claim 14, wherein: the proteolytic enzyme is a commercially available protease.
17. A method as claimed in claim 1, characterized by: the targeting region comprises two targeting segments.
18. A method as claimed in claim 1, characterized by: the construct also includes one or more selection and/or amplification marker genes.
19. A method as claimed in claim 1, characterized by: the fragment is the extracellular binding domain of a membrane receptor.
20. A method as claimed in claim 1, characterized by: proteins are proteolytic fragments of extracellular structural matrix proteins with anti-angiogenic properties.
21. A method as claimed in claim 1, characterized by: the fragment is a protein that is proteolytically released from an inactive precursor protein.
22. A method according to claim 1, wherein the primary translation product of the gene of interest comprising the exon(s) encoding the protein is a regulatory factor.
23. A method according to claim 1, wherein the primary translation product of the gene of interest comprising the exon(s) encoding the protein is a hormone.
24. A method according to claim 1, wherein the primary translation product of the gene of interest comprising the exon(s) encoding the protein is a cytokine.
25. A method according to claim 1, wherein the primary translation product of the gene of interest comprising the exon(s) encoding the protein is lymphokine.
26. A method according to claim 1, wherein the primary translation product of the gene of interest comprising the exon(s) encoding the protein is a chemokine.
27. A method according to claim 1, wherein the primary translation product of the target gene comprising the exon(s) encoding the protein is a regulator of growth and metabolism of other cells.
28. A method according to claim 1, wherein the primary translation product of the gene of interest comprising an exon encoding a protein is a membrane-bound protein.
29. A method according to claim 1, wherein the primary translation product of the gene of interest comprising the exon(s) encoding the protein is a transmembrane protein.
30. A method according to claim 1, wherein the primary translation product of the gene of interest comprising the exon(s) encoding the protein is an extracellular matrix protein.
31. A method according to claim 1, wherein the primary translation product of the gene of interest comprising the exon(s) encoding the protein is an intracellular protein.
32. A method according to claim 1, wherein the primary translation product of the gene of interest comprising the exon(s) encoding the protein is a nuclear protein.
33. The method of claim 1, wherein: the fragment is selected from the group consisting of soluble tumor necrosis factor-related, inducible cytokines, endostatin, truncated thioredoxin, tyrosine-tRNA synthetase-derived cytokines, and immunoglobulin antigen binding sites.
HK04101655.0A 2000-08-01 2001-08-01 Method of producing functional protein domains HK1059281B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
GBGB0018876.3A GB0018876D0 (en) 2000-08-01 2000-08-01 Method of producing polypeptides
GB0018876.3 2000-08-01
PCT/GB2001/003455 WO2002010372A1 (en) 2000-08-01 2001-08-01 Method of producing functional protein domains

Publications (2)

Publication Number Publication Date
HK1059281A1 HK1059281A1 (en) 2004-06-25
HK1059281B true HK1059281B (en) 2011-07-29

Family

ID=

Similar Documents

Publication Publication Date Title
CN1468304B (en) Method of producing functional protein domains
RU2233334C2 (en) Method for insertion of necessary dna into mammal cell genome and vector system for its realization
AU770119B2 (en) Methods for making recombinant cells
Majors et al. Enhancement of transient gene expression and culture viability using Chinese hamster ovary cells overexpressing Bcl‐xL
KR102288232B1 (en) Novel eukaryotic cells and methods for recombinantly expressing a product of interest
CN112899252B (en) High-activity transposase and application thereof
AU6137599A (en) Expression vectors containing hot spot for increased recombinant protein expression in transfected cells
JP2004518419A (en) Substrate-linked directional evolution (SLiDE)
KR20100097123A (en) Novel recombination sequences
WO2009130598A1 (en) Genetically modified eukaryotic cells
CN115667526A (en) Methods for selecting nucleic acid sequences
HK1059281B (en) Method of producing functional protein domains
AU2001275739B2 (en) Method of producing functional protein domains
US20240400974A1 (en) Novel yeast strains
AU2007202310A1 (en) Method of producing functional protein domains
AU2001275739A1 (en) Method of producing functional protein domains
US20060258007A1 (en) Regulated vectors for selection of cells exhibiting desired phenotypes
Brown et al. Expression Systems for Recombinant Biopharmaceutical Production by Mammalian Cells in Culture
US20250320526A1 (en) Method for production of a eukaryotic host cell or cell line for lambda-integrase-mediated recombination
KR20250169382A (en) Method for Selection Cell Line Expressing High Levels of Recombinant Protein Using Dihydrofolate Reductase Split Expression Vector
WO2026017676A1 (en) A novel genomic safe harbor site in the actb locus
CN120283057A (en) Metabolic selection via serine biosynthetic pathway
Balasubramanian Study of transposon-mediated cell pool and cell line generation in CHO cells
Gadgil et al. Cell Line Development for Biomanufacturing Processes