HK1128490B

HK1128490B - Selection of host cells expressing protein at high levels

Info

Publication number: HK1128490B
Application number: HK09107655.2A
Authority: HK
Inventors: A‧P‧奥特; H‧J‧M‧范布洛克兰; T‧H‧J‧克瓦克斯; R‧G‧A‧B‧西沃尔特
Original assignee: 科罗迈吉尼科斯公司
Priority date: 2006-02-21
Filing date: 2007-02-21
Publication date: 2013-07-05

Description

Selection of host cells expressing high levels of protein

Technical Field

The present invention relates to the fields of molecular biology and biotechnology. More specifically, the present invention relates to methods and means for improving the selection of host cells that express proteins at high levels.

Background

Proteins can be produced in a variety of host cells and are widely used in biology and biotechnology, such as biopharmaceuticals. Eukaryotic, and in particular mammalian host cells, are preferred for the expression of a variety of proteins, for example when such proteins have certain post-translational modifications such as glycosylation. Methods for such production are well established and generally require the expression of a nucleic acid encoding a protein of interest (also referred to as a "transgene") in a host cell. Typically, the transgene is introduced into a precursor cell along with a selectable marker gene, the cell is selected for expression of the selectable marker gene, and one or more clones expressing high levels of the protein of interest are identified and used to express the protein of interest.

One problem with transgene expression is that it is unpredictable, due to the high probability that the transgene becomes inactive due to gene silencing (McBurney et al, 2002), and thus numerous host cell clones need to be tested in order to obtain high expression of the transgene.

Methods for selecting recombinant host cells that express relatively high levels of a desired protein are known, and some such methods are discussed in the introduction to WO2006/048459, which is incorporated herein by reference.

In certain prior art advantageous methods, bicistronic expression vectors are described for the rapid and efficient production of stable mammalian cell lines expressing recombinant proteins. These vectors contain an Internal Ribosome Entry Site (IRES) between the upstream coding sequence of the protein of interest and the downstream coding sequence of the selectable marker (Rees et al, 1996). Such vectors are commercially available, for example the pIRES1 vector is available from Clontech (CLONTECHniques, October 1996). The introduction of such vectors into host cells, followed by selection for sufficient expression of the downstream marker protein, can automatically select for high transcript levels of polycistronic mRNA, and thus can potentially increase the likelihood of high expression of the protein of interest using such vectors. The IRES used in such a method is preferably such that translation of the selectable marker gene is a relatively low level of IRES, thereby further improving the opportunity to select host cells with high expression levels of the protein of interest by expression selection of the selectable marker protein (see, e.g., WO03/106684 and WO 2006/005718).

The present invention aims to provide improved methods and means for selecting host cells that express high levels of a protein of interest.

Brief description of the invention

WO2006/048459, filed at the priority date of the present application but published thereafter, is incorporated herein by reference in its entirety. WO2006/048459 discloses the concept of selecting host cells that express a polypeptide of interest at high levels, which concept is referred to herein as "dependent translation". In this concept, a polycistronic transcription unit is used in which a sequence encoding a selectable marker polypeptide is located upstream of a sequence encoding a polypeptide of interest, wherein translation of the selectable marker polypeptide is impaired by mutations therein, whereas translation of the polypeptide of interest is very high (see, e.g., the schematic of figure 13 herein). The present invention provides alternative methods and means for selecting host cells that express high levels of a polypeptide.

In one aspect, the invention provides a DNA molecule comprising a polycistronic transcription unit encoding i) a polypeptide of interest and ii) a selectable marker polypeptide functional in a eukaryotic host cell, wherein the polypeptide of interest has a translation initiation sequence that is independent of the translation initiation sequence of the selectable marker polypeptide sequence, wherein in the polycistronic transcription unit the coding sequence for the polypeptide of interest is upstream of the coding sequence for the selectable marker polypeptide, wherein an Internal Ribosome Entry Site (IRES) is present downstream of the coding sequence for the polypeptide of interest and upstream of the coding sequence for the selectable marker polypeptide, wherein the nucleic acid sequence encoding the selectable marker polypeptide on the coding strand comprises a translation initiation sequence selected from the group consisting of: a) a GTG start codon; b) a TTG start codon; c) a CTG start codon; d) an ATT start codon; and e) ACG initiation codon.

The translation initiation sequence of the selectable marker polypeptide in the coding strand comprises an initiation codon, such as a GTG, TTG, CTG, ATT or ACG sequence, which is different from the ATG initiation codon, with the first two being most preferred. Such a non-ATG initiation codon is preferably flanked by sequences for relatively better recognition of non-ATG sequences as initiation codons, such that at least some ribosomes are translated from these initiation codons, i.e. the translation initiation sequence preferably comprises ACC [ non-ATG initiation codon ] G or GCC [ non-ATG initiation codon ] G.

In preferred embodiments, the selectable marker protein provides resistance to lethal and/or growth inhibitory effects of a selective agent, such as an antibiotic.

The invention further provides an expression cassette comprising a DNA molecule of the invention, the expression cassette further comprising a promoter upstream of the polycistronic expression unit, which is functional in a eukaryotic host cell and which can initiate transcription of the polycistronic expression unit, said expression cassette further comprising a transcription termination sequence downstream of the polycistronic expression unit.

In a preferred embodiment, such an expression cassette further comprises at least one chromatin control element (chromatin control element) selected from the group consisting of: a matrix or scaffold attachment region (MAR/SAR), an insulating sequence, a Universal Chromatin Opening Element (UCOE), and an anti-repressor (STAR) sequence. Preferred in this regard are anti-repressor sequences, which in certain embodiments are selected from the group consisting of: a) any one of seq.d. No.1 to seq.d. No. 66; b) a fragment of any one of seq.d. No.1 to seq.d. No.66, wherein said fragment has anti-repressor activity; c) a sequence that is at least 70% identical in nucleotide sequence to a) or b), wherein said sequence has anti-repressor activity; and d) the complement of any one of a) to c).

The invention also provides host cells comprising the DNA molecules of the invention.

The invention further provides a method of producing a host cell expressing a polypeptide of interest, the method comprising: the DNA molecules or expression cassettes of the invention are introduced into a plurality of precursor host cells, the cells are cultured under conditions that select for expression of the selectable marker polypeptide, and at least one host cell that produces the polypeptide of interest is selected.

In another aspect, the invention provides a method of producing a polypeptide of interest, the method comprising culturing a host cell comprising an expression cassette of the invention, and expressing the polypeptide of interest from the expression cassette. In a preferred embodiment, the polypeptide of interest is further isolated from the host cell and/or the host cell culture medium.

Brief Description of Drawings

FIG. 1. results for expression constructs of the invention. The expression construct contains a sequence encoding a polypeptide of interest (here exemplified by d2EGFP) upstream of an IRES upstream of a sequence encoding a selectable marker of the invention (here exemplified by a zeocin resistance gene, having a TTG initiation codon (TTG Zeo) (or under the control of its conventional ATG initiation codon (ATG Zeo)). See example 1 for details. The dots represent individual data points; straight lines represent mean expression levels; the constructs used are represented on the horizontal axis and depicted schematically above the diagram; the vertical axis represents the d2EGFP signal.

FIG. 2 results with the tricistronic expression cassette with dhfr as a maintenance marker. The expression construct contains a zeocin selectable marker gene, has a TTG initiation codon and lacks an internal ATG sequence upstream of the sequence encoding the polypeptide of interest (here exemplified by d2EGFP), which is further operably linked to a downstream metabolic selectable marker dhfr gene (with ATG initiation codon) by an IRES. The dots represent individual data points (Zeo on the vertical axis)^RGFP fluorescence signal of colonies), the straight line indicates the mean expression level. The constructs used are shown above the graph with conditions indicated on the horizontal axis (d: days). See example 2 for details.

FIG. 3 shows the same result as in FIG. 2, except that the dhfr gene has a GTG start codon.

FIG. 4 shows the case of FIG. 2, but the dhfr gene has a TTG start codon.

FIG. 5 copy number of clones with dhfr enzyme (ATG start codon) under different conditions. See example 3 for details.

FIG. 6 shows the case of FIG. 5, but the dhfr gene has a GTG start codon.

FIG. 7 shows the case of FIG. 5, but the dhfr gene has a TTG start codon.

Detailed Description

In one aspect, the invention provides a DNA molecule according to claim 1. Such DNA molecules of the invention may be used to obtain eukaryotic host cells expressing high levels of a polypeptide of interest by selecting for expression of a selectable marker polypeptide. One or more host cells expressing the polypeptide of interest can then or concurrently be identified for further use in expressing high levels of the polypeptide of interest.

The term "monocistronic gene" is defined as a gene that provides an RNA molecule encoding a polypeptide. A "polycistronic transcription unit", also known as a polycistronic gene, is defined as a gene that can provide an RNA molecule encoding at least 2 polypeptides. The term "bicistronic gene" is defined as a gene that provides an RNA molecule encoding 2 polypeptides. Accordingly, the bicistronic gene is encompassed in the definition of the polycistronic gene. As used herein, a "polypeptide" comprises at least 5 amino acids joined by peptide bonds, and may be, for example, a protein or a portion of a protein, such as a subunit. In most cases, the terms polypeptide and protein are used interchangeably herein. "Gene" or "transcription unit" used in the present invention may comprise chromosomal DNA, cDNA, artificial DNA, a combination thereof, and the like. A transcription unit comprising several cistrons is transcribed as a single mRNA.

The polycistronic transcription unit of the present invention is preferably a bicistronic transcription unit, encoding from 5 'to 3' the polypeptide of interest and the selectable marker polypeptide. Thus, the polypeptide of interest is encoded upstream of the coding sequence for the selectable marker polypeptide. The IRES is operably linked to a sequence encoding a selectable marker polypeptide, such that translation of the selectable marker polypeptide is dependent on the IRES.

Preferably, separate transcription units are used to express different polypeptides of interest, and the same applies when these form part of a multimeric protein (see, e.g., example 13 of WO2006/048459, incorporated herein by reference: the heavy and light chains of an antibody are each encoded by separate transcription units, each expression unit being a bicistronic expression unit).

The DNA molecule of the invention may exist in the form of a double-stranded DNA having a coding strand for the selectable marker polypeptide and the polypeptide of interest and a non-coding strand, the coding strand being the same strand as the translated RNA sequence except that U is replaced by T. Thus, the AUG initiation codon is encoded by the ATG sequence on the coding strand, and the strand containing this ATG sequence corresponding to the AUG initiation codon on the RNA is referred to as the coding strand of the DNA. It will be clear to the skilled person that the initiation codon or translation initiation sequence is actually present in the RNA molecule, but these can equally be considered on the DNA molecule encoding such an RNA molecule; thus, wherever reference is made herein to an initiation codon or translation initiation sequence, the corresponding DNA molecule having the same sequence as the RNA sequence but with a T instead of a U on the coding strand of the DNA molecule, and vice versa, unless otherwise indicated. In other words, the initiation codon is, for example, an AUG sequence in RNA, but the corresponding ATG sequence on the DNA coding strand is also referred to as the initiation codon in the present invention. The same usage is also used to refer to "in frame" coding sequences, meaning that triplets of amino acids (3 bases) are translated on the RNA molecule, but are also interpreted as corresponding trinucleotide sequences on the coding strand of the DNA molecule.

The selectable marker polypeptide and the polypeptide of interest encoded by the polycistronic gene each have their own translation initiation sequence and thus each have their own initiation codon (and terminator), i.e., they are encoded by separate open reading frames.

The term "selectable marker" or "selectable marker" typically refers to a gene and/or protein whose presence in a cell can be detected directly or indirectly, e.g., a polypeptide (e.g., an antibiotic resistance gene and/or protein) that inactivates a selective agent and protects the host cell from lethal or growth inhibitory effects of the selective agent. Another possibility is that the selectable marker induces fluorescence or color deposits (e.g., Green Fluorescent Protein (GFP) and derivatives (e.g., d2EGFP), luciferase, lacZ, alkaline phosphatase, etc.), which can be used to select for cells expressing a color deposit-inducing polypeptide, e.g., cells expressing GFP using a Fluorescent Activated Cell Sorter (FACS). Preferably, the selectable marker polypeptide of the present invention provides resistance to lethal and/or growth inhibitory effects of the selective agent. The selectable marker polypeptide is encoded by the DNA of the present invention. The selectable marker polypeptide of the present invention must be functional in eukaryotic host cells and therefore capable of being selected in eukaryotic host cells. Any selectable marker polypeptide meeting such criteria may in principle be used in the present invention. Such selectable marker polypeptides are well known in the art and are routinely used to obtain eukaryotic host cell clones, several examples of which are provided herein. In certain embodiments, the selectable marker used in the present invention is zeocin. In other embodiments, blasticidin is used. Those skilled in the art will recognize that other selectable markers are available and may be used, such as neomycin, puromycin, bleomycin and hygromycin and the like. In other embodiments, kanamycin is used. In other embodiments, the DHFR gene is used as a selectable marker that can be selected using methotrexate, particularly with increased concentrations of methotrexate. The DHFR gene can also be used to complement a dhff-deficiency in media with folate and lacking glycine, hypoxanthine and thymidine, for example in CHO cells with a dhff phenotype. Similarly, the Glutamate Synthetase (GS) gene can also be used, and selection can be performed in GS-deficient cells (e.g., NS-0 cells) by culturing in a medium without glutamine, or in cells with sufficient GS (e.g., CHO cells) by adding the GS inhibitor Methionine Sulfoximine (MSX). Other selectable marker genes and their selection agents that may be used are described, for example, in table 1 of U.S. patent No. 5,561,053, which is incorporated herein by reference; see also Kaufman, Methods in Enzymology, 185: 537-566(1990). If the selectable marker polypeptide is dhfr, in an advantageous embodiment the host cell is cultured in a medium containing folate, which medium is substantially free of hypoxanthine and thymidine, preferably also free of glycine.

When two polycistronic transcription units are selected in a single host cell according to the present invention, each preferably contains a coding sequence for a different selectable marker, allowing for selection of two polycistronic transcription units. Of course, two polycistronic transcription units may be present on one nucleic acid molecule or each on a separate nucleic acid molecule.

The term "selection" is typically defined as the process of identifying a host cell with specific genetic properties (e.g., the host cell contains a transgene integrated into its genome) using a selectable marker/selectable marker and a selection agent. It is clear to the person skilled in the art that a combination of various selection markers is feasible. One particularly advantageous antibiotic is zeocin, because the zeocin-resistance protein (zeocin-R) acts by binding a drug such that it becomes harmless. The amount of drug that kills cells expressing zeocin-R at low levels can therefore be readily titrated while allowing high levels of expressing cells to survive. All other commonly used antibiotic resistance proteins are enzymes and therefore act catalytically (and drugs not 1: 1). Therefore, the antibiotic zeocin is a preferred selection marker. Another preferred selectable marker is 5,6, 7, 8-tetrahydrofolate synthetase (dhfr). However, the present invention is applicable to other selection markers.

The selectable marker polypeptide of the invention is a protein encoded by a nucleic acid of the invention, which polypeptide is functionally useful for selection, for example because it provides resistance to a selection agent such as an antibiotic. Thus, when an antibiotic is used as a selection agent, the DNA encodes a polypeptide that confers resistance to the selection agent, which is a selectable marker polypeptide. DNA sequences encoding such selectable marker polypeptides are known, and several examples of wild-type DNA sequences encoding selectable marker proteins are provided herein (e.g., FIGS. 26-32 of WO2006/048459, incorporated herein by reference). It is clear that mutants or derivatives of the selectable marker may also be suitable according to the present invention and are therefore included within the scope of the term "selectable marker polypeptide" as long as the selectable marker protein is functional.

For convenience, and as is also generally accepted by those skilled in the art, in many publications and herein, genes and proteins encoding resistance to a selection agent are often referred to as "selection agent (resistance) genes" or "selection agent (resistance) proteins", respectively, although the formal names may differ, e.g., genes encoding proteins conferring resistance to neomycin (as well as to G418 and kanamycin) are often referred to as neomycin (resistance) (or neo)^r) The gene, and formally named aminoglycoside 3' -phosphotransferase gene.

For the present invention, it is advantageous that the expression level of the selectable marker polypeptide is low, so that stringent selection can be performed. In the present invention, this is due to the use of a selectable marker coding sequence having a non-ATG initiation codon. In the selection, only those cells are selected which still have a sufficient level of the selectable marker polypeptide, meaning that these cells must have sufficient transcription of the polycistronic transcription unit and sufficient translation of the selectable marker polypeptide, which provides for selection of cells in which the polycistronic transcription unit has been integrated or is present at a location in the host cell where the expression level of this transcription unit is high.

The DNA molecules of the invention have the coding sequence for a selectable marker polypeptide downstream of the coding sequence for the polypeptide of interest. Thus, the polycistronic transcription unit comprises in the 5 'to 3' direction (in both the transcribed strand of DNA and the resulting transcribed RNA) a sequence encoding a polypeptide of interest and a coding sequence for a selectable marker polypeptide. The IRES is located upstream of the coding sequence for the selectable marker polypeptide.

According to the present invention, the coding region of the gene of interest is preferably translated from the cap-dependent ORF and the polypeptide of interest is produced in large quantities. The selectable marker polypeptide is translated from an IRES. To reduce translation of the selectable marker cistron, according to the present invention, the nucleic acid sequence encoding the selectable marker polypeptide comprises a mutation in the initiation codon that reduces the efficiency of translation initiation of the selectable marker polypeptide in a eukaryotic host cell. Preferably, the GTG start codon or, more preferably, the TTG start codon is engineered into the selectable marker polypeptide. In the same cell, the translation efficiency is lower than for the corresponding wild-type sequence, i.e.the mutation results in less polypeptide per cell per time unit and thus less selection marker polypeptide.

The translation initiation sequence is often referred to in the art as a "Kozak sequence", and one optimal Kozak sequence is RCCATGG, the start codon is underlined, and R is a purine, i.e., A or G (see Kozak M, 1986, 1987, 1989, 1990, 1997, 2002). Thus, in addition to the initiation codon itself, the bases above and below it, particularly the nucleotides at the-3 to-1 and +4 positions, are also of interest, and an optimal translation initiation sequence contains an optimal initiation codon (i.e., ATG) in the optimal context (i.e., RCC directly before ATG, G directly after ATG). Translation by ribosomes is also most efficient when an optimal Kozak sequence is present (see Kozak M, 1986, 1987, 1989, 1990, 1997, 2002). However, in a small number of cases, non-optimal translation initiation sequences are recognized by the ribosome and used to initiate translation. The present invention utilizes this principle, allowing to reduce or even fine-tune the amount of translation and the amount of expression of the selection marker polypeptide, which can thus be used to increase the stringency of the selection system.

In the present invention, the ATG initiation codon of the selectable marker polypeptide is mutated to another codon that has been reported to provide some translation initiation, such as to GTG, TTG, CTG, ATT or ACG (collectively referred to herein as "non-ATG initiation codons"). In a preferred embodiment, the ATG start codon is mutated to a GTG start codon. This provides a lower level of expression (lower translation) than the complete ATG start codon but in non-optimal sequences. More preferably, the ATG start codon is mutated to the TTG start codon, which provides an even lower level of expression of the selectable marker polypeptide than with the GTG start codon (Kozak M, 1986, 1987, 1989, 1990, 1997, 2002; see also examples 9-13 of WO2006/048459, incorporated herein by reference). The use of a non-ATG initiation codon in the coding sequence for a selectable marker polypeptide in a polycistronic transcription unit of the present invention has not been disclosed or suggested in the prior art, preferably in combination with chromatin control elements, resulting in very high levels of expression of the polypeptide of interest, as also shown in WO2006/048459, incorporated herein by reference.

The use of a non-ATG initiation codon according to the invention, strongly preferred is an initiation codon providing an optimal sequence of bases before and after that, i.e.the non-ATG initiation codon preferably directly follows the RCC nucleotide at the-3 to-1 position and directly follows the G nucleotide (position + 4). However, TTT has been reportedGTGG sequences (start codon underlined) at least some initiation is observed in vitro, so although strongly preferred, it may not be absolutely necessary to provide optimal base sequences around the non-ATG start codon.

The ATG sequences within the coding sequence of the polypeptide (excluding the ATG start codon) are referred to as "internal ATGs" and if these are in frame with the ORF, thus encoding methionine, the methionine in the resulting polypeptide is referred to as "internal methionine". In the invention WO2006/048459, the coding region encoding the selectable marker polypeptide (which does not necessarily include the start codon after the start codon) does not contain any ATG sequence on the coding strand of the DNA up to (but not including) the start codon of the polypeptide of interest. WO2006/048459 discloses how to do this and to test the functionality of the resulting selectable marker polypeptide. For the purposes of the present invention, where the selectable marker polypeptide coding sequence is located downstream of the IRES and downstream of the coding sequence for the polypeptide of interest, the internal ATG may be retained in the sequence encoding the selectable marker polypeptide.

Clearly, it is strongly preferred according to the invention that the translation initiation sequence of the polypeptide of interest comprises an optimal translation initiation sequence, i.e.has the consensus sequence RCCATGG (start codon underlined). This results in a very efficient translation of the polypeptide of interest.

The stringency of selection can be increased by providing coding sequences for markers with different mutations, resulting in several reduced levels of translational efficiency. The selection system can thus be fine-tuned using the polycistronic transcription unit of the invention: for example, using GTG as the start codon for the selectable marker polypeptide, only a few ribosomes will be translated from this start codon, resulting in a low level of selectable marker protein, and thus high stringency of selection; the use of a TTG start codon increases the stringency of selection even further, as less ribose will translate the selectable marker polypeptide from this start codon.

It is shown in WO2006/048459 incorporated herein by reference that the polycistronic expression units disclosed therein can be used in a very robust selection system, resulting in a very large proportion of clones expressing the polypeptide of interest at high levels as desired. In addition, the expression levels of the polypeptide of interest obtained are significantly higher than those of clones obtained when screening even larger numbers of colonies with the selection systems known so far.

In addition to reduced translation initiation efficiency, it may be beneficial to reduce the efficiency of translational extension of the selectable marker polypeptide, for example by mutating its coding sequence such that it contains several non-preferred host cell codons to further reduce the level of translation of the marker polypeptide, and if desired, more stringent selection conditions may be used. In certain embodiments, the selectable marker polypeptide further comprises a mutation that reduces the activity of the selectable marker polypeptide (compared to the wild-type) in addition to the mutation that reduces translation efficiency of the invention. This can be used to further increase the stringency of selection. By way of non-limiting example, the proline at position 9 of the zeocin resistance polypeptide may be mutated, for example, to Thr or Phe (see example 14 of WO2006/048459, incorporated herein by reference), and for neomycin resistance polypeptides, amino acid residues 182 or 261 or both may be further mutated (see, for example, WO 01/32901).

In some embodiments of the invention, a so-called spacer sequence is placed downstream of the sequence encoding the start codon of the selectable marker polypeptide, which spacer sequence is preferably in-frame with the start codon and encodes several amino acids, without secondary structure (Kozak, 1990). If secondary structure is present in the RNA of the selection marker polypeptide (e.g.for zeocin or for blasticidin), such a spacer sequence may be used to further reduce the translation initiation frequency (Kozak, 1990) and thus increase the stringency of the selection system of the invention (see example 14 of WO2006/048459, incorporated herein by reference).

It will be clear that any DNA molecule as described above but having a mutation in the sequence downstream of the first ATG sequence (start codon) encoding the selectable marker protein may also be used and is therefore also encompassed within the scope of the present invention, as long as the respective encoded selectable marker polypeptide is still active. For example, any silent mutation that does not alter the encoded protein due to redundancy of the genetic code is also contemplated. Other mutations that result in conservative amino acid mutations or other mutations are also contemplated, as long as the encoded protein is still active, which may or may not be less active than the wild-type protein encoded by the sequence. In particular, it is preferred that the encoded protein is at least 70%, more preferably at least 80%, more preferably at least 90%, more preferably at least 95% identical to the protein encoded by the respective representative sequence (e.g. as provided in SEQ ID No.68-80 of the sequence listing in this application). The activity of the selectable marker protein can be detected by conventional methods.

A preferred aspect of the invention provides an expression cassette comprising a DNA molecule of the invention having a polycistronic transcription unit. Such expression cassettes can be used to express a sequence of interest, for example in a host cell. As used herein, an "expression cassette" is a nucleic acid sequence comprising at least one promoter functionally linked to a sequence to be expressed. Preferably, the expression cassette further comprises transcription termination and polyadenylation sequences. Other regulatory sequences such as enhancers may also be included. Accordingly, the present invention provides an expression cassette comprising, in the following order: the 5 '-promoter-the polycistronic transcription unit of the invention, encodes the polypeptide of interest and downstream thereof the selectable marker polypeptide-transcription termination sequence-3'. The promoter must be capable of functioning in a eukaryotic host cell, i.e., it must be capable of driving transcription of the polycistronic transcription unit. The promoter is thus operably linked to the polycistronic transcription unit. The expression cassette may optionally further contain other elements known in the art, such as splice sites to include introns and the like. In some embodiments, the intron is present after the promoter and before the sequence encoding the polypeptide of interest. The IRES is operably linked to a cistron containing the coding sequence for the selectable marker polypeptide. In further embodiments, the sequence encoding the second selectable marker is present in a polycistronic transcription unit (i.e., in these embodiments this is at least a tricistronic transcription unit). In a preferred embodiment, the sequence encoding the second selectable marker polypeptide: a) having a translation initiation sequence separate from the translation initiation sequence of the polypeptide of interest, b) being located upstream of said sequence encoding the polypeptide of interest, c) having no ATG sequence in the coding chain between the start codon of said second selectable marker polypeptide and the start codon of the polypeptide of interest, and d) having a non-optimal translation initiation sequence, such as a GTG start codon or a TTG start codon. For such embodiments, a preferred selectable marker polypeptide is 5,6, 7, 8-tetrahydrofolate synthase (dhfr). This allows for a continuous selection of high levels of expression of the polypeptide of interest, as described in example 2.

To obtain expression of a nucleic acid sequence encoding a protein, it is known to those skilled in the art that sequences capable of driving such expression can be functionally linked to the nucleic acid sequence encoding the protein to produce the recombinant nucleic acid sequence encoding the protein in an expressible form. In the present invention, the expression cassette comprises a polycistronic transcription unit. Typically, the promoter sequence is located upstream of the expressed sequence. Widely used expression vectors are available in the art, such as the pcDNA and pEF vector series of Invitrogen, pMSCV and pTK-Hyg of BD Sciences, pCMV-Script of Stratagene, and the like, which can be used to obtain appropriate promoter and/or transcription terminator sequences, polyA sequences, and the like.

The sequence encoding the polypeptide of interest is suitably inserted into a sequence controlling the transcription and translation of the encoded polypeptide, and the resulting expression cassette can be used to produce the polypeptide of interest, referred to as expression. Sequences that drive expression may include promoters, enhancers, and the like, as well as combinations thereof. These should be capable of functioning in the host cell, thus driving expression of those nucleic acid sequences to which they are functionally linked. One skilled in the art recognizes that different promoters can be used to obtain expression of a gene in a host cell. Promoters may be constitutive or regulated, and may be obtained from different sources, including viral, prokaryotic or eukaryotic sources, or artificially designed. Expression of the nucleic acid of interest may be initiated from a native promoter or derivative thereof or from a completely heterologous promoter (Kaufman, 2000). According to the present invention, strong promoters giving high transcription levels in selected eukaryotic cells are preferred. Suitable promoters are well known and available to those skilled in the art, some of which are described in WO2006/048459 (e.g., pages 28-29), incorporated herein by reference, including the CMV Immediate Early (IE) promoter (referred to herein as the CMV promoter) (e.g., obtained from pcDNA from Invitrogen) and many others.

In certain embodiments, the DNA molecule of the invention is part of a vector, such as a plasmid. Such vectors are readily manipulated by methods well known to those skilled in the art and can, for example, be designed to replicate in prokaryotic and/or eukaryotic cells. In addition, many vectors can be used to transform eukaryotic cells, either directly or as fragments of interest isolated therefrom, and integrate all or part of these into the genome of these cells, resulting in a stable host cell comprising the desired nucleic acid in its genome.

Conventional expression systems are DNA molecules in the form of recombinant plasmids or recombinant viral genomes. Plasmids or viral genomes are introduced into (eukaryotic) cells and preferably integrated into their genome by methods known in the art, some aspects of which are described in WO2006/048459 (e.g. pages 30-31), incorporated herein by reference.

It is widely recognized that chromatin structure and other epigenetic regulatory mechanisms can influence transgene expression in eukaryotic cells (e.g., Whitelaw et al, 2001). The polycistronic expression unit in this aspect forms part of a selection system with a rigid selection scheme. This usually requires high transcription levels in the host cell of choice. In order to increase the chance of finding a viable host cell clone under stringent selection regimes and possibly increase the stability of expression in the resulting clones, it is generally preferred to increase the predictability of transcription. Thus, in a preferred embodiment, the expression cassette of the invention further comprises at least one chromatin control element. "chromatin control elements" are herein a generic term for DNA sequences which have a role in the chromatin structure in eukaryotic cells and thus in the expression level and/or expression stability of the transgene in their vicinity (they act in "cis" and are therefore preferably located within 5kb, more preferably 2kb, more preferably 1kb of the transgene). These elements are sometimes used to increase the number of clones with desired transgene expression levels. Several types of these elements that may be used in the present invention are described in WO2006/048459 (e.g.pages 32-34), incorporated herein by reference, and for the purposes of the present invention, chromatin control elements are selected from the following: matrix or scaffold attachment regions (MAR/SAR), insulators such as beta-globin insulator elements (5 'HS 4 of the chicken beta-globin locus), scs, scs', and the like, Universal Chromatin Opening Elements (UCOE) and anti-repressor sequences (also known as "STAR" sequences).

Preferably, the chromatin control element is an anti-repressor sequence, preferably selected from the group consisting of: a) any one of seq.d. No.1 to seq.d. No. 66; b) a fragment of any one of seq.d. No.1 to seq.d. No.66, wherein said fragment has anti-repressor activity ("functional fragment"); c) a sequence which is at least 70% identical in nucleotide sequence to a) or b), wherein said sequence has anti-repressor activity ("functional derivative"); d) the complement of any one of a) to c). Preferably, the chromatin control element is selected from the group consisting of: STAR67(seq.id.no.66), STAR7(seq.id.no.7), STAR9(seq.id.no.9), STAR17(seq.id.no.17), STAR27(seq.id.no.27), STAR29(seq.id.no.29), STAR43(seq.id.no.43), STAR44(seq.id.no.44), STAR45(seq.id.no.45), STAR47(seq.id.no.47), STAR61(seq.id.no.61), or a functional fragment or derivative of said STAR sequence. In a preferred embodiment, the STAR sequence is STAR67(seq. id. No.66) or a functional fragment or derivative thereof. In certain preferred embodiments, STAR67 or a functional fragment or derivative thereof is located upstream of a promoter that drives expression of the polycistronic transcription unit. In other preferred embodiments, the expression cassette of the invention is flanked on both sides by at least 1 anti-repressor sequence, for example by one of seq id No.1 to seq id No.65, preferably with the 3' end of each of these sequences facing the transcription unit. In certain embodiments, the expression cassettes of the invention comprise, in order from 5 'to 3': anti-repressor sequence A-anti-repressor sequence B- [ promoter-polycistronic transcription unit of the invention (encoding the polypeptide of interest and a functional selectable marker protein downstream thereof) -transcription termination sequence ] -anti-repressor sequence C, wherein A, B and C may be the same or different.

Sequences having anti-repressor activity (anti-repressor sequences) and their characteristics, as well as functional fragments or derivatives thereof, as well as their structural and functional definitions, and methods of obtaining and using them (these sequences can be used in the present invention) are described in WO2006/048459 (e.g. pages 34-38), incorporated herein by reference.

For the production of multimeric proteins, 2 or more expression cassettes may be used. Preferably, both expression cassettes are polycistronic expression cassettes of the invention, each encoding a different selectable marker protein, such that both expression cassettes can be selected. This embodiment has been shown to give good results, for example for expression of antibody heavy and light chains. It will be clearly understood that the two expression cassettes may be located on one nucleic acid molecule or that the two may be present on separate nucleic acid molecules prior to introduction into the host cell. The advantage of placing them on a nucleic acid molecule is that the two expression cassettes are present in a predetermined ratio (1: 1) when introduced into the host cell. On the other hand, when present on two different nucleic acid molecules, the molar ratio of the two expression cassettes can be varied when introduced into the host cell, which is advantageous for those cases where the preferred molar ratio is not 1: 1 or where it is not known beforehand what is the preferred molar ratio, so that the person skilled in the art can easily vary and empirically find the optimum ratio. According to the present invention, preferably at least one expression cassette, but more preferably each expression cassette comprises a chromatin control element, more preferably an anti-repressor sequence.

In another embodiment, different subunits or portions of the multimeric protein are present in one expression cassette.

Configurations of useful anti-repressor sets and expression cassettes have been described in WO2006/048459 (e.g., page 40), incorporated herein by reference.

In certain embodiments, the transcription unit or expression cassette of the invention provided further comprises a transcription pause (TRAP) sequence, substantially as described in WO2006/048459, pages 40-41, incorporated herein by reference. A non-limiting example of a TRAP sequence is given in seq.id No. 81. Other TRAP sequence examples, methods for their discovery and their use are described in WO 2004/055215.

DNA molecules comprising the polycistronic transcription unit and/or the expression cassette of the invention may be used to improve expression of the nucleic acid, preferably in a host cell. The terms "cell"/"host cell" and "cell line"/"host cell line" are typically defined as cells and their cognate population, respectively, that can be maintained in cell culture by methods known in the art, and that have the ability to express heterologous or homologous proteins.

Several examples of host cells that can be used are described in WO2006/048459 (e.g., pages 41-42), incorporated herein by reference, such cells including, for example, mammalian cells, including but not limited to CHO cells, e.g., CHO-K1, CHO-S, CHO-DG44, CHO-DUKXB11, including CHO cells having a dhfr-phenotype, and myeloma cells (e.g., Sp2/0, NS0), HEK 293 cells, and PER. C6 cells.

These eukaryotic host cells can express the desired polypeptide and are often used for these purposes. They can be obtained by introducing the DNA molecules of the invention, preferably in the form of expression cassettes, into cells. Preferably, the expression cassette is integrated into the genome of the host cell, can be integrated into different sites in different host cells, and clones can be selected for transgene integration into the appropriate site, resulting in host cell clones with the desired properties in terms of expression level, stability and growth characteristics. Alternatively, the polycistronic transcription unit may be targeted or randomly selected for integration into a chromosomal transcriptionally active region, e.g., following a promoter present in the genome. Selection of cells containing the DNA of the invention is carried out by selection of the selectable marker polypeptide using conventional methods known to those skilled in the art. When such a polycistronic transcription unit is integrated into the promoter of the genome, the expression cassette of the invention may be generated in situ, i.e.in the genome of the host cell.

Preferably, the host cells are obtained from stable clones selected and passaged according to standard methods known to those skilled in the art. If these cells contain the polycistronic transcription unit of the invention, a culture of such a clone will produce the polypeptide of interest.

Introduction of the expressed nucleic acid into the cell may be carried out by one of several methods, which are known to the person skilled in the art and depend on the form of the introduced nucleic acid. Such methods include, but are not limited to, transfection, infection, injection, transformation, and the like. Suitable host cells for expression of the polypeptide of interest may be obtained by selection.

In a preferred embodiment, the DNA molecule comprising the polycistronic transcription unit of the invention, preferably in the form of an expression cassette, is integrated into the genome of the eukaryotic host cell of the invention. This will provide stable inheritance of polycistronic transcription units.

Selection for the presence of the selectable marker polypeptide, and thus expression, can be performed at the time the cells were obtained. In certain embodiments, the selective agent is present in the culture medium at least part of the time during the culturing process, either at a sufficient concentration to select for cells expressing the selectable marker polypeptide, or at a lower concentration. In a preferred embodiment, the selective agent is no longer present in the culture medium during the production phase when the polypeptide is expressed.

The polypeptide of interest of the invention may be any protein, possibly a monomeric or multimeric protein (or a portion). The multimeric protein comprises at least two polypeptide chains. Non-limiting examples of proteins of interest of the present invention are enzymes, hormones, immunoglobulin chains, therapeutic proteins such as anti-cancer proteins, hemagglutinin proteins such as factor VIII, multifunctional proteins such as erythropoietin, diagnostic proteins, or proteins or fragments thereof for vaccination purposes, all of which are known to those skilled in the art.

In certain embodiments, the expression cassette of the invention encodes an immunoglobulin heavy or light chain or antigen-binding portion, derivative and/or analog thereof. In a preferred embodiment, a protein expression unit of the invention is provided, wherein the protein of interest is an immunoglobulin heavy chain. In another preferred embodiment, a protein expression unit of the invention is provided, wherein the protein of interest is an immunoglobulin light chain. When these two protein expression units are present in the same (host) cell, the multimeric protein, more specifically the immunoglobulin, is synthesized. Thus, in certain embodiments, the protein of interest is an immunoglobulin, e.g., an antibody, that is a multimeric protein. Preferably, such an antibody is a human or humanized antibody. In certain embodiments, it is an IgG, IgA, or IgM antibody. The immunoglobulin may be encoded by the heavy and light chains on different expression cassettes or on one expression cassette. Preferably, the heavy and light chains are present on different expression cassettes, each having its own promoter (which may be the same or different for both expression cassettes), each comprising a polycistronic transcription unit of the invention, the heavy and light chains being a polypeptide of interest, preferably each encoding a different selectable marker protein, such that the heavy and light chain expression cassettes can be selected when the expression cassettes are introduced into and/or present in a eukaryotic host cell.

The polypeptide of interest may be from any source, and in certain embodiments is a mammalian protein, an artificial protein (e.g., a fusion protein or a mutein), preferably a human protein.

Obviously, the expression cassette configuration of the invention can also be used when the ultimate goal is not to produce the polypeptide of interest but the RNA itself, e.g. to produce an increased amount of RNA from the expression cassette, which may be used for the purpose of modulating other genes (e.g. RNAi, reverse RNA), gene therapy, in vitro protein expression, etc.

In one aspect, the invention provides a method for producing a host cell expressing a polypeptide of interest, the method comprising introducing a DNA molecule or expression cassette of the invention into a plurality of precursor cells, culturing the produced cells under selective conditions and selecting at least one host cell that produces the polypeptide of interest. The advantages of this new process are similar to the alternative process described in WO2006/048459 (e.g. pages 46-47) and incorporated herein by reference.

When clones with relatively low copy numbers of polycistronic transcription units and high expression levels are available, the selection system of the present invention can still be combined with amplification methods to further improve expression levels. This can be achieved, for example, by amplifying the dhfr gene co-integrated with methotrexate, for example by placing dhff on the same nucleic acid molecule as the polycistronic transcription unit of the invention, or by co-transfection when dhfr is on a different DNA molecule. The dhfr gene may also be part of a polycistronic expression unit of the invention.

The invention also provides a method of producing one or more polypeptides of interest, the method comprising culturing a host cell of the invention.

Culturing the cell so that it can metabolize and/or grow and/or divide and/or produce the recombinant protein of interest. This can be accomplished by methods well known to those skilled in the art, including but not limited to providing nutrients to the cells. The methods comprise adherent growth, suspension growth, or a combination thereof. The cultivation may be carried out, for example, in a petri dish, a spinner flask or a bioreactor, using batch, fed-batch, continuous systems such as perfusion systems, etc. In order to achieve large-scale (continuous) production of recombinant proteins by cell culture, it is preferred in the art to use cells which can be cultured in suspension, preferably cells which can be cultured in the absence of serum of animal or human origin or serum components of animal or human origin.

The conditions under which the cells are grown or propagated (see, for example, Tissue Culture, Academic Press, Kruseand Paterson, editors (1973)) and the conditions under which the recombinant product is expressed are known to those skilled in the art. In general, the principles, procedures and operating techniques to maximize productivity of Mammalian Cell cultures can be found in Mammalian Cell Biotechnology: a Practical Approach (m.butler, ed., IRLPress, 1991).

In preferred embodiments, the expressed protein is collected (isolated) from the cell or from the culture medium or from both. It may be further purified by known methods such as filtration, column chromatography, etc. well known to those skilled in the art.

The selection methods of the invention work without chromatin control elements, but improved results are obtained when polycistronic expression units are provided with such elements. The selection method of the invention works particularly well when the expression cassette of the invention used comprises at least one anti-repressor sequence. Depending on the selection agent and conditions, the selection may in some cases be so stringent that only very few or even no host cells survive the selection unless an anti-repressor sequence is present. Thus, the combination of the new selection method and the anti-repressor sequence provides a very attractive way of obtaining a limited number of clones with a greatly improved chance of highly expressing the polypeptide of interest, while the obtained clones comprising expression cassettes with anti-repressor sequences provide stable expression of the polypeptide of interest, i.e. they are less prone to silencing or other expression-reducing mechanisms than conventional expression cassettes.

In one aspect, the present invention provides a polycistronic transcription unit having a different configuration than that disclosed in WO 2006/048459: in various configurations of the invention, the sequence encoding the polypeptide of interest is located upstream of the sequence encoding the selectable marker polypeptide operably linked to a cap-independent translation initiation sequence, preferably an Internal Ribosome Entry Site (IRES). Such polycistronic transcription units are also known (e.g., Rees et al, 1996, WO 03/106684), but have not been combined with a non-ATG initiation codon. According to various methods of the invention, the initiation codon of the selectable marker polypeptide is changed to a non-ATG initiation codon to further reduce the translational initiation rate of the selectable marker. This therefore results in a reduction in the level of expression of the desired selectable marker polypeptide and may result in efficient selection of host cells which express high levels of the polypeptide of interest, as described in the embodiment shown in WO 2006/048459. One possible benefit of this different aspect of the invention compared to the embodiment of WO2006/048459 is that the coding sequence for the selectable marker polypeptide need not be further modified with internal ATG sequences, since any internal ATG sequences therein may remain unchanged, since they are no longer involved in the translation of downstream polypeptides. This is particularly advantageous when the coding sequence for the selection marker polypeptide contains several internal ATG sequences, since for the present invention it is no longer necessary to perform the work of altering these sequences and testing the functionality of the resulting construct: in this case, it is sufficient to mutate only the ATG initiation codon. The following (example 1) shows that this modification provided by the present invention also produces very good results.

The coding sequence for the selectable marker polypeptide in the DNA molecule of the invention is translated under the control of an IRES, however the coding sequence for the polypeptide of interest is preferably translated in a cap-dependent manner. The coding sequence for the polypeptide of interest comprises a stop codon such that translation of the first cistron terminates upstream of the IRES operably linked to the second cistron.

It will be readily apparent to those skilled in the art upon reading the present invention that most of these polycistronic expression units can be advantageously altered in the same manner as the polycistronic expression units having the coding sequences for the polypeptide of interest and the selection marker polypeptide in reverse order (i.e., the polycistronic transcription unit of WO2006/048459, incorporated herein by reference). For example, the preferred initiation codon for the selectable marker polypeptide, the presence of integration into the expression cassette, host cell, promoter, chromatin control elements, and the like, can be altered and used in preferred embodiments as described above. The use of these polycistronic expression units and expression cassettes is also described above. Thus, this aspect is indeed an alternative to the methods and means described in WO2006/048459, the main difference being that the order of the polypeptides in the polycistronic expression unit is reversed, and the IRES is now essential for translation of the selectable marker polypeptide.

As used herein, "internal ribosome entry site" or "IRES" refers to an element that facilitates direct internal ribosome entry into the initiation codon of a cistron (protein coding region) such as ATG in general, but is preferably GTG or TTG in the present invention, thereby producing cap-independent gene translation. See, for example, Jackson R J, Howell M T, Kaminski A (1990) Trends Biochem Sci15 (12): 477-83) and Jackson R J and Kaminski, A. (1995) RNA 1 (10): 985-1000. The present invention encompasses the use of any cap-independent translation initiation sequence, in particular any IRES element that can facilitate direct internal ribosome entry into the cistron initiation codon. As used herein, "under translation control of an IRES" means that translation is associated with the IRES and proceeds in a cap-independent manner. The term "IRES" as used herein includes functional variants of the IRES sequence, provided that such variants are capable of facilitating direct internal ribosome entry into the cistron initiation codon. As used herein, "cistron" refers to a polynucleotide sequence or gene for a protein, polypeptide, or peptide of interest. "operably linked" refers to a state in which the described components are in a relationship such that they can function in the intended manner. Thus, for example, a promoter is "operably linked" to a cistron in such a way that expression of the cistron is achieved under conditions compatible with the promoter. Similarly, the nucleotide sequence of the IRES is operably linked to the cistron in such a way that translation of the cistron is achieved under conditions compatible with the IRES.

Internal ribosome binding site (IRES) elements are known from viral and mammalian genes (Martinez-Salas, 1999), and have also been identified in the screening of small synthetic oligonucleotides (Venkatesan & Dasgupta, 2001). IRES of encephalomyocarditis virus has been analyzed in detail (mizugchi et al, 2000). An IRES is an element encoded in DNA that creates a structure in transcribed RNA to which eukaryotic ribosomes can bind and initiate translation. IRES allows 2 or more proteins to be produced from a single RNA molecule (the first protein is translated by ribosomes bound to the RNA 5' end cap structure (Martinez-Salas, 1999)). Protein translation from IRES elements is less efficient than cap-dependent translation: the amount of protein obtained from IRES-dependent Open Reading Frames (ORFs) ranged from 20% to 50% less than the amount from the first ORF (mizugchi et al, 2000). IRES-dependent reduction of translation efficiency provides an advantage for utilizing this embodiment of the invention. Furthermore, mutations in the IRES elements may attenuate their activity, reducing expression from the IRES-dependent ORF to less than 10% of the first ORF (Lopez de Quinto & Martinez-Salas, 1998, Rees et al, 1996). Thus, it is clear to one skilled in the art that altering an IRES may have no effect on the basic function of the IRES (thus providing a protein translation initiation site with reduced translation efficiency), resulting in a modified IRES. Thus the use of such modified IRES that still provide a small percentage of translation (compared to 5' cap translation) is also encompassed by the present invention. The present invention uses a non-ATG start codon to significantly further reduce the initiation of translation of the selectable marker ORF, thus further improving the chances of obtaining a preferred host cell, i.e., a host cell expressing high levels of the recombinant protein of interest.

U.S. Pat. Nos. 5,648,267 and 5,733,779 describe a consensus Kozak sequence with an attenuation ([ Py)]xxATG[Py]Wherein [ Py]Use of a dominant selectable marker sequence that is a pyrimidine nucleotide (i.e., C or T), X is a nucleotide (i.e., G, A, T or C), and the ATG initiation codon is underlined. U.S. Pat. No.6,107,477 describes non-optimal Kozak sequences of selectable marker genesColumn (AGATCTTT)ATGGACC, where ATG initiation codon is underlined). None of these patents describe the use of a non-ATG initiation codon, nor do they provide any suggestion to do so. Further, they do nothing more than combine with IRES. Moreover, because the IRES itself already has reduced translation initiation compared to cap-dependent translation, it was not possible to predict prior to the present invention whether combining an IRES and a selection marker's non-ATG initiation codon would provide sufficient translation of the selection marker polypeptide to produce any selectable level of selection marker polypeptide. The present invention shows this result, providing a surprisingly effective selection system.

The invention also provides a DNA molecule comprising a sequence encoding a selectable marker polypeptide operably linked to an IRES sequence, wherein the coding sequence encoding the selectable marker polypeptide comprises a translation initiation sequence selected from the group consisting of: a) a GTG start codon; b) a TTG start codon; c) a CTG start codon; d) an ATT start codon; and e) ACG initiation codon.

Those skilled in the art will appreciate that further modifications to the present invention are possible, such as those described in US2006/0195935, particularly examples 20-27 thereof, which are incorporated herein by reference.

In certain embodiments, the mammalian 5,6, 7, 8-tetrahydrofolate synthetase dihydrofolate reductase (dhfr) can be made to have dhfr by removing hypoxanthine and thymidine from the culture medium (preferably glycine is also removed) and including folate in the culture medium (or (dihydrofolate)^-Phenotypic cells (e.g., CHO-DG44 cells) are used as selectable markers (Simonsen et al, 1988). The dhfr gene may for example be derived from a mouse genome or mouse cDNA and used in the present invention, it is preferably provided with a GTG or TTG start codon (see dhfr gene sequence of seq. In all of these embodiments, "removed from the medium" means that the medium is substantially free of the indicated component, meaning that there are insufficient indicated components present in the medium to sustain cell growth, such that when the genetic information for the indicated enzyme is expressed in the cell and the indicated precursor component is present in the mediumThen, a good selection can be made. For example, the indicated component is present at a concentration of less than 0.1% of the concentration it would normally be used in the culture medium of a certain cell type. Preferably, the indicated components are not present in the culture medium. Media without the indicated components can be prepared by standard methods by those skilled in the art or can be obtained from commercial media suppliers. One potential advantage of using these types of metabolic enzymes as selectable marker polypeptides is that they can be used to place polycistronic transcription units under conditions of sequential selection, which may result in higher expression of the polypeptide of interest.

In another aspect, the invention uses a dhfr metabolic selectable marker as an additional selectable marker on a polycistronic transcription unit of the invention. In such embodiments, selection of host cell clones with high expression is first established by using, for example, antibiotic selection markers such as zeocin, neomycin, etc., the coding sequences of these markers of the invention will have either a GTG or TTG start codon. After appropriate clonal selection, antibiotic selection is complete and continuous or intermittent selection using a metabolic enzyme selection marker can be performed by culturing the cells in a medium lacking the appropriate identified component described above and containing the appropriate precursor component described above. In this regard, the metabolic selection marker is operably linked to an IRES and may have its usual ATG component, and the initiation codon may be appropriately selected from GTG or TTG. In this regard, the polycistronic transcription unit is at least a tricistron.

The practice of the present invention will employ, unless otherwise indicated, conventional immunological, molecular biological, microbiological, cell biological and recombinant DNA techniques, which are within the knowledge of one of ordinary skill in the art. See Sambrook, Fritsch and Maniatis, Molecular Cloning: a Laboratory Manual, second edition, 1989; current Protocols in Molecular Biology, Ausubel FM, et al, eds, 1987; the series Methods in Enzymology (Academic Press, Inc.); PCR 2: APractcal Approach, MacPherson MJ, Hams BD, Taylor GR, eds, 1995; antibodies: a Laboratory Manual, Harlow and Lane, eds, 1988.

The invention will be further elucidated in the following examples. The examples are not intended to limit the invention in any way. They are merely intended to illustrate the invention.

Examples

Example 1 describes a selection system with a polycistronic transcription unit of the present invention, it is clear that the variations described in examples 8-26 of WO2006/048459, which is incorporated herein by reference, can also be used and tested with a polycistronic transcription unit of the present invention. The same is true for examples 20-27 of US 2006/0195935.

Example 1: stringent selection by placing the modified Zeocin resistance Gene after IRES sequence

Examples 8-26 of WO2006/048459 (incorporated herein by reference in its entirety) have shown a selection system in which the sequence encoding the selectable marker protein on the polycistronic transcription unit is upstream of the sequence encoding the protein of interest, wherein the translational start sequence of the selectable marker is non-optimal, wherein the remaining internal ATGs have been removed from the selectable marker coding sequence. This system results in a high stringency selection system. For example, Zeo selection markers, in which the translation initiation codon is changed to TTG, show very high selection stringency, as well as very high levels of expression of the downstream encoded protein of interest.

In another possible selection system (i.e., the system of the present invention), a selection marker such as Zeo is placed downstream of the IRES sequence. This results in a polycistronic mRNA from which the Zeo gene product is translated via IRES-dependent initiation. In the usual d2EGFP-IRES-Zeo construct (i.e.one of the prior art, e.g.WO 2006/005718), the Zeo initiation codon is the optimal ATG. We tested whether altering the Zeo ATG initiation codon to, for example, TTG (referred to as IRES-TTG Zeo) results in increased selection stringency compared to the usual IRES-ATG Zeo.

Results

The constructs used are shown in FIG. 1. The control construct consisted of the CMV promoter, the d2EGFP gene, the IRES sequence (the sequence of the IRES used (Rees et al, 1996) in this example is GCCCCTCTCCCTCCCCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAATAAGGCCGGTGTGCGTTTGTCTATATGTGATTTTCCACCATATTGCCGTCTTTTGGCAATGTGAGGGCCCGGAAACCTGGCCCTGTCTTCTTGACGAGCATTCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATGTCGTGAAGGAAGCAGTTCCTCTGGAAGCTTCTTGAAGACAAACAACGTCTGTAGCGACCCTTTGCAGGCAGCGGAACCCCCCACCTGGCGACAGGTGCCTCTGCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAACCCCAGTGCCACGTTGTGAGTTGGATAGTTGTGGAAAGAGTCAAATGGCTCTCCTCAAGCGTATTCAACAAGGGGCTGAAGGATGCCCAGAAGGTACCCCATTGTATGGGATCTGATCTGGGGCCTCGGTGCACATGCTTTACATGTGTTTAGTCGAGGTTAAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTTTTCCTTTGAAAAACACGATGATAAGCTTGCCACAACCCCGGGATA; SEQ. ID. NO.82) and the TTG Zeo selection marker, i.e.the zeocin resistance gene with the TTG start codon ('d 2EGFP-IRES-TTG Zeo'). The same is true for the other construct, but contains a combination of STAR7 and STAR67 upstream of the expression cassette and STAR7 downstream of the expression cassette ('STAR 7/67d2EGFP-IRES-TTG Zeo STAR 7'). Both constructs were transfected into CHO-K1 cells and selected in medium containing 100. mu.g/ml Zeocin. 4 clones appeared after transfection with the control construct, and 6 clones appeared after transfection with the construct containing STAR. These independent colonies were isolated and cultured prior to analysis of d2EGFP expression levels. As shown in figure 1, integration of STAR elements in the construct resulted in the formation of colonies with high d2EGFP expression levels. Whereas only one clone of the control colony without STAR element ('d 2EGFP-IRES-TTG Zeo') showed some d2EGFP expression. The expression levels were also much higher than those with other control constructs containing IRES with a conventional Zeo containing a standard ATG initiation codon with or without STAR elements ('d 2EGFP-IRES-ATG Zeo' and 'STAR 7/67d 2EGFP-IRES-ATG Zeo STAR 7'; the enhancement of STAR elements was also found in these ATG Zeo constructs, but was low compared to the new TTG Zeo variants).

These results indicate that placing a Zeo selection marker with a TTG initiation codon downstream of the IRES sequence, in combination with STAR elements, works well and establishes a stringent selection system.

From these data and examples 8-26 of WO2006/048459 and examples 20-27 of US2006/0195935, it is clear that the labels can be varied in the same way as examples 8-26 of WO2006/048459 and examples 20-27 of US 2006/0195935. For example, instead of the TTG start codon, a GTG start codon may be used, and the label may be changed from Zeo to a different label, e.g., Neo, Blas, dhfr, puro, etc., all with GTG or TTG as start codon. STAR elements can be altered with different STAR sequences or different substitutions thereof, or they can be replaced with other chromatin control elements, such as MAR sequences. This leads to an improvement of the prior art selection system for IRES with a marker for the common ATG initiation codon.

As a non-limiting example, instead of the modified Zeo resistance gene (TTG Zeo), a modified neomycin resistance gene is placed downstream of the IRES sequence. This modification is the substitution of the translation initiation codon ATG of the Neo coding sequence with the TTG translation initiation codon, resulting in TTG Neo. CHO-K1 cells were transfected with CMV-d2EGF-IRES-TTG Neo constructs with or without STAR elements. Colonies were picked and cells were propagated and tested for d2EGFP values. This ('IRES-TTG Neo') resulted in an improvement in the selection system for Neo known to have an ATG start codon downstream of IRES ('IRES-ATG Neo'). This improvement is particularly evident when the TTG Neo construct contains STAR elements.

Example 2: stability of expression of the modified dhfr Gene after Placement in the IRES sequence

The translation initiation codon of the Zeocin selection marker was modified to be a much less frequent translation initiation codon than the usual ATG codon, resulting in a high stringency selection system. In the selection system described in WO2006/048459, TTG Zeo is placed upstream of the gene of interest. In another possible selection system, a Zeo selection marker is placed downstream of the IRES sequence (see example 1, herein). This results in a bicistronic mRNA from which the Zeo gene product is translated from the translation initiation codon in the IRES sequence.

In this experiment, we combined the embodiments of the two systems. We placed a TTG selection marker upstream of the reporter gene and coupled a GTG or TTG modified metabolic marker with an IRES to the reporter gene. Different selectable marker genes can be used, such as the Zeocin and neomycin resistance genes, and the dhfr gene. Here we placed the modified Zeocin resistance gene TTG Zeo (see WO 2006/048459) upstream of the gene of interest, the dhfr selection gene downstream of the gene of interest and coupled to an IRES (FIG. 2). The purpose of this expression cassette was to select mammalian cell clones that produce high levels of protein, and Zeocin was first selected. TTG Zeo-gene of interest configuration most efficiently achieves this goal. After this preliminary selection period, the characteristics of the dhff-protein were used to maintain high expression levels in the absence of Zeocin antibiotics.

Active selection pressure appears to be beneficial in maintaining the protein expression levels of TTG Zeo selected colonies at the same high levels over a long period of time. This can be achieved, for example, by keeping a minimum amount of Zeocin in the culture medium, but this is not favoured in an industrial setting for economic or possible regulatory reasons (Zeocin is toxic and expensive).

Another approach is to couple the gene of interest to a selective marker for an enzyme that metabolizes 1 or more key steps in the metabolic pathway. By critical is meant that the cell is unable to synthesize specific critical metabolic components by itself, meaning that these components must be present in the culture medium for the cell to survive. Well known examples are essential amino acids which cannot be synthesized by mammalian cells, which must be present in the culture medium for the cells to survive. Another example relates to the dhfr gene for 5,6, 7, 8-tetrahydrofolate synthesis. The corresponding dhfr protein is an enzyme in the folate metabolic pathway. The dhfr protein specifically converts folate to 5,6, 7, 8-tetrahydrofolate, the methyl shuttle (shuttle) required for de novo synthesis of purines (hypoxanthine), thymidylate (thymidine), and the amino acid glycine. Operationally, the non-toxic substance folic acid must be present in the culture medium (Urlaub et al, 1980). Furthermore, the medium must be devoid of hypoxanthine and thymidine because the requirement for the dhff enzyme is skipped when these components are available to the cell. CHO-DG44 cells lack the dhfr gene and therefore these cells require glycine, hypoxanthine and thymidine in the culture medium to survive. However, if the final products glycine, hypoxanthine and thymidine are absent in the culture medium and folate is present and the dhfr gene is provided by the presence on the expression cassette in the cell, the cell can convert folate to 5,6, 7, 8-tetrahydrofolate and survive in such a medium. This principle has been used for many years as a method of choice for the production of stably transfected mammalian cell lines.

Here we apply this principle, not to initially select stable clones (this is done by Zeocin), but to maintain the cells under metabolic selection pressure. This has the advantage that initially very high protein expression can be achieved by the TTG Zeo selection system and that these high expression levels can be maintained without the need to maintain Zeocin in the culture medium. Alternatively, Zeocin may be removed from the culture medium where the absence of glycine, hypoxanthine and thymidine (GHT) or the mere absence of Hypoxanthine and Thymidine (HT) is sufficient to maintain a sufficiently high selection pressure to ensure high levels of protein expression. Such a configuration requires the presence of two selectable markers, both the Zeocin resistance gene and the dhfr gene in the expression cassette. As described above, it can be obtained efficiently when two genes and a gene of interest exist in such a configuration that a tricistronic mRNA is transcribed from the same promoter. When the modified Zeocin resistance gene (TTG Zeo) is located upstream of the d2EGFP gene, the dhfr gene needs to be coupled downstream of the d2EGFP gene by, for example, an IRES sequence (fig. 1).

Results

We generated constructs in which the TTG Zeo selection marker was located upstream of the d2EGFP reporter gene and the dhfr selection marker was located downstream of the d2EGFP gene, coupled by IRES sequences (fig. 2). These constructs were flanked by STAR 7/67/7. Three versions of this construct were made: ATG dhfr, GTG dhfr or TTG dhfr, each name representing the initiation codon for the dhfr gene. These constructs were transfected into CHO-DG44 cells. DNA transfection was performed using Lipofectamine 2000(Invitrogen) and cells were grown in IMDM medium (Gibco) + 10% FBS (Gibco) + HT-supplement in the presence of 400. mu.g/ml Zeocin.

Measured in the presence of 400. mu.g/ml Zeocin, at 14 TTG Zeo IRESATGThe average d2EGFP value in dhfr clones was 341 (day one). After measurement, cells were separated and cultured under three conditions:

(1) the medium contained 400. mu.g/ml Zeocin as well as hypoxanthine and thymidine (HT-supplement),

(2) zeocin is not contained in the culture medium, but HT-supplement is contained,

(3) does not contain Zeocin and HT-supplement.

Briefly, in condition 1, cells were only under Zeocin selection pressure, in condition 2, cells were not under any selection pressure, and in condition 3, cells were under DHFR selection pressure. Final condition 3 requires continuous dhfr gene expression to allow dhfr protein expression and cell survival.

After 65 days we measured the d2EGFP value again. TTG Zeo IRESATGThe average d2EGFP value for the dhfr clone under Zeocin selection is now 159 (fig. 2). TTG Zeo IRES ATG dhfr clone the average d2EGFP value was 20 in the absence of Zeocin but with HT supplement (FIG. 2). TTG Zeo IRES ATG dhff clones the average d2EGFP value was 37 without Zeocin selection and without HT supplement (FIG. 2). Overall we observed a decrease in d2EGFP values, but most seriously in the absence of Zeocin, despite the absence of HT supplement.

We performed TTG Zeo IRES in the same wayGTGExperiment with dhfr construct. The average d2EGFP value was 455 (day one) in 15 TTG Zeo IRES GTG dhfr clones measured in the presence of 400. mu.g/ml Zeocin (FIG. 3). After measurement the cells were separated and further cultured under the three conditions described above. D2EGFP values were re-measured after 65 days. The average d2EGFP value for TTG Zeo IRES GTG dhfr clone under Zeocin selection was now 356 (fig. 3). The TTG Zeo IRES GTGdhfr clone had an average d2EGFP value of 39 without Zeocin selection but with HT supplement (FIG. 3). The average d2EGFP value of the TTG Zeo IRES GTG dhfr clone without Zeocin selection and without HT supplement was 705 (FIG. 2).

In this example, we observed that the decrease in d2EGFP values occurred only in the absence of Zeocin but with HT supplement (condition 2). The d2EGFP value became very high in the absence of Zeocin and also in the absence of HTsupplement (condition 3). This may indicate that the expression level of dhfr protein is low enough to result in very high selection stringency due to the impaired translation frequency of the GTG dhfr mRNA. This selection pressure, in the absence of any toxic substance, is high enough to maintain long-term high protein expression levels and clearly even improve these expression levels over time.

We treated TTG Zeo IRES in the same wayTTGThe dhff construct was tested. The average d2EGFP value was 531 (day one) among 18 TTG Zeo IRES GTG dhfr clones measured in the presence of 00. mu.g/ml Zeocin at 4 (FIG. 4). After measurement the cells were separated and further cultured under the three conditions described above. D2EGFP values were re-measured after 65 days. The average d2EGFP value for TTG Zeo IRES TTG dhfr clone under Zeocin selection is now 324 (fig. 4). The TTG Zeo IRES TTG dhfr clone had an average d2EGFP value of 33 without Zeocin selection but with HT supplement (FIG. 4). The average d2EGFP value of the TTG Zeo IRES TTG dhfr clone without Zeocin selection and without HTsupplements was 1124 (FIG. 4).

Again, we observed that the decrease in d2EGFP values only occurred in the absence of Zeocin but with HTsupplement (condition 2). In the absence of Zeocin and also in the absence of HT supplement, the d2EGFP value became even higher than for the TTG Zeo IRES GTG dhfr construct (condition 3). Because TTG variants are more stringent than GTG variants, it is expected that TTG dhfr translates even less dhfr protein than GTG dhfr variants. This increased selective pressure of the TTG dhfr variant is high enough to maintain long-term high protein expression levels in the absence of any toxic substances and clearly improves protein expression levels even over time.

The data show that coupling of a non-ATG start codon variant of the dhfr gene to the d2EGFP gene by IRES results in high levels of d2EGFP expression with high stability in CHO-DG44 cells. This occurs when the medium does not contain Zeocin and essential metabolic end products. Preselection of Zeocin by a modified TTG Zeo selection marker allowed efficient establishment of colonies with high levels of d2EGFP expression. It is now possible to maintain high levels of d2EGFP expression and even improve these levels of expression simply by changing the medium (removing Zeocin and HT).

Example 3: increased expression of the modified dhfr gene after placement in the attenuated IRES sequence is not the result of gene amplification.

In the prior art, the use of the dhfr gene as a selectable marker generally relies on the amplification of the dhfr gene. One toxic agent, methotrexate, is used in this system to amplify the dhfr gene, accompanied by the desired transgene, of which up to several thousand copies can be integrated into the CHO cell genome following such amplification. While these high copy numbers produce high expression levels, they are also considered a disadvantage because so many copies may cause increased genomic instability, and subsequent removal of methotrexate from the culture medium results in rapid removal of many of the amplified loci.

In example 2, methotrexate was not used to inhibit dhfr enzyme activity. Only hypoxanthine and thymidine precursors are removed from the culture medium, which is sufficient to obtain stability of protein expression and can even increase the expression level. We therefore believe that the use of dhfr enzyme in our design leads to gene amplification.

Results

We isolated DNA from the clones described in example 2 on the same day as the d2EGFP values were measured (65). We used this DNA to determine the copy number of d2 EGFP.

TTG Zeo IRES under Zeocin selectionATGThe average d2EGFP copy number in dhfr clones was 86 (condition 1) (fig. 5). The average d2EGFP copy number of the TTG Zeo IRES ATG dhfr clone without Zeocin selection but with HT deletion was 53 (condition 2) (FIG. 5). The average d2EGFP copy number of the TTG Zeo IRES ATG dhfr clone without Zeocin selection and without HT supplement was 59 (condition 3) (FIG. 5).

TTG Zeo IRES under Zeocin selectionGTGThe average d2EGFP copy number in dhfr clones was 23 (condition 1) (fig. 6). TTG Zeo IRES GTG dhfr clone the average d2EGFP copy number was 14 (Condition 2) in the absence of Zeocin but in the presence of HT supplement (FIG. 6). The average d2EGFP copy number of the TTGZeo IRES GTG dhfr clone without Zeocin selection and without HT deletion was 37 (condition 3) (FIG. 6).

TTG Zeo IRES under Zeocin selectionTTGThe average d2EGFP copy number in dhfr clones was 33 (condition 1) (fig. 7). The average d2EGFP copy number of the TTG Zeo IRES TTG dhfr clone in the absence of Zeocin but with HT supplement was 26 (condition 2) (FIG. 7). The average d2EGFP copy number of the TTGZeo IRES TTG dhfr clone without Zeocin selection and without HT supplement was 32 (condition 3) (FIG. 7).

In either case, we did not observe a significant increase in d2EGFP copy number following removal of HT deletion, resulting in an increase in d2EGFP values in the case of GTG dhff and TTG dhff variants. The d2EGFP values remain stable over time and even increase significantly with both constructsThe fact that (a) is due to the action of dhfr protein is certain. Furthermore, no increase in the copy number of d2EGFP was observed in the TTG Zeo TTG dhfr clone, and only a slight increase was observed in the TTG Zeo GTG dhfr clone. Interestingly, at the lowest producer, TTG ZeoATGThe overall d2EGFP copy number in dhfr clones was higher than in both variants, while these clones did not maintain the original high d2EGFP fluorescence values (see example 2). We conclude from these data that the generally known gene amplification observed when using dhfr protein in combination with methotrexate has no effect on maintaining stable d2EGFP expression levels over time and the observed increase in these expression levels. In contrast, in both GTG and TTG dhfr variants, it appears that more d2EGFP protein is expressed per d2EGFP gene copy.

We further analyzed the d2EGFP mRNA levels of the different clones under different conditions as above and found that these mRNA levels generally trend with d2EGFP fluorescence values. We therefore concluded that the increase in d2EGFP fluorescence values was due to an increase in mRNA levels, rather than a change in translation efficiency.

Reference to the literature

Kaufman，RJ.(2000)Overview of vector design for mammalian geneexpression Mol Biotechnol 16，151-160.

Kozak M.(1986)Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44：283-292.

Kozak M.(1987)An analysis of 5′-noncoding sequences from 699vertebrate messenger RNAs.Nucleic Acids Res.15：8125-8148.

Kozak M.(1989)Context effects and inefficient initiation at non-AUGcodons in eucaryotic cell-free translation systems.Mol Cell Biol. 9：5073-5080.

Kozak M.(1990)Downstream secondary structure facilitates recognitionof initiator codons by eukaryotic ribosomes. Proc Natl Acad Sci USA87：8301-8305.

Kozak M. (1997)Recognition of AUG and alternative initiator codons isaugmented by G in position+4 but is not generally affected by the nucleotidesin positions+5 and+6. EMBO J.16：2482-2492.

Kozak M.(2002)Pushing the limits of the scanning mechanism forinitiation of translation. Gene 299：1-34.

Lopez de Quinto，S，and Martinez-Salas，E.(1998)Parameters influencingtranslational efficiency in aphthovirus IRES-based bicistronic expressionvectors Gene 217，51-6.

Martinez-Salas，E.(1999)Internalribosome entry site biology and its usein expression vectors Curr Opin Biotechnol 10，458-64.

McBurney，MW，Mai，T，Yang，X，and Jardine，K.(2002)Evidence forrepeat-induced gene silencing in cultured Mammalian cells：inactivation oftandem repeats of transfected genes Exp Cell Res 274，1-8.

Mizuguchi，H，Xu，Z，Ishii-Watabe，A，Uchida，E，and Hayakawa，T.(2000)IRES-dependent second gene expression is significantly lower than cap-dependent first gene expression in a bicistronic vector Mol Ther 1，376-82.

Rees，S，Coote，J，Stables，J，Goodson，S，Harris，S，and Lee，MG.(1996)Bicistronic vector for the creation of stable mammalian cell lines thatpredisposes all antibiotic-resistant cells to express recombinant proteinBiotechniques 20，102-104，106，108-110.

Urlaub，G. & Chasin，L.A.Isolation of Chinese hamster cell mutants deficient in dihydrofolate reductase activity.Proc Natl Acad Sci USA 77，4216-20(1980)

Venkatesan，A，and Dasgupta，A.(2001)Novel fluorescence-based screento identify small synthetic internal ribosome entry site elements Mol Cell Biol21，2826-37.

Whitelaw，E，Sutherland，H，Kearns，M，Morgan，H，Weaving，L，andGarrick，D.(2001)Epigenetic effects on transgene expression Methods MolBiol 158，351-68.

Sequence listing

<110> Corromi Ginkius Corp

<120> selection of host cells expressing proteins at high levels

<130>0117 A WO 01 ORD

<150>US11/359,953

<151>2006-02-21

<150>US11/269,525

<151>2005-11-07

<150>US60/626,301

<151>2004-11-08

<150>US60/696,610

<151>2005-07-05

<150>EP04105593.0

<151>2004-11-08

<160>82

<170>PatentIn version 3.3

<210>1

<211>749

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR1

<400>1

atgcggtggg ggcgcgccag agactcgtgg gatccttggc ttggatgttt ggatctttct 60

gagttgcctg tgccgcgaaa gacaggtaca tttctgatta ggcctgtgaa gcctcctgga 120

ggaccatctc attaagacga tggtattgga gggagagtca cagaaagaac tgtggcccct 180

ccctcactgc aaaacggaag tgattttatt ttaatgggag ttggaatatg tgagggctgc 240

aggaaccagt ctccctcctt cttggttgga aaagctgggg ctggcctcag agacaggttt 300

tttggccccg ctgggctggg cagtctagtc gaccctttgt agactgtgca cacccctaga 360

agagcaacta cccctataca ccaggctggc tcaagtgaaa ggggctctgg gctccagtct 420

ggaaaatctg gtgtcctggg gacctctggt cttgcttctc tcctcccctg cactggctct 480

gggtgcttat ctctgcagaa gcttctcgct agcaaaccca cattcagcgc cctgtagctg 540

aacacagcac aaaaagccct agagatcaaa agcattagta tgggcagttg agcgggaggt 600

gaatatttaa cgcttttgtt catcaataac tcgttggctt tgacctgtct gaacaagtcg 660

agcaataagg tgaaatgcag gtcacagcgt ctaacaaata tgaaaatgtg tatattcacc 720

ccggtctcca gccggcgcgc caggctccc 749

<210>2

<211>883

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR2

<400>2

gggtgcttcc tgaattcttc cctgagaagg atggtggccg gtaaggtccg tgtaggtggg 60

gtgcggctcc ccaggccccg gcccgtggtg gtggccgctg cccagcggcc cggcaccccc 120

atagtccatg gcgcccgagg cagcgtgggg gaggtgagtt agaccaaaga gggctggccc 180

ggagttgctc atgggctcca catagctgcc ccccacgaag acggggcttc cctgtatgtg 240

tggggtccca tagctgccgt tgccctgcag gccatgagcg tgcgggtcat agtcgggggt 300

gccccctgcg cccgcccctg ccgccgtgta gcgcttctgt gggggtggcg ggggtgcgca 360

gctgggcagg gacgcagggt aggaggcggg gggcagcccg taggtaccct gggggggctt 420

ggagaagggc gggggcgact ggggctcata cgggacgctg ttgaccagcg aatgcataga 480

gttcagatag ccaccggctc cggggggcac ggggctgcga cttggagact ggccccccga 540

tgacgttagc atgcccttgc ccttctgatc ctttttgtac ttcatgcggc gattctggaa 600

ccagatcttg atctggcgct cagtgaggtt cagcagattg gccatctcca cccggcgcgg 660

ccggcacagg tagcggttga agtggaactc tttctccagc tccaccagct gcgcgctcgt 720

gtaggccgtg cgcgcgcgct tggacgaagc ctgccccggc gggctcttgt cgccagcgca 780

gctttcgcct gcgaggacag agagaggaag agcggcgtca ggggctgccg cggccccgcc 840

cagcccctga cccagcccgg cccctccttc caccaggccc caa 883

<210>3

<211>2126

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR3

<400>3

atctcgagta ctgaaatagg agtaaatctg aagagcaaat aagatgagcc agaaaaccat 60

gaaaagaaca gggactacca gttgattcca caaggacatt cccaaggtga gaaggccata 120

tacctccact acctgaacca attctctgta tgcagattta gcaaggttat aaggtagcaa 180

aagattagac ccaagaaaat agagaacttc caatccagta aaaatcatag caaatttatt 240

gatgataaca attgtctcca aaggaacaag gcagagtcgt gctagcagag gaagcacgtg 300

agctgaaaac agccaaatct gctttgtttt catgacacag gagcataaag tacacaccac 360

caactgacct attaaggctg tggtaaaccg attcatagag agaggttcta aatacattgg 420

tccctcacag gcaaactgca gttcgctccg aacgtagtcc ctggaaattt gatgtccagt 480

atagaaaagc agagcagtca aaaaatatag ataaagctga accagatgtt gcctgggcaa 540

tgttagcagc accacactta agatataacc tcaggctgtg gactccctcc ctggggagcg 600

gtgctgccgg cggcgggcgg gctccgcaac tccccggctc tctcgcccgc cctcccgttc 660

tcctcgggcg gcggcggggg ccgggactgc gccgctcaca gcggcggctc ttctgcgccc 720

ggcctcggag gcagtggcgg tggcggccat ggcctcctgc gttcgccgat gtcagcattt 780

cgaactgagg gtcatctcct tgggactggt tagacagtgg gtgcagccca cggagggcga 840

gttgaagcag ggtggggtgt cacctccccc aggaagtcca gtgggtcagg gaactccctc 900

ccctagccaa gggaggccgt gagggactgt gcccggtgag agactgtgcc ctgaggaaag 960

gtgcactctg gcccagatac tacacttttc ccacggtctt caaaacccgc agaccaggag 1020

attccctcgg gttcctacac caccaggacc ctgggtttca accacaaaac cgggccattt 1080

gggcagacac ccagctagct gcaagagttg tttttttttt tatactcctg tggcacctgg 1140

aacgccagcg agagagcacc tttcactccc ctggaaaggg ggctgaaggc agggaccttt 1200

agctgcgggc tagggggttt ggggttgagt gggggagggg agagggaaaa ggcctcgtca 1260

ttggcgtcgt ctgcagccaa taaggctacg ctcctctgct gcgagtagac ccaatccttt 1320

cctagaggtg gagggggcgg gtaggtggaa gtagaggtgg cgcggtatct aggagagaga 1380

aaaagggctg gaccaatagg tgcccggaag aggcggaccc agcggtctgt tgattggtat 1440

tggcagtgga ccctcccccg gggtggtgcc ggaggggggg atgatgggtc gaggggtgtg 1500

tttatgtgga agcgagatga ccggcaggaa cctgccccaa tgggctgcag agtggttagt 1560

gagtgggtga cagacagacc cgtaggccaa cgggtggcct taagtgtctt tggtctcctc 1620

caatggagca gcggcggggc gggaccgcga ctcgggttta atgagactcc attgggctgt 1680

aatcagtgtc atgtcggatt catgtcaacg acaacaacag ggggacacaa aatggcggcg 1740

gcttagtcct acccctggcg gcggcggcag cggtggcgga ggcgacggca ctcctccagg 1800

cggcagccgc agtttctcag gcagcggcag cgcccccggc aggcgcggtg gcggtggcgc 1860

gcagccaggt ctgtcaccca ccccgcgcgt tcccaggggg aggagactgg gcgggagggg 1920

ggaacagacg gggggggatt caggggcttg cgacgcccct cccacaggcc tctgcgcgag 1980

ggtcaccgcg gggccgctcg gggtcaggct gcccctgagc gtgacggtag ggggcggggg 2040

aaaggggagg agggacaggc cccgcccctc ggcagggcct ctagggcaag ggggcggggc 2100

tcgaggagcg gaggggggcg gggcgg 2126

<210>4

<211>1625

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR4

<400>4

gatctgagtc atgttttaag gggaggattc ttttggctgc tgagttgaga ttaggttgag 60

ggtagtgaag gtaaaggcag tgagaccacg taggggtcat tgcagtaatc caggctggag 120

atgatggtgg ttcagttgga atagcagtgc atgtgctgta acaacctcag ctgggaagca 180

gtatatgtgg cgttatgacc tcagctggaa cagcaatgca tgtggtggtg taatgacccc 240

agctgggtag ggtgcatgtg gtgtaacgac ctcagctggg tagcagtgtg tgtgatgtaa 300

caacctcagc tgggtagcag tgtacttgat aaaatgttgg catactctag atttgttatg 360

agggtagtgc cattaaattt ctccacaaat tggttgtcac gtatgagtga aaagaggaag 420

tgatggaaga cttcagtgct tttggcctga ataaatagaa gacgtcattt ccagttaatg 480

gagacaggga agactaaagg tagggtggga ttcagtagag caggtgttca gttttgaata 540

tgatgaactc tgagagagga aaaacttttt ctacctctta gtttttgtga ctggacttaa 600

gaattaaagt gacataagac agagtaacaa gacaaaaata tgcgaggtta tttaatattt 660

ttacttgcag aggggaatct tcaaaagaaa aatgaagacc caaagaagcc attagggtca 720

aaagctcata tgccttttta agtagaaaat gataaatttt aacaatgtga gaagacaaag 780

gtgtttgagc tgagggcaat aaattgtggg acagtgatta agaaatatat gggggaaatg 840

aaatgataag ttattttagt agatttattc ttcatatcta ttttggcttc aacttccagt 900

ctctagtgat aagaatgttc ttctcttcct ggtacagaga gagcaccttt ctcatgggaa 960

attttatgac cttgctgtaa gtagaaaggg gaagatcgat ctcctgtttc ccagcatcag 1020

gatgcaaaca tttccctcca ttccagttct caaccccatg gctgggcctc atggcattcc 1080

agcatcgcta tgagtgcacc tttcctgcag gctgcctcgg gtagctggtg cactgctagg 1140

tcagtctatg tgaccaggag ctgggcctct gggcaatgcc agttggcagc ccccatccct 1200

ccactgctgg gggcctccta tccagaaggg cttggtgtgc agaacgatgg tgcaccatca 1260

tcattcccca cttgccatct ttcaggggac agccagctgc tttgggcgcg gcaaaaaaca 1320

cccaactcac tcctcttcag gggcctctgg tctgatgcca ccacaggaca tccttgagtg 1380

ctgggcagtc tgaggacagg gaaggagtga tgaccacaaa acaggaatgg cagcagcagt 1440

gacaggagga agtcaaaggc ttgtgtgtcc tggccctgct gagggctggc gagggccctg 1500

ggatggcgct cagtgcctgg tcggctgcaa gaggccagcc ctctgcccat gaggggagct 1560

ggcagtgacc aagctgcact gccctggtgg tgcatttcct gccccactct ttccttctaa 1620

gatcc 1625

<210>5

<211>1571

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR5

<400>5

cacctgattt aaatgatctg tctggtgagc tcactgggtc tttactcgca tgctgggtcc 60

acagctccac tgtcctgcag ggtccgtgag tgtgggcccc ttatctattt catcatcata 120

accctgcgtg tcctcaactc ctggcacata ttgggtggcc ccatccacac acggttgttg 180

agtgaatcca tgagatgaca aaggctatga tgtagactat atcatgagcc agaaccaggc 240

tttcctacct ccagacaatc aagggccttg atttgggatt gagggagaaa ggagtagaag 300

ccaggaagga gaagagattg aggtttacca agggtgcaaa gtcctggccc ctgactgtag 360

gctgaaaact atagaaatga tagaacaatt ttgcaatgaa atgcagaaga ccctgcatca 420

actttaggtg ggacttcggg tatttttatg gccacagaac atcctcccat ttacctgcat 480

ggcccagaca cagacttcaa aacagttgag gccagcaggc tccaggtaag tggtaggatt 540

ccagaatgcc ctcagagtgt tgtgggaggc agcaggcgat tttcctggac ttctgagttt 600

atgagaaccc caaaccccaa ttggcattaa cattgaggtc tcaatgtatc atggcaggaa 660

gcttccgagt ggtgaaaagg aaagtgaaca tcaaagctcg gaagacaaga gggtggagtg 720

atggcaacca agagcaagac ccttccctct cctgtgatgg ggtggctcta tgtgaagccc 780

ccaaactgga cacaggtctg gcagaatgag gaacccactg agatttagcg ccaacatcca 840

gcataaaagg gagactgaca tagaatttga gttagttaaa aataaggcac aatgcttttc 900

atgtattcct gagttttgtg gactggtgtt caatttgcag cattcttagt tgattaaatc 960

tgagatgaag aaagagtgtc caacactttc accttggaaa gctctggaaa agcaaaaggg 1020

agagacaatt agcttcatcc attaactcac ttagtcatta tgcattcatt catgtaacta 1080

ccaaacacgt actgagtgcc taacactcct gagacactga gaagtttctt gggaatacaa 1140

agatgaataa aaaccacgcc aggcaggagt tggaggaagg ttctggatgc caccacgctc 1200

tacctcctgg ctggacacca ggcaatgttg gtaaccttct gcctccaatt tctgcaaata 1260

cataattaat aaacacaagg ttatcttcta aacagttctt aaaatgagtc aactttgttt 1320

aaacttgttc tttttagaga aaaatgtatt tttgaaagag ttggttagtg ctaggggaaa 1380

tgtctgggca cagctcagtc tggtgtgaga gcaggaagca gctctgtgtg tctggggtgg 1440

gtacgtatgt aggacctgtg ggagaccagg ttgggggaag gcccctcctc atcaagggct 1500

cctttgcttt ggtttgcttt ggcgtgggag gtgctgtgcc acaagggaat acgggaaata 1560

agatctctgc t 1571

<210>6

<211>1173

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR6

<400>6

tgacccacca cagacatccc ctctggcctc ctgagtggtt tcttcagcac agcttccaga 60

gccaaattaa acgttcactc tatgtctata gacaaaaagg gttttgacta aactctgtgt 120

tttagagagg gagttaaatg ctgttaactt tttaggggtg ggcgagaggg atgacaaata 180

acaacttgtc tgaatgtttt acatttctcc ccactgcctc aagaaggttc acaacgaggt 240

catccatgat aaggagtaag acctcccagc cggactgtcc ctcggccccc agaggacact 300

ccacagagat atgctaactg gacttggaga ctggctcaca ctccagagaa aagcatggag 360

cacgagcgca cagagcaggg ccaaggtccc agggacagaa tgtctaggag ggagattggg 420

gtgagggtaa tctgatgcaa ttactgtggc agctcaacat tcaagggagg gggaagaaag 480

aaacagtccc tgtcaagtaa gttgtgcagc agagatggta agctccaaaa tttgaaactt 540

tggctgctgg aaagttttag ggggcagaga taagaagaca taagagactt tgagggttta 600

ctacacacta gacgctctat gcatttattt atttattatc tcttatttat tactttgtat 660

aactcttata ataatcttat gaaaacggaa accctcatat acccatttta cagatgagaa 720

aagtgacaat tttgagagca tagctaagaa tagctagtaa gtaaaggagc tgggacctaa 780

accaaaccct atctcaccag agtacacact cttttttttt ttccagtgta atttttttta 840

atttttattt tactttaagt tctgggatac atgtgcagaa ggtatggttt gttacatagg 900

tatatgtgtg ccatagtgga ttgctgcacc tatcaacccg tcatctaggt ttaagcccca 960

catgcattag ctatttgtcc tgatgctctc cctcccctcc ccacaccaga caggccttgg 1020

tgtgtgatgt tcccctccct gtgtccatgt gttctcactg ttcagctccc acttatgagt 1080

gagaacgtgt ggtatttggt tttctgttcc tgtgttagtt tgctgaggat gatggcttcc 1140

agcttcatcc atgtccctgc aaaggacacg atc 1173

<210>7

<211>2101

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR7

<400>7

aggtgggtgg atcacccgag gtcaggagtt caagaccagc ctggccaaca tggtaaaacc 60

tcgtctctac taaaaaatac gaaaaattag ctggttgtgg tggtgcgtgc ttgtaatccc 120

agctactcgg gaggctgagg caggagaatc acttgaatct gggaggcaga ggttgcagtg 180

agctgagata gtgccattgc actccagcct gggcaacaga cggagactct gtctccaaaa 240

aaaaaaaaaa aaatcttaga ggacaagaat ggctctctca aacttttgaa gaaagaataa 300

ataaattatg cagttctaga agaagtaatg gggatatagg tgcagctcat gatgaggaag 360

acttagctta actttcataa tgcatctgtc tggcctaaga cgtggtgagc tttttatgtc 420

tgaaaacatt ccaatataga atgataataa taatcacttc tgacccccct tttttttcct 480

ctccctagac tgtgaagcag aaaccccata tttttcttag ggaagtggct acgcactttg 540

tatttatatt aacaactacc ttatcaggaa attcatattg ttgccctttt atggatgggg 600

aaactggaca agtgacagag caaaatccaa acacagctgg ggatttccct cttttagatg 660

atgattttaa aagaatgctg ccagagagat tcttgcagtg ttggaggaca tatatgacct 720

ttaagatatt ttccagctca gagatgctat gaatgtatcc tgagtgcatg gatggacctc 780

agttttgcag attctgtagc ttatacaatt tggtggtttt ctttagaaga aaataacaca 840

tttataaata ttaaaatagg cccaagacct tacaagggca ttcatacaaa tgagaggctc 900

tgaagtttga gtttgttcac tttctagtta attatctcct gcctgtttgt cataaatgcg 960

tttagtaggg agctgctaat gacaggttcc tccaacagag tgtggaagaa ggagatgaca 1020

gctggcttcc cctctgggac agcctcagag ctagtgggga aactatgtta gcagagtgat 1080

gcagtgacca agaaaatagc actaggagaa agctggtcca tgagcagctg gtgagaaaag 1140

gggtggtaat catgtatgcc ctttcctgtt ttatttttta ttgggtttcc ttttgcctct 1200

caattccttc tgacaataca aaatgttggt tggaacatgg agcacctgga agtctggttc 1260

attttctctc agtctcttga tgttctctcg ggttcactgc ctattgttct cagttctaca 1320

cttgagcaat ctcctcaata gctaaagctt ccacaatgca gattttgtga tgacaaattc 1380

agcatcaccc agcagaactt aggttttttt ctgtcctccg tttcctgacc tttttcttct 1440

gagtgcttta tgtcacctcg tgaaccatcc tttccttagt catctaccta gcagtcctga 1500

ttcttttgac ttgtctccct acaccacaat aaatcactaa ttactatgga ttcaatccct 1560

aaaatttgca caaacttgca aatagattac gggttgaaac ttagagattt caaacttgag 1620

aaaaaagttt aaatcaagaa aaatgacctt taccttgaga gtagaggcaa tgtcatttcc 1680

aggaataatt ataataatat tgtgtttaat atttgtatgt aacatttgaa taccttcaat 1740

gttcttattt gtgttatttt aatctcttga tgttactaac tcatttggta gggaagaaaa 1800

catgctaaaa taggcatgag tgtcttatta aatgtgacaa gtgaatagat ggcagaaggt 1860

ggattcatat tcagttttcc atcaccctgg aaatcatgcg gagatgattt ctgcttgcaa 1920

ataaaactaa cccaatgagg ggaacagctg ttcttaggtg aaaacaaaac aaacacgcca 1980

aaaaccttta ttctctttat tatgaatcaa atttttcctc tcagataatt gttttattta 2040

tttattttta ttattattgt tattatgtcc agtctcactc tgtcgcctaa gctggcatga 2100

t 2101

<210>8

<211>1821

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR8

<400>8

gagatcacct cgaagagagt ctaacgtccg taggaacgct ctcgggttca caaggattga 60

ccgaacccca ggatacgtcg ctctccatct gaggcttgct ccaaatggcc ctccactatt 120

ccaggcacgt gggtgtctcc cctaactctc cctgctctcc tgagcccatg ctgcctatca 180

cccatcggtg caggtccttt ctgaagagct cgggtggatt ctctccatcc cacttccttt 240

cccaagaaag aagccaccgt tccaagacac ccaatgggac attccccttc cacctccttc 300

tccaaagttg cccaggtgtt catcacaggt tagggagaga agcccccagg tttcagttac 360

aaggcatagg acgctggcat gaacacacac acacacacac acacacacac acacacacac 420

acacgactcg aagaggtagc cacaagggtc attaaacact tgacgactgt tttccaaaaa 480

cgtggatgca gttcatccac gccaaagcca agggtgcaaa gcaaacacgg aatggtggag 540

agattccaga ggctcaccaa accctctcag gaatattttc ctgaccctgg gggcagaggt 600

tggaaacatt gaggacattt cttgggacac acggagaagc tgaccgacca ggcattttcc 660

tttccactgc aaatgaccta tggcgggggc atttcacttt cccctgcaaa tcacctatgg 720

cgaggtacct ccccaagccc ccacccccac ttccgcgaat cggcatggct cggcctctat 780

ccgggtgtca ctccaggtag gcttctcaac gctctcggct caaagaagga caatcacagg 840

tccaagccca aagcccacac ctcttccttt tgttataccc acagaagtta gagaaaacgc 900

cacactttga gacaaattaa gagtccttta tttaagccgg cggccaaaga gatggctaac 960

gctcaaaatt ctctgggccc cgaggaaggg gcttgactaa cttctatacc ttggtttagg 1020

aaggggaggg gaactcaaat gcggtaattc tacagaagta aaaacatgca ggaatcaaaa 1080

gaagcaaatg gttatagaga gataaacagt tttaaaaggc aaatggttac aaaaggcaac 1140

ggtaccaggt gcggggctct aaatccttca tgacacttag atataggtgc tatgctggac 1200

acgaactcaa ggctttatgt tgttatctct tcgagaaaaa tcctgggaac ttcatgcact 1260

gtttgtgcca gtatcttatc agttgattgg gctcccttga aatgctgagt atctgcttac 1320

acaggtcaac tccttgcgga agggggttgg gtaaggagcc cttcgtgtct cgtaaattaa 1380

ggggtcgatt ggagtttgtc cagcattccc agctacagag agccttattt acatgagaag 1440

caaggctagg tgattaaaga gaccaacagg gaagattcaa agtagcgact tagagtaaaa 1500

acaaggttag gcatttcact ttcccagaga acgcgcaaac attcaatggg agagaggtcc 1560

cgagtcgtca aagtcccaga tgtggcgagc ccccgggagg aaaaaccgtg tcttccttag 1620

gatgcccgga acaagagcta ggcttccgga gctaggcagc catctatgtc cgtgagccgg 1680

cgggagggag accgccggga ggcgaagtgg ggcggggcca tccttctttc tgctctgctg 1740

ctgccgggga gctcctggct ggcgtccaag cggcaggagg ccgccgtcct gcagggcgcc 1800

gtagagtttg cggtgcagag t 1821

<210>9

<211>1929

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR9

<400>9

cacttcctgg gagtggagca gaggctctgc gtggagcatc catgtgcagt actcttaggt 60

acggaaggga ttgggctaaa ccatggatgg gagctgggaa gggaagggac caacttcagg 120

ccccactggg acactggagc tgccaccctt tagagccctc ctaaccctac accagaggct 180

gagggggacc tcagacatca cacacatgct ttcccatgtt ttcagaaatc tggaaacgta 240

gaacttcagg ggtgagagtg cctagatatt gaatacaagg ctagattggg cttctgtaat 300

atcccaaagg accctccagc tttttcacca gcacctaatg cccatcagat accaaagaca 360

cagcttagga gaggttcacc ctgaagctga ggaggaggca gccggattag agttgactga 420

gcaaggatga ctgccttctc cacctgacga tttcagctgc tgcccttttc ttttcctggg 480

aatgcctgtc gccatggcct tctgtgtcca caggagagtt tgacccagat actcatggac 540

caggcaaagg tgctgttcct cccagcccag ggcccaccat gaagcatgcc tgggagcctg 600

gtaaggaccc agccactcct gggctgttga cattggcttc tcttgcccag cattgtagcc 660

acgccactgc attgtactgt gagataagtc aaggtgggct caccaggacc tgcactaaat 720

tgtgaaattc agctccaaag aactttggaa attacccatg catttaagca aaatgaatga 780

tacctgagca aaccctttca cattggcaca agttacaatc ctgtctcatc ctcttgatta 840

caaattccat ccaggcaaga gctgtatcac cctgaggtct ccccattcat gttttggtca 900

ataatattta gtttcctttt gaaaatagat ttttgtgtta ctccattatg atgggcagag 960

gccagatgct tatattctat ttaaatgact atgtttttct atctgtaact gggtttgtgt 1020

tcaggtggta aatgcttttt ttttgcagtc agaagattcc tggaaggcga ccagaaatta 1080

gctggccgct gtcagacctg aagttacttc taaagggcct ttagaaatga attctttttt 1140

atgccttctc tgaattctga gaagtaggct tgacttcccc taagtgtgga gttgggagtc 1200

aactcttctg aaaagaaagt ttcagagcat tttccaaagc catggtcagc tgtgggaagg 1260

gaagacgatg gatagtacag ttgccggaaa acactgatgg aggcggatgc tccagctcag 1320

ccaaagacct ttgttctgcc caccccagaa atgccccttc ctcaatcgca gaaacgttgc 1380

cccatggctc ctgatactca gaatgcagcc tctgaccagg accatctgca tcctccagga 1440

gctcgtaaga aatgcagcat cgtgggacct gctggcacct ggtgaaccca aacctgcagg 1500

gctcctgggt gtgcttgggg cggctgcagg ggaagaggga gtcagcagcc tcctcctgac 1560

cttcccgggg gctgcttttc tgaggggcca gaatgcaccg gttgaccttg ttgcatcact 1620

ggcccatgac tggctgcttt ggtcaggtgt aaaaaggtgt ttccagaggg tctgctcctc 1680

tcactatcgg accaggtttc catggagagc tcagcctccc agcaaggata gagaacttca 1740

aatggctcaa agaactgaga ggccacacat gtgtgacctg aatagtctct gctgcaaaac 1800

aaagggtttc ttaatgtaaa acgttctctt cctcacagag gggttcccag ctgctagtgg 1860

gcatgttgca ggcatttcct gggctgcatc aggttgtcat aagccagagg atcatttttg 1920

ggggctcat 1929

<210>10

<211>1167

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR10

<220>

<221>misc_feature

<222>(452)..(1143)

<223>n is a，c，g，or t on various positions

<400>10

aggtcaggag ttcaagacca gcctggccaa catggtgaaa ccctgtccct acaaaaaata 60

caaaaattag ccgggcgtgg tggggggcgc ctataatccc agctactcag gatgctgaga 120

caggagaatt gtttgaaccc gggaggtgga ggttgcagtg aactgagatc gcgccactgc 180

actccagcct ggtgacagag agagactccg tctcaacaac agacaaacaa acaaacaaac 240

aacaacaaaa atgtttactg acagctttat tgagataaaa ttcacatgcc ataaaggtca 300

ccttctacag tatacaattc agtggattta gtatgttcac aaagttgtac gttgttcacc 360

atctactcca gaacatttac atcaccccta aaagaagctc tttagcagtc acttctcatt 420

ctccccagcc cctgccaacc acgaatctac tntctgtctc tattctgaat atttcatata 480

aaggagtcct atcatatggg ccttttacgt ctaccttctt tcacttagca tcatgttttt 540

aagattcatc cacagtgtag cacgtgtcag ttaattcatt tcatcttatg gctggataat 600

gctctattgt atgcatatcc ctcactttgc ttatccattc atcaactgat tgacatttgg 660

gttatttcta ctttttgact attatgagta atgctgctat gaacattcct gtaccaatcg 720

ttacgtggac atatgctttc aattctcctg agtatgtaac tagggttgga gttgctgggt 780

catatgttaa ctcagtgttt catttttttg aagaactacc aaatggtttt ccaaagtgga 840

tgcaacactt tacattccca ccagcaagat atgaaggttc caatgtctct acatttttgc 900

caacacttgt gattttcttt tatttattta tttatttatt tatttttgag atggagtctc 960

actctgtcac ccaggctgga gtgcagtggc acaatttcag ctcactgcaa tctccacctc 1020

tcgggctcaa gcgatactcc tgcctcaacc tcccgagtaa ctgggattac aggcgcccac 1080

caccacacca agctaatttt ttgtattttt agtagagacg gggtttcatc atgtcggcca 1140

ggntgtactc gaactctgac ctcaagt 1167

<210>11

<211>1377

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR11

<400>11

aggatcactt gagcccagga gttcaagacc agcctgggca acatagcgag aacatgtctc 60

aaaaaggaaa aaaatggggg aaaaaaccct cccagggaca gatatccaca gccagtcttg 120

ataagctcca tcattttaaa gtgcaaggcg gtgcctccca tgtggatgat tatttaatcc 180

tcttgtactt tgtttagtcc tttgtggaaa tgcccatctt ataaattaat agaattctag 240

aatctaatta aaatggttca actctacatt ttactttagg ataatatcag gaccatcaca 300

gaatgtctga gatgtggatt taccctatct gtagctcact tcttcaacca ttcttttagc 360

aaggctagtt atcttcagtg acaacccctt gctgccctct actatctcct ccctcagatg 420

gactactctg attaagcttg agctagaata agcatgttat cccgggattt catatggaat 480

attttataca tgagtgagcc attatgagtt gtttgaaaat ttattatgtt gagggagggt 540

aaccgctgta acaaccatca ccaaatctaa tcgactgaat acatttgacg tttatttctt 600

gttcacctga cagttcagtg ttacctaaat ttacatgaag acccagaggc ccacgctcct 660

tcattttggg ctccaccgac ctccaaggtt tcagggccct ctgccccgcc ttctgcaccc 720

acaggggaag agagtggagg atgcacacgc ccaggcctgg aagtgacgca tgtggcttcc 780

ccgtccacag acttcaccca cagtccattg gccttcttaa gtcatggact cctgctgagc 840

tgccagggtg catgggaaat ccatgtgact gtgtgccctg gaggaagggg agcgtttcgg 900

tgagcacaca ggagtctttg ccactagacg ctgatgagga ttccccacag gcgatgaagc 960

atggagactc atcttgtaac aaacagatga gttgttgaca tctcttaagt ttactttgtg 1020

tgcagttttt attcagatag gaaaggctgt taaaatctta acacctaact ggaagaaggg 1080

ttttagagaa gtgtggtttt cagtaagcca gttctttcca caatccaaga aacgaaataa 1140

atttccagca tggagcagtt ggcaggtaag gtttttgttg tggtctcgcc caggcttgag 1200

tgtaaccggt gtggtcatag ctcactacat tctcaaactc ctggccttaa gtcatcctcc 1260

tgcctcagcc tcccaaaggc aagtaaggtt aagaataggg gaaaggtgaa gtttcacagc 1320

ttttctagaa ttctttttat tcaagggact ctcagatcat caaacccacc cagaatc 1377

<210>12

<211>1051

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR12

<400>12

atcctgcttc tgggaagaga gtggcctccc ttgtgcaggt gactttggca ggaccagcag 60

aaacccaggt ttcctgtcag gaggaagtgc tcagcttatc tctgtgaagg gtcgtgataa 120

ggcacgagga ggcaggggct tgccaggatg ttgcctttct gtgccatatg ggacatctca 180

gcttacgttg ttaagaaata tttggcaaga agatgcacac agaatttctg taacgaatag 240

gatggagttt taagggttac tacgaaaaaa agaaaactac tggagaagag ggaagccaaa 300

caccaccaag tttgaaatcg attttattgg acgaatgtct cactttaaat ttaaatggag 360

tccaacttcc ttttctcacc cagacgtcga gaaggtggca ttcaaaatgt ttacacttgt 420

ttcatctgcc tttttgctaa gtcctggtcc cctacctcct ttccctcact tcacatttgt 480

cgtttcatcg cacacatatg ctcatcttta tatttacata tatataattt ttatatatgg 540

cttgtgaaat atgccagacg agggatgaaa tagtcctgaa aacagctgga aaattatgca 600

acagtgggga gattgggcac atgtacattc tgtactgcaa agttgcacaa cagaccaagt 660

ttgttataag tgaggctggg tggtttttat tttttctcta ggacaacagc ttgcctggtg 720

gagtaggcct cctgcagaag gcattttctt aggagcctca acttccccaa gaagaggaga 780

gggcgagact ggagttgtgc tggcagcaca gagacaaggg ggcacggcag gactgcagcc 840

tgcagagggg ctggagaagc ggaggctggc acccagtggc cagcgaggcc caggtccaag 900

tccagcgagg tcgaggtcta gagtacagca aggccaaggt ccaaggtcag tgagtctaag 960

gtccatggtc agtgaggctg agacccaggg tccaatgagg ccaaggtcca gagtccagta 1020

aggccgagat ccagggtcca gggaggtcaa g 1051

<210>13

<211>1291

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR13

<400>13

agccactgag gtcctaactg cagccaaggg gccgttctgc acatgtcgct caccctctgt 60

gctctgttcc ccacagagca aacgcacatg gcaacgttgg tccgctcagc cactggttct 120

gtggtggaac ggtggatgtc tgcactgtga catcagctga gtaagtaaca acgactgagg 180

atgccgctga cccagggctg gggaagggga ctcccagctc agacaggctt ggctgtggtt 240

tgctttggga ggagagtgaa catcacaggg aatggctcat gtcagcccca ggagggtggg 300

ctggcccctg gtccccgggc tccttctggc cctgcaggcg atagagagcc tcaacctgct 360

gccgcttctc cttggcccgg gtgatggccg tctggaagag cctgcagtag aggtgcacag 420

ccagcggaga gtcgtcattg ccgggtacag ggtaggtgat gaggcagggg ttgcagttgg 480

tgtccacgat gcccactgtg gggatgttca tcttggctgc gtctctcacg gccacgtgtg 540

gctcaaagat gttgttgagc gtgtgcagga agatgatgag gtccggcagg cggaccgtgg 600

ggccaaagag gaggcgcgcg ttggtcagca tgccgcccct gaagtagcga gtgtgggcgt 660

actcgccaca gtcacgggcc atgttctcaa tcaggtacga gaactgccgg ttgcggctta 720

taaacaagat gatgcccttg cggtaggcca tgtgggcggt gaagttcaag gccagctgga 780

ggtgcgtggc tgtctgttcc aggtcgatga tgtcgtggtc caggcggctc ccaaagatgt 840

acggctccat aaacctgcca gagaccccac caaggcaagg gggatgagag ttcacggggc 900

catctccact ggctccttgc aggaacacag acgcccacca gggactcccg ggctcctctg 960

tgggggcact atgggctggg aagcacaatt tgcaacgctc cccgtgtgca tggacagcag 1020

tgcagaccca tccaggccac ccctctgcat gcctcgtctc gtggcttaac ccctcctacc 1080

ctctacctct tcccgaagga atcctaatag aactgacccc atatggatgt gtggacatcc 1140

aacatgacgc caaaaggaca ttctgccccg tgcagctcac agggcagccg cctccgtcac 1200

tgtcctcttc ccgaggcttt gcggatgagg cccctctggg gttggactta gcggggtgct 1260

ctgggccaaa agcattaagg gatcagggca g 1291

<210>14

<211>711

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR14

<400>14

ccctggacca gggtccgtgg tcttggtggg cactggcttc ttcttgctgg gtgttttcct 60

gtgggtctct ggcaaggcac tttttgtggc gctgcttgtg ctgtgtgcgg gaggggcagg 120

tgctctttcc tcttggagct ggaccctctg gggcgggtcc ccgtcggcct ccttgtgtgt 180

tttctgcacc tggtacagct ggatggcctc ctcaatgccg tcgtcgctgc tggagtcgga 240

cgcctcgggc gcctgtacgg cgctcgtgac tcgctttccc ctccttgcgg tgctggcgtt 300

ccttttaatc ccacttttat tctgtactgc ttctgaaggg cggtgggggt tgctggcttt 360

gtgctgccct ccttctcctg cgtggtcgtg gtcgtgacct tggacctgag gcttctgggc 420

tgcacgtttg tctttgctaa ccgggggagg tctgcagaag gcgaactcct tctggacgcc 480

catcaggccc tgccggtgca ccacctttgt agccggctct tggtgggatt tcgagagtga 540

cttcgccgaa ttttcatgtg tgtctggttt cttctccact gacccatcac atttttgggt 600

ctcatgctgt cttttctcat tcagaaactg ttctatttct gccctgatgc tctgctcaaa 660

ggagtctgct ctgctcatgc tgactgggga ggcagagccc tggtccttgc t 711

<210>15

<211>1876

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR15

<400>15

gagtccaaga tcaaggtgcc agcatcttgt gagggccttc ttgttacgtc actccctagc 60

gaaagggcaa agagagggtg agcaagagaa aggggggctg aactcgtcct tgtagaagag 120

gcccattccc gagacaatgg cattcatcca ttcactccac cctcatggcc tcaccacctc 180

tcatgaggct ccacctccca gccctggttt gttggggatt aaatttccaa cacatgcctt 240

ttgggggaca tgttaaaatt atagcacccc aaatgttaca ctatcttttg atgagcggta 300

gttctgattt taagtctagc tggcctactt tttcttgcac gtgggatgct ttctgcctgt 360

tccagggcag gcagctcttc tctgtccctc tgctggcccc acctcatcct ctgttgtcct 420

cttccctcct tctgtgccct ggggtcctgg tgggggtgtg actgtcaact gcgttgggct 480

aacttttttc cctgctggtg gcccgtaatg aaagaaagct tcttgctccc aagttcctta 540

aatccaagct catagacaac gcggtctcac agcaggcctg gggccagcct cacgtgagcc 600

ccttccctgg tgtagtcact ggcatggggg aatgggattt cctgttgccc tactgtgtgg 660

ctgaggtggg ggttgcttcc tggagccagg ccttgtggaa gggcagtgcc cactgcagtg 720

gatgctgggc cctgaatctg accccagtgt tcattggctc tgtgagaccc agtgagggca 780

gggagggaag tggagctggg gtgagaagta gaggccctgc agggcccacg tgccagccac 840

caggcctcag actaggctca gatgacggag agctgcacac ctgcccaacc caggccctgc 900

agtgcccaca tgccagccgc tggggcccag acttgctcca gagggcggag agctttacac 960

cggcccaacc caggccatgg ctccaaatgc gtgacagttt tgctgttgct tcttttagtc 1020

attgtcaagt tgatgcttgt tttgcagagg accaaggctt tatgaaccta ttaccctgtg 1080

tgaagagttt caccaggtta tggaaatttc tttaaaacca taccacagtt ttttcattat 1140

tcatgtatat ttttaaaaat aattactgca ctcagtagaa taacatgaaa atgttgcctg 1200

ttagcccttt tccagtttgc cccgagaata ctgggggcac ttgtggctgc aatgtttatc 1260

ctgcggcagc tttgccatga agtatctcac ttttattatt atttttgcat tgctcgagta 1320

tattgacttt ggaaacaaaa gacatcattc tatttatagc attatgtttt tagtagtggt 1380

atttccatat acaagataca gtaattttcc gtcaatgaaa atgtcaaatt ctagaaaatg 1440

taacattcct atgcgtggtg ttaacatcgt tctctaacag ttgttggccg aagattcgtt 1500

tgatgaatcc gatttttcca aaatagccga ttctgatgat tcagacgatt ctgatgttct 1560

gtttagaaat aattccaaga acagttttta cattttattt tcacattgaa aatcagtcag 1620

atttgcttca gcctcaaaga gcacgtttat gtaaaattaa atgagtgctg gcagccagct 1680

gcgctttgtt tttctaaatg ggaaaagggt taaatttcac tcagctttta aatgacagcg 1740

cacagcctgt gtcatagagg gttggaggag atgactttaa ctgcctgtgg ttaggatccc 1800

tttcccccag gaatgtctgg gagcccactg ccgggtttgc tgtccgtctc gtttggactc 1860

agttctgcat gtactg 1876

<210>16

<211>1282

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR16

<400>16

cgcccacctc ggctttccaa agtgctggga ttacaggcat gagtcactgc gcccatcctg 60

attccaagtc tttagataat aacttaactt tttcgaccaa ttgccaatca ggcaatcttt 120

gaatctgcct atgacctagg acatccctct ccctacaagt tgccccgcgt ttccagacca 180

aaccaatgta catcttacat gtattgattg aagttttaca tctccctaaa acatataaaa 240

ccaagctata gtctgaccac ctcaggcacg tgttctcagg acctccctgg ggctatggca 300

tgggtcctgg tcctcagatt tggctcagaa taaatctctt caaatatttt ccagaatttt 360

actcttttca tcaccattac ctatcaccca taagtcagag ttttccacaa ccccttcctc 420

agattcagta atttgctaga atggccacca aactcaggaa agtattttac ttacaattac 480

caatttatta tgaagaactc aaatcaggaa tagccaaatg gaagaggcat agggaaaggt 540

atggaggaag gggcacaaag cttccatgcc ctgtgtgcac accaccctct cagcatcttc 600

atgtgttcac caactcagaa gctcttcaaa ctttgtcatt taggggtttt tatggcagtt 660

ccactatgta ggcatggttg ataaatcact ggtcatcggt gatagaactc tgtctccagc 720

tcctctctct ctcctcccca gaagtcctga ggtggggctg aaagtttcac aaggttagtt 780

gctctgacaa ccagccccta tcctgaagct attgaggggt cccccaaaag ttaccttagt 840

atggttggaa gaggcttatt atgaataaca aaagatgctc ctatttttac cactagggag 900

catatccaag tcttgcggga acaaagcatg ttactggtag caaattcata caggtagata 960

gcaatctcaa ttcttgcctt ctcagaagaa agaatttgac caagggggca taaggcagag 1020

tgagggacca agataagttt tagagcagga gtgaaagttt attaaaaagt tttaggcagg 1080

aatgaaagaa agtaaagtac atttggaaga gggccaagtg ggcgacatga gagagtcaaa 1140

caccatgccc tgtttgatgt ttggcttggg gtcttatatg atgacatgct tctgagggtt 1200

gcatccttct cccctgattc ttcccttggg gtgggctgtc cgcatgcaca atggcctgcc 1260

agcagtaggg aggggccgca tg 1282

<210>17

<211>793

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR17

<400>17

atccgagggg aggaggagaa gaggaaggcg agcagggcgc cggagcccga ggtgtctgcg 60

agaactgttt taaatggttg gcttgaaaat gtcactagtg ctaagtggct tttcggattg 120

tcttatttat tactttgtca ggtttcctta aggagagggt gtgttggggg tgggggagga 180

ggtggactgg ggaaacctct gcgtttctcc tcctcggctg cacagggtga gtaggaaacg 240

cctcgctgcc acttaacaat ccctctatta gtaaatctac gcggagactc tatgggaagc 300

cgagaaccag tgtcttcttc cagggcagaa gtcacctgtt gggaacggcc cccgggtccc 360

cctgctgggc tttccggctc ttctaggcgg cctgatttct cctcagccct ccacccagcg 420

tccctcaggg acttttcaca cctccccacc cccatttcca ctacagtctc ccagggcaca 480

gcacttcatt gacagccaca cgagccttct cgttctcttc tcctctgttc cttctctttc 540

tcttctcctc tgttccttct ctttctctgt cataatttcc ttggtgcttt cgccacctta 600

aacaaaaaag agaaaaaaat aaaataaaaa aaacccattc tgagccaaag tattttaaga 660

tgaatccaag aaagcgaccc acatagccct ccccacccac ggagtgcgcc aagacgcacc 720

caggctccat cacagggccg agagcagcgc cactctggtc gtacttttgg gtcaagagat 780

cttgcaaaag agg 793

<210>18

<211>492

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR18

<400>18

atctttttgc tctctaaatg tattgatggg ttgtgttttt tttcccacct gctaataaat 60

attacattgc aacattcttc cctcaacttc aaaactgctg aactgaaaca atatgcataa 120

aagaaaatcc tttgcagaag aaaaaaagct attttctccc actgattttg aatggcactt 180

gcggatgcag ttcgcaaatc ctattgccta ttccctcatg aacattgtga aatgaaacct 240

ttggacagtc tgccgcattg cgcatgagac tgcctgcgca aggcaagggt atggttccca 300

aagcacccag tggtaaatcc taacttatta ttcccttaaa attccaatgt aacaacgtgg 360

gccataaaag agtttctgaa caaaacatgt catctttgtg gaaaggtgtt tttcgtaatt 420

aatgatggaa tcatgctcat ttcaaaatgg aggtccacga tttgtggcca gctgatgcct 480

gcaaattatc ct 492

<210>19

<211>1840

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR19

<400>19

tcacttcctg atattttaca ttcaaggcta gctttatgca tatgcaacct gtgcagttgc 60

acagggcttt gtgttcagaa agactagctc ttggtttaat actctgttgt tgccatcttg 120

agattcatta taatataatt tttgaatttg tgttttgaac gtgatgtcca atgggacaat 180

ggaacattca cataacagag gagacaggtc aggtggcagc ctcaattcct tgccaccctt 240

ttcacataca gcattggcaa tgccccatga gcacaaaatt tgggggaacc atgatgctaa 300

gactcaaagc acatataaac atgttacctc tgtgactaaa agaagtggag gtgctgacag 360

cccccagagg ccacagttta tgttcaaacc aaaacttgct tagggtgcag aaagaaggca 420

atggcagggt ctaagaaaca gcccatcata tccttgttta ttcatgttac gtccctgcat 480

gaactaatca cttacactga aaatattgac agaggaggaa atggaaagat agggcaaccc 540

atagttcttt ttccttttag tctttcctta tcagtaaacc aaagatagta ttggtaaaat 600

gtgtgtgagt taattaatga gttagtttta ggcagtgttt ccactgttgg ggtaagaaca 660

aaatatatag gcttgtattg agctattaaa tgtaaattgt ggaatgtcag tgattccaag 720

tatgaattaa atatccttgt atttgcattt aaaattggca ctgaacaaca aagattaaca 780

gtaaaattaa taatgtaaaa gtttaatttt tacttagaat gacattaaat agcaaataaa 840

agcaccatga taaatcaaga gagagactgt ggaaagaagg aaaacgtttt tattttagta 900

tatttaatgg gactttcttc ctgatgtttt gttttgtttt gagagagagg gatgtggggg 960

cagggaggtc tcattttgtt gcccaggctg gacttgaact cctgggctcc agctatcctg 1020

ccttagcttc ttgagtagct gggactacag gcacacacca cagtgtctga cattttctgg 1080

attttttttt tttttttatt ttttttgtga gacaggttct ggctctgtta ctcaggttgc 1140

agtgcagtgg catgatagcg gctcactgca gcctcaacct cctcagctta agctactctc 1200

ccacttcagc ctcctgagta gccaggacta cagttgtgtg ccaccacacc tgtggctaat 1260

ttttgtagag atggggtctc tccacgttgc cgaggctggt ctccaactcc tggtctcaag 1320

cgaacctcct gacttggcct cccgaagtgc tgggattaca ggcttgagcc actgcatcca 1380

gcctgtcctc tgtgttaaac ctactccaat ttgtctttca tctctacata aacggctctt 1440

ttcaaagttc ccatagacct cactgttgct aatctaataa taaattatct gccttttctt 1500

acatggttca tcagtagcag cattagattg ggctgctcaa ttcttcttgg tatattttct 1560

tcatttggct tctggggcat cacactctct ttgagttact cattcctcat tgatagcttc 1620

ttcctagtct tctttactgg ttcttcctct tctccctgac tccttaatat tgtttttctc 1680

cccaggcttt agttcttagt cctcttctgt tatctattta cacccaattc tttcagagtc 1740

tcatccagag tcatgaactt aaacctgttt ctgtgcagat aattcacatt attatatctc 1800

cagcccagac tctcccgcaa actgcagact gatcctactg 1840

<210>20

<211>780

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR20

<400>20

gatctcaagt ttcaatatca tgttttggca aaacattcga tgctcccaca tccttaccta 60

aagctaccag aaaggctttg ggaactgtca acagagctac agaaaagtca gtaaagacca 120

atggacccct caaacaaaaa cagccaagct tttctgccaa aaagatgact gagaagactg 180

ttaaagcaaa aaactctgtt cctgcctcag atgatggcta tccagaaata gaaaaattat 240

ttcccttcaa tcctctaggc ttcgagagtt ttgacctgcc tgaagagcac cagattgcac 300

atctcccctt gagtgaagtg cctctcatga tacttgatga ggagagagag cttgaaaagc 360

tgtttcagct gggcccccct tcacctttga agatgccctc tccaccatgg aaatccaatc 420

tgttgcagtc tcctttaagc attctgttga ccctggatgt tgaattgcca cctgtttgct 480

ctgacataga tatttaaatt tcttagtgct ttagagtttg tgtatatttc tattaataaa 540

gcattatttg tttaacagaa aaaaagatat atacttaaat cctaaaataa aataaccatt 600

aaaaggaaaa acaggagtta taactaataa gggaacaaag gacataaaat gggataataa 660

tgcttaatcc aaaataaagc agaaaatgaa gaaaaatgaa atgaagaaca gataaataga 720

aaacaaatag caatatgaaa gacaaacttg accgggtgtg gtggctgatg cctgtaatcc 780

<210>21

<211>607

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR21

<400>21

gatcaataat ttgtaatagt cagtgaatac aaaggggtat atactaaatg ctacagaaat 60

tccattcctg ggtataaatc ctagacatat ttatgcatat gtacaccaag atatatctgc 120

aagaatgttc acagcaaatc tctttgtagt agcaaaaggc caaaaggtct atcaacaaga 180

aaattaatac attgtggcac ataatggcat ccttatgcca ataaaaatgg atgaaattat 240

agttaggttc aaaaggcaag cctccagata atttatatca tataattcca tgtacaacat 300

tcaacaacaa gcaaaactaa acatatacaa atgtcaggga aaatgatgaa caaggttaga 360

aaatgattaa tataaaaata ctgcacagtg ataacattta atgagaaaaa aagaaggaag 420

ggcttaggga gggacctaca gggaactcca aagttcatgg taagtactaa atacataatc 480

aaagcactca aaatagaaaa tattttagta atgttttagc tagttaatat cttacttaaa 540

acaaggtcta ggccaggcac ggtggctcac acctgtaatc ccagcacttt gggaggctga 600

ggcgggt 607

<210>22

<211>1380

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR22

<400>22

cccttgtgat ccacccgcct tggcctccca aagtgctggg attacaggcg tgagtcacta 60

cgcccggcca ccctccctgt atattatttc taagtatact attatgttaa aaaaagttta 120

aaaatattga tttaatgaat tcccagaaac taggatttta catgtcacgt tttcttatta 180

taaaaataaa aatcaacaat aaatatatgg taaaagtaaa aagaaaaaca aaaacaaaaa 240

gtgaaaaaaa taaacaacac tcctgtcaaa aaacaacagt tgtgataaaa cttaagtgcc 300

tgaaaattta gaaacatcct tctaaagaag ttctgaataa aataaggaat aaaataatca 360

catagttttg gtcattggtt ctgtttatgt gatggattat gtttattgat ttgtgtatgt 420

tgaacttatc tcaatagatg cagacaaggc cttgataaaa gtttttaaca ccttttcatg 480

ttgaaaactc tcaatagact aggtattgat gaaacatatc tcaaaataat agaagctatt 540

tatgataaac ccatagccaa tatcatactg agtgggcaaa agctggaagc attccctttg 600

aaaactggca caagacaagg atgccctctc tcaccactcc tattaaatgt agtattggaa 660

gttctggcca gagcaatcag gcaggagaaa gaaaaggtat taaaatagga agagaggaag 720

tcaaattgtc tctgtttgca gtaaacatga ttgtatattt agaaaacccc attgtctcat 780

cctaaaaact ccttaagctg ataaacaact tcagcaaagt ctcaggatac aaaatcaatg 840

tgcaaaaatc acaagcattc ctatacaccg ataatagaca gcagagagcc aaatcatgag 900

tgaagtccca ttcacaattg cttcaaagaa aataaaatac ttaggaatac aactttcacg 960

ggacatgaag gacattttca aggacaacta aaaaccactg ctcaaggaaa tgagagagga 1020

cacaaagaaa tggaaaaaca ttccatgctc atggaagaat caatatcatg aaaatggcca 1080

tactgcccaa agtaatttat agattcaatg ctaaccccat caagccacca ttgactttct 1140

tcacagaact agaaaaaaac tattttaaaa ctcatatgta gtcaaaaaga gtcggtatag 1200

ccaagacaat cctaagcata aagaacaaag ctggatgcat cacgctgact tcaaaccata 1260

ctacaaggct acagtaacca aaacagcatg gtactggtac caaaacagat agatagaccg 1320

atagaacaga acagaggcct cggaaataac accacacatc tacaaccctt tgatcttcaa 1380

<210>23

<211>1246

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR23

<400>23

atcccctcat ccttcagggc agctgagcag ggcctcgagc agctggggga gcctcactta 60

atgctcctgg gagggcagcc agggagcatg gggtctgcag gcatggtcca gggtcctgca 120

ggcggcacgc accatgtgca gccgccccca cctgttgctc tgcctccgcc acctggccat 180

gggcttcagc agccagccac aaagtctgca gctgctgtac atggacaaga agcccacaag 240

cagctagagg accttgtgtt ccacgtgccc agggagcatg gcccacagcc caaagaccag 300

tcaggagcag gcaggggctt ctggcaggcc cagctctacc tctgtcttca cacagatggg 360

agatttctgt tgtgattttg agtgatgtgc ccctttggtg acatccaaga tagttgctga 420

agcaccgctc taacaatgtg tgtgtattct gaaaacgaga acttctttat tctgaaataa 480

ttgatgcaaa ataaattagt ttggatttga aattctattc atgtaggcat gcacacaaaa 540

gtccaacatt gcatatgaca caaagaaaag aaaaagcttg cattccttaa atacaaatat 600

ctgttaacta tatttgcaaa tatatttgaa tacacttcta ttatgttaca tataatatta 660

tatgtatatg tatatataat atacatatat atgttacata taatatactt ctattatgtt 720

acatataata tttatctata agtaaataca taaatataaa gatttgagta gctgtagaac 780

attgtcttat gtgttatcag ctactactac aaaaatatct cttccactta tgccagtttg 840

ccatataaat atgatcttct cattgatggc ccagggcaag agtgcagtgg gtacttattc 900

tctgtgagga gggaggagaa aagggaacaa ggagaaagtc acaaagggaa aactctggtg 960

ttgccaaaat gtcaagtttc acatattccg agacggaaaa tgacatgtcc cacagaagga 1020

ccctgcccag ctaatgtgtc acagatatct caggaagctt aaatgatttt tttaaaagaa 1080

aagagatggc attgtcactt gtttcttgta gctgaggctg tgggatgatg cagatttctg 1140

gaaggcaaag agctcctgct ttttccacac cgagggactt tcaggaatga ggccagggtg 1200

ctgagcacta caccaggaaa tccctggaga gtgtttttct tactta 1246

<210>24

<211>939

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR24

<400>24

acgaggtcac gagttcgaga ccagcctggc caagatggtg aagccctgtc tctactaaaa 60

atacaacaag tagccgggcg cggtgacggg cgcctgtaat cccagctact caggaggctg 120

aagcaggaga atctctagaa cccaggaggc ggaggtgcag tgagctgaga ctgccccgct 180

gcactctagc ctgggcaaca cagcaagact ctgtctcaaa taaataaata aataaataaa 240

taaataaata aataaataaa tagaaaggga gagttggaag tagatgaaag agaagaaaag 300

aaatcctaga tttcctatct gaaggcacca tgaagatgaa ggccacctct tctgggccag 360

gtcctcccgt tgcaggtgaa ccgagttctg gcctccattg gagaccaaag gagatgactt 420

tggcctggct cctagtgagg aagccatgcc tagtcctgtt ctgtttgggc ttgatcctgt 480

atcacttgat tgtctctcct ggactttcca tggattccag ggatgcaact gagaagttta 540

tttttaatgc acttacttga agtaagagtt attttaaaac attttagcaa aggaaatgaa 600

ttctgacagg ttttgcactg aagacattca catgtgagga aaacaggaaa accactatgc 660

tagaaaaagc aaatgctgtt gagattgtct cacaaacaca aattgcgtgc cagcaggtag 720

gtttgagcct caggttgggc acattttacc ttaagcgcac tgttggtgga acttaaggtg 780

actgtaggac ttatatatac atacatacat ataatatata tacatattta tgtgtatata 840

cacacacaca cacacacaca cacacagggt cttgctatct tgcccagggt ggtctccaac 900

tctgggtctc aagcgatcct ctgcctcccc ttcccaaag 939

<210>25

<211>1067

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR25

<400>25

cagcccctct tgtgtttttc tttatttctc gtacacacac gcagttttaa gggtgatgtg 60

tgtataatta aaaggaccct tggcccatac tttcctaatt ctttagggac tgggattggg 120

tttgactgaa atatgttttg gtggggatgg gacggtggac ttccattctc cctaaactgg 180

agttttggtc ggtaatcaaa actaaaagaa acctctggga gactggaaac ctgattggag 240

cactgaggaa caagggaatg aaaaggcaga ctctctgaac gtttgatgaa atggactctt 300

gtgaaaatta acagtgaata ttcactgttg cactgtacga agtctctgaa atgtaattaa 360

aagtttttat tgagcccccg agctttggct tgcgcgtatt tttccggtcg cggacatccc 420

accgcgcaga gcctcgcctc cccgctgccc tcagcctccg atgacttccc cgcccccgcc 480

ctgctcggtg acagacgttc tactgcttcc aatcggaggc acccttcgcg ggagcggcca 540

atcgggagct ccggcaggcg gggaggccgg gccagttaga tttggaggtt caacttcaac 600

atggccgaag caagtagcgc caatctaggc agcggctgtg aggaaaaaag gcatgagggg 660

tcgtcttcgg aatctgtgcc acccggcact accatttcga gggtgaagct cctcgacacc 720

atggtggaca cttttcttca gaagctggtc gccgccggca ggtaaagtgg acgcagccgc 780

ggtgggagtg tttgttggca ccgaagctca aatcccgcga ggtcaggacg gccgcaggct 840

ggcgcgcggt gacgtgggtc cgcgttgggg gcggggcagt cggacgaggc gacccagtca 900

aatcctgagc cttaggagtc agggtattca cgcactgata acctgtagcg gaccgggata 960

gctagctact ccttcctaca ggaagccccg ttttcactaa aatttcaggt ggttgggagg 1020

aaagatagag cctttgcaaa ttagagcagg gttttttatt tttttat 1067

<210>26

<211>540

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR26

<400>26

ccccctgaca agccccagtg tgtgatgttc cccactctgt gtccatgcat tctcattgtt 60

caactcccat ctgtgagtga gaacatgcag tgtttggttt tctgtccttg agatagtttg 120

ctgagaatga tggtttccag cttcatccat gtccttgcaa aggaagtgaa cttatccttt 180

tttatggctt catagtattc catggcacat atgtgccaca tttttttaat ccagtctatc 240

attgatggac atttgggttg gttccaagtc tttgctattg tgaatagcac cacaattaac 300

atatgtgtgc atgtatacat ctttatagta gcatgattta taatccttcg ggtatatacc 360

ctgtaatggg atcgctgggt caaatggtat ttctagttct agatccttga ggaatcacca 420

cactgctttc cacaatggtt gaactaattt acgctcccac cagcagtgta aaagcattcc 480

tatttctcca cgtcctctcc agtatctgtt gtttcctgac tttttaatga tcatcattct 540

<210>27

<211>1520

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR27

<400>27

cttggccctc acaaagcctg tggccaggga acaattagcg agctgcttat tttgctttgt 60

atccccaatg ctgggcataa tgcctgccat tatgagtaat gccggtagaa gtatgtgttc 120

aaggaccaaa gttgataaat accaaagaat ccagagaagg gagagaacat tgagtagagg 180

atagtgacag aagagatggg aacttctgac aagagttgtg aagatgtact aggcaggggg 240

aacagcttaa ggagagtcac acaggaccga gctcttgtca agccggctgc catggaggct 300

gggtggggcc atggtagctt tcccttcctt ctcaggttca gagtgtcagc cttgaacttc 360

taattcccag aggcatttat tcaatgtttt cttctagggg catacctgcc ctgctgtgga 420

agactttctt ccctgtgggt cgccccagtc cccagatgag acggtttggg tcagggccag 480

gtgcaccgtt gggtgtgtgc ttatgtctga tgacagttag ttactcagtc attagtcatt 540

gagggaggtg tggtaaagat ggagatgctg ggtcacatcc ctagagaggt gttccagtat 600

gggcacatgg gagggctgga aggataggtt actgctagac gtagagaagc cacatccttt 660

aacaccctgg cttttcccac tgccaagatc cagaaagtcc ttgtggtttc gctgctttct 720

cctttttttt tttttttttt tttctgagat ggagtctggc tctgtcgccc aggctggagt 780

gcagtggcac gatttcggct cactgcaagt tccgcctcct aggttcatac cattctccca 840

cctcagcctc ccgagtagct gggactacag gcgccaccac acccagctaa ttttttgtat 900

ttttagtaga gacggcgttt caccatgtta gccaggatgg tcttgatccg cctgcctcag 960

cctcccaaag tgctgggatt acaggcgtga gccaccgcgc ccggcctgct ttcttctttc 1020

atgaagcatt cagctggtga aaaagctcag ccaggctggt ctggaactct tgacctcaag 1080

tgatctgcct gcctcagcct cccaaagtgc tgagattaca ggcatgagcc agtccgaatg 1140

tggctttttt tgttttgttt tgaaacaagg tctcactgtt gcccaggctg cagtgcagtg 1200

gcatacctca gctccactgc agcctcgacc tcctgggctc aagcaatcct cccaactgag 1260

cctccccagt agctggggct acaagcgcat gccaccacgc ctggctattt tttttttttt 1320

tttttttttt gagaaggagt ttcattcttg ttgcccaggc tggagtgcaa tggcacagtc 1380

tcagctcact gcagcctccg cctcctgggt tcaagcgatt ctcctgcctc agcctcccga 1440

gtagctggga ttataggcac ctgccaccat gcctggctaa tttttttgta tttttagtag 1500

ggatggggtt tcaccatgtt 1520

<210>28

<211>961

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR28

<400>28

aggaggttat tcctgagcaa atggccagcc tagtgaactg gataaatgcc catgtaagat 60

ctgtttaccc tgagaagggc atttcctaac tctccctata aaatgccaag tggagcaccc 120

cagatgaaat agctgatatg ctttctatac aagccatcta ggactggctt tatcatgacc 180

aggatattca cccactgaat atggctatta cccaagttat ggtaaatgct gtagttaagg 240

gggtcccttc cacatggaca ccccaggtta taaccagaaa gggttcccaa tctagactcc 300

aagagagggt tcttagacct catgcaagaa agaacttggg gcaagtacat aaagtgaaag 360

caagtttatt aagaaagtaa agaaacaaaa aaatggctac tccataagca aagttatttc 420

tcacttatat gattaataag agatggatta ttcatgagtt ttctgggaaa ggggtgggca 480

attcctggaa ctgagggttc ctcccacttt tagaccatat agggtatctt cctgatattg 540

ccatggcatt tgtaaactgt catggcactg atgggagtgt cttttagcat tctaatgcat 600

tataattagc atataatgag cagtgaggat gaccagaggt cacttctgtt gccatattgg 660

tttcagtggg gtttggttgg cttttttttt tttttaacca caacctgttt tttatttatt 720

tatttattta tttatttatt tatatttttt attttttttt agatggagtc ttgctctgtc 780

acccaggtta gagtgcagtg gcaccatctc ggctcactgc aagctctgcc tccttggttc 840

acgccattct gctgcctcag cctcccgagt agctgggact acaggtgcct gccaccatac 900

ccggctaatt ttttctattt ttcagtagag acggggtttc accgtgttag ccaggatggt 960

c 961

<210>29

<211>2233

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR29

<400>29

agcttggaca cttgctgatg ccactttgga tgttgaaggg ccgccctctc ccacaccgct 60

ggccactttt aaatatgtcc cctctgccca gaagggcccc agaggagggg ctggtgaggg 120

tgacaggagt tgactgctct cacagcaggg ggttccggag ggaccttttc tccccattgg 180

gcagcataga aggacctaga agggccccct ccaagcccag ctgggcgtgc agggccagcg 240

attcgatgcc ttcccctgac tcaggtggcg ctgtcctaaa ggtgtgtgtg ttttctgttc 300

gccagggggt ggcggataca gtggagcatc gtgcccgaag tgtctgagcc cgtggtaagt 360

ccctggaggg tgcacggtct cctccgactg tctccatcac gtcaggcctc acagcctgta 420

ggcaccgctc ggggaagcct ctggatgagg ccatgtggtc atccccctgg agtcctggcc 480

tggcctgaag aggaggggag gaggaggcca gcccctccct agccccaagg cctgcgaggc 540

tgcaagcccg gccccacatt ctagtccagg cttggctgtg caagaagcag attgcctggc 600

cctggccagg cttcccagct aggatgtggt atggcagggg tgggggacat tgaggggctg 660

ctgtagcccc cacaacctcc ccaggtaggg tggtgaacag taggctggac aagtggacct 720

gttcccatct gagattcaag agcccacctc tcggaggttg cagtgagccg agatccctcc 780

actgcactcc agcctgggca acagagcaag actctgtctc aaaaaaacag aacaacgaca 840

acaaaaaacc cacctctggc ccactgccta actttgtaaa taaagtttta ttggcacata 900

gacacaccca ttcatttaca tactgctgcg gctgcttttg cattaccctt gagtagacga 960

cagaccacgt ggccatggaa gccaaaaata tttactgtct ggccctttac agaagtctgc 1020

tctagaggga gaccccggcc catggggcag gaccactggg cgtgggcaga agggaggcct 1080

cggtgcctcc acgggcctag ttgggtatct cagtgcctgt ttcttgcatg gagcaccagg 1140

ggtcagggca agtacctgga ggaggcaggc tgttgcccgc ccagcactgg gacccaggag 1200

accttgagag gctcttaacg aatgggagac aagcaggacc agggctccca ttggctgggc 1260

ctcagtttcc ctgcctgtaa gtgagggagg gcagctgtga aggtgaactg tgaggcagag 1320

cctctgctca gccattgcag gggcggctct gccccactcc tgttgtgcac ccagagtgag 1380

gggcacgggg tgagatgtca ccatcagccc ataggggtgt cctcctggtg ccaggtcccc 1440

aagggatgtc ccatcccccc tggctgtgtg gggacagcag agtccctggg gctgggaggg 1500

ctccacactg ttttgtcagt ggtttttctg aactgttaaa tttcagtgga aaattctctt 1560

tcccctttta ctgaaggaac ctccaaagga agacctgact gtgtctgaga agttccagct 1620

ggtgctggac gtcgcccaga aagcccaggt actgccacgg gcgccggcca ggggtgtgtc 1680

tgcgccagcc atgggcacca gccaggggtg tgtctacgcc ggccaggggt aggtctccgc 1740

cggcctccgc tgctgcctgg ggagggccgt gcctgacact gcaggcccgg tttgtccgcg 1800

gtcagctgac ttgtagtcac cctgcccttg gatggtcgtt acagcaactc tggtggttgg 1860

ggaaggggcc tcctgattca gcctctgcgg acggtgcgcg agggtggagc tcccctccct 1920

ccccaccgcc cctggccagg gttgaacgcc cctgggaagg actcaggccc gggtctgctg 1980

ttgctgtgag cgtggccacc tctgccctag accagagctg ggccttcccc ggcctaggag 2040

cagccgggca ggaccacagg gctccgagtg acctcagggc tgcccgacct ggaggccctc 2100

ctggcgtcgc ggtgtgactg acagcccagg agcgggggct gttgtaattg ctgtttctcc 2160

ttcacacaga accttttcgg gaagatggct gacatcctgg agaagatcaa gaagtaagtc 2220

ccgcccccca ccc 2233

<210>30

<211>1851

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR30

<400>30

gggtgcattt ccacccaggg gacacttggc aatggtggga gacattgctt gttgtcacaa 60

ctgggcatgg gagtgctgct gcgtctagtg ggtagaggcc agagatgctc ctaatatcct 120

acaaggcaca gaacagcccc ccacaacaga gaattatcca gcctgaaaat gtccacagtg 180

ctgaggttgg gaaaccctat tctagagcca acaggctgtg aagcttgact catggttcca 240

tcaccaatag ctgcgtgacc ttggtgagtt ccttagctgc tctgtgcctc ggattcatgg 300

taggttttcc ttgttaggtt taaatgagtg aagttataca gagggcctga agtctcatgg 360

tattttacta gagcctcatt gtgttttagt tataattaga aattgggtaa ggtaaggaca 420

cagaagaagc catctgatct gggggcttca cacttagaag tgacctcgga gcaattgtat 480

tggggtggaa agggactaac agccaggagc agagggcaca ttggaattgg ggccagaggg 540

cacagactgc cttgtccatc aggcatagca atggacagag gaaggggaat gactagttat 600

ggctgcaagg ccaagtacag gggacttatt tctcatatct atctatctat ctacctaccg 660

tctatttatc tatcatctat ctacttattt atctatctat ttatgcatgt gtaccaaccg 720

aaagttttag taaatgcaca aactgcgata taatgaaaat ggaaattttc aaaagaagag 780

aaatcacctg ccacctgact accttaacaa atgagtggtt ttcatctctc cttccaggcc 840

tgtcattttt acagtgcttt agtcataaaa caggtcctct attctattgt tttatgtcac 900

atgaaattgt accataagca ttttccatga tgtgactcca ctgtttcatt ttccattttt 960

ttccagaatg aagataacct cattgttttt ttcctgattg taaaaatgct ctgtgctctt 1020

tttttttttt tttaacaatg caggcagtac caaaaagtat gaagaagaat gtaatagttc 1080

ccatttccca tctcactctt taaggccagc attttggtga acatccatcc gaacaaatct 1140

ccacgcgttt atcaatttgt tgacttactc cttcttttat gtaaatatga acatgattta 1200

actgccagtc catttggaac cttaaagtga aggtttttta ttgttggggt ttgctatggt 1260

ctgaatatgt gtgtcccccc aaaatttatg ttgaatccta acgcccaatg cgattaggag 1320

gtggggccat taggaggtga ttaagtcatg aagtcatcag ccctaatgaa tgggatttgt 1380

ggccttgaaa agggacccca gagagctgcc ttgccccttc tgccatgtaa ggacacagtg 1440

aggagctagg aagggggcct cagcagagac caaatgtgat ggtgcctcga tattggactt 1500

cccagcctcc agaatgtgag aaatgaattt ctgttgttta taagtcaccc agtctatagt 1560

attttgttct agcagcccaa acagactaag tcagggttgt tgttttagga agtggggaat 1620

ggggccatgc atgggtgtac gccagaacaa aggaagccag caagtcctga aagatactgg 1680

aaaagggaat agtgggcacg tgcagtgtgt tagtttcctg aggctgctat aacaaagcac 1740

cacaggttgg gtggcttaaa taacagaaat tcattctccc atcattctgg ggaccagacg 1800

tctgaaatca agactcctat gccatgctcc ttctgaaggc tccaggggag g 1851

<210>31

<211>1701

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR31

<220>

<221>misc_feature

<222>(159)..(1696)

<223>n is a，c，g，or t on various positions

<400>31

cacccgcctt ggccccccag agtgctggga ttacaagtgt aaaccaccat tcctggctag 60

atttaatttt ttaaaaaata aagagaagta ggaatagttc attttaggga gagcccctta 120

actgggacag gggcaggaca ggggtgaggc ttcccttant tcaagctcac ctcaaaccca 180

cccaggactg tgtgtcacat tctccaataa aggaaaggtt gctgcccccg cctgtgagtg 240

ctgcagtgga gggtagaggg ccgtgggcag agtgcttcat ggactgctca tcaagaaagg 300

cttcatgaca atcggcccag ctgctgtcat cccacattct acttccagct aggagaaggc 360

ggcttgccca cagtcaccca gccggcaagt gtcacccctg ggttggaccc agagctatga 420

tcctgcccag gggtccagct gagaatcagg cccacgttct aggcagaggg gctcacctac 480

tgggactcca gtagctgtag tgcatggagg catcatggct gcagcagcct ggacctggtc 540

tcacactggc tgtccctgtg ggcaggccat cctcaatgcc aggtcaggcc caagcatgta 600

tcccagacaa tgacaatggg gtggaatcct ctcttgtccc agaagccact cctcactgtt 660

ctacctgagg aaggcagggg catggtggaa tcctgaagcc tgctgtgagg gtctccagcg 720

aacttgcaca tggtcagccc tgccttctcc tccctgaact agattgagcg agagcaagaa 780

ggacattgaa ccagcaccca aagaattttg gggaacggcc tctcatccag gtcaggctca 840

cctccttttt aaaatttaat taattaatta attaattttt ttttagagac agagtcttac 900

tgtgtggccc aggctgtagt gcagtggcac aatcatagtt cactgcagcc tcaaactccc 960

cacctcagcc tctggattag ctgagactac aggtgcacca ccaccacacc cagctaatat 1020

ttttattttt gtagagagag ggtttcacca tcttgcccag gctggtctca aactcctggg 1080

ctcaagtgat cccgcccagg tctgaaagcc cccaggctgg cctcagactg tggggttttc 1140

catgcagcca cccgagggcg cccccaagcc agttcatctc ggagtccagg cctggccctg 1200

ggagacagag tgaaaccagt ggtttttatg aacttaactt agagtttaaa agatttctac 1260

tcgatcactt gtcaagatgc gccctctctg gggagaaggg aacgtgactg gattccctca 1320

ctgttgtatc ttgaataaac gctgctgctt catcctgtgg gggccgtggc cctgtccctg 1380

tgtgggtggg gcctcttcca tttccctgac ttagaaacca cagtccacct agaacagggt 1440

ttgagaggct tagtcagcac tgggtagcgt tttgactcca ttctcggctt tcttcttttt 1500

ctttccagga tttttgtgca gaaatggttc ttttgttgcc gtgttagtcc tccttggaag 1560

gcagctcaga aggcccgtga aatgtcgggg gacaggaccc ccagggaggg aaccccaggc 1620

tacgcacttt agggttcgtt ctccagggag ggcgacctga cccccgnatc cgtcggngcg 1680

cgnngnnacn aannnnttcc c 1701

<210>32

<211>771

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR32

<400>32

gatcacacag cttgtatgtg ggagctagga ttggaacccc agaagtctgg ccccaggttc 60

atgctctcac ccactgcata caatggcctc tcataaatca atccagtata aaacattaga 120

atctgcttta aaaccataga attagtagcg taagtaataa atgcagagac catgcagtga 180

atggcattcc tggaaaaagc ccccagaagg aattttaaat cagctttcgt ctaatcttga 240

gcagctagtt agcaaatatg agaatacagt tgttcccaga taatgcttta tgtctgacca 300

tcttaaactg gcgctgtttt tcaaaaactt aaaaacaaaa tccatgactc ttttaattat 360

aaaagtgata catgtctact tgggaggctg aggtggtggg aggatggctt gagtttgagg 420

ctgcagtatg ctactatcat gcctataaat agccgctgca ttccagcttg ggcaacatac 480

ccaggcccta tctcaaaaaa ataaaaagta atacatctac attgaagaaa attaatttta 540

ttgggttttt ttgcattttt attatacaca gcacacacag cacatatgaa aaaatgggta 600

tgaactcagg cattcaactg gaagaacagt actaaatcaa tgtccatgta gtcagcgtga 660

ctgaggttgg tttgtttttt cttttttctt ctcttctctt ctcttttctt tttttttgag 720

acggagcttt gctctttttg cccaggcttg attgcaatgg cgtgatctca g 771

<210>33

<211>1368

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR33

<400>33

gcttttatcc tccattcaca gctagcctgg cccccagagt acccaattct ccctaaaaaa 60

cggtcatgct gtatagatgt gtgtggcttg gtagtgctaa agtggccaca tacagagctc 120

tgacaccaaa cctcaggacc atgttcatgc cttctcactg agttctggct tgttcgtgac 180

acattatgac attatgatta tgatgacttg tgagagcctc agtcttctat agcactttta 240

gaatgcttta taaaaaccat ggggatgtca ttatattcta acctgttagc acttctgttc 300

gtattaccca tcacatccca acatcaattc tcatatatgc aggtacctct tgtcacgcgc 360

gtccatgtaa ggagaccaca aaacaggctt tgtttgagca acaaggtttt tatttcacct 420

gggtgcaggt gggctgagtc tgaaaagaga gtcagtgaag ggagacaggg gtgggtccac 480

tttataagat ttgggtaggt agtggaaaat tacaatcaaa gggggttgtt ctctggctgg 540

ccagggtggg ggtcacaagg tgctcagtgg gagagccttt gagccaggat gagccagaag 600

gaatttcaca aggtaatgtc atcagttaag gcagggactg gccattttca cttcttttgt 660

ggtggaatgt catcagttaa ggcaggaacc ggccattttc acttcttttg tgattcttca 720

cttgcttcag gccatctgga cgtataggtg caggtcacag tcacagggga taagatggca 780

atggcatagc ttgggctcag aggcctgaca cctctgagaa actaaagatt ataaaaatga 840

tggtcgcttc tattgcaaat ctgtgtttat tgtcaagagg cacttatttg tcaattaaga 900

acccagtggt agaatcgaat gtccgaatgt aaaacaaaat acaaaacctc tgtgtgtgtg 960

tgtgtgtgag tgtgtgtgta tgtgtgtgtg tgtgtattag agaggaaaag cctgtatttg 1020

gaggtgtgat tcttagattc taggttcttt cctgcccacc ccatatgcac ccaccccaca 1080

aaagaacaaa caacaaatcc caggacatct tagcgcaaca tttcagtttg catattttac 1140

atatttactt ttcttacata ttaaaaaact gaaaatttta tgaacacgct aagttagatt 1200

ttaaattaag tttgttttta cactgaaaat aatttaatat ttgtgaagaa tactaataca 1260

ttggtatatt tcattttctt aaaattctga acccctcttc ccttatttcc ttttgacccg 1320

attggtgtat tggtcatgtg actcatggat ttgccttaag gcaggagg 1368

<210>34

<211>755

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR34

<400>34

actgggcacc ctcctaggca ggggaatgtg agaactgccg ctgctctggg gctgggcgcc 60

atgtcacagc aggagggagg acggtgttac accacgtggg aaggactcag ggtggtcagc 120

cacaaagctg ctggtgatga ccaggggctt gtgtcttcac tctgcagccc taacacccag 180

gctgggttcg ctaggctcca tcctgggggt gcagaccctg agagtgatgc cagtgggagc 240

ctcccgcccc tccccttcct cgaaggccca ggggtcaaac agtgtagact cagaggcctg 300

agggcacatg tttatttagc agacaaggtg gggctccatc agcggggtgg cctggggagc 360

agctgcatgg gtggcactgt ggggagggtc tcccagctcc ctcaatggtg ttcgggctgg 420

tgcggcagct ggcggcaccc tggacagagg tggatatgag ggtgatgggt ggggaaatgg 480

gaggcacccg agatggggac agcagaataa agacagcagc agtgctgggg ggcaggggga 540

tgagcaaagg caggcccaag acccccagcc cactgcaccc tggcctccca caagccccct 600

cgcagccgcc cagccacact cactgtgcac tcagccgtcg atacactggt ctgttaggga 660

gaaagtccgt cagaacaggc agctgtgtgt gtgtgtgcgt gtatgagtgt gtgtgtgtga 720

tccctgactg ccaggtcctc tgcactgccc ctggg 755

<210>35

<211>1193

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR35

<220>

<221>misc_feature

<222>(312)..(1191)

<223>n is a，c，g，or t on various positions

<400>35

cgacttggtg atgcgggctc ttttttggtt ccatatgaac tttaaagtag tcttttccaa 60

ttctgtgaag aaagtcattg gtaggttgat ggggatggca ttgaatctgt aaattacctt 120

gggcagtatg gccattttca caatgttgat tcttcctatc catgatgatg gaatgttctt 180

ccattagttt gtatcctctt ttatttcctt gagcagtggt ttgtagttct ccttgaagag 240

gtccttcaca tcccttgtaa gttggattcc taggtatttt attctctttg aagcaaattg 300

tgaatgggag tncactcacg atttggctct ctgtttgtct gctgggtgta taaanaatgt 360

ngtgatnttn gtacattgat ttngtatccn tgagacttng ctgaatttgc ttnatcngct 420

tnngggaacc ttttgggctg aaacnatggg attttctaaa tatacaatca tgtcgtctgc 480

aaacagggaa caatttgact tcctcttttc ctaattgaat acactttatc tccttctcct 540

gcctaattgc cctgggcaaa acttccaaca ctatgntngn aataggagnt ggtgagagag 600

ggcatccctg ttcttgttgc cagnttttca aagggaatgc ttccagtttt ggcccattca 660

gtatgatatg ggctgtgggt ngtgtcataa atagctctta tnattttgaa atgtgtccca 720

tcaataccta atttattgaa agtttttagc atgaangcat ngttgaattt ggtcaaaggc 780

tttttctgca tctatggaaa taatcatgtg gtttttgtct ttggctcntg tttatatgct 840

ggatnacatt tattgatttg tgtatatnga acccagcctn ncatcccagg gatgaagccc 900

acttgatcca agcttggcgc gcngnctagc tcgaggcagg caaaagtatg caaagcatgc 960

atctcaatta gtcagcaccc atagtccgcc cctacctccg cccatccgcc cctaactcng 1020

nccgttcgcc cattctcgcc catggctgac taatnttttt annatccaag cggngccgcc 1080

ctgcttganc attcagagtn nagagnnttg gaggccnagc cttgcaaaac tccggacngn 1140

ttctnnggat tgaccccnnt taaatatttg gttttttgtn ttttcanngg nga 1193

<210>36

<211>1712

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR36

<400>36

gatcccatcc ttagcctcat cgatacctcc tgctcacctg tcagtgcctc tggagtgtgt 60

gtctagccca ggcccatccc ctggaactca ggggactcag gactagtggg catgtacact 120

tggcctcagg ggactcagga ttagtgagcc ccacatgtac acttggcctc agtggactca 180

ggactagtga gccccacatg tacacttggc ctcaggggac tcaggattag tgagccccca 240

catgtacact tggcctcagg ggactcagga ttagtgagcc ccacatgtac acttggcctc 300

aggggactca ggactagtga gccccacatg tacacttggc ctcaggggac tcagaactag 360

tgagccccac atgtacactt ggcttcaggg gactcaggat tagtgagccc cacatgtaca 420

cttggacacg tgaaccacat cgatgtgctg cagagctcag ccctctgcag atgaaatgtg 480

gtcatggcat tccttcacag tggcacccct cgttccctcc ccacctcatc tcccattctt 540

gtctgtcttc agcacctgcc atgtccagcc ggcagattcc accgcagcat cttctgcagc 600

acccccgacc acacacctcc ccagcgcctg cttggccctc cagcccagct cccgcctttc 660

ttccttgggg aagctccctg gacagacacc ccctcctccc agccatggct ttttcctgct 720

ctgccccacg cgggaccctg ccctggatgt gctacaatag acacatcaga tacagtcctt 780

cctcagcagc cggcagaccc agggtggact gctcggggcc tgcctgtgag gtcacacagg 840

tgtcgttaac ttgccatctc agcaactagt gaatatgggc agatgctacc ttccttccgg 900

ttccctggtg agaggtactg gtggatgtcc tgtgttgccg gccacctttt gtccctggat 960

gccatttatt tttttccaca aatatttccc aggtctcttc tgtgtgcaag gtattagggc 1020

tgcagcgggg gccaggccac agatctctgt cctgagaaga cttggattct agtgcaggag 1080

actgaagtgt atcacaccaa tcagtgtaaa ttgttaactg ccacaaggag aaaggccagg 1140

aaggagtggg gcatggtggt gttctagtgt tacaagaaga agccagggag ggcttcctgg 1200

atgaagtggc atctgacctg ggatctggag gaggagaaaa atgtcccaaa agagcagaga 1260

gcccacccta ggctctgcac caggaggcaa cttgctgggc ttatggaatt cagagggcaa 1320

gtgataagca gaaagtcctt gggggccaca attaggattt ctgtcttcta aagggcctct 1380

gccctctgct gtgtgacctt gggcaagtta cttcacctct agtgctttgg ttgcctcatc 1440

tgtaaagtgg tgaggataat gctatcacac tggttgagaa ttgaagtaat tattgctgca 1500

aagggcttat aagggtgtct aatactagta ctagtaggta cttcatgtgt cttgacaatt 1560

ttaatcatta ttattttgtc atcaccgtca ctcttccagg ggactaatgt ccctgctgtt 1620

ctgtccaaat taaacattgt ttatccctgt gggcatctgg cgaggtggct aggaaagcct 1680

ggagctgttt cctgttgacg tgccagacta gt 1712

<210>37

<211>1321

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR37

<400>37

aggatcacat ttaaggaagt gtgtggggtc cctggatgac accagcaccc agtgcggctc 60

tgtctggcaa ccgctcccaa ggtggcagga gtgggtgtcc cctgtgtgtc agtgggcagc 120

tcctgctgag cctacagctc actggggagc ctgacagcgg ggccatgtgc ctgacactcc 180

tctctgcttg tggacctggc aaggcaggga gcagaaaaca gagccacttg aaggctttct 240

gtctgcgtct gtgtgcagtg tggatttagt tgtgcttttt tcttgctggg agagcacagc 300

caccatttac aagcagtgtc accctcatgg gtggcgagga cagaacagga gcctctgctc 360

tctgtaccta tctgggcccg gtgggctccc ttgtcctggc ttccatctct gtctcagcga 420

ccattcagcc ctgcgcagga acacatgttg cttagaaaag ccaaattcag cccttgtctc 480

tgcctcctct ggtctcatga tgtgcatctg ttaccttgaa actggaaacc agtctatcaa 540

tgtctgtgcc aattttttat tccctcccca acctccttcc ccatacgact ttttatttat 600

gtaggatgtg tgctgtctaa tgatgggatg accacatttt tccatgttct aaaagtgctc 660

ctctcccgca gggtcccagg gctggtggtt gctttgggtc tacagctacg tcttacccgc 720

ctcctgcctc aacagcctgt gtggtggcaa agccggtgtg gggctgggga acgcagcgtt 780

ctccaggagg gggacccggc tctccttctg cagtgcaggc gaaggcctag atgccagtgt 840

gacctcccac aaggcgtggc ttccagactc cccggctgga agtgatgctt ttttgcctcc 900

ggccctgggt ttgaagcagc ctggctttct cttggtaagt ggctggtgtc ttagcagctg 960

caatctgagc tcagccacct acacaccacc gtggccgaca ctttcattaa aaagtttcct 1020

gagacgactt gcgtgcatgt tgacttcatg atcagcgccg ctgggaagaa cccctgagcc 1080

ggtggggtgg ggctggaagc agcaggtgca gtgatggggc tgggtgccca ggaggcctca 1140

gtgctcaatc aggccaaggt ggccaagccc aggctgcagg gaaggccggc ctgggggttg 1200

tgggtgagca caggcaggca ccagctgggc agtgttagga tgctggagca gcatccgtaa 1260

ccccactgag tggggtagtc tggttggggc agggaccgct gttgctttgg cagagagaga 1320

t 1321

<210>38

<211>1445

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR38

<220>

<221>misc_feature

<222>(348)..(949)

<223>n is a，c，g，or t on various positions

<400>38

gatctatggg agtagcttcc ttagtgagct ttcccttcaa atactttgca accaggtaga 60

gaattttgga gtgaaggttt tgttcttcgt ttcttcacaa tatggatatg catcttcttt 120

tgaaaatgtt aaagtaaatt acctctcttt tcagatactg tcttcatgcg aacttggtat 180

cctgtttcca tcccagcctt ctataaccca gtaacatctt ttttgaaacc agtgggtgag 240

aaagacacct ggtcaggaac gcggaccaca ggacaactca ggctcaccca cggcatcaga 300

ctaaaggcaa acaaggactc tgtataaagt accggtggca tgtgtatnag tggagatgca 360

gcctgtgctc tgcagacagg gagtcacaca gacacttttc tataatttct taagtgcttt 420

gaatgttcaa gtagaaagtc taacattaaa tttgattgaa caattgtata ttcatggaat 480

attttggaac ggaataccaa aaaatggcaa tagtggttct ttctggatgg aagacaaact 540

tttcttgttt aaaataaatt ttattttata tatttgaggt tgaccacatg accttaagga 600

tacatataga cagtaaactg gttactacag tgaagcaaat taacatatct accatcgtac 660

atagttacat ttttttgtgt gacaggaaca gctaaaatct acgtatttaa caaaaatcct 720

aaagacaata catttttatt aactatagcc ctcatgatgt acattagatc gtgtggttgt 780

ttcttccgtc cccgccacgc cttcctcctg ggatggggat tcattcccta gcaggtgtcg 840

gagaactggc gcccttgcag ggtaggtgcc ccggagcctg aggcgggnac tttaanatca 900

gacgcttggg ggccggctgg gaaaaactgg cggaaaatat tataactgna ctctcaatgc 960

cagctgttgt agaagctcct gggacaagcc gtggaagtcc cctcaggagg cttccgcgat 1020

gtcctaggtg gctgctccgc ccgccacggt catttccatt gactcacacg cgccgcctgg 1080

aggaggaggc tgcgctggac acgccggtgg cgcctttgcc tgggggagcg cagcctggag 1140

ctctggcggc agcgctggga gcggggcctc ggaggctggg cctggggacc caaggttggg 1200

cggggcgcag gaggtgggct cagggttctc cagagaatcc ccatgagctg acccgcaggg 1260

cggccgggcc agtaggcacc gggcccccgc ggtgacctgc ggacccgaag ctggagcagc 1320

cactgcaaat gctgcgctga ccccaaatgc tgtgtccttt aaatgtttta attaagaata 1380

attaataggt ccgggtgtgg aggctcaagc cttaatcccc agcacctggc gaggccgagg 1440

aggga 1445

<210>39

<211>2331

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR39

<400>39

gtgaaataga tcactaaagc tgattcctct tgtctaaatg aaactttcta ccctttgatg 60

gacagctatg ctttccccat cctctcccgt cccccagccc ttggtaacca tcatcctact 120

ctctacttgt aggagttcaa cttgtttaga ttttgtgagt gagaacatgt ggtatttgcc 180

tttagagtcc tctaggttta tccatattgt gttaaatgac aggattccct gcctttttaa 240

ggctgaatag tatttcattg taatatatat acatacacac acacatatac acacacatat 300

atatacatat atacatatat gtacatagat acatatatat gtacatatat acacacacat 360

atacacacat atatacacat atatacatat acatatatac acatatatgt acatatatat 420

aacttttttt catttatcca ttcacttaat acatatgatg gagggcttta tatatgccag 480

gctctgtgat gaatgctgga aattcaatag tgagaaagac tcagtctctg cctccaaaga 540

gcatcatggg ctaggtgctg caacgaggaa ttgccaactg ttgtcatgag agcacagaga 600

agggactcaa ccagccttga agaatcaggg gaggcttcta agctaatggt gtgtgcctgg 660

ggatcacatt gtttcaagca gcagtaacag gatgtgctca ggtccagatg tgagagagag 720

agagagcata tgtcttcaag aaactaacag tagctcccta tagctgaagc aggagtacaa 780

aatagtgagt ttaagtgatg aggcaagaga tatgaagaag cttgaccatg cagctacacc 840

gggcagcatg ccctctgaga catctcatgg aagccggaaa tgggagtgcc ttgataccaa 900

gccagagaaa ttataatact aagtagatag actgagcagc actcctcctg ggaagaatga 960

gacaagccct gaatttggag gtaagttgtg gattggtgat tagaggagag gtaacaggca 1020

ccaaagcaag aaatagtatt gatgcaaagc tgaggttaat tggatgacaa aatgaagagc 1080

ataaggggct cagacacaga ctgagcagaa aacgagtagc atctgaacct agattgagtt 1140

actaatggat gagaaagagt tcttaaagtt gatgaccacg ggatccatat ataagaatgt 1200

ccaatctccc caaattgatc cacgagttca gtgcaatgcc aatcaaaatc ccactaacaa 1260

gtttatttta aaatgtaaat gaaaatacaa aatttttaaa aagcaaagca atattgaaaa 1320

cccaggaaaa attaggagga cttacacaac ctgatctcaa aacttaccat tatcaagaca 1380

gagtgttatt gacacaagga gagacaaata gataaacgga atgtggtagt ctggagatgc 1440

acccacatgt atgtggtcaa ttgatttttg gccaaggcac caagtcaatt caaaggagca 1500

aggaaagtag tacagaaaca accaaatatt gttttggaaa ataatgacaa agggcttata 1560

accagaatat aagcatataa atataattct ttcaaatcaa taataagaag gcaaatatct 1620

aataaaaatg agcaaagact tgaaaagtca cttaaaaagg cttattaatt agaaatatgc 1680

aaatgttatt agtcttcagt ggaatttaca ttaaaccaca agggatacta ttatatctta 1740

tgcccactag aataaccaaa ggaaaaaaga cagacaaaac aaaatgctgg tgaggatgtg 1800

aagcaactgg aactctcata cattattggt ggtaatgtaa aatttataca accattatga 1860

ataaaggttt ggcagtttct tacaaagttg aatgcacttc tccacgatga ctaggctttt 1920

cactcatagg cgtctggctc cctagaactg aaaacatatg ttcacaagaa gacttgcaaa 1980

tatatattct cccacgtcag gagatatttg ctatgcattt aactgacata agattagtgc 2040

tagagtttat aatgaggttc ttcaaatcta aaagaaaatg caaagcatat aatagtaagg 2100

ggtgcaggcc aggcgcagtg gctcactctg taatcccagc actttgggag gccgaggtgg 2160

gcggatcaca aggtcaggag ttcgagacca acctggccaa catagtgaaa ccctgtctct 2220

actaaaaata caaaaactag ccaggtgcgg tgtcatgcac ctgtagtccc agctactcgg 2280

gaggccgagg caggagaatc acttgaacct gggaggtgga ggttgcagtg a 2331

<210>40

<211>1071

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR40

<400>40

gctgtgattc aaactgtcag cgagataagg cagcagatca agaaagcact ccgggctcca 60

gaaggagcct tccaggccag ctttgagcat aagctgctga tgagcagtga gtgtcttgag 120

tagtgttcag ggcagcatgt taccattcat gcttgacttc tagccagtgt gacgagaggc 180

tggagtcagg tctctagaga gttgagcagc tccagcctta gatctcccag tcttatgcgg 240

tgtgcccatt cgctttgtgt ctgcagtccc ctggccacac ccagtaacag ttctgggatc 300

tatgggagta gcttccttag tgagctttcc cttcaaatac tttgcaacca ggtagagaat 360

tttggagtga aggttttgtt cttcgtttct tcacaatatg gatatgcatc ttcttttgaa 420

aatgttaaag taaattacct ctcttttcag atactgtctt catgcgaact tggtatcctg 480

tttccatccc agccttctat aacccagtaa catctttttt gaaaccagtg ggtgagaaag 540

acacctggtc aggaacgcgg accacaggac aactcaggct cacccacggc atcagactaa 600

aggcaaacaa ggactctgta taaagtaccg gtggcatgtg tattagtgga gatgcagcct 660

gtgctctgca gacagggagt cacacagaca cttttctata atttcttaag tgctttgaat 720

gttcaagtag aaagtctaac attaaatttg attgaacaat tgtatattca tggaatattt 780

tggaacggaa taccaaaaaa tggcaatagt ggttctttct ggatggaaga caaacttttc 840

ttgtttaaaa taaattttat tttatatatt tgaggttgac cacatgacct taaggataca 900

tatagacagt aaactggtta ctacagtgaa gcaaattaac atatctacca tcgtacatag 960

ttacattttt ttgtgtgaca ggaacagcta aaatctacgt atttaacaaa aatcctaaag 1020

acaatacatt tttattaact atagccctca tgatgtacat tagatctcta a 1071

<210>41

<211>1135

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR41

<400>41

cgtgtgcagt ccacggagag tgtgttctcc tcatcctcgt tccggtggtt gtggcgggaa 60

acgtggcgct gcaggacacc aacatcagtc acgtatttca ttctggaaaa aaaagtagca 120

caagcctcgg ctggttccct ccagctctta ccaggcagcc taagcctagg ctccattccc 180

gctcaaggcc ttcctcaggg gcctgctcac cacaggagct gttcccatgc agggactaag 240

gacatgcagc ctgcatagaa accaagcacc caggaaaaca tgattggatg gagcgggggg 300

gtgtggtctc tagccttgtc cacctccggt cctcatgggt ctcacacctc ctgagaatgg 360

gcaccgcaga ggccacagcc catacagcca agatgacaga ctccgtaagt gacagggatc 420

cacagcagag tgggtgaaat gttccctata aactttacaa aattaatgag ggcaggggga 480

ggggagaaat gaaaatgaac ccagctcgca gcacatcagc atcagtcact aggtcggcgt 540

gctctctgac tgcttcctcg tagctgcttg gtgtctcatt gcctcagaag catgtagacc 600

ctgtcacaag attgtagttc ccctaactgc tccgtagatc acaacttgaa ccttaggaaa 660

tgctgttttc cctttgagat attcctttgg gtcctgtata ctgatggagc tactgactga 720

gctgctccga aggaccccac gaggagctga ctaaaccaag agtgcagttt gtacaccctg 780

atgattacat cccccttgcc ccaccaatca actctcccaa ttttccagcc cctcaccctc 840

cagtcccctt aaaagcccca gcccaggccg ggcacagtgg ctcatgcctg taatcccagc 900

actttgggag gccaaggtgg gcagatcacc tgagggcagg aatttgagac cagcctgacc 960

aacatgaaga aaccccgtct ctattacaaa tacaaaatta gccgggcgtg ttgctgcata 1020

ctggtaatcc cagctacttg ggagggtgag gcaggagaat cacttgaatc tgggaggcgg 1080

aggttgcgat gagccgagac agcgccattg cactgcagcc tgggcaacaa gagca 1135

<210>42

<211>735

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR42

<400>42

aagggtgaga tcactaggga gggaggaagg agctataaaa gaaagaggtc actcatcaca 60

tcttacacac tttttaaaac cttggttttt taatgtccgt gttcctcatt agcagtaagc 120

cctgtggaag caggagtctt tctcattgac caccatgaca agaccctatt tatgaaacat 180

aatagacaca caaatgttta tcggatattt attgaaatat aggaattttt cccctcacac 240

ctcatgacca cattctggta cattgtatga atgaatatac cataatttta cctatggctg 300

tatatttagg tcttttcgtg caggctataa aaatatgtat gggccggtca cagtgactta 360

cgcccgtagt cccagaactt tgggaggccg aggcgggtgg atcacctgag gtcgggagtt 420

caaaaccagc ctgaccaaca tggagaaacc ccgtctctgc taaaaataca aaaattaact 480

ggacacggtg gcgtatgcct gtaatcccag ctactcggga agctgaggca ggagaactgc 540

ttgaacccag gaggcggagg ttgtggtgag tcgagattgc gccattgcac tccagcctgg 600

gcaacaagag cgaaattcca tctcaaaaaa aagaaaaaag tatgactgta tttagagtag 660

tatgtggatt tgaaaaatta ataagtgttg ccaacttacc ttagggttta taccatttat 720

gagggtgtcg gtttc 735

<210>43

<211>1227

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR43

<400>43

caaatagatc tacacaaaac aagataatgt ctgcccattt ttccaaagat aatgtggtga 60

agtgggtaga gagaaatgca tccattctcc ccacccaacc tctgctaaat tgtccatgtc 120

acagtactga gaccaggggg cttattccca gcgggcagaa tgtgcaccaa gcacctcttg 180

tctcaatttg cagtctaggc cctgctattt gatggtgtga aggcttgcac ctggcatgga 240

aggtccgttt tgtacttctt gctttagcag ttcaaagagc agggagagct gcgagggcct 300

ctgcagcttc agatggatgt ggtcagcttg ttggaggcgc cttctgtggt ccattatctc 360

cagcccccct gcggtgttgc tgtttgcttg gcttgtctgg ctctccatgc cttgttggct 420

ccaaaatgtc atcatgctgc accccaggaa gaatgtgcag gcccatctct tttatgtgct 480

ttgggctatt ttgattcccc gttgggtata ttccctaggt aagacccaga agacacagga 540

ggtagttgct ttgggagagt ttggacctat gggtatgagg taatagacac agtatcttct 600

ctttcatttg gtgagactgt tagctctggc cgcggactga attccacaca gctcacttgg 660

gaaaacttta ttccaaaaca tagtcacatt gaacattgtg gagaatgagg gacagagaag 720

aggccctaga tttgtacatc tgggtgttat gtctataaat agaatgcttt ggtggtcaac 780

tagacttgtt catgttgaca tttagtcttg ccttttcggt ggtgatttaa aaattatgta 840

tatcttgttt ggaatatagt ggagctatgg tgtggcattt tcatctggct ttttgtttag 900

ctcagcccgt cctgttatgg gcagccttga agctcagtag ctaatgaaga ggtatcctca 960

ctccctccag agagcggtcc cctcacggct cattgagagt ttgtcagcac cttgaaatga 1020

gtttaaactt gtttattttt aaaacattct tggttatgaa tgtgcctata ttgaattact 1080

gaacaacctt atggttgtga agaattgatt tggtgctaag gtgtataaat ttcaggacca 1140

gtgtctctga agagttcatt tagcatgaag tcagcctgtg gcaggttggg tggagccagg 1200

gaacaatgga gaagctttca tgggtgg 1227

<210>44

<211>1586

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR44

<400>44

cacctgcctc agcctcccaa agtgctgaga ttcaaagaaa ttttcatgga gaggggacag 60

atggagtcaa ttcttgtggg gtgaacatga gtaccacagt tagactgagg ttgggaaaga 120

ttttccagac aattggaaga gcatgtgaaa gacacagatt ttgagaaatg ttaagtctag 180

ggaactgcaa ggcttttggc acaagaaagc cactgtagac tatagaggca ggatgcctag 240

attcaaatcc caactgctac acttctaagc tttgtaattt tggcaagttt ttaccctcta 300

ttttcttatc tataaaatat agattttata tatatagata tagatatata gatagataat 360

aattgtgcat gcctaataaa gttgtcaaag attaaatgtt atatgtgaag tattttgtac 420

ggtgatagga acccaggaag ggctctatga atattatgta ttattattat tctaaagtag 480

ctggaataca atgttcaaag gagatagtgg caggagataa gtttgaattg aaagattgag 540

gccagaacat aaagtgcctc ctatattata ttttacataa ttggaacatc attgaaaaat 600

ttaagtatta tttatgtgtg tatgtgtgtt ttatataatt aattctagtt catcatttta 660

aaatatcttt ctgatgtcac tgtgaacaac agatgagaag aagtgaatcc tgagttaagg 720

agaccagctc tctgattact gccataatcc agggagggta ccataaggat ttcaactgga 780

agtgaatcca tcatgatgga gaggaaggac agggctgaaa aatacttagg aagtagtatc 840

agtaggactg gttaagagag agcagaggca ggctacaggg gttggaggtg tcaatcacag 900

agatagggaa aatgggagga gaagcaggct ttgaaaaagt ggcttgtctt gtaaaattat 960

gtgctgttaa aacagtacaa gaaattaata tattcaatcc caaaatacag ggacaattct 1020

ttttgaaaga gttacccaga tagtcttcct tgaagttttc agttaaagaa atttcttgtt 1080

aacaaataat gtagtcatag aagaaaacac ttaaaacttt attgaataaa gctaataaat 1140

catttaatat aatttatagg aaattgttac ataacacaca cattcaatac tttttgctaa 1200

agtataaatt aatggaagga gagcacgcac acagaggttg aattatgttt atgactttat 1260

tagtcaagaa tacaaaattg agtagctaca tcaagcagaa gcacatgctt tacaatccag 1320

cacagaatcc cttgacatcc aaactcccga aacagacatg taaatacaga tgacattgtc 1380

agaacaaaat agggtctcac ccgacctata atgttctttt cttgatataa atatgcacat 1440

gaattgcata cggtcatatg gttccaatta ccattatttc ctctgggctt agctatccat 1500

ctaaggggaa tttacaccaa cactgtactt ctacttgcaa gaatatatga aagcatagtt 1560

aacttctggc ttaggacccc aactca 1586

<210>45

<211>1981

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR45

<400>45

atggatcata gggtaaataa atttataatt tcttgagaaa gcttcgtact gttttccaag 60

atggctgtac taatttccat tcctaccaac agtgtacagg gtttcttttt ctccacatcc 120

tcaccaacac ttatcttcca tcttttttta taatagccct agtaaaatgt gtgaggtgat 180

atctcattgt ggcattgatt tgcacttctc tgataattag gaatgtttat gattttttca 240

tgtacctggt tggccttttg tatgatgtag gaaatgtcta ttctgattct ttgcttattt 300

tttaataagc atagtttttt tcttattttt gagtaggttg agttgcttat atattattat 360

atgagcccct tacctgatgt atggtttaaa aatattatcc catttgtggg ttctcttaat 420

tctatcattg cttcttttcc tgtggaaaag ttttaagttt tatgcagtct catttgtgtg 480

ttttgctttt gttgcctttt ggaataatct acagaaaatc atagctcagg ccaatgtcat 540

acagtctcct tctatatttc cttgtagtag ttttacattt aaactttaat tttgatttga 600

tgcttgtata aagagcaaaa taaaagtcaa attttattct tctgtatgtg gatagtcagt 660

tttgtctaca ccatttattg aaaataattt tctttcttca ctgtgtattt ttagttattt 720

tatcaaaaaa tcaattgacc acagacacac ggatttattt acaggttcta tatccctttg 780

tactgtttta catgtctgtt tttatgccat tgctatgctg ttttaattcc tatagctttg 840

taatagagtt tggagtcagg tagtctgatg cctccagctt tgttcttttt gttcaagatt 900

gctttggttg gtccaggtct tttgtggttc catacaaatt ttagcagtaa tttttctatt 960

tctgtgaaga atgacattgg aatttgatag tggttgcatt taatctgtag attgctttgg 1020

gtagcattga cacttttaca atactaattt ttgaatccat caatgaagga tgtttctcca 1080

tttatttatg ccattttaat ttttttcatc aatgtgctat agttttcagt atgtaaatct 1140

tttatggttt tgattaaatt tactcctgtc ttttatatat ttatatatct gttttgattc 1200

tattataaat tgaattgcct ttatttttca ggtaatagtt tgtcattagt taatagaaac 1260

aataatgata tttgtatgtt gattttgtaa ctattaactt tattgaattt cttcatcagc 1320

tataaccatt tattttggtg gaatctttaa gattttctct atcttaagat tatattttca 1380

aaaaacagaa acaatcttac ctcttccttc cctatgtgga tttcttttac gtctttgtct 1440

tgtgtaactg ttctggctag gcaattacac ataatgtttt catcatttat aattttacat 1500

cacatccatc tattgtggca cattgattgc tacttttcaa gttgtaaacc tggacattta 1560

tcactactct tcctccaata caggagtcca tggcgtggtg tgggccctac tgtgccacag 1620

tccagggcac ggctgggctg aggttctctt gtgcaagagt ccgtggctct gcggagcaag 1680

agttctccag tgccttagtc cagggttagg caggggtggg gctccttcag tagcttagtc 1740

cagtgcgccg ccctgcgagg gtcctcctga gcaggagtac acgatgaggc agggtcctac 1800

tgtgccttag cccaggaagc ggggggctgg gtcctctggt gccatagtcc aggctgccgg 1860

gagctgggtc ctctggtgcc atagctcagg ccggcgggag ctgggtcctc tggtgccgta 1920

gtccagggtg cagcagaaca ggagtcctgc ggagcagtag tccagggcac gctggggcgt 1980

g 1981

<210>46

<211>1859

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR46

<400>46

attgtttttc tcgcccttct gcattttctg caaattctgt tgaatcattg cagttactta 60

ggtttgcttc gtctccccca ttacaaacta cttactgggt ttttcaaccc tagttccctc 120

atttttatga tttatgctca tttctttgta cacttcgtct tgctccatct cccaactcat 180

ggcccctggc tttggattat tgttttggtc ttttattttt tgtcttcttc tacctcaaca 240

cttatcttcc tctcccagtc tccggtaccc tatcaccaag gttgtcatta acctttcata 300

ttattcctca ttatccatgt attcatttgc aaataagcgt atattaacaa aatcacaggt 360

ttatggagat ataattcaca taccttaaaa ttcaggcttt taaagtgtac ctttcatgtg 420

gtttttggta tattcacaaa gttatgcatt gatcaccacc atctgattcc ataacatgtt 480

caatacctca aaaagaagtc tgtactcatt agtagtcatt tcacattcac cactccctct 540

ggctctgggc agtcactgat ctttgtgtct ctatggattt gcctagtcta ggtattttta 600

tgtaaatggc atcatacaac atgtgacctt ttgtttggct tttttcattt agcaaaatgt 660

tatcaaggtc tgtccctgtt gtagcatgta ttagcacttc atttcttata tgctgaatga 720

tatactttat ttgtccatca gttgttcatg ctttatttgt ccatcagttg atgaacattt 780

gcgtttttgc cactttgggc tattaagaat aatgctactg tgaacaagtg tgtacaagtt 840

cctctacaaa tttttgtgtg gacatatcct ttcagttctc tcaggtgtat atctgggaat 900

tgaattgctg ggtcgtgtag tagctatgtt aaacactttg agaaactgct ataatgttct 960

ccagagctgt accattttaa attctgtgta tgaggattcc acgttctcca cttcctcacc 1020

agtgtatgga tttgggggta tactttttaa aaagtgggat taggctgggc acagtggctc 1080

acacctgtaa tcccaacact tcaggaagct gaggtgggag gatcacttga gcctagtagt 1140

ttgagaccag cctgggcaac atagggagac cctgtctcta caaaaaataa tttaaaataa 1200

attagctggg cgttgtggca cacacctgta gtcccagcta catgggaggc tgaggtggaa 1260

ggattccctg agcccagaag tttgaggttg cagtgagcca tgatggcagc actatactgt 1320

agcctgggtg tcagagcaag actccgtttc agggaagaaa aaaaaaagtg ggatgatatt 1380

tttgacactt ttcttcttgt tttcttaatt tcatacttct ggaaattcca ttaaattagc 1440

tggtaccact ctaactcatt gtgtttcatg gctgcatagt aatattgcat aatataaata 1500

taccattcat tcatcaaagt tagcagatat tgactgttag gtgccaggca ctgctctaag 1560

cgttaaagaa aaacacacaa aaacttttgc attcttagag tttattttcc aatggagggg 1620

gtggagggag gtaagaattt aggaaataaa ttaattacat atatagcata gggtttcacc 1680

agtgagtgca gcttgaatcg ttggcagctt tcttagtagt ataaatacag tactaaagat 1740

gaaattactc taaatggtgt tacttaaatt actggaatag gtattactat tagtcacttt 1800

gcaggtgaaa gtggaaacac catcgtaaaa tgtaaaatag gaaacagctg gttaatgtt 1859

<210>47

<211>1082

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR47

<400>47

atcattagtc attagggaaa tgcaaatgaa aaacacaagc agccaccaat atacacctac 60

taggatgatt taaaggaaaa taagtgtgaa gaaggacgta aagaaattgt aaccctgata 120

cattgatggt agaaatggat aaagttgcag ccactgtgaa aaacagtctg cagtggctca 180

gaaggttaaa tatagaaccc ctgttggacc caggaactct actcttaggc accccaaaga 240

atagagaaca gaaatcaaac agatgtttgt atactaatgt ttgtagcatc acttttcaca 300

ggagccaaaa ggtggaaata atccaaccat cagtgaacaa atgaatgtaa taaaagcaag 360

gtggtctgca tgcaatgcta catcatccat ctgtaaaaaa cgaacatcat tttgatagat 420

gatacaacat gggtggacat tgagaacatt atgcttagtg aaataagcca gacacaaaag 480

gaatatattg tataattgta attacatgaa gtgcctagaa tagtcaaatt catacaagag 540

aaagtgggat aggaatcacc atgggctgga aataggggga aggtgctata ctgcttattg 600

tggacaaggt ttcgtaagaa atcatcaaaa ttgtgggtgt agatagtggt gttggttatg 660

caaccctgtg aatatattga atgccatgga gtgcacactt tggttaaaag gttcaaatga 720

taaatattgt gttatatata tttccccacg atagaaaaca cgcacagcca agcccacatg 780

ccagtcttgt tagctgcctt cctttacctt caagagtggg ctgaagcttg tccaatcttt 840

caaggttgct gaagactgta tgatggaagt catctgcatt gggaaagaaa ttaatggaga 900

gaggagaaaa cttgagaatc cacactactc accctgcagg gccaagaact ctgtctccca 960

tgctttgctg tcctgtctca gtatttcctg tgaccacctc ctttttcaac tgaagacttt 1020

gtacctgaag gggttcccag gtttttcacc tcggcccttg tcaggactga tcctctcaac 1080

ta 1082

<210>48

<211>1242

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR48

<400>48

atcatgtatt tgttttctga attaattctt agatacatta atgttttatg ttaccatgaa 60

tgtgatatta taatataata tttttaattg gttgctactg tttataagaa tttcattttc 120

tgtttacttt gccttcatat ctgaaaacct tgctgatttg attagtgcat ccacaaattt 180

tcttggattt tctatgggta attacaaatc tccacacaat gaggttgcag tgagccaaga 240

tcacaccact gtactccagc ctgggcgaca gagtgagaca ccatctcaca aaaacacata 300

aacaaacaaa cagaaactcc acacaatgac aacgtatgtg ctttcttttt ttcttcctct 360

ttctataata tttctttgtc ctatcttaac tgaactggcc agaaacccca ggacaatgat 420

aaatacgagc agtgtcaaca gacatctcat tccctttcct agcttttata aaaataacga 480

ttatgcttca acattacata tggtggtgtc gatggttttg ttatagataa gcttatcagg 540

ttaagaaatt tgtctgcgtt tcctagtttg gtataaagat tttaatataa atgaatgttg 600

tattttatca tcttattttt ttcctacatc tgctaaggta atcctgtgtt ttcccctttt 660

caatctccta atgtggtgaa tgacattaaa ataccttcta ttgttaaaat attcttgcaa 720

cgctgtatag aaccaatgcc tttattctgt attgctgatg gatttttgaa aaatatgtag 780

gtggacttag ttttctaagg ggaatagaat ttctaatata tttaaaatat tttgcatgta 840

tgttctgaag gacattggtg tgtcatttct ataccatctg gctactagag gagccgactg 900

aaagtcacac tgccggagga ggggagaggt gctcttccgt ttctggtgtc tgtagccatc 960

tccagtggta gctgcagtga taataatgct gcagtgccga cagttctgga aggagcaaca 1020

acagtgattt cagcagcagc agtattgcgg gatccccacg atggagcaag ggaaataatt 1080

ctggaagcaa tgacaatatc agctgtggct atagcagctg agatgtgagt tctcacggtg 1140

gcagcttcaa ggacagtagt gatggtccaa tggcgcccag acctagaaat gcacatttcc 1200

tcagcaccgg ctccagatgc tgagcttgga cagctgacgc ct 1242

<210>49

<211>1015

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR49

<400>49

aaaccagaaa cccaaaacaa tgggagtgac atgctaaaac cagaaaccca aaacaatggg 60

agggtcctgc taaaccagaa acccaaaaca atgggagtga agtgctaaaa ccagaaaccc 120

aaaacaatgg gagtgtcctg ctacaccaga aacccaaaac gatgggagtg acgtgataaa 180

accagacacc caaaacaatg ggagtgacgt gctaaaccag aaacccaaaa caatgggagt 240

gacgtgctaa aacctggaaa cctaaaacaa tgcgagtgag gtgctaacac cagaatccat 300

aacaatgtga gtgacgtgct aaaccagaac ccaaaacaat gggagtgacg tgctaaaaca 360

ggaacccaaa acaatgagag tgacgtgcta aaccagaaac ccaaaacaat gggaatgacg 420

tgctaaaacc ggaacccaaa acaatgggag tgatgtgcta aaccagaaac ccaaaacaat 480

gggaatgaca tgctaaaact ggaacccaaa acaatggtaa ctaagagtga tgctaaggcc 540

ctacattttg gtcacactct caactaagtg agaacttgac tgaaaaggag gatttttttt 600

tctaagacag agttttggtc tgtcccccag agtggagtgc agtggcatga tctcggctca 660

ctgcaagctc tgcctcccgg gttcaggcca ttctcctgcc tcagcctcct gagtagctgg 720

gaatacaggc acccgccacc acacttggct aattttttgt atttttagta gagatggggt 780

ttcaccatat tagcaaggat ggtctcaatc tcctgacctc gtgatctgcc cacctcaggc 840

tcccaaagtg ctgggattac aggtgtgagc caccacaccc agcaaaaagg aggaattttt 900

aaagcaaaat tatgggaggc cattgttttg aactaagctc atgcaatagg tcccaacaga 960

ccaaaccaaa ccaaaccaaa atggagtcac tcatgctaaa tgtagcataa tcaaa 1015

<210>50

<211>2355

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>seauence of STAR50

<400>50

caaccatcgt tccgcaagag cggcttgttt attaaacatg aaatgaggga aaagcctagt 60

agctccattg gattgggaag aatggcaaag agagacaggc gtcattttct agaaagcaat 120

cttcacacct gttggtcctc acccattgaa tgtcctcacc caatctccaa cacagaaatg 180

agtgactgtg tgtgcacatg cgtgtgcatg tgtgaaagta tgagtgtgaa tgtgtctata 240

tgggaacata tatgtgattg tatgtgtgta actatgtgtg actggcagcg tggggagtgc 300

tggttggagt gtggtgtgat gtgagtatgc atgagtggct gtgtgtatga ctgtggcggg 360

aggcggaagg ggagaagcag caggctcagg tgtcgccaga gaggctggga ggaaactata 420

aacctgggca atttcctcct catcagcgag cctttcttgg gcaatagggg cagagctcaa 480

agttcacaga gatagtgcct gggaggcatg aggcaaggcg gaagtactgc gaggaggggc 540

agagggtctg acacttgagg ggttctaatg ggaaaggaaa gacccacact gaattccact 600

tagccccaga ccctgggccc agcggtgccg gcttccaacc ataccaacca tttccaagtg 660

ttgccggcag aagttaacct ctcttagcct cagtttcccc acctgtaaaa tggcagaagt 720

aaccaagctt accttcccgg cagtgtgtga ggatgaaaag agctatgtac gtgatgcact 780

tagaagaagg tctagggtgt gagtggtact cgtctggtgg gtgtggagaa gacattctag 840

gcaatgagga ctggggagag cctggcccat ggcttccact cagcaaggtc agtctcttgt 900

cctctgcact cccagccttc cagagaggac cttcccaacc agcactcccc acgctgccag 960

tcacacatag ttacacacat acaatcacat atatgttccc atatagacac attcacactc 1020

ataccttcac acatgcacac gcatgtgcac acacagtcac tcatttctgt gttggagatt 1080

gggtgaggac attcaatggg tgaggaccaa caggtgtgaa gattgctttc tagaaaatga 1140

ctcctgtctc tctttgccat tcttcccaat ccgatggagc tactaggctt ttccctcatt 1200

tcatgtttaa taaaccttcc caatggcgaa atgggctttc tcaagaagtg gtgagtgtcc 1260

catccctgcg gtggggacag gggtggcagc ggacaagcct gcctggaggg aactgtcagg 1320

ctgattccca gtccaactcc agcttccaac acctcatcct ccaggcagtc ttcattcttg 1380

gctctaattt cgctcttgtt ttctttttta tttttatcga gaactgggtg gagagctttt 1440

ggtgtcattg gggattgctt tgaaaccctt ctctgcctca cactgggagc tggcttgagt 1500

caactggtct ccatggaatt tcttttttta gtgtgtaaac agctaagttt taggcagctg 1560

ttgtgccgtc cagggtggaa agcagcctgt tgatgtggaa ctgcttggct cagatttctt 1620

gggcaaacag atgccgtgtc tctcaactca ccaattaaga agcccagaaa atgtggcttg 1680

gagaccacat gtctggttat gtctagtaat tcagatggct tcacctggga agccctttct 1740

gaatgtcaaa gccatgagat aaaggacata tatatagtag ctagggtggt ccacttctta 1800

ggggccatct ccggaggtgg tgagcactaa gtgccaggaa gagaggaaac tctgttttgg 1860

agccaaagca taaaaaaacc ttagccacaa accactgaac atttgttttg tgcaggttct 1920

gagtccaggg agggcttctg aggagagggg cagctggagc tggtaggagt tatgtgagat 1980

ggagcaaggg ccctttaaga ggtgggagca gcatgagcaa aggcagagag gtggtaatgt 2040

ataaggtatg tcatgggaaa gagtttggct ggaacagagt ttacagaata gaaaaattca 2100

acactattaa ttgagcctct actacgtgct cgacattgtt ctagtcactg agataggttt 2160

ggtatacaaa acaaaatcca tcctctatgg acattttagt gactaacaac aatataaata 2220

ataaaagtga acaaaagctc aaaacatgcc aggcactatt atttatttat ttatttattt 2280

atttatttat tttttgaaac agagtctcgc tctgttgccc aggctggagt gtagtggtgc 2340

gatctcggct cactg 2355

<210>51

<211>2289

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR51

<400>51

tcacaggtga caccaatccc ctgaccacgc tttgagaagc actgtactag attgactttc 60

taatgtcagt cttcattttc tagctctgtt acagccatgg tctccatatt atctagtaca 120

acacacatac aaatatgtgt gatacagtat gaatataata taaaaatatg tgttataata 180

taaatataat attaaaatat gtctttatac tagataataa tacttaataa cgttgagtgt 240

ttaactgctc taagcacttt acctgcagga aacagttttt tttttatttt ggtgaaatac 300

aactaacata aatttattta caattttaag catttttaag tgtatagttt agtggagtta 360

atatattcaa aatgttgtgc agccgtcacc atcatcagtc ttcataactc ttttcatatt 420

gtaaaattaa aagtttatgc tcatttaaaa atgactccca atttcccccc tcctcaacct 480

ctggaaacta ccattctatt ttctgcctcc gtagttttgc ccactctaag tacctcacat 540

aagtggaatt tgtcttattt gcctgtttgt gaccggctga tttcatttag tataatgtcc 600

tcaagtttta ttcacgttat atagcatatg tcataatttt cttcactttt aagcttgagt 660

aatatttcat cgtatgtatc tcacattttg cttatccatt catctctcag tggacacttg 720

agttgcttct acattttagc tgttgtgaat actgctgcta tgaacatggg tgtataaata 780

tctcaagacc tttttatcag ttttttaaaa tatatactca gtagtagttt agctggatta 840

tatggtaatt ttatttttaa tttttgagga actgtcctac ccttttattc aatagtagct 900

ataccaattg acaattggca ttcctaccaa cagggcataa gggttctcaa ttctccacat 960

attccctgat acttgttatt ttcaggtgtt tttttttttt tttttttttt atgggagcca 1020

tgttaatggg tgtaaggtga tatttcatta tagttttgat ttgcatttcc ctaatgatta 1080

gtgatgttaa gcatctcttc atgtgcctat tggccatttg tatatcttct ttaaaaatat 1140

atatatactc attcctttgc ccatttttga attatgttta ttttttgtta ttgagtttca 1200

atacttttct atataaccta ggtattaatc ctttatcaga cttaagattt gcaaatattc 1260

tctttcattc cacaggttgc taattctctc tgttggtaat atcttttgat gctgttgtgt 1320

ccagaattga ttcattcctg tgggttcttg gtctcactga cttcaagaat aaagctgcgg 1380

accctagtgg tgagtgttac acttcttata gatggtgttt ccggagtttg ttccttcaga 1440

tgtgtccaga gtttcttcct tccaatgggt tcatggtctt gctgacttca ggaatgaagc 1500

cgcagacctt cgcagtgagg tttacagctc ttaaaggtgg cgtgtccaga gttgtttgtt 1560

ccccctggtg ggttcgtggt cttgctgact tcaggaatga agccgcagac cctcgcagtg 1620

agtgttacag ctcataaagg tagtgcggac acagagtgag ctgcagcaag atttactgtg 1680

aagagcaaaa gaacaaagct tccacagcat agaaggacac cccagcgggt tcctgctgct 1740

ggctcaggtg gccagttatt attcccttat ttgccctgcc cacatcctgc tgattggtcc 1800

attttacaga gtactgattg gtccatttta cagagtgctg attggtgcat ttacaatcct 1860

ttagctagac acagagtgct gattgctgca ttcttacaga gtgctgattg gtgcatttac 1920

agtcctttag ctagatacag aacgctgatt gctgcgtttt ttacagagtg ctgattggtg 1980

catttacaat cctttagcta gacacagtgc tgattggtgg gtttttacag agtgctgatt 2040

ggtgcgtctt tacagagtgc tgattggtgc atttacaatc ctttagctag acacagagtg 2100

ctgattggtg cgtttataat cctctagcta gacagaaaag ttttccaagt ccccacctga 2160

ccgagaagcc ccactggctt cacctctcac tgttatactt tggacatttg tccccccaaa 2220

atctcatgtt gaaatgtaac ccctaatgtt ggaactgagg ccagactgga tgtggctggg 2280

ccatgggga 2289

<210>52

<211>1184

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR52

<400>52

ctcttctttg tttttttatt ttggggtgtg tgggtacgtg taagatgaga aatgtacaaa 60

cacaagtatt tcagaaactc caagtaatat tctgtctgtg agttcacggt aaataaataa 120

aaagggcaaa gtgacagaaa tacaggatta ttaaaagcaa aataatgttc tttgaaatcc 180

cccccttggt gtatttttta tcttaggatg cagcactttc agcatgccca agtattgaaa 240

gcagtgtttt tacgctacca cggtaatttt atttagaaac cccatgttca cttttagttt 300

taaaatggtc tttatgacat aaaattatca gcattcatat ttttgtgttt taatattcct 360

ttggctactt attgaaacag taaacattac gaaaattagt aaacaaatct ttgatagttg 420

cttatttttg tttaattgaa tgtttatttt attaggtaaa tatacaatca aatttattta 480

aaaataatga ggaaaagaat acttttcttt cgctttgcga aagcaaagtg atttttcatt 540

cttctccgtc cgattccttc tcttccagct gccacagccg actgacaggc tcccggcggc 600

ctgaggagta gtatgcaaat tttggatgat tgacacctac agtagaagcc aatcacgtca 660

aagtaggatg ctgattggtt gacaacaata ggcgtaaacc ttgacgtttt aaaaacctga 720

cacccaatcc aggcgattca tgcaaataaa ggaagggagt cacattacca ggggccagag 780

agacttgagt acgacctcac gtgttcagtg gtggatattg cacagacgtc tgcaaggtct 840

atataaacgc tacataatgt tcaactcaat tgcttgcctt ggcctttccc aaacttgtca 900

ctggaatata aattatccct tttttaaaaa taaaaaaata agaattatgt agtgcacata 960

tatgatggtt catgtagaaa tctaaatgga cttccaacgc atggaatttt cctatttccc 1020

cctttcttta aattaatcct cagtgaagga ggctgttttc ccctagattt caaaaggacg 1080

agatttacag agcctttcct tggagaaacc cgctctaggc acagatggtc agtaaattta 1140

gcttcttcag cgaagttcca catggcaccg ccagatggca taag 1184

<210>53

<211>1431

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR53

<400>53

ccctgaggaa gatgacgagt aactccgtaa gagaaccttc cactcatccc ccacatccct 60

gcagacgtgc tattctgtta tgatactggt atcccatctg tcacttgctc cccaaatcat 120

tcccttctta caattttcta ctgtacagca ttgaggctga acgatgagag atttcccatg 180

ctctttctac tccctgccct gtatatatcc ggggatcctc cctacccagg atgctgtggg 240

gtcccaaacc ccaagtaagc cctgatatgc gggccacacc tttctctagc ctaggaattg 300

ataacccagg cgaggaagtc actgtggcat gaacagatgg ttcacttcga ggaaccgtgg 360

aaggcgtgtg caggtcctga gatagggcag aatcggagtg tgcagggtct gcaggtcagg 420

aggagttgag attgcgttgc cacgtggtgg gaactcactg ccacttattt ccttctctct 480

tcttgcctca gcctcaggga tacgacacat gcccatgatg agaagcagaa cgtggtgacc 540

tttcacgaac atgggcatgg ctgcggaccc ctcgtcatca ggtgcatagc aagtgaaagc 600

aagtgttcac aacagtgaaa agttgagcgt catttttctt agtgtgccaa gagttcgatg 660

ttagcgttta cgttgtattt tcttacactg tgtcattctg ttagatacta acattttcat 720

tgatgagcaa gacatactta atgcatattt tggtttgtgt atccatgcac ctaccttaga 780

aaacaagtat tgtcggttac ctctgcatgg aacagcatta ccctcctctc tccccagatg 840

tgactactga gggcagttct gagtgtttaa tttcagattt tttcctctgc atttacacac 900

acacgcacac aaaccacacc acacacacac acacacacac acacacacac acacacacac 960

acacaccaag taccagtata agcatctgcc atctgctttt cccattgcca tgcgtcctgg 1020

tcaagctccc ctcactctgt ttcctggtca gcatgtactc ccctcatccg attcccctgt 1080

agcagtcact gacagttaat aaacctttgc aaacgttccc cagttgtttg ctcgtgccat 1140

tattgtgcac acagctctgt gcacgtgtgt gcatatttct ttaggaaaga ttcttagaag 1200

tggaattgct gtgtcaaagg agtcatttat tcaacaaaac actaatgagt gcgtcctcgt 1260

gctgagcgct gttctaggtg ctggagcgac gtcagggaac aaggcagaca ggagttcctg 1320

acccccgttc tagaggagga tgtttccagt tgttgggttt tgtttgtttg tttcttctag 1380

agatggtggt cttgctctgt ccaggctaga gtgcagtggc atgatcatag c 1431

<210>54

<211>975

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR54

<400>54

ccataaaagt gtttctaaac tgcagaaaaa tccccctaca gtcttacagt tcaagaattt 60

tcagcatgaa atgcctggta gattacctga ctttttttgc caaaaataag gcacagcagc 120

tctctcctga ctctgacttt ctatagtcct tactgaatta tagtccttac tgaattcatt 180

cttcagtgtt gcagtctgaa ggacacccac attttctctt tgtctttgtc aattctttgt 240

gttgtaaggg caggatgttt aaaagttgaa gtcattgact tgcaaaatga gaaatttcag 300

agggcatttt gttctctaga ccatgtagct tagagcagtg ttcacactga ggttgctgct 360

aatgtttctg cagttcttac caatagtatc atttacccag caacaggata tgatagagga 420

cttcgaaaac cccagaaaat gttttgccat atatccaaag ccctttggga aatggaaagg 480

aattgcgggc tcccattttt atatatggat agatagagac caagaaagac caaggcaact 540

ccatgtgctt tacattaata aagtacaaaa tgttaacatg taggaagtct aggcgaagtt 600

tatgtgagaa ttctttacac taattttgca acattttaat gcaagtctga aattatgtca 660

aaataagtaa aaatttttac aagttaagca gagaataaca atgattagtc agagaaataa 720

gtagcaaaat cttcttctca gtattgactt ggttgctttt caatctctga ggacacagca 780

gtcttcgctt ccaaatccac aagtcacatc agtgaggaga ctcagctgag actttggcta 840

atgttggggg gtccctcctg tgtctcccca ggcgcagtga gcctgcaggc cgacctcact 900

cgtggcacac aactaaatct ggggagaagc aacccgatgc cagcatgatg cagatatctc 960

agggtatgat cggcc 975

<210>55

<211>501

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR55

<400>55

cctgaactca tgatccgccc acctcagcct cctgaagtgc tgggattaca ggtgtgagcc 60

accacaccca gccgcaacac actcttgagc aaccaatgtg tcataaaaga aataaaatgg 120

aaatcagaaa gtatcttgag acagacaaaa atggaaacac aacataccaa aatttatggg 180

acacagcaaa agcagtttta ggagggaagt ttatagtgat gaatacctac ctcaaaatca 240

ttagcctgat tggatgacac tacagtgtat aaatgaattg aaaaccacat tgtgccccat 300

acatatatac aatttttatt tgttaattaa aaataaaata aaactttaaa aaagaagaaa 360

gagctcaaat aaacaaccta actttatacc tcaaggaaat agaagagcca gctaagccca 420

aagttgacag aaggaaaaaa atattggcag aaagaaatga aacagagact agaaagacaa 480

ttgaagagat cagcaaaact a 501

<210>56

<211>741

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR56

<400>56

acacaggaaa agatcgcaat tgttcagcag agctttgaac cggggatgac ggtctccctc 60

gttgcccggc aacatggtgt agcagccagc cagttatttc tctggcgtaa gcaataccag 120

gaaggaagtc ttactgctgt cgccgccgga gaacaggttg ttcctgcctc tgaacttgct 180

gccgccatga agcagattaa agaactccag cgcctgctcg gcaagaaaac gatggaaaat 240

gaactcctca aagaagccgt tgaatatgga cgggcaaaaa agtggatagc gcacgcgccc 300

ttattgcccg gggatgggga gtaagcttag tcagccgttg tctccgggtg tcgcgtgcgc 360

agttgcacgt cattctcaga cgaaccgatg actggatgga tggccgccgc agtcgtcaca 420

ctgatgatac ggatgtgctt ctccgtatac accatgttat cggagagctg ccaacgtatg 480

gttatcgtcg ggtatgggcg ctgcttcgca gacaggcaga acttgatggt atgcctgcga 540

tcaatgccaa acgtgtttac cggatcatgc gccagaatgc gctgttgctt gagcgaaaac 600

ctgctgtacc gccatcgaaa cgggcacata caggcagagt ggccgtgaaa gaaagcaatc 660

agcgatggtg ctctgacggg ttcgagttct gctgtgataa cggagagaga ctgcgtgtca 720

cgttcgcgct ggactgctgt g 741

<210>57

<211>1365

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR57

<400>57

tccttctgta aataggcaaa atgtatttta gtttccacca cacatgttct tttctgtagg 60

gcttgtatgt tggaaatttt atccaattat tcaattaaca ctataccaac aatctgctaa 120

ttctggagat gtggcagtga ataaaaaagt tatagtttct gattttgtgg agcttggact 180

ttaatgatgg acaaaacaac acattcttaa atatatattt catcaaaatt atagtgggtg 240

aattatttat atgtgcattt acatgtgtat gtatacataa atgggcggtt actggctgca 300

ctgagaatgt acacgtggcg cgaacgaggc tgggcggtca gagaaggcct cccaaggagg 360

tggctttgaa gctgagtggt gcttccacgt gaaaaggctg gaaagggcat tccaagaaaa 420

ggctgaggcc agcgggaaag aggttccagt gcgctctggg aacggaaagc gcacctgcct 480

gaaacgaaaa tgagtgtgct gaaataggac gctagaaagg gaggcagagg ctggcaaaag 540

cgaccgagga ggagctcaaa ggagcgagcg gggaaggccg ctgtggagcc tggaggaagc 600

acttcggaag cgcttctgag cgggtaaggc cgctgggagc atgaactgct gagcaggtgt 660

gtccagaatt cgtgggttct tggtctcact gacttcaaga atgaagaggg accgcggacc 720

ctcgcggtga gtgttacagc tcttaaggtg gcgcgtctgg agtttgttcc ttctgatgtt 780

cggatgtgtt cagagtttct tccttctggt gggttcgtgg tctcgctggc tcaggagtga 840

agctgcagac cttcgcggtg agtgttacag ctcataaaag cagggtggac tcaaagagtg 900

agcagcagca agatttattg caaagaatga aagaacaaag cttccacact gtggaagggg 960

accccagcgg gttgccactg ctggctccgc agcctgcttt tattctctta tctggcccca 1020

cccacatcct gctgattggt agagccgaat ggtctgtttt gacggcgctg attggtgcgt 1080

ttacaatccc tgcgctagat acaaaggttc tccacgtccc caccagatta gctagataga 1140

gtctccacac aaaggttctc caaggcccca ccagagtagc tagatacaga gtgttgattg 1200

gtgcattcac aaaccctgag ctagacacag ggtgatgact ggtgtgttta caaaccttgc 1260

ggtagataca gagtatcaat tggcgtattt acaatcactg agctaggcat aaaggttctc 1320

caggtcccca ccagactcag gagcccagct ggcttcaccc agtgg 1365

<210>58

<211>1401

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR58

<400>58

aagtttacct tagccctaaa ttatttcatt gtgattggca ttttaggaaa tatgtattaa 60

ggaatgtctc ttaggagata aggataacat atgtctaaga aaattatatt gaaatattat 120

tacatgaact aaaatgttag aactgaaaaa aaattattgt aactccttcc agcgtaggca 180

ggagtatcta gataccaact ttaacaactc aactttaaca acttcgaacc aaccagatgg 240

ctaggagatt cacctattta gcatgatatc ttttattgat aaaaaaatat aaaacttcca 300

ttaaattttt aagctactac aatcctatta aattttaact taccagtgtt ctcaatgcta 360

cataatttaa aatcattgaa atcttctgat tttaactcct cagtcttgaa atctacttat 420

ttttagttac atatatatcc aatctactgc cgctagtaga agaagcttgg aatttgagaa 480

aaaaatcaga cgttttgtat attctcatat tcactaattt attttttaaa tgagtttctg 540

caatgcatca agcagtggca aaacaggaga aaaattaaaa ttggttgaaa agatatgtgt 600

gccaaacaat cccttgaaat ttgatgaagt gactaatcct gagttattgt ttcaaatgtg 660

tacctgttta tacaagggta tcacctttga aatctcaaca ttaaatgaaa ttttataagc 720

aatttgttgt aacatgatta ttataaaatt ctgatataac attttttatt acctgtttag 780

agtttaaaga gagaaaagga gttaagaata attacatttt cattagcatt gtccgggtgc 840

aaaaacttct aacactatct tcaaatcttt ttctccattg ccttctgaac atacccactt 900

gggtatctca ttagcactgc aaattcaaca ttttcgattg ctaatttttc tccctaaata 960

tttatttgtt ttctcagctt tagccaatgt ttcactattg accatttgct caagtatagt 1020

gacgcttcaa tgaccttcag agagctgttt cagtccttcc tggactactt gcatgcttcc 1080

aacaaaatga agcactcttg atgtcagtca ctcaaataaa tggaaatggg cccatttact 1140

aggaatgtta acagaataaa aagatagacg tgacaccagt tgcttcagtc catctccatt 1200

tacttgctta aggcctggcc atatttctca cagttgatat ggcgcagggc acatgtttaa 1260

atggctgttc ttgtaggatg gtttgactgt tggattcctc atcttccctc tccttaggaa 1320

ggaaggttac agtagtactg ttggctcctg gaatatagat tcataaagaa ctaatggagt 1380

atcatctccc actgctcttg t 1401

<210>59

<211>866

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR59

<400>59

gagatcacgc cactgcactc cagcctgggg gacagagcaa gactccatct cagaaacaaa 60

caaacacaca aagccagtca aggtgtttaa ttcgacggtg tcaggctcag gtctcttgac 120

aggatacatc cagcacccgg gggaaacgtc gatgggtggg gtggaatcta ttttgtggcc 180

tcaagggagg gtttgagagg tagtcccgca agcggtgatg gcctaaggaa gcccctccgc 240

ccaagaagcg atattcattt ctagcctgta gccacccaag agggagaatc gggctcgcca 300

cagaccccac aacccccaac ccaccccacc cccacccctc ccacctcgtg aaatgggctc 360

tcgctccgtc aggctctagt cacaccgtgt ggttttggaa cctccagcgt gtgtgcgtgg 420

gttgcgtggt ggggtggggc cggctgtgga cagaggaggg gataaagcgg cggtgtcccg 480

cgggtgcccg ggacgtgggg cgtggggcgt gggtggggtg gccagagcct tgggaactcg 540

tcgcctgtcg ggacgtctcc cctcctggtc ccctctctga cctacgctcc acatcttcgc 600

cgttcagtgg ggaccttgtg ggtggaagtc accatccctt tggactttag ccgacgaagg 660

ccgggctccc aagagtctcc ccggaggcgg ggccttgggc aggctcacaa ggatgctgac 720

ggtgacggtt ggtgacggtg atgtacttcg gaggcctcgg gccaatgcag aggtatccat 780

ttgacctcgg tgggacaggt cagctttgcg gagtcccgtg cgtccttcca gagactcatc 840

cagcgctagc aagcatggtc ccgagg 866

<210>60

<211>2067

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR60

<220>

<221>misc_feature

<222>(92)..(1777)

<223>n is a，c，g，or t on various positions

<400>60

agcagtgcag aactggggaa gaagaagagt ccctacacca cttaatactc aaaagtactc 60

gcaaaaaata acacccctca ccaggtggca tnattactct ccttcattga gaaaattagg 120

aaactggact tcgtagaagc taattgcttt atccagagcc acctgcatac aaacctgcag 180

cgccacctgc atacaaacct gtcagccgac cccaaagccc tcagtcgcac caagcctctg 240

ctgcacaccc tcgtgccttc acactggccg ttccccaagc ctggggcata ctncccagct 300

ctgagaaatg tattcatcct tcaaagccct gctcatgtgt cctnntcaac aggaaaatct 360

cccatgagat gctctgctat ccccatctct cctgccccat agcttaggca nacttctgtg 420

gtggtgagtc ctgggctgtg ctgtgatgtg ttcgcctgcn atgtntgttc ttccccacaa 480

tgatgggccc ctgaattctc tatctctagc acctgtgctc agtaaaggct tgggaaacca 540

ggctcaaagc ctggcccaga tgccaccttt tccagggtgc ttccgggggc caccaaccag 600

agtgcagcct tctcctccac caggaactct tgcagcccca cccctgagca cctgcacccc 660

attacccatc tttgtttctc cgtgtgatcg tattattaca gaattatata ctgtattctt 720

aatacagtat ataattgtat aattattctt aatacagtat ataattatac aaatacaaaa 780

tatgtgttaa tggaccgttt atgttactgg taaagcttta agtcaacagt gggacattag 840

ttaggttttt ggcgaagtca aaagttatat gtgcattttc aacttcttga ggggtcggta 900

cntctnaccc ccatgttgtt caanggtcaa ctgtctacac atatcatagc taattcacta 960

cagaaatgtt agcttgtgtc actagtatct ccccttctca taagcttaat acacatacct 1020

tgagagagct cttggccatc tctactaatg actgaagttt ttatttatta tagatgtcat 1080

aataggcata aaactacatt acatcattcg agtgccaatt ttgccacctt gaccctcttt 1140

tgcaaaacac caacgtcagt acacatatga agaggaaact gcccgagaac tgaagttcct 1200

gagaccagga gctgcaggcg ttagatagaa tatggtgacg agagttacga ggatgacgag 1260

agtaaatact tcatactcag tacgtgccaa gcactgctat aagcgctctg tatgtgtgaa 1320

gtcatttaat cctcacagca tcccacggtg taattatttt cattatcccc atgagggaac 1380

agaaactcag aacggttcaa cacatatgcg agaagtcgca gccggtcagt gagagagcag 1440

gttcccgtcc aagcagtcag accccgagtg cacactctcg acccctgtcc agcagactca 1500

ctcgtcataa ggcggggagt gntctgtttc agccagatgc tttatgcatc tcagagtacc 1560

caaaccatga aagaatgagg cagtattcan gagcagatgg ngctgggcag taaggctggg 1620

cttcagaata gctggaaagc tcaagtnatg ggacctgcaa gaaaaatcca ttgtttngat 1680

aaatagccaa agtccctagg ctgtaagggg aaggtgtgcc aggtgcaagt ggagctctaa 1740

tgtaaaatcg cacctgagtc tcctggtctt atgagtnctg ggtgtacccc agtgaaaggt 1800

cctgctgcca ccaagtgggc catggttcag ctgtgtaagt gctgagcggc agccggaccg 1860

cttcctctaa cttcacctcc aaaggcacag tgcacctggt tcctccagca ctcagctgcg 1920

aggcccctag ccagggtccc ggcccccggc ccccggcagc tgctccagct tccttcccca 1980

cagcattcag gatggtctgc gttcatgtag acctttgttt tcagtctgtg ctccgaggtc 2040

actggcagca ctagccccgg ctcctgt 2067

<210>61

<211>1470

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR61

<220>

<221>misc_feature

<222>(130)..(976)

<223>n is a，c，g，or t on various positions

<400>61

cagcccccac atgcccagcc ctgtgctcag ctctgcagcg gggcatggtg ggcagagaca 60

cagaggccaa ggccctgctt cggggacggt gggcctggga tgagcatggc cttggccttc 120

gccgagagtn ctcttgtgaa ggaggggtca ggaggggctg ctgcagctgg ggaggagggc 180

gatggcactg tggcangaag tgaantagtg tgggtgcctn gcaccccagg cacggccagc 240

ctggggtatg gacccggggc cntctgttct agagcaggaa ggtatggtga ggacctcaaa 300

aggacagcca ctggagagct ccaggcagag gnacttgaga ggccctgggg ccatcctgtc 360

tcttttctgg gtctgtgtgc tctgggcctg ggcccttcct ctgctccccc gggcttggag 420

agggctggcc ttgcctcgtg caaaggacca ctctagactg gtaccaagtc tggcccatgg 480

cctcctgtgg gtgcaggcct gtgcgggtga cctgagagcc agggctggca ggtcagagtc 540

aggagaggga tggcagtgga tgccctgtgc aggatctgcc taatcatggt gaggctggag 600

gaatccaaag tgggcatgca ctctgcactc atttctttat tcatgtgtgc ccatcccaac 660

aagcagggag cctggccagg agggcccctg ggagaaggca ctgatgggct gtgttccatt 720

taggaaggat ggacggttgt gagacgggta agtcagaacg ggctgcccac ctcggccgag 780

agggccccgt ggtgggttgg caccatctgg gcctggagag ctgctcagga ggctctctag 840

ggctgggtga ccaggnctgg ggtacagtag ccatgggagc aggtgcttac ctggggctgt 900

ccctgagcag gggctgcatt gggtgctctg tgagcacaca cttctctatt cacctgagtc 960

ccnctgagtg atgagnacac ccttgttttg cagatgaatc tgagcatgga gatgttaagt 1020

ggcttgcctg agccacacag cagatggatg gtgtagctgg gacctgaggg caggcagtcc 1080

cagcccgagg acttcccaag gttgtggcaa actctgacag catgacccca gggaacaccc 1140

atctcagctc tggtcagaca ctgcggagtt gtgttgtaac ccacacagct ggagacagcc 1200

accctagccc cacccttatc ctctcccaaa ggaacctgcc ctttcccttc attttcctct 1260

tactgcattg agggaccaca cagtgtggca gaaggaacat gggttcagga cccagatgga 1320

cttgcttcac agtgcagccc tcctgtcctc ttgcagagtg cgtcttccac tgtgaagttg 1380

ggacagtcac accaactcaa tactgctggg cccgtcacac ggtgggcagg caacggatgg 1440

cagtcactgg ctgtgggtct gcagaggtgg 1470

<210>62

<211>1011

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR62

<400>62

agtgtcaaat agatctacac aaaacaagat aatgtctgcc catttttcca aagataatgt 60

ggtgaagtgg gtagagagaa atgcatccat tctccccacc caacctctgc taaattgtcc 120

atgtcacagt actgagacca gggggcttat tcccagcggg cagaatgtgc accaagcacc 180

tcttgtctca atttgcagtc taggccctgc tatttgatgg tgtgaaggct tgcacctggc 240

atggaaggtc cgttttgtac ttcttgcttt agcagttcaa agagcaggga gagctgcgag 300

ggcctctgca gcttcagatg gatgtggtca gcttgttgga ggcgccttct gtggtccatt 360

atctccagcc cccctgcggt gttgctgttt gcttggcttg tctggctctc catgccttgt 420

tggctccaaa atgtcatcat gctgcacccc aggaagaatg tgcaggccca tctcttttat 480

gtgctttggg ctattttgat tccccgttgg gtatattccc taggtaagac ccagaagaca 540

caggaggtag ttgctttggg agagtttgga cctatgggta tgaggtaata gacacagtat 600

cttctctttc atttggtgag actgttagct ctggccgcgg actgaattcc acacagctca 660

cttgggaaaa ctttattcca aaacatagtc acattgaaca ttgtggagaa tgagggacag 720

agaagaggcc ctagatttgt acatctgggt gttatgtcta taaatagaat gctttggtgg 780

tcaactagac ttgttcatgt tgacatttag tcttgccttt tcggtggtga tttaaaaatt 840

atgtatatct tgtttggaat atagtggagc tatggtgtgg cattttcatc tggctttttg 900

tttagctcag cccgtcctgt tatgggcagc cttgaagctc agtagctaat gaagaggtat 960

cctcactccc tccagagagc ggtcccctca cggctcattg agagtttgtc a 1011

<210>63

<211>1410

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR63

<400>63

ccacagcctg atcgtgctgt cgatgagagg aatctgctct aagggtctga gcggagggag 60

atgccgaagc tttgagcttt ttgtttctgg cttaaccttg gtggattttc accctctggg 120

cattacctct tgtccagggg aggggctggg ggagtgcctg gagctgtagg gacagagggc 180

tgagtggggg ggactgcttg ggctgaccac ataatattct gctgcgtatt aatttttttt 240

tgagacagtc tttctctgtt gcccaggctg gagtgtaatg gcttgatagc tcactgccac 300

ctccgcctcc tgggttcaag tgattctcct gcttcagctt ccggagtagc tgggactgca 360

ggtgcccgcc accatggctg gctaattttt gtatttttat tagcaatggg gttttgctat 420

gttgcccagg ccggtcccga actcctgccc tcaagtgata cacctgcctc ggcctcccaa 480

agtgctggga ttagaggctt gagccactgc gcctggccag ctgcatattg ttaattagac 540

ataaaatgca aaataagatg atataaacac aaaggtgtga aataagatgg acacctgctg 600

agcgcgcctg tcctgaagca tcgcccctct gcaaaagcag gggtcagcat gtgttctccg 660

gtccttgctc ttacagagga gtgagctgcc tatgcgtctt ccagccactt cctgggctgc 720

tcagaggcct ctcacgggtg ttctgggttg ctgccacttg caggggtgct gaggcggggc 780

tcctcccgtg cggggcatgt ccaggccgcc ctctctgaag gcttggcagg tacaggtggg 840

agtgggggtc tctgggctgc tgtggggact gggcaggctc ctggaagacc tccctgtgtt 900

tgggctgaaa gcgcagcccg aggggaggtc cccagggagg ccgctgtcgg gggtgggggc 960

ttggaggagg gaggggccga ggagccggcg acactccgtg acggcccagg aacgtcccta 1020

aacaaggcgc cgcgttctcg atggggtggg gtccgctttc ttttctcaaa agctgcagtt 1080

actccatgct cggaggactg gcgtccgcgc cctgttccaa tgctgccccg gggccctggc 1140

cttggggaat cggggccttg gactggaccc tgggggcttc gcggagccgg gcctggcggg 1200

gcgagcggag cagaggctgg gcagccccgg ggaagcgctc gccaaagccg ggcgctgctc 1260

ccagagcgcg aggtgcagaa ccagaggctg gtcccgcggc gctaacgaga gaagaggaag 1320

cgcgctgtgt agagggcgcc caccccgtgg ggcgaacccc cttcctcaac tccatggacg 1380

gggc tcatgg gt tcccagcg gc tcagacgc 1410

<210>64

<211>1414

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR64

<400>64

tggatcagat ttgttttata ccctcccttc tactgctctg agagttgtac atcacagtct 60

actgtatctg tttcccatta ttataatttt tttgcactgt gcttgcctga agggagcctc 120

aagttcatga gtctccctac cctcctccca aatgagacat ggacctttga atgctttcct 180

gggaccacca ccccaccttt catgctgctg ttatccagga ttttagttca acagtgtttt 240

aaccccccaa atgagtcatt tttattgttt cgtatagtga atgtgtattt gggtttgctt 300

atatggtgac ctgtttattt gctcctcatt gtacctcatg ctctgctctt tccttctaga 360

ttcagtctct ttcctaatga ggtgtctcgc agcaattctt tacaagacag ccaagatagg 420

ccagctctca gagcacttgt tgtctgaaaa agtcttgtct tatttaattt ctttttctta 480

gagatggggt ctcattatgt tacccacact ggtctcaaac ttctggctta aagcggtcct 540

cccaccttgg cctcccaaag tgctaggatt acaggcgtga gcgacctcgt ccagcctgtc 600

tgagaaagcg tttgttttgc ccttgctctc agatgacagt ttggggatag aattctaggt 660

ggacggtttt tttccttcag ccctttgaag agtctgtatt ttcattatct ccctgcatta 720

gatgttcttt tgcaagtaac gtgtcttttc tctctgggta ttcttaaggt tttctctttg 780

cctttggtga gctgcagtgg atttgctttt ttcaagaggt caagagaaag gaaagtgtga 840

ggtttctgtt ttttactgac aatttgtttg ttgatttgtt ttcccaccca gaggttcctt 900

gccactttgc caggctggaa ggcagacttc ttctggtgtc ctgttcacag acggggcagc 960

ctgcggaagg ccctgccaca tgcagggcct cggtcctcat tcccttgcat gtggacccgg 1020

gcgtgactcc tgttcaggct ggcacttccc agagctgagc cccagcctga ccttcctccc 1080

atactgtctt cacaccccct cctttcttct gatacctgga ggttttcctt tctttcctgt 1140

cacctccact tggattttaa atcctctgtc tgtggaattg tattcggcac aggaagatgc 1200

ttgcaagggc caggctcatc agccctgtcc ctgctgctgg aagcagcaca gcagagcctc 1260

atgctcaggc tgagatggag cagaggcctg cagacgagca cccagctcag ctggggttgg 1320

cgccgatggt ggagggtcct cgaaagctct ggggacgatg gcagagctat tggcagggga 1380

gccgcagggt cttttgagcc cttaaaagat ctct 1414

<210>65

<211>1310

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR65

<400>65

gtgaatgttg atggatcaaa tatctttctg tgttgtttat caaagttaaa ataaatgtgg 60

tcatttaaag gacaaaagat gaggggttgg agtctgttca agcaaagggt atattaggag 120

aaaagcagaa ttctctccct gtgaagggac agtgactcct attttccacc tcatttttac 180

taactctcct aactatctgc ttaggtagag atatatccat gtacatttat aaaccacagt 240

gaatcatttg attttggaat aaagatagta taaaatgtgt cccagtgttg atatacatca 300

tacattaaat atgtctggca gtgttctaat tttacagttg tccaaagata atgttagggc 360

atactggcta tggatgaagc tccaatgttc agattgcaaa gaaacttaga attttactaa 420

tgaaaccaaa tacatcccaa gaaatttttc agaagaaaaa aagagaaact agtagcaaag 480

taaagaatca ccacaatatc atcagatttt ttttatatgt agaatattta ttcagttctt 540

ttttcaagta caccttgtct tcattcattg tactttattt tttgtgaagg tttaaattta 600

tttcttctat gtgtttagtg atatttaaaa tttttattta atcaagttta tcagaaagtt 660

ctgttagaaa atatgacgag gctttaattc cgccatctat attttccgct attatataaa 720

gataattgtt ttctcttttt aaaacaactt gaattgggat tttatatcat aattttttaa 780

tgtctttttt tattatactt taagttctgg gatacatgtg cagaacgtgc aggtgtgtta 840

catagatata cacgtgccat ggtggtttgc tgcacccact aacctgttat cgacattagg 900

tatttctcct aatgctatca ccccctattt ccccaccccc cgagaggccc cagtgtgtga 960

tgttctcctc cctgtgtcca tgtgttctca ttgttcatct cccacttatg gtatctacca 1020

taaccttgaa attgtcttat gcattcactt gtttggttgt tatatagcct ccatcaggac 1080

agggatattt gctgctgctt cttttttttt tctttttgag acagtcttgc tccgtcatcc 1140

aggctggagt gcttctcggc tcaatgcaac ctccacctcc caggtttaag cgattctcca 1200

acttcagcct cccaaatggc tgggactgca ggcatgcacc actacacctg gctaattttt 1260

gtatttgtaa tagagacaat gtttcaccat gttggccagg ctggtctcga 1310

<210>66

<211>1917

<212>DNA

<213>Homo sapiens

<220>

<221>misc_feature

<223>sequence of STAR67

<400>66

aggatcctaa aattttgtga ccctagagca agtactaact atgaaagtga aatagagaat 60

gaaggaatta tttaattaag tccagcaaaa cccaaccaaa tcatctgtaa aatatatttg 120

ttttcaacat ccaggtattt tctgtgtaaa aggttgagtt gtatgctgac ttattgggaa 180

aaataattga gttttcccct tcactttgcc agtgagagga aatcagtact gtaattgtta 240

aaggttaccc atacctacct ctactaccgt ctagcatagg taaagtaatg tacactgtga 300

agtttcctgc ttgactgtaa tgttttcagt ttcatcccat tgattcaaca gctatttatt 360

cagcacttac tacaaccatg ctggaaaccc aagagtaaat aggctgtgtt actcaacagg 420

actgaggtac agccgaactg tcaggcaagg ttgctgtcct ttggacttgc ctgctttctc 480

tctatgtagg aagaagaaat ggacataccg tccaggaaat agatatatgt tacatttcct 540

tattccataa ttaatattaa taaccctgga cagaaactac caagtttcta gacccttata 600

gtaccacctt accctttctg gatgaatcct tcacatgttg atacatttta tccaaatgaa 660

aattttggta ctgtaggtat aacagacaaa gagagaacag aaaactagag atgaagtttg 720

ggaaaaggtc aagaaagtaa ataatgcttc tagaagacac aaaaagaaaa atgaaatggt 780

aatgttggga aagttttaat acattttgcc ctaaggaaaa aaactacttg ttgaaattct 840

acttaagact ggaccttttc tctaaaaatt gtgcttgatg tgaattaaag caacacaggg 900

aaatttatgg gctccttcta agttctaccc aactcaccgc aaaactgttc ctagtaggtg 960

tggtatactc tttcagattc tttgtgtgta tgtatatgtg tgtgtgtgtg tgtgtttgta 1020

tgtgtacagt ctatatacat atgtgtacct acatgtgtgt atatataaat atatatttac 1080

ctggatgaaa tagcatatta tagaatattc ttttttcttt aaatatatat gtgcatacat 1140

atgtatatgc acatatatac ataaatgtag atatagctag gtaggcattc atgtgaaaca 1200

aagaagccta ttacttttta atggttgcat gatattccat cataggagta tagtacaact 1260

tatgtaacac acatttggct tgttgtaaaa ttttggtatt aataaaatag cacatatcat 1320

gcaaagacac ccttgcatag gtctattcat tctttgattt ttaccttagg acaaaattta 1380

aaagtagaat ttctgggtca agcagtatgc tcatttaaaa tgtcattgca tatttccaaa 1440

ttgtcctcca gaaaagtagt aacagtaaca attgatggac tgcgtgtttt ctaaaacttg 1500

catttttttc cttattggtg aggtttggca ttttccatat gtttattggc attttaattt 1560

tttttggttc atgtctttta ttcccttcct gcaaatttgt ggtgtgtctc aactttattt 1620

atactctcat tttcataatt ttctaaagga atttgacttt aaaaaaataa gacagccaat 1680

gctttggttt aatttcattg ctgctttttg aagtgactgc tgtgttttta tatactttta 1740

tattttgttg ttttagcaaa ttcttctata ttataattgt gtatgctgga acaaaaagtt 1800

atatttctta atctagataa aatatttcaa gatgttgtaa ttacagtccc ctctaaaatc 1860

atataaatag acgcatagct gtgtgatttg taattagtta tgtccattga tagatcc 1917

<210>67

<211>375

<212>DNA

<213>Artificial

<220>

<223>wt zeocin resi stance gene

<220>

<221>CDS

<222>(1)..(375)

<400>67

atg gcc aag ttg acc agt gcc gtt ccg gtg ctc acc gcg cgc gac gtc 48

Met Ala Lys Leu Thr Ser Ala Val Pro Val Leu Thr Ala Arg Asp Val

1 5 10 15

gcc gga gcg gtc gag ttc tgg acc gac cgg ctc ggg ttc tcc cgg gac 96

Ala Gly Ala Val Glu Phe Trp Thr Asp Arg Leu Gly Phe Ser Arg Asp

20 25 30

ttc gtg gag gac gac ttc gcc ggt gtg gtc cgg gac gac gtg acc ctg 144

Phe Val Glu Asp Asp Phe Ala Gly Val Val Arg Asp Asp Val Thr Leu

35 40 45

ttc atc agc gcg gtc cag gac cag gtg gtg ccg gac aac acc ctg gcc 192

Phe Ile Ser Ala Val Gln Asp Gln Val Val Pro Asp Asn Thr Leu Ala

50 55 60

tgg gtg tgg gtg cgc ggc ctg gac gag ctg tac gcc gag tgg tcg gag 240

Trp Val Trp Val Arg Gly Leu Asp Glu Leu Tyr Ala Glu Trp Ser Glu

65 70 75 80

gtc gtg tcc acg aac ttc cgg gac gcc tcc ggg ccg gcc atg acc gag 288

Val Val Ser Thr Asn Phe Arg Asp Ala Ser Gly Pro Ala Met Thr Glu

85 90 95

atc ggc gag cag ccg tgg ggg cgg gag ttc gcc ctg cgc gac ccg gcc 336

Ile Gly Glu Gln Pro Trp Gly Arg Glu Phe Ala Leu Arg Asp Pro Ala

100 105 110

ggc aac tgc gtg cac ttc gtg gcc gag gag cag gac tga 375

Gly Asn Cys Val His Phe Val Ala Glu Glu Gln Asp

115 120

<210>68

<211>124

<212>PRT

<213>Artificial

<220>

<223>Synthetic Construct

<400>68

Met Ala Lys Leu Thr Ser Ala Val Pro Val Leu Thr Ala Arg Asp Val

1 5 10 15

Ala Gly Ala Val Glu Phe Trp Thr Asp Arg Leu Gly Phe Ser Arg Asp

20 25 30

Phe Val Glu Asp Asp Phe Ala Gly Val Val Arg Asp Asp Val Thr Leu

35 40 45

Phe Ile Ser Ala Val Gln Asp Gln Val Val Pro Asp Asn Thr Leu Ala

50 55 60

Trp Val Trp Val Arg Gly Leu Asp Glu Leu Tyr Ala Glu Trp Ser Glu

65 70 75 80

Val Val Ser Thr Asn Phe Arg Asp Ala Ser Gly Pro Ala Met Thr Glu

85 90 95

Ile Gly Glu Gln Pro Trp Gly Arg Glu Phe Ala Leu Arg Asp Pro Ala

100 105 110

Gly Asn Cys Val His Phe Val Ala Glu Glu Gln Asp

115 120

<210>69

<211>399

<212>DNA

<213>Artificial

<220>

<223>wt blasticidin resistance gene

<220>

<221>CDS

<222>(1)..(399)

<400>69

atg gcc aag cct ttg tct caa gaa gaa tcc acc ctc att gaa aga gca 48

Met Ala Lys Pro Leu Ser Gln Glu Glu Ser Thr Leu Ile Glu Arg Ala

1 5 10 15

acg gct aca atc aac agc atc ccc atc tct gaa gac tac agc gtc gcc 96

Thr Ala Thr Ile Asn Ser Ile Pro Ile Ser Glu Asp Tyr Ser Val Ala

20 25 30

agc gca gct ctc tct agc gac ggc cgc atc ttc act ggt gtc aat gta 144

Ser Ala Ala Leu Ser Ser Asp Gly Arg Ile Phe Thr Gly Val Asn Val

35 40 45

tat cat ttt act ggg gga cct tgt gca gaa ctc gtg gtg ctg ggc act 192

Tyr His Phe Thr Gly Gly Pro Cys Ala Glu Leu Val Val Leu Gly Thr

50 55 60

gct gct gct gcg gca gct ggc aac ctg act tgt atc gtc gcg atc gga 240

Ala Ala Ala Ala Ala Ala Gly Asn Leu Thr Cys Ile Val Ala Ile Gly

65 70 75 80

aat gag aac agg ggc atc ttg agc ccc tgc gga cgg tgc cga cag gtg 288

Asn Glu Asn Arg Gly Ile Leu Ser Pro Cys Gly Arg Cys Arg Gln Val

85 90 95

ctt ctc gat ctg cat cct ggg atc aaa gcc ata gtg aag gac agt gat 336

Leu Leu Asp Leu His Pro Gly Ile Lys Ala Ile Val Lys Asp Ser Asp

100 105 110

gga cag ccg acg gca gtt ggg att cgt gaa ttg ctg ccc tct ggt tat 384

Gly Gln Pro Thr Ala Val Gly Ile Arg Glu Leu Leu Pro Ser Gly Tyr

115 120 125

gtg tgg gag ggc taa 399

Val Trp Glu Gly

130

<210>70

<211>132

<212>PRT

<213>Artificial

<220>

<223>Synthetic Construct

<400>70

Met Ala Lys Pro Leu Ser Gln Glu Glu Ser Thr Leu Ile Glu Arg Ala

1 5 10 15

Thr Ala Thr Ile Asn Ser Ile Pro Ile Ser Glu Asp Tyr Ser Val Ala

20 25 30

Ser Ala Ala Leu Ser Ser Asp Gly Arg Ile Phe Thr Gly Val Asn Val

35 40 45

Tyr His Phe Thr Gly Gly Pro Cys Ala Glu Leu Val Val Leu Gly Thr

50 55 60

Ala Ala Ala Ala Ala Ala Gly Asn Leu Thr Cys Ile Val Ala Ile Gly

65 70 75 80

Asn Glu Asn Arg Gly Ile Leu Ser Pro Cys Gly Arg Cys Arg Gln Val

85 90 95

Leu Leu Asp Leu His Pro Gly Ile Lys Ala Ile Val Lys Asp Ser Asp

100 105 110

Gly Gln Pro Thr Ala Val Gly Ile Arg Glu Leu Leu Pro Ser Gly Tyr

115 120 125

Val Trp Glu Gly

130

<210>71

<211>600

<212>DNA

<213>Artificial

<220>

<223>wt puromycin resistance gene

<220>

<221>CDS

<222>(1)..(600)

<400>71

atg acc gag tac aag ccc acg gtg cgc ctc gcc acc cgc gac gac gtc 48

Met Thr Glu Tyr Lys Pro Thr Val Arg Leu Ala Thr Arg Asp Asp Val

1 5 10 15

ccc agg gcc gta cgc acc crc gcc gcc gcg ttc gcc gac tac ccc gcc 96

Pro Arg Ala Val Arg Thr Leu Ala Ala Ala Phe Ala Asp Tyr Pro Ala

20 25 30

acg cgc cac acc gtc gat ccg gac cgc cac atc gag cgg gtc acc gag 144

Thr Arg His Thr Val Asp Pro Asp Arg His Ile Glu Arg Val Thr Glu

35 40 45

ctg caa gaa ctc ttc ctc acg cgc gtc ggg crc gac atc ggc aag gtg 192

Leu Gln Glu Leu Phe Leu Thr Arg Val Gly Leu Asp Ile Gly Lys Val

50 55 60

tgg gtc gcg gac gac ggc gcc gcg gtg gcg gtc tgg acc acg ccg gag 240

Trp Val Ala Asp Asp Gly Ala Ala Val Ala Val Trp Thr Thr Pro Glu

65 70 75 80

agc gtc gaa gcg ggg gcg gtg ttc gcc gag atc ggc ccg cgc atg gcc 288

Ser Val Glu Ala Gly Ala Val Phe Ala Glu Ile Gly Pro Arg Met Ala

85 90 95

gag ttg agc ggt tcc cgg ctg gcc gcg cag caa cag atg gaa ggc ctc 336

Glu Leu Ser Gly Ser Arg Leu Ala Ala Gln Gln Gln Met Glu Gly Leu

100 105 110

ctg gcg ccg cac cgg ccc aag gag ccc gcg tgg ttc ctg gcc acc gtc 384

Leu Ala Pro His Arg Pro Lys Glu Pro Ala Trp Phe Leu Ala Thr Val

115 120 125

ggc gtc tcg ccc gac cac cag ggc aag ggt ctg ggc agc gcc gtc gtg 432

Gly Val Ser Pro Asp His Gln Gly Lys Gly Leu Gly Ser Ala Val Val

130 135 140

ctc ccc gga gtg gag gcg gcc gag cgc gcc ggg gtg ccc gcc ttc ctg 480

Leu Pro Gly Val Glu Ala Ala Glu Arg Ala Gly Val Pro Ala Phe Leu

145 150 155 160

gag acc tcc gcg ccc cgc aac ctc ccc ttc tac gag cgg ctc ggc ttc 528

Glu Thr Ser Ala Pro Arg Asn Leu Pro Phe Tyr Glu Arg Leu Gly Phe

165 170 175

acc gtc acc gcc gac gtc gag tgc ccg aag gac cgc gcg acc tgg tgc 576

Thr Val Thr Ala Asp Val Glu Cys Pro Lys Asp Arg Ala Thr Trp Cys

180 185 190

atg acc cgc aag ccc ggt gcc tga 600

Met Thr Arg Lys Pro Gly Ala

195

<210>72

<211>199

<212>PRT

<213>Artificial

<220>

<223>Synthetic Construct

<400>72

Met Thr Glu Tyr Lys Pro Thr Val Arg Leu Ala Thr Arg Asp Asp Val

1 5 10 15

Pro Arg Ala Val Arg Thr Leu Ala Ala Ala Phe Ala Asp Tyr Pro Ala

20 25 30

Thr Arg His Thr Val Asp Pro Asp Arg His Ile Glu Arg Val Thr Glu

35 40 45

Leu Gln Glu Leu Phe Leu Thr Arg Val Gly Leu Asp Ile Gly Lys Val

50 55 60

Trp Val Ala Asp Asp Gly Ala Ala Val Ala Val Trp Thr Thr Pro Glu

65 70 75 80

Ser Val Glu Ala Gly Ala Val Phe Ala Glu Ile Gly Pro Arg Met Ala

85 90 95

Glu Leu Ser Gly Ser Arg Leu Ala Ala Gln Gln Gln Met Glu Gly Leu

100 105 110

Leu Ala Pro His Arg Pro Lys Glu Pro Ala Trp Phe Leu Ala Thr Val

115 120 125

Gly Val Ser Pro Asp His Gln Gly Lys Gly Leu Gly Ser Ala Val Val

130 135 140

Leu Pro Gly Val Glu Ala Ala Glu Arg Ala Gly Val Pro Ala Phe Leu

145 150 155 160

Glu Thr Ser Ala Pro Arg Asn Leu Pro Phe Tyr Glu Arg Leu Gly Phe

165 170 175

Thr Val Thr Ala Asp Val Glu Cys Pro Lys Asp Arg Ala Thr Trp Cys

180 185 190

Met Thr Arg Lys Pro Gly Ala

195

<210>73

<211>564

<212>DNA

<213>Artificial

<220>

<223>wt DHFR gene(from mouse)

<220>

<221>CDS

<222>(1)..(564)

<400>73

atg gtt cga cca ttg aac tgc atc gtc gcc gtg tcc caa aat atg ggg 48

Met Val Arg Pro Leu Asn Cys Ile Val Ala Val Ser Gln Asn Met Gly

1 5 10 15

att ggc aag aac gga gac cta ccc tgg cct ccg ctc agg aac gag ttc 96

Ile Gly Lys Asn Gly Asp Leu Pro Trp Pro Pro Leu Arg Asn Glu Phe

20 25 30

aag tac ttc caa aga atg acc aca acc tct tca gtg gaa ggt aaa cag 144

Lys Tyr Phe Gln Arg Met Thr Thr Thr Ser Ser Val Glu Gly Lys Gln

35 40 45

aat ctg gtg att atg ggt agg aaa acc tgg ttc tcc att cct gag aag 192

Asn Leu Val Ile Met Gly Arg Lys Thr Trp Phe Ser Ile Pro Glu Lys

50 55 60

aat cga cct tta aag gac aga att aat ata gtt ctc agt aga gaa ctc 240

Asn Arg Pro Leu Lys Asp Arg Ile Asn Ile Val Leu Ser Arg Glu Leu

65 70 75 80

aaa gaa cca cca cga gga gct cat ttt ctt gcc aaa agt ttg gat gat 288

Lys Glu Pro Pro Arg Gly Ala His Phe Leu Ala Lys Ser Leu Asp Asp

85 90 95

gcc tta aga ctt att gaa caa ccg gaa ttg gca agt aaa gta gac atg 336

Ala Leu Arg Leu Ile Glu Gln Pro Glu Leu Ala Ser Lys Val Asp Met

100 105 110

gtt tgg ata gtc gga ggc agt tct gtt tac cag gaa gcc atg aat caa 384

Val Trp Ile Val Gly Gly Ser Ser Val Tyr Gln Glu Ala Met Asn Gln

115 120 125

cca ggc cac ctc aga ctc ttt gtg aca agg atc atg cag gaa ttt gaa 432

Pro Gly His Leu Arg Leu Phe Val Thr Arg Ile Met Gln Glu Phe Glu

130 135 140

agt gac acg ttt ttc cca gaa att gat ttg ggg aaa tat aaa ctt ctc 480

Ser Asp Thr Phe Phe Pro Glu Ile Asp Leu Gly Lys Tyr Lys Leu Leu

145 150 155 160

cca gaa tac cca ggc gtc ctc tct gag gtc cag gag gaa aaa ggc atc 528

Pro Glu Tyr Pro Gly Val Leu Ser Glu Val Gln Glu Glu Lys Gly Ile

165 170 175

aag tat aag ttt gaa gtc tac gag aag aaa gac taa 564

Lys Tyr Lys Phe Glu Val Tyr Glu Lys Lys Asp

180 185

<210>74

<211>187

<212>PRT

<213>Artificial

<220>

<223>Synthetic Construct

<400>74

Met Val Arg Pro Leu Asn Cys Ile Val Ala Val Ser Gln Asn Met Gly

1 5 10 15

Ile Gly Lys Asn Gly Asp Leu Pro Trp Pro Pro Leu Arg Asn Glu Phe

20 25 30

Lys Tyr Phe Gln Arg Met Thr Thr Thr Ser Ser Val Glu Gly Lys Gln

35 40 45

Asn Leu Val Ile Met Gly Arg Lys Thr Trp Phe Ser Ile Pro Glu Lys

50 55 60

Asn Arg Pro Leu Lys Asp Arg Ile Asn Ile Val Leu Ser Arg Glu Leu

65 70 75 80

Lys Glu Pro Pro Arg Gly Ala His Phe Leu Ala Lys Ser Leu Asp Asp

85 90 95

Ala Leu Arg Leu Ile Glu Gln Pro Glu Leu Ala Ser Lys Val Asp Met

100 105 110

Val Trp Ile Val Gly Gly Ser Ser Val Tyr Gln Glu Ala Met Asn Gln

115 120 125

Pro Gly His Leu Arg Leu Phe Val Thr Arg Ile Met Gln Glu Phe Glu

130 135 140

Ser Asp Thr Phe Phe Pro Glu Ile Asp Leu Gly Lys Tyr Lys Leu Leu

145 150 155 160

Pro Glu Tyr Pro Gly Val Leu Ser Glu Val Gln Glu Glu Lys Gly Ile

165 170 175

Lys Tyr Lys Phe Glu Val Tyr Glu Lys Lys Asp

180 185

<210>75

<211>1143

<212>DNA

<213>Artificial

<220>

<223>wt hygromycin resistance gene

<220>

<221>CDS

<222>(1)..(1143)

<400>75

atg aaa aag cct gaa ctc acc gcg acg tct gtc gag aag ttt ctg atc 48

Met Lys Lys Pro Glu Leu Thr Ala Thr Ser Val Glu Lys Phe Leu Ile

1 5 10 15

gaa aag ttc gac agc gtc tcc gac ctg atg cag ctc tcg gag ggc gaa 96

Glu Lys Phe Asp Ser Val Ser Asp Leu Met Gln Leu Ser Glu Gly Glu

20 25 30

gaa tct cgt gct ttc agc ttc gat gta gga ggg cgt gga tat gtc ctg 144

Glu Ser Arg Ala Phe Ser Phe Asp Val Gly Gly Arg Gly Tyr Val Leu

35 40 45

cgg gta aat agc tgc gcc gat ggt ttc tac aaa gat cgt tat gtt tat 192

Arg Val Asn Ser Cys Ala Asp Gly Phe Tyr Lys Asp Arg Tyr Val Tyr

50 55 60

cgg cac ttt gca tcg gcc gcg ctc ccg att ccg gaa gtg ctt gac att 240

Arg His Phe Ala Ser Ala Ala Leu Pro Ile Pro Glu Val Leu Asp Ile

65 70 75 80

ggg gaa ttc agc gag agc ctg acc tat tgc atc tcc cgc cgt gca cag 288

Gly Glu Phe Ser Glu Ser Leu Thr Tyr Cys Ile Ser Arg Arg Ala Gln

85 90 95

ggt gtc acg ttg caa gac ctg cct gaa acc gaa ctg ccc gct gtt ctg 336

Gly Val Thr Leu Gln Asp Leu Pro Glu Thr Glu Leu Pro Ala Val Leu

100 105 110

cag ccg gtc gcg gag gcc atg gat gcg atc gct gcg gcc gat ctt agc 384

Gln Pro Val Ala Glu Ala Met Asp Ala Ile Ala Ala Ala Asp Leu Ser

115 120 125

cag acg agc ggg ttc ggc cca ttc gga ccg caa gga atc ggt caa tac 432

Gln Thr Ser Gly Phe Gly Pro Phe Gly Pro Gln Gly Ile Gly Gln Tyr

130 135 140

act aca tgg cgt gat ttc ata tgc gcg att gct gat ccc cat gtg tat 480

Thr Thr Trp Arg Asp Phe Ile Cys Ala Ile Ala Asp Pro His Val Tyr

145 150 155 160

cac tgg caa act gtg atg gac gac acc gtc agt gcg tcc gtc gcg cag 528

His Trp Gln Thr Val Met Asp Asp Thr Val Ser Ala Ser Val Ala Gln

165 170 175

gct ctc gat gag ctg atg ctt tgg gcc gag gac tgc ccc gaa gtc cgg 576

Ala Leu Asp Glu Leu Met Leu Trp Ala Glu Asp Cys Pro Glu Val Arg

180 185 190

cac ctc gtg cac gcg gat ttc ggc tcc aac aat gtc ctg acg gac aat 624

His Leu Val His Ala Asp Phe Gly Ser Asn Asn Val Leu Thr Asp Asn

195 200 205

ggc cgc ata aca gcg gtc att gac tgg agc gag gcg atg ttc ggg gat 672

Gly Arg Ile Thr Ala Val Ile Asp Trp Ser Glu Ala Met Phe Gly Asp

210 215 220

tcc caa tac gag gtc gcc aac atc ttc ttc tgg agg ccg tgg ttg gct 720

Ser Gln Tyr Glu Val Ala Asn Ile Phe Phe Trp Arg Pro Trp Leu Ala

225 230 235 240

tgt atg gag cag cag acg cgc tac ttc gag cgg agg cat ccg gag ctt 768

Cys Met Glu Gln Gln Thr Arg Tyr Phe Glu Arg Arg His Pro Glu Leu

245 250 255

gca gga tcg ccg cgg ctc cgg gcg tat atg ctc cgc att ggt ctt gac 816

Ala Gly Ser Pro Arg Leu Arg Ala Tyr Met Leu Arg Ile Gly Leu Asp

260 265 270

caa ctc tat cag agc ttg gtt gac ggc aat ttc gat gat gca gct tgg 864

Gln Leu Tyr Gln Ser Leu Val Asp Gly Asn Phe Asp Asp Ala Ala Trp

275 280 285

gcg cag ggt cga tgc gac gca atc gtc cga tcc gga gcc ggg act gtc 912

Ala Gln Gly Arg Cys Asp Ala Ile Val Arg Ser Gly Ala Gly Thr Val

290 295 300

ggg cgt aca caa atc gcc cgc aga agc gcg gcc gtc tgg acc gat ggc 960

Gly Arg Thr Gln lle Ala Arg Arg Ser Ala Ala Val Trp Thr Asp Gly

305 310 315 320

tgt gta gaa gta ctc gcc gat agt gga aac cga cgc ccc agc act cgt 1008

Cys Val Glu Val Leu Ala Asp Ser Gly Asn Arg Arg Pro Ser Thr Arg

325 330 335

ccg gag gca aag gaa ttc ggg aga tgg ggg agg cta act gaa aca cgg 1056

Pro Glu Ala Lys Glu Phe Gly Arg Trp Gly Arg Leu Thr Glu Thr Arg

340 345 350

aag gag aca ata ccg gaa gga acc cgc gct atg acg gca ata aaa aga 1104

Lys Glu Thr Ile Pro Glu Gly Thr Arg Ala Met Thr Ala Ile Lys Arg

355 360 365

cag aat aaa acg cac ggg tgt tgg gtc gtt tgt tca taa 1143

Gln Asn Lys Thr His Gly Cys Trp Val Val Cys Ser

370 375 380

<210>76

<211>380

<212>PRT

<213>Artificial

<220>

<223>Synthetic Construct

<400>76

Met Lys Lys Pro Glu Leu Thr Ala Thr Ser Val Glu Lys Phe Leu Ile

1 5 10 15

Glu Lys Phe Asp Ser Val Ser Asp Leu Met Gln Leu Ser Glu Gly Glu

20 25 30

Glu Ser Arg Ala Phe Ser Phe Asp Val Gly Gly Arg Gly Tyr Val Leu

35 40 45

Arg Val Asn Ser Cys Ala Asp Gly Phe Tyr Lys Asp Arg Tyr Val Tyr

50 55 60

Arg His Phe Ala Ser Ala Ala Leu Pro Ile Pro Glu Val Leu Asp Ile

65 70 75 80

Gly Glu Phe Ser Glu Ser Leu Thr Tyr Cys Ile Ser Arg Arg Ala Gln

85 90 95

Gly Val Thr Leu Gln Asp Leu Pro Glu Thr Glu Leu Pro Ala Val Leu

100 105 110

Gln Pro Val Ala Glu Ala Met Asp Ala Ile Ala Ala Ala Asp Leu Ser

115 120 125

Gln Thr Ser Gly Phe Gly Pro Phe Gly Pro Gln Gly Ile Gly Gln Tyr

130 135 140

Thr Thr Trp Arg Asp Phe Ile Cys Ala Ile Ala Asp Pro His Val Tyr

145 150 155 160

His Trp Gln Thr Val Met Asp Asp Thr Val Ser Ala Ser Val Ala Gln

165 170 175

Ala Leu Asp Glu Leu Met Leu Trp Ala Glu Asp Cys Pro Glu Val Arg

180 185 190

His Leu Val His Ala Asp Phe Gly Ser Asn Asn Val Leu Thr Asp Asn

195 200 205

Gly Arg Ile Thr Ala Val Ile Asp Trp Ser Glu Ala Met Phe Gly Asp

210 215 220

Ser Gln Tyr Glu Val Ala Asn Ile Phe Phe Trp Arg Pro Trp Leu Ala

225 230 235 240

Cys Met Glu Gln Gln Thr Arg Tyr Phe Glu Arg Arg His Pro Glu Leu

245 250 255

Ala Gly Ser Pro Arg Leu Arg Ala Tyr Met Leu Arg Ile Gly Leu Asp

260 265 270

Gln Leu Tyr Gln Ser Leu Val Asp Gly Asn Phe Asp Asp Ala Ala Trp

275 280 285

Ala Gln Gly Arg Cys Asp Ala Ile Val Arg Ser Gly Ala Gly Thr Val

290 295 300

Gly Arg Thr Gln Ile Ala Arg Arg Ser Ala Ala Val Trp Thr Asp Gly

305 310 315 320

Cys Val Glu Val Leu Ala Asp Ser Gly Asn Arg Arg Pro Ser Thr Arg

325 330 335

Pro Glu Ala Lys Glu Phe Gly Arg Trp Gly Arg Leu Thr Glu Thr Arg

340 345 350

Lys Glu Thr Ile Pro Glu Gly Thr Arg Ala Met Thr Ala Ile Lys Arg

355 360 365

Gln Asn Lys Thr His Gly Cys Trp Val Val Cys Ser

370 375 380

<210>77

<211>804

<212>DNA

<213>Artificial

<220>

<223>wt neomycin resistance gene

<220>

<221>CDS

<222>(1)..(804)

<400>77

atg gga tcg gcc att gaa caa gat gga ttg cac gca ggt tct ccg gcc 48

Met Gly Ser Ala Ile Glu Gln Asp Gly Leu His Ala Gly Ser Pro Ala

1 5 10 15

gct tgg gtg gag agg cta ttc ggc tat gac tgg gca caa cag aca atc 96

Ala Trp Val Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile

20 25 30

ggc tgc tct gat gcc gcc gtg ttc cgg ctg tca gcg cag ggg cgc ccg 144

Gly Cys Ser Asp Ala Ala Val Phe Arg Leu Ser Ala Gln Gly Arg Pro

35 40 45

gtt ctt ttt gtc aag acc gac ctg tcc ggt gcc ctg aat gaa ctg cag 192

Val Leu Phe Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln

50 55 60

gac gag gca gcg cgg cta tcg tgg ctg gcc acg acg ggc gtt cct tgc 240

Asp Glu Ala Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys

65 70 75 80

gca gct gtg ctc gac gtt gtc act gaa gcg gga agg gac tgg ctg cta 288

Ala Ala Val Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu

85 90 95

ttg ggc gaa gtg ccg ggg cag gat ctc ctg tca tct cac ctt gct cct 336

Leu Gly Glu Val Pro Gly Gln Asp Leu Leu Ser Ser His Leu Ala Pro

100 105 110

gcc gag aaa gta tcc atc atg gct gat gca atg cgg cgg ctg cat acg 384

Ala Glu Lys Val Ser Ile Met Ala Asp Ala Met Arg Arg Leu His Thr

115 120 125

ctt gat ccg gct acc tgc cca ttc gac cac caa gcg aaa cat cgc atc 432

Leu Asp Pro Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg Ile

130 135 140

gag cga gca cgt act cgg atg gaa gcc ggt ctt gtc gat cag gat gat 480

Glu Arg Ala Arg Thr Arg Met Glu Ala Gly Leu Val Asp Gln Asp Asp

145 150 155 160

ctg gac gaa gag cat cag ggg ctc gcg cca gcc gaa ctg ttc gcc agg 528

Leu Asp Glu Glu His Gln Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg

165 170 175

ctc aag gcg cgc atg ccc gac ggc gat gat ctc gtc gtg acc cat ggc 576

Leu Lys Ala Arg Met Pro Asp Gly Asp Asp Leu Val Val Thr His Gly

180 185 190

gat gcc tgc ttg ccg aat atc atg gtg gaa aat ggc cgc ttt tct gga 624

Asp Ala Cys Leu Pro Asn Ile Met Val Glu Asn Gly Arg Phe Ser Gly

195 200 205

ttc atc gac tgt ggc cgg ctg ggt gtg gcg gac cgc tat cag gac ata 672

Phe Ile Asp Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile

210 215 220

gcg ttg gct acc cgt gat att gct gaa gag ctt ggc ggc gaa tgg gct 720

Ala Leu Ala Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp Ala

225 230 235 240

gac cgc ttc ctc gtg ctt tac ggt atc gcc gct ccc gat tcg cag cgc 768

Asp Arg Phe Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg

245 250 255

atc gcc ttc tat cgc ctt ctt gac gag ttc ttc tga 804

Ile Ala Phe Tyr Arg Leu Leu Asp Glu Phe Phe

260 265

<210>78

<211>267

<212>PRT

<213>Artificial

<220>

<223>Synthetic Construct

<400>78

Met Gly Ser Ala Ile Glu Gln Asp Gly Leu His Ala Gly Ser Pro Ala

1 5 10 15

Ala Trp Val Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile

20 25 30

Gly Cys Ser Asp Ala Ala Val Phe Arg Leu Ser Ala Gln Gly Arg Pro

35 40 45

Val Leu Phe Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln

50 55 60

Asp Glu Ala Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys

65 70 75 80

Ala Ala Val Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu

85 90 95

Leu Gly Glu Val Pro Gly Gln Asp Leu Leu Ser Ser His Leu Ala Pro

100 105 110

Ala Glu Lys Val Ser Ile Met Ala Asp Ala Met Arg Arg Leu His Thr

115 120 125

Leu Asp Pro Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg Ile

130 135 140

Glu Arg Ala Arg Thr Arg Met Glu Ala Gly Leu Val Asp Gln Asp Asp

145 150 155 160

Leu Asp Glu Glu His Gln Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg

165 170 175

Leu Lys Ala Arg Met Pro Asp Gly Asp Asp Leu Val Val Thr His Gly

180 185 190

Asp Ala Cys Leu Pro Asn Ile Met Val Glu Asn Gly Arg Phe Ser Gly

195 200 205

Phe Ile Asp Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile

210 215 220

Ala Leu Ala Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp Ala

225 230 235 240

Asp Arg Phe Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg

245 250 255

Ile Ala Phe Tyr Arg Leu Leu Asp Glu Phe Phe

260 265

<210>79

<211>1121

<212>DNA

<213>Artificial

<220>

<223>wt glutamine synthase gene(human)

<220>

<221>CDS

<222>(1)..(1119)

<400>79

atg acc acc tca gca agt tcc cac tta aat aaa ggc atc aag cag gtg 48

Met Thr Thr Ser Ala Ser Ser His Leu Asn Lys Gly Ile Lys Gln Val

1 5 10 15

tac atg tcc ctg cct cag ggt gag aaa gtc cag gcc atg tat atc tgg 96

Tyr Met Ser Leu Pro Gln Gly Glu Lys Val Gln Ala Met Tyr Ile Trp

20 25 30

atc gat ggt act gga gaa gga ctg cgc tgc aag acc cgg acc ctg gac 144

Ile Asp Gly Thr Gly Glu Gly Leu Arg Cys Lys Thr Arg Thr Leu Asp

35 40 45

agt gag ccc aag tgt gtg gaa gag ttg cct gag tgg aat ttc gat ggc 192

Ser Glu Pro Lys Cys Val Glu Glu Leu Pro Glu Trp Asn Phe Asp Gly

50 55 60

tcc agt act tta cag tct gag ggt tcc aac agt gac atg tat ctc gtg 240

Ser Ser Thr Leu Gln Ser Glu Gly Ser Asn Ser Asp Met Tyr Leu Val

65 70 75 80

cct gct gcc atg ttt cgg gac ccc ttc cgt aag gac cct aac aag ctg 288

Pro Ala Ala Met Phe Arg Asp Pro Phe Arg Lys Asp Pro Asn Lys Leu

85 90 95

gtg tta tgt gaa gtt ttc aag tac aat cga agg cct gca gag acc aat 336

Val Leu Cys Glu Val Phe Lys Tyr Asn Arg Arg Pro Ala Glu Thr Asn

100 105 110

ttg agg cac acc tgt aaa cgg ata atg gac atg gtg agc aac cag cac 384

Leu Arg His Thr Cys Lys Arg Ile Met Asp Met Val Ser Asn Gln His

115 120 125

ccc tgg ttt ggc atg gag cag gag tat acc ctc atg ggg aca gat ggg 432

Pro Trp Phe Gly Met Glu Gln Glu Tyr Thr Leu Met Gly Thr Asp Gly

130 135 140

cac ccc ttt ggt tgg cct tcc aac ggc ttc cca ggg ccc cag ggt cca 480

His Pro Phe Gly Trp Pro Ser Asn Gly Phe Pro Gly Pro Gln Gly Pro

145 150 155 160

tat tac tgt ggt gtg gga gca gac aga gcc tat ggc agg gac atc gtg 528

Tyr Tyr Cys Gly Val Gly Ala Asp Arg Ala Tyr Gly Arg Asp Ile Val

165 170 175

gag gcc cat tac cgg gcc tgc ttg tat gct gga gtc aag att gcg ggg 576

Glu Ala His Tyr Arg Ala Cys Leu Tyr Ala Gly Val Lys Ile Ala Gly

180 185 190

act aat gcc gag gtc atg cct gcc cag tgg gaa ttt cag att gga cct 624

Thr Asn Ala Glu Val Met Pro Ala Gln Trp Glu Phe Gln Ile Gly Pro

195 200 205

tgt gaa gga atc agc atg gga gat cat ctc tgg gtg gcc cgt ttc atc 672

Cys Glu Gly Ile Ser Met Gly Asp His Leu Trp Val Ala Arg Phe Ile

210 215 220

ttg cat cgt gtg tgt gaa gac ttt gga gtg ata gca acc ttt gat cct 720

Leu His Arg Val Cys Glu Asp Phe Gly Val Ile Ala Thr Phe Asp Pro

225 230 235 240

aag ccc att cct ggg aac tgg aat ggt gca ggc tgc cat acc aac ttc 768

Lys Pro Ile Pro Gly Asn Trp Asn Gly Ala Gly Cys His Thr Asn Phe

245 250 255

agc acc aag gcc atg cgg gag gag aat ggt ctg aag tac atc gag gag 816

Ser Thr Lys Ala Met Arg Glu Glu Asn Gly Leu Lys Tyr Ile Glu Glu

260 265 270

gcc att gag aaa cta agc aag cgg cac cag tac cac atc cgt gcc tat 864

Ala Ile Glu Lys Leu Ser Lys Arg His Gln Tyr His Ile Arg Ala Tyr

275 280 285

gat ccc aag gga ggc ctg gac aat gcc cga cgt cta act gga ttc cat 912

Asp Pro Lys Gly Gly Leu Asp Asn Ala Arg Arg Leu Thr Gly Phe His

290 295 300

gaa acc tcc aac atc aac gac ttt tct ggt ggt gta gcc aat cgt agc 960

Glu Thr Ser Asn Ile Asn Asp Phe Ser Gly Gly Val Ala Asn Arg Ser

305 310 315 320

gcc agc ata cgc att ccc cgg act gtt ggc cag gag aag aag ggt tac 1008

Ala Ser Ile Arg Ile Pro Arg Thr Val Gly Gln Glu Lys Lys Gly Tyr

325 330 335

ttt gaa gat cgt cgc ccc tct gcc aac tgc gac ccc ttt tcg gtg aca 1056

Phe Glu Asp Arg Arg Pro Ser Ala Asn Cys Asp Pro Phe Ser Val Thr

340 345 350

gaa gcc ctc atc cgc acg tgt ctt ctc aat gaa acc ggc gat gag ccc 1104

Glu Ala Leu Ile Arg Thr Cys Leu Leu Asn Glu Thr Gly Asp Glu Pro

355 360 365

ttc cag tac aaa aat ta 1121

Phe Gln Tyr Lys Asn

370

<210>80

<211>373

<212>PRT

<213>Artificial

<220>

<223>Synthetic Construct

<400>80

Met Thr Thr Ser Ala Ser Ser His Leu Asn Lys Gly Ile Lys Gln Val

1 5 10 15

Tyr Met Ser Leu Pro Gln Gly Glu Lys Val Gln Ala Met Tyr Ile Trp

20 25 30

Ile Asp Gly Thr Gly Glu Gly Leu Arg Cys Lys Thr Arg Thr Leu Asp

35 40 45

Ser Glu Pro Lys Cys Val Glu Glu Leu Pro Glu Trp Asn Phe Asp Gly

50 55 60

Ser Ser Thr Leu Gln Ser Glu Gly Ser Asn Ser Asp Met Tyr Leu Val

65 70 75 80

Pro Ala Ala Met Phe Arg Asp Pro Phe Arg Lys Asp Pro Asn Lys Leu

85 90 95

Val Leu Cys Glu Val Phe Lys Tyr Asn Arg Arg Pro Ala Glu Thr Asn

100 105 110

Leu Arg His Thr Cys Lys Arg Ile Met Asp Met Val Ser Asn Gln His

115 120 125

Pro Trp Phe Gly Met Glu Gln Glu Tyr Thr Leu Met Gly Thr Asp Gly

130 135 140

His Pro Phe Gly Trp Pro Ser Asn Gly Phe Pro Gly Pro Gln Gly Pro

145 150 155 160

Tyr Tyr Cys Gly Val Gly Ala Asp Arg Ala Tyr Gly Arg Asp Ile Val

165 170 175

Glu Ala His Tyr Arg Ala Cys Leu Tyr Ala Gly Val Lys Ile Ala Gly

180 185 190

Thr Asn Ala Glu Val Met Pro Ala Gln Trp Glu Phe Gln Ile Gly Pro

195 200 205

Cys Glu Gly Ile Ser Met Gly Asp His Leu Trp Val Ala Arg Phe Ile

210 215 220

Leu His Arg Val Cys Glu Asp Phe Gly Val Ile Ala Thr Phe Asp Pro

225 230 235 240

Lys Pro Ile Pro Gly Asn Trp Asn Gly Ala Gly Cys His Thr Asn Phe

245 250 255

Ser Thr Lys Ala Met Arg Glu Glu Asn Gly Leu Lys Tyr Ile Glu Glu

260 265 270

Ala Ile Glu Lys Leu Ser Lys Arg His Gln Tyr His Ile Arg Ala Tyr

275 280 285

Asp Pro Lys Gly Gly Leu Asp Asn Ala Arg Arg Leu Thr Gly Phe His

290 295 300

Glu Thr Ser Asn Ile Asn Asp Phe Ser Gly Gly Val Ala Asn Arg Ser

305 310 315 320

Ala Ser Ile Arg Ile Pro Arg Thr Val Gly Gln Glu Lys Lys Gly Tyr

325 330 335

Phe Glu Asp Arg Arg Pro Ser Ala Asn Cys Asp Pro Phe Ser Val Thr

340 345 350

Glu Ala Leu Ile Arg Thr Cys Leu Leu Asn Glu Thr Gly Asp Glu Pro

355 360 365

Phe Gln Tyr Lys Asn

370

<210>81

<211>154

<212>DNA

<213>Artificial

<220>

<223>combined synthetic polyadenylation sequence and pausing signal

from the human alpha2 globin gene

<220>

<221>synthetic polyadenylation sequence

<222>(1)..(49)

<220>

<221>cloning site

<222>(50)..(62)

<220>

<221>pausing signal from the human alpha2 globin gene

<222>(63)..(154)

<400>81

aataaaatat ctttattttc attacatctg tgtgttggtt ttttgtgtga atcgatagta 60

ctaacatacg ctctccatca aaacaaaacg aaacaaaaca aactagcaaa ataggctgtc 120

cccagtgcaa gtgcaggtgc cagaacattt ctct 154

<210>82

<211>596

<212>DNA

<213>Artificial

<220>

<223>IRES sequence

<400>82

gcccctctcc ctcccccccc cctaacgtta ctggccgaag ccgcttggaa taaggccggt 60

gtgcgtttgt ctatatgtga ttttccacca tattgccgtc ttttggcaat gtgagggccc 120

ggaaacctgg ccctgtcttc ttgacgagca ttcctagggg tctttcccct ctcgccaaag 180

gaatgcaagg tctgttgaat gtcgtgaagg aagcagttcc tctggaagct tcttgaagac 240

aaacaacgtc tgtagcgacc ctttgcaggc agcggaaccc cccacctggc gacaggtgcc 300

tctgcggcca aaagccacgt gtataagata cacctgcaaa ggcggcacaa ccccagtgcc 360

acgttgtgag ttggatagtt gtggaaagag tcaaatggct ctcctcaagc gtattcaaca 420

aggggctgaa ggatgcccag aaggtacccc attgtatggg atctgatctg gggcctcggt 480

gcacatgctt tacatgtgtt tagtcgaggt taaaaaaacg tctaggcccc ccgaaccacg 540

gggacgtggt tttcctttga aaaacacgat gataagcttg ccacaacccc gggata 596

Claims

A DNA molecule comprising a polycistronic transcription unit comprising a coding sequence

i) A polypeptide of interest, and

ii) a selectable marker polypeptide functional in a eukaryotic host cell,

at least one of the code sequences of (a),

wherein the polypeptide of interest has a translation initiation sequence that is independent of the translation initiation sequence of the selectable marker polypeptide,

wherein the at least one coding sequence for the polypeptide of interest is located upstream of the at least one coding sequence for the selectable marker polypeptide in the polycistronic transcription unit,

wherein an Internal Ribosome Entry Site (IRES) is located downstream of at least one coding sequence for a polypeptide of interest and upstream of at least one coding sequence for a selectable marker polypeptide, and

characterized in that the coding sequence encoding the selectable marker polypeptide comprises a translation initiation sequence selected from the group consisting of:

a) a GTG start codon;

b) a TTG start codon;

c) a CTG start codon;

d) an ATT start codon; and

e) ACG initiation codon.
2. The DNA molecule of claim 1, wherein the translation initiation sequence of the selectable marker polypeptide comprises a GTG start codon or a TTG start codon.
3. The DNA molecule of claim 1 or 2, wherein the selectable marker polypeptide provides resistance to lethal or growth-inhibitory effects of the selective agent.
4. The DNA molecule of claim 3, wherein the selection agent is selected from the group consisting of: zeocin, puromycin, blasticidin, hygromycin, neomycin, methotrexate, methionine sulfoximine (methioninesulphomine) and kanamycin.
5. The DNA molecule of claim 3, wherein said selection agent is zeocin.
6. The DNA molecule of claim 1 or 2, wherein the selectable marker polypeptide is a 5,6, 7, 8-tetrahydrofolate synthetase.
7. The DNA molecule of claim 1 or 2, wherein said polycistronic transcription unit further comprises a sequence encoding a second selectable marker polypeptide functional in eukaryotic cells, wherein said sequence encoding a second selectable marker polypeptide:

a) having a translation initiation sequence which is independent of the translation initiation sequence of the polypeptide sequence of interest,

b) upstream of said sequence encoding a polypeptide of interest,

c) no ATG sequence is present on the coding strand after the start codon of said second selectable marker polypeptide up to the start codon of the polypeptide of interest, and

d) having a GTG start codon or a TTG start codon.
8. An expression cassette comprising a DNA molecule according to any one of claims 1 to 7, said expression cassette comprising a promoter upstream of said polycistronic transcription unit and a transcription termination sequence downstream of said polycistronic transcription unit, wherein said expression cassette is functional in a eukaryotic host cell and is capable of initiating transcription of the polycistronic transcription unit.
9. The expression cassette of claim 8, further comprising at least one chromatin control element selected from the group consisting of: a matrix or scaffold attachment region, an insulating sequence, a universal chromatin opening element, and an anti-repressor sequence.
10. A host cell comprising the DNA molecule of any one of claims 1 to 7 or the expression cassette of any one of claims 8 to 9, wherein the host cell is a mammalian cell.
11. The host cell of claim 10, wherein the cell is a CHO cell
12. A method of producing a host cell capable of expressing a polypeptide of interest, the method comprising:

a) introducing a DNA molecule according to any one of claims 1 to 7 or an expression cassette according to any one of claims 8 to 9 into a plurality of precursor cells; and

b) culturing the plurality of precursor cells under conditions suitable for expression of the selectable marker polypeptide, and;

c) selecting at least one host cell expressing the polypeptide of interest.
13. A method of expressing a polypeptide of interest comprising culturing a host cell comprising the expression cassette of any one of claims 8-9 and expressing the polypeptide of interest from the expression cassette.
14. The method of claim 13, further comprising harvesting the polypeptide of interest.
15. The method of claim 13 or 14, wherein the host cell is dhfr-bearing^-A CHO cell of a phenotype, and wherein the expression cassette comprises a coding sequence for a selectable marker polypeptide which is a 5,6, 7, 8-tetrahydrofolate synthetase, wherein the cell is cultured in a medium comprising folate, which medium is free of hypoxanthine and thymidine.