[go: up one dir, main page]

AU770871B2 - Non-endogenous, constitutively activated human G protein-coupled receptors - Google Patents

Non-endogenous, constitutively activated human G protein-coupled receptors Download PDF

Info

Publication number
AU770871B2
AU770871B2 AU62991/99A AU6299199A AU770871B2 AU 770871 B2 AU770871 B2 AU 770871B2 AU 62991/99 A AU62991/99 A AU 62991/99A AU 6299199 A AU6299199 A AU 6299199A AU 770871 B2 AU770871 B2 AU 770871B2
Authority
AU
Australia
Prior art keywords
leu
ala
val
ser
ile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU62991/99A
Other versions
AU6299199A (en
Inventor
Dominic P. Behan
Derek T. Chalmers
Chen W. Liaw
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Arena Pharmaceuticals Inc
Original Assignee
Arena Pharmaceuticals Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/170,496 external-priority patent/US6555339B1/en
Application filed by Arena Pharmaceuticals Inc filed Critical Arena Pharmaceuticals Inc
Priority claimed from PCT/US1999/024065 external-priority patent/WO2000022131A2/en
Publication of AU6299199A publication Critical patent/AU6299199A/en
Application granted granted Critical
Publication of AU770871B2 publication Critical patent/AU770871B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Landscapes

  • Peptides Or Proteins (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Description

NON-ENDOGENOUS CONSTITUTIVELY ACTIVATED HUMAN G PROTEIN-COUPLED RECEPTORS This patent application claims priority from, U.S. Patent Number 6,555,339, filed with the U.S. patent and Trademark Office as USSN 09/170,496 on October 13, 1998; U.S. Serial Number 09/417,044 filed on October 12, 1999; and U.S.
Serial Number 09/416,760 filed on October 12, 1999. This application also claims the benefit of priority from the following provisional applications, all filed via U.S.
Express Mail with the United States Patent and Trademark Office on the indicated dates: U.S. Provisional Number 60/110,060, filed November 27, 1998; U.S. Provisional Number 60/120,416, filed February 16, 1999; U.S. Provisional Number 60/121,852, filed February 26, 1999; U.S. Provisional Number 60/109,213, filed November 1998; U.S. Provisional Number 60/123,944, filed March 12, 1999; U.S. Provisional Number 60/123,945, filed March 12, 1999; U.S. Provisional Number 60/123,948, filed March 12, 1999; U.S. Provisional Number 60/123,951, filed March 12, 1999; U.S.
Provisional Number 60/123,946, filed March 12, 1999; U.S. Provisional Number a 60/123,949, filed March 12, 1999; U.S. Provisional Number 60/152,524, filed 20 September 3, 1999; U.S. Provisional Number 60/151,114, filed August 27, 1999 and o U.S. Provisional Number 60/108,029, filed November 12, 1998; U.S. Provisional Number 60/136,436, filed May 28, 1999; U.S. Provisional Number 60/136,439, filed May 28, 1999; U.S. Provisional Number 60/136,567, filed May 28, 1999; U.S.
Provisional Number 60/137,127, filed May 28, 1999; U.S. Provisional Number 25 60/137,131, filed May 28, 1999; U.S. Provisional Number 60/141,448, filed June 29, 1999; U.S. Provisional Number 60/136,437, filed May 28, 1999; U.S. Provisional Number 60/156,633, filed September 29, 1999; U.S. Provisional Number 60/156,555, S. filed September 29, 1999; U.S. Provisional Number 60/156,634, filed September 29, 1999; U.S. Provisional Number 60/156,653 filed September 29, 1999; U.S. Provisional Number 60/157,280 filed October 1, 1999; U.S. Provisional Number 60/157,281 filed October 1, 1999; U.S. Provisional Number 60/157,282 filed October 1, 1999; U.S.
Provisional Number 60/157,293 filed October 1, 1999; and U.S. Provisional Number 60/157,294 filed October 1, 1999. This application is also related to U.S. Serial Number 09/364,425, filed on July 30, 1999. Each of the foregoing applications are incorporated by reference herein in their entirety.
FIELD OF THE INVENTION The invention disclosed in this patent document relates to transmembrane receptors, and more particularly to human G protein-coupled receptors, and specifically to GPCRs that have been altered to establish or enhance constitutive activity of the receptor. Preferably, the altered GPCRs are used for the direct identification of candidate compounds as receptor agonists, inverse agonists or partial agonists having potential applicability as therapeutic agents.
BACKGROUND OF THE INVENTION 20 Although a number of receptor classes exist in humans, by far the most S: abundant and therapeutically relevant is represented by the G protein-coupled receptor (GPCR or GPCRs) class. It is estimated that there are some 100,000 genes within the human genome, and of these, approximately or 2,000 genes, are estimated to code for GPCRs. Receptors, including GPCRs, for which the endogenous ligand has been identified are referred to as "known" receptors, while receptors for which the endogenous ligand has not been identified are referred to as "orphan" receptors.
Nucleic acid encoding the native or wild-type GPR38 receptor was first described in isolated form by McKee et al., Genomics 46(3), 426-434, 1997. GPCRs represent an important area for the development of pharmaceutical products: from approximately 30 of the 100 known GPCRs, 60% of all prescription pharmaceuticals have been developed.
GPCRs share a common structural motif. All these receptors have seven sequences of between 22 to 24 hydrophobic amino acids that form seven alpha helices, each of which spans the membrane (each span is identified by number, i.e., transmembrane-1 transmebrane-2 etc.). The transmembrane helices are joined by strands of amino acids between transmembrane-2 and transmembrane-3, transmembrane-4 and transmembrane-5, and transmembrane-6 and transmembrane-7 on the exterior, or "extracellular" side, of the cell membrane (these are referred to as "extracellular" regions 1, 2 and 3 (EC-1, EC-2 and EC-3), respectively). The transmembrane helices are also joined by strands of amino acids between transmembrane-1 and transmembrane-2, transmembrane-3 and transmembrane-4, and and transmembrane-6 on the interior, or "intracellular" side, of the cell membrane (these are referred to as "intracellular" regions 1, 2 and 3 (IC-1, IC-2 and IC-3), respectively). The "carboxy" terminus of the receptor lies in the intracellular space within the cell, and the "amino" terminus of the receptor lies in the extracellular space outside of the cell.
Generally, when an endogenous ligand binds with the receptor (often referred to as "activation" of the receptor), there is a change in the conformation of the intracellular region that allows for coupling between the intracellular region and an intracellular "G-protein." It has been reported that GPCRs are "promiscuous" with respect to G proteins, that a GPCR can interact with more than one G protein. See, Kenakin, 43 Life Sciences 1095 (1988). Although other G proteins exist, currently, Gq, Gs, Gi, Gz and Go are G proteins that have been identified. Endogenous ligandactivated GPCR coupling with the G-protein begins a signaling cascade process (referred to as "signal transduction"). Under normal conditions, signal transduction 25 ultimately results in cellular activation or cellular inhibition. It is thought that the IC-3 loop as well as the carboxy terminus of the receptor interact with the G protein.
Under physiological conditions, GPCRs exist in the cell membrane in equilibrium between two different conformations: an "inactive" state and an "active" state. A receptor in an inactive state is unable to link to the intracellular signaling transduction pathway to produce a biological response. Changing the receptor conformation to the active state allows linkage to the transduction pathway (via the Gprotein) and produces abiological response.
A receptor may be stabilized in an active state by an endogenous ligand or S. a compound such as a drug. Recent discoveries, including but not exclusively limited to modifications to the amino acid sequence of the receptor, provide means other than endogenous ligands or drugs to promote and stabilize the receptor in the active state conformation. These means effectively stabilize the receptor in an active state by simulating the effect of an endogenous ligand binding to the receptor. Stabilization by such ligand-independent means is termed "constitutive receptor activation." SUMMARY OF THE INVENTION Disclosed herein are non-endogenous versions of endogenous, human GPCRs and uses thereof.
In its endogenous form, GPR38 is not constitutively active, i.e. GPR38 signaling via G protein is ligand-dependent. Thus, it is not feasible to search directly for inverse agonists of endogenous GPR38. Accordingly, the present inventors sought to employ a mutation approach to identify a constitutively activated version of GPR38 to permit screening of candidate compounds against the non-endogenous, constitutively activated version of GPR38.
One embodiment of the present invention provides an isolated polynucleotide encoding a non-endogenous, constitutively activated version of a human G protein-coupled receptor (GPCR), in particular a polynucleotide encoding a non-endogenous, constitutively activated version of the human G protein-coupled receptor designated GPR38.
Preferably, the isolated polynucleotide of the present invention comprises a nucleotide sequence selected from the group consisting of: a sequence encoding a polypeptide that comprises the amino acid sequence set forth in SEQ ID NO: 130; S(b) the sequence set forth in SEQ ID NO: 129; a sequence having at least about 80% identity to SEQ ID NO: 129 other than a sequence encoding a non-endogenous, constitutively activated .version of a human G protein-coupled receptor comprising a valine residue at position 297 of SEQ ID NO: 130; and the sequence of wherein the constitutively activated version of a human G protein-coupled receptor comprises an amino acid sequence having a lysine residue at a position equivalent to position 297 of SEQ ID NO: 130.
Even more preferably, the isolated polynucleotide comprises a nucleotide 35 sequence selected from the group consisting of: a a sequence that is identical or substantially identical to SEQ ID NO: 129 wherein the codon at nucleotide positions 889-891 encoding lysine is unchanged or substituted with a codon that encodes an amino acid other than valine; a sequence encoding a constitutively activated version of a human G protein-coupled receptor having an amino acid sequence identical or substantially identical to SEQ ID NO:130 wherein the lysine residue at amino acid position 297 is unchanged or substituted with an amino acid other than valine; and a sequence encoding a variant of a non-endogenous, constitutively activated version of a human G protein-coupled receptor comprising the amino acid sequence set forth in SEQ ID NO: 130 in which the lysine residue at position 297 is substituted for a different amino acid other than valine.
In a particularly preferred embodiment, the isolated polynucleotide of the present invention comprises a nucleotide sequence selected from the group consisting of: a sequence encoding the amino acid sequence set forth in SEQ ID NO: 130; and the nucleotide sequence set forth in SEQ ID NO: 129.
For the purposes of nomenclature, the nucleotide and amino acid sequence set forth in SEQ ID NO: 129 and SEQ ID NO: 130, respectively, relate to the 25 non-endogenous, constitutively activated version of the human G protein-coupled receptor designated GPR38.
A further embodiment of the present invention provides an isolated polynucleotide encoding a GPCR fusion protein, wherein said polynucleotide comprises a nucleotide sequence of an isolated polynucleotide encoding a GPCR, preferably fused to a nucleotide sequence encoding a G protein, in particular a Gs protein.
0 0o
*OS
O* C S O A further embodiment of the present invention provides a vector, such as an expression vector, comprising a nucleotide sequence encoding a GPCR, preferably operably linked to a promoter. In one embodiment, the nucleotide sequence encoding the GPCR is fused to a nucleotide sequence encoding a G protein, in particular a Gs protein such that a GPCR fusion protein is capable of being expressed from said expression vector.
A further embodiment of the present invention provides a recombinant host cell comprising the vector or expression vector of the present invention.
A further embodiment of the present invention provides a method of producing a non-endogenous, constitutively activated version of a human G protein-coupled receptor or a GPCR fusion protein comprising the steps of: transfecting the expression vector of the present invention into a host cell thereby producing a transfected host cell; and culturing the transfected host cell under conditions sufficient to express a non-endogenous, constitutively activated version of a human G protein-coupled receptor or GPCR fusion protein from the expression vector.
In one preferred embodiment, the method further comprises obtaining the transfected host cell and preferably, further comprises obtaining or isolating a membrane fraction from the transfected host cell.
A further embodiment of the present invention provides an isolated membrane of a transfected host cell wherein said isolated membrane comprises a non-endogenous, constitutively activated version of said human G protein-coupled receptor or a GPCR fusion protein encoded by an isolated polynucleotide of the invention.
A further embodiment of the present invention provides an isolated or recombinant non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide comprising an amino acid sequence selected from the group consisting of: a sequence comprising the amino acid sequence set forth in SEQ ID NO: 130; a sequence having at least about 80% identity to SEQ ID NO: 130 other than a sequence comprising a valine residue at position 297 of SEQ ID NO: 130; and a a.
a a a a ooooo the sequence of wherein said sequence comprises a lysine residue at a position equivalent to position 297 of SEQ ID NO: 130.
Preferably, the isolated or recombinant non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide comprises an amino acid sequence selected from the group consisting of: a sequence that is substantially identical to SEQ ID NO:130 wherein the lysine residue at amino acid position 297 is unchanged or substituted with an amino acid other than valine; and a sequence comprising the amino acid sequence set forth in SEQ ID NO: 130 in which the lysine residue at position 297 is substituted for a different amino acid other than valine.
In a particularly preferred embodiment, the isolated or recombinant non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide comprises the amino acid sequence set forth in SEQ ID NO: 130.
A further embodiment of the present invention provides an isolated or recombinant GPCR fusion protein comprising an amino acid sequence of the isolated or recombinant non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide. Preferably, the GPCR fusion protein comprises a G protein, in particular a Gs protein.
A further embodiment of the present invention provides a method of identifying a modulator of a G protein-coupled receptor. In one embodiment, the 25 method comprises the steps of: contacting a candidate compound with a recombinant host cell that expresses the non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide of the invention or an isolated membrane comprising said non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide; and measuring the ability of the compound to inhibit or stimulate functionality of the G protein coupled receptor polypeptide wherein inhibition or stimulation of said functionality indicates that the candidate compound is a S* modulator of the G protein-coupled receptor polypeptide.
In an alternative embodiment, the method of identifying a modulator of a G protein-coupled receptor comprises the steps of: contacting a candidate compound with a recombinant host cell that expresses the GPCR fusion protein of the invention or an isolated membrane comprising said GPCR fusion protein; and measuring the ability of the compound to inhibit or stimulate functionality of the G protein-coupled receptor polypeptide portion of said GPCR fusion protein wherein inhibition or stimulation of said functionality indicates that the candidate compound is a modulator of the G proteincoupled receptor polypeptide.
Preferably, the method of identifying a modulator of a G protein-coupled receptor further comprises providing the host cell or membrane expressing/comprising the GPCR polypeptide or GPCR fusion protein.
In a particularly preferred embodiment, the present invention provides a method of identifying a modulator of a G protein-coupled receptor comprising the steps of: providing a recombinant host cell that expresses a GPCR fusion protein comprising the amino acid sequence set forth in SEQ ID NO: 130 and a Gs protein or an isolated membrane comprising said GPCR fusion protein; contacting a candidate compound with the recombinant host cell or isolated membrane; and measuring the ability of the compound to inhibit or stimulate functionality of the G protein-coupled receptor polypeptide portion of said GPCR 25 fusion protein wherein inhibition or stimulation of said functionality .indicates that the candidate compound is a modulator of the G proteincoupled receptor polypeptide.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a representation of 8XCRE-Luc reporter plasmid (see, Example 4(c)3.) S"FIGS. 2A and 2B are graphic representations of the results of ATP and ADP binding to endogenous TDAG8 (2A) and comparisons in serum and serum free media (2B).
FIG. 3 is a graphic representation of the comparative signaling results of CMV versus the GPCR Fusion Protein H9(F236K):Gsa.
FIG. 4 is a graphic representation showing a comparative analysis of endogenous GPR38 relative to non-endogenous constitutively-activated GPR38 ("V297K"). The control is designated "CMV".
DETAILED DESCRIPTION The scientific literature that has evolved around receptors has adopted a number of terms to refer to ligands having various effects on receptors. For clarity and consistency, the following definitions will be used throughout this patent document. To the extent that these definitions conflict with other definitions for these terms, the following definitions shall control: AGONISTS shall mean materials ligands, candidate compounds) that 0 WO 00/22131 PCT/US99/24065 -6activate the intracellular response when they bind to the receptor, or enhance GTP binding to membranes.
AMINO ACID ABBREVIATIONS used herein are set out in Table A: TABLE A ALANINE ALA A ARGININE ARG R ASPARAGINE ASN N ASPARTIC ACID ASP D CYSTEINE CYS C GLUTAMIC ACID GLU E GLUTAMINE GLN
Q
GLYCINE GLY G HISTIDINE HIS H ISOLEUCINE ILE I LEUCINE LEU L LYSINE LYS K METHIONINE MET M PHENYLALANINE PHE F PROLINE PRO P SERINE SER S THREONINE THR T TRYPTOPHAN TRP W TYROSINE TYR Y VALINE VAL V PARTIAL AGONISTS shall mean materials ligands, candidate compounds) that activate the intracellular response when they bind to the receptor to a lesser degree/extent than do agonists, or enhance GTP binding to membranes to a lesser degree/extent than do agonists.
ANTAGONIST shall mean materials ligands, candidate compounds) that competitively bind to the receptor at the same site as the agonists but which do not activate the intracellular response initiated by the active form of the receptor, and can thereby inhibit the intracellular responses by agonists or partial agonists. ANTAGONISTS do not diminish the baseline intracellular response in the absence of an agonist or partial agonist.
CANDIDATE COMPOUND shall mean a molecule (for example, and not limitation, WO 00/22131 PCT/US99/24065 -7a chemical compound) that is amenable to a screening technique. Preferably, the phrase "candidate compound" does not include compounds which were publicly known to be compounds selected from the group consisting of inverse agonist, agonist or antagonist to a receptor, as previously determined by an indirect identification process ("indirectly identified compound"); more preferably, not including an indirectly identified compound which has previously been determined to have therapeutic efficacy in at least one mammal; and, most preferably, not including an indirectly identified compound which has previously been determined to have therapeutic utility in humans.
COMPOSITION means a material comprising at least one component; a "pharmaceutical composition" is an example of a composition.
COMPOUND EFFICACY shall mean a measurement of the ability of a compound to inhibit or stimulate receptor functionality, as opposed to receptor binding affinity.
Exemplary means of detecting compound efficacy are disclosed in the Example section of this patent document.
CODON shall mean a grouping of three nucleotides (or equivalents to nucleotides) which generally comprise a nucleoside (adenosine guanosine cytidine uridine and thymidine coupled to a phosphate group and which, when translated, encodes an amino acid.
CONSTITUTIVELY ACTIVATED RECEPTOR shall mean a receptor subject to constitutive receptor activation. A constitutively activated receptor can be endogenous or nonendogenous.
CONSTITUTIVE RECEPTOR ACTIVATION shall mean stabilization of a receptor in the active state by means other than binding of the receptor with its endogenous WO 00/22131 PCT/US99/24065 -8ligand or a chemical equivalent thereof.
CONTACT or CONTACTING shall mean bringing at least two moieties together, whether in an in vitro system or an in vivo system.
DIRECTLY IDENTIFYING or DIRECTLY IDENTIFIED, in relationship to the phrase "candidate compound", shall mean the screening of a candidate compound against a constitutively activated receptor, preferably a constitutively activated orphan receptor, and most preferably against a constitutively activated G protein-coupled cell surface orphan receptor, and assessing the compound efficacy of such compound. This phrase is, under no circumstances, to be interpreted or understood to be encompassed by or to encompass the phrase "indirectly identifying" or "indirectly identified." ENDOGENOUS shall mean a material that a mammal naturally produces.
ENDOGENOUS in reference to, for example and not limitation, the term "receptor," shall mean that which is naturally produced by a mammal (for example, and not limitation, a human) or a virus. By contrast, the term NON-ENDOGENOUS in this context shall mean that which is not naturally produced by a mammal (for example, and not limitation, a human) or a virus. For example, and not limitation, a receptor which is not constitutively active in its endogenous form, but when manipulated becomes constitutively active, is most preferably referred to herein as a "non-endogenous, constitutively activated receptor." Both terms can be utilized to describe both "in vivo" and "in vitro" systems. For example, and not limitation, in a screening approach, the endogenous or non-endogenous receptor may be in reference to an in vitro screening system. As a further example and not limitation, where the genome of a mammal has been manipulated to include a non-endogenous constitutively activated receptor, screening of a candidate compound by means of an in vivo system is viable.
WO 00/22131 PCT/US99/24065 -9- G PROTEIN COUPLED RECEPTOR FUSION PROTEIN and GPCR FUSION PROTEIN, in the context of the invention disclosed herein, each mean a non-endogenous protein comprising an endogenous, constitutively activate GPCR or a non-endogenous, constitutively activated GPCR fused to at least one G protein, most preferably the alpha (a) subunit of such G protein (this being the subunit that binds GTP), with the G protein preferably being of the same type as the G protein that naturally couples with endogenous orphan GPCR. For example, and not limitation, in an endogenous state, if the G protein "Gsa" is the predominate G protein that couples with the GPCR, a GPCR Fusion Protein based upon the specific GPCR would be a non-endogenous protein comprising the GPCR fused to Gsa; in some circumstances, as will be set forth below, a non-predominant G protein can be fused to the GPCR. The G protein can be fused directly to the c-terminus of the constitutively active GPCR or there may be spacers between the two.
HOST CELL shall mean a cell capable of having a Plasmid and/or Vector incorporated therein. In the case of a prokaryotic Host Cell, a Plasmid is typically replicated as a autonomous molecule as the Host Cell replicates (generally, the Plasmid is thereafter isolated for introduction into a eukaryotic Host Cell); in the case of a eukaryotic Host Cell, a Plasmid is integrated into the cellular DNA of the Host Cell such that when the eukaryotic Host Cell replicates, the Plasmid replicates. Preferably, for the purposes of the invention disclosed herein, the Host Cell is eukaryotic, more preferably, mammalian, and most preferably selected from the group consisting of 293, 293T and COS-7 cells.
INDIRECTLY IDENTIFYING or INDIRECTLY IDENTIFIED means the traditional approach to the drug discovery process involving identification of an endogenous ligand specific for an endogenous receptor, screening of candidate compounds against the WO 00/22131 PCT/US99/24065 receptor for determination of those which interfere and/or compete with the ligand-receptor interaction, and assessing the efficacy of the compound for affecting at least one second messenger pathway associated with the activated receptor.
INHIBIT or INHIBITING, in relationship to the term "response" shall mean that a response is decreased or prevented in the presence of a compound as opposed to in the absence of the compound.
INVERSE AGONISTS shall mean materials ligand, candidate compound) which bind to either the endogenous form of the receptor or to the constitutively activated form of the receptor, and which inhibit the baseline intracellular response initiated by the active form of the receptor below the normal base level of activity which is observed in the absence of agonists or partial agonists, or decrease GTP binding to membranes. Preferably, the baseline intracellular response is inhibited in the presence of the inverse agonist by at least more preferably by at least 50%, and most preferably by at least 75%, as compared with the baseline response in the absence of the inverse agonist.
KNOWN RECEPTOR shall mean an endogenous receptor for which the endogenous ligand specific for that receptor has been identified.
LIGAND shall mean an endogenous, naturally occurring molecule specific for an endogenous, naturally occurring receptor.
MUTANT or MUTATION in reference to an endogenous receptor's nucleic acid and/or amino acid sequence shall mean a specified change or changes to such endogenous sequences such that a mutated form of an endogenous, non-constitutively activated receptor evidences constitutive activation of the receptor. In terms of equivalents to specific sequences, a subsequent mutated form of a human receptor is considered to be equivalent to WO 00/22131 PCT/US99/24065 11 a first mutation of the human receptor if the level of constitutive activation of the subsequent mutated form of a human receptor is substantially the same as that evidenced by the first mutation of the receptor; and the percent sequence (amino acid and/or nucleic acid) homology between the subsequent mutated form of the receptor and the first mutation of the receptor is at least about 80%, more preferably at least about 90% and most preferably at least 95%. Ideally, and owing to the fact that the most preferred cassettes disclosed herein for achieving constitutive activation includes a single amino acid and/or codon change between the endogenous and the non-endogenous forms of the GPCR, the percent sequence homology should be at least 98%.
NON-ORPHAN RECEPTOR shall mean an endogenous naturally occurring molecule specific for an endogenous naturally occurring ligand wherein the binding of a ligand to a receptor activates an intracellular signaling pathway.
ORPHAN RECEPTOR shall mean an endogenous receptor for which the endogenous ligand specific for that receptor has not been identified or is not known.
PHARMACEUTICAL COMPOSITION shall mean a composition comprising at least one active ingredient, whereby the composition is amenable to investigation for a specified, efficacious outcome in a mammal (for example, and not limitation, a human). Those of ordinary skill in the art will understand and appreciate the techniques appropriate for determining whether an active ingredient has a desired efficacious outcome based upon the needs of the artisan.
PLASMID shall mean the combination of a Vector and cDNA. Generally, a Plasmid is introduced into a Host Cell for the purposes of replication and/or expression of the cDNA as a protein.
WO 00/22131 PCT/US99/24065 -12- STIMULATE or STIMULATING, in relationship to the term "response" shall mean that a response is increased in the presence of a compound as opposed to in the absence of the compound.
VECTOR in reference to cDNA shall mean a circular DNA capable of incorporating at least one cDNA and capable of incorporation into a Host Cell.
The order of the following sections is set forth for presentational efficiency and is not intended, nor should be construed, as a limitation on the disclosure or the claims to follow.
A. Introduction The traditional study of receptors has always proceeded from the a priori assumption (historically based) that the endogenous ligand must first be identified before discovery could proceed to find antagonists and other molecules that could affect the receptor. Even in cases where an antagonist might have been known first, the search immediately extended to looking for the endogenous ligand. This mode of thinking has persisted in receptor research even after the discovery of constitutively activated receptors. What has not been heretofore recognized is that it is the active state of the receptor that is most useful for discovering agonists, partial agonists, and inverse agonists of the receptor. For those diseases which result from an overly active receptor or an under-active receptor, what is desired in a therapeutic drug is a compound which acts to diminish the active state of a receptor or enhance the activity of the receptor, respectively, not necessarily a drug which is an antagonist to the endogenous ligand.
This is because a compound that reduces or enhances the activity of the active receptor state need not bind at the same site as the endogenous ligand. Thus, as taught by a method of this invention, any search for therapeutic compounds should start by screening compounds against the ligand-independent active state.
WO 00/22131 PCT/US99/24065 -13- B. Identification of Human GPCRs The efforts of the Human Genome project has led to the identification of a plethora of information regarding nucleic acid sequences located within the human genome; it has been the case in this endeavor that genetic sequence information has been made available without an understanding or recognition as to whether or not any particular genomic sequence does or may contain open-reading frame information that translate human proteins. Several methods of identifying nucleic acid sequences within the human genome are within the purview of those having ordinary skill in the art. For example, and not limitation, a variety of human GPCRs, disclosed herein, were discovered by reviewing the GenBank T M database, while other GPCRs were discovered by utilizing a nucleic acid sequence of a GPCR, previously sequenced, to conduct a BLASTM search of the EST database. Table B, below, lists several endogenous GPCRs that we have discovered, along with a GPCR's respective homologous receptor.
TABLE B Disclosed Accession Open Reading Per Cent Reference To Human Number Frame Homology Homologous Orphan Identified (Base Pairs) To Designated GPCR GPCRs GPCR (Accession No.) hARE-3 AL033379 1,260 bp 52.3% LPA-R U92642 hARE-4 AC006087 1,119 bp 36% P2Y5 AF000546 AC006255 1,104 bp 32% Oryzias D43633 latipes hGPR27 AA775870 1,128 bp hARE-1 A1090920 999 bp 43% D13626 KIAA0001 hARE-2 AA359504 1,122 bp 53% GPR27 hPPR1 H67224 1,053 bp 39% EBI1 L31581 hG2A AA754702 1,113 bp 31% GPR4 L36148 WO 00/22131 PCT/US99/24065 -14hRUP3 AL035423 1,005 bp 30% 2133653 Drosophila melanogaster hRUP4 AI307658 1,296 bp 32% pNPGPR NP 004876 28% and 29 AAC41276 Zebrafish Ya and and Yb, AAB94616 respectively hRUPS AC005849 1,413 bp 25% DEZ Q99788 23% FMLPR P21462 hRUP6 AC005871 1,245 bp 48% GPR66 NP_006047 hRUP7 AC007922 1,173 bp 43% H3R AF140538 hCHN3 EST 36581 1,113 bp 53% GPR27 hCHN4 AA804531 1,077 bp 32% thrombin 4503637 hCHN6 EST 2134670 1,503 bp 36% edg-1 NP_001391 hCHN8 EST 764455 1,029 bp 47% D13626 KIAA0001 hCHN9 EST 1541536 1,077 bp 41% LTB4R NM_000752 EST 1365839 1,055 bp 35% P2Y NM_002563 Receptor homology is useful in terms of gaining an appreciation of a role of the receptors within the human body. As the patent document progresses, we will disclose techniques for mutating these receptors to establish non-endogenous, constitutively activated versions of these receptors.
The techniques disclosed herein have also been applied to other human, orphan GPCRs known to the art, as will be apparent as the patent document progresses.
C. Receptor Screening Screening candidate compounds against a non-endogenous, constitutively activated version of the human GPCRs disclosed herein allows for the direct identification of candidate compounds which act at this cell surface receptor, without requiring use of the receptor's endogenous ligand. By determining areas within the body where the endogenous version of human GPCRs disclosed herein is expressed and/or over-expressed, it is possible to determine related disease/disorder states which are associated with the expression and/or over-expression WO 00/22131 PCT/US99/24065 of the receptor; such an approach is disclosed in this patent document.
With respect to creation of a mutation that may evidence constitutive activation of the human GPCR disclosed herein is based upon the distance from the proline residue at which is presumed to be located within TM6 of the GPCR; this algorithmic technique is disclosed in co-pending and commonly assigned patent document U.S. Serial Number 09/170,496, incorporated herein by reference. The algorithmic technique is not predicated upon traditional sequence "alignment" but rather a specified distance from the aforementioned TM6 proline residue. By mutating the amino acid residue located 16 amino acid residues from this residue (presumably located in the IC3 region of the receptor) to, most preferably, a lysine residue, such activation may be obtained. Other amino acid residues may be useful in the mutation at this position to achieve this objective.
D. Disease/Disorder Identification and/or Selection As will be set forth in greater detail below, most preferably inverse agonists to the non-endogenous, constitutively activated GPCR can be identified by the methodologies of this invention. Such inverse agonists are ideal candidates as lead compounds in drug discovery programs for treating diseases related to this receptor. Because of the ability to directly identify inverse agonists to the GPCR, thereby allowing for the development of pharmaceutical compositions, a search for diseases and disorders associated with the GPCR is relevant. For example, scanning both diseased and normal tissue samples for the presence of the GPCR now becomes more than an academic exercise or one which might be pursued along the path of identifying an endogenous ligand to the specific GPCR. Tissue scans can be conducted across a broad range of healthy and diseased tissues. Such tissue scans provide a preferred first step in associating a specific receptor with a disease and/or disorder. See, for WO 00/22131 PCT/US99/24065 -16example, co-pending application (docket number ARE-0050) for exemplary dot-blot and RT- PCR results of several of the GPCRs disclosed herein.
Preferably, the DNA sequence of the human GPCR is used to make a probe for (a) dot-blot analysis against tissue-mRNA, and/or RT-PCR identification of the expression of the receptor in tissue samples. The presence of a receptor in a tissue source, or a diseased tissue, or the presence of the receptor at elevated concentrations in diseased tissue compared to a normal tissue, can be preferably utilized to identify a correlation with a treatment regimen, including but not limited to, a disease associated with that disease.
Receptors can equally well be localized to regions of organs by this technique. Based on the known functions of the specific tissues to which the receptor is localized, the putative functional role of the receptor can be deduced.
E. Screening of Candidate Compounds 1. Generic GPCR screening assay techniques When a G protein receptor becomes constitutively active, it binds to a G protein Gq, Gs, Gi, Gz, Go) and stimulates the binding of GTP to the G protein. The G protein then acts as a GTPase and slowly hydrolyzes the GTP to GDP, whereby the receptor, under normal conditions, becomes deactivated. However, constitutively activated receptors continue to exchange GDP to GTP. A non-hydrolyzable analog of GTP, 3 S]GTPyS, can be used to monitor enhanced binding to membranes which express constitutively activated receptors.
It is reported that 5 S]GTPyS can be used to monitor G protein coupling to membranes in the absence and presence ofligand. An example of this monitoring, among other examples wellknown and available to those in the art, was reported by Traynor and Nahorski in 1995. The preferred use of this assay system is for initial screening of candidate compounds because the WO 00/22131 PCT/US99/24065 -17system is generically applicable to all G protein-coupled receptors regardless of the particular G protein that interacts with the intracellular domain of the receptor.
2. Specific GPCR screening assay techniques Once candidate compounds are identified using the "generic" G protein-coupled receptor assay an assay to select compounds that are agonists, partial agonists, or inverse agonists), further screening to confirm that the compounds have interacted at the receptor site is preferred. For example, a compound identified by the "generic" assay may not bind to the receptor, but may instead merely "uncouple" the G protein from the intracellular domain.
a. Gs, Gz and Gi.
Gs stimulates the enzyme adenylyl cyclase. Gi (and Gz and Go), on the other hand, inhibit this enzyme. Adenylyl cyclase catalyzes the conversion of ATP to cAMP; thus, constitutively activated GPCRs that couple the Gs protein are associated with increased cellular levels of cAMP. On the other hand, constitutively activated GPCRs that couple Gi (or Gz, Go) protein are associated with decreased cellular levels of cAMP. See, generally, "Indirect Mechanisms of Synaptic Transmission," Chpt. 8, From Neuron To Brain 3 r d Ed.) Nichols, J.G. et al eds. Sinauer Associates, Inc. (1992). Thus, assays that detect cAMP can be utilized to determine if a candidate compound is, an inverse agonist to the receptor such a compound would decrease the levels of cAMP). A variety of approaches known in the art for measuring cAMP can be utilized; a most preferred approach relies upon the use of anti-cAMP antibodies in an ELISA-based format. Another type of assay that can be utilized is a whole cell second messenger reporter system assay. Promoters on genes drive the expression of the proteins that a particular gene encodes. Cyclic AMP drives gene expression by promoting the binding of a cAMP-responsive DNA binding protein or WO 00/22131 PCT/US99/24065 -18transcription factor (CREB) that then binds to the promoter at specific sites called cAMP response elements and drives the expression of the gene. Reporter systems can be constructed which have a promoter containing multiple cAMP response elements before the reporter gene, p-galactosidase or luciferase. Thus, a constitutively activated Gs-linked receptor causes the accumulation of cAMP that then activates the gene and expression of the reporter protein.
The reporter protein such as P-galactosidase or luciferase can then be detected using standard biochemical assays (Chen et al. 1995).
b. Go and Gq.
Gq and Go are associated with activation of the enzyme phospholipase C, which in turn hydrolyzes the phospholipid PIP 2 releasing two intracellular messengers: diacycloglycerol (DAG) and inistol 1,4,5-triphoisphate (IP 3 Increased accumulation of IP 3 is associated with activation of Gq- and Go-associated receptors. See, generally, "Indirect Mechanisms of Synaptic Transmission," Chpt. 8, From Neuron To Brain 3 rd Ed.) Nichols, J.G. et al eds. Sinauer Associates, Inc. (1992). Assays that detect IP 3 accumulation can be utilized to determine if a candidate compound is, an inverse agonist to a Gq- or Goassociated receptor such a compound would decrease the levels of IP 3 Gq-associated receptors can also been examined using an AP1 reporter assay in that Gq-dependent phospholipase C causes activation of genes containing API elements; thus, activated Gqassociated receptors will evidence an increase in the expression of such genes, whereby inverse agonists thereto will evidence a decrease in such expression, and agonists will evidence an increase in such expression. Commercially available assays for such detection are available.
WO 00/22131 PCT/US99/24065 -19- 3. GPCR Fusion Protein The use of an endogenous, constitutively activate orphan GPCR or a non-endogenous, constitutively activated orphan GPCR, for use in screening of candidate compounds for the direct identification of inverse agonists, agonists and partial agonists provide an interesting screening challenge in that, by definition, the receptor is active even in the absence of an endogenous ligand bound thereto. Thus, in order to differentiate between, the nonendogenous receptor in the presence of a candidate compound and the non-endogenous receptor in the absence of that compound, with an aim of such a differentiation to allow for an understanding as to whether such compound may be an inverse agonist, agonist, partial agonist or have no affect on such a receptor, it is preferred that an approach be utilized that can enhance such differentiation. A preferred approach is the use of a GPCR Fusion Protein.
Generally, once it is determined that a non-endogenous orphan GPCR has been constitutively activated using the assay techniques set forth above (as well as others), it is possible to determine the predominant G protein that couples with the endogenous GPCR.
Coupling of the G protein to the GPCR provides a signaling pathway that can be assessed.
Because it is most preferred that screening take place by use of a mammalian expression system, such a system will be expected to have endogenous G protein therein. Thus, by definition, in such a system, the non-endogenous, constitutively activated orphan GPCR will continuously signal. In this regard, it is preferred that this signal be enhanced such that in the presence of, an inverse agonist to the receptor, it is more likely that it will be able to more readily differentiate, particularly in the context of screening, between the receptor when it is contacted with the inverse agonist.
The GPCR Fusion Protein is intended to enhance the efficacy of G protein coupling WO 00/22131 PCT/US99/24065 with the non-endogenous GPCR. The GPCR Fusion Protein is preferred for screening with a non-endogenous, constitutively activated GPCR because such an approach increases the signal that is most preferably utilized in such screening techniques. This is important in facilitating a significant "signal to noise" ratio; such a significant ratio is import preferred for the screening of candidate compounds as disclosed herein.
The construction of a construct useful for expression of a GPCR Fusion Protein is within the purview of those having ordinary skill in the art. Commercially available expression vectors and systems offer a variety of approaches that can fit the particular needs of an investigator. The criteria of importance for such a GPCR Fusion Protein construct is that the endogenous GPCR sequence and the G protein sequence both be in-frame (preferably, the sequence for the endogenous GPCR is upstream of the G protein sequence) and that the "stop" codon of the GPCR must be deleted or replaced such that upon expression of the GPCR, the G protein can also be expressed. The GPCR can be linked directly to the G protein, or there can be spacer residues between the two (preferably, no more than about 12.
although this number can be readily ascertained by one of ordinary skill in the art). We have a preference (based upon convenience) of use of a spacer in that some restriction sites that are not used will, effectively, upon expression, become a spacer. Most preferably, the G protein that couples to the non-endogenous GPCR will have been identified prior to the creation of the GPCR Fusion Protein construct. Because there are only a few G proteins that have been identified, it is preferred that a construct comprising the sequence of the G protein a universal G protein construct) be available for insertion of an endogenous GPCR sequence therein; this provides for efficiency in the context of large-scale screening of a variety of different endogenous GPCRs having different sequences.
WO 00/22131 PCTIUS99/24065 -21- As noted above, constitutively activated GPCRs that couple to Gi, Gz and Go are expected to inhibit the formation of cAMP making assays based upon these types of GPCRs challenging the cAMP signal decreases upon activation thus making the direct identification of, e.g, inverse agonists (which would further decrease this signal), interesting).
As will be disclosed herein, we have ascertained that for these types of receptors, it is possible to create a GPCR Fusion Protein that is not based upon the endogenous GPCR's endogenous G protein, in an effort to establish a viable cyclase-based assay. Thus, for example, a Gz coupled receptor such as H9, a GPCR Fusion Protein can be established that utilizes a Gs fusion protein we believe that such a fusion construct, upon expression, "drives" or "forces" the non-endogenous GPCR to couple with, Gs rather than the "natural" Gz protein, such that a cyclase-based assay can be established. Thus, for Gi, Gz and Go coupled receptors, we prefer that that when a GPCR Fusion Protein is used and the assay is based upon detection of adenyl cyclase activity, that the fusion construct be established with Gs (or an equivalent G protein that stimulates the formation of the enzyme adenylyl cyclase).
F. Medicinal Chemistry Generally, but not always, direct identification of candidate compounds is preferably conducted in conjunction with compounds generated via combinatorial chemistry techniques, whereby thousands of compounds are randomly prepared for such analysis. Generally, the results of such screening will be compounds having unique core structures; thereafter, these compounds are preferably subjected to additional chemical modification around a preferred core structure(s) to further enhance the medicinal properties thereof. Such techniques are known to those in the art and will not be addressed in detail in this patent document.
WO 00/22131 PCT/US99/24065 -22- G. Pharmaceutical compositions Candidate compounds selected for further development can be formulated into pharmaceutical compositions using techniques well known to those in the art. Suitable pharmaceutically-acceptable carriers are available to those in the art; for example, see Remington's Pharmaceutical Sciences, 16 th Edition, 1980, Mack Publishing Co., (Oslo et al., eds.) H. Other Utility Although a preferred use of the non-endogenous versions the human GPCRs disclosed herein may be for the direct identification of candidate compounds as inverse agonists, agonists or partial agonists (preferably for use as pharmaceutical agents), these versions of human GPCRs can also be utilized in research settings. For example, in vitro and in vivo systems incorporating GPCRs can be utilized to further elucidate and understand the roles these receptors play in the human condition, both normal and diseased, as well as understanding the role of constitutive activation as it applies to understanding the signaling cascade. The value in non-endogenous human GPCRs is that their utility as a research tool is enhanced in that, because of their unique features, non-endogenous human GPCRs can be used to understand the role of these receptors in the human body before the endogenous ligand therefor is identified. Other uses of the disclosed receptors will become apparent to those in the art based upon, inter alia, a review of this patent document.
EXAMPLES
The following examples are presented for purposes of elucidation, and not limitation, of the present invention. While specific nucleic acid and amino acid sequences are disclosed herein, those of ordinary skill in the art are credited with the ability to make minor WO 00/22131 PCT/US99/24065 -23modifications to these sequences while achieving the same or substantially similar results reported below. The traditional approach to application or understanding of sequence cassettes from one sequence to another from rat receptor to human receptor or from human receptor A to human receptor B) is generally predicated upon sequence alignment techniques whereby the sequences are aligned in an effort to determine areas of commonality.
The mutational approach disclosed herein does not rely upon this approach but is instead based upon an algorithmic approach and a positional distance from a conserved proline residue located within the TM6 region of human GPCRs. Once this approach is secured, those in the art are credited with the ability to make minor modifications thereto to achieve substantially the same results constitutive activation) disclosed herein. Such modified approaches are considered within the purview of this disclosure Example 1 ENDOGENOUS HUMAN GPCRS 1. Identification of Human GPCRs Certain of the disclosed endogenous human GPCRs were identified based upon a review of the GenBankTM database information. While searching the database, the following cDNA clones were identified as evidenced below (Table C).
TABLE C Disclosed Accession Complete DNA Open Reading Nucleic Amino Human Number Sequence Frame Acid Acid Orphan (Base Pairs) (Base Pairs) SEQ.ID. SEQ.ID.
GPCRs NO. NO.
hARE-3 AL033379 111,389 bp 1,260 bp 1 2 hARE-4 AC006087 226,925 bp 1,119 bp 3 4 hARE-5 AC006255 127,605 bp 1,104 bp 5 6 hRUP3 AL035423 140,094 bp 1,005 bp 7 8 WO 00/22131 WO 0022131PCTIUS99/24065 -24 AC005849 hRUP6 AC005871 hRUP7 AC007922 169,144 bp 218,807 bp 158,858 bp 1,413 bp 1,245 bp 1,173 bp Other disclosed endogenous human GPCRs were identified by conducting a BLAST
TM
search of EST database (dbest) using the following EST clones as query sequences. The following EST clones identified were then used as a probe to screen a human genomic library (Table D).
TABLE D Disclosed Human Orphan GPCRs hGPCR27 hARE-I Query EST Clone/ (Sequence) Accession No.
Identified Mouse GPCR27
TDAG
hARE-2 GPCR27 hPPR1 hG2A hCHN3 hCHN4 hCHN6 hCHN8 hCHN 9 hCHN1O hRUP4 Bovine
PPRI
Mouse 1179426
N.A.
TDAG
N.A.
KIAAOOO 1 1365839 Mouse EST 1365839
N.A.
AA775870 1689643 A1090920 68530 AA359504 238667 H67224 See Example 2(a), below EST 36581 (full length) 1184934 AA804531 EST 2134670 (full length) EST 764455 EST 1541536 Human 1365839 A1307658 Open Reading Frame (Base Pairs) 1,125 bp 999 bp 1, 122 bp 1,053 bp 1,113 bp 1.113 bp 1,077 bp 1,503 bp 1,029 bp 1,077 bp 1,005 bp 1,296 bp Nucleic Acid
SEQ.ID.NO.
17 19 21 23 25 27 29 31 33 35 37 39 Amino Acid
SEQ.ID.NO.
18 22 24 26 28 32 34 36 38 N.A. "not applicable".
2. Full Length Cloning a. Human G2A Mouse EST clone 1179426 was used to obtain a human genomic clone containing all WO 00/22131 PCT/US99/24065 but three amino acid G2A coding sequences. The 5'of this coding sequence was obtained by using 5'RACE, and the template for PCR was Clontech's Human Spleen Marathon-ReadyTM cDNA. The disclosed human G2A was amplified by PCR using the G2A cDNA specific primers for the first and second round PCR as shown in SEQ.ID.NO.: 41 and SEQ.ID.NO.:42 as follows: 5'-CTGTGTACAGCAGTTCGCAGAGTG-3' (SEQ.ID.NO.: 41; 1" s round PCR) 5'-GAGTGCCAGGCAGAGCAGGTAGAC-3' (SEQ.ID.NO.: 42; second round PCR).
PCR was performed using Advantage GC Polymerase Kit (Clontech; manufacturing instructions will be followed), at 94 0 C for 30 sec followed by 5 cycles of 94°C for 5 sec and 72°C for 4 min; and 30 cycles of 940 for 5 sec and 700 for 4 min. An approximate 1.3 Kb PCR fragment was purified from agarose gel, digested with Hind III and Xba I and cloned into the expression vector pRC/CMV2 (Invitrogen). The cloned-insert was sequenced using the T7 Sequenase T M kit (USB Amersham; manufacturer instructions followed) and the sequence was compared with the presented sequence. Expression of the human G2A was detected by probing an RNA dot blot (Clontech; manufacturer instructions followed) with the P 3 2 -labeled fragment.
b. CHN9 Sequencing of the EST clone 1541536 showed CHN9 to be a partial cDNA clone having only an initiation codon; the termination codon was missing. When CHN9 was used to blast against data base the 3' sequence of CHN9 was 100% homologous to the 5' untranslated region of the leukotriene B4 receptor cDNA, which contained a termination codon in the frame with CHN9 coding sequence. To determine whether the untranslated region of LTB4R cDNA was the 3' sequence of CHN9, PCR was performed using primers based upon the 5' sequence flanking the initiation codon found in CHN9 and WO 00/22131 PCTIUS99/24065 -26the 3' sequence around the termination codon found in the LTB4R 5' untranslated region.
The 5' primer sequence utilized was as follows: 5'-CCCGAATTCCTGCTTGCTCCCAGCTTGGCCC-3' (SEQ.ID.NO.: 43; sense) and 5'-TGTGGATCCTGCTGTCAAAGGTCCCATTCCGG-3' (SEQ.ID.NO.: 44; antisense).
PCR was performed using thymus cDNA as a template and rTth polymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 uM of each primer, and 0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of 94°C for 1 min, 65°C for Imin and 72 °C for 1 min and 10 sec. A 1.1kb fragment consistent with the predicted size was obtained from PCR. This PCR fragment was subcloned into pCMV (see below) and sequenced (see, SEQ.ID.NO.: c. RUP4 The full length RUP4 was cloned by RT-PCR with human brain cDNA (Clontech) as templates: 5'-TCACAATGCTAGGTGTGGTC-3' (SEQ.ID.NO.: 45; sense) and 5'-TGCATAGACAATGGGATTACAG-3' (SEQ.ID.NO.: 46; antisense).
PCR was performed using TaqPlus PrecisionTM polymerase (Stratagene; manufacturing instructions followed) by the following cycles: 94 C for 2 min; 94 °C 30 sec; 55 °C for 30 sec, 72°C for 45 sec, and 72°C for 10 min. Cycles 2 through 4 were repeated 30 times.
The PCR products were separated on a 1% agarose gel and a 500 bp PCR fragment was isolated and cloned into the pCRII-TOPOTM vector (Invitrogen) and sequenced using the T7 DNA Sequenase T M kit (Amsham) and the SP6/T7 primers (Stratagene). Sequence analysis revealed that the PCR fragment was indeed an alternatively spliced form ofAI307658 having a continuous open reading frame with similarity to other GPCRs. The completed sequence of this PCR fragment was as follows: WO 00/22131 WO 0022131PCTIUS99/24065 -27-
-TCACAATGCTAGGTGTGGTCTGGCTGGTGGCAGTCATCGTAGGATCACCCATGTGGCAC
GTGCAACAACTTGAGATCAAATATGACTTCCTATATGAAAAGGAACACATCTGCTGCTTAAGA
GTGGACCAGCCCTGTGCACCAGAAGATCTACACCACCTTCATCCTTGTCATCCTCTTCCTCCTGC
CTCTTATGGTGATGCTTATTCTGTACGTAAAATTGGTTATGAACTTTGGATAAAGAAAAGAGTT
GGGGATGGTTCAGTGCTTCGAACTATTCATGGAAAAGAAATGTGCAAAATAGCCAGGAAGAAG
AAACGAGCTGTCATTATGATGGTGACAGTGGTGG CTCTCTTTGCTGTGTGCTGGGGACCATTCC
ATGTTGTCCATATGATGATTGAATACAGTAATTTTGAAAAGGAATATGATGATGTCACAATCAA
GATGATFITIGCTATCGTGCAAATFATGGA'ITICCAACTCCATCTGTAATCCCATrGTCTATGCA- 3' (SEQ.ID.NO.: 47) Based on the above sequence, two sense oligonucleotide primer sets: 5'-CTGCTITAGAAGAGTGGACCAG-3' (SEQ.ID.NO.: 48; oligo 1), '-CTGTGCACCAGAAGATCTACAC-3' (SEQ.IDNO.: 49; oligo 2) and two antisense oligonucleotide primer sets: 5'-CAAGGATGAAGGTGGTGTAGA-3' (SEQ.ID.NO.: 50; oligo 3) 5'-GTGTAGATCTTCTGGTGCACAGG-3' (SEQ.JD.NO.: 5 1; oligo 4) were used for and 5'-RACE PCR with a human brain Marathon-ReadyTM cDNA (Clontech. Cat# 7400-1) as template, according to manufacture's instructions. DNA fragments generated by the RLACE PCR were cloned into the pCRII-TOPOTM vector (Invitrogen) and sequenced using the SP6/T7 primers (Stratagene) and some internal primers.
The 3' RACE product contained a poly(A) tail and a completed open reading frame ending at a TAA stop codon. The 5' RACE product contained an incomplete 5' end; the ATG initiation codon was not present.
Based on the new 5' sequence, oligo 3 and the following primer: (SEQ.ID.NO.: 52; oligo were used for the second round of 5' race PCR and the PCR products were analyzed as above.
A third round of 5 race PCR was carried out utilizing antisense primers: 5'-TGGAGCATGGTGACGGGAATGCAGAAG-3' (SEQ.ID.NO.: 53: oligo 6) and 5'-GTGATGAGCAGGTCACTGAGCGCCAAG-3' (SEQ.ID.NO.: 54:. oligO7).
The sequence of the 5' RACE PCR products revealed the presence of the initiation codon WO 00/22131 PCTIUS99/24065 -28- ATG, and further round of 5' race PCR did not generate any more 5' sequence. The completed 5' sequence was confirmed by RT-PCR using sense primer 5'-GCAATGCAGGCGCTTAACATTAC-3' (SEQ.ID.NO.: 55; oligo 8) and oligo 4 as primers and sequence analysis of the 650 bp PCR product generated from human brain and heart cDNA templates (Clontech, Cat# 7404-1). The completed 3' sequence was confirmed by RT-PCR using oligo 2 and the following antisense primer: 5'-TTGGGTTACAATCTGAAGGGCA-3' (SEQ.ID.NO.:56; oligo 9) and sequence analysis of the 670 bp PCR product generated from human brain and heart cDNA templates. (Clontech, Cat# 7404-1).
d. The full length RUP5 was cloned by RT-PCR using a sense primer upstream from ATG, the initiation codon (SEQ.ID.NO.:57), and an antisense primer containing TCA as the stop codon (SEQ.ID.NO.:58), which had the following sequences: 5'-ACTCCGTGTCCAGCAGGACTCTG-3' (SEQ.ID.NO.: 57) 5'-TGCGTGTTCCTGGACCCTCACGTG-3' (SEQ.ID.NO.: 58) and human peripheral leukocyte cDNA (Clontech) as a template. Advantage T M cDNA polymerase (Clontech) was used for the amplification in a 50ul reaction by the following cycle with step 2 through step 4 repeated 30 times: 94 0 C for 30 sec; 940 for 15 sec; 690 for 40 sec; 72 0 C for 3 min; and 72 0 C fro 6 min. A 1.4kb PCR fragment was isolated and cloned with the pCRII-TOPOTM vector (Invitrogen) and completely sequenced using the T7 DNA Sequenase T M kit (Amsham). See, SEQ.ID.NO.: 9.
e. RUP6 The full length RUP6 was cloned by RT-PCR using primers: 5'-CAGGCCTTGGATTTTAATGTCAGGGATGG-3' (SEQ.ID.NO.: 59) and WO 00/22131 PCT/US99/24065 -29- 5'-GGAGAGTCAGCTCTGAAAGAATTCAGG-3' (SEQ.ID.NO.: and human thymus Marathon-Ready T M cDNA (Clontech) as a template. Advantage cDNA polymerase (Clontech, according to manufacturer's instructions) was used for the amplification in a 50ul reaction by the following cycle: 94°C for 30sec; 94°C for 5 sec; 66°C for 40sec; 72°C for 2.5 sec and 72°C for 7 min. Cycles 2 through 4 were repeated 30 times.
A 1.3 Kb PCR fragment was isolated and cloned into the pCRII-TOPOTM vector (Invitrogen) and completely sequenced (see, SEQ.ID.NO.: 11) using the ABI Big Dye TerminatorTM kit Biosystem).
f. RUP7 The full length RUP7 was cloned by RT-PCR using primers: 5'-TGATGTGATGCCAGATACTAATAGCAC-3' (SEQ.ID.NO.: 61; sense) and 5'-CCTGATTCATTTAGGTGAGATTGAGAC-3' (SEQ.ID.NO.: 62; antisense) and human peripheral leukocyte cDNA (Clontech) as a template. Advantage T M cDNA polymerase (Clontech) was used for the amplification in a 50 ul reaction by the following cycle with step 2 to step 4 repeated 30 times: 94 0 C for 2 minutes; 94°C for 15 seconds; for 20 seconds; 72°C for 2 minutes; 72°C for 10 minutes. A 1.25 Kb PCR fragment was isolated and cloned into the pCRII-TOPOTM vector (Invitrogen) and completely sequenced using the ABI Big Dye TerminatorTM kit Biosystem). See, SEQ.ID.NO.: 13.
3. Angiotensin II Type 1 Receptor ("AT1") The endogenous human angiotensin II type 1 receptor ("AT1 was obtained by PCR using genomic DNA as template and rTth polymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 pM of each primer, and 0.2 mM of each 4 nucleotides.
The cycle condition was 30 cycles of 94°C for 1 min, 55°C for Imin and 72 C for 1.5 min.
The 5' PCR primer contains a HindIII site with the sequence: WO 00/22131 PCTIUS99/24065 5'-CCCAAGCTTCCCCAGGTGTATTTGAT-3' (SEQ.ID.NO.: 63) and the 3' primer contains a BamHI site with the following sequence: 5'-GTTGGATCCACATAATGCATTTTCTC-3' (SEQ.ID.NO.: 64).
The resulting 1.3 kb PCR fragment was digested with HindIII and BamHI and cloned into HindIII-BamHI site of pCMV expression vector. The cDNA clone was fully sequenced.
Nucleic acid (SEQ.ID.NO.: 65) and amino acid (SEQ.ID.NO.: 66) sequences for human AT1 were thereafter determined and verified.
4. GPR38 To obtain GPR38, PCR was performed by combining two PCR fragments, using human genomic cDNA as template and rTth poymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25uM of each primer, and 0.2 mM of each 4 nucleotides.
The cycle condition for each PCR reaction was 30 cycles of94°C for 1 min, 62°C for 1min and 72 0 C for 2 min.
The first fragment was amplified with the 5' PCR primer that contained an end site with the following sequence: 5'-ACCATGGGCAGCCCCTGGAACGGCAGC-3' (SEQ.ID.NO.:67) and a 3' primer having the following sequence: 5'-AGAACCACCACCAGCAGGACGCGGACGGTCTGCCGGTGG-3' (SEQ.ID.NO.:68).
The second PCR fragment was amplified with a 5' primer having the following sequence: 5'-GTCCGCGTCCTGCTGGTGGTGGTTCTGGCATTTATAATT- 3 (SEQ.ID.NO.: 69) and a 3' primer that contained a BamHI site and having the following sequence: 5'-CCTGGATCCTTATCCCATCGTCTTCACGTTAGC-3' (SEQ.ID.NO.: The two fragments were used as templates to amplify GPR38, using SEQ.ID.NO.: 67 and SEQ.ID.NO.: 70 as primers (using the above-noted cycle conditions). The resulting 1.44kb WO 00/22131 PCT/US99/24065 -31- PCR fragment was digested with BamHI and cloned into Blunt-BamHI site of pCMV expression vector.
MC4 To obtain MC4, PCR was performed using human genomic cDNA as template and rTth poymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25uM of each primer, and 0.2 mM of each 4 nucleotides. The cycle condition for each PCR reaction was 30 cycles of 94°C for 1 min, 54 C for Imin and 72 0 C for 1.5 min.
The 5' PCR contained an EcoRI site with the sequence: 5'-CTGGAATTCTCCTGCCAGCATGGTGA-3' (SEQ.ID.NO.: 71) and the 3' primer contained a BamHI site with the sequence: 5'-GCAGGATCCTATATTGCGTGCTCTGTCCCC'-3 (SEQ.ID.NO.: 72).
The 1.0 kb PCR fragment was digest with EcoRI and BamHI and cloned into EcoRI-BamHI site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 73) and amino acid (SEQ.ID.NO.: 74) sequences for human MC4 were thereafter determined.
6. CCKB To obtain CCKB, PCR was performed using human stomach cDNA as template and rTth poymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25uM of each primer, and 0.2 mM of each 4 nucleotides. The cycle condition for each PCR reaction was 30 cycles of 94°C for 1 min, 65 C for 1min and 72 0 C for 1 min and 30 sec.
The 5' PCR contained a HindIII site with the sequence: 5'-CCGAAGCTTCGAGCTGAGTAAGGCGGCGGGCT-3' (SEQ.ID.NO.: and the 3' primer contained an EcoRI site with the sequence: 5'-GTGGAATTCATTTGCCCTGCCTCAACCCCCA-3 (SEQ.ID.NO.: 76).
The resulting 1.44 kb PCR fragment was digest with HindIII and EcoRI and cloned into WO 00/22131 PCTIUS99/24065 -32- HindIII-EcoRI site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 77) and amino acid (SEQ.ID.NO.: 78) sequences for human CCKB were thereafter determined.
7. TDAG8 To obtain TDAG8, PCR was performed using genomic DNA as template and rTth polymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 pM of each primer, and 0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of 94°C for 1 min, 56°C for 1min and 72 oC for 1 min and 20 sec. The 5' PCR primer contained a HindIII site with the following sequence: 5'-TGCAAGCTTAAAAAGGAAAAAATGAACAGC-3' (SEQ.ID.NO.: 79) and the 3' primer contained a BamHI site with the following sequence: (SEQ.ID.NO.: The resulting 1.1 kb PCR fragment was digested with HindIII and BamHI and cloned into HindIII-BamHI site ofpCMV expression vector. Three resulting clones sequenced contained three potential polymorphisms involving changes of amino acid 43 from Pro to Ala, amino acid 97 from Lys to Asn and amino acid 130 from Ile to Phe. Nucleic acid (SEQ.ID.NO.: 81) and amino acid (SEQ.ID.NO.: 82) sequences for human TDAG8 were thereafter determined.
8. H9 To obtain H9, PCR was performed using pituitary cDNA as template and rTth polymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 jtM of each primer, and 0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of 94°C for 1 min, 62 0 C for 1 min and 72 0 C for 2 min. The 5' PCR primer contained a HindIII site with the following sequence: 5'-GGAAAGCTTAACGATCCCCAGGAGCAACAT-3' and the 3' primer contained a BamHI site with the following sequence: 33 5'-CTGGGATCCTACGAGAGCATTTTTCACACAG-3' (SEQ.ID.NO.:16).
The resulting 1.9 kb PCR fragment was digested with HindIII and BamHI and cloned into HindIII-BamHI site of pCMV expression vector. H9 contained three potential polymorphisms involving changes of amino acid P320S, S493N and amino acid G448A. Nucleic acid (SEQ.ID.NO.: 139) and amino acid (SEQ.ID.NO.: 140) sequences for human H9 were thereafter determined and verified.
Example 2 Preparation of Non-endogenous, Constitutively Activated GPCRs Those skilled in the art are credited with the ability to select techniques for mutation of a nucleic acid sequence. Presented below are approaches utilized to create non-endogenous versions of several of the human GPCRs disclosed above. The mutations disclosed below are based upon an algorithmic approach whereby the 16amino acid (located in the IC3 region of the GPCR) from a conserved proline residue (located in the TM6 region of the GPCR, near the TM6/IC3 interface) is mutated, most preferably to a lysine amino acid residue.
1. Transformer Site-directed(TM) Mutagenesis Preparation of non-endogenous human GPCRs may be accomplished on 20 human GPCRs using Transformer Site-DirectedTM Mutagenesis Kit (Clontech) according to the manufacturer instructions. Two mutagenesis primers are utilized, most preferably a lysine mutagenesis oligonucleotide that creates the lysine mutation, and a selection marker oligonucleotide. For convenience, the codon mutation to be incorporated into the human GPCR is also noted, in standard form (Table E): 34 TABLE E Receptor Identifier hARE-3 hARE-4 hGPCR14 hGPCR27 hGPR38 hARE-i hARE-2 hPPR1 hG2A hRUP3 hRUP4 hRUP6 hRUP7 hCHN4 hMC4 hCHN3 hCHN6 hCHN8 hCHN9 hCHNlO h}{9 hCCKB hTDAG8 Codon Mutation F313K V233K A240K L257K C283K V297K E232K G285K L239K K232A L224K V272K A236K N267K A302K V236K A244K S284K L352K N235K 0223K L231K F236K V332K 1225K The following GPCRs were mutated according with the above method using the designated sequence primers (Table F).
WO 00/22131 WO 0022131PCTIUS99/24065 TABLE F Receptor Codon Identifier Mutation Lysine Mutagenesis
(SEQ.ID.NO.)
orientation, mutation sequence underlined hRUP4 hAT I hGPR38 hCCK13 hTDAG8 hH9 hMC4 V272K CAGGAAGAAGAAACGAGC
TGTCATTATGATGGTGACA
GTG (83) see below alternative approach; see below V297K GGCCACCGGCAGACCAAAC GCGTCCTGCTG (85) V332K alternative approach;, see below 1225K GGAAAAGAAGAGAATCAA
AAAACTACTTGTCAGCATC
(87) F236K GCTGAGGTTCGCAATAAAC TAACCATGFFTGTG (143) A244K GCCAATATGAAGGGAAAA ATTACCTTGACCATC (137) Selection Marker
(SEQ.ID.NO.)
orientation
CACTGTCACCATCATAATG
ACAGCTCG1TTCTTCTTCC TG (84) alterniative approach; see below
CTCCTTCGGTCCTCCTATC
GTTGTCAGAAGT (86) alternative approach; see below
CTCCTTCGGTCCTCCTATC
GTTGTCAGAAGT (88)
CTCCTTCGGTCCTCCTATC
GTTGTCAGAAGT (144)
CTCCTTCGGTCCTCCTATC
GTTGTCAGAAGT (138) The non-endogenous human GPCRs were then sequenced and the derived and verified nucleic acid and amino acid sequences are listed in the accompanying "Sequence Listing" appendix to this patent document, as summarized in Table G below: TABLE G Non Endogenous Human
GPCR
hRUP4 (V272K) hAT I (see alternative approaches below) hGPR38 (V297K) hCCKB (V332K) HTDAG8 (1225K) hH9 (F236K) hMC4 (A244K) Nucleic Acid Sequence Listing SEQ.ID.NO.: 127 (see alternative approaches below) SEQ.ID.NO.: 129 SEQ.ID.NO.: 131 SEQ.ID.NO.: 133 SEQlD.NO.: 141 SEQ.ID.NO.: 135 Amino Acid Sequence Listing SEQ.ID.NO.: 128 (see alterniative approaches, below) SEQ.ID.NO.: 130 SEQ.ID.NO.: 132 SEQ.ID.NO.: 134 SEQ.ID.NO.: 142 SEQ.ID.NO.: 136 WO 00/22131 PCT/US99/24065 -36- 2. Alternative Approaches For Creation of Non-Endogenous Human GPCRs a. AT1 1. F239K Mutation Preparation of a non-endogenous, constitutively activated human AT1 receptor was accomplished by creating an F239K mutation (see, SEQ.ID.NO.: 89 for nucleic acid sequence, and SEQ.ID.NO.: 90 for amino acid sequence). Mutagenesis was performed using Transformer Site-Directed Mutagenesis T M Kit (Clontech) according to the to manufacturer's instructions. The two mutagenesis primers were used, a lysine mutagenesis oligonucleotide (SEQ.ID.NO.: 91) and a selection marker oligonucleotide (SEQ.ID.NO.: 92), which had the following sequences: 5'-CCAAGAAATGATGATATTAAAAAGATAATTATGGC-3' (SEQ.ID.NO.: 91) 5'-CTCCTTCGGTCCTCCTATCGTTGTCAGAAGT-3' (SEQ.ID.NO.: 92), respectively.
2. N111A Mutation Preparation of a non-endogenous human ATI receptor was also accomplished by creating an N111A mutation (see, SEQ.ID.NO.:93 for nucleic acid sequence, and SEQ.ID.NO.: 94 for amino acid sequence). Two PCR reactions were performed using pfu polymerase (Stratagene) with the buffer system provided by the manufacturer, supplemented with 10% DMSO, 0.25 LtM of each primer, and 0.5 mM of each 4 nucleotides. The 5' PCR sense primer used had the following sequence: 5'-CCCAAGCTTCCCCAGGTGTATTTGAT-3' (SEQ.ID.NO.: and the antisense primer had the following sequence: WO 00/22131 PCT/US99/24065 -37- 5'-CCTGCAGGCGAAACTGACTCTGGCTGAAG-3' (SEQ.ID.NO.: 96).
The resulting 400 bp PCR fragment was digested with HindIII site and subcloned into HindIII-Smal site ofpCMV vector construct). The 3' PCR sense primer used had the following sequence: 5'-CTGTACGCTAGTGTGTTTCTACTCACGTGTCTCAGCATTGAT-3' (SEQ.ID.NO.: 97) and the antisense primer had the following sequence:.
5'-GTTGGATCCACATAATGCATTTTCTC-3' (SEQ.ID.NO.: 98) The resulting 880 bp PCR fragment was digested with BamHI and inserted into Pst (blunted by T4 polymerase) and BamHI site of 5' construct to generated the full length N111A construct. The cycle condition was 25 cycles of 94°C for 1 min, 60°C for Imin and 72 °C for 1 min PCR) or 1.5 min PCR).
3. AT2K255IC3 Mutation Preparation of a non-endogenous, constitutively activated human AT1 was accomplished by creating an AT2K255IC3 "domain swap" mutation (see, SEQ.ID.NO.:99 for nucleic acid sequence, and SEQ.ID.NO.: 100 for amino acid sequence). Restriction sites flanking IC3 of ATI were generated to facilitate replacement of the IC3 with corresponding IC3 from angiotensin II type 2 receptor (AT2). This was accomplished by performing two PCR reactions. A 5' PCR fragment (Fragment A) encoded from the untranslated region to the beginning of IC3 was generated by utilizing SEQ.ID.NO.: 63 as sense primer and the following sequence: 5'-TCCGAATTCCAAAATAACTTGTAAGAATGATCAGAAA-3' (SEQ.ID.NO.: 101) as antisense primer. A 3' PCR fragment (Fragment B) encoding from the end of IC3 to the 3' untranslated region was generated by using the following sequence: 5'-AGATCTTAAGAAGATAATTATGGCAATTGTGCT-3' (SEQ.ID.NO.: 102) WO 00/22131 PCT/US99/24065 -38as sense primer and SEQ.ID.NO.: 64 as antisense primer. The PCR condition was cycles of 94 0 C for 1 min, 55 C for Imin and 72 C for 1.5 min using endogenous ATI cDNA clone as template and pfu polymerase (Stratagene), with the buffer systems provided by the manufacturer, supplemented with 10% DMSO, 0.25 uM of each primer, and 0.5 mM of each 4 nucleotides. Fragment A (720 bp) was digested with HindIII and EcoRI and subcloned. Fragment B was digested with BamHI and subcloned into pCMV vector with an EcoRI site 5' to the cloned PCR fragment.
The DNA fragment (Fragment C) encoding IC3 of AT2 with a L255K mutation and containing an EcoRI cohesive end at 5' and a AflII cohesive end at was generated by annealing 2 synthetic oligonucleotides having the following sequences: G-3' (sense; SEQ.ID.NO.: 103) AAGTGTTTTCG-3' (antisense; SEQ.ID.NO.: 104).
Fragment C was inserted in front of Fragment B through EcoRI and AflII site. The resulting clone was then ligated with the Fragment A through the EcoRI site to generate AT1 with AT2K255IC3.
4. A243+ Mutation Preparation of a non-endogenous human AT1 receptor was also accomplished by creating an A243+ mutation (see. SEQ.ID.NO.: 105 for nucleic acid sequence, and SEQ.ID.NO.: 106 for amino acid sequence). An A243+ mutation was constructed using the following PCR based strategy: Two PCR reactions was performed using pfu polymerase (Stratagene) with the buffer system provided by the manufacturer supplemented with DMSO, 0.25 uM of each primer, and 0.5 mM of each 4 nucleotides. The 5' PCR sense primer WO 00/22131 PCTIUS99/24065 -39utilized had the following sequence: 5'-CCCAAGCTTCCCCAGGTGTATTTGAT-3' (SEQ.ID.NO.: 107) and the antisense primer had the following sequence: 5'-AAGCACAATTGCTGCATAATTATCTTAAAAATATCATC-3' (SEQ.ID.NO.: 108).
The 3' PCR sense primer utilized had the following sequence: 5'-AAGATAATTATGGCAGCAATTGTGCTTTTCTTTTTCTTT-3' (SEQ.ID.NO.: 109) containing the Ala insertion and antisense primer: 5'-GTTGGATCCACATAATGCATTTTCTC-3'(SEQ.ID.NO.: 110).
The cycle condition was 25 cycles of 94 0 C for 1 min, 54°C for 1min and 72 °C for 1.5 min.
An aliquot of the 5' and 3' PCR were then used as co-template to perform secondary PCR using the 5' PCR sense primer and 3' PCR antisense primer. The PCR condition was the same as primary PCR except the extention time was 2.5 min. The resulting PCR fragment was digested with HindIII and BamHI and subcloned into pCMV vector. (See, SEQ.ID.NO.: 105) 4. CCKB Preparation of the non-endogenous, constitutively activated human CCKB receptor was accomplished by creating a V322K mutation (see, SEQ.ID.NO.: 111 for nucleic acid sequence and SEQ.ID.NO.: 112 for amino acid sequence). Mutagenesis was performed by PCR via amplification using the wildtype CCKB from Example 1.
The first PCR fragment (1kb) was amplified by using SEQ.ID.NO.: 75 and an antisense primer comprising a V322K mutation: 5'-CAGCAGCATGCGCTTCACGCGCTTCTTAGCCCAG-3' (SEQ.ID.NO.: 113).
The second PCR fragment (0.44kb) was amplified by using a sense primer comprising the V322K mutation: WO 00/22131 PCT[US99/24065 5'-AGAAGCGCGTGAAGCGCATGCTGCTGGTGATCGTT-3' (SEQ.ID.NO.: 114) and SEQ.ID.NO.: 76.
The two resulting PCR fragments were then used as template for amplifying CCKB comprising V332K, using SEQ.ID.NO.: 75 and SEQ.ID.NO.: 76 and the above-noted system and conditions. The resulting 1.44kb PCR fragment containing the V332K mutation was digested with HindIII and EcoRI and cloned into HindlII-EcoRI site of pCMV expression vector. (See, SEQ.ID.NO.: 111).
3. QuikChange T M Site-DirectedTM Mutagenesis Preparation of non-endogenous human GPCRs can also be accomplished by using QuikChangeTM Site-DirectedTM Mutagenesis Kit (Stratagene, according to manufacturer's instructions). Endogenous GPCR is preferably used as a template and two mutagenesis primers utilized, as well as, most preferably, a lysine mutagenesis oligonucleotide and a selection marker oligonucleotide (included in kit). For convenience, the codon mutation incorporated into the human GPCR and the respective oligonucleotides are noted, in standard form (Table H): WO 00/22131 PCT/US99/24065 -41- TABLE H Receptor Codon Lysine Mutagenesis Identifier Mutation (SEQ.ID.NO.) orientation, mutation underlined hCHN3 S284K ATGGAGAAAAGAATCAAAAGAA TGTTCTATATA(115) hCHN6 L352K CGCTCTCTGGCCTTGAAGCGCAC GCTCAGC (117) hCHN8 N235K CCCAGGAAAAAGGTGAAAGTCA AAGTTTTC (119) hCHN9 G223K GGGGCGCGGGTGAAACGGCTGG TGAGC (121) L231K CCCCTTGAAAAGCCTAAGAACTT GGTCATC (123) Selection Marker
(SEQ.ID.NO.)
orientation
TATATAGAACATTCTTTT
GATTCTTTTCTCCAT
(116)
GCTGAGCGTGCGCTTCA
AGGCCAGAGAGCG (118)
GAAAACTTTGACTTTCAC
CTTTTTCCTGGG (120)
GCTCACCAGCCGTTTCA
CCCGCGCCCC (122)
GATGACCAAGTTCTTAG
GCTTTTCAAGGGG (124) Example 3 RECEPTOR EXPRESSION Although a variety of cells are available to the art for the expression of proteins, it is most preferred that mammalian cells be utilized. The primary reason for this is predicated upon practicalities, utilization of, yeast cells for the expression of a GPCR, while possible, introduces into the protocol a non-mammalian cell which may not (indeed, in the case of yeast, does not) include the receptor-coupling, genetic-mechanism and secretary pathways that have evolved for mammalian systems thus, results obtained in nonmammalian cells, while of potential use, are not as preferred as that obtained from mammalian cells. Of the mammalian cells, COS-7,293 and 293T cells are particularly preferred, although the specific mammalian cell utilized can be predicated upon the particular needs of the artisan.
On day one, 1X107 293T cells per 150mm plate were plated out. On day two, two reaction tubes were prepared (the proportions to follow for each tube are per plate): tube A was prepared by mixing 20 pg DNA pCMV vector; pCMV vector with receptor cDNA, etc.) in 1.2ml serum free DMEM (Irvine Scientific, Irvine, CA); tube B was WO 00/22131 PCT/US99/24065 -42prepared by mixing 120pl lipofectamine (Gibco BRL) in 1.2ml serum free DMEM. Tubes A and B were admixed by inversions (several times), followed by incubation at room temperature for 30-45min. The admixture is referred to as the "transfection mixture".
Plated 293T cells were washed with 1XPBS, followed by addition of 10ml serum free DMEM. 2.4ml of the transfection mixture were added to the cells, followed by incubation for 4hrs at 37 0 C/5% CO.. The transfection mixture was removed by aspiration, followed by the addition of 25ml of DMEM/10% Fetal Bovine Serum. Cells were incubated at 37 0 C/5% CO,. After 72hr incubation, cells were harvested and utilized for analysis.
Example 4 ASSAYS FOR DETERMINATION OF CONSTITUTIVE ACTIVITY OF NON-ENDOGENOUS GPCRS A variety of approaches are available for assessment of constitutive activity of the non-endogenous human GPCRs. The following are illustrative; those of ordinary skill in the art are credited with the ability to determine those techniques that are preferentially beneficial for the needs of the artisan.
1. Membrane Binding Assays: [3S]GTPyS Assay When a G protein-coupled receptor is in its active state, either as a result of ligand binding or constitutive activation, the receptor couples to a G protein and stimulates the release of GDP and subsequent binding of GTP to the G protein. The alpha subunit of the G protein-receptor complex acts as a GTPase and slowly hydrolyzes the GTP to GDP, at which point the receptor normally is deactivated. Constitutively activated receptors continue to exchange GDP for GTP. The non-hydrolyzable GTP analog, 3 S]GTPyS. can be utilized to demonstrate enhanced binding of 3 S]GTPyS to membranes expressing constitutively activated receptors. The advantage of using 35 S]GTPyS binding to measure constitutive WO 00/22131 PCT/US99/24065 -43activation is that: it is generically applicable to all G protein-coupled receptors; it is proximal at the membrane surface making it less likely to pick-up molecules which affect the intracellular cascade.
The assay utilizes the ability of G protein coupled receptors to stimulate 3 S]GTPyS binding to membranes expressing the relevant receptors. The assay can, therefore, be used in the direct identification method to screen candidate compounds to known, orphan and constitutively activated G protein-coupled receptors. The assay is generic and has application to drug discovery at all G protein-coupled receptors.
The 3 S]GTPyS assay can be incubated in 20 mM HEPES and between 1 and about 20mM MgCl, (this amount can be adjusted for optimization of results, although 20mM is preferred) pH 7.4, binding buffer with between about 0.3 and about 1.2 nM 3 5 S]GTPyS (this amount can be adjusted for optimization of results, although 1.2 is preferred and 12.5 to pg membrane protein COS-7 cells expressing the receptor; this amount can be adjusted for optimization, although 75 pg is preferred) and 1 uM GDP (this amount can be changed for optimization) for 1 hour. Wheatgerm agglutinin beads (25 tl; Amersham) should then be added and the mixture incubated for another 30 minutes at room temperature. The tubes are then centrifuged at 1500 x g for 5 minutes at room temperature and then counted in a scintillation counter.
A less costly but equally applicable alternative has been identified which also meets the needs of large scale screening. Flash platesTM and Wallac'T scintistrips may be utilized to format a high throughput 35 S]GTPyS binding assay. Furthermore, using this technique, the assay can be utilized for known GPCRs to simultaneously monitor tritiated ligand binding to the receptor at the same time as monitoring the efficacy via 3 5 S]GTPyS binding. This is WO 00/22131 PCT/US99/24065 -44possible because the Wallac beta counter can switch energy windows to look at both tritium and "S-labeled probes. This assay may also be used to detect other types of membrane activation events resulting in receptor activation. For example, the assay may be used to monitor 32 P phosphorylation of a variety of receptors (both G protein coupled and tyrosine kinase receptors). When the membranes are centrifuged to the bottom of the well, the bound 3 S]GTPyS or the 32 P-phosphorylated receptor will activate the scintillant which is coated of the wells. Scinti® strips (Wallac) have been used to demonstrate this principle. In addition, the assay also has utility for measuring ligand binding to receptors using radioactively labeled ligands. In a similar manner, when the radiolabeled bound ligand is centrifuged to the bottom of the well, the scintistrip label comes into proximity with the radiolabeled ligand resulting in activation and detection.
2. Adenylyl Cyclase A Flash PlateTM Adenylyl Cyclase kit (New England Nuclear; Cat. No. SMP004A) designed for cell-based assays can be modified for use with crude plasma membranes. The Flash Plate wells contain a scintillant coating which also contains a specific antibody recognizing cAMP. The cAMP generated in the wells was quantitated by a direct competition for binding of radioactive cAMP tracer to the cAMP antibody. The following serves as a brief protocol for the measurement of changes in cAMP levels in membranes that express the receptors.
Transfected cells are harvested approximately three days after transfection.
Membranes were prepared by homogenization of suspended cells in buffer containing HEPES, pH 7.4 and 10mM MgCl 2 Homogenization is performed on ice using a Brinkman Polytron T M for approximately 10 seconds. The resulting homogenate is centrifuged at 49,000 X g for 15 minutes at 4 0 C. The resulting pellet is then resuspended in buffer containing mM HEPES, pH 7.4 and 0.1 mM EDTA, homogenized for 10 seconds, followed by centrifugation at 49,000 X g for 15 minutes at 4 0 C. The resulting pellet can be stored at 0 C. until utilized. On the day of measurement, the membrane pellet is slowly thawed at room temperature, resuspended in buffer containing 20 mM HEPES, pH 7.4 and mM MgCL 2 (these amounts can be optimized, although the values listed herein are preferred), to yield a final protein concentration of 0.60 mg/ml (the resuspended membranes were placed on ice until use).
cAMP standards and Detection Buffer (comprising 2 tCi of tracer [1251] cAMP (100 gl] to 11 ml Detection Buffer) are prepared and maintained in accordance with the manufacturer's instructions. Assay Buffer is prepared fresh for screening and contained 20 mM HEPES, pH 7.4, 1 mM MgC12, 20 mM (Sigma), 0.1 units/ml creatine phosphokinase (Sigma), 50,tM GTP (Sigma), and 0.2 mM ATP (Sigma); Assay Buffer can be stored on ice until utilized. The assay is initiated by addition of 50 ul of assay buffer followed by addition of 50 ul of membrane suspension to the NEN Flash Plate.
The resultant assay mixture is incubated for 60 minutes at room temperature followed by addition of 100 ul of detection buffer. Plates are then incubated an additional 2-4 hours followed by counting in a Wallac MicroBetaTM scintillation counter. Values of cAMP/well are extrapolated from a standard cAMP curve that is contained within each S 20 assay plate.
3. Reporter-based Assays 1. CREB Reporter Assay (Gs-associated Receptors) A method to detect Gs stimulation depends on the known property of the transcription factor CREB, which is activated in a cAMP-dependent manner. A PathDetectTM CREB transo *o *~o WO 00/22131 PCT/US99/24065 -46- Reporting System (Stratagene, Catalogue 219010) can utilized to assay for Gs coupled activity in 293 or 293T cells. Cells are transfected with the plasmids components of this above system and the indicated expression plasmid encoding endogenous or mutant receptor using a Mammalian Transfection Kit (Stratagene, Catalogue #200285) according to the manufacturer's instructions. Briefly, 400 ng pFR-Luc (luciferase reporter plasmid containing Gal4 recognition sequences), 40 ng pFA2-CREB (Gal4-CREB fusion protein containing the Gal4 DNA-binding domain), 80 ng pCMV-receptor expression plasmid (comprising the receptor) and 20 ng CMV-SEAP (secreted alkaline phosphatase expression plasmid; alkaline phosphatase activity is measured in the media of transfected cells to control for variations in transfection efficiency between samples) are combined in a calcium phosphate precipitate as per the Kit's instructions. Half of the precipitate is equally distributed over 3 wells in a 96well plate, kept on the cells overnight, and replaced with fresh medium the following morning.
Forty-eight (48) hr after the start of the transfection, cells are treated and assayed for, e.g., luciferase activity 2. AP1 reporter assay (Gq-associated receptors) A method to detect Gq stimulation depends on the known property of Gq-dependent phospholipase C to cause the activation of genes containing AP 1 elements in their promoter.
A PathdetectTM AP-1 cis-Reporting System (Stratagene, Catalogue 219073) can be utilized following the protocol set forth above with respect to the CREB reporter assay. except that the components of the calcium phosphate precipitate were 410 ng pAP 1-Luc. 80 ng pCMVreceptor expression plasmid, and 20 ng CMV-SEAP.
3. CRE-LUC Reporter Assay 293 and 293T cells are plated-out on 96 well plates at a density of 2 x 104 cells per WO 00/22131 PCTIUS99/24065 -47well and were transfected using Lipofectamine Reagent (BRL) the following day according to manufacturer instructions. A DNA/lipid mixture is prepared for each 6-well transfection as follows: 260ng of plasmid DNA in l00pl of DMEM were gently mixed with 2pl of lipid in 100pl of DMEM (the 260ng ofplasmid DNA consisted of 200ng of a 8xCRE-Luc reporter plasmid (see below and Figure 1 for a representation of a portion of the plasmid), 50ng of pCMV comprising endogenous receptor or non-endogenous receptor or pCMV alone, and of a GPRS expression plasmid (GPRS in pcDNA3 (Invitrogen)). The 8XCRE-Luc reporter plasmid was prepared as follows: vector SRIF-P-gal was obtained by cloning the rat somatostatin promoter at BglV-HindIII site in the ppgal-Basic Vector (Clontech).
Eight copies of cAMP response element were obtained by PCR from an adenovirus template AdpCF126CCRE8 (see, 7 Human Gene Therapy 1883 (1996)) and cloned into the SRIF-p-gal vector at the Kpn-BglV site, resulting in the 8xCRE-p-gal reporter vector. The 8xCRE-Luc reporter plasmid was generated by replacing the beta-galactosidase gene in the 8xCRE-p-gal reporter vector with the luciferase gene obtained from the pGL3-basic vector (Promega) at the HindIII-BamHI site. Following 30 min. incubation at room temperature, the DNA/lipid mixture was diluted with 400 ul of DMEM and 100pl of the diluted mixture was added to each well. 100 jl of DMEM with 10% FCS were added to each well after a 4hr incubation in a cell culture incubator. The following day the transfected cells were changed with 200 pl/well of DMEM with 10% FCS. Eight hours later, the wells were changed to 100 pl /well of DMEM without phenol red, after one wash with PBS. Luciferase activity were measured the next day using the LucLiteTM reporter gene assay kit (Packard) following manufacturer instructions and read on a 1450 MicroBetaTM scintillation and luminescence counter (Wallac).
To detect Gs stimulation of non-endogenous constitutively activated GPR38, a PathDetect pCRE-Luc trans-Reporting System (Stratagene, Catalogue 219075) was utilized in 293T cells. Cells were transfected with the plasmids components of this system and the indicated expression plasmid encoding endogenous or non-endogenous receptor using a Mammalian Transfection Kit (Stratagene, Catalogue #200285) according to the manufacturer's instructions. Briefly, 400 ng pCRE-Luc, 80 ng pCMV (comprising the receptor) and 20 ng CMV-SEAP (secreted alkaline phosphatase expression plasmid; alkaline phosphatase activity was measured in the media of transfected cells to control for variations in transfection efficiency between samples) were combined in a calcium phosphate precipitate as per the manufacturer's instructions. Half of the precipitate was equally distributed over 3 wells in a 96-well plate, kept on the cells overnight, and replaced with fresh medium the following day. Forty-eight (48) hr after the start of the transfection, cells were treated and assayed for luciferase activity using a LucliteTM Kit (Packard, Cat. 6016911) and "Trilux 1450 Microbeta" liquid scintillation and luminescence counter (Wallac) as per the manufacturer's instructions. The data were analyzed using GraphPad Prism T (GraphPad Software Inc.).
Results shown in Figure 4 indicate an 83.1% increase in activity of the 20 non-endogenous, constitutively active version of human GPR38 (V297K) (11,505 relative light units) compared with that of the endogenous GPR38 (1950 relative light units).
S
*bo *o *°to o WO 00/22131 PCT/US99/24065 -48- 4. SRF-LUC Reporter Assay One method to detect Gq stimulation depends on the known property of Gq-dependent phospholipase C to cause the activation of genes containing serum response factors in their promoter. A PathdetectTM SRF-Luc-Reporting System (Stratagene) can be utilized to assay for Gq coupled activity in, COS7 cells. Cells are transfected with the plasmid components of the system and the indicated expression plasmid encoding endogenous or nonendogenous GPCR using a Mammalian TransfectionTM Kit (Stratagene, Catalogue #200285) according to the manufacturer's instructions. Briefly, 410 ng SRF-Luc, 80 ng pCMV-receptor expression plasmid and 20 ng CMV-SEAP (secreted alkaline phosphatase expression plasmid; alkaline phosphatase activity is measured in the media of transfected cells to control for variations in transfection efficiency between samples) are combined in a calcium phosphate precipitate as per the manufacturer's instructions. Half of the precipitate is equally distributed over 3 wells in a 96-well plate, kept on the cells in a serum free media for 24 hours. The last hours the cells are incubated with 1 gM Angiotensin, where indicated. Cells are then lysed and assayed for luciferase activity using a LucliteTM Kit (Packard, Cat. 6016911) and "Trilux 1450 Microbeta" liquid scintillation and luminescence counter (Wallac) as per the manufacturer's instructions. The data can be analyzed using GraphPad PrismTM (GraphPad Software Inc.).
Intracellular IP 3 Accumulation Assay On day 1, cells comprising the receptors (endogenous and/or non-endogenous) can be plated onto 24 well plates, usually lxl 0 cells/well (although his umber can be optimized. On day 2 cells can be transfected by firstly mixing 0.25ug DNA in 50 ul serum free DMEM/well and 2 ul lipofectamine in 50 Al serumfree DMEM/well. The solutions WO 00/22131 PCT/US99/24065 -49are gently mixed and incubated for 15-30 min at room temperature. Cells are washed with ml PBS and 400 Al of serum free media is mixed with the transfection media and added to the cells. The cells are then incubated for 3-4 hrs at 37C/5%CO, and then the transfection media is removed and replaced with lml/well of regular growth media. On day 3 the cells are labeled with 3 H-myo-inositol. Briefly, the media is removed and the cells are washed with 0.5 ml PBS. Then 0.5 ml inositol-free/serum free media (GIBCO BRL) is added/well with 0.25 /Ci of 3 H-myo-inositol well and the cells are incubated for 16-18 hrs o/n at 37 0 C/5%C 2 On Day 4 the cells are washed with 0.5 ml PBS and 0.45 ml of assay medium is added containing inositol-free/serum free media 10 /IM pargyline 10 mM lithium chloride or 0.4 ml of assay medium and 50 ul of 1 Ox ketanserin (ket) to final concentration of 104M. The cells are then incubated for 30 min at 37 0 C. The cells are then washed with 0.5 ml PBSand 200 ul of fresh/icecold stop solution (1M KOH; 18 mM Na-borate; 3.8 mM EDTA) is added/well. The solution is kept on ice for 5-10 min or until cells were lysed and then neutralized by 200 /l of fresh/ice cold neutralization sol.
(7.5 HCL). The lysate is then transferred into 1.5 ml eppendorf tubes and 1 ml of chloroform/methanol is added/tube. The solution is vortexed for 15 sec and the upper phase is applied to a Biorad AG1-X8 T M anion exchange resin (100-200 mesh).
Firstly, the resin is washed with water at 1:1.25 W/V and 0.9 ml of upper phase is loaded onto the column. The column is washed with 10 mls of 5 mM myo-inositol and 10 ml of mM Na-borate/60mM Na-formate. The inositol tris phosphates are eluted into scintillation vials containing 10 ml of scintillation cocktail with 2 ml of 0.1 M formic acid/ 1 M ammonium formate. The columns are regenerated by washing with 10 ml of 0.1 M formic acid/3M ammonium formate and rinsed twice with dd H,O and stored at 4 0 C in water.
Exemplary results are presented below in Table I: Receptor Mutation TABLE I Assay Signal Utilized Generated Endogenous Version (Relative Light Units) Signal Generated Non- Endogenous Version (Relative Light Units) Percent Difference hATL hTDAG8 hH9 hGGKB F239K AT2K2551C3 1225K 1225K F236K V332K
SRF-LUC
SRF-LUC
CRE-LUC
(293 cells)
CRE-LUC
(293T cells)
CRE-LUC
CRE-LUC
34 34 2,715 65,681 1,887 785 137 127 14,440 185,636 6,096 3,223 75%
A
73%
A
81%
A
65%
A
69%
A
76%
A
C. C
C
C. C
C..
C
C.
bb C. C C
CC.
C
C.
C
C.
Example Cell-Based Detection Assay (Example-TDAG8) 293 cells were plated-out on 150 mm plates at a density of 1.3x107 cells per plate, and were transfected using 12 ug of the respective DNA and 60 ul of Lipofectamine Reagent (BRL) per plate. The transfected cells were grown in media containing serum for an assay performed 24 hours post-transfection. For detection assay performed 48 hours post-transfecfion (assay comparing serum and serum-free 30 media; see FIG. the initial media was changed to either serum or serum-free media.
The serum-free media was comprised solely of Dulbecco's Modified Eagle's (DME) High Glucose Medium (Irvine Scientific #9024). In addition to the above DME Medium, the media with serum contained the following: 10% Fetal Bovine Serum (Hyclone #SH30071.03), 1% of 100 mM Sodium Pyruvate (Irvine Scientific #9334), 1% of 20 mM L-Glutamine (Irvine Scientific #9317), and 1% of Penicillin- WO 00/22131 PCT/US99/24065 -51- Streptomycin solution (Irvine Scientific #9366).
A 96-well Adenylyl Cyclase Activation Flashplate T M was used (NEN: #SMP004A).
First, 50ul of the standards for the assay were added to the plate, in duplicate, ranging from concentrations of 50pmol to zero pmol cAMP per well. The standard cAMP (NEN: #SMP004A) was reconstituted in water, and serial dilutions were made using 1 xPBS (Irvine Scientific: #9240). Next, 50ul of the stimulation buffer (NEN: #SMP004A) was added to all wells. In the case of using compounds to measure activation or inactivation of cAMP, 1Oul of each compound, diluted in water, was added to its respective well, in triplicate. Various final concentrations used range from luM up to ImM. Adenosine 5'-triphosphate, ATP, (Research Biochemicals International: #A-141) and Adenosine 5'-diphosphate, ADP, (Sigma: #A2754) were used in the assay. Next, the 293 cells transfected with the respective cDNA (CMV or TDAG8) were harvested 24 (assay detection in serum media) or 48 hours posttransfection (assay detection comparing serum and serum-free media). The media was aspirated and the cells washed once with IxPBS. Then 5ml of IxPBS was added to the cells along with 3ml of cell dissociation buffer (Sigma: #C-1544). The detached cells were transferred to a centrifuge tube and centrifuged at room temperature for five minutes. The supernatant was removed and the cell pellet was resuspended in an appropriate amount of 1 xPBS to obtain a final concentration of2x1 06 cells per milliliter. To the wells containing the compound, 50ul of the cells in lxPBS (1xl 05 cells/well) were added. The plate was incubated on a shaker for 15 minutes at room temperature. The detection buffer containing the tracer cAMP was prepared. In 1 lml of detection buffer (NEN: #SMP004A). 50ul (equal to luCi) of 2 5 I]cAMP (NEN: #SMP004A) was added. Following incubation, 50ul of this detection buffer containing tracer cAMP was added to each well. The plate was placed on a shaker and WO 00/22131 PCT/US99/24065 -52incubated at room temperature for two hours. Finally, the solution from the wells of the plate were aspirated and the flashplate was counted using the Wallac MicroBetaTM scintillation counter.
In Figure 2A, ATP and ADP bind to endogenous TDAG8 resulting in an increase of cAMP of about 59% and about 55% respectively. Figure 2B evidences ATP and ADP binding to endogenous TDAG8 where endogenous TDAG8 was transfected and grown in serum and serum-free medium. ATP binding to endogenous TDAG8 grown in serum media evidences an increase in cAMP of about 65%, compared to the endogenous TDAG8 with no compounds; in serum-free media there was an increase of about 68%. ADP binding to endogenous TDAG8 in serum evidences about a 61% increase, while in serumfree ADP binding evidences an increase of about 62% increase. ATP and ADP bind to endogenous TDAG8 with an EC50 value of 139.8uM and 120.5uM, respectively (data not shown).
Although the results presented in Figure 2B indicate substantially the same results when serum and serum-free media were compared, our choice is to use a serum based media, although a serum-free media can also be utilized.
Example 6 GPCR FUSION PROTEIN PREPARATION The design of the constitutively activated GPCR-G protein fusion construct was accomplished as follows: both the 5' and 3' ends of the rat G protein Gsa (long form; Itoh.
H. et al., 83 PNAS 3776 (1986)) were engineered to include a HindIII (5'-AAGCTT-3') sequence thereon. Following confirmation of the correct sequence (including the flanking HindIII sequences), the entire sequence was shuttled into pcDNA3.1(-) (Invitrogen, cat. no.
V795-20) by subcloning using the HindIII restriction site of that vector. The correct WO 00/22131 PCT/US99/24065 -53orientation for the Gsa sequence was determined after subcloning into pcDNA3.1(-). The modified pcDNA3.1 containing the rat Gsa gene at HindIII sequence was then verified; this vector was now available as a "universal" Gsa protein vector. The pcDNA3.1(-) vector contains a variety of well-known restriction sites upstream of the HindIII site, thus beneficially providing the ability to insert, upstream of the Gs protein, the coding sequence of an endogenous, constitutively active GPCR. This same approach can be utilized to create other "universal" G protein vectors, and, of course, other commercially available or proprietary vectors known to the artisan can be utilized the important criteria is that the sequence for the GPCR be upstream and in-frame with that of the G protein.
TDAG8 couples via Gs, while H9 couples via Gz. For the following exemplary GPCR Fusion Proteins, fusion to Gsa was accomplished.
A TDAG8(I225K)-Gsa Fusion Protein construct was made as follows: primers were designed as follows: 5'-gatcTCTAGAATGAACAGCACATGTATTGAAG-3' (SEQ.ID.NO.: 125; sense) 5'-ctagGGTACCCGCTCAAGGACCTCTAATTCCATAG-3' (SEQ.ID.NO.: 126; antisense).
Nucleotides in lower caps are included as spacers in the restriction sites between the G protein and TDAG8. The sense and anti-sense primers included the restriction sites for XbaI and KpnI, respectively.
PCR was then utilized to secure the respective receptor sequences for fusion within the Gsa universal vector disclosed above, using the following protocol for each: 1 OOng cDNA for TDAG8 was added to separate tubes containing 2ul of each primer (sense and anti-sense), 3uL of 10mM dNTPs, 1OuL of 10XTaqPlus T M Precision buffer, luL of TaqPlus T M Precision polymerase (Stratagene: #600211), and 80uL of water. Reaction temperatures and cycle times for TDAG8 were as follows: the initial denaturing step was done it 94°C for five minutes, and WO 00/22131 PCT/US99/24065 -54a cycle of 94°C for 30 seconds; 55°C for 30 seconds; 72°C for two minutes. A final extension time was done at 72°C for ten minutes. PCR product for was run on a 1% agarose gel and then purified (data not shown). The purified product was digested with XbaI and KpnI (New England Biolabs) and the desired inserts purified and ligated into the Gs universal vector at the respective restriction site. The positive clones was isolated following transformation and determined by restriction enzyme digest; expression using 293 cells was accomplished following the protocol set forth infra. Each positive clone for TDAG8:Gs Fusion Protein was sequenced to verify correctness.
GPCR Fusion Proteins comprising non-endogenous, constitutively activated TDAG8(I225K) were analyzed as above and verified for constitutive activation.
An H9(F236K)-Gsa Fusion Protein construct was made as follows: primers were designed as follows: 5'-TTAgatatcGGGGCCCACCCTAGCGGT-3' (SEQ.ID.NO.: 145; sense) 5'-ggtaccCCCACAGCCATTTCATCAGGATC-3' (SEQ.ID.NO.: 146; antisense).
Nucleotides in lower caps are included as spacers in the restriction sites between the G protein and H9. The sense and anti-sense primers included the restriction sites for EcoRV and KpnI, respectively such that spacers (attributed to the restriction sites) exists between the G protein and H9.
PCR was then utilized to secure the respective receptor sequences for fusion within the Gsa universal vector disclosed above, using the following protocol for each: 80ng cDNA for H9 was added to separate tubes containing 100ng of each primer (sense and anti-sense), and 45uL of PCR Supermix T M (Gibco-Brl, LifeTech) (50ul total reaction volume). Reaction temperatures and cycle times for H9 were as follows: the initial denaturing step was done it 94°C for one, and a cycle of 94°C for 30 seconds; 55'C for 30 seconds: 72°C for two WO 00/22131 PCT/US99/24065 minutes. A final extension time was done at 72 0 C for seven minutes. PCR product for was run on a 1% agarose gel and then purified (data not shown). The purified product was cloned into pCRII-TOPOTM System followed by identification of positive clones. Positive clones were isolated, digested with EcoRV and KpnI (New England Biolabs) and the desired inserts were isolated, purified and ligated into the Gs universal vector at the respective restriction site.
The positive clones was isolated following transformation and determined by restriction enzyme digest; expression using 293 cells was accomplished following the protocol set forth infra. Each positive clone for H9(F236K):Gs Fusion Protein was sequenced to verify correctness. Membranes were frozen until utilized.
To ascertain the ability of measuring a cAMP response mediated by the Gs protein (even though H9 couples with Gz), the following cAMP membrane assay was utilized, based upon an NEN Adenyl Cyclase Activation FlahplateTM Assay kit (96 well format). "Binding Buffer" consisted of 10mM HEPES, 100mM NaCl and 10mM MgCl (ph "Regeneration Buffer" was prepared in Binding Buffer and consisted of 20mM phosphocreatine, creatine phosphokinase. 20uM GTP, 0.2mM ATP, and 0.6mM IBMX. "cAMP Standards" were prepared in Binding Buffer as follows: cAMP Stock Added to Final Assay Concentration (5,000 pmol/ml in 2ml HO) indicted amount of Binding (50ul into 100ul) in ul Buffer to achieve indicated pmol/well A 250 Iml B 500 of A 500ul C 500 of B 500ul 12.5 D 500 of C 750ul E 500 of D 500ul F 500 of E 500ul 1.25 G 500 of F 750ul Frozen membranes (both pCMV as control and the non-endogenous H(-Gs Fusion Protein) were thawed (on ice at room temperature until in solution). Membranes were homogenized with a polytron until in suspension (2x15 seconds). Membrane protein concentration was determined using the Bradford Assay Protocol (see infra).
Membrane concentration was diluted to 0.5 mg/ml in Regeneration Buffer (final assay ug/well). Thereafter, 50 ul of Binding Buffer was added to each well.
For control, 50 ul/well of cAMP standard was added to wells 11 and 12 A-G, with Binding Buffer alone to 12H (on the 96-well format). Thereafter, 50 ul/well of protein was added to the wells and incubated at room temperature (on shaker) for 60 min. 100 ul[1 25 I]cAMP in Detection Buffer (see infra) was added to each well (final ul[1 25 I]cAMP into 11 ml Detection Buffer). These were incubated for 2 hrs at room temperature. Plates were aspirated with an 8 channel manifold and sealed with plate covers. Results (pmoles cAMP bound) were read in a WallacTM 1450 on "prot Results are presented in FIG. 3.
The results presented in FIG. 3 indicate that the Gs coupled fusion was able to "drive" the cyclase reaction such that measurement of the consitutive activation of H9(F236K) was viable. Based upon these results, the direct identification of candidate compounds that are inverse agonists, agonists and partial agonists is possible using a cyclase-based assay.
Example 7 S 20 Protocol: Direct Identification of Inverse Agonists and Agonists Using S]GTPyS Although we have utilized endogenous, constitutively active GPCRS for S: the direct identification of candidate compounds as, inverse agonists, for reasons that are not altogether understood, intra-assay variation can become exacerbated.
Preferably, then, a GPCR Fusion Protein, as disclosed above, is also utilized with a non-endogenous, constitutively activated GPCR. We have determined that when such a protein is used, intra-assay variation appears to be substantially stabilized, whereby an effective signal-to-noise ratio is obtained. This has the beneficial result of allowing for a more robust identification a.
WO 00/22131 PCT/US99/24065 -57of candidate compounds. Thus, it is preferred that for direct identification, a GPCR Fusion Protein be used and that when utilized, the following assay protocols be utilized.
Membrane Preparation Membranes comprising the non-endogenous, constitutively active orphan GPCR Fusion Protein of interest and for use in the direct identification of candidate compounds as inverse agonists, agonists or partial agonists are preferably prepared as follows: a. Materials "Membrane Scrape Buffer" is comprised of 20mM HEPES and 10mM EDTA, pH 7.4; "Membrane Wash Buffer" is comprised of 20 mM HEPES and 0.1 mM EDTA, pH 7.4; "Binding Buffer" is comprised of 20mM HEPES, 100 mM NaC1, and 10 mM MgCl 2 pH 7.4 b. Procedure All materials are kept on ice throughout the procedure. Firstly, the media is aspirated from a confluent monolayer of cells, followed by rinse with 10ml cold PBS, followed by aspiration. Thereafter, 5ml of Membrane Scrape Buffer is added to scrape cells; this is followed by transfer of cellular extract into 50ml centrifuge tubes (centrifuged at 20,000 rpm for 17 minutes at 4 Thereafter, the supernatant is aspirated and the pellet is resuspended in 30ml Membrane Wash Buffer followed by centrifuge at 20,000 rpm for 17 minutes at 4'C.
The supernatant is then aspirated and the pellet resuspended in Binding Buffer. This is then homogenized using a Brinkman polytron T M homogenizer (15-20 second bursts until the all material is in suspension). This is referred to herein as "Membrane Protein".
Bradford Protein Assay Following the homogenization, protein concentration of the membranes is determined using the Bradford Protein Assay (protein can be diluted to about 1.5mg/ml, aliquoted and WO 00/22131 PCT/US99/24065 -58frozen (-80 C) for later use; when frozen, protocol for use is as follows: on the day of the assay, frozen Membrane Protein is thawed at room temperature, followed by vortex and then homogenized with a polytron at about 12 x 1,000 rpm for about 5-10 seconds; it is noted that for multiple preparations, the homogenizor should be thoroughly cleaned between homoginezation of different preparations).
a. Materials Binding Buffer (as per above); Bradford Dye Reagent; Bradford Protein Standard are utilized, following manufacturer instructions (Biorad, cat. no. 500-0006).
b. Procedure Duplicate tubes are prepared, one including the membrane, and one as a control "blank". Each contained 800ul Binding Buffer. Thereafter, 10ul of Bradford Protein Standard (Img/ml) is added to each tube, and 10ul of membrane Protein is then added to just one tube (not the blank). Thereafter, 200ul of Bradford Dye Reagent is added to each tube, followed by vortex of each. After five minutes, the tubes were re-vortexed and the material therein is transferred to cuvettes. The cuvettes are then read using a CECIL 3041 spectrophotometer, at wavelength 595.
Direct Identification Assay a. Materials GDP Buffer consists of 37.5 ml Binding Buffer and 2mg GDP (Sigma, cat. no. G- 7127), followed by a series of dilutions in Binding Buffer to obtain 0.2 uM GDP (final concentration of GDP in each well was 0.1 uM GDP); each well comprising a candidate compound, has a final volume of 200ul consisting of 100ul GDP Buffer (final concentration, 0.1 uM GDP), 50ul Membrane Protein in Binding Buffer, and 50ul 3 S]GTPyS (0.6 nM) in Binding Buffer (2.5 ul 35 S]GTPyS per 10 ml Binding Buffer).
b. Procedure Candidate compounds are preferably screened using a 96-well plate format (these can be frozen at -80 0 Membrane Protein (or membranes with expression vector excluding the GPCR Fusion Protein, as control), are homogenized briefly until in suspension. Protein concentration is then determined using the Bradford Protein Assay set forth above. Membrane Protein (and control) is then diluted to 0.25 mg/ml in Binding Buffer (final assay concentration, 12.5 ug/well). Thereafter, 100 ul GDP Buffer is added to each well of a Wallac ScintistripM (Wallac). A 5 ul pin-tool is then used to transfer 5 ul of a candidate compound into such well 5 ul in total assay volume of 200 ul is a 1:40 ratio such that the final screening concentration of the candidate compound is 10 uM). Again, to avoid contamination, after each transfer step the pin tool should be rinsed in three reservoirs comprising water ethanol (IX) and water (2X)-excess liquid should be shaken from the tool after each rinse and dried with paper and kimwipes. Thereafter, 50 ul of Membrane Protein is added to each well (a control well comprising membranes without the GPCR Fusion Protein is also utilized), and pre-incubated for 5-10 minutes at room temperature. Thereafter, 50 ul of 35 S]GTPyS (0.6 nM) in Binding Buffer is added to each well, followed by incubation on a shaker for 60 minutes at room temperature (again, in this example, plates were covered with foil). The assay is then stopped by spinning of the plates at 4000 RPM for 15 minutes at 22 0 C. The plates are then aspirated with an 8 channel manifold and sealed with plate covers. The plates are then read on a Wallacc 1450 using setting "Prot. #37" (as per manufacturer instructions).
Example 8 Protocol: Confirmation Assay Using an independent assay approach to provide confirmation of a directly identified r WO 00/22131 PCT/US99/24065 candidate compound as set forth above, it is preferred that a confirmation assay then be utilized. In this case, the preferred confirmation assay is a cyclase-based assay.
A modified Flash PlateTM Adenylyl Cyclase kit (New England Nuclear; Cat. No.
SMP004A) is preferably utilized for confirmation of candidate compounds directly identified as inverse agonists and agonists to non-endogenous, constitutively activated orphan GPCRs in accordance with the following protocol.
Transfected cells are harvested approximately three days after transfection.
Membranes are prepared by homogenization of suspended cells in buffer containing HEPES, pH 7.4 and 10mM MgCl 2 Homogenization is performed on ice using a Brinkman Polytron T M for approximately 10 seconds. The resulting homogenate is centrifuged at 49,000 X g for 15 minutes at 4 0 C. The resulting pellet is then resuspended in buffer containing HEPES, pH 7.4 and 0.1 mM EDTA, homogenized for 10 seconds, followed by centrifugation at 49,000 X g for 15 minutes at 4°C. The resulting pellet can be stored at 0 C until utilized. On the day of direct identification screening, the membrane pellet is slowly thawed at room temperature, resuspended in buffer containing 20mM HEPES, pH 7.4 and 10mM MgCL2, to yield a final protein concentration of 0.60mg/ml (the resuspended membranes are placed on ice until use).
cAMP standards and Detection Buffer (comprising 2 /Ci of tracer [1251 cAMP (100 to 11 ml Detection Buffer) are prepared and maintained in accordance with the manufacturer's instructions. Assay Buffer is prepared fresh for screening and contained HEPES, pH 7.4, 10mM MgCl 2 20mM phospocreatine (Sigma), 0.1 units/ml creatine phosphokinase (Sigma), 50 kM GTP (Sigma), and 0.2 mM ATP (Sigma); Assay Buffer can be stored on ice until utilized.
WO 00/22131 PCT/US99/24065 -61- Candidate compounds identified as per above (if frozen, thawed at room temperature) are added, preferably, to 96-well plate wells (3ul/well; 12/iM final assay concentration), together with 40 Al Membrane Protein (30/g/well) and 50 1 l of Assay Buffer. This admixture is then incubated for 30 minutes at room temperature, with gentle shaking.
Following the incubation, 1001l of Detection Buffer is added to each well, followed by incubation for 2-24 hours. Plates are then counted in a Wallac MicroBetaTM plate reader using "Prot. #31" (as per manufacturer instructions).
It is intended that each of the patents, applications, and printed publications mentioned in this patent document be hereby incorporated by reference in their entirety.
As those skilled in the art will appreciate, numerous changes and modifications may be made to the preferred embodiments of the invention without departing from the spirit of the invention. It is intended that all such variations fall within the scope of the invention.
Although a variety of expression vectors are available to those in the art, for purposes of utilization for both the endogenous and non-endogenous human GPCRs, it is most preferred that the vector utilized be pCMV. This vector was deposited with the American Type Culture Collection (ATCC) on October 13, 1998 (10801 University Blvd., Manassas, VA 20110-2209 USA) under the provisions of the Budapest Treaty for the International Recognition of the Deposit of Microorganisms for the Purpose of Patent Procedure. The DNA was tested by the ATCC and determined to be. The ATCC has assigned the following deposit number to pCMV: ATCC #203351.
EDITORIAL NOTE APPLICATION NUMBER 62991/99 The following Sequence Listing pages 1 to 116 are part of the description. The claims pages follow on pages "62" to "68".
WO 00/22131 PCT/US99/24065 -1- SEQUENCE LISTING GENERAL INFORMATION: APPLICANT: Behan, Dominic P.
Lehmann-Bruinsma, Karin Chalmers, Derek T.
Lowitz, Kevin P.
Lin, I-Lin Dang, Huong T.
Chen, Ruoping Liaw, Chen W.
Gore, Martin J.
White, Carol (ii) TITLE OF INVENTION: Non-Endogenous, Constitutively Activated Human G Protein-Coupled Receptors (iii) NUMBER OF SEQUENCES: 146 (iv) CORRESPONDENCE ADDRESS: ADDRESSEE: Arena Pharmaceuticals, Inc.
STREET: 6166 Nancy Ridge Drive CITY: San Diego STATE: CA COUNTRY: USA ZIP: 92121 COMPUTER READABLE FORM: MEDIUM TYPE: Floppy disk COMPUTER: IBM PC compatible OPERATING SYSTEM: PC-DOS/MS-DOS SOFTWARE: PatentIn Release Version #1.30 (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: US FILING DATE:
CLASSIFICATION:
(viii) ATTORNEY/AGENT INFORMATION: NAME: Burgoon, Richard P.
REGISTRATION NUMBER: 34,787 (ix) TELECOMMUNICATION INFORMATION: TELEPHONE: (858)453-7200 TELEFAX: (858)453-7210 INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 1260 base pairs TYPE: nucleic acid STRANDEDNESS: single WO 00/22131 WO 00/213 1PCT[US99/24065 -2- TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi)
ATGGTCTTCT
GTGTATGAAA
AGTCCATTGC
GTGAATAGTA
CAGATCACCC
GTTGTTTGCC
GCCAGCCTAG
ACTATTCTTA
TTCTGGTTAT
CTTATTATAG
GTTTCTTGGG
CAGATACCTT
GCTTATGTGA
TCATTTATGG
GAAGGTATAT
CAGATGAGCA
GCTGTCTTCA
AAGCACTTTT
TACCTCAAGT
GCTTGCCTGG
SEQUENCE DESCRIPTION: SEQ ID NO:1: CGGCAGTGTT GACTGCGTTC CATACCGGGA CATCCAACAC AACATTTGTC
ACACCTACAT
TTAGATATAG
CAGCTGTGCC
TTTCTGCTAT
TCATGGTTTA
CTTTTGCAGA
CTACCCGATG
TTGTGATAGA
TCCAGAGGCA
CAACTTCCTT
CCCGAGCTCC
TTTTGATTTC
GCATACTCAA
GCCTCAGCCA
TTGACATGGG
TTGTCTGCTG
ACTATCAGCA
CTGCATTGAA
ACATGATGCC
GAATATTACA
TTTTGAAACC
CACAACACCA
AATGATATTC
CCAAAAAGCT
CATGTTGCTT
GATTTTTGGG
AGGAGTAGCC
GGATAAGCTA
TTGTGTAGCT
CCAGTGTGTG
TCTCATTTCT
CACCCTTCGG
GGCCAGCAAA
CTTTAAAACA
GGCCCCATTC
CAACTTTTTT
TCCGCTGATC
TAAGTCCTTC
CTCCCTCCAC
ATGGCTCCCA
GCAGCATTTA
ATTCTGTTTG
GCCATGAGGT
GCAGTGCTGA
AAATTCTTCT
ATCCTGCTCA
AACCCATATA
TTTCCTTTAG
TTTGGGTACA
TTCTTCATAC
CACAATGCCT
CTGGGTCTCA
CGTGCCTTCA
ACCACTTACA
GAGATTAGCA
TACTACTGGA
AAGTTTTTGC
GTGTGTGGGG
CATTCCAGCA
CTGGTTTGAG,
AGAGCCTAAA
TGTCTTTTCT
CTGCAATTAA
ACATGCCCTT
GTAGGGTATC
TCATTAGCAT
GAGCTAAGGT
CCGTAGGAAA
CAACCAATCC
CCTTCCTGGT
TGAGGATCCA
TGAGTCTGCA
CCACTATTTT
GCCTTGTGGC
CCTGGCTACT
GGATTAAGAA
CGCAGCTCCC
AACATCGGAC
TCCTGACCTC
TTCCTTGACC
CTTGCCTCTT
TGGGAACTTG
CATCCTCCTT
TGCCCTGGT.A
TGCTATGTTT
AGATAGGTTC
TCTGATTGCA
CCCCGACCTG
AGGCTACCAG
AATACTGTAC
TAGCTACCCT
GAGACCTTTC
GATTCTCTTT
AACATTCAGT
GTGGCTCTGC
ATTCCATGAT
TGGTCACACA
GGTGGTGTGA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 AAGCGACGGA TACGTCCTAG TGCTGTCTAT INFORMATION FOR SEQ ID NO:2: SEQUENCE CHARACTERISTICS: LENGTH: 419 amino acids TYPE: amino acid STRAN~DEDNESS: single TOPOLOGY: linear WO 00/22131 PCT/US99/24065 -3- (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: Met Val Phe Ser Ala Val Leu Thr Ala Phe His Thr Gly Thr Ser Asn 1. 5 10 Thr Pro Giu Ala Gin Leu Arg Leu Thr 145 Phe Ile Tyr Val Arg 225 Ala Thr Pro Thr Val Ile Giy Ser Leu 130 Arg Trp Asp Arg Ala 210 Ala Tyr Phe Phe Met Pro Thr Asn Ala 115 Ala Trp Leu Arg Ala 195 Phe Pro Val Val Gin Ala Thr Leu Leu 100 Ile Val Ile Phe Phe 180 Lys Pro Gin Ile Val His Pro Thr Ser Vai Asn Leu Phe Val 165 Leu Val Leu Cys Leu 245 Tyr Pro Thr Pro 70 Ala Val Ile Asn Gly 150 Ile Ile Leu Al a Val 230 Ile Glu Asp Gly 55 Al a Ile Cys Leu Met 135 Lys Glu Ile Ile Val 215 Phe Ser Asn Leu 40 Leu Al a Met Leu Leu 120 Pro Phe Gly Val Ala 200 Gly Gly Leu Thr 25 Ser Ser Phe Ile Met 105 Ala Phe Phe Val Gin 185 Val Asn Tyr Ile Tyr Pro Ser Lys Phe 90 Val Ser Ala Cys Al a i7 0 Arg Ser Pro Thr Ser 250 Met Leu Leu Ser 75 Ile Tyr Leu Leu Arg 155 Ile Gin Trp, Asp Thr 235 Phe Asn Leu Thr Leu Leu Gin Al a Val 140 Val Leu Asp Ala Leu 220 Asn Phe Ile Arg Val Asn Phe Lys Phe 125 Thr Ser Leu Lys Thr 205 Gin Pro Ile Thr Tyr Asn Leu Val Ala 110 Ala Ile Ala Ile Leu 190 Ser Ile Gly Pro Leu Ser Ser Pro Ser Ala Asp Leu Met Ile 175 Asn Phe Pro Tyr Phe 255 Pro Phe Thr Leu Phe Met Met Thr Phe 160 Ser Pro Cys Ser Gin 240 Leu Val Ile Leu Tyr Ser Phe Met Gly Ile Leu Asn Thr Leu Arg His Asn 260 265 270 WO 00/22131 PCT/US99/24065 -4- Ala Leu Arg 275 Ser Lys Leu Ile His Ser Tyr Pro 280 Leu Glu Gly Ile Cys Leu Ser Gin Ala Gly Leu Met 290 Asp Met Ser 295 Arg Gin Arg Pro Gin Met Ser Ile Gly Phe Lys 305 Ala Thr 310 Cys Ala Phe Thr Leu Ile Leu Phe 320 Val Phe Ile Trp Ala Pro Phe 330 Gin Thr Tyr Ser Leu Val 335 Ala Thr Phe Ser Thr Trp 355 Leu Ile Tyr Ser 340 Leu His Phe Tyr His Asn Phe Leu Trp Leu Cys 360 Lys Leu Lys Ser Ala 365 Ala Phe Glu Ile 350 Leu Asn Pro Cys Leu Asp Tyr Trp Arg Lys Phe His 370 Met Met Asp 380 Leu Pro Lys Ser 385 Lys Phe 390 Phe Leu Pro Pro Gly His Arg Arg Ile Arg Pro Ser Ala Val Tyr Val Cys Gly Glu His Arg 405 410 415 Thr Val Val INFORMATION FOR SEQ ID NO:3: SEQUENCE CHARACTERISTICS: LENGTH: 1119 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: ATGTTAGCCA ACAGCTCCTC AACCAACAGT TCTGTTCTCC ACCCACCGCC TGCACTTGGT GGTCTACAGC TTGGTGCTGG GCGCTAGCCC TCTGGGTCTT CCTGCGCGCG CTGCGCGTGC ATGTGTAACC TGGCGGCCAG CGACCTGCTC TTCACCCTCT TACTACGCAC TGCACCACTG GCCCTTCCCC GACCTCCTGT TTCCAGATGA ACATGTACGG CAGCTGCATC TTCCTGATGC
CGTGTCCTGA
CTGCCGGGCT
ACTCGGTGGT
CGCTGCCCGT
GCCAGACGAC
TCATCAACGT
CTACCGACCT
CCCCCTCAAC
GAGCGTGTAC
TCGTCTCTCC
GGGCGCCATC
GGACCGCTAC
120 180 240 300 360 WO 00/2213 1 PTU9/4 PCT/US99/24065 GCCGCCATCG TGCACCCGCT GCGACTGCGC CACCTGCGGC GGCCCCGCGT GGCGCGGCTG
CTCTGCCTGG
AGGCCCTCGC
GACGAGCTGT
CTGCTGCCCC
CCCGACGCCA
GTCATCTTCC
CGGAGCAAGC
GTGATGGTGC
GCCGAGGGCT
GCCACCAACG
GCCACCAGGC
GCGTGTGGGC
GTTGCCGCTA
GGAAAGGCAG
TGGCGGCGGT
CGCAGAGCCA
TGCTGTGCTT
TGGTGGCGGC
TGCTGGCCGG
TCCGCAACAC
GGACGCGGGC
CGGATGCCGC
GCTCATCCTG
CCGGGACCTC
GCTGCTGCCC
GGTCTACTCG
GCGGCGGCGG
CGTGCCCTAC
CAGCGTGCCT
CGCCAACTGC
CCTGCGCGGC
GGCGCTCGCG
CAGTCAGGGG
GTGTTTGCCG
GAGGTGCGCC
CTCGTGCTGC
TCGGGCCGAG
AAGACCGTGC
AACAGCACGC
GCCCGCGATC
GTGCTGGACC
CTGGGCACTC
CAATCCGAAA
CTGCTCCGAC
GCCCTCTGA
TGCCCGCCGC
TATGCTTCGA
TGGCCGAGGC
TCTTCTGGAC
GCCTCCTGCT
TGGCGGTCTA
GCGTGCGCGG
CGCTGGTGTA
CGCACCGGGC
GGTCCGCCGT
CCTCCGACTC
CCGCGTGCAC
GAGCTTCAGC
GCTGGGCTTC
GCTGGCGCGC
GGCTAACCTC
CGGGCTGCTG
GGTGCTGATG
CTACTTTAGC
CAGGACCTCG
CACCACCGAC
CCACTCTCTn 480 540 600 660 720 780 840 900 960 1020 1080 1119 TCTTCCTTCA CACAGTGTCC CCAGGATTCC INFORMATION FOR SEQ ID NO:4: SEQUENCE CHARACTERISTICS: LENGTH: 372 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: Met Leu Ala Asn Ser Ser Ser Thr Asn Ser 1 5 10 Asp Tyr Arg Pro Thr His Arg Leu His Leu 20 25 Leu Ala Ala Gly Leu Pro Leu Asn Ala Leu 40 Arg Ala Leu Arg Val His Ser Val Val Ser 55 Ala Ala Ser Asp Leu Leu Phe Thr Leu Ser 70 Ser Val Leu Pro Cys Pro Val Val Tyr Ala Leu Trp Val Tyr Met Ser Leu Val Val Phe Leu Cys Asn Leu Leu Pro 75 Asp Leu Tyr Val Arg Leu Leu Cys Gin Ser Thr Tyr Ala Leu His His Trp Pro Phe Pro WO 00/22131 PCT/US99/24065 -6- 90 Thr Gly Ala Ile Phe Gin Met Asn Met Tyr Gly Ser Cys Ile Phe Leu 100 105 110 Met Leu Ile Asn Val Asp Arg Tyr Ala Ala Ile Val His Pro Leu Arg 115 120 125 Leu Arg His Leu Arg Arg Pro Arg Val Ala Arg Leu Leu Cys Leu Gly 130 135 140 Val Trp Ala Leu Ile Leu Val Phe Ala Val Pro Ala Ala Arg Val His 145 150 155 160 Arg Pro Ser Arg Cys Arg Tyr Arg Asp Leu Glu Val Arg Leu Cys Phe 165 170 175 Glu Ser Phe Ser Asp Glu Leu Trp Lys Gly Arg Leu Leu Pro Leu Val 180 185 190 Leu Leu Ala Glu Ala Leu Gly Phe Leu Leu Pro Leu Ala Ala Val Val 195 200 205 Tyr Ser Ser Gly Arg Val Phe Trp Thr Leu Ala Arg Pro Asp Ala Thr 210 215 220 Gin Ser Gin Arg Arg Arg Lys Thr Val Arg Leu Leu Leu Ala Asn Leu 225 230 235 240 Val Ile Phe Leu Leu Cys Phe Val Pro Tyr Asn Ser Thr Leu Ala Val 245 250 255 Tyr Gly Leu Leu Arg Ser Lys Leu Val Ala Ala Ser Val Pro Ala Arg 260 265 270 Asp Arg Val Arg Gly Val Leu Met Val Met Val Leu Leu Ala Gly Ala 275 280 285 Asn Cys Val Leu Asp Pro Leu Val Tyr Tyr Phe Ser Ala Glu Gly Phe 290 295 300 Arg Asn Thr Leu Arg Gly Leu Gly Thr Pro His Arg Ala Arg Thr Ser 305 310 315 320 Ala Thr Asn Gly Thr Arg Ala Ala Leu Ala Gin Ser Glu Arg Ser Ala 325 330 335 Val Thr Thr Asp Ala Thr Arg Pro Asp Ala Ala Ser Gin Gly Leu Leu 340 345 350 Arg Pro Ser Asp Ser His Ser Leu Ser Ser Phe Thr Gin Cys Pro Gin 355 360 365 Asp Ser Ala Leu 370 WO 00/22131 PCT/US99/24065 -7- INFORMATION FOR SEQ ID i) SEQUENCE CHARACTERISTICS: LENGTH: 1107 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID ATGGCCAACT CCACAGGGCT GAACGCCTCA GAAGTCGCAG
GCAGCTGTCG
CGCACGCCGG
GCGGCCGCCT
GTGCGCCTGG
GCCTGCACGC
CTGCGGCCAG
GGACTGCTGG
CGCTGCTCGG
TTCGCGCTGC
CGCGCTGCCC
GATAGCCGCC
CTGGCCCCAG
TGCCTGGCGC
TCGGCCTTCG
CTGGGCCGCC
TGGCACCCGC
CCTTCTGAGG
TGGAGGTGGG
GACTGCGCGA
CCATCATGCC
GCCCCGCGCC
TCGGGGTGGC
GCTCGCGGCC
GCGCGCTCTC
TCCTGGCTGG
CCGCCCTCCT
TGAGGCCCCC
TTTCCATCTT
CGCTGGCCGT
CCGCAGCGCG
CGGCTCACCC
TCTCTCGCCG
GGGCACTCTT
CTCCAGAACA
GGCACTGCTG
CGCGCTCTAC
GCTGGGCCTG
ATGCCGCGCC
CGCACTTGGC
GCCGCCTGTG
CCTGCTCGGC
GGGCCTCGGG
GCTGCTCGGC
ACGGCCGGCG
GCCGCCGCTC
GGGCCAATTT
GGCCGCGGAA
CTTCCTGTAC
TGCACTGCCT
GCAATGCCTC
GACCCCCGAG
GGCAACGGCG
CTGGCGCACC
CTGGCCGCAC
GCTCGCTTCC
CTGGCACGCT
CTCGTGCTCA
CCGCCGCCCG
CCCTTCCGGC
GCCTACGGCG
CGCGGGTCCC
CGGCCTCGCC
GCAGCCTGCT
GCCGAAGCGG
GGGCTGCTGC
GGACCTGTGC
CAGAGACCCC
TTGGCAGGAG
GCTCGTTGGG
CGCTGCTGGT
TGTGCGTCGT
CGCCGCCCGG
TCTCCGCCGC
ACCGCCTCAT
CCGCCGTGTG
CACCGCCCCC
CGCTCTGGGC
GCATCTTCGT
GACTCCGCTC
TGCCCGGGGG
GGCTGCCTTA
CTGTCACCTG
AGCGCCCCGT
GGGCCTGCAC
CAGAGGGCCC
GGCGGAGCCC
GTTGATCCTG
CGTGGTGCTG
GGACCTGCTG
GCTGGGCCGC
TCTGCTGCCG
CGTGCACCCG
GGCCGCGGCG
TGCTCCTGCT
CCTGCTGGCC
GGTGGCGCGT
GGACTCTCTG
CAAGGCGGCC
TGGCTGCGCG
GGTCGCCTAC
GCGCTTGGCA
TCCGCAAGCC
TGCCGTAGGC
CGCATACCAG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1107 GGGCCACCTG AGAGTTCTCT CTCCTGA INFORMATION FOR SEQ ID NO:6: SEQUENCE CHARACTERISTICS: LENGTH: 368 amino acids WO 00/22131 PCT/US99/24065 -8- TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: Met Ala Asn Ser Thr Gly Leu Asn Ala Ser Glu Val Ala Gly Ser Leu 1 5 10 Gly Leu Ile Leu Ala Ala Val Val Glu Val Gly Ala Leu Leu Gly Asn 25 Gly Ala Leu Leu Val Val Val Leu Arg Thr Pro Gly Leu Arg Asp Ala 40 Leu Tyr Leu Ala His Leu Cys Val Val Asp Leu Leu Ala Ala Ala Ser 55 Ile Met Pro Leu Gly Leu Leu Ala Ala Pro Pro Pro Gly Leu Gly Arg 65 70 75 Val Arg Leu Gly Pro Ala Pro Cys Arg Ala Ala Arg Phe Leu Ser Ala 90 Ala Leu Leu Pro Ala Cys Thr Leu Gly Val Ala Ala Leu Gly Leu Ala 100 105 110 Arg Tyr Arg Leu Ile Val His Pro Leu Arg Pro Gly Ser Arg Pro Pro 115 120 125 Pro Val Leu Val Leu Thr Ala Val Trp Ala Ala Ala Gly Leu Leu Gly 130 135 140 Ala Leu Ser Leu Leu Gly Pro Pro Pro Ala Pro Pro Pro Ala Pro Ala 145 150 155 160 Arg Cys Ser Val Leu Ala Gly Gly Leu Gly Pro Phe Arg Pro Leu Trp 165 170 175 Ala Leu Leu Ala Phe Ala Leu Pro Ala Leu Leu Leu Leu Gly Ala Tyr 180 185 190 Gly Gly Ile Phe Val Val Ala Arg Arg Ala Ala Leu Arg Pro Pro Arg 195 200 205 Pro Ala Arg Gly Ser Arg Leu Arg Ser Asp Ser Leu Asp Ser Arg Leu 210 215 220 Ser Ile Leu Pro Pro Leu Arg Pro Arg Leu Pro Gly Gly Lys Ala Ala 225 230 235 240 Leu Ala Pro Ala Leu Ala Val Gly Gin Phe Ala Ala Cys Trp Leu Pro WO 00/22131 WO 0022131PCTIUS99/24065 -9- 245 Cys 250 Ala Tyr Gly Cys Ala Ala Val 275 Leu Tyr Gly Ala 260 Thr Leu Ala Pro Arg Ala Ala 255 Glu Ala Glu 270 His Pro Phe Gly Arg Leu Trp, Val Ala Tyr 280 Pro Ala Phe Ala Al a 285 Leu Leu Leu Gln 290 Ser Arg Arg 295 Gly Val Arg Leu Arg Ala Leu Pro Val Arg 305 Trp Al a 315 Gln Thr Pro Gln Ala 320 His Pro Arg Leu Gln Cys Leu 330 Glu Arg Pro Pro Glu Gly 335 Pro Ala Val Gly Gly Arg 355 Gly 340 Ser Ser Glu Ala Pro 345 Gly Gln Thr Pro Glu Leu Ala 350 Ser Leu Ser Pro Ala Tyr Pro Pro Glu Ser 365 INFORMATION FOR SEQ ID NO:7: SEQUENCE CHARACTERISTICS: LENGTH: 1008 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: ATGGAATCAT CTTTCTCATT TGGAGTGATC CTTGCTGTCC
ACTAACACAC
CTCTGCTTCA
CTACTCACAG
CGGATGGCAT
TTTGACAGGT
GTGGCCGGGG
CTCGGAATCC
TTTCACCCTC
TTTGTCTTCT
TAGTGGCTGT
CCTTGAATCT
ACCAGCTCTC
TTGTCACTTC
ACCTTGCCAT
CCTGCATTGC
CCATGTTCCA
ACTTCGTGCT
TCTACTGCGA
GGCTGTGCTG
GGCTGTGGCT
CAGCCCTTCT
CTCCGCAGCT
CAAGCAGCCC
CGGGCTGTGG
GCAGACTGCC
GACCCTCTCC
CATGCTCAAG
CTGTTGATCC
GACACCTTGA
CGGCCCACAC
GCCTCTGTCC
TTCCGCTACT
TTAGTGTCTT
TACAAAGGGC
TGCGTTGGCT
ATTGCCTCCA
TGGCCTCCCT
ACAAGAATGA
TTGGTGTGGC
AGAAGACCCT
TCACGGTCAT
TGAAGATCAT
ACCTCATTGG
AGTGCAGCTT
TCTTCCCAGC
TGCACAGCCA
CATCATTGCT
TGGTGTCAGT
CATCTCTGGC
GTGCAGCCTG
GCTGATCACC
GAGTGGGTTC
CTTCCTCCCA
CTTTGCTGTA
CATGCTCCTC
GCAGATTCGA
120 180 240 300 360 420 480 540 600 WO 00/22131 WO 0022131PCTIUS99/24065 10 AAGATGGAAC ATGCAGGAGC CATGGCTGGA GGTTATCGAT TTCAAAGCTC TCCGTACTGT GTCTGTTCTC ATTGGGAGCT TTCCTTATCA CTGGCATTGT GCAGGTGGCC TGCCAGGAGT GAACGGTACC TGTGGCTGCT CGGCGTGGGC AACTCCCTGC TATTGGCAGA AGGAGGTGCG ACTGCAGCTC TACCACATGG CTCACCTCAT TCCTCCTCTT TCTCTCGGCC AGGAATTGTG AGTTCCTGTC ACATCGTCAC TATCTCCAGC TCAGAGTTTG, INFORMATION FOR SEQ ID NO:8: SEQUENCE CHARACTERISTICS: LENGTH: 335 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein
CCCCACGGAC
TTGCTCTATC
GTCACCTCTA
TCAACCCACT
CCCTAGGAGT
GCCCAGAGAG
ATGGCTAA
TCCCAGCGAC
CTGGACCCCC
CCTAGTGCTG
CATCTATGCC
GAAGAAGGTG
GCCCAGGGAA
660 720 780 840 900 960 1008 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: Met Glu Ser Ser Phe Ser Phe Gly Val Ile Leu Ala Val Leu Ala Ser 1 Leu Ile Ile Ala Thr Asn Thr Leu Val 25 Leu Ala Val Ala Val Ile His Lys Val Ala Asp Asn Asp Gly Val Ser 40 Val Cys Phe Thr Leu Leu Leu Leu Leu Asn Leu Ala Leu Thr Asp Thr Leu Ile Gln Leu Gly 55 Arg Ala Ile Ser Gly Thr Ser Ser Pro Arg Ser 70 Thr Pro Thr Gin Lys 75 Al a Leu Cys Ser Leu Met Ala Phe Val Phe Ser Ser Ala Ser Val Leu Thr Val Met Leu Ile Tyr Leu Lys 115 Leu Trp Leu 130 Asp Arg Tyr Leu 105 Val Ile Lys Gin Met Ser Gly Ala Gly Ala Cys 125 Leu Pro Phe Arg 110 Ile Ala Gly Gly Ile Pro Val Ser Tyr Leu 135 Gly Phe Leu Pro 140 Met Phe Gln Gin Thr Ala Tyr Lys Gly Gin Cys Ser Phe Phe Ala Vai WO 00/22131 PCT/US99/24065 -11- 145 150 155 160 Phe His Pro His Phe Val Leu Thr Leu Ser Cys Val Gly Phe Phe Pro 165 170 175 Ala Met Leu Leu Phe Val Phe Phe Tyr Cys Asp Met Leu Lys Ile Ala 180 185 190 Ser Met His Ser Gin Gin Ile Arg Lys Met Glu His Ala Gly Ala Met 195 200 205 Ala Gly Gly Tyr Arg Ser Pro Arg Thr Pro Ser Asp Phe Lys Ala Leu 210 215 220 Arg Thr Val Ser Val Leu Ile Gly Ser Phe Ala Leu Ser Trp Thr Pro 225 230 235 240 Phe Leu Ile Thr Gly Ile Val Gin Val Ala Cys Gin Glu Cys His Leu 245 250 255 Tyr Leu Val Leu Glu Arg Tyr Leu Trp Leu Leu Gly Val Gly Asn Ser 260 265 270 Leu Leu Asn Pro Leu Ile Tyr Ala Tyr Trp Gin Lys Glu Val Arg Leu 275 280 285 Gin Leu Tyr His Met Ala Leu Gly Val Lys Lys Val Leu Thr Ser Phe 290 295 300 Leu Leu Phe Leu Ser Ala Arg Asn Cys Gly Pro Glu Arg Pro Arg Glu 305 310 315 320 Ser Ser Cys His Ile Val Thr Ile Ser Ser Ser Glu Phe Asp Gly 325 330 335 INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 1413 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: ATGGACACTA CCATGGAAGC TGACCTGGGT GCCACTGGCC ACAGGCCCCG CACAGAGCTT GATGATGAGG ACTCCTACCC CCAAGGTGGC TGGGACACGG TCTTCCTGGT GGCCCTGCTG 120 CTCCTTGGGC TGCCAGCCAA TGGGTTGATG GCGTGGCTGG CCGGCTCCCA GGCCCGGCAT 180 GGAGCTGGCA CGCGTCTGGC GCTGCTCCTG CTCAGCCTGG CCCTCTCTGA CTTCTTGTTC 240 WO 00/22131 WO 00/213 1PCTIUS99/24065 12
CTGGCAGCAG
ACAGCTGCCT
CTGCTGGCCG
GGGCACCGCC
CTCTTCAGCG
ATCTGCCTGG
GGCTTCCTGC
CGCACCTGCC
ACCATTCTGT
CTGGCCTTCC
GACTACCTGA
GACCTCCGGA
CGGCCGGGCA
CTGCCAGAGC
AACCCCACAC
CAGCCACAGT
GATTCTGTGG
TCTGTGCCCA
GCCCTTGAGG
CGGCCTTCCA GATCCTAGAG ATCCGGCATG GGGGACACTG GCCGCTGGGG
GCCGCTTCTA
CCCTCAGCCT
CAGTCCGCCT
TGCCCTGGCT
ACTTCTGGGA
CTTTCCTCCT
ACCGCCAACA
CAGCCTATGT
TGTGGGACGT
TCCTACTCAA
CCCTGCTGCG
GCTTCACGCC
CGATGGCAGA
TCCAGCCACG
CGGATCCCAC
CCCAGCCACA
GTCCCTGTGA
ACCCAGCCAC
CTACTTCCTA
CGACCGCTGC
GCCCCTCTGG
GGTCTTCCCC
CAGCGAGGAG
GCTGCTCGTC
GCAGCCCGCA
GGTCCTGAGG
CTACTCTGGC
CAGCTGCCTC
CTCCGTGCTC
CACTGAGCCA
GGCCCAGTCA
ATCGGATCCC
AGCCCAGCCA
GGCAGACACT
TGAAiGCTTCC
ACCTCCTGCC
TGGGGCGTGT
CTGCTGGCGC
GTCTGCGCCG
GAGGCTGCCG
CTGTCGCTGA
TGCCACGTGC
GCCTGCCGGG
CTGCCCTACC
TACCTGCTCT
AGCCCCTTCC
TCGTCCTTCG
CAGACCCAGC
CAGATGGATC
ACAGCTCAGC
CAGCTGAACC
AACGTCCAGA
CCAACCCCAT
TCTGAAGGAG
CCTACTCCTC
TGTGCCCACA
GTGTCTGGGT
TCTGGTGGTA
GGATGCTGGA
TCACCCAGGC
GCTTCGCCCG
AGCTGGCCCA
GGGAGGCCCT
TCTGCCTCAT
CGGCAGCTCT
TAGATTCTGA
CTGTGGCCCA
CACAGCTGAA
TCATGGCCCA
CCCCTGCACC
CCTCGCATCC
AAAGCCCCAG
CGGCCTCTTC
CTGGTACCCT
GCTGGCCACA
CGACCTGGTC
GGTCCTGGGG
CACAGCCTGT
TGTGGCCAGG
GCTGCTCTAC
GGTCTACTCC
GGCCAGTGCC
CTGCGAGGAG
GGGTCCAACT
GCCTCAGGTG
CCCTACGGCC
GCCACAGTCA
TGCTGCCAGT
TACCCCAGGG
CAGCACCCCG
300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1413 CCAGAGGCGG CCCCGGGCGC AGGCCCCACG TGA (11) INFORMATION FOR SEQ ID NO:l0: SEQUENCE CHARACTERISTICS: LENGTH: 468 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1O: Met Asp Thr Thr Met Glu Ala Asp Leu Gly Ala Thr Gly His Arg Pro 1 5 10 PCT/US99/24065 WO 00/22131 -13- Arg Thr Leu Arg Leu Trp Val Arg Val 145 Leu Tyr Leu Leu Gin 225 Leu Leu Glu Ser Thr Glu Val Phe Met Ala Leu Ala Ala Ala Pro Leu Ser Tyr 115 Cys Leu 130 Arg Leu Phe Ser Asp Leu Arg Met 195 Val Cys 210 Gin Pro Ser Ala Tyr Leu Ala Leu 275 Pro Phe 290 Leu Asp Leu Val Trp Leu Leu Leu Ala Ala Gly Thr 100 Ser Ser Leu Ala Pro Leu Val Pro 165 Val Ile 180 Leu Glu His Val Ala Ala Tyr Val 245 Ala Phe 260 Val Tyr Leu Cys Asp Ala Ala Leu 70 Phe Ala Gly Leu Trp 150 Trp Cys Val Leu Cys 230 Val Leu Ser Leu Glu Leu Gly 55 Leu Gin Ala Leu Cys 135 Val Leu Leu Leu Thr 215 Arg Leu Trp Asp Met 295 Asp Leu 40 Ser Ser Ile Cys Phe 120 Pro Cys Val Asp Gly 200 Gin Gly Arg Asp Tyr 280 Ala Ser 25 Leu Gin Leu Leu Arg 105 Leu His Ala Phe Phe 185 Gly Ala Phe Leu Val 265 Leu Ser Tyr Leu Ala Ala Glu 90 Phe Leu Trp Gly Pro 170 Trp Phe Thr Ala Pro 250 Tyr Ile Ala Pro Gly Arg Leu 75 Ile Tyr Ala Tyr Val 155 Glu Asp Leu Arg Arg 235 Tyr Ser Leu Asp Gin Leu His Ser Arg Tyr Ala Pro 140 Trp Ala Ser Pro Thr 220 Val Gin Gly Leu Leu 300 Gly Pro Gly Asp His Phe Leu 125 Gly Val Ala Glu Phe 205 Cys Ala Leu Tyr Asn 285 Arg Gly Ala Ala Phe Gly Leu 110 Ser His Leu Val Glu 190 Leu His Arg Ala Leu 270 Ser Thr Trp Asn Gly Leu Gly Trp Leu Arg Ala Trp 175 Leu Leu Arg Thr Gin 255 Leu Cys Leu Asp Gly Thr Phe His Gly Asp Pro Thr 160 Trp Ser Leu Gin Ile 240 Leu Trp Leu Leu Arg Ser Val Leu Ser Ser Phe Ala Ala Ala Leu Cys Glu Glu Arg Pro WO 00/22131 PCT/US99/24065 -14- 305 Gly 315 Gin Ser Phe Thr Glu Pro Gin Leu Asp Ser 320 Glu Gly 335 Pro Thr Leu Val Ala Gin 355 Thr Ala Gin Pro 340 Pro Pro Met Ala Glu 345 Thr Gin Ser Gin Gin Val Asn Leu Gin Pro Arg 365 Gin Met Asp Pro 350 Ser Asp Pro Ser Asp Pro Pro Gin Leu Thr Ala Gin 370 Thr Ala Pro 380 Pro Gin Pro Gin 385 Val Leu 390 Ala Leu Met Ala Gin 395 Gin Gin Ser Asp Ser 400 Ala Gin Pro Gin 405 Pro Asp Thr Asn Thr Pro Ala Pro Ala 415 Ala Ser Ser Ser His Pro 435 Val 420 Thr Ser Pro Cys Asp 425 Glu Ala Ser Pro Thr Pro Ser 430 Pro Pro Ala Pro Gly Ala Leu 440 Asp Pro Ala Thr 445 Ser Glu Gly Glu 450 Ala Gly Pro Thr 465 Ser Pro Ser Ser Thr Pro Pro Glu Ala Ala Pro Gly 455 460 (12) INFORMATION FOR SEQ ID NO:11: SEQUENCE CHARACTERISTICS: LENGTH: 1248 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: ATGTCAGGGA TGGAAAAACT TCAGAATGCT TCCTGGATCT ACCAGCAGAA ACTAGAAGAT CCATTCCAGA AACACCTGAA CAGCACCGAG GAGTATCTGG CCTTCCTCTG CGGACCTCGG CGCAGCCACT TCTTCCTCCC CGTGTCTGTG GTGTATGTGC CAATTTTTGT GGTGGGGGTC ATTGGCAATG TCCTGGTGTG CCTGGTGATT CTGCAGCACC AGGCTATGAA GACGCCCACC AACTACTACC TCTTCAGCCT GGCGGTCTCT GACCTCCTGG TCCTGCTCCT TGGAATGCCC 120 180 240 300 WO 00/22131 WO 0022131PCT/US99/24065 15 CTGGAGGTCT ATGAGATGTG GCGCAACTAC CCTTTCTTGT TCGGGCCCGT GGGCTGCTAC
TTCAAGACGG
AGCGTGGAGC
CGCCGGGCCC
AACACCAGCA
TCGGCCACCT
TCCTTCCTAT
CTCAGACTAA
CCCTGCAGAA
TGGGCCCCGT
CTGGCTGCTG
GCTGTCAACC
GTGATCTCTT
CAGCGGAACA
CAATTCCCAT
CCCTCTTTGA
GCTACGTGGC
TCAGGATCCT
TCCATGGCAT
GTACGGTCAT
TCTACCTCCT
AGAAAGACAA
AATCAGTCAA
TCCACATTGA
TGTTCAACCT
CCATTATCTA
CTTTCCACAA
TCTTCCTGAC
GTCAGTCATC
GACCGTGTGC
CATCCTACAC
CGGCATCGTC
CAAGTTCCAC
CAAGCCCATG
CCCCATGACT
ATCTCTTGAG
CAAGATGCTG
CCGACTCTTC
CGTCCATGTG
TAACCTACTG
ACAGTGGCAC
AGAATGCCAC
CATGCACAAC
TTCGCCTCCA
CCGTTCCGCG
TGGGGCTTCT
TACTTCCCCA
TGGATCTACA
GTCATCAGTG
GCAGATGAAG
TTTGTCTTGG
TTCAGCTTTG
GTGTCAGGTG
TCTCGCCGCT
TCCCAGCATG
TTTGTGGAGC
TCTCACCTCC
TCCTCAGCAT
CCAAACTGCA
CCGTGCTCTT
ATGGGTCCCT
ATTTCATCAT
TCCTCTACTA
GGAATGCAAA
TCTTAGTGTT
TGGAGGAGTG
TCTTCTTCTA
TCCAGGCAGC
ACCCACAGTT
TGACCGAAGA
CAACAGCCCT
CACCACCGTC
GAGCACCCGG
CTCCCTGCCC
GGTCCCAGGT
CCAGGTCACC
CCTCATGGCA
TATTCAAAGA
TGCTATCTGT
GAGTGAATCC
CCTGAGCTCA
ATTCCAGAAT
GCCACCTGCC
TATAGGTCCC
CTCTAGTGAA
420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1248 CAGATGTCAA GAACAAACTA TCAAAGCTTC CACTTTAACA AAACCTGA (13) INFORMATION FOR SEQ ID NO:12: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 415 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULhE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: Met Ser Gly Met Glu Lys Leu Gin Asn Ala Ser Trp Ile Tyr Gin Gin 1 5 10 Lys Leu Glu Asp Pro Phe Gin Lys His Leu Asn Ser Thr Giu Giu Tyr 25 Leu Ala Phe Leu Cys Gly Pro Arg Arg Ser His Phe Phe Leu Pro Val 40 Ser Val Val Tyr Val Pro Ile Phe Val Val Gly Val Ile Gly Asn Val WO 00/22131 WO 0022131PCT[US99/24065 16 Leu Asn Leu Leu Val Tyr 145 Arg Phe Pro Pro Tyr 225 Leu Asn Leu Leu Phe 305 Ala Val Tyr Gly Phe Cys 130 Val Arg Ser Asn Met 210 Leu Arg Ile Val Phe 290 Asn Val Cys Tyr Met Gly 115 Phe Ala Ala Leu Gly 195 Trp Leu Leu Gin Leu 275 Phe Leu Asn Leu Val Leu Phe Pro Leu 100 Pro Val Ala Ser Ile Leu Leu Arg 165 Pro Asn 180 Ser Leu Ile Tyr Pro Met Lys Lys 245 Arg Pro 260 Val Phe Ser Phe Vai His Pro Ile 325 55 Ile Leu 70 Ser Leu Giu Val Gly Cys Ile Leu 135 His Pro 150 Ile Leu Thr Ser Val Pro Asn Phe 215 Thr Val 230 Asp Lys Cys Arg Ala Ile Val Giu 295 Val Val 310 Ile Tyr Gin His Ala Val Tyr Giu 105 Tyr Phe 120 Ser Ile Phe Arg Gly Ile Ile His 185 Gly Ser 200 Ile Ile Ile Ser Ser Leu Lys Ser 265 Cys Trp 280 Giu Trp Ser Gly Asn Leu Gin Ala 75 Ser Asp 90 Met Trp Lys Thr Thr Thr Ala Lys 155 Val Trp 170 Gly Ile Ala Thr Gin Val Val Leu 235 Giu Ala 250 Val Asn Ala Pro Ser Giu Val Phe 315 Leu Ser 330 Met Lys Leu Leu Arg Asn Ala Leu 125 Val Ser 140 Leu Gin Gly Phe Lys Phe Cys Thr 205 Thr Ser 220 Tyr Tyr Asp Glu Lys Met Phe His 285 Ser Leu 300 Phe Tyr Arg Arg Thr Val Tyr 110 Phe Val Ser Ser His 190 Val Phe Leu Gly Leu 270 Ile Al a Leu Phe Pro Leu Pro Glu Giu Thr Val 175 Tyr Ile Leu Met Asn 255 Phe Asp Al a Ser Gin 335 Thr Leu Phe Thr Arg Arg 160 Leu Phe Lys Phe Al a 240 Ala Val1 Arg Val Ser 320 Ala Ala Phe Gin Asn Val Ile Ser Ser Phe His Lys Gin Trp His Ser Gin 340 345 350 WO 00/22131 WO 0022131PCT[US99/24065 -17- His Asp Pro Gin Leu Pro Pro Ala Gin Arg Asn Ile Phe Leu Thr Giu 355 360 365 Cys His Phe Val Glu Leu Thr Giu Asp Ile Gly Pro Gin Phe Pro Cys 370 375 380 Gin Ser Ser Met His Asn Ser His Leu Pro Thr Ala Leu Ser Ser Giu 385 390 395 400 Gin Met Ser Arg Thr Asn Tyr Gin Ser Phe His Phe Asn Lys Thr 405 410 415 (14) INFORMATION FOR SEQ ID NO:13: SEQUENCE CHARACTERISTICS: LENGTH: 1173 base pairs TYPE: nucleic acid STRAliDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: ATGCCAGATA CTAATAGCAC AATCAATTTA TCACTAAGCA CTCGTGTTAC TTTAGCATTT
TTTATGTCCT
GTGGTGGACA
GACTTCTTTG
GATTTTGGAA
TCTGTATATA
TCTTATAGAA
GTGCTGGCCT
GGTAGTGAAT
TTGGAATTCG
CTGTGGAAGC
TCCAACATCT
TCGACAGAAG
TTTTCCTCAA
CAATCAGATT
TAGTAGCTTT
AAAACCTTAG
TGGGTGTGAT
AGGAAATCTG
ACATTGTCCT
CTCAACATAC
TCTTAGTGAA
GTGAACCTGG
TGATCCCAGT
GTGATCATCT
GTGGACACTC
TTCCTGCATC
GAACCAAGAT
CTGTAGCTCT
TGCTATAATG
ACATCGAAGT
CTCCATTCCT
TGTATTTTGG
CATCAGCTAT
TGGGGTCTTG
TGGGCCAATG
ATTTTTTTCG
CATCTTAGTC
CAGTAGGTGC
ATTCAGAGGT
CTTTCATTCA
GAATAGCAAT
TCACCAAAGG
CTAGGAAATG
AGTTATTTTT
TTGTACATCC
CTCACTACTG
GATCGATACC
AAGATTGTTA
ATTCTAGTTT
GAATGGTACA
GCTTATTTCA
CAAAGCCATC
AGACTATCTT
GAGAGACAGA
ACAATTGCTT
GAACATGTTG
CTTTGGTCAT
TTCTTAACTT
CTCACACGCT
ACTATCTGTT
TGTCAGTCTC
CTCTGATGGT
CAGAGTCTTG
TCCTTGCCAT
ACATGAATAT
CTGGACTGAC
CAAGGAGATC
GGAGAAAGAG
CCAAAATGGG
AACTGCTTAG
TTTAGCTTTT
GGCCATCTCT
GTTCGAATGG
ATGTACAGCA
AAATGCTGTG
GGCCGTTTGG
GAAGGATGAA
CACATCATTC
TTATTGGAGC
TGCTGTCTCT
TCTTTCTGCA
TAGTCTCATG
TTCCTTCTCC
AGCCAGGAGA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 WO 00/22131 WO 0022131PCTIUS99/24065 18 TTAGCCAAGT CACTGGCCAT TCTCTTAGGG GTTTTTGCTG TTTGCTGGGC TCCATATTCT CTGTTCACAA TTGTCCTTTC ATTTTATTCC TCAGCAACAG GTCCTAAATC AGTTTGGTAT AGAATTGCAT TTTGGCTTCA. GTGGTTCAAT TCCTTTGTCA ATCCTCTTTT GTATCCATTG TGTCACAAGC GCTTTCAAAA GGCTTTCTTG AAAATATTTT GTATAAAAAA GCAACCTCTA CCATCACAAC ACAGTCGGTC AGTATCTTCT TAA INFORMATION FOR SEQ ID NO:i4: SEQUENCE CHARACTERISTICS: LENGTH: 390 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein 960 1020 1080 1140 1173 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:i4: Met Pro Asp Thr Asn Ser Thr Ile Asn Len Ser Len Ser Thr 1 Thr 5 Phe 10 Ala Arg Val Len Ala Phe Met Ser Leu Val 25 Val Phe Ala Ile Asn Ala Leu Arg Ser Ser Val Ile Len Ala Phe 40 Asn Val Asp Lys Asn Asp Met Len Gly Len Arg His Phe Phe Val Tyr Phe Phe Gly Val Leu Len Leu Ala Ile Ile Ser Ile Tyr Ile Pro Asp His 75 Len Leu Phe Gin Trp Phe Gly Lys Gin Ser Cys Val Phe Trp 90 Thr Thr Asp Tyr Leu Len Cys Thr Tyr Len Ser 115 Val Leu Lys Ala 100 Val Val Tyr Asn Ile 105 Ser Val Len Ile Ser Ser Asn Ala Tyr Arg Thr Gin 125 Val Tyr Asp Arg 110 His Thr Gly Len Ala Phe Ile Val Thr Val Ala Val 130 Val Trp 140 Ser Len 145 Asn Gly Pro Met 150 Leu Val Ser Gin 155 Trp Lys Asp Gin 160 Giy Ser Gin Cys Gin Pro Giy Phe Phe Ser Gin Trp Tyr Ile Len Ala 165 170 17S WO 0022131PCTIUS99/24065 WO 00/22131 19 Ile Phe Arg Giy 225 Ser Ser Al a Gin Leu 305 Leu Ser Val Phe Ser 385 Thr Asn Cys 210 His Thr Ser Ser Arg 290 Aia Phe Val Asn Leu 370 Arg Ser Met 195 Gin Ser Giu Leu Lys 275 Glu Ile Thr Trp Pro 355 Lys Ser Phe 180 Asn Ser Phe Val Met 260 Met His Leu Ile Tyr 340 Leu Ile Val Leu Ile His Arg Pro 245 Phe Gly Val Leu Val 325 Arg Leu Phe Ser Giu Phe Tyr Trp Pro Giy 215 Giy Arg 230 Ala Ser Ser Ser Ser Phe Giu Leu 295 Giy Val 310 Leu Ser Ile Aia Tyr Pro Cys Ile 375 Ser 390 Vai Ser 200 Leu Leu Phe Arg Ser 280 Leu Phe Phe Phe Leu 360 Ile 185 Leu Thr Ser His Thr 265 Gin Arg Ala Tyr Trp 345 Cys Pro Trp Al a Ser Ser 250 Lys Ser Al a Val Ser 330 Leu His Val Lys Val Arg 235 Giu Met Asp Arg Cys 315 Ser Gin Lys Ile Arg Ser 220 Arg Arg Asn Ser Arg 300 Trp Ala Trp Arg Leu Val 190 Asp His 205 Ser Asn Ser Leu Gin Arg Ser Asn 270 Val Ala 285 Leu Ala Ala Pro Thr Gly Phe Asn 350 Phe Gin 365 Ala Leu Ile Ser Arg 255 Thr Leu Lys Tyr Pro 335 Ser Lys Tyr Ser Cys Ala 240 Lys Ile His Ser Ser 320 Lys Phe Al a Lys Lys Gin Pro Leu Pro Ser Gin His 380 (16) INFORMATION FOR SEQ ID Wi SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID WO 00/22131 PCTIUS99/24065 GGAAAGCTTA ACGATCCCCA GGAGCAACAT (17) INFORMATION FOR SEQ ID NO:l6: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: CTGGGATCCT ACGAGAGCAT TTTTCACACA G 31 (18) INFORMATION FOR SEQ ID NO:17: SEQUENCE CHARACTERISTICS: LENGTH: 1128 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: ATGGCGAACG CGAGCGAGCC GGGTGGCAGC GGCGGCGGCG AGGCGGCCGC CCTGGGCCTC
AAGCTGGCCA
CTGCTGATCG
TGCCTGGCCG
CGTGCGGCGG
CTGGCCGCGC
TACCTGGCCA
GCCATGCTGG
GACGGCGGTG
CCCGGCGCGC
TACCTCCGCC
CGCTCAGCCT
TGCGGGAGCG
ACGGGCTGCG
CCGCGGCGGG
TCTTCTGCTT
TCGCGCACCA
TGTGCGCCGC
GCGACGACGA
TGGGCTTCCT
TGCTCTTCTT
GCTGCTGTGC
CAGCCTGCAC
CGCGCTCGCC
GGCGCCGCCG
CCACGCCGCC
CCGCTTCTAT
CTGGGCGCTG
GGACGCGCCG
GCTGCTGCTG
CATCCACGAC
GTGAGCCTAG
CGCGCCCCGT
TGCCTCCCGG
GGCGCGCTGG
TTCCTGCTGC
GCAGAGCGCC
GCGCTGGCCG
TGCGCCCTGG
GCCGTGGTGG
CGCCGCAAGA
CGGGCAACGT
ACTACCTGCT
CCGTCATGCT
GCTGCAAGCT
TGGGCGTGGG
TGGCCGGCTG
CGGCCTTCCC
AGCAGCGGCC
TGGGCGCCAC
TGCGGCCCGC
GCTGTTCGCG
GCTCGACCTG
GGCGGCGCGG
GCTCGCCTTC
CGTCACCCGC
GCCGTGCGCC
GCCAGTGCTG
CGACGGCGCC
GCACCTCGTC
GCGCCTGGTG
120 180 240 300 360 420 480 540 600 660 WO 00/22131 WO 00/213 1PCT/US99/24065 -21- CCCGCCGTC.A GCCACGACTG GACCTTCCAC GGCCCGGGCG AACTGGACGG CGGGCTTCGG CCGCGGGCCC ACGCCGCCCG GCAGGGCCGG GCCGCGGCGC GCGCCGCCTC CTCGTGCTGG AGGCTGTGCA AGATGTTCTA CGCCGTCACG CTGCTCTTCC GTCGTGGCC.A GCTACCTGCG GGTCCTGGTG CGGCCCGGCG ACGGCCTCCG TGTGGCTGAC CTTCGCGCAG GCCGGCATCA TTCAACAGGG AGCTGAGGGA CTGCTTCAGG GCCCAGTTCC ACCACCCAGG CGACCCATCC CTGCGACCTG AAAGGCATTG (19) INFORMATION FOR SEQ ID NO:18: SEQUENCE CHARACTERISTICS: LENGTH: 375 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein
CCACCGGCCA
CGCTTGTGGG
AAGAATTCAA
TGCTCCTCTG
CCGTCCCCCA
ACCCCGTCGT
CCTGCTGCCA
GTTTATGA
GGCGGCCGCC
CATCCGGCCC
GACGGAGAAG
GGGGCCCTAC
GGCCTACCTG
GTGCTTCCTC
GAGCCCCCGG
720 780 840 900 960 1020 1080 1128 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: Met Ala Asn Ala Ser Glu Pro Gly Gly Ser Gly Gly Gly Glu Ser Ala Ala Ala Leu Gly Leu Ala Gly Leu His Arg Leu Asn Lys Leu Ala Thr Leu 25 Leu Leu Leu Leu Cys Val Ser Glu Arg Ser Val Leu Phe Al a 40 Leu Leu Ile Val Arg Ala Pro Tyr Gly Leu Tyr 55 Cys Leu Leu Asp Leu Met Cys Leu Ala Asp Arg Ala Leu Arg Ala 70 Ala Leu Pro Ala Val Gly Leu Ala Ala Ala Ala Ala Gly Ala Pro Pro Cys Ala Leu Gly Cys Lys Leu Leu Ala Leu Leu Gly 115 Phe 100 Val Ala Ala Leu Phe His Ala Gly Val Thr Arg 120 Leu Ala Ile Ala 125 Al a Ala Phe Leu 110 His His Arg Met Leu Val Phe Tyr Ala Glu Arg Leu Ala Gly Trp Pro Cys Ala 130 135 140 WO 00/22131 WO 0022131PCTIUS99/24065 -22 Cys 145 Asp Pro Val His His 225 Asn Gly Leu Val Tyr 305 Thr Val Phe Asp Al a Gly Asp Val Asp 210 Asp Trp Ile Glu Thr 290 Leu Ala Cys Pro Leu 370 Trp Gly Ala 180 Ala Arg Thr Ala Pro 260 Phe Leu Val Val Leu 340 Cys Gly Ala Asp 165 Pro Thr Lys Phe Gly 245 Ala Lys Phe Leu Trp 325 Phe Gin Ile Leu 150 Asp Gly His Met His 230 Phe Gly Thr Leu Val 310 Leu Asn Ser Gly Ala Leu Ala Ala Ala Phe Pro Pro Val Leu 155 160 Glu Ala Leu Arg 215 Gly Gly Pro Glu Leu 295 Arg Thr Arg Pro Leu 375 Ala Gly 185 Tyr Ala Gly Gly Arg 265 Arg Trp Gly Ala Leu 345 Thr Cys Leu Arg Leu Thr 235 Thr Ala Cys Pro Val 315 Al a Asp Gin Al a Leu Leu Val 220 Gly Pro Arg Lys Tyr 300 Pro Gly Cys Ala Leu Leu Leu 205 Pro Gin Pro Arg Met 285 Val Gin Ile Phe Thr 365 Gin 175 Ala Phe Val Ala Leu 255 Leu Tyr Ala Tyr Pro 335 Ala Pro Arg Val Ile Ser Al a 240 Val Val Aila Ser Leu 320 Val Gin Cys INFORMATION FOR SEQ ID NO:19: SEQUENCE CHARACTERISTICS: LENGTH: 1002 base pairs TYPE: nucieic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) WO 00/22131 WO 0022131PCT/US99/24065 -23 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: ATGAACACCA CAGTGATGCA AGGCTTCAAC AGATCTGAGC GGTGCCCCAG AGACACTCGG ATAGTACAGC TGGTATTCCC AATACTTTGG CTCTGTGGGT CTCAAAAACA CTTTGGTGGC TCTGACTCAC ACCTGGCACC ATATTTTATG AGACCATGTA TTCCTCAAGA TCATCAGACC ACGGTCTCAA TCTTCATCTG AGCAACAAGG AAGCAACACC GGGCTGAAAT GGCATCAAAT ATCCTAATGC TTGTGTTTTA TCCAAAAGTA AGGACAGAAA.
GCTGTCTTCT TTGTGTGTTT CAAACCAACA ATAAGACTGA ACTCTCTTTT TGGCAGCAAC AAAAAATTCA CAGAAAAGCT
AGCCCTCTAC
GTTTGTTCAC
CGACTTGATA
CTGGCAGCTC
TGTGGGCATC
TTTGAGAAAT
GTTCTTTTTG
ATCGTCTGTG
GGTAAATAAC
TGTGGTTATT
AAACAACAAA
TGCTCCATTT
CTGTAGACTG
TAACATTTGT
ACCATGTATG
ACAGTGGTTT
AT CCCCAG CT
ATGACACTCA
AGAGCTTTTG
GTGCTGTTAG
ATTTTTCTAA
TTCTTC.ATCT
AAAAAGTGTG
ATATGCCAGT
GCAAAAAAAG
AAGCTGGAAG
CATTTTGCCA
CAAAATCAAC
ATGGATCCCT
CAAGGGAGAA
TCTTGACCGG
CCTCCACCTT
TGCTTCCTTT
TGTGTCGTTT
GGCTCATAGC
AAAAACCTGT
CCCTGCCAAA
CTTCCTTAAA
TTATTTTCTG
TATATGATTC
GCAAAGTATT
GAGTTCCATA
TGTTTATTGC
TAATATACAT
AGACCACAGC
CATCCTGCTG
CATCATCTAC
CAAAATCCTC
TTCTTCGGTG
CTTTGACAGA
TTTTGCAAAA
TACGATCTTG
GGGGCCTCTG
GACTGTTTTT
TTATAGAAAG
TGTTGTCGTG
TACTCACAGT
TAAAGAAACA
ATTCTTATGT
ATCAAGCCAA
120 240 300 360 420 480 540 600 660 720 780 840 900 960 1002 GAAAATCATA GCAGTCAGAC AGACAACATA ACCTTAGGCT GA (21) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS:, LENGTH: 333 amino acids TYPE: amino acid STRAN1fEDNESS: TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Asn Thr Thr Val Met Gin Gly Phe Asn Arg Ser Glu Arg Cys Pro 1 5 10 Arg Asp Thr Arg Ile Val Gin Leu Val Phe Pro Ala Leu Tyr Thr Val 25 WO 0022131PCT/US99/24065 WO 00/22131 -24 Val Phe Leu Thr Gly Ile Leu Leu Asn Thr Leu Ala Leu Trp Val Phe Val His Leu Val Ser Asp Phe Ser Leu Gly Arg Asn 130 Phe Ile 145 Ser Asn Lys Gly Gin Phe Val Ile 210 Asp Arg 225 Ala Val Tyr Thr Gin Leu Ile Cys 290 Giu Lys Ile Pro Ala Asp Ser His Ser Vai 100 Leu Ile 115 Ile Phe Trp Phe Lys Giu Pro Leu 180 Ile Phe 195 Ala Lys Lys Asn Phe Phe His Ser 260 Phe Ile 275 Met Asp Leu Pro Ser Leu Leu Ile Ala Leu Phe Al a 165 Gly Trp Lys Asn Vai 245 Gin Ala Pro Cys Ser Ile 70 Ala Phe Phe Lys Leu 150 Thr Leu Thr Val Lys 230 Cys Thr Lys Leu Met 310 40 Ser Thr 55 Met Thr Pro Trp Tyr Glu Asp Arg 120 Lys Pro 135 Phe Phe Pro Ser Lys Trp Val Phe 200 Tyr Asp 215 Lys Leu Phe Ala Asn Asn Giu Thr 280 Ile Tyr 295 Gin Gly Phe Leu Gin Thr 105 Phe Val Ile Ser His 185 Ile Ser Giu Pro Lys 265 Thr Ile Arg Ile Met Leu 90 Met Leu Phe Se r Val 170 Gin Leu Tyr Gly Phe 250 Thr Leu Phe Lys Ile Leu 75 Arg Tyr Lys Al a Leu 155 Lys Met Met Arg Lys 235 His Asp Phe Leu Thr Tyr Pro Al a Val Ile Lys 140 Pro Lys Val Leu Lys 220 Val Phe Cys Leu Cys 300 Thr Leu Lys Phe Lys Phe Val Gly Ile 110 Ile Arg 125 Thr Val Asn Thr Cys Ala Asn Asn 190 Val Phe 205 Ser Lys Phe Val Ala Arg Arg Leu 270 Ala Ala 285 Lys Lys Ala Ser Asn Ile Cys Val Pro Ser Ile Ser 175 Ile Tyr Ser Val Val 255 Gin Thr Phe Ser Thr Leu Arg Leu Leu Ile Leu 160 Leu Cys Val Lys Vai 240 Pro Asn Asn Thr Gin 320 Giu Asn His Ser Ser Gin Thr Asp Asn Ile Thr Leu Gly WO 00/22131 PCTIUS99/24065 (22) INFORMATION FOR SEQ ID NO:21: SEQUENCE CHARACTERISTICS: LENGTH: 1122 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: ATGGCCAACA CTACCGGAGA GCCTGAGGAG GTGAGCGGCG CTCTGTCCCC ACCGTCCGCA TCAGCTTATG TGAAGCTGGT GCCATCTTGT CCCTGCTGGT CTGCTGGACC TGTGCCTGGC GCTTCTGTGC GCCACGGCTC TTTATGGCCG TGCTCTTTTG CGCTACATGG CCATCGCCCA GCGGCTGTCA TCTGCATGGC GACGTGGGCA CCTACAAGTT TTCAAGGCCA ATGACACGCT CATGCTGTCT ACGGCAAGCT CAGATGGTGC CAGCCATCAG GCTGCTGCCA ACTGGATCGC ATCCGGCAGA ATGGGCATGC GAAAAGCAGC TGGGCCGCAT CCCTACATCG TGGCCTGCTA TACCTGGCCA CTGCTGTTTG TTCCTGCTCA ACAAGGACCT
ACTGCTGGGA
GCTCAAGGAG
CGATGGCATA
TTCATGGACC
CTTCCATGCG
CCACCGCTTC
CTGGACCCTG
TATTCGGGAG
GGGCTTCATG
GCTCCTCTTC
CCAGAACTGG
CGGCTTTGGC
AGCCAGCCGG
GTTCTACGCG
CTGGCGAGTG
GATGAGCTTC
CAAGAAGTGC
CTGATTATGT
CGTGCCCTGC
CGCTCTGCCG
TTCAGTGCAC
GCCTTCATGC
TACGCCAAGC
TCTGTGGCCA
GAGGACCAGT
CTTATGTTGG
GAGTATCGTC
ACATTCCATG
CGTGGGCCCA
CGGCTACTGG
ATCACACTGC
TTTGTGAAAG
GCCCAGGCTG
CTGACCACTC
GCGTGAGCCT
ACAAGGCTCC
TCTGCTTCCC
TCAGCTGCAA
TGTTCTGCAT
GCATGACACT
TGGCCTTCCC
GCATCTTTGA
CTGTGCTCAT
ACCGCAAGAT
GTCCCGGGGC
TGCCACCAAC
GCATGGACGA
TCTTTCTGCT
CCTGTGCTGT
CCGTCAACCC
ACGCCCCCTG
GGCGGGTAAC
TTACTACTTC
CTTTGTGCTG
GATTGTGGCC
CAGCGTCACC
CTGGACATGC
ACCTGTCTTT
GCATCGCTAC
GGCAGCTACC
GAAGCCAGTG
CACCGGCCAG
CCTGCTGGGT
GGTCAAGGGT
CCTCTGGTCA
GCCCCACCGC
AATTGTCTGC
CTGGGGCACA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1122 GGAGGTGCCC CGGCTCCCAG AGAACCCTAC TGTGTCATGT GA (23) INFORMATION FOR SEQ ID NO:22: WO 0022131PCT/US99/24065 WO 00/22131 -26 Wi SEQUENCE CHARACTERISTICS: LENGTH: 373 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: Met Ala Asn Thr Thr Gly Glu Pro Glu Glu Val Ser Gly Ala Leu Ser Pro Met Lys Cys Ala Lys Met Arg Cys 145 Asp Glu Leu Leu Pro Cys Glu Leu Ser Ile Leu Phe 130 Met Val His Ala Phe 210 Ser Val Arg Ala Val Val Phe 115 Tyr Ala Gly Arg Val 195 Glu Ala Ser Ala Asp Arg Ala 100 Cys Ala Trp Thr Tyr 180 Leu Tyr Ser Leu Leu Gly His Phe Ile Lys Thr Tyr 165 Phe Met Arg Ala Al a His Ile 70 Gly Met Ser Arg Leu 150 Lys Lys Ala His Tyr Gly Lys 55 Arg Ser Al a Val Met 135 Ser Phe Al a Al a Arg 215 Val Asn 40 Al a Ser Ser Val Thr 120 Thr Val Ile Asn Thr 200 Lys Lys 25 Ala Pro Al a Trp Leu 105 Arg Leu Ala Arg Asp 185 His Met Leu Ile Tyr Val Thr 90 Phe Tyr Trp Met Glu 170 Thr Al a Lys Val Leu Tyr Cys 75 Phe Cys Met Thr Al a 155 Glu Leu Val Pro Leu Ser Phe Phe Ser Phe Al a Cys 140 Phe Asp Gly Tyr Val 220 Leu Leu Leu Pro Ala His Ile 125 Al a Pro Gln Phe Gly 205 Gln Gly Leu Leu Phe Leu Al a 110 Ala Ala Pro Cys Met 190 Lys Met Leu Val Asp Val Ser Ala His Val Val Ile 175 Leu Leu Val Ile Leu Leu Leu Cys Phe His Ile Phe 160 Phe Met Leu Pro Ala Ile Ser Gin Asn Trp Thr Phe His Gly Pro Gly Ala Thr Gly Gln 230 235 240 WO 00/22131 PCT/US99/24065 -27- Ala Ala Ala Asn Ile Ala Gly Phe Gly 250 His Arg Gly Pro Met Pro Pro 255 Thr Leu Leu Leu Gly Met 275 Tyr Ala Ile 290 Ala Cys Tyr Arg Gin Asn Gly 265 Glu Ala Ala Ser Glu Val Lys Gly 280 Leu Lys Gin Leu Gly 285 Pro Arg Arg Leu 270 Arg Met Phe Tyr Ile Val Thr Leu Leu Trp Arg Val 310 Thr Ala Val Leu Leu Trp Ser 300 Ala Val Lys Ala 305 Tyr Cys 315 Ala Val Pro His Leu Ala Trp Met Ser Phe 330 Asp Gin Ala Ala Val Asn 335 Pro Ile Val Leu Leu Asn Lys 345 Gly Leu Lys Lys Cys Leu Thr 350 Pro Arg Glu Thr His Ala Pro Cys 355 Pro Tyr Cys Val Met 370 Trp Gly Thr 360 Gly Ala Pro (24) INFORMATION FOR SEQ ID NO:23: SEQUENCE CHARACTERISTICS: LENGTH: 1053 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: ATGGCTTTGG AACAGAACCA GTCAACAGAT TATTATTATG AGGAAAATGA ACTTATGACT ACAGTCAATA TGAATTGATC TGTATCAAAG AAGATGTCAG AAAGTTTTCC TCCCTGTATT CCTCACAATA GCTTTCGTCA TTGGACTTGC ATGGTAGTGG CAATTTATGC CTATTACAAG AAACAGAGAA CCAAAACAGA CTGAATTTGG CTGTAGCAGA TTTACTCCTT CTATTCACTC TGCCTTTTTG GCAGTTCATG GGTGGGTTTT AGGGAAAATA ATGTGCAAAA TAACTTCAGC CTAAACTTTG TCTCTGGAAT GCAGTTTCTG GCTTGCATCA GCATAGACAG GTAACTAATG TCCCCAGCCA ATCAGGAGTG GGAAAACCAT GCTGGATCAT
AATGAATGGC
AGAATTTGCA
AGGCAATTCC
TGTGTACATC
GGCTGTTAAT
CTTGTACACA
ATATGTGGCA
CTGTTTCTGT
120 180 240 300 360 420 480 WO 00/22131 WO 0022131PCTIUS99/24065
GTCTGGATGG
AATGCTAGGT
CAAATGCTAG
TTTATCACGG
GTTCTGCTCA
TTCTGCCGAG
ATGGACATCG
ATCCTTTATG
TATGGGTCCT
CTGCCATCTT
GCATTCCCAT
AGATCTGCAT
CAAGGACACT
CAGTCGTTAT
CCATAGACAT
CCATCCAAGT
TTTTTATGGG
GGAGAAGACA
28 GCTGAGCATA CCCCAGCTGG TTTCCCCCGC TACCTAGGAA TGGATTTGTA GTACCCTTTC CATGAAGATG CCAAACATTA AGTTTTCATT GTCACTCAAC CATCTACTCC CTGATCACCA CACAGAAAGC ATTGCACTCT AGCATCTTTC AAAAACTACG GAGACAAAGT GTGGAGGAGT
TTTTTTATAC
CATCAATGAA
TTATTATGGG
AAATATCTCG
TGCCTTATAA
GCTGCAACAT
TTCACAGCTG
TTATGAAAGT
TTCCTTTTGA
AGTAAATGAC
AGCATTGATT
GGTGTGCTAC
ACCCCTAAAA
CATTGTCAAG
GAGCAAACGC
CCTCAACCCA
GGCCAAGAAA
TTCTGAGGGT
540 600 660 720 780 840 900 960 1020 1053 CCTACAGAGC CAACCAGTAC TTTTAGCATT TAA INFORMATION FOR SEQ ID NO:24: SEQUENCE CHARACTERISTICS: LENGTH: 350 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: Met Ala Leu Glu Gin Asn Gin Ser Thr Asp Tyr Tyr Tyr Glu 1 Glu Met Asn Gly 5 Thr Arg Tyr Asp Tyr Ser 25 Glu Phe Ala Lys Ile Glv Leu Ala Giu Asn Gin Tyr Giu Leu Lys Glu Asp Thr Ile Ala Val Phe Val Val Phe Leu Gly Asn Ser Thr Lys Thr Ile Cys Ile Vai Phe Leu Val Val Ala Ile Tyr 55 Lys Ala Tyr Tyr Leu Lys 70 Al a Gin Arg Asn Leu Ala Val Asp Val Tyr Asp Leu Leu Leu 90 Val 75 Leu Ile Phe Thr Leu Pro Phe Met Cys Trp Ala Val Asn Ala Val His Gly Trp 100 105 Leu Gly Lys Ile 110 Lys Ile Thr Ser Aia Leu Tyr Thr Leu Asn Phe Val Ser Gly Met Gin WO 00/22131 PCT/US99/24065 -29- 115 120 125 Phe Leu Ala Cys Ile Ser Ile Asp Arg Tyr Val Ala Val Thr Asn Val 130 135 140 Pro Ser Gin Ser Gly Val Gly Lys Pro Cys Trp Ile Ile Cys Phe Cys 145 150 155 160 Val Trp Met Ala Ala Ile Leu Leu Ser Ile Pro Gin Leu Val Phe Tyr 165 170 175 Thr Val Asn Asp Asn Ala Arg Cys Ile Pro Ile Phe Pro Arg Tyr Leu 180 185 190 Gly Thr Ser Met Lys Ala Leu Ile Gin Met Leu Glu Ile Cys Ile Gly 195 200 205 Phe Val Val Pro Phe Leu Ile Met Gly Val Cys Tyr Phe Ile Thr Ala 210 215 220 Arg Thr Leu Met Lys Met Pro Asn Ile Lys Ile Ser Arg Pro Leu Lys 225 230 235 240 Val Leu Leu Thr Val Val Ile Val Phe Ile Val Thr Gin Leu Pro Tyr 245 250 255 Asn Ile Val Lys Phe Cys Arg Ala Ile Asp Ile Ile Tyr Ser Leu Ile 260 265 270 Thr Ser Cys Asn Met Ser Lys Arg Met Asp Ile Ala Ile Gin Val Thr 275 280 285 Glu Ser Ile Ala Leu Phe His Ser Cys Leu Asn Pro Ile Leu Tyr Val 290 295 300 Phe Met Gly Ala Ser Phe Lys Asn Tyr Val Met Lys Val Ala Lys Lys 305 310 315 320 Tyr Gly Ser Trp Arg Arg Gin Arg Gin Ser Val Glu Glu Phe Pro Phe 325 330 335 Asp Ser Glu Gly Pro Thr Glu Pro Thr Ser Thr Phe Ser Ile 340 345 350 (26) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 1116 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) WO 00/22131 WO 0022131PCT/US99/24065 30 (xi) SEQUENCE DESCRIPTION: SEQ ID ATGCCAGGAA ACGCCACCCC AGTGACCACC ACTGCCCCGT GGGCCTCCCT GGGCCTCTCC
GCCAAGACCT
AGCGCGGTGT
CAGGTACTGC
CTGTACACAG
CTAGGCCTGC
ATCCTCTTCC
AGTCGGGGCC
GTCGGGATCG
CTGCAGATGG
ATCCCTCTCT
ATGGGCTTAA
ATCTTCCTAG
TCCTACTACA
TCTGTGGTGT
CTGGCCACGG
TCCATGAAGA
CCCGTGGCCC
TGCCCTGCAA
GCAACAACGT
GCACGCTGGG
AGGGCAACGT
GCACGCTGCC
TGGCCTCGAA
TGTGCTGCAT
GCCGCCGCCG
TTCACTACCC
ACAGCAGGAT
CCATCATCGC
GCGCTGCCCA
TCTGCTTCGC
GAGGAGACAG
TTCTGTGCCT
ACCATTCCCG
CAGACGTCAC
TTGCAGACCA
GTCCTTCGAA
GGTGCCGGCC
GCTGGCCGTC
ACTCTGGGTC
GGTGACCGCC
CTCCTGCGAC
GAGGACCGCC
GGTGTTCCAG
TGCCGGGTAC
CTTCACCAAC
GAAGGCCAAG
CCCGTACCAC
GAACGCCATG
GTCCACGGTG
CCAAGAAGTG
CAGGCTCACC
CTACACCTTC
GAGAGCAGGA
AACTGCCTGA
TACCTGCTCT
ATCTATATCC
TACATCTTCT
CGCTTCGTGG
ATCCTCATCT
ACGGAAGACA
TACTACGCCA
CACCGGATTT
GTGAAGCACT
CTGGTTCTCC
TGCGGCTTGG
AACGGCGTGG
TCCAGAATCC
CACAGCAGGG
TCCAGGCCCG
TAGTCCTGGT
CTGCGTGGCT
GCCTGGCACT
GCAACCAGCA
TCTGCAACAT
CCGTGGTGTA
CCGCCTGCAT
AGGAGACCTG
GGTTCACCGT
TCAGGAGCAT
CGGCCATCGC
TCGTCAAAGC
AGGAAAGGCT
CTGACCCCAT
ATAAGGGGTG
ACACCGAGGA
TGCACCCACC
CGTGGTGTAC
GGCGCTGCTG
CTGCGAACTG
CCGCTGGACC
CTACGTCAGC
CGCGCTGGAG
CTTCATCCTC
CTTTGACATG
TGGCTTTGCC
CAAGCAGAGC
GGTGGTTGTC
CGCTGCCTTT
GTACACAGCC
TATCTACGTG
GAAAGAGTGG
GCTGCAGTCG
AGGGTCACCA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1116 AGAGGCTGAT TGAGGAGTCC TGCTGA (28) INFORMATION FOR SEQ ID NO:26: SEQUENCE CHARACTERISTICS: LENGTH: 371 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: Met Pro Gly Asn Ala Thr Pro Val Thr Thr Thr Ala Pro Trp Ala Ser 1 5 10 WO 00/22131 WO 0022131PCT/US99/24065 -31- Leu Arg Pro Gly Leu His Phe Cys Arg 145 Val Cys Ala Thr Ala 225 Ile Al a Leu Gly Leu Ile Val Ala Asn Asn Val Tyr Thr Arg Trp Phe Cys 115 Asp Arg 130 Arg Arg Gly Ile Phe Asp Arg Phe 195 Asn His 210 Ala Gin Phe Leu Ala Ala Giu Giu Ser Leu Cys Leu Gly Thr 100 Asn Phe Arg Val Met 180 Thr Arg Lys Val Phe 260 Arg Ala Val Leu Ala Thr Leu Ile Val Thr His 165 Leu Val Ile Ala Cys 245 Ser Leu Lys Val Thr Val 70 Leu Gly Tyr Ala Ala 150 Tyr Gin Gly Phe Lys 230 Phe Tyr Tyr Thr Val Al a 55 Tyr Pro Leu Val Val 135 Ile Pro Met Phe Arg 215 Val Ala Tyr Thr Cys Tyr 40 Trp Leu Leu Leu Ser 120 Val Leu Val Asp Al a 200 Ser Lys Pro Arg Ala 280 Asn 25 Ser Leu Leu Trp Ala 105 Ile Tyr Ile Phe Ser 185 Ile Ile His Tyr Gly 265 Ser Asn Al a Ala Cys Val 90 Ser Leu Ala Ser Gin 170 Arg Pro Lys Ser His 250 Asp Val Val Val Leu Leu 75 Ile Lys Phe Leu Ala 15 Thr Ile Leu Gln Al a 235 Leu Arg Val Ser Cys Leu Ala Tyr Val Leu Glu 140 Cys Glu Al a Ser Ser 220 Ilie Val Asn Phe Phe Thr Gin Leu Ile Thr Cys 125 Ser Ile Asp Gly Ile 205 Met Al a Leu Ala Leu 285 Glu Leu Val Cys Arg Al a 110 Cys Arg Phe Lys Tyr 190 Ile Gly Val Leu Met 270 Cys Glu Gly Leu Glu Asn Tyr Ile Gly Ile Giu 175 Tyr Ala Leu Val1 Val 255 Cys Leu Ser Val Gin Leu Gin Ile Ser Arg Leu 160 Thr Tyr Phe Ser Val 240 Lys Gly Ser 275 Thr Val Asn Gly Val Ala Asp Pro Ile Ile Tyr Val Leu Ala Thr Asp 290 295 300 WO 00/22131 WO 0022131PCTIUS99/24065 -32 His 305 Ser Giu Ser Arg Gin Glu Vai 310 Met Lys Thr Asp Val 325 Leu Gin Ser Pro Val 340 Ser Arg Ile His Lys Gly 315 Thr Arg Leu Thr His Ser 330 Aia Leu Ala Asp His Tyr 345 Ser Pro Cys Pro Ala Lys 360 Trp Lys Glu Trp 320 Arg Asp Thr Glu 335 Thr Phe Ser Arg 350 Arg Leu Ile Glu 365 Pro Val His 355 Pro Pro Gly Giu Ser Cys 370 (28) INFORMATION FOR SEQ ID NO:27: SEQUENCE CHARACTERISTICS: LENGTH: 1113 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: ATGGCGAACT ATAGCCATGC AGCTGACAAC ATTTTGCAAA ATCTCTCGCC TCTAACAGCC
TTTCTGAAAC
ATCTCCATTT
GATCTTTGCT
GTCAAAAATG
GGGGTTTTGT
TTAGCTATCG
GTGATCTGTA
GGCACTTACT
GCTAATGATT
GTCTACCTCA
GTAGCAGCAG
GCCAATTGGC
CAAAATGCAA
TGACTTCCTT
TGCTAGTGAA
GTTCAGATAT
GCTCTACCTG
CCTGTTTCCA
CCCATCACCG
TGGTGTGGAC!
CATTCATTAG
CCTTAGGATT
AGCTGATATT
TCAGCCAGAA
TAGCAGGATT
ACACCACAGG
GGGTTTCATA
AGATAAGACC
CCTCAGATCT
GACTTATGGG
CACTGCTTTC
CTTCTATACA
TCTGTCTGTG
GGAGGAAGAT
TATGCTGCTT
TTTCGTCCAC
CTGGACTTTT
TGGAAGGGGT
CAGAAGAAGG
ATAGGAGTCA
TTGCATAGAG
GCAATTTGTT
ACTCTGACTT
ATGCTCTTCT
AAGAGGCTGA
GCCATGGCAT
CAATGCACCT
CTTGCTCTCA
GATCGAAGAA
CATGGTCCTG
CCCACACCAC
CTATTGGTCT
GCGTGGTGGG
CACCTTACTA
TCCCATTTGT
GCAAAGTGAT
GCA.TCAGTGT
CCTTTTGGAC
TTCCCCCGGT
TCCAACACCG
TCCTCCTAGC
AAATGAAGCC
GAGCCAGTGG
CCACCTTGCT
TAGACGAGTT
CAACCTCCTG
CTTCCTGTTG,
GTTCAACTCT
TGCCTTTCTG
CACCAGATAC
GTGTCTGGCT
TTTAGACGTG
CTCCTTCAGG
CACACAGCTT
AGTCCAGTTT
CCAGGCAGCT
GGGCATCAGG
CAAAATGGAG
120 180 240 300 360 420 480 540 600 660 720 780 840 WO 00/22131 WO 00/213 1PCTIUS99/24065 -33 AAAAGAATCA GCAGAATGTT CTATATAATG ACTTTTCTGT TTCTAACCTT GTGGGGCCCC TACCTGGTGG CCTGTTATTG GAGAGTTTTT GCAAGAGGGC CTGTAGTACC AGGGGGATTT CTAACAGCTG CTGTCTGGAT GAGTTTTGCC CAAGCAGGAA TCAATCCTTT TGTCTGCATT TTCTCAAACA GGGAGCTGAG GCGCTGTTTC AGCACAACCC TTCTTTACTG CAGAAAATCC AGGTTACCAA GGGAACCTTA CTGTGTTATA TGA (29) INFORMATION FOR SEQ ID NO:28: SEQUENCE CHARACTERISTICS: LENGTH: 370 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein 900 960 1020 1080 1113 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: Met Ala Asn Tyr Ser His Ala Ala Asp Asn Ile Leu Gin Asn 1 Pro 10 Ser Leu Ser Leu Thr Ala Phe Leu Lys Leu Thr 25 Ile Leu Gly Phe Ile Ile Gly Val Lys Asp Val Ser Val Val Gly Asn Leu Ser Ile Leu Leu Lys Thr Ser Asp Leu His Arg Ala Tyr Tyr Phe Leu Asp Leu Cys Cys Ile Leu Arg Ile Cys Phe Val Pro Thr Val Phe Asn Lys Asn Gly Ser Gly Trp Thr Tyr Gly 90 Phe Leu Thr Cys Lys Val Ile Ala Phe Phe Cys Ile 115 Tyr Thr Lys Leu 100 Ser Val Leu Ser Cys 105 Leu His Thr Ala Val Thr Arg Tyr 120 Trp Ala Ile Ala His 125 Val Phe Met Leu 110 His Arg Phe Ile Cys Met Arg Leu Thr Thr Cys Leu 130 Val Trp Ala 140 Pro Thr Leu Ser Met Ala Phe 145 Gly Pro 155 Gin Val Leu Asp Val 160 His Thr Tyr Ser Phe Arg Giu Glu Asp Cys Thr Phe Gin WO 00/22131 PCT/US99/24065 -34- 165 170 175 Arg Ser Phe Arg Ala Asn Asp Ser Leu Gly Phe Met Leu Leu Leu Ala 180 185 190 Leu Ile Leu Leu Ala Thr Gin Leu Val Tyr Leu Lys Leu Ile Phe Phe 195 200 205 Val His Asp Arg Arg Lys Met Lys Pro Val Gin Phe Val Ala Ala Val 210 215 220 Ser Gin Asn Trp Thr Phe His Gly Pro Gly Ala Ser Gly Gin Ala Ala 225 230 235 240 Ala Asn Trp Leu Ala Gly Phe Gly Arg Gly Pro Thr Pro Pro Thr Leu 245 250 255 Leu Gly Ile Arg Gin Asn Ala Asn Thr Thr Gly Arg Arg Arg Leu Leu 260 265 270 Val Leu Asp Glu Phe Lys Met Glu Lys Arg Ile Ser Arg Met Phe Tyr 275 280 285 Ile Met Thr Phe Leu Phe Leu Thr Leu Trp Gly Pro Tyr Leu Val Ala 290 295 300 Cys Tyr Trp Arg Val Phe Ala Arg Gly Pro Val Val Pro Gly Gly Phe 305 310 315 320 Leu Thr Ala Ala Val Trp Met Ser Phe Ala Gin Ala Gly Ile Asn Pro 325 330 335 Phe Val Cys Ile Phe Ser Asn Arg Glu Leu Arg Arg Cys Phe Ser Thr 340 345 350 Thr Leu Leu Tyr Cys Arg Lys Ser Arg Leu Pro Arg Glu Pro Tyr Cys 355 360 365 Val Ile 370 INFORMATION FOR SEQ ID NO:29: SEQUENCE CHARACTERISTICS: LENGTH: 1080 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: ATGCAGGTCC CGAACAGCAC CGGCCCGGAC AACGCGACGC TGCAGATGCT GCGGAACCCG WO 00/22131 WO 00/213 1PCT/US99/24065 GCGATCGCGG TGGCCCTGCC CGTGGTGTAC TCGCTGGTGG CGGCGGTCAG CATCCCGGGC
AACCTCTTCT
TTCATGATCA
TACTACCATT
GTGGCCTTTT
CGCTTCCTGG
GTGGCCGCGT
ACCGATCTCA
TGGACGATGC
CTGTTCCTCA
TTGCGCACGG
GTGGTCTTGC
ATCGTGAGCC
CTCAGCTGCC
CAGCTGCGCC
CGCCGCGAGA
CCTGAAGGGA
CTCTGTGGGT
ACCTGAGCGT
GCAACCGCCA
ACGCAAACAT
GGGTCCTGTA
GTGCAGGGAC
CCTACCCGGT
TCCCCAGCGT
TCCCGTTCGT
AGGAGGCGCA
TGGCCTTTGT
GCCTGTTCTA
TCAACAACTG
TGCGGGAATA
GCCTCTTCTC
TGGAGGGAGC
GCTGTGCCGG
CACGGACCTG
CCACTGGGTA
GTATTCCAGC
CCCGCTCAGC
CTGGCTGCTG
GCACGCCCTG
GGCCATGTGG
GATCACCGTG
CGGCCGGGAG
CACCTGCTTC
CGGCAAGAGC
TCTGGACCCG
TTTGGGCTGC
CGCCAGGACC
CACCAGGCCC
CGCATGGGGC
ATGCTGGCCA
TTCGGGGTGC
ATCCTCACCA
TCCAAGCGCT
CTCCTGACCG
GGCATCATCA
GCCGTGTTCC
GCTTGTTACA
CAGCGGAGGC
GCCCCCAACA
TACTACCACG
TTTGTTTATT
CGCCGGGTGC
ACGTCCGTGC
GGCCTCCAGA
CCAGATCCCC
GCGTGTTGCC
TGCTTTGCAA
TGACCTGTAT
GGCGCCGCCG
CCCTGTGCCC
CCTGCTTCGA
TCTTCACCAT
CGGCCACCAT
GCGCGGTGGG
ACTTCGTGCT
TGTACAAGCT
ACTTTGCGTC
CCAGAGACAC
GCTCCGAGGC
GGCAGGAGAG
GTCGGTCATC
TTTCCAAATC
CGTGGTGACC
CAGCGTGGAG,
TCGTTACGCG
GCTGGCGCGC
CGTCCTCAAG
CTTCATCCTG
CCTCAAGCTG
CCTGGCCGCG
CCTGGCGCAC
CACGCTGTGT
CCGGGAATTC
CCTGGACACG
CGGTGCGCAC
TGTGTTCTGA
180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 (31) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 359 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Gin Val Pro Asn Ser Thr Gly Pro Asp Asn Ala Thr Leu Gin Met 1 5 10 Leu Arg Asn Pro Ala Ile Ala Val Ala Leu Pro Val Val Tyr Ser Leu 25 Val Ala Ala Val Ser Ile Pro Gly Asn Leu Phe Ser Leu Trp, Val Leu WO 00/22131 WO 0022131PCTIUS99/24065 36 Cys Arg Leu Ser Tyr Tyr Asn Val Thr Met Leu Ser 130 Ala Giy 145 T hr Asp Asp Val Phe Leu Thr Val 210 Glu Ala 225 Val Val Leu Leu His Val Asp Pro 290 Arg Glu 305 Arg Arg Arg Met Val Thr His Cys Val. Thr 100 Thr Cys 115 Ser Lys Thr Trp Leu Thr Leu Lys 180 Phe Thr 195 Ala Cys His Gly Leu Leu Ala His 260 Tyr Lys 275 Phe Vai Tyr Leu Giu Ser Gly Pro Asp Leu 70 Asn Arg Val Ala Ile Ser Arg Trp Leu Leu 150 Tyr Pro 165 Trp Thr Ile Phe Tyr Thr Arg Giu 230 Ala Phe 245 Ilie Val Leu Thr Tyr Tyr Gly Cys 310 Leu Phe 325 Arg Ser 55 Met Leu His His Phe Tyr Val Giu 120 Arg Arg 135 Leu Leu Val His Met Leu Ile Leu 200 Ala Thr 215 Gin Arg Val Thr Ser Arg Leu Cys 280 Phe Ala 295 Arg Arg Ser Ala Pro Ala Trp Al a 105 Arg Arg Thr Ala Pro 185 Leu Ile Arg Cys Leu 265 Leu Ser Val Arg Ser Ser Vai 90 Asn Phe Arg Ala Leu 170 Ser Phe Leu Arg Phe 250 Phe Ser Arg Pro Thr 330 Val Val 75 Phe Met Leu Tyr Leu 155 Gly Val Leu Lys Ala 235 Ala Tyr Cys Giu Arg 315 Thr Ile Leu Gly Tyr Gly Ala 140 Cys Ile Ala Ile Leu 220 Val.
Pro Gly Leu Phe 300 Asp Ser Phe Pro Val Ser Val i2 Val Pro Ile Met Pro 205 Leu Gly Asn Lys Asn 285 Gin Thr Val Met Phe Leu Ser 110 Leu Ala Leu Thr Trp 190 Phe Arg Leu Asn Ser 270 Asn Leu Leu Arg Ile Gin Leu Ile Tyr Ala Ala Cys 175 Ala Val Thr Al a Phe 255 Tyr Cys Arg Asp Ser 335 Asn Ile Cys Leu Pro Cys Arg 160 Phe Val Ile Glu Ala 240 Val Tyr Leu Leu Thr 320 Glu WO 00/22131 WO 0022131PCTIUS99/24065 -37 Ala Gly Ala His Pro Glu Gly Met Glu Gly Ala Thr Arg Pro Gly Leu 340 345 350 Gin Arg Gin Glu Ser Val Phe 355 (32) INFORMATION FOR SEQ ID NO:31: SEQUENCE CHARACTERISTICS: LENGTH: 1503 base pairs TYPE: nucleic acid STRANDEDNESS: single 0 TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: ATGGAGCGTC CCTGGGAGGA CAGCCCAGGC CCGGAGGGGG CAGCTGAGGG CTCGCCTGTG
CCAGTCGCCG
GCTGAGTGCC
CGCTGGCCCG
TCGGTTCAAG
CGGCCCATGG
TACAACTACA
GACGCCGTGG
TTGGTGCTCG
ACGTTGTCGG
CTCACGCTGA
CTCACTGCGT
CGCAGGGGGC
TGGGGCGTGT
CTGGACGCTT
CTCGCCTTCG
GTACGCGCCA
CCGGGGCGCG
CGGGACCCAA
CCCCCTCGCC
GCAGCGCGAC
AGTCGGGGCT
CCGGCAAGCT
TGTGCCTGGC
GACGCCACCC
ATCTGCTGGC
AACTGTCCCC
CCGTGCTGAG,
CCGCGCCCGT
CGCTGCTCCT
GCTCCACTGT
TGGGCATCCT
ACGCGCGGCG
CTCCGGTGCC
GGGGAGGGGG
TGCCAGCTCC
TGCGGGTGGC
GCTGCGGCCG
CCGCGGTGCG
GGTGTGCGCC
GCGCTTCCAC
AGGCGCCGCC
CGCGCTCTGG
CCTCCTGGCC
CTCCAGTCGG
CGGGCTCCTG
CTTGCCGCTC
GGCCGCGATC
CCTGCCGGCA
GCGGCGAGTG
CAACTGCTGG
AGCCCCGCCC
GCACGACCAG
GCGCCGGTGA
AGCTACCAGC
TTCATCGTGC
GCTCCCATGT
TACGCCGCCA
TTCGCACGGG
ATCGCGCTGG
GGGCGCACGC
CCAGCGCTGG
TACGCCAAGG
TGTGCACTCT
CGGCCCGGGA
GCACAGGCTG
CGACCGCCGG
CCGGAGCGGC
GGCGCAGACC
GCGAGGTCAT
CGGGTGCCGG
TAGAGAATCT
TCCTGCTCCT
ACATCCTACT
AGGGAGGCGT
AGCGCAGCCT
TGGCGATGGC
GCTGGAATTG
CCTACGTGCT
ACGCGCGCAT
CTGCGGGGAC
GCAGCCATGG
CCCTTTGCGT
GTCCGCTCAC
TTGGGGCGCG
CGTCCTGCAT
CCTGCGCGCC
AGCCGTGTTG
GGGCAGCCTC
GTCGGGGCCG
CTTCGTGGCA.
CACCATGGCG
AGCCGCGGCC
CCTGGGTCGC
CTTCTGCGTG
CTACTGCCAG
CACCTCGACC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 CGGGCGCGTC GCAAGCCGCG CTCTCTGGCC TTGCTGCGCA CGCTCAGCGT GGTGCTCCTG WO 00/22131 PCTIUS99/24065 38 GCCTTTGTGG CATGTTGGGG CCCCCTCTTC CTGCTGCTGT GCGCGCACCT GTCCTGTACT CCTGCAGGCC GATCCCTTCC TCACTTCTGA ACCCCATCAT CTACACGCTC ACCAACCGCG CGCCTGGTCT GCTGCGGACG CCACTCCTGC GGCAGAGACC GCGAGCGCGG CTGAGGCTTC CGGGGGCCTG CGCCGCTGCC AGCTTCAGCG GCTCGGAGCG CTCATCGCCC CAGCGCGACG ACAGGCAGCC CCGGTGCACC CACAGCCGCC CGGACTCTGG
TGA
(33) INFORMATION FOR SEQ ID NO:32: SEQUENCE CHARACTERISTICS: LENGTH: 500 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein
TGCTCGACGT
TGGGACTGGC
ACCTGCGCCA
CGAGTGGCTC
TGCCCCCGGG
GGCTGGACAC
TATCAGAACC
GGCGTGCCCG
CATGGCCAAC
CGCGCTCCTG
CCAGCAGTCG
CCTTGATGGG
CAGCGGCTCC
GGCTGCAGAC
1140 1200 1260 1320 1380 1440 1500 1503 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:32: Met Glu Arg Pro Trp Glu Asp Ser Pro Gly Pro Glu Gly Ala 1 Gly Ser Pro Ser Gly Thr Arg Gly Gin 10 Al a Ala Glu Val Gly Pro Val Ala Ala Arg Ser Gly Trp Gin Pro Trp 40 Ala Glu Cys Pro Gly Arg Ala Ala Ala Pro Lys Gly Trp Pro Ala Leu Leu Ala Pro Ser Thr 55 Ser Gly Pro Leu Arg Ala Pro Ala Ser Ser Ser 70 Ala Pro Ala Pro Gly 75 Ala Ala Ser Ala His Val Gin Gly Ser Arg Thr Ala Gly Gly 90 Gly Arg Pro Gly Arg Arg Pro Trp Gly Val Ser Glu 115 Pro Met Glu Leu Leu Arg Pro Ala Pro 110 Lys Leu Arg Ile Val Leu His 120 Asn Tyr Thr Gly 125 Gly Ala 130 Ser Tyr Gln Pro Gly Ala Gly Leu Arg 135 Ala Asp Ala Val Val 140 WO 00/22131 WO 0022131PCTIUS99/24065 39 Cys 145 Leu Leu Ala Leu Val 225 Arg Al a Leu Pro Gly 305 Val Thr Arg Leu Pro 385 Ser His Leu Val Gly Asn Trp 210 Leu Arg Al a Gly Leu 290 Ile Arg Thr Thr Phe 370 Val Leu Ala Ala Leu Ser Ile 195 Phe Ser Gly Al a Trp 275 Tyr Leu Al a Ser Leu 355 Leu Leu Leu Leu Val Cys Gly Arg 165 Leu Thr 180 Leu Leu Ala Arg Leu Leu Pro Ala 245 Ala Trp 260 Asn Cys Ala Lys Ala Ala Asn Ala 325 Thr Arg 340 Ser Val Leu Leu Leu Gln Asn Pro 405 Leu Arg 420 Ala Phe 150 His Pro Leu Ser Ser Gly Glu Gly 215 Ala Ile 230 Pro Val Gly Val Leu Gly Ala Tyr 295 Ile Cys 310 Arg Arg Ala Arg Val Leu Leu Leu 375 Ala Asp 390 Ile Ile Leu Val Ile Arg Asp Pro 200 Gly Ala Ser S er Arg 280 Val Ala Leu Arg Leu 360 Asp Pro Tyr Cys Val Leu Giu Asn Leu Ala Val Leu 155 160 Phe Leu 185 Leu Val Leu Ser Leu 265 Leu Leu Leu Pro Lys 345 Ala Val Phe Thr Cys 425 His 170 Leu Thr Phe Giu Arg 250 Leu Asp Phe Tyr Ala 330 Pro Phe Ala Leu Leu 410 Gly Al a Ala Leu Val Arg 235 Gly Leu Ala Cys Al a 315 Arg Arg Val Cys Gly 395 Thr Arg Pro Gly Lys Ala 220 Ser Arg Gly Cys Val 300 Arg Pro Ser Ala Pro 380 Leu Asn His Met Ala Leu 205 Leu Leu Thr Leu Ser 285 Leu Ile Gly Leu Cys 365 Ala Ala Arg Ser Phe Leu 175 Ala Tyr 190 Ser Pro Thr Ala Thr Met Leu Ala 255 Leu Pro 270 Thr Val Ala Phe Tyr Cys Thr Ala 335 Ala Leu 350 Trp Gly Arg Thr Met Ala Asp Leu 415 Cys Gly 430 Leu Ala Al a Ser Ala 240 Met Al a Leu Val Gln 320 Gly Leu Pro Cys Asn 400 Arg Arg Asp Pro Ser Gly Ser Gln Gln Ser Ala Ser Ala Ala Glu Ala Ser Gly WO 00/22131 WO 00/213 1PCT/US99/24065 Gly Leu 450 Ser Glu 465 Thr Gly 435 440 445 Arg Arg Cys Leu Pro Pro Gly Leu Asp Gly Ser Phe Ser Gly 455 460 Arg Ser Ser Pro Gin Arg Asp Gly Leu Asp Thr Ser Gly Ser 470 475 480 Ser Pro Gly Ala Pro Thr Ala Ala Arg Thr Leu Val Ser Glu 485 490 495 Pro Ala Ala Asp 500 (34) INFORMATION FOR SEQ ID NO: 33: SEQUENCE CHARACTERISTICS: LENGTH: 1029 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID N0:33: ATGCAAGCCG TCGACAATCT CACCTCTGCG CCTGGGAACA CCAGTCTGTG CACCAGAGAC
TACAAAATCA
ATCACAAATG
ATTTTTCTTA
ATTCTTAGTG
TCCGTCATAT
GATCGCTACC
GCTAAGATTC
ATTCTGACCA
GA.GTTCGGTC
AATTTCTTAA
GTAAGAACGA
ATCATTGCTG
CTGAGCCAAA
CCCAGGTCCT
GCCTGGCGAT
AGAACACAGT
ATGCCAAACT
TTTATTTCAC
AGAAGACCAC
TCTCTGTTGT
ACAGGCAGCC
TAGTCTGGCA
TTGTTATTGT
GGGGTGTAGG
TATTCTTTAT
CCCGGGATGT
CTTCCCACTG
GAGGATTTTC
CATTTCTGAT
GGGAACAGGA
AATGTATATC
CAGGCCATTT
CATCTGGGCA
GAGAGACAAG
TGAAATAGTA
ATGTTATACA
TAAAGTCCCC
TTGTTTTGTT
CTTTGACTGC
CTCTACACTG
TTTCAAATCC
CTTCTCATGA
CCACTGAGAA
AGTATTTCAT
AAAACATCCA
TTCATGTTCT
AATGTGAAGA
AATTACATCT
CTCATTACAA
AGGAAAAAGG
CCTTTCCATT
ACTGCTGAA-A
TCCTGTTTTT
GGAGTAAATC
TTCTGACTTT
CTTTTGTGTG
TCCTGGGACT
ACCCCAAAAA
TACTCTCTTT
AATGCTCTTT
GTCAAGTCAT
AAGAACTGTA
TGAACGTCAA
TTGCCCGAAT
ATACTCTGTT
TGTTGGACTT
AAACTTTATT
TCCATTCAAA
TCAAGTTACC
GATAACTATC
TCTCTTGGGG
GCCTAACATG
CCTTAAATCA
TTTCTGGATT
CCGGTCATAC
AGTTTTCATT
TCCTTACACC
CTATGTGAAA
120 180 240 300 360 420 480 540 600 660 720 780 840 WO 00/22131 PCT/US99/24065 -41- GAGAGCACTC TGTGGTTAAC TTCCTTAAAT GCATGCCTGG ATCCGTTCAT CTATTTTTTC 900 CTTTGCAAGT CCTTCAGAAA TTCCTTGATA AGTATGCTGA AGTGCCCCAA TTCTGCAACA 960 TCTCTGTCCC AGGACAATAG GAAAAAAGAA CAGGATGGTG GTGACCCAAA TGAAGAGACT 1020 CCAATGTAA 1029 (35) INFORMATION FOR SEQ ID NO:34: SEQUENCE CHARACTERISTICS: LENGTH: 342 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: Met Gin Ala Val Asp Asn Leu Thr Ser Ala Pro Gly Asn Thr Ser Leu 1 5 10 Cys Thr Arg Asp Tyr Lys Ile Thr Gin Val Leu Phe Pro Leu Leu Tyr 25 Thr Val Leu Phe Phe Val Gly Leu Ile Thr Asn Gly Leu Ala Met Arg 40 Ile Phe Phe Gin Ile Arg Ser Lys Ser Asn Phe Ile Ile Phe Leu Lys 50 55 Asn Thr Val Ile Ser Asp Leu Leu Met Ile Leu Thr Phe Pro Phe Lys 70 75 Ile Leu Ser Asp Ala Lys Leu Gly Thr Gly Pro Leu Arg Thr Phe Val 90 Cys Gin Val Thr Ser Val Ile Phe Tyr Phe Thr Met Tyr Ile Ser Ile 100 105 110 Ser Phe Leu Gly Leu Ile Thr Ile Asp Arg Tyr Gin Lys Thr Thr Arg 115 120 125 Pro Phe Lys Thr Ser Asn Pro Lys Asn Leu Leu Gly Ala Lys Ile Leu 130 135 140 Ser Val Val Ile Trp Ala Phe Met Phe Leu Leu Ser Leu Pro Asn Met 145 150 155 160 Ile Leu Thr Asn Arg Gin Pro Arg Asp Lys Asn Val Lys Lys Cys Ser 165 170 175 Phe Leu Lys Ser Glu Phe Gly Leu Val Trp His Glu Ile Val Asn Tyr WO 00/22131 PCT/US99/24065 -42- 180 Val 185 Asn Ile Cys Gin 195 Tvr Thr Leu Ile Phe Trp Phe Leu Ile 190 Ile Val Cys Arg Thr Arg Ile Thr Lys 210 Gly Val Glu 215 Arg Tyr Arg Ser Gly Lys Val Lys Lys Val 225 Ile Asn 235 Pro Lys Val Phe Ile 240 Ile Ala Val Ile Cys Phe Val 250 Asp Phe His Phe Ala Arg 255 Ser Gin Ile Pro Tyr Glu Asn Thr 275 Leu Asn Ala Thr Arg 265 Lys Glu Val Phe Asp Phe Tyr Val Ser Thr Leu Trp 285 Leu Cys Thr Ala 270 Leu Thr Ser Cys Lys Ser Cys Leu Asp 290 Phe Arg Pro 295 Ser Ile Tyr Phe Phe 300 Pro Asn Ser Leu Met Leu Lys Asn Ser Ala 305 Ser Thr 320 Leu Ser Gin Asp Asn 325 Pro Met Arg Lys Lys Glu Gin Asp Gly Gly Asp Pro 330 335 Asn Glu Glu Thr 340 (36) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 1077 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID ATGTCGGTCT GCTACCGTCC CCCAGGGAAC GAGACACTGC TGAGCTGGAA GACTTCGCGG GCCACAGGCA CAGCCTTCCT GCTGCTGGCG GCGCTGCTGG GGCTGCCTGG CAACGGCTTC GTGGTGTGGA GCTTGGCGGG CTGGCGGCCT GCACGGGGGC GACCGCTGGC GGCCACGCTT GTGCTGCACC TGGCGCTGGC CGACGGCGCG GTGCTGCTGC TCACGCCGCT CTTTGTGGCC TTCCTGACCC GGCAGGCCTG GCCGCTGGGC CAGGCGGGCT GCAAGGCGGT GTACTACGTG 120 180 240 300 WO 00/22131 WO 00/213 1PCTIUS99/24065 43 TGCGCGCTCA GCATGTACGC CAGCGTGCTG CTCACCGGCC TGCTCAGCCT GCAGCGCTGC
CTCGCAGTCA
CTGCTGCTGG
CACCTGTGGA
CACCTGAGCC
TACAGCGTGA
CGGGTGGGCC
CACGCAGTCA
AAGCTGGGCG
TCTAGCGTCA
CCCCGTTTCC
AGGGAAGGGA
CCCGCCCCTT
CGGTCTGGCT
GGGACCGCGT
TGGAGACTCT
CGCTGGCACG
GGCTGGTGAG
ACCTTCTGCA
GAGCCGGCCA
ACCCGGTGCT
TCAC!GCGGCT
CCATGGAGCT
CCTGGCGCCT
GGCCGCCCTG
ATGCCAGCTG
GACCGCTTTC
GCTGCGGGGC
CGCCATCGTG
GGCGGTCGCA
GGCGGCGCGA
CTACGTCTTC
CTTCGAAGGC
CCGAACTACC
CGGCTGCGCA
TTGCTCGCCG,
TGCCACCCGT
GTGCTTCCTT
GCCCGCTGGG,
CTTGCCTTCG
GCGCTGGCTC
GCGGGAACTA
ACCGCTGGAG
TCTGGGGAGG
CCTCAGCTGA
GCCCGGCCCT
TCCCGGCCGC
CGCCGGTCCA
TCGGGCTGAT
GCTCCGGGCG
GCTTGCTCTG
CACCGGAAGG
CGGCCTTGGC
ATCTGCTGCC
CCCGAGGGGG
AAGTGGTGGG
GGCCCGCCGC
CGTCTACCGC
CGCCGCCGCC
GCTCGGCTGC
GCACGGGGCG
GGCCCCCTAC
GGCCTTGGCG
CTTCTTCAGT
CCGGGCAGGT
CGGCCGCTCT
GCAGGGCCGC
360 420 480 540 600 660 720 780 840 900 960 1020 1077 GGCAATGGAG ACCCGGGGGG TGGGATGGAG AAGGACGGTC CGGAATGGGA CCTTTGA (37) INFORMATION FOR SEQ ID NO:36: SEQUENCE CHARACTERISTICS: LENGTH: 358 amino acids TYPE: amino acid
STRAINDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: Met Ser Val Cys Tyr Arg Pro Pro Gly Asn Glu 1 5 10 Lys Thr Ser Arg Ala Thr Gly Thr Ala Phe Leu 25 Leu Gly Leu Pro Gly Asn Gly Phe Val Val Trp 40 Arg Pro Ala Arg Gly Arg Pro Leu Ala Ala Thr Ala Leu Ala Asp Gly Ala Val Leu Leu Leu Thr 70 75 Thr Leu Leu Ser Trp Leu Leu Ser Leu Leu Val Ala Ala Leu Ala Gly Trp Leu His Leu Pro Leu Phe Val Phe Ala Ala eLeu Thr Arg Gln Ala Trp Pro Leu Gly Gln Ala Gly Cys Lys WO 00/22131 WO 00/213 1PCTIUS99/24065 -44- Val Gly Al a Val 145 His His Pro Arg Leu 225 His Gly Thr Val Thr 305 Arg Gly Gly Tyr Leu Pro 130 Trp Leu Ala Phe Gly 210 Val Ala Ala Thr Phe 290 Arg Glu Gln Tyr Leu 115 Arg Leu Trp Ala Gly 195 Ala Ser Val Leu Al a 275 Thr Leu Gly Gly Val 100 Ser Leu Ala Arg Ala 180 Leu Arg Ala Asn Ala 260 Leu Ala Phe Thr Arg 340 Cys Leu Arg Ala Asp.
165 His Met Trp Ile Leu 245 Lys Ala Gly Glu Met 325 Gly Ala Gln Ser Leu 150 Arg Leu Leu Gly Val 230 Leu Leu Phe Asp Gly 310 Glu Asn Leu Leu Arg Pro 135 Leu Val Ser Gly Ser 215 Leu Gln Gly Phe Leu 295 Ser Leu Ser Dys 120 Al a Leu Cys Leu Cys 200 Gly Al a Al a Gly Ser 280 Leu Gly Arg Met 105 Leu Leu Ala Gln Glu 185 Tyr Arg Phe Val Ala 265 Ser Pro Glu Thr Tyr Ala Al a Val Leu 170 Thr Ser His Gly Ala 250 Gly Ser Arg Al a Thr 330 Ala VTal Arg Pro 155 Cys Leu Val Gly Leu 235 Al a Gln Val Ala Arg 315 Pro Ser Thr Arg 140 Ala His Thr Thr Ala 220 Leu Leu Ala Asn Gly 300 Gly Gln Val Arg 125 Leu Al a Pro Al a Leu 205 Arg Trp Ala Ala Pro 285 Pro Gly Leu Leu 110 Pro Leu Val Ser Phe 190 Al a Val Al a Pro Arg 270 Val Arg Gly Lys Leu Phe Leu Tyr Pro 175 Val1 Arg Gly Pro Pro 255 Ala Leu Phe Arg Val 335 Thr Leu Ala Arg 160 Val Leu Leu Arg Tyr 240 Glu Gly Tyr Leu Ser 320 Val Gly Asp Pro Gly Gly Gly Met Glu Lys Asp Pro Glu Trp Asp 355 (38) INFORMATION FOR SEQ ID NO:37: WO 00/22131 WO 0022131PCTIUJS99/24065 45 Wi SEQUENCE CHARACTERISTICS: LENGTH: 1005 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: ATGCTGGGGA TCATGGCATG GAATGCAACT TGCAAAAACT GGCTGGCAGC AGAGGCTGCC
CTGGAAAAGT
AATACCATTG
TATCTCTTTA
ACTACCTTTC
TTGTTTACGG
ACCTCTCTGT
AGGAGTTATG CCAATGGAAA
GTGCTTCATG
TACTTGATAA
TTAATCTCCT
ATAAATCCTG
CCCAACTACA
TTTGTGATGT
GTTGCTACTG
TTCTCTGTGC
GGGAGTTGGA
CGGCCTTTGG
CACTTCAGGG
CCAACCTCTA
TTAAGTATCC
TGGCCATTTG
TTATAACTGA
ACCTCATTTA
GTTTCTTTTA
CTCTGCCCCT
TTTTTACACC
AGCAGTATCA
CCTTTCTGAA
ACATGCTGAT
CATTTTTTAT
CTACATCTTC
CTCTGACTTA
CTGGATATAT
TACCAGCATT
TTTCCGAGAA
GGTTTTAGTA
CAATGGCACC
CAGCATGTGT
TTACAAGATT
TGAAAAGCCT
CTATCACGTC
GTGCACTCAG
CAGTGTCATC
GAATCAACTG
GGGATTGAGT
TCTCTGAAGA
GCTTTTCTGT
GGAGACGTGC
CTCTTTCTCA
CACCTTCTGC
ACCTTAGAGT
ACCTGTAATG
CTAACACTGT
GCTCTCTTCC
CTCAACTTGG
ATGCGGAATG
GTCGTCATCA
AACCCTGTCT
AGACACAACT
TCGTTGTGGG
ACTGGAACAG
GCACCCTCCC
TCTGCATAAG
CTTTTATCAG
AAAAGAAAGA
TACTACCCAT
ATTTTGCAAG
TGGGGTTCCT
TAAAGCAGAG
TCATCATGGC
TGAGGATCGC
ACTCCTTTTA
TCTATTTTCT
TCAAATCCCT
AGTCCTTGGA
CAGTAATATT
CATGCTGATA
CAACCGATAT
CATAGATCGA
GTTTGCTATT
ACTTCCCCTT
TTCTGGAGAC
TATTCCTCTT
GAATAGGCAG
AGTGGTAATC
TTCAC!GCCTG
CATTGTGACA
TTTGGGAGAT
TACATCCTTT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 AGCAGATGGG CTCATGAACT CCTACTTTCA TTCAGAGAAA AGTGA (39) INFORMATION FOR SEQ ID NO:38: SEQUENCE CHARACTERISTICS: LENGTH: 334 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein 1005 WO 00/22131 PCTIUS99/24065 46 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: Met Leu Gly Ilie Met Ala Trp Asn Ala Thr Cys Lys Asn Trp Leu Ala 1 5 10 Ala Giu Ala Ala Leu Glu Lys Tyr Tyr Leu Ser Ile Phe Tyr Gly Ile 20 25 Giu Phe Val Val Gly Val Leu Gly Asn Thr Ile Val Val Tyr Gly Tyr 40 Ile Phe Ser Leu Lys Asn Trp Asn Ser Ser Asn Ile Tyr Leu Phe Asn 55 Leu Ser Val Ser Asp Leu Ala Phe Leu Cys Thr Leu Pro Met Leu Ile 70 75 Arg Ser Tyr Ala Asn Gly Asn Trp Ile Tyr Gly Asp Val Leu Cys Ile 90 Ser Asn Arg Tyr Val Leu His Ala Asn Leu Tyr Thr Ser Ile Leu Phe 100 105 110 Leu Thr Phe Ile Ser Ile Asp Arg Tyr Leu Ile Ile Lys Tyr Pro Phe 115 120 125 Arg Giu His Leu Leu Gin Lys Lys Glu Phe Ala Ile Leu Ile Ser Leu 130 135 140 Ala Ile Trp, Vai Leu Val Thr Leu Giu Leu Leu Pro Ile Leu Pro Leu 145 150 155 160 Ile Asn Pro Val Ile Thr Asp Asn Gly Thr Thr Cys Asn Asp Phe Ala 165 170 175 Ser Ser Gly Asp Pro Asn Tyr Asn Leu Ile Tyr Ser Met Cys Leu Thr 180 185 190 Leu Leu Gly Phe Leu Ile Pro Leu Phe Val Met Cys Phe Phe Tyr Tyr 195 200 205 Lys Ile Ala Leu Phe Leu Lys Gin Arg Asn Arg Gin Val Ala Thr Ala 210 215 220 Leu Pro Leu Giu Lys Pro Leu Asn Leu Val Ile Met Ala Val Vai Ile 225 230 235 240 Phe Ser Val Leu Phe Thr Pro Tyr His Vai Met Arg Asn Vai Arg Ile 245 250 255 Ala Ser Arg Leu Gly Ser Trp Lys Gin Tyr Gin Cys Thr Gin Val Val 260 265 270 Ile Asn Ser Phe Tyr Ile Val Thr Arg Pro Leu Ala Phe Leu Asn Ser WO 00/22131 PTU9/46 PCT/US99/24065 Val Ile 290 Met Leu 305 275 Asri Met -47- 280 285 Pro Val Phe Tyr Phe Leu Leu Gly Asp His Phe Arg Asp 295 300 Asn Gin Leu Arg His Asn Phe Lys Ser Leu Thr Ser Phe 310 315 320 Aia His Giu Leu Leu Leu Ser Phe Arg Giu Lys 325 330 Ser Arg Trp INFORMATION FOR SEQ ID NO:39: SEQUENCE CHARACTERISTICS: LENGTH: 1296 base pairs TYPE: nucleic acid STRANDfEDNESS: singie TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: ATGCAGGCGC TTAACATTAC CCCGGAGCAG TTCTCTCGGC TGCTGCGGGA CCACAACCTG
ACGCGGGAGC
CCGGGACGCG
TTTGGCAATG
AACATCTTTA
GTCACCATGC
GTGCCATTTG
GTGGAAAGGC
AGGGCTTTCA
TGGCACGTGC
TGCTTAGAAG
ATCCTCTTCC
CTTTGGATAA
ATGTCCAAAA
CTCTTTGCTG
TTTGAAAAGG
AGTTCATCGC
CCAAGCTGGC
CTCTGGTGTT
TCTGCTCCTT
TCCAGAACAT
TCCAGTCTAC
ACCAGGGACT
CAATGCTAGG
AACAACTTGA
AGTGGACCAG
TCCTGCCTCT
AGAAAAGAGT
TAG CCAGGAA
TGTGCTGGGC
AATATGATGA
TCTGTAC!CGG
CCTCGTGCTC
CTACGTGGTG
GGCGCTCAGT
TTCCGACAAC
CGCTGTTGTG
TGTGCATCCT
TGTGGTCTGG
GATCAAATAT
CCCTGTGCAC
TATGGTGATG
TGGGGATGGT
GAAGAAACGA
ACCATTCCAT
TGTCACAATC
CTGCGACCGC
ACCGGCGTGC
ACCCGCAGCA
GACCTGCTCA
TGGCTGGGGG
ACAGAAATGC
TTTAAAATGA
CTGGTGGCAG
GACTTCCTAT
CAGAAGATCT
CTTATTCTGT
TCAGTGCTTC
GCTGTCATTA
GTTGTCCATA
AAGATGATTT
TCGTCTACAC
TCATCTTCGC
AGGCCATGCG
TCACCTTCTT
GTGCTTTCAT
TCACTATGAC
AGTGGCAATA
TCATCGTAGG
ATGAAAAGGA
ACACCACCTT
ACAGTAAAAT
GAACTATTCA
TGATGGTGAC
TGATGATTGA
TTGCTATCGT
CCCAGAGCTG
CCTGGCGCTC
CACCGTCACC
CTGCATTCCC
TTGCAAGATG
CTGCATTGCT
CACCAACCGA
ATCACCCATG
ACACATCTGC
CATCCTTGTC
TGGTTATGAA
TGGAAAAGAA
AGTGGTGGCT
ATACAGTAAT
GCAAATTATT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 WO 00/22131 WO 0022131PCTIUS99/24065 48 GGATTTTCCA ACTCCATCTG TAATCCCATT AAAAATGTTT TGTCTGCAGT TTGTTATTGC AGGCATGGAA ATTCAGGAAT TACAATGATG AATCCAGTGG AGGAAACCAA AGGAGAAGCA TGTGAACAGA CAGAGGAGAA GAAAAAGCTC CTGGCTGAGA ATTCTCCTTT AGACAGTGGG (41) INFORMATION FOR SEQ ID NO:4C SEQUENCE CHARACTERISTICE LENGTH: 431 amino z TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not releN (ii) MOLECULE TYPE: protein GTCTATGCAT TTATGAATGA AAACTTCAAA ATAGTAAATA AAACCTTCTC TCCAGCACAA CGGAAGAAAG CAAAGTTTTC CCTCAGAGAG TTCAGTGATG GCAACATTGA AGTCAAATTG AAACGACATC TTGCTCTCTT TAGGTCTGAA
CATTAA
1020 1080 1140 1200 1260 1296 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:40: Met Gin Ala Leu Asn Ile Thr Pro Glu Gin Phe Ser Arg Leu Thr 10 Ile Leu Arg His Asn Leu Leu Val Tyr Arg Giu Gin Ala Leu Tyr Pro Thr Pro Giu Val Leu Thr Leu 40 Phe Gly Arg Aia Lys Phe Arg Leu Arg Leu Ala Leu Gly Asn Ala Gly Val Leu Leu Val Ile 55 Thr Ala Leu Ala Leu Met Phe Tyr Val Asn Val Ser Arg Ser Lys Arg Thr Vai Ile Phe Ile Cys Val Leu Ala Leu Leu Leu Ile Thr Phe Phe Cys Ile Gly Gly Ala 115 Val Vai Thr 130 Pro 100 Phe Thr Met Leu Gin 105 Val Ile Ser Asp Ile Cys Lys Pro Phe Val Gin 125 Val Asn Trp Leu 110 Ser Thr Ala Giu Arg His Giu Met Leu Thr 135 Thr Cys Ile Gin Gly Leu Val His Pro Phe Lys Met Lys Trp Gin Tyr Thr Asn Arg 145 150 155 160 WO 00/22131 WO 0022131PCTIUS99/24065 -49- Arg Gly Leu Val Leu 225 Leu His Ile Phe Tyr 305 Giy Giu Asn Met Glu 385 Cys Phe Ala Phe Thr Met Leu Gly Vai Ser Pro Tyr Glu 195 His Gin 210 Pro Leu Trp Ile Gly Lys Met Met 275 His Vai 290 Asp Asp Phe Ser Asn Phe Lys Thr 355 Met Arg 370 Thr Lys Giu Gin Arg Ser 165 Met Trp 180 Lys Glu Lys Ile Met Val Lys Lys 245 Glu Met 260 Val Thr Val His Vai Thr Asn Ser 325 Lys Lys 340 Phe Ser Lys Lys Giy Giu Thr Glu 405 Glu Leu 420 His His Tyr Met 230 Arg Ser Val Met Ile 310 Ile Asn Pro Ala Al a 390 Giu Ala Val Ile Thr 215 Leu Val Lys Val Met 295 Lys Cys Val Ala Lys 375 Phe Lys Glu Gin Cys 200 Thr Ile Gly Ile Ala 280 Ile Met Asn Leu Gin 360 Phe Ser Lys Asn Val Trp 170 Gin Leu 185 Cys Leu Phe Ile Leu Tyr Asp Gly 250 Ala Arg 265 Leu Phe Giu Tyr Ile Phe Pro Ile 330 Ser Ala 345 Arg His Ser Leu Asp Gly Lys Leu 410 Ser Pro Leu Glu Giu Leu Ser 235 Ser Lys Al a Ser Al a 315 Val Val Giy Arg Asn 395 Lys Leu Val Ala Ile Lys Giu Trp 205 Val Ile 220 Lys Ile Val Leu Lys Lys Val Cys 285 Asn Phe 300 Ile Val Tyr Ala Cys Tyr Asn Ser 365 Glu Asn 380 Ile Glu Arg His Asp Ser Val Tyr 190 Thr Leu Gly Arg Arg 270 Trp, Glu Gin Phe Cys 350 Gly Pro Val Leu Gly 430 Ile 175 Asp Ser Phe Tyr Thr 255 Al a Ala Lys Ile Met 335 Ile Ile Val Lys Al a 415 His Val Phe Pro Leu Glu 240 Ile Val Pro Glu Ile 320 Asn Val Thr Giu Leu 400 Leu (42) INFORMATION FOR SEQ ID NO:41: SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs WO 00/22131 PCT/US99/24065 TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: CTGTGTACAG CAGTTCGCAG AGTG 24 (43) INFORMATION FOR SEQ ID NO:42: SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: GAGTGCCAGG CAGAGCAGGT AGAC 24 (44) INFORMATION FOR SEQ ID NO:43: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: CCCGAATTCC TGCTTGCTCC CAGCTTGGCC C 31 INFORMATION FOR SEQ ID NO:44: SEQUENCE CHARACTERISTICS: LENGTH: 32 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES WO 00/22131 PCT/US99/24065 -51- (xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: TGTGGATCCT GCTGTCAAAG GTCCCATTCC GG 32 (46) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID TCACAATGCT AGGTGTGGTC (47) INFORMATION FOR SEQ ID NO:46: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: TGCATAGACA ATGGGATTAC AG 22 (48) INFORMATION FOR SEQ ID NO:47: SEQUENCE CHARACTERISTICS: LENGTH: 511 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: TCACAATGCT AGGTGTGGTC TGGCTGGTGG CAGTCATCGT AGGATCACCC ATGTGGCACG TGCAACAACT TGAGATCAAA TATGACTTCC TATATGAAAA GGAACACATC TGCTGCTTAG 120 WO 00/22131 PCT/US99/24065 -52- AAGAGTGGAC CAGCCCTGTG CACCAGAAGA TCTACACCAC TCCTCCTGCC TCTTATGGTG ATGCTTATTC TGTACGTAAA AAAGAAAAGA GTTGGGGATG GTTCAGTGCT TCGAACTATT AATAGCCAGG AAGAAGAAAC GAGCTGTCAT TATGATGGTG TGTGTGCTGG GCACCATTCC ATGTTGTCCA TATGATGATT GGAATATGAT GATGTCACAA TCAAGATGAT TTTTGCTATC CAACTCCATC TGTAATCCCA TTGTCTATGC A (49) INFORMATION FOR SEQ ID NO:48: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO
CTTCATCCTT
ATTGGTTATG
CATGGAAAAG
ACAGTGGTGG
GAATACAGTA
GTGCAAATTA
GTCATCCTCT
AACTTTGGAT
AAATGTCCAA
CTCTCTTTGC
ATTTTGAAAA
TTGGATTTTC
180 240 300 360 420 480 511 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: CTGCTTAGAA GAGTGGACCA G INFORMATION FOR SEQ ID NO:49: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: CTGTGCACCA GAAGATCTAC AC (51) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear WO 00/22131 PCT/US99/24065 -53- (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID CAAGGATGAA GGTGGTGTAG A 21 (52) INFORMATION FOR SEQ ID NO:51: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: GTGTAGATCT TCTGGTGCAC AGG 23 (53) INFORMATION FOR SEQ ID NO:52: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: GCAATGCAGG TCATAGTGAG C 21 (54) INFORMATION FOR SEQ ID NO:53: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: YES (iv) ANTI-SENSE: YES WO 00/22131 PCT/US99/24065 -54- (xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: TGGAGCATGG TGACGGGAAT GCAGAAG INFORMATION FOR SEQ ID NO:54: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: GTGATGAGCA GGTCACTGAG CGCCAAG (56) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID GCAATGCAGG CGCTTAACAT TAC (57) INFORMATION FOR SEQ ID NO:56: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: TTGGGTTACA ATCTGAAGGG CA WO 00/22131 PCT/US99/24065 (58) INFORMATION FOR SEQ ID NO:57: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: ACTCCGTGTC CAGCAGGACT CTG 23 (58) INFORMATION FOR SEQ ID NO:58: SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: TGCGTGTTCC TGGACCCTCA CGTG 24 (58) INFORMATION FOR SEQ ID NO:59: SEQUENCE CHARACTERISTICS: LENGTH: 29 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: CAGGCCTTGG ATTTTAATGT CAGGGATGG 29 (61) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs WO 00/22131 PCT/US99/24065 -56- TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear MOLECULE TYPE: DNA (genomic) ANTI-SENSE: YES (ii) (iv) (xi) SEQUENCE DESCRIPTION: SEQ ID GGAGAGTCAG CTCTGAAAGA ATTCAGG (62) INFORMATION FOR SEQ ID NO:61: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:61: TGATGTGATG CCAGATACTA ATAGCAC (63) INFORMATION FOR SEQ ID NO:62: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: CCTGATTCAT TTAGGTGAGA TTGAGAC (64) INFORMATION FOR SEQ ID NO:63: SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear WO 00/22131 PCT[US99/24065 -57- (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: CCCAAGCTTC CCCAGGTGTA TTTGAT INFORMATION FOR SEQ ID NO:63: SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: GTTGGATCCA CATAATGCAT TTTCTC (66) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 1080 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID ATGATTCTCA ACTCTTCTAC TGAAGATGGT ATTAAAAGAA TCCAAGATGA TTGTCCCAAA
GCTGGAAGGC
GTGGGAATAT
ACTGTGGCCA
TTGCCACTAT
TGTAAGATTG
TGTCTCAGCA
ACAATGCTTG
TTGCCAGCTA
GCTTTCCATT
ATAATTACAT
TTGGAAACAG
GTGTTTTTCT
GGGCTGTCTA
CTTCAGCCAG
TTGATCGATA
TAGCCAAAGT
TAATCCATCG
ATGAGTCCCA
ATTTGTCATG
CTTGGTGGTG
TTTGAATTTA
CACAGCTATG
CGTCAGTTTC
CCTGGCTATT
CACCTGCATC
AAATGTATTT
AAATTCAACC
ATTCCTACTT
ATAGTCATTT
GCACTGGCTG
GAATACCGCT
AACCTGTACG
GTTCACCCAA
ATCATTTGGC
TTCATTGAGA
CTTCCGATAG
TATACAGTAT
ACTTTTATAT
ACTTATGCTT
GGCCCTTTGG
CTAGTGTGTT
TGAAGTCCCG
TGCTGGCAGG
ACACCAATAT
GGCTGGGCCT
CATCTTTGTG
GAAGCTGAAG
TTTACTGACT
CAATTACCTA
TCTACTCACG
CCTTCGACGC
CTTGGCCAGT
TACAGTTTGT
GACCAAAAAT
120 180 240 300 360 420 480 540 600 WO 00/22131 WO 0022131PCTIUS99/24065
ATACTGGGTT
GCCCTAAAGA
ATAATTATGG
TTTCTGGATG
GACACGGCCA
TTTTATGGCT
C CC CCAAAAG
CCCTCAGATA
TCCTGTTTCC
AGGCTTATGA
CAATTGTGCT
TATTGATTCA
TGCCTATCAC
TTCTGGGGAA
CCAAATCCCA
ATGTAAGCTC
-58 TTTTCTGATC ATTCTTACAA AATTCAGAAG AACAAACCAA TTTCTTTTTC TTTTCCTGGA ACTAGGCATC ATACGTGACT CATTTGTATA GCTTATTTTA AAAATTTAAA AGATATTTTC CTCAAACCTT TCAACAAAAA ATCCACCAAG AAGCCTGCAC
GTTATACTCT
GAAATGATGA
TTCCCCACCA
GTAGAATTGC
ACAATTGCCT
TCCAGCTTCT
TGAGCACGCT
CATGTTTTGA
TATTTGGAAG
TATTTTTAAG
AATATTCACT
AGATATTGTG
GAATCCTCTT
AAAATATATT
TTCCTACCGC
GGTTGAGTGA
660 720 780 840 900 960 1020 1080 (67) INFORMATION FOR SEQ ID NO:66: SEQUENCE CHARACTERISTICS: LENGTH: 359 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: Met Ile Leu Asn Ser Ser Thr Giu Asp Gly Ile Lys Arg Ile 1 Asp 5 Al a Gin Asp Cys Pro Lys Gly Arg His Asn 25 Val Ile Phe Val Thr Leu Tyr Val Val Ile Ser Ile Ile Phe Val 40 Tyr Gly Ile Phe Gly Thr Met Ile Pro Asn Ser Leu Val Ala Ser Val Ile Tyr Phe Phe 55 Ala Met Lys Leu Lys Cys Val Leu Leu Leu Asn Leu 70 Val Leu Ala Asp Leu 75 Phe Leu Leu Thr Pro Leu Trp Ala Cys Tyr Thr Ala Gly Asn Tyr Tyr Ala Ser 115 Leu 100 Val1 Lys Ile Ala Ser 105 Leu Leu Thr Cys 120 Met Glu Tyr Arg Trp Pro Phe 90 Ala Ser Val Ser Phe Asn Leu 110 Leu Ser Ile Asp Arg Tyr Leu 125 Phe Ala Ile Val His Pro Met Lys Ser Arg Leu Arg Arg Thr Met Leu Val WO 00/22131 PCT/US99/24065 -59- 130 Ala Lys 145 Leu Pro Ile Thr Ile Gly Leu Ile 210 Ala Tyr 225 Ile Ile Gin Ile Asp Cys Cys Ile 290 Leu Gly 305 Pro Pro Leu Ser Ala Pro Val Thr Ala Ile Val Cys 180 Leu Gly 195 Ile Leu Glu Ile Met Ala Phe Thr 260 Arg Ile 275 Ala Tyr Lys Lys Lys Ala Tyr Arg 340 Cys Phe 355 Cys Ile 165 Ala Leu Thr Gin Ile 245 Phe Ala Phe Phe Lys 325 Pro Glu Ile 150 His Phe Thr Ser Lys 230 Val Leu Asp Asn Lys 310 Ser Ser Val 135 Ile Arg His Lys Tyr 215 Asn Leu Asp Ile Asn 295 Arg His Asp Glu Ile Asn Tyr Asn 200 Thr Lys Phe Val Val 280 Cys Tyr Ser Trp Val Glu 185 Ile Leu Pro Phe Leu 265 Asp Leu Phe Asn Leu Leu 155 Phe Phe 170 Ser Gin Leu Gly Ile Trp Arg Asn 235 Phe Phe 250 Ile Gin Thr Ala Asn Pro Leu Gin 315 Leu Ser 330 140 Ala Gly Ile Glu Asn Ser Phe Leu 205 Lys Ala 220 Asp Asp Ser Trp Leu Gly Met Pro 285 Leu Phe 300 Leu Leu Thr Lys Leu Asn Thr 190 Phe Leu Ile Ile Ile 270 Ile Tyr Lys Met Ala Thr 175 Leu Pro Lys Phe Pro 255 Ile Thr Gly Tyr Ser 335 Ser 160 Asn Pro Phe Lys Lys 240 His Arg Ile Phe Ile 320 Thr Asn Val Ser Ser Ser Thr Lys Lys Pro 345 350 (68) INFORMATION FOR SEQ ID NO:67: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) WO 00/22131 PCT/US99/24065 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: ACCATGGGCA GCCCCTGGAA CGGCAGC 27 (69) INFORMATION FOR SEQ ID NO:68: SEQUENCE CHARACTERISTICS: LENGTH: 39 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: AGAACCACCA CCAGCAGGAC GCGGACGGTC TGCCGGTGG 39 INFORMATION FOR SEQ ID NO:69: SEQUENCE CHARACTERISTICS: LENGTH: 39 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: GTCCGCGTCC TGCTGGTGGT GGTTCTGGCA TTTATAATT 39 (71) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 33 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: not relevant (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID CCTGGATCCT TATCCCATCG TCTTCACGTT AGC 33 (72) INFORMATION FOR SEQ ID NO:71: SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single WO 00/22131 PCT/US99/24065 -61- TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:71: CTGGAATTCT CCTGCCAGCA TGGTGA 26 (73) INFORMATION FOR SEQ ID NO:72: SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: GCAGGATCCT ATATTGCGTG CTCTGTCCCC (74) INFORMATION FOR SEQ ID NO:73: SEQUENCE CHARACTERISTICS: LENGTH: 999 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: ATGGTGAACT CCACCCACCG TGGGATGCAC ACTTCTCTGC ACCTCTGGAA TACAGACTGC ACAGCAATGC CAGTGAGTCC CTTGGAAAAG GCTACTCTGA TACGAGCAAC TTTTTGTCTC TCCTGAGGTG TTTGTGACTC TGGGTGTCAT GAGAATATCT TAGTGATTGT GGCAATAGCC AAGAACAAGA ATCTGCATTC TTTTTCATCT GCAGCTTGGC TGTGGCTGAT ATGCTGGTGA GCGTTTCAAA ACCATTATCA TCACCCTATT AAACAGTACA GATACGGATG CACAGAGTTT ATTGATAATG TCATTGACTC GGTGATCTGT AGCTCCTTGC TTGCATCCAT
CCGCAGCAGT
TGGAGGGTGC
CAGCTTGTTG
ACCCATGTAC
TGGATCAGAA
CACAGTGAAT
TTGCAGCCTG
WO 00/22131 WO 0022131PCTfUS99/24065 62
CTTTCAATTG
ATGACAGTTA
GGCATTTTGT
TTCTTCACCA
CTTCACATTA
ATGAAGGGAG
TTCTTCCTCC
ATGTCTCACT
ATTTATGCAC
CAGTGGACAG
AGCGGGTTGG
TCATCATTTA
TGCTGGCTCT
AGAGGATTGC
CGATTACCTT
ACTTAATATT
TTAACTTGTA
TCCGGAGTCA
GTACTTTACT ATCTTCTATG GATCAGCATA AGTTGTATCT CTCAGATAGT AGTGCTGTCA CATGGCTTCT CTCTATGTCC TGTCCTCCCC GGCACTGGTG GACCATCCTG ATTGGCGTCT CTACATCTCT TGTCCTCAGA TCTCATACTG ATCATGTGTA AGAACTGAGG AAAACCTTCA
CTCTCCAGTA
GGGCAGCTTG
TCATCTGCCT
ACATGTTCCT
CCATCCGCCA
TTGTTGTCTG
ATCCATATTG
ATTCAATCAT
AAGAGATCAT
CCATAACATT
CACGGTTTC.A
CATCACCATG
GATGGCCAGG
AGGTGCCAAT
CTGGGCCCCA
TGTGTGCTTC
CGATCCTCTG
CTGTTGCTAT
480 540 600 660 720 780 840 900 960 CCCCTGGGAG GCCTTTGTGA CTTGTCTAGC AGATATTAA INFORMATION FOR SEQ ID NO:74: SEQUENCE CHARACTERISTICS: CA) LENGTH: 332 amino acids TYPE: amino acid
STRAN'DEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: Met Val Asn Ser Thr His Arg Gly Met His Thr Ser Leu His Asn Arg Ser Lys Gly Tyr Giu Val Phe Ser Tyr Arg Leu His Ser Asn 25 Tyr Glu Leu Trp is Ile Ser Asp Gly Gly Cys 40 Val Thr Leu Gly Val 55 Ala Ile Ala Lys Asn 70 Cys Ser Leu Ala Val Ala Ser Glu Gin Leu Phe Leu Leu Giu Val Phe Val1 Ser Leu Gly Val Ser Pro Asn Ile Leu Lys Asn Leu 75 His Ser Pro Met Tyr Phe Ile Ala Asp Thr Leu 105 Met Leu Val Ser Leu Asn Ser Thr 110 Val Ser Asp Thr Asn Gly Ser Giu Thr Ile Ile Ile 100 Asp Ala Gin Ser Phe Thr Val Asn Ile Asp Asn Vai Ile Asp Ser Val WO 00/22131 PCT/US99/24065 -63- 115 120 125 Ile Cys Ser Ser Leu Leu Ala Ser Ile Cys Ser Leu Leu Ser Ile Ala 130 135 140 Val Asp Arg Tyr Phe Thr Ile Phe Tyr Ala Leu Gin Tyr His Asn Ile 145 150 155 160 Met Thr Val Lys Arg Val Gly Ile Ser Ile Ser Cys Ile Trp Ala Ala 165 170 175 Cys Thr Val Ser Gly Ile Leu Phe Ile Ile Tyr Ser Asp Ser Ser Ala 180 185 190 Val Ile Ile Cys Leu Ile Thr Met Phe Phe Thr Met Leu Ala Leu Met 195 200 205 Ala Ser Leu Tyr Val His Met Phe Leu Met Ala Arg Leu His Ile Lys 210 215 220 Arg Ile Ala Val Leu Pro Gly Thr Gly Ala Ile Arg Gin Gly Ala Asn 225 230 235 240 Met Lys Gly Ala Ile Thr Leu Thr Ile Leu Ile Gly Val Phe Val Val 245 250 255 Cys Trp Ala Pro Phe Phe Leu His Leu Ile Phe Tyr Ile Ser Cys Pro 260 265 270 Gin Asn Pro Tyr Cys Val Cys Phe Met Ser His Phe Asn Leu Tyr Leu 275 280 285 Ile Leu Ile Met Cys Asn Ser Ile Ile Asp Pro Leu Ile Tyr Ala Leu 290 295 300 Arg Ser Gin Glu Leu Arg Lys Thr Phe Lys Glu Ile Ile Cys Cys Tyr 305 310 315 320 Pro Leu Gly Gly Leu Cys Asp Leu Ser Ser Arg Tyr 325 330 (76) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 32 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID CCGAAGCTTC GAGCTGAGTA AGGCGGCGGG CT WO 00/22131 PCT/US99/24065 -64- (77) INFORMATION FOR SEQ ID NO:76: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genonic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: GTGGAATTCA TTTGCCCTGC CTCAACCCCC A (78) INFORMATION FOR SEQ ID NO:77: SEQUENCE CHARACTERISTICS: LENGTH: 1344 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:77: ATGGAGCTGC TAAAGCTGAA CCGGAGCGTG CAGGGAACCG GACCCGGGCC GGGGGCTTCC
CTGTGCCGCC
CCCCCTCGCA
TACGCAGTGA
CTGAGCCGCC
CTCCTGCTGG
ATCTTTGGCA
TCCACGCTAA
CAGGCACGAG
CTGTCCGGAC
CGTGTGCTGC
CTGCTGCTTC
ATCTCTCGCG
CGGGGGCGCC
TTCGCGGAGC
TCTTCCTGAT
GCCTGAGGAC
CTGTGGCTTG
CCGTCATCTG
GCCTCGTGGC
TGTGGCAGAC
TACTCATGGT
AGTGCGTGCA
TGCTCTTGTT
AGCTCTACTT
TCTCCTCAAC
CGGGACACGA
GAGCGTTGGA
TGTCACCAAT
CATGCCCTTC
CAAGGCGGTT
CATCGCACTG
GCGCTCCCAC
GCCCTACCCC
TCGCTGGCCC
CTTCATCCCA
AGGGCTTCGC
AGCAGCAGTG
GAATTGGAGC
GGAAATATGC
GCCTTCCTCC
ACCCTCCTGC
TCCTACCTCA
GAGCGATATA
GCGGCTCGCG
GTGTACACTG
AGTGCGCGGG
GGTGTGGTTA
TTTGACGGCG
TGGGCAACCT
TGGCCATTAG
TCATCATCGT
TCTCACTGGC
CCAATCTCAT
TGGGGGTGTC
GCGCCATCTG
TGATTGTAGC
TCGTGCAACC
TCCGCCAGAC
TGGCCGTGGC
ACAGTGACAG
CAGCTGCGAG
AATCACTCTT
GGTCCTGGGA
AGTCAGCGAC
GGGCACATTC
TGTGAGTGTG
CCGACCACTG
CACGTGGCTG
AGTGGGGCCT
CTGGTCCGTA
CTACGGGCTT
CGACAGCCAA
120 180 240 300 360 420 480 540 600 660 720 780 840 AGCAGGGTCC GAAACCAAGG CGGGCTGCCA GGGGCTGTTC ACCAGAACGG GCGTTGCCGG WO 00/22131 WO 0022131PCT/US99/24065 CCTGAGACTG GCGCGGTTGG CAAAGACAGC GATGGCTGCT CGGCCTGCCC TGGAGCTGAC GGCGCTGACG GCTCCTGGGC ACCCAGGCCA AGCTGCTGGC TAAGAAGCGC GTGGTGCGAA CTTTTTTTTC TGTGTTGGTT GCCAGTTTAT AGTGCCAACA CCGGGTGCAC ACCGAGCACT CTCGGGTGCT CCTATCTCCT GCCTCGGCCT GTGTCAACCC CCTGGTCTAC TGCTTCATGC TGCCTGGAA.A CTTGCGCTCG CTGCTGCCCC CGGCCTCCAC CCCGATGAGG ACCCTCCCAC TCCCTCCATT GCTTCGCTGT ATCAGCACAC TGGGCCCTGG CTGA (79) INFORMATION FOR SEQ ID NO:78: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 447 amino acids TYPE: amino acid
STRAINDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein
ACGTGCAACT
CGGGATCCGG
TGTTGCTGGT
CGTGGCGCGC
TCATTCACTT
ACCGTCGCTT
GAGCTCGCCC
CCAGGCTTAG
TCCACGTTCC
CTCCCGGCCC
GATCGTTGTG
CTTTGATGGC
GCTGAGCTAC
TCGCCAGGCC
CAGGGCTCTT
CTACACCACC
900 960 1020 1080 1140 1.200 1260 1320 1344 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: Met Giu Leu Leu Lys Leu Asn Arg Ser Val Gin Gly Thr Gly Ala Pro Gly Pro Gly Ala Ser Leu Cys Arg Pro Gly Pro Pro Leu Leu Ser Val Gly Thr Arg Giu s0 Phe Leu Met Asn. Leu Ser Cys Giu 40 Ile Pro Arg Ile Arg Tyr Asn Ser Ser Gly Ala Giy Ala Val Ile Leu Glu Leu Ser Val Giy 70 Arg Leu Arg Arg Ile Thr Asn Met Leu Leu Ile 75 Al a Val Val Leu Gly Ser Arg Ala Vai Ser Asp 100 Leu Pro Asn Leu 115 Leu Leu Met Giy Thr Val Thr Leu Ala Val 105 Thr Phe Ile 120 Asn 90 Phe Leu Leu Ser Leu Ala Cys Met Pro Phe Thr Leu 110 Ile Cys Lys Phe Gly Thr WO 00/22131 PCT/US99/24065 -66- Ala Val Ser Tyr Leu Met Gly Val Ser Val Ser Val Ser Thr Leu Ser 130 135 140 Leu Val Ala Ile Ala Leu Glu Arg Tyr Ser Ala Ile Cys Arg Pro Leu 145 150 155 160 Gln Ala Arg Val Trp Gln Thr Arg Ser His Ala Ala Arg Val Ile Val 165 170 175 Ala Thr Trp Leu Leu Ser Gly Leu Leu Met Val Pro Tyr Pro Val Tyr 180 185 190 Thr Val Val Gin Pro Val Gly Pro Arg Val Leu Gin Cys Val His Arg 195 200 205 Trp Pro Ser Ala Arg Val Arg Gin Thr Trp Ser Val Leu Leu Leu Leu 210 215 220 Leu Leu Phe Phe Ile Pro Gly Val Val Met Ala Val Ala Tyr Gly Leu 225 230 235 240 Ile Ser Arg Glu Leu Tyr Leu Gly Leu Arg Phe Asp Gly Asp Ser Asp 245 250 255 Ser Asp Ser Gin Ser Arg Val Arg Asn Gin Gly Gly Leu Pro Gly Ala 260 265 270 Val His Gin Asn Gly Arg Cys Arg Pro Glu Thr Gly Ala Val Gly Lys 275 280 285 Asp Ser Asp Gly Cys Tyr Val Gin Leu Pro Arg Ser Arg Pro Ala Leu 290 295 300 Glu Leu Thr Ala Leu Thr Ala Pro Gly Pro Gly Ser Gly Ser Arg Pro 305 310 315 320 Thr Gin Ala Lys Leu Leu Ala Lys Lys Arg Val Val Arg Met Leu Leu 325 330 335 Val Ile Val Val Leu Phe Phe Leu Cys Trp Leu Pro Val Tyr Ser Ala 340 345 350 Asn Thr Trp Arg Ala Phe Asp Gly Pro Gly Ala His Arg Ala Leu Ser 355 360 365 Val Ala Pro Ile Ser Phe Ile His Leu Leu Ser Tyr Ala Ser Ala Cys 370 375 380 Val Asn Pro Leu Val Tyr Cys Phe Met His Arg Arg Phe Arg Gin Ala 385 390 395 400 Cys Leu Glu Thr Cys Ala Arg Cys Cys Pro Arg Pro Pro Arg Ala Arg 405 410 415 Pro Arg Ala Leu Pro Asp Glu Asp Pro Pro Thr Pro Ser Ile Ala Ser WO 00/22131 PCT/US99/24065 -67- 420 425 430 Leu Ser Arg Leu Ser Tyr Thr Thr Ile Ser Thr Leu Gly Pro Gly 435 440 445 INFORMATION FOR SEQ ID NO:79: SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: TGCAAGCTTA AAAAGGAAAA AATGAACAGC (81) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID TAAGGATCCC TTCCCTTCAA AACATCCTTG (82) INFORMATION FOR SEQ ID NO:81: SEQUENCE CHARACTERISTICS: LENGTH: 1014 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:81: ATGAACAGCA CATGTATTGA AGAACAGCAT GACCTGGATC ACTATTTGTT TCCCATTGTT TACATCTTTG TGATTATAGT CAGCATTCCA GCCAATATTG GATCTCTGTG TGTGTCTTTC 120 CTGCAACCCA AGAAGGAAAG TGAACTAGGA ATTTACCTCT TCAGTTTGTC ACTATCAGAT 180 TTACTCTATG CATTAACTCT CCCTTTATGG ATTGATTATA CTTGGAATAA AGACAACTGG 240 WO 00/22131 WO 00/213 1PCTIUS99/24065 68 ACTTTCTCTC CTGCCTTGTG CAAAGGGAGT GCTTTTCTCA TGTACATGAA GTTTTA.CAGC
AGCACAGCAT
AAGTTTTTTT
TTGGAAACCA
GATGCCGAA
ATCAACCTCA
ATCTGTAACC
AAGAAGAGAA
CCCTTTCATG
CACAGCAATT
TTAAATTGTG
ATGTGGAATA
TCCTCACCTG
TCCTAAGGAC
TCTTCAATGC
AGTCTAATTT
ACTTGTTCAG
GGAAAGTCTA
TCATAAAACT
TGATGTTGCT
CTGGGAAGCG
TTGCTGATCC
TATTAAAATT
CATTGCCGTT
AAGAAGAATT
TGTCATGTTG
TACTTTATGC
GACGTGTACA
CCAAGCTGTG
ACTTGTCAGC
GATTCGCTGC
AACTTACACA
AATTCTGTAC
CTGCACTGGG
GATCGGTATT
GCACTCATGG
TGGGAAGATG
TATGACAAAT
GGCTATGCAA
CGGCACAATA
ATCACAGTTA
ATTTTAGAGC
ATGTATAGAA
TGTTTTGTTA.
AGOT~gTAATA
TGGCTGTTGT
TCAGCCTGTC
AAACAGTTGT
ACCCTTTAGA
TA.CCTTTGGT
AAGCCACGGA
CTTTTGTCTT
ATGCTGTGAA
TCACGGTTGC
CCGAAACAGG
CATCACAAAG
CTACCCTTTG
CATCTGGATA
TGAATATTGC
GAAATGGCAA
CACCATCCTG
AAACAAGGAA
ATGCTTTACT
CTTCGAAGAC
ATTAACAAGT
AA.GATATGAT
ACAAAGAAAA
300 360 420 480 540 600 660 720 780 840 900 960 CGCATACTTT CTGTGTCTAC AAAAGATACT ATGGAATTAG AGGTCCTTGA GTAG (83) INFORMATION FOR SEQ ID NO:82: SEQUENCE CHARACTERISTICS: LENGTH: 337 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein 1014 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:82: Met Asn Ser Thr Cys Ile Giu Glu Gin His Asp 1 5 10 Phe Pro Ile Val Tyr Ile Phe Val Ile Ile Val 25 Ile Gly Ser Leu Cys Val Ser Phe Leu Gin Pro 40 Leu Gly Ile Tyr Leu Phe Ser Leu Ser Leu Ser 55 Leu*Thr Leu Pro Leu Trp Ile Asp Tyr Thr Trp 70 75 Thr Phe Ser Pro Ala Leu Cys Lys Giy Ser Ala Leu Asp His Tyr Leu Ser Ile Lys Lys Asp Leu Pro Ala Asn Glu Ser Glu Leu Tyr Ala Asn Lys Asp Asn Trp Met Phe Leu Met Tyr WO 00/22131 WO 0022131PCT[US99/24065 69 Lys Tyr Arg Phe 145 Asp Giu Ala Ala Ile 225 Pro Asn Arg Leu Leu 305 Phe Leu Ile 130 Asn Al a Lys Ile Val 210 Lys Phe Phe Ile Tyr 290 Lys Tyr Al a 115 Al a Al a Glu Trp Pro 195 Arg Leu His Glu Thr 275 Cys Phe Ser 100 Val Leu Val Lys Gin 180 Leu His Leu Val Asp 260 Val Phe Cys Ser Val Met Met Ser 165 Ile Val Asn Val Met 245 His Ala Val Thr Thr Ala Tyr Pro Val Ser 135 Leu Trp 150 Asn Phe Asn Leu Thr Ile Lys Ala 215 Ser Ile 230 Leu Leu Ser Asn Leu Thr Thr Glu 295 Gly Arg 310 Phe Leu 120 Leu Glu Thr Asn Leu 200 Thr Thr Ile Ser Ser 280 Thr Cys 90 Leu Thr 105 Lys Phe Ser Ile Asp Glu Leu Cys 170 Leu Phe 185 Ile Cys Giu Asn Val Thr Arg Cys 250 Gly Lys 265 Leu Asn Gly Arg Asn Thr Asp Arg Cys Ile Ala Val 110 Phe Phe Trp Ile 140 Thr Val 155 Tyr Asp Arg Thr Asn Arg Lys Giu 220 Phe Val 235 Ile Leu Arg Thr Cys Val Tyr Asp 300 Ser Gln 315 Leu 125 Leu Val Lys Cys Lys 205 Lys Leu Glu Tyr Ala 285 Met Arg Arg Glu Glu Tyr Thr 190 Val Lys Cys His Thr 270 Asp Trp Gin Thr Thr Tyr Pro 175 Gly Tyr Arg Phe Al a 255 Met Pro Asn Arg Arg Ile Cys 160 Leu Tyr Gin Ile Thr 240 Val Tyr Ile Ile Lys 320 Arg Ile Leu Ser Val Ser Thr Lys Asp Thr Met Glu Leu Giu Val Leu 325 330 335 Glu (84) INFORMATION FOR SEQ ID NO:83: SEQUENCE CHARACTERISTICS: LENGTH: 40 base pairs TYPE: nucieic acid WO 00/22131 PCT/US99/24065 STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: CAGGAAGAAG AAACGAGCTG TCATTATGAT GGTGACAGTG INFORMATION FOR SEQ ID NO:84: SEQUENCE CHARACTERISTICS: LENGTH: 40 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:84: CACTGTCACC ATCATAATGA CAGCTCGTTT CTTCTTCCTG (86) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID GGCCACCGGC AGACCAAACG CGTCCTGCTG (87) INFORMATION FOR SEQ ID NO:86: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:86: WO 00/22131 PCT/US99/24065 -71- CTCCTTCGGT CCTCCTATCG TTGTCAGAAG T 31 (88) INFORMATION FOR SEQ ID NO:87: SEQUENCE CHARACTERISTICS: LENGTH: 37 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:87: GGAAAAGAAG AGAATCAAAA AACTACTTGT CAGCATC 37 (89) INFORMATION FOR SEQ ID NO:88: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:88: CTCCTTCGGT CCTCCTATCG TTGTCAGAAG T 31 INFORMATION FOR SEQ ID NO:89: SEQUENCE CHARACTERISTICS: LENGTH: 1080 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:89: ATGATTCTCA ACTCTTCTAC TGAAGATGGT ATTAAAAGAA TCCAAGATGA TTGTCCCAAA GCTGGAAGGC ATAATTACAT ATTTGTCATG ATTCCTACTT TATACAGTAT CATCTTTGTG 120 GTGGGAATAT TTGGAAACAG CTTGGTGGTG ATAGTCATTT ACTTTTATAT GAAGCTGAAG 180 ACTGTGGCCA GTGTTTTTCT TTTGAATTTA GCACTGGCTG ACTTATGCTT TTTACTGACT 240 TTGCCACTAT GGGCTGTCTA CACAGCTATG GAATACCGCT GGCCCTTTGG CAATTACCTA 300 WO 00/22131 PTU9/46 PCTIUS99/24065 72
TGTAAGATTG
TGTCTCAGCA
ACAATGCTTG
TTGCCAGCTA
GCTTTCCATT
ATACTGGGTT
GCCCTAAAGA
ATAATTATGG
TTTCTGGATG
GACACGGCCA
TTTTATGGCT
CCCCCAAAAG
CTTCAGCCAG
TTGATCGATA
TAGCCAAAGT
TAATCCATCG
ATGAGTCCCA
TCCTGTTTCC
AGGCTTATGA
CAATTGTGCT
TATTGATTCA
TGCCTATCAC
TTCTGGGGAA
CCAAATCCCA
CGTCAGTTTC
CCTGGCTATT
CACCTGCATC
AAATGTATTT
AAATTCAACC
TTTTCTGATC
AATTCAGAAG
TTTCTTTTTC
ACTAGGCATC
CATTTGTATA
AAAATTTAAA
CTCAAACCTT
AACCTGTACG
GTTCACCCAA
ATCATTTGGC
TTCATTGAGA
CTTCCGATAG
ATTCTTACAA
AACAAACCAA
TTTTCCTGGA
ATACGTGACT
GCTTATTTTA
AGATATTTTC
TCAACAAAAA
CTAGTGTGTT
TGAAGTCCCG
TGCTGGCAGG
ACACCAATAT
GGCTGGGCCT
GTTATACTCT
GAAATGATGA
TTCCCCACCA
GTAGAATTGC
ACAATTGCCT
TCCAGCTTCT
TGAGCACGCT
TCTACTCACG
CCTTCGACGC
CTTGGCCAGT
TACAGTTTGT
GACCAAAAAT
TATTTGGAAG
TATTAAAAAG
AATATTCACT
AGATATTGTG
GAATCCTCTT
AAAATATATT
TTCCTACCGC
360 420 480 540 600 660 720 780 840 900 960 1020 1080 CCCTCAGATA ATGTAAGCTC ATCCACCAAG AAGCCTGCAC CATGTTTTGA GGTTGAGTGA (91) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 359 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Ile Leu Asn Ser Ser Thr Glu Asp Gly 1 5 10 Asp Cys Pro Lys Ala Gly Arg His Asn Tyr 25 Thr Leu Tyr Ser Ile Ile Phe Val Val Gly 40 Val Val Ile Val Ile Tyr Phe Tyr Met Lys 55 Val Phe Leu Leu Asn Leu Ala Leu Ala Asp Ile Lys Arg Ile Gin Asp Ile Phe Val Ile Phe Gly Leu Lys Thr Met Ile Pro Asn Ser Leu Val Ala Ser Leu Cys Phe Leu Leu Leu 75 Glu Thr Phe Pro Leu Trp, Ala Val Tyr Thr Ala Met TrAgTpPo Tyr Arg Trp, Pro WO 00/22131 WO 0022131PCT/US99/24065 73 Gly Tyr Ala Ala 145 Leu Ile Ile Leu Ala 225 Ile Gin Asp Cys Leu 305 Pro Leu Ala Asn Aia Ile 130 Lys Pro Thr Giy Ile 210 Tyr Ile Ile Cys Ile 290 Gly Pro Ser Pro Tyr Ser 115 Val Val Al a Val Leu 195 Ile Giu Met Phe Arg 275 Ala Lys Lys Tyr Cys 355 Leu 100 Val His Thr Ile Cys 180 Gly Leu Ile Ala Thr 260 Ile Tyr Lys Ala Arg 340 Phe Cys Phe Pro Cys Ile 165 Ala Leu Thr Gin Ile 245 Phe Ala Phe Phe Lys 325 Pro Giu Lys Leu Met Ile 150 His Phe Thr Ser Lys 230 Val Leu Asp Asn Lys 310 Ser Ser Val Ile Leu Lys 135 Ile Arg His Lys Tyr 215 Asn Leu Asp Ile Asn 295 Arg His Asp Glu Ala Thr 120 Ser Ile Asn Tyr Asn 200 Thr Lys Phe Vai Val 280 Cys Tyr Ser Asn Ser 105 Cys Arg Trp Vai Giu 185 Ile Leu Pro Phe Leu 265 Asp Leu Phe Asn Val 34S Al a Leu Leu Leu Phe 170 Ser Leu Ile Arg Phe 250 Ile Thr Asn Leu Leu 330 Ser Ser Ser Arg Leu 155 Phe Gin Gly Trp Asn 235 Phe Gin Ala Pro Gin 315 Ser Ser Val Ile Arg 140 Ala Ile Asn Phe Lys 220 Asp Ser Leu Met Leu 300 Leu Thr Ser Ser Asp 125 Thr Gly Glu Ser Leu 205 Ala Asp Trp Gly Pro 285 Phe Leu Lys Thr Phe 110 Arg Met Leu Asn Thr 190 Phe Leu Ile Ile Ile 270 Ile Tyr Lys Met Lys 350 Asn Tyr Leu Ala Thr 175 Leu Pro Lys Lys Pro 255 Ile Thr Gly Tyr Ser 335 Lys Leu Leu Val Ser 160 Asn Pro Phe Lys Lys 240 His Arg Ile Phe Ile 320 Thr Pro (92) INFORMAATION FOR SEQ ID NO:91: WO 00/22131 PCT/US99/24065 -74- SEQUENCE CHARACTERISTICS: LENGTH: 35 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:91: CCAAGAAATG ATGATATTAA AAAGATAATT ATGGC (93) INFORMATION FOR SEQ ID NO:92: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:92: CTCCTTCGGT CCTCCTATCG TTGTCAGAAG T (94) INFORMATION FOR SEQ ID NO:93: SEQUENCE CHARACTERISTICS: LENGTH: 1080 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:93: ATGATTCTCA ACTCTTCTAC TGAAGATGGT ATTAAAAGAA TCCAAGATGA GCTGGAAGGC ATAATTACAT ATTTGTCATG ATTCCTACTT TATACAGTAT GTGGGAATAT TTGGAAACAG CTTGGTGGTG ATAGTCATTT ACTTTTATAT ACTGTGGCCA GTGTTTTTCT TTTGAATTTA GCACTGGCTG ACTTATGCTT TTGCCACTAT GGGCTGTCTA CACAGCTATG GAATACCGCT GGCCCTTTGG TGTAAGATTG CTTCAGCCAG CGTCAGTTTC GCCCTGTACG CTAGTGTGTT TGTCTCAGCA TTGATCGATA CCTGGCTATT GTTCACCCAA TGAAGTCCCG
TTGTCCCAAA
CATCTTTGTG
GAAGCTGAAG
TTTACTGACT
CAATTACCTA
TCTACTCACG
CCTTCGACGC
120 180 240 300 360 420 WO 00/22131 WO 00/213 1PCTIUS99/24065
ACAATGCTTG
TTGCCAGCTA
GCTTTCCATT
ATACTGGGTT
GCCCTAAAGA
ATAATTATGG
TTTCTGGATG
GACACGGCCA
TTTTATGGCT
CCCCCAAAAG
CCCTCAGATA
TAGCCAAAGT
TAATCCATCG
ATGAGTCCCA
TCCTGTTTCC
AGGCTTATGA
CAATTGTGCT
TATTGATTCA
TGCCTATCAC
TTCTGGGGAA
CCAAATCCCA
ATGTAAGCTC
CACCTGCATC
AAATGTATTT
AAATTCAACC
TTTTCTGATC
AATTCAGAAG
TTTCTTTTTC
ACTAGGCATC
CATTTGTATA
AAAATTTAAA
CTCAAACCTT
ATCCACCAAG
ATCATTTGGC
TTCATTGAGA
CTTCCGATAG
ATTCTTACAA
AACAAACCAA
TTTTCCTGGA
ATACGTGACT
GCTTATTTTA
AGATATTTTC
TCAACAAAAA
TGCTGGCAGG
ACACCAATAT
GGCTGGGCCT
GTTATACTCT
GAAATGATGA
TTCCCCACCA
GTAGAATTGC
ACAATTGCCT
TCCAGCTTCT
TGAGCACGCT
CTTGGCCAGT
TACAGTTTGT
GACCAAAAAT
TATTTGGAAG
TATTTTTAAG
AATATTCACT
AGATATTGTG
GAATCCTCTT
AAAATATATT
TTCCTACCGC
480 540 600 660 720 780 840 900 960 1020 1080 AAGCCTGCAC CATGTTTTGA GGTTGAGTGA INFORMATION FOR SEQ ID NO:94: SEQUENCE CHARACTERISTICS: LENGTH: 359 amino acids TYPE: amino acid
STP.AIDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:94: Met Ile Leu Asn Ser Ser Thr Glu Asp Gly Ile Lys Arg Ile 1 Asp Cys Pro Lys Thr Leu Tyr Ser Val Val Ile Val 10 Tyr Gin Asp Ala Gly Arg His Asn 25 Val Ile Phe Val Ile Ile Phe Ile Tyr Phe 55 Val 40 Tyr Gly Ile Phe Gly Thr Met Ile Pro Asn Ser Leu Val Ala Ser Met Lys Leu Val Phe Lys Cys Leu Leu Asn Leu Leu 70 Val Aia Leu Ala Asp Leu 75 Glu Phe Leu Leu Thr Pro Leu Trp Tyr Thr Aia Met Al a Tyr Arg Trp Pro Phe Aia Leu Gly Asn Tyr Leu Lys Ile Ala Ser Ser Val Ser Phe WO 00/22131 PCT/US99/24065 -76- 100 105 110 Tyr Ala Ser Val Phe Leu Leu Thr Cys Leu Ser Ile Asp Arg Tyr Leu 115 120 125 Ala Ile Val His Pro Met Lys Ser Arg Leu Arg Arg Thr Met Leu Val 130 135 140 Ala Lys Val Thr Cys Ile Ile Ile Trp Leu Leu Ala Gly Leu Ala Ser 145 150 155 160 Leu Pro Ala Ile Ile His Arg Asn Val Phe Phe Ile Glu Asn Thr Asn 165 170 175 Ile Thr Val Cys Ala Phe His Tyr Glu Ser Gin Asn Ser Thr Leu Pro 180 185 190 Ile Gly Leu Gly Leu Thr Lys Asn Ile Leu Gly Phe Leu Phe Pro Phe 195 200 205 Leu Ile Ile Leu Thr Ser Tyr Thr Leu Ile Trp Lys Ala Leu Lys Lys 210 215 220 Ala Tyr Glu Ile Gin Lys Asn Lys Pro Arg Asn Asp Asp Ile Phe Lys 225 230 235 240 Ile Ile Met Ala Ile Val Leu Phe Phe Phe Phe Ser Trp Ile Pro His 245 250 255 Gin Ile Phe Thr Phe Leu Asp Val Leu Ile Gin Leu Gly Ile Ile Arg 260 265 270 Asp Cys Arg Ile Ala Asp Ile Val Asp Thr Ala Met Pro Ile Thr Ile 275 280 285 Cys Ile Ala Tyr Phe Asn Asn Cys Leu Asn Pro Leu Phe Tyr Gly Phe 290 295 300 Leu Gly Lys Lys Phe Lys Arg Tyr Phe Leu Gin Leu Leu Lys Tyr Ile 305 310 315 320 Pro Pro Lys Ala Lys Ser His Ser Asn Leu Ser Thr Lys Met Ser Thr 325 330 335 Leu Ser Tyr Arg Pro Ser Asp Asn Val Ser Ser Ser Thr Lys Lys Pro 340 345 350 Ala Pro Cys Phe Glu Val Glu 355 (97) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid WO 00/22131 PCT/US99/24065 -77- STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID CCCAAGCTTC CCCAGGTGTA TTTGAT 26 (97) INFORMATION FOR SEQ ID NO:96: SEQUENCE CHARACTERISTICS: LENGTH: 29 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:96: CCTGCAGGCG AAACTGACTC TGGCTGAAG 29 (98) INFORMATION FOR SEQ ID NO:97: SEQUENCE CHARACTERISTICS: LENGTH: 42 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:97: CTGTACGCTA GTGTGTTTCT ACTCACGTGT CTCAGCATTG AT 42 (99) INFORMATION FOR SEQ ID NO:98: SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) WO 00/22131 PCT/US99/24065 -78- (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:98: GTTGGATCCA CATAATGCAT TTTCTC (100) INFORMATION FOR SEQ ID NO:99: SEQUENCE CHARACTERISTICS: LENGTH: 1080 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:99: ATGATTCTCA ACTCTTCTAC TGAAGATGGT ATTAAAAGAA TCCAAGATGA TTGTCCCAAA GCTGGAAGGC ATAATTACAT GTGGGAATAT TTGGAAACAG ACTGTGGCCA GTGTTTTTCT TTGCCACTAT GGGCTGTCTA TGTAAGATTG CTTCAGCCAG TGTCTCAGCA TTGATCGATA ACAATGCTTG TAGCCAAAGT TTGCCAGCTA TAATCCATCG GCTTTCCATT ATGAGTCCCA ATACTGGGTT TCCTGTTTCC CACTTACTGA AGACGAATAG ATAATTATGG CAATTGTGCT TTTCTGGATG TATTGATTCA GACACGGCCA TGCCTATCAC TTTTATGGCT TTCTGGGGAA CCCCCAAAAG CCAAATCCCA CCCTCAGATA ATGTAAGCTC
ATTTGTCATG
CTTGGTGGTG
TTTGAATTTA
CACAGCTATG
CGTCAGTTTC
CCTGGCTATT
CACCTGCATC
AAATGTATTT
AAATTCAACC
TTTTCTGATC
CTATGGGAAG
TTTCTTTTTC
ACTAGGCATC
CATTTGTATA
AAAATTTAAA
CTCAAACCTT
ATCCACCAAG
ATTCCTACTT
ATAGTCATTT
GCACTGGCTG
GAATACCGCT
AACCTGTACG
GTTCACCCAA
ATCATTTGGC
TTCATTGAGA
CTTCCGATAG
ATTCTTACAA
AACAGGATAA
TTTTCCTGGA
ATACGTGACT
GCTTATTTTA
AGATATTTTC
TCAACAAAAA
AAGCCTGCAC
TATACAGTAT
ACTTTTATAT
ACTTATGCTT
GGCCCTTTGG
CTAGTGTGTT
TGAAGTCCCG
TGCTGGCAGG
ACACCAATAT
GGCTGGGCCT
GTTATTTTGG
CCCGTGACCA
TTCCCCACCA
GTAGAATTGC
ACATTTGCCT
TCCAGCTTCT
TGAGCACGCT
CATGTTTTGA
CATCTTTGTG
GAAGCTGAAG
TTTACTGACT
CAATTACCTA
TCTACTCACG
CCTTCGACGC
CTTGGCCAGT
TACAGTTTGT
GACCAAAAAT
AATTCGAAAA
AGTTAAGAAG
AATATTCACT
AGATATTGTG
GAATCCTCTT
AAAATATATT
TTCCTACCGC
GGTTGAGTGA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 WO 00/22131 WO 0022131PCT/US99/24065 79 (101) INFORMATION FOR SEQ ID NO:l00: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 359 amino acids TYPE: amino acid STRANiDEDNESS: TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:100: Met Ile Leu Asn Ser Ser Thr Glu Asp Gly Ile Lys Arg Ile Gin Asp 1 Asp Thr Vai Val Leu Gly Tyr Ala Ala 145 Leu Ile Ile Cys Leu Val Phe Pro Asn Ala Ile 130 Lys Pro Thr Gly Pro Tyr Ile Leu Leu Tyr Ser 115 Val Val Al a Val Leu 195 Lys Ser Val Leu Trp Leu 100 Val His Thr Ile Cys 180 Gly 5 Ala Ile Ile Asn Al a Cys Phe Pro Cys I le 165 Al a Leu Gly Ile Tyr Leu 70 Val Lys Leu Met Ile 150 His Phe Thr Arg Phe Phe 55 Ala Tyr Ile Leu Lys 135 Ile Arg His Lys His Val 40 Tyr Leu Thr Al a Thr 120 Ser Ile Asn Tyr Asn 200 Asn 25 Val Met Al a Al a Ser 105 Cys Arg Trp Val Glu 185 Ile 10 Tyr Gly Lys Asp Met 90 Ala Leu Leu Leu Phe 170 Ser Leu Ile Ile Leu Leu 75 Glu Ser Ser Arg Leu 1S5 Phe Gin Gly Phe Phe Lys Cys Tyr Val Ile Arg 140 Ala Ile Asn Phe Val Gly Thr Phe Arg Ser Asp 125 Thr Gly Glu Ser Leu 205 Met Asn Val Leu Trp Phe 110 Arg Met Leu Asn Thr 190 Phe Ile Ser Ala Leu Pro Asn Tyr Leu Al a Thr 175 Leu Pro Pro Leu Ser Thr Phe Leu Leu Val Ser 160 Asn Pro Phe Leu Ile Ile Leu Thr Ser Tyr Phe Gly Ile Arg Lys His Leu Leu Lys 210 215 220 WO 00/22131 PCT/US99/24065 Asn Arg Thr Asn Ser Tyr Gly Lys 225 230 Ile Thr Arg Asp Gin Val Lys Lys 240 235 Ile Gin Asp Cys Leu 305 Pro Leu Ala Ile Ile Cys Ile 290 Gly Pro Ser Pro Met Phe Arg 275 Ala Lys Lys Tyr Cys Ala Thr 260 Ile Tyr Lys Ala Arg 340 Phe Ile 245 Phe Ala Phe Phe Lys 325 Pro Glu Leu Asp Ile Asn 295 Arg His Asp Glu Phe Leu 265 Asp Leu Phe Asn Val 345 Phe 250 Ile Thr Asn Leu Leu 330 Ser Phe Gin Ala Pro Gin 315 Ser Ser Trp Gly Pro 285 Phe Leu Lys Thr Pro 255 Ile Thr Gly Tyr Ser 335 Lys His Arg Ile Phe Ile 320 Thr Pro 355 (102) INFORMATION FOR SEQ ID NO:101: SEQUENCE CHARACTERISTICS: LENGTH: 37 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:101: TCCGAATTCC AAAATAACTT GTAAGAATGA TCAGAAA (103) INFORMATION FOR SEQ ID NO:102: SEQUENCE CHARACTERISTICS: LENGTH: 33 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO WO 00/22131 PCT/US99/24065 -81- (xi) SEQUENCE DESCRIPTION: SEQ ID NO:102: AGATCTTAAG AAGATAATTA TGGCAATTGT GCT (104) INFORMATION FOR SEQ ID NO:103: SEQUENCE CHARACTERISTICS: LENGTH: 62 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:103: AATTCGAAAA CACTTACTGA AGACGAATAG CTATGGGAAG AACAGGATAA CCCGTGACCA
AG
(105) INFORMATION FOR SEQ ID NO:104: SEQUENCE CHARACTERISTICS: LENGTH: 62 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:104: TTAACTTGGT CACGGGTTAT CCTGTTCTTC CCATAGCTAT TCGTCTTCAG TAAGTGTTTT
CG
(106) INFORMATION FOR SEQ ID NO:105: SEQUENCE CHARACTERISTICS: LENGTH: 1083 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:105: WO 00/22131 WO 0022131PCTIUS99/24065 -82 ATGATTCTCA ACTCTTCTAC TGAAGATGGT ATTAAAAGAA TCCAAGATGA GCTGGAAGGC ATAATTACAT GTGGGAATAT TTGGAAACAG ACTGTGGCCA GTGTTTTTCT TTGCCACTAT GGGCTGTCTA TGTAAGATTG CTTCAGCCAG TGTCTCAGCA TTGATCGATA ACA.ATGCTTG TAGCCAAAGT TTGCCAGCTA TAATCCATCG GCTTTCCATT ATGAGTCCCA ATACTGGGTT TCCTGTTTCC GCCCTAAAGA AGGCTTATGA ATAATTATGG CAGCAATTGT ACTTTTCTGG ATGTATTGAT GTGGACACGG CCATGCCTAT CTTTTTTATG GCTTTCTGGG, ATTCCCCCAA AAGCCAAATC CGCCCCTCAG ATAATGTAAG
TGA
(107) INFORMATION FOR
ATTTGTCATG
CTTGGTGGTG
TTTGAATTTA
CACAGCTATG
CGTCAGTTTC
CCTGGCTATT
CACCTGCATC
AAATGTATTT
AAATTCAACC
TTTTCTGATC
AATTCAGAAG
GCTTTTCTTT
TCAACTAGGC
CACCATTTGT
GAAAAAATTT
CCACTCAAAC
CTCATCCACC
ATTCCTACTT
ATAGTCATTT
GCACTGGCTG
GAATACCGCT
AACCTGTACG
GTTCACCCAA
ATCATTTGGC
TTCATTGAGA
CTTCCGATAG
ATTCTTACAA
AACAAACCAA
TTCTTTTCCT
ATCATACGTG
ATAGCTTATT
AAAAGATATT
CTTTCAACAA
AAGAAGCCTG
TATACAGTAT
ACTTTTATAT
ACTTATGCTT
GGCCCTTTGG
CTAGTGTGTT
TGAAGTCCCG
TGCTGGCAGG
ACACCAATAT
GGCTGGGCCT
GTTATACTCT
GAAATGATGA
GGATTCCCCA
ACTGTAGAAT
TTAACAATTG
TTCTCCAGCT
AAATGAGCAC
CACCATGTTT
TTGTCCCAAA
CATCTTTGTG
GAAGCTGAAG
TTTACTGACT
CAATTACCTA
TCTACTCACG
CCTTCGACGC
CTTGGCCAGT
TACAGTTTGT
GACCAAAAAT
TATTTGGAAG
TATTTTTAAG
CCAAATATTC
TGCAGATATT
CCTGAATCCT
TCTAAAATAT
GCTTTCCTAC
TGAGGTTGAG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1083 SEQ ID NO:106: SEQUENCE CHARACTERISTICS: LENGTH: 360 amino acids TYPE: amino acid
STRAN.DEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:106: Met Ile Leu Asn Ser Ser Thr Glu Asp Gly Ile Lys Arg Ile Gin Asp 1 5 10 Asp Cys Pro Lys Ala Gly Arg His Asn Tyr Ile Phe Val Met Ile Pro WO 00/22131 WO 00/213 1PCT/US99/24065 83 Val Gly Ile Phe Gly Asn Ser Leu Thr Leu Tyr Ser Ile Ile Val Val Leu Gly Tyr Al a Ala 145 Leu Ile Ile Leu Ala 225 Ile His Arg Ile Val Phe Pro Asn Al a Ile 130 Lys Pro Thr Gly Ile 210 Tyr Ile Gin Asp Cys Ile Leu Leu Tyr Ser 115 Val Val Ala Val Leu 195 Ile Glu Met Ile Cys 275 Ile Val Leu Trp Leu 100 Val His Thr Ile Cys 180 Gly Leu Ile Ala Phe 260 Arg Ala Ile Asn Ala Cys Phe Pro Cys Ile 165 Ala Leu Thr Gin Ala 245 Thr Ile Tyr Tyr Leu 70 Val Lys Leu Met Ile 150 His Phe Thr Ser Lys 230 Ile Phe Ala Phe Phe Phe Al a Tyr Ile Leu Lys 135 Ile Arg His Lys Tyr 215 Asn Val Leu Asp Asn 295 Vai 40 Tyr Leu Thr Ala Thr 120 Ser Ile Asn Tyr Asn 200 Thr Lys Leu Asp Ile 280 Asn Met Al a Al a Ser 105 Cys Arg Trp Val Giu 185 Ile Leu Pro Phe Val 265 Val Cys Lys Asp Met 90 Ala Leu Leu Leu Phe 170 Ser Leu Ile Arg Phe 250 Leu Asp Leu Leu Leu 75 Giu Ser Ser Arg Leu 155 Phe Gin Gly Trp Asn 235 Phe Ile Thr Asn Lys Cys Tyr Val Ile Arg 140 Ala Ile Asn Phe Lys 220 Asp Phe Gin Ala Pro 300 Thr Phe Arg Ser Asp 125 Thr Gly Giu Ser Leu 205 Ala Asp Ser Leu Met 285 Leu Val Leu Trp Phe 110 Arg Met Leu Asn Thr 190 Phe Leu Ile Trp Gly 270 Pro Phe Al a Leu Pro Asn Tyr Leu Ala Thr 175 Leu Pro Lys Phe Ile 255 Ile Ile Tyr Ser Thr Phe Leu Leu Val Ser 160 Asn Pro Phe Lys Lys 240 Pro Ile Thr Gly 290 Phe Leu Gly Lys Lys Phe Lys Arg Tyr Phe Leu Gin Leu Leu Lys Tyr 305 310 315 WO 00/22131 PCT/US99/24065 -84- Ile Pro Pro Lys Ala Lys Ser His Ser Asn Leu Ser Thr Lys Met Ser 325 330 335 Thr Leu Ser Tyr Arg Pro Ser Asp Asn Val Ser Ser Ser Thr Lys Lys 340 345 350 Pro Ala Pro Cys Phe Glu Val Glu 355 360 (108) INFORMATION FOR SEQ ID NO:107: SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:107: CCCAAGCTTC CCCAGGTGTA TTTGAT 26 (109) INFORMATION FOR SEQ ID NO:108: SEQUENCE CHARACTERISTICS: LENGTH: 38 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:108: AAGCACAATT GCTGCATAAT TATCTTAAAA ATATCATC 38 (110) INFORMATION FOR SEQ ID NO:109: SEQUENCE CHARACTERISTICS: LENGTH: 39 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO WO 00/22131 PCT/US99/24065 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:109: AAGATAATTA TGGCAGCAAT TGTGCTTTTC TTTTTCTTT (111) INFORMATION FOR SEQ ID NO:110: SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:110: GTTGGATCCA CATAATGCAT TTTCTC (112) INFORMATION FOR SEQ ID NO:ll1: SEQUENCE CHARACTERISTICS: LENGTH: 1344 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:l11: ATGGAGCTGC TAAAGCTGAA CCGGAGCGTG CAGGGAACCG GACCCGGGCC GGGGGCTTCC
CTGTGCCGCC
CCCCCTCGCA
TACGCAGTGA
CTGAGCCGCC
CTCCTGCTGG
ATCTTTGGCA
TCCACGCTAA
CAGGCACGAG
CTGTCCGGAC
CGTGTGCTGC
CGGGGGCGCC
TTCGCGGAGC
TCTTCCTGAT
GCCTGAGGAC
CTGTGGCTTG
CCGTCATCTG
GCCTCGTGGC
TGTGGCAGAC
TACTCATGGT
AGTGCGTGCA
TCTCCTCAAC
CGGGACACGA
GAGCGTTGGA
TGTCACCAAT
CATGCCCTTC
CAAGGCGGTT
CATCGCACTG
GCGCTCCCAC
GCCCTACCCC
TCGCTGGCCC
AGCAGCAGTG
GAATTGGAGC
GGAAATATGC
GCCTTCCTCC
ACCCTCCTGC
TCCTACCTCA
GAGCGATATA
GCGGCTCGCG
GTGTACACTG
AGTGCGCGGG
TGGGCAACCT
TGGCCATTAG
TCATCATCGT
TCTCACTGGC
CCAATCTCAT
TGGGGGTGTC
GCGCCATCTG
TGATTGTAGC
TCGTGCAACC
TCCGCCAGAC
CAGCTGCGAG
AATCACTCTT
GGTCCTGGGA
AGTCAGCGAC
GGGCACATTC
TGTGAGTGTG
CCGACCACTG
CACGTGGCTG
AGTGGGGCCT
CTGGTCCGTA
120 180 240 300 360 420 480 540 600 660 WO 00/22131 WO 0022131PCTIUS99/24065 86 CTGCTGCTTC TGCTCTTGTT CTTCATCCCA GGTGTGGTTA TGGCCGTGGC CTACGGGCTT
ATCTCTCGCG
AGCAGGGTCC
CCTGAGACTG
CGGCCTGCCC
ACCCAGGCCA
CTTTTTTTTC
CCGGGTGCAC
GCCTCGGCCT
TGCCTGGAAA.
CCCGATGAGG
ATCAGCACAC
AGCTCTACTT
GAAACCAAGG
GCGCGGTTGG
TGGAGCTGAC
AGCTGCTGGC
TGTGTTGGTT
ACCGAGCACT
GTGTCAACCC
CTTGCGCTCG
ACCCTCCCAC
TGGGCCCTGG
AGGGCTTCGC
CGGGCTGCCA
CAAAGACAGC
GGCGCTGACG
TAAGAAGCGC
GCCAGTTTAT
CTCGGGTGCT
CCTGGTCTAC
CTGCTGCCCC
TCCCTCCATT
CTGA
TTTGACGGCG
GGGGCTGTTC
GATGGCTGCT
GCTCCTGGGC
GTGAAACGAA
AGTGCCAACA
CCTATCTCCT
TGCTTCATGC
CGGCCTCCAC
GCTTCGCTGT
ACAGTGACAG
ACCAGAACGG
ACGTGCAACT
CGGGATCCGG
TGTTGCTGGT
CGTGGCGCGC
TCATTCACTT
ACCGTCGCTT
GAGCTCGCCC
CCAGGCTTAG
CGACAGCCAA
GCGTTGCCGG
TCCACGTTCC
CTCCCGGCCC
GATCGTTGTG
CTTTGATGGC
GCTGAGCTAC
TCGCCAGGCC
CAGGGCTCTT
CTACACCACC
780 840 900 960 1020 1080 1140 1200 1260 1320 1344 (113) INFORMATION FOR SEQ ID NO:112: SEQUENCE CHARACTERISTICS: LENGTH: 447 amino acids TYPE: amino acid STRANflEDNESS: TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) Met 1 Pro SEQUENCE DESCRIPTION: SEQ ID NO:112: Glu Leu Leu Lys Leu Asn Arg Ser Val 5 10 Gly Ala Ser Leu Cys Arg Pro Gly Ala Gin Gly Thr Gly Pro Gly Pro Leu Leu Asn 25 Pro Ser Val Gly Thr Arar Glu Leu Ser Cys Pro Arg Ile Arg Tyr Asn Ser Ser Gly Ala Gly Ala Val Ile Phe Leu Met Leu Leu Glu Leu Ala 55 Ser Val Gly Gly 70 Arg Leu Arg Thr Arg Ile Thr Asn Met Leu Val Thr Asn 90 Ile 75 Al a Leu Ile Val Val Leu Gly Leu Ser Arg Phe Leu Leu Ser WO 00/22131 WO 0022131PCT/US99/24065 87- Ala Lea Al a Leu 145 Gin Al a Thr Trp, Leu 225 Ile Ser Val Asp Gia 305 Thr Val Asn Val Val Pro Val 130 Vai Al a Thr Val Pro 210 Lea Ser Asp His Ser 290 Lea Gin Ile Thr Al a Ser Asn 115 Ser Al a Arg Trp Vai 195 Ser Phe Arg Ser Gin 275 Asp Thr Al a Val Trp 355 Pro Asp 100 Leu Tyr Ile Val Leu 180 Gin Ala Phe Gla Gin 260 Asn Gly Al a Lys Vai 340 Arg Ile Lea Met Leu Ala Trp 165 Leu Pro Arg Ile Leu 245 Ser Gly Cys Leu Lea 325 Leu Ala Ser Leu Gly Met Leu i50 Gin Ser Val Val Pro 230 Tyr Arg Arg Tyr Thr 310 Leu Phe Phe Phe Lea Thr Gly 135 Glu Thr Giy Giy Arg 215 Gly Lea Val Cys Vai 295 Ala Al a Phe Asp Ile 375 Ala Val 105 Phe Ile 120 Val Ser Arg Tyr Arg Ser Leu Leu 185 Pro Arg 200 Gin Thr Val Val Gly Leu Arg Asn 265 Arg Pro 280 Gin Lea Pro Gly Lys Lys Lea Cys 345 Gly Pro 360 His Lea Ala Phe Val Ser His 170 Met Val1 Trp Met Arg 250 Gin Gia Pro Pro Arg 330 Trp Gly Lea Cys Gly Ser Al a 155 Ala Val Leu Ser Ala 235 Phe Gly Thr Arg Gly 315 Val Lea Ala Ser Met Thr Val 140 Ile Al a Pro Gin Val 220 Val Asp Gly Gi y Ser 300 Ser Lys Pro His Tyr 380 Pro Val 125 Ser Cys Arg Tyr Cys 205 Lea Ala Gly Lea Ala 285 Arg Gly Arg Val Arg 365 Ala Phe 110 Ile Thr Arg Val Pro 190 Val1 Lea Tyr Asp Pro 270 Val Pro Ser Met Tyr 350 Al a Ser Thr Cys Lea Pro Ile 175 Val His Lea Gly Ser 255 Gly Gly Al a Arg Lea 335 Ser Lea Al a Lea Lys Ser Lea 160 Val Tyr Arg Lea Lea 240 Asp Ala Lys Lea Pro 320 Lea Al a Ser Cys 370 Val Asn Pro Lea Val Tyr Cys Phe Met His Arg Arg Phe Arg Gin Ala WO 00/22131 PCTIUS99/24065 -88- 385 390 Cys Leu Glu Thr Cys Ala Arg Cys 405 Pro Arg Ala Leu Pro Asp Glu Asp 420 Leu Ser Arg Leu Ser Tyr Thr Thr 435 440 (114) INFORMATION FOR SEQ ID NO:113: SEQUENCE CHARACTERISTICS: LENGTH: 34 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) 395 400 Cys Pro Arg Pro Pro Arg Ala Arg 410 415 Pro Pro Thr Pro Ser Ile Ala Ser 425 430 Ile Ser Thr Leu Gly Pro Gly 445 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:113: CAGCAGCATG CGCTTCACGC GCTTCTTAGC CCAG (115) INFORMATION FOR SEQ ID NO:114: SEQUENCE CHARACTERISTICS: LENGTH: 33 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: not relevant (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:114: AGAAGCGCGT GAAGCGCATG CTGCTGGTGA TCGTT (116) INFORMATION FOR SEQ ID NO:115: SEQUENCE CHARACTERISTICS: LENGTH: 33 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:115: WO 00/22131 PCT/US99/24065 -89- ATGGAGAAAA GAATCAAAAG AATGTTCTAT ATA 33 (117) INFORMATION FOR SEQ ID NO:116: SEQUENCE CHARACTERISTICS: LENGTH: 33 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:116: TATATAGAAC ATTCTTTTGA TTCTTTTCTC CAT 33 (118) INFORMATION FOR SEQ ID NO:117: SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:117: CGCTCTCTGG CCTTGAAGCG CACGCTCAGC (119) INFORMATION FOR SEQ ID NO:118: SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:118: GCTGAGCGTG CGCTTCAAGG CCAGAGAGCG (120) INFORMATION FOR SEQ ID NO:119: WO 00/22131 PCT/US99/24065 SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:119: CCCAGGAAAA AGGTGAAAGT CAAAGTTTTC (121) INFORMATION FOR SEQ ID NO:120: SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:120: GAAAACTTTG ACTTTCACCT TTTTCCTGGG (122) INFORMATION FOR SEQ ID NO:121: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:121: GGGGCGCGGG TGAAACGGCT GGTGAGC 27 (123) INFORMATION FOR SEQ ID NO:122: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single WO 00/22131 PCT/US99/24065 -91- TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:122: GCTCACCAGC CGTTTCACCC GCGCCCC 27 (124) INFORMATION FOR SEQ ID NO:123: SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:123: CCCCTTGAAA AGCCTAAGAA CTTGGTCATC (125) INFORMATION FOR SEQ ID NO:124: SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:124: GATGACCAAG TTCTTAGGCT TTTCAAGGGG (126) INFORMATION FOR SEQ ID NO:125: SEQUENCE CHARACTERISTICS: LENGTH: 32 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) WO 00/22131 PCT/US99/24065 -92- (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:125: GATCTCTAGA ATGAACAGCA CATGTATTGA AG (127) INFORMATION FOR SEQ ID NO:126: SEQUENCE CHARACTERISTICS: LENGTH: 35 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:126: CTAGGGTACC CGCTCAAGGA CCTCTAATTC CATAG (128) INFORMATION FOR SEQ ID NO:127: SEQUENCE CHARACTERISTICS: LENGTH: 1296 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:127: ATGCAGGCGC TTAACATTAC CCCGGAGCAG TTCTCTCGGC TGCTGCGGGA ACGCGGGAGC AGTTCATCGC TCTGTACCGG CTGCGACCGC TCGTCTACAC CCGGGACGCG CCAAGCTGGC CCTCGTGCTC ACCGGCGTGC TCATCTTCGC TTTGGCAATG CTCTGGTGTT CTACGTGGTG ACCCGCAGCA AGGCCATGCG AACATCTTTA TCTGCTCCTT GGCGCTCAGT GACCTGCTCA TCACCTTCTT GTCACCATGC TCCAGAACAT TTCCGACAAC TGGCTGGGGG GTGCTTTCAT GTGCCATTTG TCCAGTCTAC CGCTGTTGTG ACAGAAATGC TCACTATGAC GTGGAAAGGC ACCAGGGACT TGTGCATCCT TTTAAAATGA AGTGGCAATA
CCACAACCTG
CCCAGAGCTG
CCTGGCGCTC
CACCGTCACC
CTGCATTCCC
TTGCAAGATG
CTGCATTGCT
CACCAACCGA
120 180 240 300 360 420 480 WO 00/22131 WO 0022131PCT/US99/24065 93 AGGGCTTTCA CAATGCTAGG TGTGGTCTGG CTGGTGGCAG TCATCGTAGG ATCACCCATG
TGGCACGTGC
TGCTTAGAAG
ATCCTCTTCC
CTTTGGATAA
ATGTCCA.AAA
CTCTTTGCTG
TTTGAAAAGG
GGATTTTCCA
AAAAiATGTTT
AGGCATGGAA
AATCCAGTGG
TGTGAACAGA
CTGGCTGAGA
AACAACTTGA
AGTGGACCAG
TCCTGCCTCT
AGAAAAGAGT
TAGCCAGGAA
TGTGCTGGGC
AA.TATGATGA
ACTCCATCTG
TGTCTGCAGT
ATTCAGGAAT
AGGAAACCAA
CAGAGGAGAA
ATTCTCCTTT
GATCAAATAT
CCCTGTGCAC
TATGGTGATG
TGGGGATGGT
GAAGAAACGA
ACCATTCCAT
TGTCACAATC
TAATCCCATT
TTGTTATTGC
TACAATGATG
AGGAGAAGCA
GAAAAAGCTC
GACTTCCTAT
CAGAAGATCT
CTTATTCTGT
TCAGTGCTTC
GCTAAGATTA
GTTGTCCATA
AAGATGATTT
GTCTATGCAT
ATAGTAAATA
CGGAAGAAAG
TTCAGTGATG
AAACGACATC
ATGAAAAGGA
ACACCACCTT
ACAGTAAAAT
GAACTATTCA
TGATGGTGAC
TGATGATTGA
TTGCTATCGT
TTATGAATGA
AAACCTTCTC
CAAAGTTTTC
GCAACATTGA
TTGCTCTCTT
ACACATCTGC
CATCCTTGTC
TGGTTATGAA
TGGAAAAGAA
AGTGGTGGCT
ATACAGTAAT
GCAAATTATT
AAACTTCAAA
TCCAGCACAA
CCTCAGAGAG
AGTCAAATTG
TAGGTCTGAA
540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1296 AGACAGTGGG CATTAA (129) INFORMATION FOR SEQ ID NO:128: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 431 amino acids TYPE: amino acid
STRANDEDNESS:
CD) TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:128: Met Gin Ala Leu Asn Ile Thr Pro Giu Gin Phe Ser Arg Leu 1 5 10 Asp His Asn Leu Thr Arg Giu Gin Phe Ile Ala Leu Tyr Arg 25 Pro Leu Val Tyr Thr Pro Glu Leu Pro Gly Arg Ala Lys Leu 40 Val Leu Thr Gly Val Leu Ile Phe Ala Leu Ala Leu Phe Gly 55 Leu Val Phe Tyr Val Vai Thr Arg Ser Lys Ala Met Arg Thr 70 75 Leu Arg Leu Arg Ala Leu Asn Ala Val Thr WO 00/22131 WO 0022131PCTIUS99/24065 -94- Asn Ile Phe Ile Cys Ser Leu Ala Leu Ser Asp Leu Leu Ile Thr Phe Phe Gly Val Gin 145 Arg Gly Leu Val Leu 225 Leu His Ile Phe Tyr 305 Gly Glu Asn Cys Gly Val 130 Gly Al a Ser Tyr His 210 Pro Trp Gly Met His 290 Asp Phe Asn Lys Ile Ala 115 Thr Leu Phe Pro Giu 195 Gin Leu Ile Lys Met 275 Val Asp Ser *Phe *Thr P~ro 100 Phe G1u Val Thr M~et 180 Lys Lys Met Lys Giu 260 Val Val Val Asn Lys 340 Phe Val Ile Met His Met 165 Trp Glu Ile Val Lys 245 Met Thr His Thr Ser 325 Lys Ser Thr Cys Leti Pro 150 Leti His His Tyr Met 230 Arg Ser Val Met Ile 310 Ile Asn Pro Met Lys Thr 135 Phe Gly Val Ile Thr 215 Leti Vai Lys Val Met 295 Lys Cys Vai Ala Leu Met 120 Met Lys Val Gin Cys 200 Thr Ile Gly Ile Ala 280 Ile Met Asn Leu Gin 360 Gln 105 Val rhr M1et VJal Gin 185 Cys Phe Leti Asp Ala 265 Leu Giu Ile Pro Ser 345 Arg Asn Pro Cys Lys Trp 170 Leu Leu Ile Tyr Gly 250 Arg Phe Tyr Phe Ile 330 Ala His Ile Phe Ile Trp 155 Leu Giu Glu Leti Ser 235 S er Lys Al a Ser Ala 315 Val Val Gly Ser Val Ala 140 Gin Val Ile Giu Vai 220 Lys Val Lys Val Asn 300 Ile Tyr Cys Asn ksp Gln 125 Val Tyr Al a Lys Trp 205 Ile Ile Leti Lys Cys 285 Phe Val Al a Tyr Ser Asn 110 Ser Giu Thr Val Tyr 190 Thr Leu Gly Arg Arg 270 Trp Glu Gin Phe Cys 350 Gly Trp Thr Arg Asn Ile 175 Asp Ser Phe Tyr Thr 255 Al a Ala Lys Ile Met 335 Ile Ile Leu Al a His Arg 160 Val Phe Pro Leti Giti 240 Ile Lys Pro Giu Ile 320 Asn Val Thr 355 Met Met Arg Lys Lys Ala Lys Phe Ser Leti Arg Giu Asn Pro Val Glu WO 00/22131 WO 0022131PCTIUS99/24065 370 375 380 Glu Thr Lys Gly Glu Ala Phe Ser Asp Gly Asn Ile Giu Vai Lys Leu 385 390 395 400 Cys Giu Gin Thr Giu Glu Lys Lys Lys Leu Lys Arg His Leu Ala Leu 405 410 415 Phe Arg Ser Giu Leu Ala Glu Asn Ser Pro Leu Asp Ser Gly His 420 425 430 (130) INFORMATION FOR SEQ ID NO:129: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 2040 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:129: ATGGGCAGCC CCTGGAACGG CAGCGACGGC CCCGAGGGGG CGCGGGAGCC GCCGTGGCCC
GCGCTGCCGC
120
GTGACCGCTG
180
ATGCTGATCG
240
GCCGTGTCCG
300
TCGCGGCCCT
360
TGCACCTACG
420
TGCCGCCCGC
480
GTGCTCTGGG
540
CAGGACCCCG
600
CCTCTCGCCT
CTTGCGACGA
TGTGCCTGTG
GGCGCTACCG
ACCTACTCAT
GGGTGTTCGG
CCACGCTGCT
TCCGCGCCCG
CCGTGGCGCT
GCATCTCCGT
CGTCGCCGCC
GCGCCGCTGC
CCTGTTCGTC
GGACATGCGG
CCTGCTCGGG
GCCGCTGCTC
GCACATGACC
CGTCTTGGTC
GCTCTCTGCC
AGTCCCGGGC
TCTCTGGCTC
TCGCCCTTTC
GTCGGGGTGA
ACCACCACCA
CTGCCGTTCG
TGCCGCCTGT
GCGCTCAGCG
ACCCGGCGCC
GGTCCCTTCT
CTCAATGGCA
TCGCGGGCGC
CCCTGGGGGC
GCGGCAACGT
ACTTGTACCT
ACCTGTACCG
CCCTCTACGT
TCGAGCGCTA
GCGTCCGCGC
TGTTCCTGGT
CCGCGCGGAT
CACCGCCGTC
GCTGGTGCCG
GGTGACCGTG
GGGCAGCATG
CCTCTGGCGC
GGGCGAGGGC
CCTGGC CAT C
GCTCATCGCT
GGGCGTCGAG
CGCCTCCTCG
CCCGCCGTCG
WO 00/22131 WO 0022131PCTIUS99/24065 -96 660
GGGCCCGAGA
720
CAGCTGGGCG
780
CTGTGCCTCA
840
CTGCGAGGCC
900
CGTAAGTGGA
960
GCGCAAACGC
1020
TTTCCTATTT
1080
TCCTGTCCCC
1140
CGATTCAGTA
1200
TTCTTAATCC
1260
AGACGAGGGA
1320
TAAAGTAAAC
1380
TTCAACAGAG
1440
CGGCTTGTTC
1500
AGCCTACTAT
1560
GGGGTGAGGA
1620
GCAGATGGTT
1680
GGTGCTGTGT
1740
CCGCGGAGGC
CGCTGCGTGT
GCATCCTCTA
CGGCCGCCTC
GCCGCCGTGG
TGGGTCCCCT
CGATTCCAGC
CAGGAGCTCT
ACCAGCAGTG
AACCACCTGT
GATTTCATTA
CTTGCTCGTA
AACAGAAAAC
AGAGAAATTG
GCAGTTTTAA
TCTGCCTAGG
CCTTGTCGGG
CTTATGTTGC
CGCGGCGCTG
CATGCTGTGG
CGGGCTCATC
GGGGCGGGAG
TTCCAAAGAC
TCCCCTGCTC
CTCCACCCGC
GGGGGACCCC
CTTTTCCAGA
TAGATGCCAC
AGCTAAAATT
TCAAAAAGTA
TTGTCTCCGA
CTCCTTCTGG
AGCAAGTATC
TAGAAGTTTT
GTGGGGGGTT
AGTGGTGGTG
TTCAGCCGCG
GTCACCACCG
GGGCGGGAGC
AGAGGCCACC
GCCTGCCTGC
GCCCAGCTCT
CGGTACTTCC
AGGGCGCTTT
GCCTCTGAGA
AAATGAGGAG
TTTTATTTAA
AAGATTGTGC
AGTGGGTTTG
TTTATGTCCA
CATGCAGCCT
CTCTAATTTA
TATTTGCTTC
GTTCTGGCAT
AATGCCGGCC
CCTACTTCTT
TGTGGAGCAG
GGCAGACCAA
AGTCCGCCCC
GGGCGCCGCT
CATCCCCCGA
GAGGGTGGGA
CCAGAAAGGA
TCCTCACAGT
TGTTAAGTGA
AGACCTGTTG
TGGAAGGAAG
GCCTTGATAA
GCAGCCTGGT
TTTTGCTGTT
CCAATGCTTT
TTATAATTTG
GAGC C CCG CG
CCTGCCCTTT
CCGGCGGCCG
ACGCGTCCTG
GCCGGGGACC
TCCAGCTCCC
GAAA.ACCATG
TCCCCGGATC
GAGTTGGTAA
GCTCTTGAGA
TGCTGAAGGC
TAGAATTCTT
CCTGCCAAGG
CACATATGGG
CATTTTTTCT
ACTTGTTATT
TGTTAAT CC C
CTGGTTGCCC
WO 00/22131 WO 0022131PCTIUS99/24065 97- TTCCACGTTG GCAGAATCAT TTACATAAAC ACGGAAGATT CGCGGATGAT 1800 CAGTACTTTA ACATCGTCGC TCTGCA1&CTT TTCTATCTGA GCGCATCTAT 1860 CTCTACAACC TCATTTCAAA GAAGTACAGA GCGGCGGCCT TTAAACTGCT 1920 AAGTCCAGGC CGAGAGGCTT CCACAGAAGC AGGGACACTG CGGGGGAAGT 1980 ACTGGAGGAG ACACGGTGGG CTACACCGAG ACAAGCGCTA ACGTGAAGAC 2040 (131) INFORMATION FOR SEQ ID NO:130: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 412 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:130:
GTACTTCTCT
CAACCCAATC
GCTCGCAAGG
TGCAGGGGAC
GATGGGATAA
Met Gly Ser Pro Trp Asn Gly Ser Asp Gly Pro Glu Gly Ala Arg Glu 1 Pro Phe Phe Arg Al a Arg Leu Pro Pro Val Tyr Val Leu Ser Trp, Leu Val Arg Ser Trp Leu 115 Pro Gly Gly Asp Asp Arg 100 Tyr 5 Ala Ala Val Met Leu Ser Val Leu Leu Ser Arg 70 Leu Arg Gly Pro Val Gly 55 Thr Ile Pro Glu Pro Pro 40 Asn Thr Leu Trp, Gly 120 Cys 25 Val Val Thr Leu Val 105 Cys 10 Asp Thr Val Asn Gly 90 Phe Thr Glu Al a Thr Leu 75 Leu Gly Tyr Arg Val Val G0 Tyr Pro Pro Al a Arg Cys Met Leu Phe Leu Thr 125 Cys Leu Leu Gly Asp Leu 110 Leu Ser Cys Ile Ser Leu Cys Leu Pro Leu Gly Met Tyr Arg His Met Thr Ala Leu Ser Val Glu Arg Tyr Leu Ala Ile Cys Arg Pro Leu 130 135 WO 00/22131 PCT/US99/24065 -98- Arg Ala Arg Val Leu Val Thr Arg Arg Arg Val Arg Ala Leu Ile Ala 145 150 155 160 Val Leu Trp Ala Val Ala Leu Leu Ser Ala Gly Pro Phe Leu Phe Leu 165 170 175 Val Gly Val Glu Gin Asp Pro Gly Ile Ser Val Val Pro Gly Leu Asn 180 185 190 Gly Thr Ala Arg Ile Ala Ser Ser Pro Leu Ala Ser Ser Pro Pro Leu 195 200 205 Trp Leu Ser Arg Ala Pro Pro Pro Ser Pro Pro Ser Gly Pro Glu Thr 210 215 220 Ala Glu Ala Ala Ala Leu Phe Ser Arg Glu Cys Arg Pro Ser Pro Ala 225 230 235 240 Gin Leu Gly Ala Leu Arg Val Met Leu Trp Val Thr Thr Ala Tyr Phe 245 250 255 Phe Leu Pro Phe Leu Cys Leu Ser Ile Leu Tyr Gly Leu Ile Gly Arg 260 265 270 Glu Leu Trp Ser Ser Arg Arg Pro Leu Arg Gly Pro Ala Ala Ser Gly 275 280 285 Arg Glu Arg Gly His Arg Gin Thr Lys Arg Val Leu Leu Val Val Val 290 295 300 Leu Ala Phe Ile Ile Cys Trp Leu Pro Phe His Val Gly Arg Ile Ile 305 310 315 320 Tyr Ile Asn Thr Glu Asp Ser Arg Met Met Tyr Phe Ser Gin Tyr Phe 325 330 335 Asn Ile Val Ala Leu Gin Leu Phe Tyr Leu Ser Ala Ser Ile Asn Pro 340 345 350 Ile Leu Tyr Asn Leu Ile Ser Lys Lys Tyr Arg Ala Ala Ala Phe Lys 355 360 365 Leu Leu Leu Ala Arg Lys Ser Arg Pro Arg Gly Phe His Arg Ser Arg 370 375 380 Asp Thr Ala Gly Glu Val Ala Gly Asp Thr Gly Gly Asp Thr Val Gly 385 390 395 400 Tyr Thr Glu Thr Ser Ala Asn Val Lys Thr Met Gly 405 410 (132) INFORMATION FOR SEQ ID NO:131: SEQUENCE CHARACTERISTICS: LENGTH: 1344 base pairs WO 00/2213 1 99 TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:131: PCTIUS99/24065 ATGGAGCTGC TAAAGCTGAA CCGGAGCGTG CAGGGAACCG GACCCGGGCC GGGGGCTTCC
CTGTGCCGCC
120
CCCCCTCGCA
180
TACGCAGTGA
240
CTGAGCCGCC
300
CTCCTGCTGG
360
ATCTTTGGCA
420
TCCACGCTAA
480
CAGGCACGAG
540
CTGTCCGGAC
600
CGTGTGCTGC
660 CTGCTGCTTt 720
ATCTCTCGCG
780
AGCAGGGTCC
840
CCTGAGACTG
g00
CGGCCTGCCC
CGGGGGCGCC
TTCGCGGAGC
TCTTCCTGAT
GCCTGAGGAC
CTGTGGCTTG
CCGTCATCTG
GCCTCGTGGC
TGTGGCAGAC
TACTCATGGT
AGTGCGTGCA
TGCTCTTGTT
AGCTCTACTT
GAAACCAAGG
GCGCGGTTGG
TGGAGCTGAC
TCTCCTCAAC
CGGGACACGA
GAGCGTTGGA
TGTCACCAAT
CATGCCCTTC
CAAGGCGGTT
CATCGCACTG
GCGCTCCCAC
GCCCTACCCC
TCGCTGGCCC
CTTCATCCCA
AGGGCTTCGC
CGGGCTGCCA
CAAAGACAGC
GGCGCTGACG
AGCAGCAGTG
GAATTGGAGC
GGAAATATGC
GCCTTCCTCC
ACCCTCCTGC
TCCTACCTCA
GAGCGATATA
GCGGCTCGCG
GTGTACACTG
AGTGCGCGGG
GGTGTGGTTA
TTTGACGGCG
GGGGCTGTTC
GATGGCTGCT
GCTCCTGGGC
TGGGCAACCT
TGGCCATTAG
TCATCATCGT
TCTCACTGGC
CCAATCTCAT
TGGGGGTGTC
GCGCCATCTG
TGATTGTAGC
TCGTGCAACC
TCCGCCAGAC
TGGCCGTGGC
ACAGTGACAG
ACCAGAACGG
ACGTGCAACT
CGGGATCCGG
CAGCTGCGAG
AATCACTCTT
GGTCCTGGGA
AGTCAGCGAC
GGGCACATTC
TGTGAGTGTG
CCGACCACTG
CACGTGGCTG
AGTGGGGCCT
CTGGTCCGTA
CTACGGGCTT
CGACAGCCAA
GCGTTGCCGG
TCCACGTTCC
CTCCCGGCCC
WO 00/22131 WOOO/2131PCTIUS99/24065 100 960 ACCCAGGCCA AGCTGCTGGC TAAGAAGCGC GTGAAACGAA T 1020 CTTTTTTTTC TGTGTTGGTT GCCAGTTTAT AGTGCCAACA C 1080 CCGGGTGCAC ACCGAGCACT CTCGGGTGCT CCTATCTCCT T 1140 GCCTCGGCCT GTGTCAACCC CCTGGTCTAC TGCTTCATGC A 1200 TGCCTGGAAA CTTGCGCTCG CTGCTGCCCC CGGCCTCCAC G 1260 CCCGATGAGG ACCCTCCCAC TCCCTCCATT GCTTCGCTGT C 1320 ATCAGCACAC TGGGCCCTGG CTGA 1344 (133) INFORMATION FOR SEQ ID NO:132: SEQUENCE CHARACTERISTICS: LENGTH: 447 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:132: Met Glu Leu Leu Lys Leu Asn Arg Ser Val 1 5 10 Pro Gly Ala Ser Leu Cys Arg Pro Gly Ala 25 Ser Val Gly Asn Leu Ser Cys Glu Pro Pro 40 Thr Arg Glu Leu Glu Leu Ala Ile Arg Ile 55 Phe Leu Met Ser Val Gly Gly Asn Met Leu Leu Ser Arg Arg Ieu Arg Thr Val Thr Asn 85 90 Ala Val Ser Asp Leu Leu Leu Ala Val Ala
GTTGCTGGT
GTGGCGCGC
CATTCACTT
CCGTCGCTT
AGCTCGCCC
CAGGCTTAG
GATCGTTGTG
CTTTGATGGC
GCTGAGCTAC
TCGCCAGGCC
CAGGGCTCTT
CTACACCACC
Gly Leu Ile Leu Ile Phe Met Thr Leu Arg Tyr Val Leu Pro Gly Asn Gly Al a Val Leu Phe Pro Ser Al a Val Leu Ser Thr WO 00/22131 WOOO/2131PCTIUS99/24065 101 100 105 110 Leu Pro Asn Leu Met Gly Thr Phe Ile Phe Gly Thr Val Ile Cys Lys 115 120 125 Ala Val Ser Tyr Leu Met Gly Val Ser Val. Ser Val Ser Thr Leu Ser 130 135 140 Leu Val Ala Ile Ala Leu Giu Arg Tyr Ser Ala Ile Cys Arg Pro Leu 145 150 155 160 Gin Ala Arg Val Trp Gin Thr Arg Ser His Ala Ala Arg Val Ile Val 165 170 175 Ala Thr Trp Leu Leu Ser Gly Leu Leu Met Val Pro Tyr Pro Val Tyr 180 185 190 Thr Val Val Gin Pro Val Gly Pro Arg Val Leu Gin Cys Val His Arg 195 200 205 Trp, Pro Ser Ala Arg Val Arg Gin Thr Trp Ser Val Leu Leu Leu Leu 210 215 220 Leu Leu Phe Phe Ile Pro Gly Val Val Met Ala Val Ala Tyr Gly Leu 225 230 235 240 Ile Ser Arg Giu Leu Tyr Leu Gly Leu Arg Phe Asp Gly Asp Ser Asp 245 250 255 Ser Asp Ser Gin Ser Arg Val Arg Asn Gin Gly Gly Leu Pro Gly Ala 260 265 270 Val His Gin Asn Gly Arg Cys Arg Pro Glu Thr Gly Ala Val Gly Lys 275 280 285 Asp Ser Asp Gly Cys Tyr Val Gin Leu Pro Arg Ser Arg Pro Ala Leu 290 295 300 Giu Leu Thr Ala Leu Thr Ala Pro Gly Pro Gly Ser Gly Ser Arg Pro 305 310 315 320 Thr Gin Ala Lys Leu Leu Ala Lys Lys Arg Val Lys Arg Met Leu Leu 325 330 335 Val Ile Val Val Leu Phe Phe Leu Cys Trp Leu Pro Val Tyr Ser Ala 340 345 350 Asn Thr Trp Arg Ala Phe Asp Gly Pro Gly Ala His Arg Ala Leu Ser 355 360 365 Val Ala Pro Ile Ser Phe Ile His Leu Leu Ser Tyr Ala Ser Ala Cys 370 375 380 Val Asn Pro Leu Val Tyr Cys Phe Met His Arg Arg Phe Arg Gin Ala 390 395 400 WO 00/22131 WO 0022131PCTIUS99/24065 102 Cys Leu Glu Thr Cys Ala Arg Cys Cys Pro Arg Pro Pro 405 410 Pro Arg Ala Leu Pro Asp Glu Asp Pro Pro Thr Pro Ser 420 425 Leu Ser Arg Leu Ser Tyr Thr Thr Ile Ser Thr Leu Gly 435 440 445 (134) INFORMATION FOR SEQ ID NO:133: SEQUENCE CHARACTERISTICS: LENGTH: 1014 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) Arg Ala Arg 415 Ile Ala Ser Gly (xi) SEQUENCE DESCRIPTION: SEQ ID NO:133: ATGAACAGCA CATGTATTGA AGAACAGCAT GACCTGGATC ACTATTTGTT TCCCATTGTT
TACATCTTTG
CTGCAAGCAA
TTACTCTATG
AC!TTTCTCTC
AGCACAGCAT
AAGTTTTTTT
TTGGAAACCA
GATGCCGAAA
ATCAACCTCA
ATCTGTAACC
AAGAAGAGAA
CCCTTTCATG
CACAGCAATT
TTAAATTGTG
ATGTGGAATA
CGCATACTTT
TGATTATAGT
AGAAGGAAAG
CATTAACTCT
CTGCCTTGTG
TCCTCACCTG
TCCTAAGGAC
TCTTCAATGC
AGTCTAATTT
ACTTGTTCAG
GGAAAGTCTA
TCAAAAAACT
TGATGTTGCT
CTGGGAAGCG
TTGCTGATCC
TATTAAAATT
CTGTGTCTAC
CAGCATTCCA
TGAACTAGGA
CCCTTTATGG
CAAAGGGAGT
CATTGCCGTT
AAGAAGATTT
TGTCATGTTG
TACTTTATGC
GACGTGTACA
CCAAGCTGTG
ACTTGTCAGC
GATTCGCTGC
AACTTACACA
AATTCTGTAC
CTGCACTGGG
AAAAGATACT
GCCAATATTG
ATTTACCTCT
ATTGATTATA
GCTTTTCTCA
GATCGGTATT
GCACTCATGG
TGGGAAGATG
TATGACAAAT
GGCTATGCAA.
CGGCACAATA
ATCACAGTTA
ATTTTAGAGC
ATGTATAGAA
TGTTTTGTTA
AGGTGTAATA
ATGGAATTAG
GATCTCTGTG
TCAGTTTGTC
CTTGGAATAA
TGTACATGAA
TGGCTGTTGT
TCAGCCTGTC
AAACAGTTGT
ACCCTTTAGA
TACCTTTGGT
AAGCCACGGA
CTTTTGTCTT
ATGCTGTGAA
TCACGGTTGC
CCGAAACAGG
CATCACAAAG
AGGTCCTTGA
TGTGTCTTTC
ACTATCAGAT
AGACAACTGG
TTTTTAC!AGC
CTACCCTTTG
CATCTGGATA
TGAATATTGC
GAA-ATGGCAA
CACCATCCTG
AAACAAGGAA
ATGCTTTACT
CTTCGAAGAC
ATTAACAAGT
AAGATATGAT
ACAAAGAAAA
GTAG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1014 WO 00/22131 WO 0022131PCTIUS99/24065 103 (135) INFORMATION FOR SEQ ID NO:134: SEQUENCE CHARACTERISTICS: LENGTH: 337 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:134: Met Asn Ser Thr Cys Ile Glu Giu Gin His Asp Leu Asp His Tyr Leu 1 Phe Ile Leu Leu Thr Asn Tyr Arg Phe 145 Asp Giu Ala Pro Gly Gly Thr Phe Phe Leu Phe 130 Asn Al a Lys Ile Ile Ser Ile Leu Ser Tyr Ala 115 Al a Al a Gin Trp Pro 195 5 Val Tyr Len Cys Tyr Len Pro Len Pro Ala Ser Ser 100 Val Val Len Met Val Met Lys Ser 165 Gin Ile 180 Leu Val Ile Val Phe Trp 70 Leu Thr Tyr Val Leu 150 Asn Asn Thr Phe Ser Ser 55 Ile Cys Al a Pro Ser 135 Trp Phe Len Ile Val Phe 40 Len Asp Lys Phe Leu 120 Len Glu Thr Asn Leu 200 Ile 25 Len Ser Tyr Gly Len 105 Lys Ser Asp Leu Leu 185 Ile 10 Ile Gin Len Thr Ser 90 Thr Phe Ile Giu Cys 170 Phe Cys Val Ala Ser Trp 75 Al a Cys Phe Trp Thr 155 Tyr Arg Asn Ser Lys Asp Asn Phe Ile Phe Ile 140 Val Asp Thr Arg Ile Lys Leu Lys Len Al a Len 125 Leu Val Lys Cys Lys 205 Pro Glu Leu Asp Met Val 110 Arg Giu Giu Tyr Thr 190 Val Ala Ser Tyr Asn Tyr Asp Thr Thr Tyr Pro 175 Gly Tyr Asn Gin Ala Trp Met Arg Arg Ile Cys 160 Len Tyr Gin Ala Val Arg His Asn Lys Ala Thr Gln Asn Lys Gin Lys Lys Arg Ile 210 215 220 WO 00/22131 PCT/US99/24065 -104- Lys Lys Leu Leu Val Ser Ile Thr Val Thr Phe Val Le 225 230 235 Pro Phe His Val Met Leu Leu Ile Arg Cys Ile Leu G1 245 250 Asn Phe Glu Asp His Ser Asn Ser Gly Lys Arg Thr Ty 260 265 Arg Ile Thr Val Ala Leu Thr Ser Leu Asn Cys Val Al 275 280 28 Leu Tyr Cys Phe Val Thr Glu Thr Gly Arg Tyr Asp Me 290 295 300 Leu Lys Phe Cys Thr Gly Arg Cys Asn Thr Ser Gin Ar 305 310 315 Arg Ile Leu Ser Val Ser Thr Lys Asp Thr Met Glu Le 325 330 Glu (136) INFORMATION FOR SEQ ID NO:135: SEQUENCE CHARACTERISTICS: LENGTH: 999 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:135: ATGGTGAACT CCACCCACCG TGGGATGCAC ACTTCTCTGC ACCTCTGGAA TACAGACTGC ACAGCAATGC CAGTGAGTCC CTTGGAAAAG GCTACTCTGA 120 TACGAGCAAC TTTTTGTCTC TCCTGAGGTG TTTGTGACTC TGGGTGTCAT 180 GAGAATATCT TAGTGATTGT GGCAATAGCC AAGAACAAGA ATCTGCATTC 240 TTTTTCATCT GCAGCTTGGC TGTGGCTGAT ATGCTGGTGA GCGTTTCAAA 300 ACCATTATCA TCACCCTATT AAACAGTACA GATACGGATG CACAGAGTTT 360 Cys His Thr 270 Asp Trp Gin Glu Phe Ala 255 Met Pro Asn Arg Val 335 Thr 240 Val Tyr Ile Ile Lys 320 Leu
CCGCAGCAGT
TGGAGGGTGC
CAGCTTGTTG
ACCCATGTAC
TGGATCAGAA
CACAGTGAAT
WO 00/22131 WO 00/2 131PCTIUS99/24065 -105- ATTGATAATG TCATTGACTC GGTGATCTGT AGCTCCTTGC TTGCATCCAT TTGCAGCCTG 420
CTTTCAATTG
480
ATGACAGTTA
540
GGCATTTTGT
600
TTCTTCACCA
660
CTTCACATTA
720
ATGAAGGGAA
780
TTCTTCCTCC
840
ATGTCTCACT
900
ATTTATGCAC
960
CCCCTGGGAG
999
CAGTGGACAG
AGCGGGTTGG
TCATCATTTA
TGCTGGCTCT
AGAGGATTGC
AAATTACCTT
ACTTAATATT
TTAACTTGTA
TCCGGAGTCA
GCCTTTGTGA
GTACTTTACT
GATCAGCATA
CTCAGATAGT
CATGGCTTCT
TGTCCTCCCC
GACCATCCTG
CTACATCTCT
TCTCATACTG
AGAACTGAGG
CTTGTCTAGC
ATCTTCTATG
AGTTGTATCT
AGTGCTGTCA
CTCTATGTCC
GGCACTGGTG
ATTGGCGTCT
TGTCCTCAGA
ATCATGTGTA
AAAACCTTCA
AGATATTAA
CTCTCCAGTA
GGGCAGCTTG
TCATCTGCCT
ACATGTTCCT
CCATCCGCCA
TTGTTGTCTG
ATCCATATTG
ATTCAATCAT
AAGAGATCAT
CCATAACATT
CACGGTTTCA.
CATCACCATG
GATGGCCAGG
AGGTGCCAAT
CTGGGCCCCA
TGTGTGCTTC
CGATCCTCTG
CTGTTGCTAT
(137) INFORMATION FOR SEQ ID NO:136: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 332 amino acids TYPE: amino acid STRANDfEDNESS: TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:136: Met Val Asn Ser Thr His Arg Gly Met His Thr Ser Leu His Leu Trp 1 5 10 Asn Arg Ser Ser Tyr Arg Leu His Ser Asn Ala Ser Glu Ser Leu Gly 25 Lys Gly Tyr Ser Asp Gly Gly Cys Tyr Glu Gln Leu Phe Val Ser Pro 40 WO 00/22131 WO 0022131PCT/US99/24065 106 Giu Val Phe Asn Asp Ile Val 145 Met Cys Val Al a Arg 225 Met Cys Gin Ile Arg 305 Pro Val Ile Phe Giy Ala Cys 130 Asp Thr Thr Ile Ser 210 Ile Lys Trp Asn Leu 290 Ser Leu Phe Val Ilie Ser Gin Ser Arg Vai Val Ile 195 Leu Al a Gly Ala Pro 275 Ile Gin Gly Val Ala Cys Giu 100 Ser Ser Tyr Lys Ser 180 Cys Tyr Val Lys Pro 260 Tyr Met Giu Gly Thr Ile Ser Thr Phe Leu Phe Arg 165 Gly Leu Val Leu Ile 245 Phe Cys Cys Leu Leu 325 Leu Gly 55 Ala Lys 70 Leu Ala Ile Ile Thr Val Leu Ala 135 Thr Ile 150 Val Gly Ile Leu Ile Thr His Met 215 Pro Gly *2 30 Thr Leu Phe Leu Val Cys Asn Ser 295 Arg Lys 310 Cys Asp Val Ile Asn Lys Val Ala Ile Thr 105 Asn Ile 120 Ser Ile Phe Tyr Ile Ser Phe Ile 185 Met Phe 200 Phe Leu Thr Gly Thr Ile His Leu 265 Phe Met 280 Ile Ile Thr Phe Leu Ser Ser Leu Asn Leu 75 Asp Met 90 Leu Leu Asp Asn Cys Ser Ala Leu 155 Ile Ser 170 Ile Tyr Phe Thr Met Ala Ala Ile 235 Leu Ile 250 Ile Phe Ser His Asp Pro Lys Giu 315 Ser Arg 330 Leu His Leu Asn Val Leu 140 Gin Cys Ser Met Arg 220 Arg Gly Tyr Phe Leu 300 Ile Tyr Glu Ser Val Ser Ile 125 Leu Tyr Ile Asp Leu 205 Leu Gin Val Ile Asn 285 Ile Ile Asn Ile Pro Met Ser Val Thr Asp 110 Asp Ser Ser Ile His Asn Trp Ala 175 Ser Ser 190 Aia Leu His Ile Giy Aia Phe Val 255 Ser Cys 270 Leu Tyr Tyr Aia Cys Cys Leu Tyr Ser Thr Val Ala Ile 160 Aia Ala Met Lys Asn 240 Val Pro Leu Leu Tyr 320 (138) INFORMATION FOR SEQ ID NO:137: WO 00/22131 PCT/US99/24065 -107- SEQUENCE CHARACTERISTICS: LENGTH: 33 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:137: GCCAATATGA AGGGAAAAAT TACCTTGACC ATC 33 (137) INFORMATION FOR SEQ ID NO:138: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:138: CTCCTTCGGT CCTCCTATCG TTGTCAGAAG T 31 (140) INFORMATION FOR SEQ ID NO:139: SEQUENCE CHARACTERISTICS: LENGTH: 1842 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:139: ATGGGGCCCA CCCTAGCGGT TCCCACCCCC TATGGCTGTA TTGGCTGTAA CCAGAATACC CACCGGCTCT AATCATCTTT ATGTTCTGCG CGATGGTTAT GTAGACCTAA TCGGCAACTC CATGGTCATT TTGGCTGTGA CGAAGAACAA AATTCTGGCA ACATCTTCGT GGTCAGTCTC TCTGTGGCCG ATATGCTGGT CCATACCCTT TGATGCTGCA TGCCATGTCC ATTGGGGGCT GGGATCTGAG TGCCAGATGG TCGGGTTCAT CACAGGGCTG AGTGTGGTCG GCTCCATCTT
GCTACCCCAG
CACCATCGTT
GAAGCTCCGG
GGCCATCTAC
CCAGTTACAG
CAACATCGTG
WO 00/22131 WO 0022131PCT/US99/24065 -108- GCAATCGCTA TCAACCGTTA CTGCTACATC TGCCACAGCC TCCAGTACGA ACGGATCTTC
AGTGTGCGCA.
CTGCCCAACA
AACTATCTGA
CTCCTCATCG
CCTGCAGGGC
GTGATCTTCC
GCTGTCAGTC
TTCATAGCCT
TTCCGAAGAG
GGCCTCATCA
CATGCTCGCG
ACCCCGATGA
CGTGCCTCTG
TCTACCCACC
GTCTCTGGCC
CCTGCCTCTG
AAGCCTGACT
CATGTCTCTG
CCCATCAAGC
ACTACCAGCC
CCCGAGATCC
TCTAGCCCTG
GCTGACCTTC
GTTGTTGATG
ATACCTGCAT
TGTACATTGG
ACAACCCTGT
TGGGTTTCTG
AGAA.TCCTGA
TCCTCTTTGC
CGAAGGAGAT
ACTTCAACAG
AATACTGGAC
GTGATATTCG
ACCAAGCTCG
ATGTCCGGAA
GCCACCCTAA
ACAAGTCTGT
ACTCCAAGCC
TCCATTTCAA
CTGTTCATTT
CTGGCAGCCA
CAGCTACCAG
ACCCTAAGCC
CTGCCATTGC
CCGCTGGGCC
CTGACCCTAC
TTGAAGATGA
CTACCTGGTC
CACCATCGAG
CTTCACTGTT
CTACGTGAGG
CAACCAACTT
AGTGTGCTGG
GGCAGGCAAG
CTGCCTCAAC
CATCTTCCAT
TGAGATGCAG
TGAACAAGAC
TGTTCCATTA
GCCCCATTCC
CTTTAGCCAC!
TGCCTCTGGT
GGGTGACTCT
CAAGCCTGCT
CTCCAAGTCT
CCATGCTGAG
CGCTGCTGCT
CCACCCTGTG
CACCAAGCCT
TGTAGTCACT
TCCTGATGAA
ATCACCTGGA
TACGATCCTC
ACCATCGTCT
ATCTGGACCA
GCTGAGGTTC
TGCCCTATCA
ATCCCCAACT
GCTGTGATCT
GCTATGCGGC
GAGGCCCGTA
CGTGCCCATG
CCTGGTGATG
AGATCCTCCT
TCCA.AGGCTG
CACCCCAAGT
GTCCATTTCA
TCCAGCAACC
GCCTTCAGTG
CCCACCACTG
GACAACCCTG
TCTGACGACA
GCTGCCAGCC
ACCAGTACCA
ATGGCTGTGT
TCATGACCGT
GCACCTACAC
GCATCCACTT
AAGTGCTGGC
GCAATTTTCT
ACGTGCTCAC!
GGCTTTATCT
ACGGGCTCCT
ACCCTATCAT
CCCTGGCCCG
CCTGTCCTGC
CTGCAGCTGG
CTGCCTATCG
CCTCTGGTCA
CTGCCACTGT
AGGGTGACTC
CCAAGCCCAT
CTGCCACCAG
CTGACTATCC
AGCTCTCTGC
GTGACCTCCC
AGCTGGAGTC
ATGATTACCA
GA
CCTGGCTGTC
CTGCATCTTC
CGTCCTCCCT
GGCCCGTGAC
AACCATGTTT
TGTCTTGGTG
TGCAGCCTAC
CAATGAGAAT
ATTCTTCCCT
CGCCCGTGCC
TGTGGAGGAA
CCACCCCGAC
CAAATCTGCC
CCTCAAGCCT
CTACCCTAAG
TGTCCATTTC
CACTGGCCAC
CCACCCTAAA
CAAGCCTGCC
CTCCCATTGC
TGAGTCGGCC
TGACACCATC
TGATGTCGTG
420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1842 (141) INFORMATION FOR SEQ ID NO:140: SEQUENCE CHARACTERISTICS: LENGTH: 613 amino acids TYPE: amino acid WO 00/22131 PCT/US99/24065 -109-
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:140: Met Gly Pro Thr Leu Ala Val Pro Thr Pro Tyr Gly Cys Ile Gly Cys 1 5 10 Lys Leu Pro Gin Pro Glu Tyr Pro Pro Ala Leu Ile Ile Phe Met Phe 25 Cys Ala Met Val Ile Thr Ile Val Val Asp Leu Ile Gly Asn Ser Met 35 40 Val Ile Leu Ala Val Thr Lys Asn Lys Lys Leu Arg Asn Ser Gly Asn 55 Ile Phe Val Val Ser Leu Ser Val Ala Asp Met Leu Val Ala Ile Tyr 70 75 Pro Tyr Pro Leu Met Leu His Ala Met Ser Ile Gly Gly Trp Asp Leu 90 Ser Gin Leu Gin Cys Gin Met Val Gly Phe Ile Thr Gly Leu Ser Val 100 105 110 Val Gly Ser Ile Phe Asn Ile Val Ala Ile Ala Ile Asn Arg Tyr Cys 115 120 125 Tyr Ile Cys His Ser Leu Gin Tyr Glu Arg Ile Phe Ser Val Arg Asn 130 135 140 Thr Cys Ile Tyr Leu Val Ile Thr Trp Ile Met Thr Val Leu Ala Val 145 150 155 160 Leu Pro Asn Met Tyr Ile Gly Thr Ile Glu Tyr Asp Pro Arg Thr Tyr 165 170 175 Thr Cys Ile Phe Asn Tyr Leu Asn Asn Pro Val Phe Thr Val Thr Ile 180 185 190 Val Cys Ile His Phe Val Leu Pro Leu Leu Ile Val Gly Phe Cys Tyr 195 200 205 Val Arg Ile Trp Thr Lys Val Leu Ala Ala Arg Asp Pro Ala Gly Gin 210 215 220 Asn Pro Asp Asn Gin Leu Ala Glu Val Arg Asn Phe Leu Thr Met Phe 225 230 235 240 Val Ile Phe Leu Leu Phe Ala Val Cys Trp Cys Pro Ile Asn Val Leu 245 250 255 WO 00/22131 PCT/US99/24065 -110- Thr Asn Leu Tyr 305 Gly Arg His Pro His 385 Ser His Lys Asp Val 465 His Ser Thr Ala Val Trp Asn 290 Trp Leu Ala Ala Leu 370 Pro Thr Leu Ser Ser 450 His Val His Ala Ala 530 Leu Leu 275 Ala Thr Ile Arg Cys 355 Pro Lys His Lys Ala 435 Val Phe Ser Pro Asp 515 Asp Val 260 Tyr Val Ile Ser Ala 340 Pro Gly Pro His Pro 420 Thr His Lys Ala Lys 500 Tyr Asn Ala Val Ser Leu Ala Ala Ile Tyr Gly 295 Phe His Ala 310 Asp Ile Arg 325 His Ala Arg Ala Val Glu Asp Ala Ala 375 His Ser Arg 390 Lys Ser Val 405 Val Ser Gly Val Tyr Pro Phe Lys Gly 455 Pro Ala Ser 470 Gly Ser His 485 Pro Ile Lys Pro Lys Pro Pro Glu Leu 535 Pro Lys 265 Tyr Phe 280 Leu Leu Met Arg Glu Met Asp Gin 345 Glu Thr 360 Ala Gly Ser Ser Phe Ser His Ser 425 Lys Pro 440 Asp Ser Ser Asn Ser Lys Pro Ala 505 Ala Thr 520 Ser Ala Glu Met Ala Gly Lys Ile Pro 270 Ile Asn His Gin 330 Ala Pro His Ser His 410 Lys Ala Val Pro Ser 490 Thr Thr Ser Ala Glu Pro 315 Glu Arg Met Pro Ala 395 Ser Pro Ser His Lys 475 Ala Ser Ser His Tyr Asn 300 Ile Ala Glu Asn Asp 380 Tyr Lys Ala Val Phe 460 Pro Phe His His Cys 540 Phe 285 Phe Ile Arg Gin Val 365 Arg Arg Ala Ser His 445 Lys Ile Ser Ala Pro 525 Pro Asn Arg Phe Thr Asp 350 Arg Ala Lys Ala Gly 430 Phe Pro Thr Ala Glu 510 Lys Glu Ser Arg Phe Leu 335 Arg Asn Ser Ser Ser 415 His Lys Asp Gly Ala 495 Pro Pro Ile Cys Glu Pro 320 Ala Ala Val Gly Ala 400 Gly Pro Gly Ser His 480 Thr Thr Ala Pro Ala Ile Ala His Pro Val Ser Asp Asp Ser Asp Leu Pro Glu Ser Ala WO 00/22131 WO 0022131PCTIUS99/24065 ill 545 550 Ser Ser Pro Ala Ala Gly 565 Ser Asp Thr Ile Ala Asp 580 Thr Asn Asp Tyr His Asp 595 Asp Glu Met Ala Val 610 555 560 Pro Thr Lys Pro Ala Ala Ser Gin Leu Giu 570 575 Leu Pro Asp Pro Thr Val Vai Thr Thr Ser 585 590 Val Val Val Val Asp Val Giu Asp Asp Pro 600 605 (142) INFORMATION FOR SEQ ID NO:141: SEQUENCE CHARACTERISTICS: LENGTH: 1842 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:141: ATGGGGCCCA CCCTAGCGGT TCCCACCCCC TATGGCTGTA TTGGCTGTAA GCTACCCCAG
CCAGAATACC
GTAGACCTAA
AATTCTGGCA
CCATACCCTT
TGCCAGATGG
GCAATCGCTA
AGTGTGCGCA
CTGCCCAACA
AACTATCTGA
CTCCTCATCG
CCTGCAGGGC
GTGATCTTCC
GCTGTCAGTC
CACCGGCTCT
TCGGCAACTC
ACATCTTCGT
TGATGCTGCA
TCGGGTTCAT
TCAACCGTTA
ATACCTGCAT
TGTACATTGG
ACAACCCTGT
TGGGTTTCTG
AGAATCCTGA
TCCTCTTTGC
CGAAGGAGAT
AATCATCTTT
CATGGTCATT
GGTCAGTCTC
TGCCATGTCC
CACAGGGCTG
CTGCTACATC
CTACCTGGTC
CACCATCGAG
CTTCACTGTT
CTACGTGAGG
CAACCAACTT
AGTGTGCTGG
GGCAGGCAAG
ATGTTCTGCG
TTGGCTGTGA
TCTGTGGCCG
ATTGGGGGCT
AGTGTGGTCG
TGCCACAGCC
ATCACCTGGA
TACGATCCTC
ACCATCGTCT
ATCTGGACCA
GCTGAGGTTC
TGCCCTATCA
ATCCCCAACT
CGATGGTTAT
CGAAGAACAA
ATATGCTGGT
GGGATCTGAG
GCTCCATCTT
TCCAGTACGA
TCATGACCGT
GCACCTACAC
GCATCCACTT
AAGTGCTGGC
GCAATAAACT
ACGTGCTCAC
GGCTTTATCT
CACCATCGTT
GAAGCTCCGG
GGCCATCTAC
CCAGTTACAG
CAACATCGTG
ACGGATCTTC
CCTGGCTGTC
CTGCATCTTC
CGTCCTCCCT
GGCCCGTGAC
AACCATGTTT
TGTCTTGGTG
TGCAGCCTAC
120 180 240 300 360 420 480 540 600 660 720 780 840 WO 00/22131 WO 00/213 1PCTIUS99/24065 112
TTCATAGCCT
TTCCGA.AGAG
GGCCTCATCA
CATGCTCGCG
ACCCCGATGA
CGTGCCTCTG
TCTACCCACC
GTCTCTGGCC
CCTGCCTCTG
AAGCCTGACT
CATGTCTCTG
CCCATCAAGC
ACTACCAGCC
CCCGAGATCC
TCTAGCCCTG
GCTGACCTTC
ACTTCAACAG CTGCCTCAAC GCTGTGATCT ACGGGCTCCT CAATGAGAAT
AATACTGGAC
GTGATATTCG
ACCAAGCTCG
ATGTCCGGAA.
GCCACCCTAA
ACAAGTCTGT
ACTCCAAGCC
TCCATTTCAA
CTGTTCATTT
CTGGCAGCCA
CAGCTACCAG
ACCCTAAGCC
CTGCCATTGC
CCGCTGGGCC
CTGACCCTAC
CATCTTCCAT
TGAGATGCAG
TGAACAAGAC
TGTTCCATTA
GCCCCATTCC
CTTTAGCCAC
TGCCTCTGGT
GGCTGACTCT
CAAGCCTGCT
CTCCAAGTCT
CCATGCTGAG
CGCTGCTGCT
CCACCCTGTG
CACCAAGCCT
TGTAGTCACT
TCCTGATGAA
GCTATGCGGC
GAGGCCCGTA
CGTGCCCATG
CCTGGTGATG
AGATCCTCCT
TCCAAGGCTG
CACCCCAAGT
GTCCATTTCA
TCCAGCAACC
GCCTTCAATG
CCCACCACTG
GACAACCCTG
TCTGACGACA
GCTGCCAGCC
ACCAGTACCA
ATGGCTGTGT
ACCCTATCAT
CCCTGGCCCG
CCTGTCCTGC
CTGCAGCTGG
CTGCCTATCG
CCTCTGGTCA
CTGCCACTGT
AGGGTGACTC
CCAAGCCCAT
CTGCCACCAG
CTGACTATCC
AGCTCTCTGC
GTGAC CT CC C
AGCTGGAGTC
ATGATTACCA
GA
ATTCTTCTCT
CGCCCGTGCC
TGTGGAGGAA
CCACCCCGAC
CAAATCTGCC
CCTCAA.GCCT
CTACCCTAAG
TGTCCATTTC
CACTGGCCAC
CCACCCTAAA
CAAGCCTGCC
CTCCCATTGC
TGAGTCGGCC
TGACACCATC
TGATGTCGTG
900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1842 GTTGTTGATG TTGAAGATGA (143) INFORMATION FOR SEQ ID NO:142: SEQUENCE CHARACTERISTICS: LENGTH: 613 amino acids TYPE: amino acid
STRANDEDNESS:
TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:142: Met Gly Pro Thr Leu Ala Val Pro Thr Pro Tyr Gly Cys Ile Gly Cys 1 5 10 Lys Leu Pro Gin Pro Glu Tyr Pro Pro Ala Leu Ile Ile Phe Met Phe 25 Cys Ala Met Val Ile Thr Ile Val Val Asp Leu Ile Gly Asn Ser Met 40 WO 00/22131 PCT/US99/24065 -113- Val Ile Leu Ala Val Thr Lys Asn Lys Lys Leu Arg Asn Ser Gly Asn 55 Ile Phe Val Val Ser Leu Ser Val Ala Asp Met Leu Val Ala Ile Tyr 70 75 Pro Tyr Pro Leu Met Leu His Ala Met Ser Ile Gly Gly Trp Asp Leu 90 Ser Gin Leu Gin Cys Gin Met Val Gly Phe Ile Thr Gly Leu Ser Val 100 105 110 Val Gly Ser Ile Phe Asn Ile Val Ala Ile Ala Ile Asn Arg Tyr Cys 115 120 125 Tyr Ile Cys His Ser Leu Gin Tyr Glu Arg Ile Phe Ser Val Arg Asn 130 135 140 Thr Cys Ile Tyr Leu Val Ile Thr Trp Ile Met Thr Val Leu Ala Val 145 150 155 160 Leu Pro Asn Met Tyr Ile Gly Thr Ile Glu Tyr Asp Pro Arg Thr Tyr 165 170 175 Thr Cys Ile Phe Asn Tyr Leu Asn Asn Pro Val Phe Thr Val Thr Ile 180 185 190 Val Cys Ile His Phe Val Leu Pro Leu Leu Ile Val Gly Phe Cys Tyr 195 200 205 Val Arg Ile Trp Thr Lys Val Leu Ala Ala Arg Asp Pro Ala Gly Gin 210 215 220 Asn Pro Asp Asn Gin Leu Ala Glu Val Arg Asn Lys Leu Thr Met Phe 225 230 235 240 Val Ile Phe Leu Leu Phe Ala Val Cys Trp Cys Pro Ile Asn Val Leu 245 250 255 Thr Val Leu Val Ala Val Ser Pro Lys Glu Met Ala Gly Lys Ile Pro 260 265 270 Asn Trp Leu Tyr Leu Ala Ala Tyr Phe Ile Ala Tyr Phe Asn Ser Cys 275 280 285 Leu Asn Ala Val Ile Tyr Gly Leu Leu Asn Glu Asn Phe Arg Arg Glu 290 295 300 Tyr Trp Thr Ile Phe His Ala Met Arg His Pro Ile Ile Phe Phe Ser 305 310 315 320 Gly Leu Ile Ser Asp Ile Arg Glu Met Gin Glu Ala Arg Thr Leu Ala 325 330 335 Arg Ala Arg Ala His Ala Arg Asp Gin Ala Arg Glu Gin Asp Arg Ala WO 00/22131 PCT/US99/24065 -114- 340 345 350 His Ala Cys Pro Ala Val Glu Glu Thr Pro Met Asn Val Arg Asn Val 355 360 365 Pro Leu Pro Gly Asp Ala Ala Ala Gly His Pro Asp Arg Ala Ser Gly 370 375 380 His Pro Lys Pro His Ser Arg Ser Ser Ser Ala Tyr Arg Lys Ser Ala 385 390 395 400 Ser Thr His His Lys Ser Val Phe Ser His Ser Lys Ala Ala Ser Gly 405 410 415 His Leu Lys Pro Val Ser Gly His Ser Lys Pro Ala Ser Gly His Pro 420 425 430 Lys Ser Ala Thr Val Tyr Pro Lys Pro Ala Ser Val His Phe Lys Ala 435 440 445 Asp Ser Val His Phe Lys Gly Asp Ser Val His Phe Lys Pro Asp Ser 450 455 460 Val His Phe Lys Pro Ala Ser Ser Asn Pro Lys Pro Ile Thr Gly His 465 470 475 480 His Val Ser Ala Gly Ser His Ser Lys Ser Ala Phe Asn Ala Ala Thr 485 490 495 Ser His Pro Lys Pro Ile Lys Pro Ala Thr Ser His Ala Glu Pro Thr 500 505 510 Thr Ala Asp Tyr Pro Lys Pro Ala Thr Thr Ser His Pro Lys Pro Ala 515 520 525 Ala Ala Asp Asn Pro Glu Leu Ser Ala Ser His Cys Pro Glu Ile Pro 530 535 540 Ala Ile Ala His Pro Val Ser Asp Asp Ser Asp Leu Pro Glu Ser Ala 545 550 555 560 Ser Ser Pro Ala Ala Gly Pro Thr Lys Pro Ala Ala Ser Gln Leu Glu 565 570 575 Ser Asp Thr Ile Ala Asp Leu Pro Asp Pro Thr Val Val Thr Thr Ser 580 585 590 Thr Asn Asp Tyr His Asp Val Val Val Val Asp Val Glu Asp Asp Pro 595 600 605 Asp Glu Met Ala Val 610 (144) INFORMATION FOR SEQ ID NO:143: WO 00/22131 PCT/US99/24065 -115- SEQUENCE CHARACTERISTICS: LENGTH: 33 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:143: GCTGAGGTTC GCAATAAACT AACCATGTTT GTG 33 (145) INFORMATION FOR SEQ ID NO.:144: SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:144: CTCCTTCGGT CCTCCTATCG TTGTCAGAAG T 31 (146) INFORMATION FOR SEQ ID NO:145: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:145: TTAGATATCG GGGCCCACCC TAGCGGT 33 (147) INFORMATION FOR SEQ ID NO:146: SEQUENCE CHARACTERISTICS: LENGTH: 29 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) WO 00/22131 PCT/US99/24065 -116- (iv) ANTI-SENSE: YES (xi) SEQUENCE DESCRIPTION: SEQ ID NO:146: GGTACCCCCA CAGCCATTTC ATCAGGATC 33

Claims (31)

1. An isolated polynucleotide encoding a non-endogenous, constitutively activated version of a human G protein-coupled receptor, wherein said polynucleotide comprises a nucleotide sequence selected from the group consisting of: a sequence encoding a polypeptide that comprises the amino acid sequence set forth in SEQ ID NO: 130; the sequence set forth in SEQ ID NO: 129; a sequence having at least about 80% identity to SEQ ID NO: 129 other than a sequence encoding a non-endogenous, constitutively activated version of a human G protein-coupled receptor comprising a valine residue at position 297 of SEQ ID NO: 130; and the sequence of wherein the constitutively activated version of a human G protein-coupled receptor comprises an amino acid sequence having a lysine residue at a position equivalent to position 297 of SEQ ID NO: 130.
2. The isolated polynucleotide according to claim 1 wherein said polynucleotide comprises a nucleotide sequence selected from the group consisting of: a sequence that is identical or substantially identical to SEQ ID NO: 129 wherein the codon at nucleotide positions 889-891 encoding lysine is unchanged or substituted with a codon that encodes an amino acid other than valine; a sequence encoding a constitutively activated version of a human G 25 protein-coupled receptor having an amino acid sequence identical or substantially identical to SEQ ID NO:130 wherein the lysine residue at amino acid position 297 is unchanged or substituted with an amino acid other than valine; and a sequence encoding a variant of a non-endogenous, constitutively activated version of a human G protein-coupled receptor comprising the amino acid sequence set forth in SEQ ID NO: 130 in which the lysine ***residue at position 297 is substituted for a different amino acid other than valine. 63
3. An isolated polynucleotide encoding a non-endogenous, constitutively activated version of a human G protein-coupled receptor, wherein said polynucleotide comprises a nucleotide sequence selected from the group consisting of: a sequence encoding the amino acid sequence set forth in SEQ ID NO: 130; and the nucleotide sequence set forth in SEQ ID NO: 129.
4. An isolated polynucleotide encoding a GPCR fusion protein, wherein said polynucleotide comprises a nucleotide sequence of the isolated polynucleotide according to any one of claims 1 to 3. The isolated polynucleotide according to claim 4 further comprising nucleic acid encoding a G protein.
6. The isolated polynucleotide according to claim 5 wherein the G protein is a Gsa protein.
7. A vector comprising the polynucleotide according to any one of claims 1 to 6.
8. The vector of claim 7, wherein said vector is an expression vector and wherein the polynucleotide according to any one of claims 1 to 6 is operably linked to a promoter.
9. A recombinant host cell comprising the vector of claim 7.
10. A recombinant host cell comprising the vector of claim 8.
11. A method of producing a non-endogenous, constitutively activated version of a human G protein-coupled receptor or a GPCR fusion protein comprising the steps of: transfecting the expression vector according to claim 8 into a host cell thereby producing a transfected host cell; and culturing the transfected host cell under conditions sufficient to express a non-endogenous, constitutively activated version of a human G protein-coupled receptor or GPCR fusion protein from the expression 35 vector. C CS
12. The method of claim 11 further comprising obtaining the transfected host cell.
13. The method of claim 12 further comprising obtaining or isolating a membrane fraction from the obtained transfected host cell.
14. An isolated membrane of a transfected host cell obtained by the method of claim 13 wherein said isolated membrane comprises a non-endogenous, constitutively activated version of said human G protein-coupled receptor encoded by the isolated polynucleotide according to any one of claims 1 to 3 or a GPCR fusion protein encoded by the isolated polynucleotide according to any one of claims 4 to 6 or a polypeptide expressed by the vector of claim 8. An isolated membrane of the recombinant host cell of claim 10 wherein said isolated membrane comprises a non-endogenous, constitutively activated version of said human G protein-coupled receptor encoded by the isolated polynucleotide according to any one of claims 1 to 3 or a GPCR fusion protein encoded by the isolated polynucleotide according to any one of claims 4 to 6 or a polypeptide expressed by the vector of claim 8.
16. An isolated or recombinant non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide comprising an amino acid sequence selected from the group consisting of: a sequence comprising the amino acid sequence set forth in SEQ ID NO: 130; 25 a sequence having at least about 80% identity to SEQ ID NO: 130 other than a sequence comprising a valine residue at position 297 of SEQ ID NO: 130; and the sequence of(b) wherein said sequence comprises a lysine residue at a position equivalent to position 297 of SEQ ID NO: 130.
17. The isolated or recombinant non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide according to claim 16 wherein said polypeptide comprises an amino acid sequence selected from the group consisting of: a sequence that is substantially identical to SEQ ID NO:130 wherein the 35 lysine residue at amino acid position 297 is unchanged or substituted with an amino acid other than valine; and a sequence comprising the amino acid sequence set forth in SEQ ID NO: 130 in which the lysine residue at position 297 is substituted for a different amino acid other than valine.
18. An isolated or recombinant non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide comprising the amino acid sequence set forth in SEQ ID NO: 130.
19. An isolated or recombinant GPCR fusion protein comprising an amino acid sequence of the isolated or recombinant non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide according to any one of claims 16 to 18. The isolated or recombinant GPCR fusion protein according to claim 19 further comprising a G protein.
21. The isolated or recombinant GPCR fusion protein according to claim 20 wherein the G protein is a Gsa protein.
22. An isolated membrane of a cell wherein said isolated membrane comprises the isolated or recombinant non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide according to any one of claims 16 to 18 or the isolated or recombinant GPCR fusion protein according to any one of claims 19 to 21. 25 23. A method of identifying a modulator of a G protein-coupled receptor comprising the steps of: contacting a candidate compound with a recombinant host cell that expresses the non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide according to any one of claims 16 to 18 or an isolated membrane comprising said non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide; and measuring the ability of the compound to inhibit or stimulate functionality of the G protein coupled receptor polypeptide wherein inhibition or 35 stimulation of said functionality indicates that the candidate compound is a modulator of the G protein-coupled receptor polypeptide.
24. The method of claim 24 further comprising providing the host cell or membrane. The method of claim 23 or 24 wherein the host cell comprises the expression vector of claim 8.
26. The method according to any one of claims 23 to 25 wherein the human G protein-coupled receptor polypeptide comprises the amino acid sequence set forth in SEQ ID NO: 130.
27. A method of identifying a modulator of a G protein-coupled receptor comprising the steps of: contacting a candidate compound with a recombinant host cell that expresses the GPCR fusion protein according to any one of claims 19 to 21 or an isolated membrane comprising said GPCR fusion protein; and measuring the ability of the compound to inhibit or stimulate functionality of the G protein-coupled receptor polypeptide portion of said GPCR fusion protein wherein inhibition or stimulation of said functionality indicates that the candidate compound is a modulator of the G protein- coupled receptor polypeptide.
28. The method of claim 27 further comprising providing the host cell or membrane.
29. The method of claim 26 or 27 wherein the host cell comprises the expression 25 vector of claim 8. The method according to any one of claims 26 to 29 wherein the GPCR fusion protein comprises the amino acid sequence set forth in SEQ ID NO: 130.
31. The method according to any one of claims 26 to 30 wherein the GPCR fusion protein comprises a Gsa protein.
32. A method of identifying a modulator of a G protein-coupled receptor comprising the steps of: 35 providing a recombinant host cell that expresses a GPCR fusion protein comprising the amino acid sequence set forth in SEQ ID NO: 130 and a *r 67 Gsa protein or an isolated membrane comprising said GPCR fusion protein; contacting a candidate compound with the recombinant host cell or isolated membrane; and measuring the ability of the compound to inhibit or stimulate functionality of the G protein-coupled receptor polypeptide portion of said GPCR fusion protein wherein inhibition or stimulation of said functionality indicates that the candidate compound is a modulator of the G protein- coupled receptor polypeptide.
33. The isolated polynucleotide according to any one of claims 1 to 6 substantially as hereinbefore described with reference to the examples and/or drawings.
34. The vector of claim 7 or 8 substantially as hereinbefore described with reference to the examples and/or drawings. The recombinant host cell of claim 9 or 10 substantially as hereinbefore described with reference to the examples and/or drawings.
36. The method according to any one of claims 11 to 13 substantially as hereinbefore described with reference to the examples and/or drawings.
37. The isolated membrane according to any one of claims 14, 15, or 22 substantially as hereinbefore described with reference to the examples and/or drawings.
38. The isolated or recombinant non-endogenous, constitutively activated version of a human G protein-coupled receptor polypeptide according to any one of claims 16 to 18 substantially as hereinbefore described with reference to the examples and/or drawings. S*o •39. The isolated or recombinant GPCR fusion protein according to any one of claims 19 to 21 substantially as hereinbefore described with reference to the examples and/or drawings. 68 The method according to any one of claims 23 to 32 substantially as hereinbefore described with reference to the examples and/or drawings. Dated this twenty-eighth day of October 2003 Arena Pharmaceuticals, Inc. Patent Attorneys for the Applicant: F B RICE CO *.go*
AU62991/99A 1998-10-12 1999-10-13 Non-endogenous, constitutively activated human G protein-coupled receptors Ceased AU770871B2 (en)

Applications Claiming Priority (63)

Application Number Priority Date Filing Date Title
US09/170496 1998-10-13
US09/170,496 US6555339B1 (en) 1997-04-14 1998-10-13 Non-endogenous, constitutively activated human protein-coupled receptors
US10802998P 1998-11-12 1998-11-12
US60/108029 1998-11-12
US10921398P 1998-11-20 1998-11-20
US60/109213 1998-11-20
US11006098P 1998-11-27 1998-11-27
US60/110060 1998-11-27
US12041699P 1999-02-16 1999-02-16
US60/120416 1999-02-16
US12185299P 1999-02-26 1999-02-26
US60/121852 1999-02-26
US12394899P 1999-03-12 1999-03-12
US12394699P 1999-03-12 1999-03-12
US12394599P 1999-03-12 1999-03-12
US12395199P 1999-03-12 1999-03-12
US12394499P 1999-03-12 1999-03-12
US12394999P 1999-03-12 1999-03-12
US60/123946 1999-03-12
US60/123948 1999-03-12
US60/123945 1999-03-12
US60/123949 1999-03-12
US60/123944 1999-03-12
US60/123951 1999-03-12
US13643999P 1999-05-28 1999-05-28
US13643799P 1999-05-28 1999-05-28
US13713199P 1999-05-28 1999-05-28
US13756799P 1999-05-28 1999-05-28
US13712799P 1999-05-28 1999-05-28
US13643699P 1999-05-28 1999-05-28
US60/137127 1999-05-28
US60/136439 1999-05-28
US60/136567 1999-05-28
US60/136436 1999-05-28
US60/137131 1999-05-28
US60/136437 1999-05-28
US14144899P 1999-06-29 1999-06-29
US60/141448 1999-06-29
US15111499P 1999-08-27 1999-08-27
US60/151114 1999-08-27
US15252499P 1999-09-03 1999-09-03
US60/152524 1999-09-03
US15665399P 1999-09-29 1999-09-29
US15663399P 1999-09-29 1999-09-29
US15655599P 1999-09-29 1999-09-29
US15663499P 1999-09-29 1999-09-29
US60/156634 1999-09-29
US60/156555 1999-09-29
US60/156633 1999-09-29
US60/156653 1999-09-29
US15729399P 1999-10-01 1999-10-01
US15729499P 1999-10-01 1999-10-01
US15728299P 1999-10-01 1999-10-01
US15728199P 1999-10-01 1999-10-01
US15728099P 1999-10-01 1999-10-01
US60/157280 1999-10-01
US60/157293 1999-10-01
US60/157281 1999-10-01
US60/157282 1999-10-01
US60/157294 1999-10-01
US09/416760 1999-10-12
US09/417044 1999-10-12
PCT/US1999/024065 WO2000022131A2 (en) 1998-10-13 1999-10-13 Non-endogenous, constitutively activated human g protein-coupled receptors

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2004202476A Division AU2004202476A1 (en) 1998-10-12 2004-06-03 Human G Protein-Coupled Receptors

Publications (2)

Publication Number Publication Date
AU6299199A AU6299199A (en) 2000-05-01
AU770871B2 true AU770871B2 (en) 2004-03-04

Family

ID=31999975

Family Applications (2)

Application Number Title Priority Date Filing Date
AU62991/99A Ceased AU770871B2 (en) 1998-10-12 1999-10-13 Non-endogenous, constitutively activated human G protein-coupled receptors
AU2004202476A Abandoned AU2004202476A1 (en) 1998-10-12 2004-06-03 Human G Protein-Coupled Receptors

Family Applications After (1)

Application Number Title Priority Date Filing Date
AU2004202476A Abandoned AU2004202476A1 (en) 1998-10-12 2004-06-03 Human G Protein-Coupled Receptors

Country Status (1)

Country Link
AU (2) AU770871B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2007202139B2 (en) * 1998-11-20 2009-05-21 Arena Pharmaceuticals, Inc. Human Orphan G protein-coupled receptor hRUP3

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2298803A3 (en) * 1999-09-24 2011-06-22 Abbott Healthcare Products B.V. Human G-protein coupled receptor and ligands thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GENBANK ACCESSION NO. AF034632 *
GENOMICS 1997 46(3) MCKEE ET AL 426-34 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2007202139B2 (en) * 1998-11-20 2009-05-21 Arena Pharmaceuticals, Inc. Human Orphan G protein-coupled receptor hRUP3
AU2007202139B8 (en) * 1998-11-20 2009-09-17 Arena Pharmaceuticals, Inc. Human Orphan G protein-coupled receptor hRUP3

Also Published As

Publication number Publication date
AU2004202476A1 (en) 2004-07-01
AU6299199A (en) 2000-05-01

Similar Documents

Publication Publication Date Title
KR100926208B1 (en) Human Orphan G Protein-Coupled Receptors
AU2005244540B2 (en) Endogenous and non-endogenous versions of human G protein-coupled receptors
CA2348688A1 (en) Non-endogenous, constitutively activated human g protein-coupled receptors
US20030175891A1 (en) Human orphan G protein-coupled receptors
EP1121431A1 (en) Non-endogenous, constitutively activated human g protein-coupled receptors
AU770871B2 (en) Non-endogenous, constitutively activated human G protein-coupled receptors
US20030018182A1 (en) Non-endogenous, constitutively activated human G protein-coupled receptors
US20030229216A1 (en) Constitutively activated human G protein coupled receptors
MXPA01003726A (en) Non-endogenous, constitutively activated human g protein-coupled receptors
CN101597608A (en) With human g-protein coupled orphan receptor
MXPA01005021A (en) Human orphan g protein-coupled receptors
EP2264068A1 (en) Non-endogenous, constitutively activated human G protein-coupled receptors
EP1137776A2 (en) Non-endogenous, constitutively activated human g protein-coupled receptors

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)
DA3 Amendments made section 104

Free format text: THE NATURE OF THE AMENDMENT IS: AMEND INVENTORS NAMES TO READ: CHEN W LIAW, DEREK T CHALMERS, AND DOMINIC P BEHAN AMEND FIVE PRIORITY DETAILS TO READ: US 19990528 60/136567, US 19990629 60/141448, US 19990929 60/156653, US 19981012 09/417044 AND US 19981012 09/416760