MXPA97002205A

MXPA97002205A - Gene of galactocinasa hum

Info

Publication number: MXPA97002205A
Application number: MXPA/A/1997/002205A
Authority: MX
Inventors: Jon Bergsma Derk; Edward Stambolian Dwight
Original assignee: Smithkline Beecham Corporation; University Of Pennsylvania
Priority date: 1994-09-23
Filing date: 1997-03-20
Publication date: 1997-06-01

Abstract

The present invention relates to human galacto kinase and the identification of galacto kinase mutations, a missense mutation and a nonsense mutation, as well as nucleic acids encoding them, to a recombinant host cell transformed with DNA coding for such proteins and Uses of Expressed Proteins and Nucleic Acid Sequences in Therapeutic and Diagnostic Applications

Description

GENE OF Lfl GALACTOCINASA HUTTANA The present invention was carried out in part with government support > under the project EY-09404 granted by the National Institutes of Health The government of E.U.PL has certain rights in the invention.

CROSS REFERENCES FOR RELATED APPLICATIONS This application is a continuation in part of the application Ser. No. PCT / US94 / 10825 filed on September 23, 1994.

FIELD OF THE INVENTION The present invention relates to human galacto kinase and the identification of galacto kinase mutations, a missense and a nonsense mutation, as well as isolated nucleic acids: Leos encoding them, to a recombinant host cell transformed with DNA coding for such. { • Roteins and uses of expressed proteins and sequences < nucleic acids in therapeutic and diagnostic applications BACKGROUND OF THE INVENTION There are numerous inherited diseases of the human metabolism, most of which are recessive. Many have devastating effects that may include a combination of several clinical aspects, such as severe mental retardation, peripheral nervous system deterioration, blindness, hearing impairment and organ egalia. Most diseases are rare. However, most of these diseases can not be treated with medication. Galactokinase deficiency is one of the three known forms of galactosemia. The other forms are deficiency of galactose-1-phosphate uridyltransferase and deficiency of UDP-galactose-4-epi-erase. These three enzymes are involved in the metabolism of galactokinase, e.g. the conversion of galactose to glucose in the body. The deficiency of galactocinasa is inherited as a recessive autosomal character with a heterozygote frequency, which is estimated to be 0.2% in the general population (see, eg, Levy et al., 3. Pediatr., 9_2: 871- 877 (1978). Patients with oculogenic galactose deficiency usually become symptomatic in the early childhood period showing galactosemia, galactosura, increased levels of galactitol, cataracts and, in a few cases, mental retardation (Segal et al., 3. Pediatr, 95_: 750-752 (1979).) These symptoms generally improve dramatically with the administration of a galactose-free diet.The heterozygotes for galactocinase deficiency tend to present with cataracts beginning during the 20-50 years of age (Stambolian et al., Invest. Qphthal., Vis. Sci., 27: 429-433 (1986).) Galactokinase activity has been found in several mammalian tissues, including liver, kidney, brain, lens, placenta, erythrocytes and leuc While the protein has been purified from E ^ coli, it has been proven that the purification of the protein from mammalian tissues is difficult due to its low cell concentration. In addition, the molecular basis of galactokinase deficiency is unknown. The present invention provides a human galactokinase gene. The DNAs of the present invention, such as the specific sequences described herein, are useful because they encode the genetic information required for the expression of this protein. Additionally, the sequences can be used as probes for the purpose of isolating and identifying additional members, family, type and / or subtype as well as mutations which can form the basis of the galactocinease deficiency that can be characterized by specific mutations for the site or by the atypical expression of the galacto kinase gene. The galactokinase gene is also useful as a diagnostic agent for identifying galactokinase or co-mutant proteins or a therapeutic agent via gene therapy. The first clinical trials of gene therapy began in 1990. Since that time, through a regulatory authority, such as the Recombination Advisory Committee (RAO of the NIH (National Institute of Health) have been reviewed and approved more than 70 trial protocols Clinical, see eg, Anderson, UF, Human Gene Therapy, 5_: 281--282 (.1994). The therapeutic treatment of diseases and disorders by gene therapy includes the transfer and stable insertion of new genetic information into the cells. The correction of a genetic defect by reintroducing the normal allele of a gene has shown, from this, that this concept is clinically feasible (see, eg, Rosenberg, et al., New Enq. 3. fled., 323: 570 (1990)) . These and other additional uses for the reagents described herein will become apparent to those of ordinary skill in the art after reading this specification.

BRIEF DESCRIPTION OF THE INVENTION The present invention provides isolated nucleic acid molecules that code for human galacto kinase, as well as nucleic acid molecules that encode missense and senseless mutations, which include mRNAs, DNAs (eg cDNA, genomic DNA, etc.) , as well as antisense analogs thereof and fragments thereof diagnostically and therapeutically useful. The present invention also provides recombinant vectors, such as cloning and expression plasmids useful as reagents in the recombinant production of galactocinease proteins, as well as recombinant prokaryotic and / or eukaryotic host cells which comprise a human galactokinase nucleic acid sequence. The present invention also provides a method for preparing human galactokinase proteins characterized in that it comprises culturing prokaryotic and / or recombinant eukaryotic host cells, which contain a human galactokinase nucleic acid sequence, under conditions that promote the expression of said protein and subsequent recovery of said protein. Another related aspect of the present invention is that of isolated human galactocine proteins produced by said method. In yet another aspect, the present invention also provides antibodies that are directed to (e.g., bind) galacto kinase proteins. The present invention also provides galactocine proteins? isolated human that have a missense or senseless mutation and antibodies (monoclonal or polyclonal) that are specifically reactive with said proteins. The present invention also provides nucleic acid probes and PCR primers which are characterized in that they comprise nucleic acid molecules of sufficient length to hybridize to human galactokinase sequences. The present invention also provides a method for diagnosing human galactokinase deficiency by isolating a nucleic acid sample from an individual and testing the sequence of said nucleic acid sample with the reference gene of the present invention and comparing differences between said sample and the nucleic acid of the instant invention, wherein said differences indicate mutations in the gene of human galactocinae isolated from an individual. The sample can be tested by direct comparison of the sequence (eg DNA sequencing), where the sample nucleic acid can be compared to the galactokinase reference gene by hybridization (eg mobility change tests such as gel electrophoresis of the heteroduplex, SSCP or other techniques such as Northern or Southern blot analysis which are based on the length of the nucleic acid sequence) or other known methods of gel electrophoresis such as RFLP (eg, by restriction endonuclease digestion of a sample amplified by PCR (for DNfi) or PCR-RT (Reverse Transcription PCR) (for RNA)). Alternatively, the diagnostic method comprises isolating cells from an individual or containing genomic DNA and testing said sample (eg, cellular RNA) by in situ hybridization using the DNA sequence of the present invention, or at least one exon, or a fragment containing at least 15, preferably 18, and preferably 21 contiguous pairs of bases as a probe. The present invention also provides an antisense oligonucleotide having a sequence capable of agglutinating with mRNfis coding for human galacto kinase in order to iden tify the galacto kinase mutant genes. The. present invention also provides yet another method to diagnose human galactocinease deficiency which comprises obtaining a serum or tissue sample; allowing such sample to come in contact with an antibody or antibody fragment which binds specifically to a human galactocine demantant protein of the present invention under conditions such that an antigen-antibody complex is formed between said antibody (or fragment thereof). antibody) and said galactokinase mutant protein; and detecting the presence or absence of said complex. The present invention also provides non-human transgenic animals that comprise a nucleic acid molecule encoding human galacto kinase. Methods for the use of such transgenic animals are also provided as models for disease states, mutation and SAR. The present invention also provides a method for treating conditions which are related to the insufficient activity of human galacto kinase which comprises administering to a patient in need thereof a pharmaceutical composition containing the galacto kinase protein of the invention which is effective to supply the Endogenous galactose of a patient and hence alleviate said condition. L? present invention also provides a method for treating conditions which are related to the insufficient activity of human galacto kinase via gene therapy. An additional or reference gene, comprising the non-proliferating galactokinase gene of the instant invention, is inserted into the cells of a patient and the result is that the protein cephilated by the reference gene corrects the defect (eg, galactokinase deficiency). ) thus allowing transfected cells to function normally and alleviate disease conditions (or symptoms).

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 represents the organization of the introns / exonee of the human galactocinease gene. Figure 2 is the genomic DNA sequence (and single-letter abbreviations of the amino acids) for human galactocinae [SEO ID NO: 73. The bold DNA sequence corresponds to the exon regions while the sequences with the Normal typeface corresponds to the introns regions of human galactocinease.

DETAILED DESCRIPTION OF THE INVENTION The present invention relates to human galacto kinase (amino acid and nucleotide sequences) and its use as a diagnostic and therapeutic means. The particular cDNA and the amino acid sequence of human galacto kinase is identified by SEO ID NO: such as is more fully described below. The present invention also relates to the genomic DNA sequence for human galacto kinase [SEQ ID NO:] and also to galactocin to human genes and mutant amino acid sequences [SEQ ID NO: 5 and 6 and their use for diagnostic purposes. In order to describe the present invention more broadly, the following additional terms will be employed, and are intended to be defined as indicated below. Ur "antigen" refers to a molecule that contains one or more epitopes that will stimulate a host's immune system to make a humoral and / or cellular response specific for the antigen. The term is also used interchangeably with "immunogen". The term "epitope" refers to the site on an antigen or hapten to which a specific antibody molecule is agglutinated. The term is also used interchangeably with "antigenic determinant" or "antigenic determinant site". A coding sequence is "operably linked" to another coding sequence when the RNA polymerase transcribes the two coding sequences into an individual mRNA, which is then translated into a single polypeptide having amino acids derived from both coding sequences . The coding sequences do not need to be contiguous with each other as long as the expressed sequence is finally processed to produce the desired protein. "Recombinant" polypeptides refer to polypeptides produced by recombinant DNA techniques; e.g. produced from transformed cells by an exogenous DNA construct encoding the desired polypeptide. "Synthetic" polypeptides are those that are prepared by chemical synthesis. Ur "replicon" is any genetic element (e.g., plasmid, chromosome, virus) that functions as an autonomous unit of DNA replication in vivo; e.g. capable of replication under its own control. Ur "vector" is a replicon, such as a plasmid, phage or cosmid, to which another DNA segment can be attached in order to carry out re-cleavage of the attached segment. A "replication-deficient virus" is a virus in which the functions of excision and / or replication have been altered in such a way that after transfection into a host cell, the virus is not able to reproduce and / or infect addition cells. A "reference" gene refers to the galactokinase sequence of the present invention and is understood to include the different sequence polymorphisms that exist, where there are nucleotide substitutions in the gene sequence, but do not affect the essential function of the gene. product of the gene. Ur "mutant" gene refers to galactocinase sequences different from those of the reference gene wherein substitutions and / or deletions and / or insertions of nucleotides result in a deterioration of the essential function of the gene product such that the levels of galactose in an individual (or patient) are atypically elevated. For example, the substitution of G for A at position 122 of human galactocinease [SEO ID NO: 5] is a missense mutation associated with patients who are deficient in galactose. Another substitution of G for T produces a nonsense codon within the frame at the 80 amino acid position of the mature protein. The result is a truncated protein consisting of the first 79 amino acids of human galactocinease. A "DNA coding sequence" or a "nucleotide sequence encoding" a particular protein, is a DNA sequence which is transcribed and translated into a polypeptide when placed under the control of appropriate regulatory sequences. A "promoter sequence" is a DNA regulatory region capable of agglutinating the RNA polymerase in a cell and initiating the transcription of a downstream coding sequence (3 'direction). For purposes of defining the present invention, the promoter sequence is bound at the 3 'terminus by a translation initiation codon (eg, OTG) of a coding sequence and extended upstream (5' direction) to include the number minimum of bases or elements necessary to initiate transcription at detectable levels over the medium. Within the promoter sequence will be found a transcription initiation site (conveniently defined by mapping with nuclease SI), as well as protein agglutination domains (consensus sequences) responsible for the agglutination of the RNA polymerase. Eukaryotic promoters will frequently contain, but not always, "TATA" boxes and "CAT" boxes. The prokaryotic promoters contain Shine-Dalgarno sequences in addition to the consensus sequences of -10 to -35. The "DNA control sequences" ee collectively refer to the promoter sequences, ribosome binding sites, polyadenylation signals, transcription termination sequences, upstream regulatory domains, enhancers, and the like, which collectively provide the expression (eg transcription and translation) of a coding sequence in a host cell. A control sequence "directs the expression" of a coding sequence in a cell when the RNA polymerase binds to the promoter sequence and transcribes the coding sequence to mRNfl, which is translated into the polypeptide encoded by the coding sequence. A "host cell" is a cell that has been transformed or transfected, or is capable of transformation or transfection by an exogenous DNA sequence. A cell has been "transformed" by means of exogenous DNA when such exogenous DNA has been introduced into the cell membrane. The xenogen DNA may or may not integrate (bind covalently) to the DNA of the chromosomes forming the genome of the cell. In prokaryotes and yeasts, for example, exogenous DNA can be maintained on an episome element, such as a plasmid. With respect to eukaryotic cells, a stably transformed or transinfected cell is one in which the exogenous DNA has been integrated into the chromosome so that it is inherited by the daughter cells through replication of the chromosome. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones that comprise a population of daughter cells containing the exogenous DNA. "Transinfection" or "trane-infected" refers to a process by which cells take foreign DNA and integrate it into its chromosome. Transfection can be carried out, for example, by several techniques in which the cells make DNA (eg calcium phosphate precipitation, electroporation, liposome assimilation, etc.) or by infection, which viruses are used to transfer DNA to cells A "white cell" is a cell (s) that is selectively transferred onto other cell types (or cell lines). A "clone" is a population of cells derived from a single cell or common ancestor by rnitosis. A "cell line" is a clone of a primary cell that is capable of stable growth in vitro for many generations. A "heterologous" region of a DNA construct is an identifiable DNA segment within or attached to another DNA molecule that is not found in association with the other molecule in nature. Thus, when the heterologous region encodes a gene, the gene will usually be flanked by DNP that does not flank the gene in the genome of the animal source. Another example of a heterologous coding sequence is a construct in "where the same coding sequence is not found in nature (eg, synthetic sequences having different codons from the native gene) .The allelic variation or mutational events that occur in nature does not give rise to a heterologous region of DNA as used here. "Conditions that refer to insufficient human galactokinase activity" or a "deficiency in galactokinase activity" represent mutations of the galactokinase protein that affect activity of galactokinase or may affect galactokinase expression or both, such that a patient's galactose levels are atypically elevated, and this definition is intended to atypically cover low levels of galactokinase expression in a patient due to defective control sequences for the galactocine protein, a reference. The present invention provides an isolated nucleic acid molecule encoding a human galactocine protein and substantially similar sequences. The isolated nucleic acid sequences are "substantially similar" if: (i) they are approximately the same length (e.g., at least 80% of the coding region of SEO ID NO: 4); (ii) code for a protein with the same galacto kinase activity (e.g., within an order of magnitude) that the protein encoded by SEO ID NO: 4; and (iii) are capable of hybridizing under moderately stringent conditions for SEO ID N0: 4; or that encode DNA sequences that are degenerate with respect to SEO ID NO: 4. Degenerate DNA sequences encode the same amino acid sequence as SEO ID NO: 4, but have variation (s) in nucleotide coding sequences . Hybridization under moderately stringent conditions is delineated later. Hybridization under moderately stringent conditions can be carried out as follows. The nitrocellulose filters are prehybridized at 65 ° C in a solution containing 6X SSPE, 5X Denhardt solution (10Og Ficoll, 10Og BSA and 10Og Polyvinylpyrrolidone per liter of solution), 0.05% SD and 100 micrograms of tRNA. Hybridization probes are labeled, preferably with radioactive labels (e.g., using the Bios TAG-IT equipment). The hybridization is then carried out for about 18 hours at 65 ° C. The filters are washed in a 2X SSC solution and 0.5% SDS at room temperature for 15 minutes (repeated once). Subsequently, the filters are washed at 58 ° C, dried with air and exposed to X-ray film overnight at -70 ° C with an intensifying screen. Alternatively, the "substantially similar" sequences are substantially the same when 66% (preferably 75% and preferably 90%) of the nucleotides or amino acids coincide over a defined length (eg at least 80% of the SEO coding region). ID NO: 4) of the molecule and the protein encoded by such sequence has the same galactokinase activity (eg within an order of magnitude) such as the protein encoded by SEO ID NO: 4. As used herein, substantially similar refers to sequences that have identity similar to the sequences of the instant invention. A) Yes, nucleotide sequences that are substantially the same can be identified by hybridization or by comparison of sequences. Protein sequences that are substantially the same can be identified by one or more of the following: proteolytic digestion, gel electrophoresis and / or microsequencing. The present invention also provides nucleic acid molecules that encode a missense mutation (SEO ID NO: 5) or a nonsense mutation (SEO ID NO: 6) of the human galactokinase protein and the DNA sequences that are degenerate to SEO ID NO: 5 or 6. Degenerate DNA sequences encode the same amino acid sequence (or termination site) such as SEO ID NO: 5 or 6, but have variation (s) in coding sec? of nucleotides. One means for isolating a nucleic acid molecule for a human galactocinase is to probe human genomic DNA or a collection of cDNA with a natural or artificially designed probe using art recognized procedures (See, for example: "Current Protocole in Molecular Biology", Aus? Bel, FM, et al. (Ede.) Green Publishing Assoc. And 3ohn Uiley Interscience, New York, 1989, 1992). One skilled in the art will appreciate that SEO ID NO: 4 or fragments thereof (comprising at least 15 contiguous nucleotides), is a particularly useful probe. Several probes particularly useful for this purpose are set forth in Table 1, or hybridizable fragments thereof (e.g., comprising at least 15 contiguous nucleotides). It is also appreciated that talee probes can be and preferably are labeled with an analytically detectable reagent to facilitate identification of the probe. Useful reagents include but are not limited to radioactivity, dyes or fluorescent enzymes capable of catalyzing the formation of a detectable product. The probes are thus capable of isolating complementary copies of genomic DNA, cBNA, or RNfI from mammalian, human or other animal sources for related sequences (eg additional members of the family, type and / or subtype) and including tranecriptional elements. regulators and control defined abovemen e as well as other regions of stability, processing, translation and tissue specificity from 5 'and / or 3' regions relative to the coding sequences described in the present invention. The present invention also considers gene therapy. "Gene therapy" means supplementing with genes. That is, an additional copy (eg, reference) of the gene of interest is inserted into the cells of the patients. As a result, the protein encoded by the reference gene corrects the defect (e.g., galactocinease deficiency) and allows cells to function normally, thereby alleviating disease symptoms. The gene therapy of the present invention can occur in vivo or ex vivo. Ex vivo gene therapy requires the isolation and purification of patient cells, the introduction of a therapeutic gene and the introduction of genetically altered cells back into the patient. A deficient replication virus such as co or a nodulated retrovirus can be used to introduce the therapeutic gene (galactokinase) into such cells. For example, the mouse Moloney leukemia virus (flMLV) is a well-known vector in clinical gene therapy trials (see, eg, Bops-Laueri et al., Curr. Opin. Genet. Dev., 3: 102 -109 (1993)). In contrast, live i_n gene therapy does not require isolation and purification of patient cells. The therapeutic gene is typically "packaged" for administration to a patient in liposome or in a replication-deficient virus such as adenovirus (see, eg, Berkner, KL, Curr. Top., Microbiol. Imol., 158: 39 -66 (1992)) or adenovirus-associated vectors (AAV) (see eg Muzyczka, N., Curr. Top, Microbiol.I m? Nol., 158: 97-129 (1992) and US patent 5,252,479"Safe Vector. for Gene Therapy "). Another approach is the administration of the so-called "naked DNA" in which the therapeutic gene is injected directly into the bloodstream or muscle tissue. Cell types useful for gene therapy of the present invention include hepatocytes, fibroblasts, lymphocytes, any cell of the eye (e.g. retina), epithelial and endothelial cells. Preferably the cells are hepatocytes, any eye cells or respiratory (or pulmonary) epithelial cells. Transfection of epithelial (lung) cells can occur via inhalation of a nebulized preparation of DNA vectors in liposomes, protein-DNA complexes or deficient replication adenoviruses (see, eg, US Patent 5,240,846"Gene Therapy Vector for Cystic Fibrosis"). The present invention also contemplates a process for preparing human galactocinease proteins. The non-mutant proteins are defined with reference to the amino acid sequence listed in SEO ID NO: 4 and include variants with a substantially similar amino acid sequence or € 1 has the same galactokinase activity. Additional proteins of the present invention include human galactocmase-binding proteins as set forth in SEO ID NO: 5 or 6. The proteins of the present invention are preferably made by recombinant genetic engineering techniques. The nucleic acids isolated particularly the DNAs can be introduced into expression vectors by operatively ligating the DNA to the necessary expression control regions (e.g., regulatory regions) required for gene expression. Vectors can be introduced into appropriate host cells such as prokaryotic (e.g. bacterial), or eukaryotic (e.g., yeast or mammalian) cells by methods well known in the art (Ausubel, et al., Supra). The coding sequences, which have been prepared or isolated, for the desired proteins, can be cloned into any suitable vector or replicon. Numerous cloning vectors are known to those skilled in the art and the selection of an appropriate cloning vector is a matter of choice. The examples of 23 insertion of a vector, such as the cloning vectors described above. Alternatively, the coding sequence can be cloned directly into an expression vector which contains the control sequences and an appropriate restriction site. In some cases, it may be desirable to produce other mutants or analogues for the galactokinase protein. Mutants or analogs can be prepared by deleting a portion of the sequence encoding the protein by inserting a sequence, and / or by substituting one or more nucleotides within the sequence. Techniques for modifying nucleotide sequences, such as site-directed mtotagenesis, are well known to those skilled in the art. See e.g. , T. Maniatis et al., Above; DNA Cloning, Veis. I and II, above; Nucleic Acid Hybridization, previous. A number of prokaryotic expression vectors are known in the art. See e.g. the patent of E.U.A. Nos. 4,578,355; 4,440,859; 4,436,815; 4,431,740; 4,431,739; 4,428,941; 4,425,437; 4,418,149; 4,411,994; 4,366,246; 4,342,832; see also the British patent applications GB 2,121,054; GB 2,008,123; GB2,007,675; and European application 103,395. Yeast expression vectors are also known in the art. See e.g. the patents of E.U.A. Nos. 4,446,235; 4,443,539; 4,430,428; see also European patent applications 103,409; recombinant DNA vectors for cloning and host cells that they can transform include, but are not limited to, the bacteriophage lambda (E. coli), pBR322, (E.coli), PACYC177 (§ _._coli), pKT230 ( gram-negative bacteria), pGVHOd (negative bacteria), pLAFRI (gram-negative bacteria), pME290 (gram-negative bacteria not E ^ coli). pHV14 (E. coli and Bacillus subtilis), pBD9 (Bacillus), pID61 (Streptomyces), pUC6 (Streptomyces), YIp5 (Saccharomyces), baculovirus system of ineecto cells, a system of insect of rosophila, and YCpl9 (Saccharomyces). See, generally, "DNA Cloning": Vols. I S II, Glover et al. ed. IRL Press Oxford (1985) (1987) and; T. Maniatis et a_l ("Molecular Cloning" Cold Spring Harbor Laboratory (1982) .The EJ gene can be placed under the control of a promoter site, which binds ribosomes (for bacterial expression) and, optionally, an operator (referred to collectively here as "control" elements) so that the DNA sequence encoding the desired protein is transcribed into RNA in the transformed host cell by a vector containing this expression construct.The coding sequence may or may not contain a signal peptide or leader sequence The subunit antigens of the present invention can be expressed using, for example, the tac promoter of E ^ coli or the promoter (spa) and the signal sequence of the protein A gene. can be removed by a bacterial host in a post 77 process translation. See, e.g. the patents of E.U.A. Nos. 4,431,739; 4,425,437; 4,338,397. In addition to the control sequences, it may be desirable to add regulatory sequences that allow to control the expression of protein sequences relative to the growth of the host cell. The regulatory sequences are known to those skilled in the art, and examples include those that cause the expression of a gene to turn on or off in response to a chemical or physical stimulus, including the presence of a regulatory component. Other types of regulatory elements may also be present in the vector, for example, sequence of enhancers. Ur expression vector is constructed so that the particular coding sequence is located in the vector with the appropriate regulatory sequences, the placement and orientation of the coding sequence is transcribed under the "control" of the control sequences (eg polymerase of RNA that binds to the DNA molecule in the control sequences transcribes the coding sequence). It may be desirable that the modification of the sequences encoding the particular antigen of interest achieve this purpose. For example, in some cases it may be necessary to modify the sequence so that it can adhere to the control sequences in the proper orientation; e.g. to maintain the reading frame. The control sequences and other regulatory sequences may be linked to the coding sequence prior to 100,561; 96, 491. pSV2 neo (as described in 3. Mol.Appl. Genet., 1: 327-341) which uses the SV40 late promoter to induce expression in mammalian cells or pCDNAlneo, a vector derived from pCDNAl (Mol Cell Biol. 7: 4125-29) uses the CMV promoter to induce expression. These latter two vectors can be used for transient or stable expression (using resistance to G418) in mammalian cells. The insect cell expression systems, e.g. Drosophila, are also useful, see for example, the applications of PCT UO 90/06358 and UO 92/06212 as well as EP 290,261-Bl. Depending on the expression system and the selected host, the proteins of the present invention are produced by culturing transformed host cells by an expression vector described above under conditions whereby the protein of interest is expressed. Preferred mammalian cells include human embryo kidney cells, monkey kidney (HEK-293 cells), fibroblasts (COS), Chinese hamster ovary (CHO) cells, Drosophila or murine L cells. If the expulsion system secretes the protein in the culture medium, the protein can be purified directly from the medium. If the protein is not secreted, it is isolated from used cells or recovered from cell membrane fractions. The selection of appropriate culture conditions and recovery methods are within the skill in the art.

An alternative method for identifying proteins of the present invention is by constructing gene libraries, using the resulting clones to transform E. coli and pool and label individual colonies using polyclonal or monoclonal anti-galactocine serum. The proteins of the present invention can be produced by chemical synthesis such as solid phase synthesis of peptides, using known sequences of amino acids or amino acid sequences derived from the DNA sequence of the genes of interest. Such methods are known to those skilled in the art. The chemical synthesis of peptides is not particularly preferred. The proteins of the present invention or fragments thereof comprising at least one epitope can be used to produce polyclonal and monoclonal antibodies. If polyclonal antibodies are desired, a selected mammal (eg, mouse, rabbit, goat, horse, etc.) is immunized with the protein of the present invention, or a fragment thereof, capable of producing an immune response (eg, tended by at least an epitope). The serum is collected from the immunized animal and treated according to known procedures. If serum containing polyclonal antibodies is used, can these be purified by immunological affinity chromatography? other known methods. The monoclonal antibodies against the proteins of the present invention, and against the fragments thereof, can also easily be produced by one skilled in the art. The general methodology for making monoclonal antibodies using hybridoma technology is well known. Immortal antibody-producing cell lines can be created by cell fusion, and also by other techniques such as direct transformation of B lymphocytes with oncogenic DNA, or transfection with Epstein-Barr virus. 3; 682.007.6 5; See, e.g. M. Schreier et al., "Hybridorna Techniques" (1980); Hamrnerling et a¿. , "Monoclonal Antibodies and T-cell Hybridomas" (1981); Kennet et al., "Monoclonal Antibodies" (1980); see also the patents of E.U.A. Nos. 4,341,761; 4,399,121; 4,427,783; 4,444,887; 4,452,570; 4,466,917; 4,472,500; 4,491,632 and 4,493,890. Monoclonal antibody panels produced against the antigen of interest, or fragment thereof, can be classified by different properties: e.g. by isotype, epitope, affinity, etc. From here, one skilled in the art can produce monoclonal antibodies specifically reactive with galacto kinase mutant proteins, e.g. the mutation of wrong sense of SEO ID NO: 5 or the mutation without sense SEO ID NO: 6. The monoclonal antibodies are useful in the purification, using immunoaffinity techniques, of the individual antigens against which they are directed. Alternatively, the genes encoding the monoclonal antibodies of interest can be isolated from the hybridomas by PCR techniques known in the art and cloned and expressed in the appropriate vectors. The antibodies of the present invention, whether polyclonal or monoclonal, have additional utility in that they can be used as reagents in immunoassays, RIAs, ELISAs and the like. As used herein, "monoclonal antibody" is understood to include antibodies derived from a species (eg, murine, rabbit, goat, rat, human, etc.) as well as antibodies derived from two (or perhaps more) species (eg. chimeric and humanized antibodies). Chimeric antibodies, in which non-human variable regions bind or fuse to human constant regions (see, e.g., Liu et al., Proc. Nati. Acad. Sci. USA, 84: 3439 (1987)), can also be used in tests or therapeutically. Preferably a monoclonal antibody would be "hunanized" as described in Jones et al., Nature, 321: 522 (1986); Verhoeyen et al., Science, 239: 1534 (1988); Kab.?st et al., 3. I unol., 147-1709 (1991); Queen et al., Proc. Nati Acad. Sci. USA, 88: 34181 (1991); and Hodgson et al., Bio / Technology, 9: 421 (1991). Therefore, the present invention also contemplates antibodies, polyclonal or monoclonal (including chimeric and "humanized") directed to epitopes corresponding to amino acid sequences described herein from human galacto kinase. Methods for the production of polyclonal and monoclonal antibodies are well known, see for example Chapter 11 of Aus? Bel et al. (previous) When the antibody is labeled with an analytically detectable reagent such as radioactivity, fluorescence or an enzyme, the antibody can be used to detect the presence or absence of human galacto kinase and / or its quantitative level. In addition, specific antibodies (polyclonal or monoclonal) for missense or missense mutations of the present invention are useful for diagnostic purposes. A serum or tissue sample (eg, liver, lung, etc.) is obtained and allowed to come in contact with an antibody or antibody fragment which binds specifically to a human galacto kinase protein of the present invention under conditions such that antigen-antibody complex is formed between said antibody (or antibody fragment) and said mutant galactocine protein. The detection of the presence or absence of said complex is within the skill in the art (eg ELISA, RIA, Western Blot analysis, Optical Biosensor- (eg BIAcore - Pharmacia Biosensor, Uppeala, Sweden)) and does not limit the pree invention. The present invention also contemplates pharmaceutical compositions comprising an effective amount of the galactokinase protein of the invention and a pharmaceutically acceptable carrier. The pharmaceutical compositions of drug? Proteinaceous compounds of the present invention are particularly useful for parenteral administration, e.g. subcutaneous, intramuscular or intravenous. Optionally, the protein galactokinase is surrounded by a vesicle bound to a membrane, such as a liposome. Compositions for parenteral administration will commonly comprise a solution of the compounds of the invention or a mixture thereof dissolved in an acceptable carrier, preferably an aqueous carrier. A variety of aqueous vehicles can be employed, e.g., water, pH regulated water, 0.4% ealine solution, 0.3% glycine, and the like. These solutions are sterile and generally free of particulate matter. These solutions can be sterilized by conventional techniques, well known. The compositions may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological conditions such as pH adjusting agents and pH regulators, etc. The concentration of the compound of the present invention in such a formulation can vary widely, e.g. less than 0.5%, usually at or at least 1% up to as much as 15 or 20% by weight and will be selected based primarily on fluid volumes, viscosities, etc., in accordance with the particular mode of administration selected. Thus, a pharmaceutical composition of the present invention can be prepared for intramuscular injection containing 1 ml of sterile buffered water and 50 mg of a compound of the invention. Similarly, a pharmaceutical composition of the present invention could be made for intravenous infusion containing 250 ml of sterile Ringer's solution and 150 mg of a compound of the invention. Current methods for preparing compositions that are amenable to administration are well known or will be apparent to those skilled in the art and are described in more detail in, for example, Re-ington's Pharmaceutical Science, 15th ed., Mack Publishing Company, Easton, Penneylvania. The compounds described in the present invention can be lyophilized for storage and reconstituted in a vehicle acceptable prior to use. This technique has been shown to be effective with conventional proteins and lyophilization and reconstitution techniques known in art can be employed. The physician will determine what will be the most appropriate dosage of the therapeutic agents presentee and will vary with the particular patient under treatment. The physician will generally wish to initiate the treatment with small doses substantially less than the optimum dose of the compound and increase the dosage by small increments until the optimum effect is reached under the circumstances. It will generally be found that when the composition is administered orally, larger amounts of the active agent will be required to produce the same effect as a smaller amount administered parenterally. The therapeutic dosage will generally be from 1 to 10 milligrams per day and may - although this may be administered in several different dosage units. Dop > depending on the condition of the patient, the pharmaceutical composition of the present invention can be administered for prophylactic and / or therapeutic treatments. In the therapeutic application the compositions are administered to a patient who already suffers from a disease in an amount sufficient to cure or at least partially arrest the disease and its complications. In prophylactic applications, the compositions containing the present compounds or a mixture thereof are administered to a patient who is not already in a disease state to improve their resistance. Individual or multiple administrations of the pharmaceutical compositions can be carried out, the physician selecting dosage and standard levels. In any event, the pharmaceutical composition of the invention will provide an amount of compound of the invention sufficient to effectively treat the patient. The present invention also contemplates the use of galactokinase genes of the instant invention as a diagnosis. For example, some diseases are the result of inherited defective genes. These genes can be detected by comparing the sequence of the defective gene with that of a normal gene. Subsequently, one can verify that a "mutant" gene is associated with galactocinease deficiency by measuring galactose. That is, a significant gene would be associated with high (atypical) levels of galactose in a patient. In addition, galactokinase mutant genes can be inserted into a suitable vector for expression in a functional test system (eg, colorimetric tests, MacConkey plate expression, conplementation experiments, eg in a yeast strain or E. coli deficient in galactocinease) as well as other means to verify or identify galactocinease mutations. As an example, the RNA of an individual can be transcribed with the reverse transcriptase to cDNA which can then be amplified by polymerase chain reaction (PCR), cloned into an expression vector of E_j_coli and transformed into a galacto kinase deficient strain. . When grown on indicator MacConkey plates, the galactokinase deficient cells will produce colonies that are white, whereas cells that have been transformed / supplemented with a functional galactocinase gene will be red (see e.g., the Examples section). If the majority of all colonies from an individual are red, then the individual is considered normal with respect to the activity of galactocinaea. If approximately 50% of the colonies is red (the other 50% is white), then that individual is likely to be a carrier of galactocinease deficiency. If most colonies are white, then that individual is likely to have a galactocinease deficiency. Once the "mutant" genes are identified, the population can be classified by portadoree of the "mutant" galactocinease gene. (A carrier is an apparently healthy person whose chromosomes contain a "mutant" galactocine gene that can be passed on to their offspring). In addition, monoclonal antibodies that are specific p >The galacto kinase proteins can be used for diagnostic purposes as described above. Individuals that carry mutations in the human galactocinease gene can be detected at the DNA level by a variety of techniques. Nucleic acids used for diagnosis (genomic DNA, mRNA, etc.) can be obtained from cells of a patient, such as from blood, urine, saliva, tissue biopsies (eg chorionic hair sampling or cell removal) of arnniotic fluid) and autopsy material. Genomic DNA can be used directly for detection or can be amplified enzymatically using PCR, ligase chain reaction (LCR), chain removal amplification (SDA), etc. (see eg, Saiki et al., Nature, 324: 163-266 (1986), Bej, et al, Crit. Rev. Biochem. Molec. Biol., 26: 301-334 (1991), Birkenmeyer et al., 3. Virol. Meth., 35: 117-126 (1991), Van Brunt, J. Bio / Technology, 8: 291-294 (1990)) prior to analysis. RNA can also be used for the same purpose. Reverse transcription of the RNA and amplification can be done at the same time with RT-PCR (reverse transcriptase polymerase chain reaction) or reverse transcription to a non-amplified cDNA. As an example, PCT primers complementary to the nucleic acid of the instant invention can be used to identify and analyze galacto kinase mutations. For example, deletions and insertions can be detected by a change in the size of the amplified product compared to the normal galactocinase genotype. Point mutations can be identified by hybridizing the amplified DNA to RNA (of the present invention) of radiolabeled galactcin kinase or alternatively, galactokinase antisense DNA sequences (of the present invention). Perfectly paired sequences can be distinguished from several duplexes by digestion with RNase A or by differences in melting temperatures (Tm). Such a diagnosis would be particularly useful for prenatal and even neonatal tests. In addition, point mutations and other differences in the sequences between the reference gene and the "mutant" genes can be identified by other well-known techniques, e.g. direct DNA sequencing, single-strand conformation polymorphism (SSCP; Orita, et al., Genoroics, 5_: 874 ~ 879 (1989)). For example, a sequencing primer with a two-stranded PCR product or a single-stranded template molecule generated by a modified PCT is used. The determination of the sequence is carried out by conventional methods with radioactively labeled nucleotides or by automatic sequencing procedures with fluorescent labels. Cloned segments of DNA can also be used as probes to detect specific segments of DNA. The sensitivity of this method is greatly improved when combined with PCR. The presence of nucleotide repeats can be correlated with a change in galactokinase activity (change of cause) or serve as a marker for several polymorphisms. Genetic testing based on differences in DNA sequence can be carried out by detecting alteration in the electrophoretic mobility of DNA fragments in gels with or without denaturing agents. Small deletions and sequence insertions can be visualized by high-resolution gel electrophoresis. The DNA fragments of different sequences can be distinguished on the formamide gradient denaturing gels in which the mobilities of the different DNA fragments are delayed in the gel at different positions in accordance with their specific melting or partial melting temperatures (see , eg Myers, et al., Science, 230: 1242 (1985)). In addition, sequence alterations, in particular small screenings, can be detected as changes in the migration pattern of the DNA heteroduplexes in non-denaturing gel electrophoresis (Vg heteroduplex electrophoresis) (see, eg, Nagamine et al., Am. 3. H?. Genet., 45: 337-339 (1989)). Sequence changes at specific sites can also be revealed by nuclease protection assays, such as RNase and SI protection or the chemical cleavage method (e.g., Cotton, et al., Proc. Nati. Acad. Sci. USA, 85: 4397-4401 (1985)). Thus, the detection of a specific DNA sequence can be achieved by methods such as hybridization (eg heteroduplex electroporation, see, Uhite et al., Genornics, 1.2: 301-306 (1992), RNAse protection (eg Myers et al. , Science, 230: 1242 (1985)) chemical composition (eg Cotton et al., Proc. Nati, Acad. Sci. USA, B_5_: 4397-4401 (1985))), direct DNA sequencing, or the use of enzymes of restriction (eg restriction fragment length polymorphisms (RFLP) where variations in the number and size of restriction fragments can indicate-insertions, deletions, presence of nucleotide repeats and any other mutation which creates or destroys a restriction endonuclease sequence). Southern blot analysis of genomic DNA can also be used to identify deletions and large insertions (e.g., greater than 100 base pairs). In addition to conventional gel electrophoresis and DNA sequencing, mutations (eg microsupreee, aneuploidies, translocations, inversions) can also be detected by in situ analysis (See eg Keller et al., DNA Probes, 2nd Ed., Stockton Prees, New York , NY, USA (1993)). That is, the DNA (or RNA) sequences in the cells can be analyzed to find mutations without isolation and / or immobilization on a membrane. In-situ fluorescence hybridization (FISH) is currently the most commonly applied method and numerous reviews of FISH have appeared. See, e.g. Trachuck, et al., Science, 250: 559-562 (1990) and Traek, et al., Trends. Genet , _7: 149-154 (1991) which are incorporated herein by reference for background purposes. Hence, using nucleic acids based on the structure of specific genes, e.g., galacto kinase, diagnostic tests for galactokinase deficiency can be developed. In addition, some diseases are the result of, or are characterized by, changes in gene expression that can be detected by changes in the rnRNA. Alternatively, the galacto kinase gene can be used as a reference to identify individuals expressing a decreased level of galacto cinase, e.g. by Northern blot analysis or in situ hybridization. Appropriately defining the hybridization conditions is within the skill in the art. See, e.g. , "Current Protocols in Mol. Biol." Vol. I to II, Wiley Interscience. Ausubel, et al ^ (ed) (1992). Probing technology is well known in the art and it is appreciated that the size of the probes can vary widely but it is preferred that the probe be at least 15 nucleotides in length. It is also appreciated that such probes can be and are preferably labeled with an analytically detectable reagent p >; to facilitate the identification of the probe. Useful reagents include, but are not limited to, radioactivity, dyes or fluorescent enzymes capable of catalyzing the formation of a detectable product. As a general rule, as long as the conditions of hybridization are rigorous, the genes that will recover are most closely related. Also within the scope of the present invention are antisense oligonucleotides based on the sequences described herein for human galactocinae. Synthetic oligonucleotides or related antisense chemical structural analogs are designed to specifically recognize and agglutinate a target nucleic acid encoding galactocinase and its mutations. The general field of antisense technology is illustrated by the following descriptions which are incorporated herein by references for background purposes (Cohen, 3.S. Trends in Pharrn.Sci., 10: 435 (1989). And Ueintra? B , HM Scientific American, Jan. (1990) on page 40. Transgenic, non-human animals can be obtained by transfecting fertilized eggs or appropriate embryos of a host with nucleic acids encoding the human galactokinase described herein, see for example, U.S. Pat. 4,736,866; 5,175,385; 5,175,384 and 5,175,386.

The resulting transgenic animal can be used as a model for the study of galactocinasa. Particularly, useful transgenic animals are those that exhibit a detectable phenotype associated with receptor expression. The drugs; they can then be classified by their ability to reverse or exacerbate the relevant phenotype. The present invention also contemplates operatively ligating the gene encoding p > for the receptor to regulatory elements which respond differentially to various temperature or metabolic conditions, effectively turning on or off the phenotypic expression in response to those conditions. Although not necessarily limiting of the present invention, the following are illustrative experimental data of this invention.

EDE? PLO I Purification of human galactokinase from placental tissue Galactokinase (galK) is obtained from human placenta as described by Stambolian et al. Biochi Biophys Acta, 831: 306-312 (1985)) which is incorporated herein completely for reference. In essence, the human placenta tissue (obtained within the first hour after delivery) is homogenized. centrifuge and the supernatant is absorbed in DEAE-Sephacel *. The material is eluted, precipitated with ammonium sulfate and then run through a column that separates by size (Sephadex G-100 SF *). Lae active fractionated together are concentrated. The purified protein is obtained by separating by gel electrophoresis of SDβ-polyacrylamide and afterwards an Uestern blot analysis is made using normal techniques (see Laemmli, Nature, 227: 680-685 (1970), or LeGendre et al., Biotechniques, 6,154 (1988)). Small amounts of galacto kinase were isolated (rnicrogramoe) from multiple rounds of protein purification. After digestion with trypsin peptide, 7 peptide sequences were eventually isolated and identified. The three largest fragments are presented below: [S? Q ID N0: 1] Val Asn Leu He Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu Pro Met Ala Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg [SEO ID NO: 2] His He Gln Glu His Tyr Gly Gly Thr Ala Thr Phe Tyr Leu Ser Gln Ala Ala Asp Gly Ala Lys [SEO ID NO: 3] Wing Gln Val Cye Gln Gln Wing Glu His Ser Phe Wing Gly Met Pro Cys Gly He Met Asp Gln Phe He Ser Leu Met Gly Gln Lys The fragments were compared with sequences of peptides encoded by the cDNAs, where the cDNAs were partially sequenced. The cDNAs (also known as expressed expression marks or ESTs) were obtained from Human Genome Sciences, Inc. (Rockville, MD, USA). The best alignments occurred with an EST sequence from a collection of osteoclas stromal cells »human take (SEO ID N0: 1 showed 100% identity on and contiguous IB amino acids) and an EST sequence from a collection of Human pituitary (SEO ID NO: 2 showed 95.5% identity over 22 contiguous amino acids) A full-length cDNA was identified and sequenced from the collection of osteoclato-to-human stromal cells (SEO ID NO: 4) in? ABI 373A automatic sequencer. The sequence was confirmed in both chains. The correspondence of the amino acid sequence (SEO ID NO: 4) was compared against the peptide fragments identified above. The SEO ID N0: 1 corresponds to amino acids 38-68 of the complete human galactocine protein. Similarly, SEO ID N0e: 2 and 3 correeponden to amino acids 367-388 and 167-195 respectively, of human galactocinease.

Analysis of the human galactokinase gene A comparison of the sequence for human galacto- cinase with that of E. coli galacto- cinase (Debouck et al, Nuc Acid Ros., 1_3_: 1841-1853 (1985)) shows 61% similarity and 44. 5% identity A subsequent comparison with another reported gene of human galactocinaea (GK2) (Lee et al., Proc.

Nati Acad. Sci. USA, 89: 10887-10891 (1992) m? Etra 54% and 34.6% identity at the level of arninoacid. Furthermore, the GK2 gene is mapped to human chromosome 17, position q24 as determined by fluorescence in situ hybridization (FISH) analysis. The SEO ID NO: 4 hybridizes against a Northern blot containing human messenger RNA from placenta, brain, skeletal muscle, kidney, intestine, heart, lung and liver conformed to normal procedures (see, eg Sanbrook et al., Molecular-Cloning: A Laboratory Manual, 2nd ed., Cold Spring Hartor Laboratory Press, 1989). The strongest hybridization was with human liver and lung tissue.

Completion of galacto cinase: The SEO ID NO: 4 was subcloned into an E ^ coli vector, plaemido pBluescript [Stratagene]. When transformed into C600K, a galactocine-deficient strain, the transformed E ^ coli grew on MacConkey agar plates containing 1% galactose (and ampicillin @ 50ug / ml for plasmid selection) and produced brick-red colonies, indicating fermentation of sugar. Specifically, the red color is due to the action of acids produced by the fermentation of galactose, bile salts and the indicator (neutral red) in MacConkey medium.

Expression in mammalian cells SEQ ID NO: 4 was also subcloned into COS-1 cells CATCC CRL] 650]. The cells were transfected, cultured and prepared. The Used ones were tested by a 4C galacto kinase assay as described in Stambolian et al. (Exp. Eye Res., 38: 231-237 (1984)) which is fully incorporated herein for reference. When expressed in transiently transfected COS cells, the galactokinase activity was ten times higher than in the control levels (6600 vs. 640 counts per minute - repeated three times). These results definitely confirm that the SEO ID NO: 4 encodes a complete, biologically active gene of human galactocinease. The nucleic acid molecule of the invention can also be subcloned into an expression vector to produce high levels of human galacto kinase (either fused to another protein, eg operably linked to the 5 'end with another coding sequence, or non-fused) in transfected cells. For mammalian cells, the expression vector would optionally encode a neo-icine resistance gene to select transinfectants based on the ability to grow in G418 and a dihydrofolate reductase gene which allows for the amplification of the transfected gene in cells DHFR-. The plasmid could then be introduced into host cell lines e.g. CHO ACC98, a DHFR non-adherent cell line adapted to grow in serum free medium and 293 human embryonic kidney cells (ATCC CRL) 573) and cell lines could then be selected for resistance to G418.

Human galactocinease gene - genomic sequence A coding region of a full-length genomic galactocine gene was identified from a human genomic collection (made from placental tissue) of lambda phage (Lambda Fix II) using the galK cDNA as a probe. An isolated clone, designated clone 17, was deposited on May 3, 1995 with the American Type Culture Collection (ATCC: American Type Culture Collection), Rockville, MD, USA, under accession number ATCC 97135 and has been accepted as a patent deposit, in accordance with the Budapest Treaty of 1977 that governs the deposit of microorganisms for the purposes of the patent procedure. The coding region of the genomic gene is divided into at least 8 isolated exons from four DNA fragments. The array is plotted in Figure 1. The DNA sequence is determined using multiple PCR priming oligonucleotides corresponding to the galK sequence of cDNA (eg in correspondence to the galK genomic exons) as well as subsequently designed PCR primer oligonucleotides corresponding to to non-coding regions (eg galK genomic introns). Thus, the structure of the galactokinase genomic gene is summarized later in Table 1 (see also Fig. 2 and SEO ID N0: 7): TABLE 1 Genomic Galactokinase Amino Acid Generator # / PCR Exon * Encoded [SEO ID NO] 1 1-55 3333 / [8] 3334 / [9] 3598 / [10] 3599 / [ll] 2 56-118 1888 / Í12] 3332 / [13] 3604 / Q4] 3605 / C15] 3 119-158 3331 / [16] 3606 / [17] 4 159-204 1657 / Í18] 3034 / [19] 5 205-264 3330 / [20] 3607 / [21] 6 265-315 1539 / [22] 2665 / [23] 316-269 1891 / [24] 2665 / Í25] 8 370-392 2665 / Í26] 2666 / [27] 2667 / [28] Gen / galactinase deficiency marker A fibroblast cell line (GM00334) derived from a patient with galactocinease deficiency was obtained from the Coriell Institute for Medical Research, Haddon Avenue 401, Camden, New Jersey, 08103. Total RNA was isolated from cells cultured using RNAZOL equipment for RNA isolation (Biotecx, Houston, Tx). Reverse transcription of cytoplasmic DNA (1 ug) was performed with initiator oligonucleotides 1823 [SEO ID NO: 29] and 1825 [SEO ID NO: 303. The sample was amplified by 35 cycles at 94 ° C for 1 minute, 50 ° C dur-ante one minute and 72 ° C for 7 minutes. The DNA product was purified electrophoretically, ligated to the TA cloning vector (Invitrogen) and sequenced. Twelve cDNAs were sequenced in total (representing PCR products cloned from multiple independent PCR reactions). This procedure was also repeated with fibroblasts cultured from normal controls (e.g., people who did not exhibit galactocinease deficiency). A comparison with normal controls identified a single base substitution of A by G at position 122 of the "normal" human galactocinease gene [SEO ID NO: 4]. The result is a missense mutation in amino acid 32 from Val to Met [SEO ID N0: 5]. The change of G by A creates an Mscl endonuclease restriction site (e.g., TGG CCA) in the mutant allele. This restriction site is then used to quickly classify the mutant allele in the parents of the patient with galactokinase deficiency. In essence, the 3l exon encoding the galactocine 1 to 5 residues (eg exon 1, see Table 1) was cloned from a collection of genomic lambda phage and its DNA sequence was determined, including a portion of the flanking sequences of introns. The oligonucleotide primers (X2-50UT [SEO ID NO: 31] and X2-30UT [SEO ID NO: 32] were designed to hybridize the introns sequences for the amplification of a 346 bp DNA fragment of the genomic DNA. analyzed the point mutation in the PCR product via RFLP, that is, in the presence of a new Mscl site created as detected by the 1.5% agarose gel electrophoresis.A "normal" allele remains uncut with the Mscl enzyme, and in this way, it migrates as a 346 bp fragment on an agarose gel.The PCR product from the patient with galactokinase deficiency (eg the change from A to G) is segmented with Mscl, resulting in two fragments. of 193 and 153 bp respectively, the absence of the 346bp fragment indicates that the patient is homozygous for this allele.In contrast, the PCR products from the parents of this patient, followed by an Mscl digestion, gave as a result three fragments (346, 193 and 153 bp) which is c He was consistent with a heterozygous pattern for the change from A to G. This was, both parents were carriers of the same mutation. To determine if the missense mutation results in decreased enzyme activity, a cDNA clone containing the change from A to G in COS cells is subcloned and the galactokinase activity is tested as previously described. COS cells transfected with cDNA encoding the wrong sense mutation have the same level of galactokinase activity as COS host cells, namely 0.02 units / ug protein. In contrast, COS cells transfected with the non-mutant cDNA of galactocinaea [SEO ID NO: 4] have an activity fifty times higher compared to host COS cells (e.g., control). These results support the substitution of Val32 to Met32 as the cause of the decreased enzymatic activity. Another mutation was discovered in an unrelated patient who had cataracts and was diagnosed with a galactocinease deficiency (the galactokinase activity was found to be close to zero). Genomic DNA was isolated from lines of lymphoblastoid and sec-enceded cells by automated sequencing on an ABI 373A sequencer. This resulted in a single substitution of T by G in nonsense codon within the framework (e.g., TAG) in the amino acid with position 80 [SEO ID NO: 6]. This mutation causes the premature termination of human galactocinasa, resulting in a truncated protein of 79 amino acids that would be expected to be non-functional. (The genomic DNA of this patient's parents was heterozygous for this mutation and therefore, did not have a galactocinease deficiency).

The above description and examples fully describe the invention including the modalities of the isma. Those skilled in the art will recognize or be able to determine using no more than a routine experimentation protocol, many equivalents to the specific embodiments of the present invention. Such equivalents are within the scope of the following claims. SEQUENCE LIST (1) GENERAL INFORMATION (i) APPLICANT: Bergsma, Derk, 3. Stambolian, Dwight (ii) TITLE OF THE INVENTION: Human Galactokinesa Gene (iii) SEQUENCE NUMBER: 32 (iv) ADDRESS FOR CORRESPONDENCE: (A) RECIPIENT: Smithkline Beecham Corp. / Corporate Intellectual Property (B) STREET: 709 Swedeland Road / UU2220 (C) CITY: King of Prussia (D) STATE: Penneylvania (E) COUNTRY: UNITED STATES OF AMERICA (F) POSTAL CODE: 19406 -0939 (v) LEGIBLE COMPUTER FORM: (A) TYPE OF MEDIUM: Flexible Disk (B) COMPUTER: compatible with IBM PC (C) OPERATING SYSTEM: PC-DOC / MS-DOS (D) PROGRAMS: Patentln Relay - # 1.0, Version # 1.30 (vi) COMMON DATA OF THE APPLICATION: (ON APPLICATION NUMBER: (2) INFORMATION FOR SEO ID NO: 2: (.,) CHARACTERISTICS OF THE SEQUENCE: (tt) LENGTH: 22 amino acids (B) TYPE: amino acid (O) TYPE OF CHAIN: individual (D) TOPOLOGY: linear (Ji) TYPE OF MOLECULE: protein (>: i) DESCRIPTION OF SEQUENCE: SEO ID NO: 2: His He Gln Glu His Tyr Gly Gly Thr Ala Thr Phe Tyr 1 5 10 Leu Ser Gln Ala Ala Asp Gly Ala Lys 15 20 (2) INFORMATION FOR SEO ID NO: 3: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 29 amino acids (B) TYPE: amino acid (C) TYPE OF CHAIN: individual (D) TOPOLOGY: linear (ii) TYPE OF MOLECULE: protein (xi) DESCRIPTION OF THE SEQUENCE : SEO ID NO: 3: Wing Gln Val Cys Gln Gln Wing Gl? His Ser Phe Ala Gly 1 5 10 Met Pro Cys Gly He Met Asp Gln Phe He Ser Leu Met 15 20 25 Gl > > Gln Lye (2) INFORMATION FOR SEO ID NO: 4: (i) CHARACTERISTICS OF THE SEQUENCE: (P) LENGTH: 1349 paree of bases (I) TYPE: nucleic acid (C) TYPE OF CHAIN: individual (E) TOPOLOGY: linear (Ü) TYPE OF MOLECULE: cDNA (xi) FEATURE: (A) NAME / KEY: CDS (B) LOCATION: 29..1204 (xi) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 4: GAAGTCGGCA CGAGTGCAGG C3C3C37C? TG GCT GCT TTG? G? C? G CC. C? G 52 Met Ala Ala Leu Arg Gln? R; Gln: 5 GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GTC TTC 100 Val Wing Glu Leu Leu Wing Glu Wing Arg Arg Wing Phe Arg Glu Glu Phe 10 15 20 GGG GCC GAG CCC GAG CTG GZC G7G TCA GCG CCG 148 GGC CGC GTC AAC CTC Gly? The Glu Pro Glu Leu Wing Val Ser Wing Pro Gly Arg Val Asn Leu 35 40 ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT A7G GCT 196 He Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu Pro Mee Wing 45 5s0u 55 CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG 244 Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu 60 65 ™ GTG TCT CTC CTC ACC ACC TCT GAG GGT GCC GAT GAG CCC CAG CGG CTG 292 Val Ser Leu Leu Thr Thr Ser Glu Gly Ala Asp Glu Pro Gln Arg Leu 75 '80 85 CAG TTT CCA CTG CCC ACÁ GCC CAG CGC TCG CTG GAG CCT GGG ACT CCT 340 Gln Phe Pro Leu Pro Thr Ala Gln Arg Ser Leu Glu Pro Gly Thr Pro 90 95 100 CGG TGG GCC AAC TAT GTC AAG GGA GTG ATT CAG TAC TAC CCA GCT GCC 388 Arg Trp Wing Asn Tyr Val Lys Gly Val He Gln Tyr Tyr Pro Ala Wing 105 110 115 120 CCC CTC CCT GGC TTC AGT GCA GTG GTG GTC AGC TCA GTG CCC CTG GGG 436 Pro Leu Pro Gly Phe? Er Wing Val Val Val Ser Ser Val Pro Leu Gly 125 130 135 GGT GGC CTG TCC AGC TCA GCA TCC TTG GAA GTG GCC ACG TAC ACC TTC 484 Gly Gly Leu Ser Ser Wing Ser Leu Glu Val Wing Thr Tyr Thr Phe 140 145 150 CTC CAG CAG CTC TGT CCA GAC TCG GGC ACÁ ATA GCT GCC CGC GCC CAG 532 Leu Gln Gln Leu Cys Pro Asp Ser Gly Thr lie Ala Ala Arg Ala Gln 155 160 165 GTG TGT CAG CAG GCC GAG CAC AGC TTC GCA GGG ATG CCC TGT GGC ATC 580 Val Cys Gln Gln Wing Glu His Ser Phe Wing Gly Met Pro Cys Gly He 170 175 180 ATG GAC CAG TTC ATC TCA CTT ATG GGA CAG AAA GGC CAC GCG CTG CTC 628 Met Asp Gln Phe He Ser Leu Met Gly Glr. Lys Gly His Ala Leu Leu 185 190 195 200 ATT GAC TGC AGG TCC TTG GAG ACC AGC CTG GTG CCA CTC TCG GAC CCC 676 He Asp Cys Arg Ser Leu Glu Thr Ser Leu Val Pro Leu Ser Asp Pro 205 210 215 AAG CTG GCC GTG CTC ATC ACC AAC TCT AAT GTC CGC CAC TCC CTG GCC 724 Lys Leu Wing Val Leu He Thr Asn Ser Asn to Arg His Ser Leu Wing 220 225 230 TCC AGC GAG TAC CCT GTG CGG CGG CGC CAA TGT GAA GAA GTG GCC CGG 772 Ser Ser Glu Tyr Pro Val Arg Arg Arg Gln Cys Glu Giu Val Ala Arg 235 240 245 GCG CTG GGC AAG GAA AGC CTC CGG GAG GTA CAG CTG GAA GAG CTA GAG 820 Wing Leu Gly Lys Glu Ser Leu Arg Glu Val Gln Leu Giu Glu Leu Glu 250 255 260 GCT GCC AGG GAC CTG GTG AGC AAA GAG GGC TTC CGG CGG GCC CGG CAC 868 Ala Ala Arg Asp Leu Val Ser Lys Glu Gly Phe Arg Arg Ala Arg His 265 270 275 280 GTG GTG GGG GAG ATT CGG CGC ACG GCC CAG GCA CC3 GCC GCC CTG AGA 916 Val Vile Gly Glu He Arg Arg Thr Ala Gin Ala Ala Ala Ala Ala Leu Arg 285 290 295 CGT GGC GAC TAC AG GCC TTT GGC CGC CTC ATG GTG GAG AGC CAC CGC 964 Arg Gly Asp Tyr Arg Wing Phe Gly Arg Leu Met Val Glu Ser His Arg 300 305 310 TCA CTC AGA GAC GAC TAT GAG GTG AGC TGC CCA GAG CTG GAC CAG CTG 1012 Ser Leu Arg Asp Asp Tyr Glu Val Ser Cys Pro Glu Leu Asp Gln Leu 315 320 325 GTG GAG GCT GCG CTT GCT GTG CCT GGG G7T TAT GGC AGC CGC ATG ACG 1060 Val Glu Ala Ala Leu Ala Val Pro Gly Val Tyr Gly Ser Arg Met Thr 333 - 335 340 GGC GGT GGC TTC GGT GGC TGG ACG GTG CT CTG GAG GCC TCC GCT 1108 Gly Gly Gly Phe Gly Gly Cys Thr Val Thr Leu Leu Glu Wing Be Ala 345 350 355 360 GCT CCC CAC GCC ATG CGG CAC ATC CAG GAG CAC TAC GGC GGG ACT GCC 1156 Ala Pro His Ala Met Arg Kis He Gln Glu His Tyr Giy Gly Thr Ala 365 370 375 ACC TTC TAC CTC TCT CAA GCC GCC GAT GGA GCC AAG GTG CTG TGC TTG 1204 Thr Phe Tyr Leu Ser Gin Wing Wing Asp Gly Wing Lys Val Leu Cys Leu 380 385 390 TGAGGGACCC CCAGGACAGC ACACGGTGAG GGTGCGGGGC CTGCAGGCCA GTCCCACGGC 1264 TCTGTGCCCG GTGCCATCTT CCATATCCGG GTGCTCAATA AACTTGTGCC TCCAATGTGG 1324 AAAAAAAAAA AAAAAAAAAC TCGAG 1349 (2) INFORMATION FOR SEO ID NO: 5: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 1349 base pairs (B) TYPE: nucleic acid (C) TYPE OF CHAIN: double (D) TOPOLOGY: linear ( ii) TYPE OF MOLECULE: cDNA (i) CHARACTERISTIC: (A) NAME / KEY: CDS (3) LOCATION: 29..1204 (xí.) SEQUENCE DESCRIPTION: SEO ID NO: 5: GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG 52 Met Ala Ala Leu Arg Gln Pro Gln 1 5 GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GTC TTC 100 Val Ala Glu Leu Leu Ala Giu Ala Arg Arg Ala Phe Arg Glu Giu Phe 10 15 20 GGG GCC GAG CCC GAG CTG GCC ATG TCA GCG CCG GGC CGC GTC AAC CTC 148 Gly Wing Glu Pro Glu Leu Wing Met Ser Wing Pro Gly Arg Val Asn Leu 2 30 35 40 ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT 19IÍ H Gly Glu His Thr Asp Tyr Asn Gln Gly Leu Val Leu Pro Met Wing 45 50 55 CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG 24 < l, Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu GTG TCT CTC CTC ACC ACC TCT GAG GGT ZCC GAT GAG CCC CAG CGG CTG 292 Val Ser Leu Leu Thr Thr Ser Glu Gly Wing Asp Glu Pro Gln Arg Leu 75 80 85 CAG TTT CCA CTG CCC ACÁ GCC CAG CGC TCG CTG GAG CCT GGG ACT CCT 340 Gln Píie Pro Leu Pro Thr Ala Glu Arg Ser Leu Glu Pro Gly Thr Pro < 0 95 100 CGG TCG GCC AAC TAT GTC AAG GGA GTG ATT CAG TAC TAC CCA GCT GCC 388 Arg Trp Wing Asn Tyr Val Lys Gly Val He Gln Tyr Tyr Pro Ala Wing 105 110 115 120 CCC CTC CCT GGC TTC AGT GCA GTG GTG GTC AGC TCA GTG CCC CTG GGG 436 Pro Leu Pro Gly Phe Ser Wing Val Val Val Ser Ser Val Pro Leu Gly 125 130 135 GGT GGC "CTG TCC AGC TCA GCA TCC TTG GAA GTG GCC ACG TAC ACC TTC 484 Gly Gly Leu Ser Ser Wing Ser Leu Glu Val Wing Thr Tyr Thr Phe 140 145 150 CTC CAG CAG CTC TGT CCA GAC TCG GGC ACA ATA GCT GCC CGC GCC CAG 532 Leu Gln Gln Leu Cys Pro Asp Ser Gly Thr He Ala Wing Arg Ala Gln 155 160 165 GTG TGT CAG CAG GCC GAG CAC AGC TTC GCA GGG ATG CCC TGT GGC ATC 580 Val Cy, j Gln Gln Wing Glu His Ser Phe Wing Gly Met Pro Cys Gly He 171) 175 180 ATG GAC CAG TTC ATC TCA CTT ATG GGA CAG AAA GGC CAC GCG CTG CTC 628 Met Asp Gln Phe He Ser Leu Met Gly Gln Lys Gly His Ala Leu Leu 185 190 195 200 ATT GAC TGC AGG TCC TTG GAG ACC AGC CTG GTG CCA CTC TCG GAC CCC 676 He Asp Cys Arg Ser Leu Glu Thr Ser Leu Val Pro Leu Ser Asp Pro 205 210 215 AAG CTG GCC GTG CTC ATC ACC AAC TCT AAT GTC CGC CAC TCC CTG GCC 724 Lys Leu Wing Val Leu He Thr Asn Ser Asn Val Arg His Ser Leu Wing 220 225 230 TCC AGC GAG TAC CCT GTG CGG CGG CGC CAA TGT GAA GAA GTG GCC CGG 772 Ser Ser Glu Tyr Pro val Arg Arg Arg Gln Cys Glu Glu Val Ala Arg 235 240 245 GCG CTG GGC AAG GAA AGC CTC CGG GAG GTA CAG CTG GAA GAG CTA GAG 820 Wing Leu Gly Lys Glu Ser Leu Arg Glu Val Gin Leu Glu Glu Leu Glu 250 255 260 GCT GCC AGG GAC CTG GTG AGC AAA GAG GGC TTC CGG CGG GCC CGG CAC 868 Ala Ala Arg Asp Leu Val Ser Lys Glu Gly Phe Arg Arg Ala Arg His 265 270 275 280 GTG GTG GGG GAG ATT CGG CGC ACG GCC CAG GCA GCG GCC GCC CTG AGA 916 Val Val Gly Glu He Arg Arg Thr Ala Gln Ala Ala Ala Ala Ala Leu Arg 285 290 295 CGT GGC GAC TAC AGA GCC TTT GGC CGC C C ATG GTG GAG AGC CAC CGC 964 Arg GJ.and Asp Tyr Arg Ala Phe Gly Arg Leu Met Val Glu Ser His Arg 300 305 310 TCA CTC AGA GAC GAC TAT GAG GTG AGC TGC CCA GAG CTG GAC CAG CTG 1012 Ser Leu Arg Asp Asp Tyr Glu Val Ser Cys Pro Glu Leu Asp Gln Leu 315 320 325 GTG GAG GCT GCG CTT GCT GTG CCT GGG GTT TAT GGC AGC CGC ATG ACG 1060 Val Glu Ala Ala Leu Ala Val Pro Giy Val Tyr Giy Ser Arg Met Thr 330 335 340 GGC G3T GGC TTC GGT GGC TGC ACG GTG ACE CTG CTG GAG GCC TCC GCT 1108 Gly Giy Gly Phe Gly Gly Cys Thr Val Thr Leu Leu Giu Wing Being Wing 345 350 355 360 GCT CCC CAC GCC ATG CGG CAC ATC CAG GAG CAC TAC GGC GGG ACT GCC 1156 Ala Pro His Ala Met Arg His He Gln Glu His Tyr Gly Giy Thr Ala 365 370 375 ACC TTC TAC CTC TCT CAA GCC GCC GAT GGA GCC AAG GTG CTG TGC TTG 1204 Thr Phe Tyr Leu Ser Gln Wing Wing Asp Gly Wing Lys Val Leu Cys Leu 380 385 390 TGAGGCACCC CCAGGACAGC ACACGGTGAG GGTGCGGGGC CTGCAGGCCA GTCCCACGGC 1264 TCTGTGOCCG GTGCCATCTT CCATATCCGG GTGCTCAATA AACTTGTGCC TCCAATGTGG 1324 AAAAAAAAAA AAAAAAAAAC TCGAG 1349 (2) INFORMATION FOR SEO ID NO: 6: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 1349 base pairs (B) TYPE: nucleic acid (C) TYPE OF CHAIN: double (D) TOPOLOGY: linear ( ii) TYPE OF MOLECULE: cDNA (xi) FEATURE: (A) NAME / KEY: CDS (B) LOCATION: 29.265 (xi) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 6: GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG 52 Met Wing Ala Leu Arg Gln Pro Gln 1 5 GTC GCG GAG CTG GCC GCC CGC CGA CGA GCC TTC CGG GAG GTC TTC 100 Val Glu Wing Leu Leu Wing Glu Wing Arg Arg Wing Phe Arg Glu Glu Phe 10 15 twenty GGG GCC GAG CCC GAG CTG GCC GTG TCA GCG CCG GGC CGC GTC AAC CTC 148 Gly Wing Glu Pro Giu Leu Wing Val Ser Wing Pro Gly Arg Val Asn Leu 25 30 35 40 ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT 196 He GJ..y Glu His Thr Asp Tyr Asn Gln Giy Leu Val Leu Pro Met Wing 45 50 55 CTG G? G CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG 244 Leu GJLu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu GTG TCT CTC CTC ACC ACC TCT TAGGGTGCCG ATGAGCCCCA GCGGCTGCAG 295 Val Sur Leu Leu Thr Thr Ser 75 TTTCCACTGC CCACAGCCCA GCGCTCGCTG GA TGGGA CTCCTCGGTG GGCCAACTAT 355 GTCAAGGGAG TGATTCAGTA CTACCCAGCT GCCCCCCTCC CTGGCTTCAG TGCAGTGGTG 415 GTCAGCTCAG TGCCCCTGGG GGGTGGCCTG TCCAGCTCAG CATCCTTGGA AGTGGCCACG 475 TACACCTTCC TCCAGCAGCT CTGTCCAGAC TCGGGCACAA TAGCTGCCCG CGCCCAGGTG 535 TGTC GCAGG CCGAGCACAG CTTCGCAGGG ATGCCCTGTG GCATCATGGA CCAGTTCATC 595 TCACTTATGG GACAGAAAGG CCACGCGCTG CTCATTGACT GCAGGTCCTT GGAGACCAGC 655 CTGGTGCCAC TCTCGGACCC CAAGCTGGCC GTGCTCATCA CCAACTCTAA TGTCCGCCAC 715 TCCCTGGCCT CCAGCGAGTA CCCTGTGCGG CGGCGCCAAT GTGAAGAAGT GGCCCGGGCG 775 CTGGGCAAGG AAAGCCTCCG GGAGGTACAA CTGGAAGAGC TAGAGGCTGC CAGGGACCTG 835 GTGAi? CAAAG AGGGCTTCCG GCGGGCCCGG CACGTGGTGG GGGAGATTCG GCGCACGGCC 895 CAGGCAGCGG CCGCCCTGAG ACGTGGCGAC TACAGAGCCT TTGGCCGCCT CATGGTGGAG 955 AGCCACCGCT CACTCAGAGA CGACTATGAG GTGAGCTGCC CAGAGCTGGA CCAGCTGGTG 1015 GAGGCT &CGC TTGCTGTGCC TGGGGTTTAT GGCAGCCGCA TGACGGGCGG TGGCTTCGGT 1075 GGCTGCACGG TGACACTGCT GGAGGCCTCC GCTGCTCCCC ACGCCATGCG GCACATCCAG 1135 GAGCACT.SCG GCGGGACTGC CACCTTCTAC CTCTCTCAAG CAGCCGATGG AGCCAAGGTG 1195 CTGTGCT GT GAGGCACCCC CAGGACAGCA CACGGTGAGG GTGCGGGGCC TGCAGGCCAG 1255 TCCCACGGCT CTGTGCCCGG TGCCATCTTC CATATCCGGG TGCTCAATAA ACTTGTGCCT 1315 CCAATGTGGA AAAAAAAAAA AAAAAAAACT CGAG 1349 (2) INFORMATION FOR SEO ID NO: 7: (i) CHARACTERISTICS OF THE SEQUENCE: (P) LENGTH: 7676 base pairs (E) TYPE: nucleic acid (C) TYPE OF CHAIN: double (E) TOPOLOGY: linear ( ii) TYPE OF MOLECULE: DNA (genomic) (xi) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 4: CCGAGCATCC CGCGCCGACG GGTCTGTGCC GGAGCAGCTG TGCAGAGCTG CAGGCGCGCG 60 TCATGGCTGC TTTGAGACAG CCCCAGGTCG CGGAGCTGCT GGCCGAGGCC CGGCGAGCCT 120 TCCGGGAGGA GTTCGGGGCC GAGCCCGAGC TGGCCGTGTC AGCGCCGGGC CGCGTCAACC 180 TCATCGGGGA ACACACGGAC TACAACCAGG GCCTGGTGCT GCCTATGGTG AGGGGCTGCA 240 CGGCiGAGCCC CTAGCCCGCC GCCGCCTGTC CCGGTCGCCG AGGAGGGCGG GCCTCGGGGA 300 CGCTGGGGGC GAGTTCTTCC CGCGGGAGAT GTGGGGCGGG CAGCTGCGCC TGGAGCACCG 360 GTGCACGGAA GAGTCCCCGG GACAGGCTGT TCCCCACGTT GGAAGGGAGG AAGCGAAGAA 420 GTGGTCCCCA GAGGGTGCGC GGCCGCCTCT TGGCTCAAGC CCGCCCTCTG GGGGCTGGGG 480 CTCCTCGCCT TCAACCTGGG AGCATGTTCC CCTTAAACTG TGAGGCCCTG TGTGCCACGC 540 AGAAGGGGAC ACTCCGCGCC TCCGGCCACC GTGGGGCCCC AACCGCAGAC TGGGCGAAC 600 GTAGCCTTCT GGCCCAGCCC GTTCAATTTA CAGAGGAGGA AACTGAGGCC TAGAGAGGCC 660 CAGTGAACTG CTGGAGGTCA C: "CAGGT TCTTGGCGGG GCTGCGACTT GGGAGTGAGG 720 ACTCCCAGCT TTCAGCGGGG GGCGCTTTCC GCCCCATCTG CAGCTTGGGG AGTGCACAGG 780 TACAGGATGT CCAGAGCCAC CCAAAATGTA AAGGCTTTGG AGCTCCAGTG ATCTGTTTTC 840 CCTTTGGGCT ÍAGCTCTCCC CCCTTGCCCC ACAGCTCAGG GCAGAGTCCA GGTCTGTGCT 900 CCAGCTGCAG CCGCCCCGCC CCTGAAGACC TAAGGGGGCA GGGCTCAAGC CCCCAAGGTC 960 AGCTGGCCCT CAGGATCTTC CCTGCGACGC TGAACCTGGA GGTTCAGAAC CTGATGACTG 1020 TGGAGGCATC AGAACCTCGG CTGGAGGCAG TGTCATTGGA GAGGCTTACT CCAGCTGGCG 1080 GAAGCCTCAC GTACTGCTTG TCTCTCCTGC CAGGCTCTGG AGCTCATGAC GGTGCTGGTG 1140 GGCAGCCCCC GCAAGGATGG GCTGGTGTCT CTCCTCACCA CCTCTGAGGG TGCCGATGAG 1200 CCCCAGCGGC TGCAGTTTCC ACTGCCCACA GCCCAGCGCT CGCTGGAGCC TGGGACTCCT 1260 CGGTGGGCOA ACTATGTC ?? GGGAGTGATT CAGT? CTACC CAGGTATGGG GCCCAGGCCT 1320 GAGCCAAGTC CTCACTGATA CTAGGAGTGC CACCTCACAG CCACAGAGCC CATTCATTTG 1380 TCTGATACAC TGTGGGGAAG GCTTGTAGAG TGGAGCATCC CATTGTACAG ATGAGGAAAC 1440 TGATGCCCCC: AGAAGGTCGG GAACTTGCCC TGGGTTTCCC GTGACCTGAT TGGAGGAGCC 1500 AGGATTTGAA CCCCAGCCTT TTTTCCCTCC AGAGCCCTAA ACCAGGAGGA CAATTAGAAG 1560 TGTCCCAGCA ACCTCAGAGG GTGGGAAAAT GGAGGGGAGT GGGTCCCTTG GGCCAGCAGG 1620 TTGGTGGGGT TCTTGACAAT TGAGA CAC ACCTAGAAAC AGTTGCTAGG CCGTTGCTGC 1680 CCTTCCCGCC AGGACACCTG CCCTTCCTGT CCAATCCTCC CAGGCAGCCT CTCTTACCAT 1740 CACCTGTTCT TTCCCCCTGC AGCTGCCCCC CTCCCTGGCT TCAGTGCAGT GGTGGTCAGC 1800 TCAGTGCCCC GGGGGGGTGG CCTGTCCAGC TCAGCATCCT ^ -AGTGGC CACGTACACC 1860 TTCCTCCAGC AGCTCTGTCC AGGTACCAGC TAGGCCCCAG CCCTGACCCA GCCCTCCTTC 1920 CCTGAGGTCT CCAGGTGGTC CCAGCTTCTA CTATGCCTTA TGGAGGGGGT GGCAGGGAAT 1980 CTCCCTGGAG TGTCATTGAA GCCACTGCTG CTTCCACCAG CCCTAGCCTC CCCACCTCAC 2040 CCTGTACTGC AGACTCGGGC ACAATAGCTG CCCGCGCCCA GGTGTGTCAG CAGGCCGAGC 2100 ACAGCTTCGC AGGGATGCCC TGTGGCATCA TGGACCAGTT CATCTCACTT ATGGGACAGA 2160 AAGGCCAC C GCTGCTCATT GACTGCAGGT TGGGCTCGCT CCCCTCGTCC CCTCCCGCCC 2220 TGCACTCAGC AGCTCCTGGG TGGAGTGTGC CCACTGCCTG GCGCAGCAAG CACACGCTTG 2280 GCCTCGTCAT CTCCCCCATT GTAACTCCAC CCCAGGTCCT TGGAGACCAG CCTGGTGCCA 2340 CTCTCGGACC CCAAGCTGGC CGTGCTCATC ACCAACTCTA ATGTCCGCCA CTCCCTGGCC 2400 TCCAGCGAGT ACCCTGTGCG GCGGCGCCAA TGTGAAGAAG TGGCCCGGGC GCTGGGCAAG 2460 GAAAGCCTCC GGGAGGTACA ACTGGAAGAG CTAGAGGGTG AGAACTGCCA GGGTGCTCTA 2520 TCCTGGACSGC GGCTGTGCTC CCTGCTGGCG CCTCAGTGTG GCCTTGACCC TGCCTGGGAC 2580AGGGGCTTCT GCCATGCTCT CCCCAGTCCC TTCAAACACT GCGCACCCAG 2640 GGTTCCAATC TCAGCAGGGG TGCTTGAAAT CCTAAAATGG TCTTATCTAÁ "TCAGAAAAAT 2700 CATGTTTCCA TTGTGGAAAA TGTAGAAAAG TACAAAGTAG AAAATAATAA GCTATAAGGG 2760 CACTACCCAG AGATAGGCAC TGCTGACATT TTCACGTTTC CTTTCAGTAT TTTTCCACAT 2820 CTGTCTTCAA AGCTGAGTAT ATGTAATATA CATCACTTT CCCCCCCCAC CCCCTTTTTT 2880 TTAAGAGGCA GGGTCTCATT CTGTTGCCCA AGCTGGAGTG TAGTGGTGTG ATCATAGCTT 2940 ACTGCAAACT TGAACTCTTG AGCTCAAGGG ATCCTCCCAG CTCAGCCTTC CAAGTAGCTG 3000 AGATTACAGG TGTGCCACCA TGCCCGGCTA ATTTTTATCT TCGTAAAGAC GGCCTTGTAG_3060_TGTTGCCCAG GATGATCCTG AACTCTGGCC TCAAGAGGTC CTCCTGCCTT GGGCTCCCAA 3120 AGTGTTGGGA TTATAGGCAT GAGCCACTGC GGCCAGCCCA TTTGCCGTGT TTTTTTTTTG 3180 GACACAGAGT TTCGGTCTTG TCACCCATGC TGGAGTGCAA TGGTGCGATC TCAGCTCACT 3240 GTAACCrCTG CCTCCCGGGT TCAAGTGATT CTCCTGCCTC AGCCTCCCGA GTAGCTGGGA 3300 CTACAG3CGC CCGCCACTAC GCCTGGCACA TTTTTTATAG TTCTAGTAGA GACTGGGGTT 3360 TCACCATGTT GGCCAGGCTG GTCTCAAACG CCTGACCTCA GGTGATCCTC CCGCCTCAGC 3420 CTTCCA? AGT GCTGGGATTA CAGGCGTGAG CCATAGTGCC GGTCTCTTTT TTTTTTTTTT 3480 TTAAACTAAA CATAATCTCA GAACCCAGAA CCCTATCTTA TCTTATGCCA TGAAAGGCAT 3540 ATCTCGGCGT GGCTCTTTTT TTTTTTTTTTT CTTTTTTTTTT GGGCGAGGTG GAGGCTTGCC 3600 CTGTTGCCCA GGCTGGAGTG CAGCGGCGCA ATCTCGGTTC ACTGCATCCT CCACCTCCTG 3660 GGTCCAAATG ATCCTCCTGC CTTAGCTTCC TGAGTAGGTG GGATTACTGG AACCCACCAC 3720 CACGCCCAGC CAATTTTTAT ATTTTTAGTA GAGACGGGGT TTCATGTTGG CCAGGCTGGC 3780 CTCGA? CTCC GACCTCGTG ATCTGCCCGC CTCAGCCTCC CAATGTGCTA GGATTACATG 3840 TGTGAGCCAC TGCACCTGGC CTCCGTGTGG CTCTTTAAAG CTCCACAATA TTTTAGCATT 3900 CAGGTGCTCT GTCATTTACT TAACTATTTT CTGATACACC TCACACTGCG ATTAACTTTC 3960 CTTATTTATC TTTTTTATTA TTTATTTATT TATTTATTTG AGACAGAGTC TTGCTCTGTC 4020 ACCCAGGCTG GAGTGCAGTG GCACGATCTC GGCTCACTGC AACCTCTGCC TCCCAGGTTC 4080 AAGTGATGCT CCTGCCTCAG CCTCCTGAGT AGCTAGGATT AGAGGCATGT GCCACCACAC 4140 CTGGCTA? TC TTCGTATTTT TAGCAGAGAT GAGGTTTTAC CATGTTGGTC GGGCTGGTCG 4200 TGAACTCCTG ACCTGGTGAT CTGCCCACCT CAGCCTCCCA AAGTACTGGG ATGACAGGCA 4260 TGAACCACTG TGCCTGGCCA TCTTTTTTTAT TTTTTAAAGA GATGGGTTCT GCTAAGTTGC 4320 CCAGGCTGGA CCTGAACTCT TGGGCTCAAG TAATCTTCTC ACCTAGTCTC CTGGGTAGCT 4380 6T GCAACCAAAG GCACCCGGTT TATCTGCATT CTCTTTTTTT TCTTTGAGAC TGAGTCTTGC 4440 TCTGTAGCCC AGGCTGGAGC GCAGTGGCGT GATCTCGGCT CACTGCAACC TCCGTCTTCA 4500 GGGTTCAAGC AATTCTCCTG CCTCAGCCTC TGGAGTGGCT GGGACTACAG GCGTGTGCCA 4560 CCAGAGCGAG TTAATTTTTT TTTTTTTTTG TATTTTTAGT GGACACTGGG TTTCACTATA 4620 TTGGCCAGGC TGGTCTTGGA CTCCTGACCT CAAGTGATCC GCCTGCCTTG GCCTCCCAAA 4680 GTGCTGGGAT TACAGGCACA GGCGTGAGCC ACTACACCTG GCCTATCTGC ATTCTCTTAA 4740 TAGTTTCTTA GAAATGGATT CTTAGGAGTA GGATTACAGA GTCAAGAGAC ACAAGTTTTG 4800 TAGGCTGGGT GCGGTGGCTC ACGTCTGTGC CTGTAATCCC AGTACTTTAG GAGGCCAAGG 4860 TGGGCAGATT CATTGAGCTC AGGAATTCGA GACCAGCCTG GGCAACATGG CAAAACCCCA 4920 TCTCTAAAGA AATACAAAAA TTAGCCAGGT GTGGTGGTGT GTGCCTGTAG TCCTAGCTAC 4980 TTAGGAGGCT GGGGTGGGAG GATCAATTGA GCCCAGGAGG TTGAGACTGC AGTGAGCTGT 5040 GATTGCACCA TGGCACTCCA GCCTGGGCCT CAAAGTGAGA TCCTGTCTCC AAAACAAAAA 5100 AGATACAAGT ATCCTTAAGG CTCCTGCTAC ACATGGCCAG GAAGGTAGTC TATTGGACAG 5160 TTTTAAGGTC ATTATCAATA TTAGCTCATT TAATTCCCTC CAAAACTCTG TAAAGCACAT 5223 TCTGCTACCA TAGTTGTCAT ATTTTTGATG GGGGAATCTA CAGTGAGAGG CAGTGCTGGG 5280 ATCTGAACCC CATCTGGACA GATTAGCTCC AGGGCCCATG CTCTTGACTG GCTGGCCGCG 534C CTGCCCACAC TGAGTTGTTC CTTCCTGGCA GGGTAGGTGT GCCTATCTCA GGGACACTAG_5400_ACAGCTCCGA GGGACCTCCC TGTCCTTTTC CTTTGTGAAC TGTGTCACGT TCTCCAGAGC 5460 AGGGCTCAGA CCTGCCCTGC CTGCTCTGTG CAGATGCCCT TGGCCAAGGT TTTCACACTG 5520 GAAC / AGTTG GTCCCTCCTC CCCACCCCAG CCTGTCCTTG GCCCTCCTCC AGGTCTCCTT 5580 CTGCATAGGA GCAGCTCACC CTGCCTCCTC CAGAGTCCTG CCCTAGAAGC GCAATCCCTC 5640 TCCTTCCATC CCCTGCCTGG CTGCCTGGCT CCTTCCCTCA GCCTCCAAGA CATGCTCAGT 5700 TTTCTTCCCT CCTAAAACAC CACCCACTGT CTCATTTCCA TTCATTTCTT TCTTTCTTTC 5760 TTTCTTTTTT TTTTTTGAGA GGGAGCCTCA CTCTGTCACC CAGGCTGAAG TGCAGTGGCA 5820 TGATCICCAC TCACTGCAAC CTCCGCCTCC CAGGTTCAAG CAATTCTCCT GCCTCAGCCT 5880 • * CCTGAGTAGC TGGGATTACA GGCGCCTGCC ACGATGCCCG GCTAACTTTT GTATTTTTAG_5940_TAGAGACGGG GTTTCGCCAT GTTGGCCAGG CTGGTCTCGA GCTCCTGACC TCAGGCAATC 6000 TGCCTGCCTC AGCTTCCCAA AGTGCTGGGA TTACAGGTGT GAGCCACCGC GCCCACCCAT 6060 TCATTTCTCA GTCCTTTGAA TCTACTTGCC CCTCCATCCC GCCATGCCAC CTACCCTAAC 6120 AACCTTCCCC CTTAAACCTG CGGGTTTGGC CGGGCGCAGT ACACTGAGTC AGTACTGGTA 6180 CTGACCCAGG TACCCCTCCA GCCTCAGCTC CAGTCAGATG GGACAGCCTG CTGGTCCCTG 6240 GCTGCTT TG CCCCCTCTTC TGGAGCCCCA GCCCTGGAGG CTCCATGTGG CTCAGCAGAA 6300 CTTCTTCTCC TCCTGCTCTG TGGTGGCCTC TTGAGGGCAG CACTCACCTT GGAAAGCATG 6360 GAGTGTTTCA ACCCTCACTG CTCCCTGAAG GACCAAGGTG TCCCATTTTA CAGTCGGGGG 6420 AGGAGGCACT GTGATAAAGG GGCTCTTCAG ACCCACGTCT GAGAGAGCCA GGCTGCGCCG 6480 CCCCCGCGGC CTTCCACCCT TCACCGTCCA GCCAGGGCCA CTGCCATCAC CGCCTGCTGG 6540 TCCTCACAGG CGTCGGGGCC CCAGGCAGTG AGAAGGCGGC TGCTGACTCC TCTTTCCTCC 6600 CCAGCTGCCA GGGACCTGGT GAGCAAAGAG GGCTTCCGGC GGGCCCGGCA CGTGGTGGGG 6660 GAGATTCGGC GCACGGCCCA GGCAGCGGCC GCCCTGAGAC GTGGCGACTA CAGAGCCTTT GGCCGCCTCA TGGTGGAGAG CCACCGCTCA CTCAGGTGAG GCCCTCTGGG CGCCCCGCTC 6780 CTGCCGGGCA CAGGCCGGCC CAGGCCCACC CCTTCAATAT CCTCTCTGCA GAGACGACTA 6840 TGAGGTGAGC TGCCCAGAGC TGGACCAGCT GGTGGAGGCT GCGCTTGCTG TGCCTGGGGT 6900 TTATGGCAGC CGCATGACGG GCGGTGGCTT CGGTGGCTGC ACGGTGACAC TGCTGGAGGC 6960 CTCCGC7GCT CCCCACGCCA TGCGGCACAT CCAGGTGGGC GGGCACCAGG GCCTGGGCGG 7020 GCAGGAGCGG CAGCTTCCCG GGGCCCTGCC ACTCACCCCC AGCCCGCCTC TACAGGAGC 7080 ACTACGXGG GACTGCCACC TTCTACCTCT CTCAAGCAGC CGATGGAGCC AAGGTGCTGT 7140 GCTTGTGAGG CACCCCCAGG ACAGCACACG GTGAGGGTGC GGGGCCTGCA GGCCAGTCCC 7200 ACGGCTCTGT GCCCGGTGCC ATCTTCCATA TCCGGGTGCT CAATAAACTT GTGCCTCCAA 7260 TGTGGTACCT GCCTCCTCTA GAGGTGGGTG TATGCTTGGG TGTCAGAGAA TGGGGGATGT 7320 CAGAACCGCT CCCCTACCCT AGGGGAGCAC CTCTCAGGCC CCAGAAGAAT GGGCAAGGCA 7380 GGGCCTAGCA GTAGCAAAAC CATTTATTAA GTGCAGAACA AAGGCTGGGT CCTTGTGCTG 7440 CTCCCAGCTC TTTGGTTACA AATAGGTTTG GGCCCACAGA GGACGGACCT TGCCCCCTTC 7500 ATGCCTCCCA GGAGACACCT AGCCCCTGCT CTGTGCATGC C3CTGGGCTG GGCCCCCAGC 7560 GGTGCAAGGA TGGAGTAGCT GAGGAGGCTC CGGGAGAGGA GTC3GGAGGA CGCCTAGTGG 7620 GACATTGCGG CGGTGGCGCA GGGTGCGGTC AAGTTTGGAA GAAAC-GTTG GGTC A 7676 (2) INFORMATION FOR SEO ID NO: 8: (L) SEQUENCE CHARACTERISTICS: (Ai LENGTH: 21 base pairs (Bi TYPE: nucleic acid (C i) CHAIN TYPE: individual (Di TOPOLOGY: linear (p) TYPE) OF MOLECULE: DNA yenornico (xi) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 8 AGOCTTCCGG GAGGAGTTCG 0 21 (2) INFORMATION FOR SEO ID NO: 9: (l SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) TYPE OF CHAIN: individual (D) TOPOLOGY: ineal (n) ) TYPE OF MOLECULE: Genomic DNA (xi) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 9: CTGGTÍGTAG TCCGTTTGTT C 21 (2) INFORMATION FOR SEO ID NO: 10: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE : nucleic acid (C) TYPE OF CHAIN: individual (I) TOPOLOGY: linear (ii) TYPE OF MOLECULE: genomic DNA (xi) DESCRIPTION OF SEQUENCE: SEO ID NO: 10; GCCAGCAGCT CCGCGACCTG G 21 (2) INFORMATION FOR SEO ID NO: 11: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) TYPE OF CHAIN: individual (D) TOPOLOGY: linear ( li) TYPE OF MOLECULE: Genomic DNA (XL) DESCRIPTION OF SEQUENCE: SEO ID NO: 11: GCTTCCTCCC TTCCAACGTG G 1 (2) INFORMATION FOR SEO ID NO: .2: (i) CHARACTERISTICS OF THE SEQUENCE: (AT LENGTH: 21 base pairs (B> TYPE: nucleic acid (Cl TYPE OF CHAIN: individual (DI TOPOLOGY: linear (i) :.) TYPE OF MOLECULE: Genomic DNA (xl) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 12: CCCAGGCTCC AGCGZGCGCT G 21 (2) INFORMATION FOR SEO ID NO: 13: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) TYPE OF CHAIN: individual (D) TOPOLOGY: linear ( ii.) TYPE OF MOLECULE: Genomic DNA (i.) DESCRIPTION OF SEQUENCE: SEO ID NO: 13; ACOTCTGAGG GTGCCGATGA G 1 (2) INFORMATION FOR SEO ID NO: 14: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 2.1 base pairs (B) TYPE: nucleic acid (Cl> CHAIN TYPE: i dividual (DÍ TOPOLOGÍA: lineal (iL) TYPE OF MOLECULE: Genoin DNA (XJ.) DESCRIPTION OF SEQUENCE: SEO ID NO: 14: CCCACAGCTC AGGGCAGAGT C 1 (2) INFORMATION FOR SEO ID NO: .1.5: (i!) CHARACTERISTICS OF THE SEQUENCE: (Ai LENGTH: 21 base pairs (B TYPE: nucleic acid (C: CHAIN TYPE: individual (Di TOPOLOGY: linear (Ll ) TYPE OF MOLECULE: Genetic DNA (Xl) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 15 GGACACTTCT AATTGTCCTC C 1 5 (2) INFORMATION FOR SEO ID NO: 5: () SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 pairs of bases (B) TYPE: nucleic acid 10 (C) TYPE OF CHAIN: individual (D) TOPOLOGY: linear (u) TYPE OF MOLECULE: Genomic DNA (XL) DESCRIPTION OF SEQUENCE: SEO ID NO: 16: TGAACTGG TCCATGA7GC C LS 21 (?) INFORMATION FOR SEO ID NO: 17: (H SEQUENCE CHARACTERISTICS: (AT LENGTH: 21 base pairs JU (Bl TYPE: nucleic acid (C> TYPE OF CHAIN: individual (DI TOPOLOGY: linear (n) TYPE OF MOLECULE: Genetic DNA (x) DESCRIPTION OF SEQUENCE: SEO ID NO: 17: ? h AGGGGCACTG AGCTGACCAC C 21 (2) INFORMATION FOR SEO ID NO: 18: (L SEQUENCE CHARACTERISTICS: (ft) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) CHAIN TYPE: individual (D) TOPOLOGY: linear (Ü) ) TYPE OF MOLECULE: genomic DNA (xi) DESCRIPTION OF SEQUENCE: SEO ID NO: 18; CACTTCTACA CATTGGCGCC G 21 (2) INFORMATION FOR SEO ID NO: 19: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (OR TYPE OF CHAIN: individual (D) TOPOLOGY: linear (ii) ) TYPE OF MOLECULE: Genomic DNA (XL) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 19: CT? CGCAGGG ATGCCCTGTG G 21 (2) INFORMATION FOR SEO ID NO: 20: (i) CHARACTERISTICS OF THE SEQUENCE: (AT LENGTH: 21 base pairs (Bl TYPE: nucleic acid (Ci TYPE OF CHAIN: individual. (DI TOPOLOGY: linear (i :) TYPE OF MOLECULE: Genomic DNA (x. \) DESCRIPTION OF SEQUENCE: SEO ID NO: 20; TCATCACCAA CTCTAAT6TC C 21 (2) INFORMATION FOR SEO ID NO: 21: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (OR TYPE OF CHAIN: individual (Dt TOPOLOGY: linear (ix) TYPE OF MOLECULE: genomic DNA (xi) DESCRIPTION OF SEQUENCE: SEO ID NO: 21 TGTCAGCAGT GCCTATCTCT G 21 (2! INFORMATION FOR SEO ID NO: 22: (i) CHARACTERISTICS OF THE SEQUENCE: (A: LENGTH: 21 base pairs (B! TYPE: nucleic acid (C: CHAIN TYPE: individual (D)) TOPOLOGY: linear (ii) TYPE OF MOLECULE: genomic DNA (xi) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 22: AGCAGCGGAG GCCTCCAGCA G 2.1. (2) INFORMATION FOR SEO ID NO: 23: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 2.1 base pairs (B) TYPE: nucleic acid (C) TYPE OF CHAIN: individual (D) TOPOLOGY: linear ( ii) TYPE OF MOLECULE: genomic DNA (xi) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 23: CCTCACCGTG TGCTGTCCTG G 21 (2) INFORMATION FOR SEO ID NO: 24: (j) SEQUENCE CHARACTERISTICS: (A) LENGTH: 2.1 base pairs (I¡¡) TYPE: nucleic acid (C) TYPE OF CHAIN: individual (Ti) TOPOLOGY: linear (ii) TYPE OF MOLECULE: genomic DNA (xi) DESCRIPTION OF SEQUENCE: SEO ID NO: 24, GGCTGCGCTT GCTGTGCCTG G 21 (2) INFORMATION FOR SEO ID NO: 25: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (OR CHAIN TYPE: individual (D) TOPOLOGY: linear ( ii) TYPE OF MOLECULE: genomic DNA (i) DESCRIPTION OF SEQUENCE: SEO ID NO: 25; CCÍCACCGTG TGCTGTCCTG G 21 (2) INFORMATION FOR SEO ID NO: 26: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (Cl CHAIN TYPE: individual (D) TOPOLOGY: linear (ii) TYPE OF MOLECULE: genomic DNA (?: L) DESCRIPTION OF SEQUENCE: SEO ID NO: 26: CCTCACCGTG TGCTGTCCTG G 2.1 (2) INFORMATION FOR SEO ID NO: 27: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH : 21 base pairs (B> TYPE: nucleic acid (Cl TYPE OF CHAIN: individual (D) TOPOLOGY: linear (i:?.) TYPE OF MOLECULE: Genomic DNA (X: L) DESCRIPTION OF SEQUENCE: SEO ID NO: 27; GCGGGACTGC CACCTTCTAC C 21 (2) INFORMATION FOR SEO ID NO: 28: (i) CHARACTERISTICS OF THE SEQUENCE: (A. 'LENGTH: 21 base pairs (B) TYPE: nucleic acid (C: TYPE OF CHAIN: individual (D3 TOPOLOGY: linear ( ij) TYPE OF MOLECULE: Genomic DNA (xi) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 28: CTCAATAAAC TTGTGCCTCC A 21 (2) INFORMATION FOR SEO ID NO: 29: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 23 base pairs (B) TYPE: nucleic acid (C) TYPE OF CHAIN: individual (D) TOPOLOGY: linear ( ii) TYPE OF MOLECULE: Genomic DNA (xi) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 29: CGGATATGGA AGATGGCACC GGG 23 (2) INFORMATION FOR SEO ID NO: 30: 5 (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 22 base pairs (B) TYPE: nucleic acid (C) CHAIN TYPE: individual (D) TOPOLOGY: linear Ul (il) TYPE OF MOLECULE: DNA genórni co (xi) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 30; AGAGCTGCAG GCGCGCGTCA TG 22 (2) INFORMATION FOR SEO ID NO: 1: (i: SEQUENCE CHARACTERISTICS: (A) LENGTH: 19 base pairs (B) TYPE: nucleic acid (C) TYPE OF CHAIN: individual 0 (D) TOPOLOGY : linear (ii) TYPE OF MOLECULE: genomic DNA (xi) DESCRIPTION OF THE SEQUENCE: SEO ID NO: 31: CCGAGCATCC CGCGCCGAC 19 25 (2) INFORMATION FOR SEO ID NO: 32: (i) CHARACTERISTICS OF THE SEQUENCE: (A) ) LENGTH: 20 base pairs (B) TYPE: nucleic acid 0 (C) TYPE OF CHAIN: individual. (D) TOPOLOGY: linear (ii) TYPE OF MOLECULE: Genomic DNA (i) DESCRIPTION OF THE SEQUENCE: SEO ID NO : 32; CAGCTGCCCG CCCCACATCT 0

Claims

NOVELTY OF THE INVENTION CLAIMS 1. - An isolated nucleic acid molecule encoding human genomic galacto kinase, said nucleic acid molecule characterized in that it is selected from the group comprising: (a) a characterized nucleic acid molecule * comprising the sequence as set forth in SEO ID NO:?; and (b) a nucleic acid molecule that differs to p.artir from the nucleic acid molecule of (a) in the sequence d < ? codons due to the degeneracy of the genetic code.
2. A vector characterized in that it comprises the nucleic acid molecule according to claim 1.
3. A recombinant host cell characterized in that it comprises the vector according to claim 2.
4. An isolated nucleic acid molecule characterized because it comprises a DNA sequence encoding nucleotides 29 to 1204 of the SEO ID NO: 5 or nucleotides 29 to 265 of the SEO ID NO: 6.
5. A vector characterized in that it comprises the nucleic acid molecule according to claim 4.
6. The vector according to claim 5 further characterized in that it is a plasmid.
7. A recornbinating host cell characterized in that it comprises the vector in accordance with the claim 5.
8. A method for preparing a human galactocine rotein characterized in that it comprises the reclosing host cell according to claim 7 under conditions that promote the expression of said protein and its recovery.
9. An isolated protein encoded by the DNA sequence according to claim 4.
10 - A monoclonal antibody that is specifically reactive with the protein according to the claim 9.
11. A method for diagnosing conditions associated with human galactocmase deficiency characterized in that it comprises isolating a serum or tissue sample from an individual; let t to the sample come into contact with an antibody or antibody fragments which bind specifically to the human galac + ocmase prolema according to claim 9 under conditions such that an antigen-antibody complex is formed between said antibody or antibody fragment and said galactocmase protein; and detect the presence or absence of said complex.
12. A j-vitro method for diagnosing conditions associated with human galactokinase deficiency characterized in that it comprises isolating a sample of nucleic acid from an individual; testing said sample and the DNA sequence, or corresponding sequence of RNA, encoding a human galactokinase gene; and comparing differences between said sample and said DNA (or RNA) encoding nucleotides 20 to 1204 of SEQ ID NO: 4, wherein said differences indicate mutations in the human galactokinase gene.
13. The in vitro method according to claim 12 further characterized in that sample is RNA which is subsequently amplified by RT-PCR.
14. The in vitro method according to claim 13 further characterized in that testing said sample comprises a digestion with restriction endonuclease.
15, .- The in vitro method according to claim 14 further characterized in that said restriction endonuclease is Mscl.
16 - The irj_ vitro method according to claim 12 further characterized in that testing said sample comprises a hybridization test.
17. - The in vitro method according to claim 16 further characterized in that the hybridization test is heteroduplex electrophoresis which is characterized, because it comprises determining the differential mobility of heteroduplex products in polyacrylamide gels, said heteroduplex products are the result of hybridization between the nucleic acid sample and the DNA sequence, or corresponding RNA sequence, which encodes nucleotides 29 to 1204 of the SEO ID NO:.
18. The i_n vitro method according to claim .1.2 further characterized in that testing said sample comprises gel electrophoresis of restriction fragment length polymorphisms of said nucleic acid sample and the DNA sequence., or corresponding RNA sequence, which encodes nucleotides 29 to 1204 of the SEQ ID NO: 4.
19. The in vitro method according to claim 12 further characterized in that testing said sample comprises DNA sequencing.
20. An in vitro method for diagnosing conditions associated with human galactokinase deficiency characterized in that it comprises isolating cells from an individual containing genomic DNA and testing said sample by in situ hybridization using the DNA sequence encoding nucleotides 29 to 1204 of SEO ID NO: 4, nucleotides 29 to 1204 of SEO ID NO: 5, or nucleotides 29 to 265 of the SEO ID NO: 6; or a fragment encoding at least one exon of said sequence; or a fragment containing at least 15 contiguous base pairs of said sequence as a probe.
21. A non-human transgenic mammal capable of expressing in any one cell thereof. DNA according to claim 4.