WO2000075294A1

WO2000075294A1 - Glycogenin, glycogenin pseudogenes, and uses thereof to determine risk for developing type 2 diabetes

Info

Publication number: WO2000075294A1
Application number: PCT/US2000/015685
Authority: WO
Inventors: Joseph Lomako; Wieslawa Lomako; William J. Whelan; Ronald B. Goldberg
Original assignee: University of Miami
Current assignee: University of Miami
Priority date: 1999-06-08
Filing date: 2000-06-08
Publication date: 2000-12-14
Anticipated expiration: 2001-12-08

Abstract

Nucleic acids with nucleotide sequences related to glycogenin pseudogenes, methods for detecting expression of such pseudogenes by their specific mutations, and methods for using such expression to identify subjects with a predisposition to Type 2 diabetes mellitus, are disclosed.

Description

GLYCOGENIN, GLYCOGENIN PSEUDOGENES, AND USES THEREOF TO DETERMINE RISK FOR DEVELOPING TYPE 2 DIABETES

GOVERNMENT RIGHTS This invention was made with U.S. government support under grant number DK

375000 awarded by The National Institutes of Health. The U.S. government has certain rights in the invention.

BACKGROUND OF THE INVENTION 1. Field of the Invention

This invention relates to human health, particularly in the area of endocrinology and metabolism. Glycogen synthesis may be impaired in some diabetics or people at risk for developing diabetes due to reduced expression of glycogenin. This protein serves as the scaffolding upon which glucose is polymerized, and thereby serves as a substrate for the production of glycogen. In particular, the invention relates to a diagnostic procedure for determining whether a human subject has a predisposition to develop Type 2 diabetes (e.g., he or she has an identified risk for becoming diabetic that is increased as compared to a control group of the general population) or is a candidate for preventative treatment.

2. Description of the Related Art

Maintaining a stable blood glucose level in mammals is an important, multifaceted regulatory process, as chronic hyperglycemia promotes well-recognized adverse consequences. Thus, the level of blood glucose in mammals is ordinarily under strict hormonal control. The principal ways in which excess glucose can be removed from the circulation are the storage of glucose as glycogen and the breakdown of glucose to lactate, or to water and carbon dioxide, in peripheral tissues. When glucose levels rise, insulin is released from the pancreas and most of the excess glucose is deposited in skeletal muscle in the form of glycogen. However, an inability to form glycogen in instances of chronic hyperglycemia may lead to pathologic conditions. Persistent hyperglycemia is a symptom of diabetes mellitus, which afflicts more than 100 million people worldwide.

Patients with diabetes mellitus are divided into two groups: those with deficient insulin secretion due to an autoimmune disorder (Type 1 diabetes) and those who secrete insulin but still develop hyperglycemia (Type 2 diabetes). The latter form accounts for almost 90% of human diabetic patients. A generally accepted principle is that both genetic and environmental factors contribute to the etiology of Type 2 diabetes. The genetics of the disease are complex and despite much effort and achievement during recent years, the basic etiology of Type 2 diabetes has been poorly understood. See Permutt et al. (1998). An exception is maturity-onset diabetes of the young (MODY), which can be caused by an impairment or change in a protein encoded by one of at least five different genes (i.e., at least five different genetic mutations): the enzyme glucokinase enzyme (GCK/MODY2) or the transcription factors hepatocyte nuclear factor 4 alpha (HNF-4α/ MODY1), HNF-lo/MODY3, insulin promoter factor 1 (IPF-1/MODY4), and HNF-lβ/ MODY5. See Habener and Stoffers (1998); Froguel and Nelho (1999). MODY patients, however, constitute only a minority of those with Type 2 disease.

Glycogenin is a muscle protein which plays a crucial role in nonoxidative glucose disposal in mammals, such as humans. Glycogenin is an enzyme which autocatalytically creates on itself a maltosacchari.de primer for glycogen synthase which, together with branching enzyme, forms the mature glycogen molecule. Glycogenin covalently binds glycogen at a 1 :1 molar ration and glycogenin remains an integral part of the growing polysaccharide in all steps of glycogen synthesis.

The first glucose unit in glycogen is joined to glycogenin via an alpha-glucosidic bond to tyrosine at position 194 (Tyr-194, Ν.B. the IUPAC-IUBMB one- and three-letter abbreviations for amino acids with the associated position as numbered in the mature polypeptide are used herein) of glycogenin in a unique carbohydrate-protein linkage. Using UDP-glucose (Ν.B. the IUPAC-IUBMB one- and three-letter abbreviations for nucleotides and their phosphorylated derivatives are used herein), glycogenin builds autocatalytically on itself a maltosaccharide chain long enough to serve as a primer for glycogen synthase which, together with branching enzyme, creates the mature glycogen molecule. When the carbohydrate portion of glycogen is degraded, the glycogenin so released may again form the maltosaccharide primer for synthesis of a new glycogen molecule. These results have been summarized by Alonso et al. (1995).

Characterization of the human sequence encoding glycogenin was described by Barbetti et al. (1996) and Lomako et al. (1996). Prior to the discovery of glycogenin, impaired synthesis of glycogen in skeletal muscle of Type 2 diabetes prompted investigators to examine the activities of, and possible mutations in, the known genes of enzymes responsible for glycogen metabolism in Type 2 diabetics. o convincing difference from normal subjects was found. This suggested that impaired metabolic enzyme function may not be responsible for the lack of muscle glycogen in Type 2 diabetics. Bailey and Lillehoj (1993) proposed glycogenin as a candidate gene for Type 2 diabetes.

It is known that genetic diseases may be caused by pseudogenes, as opposed to mutations in fully translatable genes. Pseudogenes are known to encode mRNAs including premature termination codons. Beutler et al. (1991) have shown that a pseudogene containing a single guanine (G) nucleotide insertion in the 5 '-coding region of the glucocerebrosidase gene causes the expression of mRNA with a frameshift promoting the early termination of a translation product. Total glucocerebrosidase-related mRNA was not increased. According to those authors, this mechanism may participate in the etiology of some cases of Gaucher disease. They have proposed to use this observation as the basis of diagnosis for Gaucher disease. However, the efficacy of this diagnostic approach is not clear, since the expression of glucocerebrosidase pseudogenes was observed in many non-Gaucher's human cell lines (Imai et al., 1993).

Two other examples where pseudogenes may be involved in the etiology of a disease are some variants of Hunter syndrome (deficiency of iduronate-2-sulphatase) (Timms et al., 1997), and a rare inherited syndrome, chronic granulomatous disease, wherein patients' phagocytes fail to generate superoxide, making those patients very susceptible to microbial infection (Gorlach et al., 1997). In both cases, the formation of abnormal mRNA is thought to be due to recombination between the wild type gene and the pseudogene(s).

Furthermore, transcription of pseudogenes for bovine cytochrome b₅ (Cristano et al., 1993), chicken calmodulin (Stein et al., 1993), human glutamine synthase (Chakrabarti et al., 1995), human serotonin 5-hydroxytryptamine receptor (Bard et al., 1995), bovine placental aromatase (Furbap and Nanselow, 1995), and fetal fibroblast growth factor receptor (Well et al., 1997) has been described but no adverse effects have been reported.

SUMMARY OF THE INVENTION

It is an object of the invention to provide products and processes that detect one or more mutations in human glycogenin genes. In particular, these mutations occur and accu- mulate in glycogenin pseudogenes which prematurely terminate translation of the open reading frame (e.g., missense or frameshift mutations that create a premature stop codon).

In one embodiment, the invention relates to the isolation of genomic and cDNA clones corresponding to the human gene encoding the glyocogenin enzyme, which has permitted the identification of human glycogenin pseudogenes. We have discovered that these pseudogenes can be transcribed in diabetic patients but, in general, they would code for truncated forms of glycogenin that appear to be enzymatically inactive. Increased transcription of glycogenin pseudogenes may be correlated in diabetic patients or other subjects at risk for the development of diabetes with decreased glycogen biosynthesis and hyperglycemia (i.e., a method of genetic forecasting). Such correlations may also be used in pharmacogenomics to correlate transcription with the bioavailability and/or biologic response of a drug that is or will be administered to subjects who have had sequences determined at the gene or transcript level, or both (i.e., a method of genetic profiling or monitoring at the DNA and/or RNA level). In another embodiment, the invention may relate to using glycogenin pseudogenes in the diagnosis and possible treatment of Type 1 and/or Type 2 diabetes.

A further embodiment of the invention is products and processes useful in the aforementioned embodiments, especially with respect to isolation, detection, and identification of glycogenin mutations in the human genome and transcripts from the glycogenin pseudogenes. In particular, the mutations disclosed herein may be used in genetic fingerprinting, mapping, DNA (genomic) profiling, RNA transcript monitoring, and forecasting as single nucleotide polymorphisms. Kits comprised of the aforementioned product are also provided to practice the described processes. Such kits are preferably further comprised of instructions for performing processes, standards to calibrate processes and quantitate their results, reagents to perform the processes, or combinations thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 shows the normal nucleotide sequence and the predicted amino acid sequence of the open reading frame (ORF) for a cDNA clone of human glycogenin (SEQ ID NOS: 1-2, respectively). Although thymidine (T) bases in a DNA polynucleotide (e.g., a gene) are replaced by uracil (U) bases in the corresponding RNA polynucleotide (e.g., a transcript produced according to the sense strand of the gene), nucleotide sequences are listed with a T base exclusively for the sake of simplicity. The skilled artisan would recognize from the context whether the chemical structure intended is actually T or U. Figure 2 shows a physical map of the human glycogenin gene.

Figure 3A-K shows the nucleotide sequences for the normal human glycogenin gene and its pseudogenes (SEQ ID NOS:3-19). The numbers enclosed in parentheses at the beginning and end of the listed sequences shows the positions of the 5'- and 3 '-termini, respectively, relative to positions 1 to 999 of the normal sequence. Nucleotides appearing in bold type (A) represent a sequence difference between the pseudogene and the normal sequence; inserted nucleotides appear in italicized, bold type (A); underlined triplet bases represent a stop codon; deleted nucleotides appear as dashes.

DESCRIPTION OF SPECIFIC EMBODIMENTS

The discovery of glycogenin, an enzyme involved in glycogen synthesis, allowed us to determine whether under-expression of this protein occurs in patients with Type 2 diabetes. Because glycogen cannot be synthesized without glycogenin and no free glycogenin exists in muscle, a decrease in the supply of glycogenin would be expected to lower the amount of glycogen synthesized by muscle in response to elevation of blood glucose. In particular, factors which could decrease the amount or activity of glycogenin may be expected to lead to proportional decreases in glucose deposition in the form of glycogen. One molecule of skeletal glycogen contains up to 60,000 glucose residues. Accordingly, one molecule of glycogenin can initiate the deposition of this amount of glucose. Because there is no free glycogenin in skeletal muscle under normal circumstances, the available quantity of this protein may limit the amount of glucose that can be stored as glycogen. Decreases in glucose deposition can result in elevated blood glucose levels, the characteristic symptom of diabetes.

Factors reducing glycogenin activity may include, for example, mutations in the glycogenin gene in one or both alleles (i.e., phenotypes of heterozygotes or homozygotes, respectively, may be an quantitative impairment and/or qualitative change), impaired or changed transcription of that glycogenin gene into RNAs, impaired or changed processing of those glycogenin RNAs, impaired or changed translation of a glycogenin mRNA into polypeptides, impaired or changed post-translational modification of a polypeptide of the glycogenin gene, or combinations thereof. Although this basis for disease etiology is genetic, the mutation may manifest itself at any one of the aforementioned art-recognized stages in gene expression and/or the regulation thereof. Genetic mutation may contribute to the initiation, establishment, progression, or maintenance of the diabetic state, or a symptom or pathologic condition associated with diabetes. Many glycogenin pseudogenes exist in the human genome and, unusually for pseudogenes, they are heavily transcribed. Without limiting the invention to a particular mechanism, but merely to provide a possible explanation for the observations disclosed herein, such transcripts with premature stop codons may overwhelm the normal process of translating mRNA into polypeptide and thereby induce a pathologic state. Although transcription of glycogenin pseudogenes may be the pathognomonic event for this disease or a related pathologic condition, it might not be the only event or other events might combine in its natural history or pathogenesis: DNA replication, methylation, unwinding, topology, conformation, packing, or other higher-order structures above the linear sequence; competitive inhibition in or interference with RNA transcription, turnover, transport, editing, splicing, or other processing; competitive inhibition in or interference with protein translation, degradation, proteolytic cleavage, reduction, glycosylation, phosphorylation, methylation, sulfation, acylation, other covalent modifications, translo- cation across a cell membrane, targeting to an organelle or other subcellular location, splicing, folding or achieving higher-order structures above the linear sequence, conformation, binding to another protein, substrate binding, autocatalytic or catalytic activity; or combinations thereof.

Glycogenin pseudogenes were discovered during chromosomal mapping of the human gene for glycogenin. Some pseudogenes may represent duplication of a functional gene and subsequent mutation, but other pseudogenes appear to be "processed" (i.e., they lack the introns present in most eukaryotic genes) and then integrated into the germ line (e.g., by retrotransposition). The former class of pseudogenes would still contain introns, although mutation may reduce the ability of a splice donor or acceptor to function, and the non-transcribed regulatory sequences might also still function. Therefore transcription of pseudogenes initially created by gene duplication would be expected. In contrast, processed pseudogenes that have integrated into the genome would lack regulatory regions outside the transcribed region or those that were located in an intron removed by splicing. Therefore, in general, transcription of intron-less pseudogenes would not be expected. In either case, missense or nonsense mutations that reduced the activity of the translated protein might make the pseudogene a neutral player in the process of genetic selection. Such multigene families are also of interest to the study of evolution and genome dynamics, where pseudogenes may act as neutral accumulators of deleterious mutations or bystanders to the selective process.

In the process of determining the chromosome location of the glycogenin gene in the human genome, a genomic clone was isolated and then analyzed by Southern blotting. The presence of multiple hybridizing fragments resulting from restriction enzyme digestion was detected; many of these fragments hybridized to probes derived from different regions of a glycogenin cDNA clone. We hypothesized that this indicated the presence of glycogenin pseudogenes in the human genome. DNA of the genomic clone was sequenced, and the results were compared to the nucleotide sequence of a cDNA clone known to encode the complete human glycogenin polypeptide (Figure 1), as well as cDNAs obtained from RT-PCR of mRNA isolated from five Type 2 diabetic patients, starting at Ala-234 and continuing through to the end of the coding region. Nine mutations were noted in the pseudogene cDNA. Mutations in glycogenin RNA in cells reflect the transcription of related pseudogenes therein and produce abnormal RNA templates for translation. Translation of abnormal glycogenin mRNA or competition for a limiting amount of the translational apparatus might be expected to initiate, establish, further development of, and/or maintain a pathologic state. These results are presented in Table 1.

Table 1. Occurrence of Pseudogene Mutations in cDNA (from mRNA) of Type 2 Patients.

Mutation Number of Patients

A234P 5

M239T 1

H241Y 3

L247L 2

P296P 2

S300S 3

R303W 4

E305E 5

R306R 4

The human glycogenin gene is located on chromosome 3q24 (Lomako et al., 1996). The gene spans about 7 Kb and contains five exons ranging in length from 129 (exon 2) to 315 (exon 1) base pairs. Four introns are located within the human glycogenin genome which have been substantially sequenced. A high degree of homology with rabbit glycogenin has been observed, and both human and rabbit glycogenins are comprised of 332 amino acids. The two proteins differ by about 10% in amino acid sequence, although the sequences between Leu-63 and Lys-201 and between Glu-295 and the terminal Glu-332 display essentially 91% identity. Intron 1 has been completely sequenced, while most of introns 2-4 has been sequenced. The glycogenin genome exhibits an unusual intron boundary between exons 3 and 4, wherein the 5 '-end of the intron begins with AA and terminates with GC. This unusual intron may be connected with a sequence encoding human glycogenin in the EMBL databank (publicly accessible as X79537), which is missing exon 4. Intron 3 of the glycogenin gene has an unusual splicing site (i.e., GTGT/aag . . . ggc/TTGG), but introns 1, 2 and 4 of the glycogemn gene comply with the gt/ag acceptor and donor rule. The presence of this unusual splicing site promotes alternative splicing in certain circumstances when there are particular mutations of the glycogenin gene. This may generate pseudogenes which are inactive but stable components of the genome derived by mutation of a previously active ancestral gene. Transcribing these pseudogenes might be involved in the etiology of the diabetic state, especially Type 2 diabetes.

A detailed physical map has been constructed (Figure 2) which identifies features of interest within the human genome in the vicinity of the glycogenin gene. Using a muscle biopsy from a Type 2 diabetic patient, a cDNA corresponding to a four amino acid deletion from the wild type sequence was identified and further analyzed. Expressed recombinant protein having this deletion is enzymatically inactive. This inactivation was determined to result from absence of the Asp- 159 residue because mutants in which this residue was changed to Ala, Glu, or Lys by genetic manipulation were also inactive. Therefore, Asp-159 may be part of glycogenin' s active site. Recombinantly produced muscle glycogenin from rabbit was labeled with a photo- affinity probe (8-azidoadenosine triphosphate) and subjected to tryptic digestion. Peptides obtained therefrom were fractionated and the sequence Ser-Nal-Arg was detected. This oligopeptide corresponds to positions 229-231 of the glycogenin polypeptide sequence. Substitution of Ser-229 by Arg (Ser229Arg) or Arg-231 by Met (Arg23 lMet) allows expression of an active enzyme which is inhibited by ATP. While a double mutant constructed by replacing Ser-Nal-Arg with Arg-Nal-Met exhibits wild-type properties, a single substitution of Ser-229 by Glu (Ser229Glu) yields an enzyme incapable of autoglu- cosylation. Thus, the ATP-binding sites appears to be comprised of the linear sequence Ser-Nal-Arg at positions 229-231 of the glycogenin polypeptide. Similar assays with a uridine photoaffinity probe revealed that this domain is also involved in UTP binding and, thus, may identify the UDP-glucose binding site.

Manipulation of such binding sites may prove useful in producing better reagents to detect the presence of glycogenin and modulate its enzymatic activity. Therapeutic treatments may be devised for controlling the activity of glycogenin, thereby alleviating hyperglycemia associated with Type 1 and/or Type 2 diabetes. For example, gene-based therapies could inhibit the expression of pseudogenes of glycogenin; preventing their interference in the normal pathway of glycogen synthesis and/or the expression of the normal glycogenin gene may be increased to overcome expression of pseudogenes. Thus, assays for the amount of glycogenin present in a sample from a patient or the glycogenin activity therein could be applied in a clinical setting. Enzymatic activity of glycogenin could be measured using in vitro or in vivo assays (e.g., glycogenin autocatalysis). The present invention contemplates testing individuals of any age group for predisposition to Type 1 and/or Type 2 diabetes. Furthermore, the present invention may also involve detecting a predisposition to diabetes in humans and animals by their aberrant transcription of one or more glycogenin pseudogenes. It is believed on the basis of the present disclosure that there are at least ten different pseudogenes corresponding to the glycogenin gene in the human genome. Given the large size of this multigene family, it is apparent that expansion and contraction of the family might occur by genomic recombination or retrotransposition of transcripts into the genome. Thus, genetic variability in both the number of copies of glycogenin-related pseudogenes and the sequences of those pseudogenes would be expected. A polynucleotide, polypeptide, or specific binding molecule according to the present invention may be used to identify and detect a genetic marker in family pedigrees (e.g., CEPH/NIH or Utah projects), radiation hybrids, or human-rodent somatic cell hybrids. Fingerprinting would allow identification of an individual within a genetically similar population or construction of a genealogy among genetically related individuals. Detection of a germline or somatic mutation will determine that a disease is inherited or acquired, respectively. Identification of mutations by molecular genetic or cytogenetic techniques may also determine how glycogenin expression, especially at the level of transcriptional activity, is regulated during development. Genetic polymorphism may be used in linkage mapping, genetic fingerprinting, and molecular taxonomy. For example, detection of a polymorphism as restriction fragment length polymoφhism (RFLP), by heteroduplex analysis, random amplified polymorphic DNA (RAPD), amplified fragment length polymoφhism (AFLP), denaturing gradient gel electrophoresis (DGGE), single- strand conformation poly-moφhism (SSCP), temperature gradient gel electrophoresis (TGGE), single nucleotide polymoφhism (SNP), short tandem repeat (STR), variable nucleotide tandem repeat (NNTR), or micro-satellite length heterogeneity may be used. See, for example, Lessa et al. (1993) and references cited therein. Such polymoφhisms (or mutations if the polymoφhism results in a mutant phenotype) may be linked to a genetic trait or phenotype, or correlated with gene expression or development. A complementary DNA sequence (either single- or double-stranded cDNA) representing a messenger RNA (mRNA) transcript of glycogenin genes and related pseudogenes in human cells may be monitored by polynucleotide detection techniques. Nucleotide sequence specific for glycogenin can be used as a probe. Such probes could be full length covering the entire transcribed message or gene, at least one coding region, or a shorter length fragment which is unique to the glycogenin transcript or gene but contains only a portion of same. The polynucleotide may be at least about 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 750, 1000, 2500, 5000, 10K, or 20K bases long; intermediate size ranges consisting of these independently selected lower and upper limits (e.g., between about 10 to about 100 bases) are also preferred. The Sanger and Maxam-Gilbert sequencing reactions produce a collection of polynucleotide fragments by enzymatic and chemical methods, respectively. These fragments may be separated by slab or capillary gel electrophoresis as a ladder of bands with different mobilities, detected by labeling, and isolated by collecting the moving zone containing the desired fragment with a relative mobility predicted from comparison to a standard. In contrast to template-dependent extension and limited hydrolysis, polynucleotide fragments may also be produced by limited or complete nuclease digestion. Organic synthesis with phosphonate or Caruthers' phosphoramidite may also be used to produce short oligonucleotides. High-performance liquid chromatography and mass spectroscopy are alternative methods of separation and detection, respectively. A recombinant clone or expression construct containing a nucleotide sequence is one preferred form of the polynucleotide of the present invention. The recombinant clone or expression construct may be an episome, phagemid, plasmid, bacteriophage, cosmid, yeast artificial chromosome (YAC), or bacterial artificial chromosome (BAC). Such clone or construct could be single- or double-stranded; nucleotides may be deoxyribonucle- osides, ribonucleosides, nucleoside analogs and variants, nucleosides with a modified base, nucleotides with a modified ribose sugar, or combinations thereof; linkages between nucleosides may be comprised of phosphorus, nitrogen, sulfur, oxygen, carbon, or combinations thereof (e.g., phosphorothioate, peptide nucleic acids or PNA). All of these products are considered "polynucleotides" in the context of the claims. The expression construct may be further comprised of a regulatory region for gene expression (e.g., promoter, enhancer, silencer, splice donor and acceptor sites, polyadeny- lation signal, cellular localization sequence) and, optionally, an origin of replication that allows chromosomal or episomal replication in a selected host cell. The expression construct may be based on a general-puφose vector with at least one regulatory region from a mammalian gene (e.g., actin, hormone responsive element of glucocorticoid receptor, histone, metallothionein) or a virus (e.g., adenovirus, baculovirus, cytomegalo- virus, heφes virus, Moloney leukemia virus, mouse mammary tumor virus, Rous sarcoma virus, SV40 virus), as well as regions that facilitate engineering a polynucleotide or poly- peptide (e.g., selectable marker, linker with multiple recognition sites for restriction endo- nucleases, promoter for in vitro transcription, primer annealing sites for in vitro replication, consensus site for initiation of translation, recognition site mediating site-specific recombination, site for proteolytic cleavage or post-translational modification).

Advantages of such clones or constructs include ease of genetic or proteomic mani- pulation, abundance of additional copies of polynucleotide or polypeptide, convenience of producing fragments and fusions thereof, and ability to shuttle among different host cells or organisms.

Production of such vectors and constructs, like any other recombinant molecule, are known in the art and typically involves enzymes, such as Taq polymerase, DNA and RNA polymerases, DNA and RNA ligases, restriction endonucleases, SI nuclease, glyco- sylase, reverse transcriptase, and ribonuclease H. See Kornberg and Baker, DNA Replication, Freeman, 1991. The recombinant molecule may be transfected into a host, selected positively or negatively, and further manipulated. Examples of drugs for which selectable marker genes exist are ampicillin, hygromycin, kanamycin/neomycin, puromycin, and tetracycline. A metabolic enzyme (e.g., dihydrofolate reductase, thymidine kinase) may be used as a selectable marker in sensitive host cells or auxotrophs.

Vectors, reagents, and other supplies are commercially available. See, for example, the catalogs and product information contained therein of Amersham Pharmacia Biotech, Biol 01, Bio-Rad, CLONTECH, Invitrogen, Molecular Probes, New England Biolabs, Novagen, PharMingen, Pierce Chemical, Promega, Roche Molecular Biochem- icals, Sigma-Aldrich, Stratagene, and United States Biological.

A host cell may be transfected with an expression construct comprised of the polynucleotide of the invention. The host cell may be a human primary culture or cell line, bacterium, yeast, other fungus, insect cell, plant cell, rodent cell, cell in a primary culture, established cell line, somatic cell, or stem cell. A heterologous promoter may be used to regulate expression in a host cell or transgenic animal. See No et al. (1996); Rivera et al. (1996); Allgood and Eastman (1997); U.S. Pat. Nos. 5,589,362, 5,650,298, 5,654,168, 5,789,156, 5,814,618, 5,830,462, 5,859,310, 5,866755, 5,869,337, and 5,871,753. Transgenic and gene knockout non-human animals may be used to produce an animal model of diabetes or other pathologic conditions according to mechanisms disclosed herein for the effects of actively transcribing human glycogenin pseudogenes (e.g., ectopic expression of mutated host glycogemn sequences, normal human glycogenin and modulated pseudogene expression in a transgenic or knockout animal). The invention also provides primer pairs and other polynucleotides for use in amplifying polynucleotides (e.g., polymerase chain reaction or PCR, ligation chain reaction or LCR, transcription-mediated amplification or TMA, other thermal cycling or isothermal reactions) and hybridization probes. A set of such primers may be selected and used for PCR assays to quantitate transcript abundance within cells. Oligonucleotide sequences may be selected using procedures such as those described in U.S. Pat. Nos. 5,556,749, 5,639,612, or others known to the skilled artisan. This set of primers will be specific for amplification of the glycogenin gene or pseudogenes, and can be used as a pair for PCR and RT-PCR amplification of DNA and RNA, respectively. A hybridization probe with a sequence complementary to at least one of the mutations disclosed herein (preferably, the mutated position is located near the middle of an oligonucleotide probe) can be used for specific hybridization to a mutant polynucleotide. One or more primer probes with a sequence complementary to at least one of the mutation disclosed herein (preferably, the mutated position is located near the 3 'end of an oligonucleotide probe) can be used for specific amplification to a mutant polynucleotide. Therefore, the present invention will be useful for development and utilization of primers and other polynucleotide probes to quantitate cognate RNA and DNA within cells. This information may then be used to correlate glycogen catabolism and metabolism, or diabetes and other related pathologic conditions, with glycogenin transcription activity or other modes of gene expression. Primers that specifically amplify sequences in the vicinity of the glycogenin genetic locus also serve as a sequence tagged site (STS) for human chromosome 3q24-25.

Based on the glycogenin nucleotide sequences, other specific-binding molecule (e.g., antisense, ribozyme) can be used to detect gene expression. Alternatively, specific- binding molecules developed to detect glycogenin protein activity may be produced in or administered to an organism. The amino acid sequence of glycogenin antigen can be used for preparation of specific-binding molecules (e.g., polyclonal or monoclonal antibody, antibody fragment, humanized antibody, single chain antibody, phage hybrid protein or other members of a combinatorial library) for monitoring protein expression, affinity puri- fication, and functional studies. Antibody may be produced by immunizing an animal (e.g., chicken, goat, hamster, horse, mouse, rabbit, rat, sheep) with antigen. The immune response may be potentiated by immunoadjuvant, conjugation of antigen to a multivalent carrier, booster immunization, or combinations thereof. Antibody fragments may be prepared by proteolytic cleavage or genetic engineering; humanized antibody and single chain antibody may be prepared by transplanting sequences from the antigen binding regions of antibodies to framework molecules. Antigen may be full-length polypeptide or a fragment thereof. Algorithms to guide the selection of hybridization probes, oligonucleotide primers, polypeptide binding molecules, and antigenic peptides have also been implemented in computer software packages. Specific binding molecules may also be generally produced by screening a combinatorial library for a clone which specifically binds glycogenin antigen or mutants thereof (e.g., phage display library). See U.S. Pat. Nos. 5,403,484, 5,723,286, 5,733,743, 5,747,334, and 5,871,974. Preferred are molecules that specifically bind to mutant polynucleotide or polypeptide in preference to the native glycogemn gene, transcript, or enzyme to distinguish mutant from normal. For immunological screening methods, antibody preparations, either monoclonal or polyclonal, may be utilized. Polyclonal antibodies, although generally less specific, typically are more useful in gene isolation. Immunizing an animal may produce polyclonal antibody which recognizes multiple epitopes of the glycogenin antigen or at least one immunodominant epitope. Monoclonal antibody may be produced by fusing lymphocytes of an immunized animal with a myeloma or other immortalized cell, and selecting clones producing antibody that recognizes a desired immunogenic epitope or possesses a desired properties (e.g., specifically binding denatured, hydrolyzed, and/or truncated polypeptide instead of native enzyme, recognizing structural differences therebetween, precipitating glycogenin polypeptides or fragments thereof). The epitope bound may be present in the native polypeptide or only a denatured, hydrolyzed, and/or truncated form.

A molecule able to specifically hybridize to a polynucleotide of the invention (e.g., a complementary polynucleotide) is also considered a specific binding molecule. Hybridization conditions are preferably chosen with a stringency that uniquely identifies the human glycogenin gene or pseudogenes in a population of genomic DNA or any of the human transcript in a population of cellular RNA. Conditions may also have to be chosen to distinguish human nucleotide sequences from those of other species by a physical criterion like sedimentation velocity or electrophoretic mobility. Alternatively, hybridization conditions may be relaxed to identify orthologs or paralogs in human or other mammalian species. Specific hybridization by such a molecule is also useful for monitoring gene expression, genetic profiling and fingeφrinting, and functional studies.

The specific binding molecule of the invention may be a chemical mimetic; for example, an aptamer or peptidomimetic. It is preferably a short oligomer selected for binding affinity and bioavailability (e.g., passage across the plasma and nuclear membranes, resistance to hydrolysis of oligomeric linkages, adsorbance into cellular tissue, and resistance to metabolic breakdown). The chemical mimetic may be chemically synthesized with at least one non-natural analog of a nucleoside or amino acid (e.g., modified base or ribose, designer or non-classical amino acid, D or L optical isomer). Modification may also take the form of acylation, glycosylation, methylation, phosphory- lation, sulfation, or combinations thereof. Oligomeric linkages may be phosphodiester or peptide bonds; linkages comprised of a phosphorus, nitrogen, sulfur, oxygen, or carbon atom (e.g., phosphorothioate, disulfide, lactam, lactone bond); or combinations thereof. The chemical mimetic may have significant secondary structure (e.g., ribozymes) or be constrained (e.g., cyclic peptides). Solid-phase synthesis is preferred to avoid representa- tional bias and to generate chemical diversity in making a library of non-natural mimetics. See, for example, U.S. Pat. Nos. 5,650,489 and 5,877,030. Cleavage from the solid support would produce a solution library or selectively release/retain the mimetic.

For detection, the antibody is labeled using radioactivity or any one of a variety of second antibody/enzyme conjugate systems that are commercially available (e.g., alkaline phosphatase, β-galactosidase, horseradish peroxidase). Chemical staining may be used to detect polynucleotide (e.g., intercalators like acridine orange and ethidium bromide) or polypeptide (e.g., stains like amido black, dyes like coomassie brilliant blue). Typically, polynucleotide, polypeptides, and specific binding molecule are labeled for use as probes in assays of the invention. The probe is preferably labeled with a small molecule (e.g., biotin, chromochrome, colloidal gold, digoxygenin, dinitrophenol, fluorochrome, radio- isotope, spin label), fusion protein (e.g., avidin or biotin-binding analogs, immunogenic epitope), or enzyme (e.g., fluorescent proteins like GFP, hydrolases and transferases, kinases and phosphatases, luminescent proteins like LUC) to detect its presence.

Detection may involve, for example, chemical or physical phenomena like aggluti- nation or flocculation, autoradiography, chemiluminescence or electrochemiluminescence (ECL), calorimetry, colorimetry, electron spin resonance, charge or energy transfer; fluorescence polarization or quenching, liquid scintillation, nuclear magnetic resonance, surface plasmon resonance, and other spectroscopic measures of radiation adsoφtion, emission, reflection, or refraction.

Related nucleotide sequences (e.g., pseudogenes) may be defined by structural and/ or functional criteria. For example, related nucleotide sequences derived from the human glycogenin nucleotide sequence may hybridize under stringent conditions known in the art. Suitable conditions for oligonucleotides of about 20 or about 50 bases could be 400 mM NaCl, 40 mM PIPES pH 6.4, 1 mM EDTA, 50°C or 70°C (see Beltz et al., 1983); suitable conditions for polynucleotides longer than 50 bases could be 500 mM NaHPO₄ pH 7.2, 7% (w/v) sodium dodecyl sulfate (SDS), 1% bovine serum albumin (BSA), 1 mM EDTA, 65°C (Church and Gilbert, 1984). Short, conserved peptide domains may be used to design amplification primers which probe for related nucleotide sequences (Gould et al., 1989). Other suitable hybridization and washing conditions are described herein. Thus, relatedness may be found when there is similarity or identity of sequence and this may be determined by comparison of sequence information or through hybridization between a probe and a source (e.g., Southern or Northern blots, genomic or cDNA libraries).

These are strict definitions, however, because some nucleotide sequences which encode the glycogenin protein with 100% identity (i.e., a functional equivalent of the native glycogenin polynucleotide) would fail to hybridize under stringent conditions because of the redundancy of the genetic code, but might be desirable for heterologous expression because of the preferences of host cells and organisms for certain codons. In contrast, a large but finite set of stop codon mutations are possible. Nonsense mutations are easily produced by changing the nucleotide at each position between the native initiation of translation Met codon and the native termination of translation stop codon to the other three nucleotides (i.e., all bases except the base normally found at that position) and determining whether a premature stop codon was created in the normal open reading frame. Moreover, frameshift mutations are easily produced by inserting any one of the four possible nucleotides (i.e., single insertion) or any one of the 16 possible dinucleotides (i.e., double insertion) at each position between the native initiation of translation Met codon and the native termination of translation stop codon and determining whether a premature stop codon was created in the normal open reading frame.

Functional criteria for mutated sequences include effects of the mutant such as alterations in DNA replication, methylation, unwinding, topology, conformation, packing, or other higher-order structures above the linear sequence; competitive inhibition in or interference with RNA transcription, folding, turnover, transport, editing, splicing, or other processing; competitive inhibition in or interference with protein translation, degradation, proteolytic cleavage, reduction, glycosylation, phosphorylation, methylation, sulfation, acylation, other covalent modifications, translocation across a cell membrane, targeting to an organelle or other subcellular location, splicing, folding or achieving higher-order structures above the linear sequence, conformation, binding to another protein, substrate binding, autocatalytic or catalytic activity; or combinations thereof. While the normal glycogenin sequence serves as a reference to determine whether a given sequence of a certain length contains a mutation at a particular position, the normal sequence is not intended to be encompassed within the scope of the invention. Similarly, a prior art sequence or fragment thereof is not generally within the scope of the invention unless used in combination with at least one novel and non-obvious sequence of the invention (e.g., mutant polynucleotide, polypeptide, or fragments thereof). A reference sequence according to the invention may be defined as a polynucleotide sequence, preferably at least 20 nucleotides in length, used as a basis for a comparison between sequences. The reference sequence may comprise a full-length cDNA or gene sequence in a sequence listing, or the reference sequence may comprise a portion of a full-length DNA or gene sequence.

A comparison of the sequences of two polynucleotides (or more than two) may be restricted to a "comparison window" in order to determine sequence similarity in a local region of the two polynucleotides. A "comparison window" is defined to be a continuous series of at least 10 nucleotides in a first polynucleotide sequence, which when aligned with a reference sequence of at least 10 contiguous nucleotides, enables a comparison of the first polynucleotide and reference sequences. The segment of contiguous nucleotides in the "comparison window" of the first polynucleotide may comprise additions and/or deletions, such as, for example, gaps, of up to 20% of the reference sequence. Local homology or sequence identity within the "comparison window'Of the first polynucleotide and reference polynucleotide may be achieved via alignment of the sequences according to the algorithm of Smith and Waterman, or by another suitable alignment method or algorithm (Needleman and Wunsch 1970; Pearson and Lipman, 1988). Sequence alignment also may be objectively achieved and optimized by other known algorithms. These algorithms may be incoφorated into commercial computer packages and implemented using default parameters. Computer packages available on the internet or via commercial sources, include FASTDB (Intelligenetics), BLAST (National Center for Biomedical Information; Altschul et al., 1997), GAP, BESTFIT , FASTA and TFASTA (Wisconsin Genetics Software package Release 7.0, Genetics Computer Group, 575 Science Drive, Madison WI). DNASTAR, Genetics Computer Group, Hitachi Genetics Systems, and Oxford Molecular Group (formerly Intelligenetics). See Doolittle, OfURFS and ORFS, University Science Books, 1986; Gribskov and Devereux, Sequence Analysis Primer, Stockton Press, 1991; and references cited therein.

Sequence identity may be established when the first polynucleotide and reference polynucleotide exhibit complete sequence similarity in a nucleotide by nucleotide basis over the span of the "comparison window". However, the invention contemplates polynucleotides having substantial identity with a reference polynucleotide, wherein a polynucleotide having substantial identity with the reference polynucleotide exhibits the same function as the reference polynucleotide. By "substantial identity" is meant that a polynucleotide had nearly identical nucleic acid bases in a "comparison window" located in positions corresponding to those of the reference polynucleotide. The number of matched base positions divided by the number of total positions compared, multiplied by 100 percent, yields what is known as the percentage sequence identity (also known as percentage sequence homology) between compared polynucleotides. The invention contemplates a reference sequence having a "comparison window" of at least 10 nucleotides (preferably 20-50 nucleotides), for example a gene comprising the sequence identity of SEQ ID NO:3, and a first, second or other sequence having at least 80% sequence identity to the gene comprising SEQ ID NO: 3, preferably at least 95% sequence identity, more preferably 97%, and most preferably 98%-99% sequence identity with SEQ ID NO.3.

For example, a nucleotide sequence may show as little as 80% sequence identity, and more preferably at least 90% sequence identity, between the target sequence and the human polynucleotide excluding any deletions or additions which may be present, and still be considered related. Nucleotide sequence identity may be at least 95% and, most preferably, nucleotide sequence identity is at least 98% or 99%.

An "isolated" polynucleotide or polypeptide is at least partially isolated from the source of the polynucleotide or polypeptide. Using the nucleotide and amino acid sequences disclosed herein, compositions or extracts of the invention may be made substantially pure by controlled expression of the polynucleotide or polypeptide, and isolating same. Expression may be accomplished by extraction from natural sources, recombinant technology, or total or partial chemical synthesis.

By "substantially pure", a composition containing a molecule is described as being at least 80%, preferably at least 90%, more preferably at least 95%, and most preferably at least 99%o pure by weight as compared to other substances (i.e., contaminants) of the same chemical character as the recited molecule (e.g., nucleotide, amino acid). Thus, "purified" polynucleotide or polypeptide could be assessed relative to the starting source (e.g., cytoplasm, nucleoplasm, cellular or nuclear lysate, cellular or tissue extract) from which purification was initiated. Preferably, such compositions are reduced by at least 95% of the initial number of intact cells (i.e., 95%) free). A substantially cell-free composition is reduced by at least 99% of the initial number of intact cells, and a reduction of at least 99.99% may also be achieved. Compositions may also be cleared so that they are substantially free of membranes or other coated structures (reduced by at least 95% of the initial content by weight). The meaning of "heterologous" depends on context. For example, heterologous polynucleotide regions or polypeptide domains may mean that some regions/domains are not found in the same species in nature (e.g., a human polynucleotide encoding glycogenin and a prokaryotic promoter). Another example is that heterologous polynucleotide regions or polypeptide domains may mean that the regions/domains are not found joined together in nature (e.g., human glycogenin structural gene and tetracycline- or FK506-responsive regulatory region; human glycogenin polypeptide and epitope tag, signal for localization in the cell, or specific-binding domain). Ligation of polynucleotide regions or fusion of polypeptide domains occurs by inventive manipulation, such as by de novo synthesis or recombination. Of course, such joining may be preceded or followed by fragmentation (e.g., hydrolysis of a phosphodiester or peptide bond) through enzymatic (e.g., nuclease or protease) or chemical hydrolysis.

Similarly, the meaning of "native" depends on context. For a human polynucleotide or polypeptide, it may mean that the polynucleotide/polypeptide was purified from a human source, has a sequence identical to a non-mutant human glycogenin gene or protein (in contrast to a pseudogene), shares a conformation with properly folded polynucleotide/ polypeptide, or is not denatured in its chemical structure.

Binding is described as "specific" for binding which is able to discriminate human glycogenin polynucleotide or polypeptide from a mixture of other chemical substances which are not related to glycogenin. Processes of isolation, detection, and identification may depend on specific binding of human glycogenin polynucleotide or polypeptide in the mixture. The skilled artisan would be able to determine appropriate process conditions to achieve specific binding by choice of length of time, temperature, ionic strength, pH, addition of surfactant, pre-treatment (e.g., adsoφtion, affinity purification, subtraction), and post-treatment (e.g., additional rounds of binding, signal amplification, washing). In particular, binding specific for a glycogenin mutation in a polynucleotide- containing sample or a glycogenin mutant in a polypeptide-containing sample would detect the mutation or mutant preferentially in comparison to the normal polynucleotide or polypeptide. Binding could be quantitated by appropriate correction for the manner in which a sample was prepared or conditions of the reaction, comparison to a standard curve of known amounts or types of glycogenin, competitive or other assay formats, or combinations thereof. Washing and separation is facilitated by immobilization to an insoluble polymeric support (e.g., acrylamide, agarose. cellulose, nylon, plastics) or other supports such as glass. Although binding in solution has favorable kinetics, prior or subsequent binding to a support allows increasing detection sensitivity and/or decreasing non-specific binding or background. For heterogeneous assays, after incubation to allow binding between or among components of the stable complex, the reaction components may be separated, non-binding components may be removed, and the support may be washed. Washing may also be facilitated by forming the support into a bilious strip with one or more zones; a well of a 96-, 192- or 384-well plate; a magnetic or non-magnetic bead; a chromatographic plate, resin, or column; or a porous or non-porous membrane. In particular, immobilization of a probe on a glass or silanized slide, nylon or nitrocellulose membrane, magnetic bead, or microtiter plate is preferred. Binding will result in capture of the solution species. Homogeneous assays will generally not require immobilization to a support.

Isolation, detection, purification, or quantitation of mutated polynucleotide or mutant polypeptide may involve conventional techniques such as in vitro transcription, in vitro translation, Northern hybridization, reverse transcription-polymerase chain reaction (RT-PCR), run-on transcription, solution hybridization, nuclease protection, Southern hybridization, metabolic protein labeling, antibody binding, enzyme linked immunosorbent assay (ELISA), immunofluorescence, immunoprecipitation (IP), fluorescence activated cell analysis (FACS), radioimmunoassay (RIA), and Western blotting. The presence of mutated polynucleotide or mutant polypeptide may be assayed by use of a reporter whose product is easily assayed. Such reporter genes include alkaline phosphatase, β-galacto- sidase (LacZ), chloramphenicol acetyltransferase (CAT), β-glucoronidase (GUS), green fluorescent protein (GFP), β-lactamase, luciferase (LUC), or derivatives thereof. Such reporter genes would use cognate substrates that are preferably assayed by a chromogen, fluorescent, or luminescent signal. Alternatively, the assayed product may be tagged with a heterologous polypeptide epitope (e.g., FLAG, MYC, SN40 T antigen, glutathione-S- transferase or GST, oligohistidine, maltose binding protein or MBP) for which cognate antibodies or affinity resins are commercially available.

Changes in transcription may be detected qualitatively and/or quantitated with, for example, differential message display (U.S. Pat. Νos. 5,459,037; 5,599,672; 5,665,544; 5,707,807; 5,807,680; 5,814,445; 5,851,805; and 5,876,932); subtractive hybridization (U.S. Pat. Νos. 5,316,925; 5,643,761; 5,804,382; 5,830,662; 5,837,468; 5,846,721; and 5,853,991); computer-assisted comparison with an electronic database (e.g., U.S. Pat. No. 5,840,484); differential screening of arrayed cDNA clones or libraries (e.g., U.S. Pat. Nos. 4,981,783; 5,206,152; and 5,624,801); reciprocal subtraction differential display (RSDD; U.S. Pat. No. 5,882,874); and serial analysis of gene expression (SAGE; U.S. Pat. No. 5,866,330). See also Soares (1997) and references cited therein.

Monitoring gene expression is facilitated by using biochips or microarrays (oligo- nucleotides and/or oligopeptides arranged at high density on a solid substrate). See, for example, U.S. Pat. Nos. 5,445,934, 5,510,270, 5,545,531, and 5,677,195; Nature Genet. special supplement (1999) 21, 1-60. Such reagents allow capture of a molecule in solution by a specific interaction between the cognate molecules and immobilization of the solution molecule on the solid substrate. See, for example, U.S. Pat. Nos. 5,143,854, 5,639,603, 5789,162, and 5,789,172. Other multiplex analyses to monitor gene expression employ simultaneous solution methods such as multi-probe ribonuclease protection assay or multi- primer pair polynucleotide amplification. Besides glycogenin, other genes or proteins which may be monitored are those involved in autoimmunity (e.g., pancreatic islet antigen, glutamic acid dehydrogenase); pharmacogenomics (e.g., cytochrome P450 genes); sugar catabolism and metabolism (e.g., insulin, insulin receptor, glucokinase); transcription of the aforementioned genes (e.g., HNF-l , HNF-l β, HNF-4α, IPF-1, and IPF-1); and combinations thereof.

Nucleotide and amino acid sequences may be synthesized in situ on the substrate by solid-phase chemistry or photolithography. In situ synthesis attaches the nucleotides or amino acids directly to the substrate. Alternatively, the polynucleotide, polypeptide, or specific binding molecule may be attached by interaction of a specific binding pair (e.g., antibody-digoxygenin/hapten/peptide, biotin-avidin/streptavidin, GST-glutathione, MBP- maltose, polyhistidine-chelated nickel, protein A/G-Fc domain); crosslinking may be used if covalent attachment to the substrate is desired. Glutaraldehyde is a covalent bifunctional crosslinker suitable for immobilization on a substrate, but a photoactivatable, reversible crosslinker is preferred to identify and isolate molecules interacting in a complex (e.g., a thiol linkage that may be reduced). Polynucleotides or polypeptides may also be delivered by replicating from a reference plate, dotting with a pen, or spraying from a reservoir and then attached to a nitrocellulose or nylon membrane.

An overlapping set of polypeptides which define all possible linear epitopes of glycogenin may be arranged on a solid substrate to map the epitope specifically bound by a binding molecule (e.g., polyclonal or monoclonal antibody). See U.S. Pat. No. 5,194,392. Once a reactive epitope is defined, it may be used to isolate the specific binding molecule or to inhibit binding between glycogenin and the specific binding molecule. A polypeptide or specific binding molecule thereof may be used to establish a profiling reference panel. See U.S. Pat. Nos. 5,384,263, 5,541,070, and 5,798,275.

The density of polynucleotides and/or polypeptides may be arrayed at a density of at least about 10² molecules per cm², about 10³ molecules per cm², about 10⁴ molecules per cm², or about 10⁵ molecules per cm². The array may contain at least 10 different sequences, 10² different sequences, 10³ different sequences, 10⁴ different sequences, 10⁵ different sequences, or 10⁶ different sequences. Some or all of the arrayed molecules have sequences which correspond to a mutation of a glycogenin pseudogene or its product (e.g., 5, 10). Gene monitoring, fmgeφrinting, profiling, and other procedures suitable for single nucleotide polymoφhisms (SNPs) may be used in the present invention to identify or detect transcripts of human glycogenin pseudogenes that are distinguished from normal glycogenin transcripts in at least one nucleotide. See, for example, Chen et al. (1998),

Landegren et al. (1998), Germer and Higuchi (1999), Sapolsky et al. (1999), Gentalen and Chee (1999), and Gilles et al. (1999); U.S. Pat. Nos. 5,610,287, 5,679,524, and 5,885,775.

Standard procedures in the art are described in generally available references and laboratory manuals like Ausubel et al., Current Protocols in Molecular Biology, Wiley, 1998; Birren et al., Genome Analysis Series, CSHL Press, 1997-1999; Coligan et al.,

Current Protocols in Immunology, Wiley, 1998; Diffenbach and Dveksler, PCR Primer, CSHL Press, 1995; Dracopoli et al., Current Protocols in Human Genetics, Wiley, 1998; Harlow and Lane, Antibodies and Using Antibodies, CSHL Press, 1988 and 1999; Keller and Manak, DNA Probes, Stockton Press, 1993; Mullis et al., The Polymer αse Chain Reaction, Birkhauser, 1994; Sambrook et al., Molecular Cloning, CSHL Press, 1989; and Spector et al., Cells, CSHL Press, 1998.

The following examples are meant to be illustrative of the present invention, however its practice is not limited or restricted in any way by them.

EXAMPLES Example I Isolation of cDNA Coding for Glycogenin

Total RNA was isolated from white blood cells of healthy control and Type 2 diabetic individuals, and was transcribed by reverse transcriptase using nucleic acid primers derived from the 5' and 3' ends of the glycogenin gene and amplified by RT-PCR. The forward 5'-end primer (ACAGATCAGGCCTTTGTGAC, SEQ ID NO:20) and the reverse 3'-end primer (CTGGAGGTAAGTGTCGTCAAGTTTCC, SEQ ID NO:21) were synthesized using the TITAN™ single tube RT-PCR system of Roche Molecular Bio- chemicals according to the manufacturer's protocol. Agarose gel electrophoresis showed semi-quantitatively that much more RT-PCR product was present in samples from diabetics versus controls, indicating higher glycogenin message levels existed in white blood cells of Type 2 diabetic patients than in the white blood cells of normal control subjects. RNA was isolated from 1.5 ml of blood using the RNEASY RNA purification kit (Qiagen). The RT-PCR product was cloned in a TA CLONING plasmid vector from Invitrogen. DNA was isolated from a single bacterial colony transfected by the recombinant clone with a WIZARD DNA purification kit from Promega following the manufacturer's methodology, and then sequenced using the AMPLI CYCLE sequencing kit from Perkin Elmer according to the manufacturer's instructions.

Example II

Sequencing cDNA Coding for Glycogenin: Stop Codons

The RT-PCR products of Example I were cloned to the TA CLONING vector, manually sequenced by the dideoxy chain termination, and separated by polyacrylamide thin gel electrophoresis. The entire cDNAs isolated from three Type 2 patients and from two healthy subjects were sequenced. One clone from each of the three Type 2 patients presented numerous mutations in glycogenin cDNA, while preserving almost 90% of sequence identity compared to normal glycogenin cDNA. By contrast, no such mutations were found in one colony isolated from each of two healthy human subjects. The presence of cDNA mutations generally implies that mutations are present in corresponding mRNA. The most frequent mutations were silent, but also present were several premature stop codons. The presence of the premature stop codons suggested that pseudogenes were being expressed as mRNA, given that mRNA containing stop codons has been associated with other pseudogenes. See Description of the Related Art. The aberrant mRNA arose from the transcription of pseudogenes, as evidenced by nine mutations in the pseudogenes which were also detected in cDNAs obtained by RT-PCR of mRNA from five Type 2 patients, starting at Ala-234 and proceeding through the end of the coding region. Two primers designed to read the sequences surrounding these stop codons from Thr-30 to Lys- 96 and from Ser-157 to Phe-203 were used to analyze an average of eight clones derived from leukocytes of each of 12 Type 2 diabetic patients.

The results indicate that one stop codon is due to a single base mutation in the triplet TGG coding for Tφ-89 in which the first guanine is replaced by adenine (W89a). Another stop codon is due to a double mutation in the same TGG codon, where both guanines are replaced by adenines (W89b). Another stop codon is due to a single base mutation in the triplet CAG coding for Gln-93 in which cytosine is replaced by thymidine (Q93). A fourth stop codon is due to a single base mutation in the triplet TAC to TAG (Y194) coding for Tyr- 194 (Tyr- 194 is a point of attachment of first glucose). An addi- tional stop codon is due to a single adenine insertion in the triplet AGC coding for Ser- 171, causing a shift in the reading frame, followed by a second adenine insertion in the triplet ATC coding for isoleucine at position 178 (Ile-178), creating the TAA stop codon. At least two of the mutations described lead to expression of a protein lacking Tyr- 19, to which the first glucose unit of glycogen normally attaches. See Figure 3 which shows the sequences in the context of the rest of the human glycogenin gene or related pseudogenes as SEQ ID OS:3-19.

In further studies, most mutant cDNAs had multiple stop codons present in glycogenin cD A isolated from single colonies. These premature stop codons could be created by a missense mutation that caused a transition or transversion in the normal sequence, and/or an insertion or deletion that results in a frameshift mutation. RNA was obtained from leukocytes of test subjects, cDNA was produced by reverse transcription (RT) and amplified by the polymerase chain reaction (PCR), the RT-PCR products were cloned into a recombinant vector, bacteria were transfected to clone the products, and double-stranded product was manually sequenced. The mutations in the human glycogemn sequence are summarized in Table 2. Their detection as RNA transcripts indicates that the glycogenin pseudogenes in the human genome are actively transcribed. The numbers in parentheses correspond to total numbers of analyzed subjects (top horizontal row) and the subject carrying a particular set of mutations. The mutations are described in more detail below and in Figure 3 where they are displayed in the context of the glycogenin sequence.

Table 2. Occurrence of Stop Codons in Glycogen cDNA (from mRNA) in Normal and Diabetic Subjects. The numbers not enclosed in parentheses are the clones having the mutation(s) indicated.

From a total of 95 clones obtained from 12 Type 2 patients, the following mutated sequences were identified as indicative of transcription of the glycogenin pseudogenes: one clone had a Tφ-89 ("W89a") plus Gln-93 ("Q93") double mutation, 37 clones had a 5 Tφ-89 ("W89a") plus two adenine inserted ("Insertion"), four clones had a Gln-93 ("Q93") mutation plus two adenines inserted ("Insertion"), 43 clones had a Gln-93 ("Q93") and Tyr- 194 ("Y194") double mutation, two clones had a Tφ-89 ("W89a") plus Gln-93 ("Q93") plus Tyr-194 ("Y194") triple mutation, three clones had only two adenine inserted ("Insertion"), one clone had a Tφ-89 ("W89a") and Tyr-194 ("Y194") double mutation, and one clone had a Gln-93 ("Q93") mutation. Only three RT-PCR cDNA clones from Type 2 patients had a normal sequence. Stop codons were also observed in Type 1 patients and two subjects with normal blood glucose levels but a strong family history of Type 2 diabetes ("High Risk"). In the latter case, we observed predominantly a Gln-93 ("Q93") plus Tyr-194 ("Y194") double mutation.

Members of the normal (i.e., control) group are subdivided into subjects showing no mutations and subjects showing some mutations. Two normal subjects displayed 17 entirely normal sequences, while two other healthy subjects transcribed pseudogenes as shown by the identification of the same point and frameshift mutations which result in stop codons. These two normal subjects are over 60 years of age, and have no family history of diabetes.

Further investigation has shown that this correlation between transcription of glycogenin pseudogenes (i.e., sequences with premature stop codons) and Type 2 diabetes is not unqualified.

Ten clones obtained from blood samples of eight Type 2 patients were sequenced and the frequency of mutations is shown in Table 3. Four patients independently displayed the Gln-93 stop codon mutation six, seven, nine, and ten times. Patients displaying no stop codon nevertheless had other types of mutations, like all other Type 2 diabetic patients.

Relative amounts of transcribed normal and pseudogene RNA transcripts were further investigated using the standard procedures of colony lifting, hybridization, and dot blotting which are known in the art (Table 3). Probes labeled with ³²P were used to detect cDNA obtained from RT-PCR of human lymphocyte mRNA from Type 2 patients. Three probes were constructed, one corresponding to the normal glycogenin sequence (AGTAT- GCCTTGGTCCCCACCA, SEQ ID NO:22) and two corresponding to the same region but carrying pseudogene mutations (AGTAAGACTTGGTCTCCACTG for probe A and CGT- ATGCCTTGTTCTCCACCA for probe B, SEQ ID NO:23-24, respectively). Competitive hybridization with the labeled probe listed in each of the last three columns of the table was carried out at 52°C in a hybridization solution of 3 M tetramethylammonium chloride (TMAC); 0.6% (w/v) SDS; 1 mM EDTA; 0.1% (w/v) Ficoll 400, 0.1% (w/v) polyvinyl- pyrrolidone, 0.1 % (w/v) BSA fraction V (i.e., 5% Denhardt solution); and 10 mM sodium phosphate buffer (pH 6.8) in the presence of a 20-fold molar excess of the other two probes, which were unlabeled. Washing was done at the same temperature in a solution of 3 M TMAC, 0.6% (w/v), 1 mM EDTA, and 10 mM sodium phosphate buffer (pH 6.8). The results indicated that a substantial majority of lymphocyte mRNA corresponded to transcribed pseudogenes.

Table 3. Comparison of the Relative Occurrence of Normal and Mutated cDNA Sequences in Human Lymphocytes.

Subject No. of Percentage of dots

Observations Normal A ^• B

Normal 314 6.4 22.6 71.0

Type 2 233 1.7 12.0 86.3

Type 2 320 1.6 33.7 64.7

Type 2 460 15.0 56.4 28.6

Type 2 405 20.5 58.5 30.0

Type 2 392 11.0 13.5 75.5

Type 2 399 11.6 60.4 28.0

Type 2 492 15.7 40.4 43.9

Type 2 320 2.2 4.0 93.8

Example III

Sequencing cDNA Coding for Glycogenin: Silent Mutations

Genomic DNA of healthy and Type 2 subjects also exhibited a silent mutation in the codon for Pro-183 of glycogenin, close to the point of attachment of glycogen at Tyr- 194. The silent mutation gives rise to three genotypes, one of which is found in 11 ) of Type 2 diabetic patients studied, but not in any control subjects. The normal triplet can be either CCG (termed G) or CCA (termed A) and the mutation creates three genotypes: G/A, G/G and A/A. Differences between the two subject types in terms of the relative frequency of occurrence of the genotypes are shown in Table 4. Table 4. Genotypes Created by a Silent Mutation in the Codon for Proline-183.

Example IV Cloning RT-PCR Products From A Muscle Biopsy

Using the same two primers as described herein above for reading the sequences surrounding aberrant stop codons from Thr-30 to Lys-96 and from Ser-157 to Phe-203, a skeletal muscle biopsy from a Type 2 diabetic patient was analyzed for the presence of nonsense mutations (i.e., the creation of a stop codon). Skeletal muscle was chosen for sampling due to the reduced muscle glycogen levels associated with hyperglycemia in Type 2 patients. cDNA sequences isolated from the muscle biopsy generated 20 clones, three of which had a mutated stop codon in the triplet TGG coding for Tφ-89, followed by two adenine insertions. Another clone displayed a deletion of 12 base pairs coding for Ser-Phe-Asp-Gly located at positions 157-160 of the human glycogenin amino acid sequence, a mutation not observed when leukocyte cDNA sequences were analyzed. Sixteen clones from the muscle biopsy were normal.

Example V

Assays for Expression of Truncated Glycogenin Protein Expression of truncated glycogenins in Type 2 patients may be confirmed by

Western blotting using antibodies to muscle glycogenin. Abnormal, truncated glycogenin proteins should display predictable sizes and structures, distinguishing them from normal glycogenin protein. Skeletal muscle biopsies obtained from volunteers having either Type 1 diabetes, Type 2 diabetes, normal glycemia with a family history of Type 2 disease or a control will be analyzed and compared with results of blood sample analysis to learn whether aberrant mRNA is present. RNA will be isolated from the muscle biopsy using the RNEASY kit for RNA preparation (Qiagen) and analyzed as described hereinabove.

A simple blood test would be desirable for detection of abnormal glycogenin expression related to the observed pseudogene expression. In preliminary tests, leukocyte proteins were analyzed for the expression of truncated glycogenin proteins. No truncated glycogenin proteins were identified by Western blotting, using polyclonal antibodies against recombinant skeletal muscle glycogenin. These results suggest that truncated glycogenin proteins were not present in the blood samples tested.

Although stable truncated products predicted from the cDNA sequences from transcripts of the human glycogenin pseudogene were not detected, one cannot conclude that the effect of pseudogenes is not necessarily limited to the presence of RNA transcripts. Both normal subjects and diabetic patients have genomic DNA with pseudogenes that are indistinguishable from our analyses. Extrapolating from the central dogma of molecular biology (i.e., DNA→RNA→protein), this implies that any difference would be attributable to the RNA or protein produced if the effect is not due to DNA. But the absence of detectable truncated proteins in leukocytes with pseudogene transcripts does not mean that the effects of glycogenin pseudogenes are limited to RNA transcripts because truncated polypeptides translated from pseudogene transcripts may be translated at an undetectable level or they may be rapidly degraded. The most likely mechanism, however, for the effects of glycogenin pseudogenes on human physiology involves their transcription or transcripts. While the present invention has been described in detail and with reference to specific embodiments thereof, it will be apparent to those of ordinary skill in the art that various changes and modifications can be made therein without departing from the essential spirit and scope thereof. For example, additional pseudogene sequences may be obtained by cloning and sequencing techniques known in the art using the products and processes disclosed herein. Thus, legal protection of the present invention extends to the literal scope of the allowed claims and equivalents thereof instead of being limited by the description and examples which limitations are not recited in those claims.

The entire disclosures of the references, patents, books, manuals, and other references cited herein are hereby incoφorated herein by reference.

REFERENCES

Allgood and Eastman (1997) Curr. Opin. Biotechnol. 8, 474-479. Alonso et al. (1995) FASEB J. 9, 1126-1137. Altschul et al. (1997) Nucl. Acids Res. 25, 3389-3402.

Bailey and Lillehoj (1993) Biochem. Soc. Trans. 21, 454S. Barbetti et al. (1996) Biochem. Biophys. Res. Comm. 220, 72-77. Bard et al. (1995) Gene 153, 295-296. Beltz et al. (1983) Meth. Enzymol. 100, 266-285. Beutler et al. (1991) Proc. Natl. Acad. Sci. USA 88, 10544-10547.

Chakrabarti et al. (1995) Gene 53, 163-169.

Chen et al. (1998) Genome Res. 8, 549-556.

Church and Gilbert (1984) Proc. Natl. Acad. Sci. USA \, 1991-1995. Cristano et al. (1993) Genomics 17, 348-354.

Froguel and Nelho (1999) Trends Endocrinol. Metabol. 10, 142-146.

Furbap and Vanselow (1995) Gene 154, 287-291.

Gentalen and Chee (1999) Nucl. Acids Res. 27, 1485-1491.

Germer and Higuchi (1999) Genome Res. 9, 72-78. Gilles et al. (1999) Nature Biotech. 17, 365-370.

Gorlach et al. (1997) J. Clin. Invest. 100, 1907-1918.

Gould et al. (1989) Proc. Natl. Acad. Sci. USA 86, 1934-1938.

Habener and Stoffers (1998) Proc. Assoc. Amer. Physicians 110, 12-21.

Imai et al. (1993) Gene 136, 365-368. Landegren et al. (1998) Genome Res. 8, 769-776.

Lessa et al. (1993) Mol. Ecol. 2, 119-129.

Lomako et al. (1996) Genomics 33, 519-522.

Νeedleman and Wunsch, (1970) J. Mol. Bio. 48:443.

No et al. (1996) Proc. Natl. Acad. Sci. USA 93, 3346-3351. Pearson (1991) Genomics 11, 635-650.

Pearson and Lipman (1988) Proc. Natl. Acad. Sci. (USA) 85: 2444.

Permutt et al. (1998) Recent Prog. Horm. Res. 53, 201-216.

Rivera et al. (1996) Nature Med. 2, 1028-1032.

Sapolsky et al. (1999) Genet. Anal. 14, 187-192. Smith and Waterman (1981) J. Mol. Biol. 147, 195-197.

Soares (1997) Curr. Opin. Biotechnol. 8, 542-546.

Stein et al. (1983) Proc. Nat. Acad. Sci. USA 80, 6485-6489.

Timms et al. (1997) Hum. Mol. Genet. 6, 479-486.

Well et al. ( 1997) Gene 187, 115-122.

Claims

WHAT IS CLAIMED IS:

1. An isolated polynucleotide having a sequence at least 95% identical and less than 100% identical to a corresponding portion of SEQ ID NO:3, wherein the sequence is comprised of at least one mutation present in a human glycogenin pseudogene or a sequence complementary thereto.

2. An isolated polynucleotide according to claim 1 , wherein said portion of the nucleotide sequence shown in Figure 1 comprises at least 10 nucleotides.

3. A recombinant clone comprised of at least one polynucleotide according to claim 1 , wherein said polynucleotide is comprised of at least one human glycogenin pseudogene RNA transcript or cDNA thereof, and a heterologous vector sequences.

4. A recombinant clone comprised of at least one polynucleotide according to claim 1 , wherein said polynucleotide is comprised of at least one human glycogenin pseudogene DNA, and a heterologous vector sequence.

5. An isolated polynucleotide according to claim 1 , wherein said mutation is selected from the group consisting of a substitution mutation and a premature termination codon.

6. An isolated polynucleotide according to claim 5, wherein said mutation is a substitution mutation selected from the group consisting of A234P, M239T, H241 Y, L247L, P296P, S300S, R303W, E305E, and R306R.

7. An isolated polynucleotide according to claim 5, wherein said mutation is a premature termination codon selected from the group consisting of W89a (TAG), Y194 (TAG), Q93 (TAG), W89b (TAA) and a first single A insertion in triplet codon AGC coding for Ser-171 together with a second single A insertion in position Ile-78 (TAA).

8. An isolated polynucleotide according to claim 1, wherein said mutation comprises a silent mutation in a codon for Pro- 183 , causing genotype A /A.

9. An isolated polynucleotide of at least ten bases in length having a sequence of a human glycogenin pseudogene with at least nucleotide different from a normal glycogenin gene.

10. An array comprised of a solid substrate on which at least ten different polynucleotides are attached at a density of more than 100 per square centimeter, wherein said polynucleotides are at least ten bases long and at least one is specific for a mutation in a human glycogenin pseudogene.

11. An array according to claim 10, wherein said polynucleotides each have a length of between 10 bases and 100 bases.

12. An array according to claim 10, wherein said polynucleotides each have a length of between 15 bases and 50 bases.

13. An array according to claim 10, wherein at least 100 different polynucleotides are attached and at least ten of the polynucleotides are specific for mutations of human glycogenin pseudogenes.

14. An array according to claim 10, wherein said mutation is selected from the group consisting of a missense mutation in an active site of the glycogenin enzyme and a mutation which creates at least a stop codon prior to termination at codon 333.

15. An array according to claiml 0, wherein a detectable signal indicates hybridization to at least one polynucleotide attached to the array, and said detectable signal's location in said array indicates whether there is hybridization to said mutation.

16. A transfected cell or transgenic non-human animal in which at least one polynucleotide encoding glycogenin is integrated in said cell's or animal's genome, said polynucleotide having a sequence with a stop codon prior to a termination codon of said cell's or animal's glycogenin gene.

7. A transfected cell or transgenic non-human animal in which at least one polynucleotide encoding a human glycogenin gene and at least one polynucleotide encoding a human glycogenin pseudogene are integrated in said cell's or animal's genome.

18. A method of detecting expression of a glycogenin pseudogene selected from the group consisting of:

(a) analyzing a sample of mRNA from a subject for the presence of a mutation from an mRNA encoded by a human glycogenin pseudogene and

(b) analyzing a sample of polypeptides from a subject for the presence of a glycogenin polypeptide encoded by a human glycogenin pseudogene.

19. A method of determining relative amounts of normal glycogenin gene and glycogenin pseudogene transcription comprising:

(a) providing a sample containing RNA transcripts or polynucleotides corresponding to said transcripts;

(b) contacting said sample with at least one polynucleotide probe under specific binding conditions, wherein said probe is able to hybridize to at least a portion of said transcripts or polynucleotides and is able to distinguish between transcription from said normal gene and said pseudogene; and

(c) comparing hybridization to said transcripts or polynucleotides of said normal gene to said pseudogene, thereby determimng the relative amounts of transcription from at least one of said normal gene and said pseudogene.

20. A method of detecting a transcript from a human glycogenin pseudogene comprising:

(a) providing a sample containing at least one polynucleotide, wherein said sample is obtained from a human subject;

(b) contacting said sample with a probe under specific binding conditions, wherein said probe is able to hybridize to at least a portion of said transcript; and

(c) determining whether said probe is hybridized to at least one polynucleotide contained in said sample, wherein hybridization of said probe indicates that said transcript is present in said sample.

21. A kit for detecting at least one human glycogenin pseudogene or transcription form at least one human glycogenin pseudogene comprised of at least one probe able to hybridize to at least a portion of said pseudogene, a transcript from said pseudogene, or a polynucleotide synthesized corresponding to said transcript in preference to a normal human glycogenin gene.