WO2009064897A2

WO2009064897A2 - Detection of nucleic acid sequence variations in circulating nucleic acid in bovine spongiform encephalopathy

Info

Publication number: WO2009064897A2
Application number: PCT/US2008/083420
Authority: WO
Inventors: Ekkehard Schuetz; Julia Beck; Howard Urnovitz
Original assignee: Chronix Biomedical Inc
Current assignee: Chronix Biomedical Inc
Priority date: 2007-11-14
Filing date: 2008-11-13
Publication date: 2009-05-22
Anticipated expiration: 2010-05-14
Also published as: WO2009064897A3

Abstract

The present invention provides methods and compositions for detecting a transmissible spongiform encephalopathy, e.g., BSE, based on the presence of BSE- associated polymorphisms in nucleic acid samples from acellular samples.

Description

DETECTION OF NUCLEIC ACID SEQUENCE VARIATIONS IN CIRCULATING NUCLEIC ACID IN BOVINE SPONGIFORM

ENCEPHALOPATHY

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims benefit of U.S. provisional application no. 60/988,066, filed November 14, 2007, which application is herein incorporated by reference.

BACKGROUND OF THE INVENTION

[0002] Mad cow disease or bovine spongiform encephalopathy (BSE) is a progressive, invariably fatal neurodegenerative disease in cattle. BSE was recognized as a public health concern in 1996 when young Britons were diagnosed with what appeared to be a new form of a familial illness of older age, Creutzfeldt- Jakob Disease (CJD). British scientists linked the development of this "variant Creutzfeldt- Jakob Disease" (vCJD) to exposure to and/or consumption of BSE cattle. As of November 2002, 143 cases of "definite or probable" vCJD had been diagnosed in the UK.

[0003] Currently, the only available BSE tests are post mortem tests performed on brain tissue from animals that have been slaughtered. In addition, the majority of these tests detect abnormal proteins known as prions, which are not found in an animal until the disease has progressed into late stage. Early stage spongiform encephalopathies are difficult to detect by prion testing because prion accumulation is most often associated with late-stage disease. Genetic tests for prion gene polymorphisms are currently used to determine the susceptibility of sheep for scrapie (Hunter, et al. Arch Virol 141:809-824, 1996). No such diversity of prion genes is found in BSE. However, the detection of nucleic acids in cattle sera (Schϋtz et al. CDLI) has previously been reported. Tests for detection and monitoring of genetic material associated with chronic illnesses other than BSE can be performed using sera. Such circulating nucleic acids (CNA) associated tests are often designed to detect unique nucleic acid targets, usually of exogenous origin, e.g. HIV-I, CMV, HCV and HBV. CNAs of possible endogenous origin also have been found to be associated with chronic illnesses in humans (Urnovitz, et al. Clin Diagn Lab Immunol 6:330-335,1999; Durie, Urnovitz, & Murphy Acta Oncol 39:789-796, 2000). These studies focused on the detection of only repetitive genomic sequences or repetitive sequences rearranged with other genomic elements. These studies did not analyze variances or polymorphisms

[0004] It has been suggested that current tests are not sensitive enough to fully protect against the entry of BSE cattle into the human food chain (Knight, Nature 426:216, 2003). Further, current tests cannot identify cohort herd mates of BSE-infected cattle that have an increased risk of BSE. The current invention addresses this need.

BRIEF SUMMARY OF THE INVENTION

[0005] This invention is based on the discovery that single nucleotide variances and polymorphisms in nucleic acid sequences are detected in acellular samples, such as serum or plasma, from animals at risk for transmissible spongiform encephalopathy, e.g., BSE. The invention therefore provides a method of detecting an animal with bovine spongiform encephalopathy (BSE), the method comprising: detection of an individual or multiple single nucleotide polymorphisms (SNPs), also referred to herein as single nucleotide variations (SNVs), in nucleic acids extracted from an acellular sample obtained from the animals. In some embodiments, the sample is an acellular fluid such as serum or plasma. The nucleic acid sample can be a DNA sample or RNA sample.

[0006] Typically, BSE specific SNPs are selected from a database of nucleic acid sequences. In some embodiments, the database is generated by ultra deep sequencing technology whereby sequences present in BSE animals are compared to sequences present in animals without BSE.

[0007] SNVs can be detected using methods that are well known in the art. Most assays entail one of several general protocols: hybridization using allele-specific oligonucleotides, primer extension, allele-specific ligation, sequencing, or electrophoretic separation techniques, e.g., singled stranded conformational polymorphism (SSCP) and heteroduplex analysis. Other assays include 5' nuclease assays, template-directed dye terminator incorporation, molecular beacon allele-specific oligonucleotide assays, single-base extension assays, and SNP scoring by real-time pyrophosphate sequences. Analysis of amplified sequences can be performed using various technologies such as microchips, fluorescence polarization assays, and matrix-assisted laser desoprtion ionization (MALDI) mass spectrometry. Two methods that can also be used are assays based on invasive cleavage with Flap nucleases and methodologies employing padlock probes. In typical embodiments, the presence of SNVs is detected by sequencing.

[0008] In another aspect, the invention provides a method of selecting technologies to use in an amplification reaction to detect an animal with BSE, the method comprising: identifying nucleic acid sequences that have differences in BSE animals compared to normal; and selecting reagents that detect specific SNP/SNV reactivity. Typically, the sequences are . identified in acellular samples, such as serum, hi one embodiment, the invention provides a method for the detection of SNPs/SNVs to detect an animal with BSE, the method comprising: whole genome amplification of circulating nucleic acids, ultra deep sequencing of the amplified products and identification of SNPs/SNVs in the resulting database in BSE animals as compared to normal controls.

[0009] The invention provides a method of detecting an animal with BSE, the method comprising: extracting nucleic acids from a sample and detecting the statistical presence of a SNP/SNV in the extracted nucleic acids. Detection of a SNP/SNV can be done directly or indirectly, e.g., through amplification of the target nucleic acids and query at a variant position within the target nucleic acid using an oligonucleotide that selectively hybridizes to a reference sequence or known BSE-associated variant; or by direct sequencing..

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] Fig. 1 provides an example of a query repetitive element against a database. Calculations based on sequences Infected and controls compared using Chi-square test. The solid line shows the Chi-square value at each position of this example queried sequence, based on distribution of nucleotide sampling, ie. A, C, G, T or a insertion or deletion, of sequences from animals with BSE as compared to sequences from normal controls. The dotted line depicts the total number of hits in the database for each position on this example queried sequence.

[0011] Fig. 2 shows exemplary SNV analysis. Whenever the dotted line reaches 1.0 the position has a single nucleotide variation found only in animals with BSE as compared to normal controls. The accompanying solid line is the respective Chi-square value, based on distribution of nucleotide sampling, ie. A, C, G, T or a insertion or deletion, calculated per animal. Throughout this example sequence, more than one position contains a SNV that is present in BSE animals but not in normal controls. DETAILED DESCRIPTION OF THE INVENTION Definitions

[0012] A "cohort" refers to birth or feeding cohorts that are defined according to the official EU definition as being raised or born on the same farm within 12 months prior to or after a BSE index case.

[0013] Animals "with BSE" refer to cattle that are incubating BSE etiologic agents but may or may not show any clinical signs of BSE or PrP^res reactivity at the time of sampling.

[0014] The term "reactivity" as used herein refers to a change in a characteristic of SNP/SNV detection, in the presence of a nucleic acid sequence that is indicative of BSE. A sample is considered reactive when it exhibits a value of at least 3, preferably 5 standard deviations above a reference standard.

[0015] A "positive reference" or "positive control" is a sample that is known to contain SNPs/SNVs that are indicative of BSE. In some embodiments, a "positive reference" can be from a known cohort animal that was reactive in the assay of the invention. Alternatively, a "positive reference" can be a synthetic construct that shows reactivity in an assay of the invention.

[0016] A "reference control" is a sample that results in minimal change to the SNP/SNV detection in BSE. Often, such a sample is a known negative, e.g., from healthy animals. For example, in diagnostic applications, such a control is typically derived from a normal animal that is not a cohort with a PrP^res animal. A "reference control" is preferably included in an assay, but may be omitted.

[0017] "Amplifying" refers to a step of submitting a solution to conditions sufficient to allow for amplification of a polynucleotide if all of the components of the reaction are intact. Components of an amplification reaction include, e.g., primers, a polynucleotide template, polymerase, nucleotides, and the like. The term "amplifying" typically refers to an "exponential" increase in target nucleic acid. However, "amplifying" as used herein can also refer to linear increases in the numbers of a select target sequence of nucleic acid.

[0018] An "amplification characteristic" refers to any parameter of an amplification reaction. Such reactions typically comprises repeated cycles. An amplification characteristic may be the number of cycles, a melting curve, temperature profile, or band characteristics on a gel or other means of post-amplification detection. [0019] In the context of this invention, "the term allele-specific probe" does not refer to an allele per se, but to a probe to a variant nucleic acid sequence relative to a reference sequence. Thus an "allele" in the context of this invention refers to a variant nucleic acid sequence in comparison to a references sequence, e.g., the reference sequences set forth in SEQ ID NOs 1-41.

[0020] A "melting profile" or "melting curve" refers to the melting temperature characteristics of a nucleic acid fragment over a temperature gradient. In some embodiments, the melting curve is derived from the first derivative of the melting signal. The melting point of a DNA fragment depends, e.g., on its length, its G/C content, the ionic strength of the buffer and the presence of mismatches (heteroduplexes). Thus, the proportion of the molecules in the population that are melting over a temperature range generates a melting profile, which is unique to a particular fragment or population of molecules.

[0021] The term "amplification reaction" refers to any in vitro means for multiplying the copies of a target sequence of nucleic acid. Such methods include but are not limited to polymerase chain reaction (PCR), DNA ligase, (LCR), QβRNA replicase, RNA transcription- based (TAS and 3SR) amplification reactions, and nucleic acid sequence based amplification (NASBA). (See, e.g., Current Protocols in Human Genetics Dracopoli et al. eds., 2000, John Wiley & Sons, Inc.).

[0022] "Polymerase chain reaction" or "PCR" refers to a method whereby a specific segment or subsequence of a target double-stranded DNA, is amplified in a geometric progression. PCR is well known to those of skill in the art; see, e.g., U.S. Patents 4,683,195 and 4,683,202; PCR Technology: Principles and Applications for DNA Amplification (Erlich, ed., 1992)and PCR Protocols: A Guide to Methods and Applications, Innis et al, eds, 1990.

[0023] The term "amplification reaction mixture" refers to an aqueous solution comprising the various reagents used to amplify a target nucleic acid. These include enzymes, aqueous buffers, salts, amplification primers, target nucleic acid, and nucleoside triphosphates. Depending upon the context, the mixture can be either a complete or incomplete amplification reaction mixture

[0024] A "primer" refers to a polynucleotide sequence that hybridizes to a sequence on a target nucleic acid and serves as a point of initiation of nucleic acid synthesis. Primers can be of a variety of lengths and are often less than 50 nucleotides in length, for example 12-25 nucleotides, in length. The length and sequences of primers for use in PCR can be designed based on principles known to those of skill in the art, see, e.g., Innis et al., supra. A primer is preferably a single-stranded oligodeoxyribonucleotide. The primer includes a "hybridizing region" exactly or substantially complementary to the target sequence, preferably about 15 to about 35 nucleotides in length. A primer oligonucleotide can either consist entirely of the hybridizing region or can contain additional features which allow for the detection, immobilization, or manipulation of the amplified product, but which do not alter the ability of the primer to serve as a starting reagent for DNA synthesis. For example, a nucleic acid sequence tail can be included at the 5' end of the primer that hybridizes to a capture oligonucleotide. As appreciated by one of skill in the art, a primer for use in the invention need not exactly correspond to the sequence(s) that it amplifies in a hybridization reaction. For example, the incorporation of mismatches into a probe can be used to adjust duplex stability when the assay format precludes adjusting the hybridization conditions. The effect of a particular introduced mismatch on duplex stability is well known, and the duplex stability can be routinely both estimated and empirically determined, as described above. Suitable hybridization conditions, which depend on the exact size and sequence of the probe, can be selected empirically using the guidance provided herein and well known in the art (see, e.g., the general PCR and molecular biology technique references cited herein).

[0025] The term " subsequence" when referring to a nucleic acid refers to a sequence of nucleotides that are contiguous within a second sequence but does not include all of the nucleotides of the second sequence.

[0026] A "temperature profile" refers to the temperature and lengths of time of the denaturation, annealing and/or extension steps of a PCR reaction. A temperature profile for a PCR reaction typically consists of 10 to 60 repetitions of similar or identical shorter temperature profiles; each of these shorter profiles may typically define a two step or three- step PCR reaction. Selection of a "temperature profile" is based on various considerations known to those of skill in the art, see, e.g., Innis et al., supra.

[0027] A "template " refers to a double or single stranded polynucleotide sequence that comprises a polynucleotide to be amplified.

[0028] An "acellular biological fluid" is a biological fluid that substantially lacks cells. Typically, such fluids are fluids prepared by removal of cells from a biological fluid that normally contains cells (e.g., whole blood). Exemplary processed acellular biological fluids include processed blood (serum and plasma), e.g., from peripheral blood or blood from body cavities or organs; and samples prepared from urine, milk, saliva, sweat, tears, phlegm, cerebrospinal fluid, semen, feces, and the like. Often, serum or plasma is the acellular sample that is analyzed in the assays of the invention. Other acellular samples that can be used include samples comprising nucleic acids obtained by washing any cell preparation to remove circulating nucleic acids that are associated with the cell surface. For example, such an acellular sample can be obtained by washing circulating blood cells, such as lymphocytes. The supernatant from the wash can then be analyzed.

[0029] "Nucleic acid" refers to a deoxyribonucleotide or ribonucleotide polymer in either single- or double-stranded form, or chimeric constructs of polynucleotides chemically linked to reporter molecules, and unless otherwise limited, would encompass known analogs of natural nucleotides that can function in a similar manner as naturally occurring nucleotides.

[0030] The term "biological sample", as used herein, refers to a sample obtained from an organism or from components (e.g., cells) of an organism. The sample may be of any biological tissue or fluid. Frequently the sample will be a "clinical sample" which is a sample derived from a patient, animal or human, with a disease or suspected of having a disease. Such samples include, but are not limited to, sputum, blood, serum, plasma, body cavity blood or blood products, blood cells (e.g., white cells), tissue or fine needle biopsy samples, urine, milk, peritoneal fluid, and pleural fluid, or cells therefrom. Biological samples may also include sections of tissues such as frozen sections taken for histological purposes.

[0031] An "individual" or "patient" as used herein, refers to any animals, often mammals, including, but not limited to humans, nonhuman primates such as chimpanzees and monkeys, horses, cows, deer, sheep, goats, pigs, dogs, minks, elk, cats, lagromorphs, and rodents.

[0032] A "chronic illness" is a disease, symptom, or syndrome that last for months to years. Examples of chronic illnesses in animals include, but are not limited to, cancers and wasting diseases as well as autoimmune diseases, and neurodegenerative diseases such as spongiform encephalopathies and others.

[0033] "Repetitive genomic sequences" or "repetitive genomic nucleic acid sequences" (RGNAS) refer to highly repeated DNA elements present in the animal genome. These sequences are usually categorized in sequence families and are broadly classified as tandemly repeated DNA or interspersed repetitive DNA (see, e.g., Jelinek and Schmid, Ann. Rev. Biochem. 51:831-844, 1982; Hardman, Biochem J. 234:1-11, 1986; and Vogt, Hum. Genet. 84:301-306, 1990). Tandemly repeated DNA includes satellite, minisatellite, and microsatellite DNA. Repetitive genomic sequences includes AIu sequences, short interspersed nuclear elements (SINES), long terminal repeats (LTR), LTR and non-LTR transposable elements, LTR and non-LTR retrotransposons, endogenous retroviruses, and long interspersed nuclear elements (LINES) including Ll LINE sequences.

[New Paragrpah] "Intergenic sequence" or "spacer DNA" or "non-coding sequence" refers to those nucleic acid sequences, including non-intronic sequences that do not code for protein sequences.

[0034] A "rearranged sequence" or "recombined sequence" is a region of the genomic DNA that is rearranged compared to normal, i.e., the rearranged sequence is not contiguous in genomic DNA in healthy animals or in genomic DNA obtained from animals prior to contracting a disease or prior to exposure to a genotoxic agent.

[0035] A single nucleotide polymorphism (SNP) is used herein interchangeably with the term "single nucleotide variance" or " single nucleotide variations" (SNV). A SNP (SNV) is a difference at a single nucleotide position of a test sequence when compared to a reference nucleic acid sequence in which 1 and up to 5 nucleotides can be different. In one embodiment, the reference sequence can be derived experimentally from nucleic acids sequenced from the serum of normal individuals and the test sequence may be derived from the serum of an animal with BSE. Such variability can include regions of short nucleotide (1- 5 nucleotide) deletions and insertions. SNPs (SNVs) may occur at any region in the genome or in nucleic acid sequences. In the current invention, the change is in a non-coding region of DNA, including, but not limited to repetitive sequences, and intragenic DNA.

[0036] "Ultra Deep Sequencing" (454 Sequencing) is a massively-parallel pyrosequencing system capable of sequencing roughly 100 megabases of raw DNA sequence per 7-hour run using the GSFLX sequencing machine. The system relies on fixing nebulized and adapter- ligated DNA fragments to small DNA-capture beads in a water-in-oil emulsion. The DNA fixed to these beads is then amplified by PCR. Finally, each DNA-bound bead is placed into a ~44 μm well on a PicoTiterPlate, a fiber optic chip. A mix of enzymes such as polymerase, ATP sulfurylase, and luciferase are also packed into the well. The PicoTiterPlate is then placed into the GS20 for sequencing.

[0037] "Contig" refers to sequences that are computationally assembled from several overlapping physically contiguous sequences into one contiguous sequence. Such a contig is usually, but not necessarily longer than the initial sequences. [0038] "Whole genome amplification" is a technique in which minute amounts of DNA can be multiplied to generate quantities suitable for genetic testing and analysis.

[0039] "Circulating nucleic acids" or "CNAs" refers to DNA or RNA that is found in acellular fluids.

[0040] Typically, stringent hybridization conditions will be those in which the salt concentration is about 0.2XSSC at pH 7 and the temperature is at least about 6O⁰C. For example, a nucleic acid of the invention or fragment thereof can be identified in standard filter hybridizations using the nucleic acids disclosed here under stringent conditions, which for purposes of this disclosure, include at least one wash (usually 2) in 0.2X SSC at a temperature of at least about 60°C, usually about 65°C, sometimes 70°C for 20 minutes, or equivalent conditions. For PCR, an annealing temperature of about 5°C below Tm, is typical for low stringency amplification, although annealing temperatures may vary between about 32°C and 72°C, e.g., 40⁰C, 42°C, 45°C, 52°C, 55⁰C, 57⁰C, or 62°C, depending on primer length and nucleotide composition. High stringency PCR amplification, a temperature at, or slightly (up to 5°C) above, primer Tm is typical, although high stringency annealing temperatures can range from about 50⁰C to about 72°C, and are often 72°C, depending on the primer and buffer conditions (Ahsen et al, Clin Chem. 47:1956-61, 2001). Typical cycle conditions for both high and low stringency amplifications include a denaturation phase of 90°C-95°C for 30 sec-2 min., an annealing phase lasting 30 sec-10 min., and an extension phase of about 72°C for 1 -15 min.

Introduction

[0041] BSE is clinically characterized by increasing perturbation of central nervous function in the affected animal, ultimately leading to severe symptoms, e.g., an inability to stand, forcing the sacrifice of the animal. In contrast to other mammalian transmissible spongiform encephalopathies, the bovine form does not appear to be associated with a mutation in the prion gene, but may be caused by a post-translational misfolding of the prion protein, which leads to aggregation in the central nervous system. The diagnosis is based on the fact that misfolded prion protein has enhanced resistance to protease K digestion. As disease-specific prion accumulation in the plasma or blood of animals has not been identified, the diagnostic target has been the brain stem. Therefore, there is an urgent need to define a blood-borne marker for TSEs so that the disease can be diagnosed in living animals, in particular in animals that may be at increased risk for the disease, e.g., cohort animals, such as cows in the same herd as an infected animal.

[0042] The invention provides a method for diagnosing an increased risk for BSE by amplification and analysis of circulating nucleic acids (CNA) from test animals.

Nucleic acids detected in the methods of the invention

[0043] Nucleic acid molecules detected in the methods of the invention may be free, single or double stranded, molecules or complexed with protein or lipids or both. The detected nucleic acids can be DNA or RNA molecules. RNA molecules need not be transcribed from a gene, but can be transcribed from any sequence in the chromosomal DNA. Exemplary RNAs include miRNA, intergenic RNA, small nuclear RNA (snRNA), mRNA, tRNA, rRNA, and interference RNA (iRNA).

[0044] The nucleic acid molecules may comprise sequences transcribed from repetitive genomic sequences or intergenic or non-coding DNA in the genome of the individual from which the sample is derived. The detected nucleic acid molecules may also be the products of rearrangement of germline sequences and/or sequences introduced into the genome, e.g., exogenous viral sequences.

[0045] The method does not require knowledge of the polynucleotide sequences present in the test samples to be evaluated. Thus, a polynucleotide detected using this method may be a particular polynucleotide or may be a population of polynucleotides that are present in the sample. Furthermore, even in instances, where the polynucleotide to be detected has a known sequence, the polynucleotide in a particular sample, need not have that sequence, i.e., the sequence of the polynucleotide in the sample may be altered in comparison to the known sequence. Such alterations can include mutations, e.g., insertions, deletions, substitutions, and various other rearrangements. Further, the resulting amplified products may be as result of the amplification reaction and not reflect the original pool of polynucleotides.

Test samples

[0046] The test samples are typically biological samples that comprise target nucleic acids. A target nucleic acid can be from any source, but is typically from a biological sample that comprises small quantities of nucleic acid, e.g., nucleic acid samples obtained from samples that are not readily quantified by standard PCR methodology. In particular embodiments, the test sample is a nucleic acid, e.g., RNA or DNA that is isolated from serum or plasma. SNP/SNV Detection Reactions

[0047] Detection techniques for evaluating nucleic acids for the presence of a SNP or SNV involve procedures well known in the field of molecular genetics. Further, many of the methods involve amplification of nucleic acids. Ample guidance for performing such technicques is provided in the art. Exemplary references include manuals such as PCR Technology: Principles and Applications for DNA Amplification (ed. H. A. Erlich, Freeman Press, NY, N. Y., 1992); PCR Protocols: A Guide to Methods and Applications (eds. Innis, et al., Academic Press, San Diego, Calif, 1990); Current Protocols in Molecular Biology, Ausubel, 1994-1999, including supplemental updates through April 2004; Sambrook & Russell, Molecular Cloning, A Laboratory Manual (3rd Ed, 2001). In addition, microarrays can be utilized for genomewide SNP detection assays (Genomewide SNP assay reveals mutations underlying Parkinson disease. Simon-Sanchez J, Scholz S, Del Mar Matarin M, Fung HC, Hernandez D, Gibbs JR, Britton A, Hardy J, Singleton A. Hum Mutat. 2007 Nov 9)

[0048] Although the methods typically employ PCR steps, other amplification protocols may also be used. Suitable amplification methods include ligase chain reaction {see, e.g., Wu & Wallace, Genomics 4:560-569, 1988); strand displacement assay {see, e.g., Walker et al, Proc. Natl. Acad. ScL USA 89:392-396, 1992; U.S. Pat. No. 5,455,166); and several transcription-based amplification systems, including the methods described in U.S. Pat. Nos. 5,437,990; 5,409,818; and 5,399,491; the transcription amplification system (TAS) (Kwoh et al, Proc. Natl. Acad. ScL USA 86:1173-1177, 1989); and self-sustained sequence replication (3SR) (Guatelli et al, Proc. Natl. Acad. ScL USA 87:1874-1878, 1990; WO 92/08800). Alternatively, methods that amplify the probe to detectable levels can be used, such as Qβ- replicase amplification (Kramer & Lizardi, Nature 339:401-402, 1989; Lomeli et al, CHn. Chem. 35:1826-1831, 1989). A review of known amplification methods is provided, for example, by Abramson and Myers in Current Opinion in Biotechnology 4:41-47, 1993.

[0049] Typically, the detection of a SNP/SNV is performed using oligonucleotide primers and/or probes. Oligonucleotides can be prepared by any suitable method, usually chemical synthesis. Oligonucleotides can be synthesized using commercially available reagents and instruments. Alternatively, they can be purchased through commercial sources. Methods of synthesizing oligonucleotides are well known in the art {see, e.g, Narang et al, Meth. Enzymol. 68:90-99, 1979; Brown et al, Meth. Enzymol. 68:109-151, 1979; Beaucage et al, Tetrahedron Lett. 22:1859-1862, 1981; and the solid support method of U.S. Pat. No. 4,458,066). In addition, modifications to the above-described methods of synthesis may be used to desirably impact enzyme behavior with respect to the synthesized oligonucleotides. For example, incorporation of modified phosphodiester linkages (e.g., phosphorothioate, methylphosphonates, phosphoamidate, or boranophosphate) or linkages other than a phosphorous acid derivative into an oligonucleotide may be used to prevent cleavage at a selected site, hi addition, the use of 2 '-amino modified sugars tends to favor displacement over digestion of the oligonucleotide when hybridized to a nucleic acid that is also the template for synthesis of a new nucleic acid strand.

[0050] Frequently used methodologies for analysis of nucleic acid samples to detect SNPs/SNVs are briefly described. However, any method known in the art can be used in the invention to detect the presence of short nucleotide substitutions.

Variant Specific Hybridization

[0051] This technique, also commonly referred to as allele specific oligonucleotide hybridization (ASO) (e.g., Stoneking et al., Am. J. Hum. Genet. 48:70-382, 1991; Saiki et al., Nature 324, 163-166, 1986; EP 235,726; and WO 89/11548), relies on distinguishing between two DNA molecules differing at a polymorphic position, typically by one nucleotide, by hybridizing an oligonucleotide probe that is specific for one of the variants to an amplified product obtained from amplifying the nucleic acid sample. This method typically employs short oligonucleotides, e.g., 15-20 bases in length. The probes are designed to differentially hybridize to one variant versus another. For example, in some embodiments, probes are designed to hybridize to the version of the nucleic acid sequence that is present in normal cows. Principles and guidance for designing such probe is available in the art, e.g., in the references cited herein. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles. Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position (e.g., in a 15-base oligonucleotide at the 7 position; in a 16-based oligonucleotide at either the 8 or 9 position) of the probe, but this design is not required.

[0052] The amount and/or presence of an allele is determined by measuring the amount of allele-specific oligonucleotide that is hybridized to the sample. Typically, the oligonucleotide is labeled with a label such as a fluorescent label. After stringent hybridization and washing conditions, fluorescence intensity is measured for each SNP oligonucleotide.

[0053] In one embodiment, the nucleotide present at the polymorphic site is identified by hybridization under sequence-specific hybridization conditions with an oligonucleotide probe exactly complementary to one of the polymorphic alleles in a region encompassing the polymorphic site. The probe hybridizing sequence and sequence-specific hybridization conditions are selected such that a single mismatch at the polymorphic site destabilizes the hybridization duplex sufficiently so that it is effectively not formed. Thus, under sequence- specific hybridization conditions, stable duplexes will form only between the probe and the exactly complementary allelic sequence. Thus, oligonucleotides from about 10 to about 35 nucleotides in length, preferably from about 15 to about 35 nucleotides in length, which are exactly complementary to an allele sequence in a region which encompasses the polymorphic site are within the scope of the invention.

[0054] In an alternative embodiment, the nucleotide present at the polymorphic site is identified by hybridization under sufficiently stringent hybridization conditions with an oligonucleotide substantially complementary to one of the SNP/SNV alleles in a region encompassing the polymorphic site, and exactly complementary to the allele at the polymorphic site. Because mismatches which occur at non-polymorphic sites are mismatches with both allele sequences, the difference in the number of mismatches in a duplex formed with the target allele sequence and in a duplex formed with the corresponding non-target allele sequence is the same as when an oligonucleotide exactly complementary to the target allele sequence is used. In this embodiment, the hybridization conditions are relaxed sufficiently to allow the formation of stable duplexes with the target sequence, while maintaining sufficient stringency to preclude the formation of stable duplexes with non-target sequences. Under such sufficiently stringent hybridization conditions, stable duplexes will form only between the probe and the target allele. Thus, oligonucleotides from about 10 to about 35 nucleotides in length, preferably from about 15 to about 35 nucleotides in length, which are substantially complementary to an allele sequence in a region which encompasses the polymorphic site, and are exactly complementary to the allele sequence at the polymorphic site, are within the scope of the invention.

[0055] The use of substantially, rather than exactly, complementary oligonucleotides may be desirable in assay formats in which optimization of hybridization conditions is limited. For example, in a typical multi-target immobilized-probe assay format, probes for each target are immobilized on a single solid support. Hybridizations are carried out simultaneously by contacting the solid support with a solution containing target DNA. As all hybridizations are carried out under identical conditions, the hybridization conditions cannot be separately optimized for each probe. The incorporation of mismatches into a probe can be used to adjust duplex stability when the assay format precludes adjusting the hybridization conditions. The effect of a particular introduced mismatch on duplex stability is well known, and the duplex stability can be routinely both estimated and empirically determined, as described above. Suitable hybridization conditions, which depend on the exact size and sequence of the probe, can be selected empirically using the guidance provided herein and well known in the art. The use of oligonucleotide probes to detect single base pair differences in sequence is described in, for example, Conner et al., 1983, Proc. Natl. Acad. Sci. USA 80:278-282, and U.S. Pat. Nos. 5,468,613 and 5,604,099, each incorporated herein by reference.

[0056] The proportional change in stability between a perfectly matched and a single-base mismatched hybridization duplex depends on the length of the hybridized oligonucleotides. Duplexes formed with shorter probe sequences are destabilized proportionally more by the presence of a mismatch. In practice, oligonucleotides between about 15 and about 35 nucleotides in length are preferred for sequence-specific detection. Furthermore, because the ends of a hybridized oligonucleotide undergo continuous random dissociation and re- annealing due to thermal energy, a mismatch at either end destabilizes the hybridization duplex less than a mismatch occurring internally. Preferably, for discrimination of a single base pair change in target sequence, the probe sequence is selected which hybridizes to the target sequence such that the polymorphic site occurs in the interior region of the probe.

[0057] Suitable assay formats for detecting hybrids formed between probes and target nucleic acid sequences in a sample are known in the art and include the immobilized target (dot-blot) format and immobilized probe (reverse dot-blot or line-blot) assay formats. Dot blot and reverse dot blot assay formats are described in U.S. Pat. Nos. 5,310,893; 5,451,512; 5,468,613; and 5,604,099; each incorporated herein by reference.

[0058] In a dot-blot format, amplified target DNA is immobilized on a solid support, such as a nylon membrane. The membrane-target complex is incubated with labeled probe under suitable hybridization conditions, unhybridized probe is removed by washing under suitably stringent conditions, and the membrane is monitored for the presence of bound probe. A preferred dot-blot detection assay is described in the examples.

[0059] In the reverse dot-blot (or line-blot) format, the probes are immobilized on a solid support, such as a nylon membrane or a microtiter plate. The target DNA is labeled, typically during amplification by the incorporation of labeled primers. One or both of the primers can be labeled. The membrane-probe complex is incubated with the labeled amplified target DNA under suitable hybridization conditions, unhybridized target DNA is removed by washing under suitably stringent conditions, and the membrane is monitored for the presence of bound target DNA. A preferred reverse line-blot detection assay is described in the examples.

[0060] An allele-specific probe that is specific for one of the polymorphism variants is often used in conjunction with the allele-specific probe for the other polymorphism variant. In some embodiments, the probes are immobilized on a solid support and the target sequence in an individual is analyzed using both probes simultaneously. Examples of nucleic acid arrays are described by WO 95/11995. The same array or a different array can be used for analysis of characterized polymorphisms.

Allele-Specific Primers

[0061] Polymorphisms are also commonly detected using allele-specific amplification or primer extension methods. These reactions typically involve use of primers that are designed to specifically target a polymorphism via a mismatch at the 3' end of a primer. The presence of a mismatch effects the ability of a polymerase to extend a primer when the polymerase lacks error-correcting activity. The presence of the particular allele can be determined by the ability of the primer to initiate extension. If the 3' terminus is mismatched, the extension is impeded. Thus, for example, if a primer matches the "C" allele nucleotide at the 3' end, the primer will be efficiently extended.

[0062] Typically, the primer is used in conjunction with a second primer in an amplification reaction. The second primer hybridizes at a site unrelated to the polymorphic position. Amplification proceeds from the two primers leading to a detectable product signifying the particular allelic form is present. Allele-specific amplification- or extension- based methods are described in, for example, WO 93/22456; U.S. Pat. Nos. 5,137,806; 5,595,890; 5,639,611; and U.S. Pat. No. 4,851,331. [0063] Using allele-specific amplification-based genotyping, identification of the alleles requires only detection of the presence or absence of amplified target sequences. Methods for the detection of amplified target sequences are well known in the art. For example, gel electrophoresis and probe hybridization assays described are often used to detect the presence of nucleic acids.

[0064] In an alternative probe-less method, the amplified nucleic acid is detected by monitoring the increase in the total amount of double-stranded DNA in the reaction mixture, is described, e.g., in U.S. Pat. No. 5,994,056; and European Patent Publication Nos. 487,218 and 512,334. The detection of double-stranded target DNA relies on the increased fluorescence various DNA-binding dyes, e.g., SYBR Green, exhibit when bound to double- stranded DNA.

[0065] As appreciated by one in the art, allele-specific amplification methods can be performed in reaction that employ multiple allele-specific primers to target particular alleles. Primers for such multiplex applications are generally labeled with distinguishable labels or are selected such that the amplification products produced from the alleles are distinguishable by size. Thus, for example, both alleles in a single sample can be identified using a single amplification by gel analysis of the amplification product.

[0066] As in the case of allele-specific probes, an allele-specific oligonucleotide primer may be exactly complementary to one of the polymorphic alleles in the hybridizing region or may have some mismatches at positions other than the 3' terminus of the oligonucleotide, which mismatches occur at non-polymorphic sites in both allele sequences.

5 '-nuclease assay

[0067] Identification of the presence of a polymorphism can also be performed using a "TaqMan®" or "5'-nuclease assay", as described in U.S. Pat. Nos. 5,210,015; 5,487,972; and 5,804,375; and Holland et al, 1988, Proc. Natl. Acad. Sd. USA 88:7276-7280. In the TaqMan® assay, labeled detection probes that hybridize within the amplified region are added during the amplification reaction. The probes are modified so as to prevent the probes from acting as primers for DNA synthesis. The amplification is performed using a DNA polymerase having 5' to 3' exonuclease activity. During each synthesis step of the amplification, any probe which hybridizes to the target nucleic acid downstream from the primer being extended is degraded by the 5' to 3' exonuclease activity of the DNA polymerase. Thus, the synthesis of a new target strand also results in the degradation of a probe, and the accumulation of degradation product provides a measure of the synthesis of target sequences.

[0068] The hybridization probe can be an allele-specific probe that discriminates between the SNP alleles. Alternatively, the method can be performed using an allele-specific primer and a labeled probe that binds to amplified product.

[0069] Any method suitable for detecting degradation product can be used in a 5' nuclease assay. Often, the detection probe is labeled with two fluorescent dyes, one of which is capable of quenching the fluorescence of the other dye. The dyes are attached to the probe, preferably one attached to the 5' terminus and the other is attached to an internal site, such that quenching occurs when the probe is in an unhybridized state and such that cleavage of the probe by the 5' to 3' exonuclease activity of the DNA polymerase occurs in between the two dyes. Amplification results in cleavage of the probe between the dyes with a concomitant elimination of quenching and an increase in the fluorescence observable from the initially quenched dye. The accumulation of degradation product is monitored by measuring the increase in reaction fluorescence. U.S. Pat. Nos. 5,491,063 and 5,571,673, both incorporated herein by reference, describe alternative methods for detecting the degradation of probe which occurs concomitant with amplification.

DNA Sequencing and single base extensions

[0070] SNPs/SNVs can also be detected by direct sequencing. Methods include e.g., dideoxy sequencing-based methods and other methods such as Maxam and Gilbert sequence (see, e.g., Sambrook and Russell, supra).

[0071] Other detection methods include Pyrosequencing™ of oligonucleotide-length products. Such methods often employ amplification techniques such as PCR. For example, in pyrosequencing, a sequencing primer is hybridized to a single stranded, PCR-amplified, DNA template; and incubated with the enzymes, DNA polymerase, ATP sulfurylase, luciferase and apyrase, and the substrates, adenosine 5' phosphosulfate (APS) and luciferin. The first of four deoxynucleotide triphosphates (dNTP) is added to the reaction. DNA polymerase catalyzes the incorporation of the deoxynucleotide triphosphate into the DNA strand, if it is complementary to the base in the template strand. Each incorporation event is accompanied by release of pyrophosphate (PPi) in a quantity equimolar to the amount of incorporated nucleotide. ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5 ' phosphosulfate. This ATP drives the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP. The light produced in the luciferase-catalyzed reaction is detected by a charge coupled device (CCD) camera and seen as a peak in a pyrogram™. Each light signal is proportional to the number of nucleotides incorporated. Apyrase, a nucleotide degrading enzyme, continuously degrades unincorporated dNTPs and excess ATP. When degradation is complete, another dNTP is added.

[0072] Another similar method for characterizing SNPs/SNVs does not require use of a complete PCR, but typically uses only the extension of a primer by a single, fluorescence- labeled dideoxyribonucleic acid molecule (ddNTP) that is complementary to the nucleotide to be investigated. The nucleotide at the polymorphic site can be identified via detection of a primer that has been extended by one base and is fluorescently labeled (e.g., Kobayashi et al, MoI. Cell. Probes, 9:175-182, 1995).

Denaturing Gradient Gel Electrophoresis

[0073] Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution (see, e.g., Erlich, ed., PCR Technology, Principles and Applications for DNA Amplification, W. H. Freeman and Co, New York, 1992, Chapter 7).

Single-Strand Conformation Polymorphism Analysis

[0074] Alleles of target sequences can be differentiated using single-strand conformation polymorphism analysis, which identifies base differences by alteration in electrophoretic migration of single stranded PCR products, as described, e.g, in Orita et ah, Proc. Nat. Acad. ScL 86, 2766-2770 (1989). Amplified PCR products can be generated as described above, and heated or otherwise denatured, to form single stranded amplification products. Single- stranded nucleic acids may refold or form secondary structures which are partially dependent on the base sequence. The different electrophoretic mobilities of single-stranded amplification products can be related to base-sequence difference between alleles of target

[0075] SNP detection methods often employ labeled oligonucleotides. Oligonucleotides can be labeled by incorporating a label detectable by spectroscopic, photochemical, biochemical, immunochemical, or chemical means. Useful labels include fluorescent dyes, radioactive labels, e.g., ³²P, electron-dense reagents, enzyme, such as peroxidase or alkaline phsophatase, biotin, or haptens and proteins for which antisera or monoclonal antibodies are available. Labeling techniques are well known in the art (see, e.g., Current Protocols in Molecular Biology, supra; Sambrook & Russell, supra).

Detection of a BSE animal

[0076] In typical embodiments, a BSE animal is detected by detecting the presence of any one of the 420 polymorphisms set forth in Table 1, or detecting a combination of those polymorphisms. Analysis is generally performed by querying more than one of the 420 variant positions. Thus, anywhere from 1 to all of the 420 variant positions set forth in Table 1 can be analyzed to detect BSE. Generally at least 10, 20, 30, 40, 50 60, 70, or 100 or more positions are analyzed.

[0077] Table 1 and Table 2 provide a description of SNPs found only in BSE animals. Table 1 provides details of SNPs found only in BSE animals. Column A contains the sequence identification tags for each reference query sequence as described in Table 2. Column B describes the position of the SNP in the sequence referred to in Column A. Column C is the consensus sequence derived from the database of sequences of normal and BSE animals and designates the position (shown in a capital letter) at which a diagnostic SNP can be found in sequences from BSE animals only. Column E is the Sequence ID number for those sequences found in Column F. Column F is the consensus sequence derived from the database of sequences of BSE animals and designates the actual polymorphism (shown in a capital letter) found in sequences from BSE animals only: a "-" designates a deletion, a "N" designates an insertion. Column G is the Sequence ID number for those sequences found in Column H. Column H is an alternative consensus sequence to that found in Column F and derived from the database of sequences of BSE animals and designates the actual polymorphism (shown in a capital letter) found in sequences from BSE animals only: a "-" designates a deletion, a "N" designates an insertion. The position of the polymorphism is determined with reference to one of reference sequences SEQ ID NOs 1-41. Thus, the number of the position indicates where that position occurs in the context of the reference sequence (Le., one of SEQ ID NOs: 1-41). An insertion occurs after the designated position,

[0078] Table 2 provides a summary table of queried sequences containing diagnostic SNPs found only in BSE animals. Column A contains the sequence identification tags for each query sequence. Column B contains the repetitive element nomenclature that has the highest homology with the query sequence when applicable. Column C is the length of the query sequence. Column D contains the percentage of the query length that has homology (>70%) to the reference repetitive element in Column B. Column E contains, where appropriate, the BLAST reference of the query sequence when searching against the cow genome. Column F contains the percentage of the query length, where appropriate, that has homology (>70%) to the BLAST reference in Column E. Column G contains the highest number of individual sequences in the database, derived from the ultra deep sequencing of both BSE and normal animals, at positions that contains SNPs found only in BSE animals. Column H contains the total number of significant SNPs in the queried sequence found only in BSE animals. Column I contains the maximum number of BSE animals, out of a total of 15 BSE animals but not any normal control animals, that can be detected when using a single SNP from Column H. Column J contains the maximum number of BSE animals, out of a total of 15 BSE animals but not any normal control animals, that can be detected when using a combination of SNPs from Column H. Column K refers to the Sequence Number of oligos (as detailed in Table 1) that are located within the query sequence of Column A and containing the SNP position referred to in Column H.

[0079] As noted above, a polymorphic position described herein, (i.e., a query position) can be evaluated using sequencing or any number of methods employing oligonucleotides that are competent to discriminate between the residue(s) present in the reference sequence and the indicated polymorphism present only in BSE animals. Such oligonucleotides can bind selectively to the normal sequence or in some embodiments, are designed to bind selectively to the variant sequence known to be associated with BSE. Exemplary oligonucleotides that discriminate between the reference sequence and BSE-associated variant sequence are provided in Table 1.

[0080] In some embodiments, a BSE animal is detected by sequence analysis of one or more polymorphic positions. EXAMPLES Example 1. Detection of polymorphisms to detect animals with BSE

[0081] This example describes detection of SNP/SNVs associated with BSE. Samples were obtained from an experimental study whereby cows were inoculated orally with BSE- infectious or control brain material. Fleckvieh/Brown Swiss cattle were fed 100 g of either PrP^res-positive brain stem macerate or normal brain material (controls). Serum samples were taken 40 months post-inoculation (15 infected, 6 control non-infected and 12 randomly selected normal animals).

Experimental Protocol

[0082] Serum collection: Special care was taken in collection, processing and storage of serum samples. Blood from the tail vein or artery was collected into 18 mL plastic tubes equipped with a coagulation accelerator and kept at room temperature for 30 min to ensure proper coagulation. Until further processing, the tubes were stored at 2 - 8 °C for not longer than 24 hours. Centrifugation was done at 2 - 8 °C, 1000 x g for 15 min. The serum supernatant was transferred into 1.5 mL microcentrifuge cups in 0.5 mL aliquots and frozen immediately at -20°C or -80⁰C until use

[0083] Preparation of serum fractions: Frozen serum was thawed at 4 ⁰C in an ice-water bath or refrigerator and 250 μL were transferred into a 1.5 mL microcentrifuge tube. The tube was centrifuged at 4,000 x g for 25 min at 4 ⁰C in a Model 5214 bench top centrifuge (Eppendorf, Hamburg, Germany) to remove cell debris. The supernatant was transferred into a fresh tube and subjected to 35 min centrifugation at 20,000 x g. The supernatant was carefully removed and the pellet was used for further analyses.

[0084] Nucleic acid extraction: 20,000 x g pellets were dissolved in 5 μl IxPBS and subsequently lysed by adding 7.5 μl of Solution D2 (0.4 M KOH, 0.01 M EDTA, 0.08 M DTT). Samples were mixed by pipetting up and down for five times. Samples were kept on ice for ten minutes. Samples were neutralized by adding 7.5 μl of Solution B (0.4 M HCl, 0.6 M Tris/HCl pH 7.5). Samples were either further purified using the PLG (light) tubes (2 mL capacity; Eppendorf) according to the manufacturers instructions or used directly in subsequent steps. [0085] Alternatively, total nucleic acids from whole serum were extracted using the High Pure Viral Nucleic Acids Extraction Kit (Roche) according to the manufacturers instructions except for the use of poly A-RNA as a carrier.

[0086] Whole DNA amplification: 1 μl of the extracted serum DNA solution was amplified using WGA4 Kit (Sigma) according to the manufacturers protocol. In preparation for sequencing samples were labelled using identifier lead sequences for later tracking of the individual samples.

[0087] Ultra deep sequencing of products from steps above was performed using a Roche/454 genome sequencer (GS20/GSFLX) with system reagents according to the manufacturers instructions.

[0088] Raw sequences were processed using SeqMan (DNAStar Inc., Madison WI, USA). Briefly, after trimming for the used adapters/primers, an automatic contig assembly was performed in order to reduce redundancy within in the dataset A total of 227283 were assembled into 26673 contigs. Out of this initial assembly 138 contigs assembled from more than 49 sequences each were selected. Sequential local alignments using the Blast program (Altschul, et al. Nucleic Acids Res 25(17): 3389-3402, 1997) were undertaken to discover sharing of domains between the contig sequences. Parameters used were -r 1 -q -3 -G 5 -E 2 - W 7 -F "m D" -e 0.01. Using this local alignment approach the sequence redundancy within the contigs was further reduced.

[0089] Blast analysis: A total of 117 contigs were compared against a database containing 808,634 sequences (total letters: 86,785,049) using Blast. Database sequences segregate into 410984 sequences from 15 animals artificially infected with BSE and 397650 sequences from 18 un-infected controls.

[0090] SNV mining: Out of the 117 initial contigs 70 gave significant hits when compared to the aforementioned database. Variations between database-sequences were extracted from the Blast output for each contig-query that gave more than five hits. Variations were tested for distribution differences between by chi-square test. For SNV positions displaying a specificity of 100% flanking nucleotides (12 bp in each direction) were extracted from the contig sequence, which served as query in the Blast analysis. Results

[0091] The ultra deep sequencing approach had generated 41 sequences (Jean is this a table, figure, attachment?) in which SNP/SNVs were found only in animals with BSE and not in normal controls. The sequences were mostly derived from repetitive genomic sequences wherein most of the prevalent sequences had homology to bovine Ll LINE or SINE repetitive elements. Two sequences showed were neither repetitive nor coding sequences homology. Table 1 shows that a total of 421 SNPs could be identified from the 41 sequences These data suggest that multiple SNPs/SNVs may be involved in defining the difference between normal and BSE infected cattle

Discussion

[0092] There a dearth of literature on the detection of CNAs, with or without polymorphisms, that can be used as a diagnostic test with acceptable performance criteria (reviewed in Fleischhaker and Schmidt, 2007). The majority of studies to date focus on coding genes that have putative roles in a suspected disease process. In the study described here, all CNAs were analyzed without gene preference bias. SNPs/SNVs were determined from the presence of multiple occurrences from ultra deep sequencing.

[0093] The presence of CNA in the sera of cattle was reported previously (e.g., Schutz et al CDLI). The study described here was conducted to determine whether SNPs or SNVs could be used to detect the presence of BSE in living cattle. BSE was evaluated because BSE is a naturally-occurring veterinary chronic illness and clinical experimental samples with sufficient volumes of samples, e.g. sera are available.

[0094] The above examples are provided to illustrate the invention but not to limit its scope. Other variants of the invention will be readily apparent to one of ordinary skill in the art and are encompassed by the appended claims.

[0095] All publications, patents, accession numbers, and patent applications cited herein are hereby incorporated by reference for all purposes. Table 1

A B C G H

SNP

Name Position Consensus Oligo SNPOIigo ID SNP-Oligol SNPOIigo ID SNP-Oligo2 Origin in wi icjiπ

Seq. ID: 1 157 gcctmrggagmaGtcycrcgttagg Seq. ID: 50 gcctmrggasmaTtcycrcgttagg Seq. ID: 464 gcctmrvgasmaTtcycrcgwymks

Seq. ID: 1 239 atcccggtctccCtcgagaggaaca Seq. ID: 68 atcccggtctccGtcgagaggaaca Seq. ID: 465 atmccggtctccGtcgagaggaaca

Seq. ID: 1 247 ctccctcgagagGaacacygaggtt Seq. ID: 70 ctccctcgagagGNaacacygaggtt

Seq. ID: 1 259 gaacacygaggtTttccggcaccmc Seq. ID: 71 gaacacygaggtAttccggcaccmc

Seq. ID: 1 267 a g gttttccg g cAccm cy tcctctg Seq. ID: 72 aggttttccggc-ccmcytcctctg Seq. ID: 466 aggttttccggc-ccmcctcctctg

Seq. ID: 1 310 gatctggacaggAgggtcgactccc Seq. ID: 73 gatctggacaggCgggtcgactccc

Seq. ID: 1 400 gacattccagacGtggcctcgtggg Seq. ID: 74 gacattccagacTtggcctcgtggg

Seq. ID: 1 410 acgtggcctcgtGggtggttccaca Seq. ID: 75 acgtggcctcgt-ggtggttccaca

Seq. ID: 1 419 cgtgggtggttcCacattccgtagg Seq. ID: 76 cgtgggtggttc-acattccgtagg

Seq. ID: 1 471 aagaacccgatgCccggacacctct Seq. ID: 77 aagaacccgatgGccggacacctct

Seq. ID: 1 493 tctccgaactccAscctgtgaatgm Seq. ID: 78 tctccgaactcc-scctgtgaatgm Seq. ID: 467 tctccgaactcc-ccctgtg a atg a

Seq. ID: 1 505 ascctgtgaatgMagtcaacacgaa Seq. ID: 79 ascctgtgaatgTagtcaacacgaa Seq. ID: 468 accctgtgaatgTagtcaacacgaa

Seq. ID: 1 565 accccaggttccAaatacagctcga Seq. ID: 80 accccaggttcc-aatacagctcga

Seq. ID: 1 578 aatacagctcgaCaagcggcctctc Seq. ID: 81 aatacagctcgaCNaagcggcctctc Seq. ID: 469 aatacagctcgaCNaagyggcctstc

Seq. ID: 1 582 cagctcgacaagCggcctctctccc Seq. ID: 82 cagctcgacaagCNggcctctctccc

Seq. ID: 1 669 ctgtccccagtcTgcagggaccctg Seq. ID: 83 ctgtccccagtcTNgcagggaccctg w Seq. ID: 1 720 cctgmggttcctGcctcaactggag Seq. ID: 84 cctgmggttcct-cctcaactggag Seq. ID: 470 ccygmgvttccy-cctcaactsgag

■^ Seq. ID: 1 820 tcagagccaccaTgagaagccccct Seq. ID: 85 tcwgagccaccaGgagaagccccct Seq. ID: 471 tcwgagccaccmGgwgaagcyccct

Seq. ID: 1 856 cacaagtcgaggGaacccmgggttt Seq. ID: 86 cacaagtcgagg-aacccmgggttt Seq. ID: 472 cacaagtcgagg-aacccngrgttt

Seq. ID: 1 978 tcgcmactcgmaTggagayysgact Seq. ID: 87 tcgcmactcgmaTNggagayysgact

Seq. ID: 1 991 ggagayysgactTccctggsscmmc Seq. ID: 88 ggagayysgactTNccctggsscmmc Seq. ID: 473 ggagabcsgactTNccctggsbcmmc

Seq. ID: 1 1074 mrctcgagaamaAccmcgwgrytcc Seq. ID: 42 mrctcgagaama-ccmcgwgrytcc Seq. ID: 474 crctcgagaaca-ccccgagrytcc

Seq. ID: 1 1142 rvgasmartcycRcg wym kcwctcc Seq. ID: 43 rvgasmartcycRNcgwymkcwctcs Seq. ID: 475 g vg asca rtcycD N eg wy ckctctcs

Seq. ID: 1 1163 ctccrarckcswMaggrbrcttgrc Seq. ID: 44 ctcsrarckcswTaggrbrcttgrc Seq. ID: 476 ctcsvarckcswTaggasvcttgrc

Seq. ID: 1 1220 tacccgtcgcgaYtcgagagcagag Seq. ID: 45 tacccgtcgcgaYNtcgagagcagag Seq. ID: 477 tmcccgtcgcnmCNtcgagagvaras

Seq. ID: 1 1303 caaccccgagatCcctgtcgcccct Seq. ID: 46 caaccccgagatCNcctgtcgcccct Seq. ID: 478 saaccccgagwtCNcctgtckcmmct

Seq. ID: 1 1336 acattg g cttctGg acacaa g ccta Seq. ID: 47 acattggcttctGNgacacaagccta Seq. ID: 479 acmytgrcttcbBNghcdcaasycka

Seq. ID: 1 1341 ggcttctggacaCaagcctagatga Seq. ID: 48 ggcttctggacaCNaagcctagatga Seq. ID: 480 grcttcbbghcdCNaasyckagatga

Seq. ID: 1 1560 gcctmrvgasmaGtcycrcgttagg Seq. ID: 49 gcctmrvgasmaTtcycrcgttagg Seq. ID: 481 gcctmrvgasmaTtcycdcgwymks

Seq. ID: 1 1642 atcccggtctccCtcgagaggaaca Seq. ID: 51 atcccggtctccGtcgagaggaaca Seq. ID: 482 atmccggtctccGtcgagaggaaca

Seq. ID: 1 1650 ctccctcgagagGaacacygaggtt Seq. ID: 52 ctccctcgagagGNaacacygaggtt

Seq. ID: 1 1672 gttttccggcacCccctcctctgag Seq. ID: 53 gttttccggcac-ccctcctctgag

Seq. ID: 1 1713 gatctggacaggAgggtcgactccc Seq. ID: 54 gatctggacaggCgggtcgactccc

Seq. ID: 1 1803 gacattccagacGtggcctcgtggg Seq. ID: 55 gacattccagacTtggcctcgtggg

Seq. ID: 1 1813 acgtggcctcgtGggtggttccaca Seq. ID: 56 acgtggcctcgt-ggtggttccaca

Seq. ID: 1 1874 aagaacccgatgCccggacacctct Seq. ID: 57 aagaacccgatgGccggacacctct

Table 1 (continued)

Seq. ID: 1 1908 ascctgtgaatgMagtcaacacgaa Seq. ID: 58 ascctgtgaatgTagtcaacacgaa Seq. ID: 483 accctgtgaatgTagtcaacacgaa

Seq. ID: 1 1965 accccaggttccAaatacagctcga Seq. ID: 59 accccaggttcc-aatacagctcga

Seq. ID: 1 1978 aatacagctcgaCaagcggcctctc Seq. ID: 60 aatacagctcgaCNaagcggcctctc Seq. ID: 484 aatacagctcgaCNaagyggcctstc

Seq. ID: 1 2120 cctgmggttcctGcctcaactggag Seq. ID: 61 cctg m g gttcct-cctcaactggag Seq. ID: 485 ccygmgvttccy-cctcaactsgag

Seq. ID: 1 2184 cgagaggcccctCccacctccagkw Seq. ID: 62 cgagasscccct-ccamckccagdw Seq. ID: 486 cgagwbscccct-ccamctcsagdw

Seq. ID: 1 2241 cccygagrtcmcYkkcrcmmstsga Seq. ID: 63 cccygag rtcmcYN kkcrcm mstsga Seq. ID: 487 ccchgrgrtchcCNkkcvcaavtcga

Seq. ID: 1 2286 cwcaascckagaH-gaggtcwrwtd Seq. ID: 64 cwcaascckagaHN-gaggtcwrwtd

Seq. ID: 1 2293 ckagah-gaggtCwrwtdsccctgc Seq. ID: 65 ckagah-gaggt-wrwtdsccctdc Seq. ID: 488 cgwsahrgassy-wghkdcccctdc

Seq. ID: 1 2294 kagah-gaggtcWrwtdsccctgcm Seq. ID: 66 kagah-gaggtc-rwtdsccctdcm

Seq. ID: 1 2379 gsscmmcacragAggcwbmctgamy Seq. ID: 67 gsscmmcacrag-ggcwbmctgamy Seq. ID: 489 gsscmmcacrag-ggcwymctgamy

Seq. ID: 1 2451 gaa-maaccccgWgrytcccccgtc Seq. ID: 69 gaa-maaccmcgWNgrytcccccgtc Seq. ID: 490 gaanmaaccccgANgaytcccccgtc

Seq. ID: 2 49 trcgagacagcaAaagagamacwga Seq. ID: 93 trcragacagcaGargagacacwga

Seq. ID: 2 75 rtadagaayagwCttwtggactytg Seq. ID: 94 gtadagaacagwTttdtggactntg Seq. ID: 491 rtagagaacaraTttdtkganwhts

Seq. ID: 2 88 ttwtggactytgTgggagagggmga Seq. ID: 95 ttdtggactntgCgggagagggaga Seq. ID: 492 ttdtkganwhtsCrggrgrdggwga

Seq. ID: 2 118 ggatgatttgrgAgaatrgcattga Seq. ID: 89 ggawgatttgggTgaatggcattga Seq. ID: 493 ggatgawttgrgTgaatrgcattga

Seq. ID: 2 200 wggatgcttgggGctggtgcactgg Seq. ID: 90 wggatgcttgggCctggtgcactgg Seq. ID: 494 wggawgcttvrrCcwrgtgcwctgk

Seq. ID: 2 241 ggtatggggaggGaggagggaggag Seq. ID: 91 ggtatggggaggTaggagggaggag Seq. ID: 495 ggkatggggaggTaggagggaggag

Seq. ID: 2 257 agggaggagggtTca Seq. ID: 92 agggaggagggtCca

Seq. ID: 3 76 taaaaactctccAgaaagyagghat Seq. ID: 112 taaaaactctccGgaaagyaggmat Seq. ID: 496 traaaacwcnmmGcaaavtagrmmt

,_O Seq. ID: 3 108 acatacctcaacAtaataaaagcya Seq. ID: 96 acatacctcaacGtaataaaagcya Seq. ID: 497 amhkwbctcaam-nnnwrwadgnba

^ Seq. ID: 3 207 agacaaggrtgcCcactytcaccac Seq. ID: 97 agacaaggrtgcTcactytcaccac Seq. ID: 498 agrcaaggatgyTcaywytydcmac

Seq. ID: 3 233 hytattcaacatAgttttggargtt Seq. ID: 98 wctattcaacatGgtdttggargtt Seq. ID: 499 wcyattcaayahGgwvytggargtt

Seq. ID: 3 280 aaaaagaaataaAaggaatccaaat Seq. ID: 99 aaaaagaaataaGaggaatccaaat

Seq. ID: 3 283 aagaaataaaagGaatccaaattgg Seq. ID: 100 aagaaataaaagCaatccaaattgg

Seq. ID: 3 293 aggaatccaaatTggaaaagaagaa Seq. ID: 101 aggaatccaaatTNggaaaagaagaa

Seq. ID: 3 299 ccaaattggaaaAgaagaagtaaaa Seq. ID: 102 ccaaattggaaaTgaagaagtaaaa

Seq. ID: 3 309 aaagaagaagtaAaactctcactrt Seq. ID: 103 aaagaagaagtaTaactctcactrt

Seq. ID: 3 311 agaagaagtaaaActctcactrttt Seq. ID: 104 agaagaagtaaaCctctcactrttt

Seq. ID: 3 318 gtaaaactctcaCtrtttgcagatg Seq. ID: 105 gtaaaactctca-trtttgcagatg

Seq. ID: 3 321 aaactctcactrTttgcagatgaca Seq. ID: 106 aaactctcactrCttgcagatgaca

Seq. ID: 3 344 catgatcctmtaCatagaaaaccct Seq. ID: 107 catgatcctcta-atagaaaaccct Seq. ID: 500 catgatcctmta-atagaaaaycct

Seq. ID: 3 361 aaaaccctaaagActccaccagaaa Seq. ID: 108 aaaaccctaaagGctccaccagaaa Seq. ID: 501 aaaaycctaaagGmtccaccagaaa

Seq. ID: 3 445 aatchcttgcatTyctatacactaa Seq. ID: 109 aatcmcttgcatAcctatacactaa

Seq. ID: 3 456 ttyctatacactAayaatgaraaaa Seq. ID: 110 ttcctatacactGayaatgaraaaa

Seq. ID: 3 472 atgaraaaacagAaagagaaattaa Seq. ID: 111 atgaraaaacag-aagagaaattaa Seq. ID: 502 atgarraawcan-aagagaaattaa

Seq. ID: 4 31 aaayaaaattgaCaaaccattagcc Seq. ID: 120 aaayaaaattgaAaaaccattagcc Seq. ID: 503 aaayaaaattgaAaaachwttagcc

Seq. ID: 4 58 actcatcaagaaAmaagggagaara Seq. ID: 129 actcatcaagaaTmaagrgagaara Seq. ID: 504 actsatcaagaaTmaagrgagaara

Seq. ID: 4 64 caagaaamaaggGagaaramtcaaa Seq. ID: 130 caagaaamaagr-agaaraatcaaa Seq. ID: 505 caagaaamaagr-agaaraawcaaa

Seq. ID: 4 81 ramtcaaatmaaYaaaattagaaat Seq. ID: 131 raatcaaatcaaYNaaaattagaaat Seq. ID: 506 raawcaaatmaaYNaaaatyagaaat

Table 1 (continued)

Seq. ID: 4 115 garrtyacaacdGayamhrcagaaa Seq. ID: 113 garatyacaacaCacamyacagaaa Seq. ID: 507 gavrtyacaacwCayamcacagaaa

Seq. ID: 4 150 cataagagamtaCtatnarcaayta Seq. ID: 114 cataagagamtaGtatvarcaayta Seq. ID: 508 cataagagaytaGtakvarcaayta

Seq. ID: 4 166 narcaaytatatGccaataaaatgg Seq. ID: 115 varcaaytatatTccaataaaatgg Seq. ID: 509 varcaaytatayTccaataaaatgg

Seq. ID: 4 217 ttagaaamgtacAacytbccaaaac Seq. ID: 116 ttagaaaagtacTacyttccaarac Seq. ID: 510 ytagaaaadtwcTacyttccaarac

Seq. ID: 4 249 rgaagaaatagaAaatmtkaacaga Seq. ID: 117 ggaagaaatagaTaatmtkaacara Seq. ID: 511 rgaagaaatagaTawtmtgaacavh

Seq. ID: 4 261 aaatmtkaacagAccmatcacaagc Seq. ID: 118 aaatmtkaacarTccmathacaagy Seq. ID: 512 aawtmtgaacavTccmatbacaarh

Seq. ID: 4 299 ctgtratcaaaaAtcttccarcaaa Seq. ID: 119 ctgtaatcaraaCtctyccarcaaa Seq. ID: 513 ctgtaathaaaaCwctyccmvmaaa

Seq. ID: 4 310 aatcttccarcaAacaaaagcccag Seq. ID: 121 aatctyccarcaTacaaaagcccag Seq. ID: 514 aawctyccmvmaTabaaaagyccag

Seq. ID: 4 355 gaattctaccaaAmatttaragaag Seq. ID: 122 gaattctaycaaChatttagagaag Seq. ID: 515 raattctaycaaChatttaragaag

Seq. ID: 4 385 acacctatcctdCtcaaactcttcc Seq. ID: 123 ayacctatcctd-tcaaactcttcc Seq. ID: 516 ayachwatcctd-wyaaactmttcc

Seq. ID: 4 386 cacctatcctdcTcaaactcttcca Seq. ID: 124 yacctatcctdc-caaactcttcca Seq. ID: 517 yachwatcctdc-yaaactmttcca

Seq. ID: 4 406 ttccaraaaattGcagaggaaggwa Seq. ID: 125 ttccaraaaatt-cagaggaaggwa Seq. ID: 518 ttccaraaaatt-magaggraggaa

Seq. ID: 4 413 aaattgcagaggAaggwaaacttcc Seq. ID: 126 aaattgcagaggGaggwamacttcc Seq. ID: 519 aaattnmagaggGaggaamacttcc

Seq. ID: 4 482 acaaagayrccaCaaaaaagaaaay Seq. ID: 127 acaaagayvccaGaaaaaagaaaay Seq. ID: 520 acaaagayvhcaGaaaaaagaaaay

Seq. ID: 4 500 agaaaaytacagGccaatatcaytg Seq. ID: 128 agaaaaytacagTccaatatcactg Seq. ID: 521 agaaaaytayagTccaatatcactg

Seq. ID: 5 61 ygactgagygacTgaactgaactga Seq. ID: 132 ygactgagygacTNgaactgaactga

Seq. ID: 6 380 tgggtacagaatGggatcaggaggt Seq. ID: 150 tgggtacagaatCggatcaggaggt

Seq. ID: 6 474 ctttgctaatggAttttaagtttct Seq. ID: 151 ctttgctaatggTttttaagtttct

Seq. ID: 6 1009 cccasccttaccAggccvcagagga Seq. ID: 133 cccasccttaccGggccvcagagga

Seq. ID: 6 1060 gggtaaggaacaAggaactaacaag Seq. ID: 134 gggtaaggaacaANggaactaacaag

Seq. ID: 6 1073 ggaactaacaagCtcccaccaacca Seq. ID: 135 ggaactaacaagTtcccaccaacca Seq. ID: 522 ggaactaacaagTtcccascaacca

Seq. ID: 6 1111 gtcaacaagaggTcagcaagagatg Seq. ID: 136 gtcaacaagaggCcagcaagagatg

Seq. ID: 6 1162 aacttccagcgaGcagcaaagaccc Seq. ID: 137 aacttccagcgaAcagcaaagaccc Seq. ID: 523 aacttccagcgaAcavmaaagaccc

Seq. ID: 6 1166 tccagcgagcagCaaagacccagca Seq. ID: 138 tccagcgagcagAaaagacccagca Seq. ID: 524 tccagcgagcavAaaagacccagca

Seq. ID: 6 1199 acgaattcaatgTtacccaaaacaa Seq. ID: 139 acgaattcaatgCtacccaaaacaa Seq. ID: 525 acgaattcaatgCkasccaaaacma

Seq. ID: 6 1201 gaattcaatgttAcccaaaacaaay Seq. ID: 140 gaattcaatgttANcccaaaacaaay Seq. ID: 526 gaattcaatgtkANsccaaaacmant

Seq. ID: 6 1253 acangactgaggTtdggggtgaggc Seq. ID: 141 acatgactgaggGtdggggtgaggc

Seq. ID: 6 1282 catgtatgtgccAggatactcttaa Seq. ID: 142 catgtatgtgccGg g atactcttaa

Seq. ID: 6 1529 ggtggatttctaCaaagaacatcca Seq. ID: 143 ggtggatttctaAaaagaacatcca

Seq. ID: 6 1536 ttctacaaagaaCatccatgcaaag Seq. ID: 144 ttctacaaagaaAatccatgcaaag

Seq. ID: 6 1580 gtgcacatctgcCcccwcccccccw Seq. ID: 145 gtgcacatctgcGccchccccccct Seq. ID: 527 gtgcacatctbcGccchccccccct

Seq. ID: 6 1698 ttttctaaaaaaAaraaaaacaaaa Seq. ID: 146 ttttctaaaaaaCaraaaaacaaaa Seq. ID: 528 ttttctaaaaaaCarraaaacaaaa

Seq. ID: 6 1714 aaaacaaaaarcAaattaaaaaaaa Seq. ID: 147 aaaacaaaaarcCaattaaaaaaaa Seq. ID: 529 aaaacaaaaaasCaantwnaaaaaa

Seq. ID: 6 1717 acaaaaarcaaaTtaaaaaaaaaaa Seq. ID: 148 acaaaaarcaaa-taaaaaaaaaaa Seq. ID: 530 acaaaaaasmaa-twnaaaaaaaaa

Seq. ID: 6 1720 aaaarcaaattaAaaaaaaaaaacc Seq. ID: 149 aaaarcaaatta-aaaaaaaaaacc Seq. ID: 531 aaaaasmaantw-aaaaaaaaaacc

Seq. ID: 7 23 ctactccatttcTtctaagggattc Seq. ID: 157 ctactccatttcAtctaagggattc

Seq. ID: 7 47 cytgcccacagtAgtagatataatg Seq. ID: 158 cytgcccacagtCgtagatataatg Seq. ID: 532 cttgcccacagtCgtagatataatg

Seq. ID: 7 50 gcccacagtagtAgatataatggtc Seq. ID: 159 gcccacagtagtTgatataatggtc

Seq. ID: 7 57 gtagtagatataAtggtcatctgag Seq. ID: 160 gtagtagatataCtggtcatctgag Seq. ID: 533 gtagtagatataCtggtcatctgar

Table 1 (continued)

Seq. ID: 7 74 catctgagttaaAttcacccattcc Seq. ID: 169 catctg agtta aTttca cccattcc Seq. ID: 534 catctga rttaaTttcacccattcc

Seq. ID: 7 93 cattccagtccaTtttagttcrctg Seq. ID: 171 cattccagtccaGtttagttcrctg Seq. ID: 535 cattccagtccaGtttagttcactg

Seq. ID: 7 121 ccta raatgtcg Ayrttcactcttg Seq. ID: 152 cctaraatgtygTyrttcactcttg Seq. ID: 536 cctaraatgtygTygttcactcttg

Seq. ID: 7 174 rccttgattcatGgacctaacattc Seq. ID: 153 gccttgattcatTgacctaacattc Seq. ID: 537 rccttgattcatTgacctaacattc

Seq. ID: 7 176 cttgattcatggAcctaacattcca Seq. ID: 154 cttg attcatg g Gcctaacattcca

Seq. ID: 7 185 tggacctaacatTccaggttcctat Seq. ID: 155 tggacctaacatCccaggttcctat

Seq. ID: 7 227 agcatcrgacytTrcttcyaycacc Seq. ID: 156 agcatcrgacytCrcttcyatcacc

Seq. ID: 7 613 ctctgatgccctCttgcaacaccta Seq. ID: 161 ctctgatgccctAttgcaacaccta

Seq. ID: 7 615 ctgatgccctctTgcaacacctacc Seq. ID: 162 ctgatgccctctAgcaacacctacc

Seq. ID: 7 627 tgcaacacctacCrtcttacttggg Seq. ID: 163 tgcaacacctacGrtcttacttggg

Seq. ID: 7 635 ctaccrtcttacTtgggtttctctt Seq. ID: 164 ctaccrtcttacAtg g gtttctctt

Seq. ID: 7 638 ccrtcttacttgGgtttctcttacc Seq. ID: 165 ccrtcttacttgCgtttctcttacc

Seq. ID: 7 657 cttaccttggrcGtggggtatctct Seq. ID: 166 cttaccttggrcCtggggtatctct

Seq. ID: 7 658 ttaccttggrcgTggggtatctctt Seq. ID: 167 ttaccttggrcgTNggggtatctctt Seq. ID: 538 ttaccttggrcvTNggggtatctctt

Seq. ID: 7 678 ctcttcacggctGctccagcaaagc Seq. ID: 168 ctcttcacggctCctccagcaaagc Seq. ID: 539 ctcttcacrgctCctccagcaaagc

Seq. ID: 7 759 accttsaacgtgGrrtagctcctct Seq. ID: 170 accttsaacgtgTrrtagctcctct Seq. ID: 540 accttsraygtgTrrtagctcctct

Seq. ID: 8 147 ggtgcctgactgStgcctcagctca Seq. ID: 172 ggtgcctgactg-tgcctcagctca Seq. ID: 541 ggyrcctkactg-ngcytcagctca

Seq. ID: 8 148 gtgcctgactgsTgcctcagctcag Seq. ID: 173 gtgcctgactgs-gcctcagctcag Seq. ID: 542 gyrcctkactg—gcytcagctcag

Seq. ID: 8 178 gccttcccttcgGacctaggagcca Seq. ID: 174 gccttcccttcgAacctaggagcca Seq. ID: 543 gccttyccntysAwmctaggacnna to Seq. ID: 8 180 cttcccttcggaCctaggagccagg Seq. ID: 175 cttcccttcggaActaggagccagg Seq. ID: 544 cttyccntysnwActaggacnnagg

Seq. ID: 8 188 cggacctaggagCcaggcgtccttg Seq. ID: 176 cggacctaggagAcaggcgtccttg Seq. ID: 545 ysnwmctaggacAnaggmswnyttg

Seq. ID: 8 189 ggacctaggagcCaggcgtccttgc Seq. ID: 177 ggacctaggagcAaggcgtccttgc Seq. ID: 546 snwmctaggacnAaggmswnyttgs

Seq. ID: 8 194 taggagccaggcGtccttgcagagg Seq. ID: 178 taggagccaggcCtccttgcagagg Seq. ID: 547 taggacnnaggmCwnyttgsagarg

Seq. ID: 8 201 caggcgtccttgCagagggcttagg Seq. ID: 179 caggcgtccttgGagagggcttagg Seq. ID: 548 naggmswnyttgGagarggsywagg

Seq. ID: 8 205 cgtccttgcagaGggcttaggggct Seq. ID: 180 cgtccttgcagaAggcttaggggct Seq. ID: 549 mswnyttgsagaAggsywagggkct

Seq. ID: 8 208 ccttgcagagggCttaggggctggg Seq. ID: 181 ccttgcagagggGttaggggctggg Seq. ID: 550 nyttgsagarggGywagggkctksd

Seq. ID: 8 215 gagggcttagggGctgggggtgtgc Seq. ID: 182 gagggcttagggTctgggggtgtgc Seq. ID: 551 garggsywagggTctksdkbtbtvc

Seq. ID: 8 229 tgggggtgtgcaCactgccaaagcc Seq. ID: 183 tgggggtgtgcaAactgccaaagcc Seq. ID: 552 tksdkbtbtvcaAacngccaargsc

Seq. ID: 8 232 gggtgtgcacacTgccaaagcctcc Seq. ID: 184 gggtgtgcacacCgccaaagcctcc Seq. ID: 553 dkbtbtvcanacCgccaargsctmc

Seq. ID: 8 238 gcacactgccaaAgcctcccgcggg Seq. ID: 185 gcacactgccaaGgcctcccgcggg Seq. ID: 554 vcanacngccaaGgsctmcngcaat

Seq. ID: 8 240 acactgccaaagCctcccgcgggga Seq. ID: 186 acactgccaaagGctcccgcgggga Seq. ID: 555 anacngccaargGctmcngcaatga

Seq. ID: 8 243 ctgccaaagcctCccgcggggagag Seq. ID: 187 ctgccaaagcctAccgcggggagag Seq. ID: 556 cngccaargsctAcngcaatgagav

Seq. ID: 8 249 aagcctcccgcgGggagagctcttt Seq. ID: 188 aagcctcccgcgAggagagctcttt Seq. ID: 557 argsctmcngcaAtgagavctntbt

Seq. ID: 8 250 agcctcccgcggGgagagctctttc Seq. ID: 189 agcctcccgcggTgagagctctttc Seq. ID: 558 rgsctmcπgcaaTgagavctntbtc

Seq. ID: 8 255 cccgcggggagaGctctttccgagg Seq. ID: 190 cccgcggggagaCctctttccgagg Seq. ID: 559 mcngcaatgagaCctntbtcmgagg

Seq. ID: 8 258 gcggggagagctCtttccgagggaa Seq. ID: 191 gcggggagagctGtttccgagggaa Seq. ID: 560 gcaatgagavctGtbtcmgaggkaa

Seq. ID: 8 263 gagagctctttcCgagggaaaggaa Seq. ID: 192 gagagctctttcAgagggaaaggaa Seq. ID: 561 gagavctntbtcAgaggkaarggaa

Seq. ID: 8 281 aaaggaagcgccCcaggctgcattc Seq. ID: 193 aaaggaagcgccAcaggctgcattc Seq. ID: 562 aarggaagvgynAnaggmtgsnttc

Seq. ID: 8 286 aagcgccccaggCtgcattcccaag Seq. ID: 194 aagcgccccaggAtgcattcccaag Seq. ID: 563 aagvgynmnaggAtgsnttcccaag

Table 1 (continued)

Seq. ID: 8 289 cgccccaggctgCattcccaagctg Seq. ID: 195 cgccccaggctgGattcccaagctg Seq. ID: 564 vgynmnaggmtgGnttcccaagcwg

Seq. ID: 8 300 gcattcccaagcTggtgcttccccg Seq. ID: 196 gcattcccaagcAggtgcttccccg Seq. ID: 565 gsnttcccaagcAggtgcntssyca

Seq. ID: 9 34 tctccaaagaagAcatacagatggc Seq. ID: 202 tctccaaagaagTcatacagatggc

Seq. ID: 9 52 agatggcyaacaAacacatgaaaag Seq. ID: 207 agatggcyaacaTrcacatgaaaag Seq. ID: 566 avatgrcyaahaTryacatgaaaar

Seq. ID: 9 167 gtctacaaayaaTaaatgctgga-g Seq. ID: 197 rtcyacaaahaaGaaatgctgga-g Seq. ID: 567 swctrmaaabadGaaatgytggmwg

Seq. ID: 9 209 gaaccctcttacActgttggtggga Seq. ID: 198 gaaccctcttac-ctgttggtggga Seq. ID: 568 gaachctchtwc-ntgytggtggga

Seq. ID: 9 211 accctcttacacTgttggtgggaat Seq. ID: 199 accctcttacac-gttggtgggaat Seq. ID: 569 achctchtwcan-gytggtgggaat

Seq. ID: 9 244 gtacagccactaTggaraacagtrt Seq. ID: 200 gtacagccactaTNggaraacagtrt Seq. ID: 570 gtrcagccactaTNggaraacagtdt

Seq. ID: 9 319 ccactvctgggcAtayacmchgagg Seq. ID: 201 ccactvctgggc-tayacmchgagg Seq. ID: 571 ccactvctgggy-tanaymchgagr

Seq. ID: 9 397 ayaatagccargAcatggaagcaac Seq. ID: 203 ayaatagccargCcatggaagcaac Seq. ID: 572 acaatagccaarCcatggaarcaac

Seq. ID: 9 399 aatagccargacAtggaagcaacct Seq. ID: 204 aatagccargac-tggaagcaacct Seq. ID: 573 aatagccaarac-tggaarcaaccy

Seq. ID: 9 432 atcarcagatgaAtggataarvaag Seq. ID: 205 atcarcagatga-tggataarraag Seq. ID: 574 atcaacagrdga-tggataarnaar

Seq. ID: 9 455 aghtgtggtacaTatacacaatgga Seq. ID: 206 ag htgtggtaca-atayacaatgga Seq. ID: 575 arhtgtggtaya-atayacaatgga

Seq. ID: 9 582 ccaatacagtatAytaaygcatata Seq. ID: 208 ccaatacagtat-ytaaygcatata Seq. ID: 576 cmaatacartat-htaayryatata

Seq. ID: 9 588 cagtataytaayGcatatatatgga Seq. ID: 209 cagtataytaayGNcatatatatgga Seq. ID: 577 cartatahtaayRNyatatatatgga

Seq. ID: 9 611 gaatttagaaagAtggtaacrahaa Seq. ID: 210 gaatttagaaagTtggtaacrayaa

Seq. ID: 9 613 atttagaaagatGgtaacrahaacc Seq. ID: 211 atttagaaagatCgtaacrayaacc Seq. ID: 578 mtttaraaaratCrtamhaatramc

Seq. ID: 9 627 taacrahaacccTrtdtrcraraca Seq. ID: 212 taacrayaacccCrtatrcraraca

Seq. ID: 10 52 ascctgtgaatgMagtcaacacgaa Seq. ID: 213 ascctgtgaatgTagtcaacacgaa Seq. ID: 579 accctgtgaatgTagtcaacacgaa

_N Seq. ID: 10 72 acgaaggggcagTttt Seq. ID: 214 acgaaggggcagAttt

∞ Seq. ID: 11 142 tg rattg atccyTtkaycattatgt Seq. ID: 226 tgrattgatccyAtkaycattatgt Seq. ID: 580 trnatyrmtyctAtkwtsawkayrt

Seq. ID: 11 172 ccttctttgtctCttttnayvkyyt Seq. ID: 227 ccttctttgtct-ttttbayvdyyt Seq. ID: 581 ccttcttngtct-ttttnacndttt

Seq. ID: 11 224 tkagtattgcwaCtccwgctttctt Seq. ID: 228 tdagtattgctaGtcctgctttctt Seq. ID: 582 tragtatwgctaGycctgctttctt

Seq. ID: 11 250 tsdtyycyatttGcatg raatatyt Seq. ID: 229 tsntyycyattt-catg raatatct Seq. ID: 583 tbrtytcyrttt-natgraatatht

Seq. ID: 11 251 sdtyycyatttgCatgraatatytt Seq. ID: 230 sntyycyatttg-atgraatatctt

Seq. ID: 11 286 ctcactttcagtCtrtrtgtgtcyy Seq. ID: 231 ytcactttcagtGtrtrtgtgtcyy Seq. ID: 584 ytywctttcartGtdtrtktktcyh

Seq. ID: 11 366 ccattcagccagTctktgtcttttg Seq. ID: 232 ccattcagccagCctttgtcttttg

Seq. ID: 11 452 tttactttattgTtttgggttygag Seq. ID: 233 tttwctttattgCtttgggttyrag

Seq. ID: 11 461 ttgttttgggttYgagtttatacac Seq. ID: 234 ttgttttgg gttG ragtttata cac

Seq. ID: 11 753 tygttvyttttcYcttgctgctttt Seq. ID: 235 ttgttgyttttcActtgctgctttt Seq. ID: 585 ttgttgcttttcActtgctgctttt

Seq. ID: 11 795 tttratytttgtTartttgattadt Seq. ID: 236 tttratytttgtAartttgattart Seq. ID: 586 tttartyttyrtAadtttgattant

Seq. ID: 11 874 tcttggacttgrGtgaytatttcct Seq. ID: 237 tcttg g acttg g -tg ay tatttcct Seq. ID: 587 tyttggacttgr-traytdttycht

Seq. ID: 11 895 tcctttcccatkTtagggaagtttt Seq. ID: 238 tccttycccatkGtagggaagtttt Seq. ID: 588 ychttyymcatdGtagggaartttt

Seq. ID: 11 931 tcy tcaa rtattTtctca kg by ctt Seq. ID: 239 tcytcaa rtattGtctca kg byctt Seq. ID: 589 tcttcaaatattGtctca nsbπytt

Seq. ID: 11 933 ytcaartattttCtcakgbyctttc Seq. ID: 240 y tea a rtatttt-tca kg by ctttc

Seq. ID: 11 1010 yattgtcccagaGgtctctgagrtt Seq. ID: 215 nattgtcccagaAgtctctgag rtt Seq. ID: 590 nrttgtcccagaAgtctctsagntt

Seq. ID: 11 1028 tgag rttgtcctCatttctttthat Seq. ID: 216 tgag rttgtcctTatttcttttwat Seq. ID: 591 tsagnttgtcctTatttyttttwan

Seq. ID: 11 1040 catttctttthaTtcktttttcttt Seq. ID: 217 catttcttttwa-tcdtttttcttt Seq. ID: 592 catttyttttwa-tn ktttttcttt

Seq. ID: 11 1172 gaaagrvcmtgaGaaaatayttgaa Seq. ID: 218 gaaagrvcmtga-aaaatayttgar

Table 1 (continued)

Seq. ID: 11 1174 aagrvcmtgagaAaatayttgaaga Seq. ID: 219 aagrvcmtgagaCaatayttgarga Seq. ID: 593 aardvbhtgagaCaatatttgaaga

Seq. ID: 11 1210 aaaacttccctaAmatgggaaagga Seq. ID: 220 aaaacttccctaCmatgggraagga Seq. ID: 594 aaaayttccctaCmatgkgraagga

Seq. ID: 11 1220 taamatgggaaaGgaaatartcacy Seq. ID: 221 taamatgggraaTgaaatartcacc Seq. ID: 595 tammatgkgraaTgaaahagtyacc

Seq. ID: 11 1225 tgggaaaggaaaTartcacycaagt Seq. ID: 222 tgggraaggaaaAartcacccaagt

Seq. ID: 11 1231 aggaaatartcaCycaagtccaaga Seq. ID: 223 aggaaatartca-ccaagtccaaga Seq. ID: 596 aggaaahagtya-ccaagtccaara

Seq. ID: 11 1326 aagatyaaacacAaasavmaaawat Seq. ID: 224 aagatyaaacacTaagaamaaatat Seq. ID: 597 aarayyaaachcTaasnnmraatat

Seq. ID: 11 1346 aawattaaaaagCagcmagrgaraa Seq. ID: 225 aatattaaaaagTagcaagggaaaa Seq. ID: 598 aatattπaaaagTagcaagggaraa

Seq. ID: 12 72 tttacctgctgrRtatttctctgyc Seq. ID: 260 tttayctgctgrCtattyctctgyc

Seq. ID: 12 108 tttadattgctgTgtttggggtgkc Seq. ID: 241 tttarattgctgGgtttggggtgkc

Seq. ID: 12 109 ttadattgctgtGtttggggtgkcc Seq. ID: 242 tta ra ttg ctg wTtttg g g gtg kcc

Seq. ID: 12 193 gggkttgkacrrGtggcttgtcaag Seq. ID: 243 gggkttgkacrgAtggcttgtcaag Seq. ID: 599 g g g kttg kaca rAtgg cttgtcaag

Seq. ID: 12 346 cctgtathttraWgctcagggctrt Seq. ID: 244 cctgtathttdaWNgctcagggctrt Seq. ID: 600 cctgththttndRNgctcagggytrt

Seq. ID: 12 403 gtcttgcyctggAacttgttggcyc Seq. ID: 245 gtcttgcyctggGacttgttggcyc Seq. ID: 601 gtcttgchctggGrcttgttggcyy

Seq. ID: 12 410 yctggaacttgtTggcycttgggtg Seq. ID: 246 yctggaacttgtAggcycttgggtg

Seq. ID: 12 447 agtgtaggtatgGaggcdtttgatg Seq. ID: 247 agtgtaggtatgCaggcdtttgatg Seq. ID: 602 agtgtaggtatgCaggcdtttgrtg

Seq. ID: 12 448 gtgtaggtatggAggcdtttgatga Seq. ID: 248 gtgtaggtatggTggcdtttgatga Seq. ID: 603 gtgtaggtatggTggcdtttgrtga

Seq. ID: 12 450 gtaggtatggagGcdtttgatgagc Seq. ID: 249 gtaggtatggagTcdtttgatgagc

Seq. ID: 12 463 cdtttgatgagcTcytrtcrattaa Seq. ID: 250 cdtttgatgagcAcytgtcdattaa Seq. ID: 604 cdtttgrtgagcAcytgtcdattaa

Seq. ID: 12 479 rtcrattaatgtTccctggagtcag Seq. ID: 251 gtcdattaatgtCccctggagtcag

Seq. ID: 12 480 tcrattaatgttCcctggagtcagg Seq. ID: 252 tcdattaatgttTcctggagtcagg

^o Seq. ID: 12 509 cyctggwgtcagGdtttggacttaa Seq. ID: 253 cyctggwgtcagTdtttggayttaa Seq. ID: 605 cyctgrwgtcarTdtttggabttra

Seq. ID: 12 528 acttaagcctccTgcytcyggttwt Seq. ID: 254 ayttaagcctccAgcytcyrgttwt Seq. ID: 606 abttragcctccAgcytcyrgbtwt

Seq. ID: 12 543 ytcyggttwtcrGtcttattyttac Seq. ID: 255 ytcyrgttwtcrAtcttattyttac Seq. ID: 607 ytcyrg btwtcrAtcttattyttac

Seq. ID: 12 547 ggttwtcrgtctTattyttacagta Seq. ID: 256 rgttwtcrgtctCattyttacagta Seq. ID: 608 rgbtwtcrgtctCattyttacagta

Seq. ID: 12 558 ttattyttacagTagyytcaaract Seq. ID: 257 ttattyttacagAagyytcaaract

Seq. ID: 12 568 agtagyytcaarActtctccwtcta Seq. ID: 258 agtagyytcaarGcttctccwtcta Seq. ID: 609 agtagyhtcaarGcttctccwtcna

Seq. ID: 12 593 tacagcaccrwtGataaaacatcta Seq. ID: 259 tacarcaccrwtAataaaacatcta Seq. ID: 610 tacagcacmrwwAataaaacatcta

Seq. ID: 13 352 gtggtrcacgccTtrttcagccaag Seq. ID: 261 gtggtgcacgcmCygytcagccaag Seq. ID: 611 gtggtgcangcmCygytcagccaag

Seq. ID: 13 498 tgtsrtaaaacaActcatgcaaatg Seq. ID: 262 tgtsgtaaaacaCctcatgcaaatg Seq. ID: 612 tgtbgtaaaacaCctcatgcaaatg

Seq. ID: 14 13 ggttcaggatggGgaacacatgtat Seq. ID: 264 ggttcaggatggGNgaacacatgtat

Seq. ID: 14 29 cacatgtataccTgtggcggattca Seq. ID: 265 cacatgtataccTNgtggcggattca

Seq. ID: 14 48 gattcatkttgaTrtwtggcaaaac Seq. ID: 266 gattcatkttgaTNrtwtggcaaaac

Seq. ID: 14 69 aaacyaatacaaTwwtgtaaagttw Seq. ID: 267 aaacyaatacaaTNattgtaaagttw

Seq. ID: 14 111 aaawwwaaaawaRadwhawwawaad Seq. ID: 263 aaawwwaaaawaRNadwhawwawaad

Seq. ID: 15 20 tttcccggtcccCtcttgataagaa Seq. ID: 268 tttcccggtcccGtcttgataagaa

Seq. ID: 16 37 ragtcatrgtagGggaatctggcct Seq. ID: 269 ragtcatrgtagCggaatctggcct

Seq. ID: 17 82 yakgagytgcttGtatattttkgar Seq. ID: 297 yakgagytgyttCtatattttkgag Seq. ID: 613 yatgagttsyttCtatattttkgab

Seq. ID: 17 94 gtatattttkgaRattantystttg Seq. ID: 298 gtatattttkgaCattartystttg Seq. ID: 614 vtatattttkgaCattarthctttr

Seq. ID: 17 104 garattantystTtgtcagttgytt Seq. ID: 270 gagattartystGtgtcagttgctt Seq. ID: 615 gabattarthctGtrtcagwtrydt

Table 1 (continued)

Seq. ID: 17 148 ccattctgwrggYtgtcttttcayy Seq. ID: 271 ccattctgargg-tgtcttttcacc Seq. ID: 616 ccattctgtggg — tctttnn— c

Seq. ID: 17 150 attctgwrggytGtcttttcayytt Seq. ID: 272 attctgarggyt-tcttttcacctt Seq. ID: 617 attctgtggg— tctttnn— ctk

Seq. ID: 17 161 tgtcttttcayyTtgyttatagttt Seq. ID: 273 tgtcttttcacc-tgyttatrgttt Seq. ID: 618 -tctttnn-c-ktcttatggttt

Seq. ID: 17 237 ttgcttttatttCcawtaytctrgg Seq. ID: 274 ttgyttttattt-cawtaytctggg Seq. ID: 619 ttgyttttattt-cawtaytytdgg

Seq. ID: 17 262 aggtgggtcataGaggatcytgctg Seq. ID: 275 aggtgggtcataGNaggatcctgctg Seq. ID: 620 agrtgggtcataGNaggatcytgctg

Seq. ID: 17 301 gagtgttytgccTatgttytcctct Seq. ID: 276 gagtgttttgcc-atgttytcctct Seq. ID: 621 gagtgttttgcc-atgttytcytct

Seq. ID: 17 301 gagtgttytgccTatgttytcctct Seq. ID: 277 gagtgttttgccAatgttytcctct Seq. ID: 622 gagtgttttgcc-atgttytcytct

Seq. ID: 17 316 gttytcctctagGagttttatagtt Seq. ID: 278 gttytcctctagGNagttttatagtt Seq. ID: 623 gttytcytctarGNagttttatagtt

Seq. ID: 17 340 ttctggtcttacRtttag rtcttta Seq. ID: 279 ttytggtcttacRNtttagrtcttta Seq. ID: 624 ttctggtcttacANtttagrtcttta

Seq. ID: 17 492 tgcctcctttgtCaaagatharktg Seq. ID: 280 tgcctcctttgtGaaagathagktg Seq. ID: 625 tg bctcctttgtGaaa ratnag ktg

Seq. ID: 17 495 ctcctttgtcaaAgatha rktg wcy Seq. ID: 281 ctcctttgtcaaCgathagktghcc Seq. ID: 626 ctcctttgtsaaCratnagktgvcy

Seq. ID: 17 524 gtgygtggrtttAtytctgggcttt Seq. ID: 282 gtgygtggrtttCtytctgggcttt Seq. ID: 627 rtgyrtggrtttCtytstggrytyt

Seq. ID: 17 529 tg g rtttaty tcTg g g ctttctatt Seq. ID: 283 tggrtttatytcGgggctttctatt Seq. ID: 628 tggrtttatytsGggrytytctatt

Seq. ID: 17 536 atytctgggcttTctattytgttcc Seq. ID: 284 atytctgggctt-ctattytgttcc Seq. ID: 629 atytstgg ryty-ctattntgttcc

Seq. ID: 17 541 tgggctttctatTytgttccattga Seq. ID: 285 tgggctttctatGytgttccattga Seq. ID: 630 tg g ry tytctatGntgttccattg a

Seq. ID: 17 563 tgatctatatktCtgtytttgtgcc Seq. ID: 286 tgatctatatttAtgtytttgtgcc Seq. ID: 631 tgatctatrtktAtgtytttgtrcc

Seq. ID: 17 573 ktctgtytttgtGccagtaccatac Seq. ID: 287 ttctgtytttgt-ccagtaccatac Seq. ID: 632 ktctgtytttgt-ccagtaccatac

Seq. ID: 17 617 ttgtagtakagyCtgaagtcaggda Seq. ID: 288 ttgtagtanagyGtgaagtcaggha Seq. ID: 633 ttgtagtakagyGtgaagtcaggda

Seq. ID: 17 623 takagyctgaagTcaggdaggytga Seq. ID: 289 tanagyctgaagCcagghaggttga Seq. ID: 634 takagystgaagCcaggdagvbtga _ω Seq. ID: 17 627 gyctgaagtcagGdaggytgattcc Seq. ID: 290 gyctgaagtcagChaggttgattcc Seq. ID: 635 gystgaagtcagCdagvbtgattcc o Seq. ID: 17 641 aggytgattcctCcagytcyrttyt Seq. ID: 291 aggttgattcctGcagytccattct Seq. ID: 636 agvbtgattcctGcagbtyydttyt

Seq. ID: 17 645 tgattcctccagYtcyrttyttctt Seq. ID: 292 tgattcctccagGtccattcttctt Seq. ID: 637 tgattcctccagGtyydttyttctt

Seq. ID: 17 651 ctccagytcyrtTyttctttctcaa Seq. ID: 293 ctccagytccatGcttctttctcaa

Seq. ID: 17 668 tttctcaagattGctttggctatty Seq. ID: 294 tttctcaagattCctttggctatty Seq. ID: 638 tttctcaagattCytttggctatty

Seq. ID: 17 706 tttccatayaaaTtktraaattdtt Seq. ID: 295 tttccatacaaaGtktraaattwtt Seq. ID: 639 tttccayacaaaGtdtrdrrttwtt

Seq. ID: 17 797 ttgggtagtataC-hawywtmrtga Seq. ID: 296 ttgggtagtataThcatyttmacra

Seq. ID: 18 14 tttggggkbccCctgsctggrgtc Seq. ID: 299 gtgnggggbymcTctgsctggggtc Seq. ID: 640 gtntggggbyccTctgvctggggtc

Seq. ID: 18 35 rgtccttctctgTtgctyrgydyrt Seq. ID: 305 ggtccttctctrCtrcttdgygyrt Seq. ID: 641 g gtccttctctg Ctgcttd gydy rt

Seq. ID: 18 88 ggtccttytctgTtgctyrgtgyrt Seq. ID: 306 ggtccttctctgCtgctyrgtkyrt Seq. ID: 642 ggtccttctctgCtgcttdgydyrt

Seq. ID: 18 148 ctctgttgcttgGyryrtcaggcrc Seq. ID: 300 ctctgttgcttgCγgyrtcagghrc Seq. ID: 643 ctctgttgcttgCygyrtcaggcrc

Seq. ID: 18 199 ctctgttgctyrGtkyrtcaggyrc Seq. ID: 301 ctctgttgctyrCtkyrycagghrc Seq. ID: 644 ctctgttgctyrCtkyrtcagghrc

Seq. ID: 18 285 ccm cmytscctg Gg ktcctthtctg Seq. ID: 302 ccmcmytscctgAgktcctthtctg Seq. ID: 645 ccmcmytssctgAg ktccttmtctg

Seq. ID: 18 312 gcttggtgcatcAggcactaaagcc Seq. ID: 303 gcttggtgcatcTggcactaaagcc Seq. ID: 646 gcttggtgyrtcTgghrctwaagvs

Seq. ID: 18 315 tggtgcatcaggCactaaagcctgc Seq. ID: 304 tggtgcatcaggAactaaagcctgc Seq. ID: 647 tggtgyrtcaggArctwaagvskgc

Seq. ID: 19 23 rcaaagagtcrgAcacgactgagcg Seq. ID: 309 rcaaagagtcrgANcacgactgagcg

Seq. ID: 19 147 cctttcaaggcgTgaatgtttcaga Seq. ID: 307 cctttcaaggcgCgaatgtttcaga

Seq. ID: 19 200 tcttcagtacacTgtcgtcacttag Seq. ID: 308 tcttcagtacac-gtcgtcacttag

Seq. ID: 19 295 tcgctcagtcgtGtcygactctttg Seq. ID: 310 teg ctcag tcgtG Ntcyg actctttg

Seq. ID: 19 429 ctgtcgtcccctTctcctctgcccy Seq. ID: 311 ctgtcgtcccct-ctcctctg cccy

Table 1 (continued)

Seq. ID: 19 465 m sea kca kcag rTcttttccaatg w Seq. ID: 312 mbcakcakcagr-cttttccaatga Seq. ID: 648 cscakcakcaga-cytttccaatga Seq. ID: 19 471 akcagrtcttttCcaatgwrdyhas Seq. ID: 313 akcagrtcttttTcaatgardyhas Seq. ID: 649 akcagatcytttTcaatgagtyaac Seq. ID: 19 510 chaaaavcwcatAtcacatttratt Seq. ID: 314 cbaaaagvtcayCksacatttmaky Seq. ID: 19 511 haaaavcwcataTcacatttrattt Seq. ID: 315 baaaagvtcaymGsacatttmakyt Seq. ID: 650 gmcaaggt-ahcGgac-tttmagyt Seq. ID: 19 839 cttgagcactatTatcaaaaatcac Seq. ID: 316 cttgagcactatGatcaaaaatcac Seq. ID: 20 112 aagaaatcaaagAdgacacaaabar Seq. ID: 317 aagaaatcaaag-dgacacwaayar Seq. ID: 651 aagaaatyaaan-rgacabraayar Seq. ID: 20 118 tcaaagadgacaCaaabaratggar Seq. ID: 318 tcaaagadgacaGwaayaratggar Seq. ID: 652 tyaaanrrgacaGraayaratggar Seq. ID: 20 147 atwccatgttchTggattggaagaa Seq. ID: 319 atwccatgttcaGggattggaagaa Seq. ID: 653 athcyrtgttcaGggatwggaagaa Seq. ID: 20 195 ctacccaaacaaTytayagattcaa Seq. ID: 320 ctacccaaacaaCytayagattcaa Seq. ID: 20 197 acccaaacaatyTayagattcaayr Seq. ID: 321 acccaaacaatyCayagattcaayg Seq. ID: 654 acccaaacarthCayagattcaaya Seq. ID: 21 124 atacacyaayaaYrrrmaaacagar Seq. ID: 322 atacacyaayaa-rrrmaaacagar Seq. ID: 21 289 baratggararaTatwccatgttcm Seq. ID: 323 yaratggararaGatwccatgttcm Seq. ID: 655 haratggaaaraGathccrtgytca Seq. ID: 21 401 ryattyttcacaGaactagaamaaa Seq. ID: 324 ryattyttcacaTaactagaacaaa Seq. ID: 656 rhwtttttcacaTaactagaamaaa Seq. ID: 21 426 haatttyamaatTyrtatggaamya Seq. ID: 325 haatttyamaatCyrtatggaamya Seq. ID: 657 haatyyyamaatCyrtatggaahca Seq. ID: 22 133 gtartattccatTgtgtatatgtac Seq. ID: 326 rtartattccatAgtgtatatgtac Seq. ID: 658 rtartattccatAgtrwrtatrtdc Seq. ID: 22 137 tattccattgtgTatatgtaccaca Seq. ID: 327 tattccattgtgAatatgtaccaca Seq. ID: 659 tattccatdgtrArtatrtdccaca Seq. ID: 22 222 traayagtgctgCdatraacatwbg Seq. ID: 328 traatagtgctgCNdatraacatwsg Seq. ID: 660 traatartgctgCNwatgaacatwsg Seq. ID: 22 246 g kgtrcatg titCty twtvrha kh a Seq. ID: 329 gkgtrcatgtgt-tttwtvrhakha Seq. ID: 661 gkgtrcawgtrt-tttwtdkwabnv Seq. ID: 22 304 tgggattgctggGtcawatggtakt Seq. ID: 330 tgggattgctggTtcawatggtakt Seq. ID: 662 tgg rattgctggTtcatatgstadt u> Seq. ID: 22 312 ctgggtcawatgGtakttctakttc Seq. ID: 331 ctgggtcawatgCtakttctakttc Seq. ID: 663 ctgg ntcatatgCtadttytrktty Seq. ID: 23 25 tacagcaaagtgAhtcagttataca Seq. ID: 332 tacagcaaagtgGwtcagttataca Seq. ID: 24 6 gtggaAgagggcctcatctccagtt Seq. ID: 334 gtggaTgagggcctcatctccagtt Seq. ID: 24 51 cagggtacctctGattgaggc Seq. ID: 333 cagggtacctctCatbgyggcgt Seq. ID: 664 cagggtwcctctCatggtgnngt Seq. ID: 25 352 t— gggggkagaTgctcgagaatwt Seq. ID: 335 t~gggggkakaGgctcgagaatwt Seq. ID: 25 402 ccgcmtttgcgtAtgccgagcctcc Seq. ID: 336 ccgcmtttgcgtTtgccgagcctcc Seq. ID: 26 42 tggacaggagggTcgactcccctgc Seq. ID: 337 tggacaggagggGcgactcccctgc Seq. ID: 27 113 tgatgggaccrgAtgccatgatctt Seq. ID: 344 tgatgggaccrgTtgccatgatctt Seq. ID: 27 155 bttknttwwtccAactctcacatcc Seq. ID: 364 ytt-sttwdtccGactctcacatcc Seq. ID: 665 bttg nttthtccGactctcacatcc Seq. ID: 27 197 aaaccatagcytTgactagayggac Seq. ID: 371 aaaccatagcytAgactagayggac Seq. ID: 666 aaaccatagcytAgactagacrgac Seq. ID: 27 201 catagcyttgacTagayggaccttt Seq. ID: 373 . catagcyttgacGagayggaccttt Seq. ID: 667 catagcyttgacGagacrgaccttt Seq. ID: 27 213 tagayggaccttTgttggcaaagta Seq. ID: 375 tagayggaccttAgttggcaaagta Seq. ID: 668 tagacrgaccttAgttggcaaagta Seq. ID: 27 265 aggttggtcataActttycttccaa Seq. ID: 386 aggttggtcataANctttycttccaa Seq. ID: 27 306 aatttcatggctGcagtcaccatct Seq. ID: 387 aatttcatggct-cagtcaccatct Seq. ID: 669 aatttcatggct-cartcaccatct Seq. ID: 27 326 catctgcagtgaTt±tggagcccmm Seq. ID: 388 catctgcagtgaTNtttggagcccmv Seq. ID: 670 catctgcagtgaTNtttggagcccaa Seq. ID: 27 394 tsccatgaagtgAtgggaccrgatg Seq. ID: 389 tsccatgaagtgANtgggaccrgatg Seq. ID: 27 426 ctthgttttytgAatgttgagyttt Seq. ID: 390 ctthgttttytgCatgttgagyttt Seq. ID: 671 cttagttttytkCatrttkagyttt Seq. ID: 27 549 gttattg atattTctcccrg caatc Seq. ID: 391 gttattgatatt-ctccyrgcaatc Seq. ID: 672 gttrttgatatt-ctccyrg caatc Seq. ID: 27 624 tgcatataagttAaataagcagggt Seq. ID: 392 tgcatataagtt-aataagcagggt Seq. ID: 673 tgcatataagtt-wataarcagggt Seq. ID: 27 625 gcatataagttaAataagcagggtg Seq. ID: 393 gcatataagttaTataagcagggtg Seq. ID: 674 gcatataagttaTataarcagggtg

Table 1 (continued)

Seq. ID: 27 640 aagcagggtgacAatatacagcctt Seq. ID: 394 aagcagggtgacTatatacagcctt Seq. ID: 675 aarcagggtgacTatatacagcctt

Seq. ID: 27 670 actcctttyccwAtttggaaccagt Seq. ID: 395 actcctttycctANtttggaaccagt Seq. ID: 676 actccttttccwANtttkgaaccagt

Seq. ID: 27 709 ccagttctaactGttgcttcctgac Seq. ID: 396 ccagttctaactTttgcttcctgac Seq. ID: 677 ccagttctaactTttgcttcytgac

Seq. ID: 27 710 cagttctaactgTtgcttcctgacc Seq. ID: 397 cagttctaactgGtgcttcctgacc Seq. ID: 678 cagttctaactgGtgcttcytgacc

Seq. ID: 27 837 atagtcaataaaGcagaartagatg Seq. ID: 398 atagtcaataaa-cagaartagatg

Seq. ID: 27 837 atagtcaataaaGcagaartagatg Seq. ID: 399 atagtcaataaaCcagaartagatg Seq. ID: 679 atagtcaataaa-cagaartagatg

Seq. ID: 27 838 tagtcaataaagCagaartagatgt Seq. ID: 400 tagtcaataaag-agaartagatgt Seq. ID: 680 tagtcaataaas-agaartagatgt

Seq. ID: 27 922 gttcctctgcctTttctaaawccag Seq. ID: 401 gttcctctgcctGttctaaawccag Seq. ID: 681 gttcctctgcctGttctaaa hccag

Seq. ID: 27 944 cagcttgaacatCtggaagttcacg Seq. ID: 402 cagcttgaacat-tggaagttcayg Seq. ID: 682 cagcttgaacat-tggaagttcayr

Seq. ID: 27 955 tctggaagttcaCggttcayrtayt Seq. ID: 403 tctggaagttca-ggttcacrtayt Seq. ID: 683 tctggaagttca-rgttcayrtayt

Seq. ID: 27 968 ggttcayrtaytGytgaagcctggc Seq. ID: 404 ggttcacrtayt-ytgaagcctggc Seq. ID: 684 rgttcayrtayt-ytgaagcctggc

Seq. ID: 27 990 ggcttggagaatTttgagcattact Seq. ID: 405 ggcttggagaatAttgagcattact

Seq. ID: 27 1006 agcattactttrCtagcrtgtgaga Seq. ID: 338 agcattacttta-tagcrtgtgaga Seq. ID: 685 agcattactttr-tagyrtgtgaga

Seq. ID: 27 1012 actttrctagcrTgtgagatgagtg Seq. ID: 339 actttactagcrGgtgagatgagtg Seq. ID: 686 actttrctagyrGgtgagatgagtg

Seq. ID: 27 1025 gtgagatgagtgCaattgtgyggta Seq. ID: 340 gtgagatgagtgCNaattgtgyggta Seq. ID: 687 gtgagatgagtgCNaattgtgyrgta

Seq. ID: 27 1089 aatgaaaactgaCcttttccagtcc Seq. ID: 341 aatgaaaactgaCNcttttccagtcc

Seq. ID: 27 1098 tgaccttttccaGtcctgtggccac Seq. ID: 342 tgaccttttccaGNtcctgtggccac

Seq. ID: 27 1101 ccttttccagtcCtgtggccactgc Seq. ID: 343 ccttttccagtcCNtgtggccactgc

Seq. ID: 27 1168 acagcatcatctTtyaggatttgaa Seq. ID: 345 acagcatcatct-tyaggatttgaa ω Seq. ID: 27 1172 catcatctttyaGgatttgaaatag Seq. ID: 346 catcatctttyaGNgatttgaaatag κ> Seq. ID: 27 1264 tcacattccaggAtgtctggctcta Seq. ID: 347 tcacattccaggTtgtctggctcta

Seq. ID: 27 1306 tcrtgattatctGggtcatgaagat Seq. ID: 348 tcgtgattatct-ggtcatraagat Seq. ID: 688 tcrtgrttatct-ggtcrtgaagat

Seq. ID: 27 1307 crtgattatctgGgtcatgaagatc Seq. ID: 349 cgtgattatctg-gtcatraagatc Seq. ID: 689 crtg rttatctn-gtcrtgaagatc

Seq. ID: 27 1311 attatctgggtcAtgaagatctttt Seq. ID: 350 attatctgggtc-traagatctttt Seq. ID: 690 rttatctnggtc-tgaagatctttt

Seq. ID: 27 1312 ttatctgggtcaTgaagatcttttt Seq. ID: 351 ttatctgggtca-raagatcttttt Seq. ID: 691 ttatctnggtcr-gaagatcttttt

Seq. ID: 27 1322 catgaagatcttTtttgtacagttc Seq. ID: 352 catraagatcttTNtttgtacagttc Seq. ID: 692 crtgaagatcttTNtttgtayagttc

Seq. ID: 27 1340 acagttcttctgTgtattcttgcca Seq. ID: 353 acagttcttctg-gtattcttgcca Seq. ID: 693 ayagttcttctg-gtattcttgcca

Seq. ID: 27 1345 tcttctgtgtatTcttgccacctct Seq. ID: 354 tcttctgtgtat-cttgccacctct

Seq. ID: 27 1372 ttaatatcttctGcttctgttaggt Seq. ID: 355 ttaatatcttct-cttctgttaggt

Seq. ID: 27 1374 aatatcttctgcTtctgttaggtcc Seq. ID: 356 a atatcttctg cGtctgttag gtcc

Seq. ID: 27 1431 tgcatgaaatgtTcccttggtatct Seq. ID: 357 tgcatgaaatgtGcccttggtatct Seq. ID: 694 tgcatgnaatgtGccyttggtatct

Seq. ID: 27 1462 ttcttgaagagaTctctagtctttc Seq. ID: 358 ttcttgaagagaGctctagtctttc

Seq. ID: 27 1478 tagtctttcccaTtctrttgttttc Seq. ID: 359 tagtctttcccaGtctrttgttttc Seq. ID: 695 tagtctttcccaGtctdttgttttc

Seq. ID: 27 1482 ctttcccattctRttgttttcctct Seq. ID: 360 ctttcccattctRNttgttttcctct Seq. ID: 696 ctttcccattctDNttgttttcctct

Seq. ID: 27 1489 attctrttgtttTcctctatttctt Seq. ID: 361 attctrttgtt±TNcctctatttctt Seq. ID: 697 attctdttgtttTN cctctatttctt

Seq. ID: 27 1498 ttttcctctattTctttgcattgat Seq. ID: 362 ttttcctctat±TNctttg cattg at

Seq. ID: 27 1510 tctttgcattgaTcrctgargaagg Seq. ID: 363 tctttgcattgaGcrctgargaagg

Seq. ID: 27 1629 agctatttgtaaGgcctccycagac Seq. ID: 365 agctatttgtaa-gcctccycagac

Seq. ID: 27 1630 gctatttgtaagGcctccycagaca Seq. ID: 366 gctatttgtaag-cctccycagaca

Table 1 (continued)

Seq. ID: 27 1677 cttttycwtgggGatggtcttgatc Seq. ID: 367 cttttycwtgggGNatggtcttgatc Seq. ID: 698 cttttycwtgggGNatggtyttgrtc

Seq. ID: 27 1740 ttcwtcaggcacTctrtctatcaga Seq. ID: 368 ttcwtcaggcacCctrtctatcaga

Seq. ID: 27 1836 tggtctagtggtTttccctactttc Seq. ID: 369 tggtctagtggtCttccctactttc

Seq. ID: 27 1968 aagaatataatcAatctgatttygg Seq. ID: 370 aagaatataatcCatctgatttygg

Seq. ID: 27 1988 ttyggtrttgacCatctggtgatgt Seq. ID: 372 ttyggtgttgacAatctggtgatgt

Seq. ID: 27 2103 tg cttca ttcy gTay tcca a g g cca Seq. ID: 374 tgcttcattctgTNaytccaaggcca Seq. ID: 699 tgcttcattytdTNaytccaaggcca

Seq. ID: 27 2176 cagtcccctataAtgaaaaggacat Seq. ID: 376 cagtcccctataGtgaaaaggacat

Seq. ID: 27 2179 tcccctataatgAaaaggacatctt Seq. ID: 377 tcccctataatgTaaaggacatctt Seq. ID: 700 tsccctatartgTaaagracatctt

Seq. ID: 27 2230 tgtaggtcttcaTagaaccrttcaa Seq. ID: 378 tgtaggtcttcaAagaaccrttcaa Seq. ID: 701 tgtdggtcttcaAagaaccrttcaa

Seq. ID: 27 2233 aggtcttcatagAaccrttcaactt Seq. ID: 379 aggtcttcatagCaccrttcaactt

Seq. ID: 27 2256 ttcagcttcttcAgcrttactggtt Seq. ID: 380 ttcagcttcttcTgcrttactggtt Seq. ID: 702 ttcagcttcttcTgcrttastggtt

Seq. ID: 27 2296 attactgtgataTtgaatggtttgc Seq. ID: 381 attactgtgataGtgaatggtttgc

Seq. ID: 27 2315 gtttgccttggaAaygaacagagat Seq. ID: 382 gtttgccttgga-aygaacagagat Seq. ID: 703 gtttgccttgga-ayraacagagat

Seq. ID: 27 2316 tttgccttggaaAygaacagagatc Seq. ID: 383 tttgccttggaa-ygaacagagatc Seq. ID: 704 tttgccttggaa-yraacagagatc

Seq. ID: 27 2318 tgccttggaaayGaacagagatcat Seq. ID: 384 tgccttggaaay-aacagagatcat

Seq. ID: 27 2418 tcttttgttgacYatgatggcyact Seq. ID: 385 tcttttgttgacAatgatggctact

Seq. ID: 28 35 gcttcagtagttGcngcwyrbgggc Seq. ID: 406 gcttcagtagttTyrgcacrygggc Seq. ID: 705 gcttcagtagttTyrgcrcayrggc

Seq. ID: 28 63 gbagttghggytCmhggsbctagwg Seq. ID: 407 gtagttgyggctTvtggsyytagak

Seq. ID: 29 20 tccrggagttggTgatggacaggga Seq. ID: 408 tccgggagttggTNgatggacaggga Seq. ID: 706 tcygggagttggTNgatggacaggga _ω Seq. ID: 29 42 ggaggcctggygTgctgcrrtycat Seq. ID: 409 ggaggcctggygTNgctgcrrttcat w Seq. ID: 30 490 gyrggccagtccAtggggtcrcama Seq. ID: 410 gyrggccagtcc-tggggtcrcaaa Seq. ID: 707 gyrggccngtcy-nngggtcrcaaa

Seq. ID: 31 35 wagbaaadthhaAtdhtataaagaa Seq. ID: 411 dagdaaadthmaANtdhtataaadaa Seq. ID: 708 navhraavtchrRNtgntataaadaa

Seq. ID: 31 74 ttcagttcagttCagtcgctcagtc Seq. ID: 412 ttcagttcagttCNagtcrctcagtc

Seq. ID: 32 54 dakdhadtbhabTttadhtddddtc Seq. ID: 413 daddbagtbhab-ttwdhtddddtc Seq. ID: 709 garkbdgtvhr— ttaavtvrvktc

Seq. ID: 33 25 atcaagggtgccAagtgccctttcg Seq. ID: 414 atcaagggtgccCagtgccctttcg

Seq. ID: 33 36 caagtgccctttCgacctccaattc Seq. ID: 415 caagtgccctttAgacctccaattc

Seq. ID: 34 97 cyagccagtaatCasttccytawgt Seq. ID: 436 ctagccagtaatAacttccytatgt

Seq. ID: 34 279 tctggaacacacAgaaattcacaga Seq. ID: 416 tctggaacacacANranattcacaga

Seq. ID: 34 316 rgagaggggttaGgraggagacasa Seq. ID: 417 agagargggtkwAggaggagacasa Seq. ID: 710 agagargrgkkaAggaggagayasa

Seq. ID: 34 370 aaagggagagagAgcartcaagcca Seq. ID: 418 aaagggrgagagTgcartcaagcca

Seq. ID: 34 468 mwaaaagcaaarAttaaaaatctag Seq. ID: 419 mwaaaagcaaarTttaaaaatctag Seq. ID: 711 maaaaagcaaarTttaaaaatctag

Seq. ID: 34 477 aarattaaaaatCtagagtagagkt Seq. ID: 420 aarattaaaaatTtagagtagagkt

Seq. ID: 34 529 aaaagaagaaggAaaagaaagagag Seq. ID: 421 aaaagaagaagrCaaagaaagagag Seq. ID: 712 aaaaraagaarrCaaagaaagarag

Seq. ID: 34 576 aamarggtmvyrAarattrkaangw Seq. ID: 422 aamaaggtcvyrGarattakaaagw Seq. ID: 713 aavaargkvryrGarawkataragw

Seq. ID: 34 592 ttrkaangwwmdTaaaggtacaaaa Seq. ID: 423 ttakaaagwamwTNaaaggtacaaaa Seq. ID: 714 wkataragwamaTNawaggtacaaaa

Seq. ID: 34 616 attgrtaacaaaTacmwaaaagcaa Seq. ID: 424 attgataacaaaCacmaaaaagcaa Seq. ID: 715 attgataacwaaCaccaaaaagcaa

Seq. ID: 34 617 ttgrtaacaaatAcmwaaaagcaaa Seq. ID: 425 ttgataacaaatANcmaaaaagcaaa Seq. ID: 716 ttgataacwaatANccaaaaagcaaa

Seq. ID: 34 620 rtaacaaatacmWaaaagcaaarat Seq. ID: 426 ataacaaatacmCaaaagcaaarat

Seq. ID: 34 622 aacaaatacmwaAaagcaaaratta Seq. ID: 427 aacaaatacmaaTaagcaaaratta

Table 1 (continued)

Seq. ID: 34 625 aaatacmwaaaaGcaaarattaaaa Seq. ID: 428 aaatacmaaaaaGNcaaarattaaaa Seq. ID: 717 waataccaaaaaGNcaaarrttaaaa

Seq. ID: 34 678 atacratgttaaAaaaraagaagra Seq. ID: 429 atacratgttaaGaaaraagaagra

Seq. ID: 34 681 cratgttaaaaaAraagaagraaaa Seq. ID: 430 cratgttaaaaaTraagaagraaaa

Seq. ID: 34 726 aaacaaamaanaAmaaacaargtmv Seq. ID: 431 aaacaaasaaraGcaaacaargt-c

Seq. ID: 34 727 aacaaamaanaaMaaacaargtmvv Seq. ID: 432 aacaaasaaraaTaaacaargt-cv Seq. ID: 718 aavaaanaaraaTaaavaaagtnca

Seq. ID: 34 733 maanaamaaacaArgtmvvhawaar Seq. ID: 433 saaraacaaacaCrgt-cvhawaaa

Seq. ID: 34 746 rgtmvvhawaarTtataaagaaaat Seq. ID: 434 rgt-cvhawaaaAtataaagaaaat

Seq. ID: 34 758 ttataaagaaaaTaayaggtacaaa Seq. ID: 435 ttataaagaaaaAaahaggtacaaa

Seq. ID: 35 40 cagagacattacTttgccaacaaaa Seq. ID: 437 cagagacattacAttgccaacaaar Seq. ID: 719 cagagacattacAttgccaacaaaa

Seq. ID: 36 12 ctgtraagaarGctgagyrcygaa Seq. ID: 438 ctgtraagaarGcNtgagyrccgaa Seq. ID: 720 ctgtraagaaaGcNtgagygcygaa

Seq. ID: 36 77 gagagtcccttgGactgcaaggaga Seq. ID: 439 gagagtcccttgGNactgcaaggaga

Seq. ID: 37 101 waawadmahthaHhtcaaactg Seq. ID: 440 daatadhahtwaHNctcaaactg Seq. ID: 721 taayarnachha-ctcawactg

Seq. ID: 38 69 ttcctgggtgtgAcagtgtttaacc Seq. ID: 456 ttcctgggtgtgGcagtgtttaacc

Seq. ID: 38 77 tgtgacagtgttTaacctacaaact Seq. ID: 458 tgtgacagtgttCaacctacaaact Seq. ID: 722 tgtgnnngtgttCaacctacaaact

Seq. ID: 38 81 acagtgtttaacCtacaaactcctt Seq. ID: 459 acagtgtttaacTtacaaactcctt Seq. ID: 723 nnngtgtttaacTtacaaactcctt

Seq. ID: 38 252 gtgaatggttttTcacttgttgggc Seq. ID: 441 gtgaatggttttTNcacttgttgggc Seq. ID: 724 gtgaakggttttTNcacttgttgggc

Seq. ID: 38 284 tgctgctaagttTccatatccctta Seq. ID: 442 tgctgctaagtt-ccatatccctta

Seq. ID: 38 290 taagtttccataTcccttacctgct Seq. ID: 443 taagtttccataTNcccttacctgct

Seq. ID: 38 293 gtttccatatccCttacctgctgtg Seq. ID: 444 gtttccatatccCNtta cctg ctg tg

Seq. ID: 38 296 tccatatcccttAcctgctgtgtcc Seq. ID: 445 tccatatcccttTcctg ctgtgtcc

-^ Seq. ID: 38 363 ttaatgtttgtaAcctgggaccctt Seq. ID: 446 ttaatg tttg taTcctg g g a ccctt

Seq. ID: 38 369 tttgtaacctggGacccttgagtta Seq. ID: 447 tttgtaacctggGNacccttgagtta

Seq. ID: 38 397 ctttttcttg tt Atag ccca cca ca Seq. ID: 448 ctttttcttgtt-tagcccaccaca

Seq. ID: 38 404 ttgttatagcccAccacacctttgc Seq. ID: 449 ttgttatagcccGccacacctttgc

Seq. ID: 38 408 tatagcccaccaCacctttgctctg Seq. ID: 450 tatagcccaccaTacctttgctctg

Seq. ID: 38 455 gcttttttggagGgtggctcctgac Seq. ID: 451 gcttttttggagTgtggctcctgac

Seq. ID: 38 469 tggctcctgaccAaccacctttaga Seq. ID: 452 tggctcctgaccGaccacctttaga Seq. ID: 725 tggctcctgaccGaycacctttaga

Seq. ID: 38 484 cacctttagagaAaaataagttttc Seq. ID: 453 cacctttagaga-aaataagttttc

Seq. ID: 38 502 agttttctgaagAaaaggtcttaaa Seq. ID: 454 agttttctgaagGaaaggtcttaaa

Seq. ID: 38 514 aaaaggtcttaaAatgttaacaggc Seq. ID: 455 aaaaggtcttaaANatgttaacaggc

Seq. ID: 38 750 gctgaattaytcAgyctcttttctc Seq. ID: 457 gctgaattaytcGgyctcttttctc

Seq. ID: 39 259 cgtgatcagtgcAtgatagc Seq. ID: 460 cgtgatcagtgcTtgatagc

Seq. ID: 40 59 atgatagccacgTgatcagtgcatg Seq. ID: 461 atgatagycacgAgatcagtgcatg

Table 2

A B C D H I K

Seq Homology to Homo#

Repetitive ID Length Repetitive Genomic contig with highest score logy to Sequences uniqu

A C 9K Max INIDr Total Element from DB 6

IT Dependent Oligos

No. cow IT

Element genome positi ons

1 BTSAT4 2473 full length 8840 47 8 15 Seq. ID Nos. 42-88 and Seq ID Nos. 463-490

2 Ll-BT 259 full length 492 7 8 13 Seq. ID Nos. 89-95 and Seq. ID Nos. 491-495

3 Ll-BT 530 91.5% 1075 17 7 15 Seq. ID Nos. 96-112 and Seq. ID Nos. 496-502

4 Ll-BT 614 full length 420 19 6 15 Seq. ID Nos. 113-131 and Seq. ID Nos. 503-521

5 BOV2 103 75.0% 11585 1 6 6 Seq. ID: 132

6 BOVA2 2285 11.8% ref|NW_001495372.1 |Bt7_WGA975_3 88.0% 1360 19 7 15 Seq. ID Nos. 133-151 and Seq. ID Nos. 522-531

7 BOVB 912 92.6% 1081 20 8 15 Seq. ID Nos. 152 -171 and Seq. ID Nos. 532-540

8 BTSAT6 633 full length 226 25 6 8 Seq. ID Nos. 172-196 and Seq. ID Nos. 541-565

9 Ll-BT 655 full length 3166 16 8 15 Seq. ID Nos. 97-212 and Seq. ID Nos. 566-578

10 BTSAT4 75 full length 4123 2 5 8 Seq. ID Nos. 213-214 and Seq. ID Nos. 579-579

11 Ll-BT 1475 78.6% 675 26 8 15 Seq. ID Nos. 215-:240 and Seq. ID Nos. 580-598

12 Ll-BT 730 full length 1046 20 9 15 Seq. ID Nos. 241-260 and Seq. ID Nos. 599-610

13 None 572 ref|NW_001493362.1 |Btl5_WGA1870_3 69.8% 640 2 5 7 Seq. ID Nos. 261-262 and Seq. ID Nos. 611-612

14 Ll-BT 145 full length 4690 5 9 15 Seq. ID: 263 to Seq. ID: 267

15 BTSAT4 53 full length 5138 1 9 9 Seq. ID: 268

16 BTSAT4 55 90.9% 333 1 6 6 Seq. ID: 269 u> 17 Ll-BT 929 full length 925 29 8 15 Seq. ID Nos. 270-298 and Seq. ID Nos..613-639

^ 18 None 361 305 8 7 14 Seq. ID Nos. 299-306 and Seq. ID Nos. 640-647

19 ART2A 1300 21.5% 19158 10 10 14 Seq. ID Nos. 307-316 and Seq. ID Nos. 648-650

20 Ll-BT 211 full length 607 5 6 13 Seq. ID Nos. 317-321 and Seq. ID Nos. 651-654

21 Ll-BT 499 full length 450 4 6 13 Seq. ID Nos. 322-325 and Seq. ID Nos. 655-657

22 L1PA8 334 full length 549 6 6 15 Seq. ID Nos. 326-331 and Seq. ID Nos. 658 -663

23 None 125 ref|NW_001508842.1 |BtX_WGA3030_3 77.0% 379 1 7 7 Seq. ID: 332

24 BTSAT4 61 85.2% 3375 2 5 8 Seq. ID: 333 to Seq. ID: 334 and Seq ID: 664

25 BTLTRl 723 74.1% 1593 2 7 9 Seq. ID: 335 to Seq. ID: 336

26 BTSAT4 57 full length 2746 1 5 5 Seq. ID: 337

27 BOVB 2434 92.6% 4675 68 8 15 Seq. ID Nos. 338-405 and Seq. ID Nos. 665 -704

28 CHR-2 120 88.3% 133 2 6 9 Seq. ID: 406 to Seq. ID: 407 and Seq ID:, 705

29 ART2A 141 full length 13335 2 5^' 8 Seq. ID: 408 to Seq. ID: 409 and Seq ID: 706

30 Bov-tA 570 26.3% ref|NW_001495573.1 |Bt9_WGA1221_3 99.0% 5376 1 7 7 Seq. ID: 410 to Seq. ID: 410 and Seq ID: 707

31 BOV2 122 50.0% ref|NW_001494076.1 | Bt22_WGA2384_J3 94.0% 12323 2 7 11 Seq. ID: 411 to Seq. ID: 412 and Seq ID: 708

32 ART2A 125 44.8% ref|NW_001503113.1 |BtUn_WGA4440_3 84.0% 907 1 6 6 Seq. ID: 413 and Seq ID: 709

33 BTSAT4/5 52 full length 4498 2 7 11 Seq. ID: 414 to Seq. ID: 415

34 Ll-BT 795 full length 405 21 7 15 Seq. ID Nos. 416-436 and Seq. ID Nos. 710-718

35 BOVB 62 full length 666 1 5 5 Seq. ID: 437 and Seq ID: 719

36 BOV2 235 full length 4881 2 6 10 Seq. ID: 438 to Seq. ID: 439 and Seq ID: 720

37 BCS 110 73.6% 1039 1 5 5 Seq. ID: 440 and Seq ID: 721

Table 2 (continued)

38 BTLTRl 905 full length 3077 19 11 15 Seq. ID Nos.441-459 and Seq. ID Nos.722-725

39 BTSAT3 266 63.9% 1102 1 5 5 Seq. ID: 460

40 BTSAT4 81 full length 1320 1 5 5 Seq. ID: 461

41 ART2B_BT 81 full length 8451 1 6 6 Seq. ID: 462

U)

CS

Corr5_FastaConsAl l .txt >Seq . ID: 1

TGTGTTGGGKGKGBTHBKGDGGWTGTGGRMCMACMCAMGAGGCMDCNNCTGRAWTSYCKTCGTRASWCSRGMMTCMYSCY GYMRCTCGAGAAMAACCMCGWGRYTCCCCCGTCATCGCRAGATGARGSCCTTKYCCSCTRCAKSGCCTMRGGASMAGTCY CRCGTTAGGWATTGGAGSTCGAAAGGGBACTTGRCACCCTTGATGCGACCCACAAAGTTCCCCGAVATCCCGGTCTCCCT CGAGAGGAACACYGAGGTTTTCCGGCACCMCYTCCTCTGAGCCCTTTCTCCCCTCCTGATCTGGACAGGAGGGTCGACTC CCCTGCTTTGTCTGGAAGGGGTTCCCGACCTTCCGGTCGCACCTCAGGATGAGGCCGGTCTCACGAAGACATTCCAGACG TGGCCTCGTGGGTGGTTCCACATTCCGTAGGACCCCGATTTCCCGGTCCCCTCTTGATAAGAACCCGATGCCCGGACACC TCTCCGAACTCCASCCTGTGAATGMAGTCAACACGAAGGGGCAGTNNYTTKYCCGTGCATCGTTCGGAAAAAACCCCAGG

ACGAGGCCTGACTCTCCTGTCCCCAGTCTGCAGGGACCCTGCGATCGGAGTCTGAAATCAGAGGWACCCTGMGGTTCCTG CCTCAACTGGAGATGAGGCCCTCTTCCAATGCACCAAVCCCAGTGGAGTCCCGAGAGGCCCCTCCCACCKCCAGTTTCCC TGRCTTCTCAGAGCCACCATGAGAAGCCCCCTGAGGTCACCTGCACAAGTCGAGGGAACCCMGGGTTTCCTGCCTCMACY

YYCCYTCGCMACTCGMATGGNGAYYSGACTTCCCTGGSSCMMCACRAGAGGCWBMCTGAMYTCSCCGTCGTAMCTCGNNA

CAKSGCCTMRVGASMARTCYCRCGWYMKCWCTCSRARCKCSWMAGGRBRCTTGRCWCCCTTGAKKCSACCCAGWGAGCTC CMAGAGATACCCGTCGCGAYTCGAGAGCAGAGCGGGGTTCTTTGCTTCCACTCGAGATGAATGCCTGTCTCCCCGGGTGC GTCTGGAATGCAACCCCGAGATCCCTGTCGCCCCTGGAGAGGAACATTGGCTTCTGGACACAAGCCTAGATGAGGTCTAT TGGCCCTGCAGTCACTCGAGAGCAATCCCCAGCTTTCCTTCGCAACTCGAATGGAAGATTGGACTTSCCTGGGCCAACAC AAGAGGCABCCTGAATTCCCCGTCGTAACTCGAGAATCCCGCCGYMRCTCGAGAAMAACCMCGWGRYTCCCCCGTCATCG

CCTTGATGCGACCCACAAAGTTCCCCGAAATCCCGGTCTCCCTCGAGAGGAACACYGAGGTTTTCCGGCACCCCCTCCTC TGAGCCCTTTCTCCCCTCCTGATCTGGACAGGAGGGTCGACTCCCCTGCTTTGTCTGGAAGGGGTTCCCGACCTTCCGGT

ATTTCCCGGTCCCCTCTTGATAAGAACCCGATGCCCGGACACCTCTCCGAACTCCASCCTGTGAATGMAGTCAACACGAA GGGGCAGTTTTTCCGTGCATCGTTCGGAAAAAACCCCAGGTTCCAAATACAGCTCGACAAGCGGCCTCTCTCCCCGGGGA CATCTCGAGAGGCAAGCGGAGTTCCATGCCTCAACCCAAGACGAGGCCTGACTCTCCTGTCCCCAGTCTGCAGGGACCCT GCGATCGGAGTCTGAAATCAGAGGWACCCTGMGGTTCCTGCCTCAACTGGAGATGAGGCCCTCTTCCAATGCACCAAVCC

YKKCRCMMSTSGAGRGRAVCMHKGGYTTCYKGMCWCAASCCKAGAHNGAGGTCWRWTDSCCCTDCMRTSACTCGAGAGCA ATSMCSMGCTYYCCYTCGCMACTCGMATGNAGAYYSGACTTSCCTGGSSCMMCACRAGAGGCWBMCTGAMYTCSCCGTCG TAMCTCGWGAGAAHCCGCACNCTNGSGCCGCMRCTCGAGAA-MAACCMCGWGRYTCCCCCGTCATCGMGGAGA Corr5_FastaConsAl l . txt >Seq . ID : 2

TATGGAATTTAGAMGATGGTMYF^TAACCCTRTRTRCi^GACAGCMAAGAGAMACWGAYRTADAGAACAGWCTTVfTG

GACTBTGTGGGAGAGGGAGAGGGTGGGATGATTTGGGAGAATGGCATTGAAACATGTAWAATATCATRTATGAAAYGART

YGCCAGTCCAGGTTCGATGCAYGATACWGGATGCTTGGGGCTGGTGCACTGGGAYGACCCAGAGGGATGGTATGGGGAGG

GAGGAGGGAGGAGGGTTCA

>Seq . ID: 3

TTTTGT 1 1 1 I CATGTGTTTGTT-AGYTNKGTGBTWDHNADTHAAATTCAACAYCCATTTATGATAAAAACTCTCCAGAAA

AAAAYTGAAAGCATTTCCYCTAARRTCAGGAACAAGACAAGGRTGCCCACTYTCACCACTWCTATTCAACATAGTTTTGG ARGTTTTGGCCACAGCAATCAGAGCAGAAAAAGAAATAAAAGGAATCCAAATTGGAAAAGAAGAAGTAAAACTCTCACTR

TTAAGGAAACAAAGACGCTTACTCCTTGGAAGGAAAGTTATGACCAACCA

>Seq . ID : 4

GDGTTCTTTGARARGATAAAYAAAATTGACAAACCATTAGCCAGACTCATCAAGAAAMAAGRGAGAARAATCAAATCAAY

ATATGCCAATAAAATGGACAACBTRGAAGAAATGGACAAATTCTTAGAAAMGTACAACYTTCCAAAACTGAACCARGAAG

GGWCCAGAYGGCTTCACAGSTGAATTCTACCAAAMATTTARAGAAGAGCTAAYACCTATCCTDCTCAAACTCTTCCARAA AATTGCAGAGGAAGGWAMACTTCCAAACTCATTCTATGAGGCCACCATCACCCTRATACCAAAACCWGACAAAGAYNCCA CAAAAAAGAAAAYTACAGGCCAATATCACTGATGAACATAGATGCAAAAATCCI

AACAACACATYAAAAAGRATCATACACCATGA YCAAGTGGGMKTTTATΎCCCA >seq . ID : 5

GGGAGGCCTGGC

AHTGTDTMTATDTAHHAMAACAA

>Seq. ID: 6

TGTGTTGGGTNTGTTNGGTTTTGGGTCAAAGAGA I I I I I GACTATCCAATAAGAGTGTCTGGCACCATCCTGACATGTGA

AACGTAAGAAAGcwAA1111111 ΓGTΠTGTTTGGTTCTGATCTTGGGTGTGTGTGTGTGTGTGTGTGTGTGCGTGYGTG

YGGGGGAGGGGAGGGGCGCTACTATCACTGATAGGTATGTTCACCAATCTGTCGAATGCCAATATAAAAGTCAGTTCTTG GTTGCTGMGGGAAMTGTATGACTTGGTATTAAAAGAAGGTATGTGGCATGAAATCGCATTTTCTGAAGGAAAACGACGT CGTTCTTTCTTCCAGGTGGCAGTTTCAGGATGGGAGAAGCAAAGGAATGGGTACAGAATGGGATCAGGAGGTTGTAGGGA corr5_FastaConsAil .txt AMGTGGACCCCCAGAAMGAGTGTTGTGCCTAGCACMGGTTTCTTGAGAGGTTGAACTGCCTTTGCTAATGGATTTTAA

GTTTCTTTACCTCTTCCGTGATCTCTTCTAGGTTGACCTAGAASAGAMTCTTTTGTTAGCTTTGGCTAAGTGGAGAAGT ATTGCTTCACGGTGACTTGTGTGATCCTACCTGACTGTTCTAAACCCTTTTGATATCTATGCACTGCTGCTGCTGCTGCT GCTGCTGCTGCTGCTGCKAAGTCGCTTCAGTCGTGTCCGASTCTGTGYGACCCMMTVKDYGSSACCCCATGGACTGYAGC CYACCAGGCTCCTCYGTCCATGGGATTYTCCAGGCAAGAATACTGGAGTGGGTTGCCATTBCCTYCTCCARKGSATGAAA GTGAAAWGTGAWAGTKMAGTCGCTCAGTCGTGTCCGACTCTTTGYGACCCCATGGACTGYAGCMYRCCAGGCYYCYCTGT CCATSRVATTΎTCCAGGCAAGAATACTGGAGTGGGTTGCCATTBCCTTCTCCMRKRKMTMTTCMCAASCCWGCCTAAATC ATTGACTTCTAGCCACCTTTGGGATGCCAATGMCCACCCASCCTTACCAGGCCVCAGAGGAGAGAAAACACTTACCACAG GAGGTCCGGGTAAGGAACAAGGAACTAACAAGCTCCCACCAACCAGGATTCTGCAGAGGTCAACAAGAGGTCAGCAAGAG ATGGGAGACTGCAGTCCAGAGGTCCTAGCAACTTCCAGCGAGCAGCAAAGACCCAGCATAGTCAAAACGAATTCAATGTT

CAGGATACTCTTAAAGTATACTCTTAACRTAGATGAGGAGAACAAGGTAGGAGGAAAACTTACTGTTTGGTTCAGGTCGC TTCKGAGCCTTTTGGGGGAACCGACCTGAAGCCAGAGTCTAGCCGAAATACACAGGCTTTGAACATCGCTGAAGACCAAC ATTTCCTAAGGAAAAAACGCATTGCTTAGCTCAAGGTTTGTCTGGGAAATTCTTACACATCAGGAGATGGCTCCGAGGTG GATTTCTACAAAGAACATCCATGCAAAGGACAAGCCTGCACAGCTCTGTGCACATCTGCCCCCHCCCCCCCTCACAGCCA GCRGCATTTCCAAGTGACCTCCTAAGGCCTTCGCCAAACTACAGGGGCACCTCTGGGCTTCCTGTTTCCCTCAI ITTTCG GAATCTTTTCTAAAAAAMRAAAAACAAAAARCAMTTAAAAAAAAAAAACCCGCACATTTCTTCGTACAATGCCACAGT

GGMGGAACACAGGCCTTCMTATTMGACCTAMTTTGCAAAAAACTTGGAGCTTGG I I I Il I GTTTTGTTTGGTTTTG

TTTTGTTTGAATCAAACCACCACATCATTTAAAA Π III ACACGTTCTAATTATTCGGTTCCAAAACACCAAACTTGTAC

TGCACACGGTCCCCATGGAAAGGCCCCCTCAGCAGGACAGCCACAGATGCTGCCATGAAAGGGCAGCAGGGTGCCCTCCT

CCCTTGTTTGCTGCCTGGGCGGCCACCGGGTTTTGAAACCCATGCACCCTACTMGTTTCTAGGGTGCMTTCGTGGACC

CTTCCCTTTCCCAYCCCASCAGGTGGTCAAGGGCCCGCTGGCCTCCAAAGTGCCAGAAGCAGGGCTGGGCCAGCCGCCCA

GGCAGCAA--ACAAGGGAGGAGGGCACCCTGCTGCCCTTTCATGGCAGCATCTGTGGCTGTCCTGCTGAGGGGGCCTTTC

CATGGGGACCGTGTGCAGTACAAGTTTGGTGTTTKGGAACCGACC

>Seq. ID: 7

ACYATGATGGCTACTCCATTTCTTCTAAGGGATTCYTGCCCACAGTAGTAGATATAATGGTCATCTGAGTTAMTTCACC

CATTCCAGTCCATTTTAGTTCRCTGATTCCTARAATGTYGAYRTTCACTCTTGCCATCTCCTGTTTGACCACTTCCAATT

TRCCTTGATTCATGGACCTAACATTCCAGGTTCCTATGCMTATTGCTCTTTACAGCATCRGACYTTRCTTCYAYCACCA

GTCACATCCACAACTGGGTRTTG Il I I I GBTTTGGCTCCATCCCTTCATTCTTTCTGGAGTTATTTCTCCACTGATCTCC

AGTAGCATATTGGGCACCTACYGACCTGGGGAGTTCMTCTTTCAGTRTCCTATCWTTTTGCCTTTTCATWCTGTCCATGG

GATTCTCCAGGCAAGAATACTGGAGTGGGTTGCCATTYCCTTCTCCAGKGGATCAHAKTCTGTCAGABMTCTCMACCATG

WCYCDYCCRTCTTGGGTKGCCCCACRGGCATGGCTTAGTTTCATTGAGTTAGACMGGCTGTGGTCCNNGTGTGATTAGA Corr5_FastaconsAn .txt

TTGRCTAGTTTTCTGTGAKTATGGTTTCAGTGTGTCTGCCCTCTGATGCCCTCTTGCAACACCTACCRTCTTACTTGGGT TTCTCTTACCTTGGRCGTGGGGTATCTCTTCACGGCTGCTCCAGCAAAGCRCAGCCRYTGCTCCTTACCTTGGAYGAGGG GTATCTCCTCACCRCCRCCCYTCCTGACCTTSAACGTGGRRTAGCTCCTCTMGGCCCTCCTGCRCCYRYGCAGCCACBRC TCCTTGGACGTGGGGTWGCTCCTCHCVGCCRCCVCYCCTGRCCTYVRRCRTGGGGTAGCTCCTCHCRSCNYSCYGCCCCT GRCCTCVGRCKTVGGGGHSRTGGGGTWGSTCC

>Seq. ID: 8

CCCTAGCASAGCCGCAGCGCAGGGGAACCAGCCCATGATCAGTGAGCTCCGGGCANTACTTTGGTCCCAGGTTTGGAGCC

CWATGC-CCGGAACGCGGGAGAGARBRGGYTGGNTGNSTGTGGGDGCYGTGDGTGGTGCCTGACTGSTGCCTCAGCTCAG

CTCCCGCGGGGAGAGCTCTTTCCGAGGGAAAGGAAGCGCCCCAGGCTGCATTCCCAAGCTGGTGCTTCCCCGCCCAGACT

GTGCCCAGAAAGCGCCTGCCAGACCGGGCCTTCCCTGGAGCTGTCCTCCCGCCTMKMGGCMGGNGGMTCTGTGCTABCGC

AGCCAGCCTCTTCTCTCCCGCGTTCCGGGCATWGGGCTCCAAACCTGGGACCAAAGTNACTGCCCGGAGCTCA >Seq. ID: 9 TGTGTTGKGTGDWTAGACATTTCTCCAAAGAAGACATACAGATGGCYAACAAACACATGAAAAGATGCTCAACATCACTM

CTATGGARAACAGTRTGGAGATTCCTTAAAAAACTRRAAATAGAACTRCCATATGAYCCAGCAATCCCACTVCTGGGCAT AYACMCHGAGGAAACCADAAKKGAAARAGACACRTGTACCCCAATGTTCATHGCAGCACTRTTTAYAATAGCCARGACAT GGAAGCAACCTARATGTCCATCANCAGATGAATGGATAARRAAGHTGTGGTACATATACACAATGGAATATTACTCAGCC ATWAAAAAGAATRMATTTGARTCAKTTCTAATGAGGTGGATGAAMCTRGAGCCTATTATACAGAGTGAAGTAAGYCAGAA AGAAAAACACCAATACAGTATAYTAAYRCATATATATGGAATTTAGAAAGATGGTAACRAHAACCCTRTDTRCNARACAV MAAAASACTCACG

>Seq . ID : 10

TCGAWAAGAACCCGATGCCCGGACACCTCTCCGAACTCCASCCTGTGAATGMAGTCAACACGAAGGGGCAGTTTT

>seq . ID : 11

GTDTTAMGTCYCCYACTATTATTGTGTTAYTGTYAATTTCTCCTTTYAWRNYTGTTAGYATTTGTCTTATRTATTGWGG TGCTCCTATGTTGGGTGCATAKATATTTAYAATTGTTATATCTTCTTSTTGRATTGATCCYTTKAYCATTATGTARTGDC CTTCTTTGTCTCTTTTBAWDYYTTTRKTTTAAAGTCTRTTTTRTCTGATATKAGTATTGCTACTCCWGCTTTCTTKTSD TYYCYATTTGCATGRAATATY I I I I I CCATCCCYTCACTTTCAGTCTRTRTGTGTCYYYWGDTYTGAGGTGGGTYTCTTG Corr5_FastaConsAl l .txt

TAGACARCATATRTAKGGGTCTTG I I I I I GTATCCATTC^GCCAGTCTKTGTCTTTTGGTTGGRGCATTYMYCCATTTA

CATTTMGGTMTTATTGATAWGTATGWTCCYRTTGCCATTTACTTTATTGTT^

GTGTTTCCTGTCTAGAGAAKWTCCTTTAGYATTTGTTGDARAGCTGGTTTGGTGGTGCTGAATTCTCTΎAGCTTTTGCTT

GTCTGWMAGCTTTTI^TTTCTCCTTCAWWTHTGAATGAKAKYCTTGCTGGGTABARTAWTCTKGGYTGTAGRTTHTTYT

CTTRCATYAYTTTRAGTATGTCYTGCCATTCCCTYCTGGCYTGWAGAGTTTCTRTTGAAAGATCAGCTGTTAKCCTKATG

GGVWTYCCCTTGTRTGTTATTTGTTGYTTTTCYCTTGCTGCTTTT^

GATTARTATGTGTCTTGGBGTGTTTCKCCTTGGGTTTATCCTGTWTGGGACTCTYTGKGYTTCTTGGACTTGRGTGAYTA

TTTCCTTYCCCATKTTAGGGAAGTTTTCARCTATTATCTCYTCAARTATTTTCTCAKGBYCTTTCI I I I ISTCTTCTTCT TCTGGGACYCCTATGATTYGAATGTTGGDGCRTTTMYATTGTCCCAGAGGTCTCTGAGRTTGTCCTCATTTCTTTTHAT TCDI 11 I ICI I I I I ICCTCTCTGHTTCATTTATTTCYACCATTCTATCTTCYACHHCACHWATCCTATCTTCTGCCCCAM

ACTTCCCTAAMATGGGRAAGGAAATARTCACCCAAGTCCAAGAARCMCAGAGAGTCCCAMCAGGATAAACCCAAGGMRA

AACACVCCAAGACACATAYTAATCAAATTMCAAAGATYAAACACAAASAAMAAATATTNAAAAGCAGCAAGRGARAARC

AACMRTAACACACAAGGGRAWTCCCATMAGRHTAACAGCTGATCTTTCAAYAGAAACTCTTCARGCCAGRAGGGAATGG

CARGACATACTWAARTRATGAAAGAAAAWMCCWA

>Seq. ID: 12

TAGATTCCCTATCTCTTCCTCTTT--NNTNTTKGGKTTGGTGGGCATTTATCMTGTTCCTTTAYCTGCTGRRTATT^CTC

TGYCTYTTCATCTTGTTTANATΓGCTGWGTTTGGGGTGKCCTTTCTGTATKCTGGMAGTTTGTGGWKTYCTCTTTATTGT

GGMGKTTCCTCVCTGTGKGTGGGKTTGKACRGGTGGCTTGTCAAGGTTTYYTGGTTAGGGAAGCTTGTGTCRGTGTTCTG

GTGGGTGGAGCTGGATΎTCTTCTCTCTGGAGTGCMTGMGTGTCCAGTARTGAGTTWTGAGATGTCTRTGGKTTTGGDG

TGACTTTGGGCAGCCTGTATHTTRAWGCTCAGGGCTRTGTTCCTKTGTTGCTGGAGAATTWGCTTGGTATGTCTTGCYCT

GGMCTTGTTGGCYCTTGGGTGGWGCTTGGTTTCAGTGTAGGTATGGAGGCDTTTGATGAGCTCYTGTCRATTMTGTTC

CCTGGAGTCAGGAGTTCYCTGGWGTCAGGDTTΓGGACTTAAGCCTCCTGCYTCYRGTTWTCRGTCTTATTYTTACAGTAG

YYTCAARACTTCTCCWTCTATACARCACCRWTGATAAAACATCTASGTTAAWGRTGAAAAGWTTCTCCACAGTGAGGRHC

ACYCAGAGAGGTTCACAGAGTTAYATRGAGMGAGAAGAGGGAGGARGGAGWYAGAGGTGRCCAGSSAGNGGTCACTCCC

TTATGTGCAC

>Seq . ID : 13

GTKTGCTGGYGDWRGGGGGDKGTGBTGCMGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCACGACGTTGTAAA

ACGACGGCCAGTGAATTGKAWTACGACTCACTATAGGGCGAATTGGGCCCGACGTCGCATGCTCCCGGCCGCCATGGCCG

CGGGATTKGCTGACAMGCAAGAGATTTTDTTGGGAAASGGCRCYSGGGYRGAGAGCAGKAGGGTAAGGGAACMCAGGAG

AACAGCTCTGCCRCRTGGCTCRCAGKCTYGGGTTTTATGGTGATGGGATTAGTTTCCGGGTTGTCTTTRGCCMTCATTC

TGAYTCAGAGTCCTTCCTGGTGGTRCACGCCTTGTTCAGCCMGATGGATGCCAGAGMGMGGATTCTGGGAGGTGGTCG Corr5_FastaConsAll .txt

GACAYRTGGTGTCTCCTKTTGACCTYTSCHGAANNNTYCYGGTTGGTGGKRGCTTRTΓAGTTCCKTGTTCCTTACCMGGA CCTCCTGTSRTAAAACAACTCATGCAAATGGTTACTATGGTGCCTGGCCAGGGTGGGYGGTTTCARTCAGTGTGCTTCCC CWAACAHAATCA >Seq. ID: 14

GGTTCAGGATGGC

WAAAAHAWAAAAWWWMAAAAWWWAAAAWARADWHAWWAWAADDWHMWAWAADWWGWWACTAHAG

>seq. ID: 15

TTYCCGATTTCCCGGTCCCCTCTTGATAAGAACCCGATGCCCGGACMMMCYCC

>Seq. ID: 16

TGGGGGGAGCGCGTCWTTRCTCTCRAGTCATRGTAGGGGAATCTGGCCTCGAG

>seq. ID: 17

GGAGAAATGTCTRTTTAGKTCTTTGGCCCAI I I I I I GATTGGGTΎRTTTR I I I I I CTGGWRTTGAGCTGYAKGAGYTGYT

TGTATATTTTKGAGATTARTYSTTTGTCAGTTGYTTCATTTGCWATTATTTTCTCCCATTCTGWRGGYTGTCTTTTCAYC TTGYTTATRGTTTCCTTTGYTGTGCAFl^AGCTTTTAAGTTTAATTAGGTCCCATTTGTTTA I I I 11 GYTTTTATTTCCAW TAYTCTRGGAGGTGGGTCATAGAGGATCYTGCTGTGATKTATGTCRGAGAGTGTTTTGCCTATGTTYTCCTCTAGGAGTT TTATAGTTTCTGGTCTTACRTTTAGRTCTTTAATCCATTTTGAGTTTA I I I I I GTGTATGGTGTTAGRAAGTGKTCTART

TTCATTCTTTTACAWGTRGYTGWCCAGTTTTCCCARCACCACTTRTTRAAGAGAYTGTCTTTWMTCCATTGTATATTCTT

GCCTCCTTTGTCAAAGATHAGKTGDCCATAKGTGYGTGGRTTTATYTCTGGGCTTTCTATTYTGTTCCATTGATCTATAT

KTCTGTYTTTGTGCCAGTACCATACTGTCTTGATDACTGTRGCTTTGTAGTAKAGYCTGAAGTCAGGHAGGTTGATTCCT

CCAGYTCYRTTΎTTCTTTCTCAAGATTGCTTTGGCTATTΎRRGGTY I I I I GTRTTTCCATACAAATTKTRAAATTWTTTG

TTCTAGYTCTGTGAAAAATRCCRTTGGTADYTTGATAGGGATTGCATTGAATCTRTARATTGCTTTGGGTAGTATASWCA

TYWTMRYRAYADTGATTMTTCTNMCCAAAARYMTGKWHTTTHCATCTRTDTCTYYTTTGCTKYCYTGCATVTAGGATCRT

TGWYYTATNANNTTNNNTGNANNAGNNTCTΓTNNTTTCNTTCCTTGTAA

>Seq . ID : 18

GGTKWGGGGBYCC

TTCTCTGTTGCTYRGTGYRTCAGGYRCTTAAADGVCYMCCCTGSCTGGGGTCCTTCTCTGTTGCTTGGYGYRTCAGGCRC

TTAAAKGGCCHCTGVCTGGGGTCCTTCTCTGTTGCTYRGTKYRYCAGGHRCTTAAAKGVCSMCCCTGSCTKGRGTYYTTC

WCTGTTGCTTGGTGYRTCWGGHRCTTAAATGGCCMCMYTSCCTGGGKTCCTTHTCTGTTGCTTGGTGCATCAGGCACTAA

AGCCTGCCCCCBTCCCCCCCCACMCCMAACACACCCACACA

>Seq. ID: 19

CCATGGGGTCRCAAAGAGTCRGACACGACTGAGCGACTGAACTGAACTGAACWKAHAATHDTTTCTCMDGRTGBTDHHAD Corr5_FastaConsA^"π .txt

AHACTCHTVTDTGDATAGTCAAAMTCTCTTTTGTTTBTGTGTAAMGGGAGGTCCTTTCAAGGCGTGAATGTTTCAGAA

CTTKAATTTATTTGGAMTGACCCAGCTCTTCAGTACACTGTCGTCACTTAGTTTAGCACAGGATAGAAACTCGGGTAAC

CAAAACACCTGGAGAAAHDATTDTHWGTTCAGTTCAGTTCAGTCGCTCAGTCGTGTCYGACTCTTTGYGACCCCATGRAY

YGCAGCACRCCAGGCYTCCCTGTCCATCACCAACTCCHGGAGTTYRCTCAAACTCAYGTCCATYGAGTCRGTGATGCCAT

CCARCCATCTCATCCTCTGTCGTCCCCTTCTCCNCTGCCCYCAATCWWNCCYMSCAKCAKNNGRTCTTTTCCAATGWRDY

HASTCTTCAYADKKTRBCHAAMVSTCAYATSACATTTMAKYTTTMGTWTTGTTCCTGTTCTTTCGAGGTACTTTCTYGT

TGACAAGCTTGTGACAGATAGAACGATATAGCAAI I I I IACCTTAGAACAAACCGAGGCACTATGAACATTTTGTGCTTC

ATGTTGATGACTCTTAGACATGTCTACAGTAGAGGAGCAAAAACAAAACTACTAGATATTTCATATTGACTAGTTCCCAG

TTCACGGGACTCTGACATTCCCTGAGGTCAAAGTTTTCTTGTATTGGAAGCAGTTGGGTTTGCAAGGGCTGCCTTGTCTT

GAGACCATTGAAATAAGAACTCAGAACTTGAGCACTATTATCAAAAATCACAAGGCTCACACTGACACAGACATCAATCC

AGACAGACMGACMGACATCTTCCAGTTTTCCGCCTGAGATGGAMAGATTTCTTTGAGCCG I I I I I I CTGGGGAGTGG

GGGGTGGGGCTGGCGGCCAGGCAGGCTTTGGGAAGGAGCTCATGGTTTGGAATTGCATATGAAAAGAACCAGCTTTCCGG

GTTCCAAGGAATCMGTTTCCTTGGAAACCAACTTTGTCCGGTTCTRTAGAATATCATGGCCCTCCTAGGTCAGGTCGTC

TTTCCTTTCCGTAGCCCTCGTTKGATCCCTGMTTCTAGACAGCTTGGATTGCCTCTGTGGGCTGGATGGATTGTCCTCC

CGTTTCACCGGGCGGCGGGAGCGAGGTCCCAGAGGCTCTCCTGGAACCGGGCGKRSRRKCGGGGCTCACCGGGAGCCCGT

GGTGAAGGTGGGNNGACCC

>Seq. ID: 20

TTCCATTCACCATTGCAWCNAAAAGAATAAAATACYTAGGAATAWAYCTACCTAARGARACDAAAGACCTVTANAHAGM

AACTAYAAAACACTGVTGAMGAAATCAAAGADGACACWAAYARATGGARARATATWCCATGTTCATGGATTGGAAGMT

CAATATWGTGAAAATGASTATACTACCCMNCAATYTAYAGATTCAAYRCA

>Seq . ID : 21

TGTMTAYMTAGAAMCCCTAAAGACTCMACCMRAAAWYTMCTWRARCTRATMARYRAHTWYAGYAAAGTYKCAGGATAYA AAATCAAYRYACARAAATCMCWWGCATTCYTATACACYAAYAAYRRRMAAACAGARAGMSAAATYAWGRRWRMAMTYCCA TTCACMATTGCWWCRMRAGMTAAAATACYTAGGMTMHAWCTWMCWARRGADRYDAARGACCTVTWYAHRGARAACTA

TWGTGAAMTGASTATACTACCCMAGCMTYTAYAGATTCMTGCMTCCCTATCMRHTACCMYGRYATTYTTCACA

GAACTAGMMAAAHAATTTYAMMTTYRTATGGAAMYAMMAARASCYCRAATAGCCAARDCAATCYTRAGMAARMGM

YRRARCTGGAGGMATCACG

>Seq . ID : 22

TGTTGGGTGTGTTTGNGTGTKTTTGKTITI ITSTTTYTGDSTTABTTCACTBWGTATRATRGBYTCYAGKTTCATCCAYS

TYVYTRSAAMTGAYWYDAWTKYATTCTTTTTWATGGCTGAGTARTATTCCATTGTGTATATGTACCACAKYTTBYTWATC

CAKTCWWYYRYTGATGGACATYTRGGTTGBTTCCAWGTCYTKGCTATTRTRMTAGTGCTGCDATRAACATWSGKGTRCA Corr5_FastaConsAil .txt

TGTRTCTTTVfTVRHAKHATGDTTTMKWWTCCTYWGGGTATATRCCCAGBARTGGGATTGCTGGGTCAWATGGtAKTTCTA

KTTCTAGWTCCYTG

>Seq. ID: 23

TWGTΓΓBWGSTGTACAGCAMGTGAWTCAGTTATACATAYACATATATCYANTCTTTTTYAGATTCTTTTCCCATATAGG

TYATTACAGARTATTGAGTAGAGTKTTCHKGTAKAYATATNSTCA

>Seq. ID: 24

GTGGAAGAGGGCCTCATCTCCAGTTGAGGCAGGAACCGCAGGGTACCTCTGATBGHGGCGT

>Seq. ID: 25

GTAGCTTGGTTTACGCGGAAGACCAATCAAACTTCAAGACAAGAAGTTTGCACCACTTACGTAGGCCGCAGGCGCCCTCT CGAATAGCGAAAGGTGCCTCACCCTAGACACCTTCTCGAGTGGGTCTTAGCAGCCCAGGCATAATTAGTAAGCGTGGTGG GTTCCGCGCTCCAGATGGAGACTCAGCTGGAAGTTAAAGGGAAGAATGACAAGGAACTTTATGAATTGGAGCTGTAAGTT AACTCTTTGACAGAGAGAGCGAGATGGTGGTGGGGGACAGCCCCCMGTRAARTCAGAGGTGAGAGCACAAAGCAADAMAG TAGGCAGACTCYGGTTTTNNNNGGGGGKAKATGCTCGAGAATWTCCRGGKGGACTCCTGAGGCTCGATCCCGCMTTTGCG TATGCCGAGCCTCCTTCCTCATGACCTTTGTCMWGRGYGGARTKCCTCMCYGGCTCCMGSCACRTGATCAGTGCATGMTC AGYCACGTGATCAGTGCMTGMTCAGCCACGTGATCAGTGHMTGVWCAGYCACGTGATCAGTGCMTVTCAGYCACGTGATC

GYCACGTGATCAGTGCMTGMWCAGYCACGTGATCAGTGCMTNTCAGYCNSKGATCASTGCMTGATARHCACGTGATCAGT

GCC

>Seq. ID: 26

ACCTCWGAGCCCCTTCTCCCCTCCTGATCTGGACAGGAGGGTCGACTCCCCTGCTTT

>Seq. ID: 27

TGTTTTCATGGCTGCAGTCACCATCTGCAGTGATTTTGGAGCCCNMVAAAAATAAAGTCTGWCACTGTTTCCACTGTTTC

CCCATCTATTTSCCATGAAGTGATGGGACCRGATGCCATGATCTTMGTTTTCTGAANTTGAGYTTKNTTWDTCCAACTCT

CACATCCATACATGACYACTGGAAAAACCATAGCYTTGACTAGAYGGACCTTTGTTGGCAAAGTAATGTCTCTGCTTTTK

AATATGCTRTCTAGGTTGGTCATAACTTTYCTTCCAAGGAGYAAGCRTCTTTTAATTTCATGGCTGCAGTCACCATCTGC

AGTGATTTTGGAGCCCMVVAAAATAMGTCTGHCACTGTTTCCACTGTTTCCCCATCTATTTSCCATGAAGTGATGGGAC

CRGATGCCATGATCTTHGTTTTYTGAATGTTGAGYTTTAAGCCARC I Π I I CACTCTCCTCTTTCACTTTCATCAAGAGG

CTYTTTAGTTCYTCTTCRCTTTCTGCCATAAGGGTGGTGTCATCTGCATATCTGAGGTTATTGATATTTCTCCYRGCAAT

CTTGATTCCAGCTTGTGCTTCWTCCAGCCCAGCRTTTCTCATGATGTACTCTGCATATAAGTTAAATAAGCAGGGTGACA

ATATACAGCCTTGACGTACTCCTTTYCCWATTTGGAACCAGTCTGTTGTTCCATGTCCAGTTCTAACTGTTGCTTCCTGA

CCTGCATACAGRTTTCTCMGAGGCAGGTCAGGTGGTCTGGTATTCCCATCTCTTTMAGAATTTTCCACAGTTTRTTGTG Corr5__FastaConsATl . txt

ATCCACACAGTCAAAGGCTTTGGCATAGTCAATAAAGCAGAARTAGATGTTTTTCTGGAACTCTCTTGC I I I I I CBATGA TCCARYRGATGTTGGCMTTTGATCTCTGGTTCCTCTGCCTTTTCTAAAWCCAGCTTGAACATCTGGAAGTTCAYGGTTC ACRTAYTGYTGAAGCCTGGCTTGGAGAATTTTGAGCATTACTTTRCTAGCRTGTGAGATGAGTGCAATTGTGYGGTAGTT TGAGCATTCTTTGGCATTGCCTTTCTTTGGGATTGGMTGAAAACTGACCTTTTCCAGTCCTGTGGCCACTGCTGAGTTT TCCAMTTTGCTGGCATATTGAGTGCAGCACTTTCACAGCATCATCTTTYAGGATTTGAMTAGCTCAACTGGAATTCCA TCACCTCCACTAGCTTTGTTCRTAGTGATGCTTYCTMGGCCCACTTGACTTCACATTCCAGGATGTCTGGCTCTAGGTG AGTGATCACACCATCRTGATTATCTGGGTCATGMGATC I l I I I I GTACAGTTCTTCTGTGTATTCTTGCCACCTCTTCT TAATATCTTCTGCTTCTGTTAGGTCCATACCATTTCTGTCCTTTATYGWGCCCATCTTTGCATGAMTGTTCCCTTGGTA TCTCTAATTTTCTTGAAGAGATCTCTAGTCTITCCCATTCTRTTGTTTTCCTCTATTTCTTTGCATTGATCRCTGARGAA GGCTTTCTTATCTCTYCTTGCTATTCTTTGGMCTCTGCATTCAGATG B KTATATCTTTCCTTTTCTCCTTTGCTTTTYR CTTCTCTTCTTTTCACAGCTATTTGTAAGGCCTCCYCAGACARCCATTTTGC I I I I I I GCATTTC I I I I YCWTGGGGATG

GTCTTGATCMCTGTCTCCTGTACAATGTCACRAACCTCHDTCCATAGTTCWTCAGGCACTCTRTCTATCAGATCTARTCC CTTRMTCTATTTSTCACTTCCACTGTATMTCATMGGGATTTGATTTAGGTCATACCTGMTGGTCTAGTGGTTTTCC CTACTTTCTTCAATTTAAGTCTGMTTTGGCMTAAGGAGTTCATGATCTGAGCCACAGTCAGCTCCYGGTCTTG I I I I I GCTGACTGTATAGAGCTTCTCCATCTTTGGCTGCAAAGAATATAATCAATCTGATΠΎGGTRTTGACCATCTGGTGATGT CCATGTGTAGAGTCTTCTCTTGTGTTGTTGGAAGAGGGTGTTTGCTATGACCAGTGCRTTCTCTTGGCAAAACTCTRTTA GCCTTTGCCCTGCTTCATTCYGTAYTCCMGGCCAAATTTGCCTGTTACTCCAGGTRTTTCTTGACTTCCTACTTTTGCA

TTCCAGTCCCCTATAATGMMGGACATC I I I I I I GGGTGTTAGTTCTARMGGTCTTGTAGGTCTTCATAGMCCRTTC

AACTTCAGCTTCTTCAGCRTTACTGGTTGGGGCATAGACTTGGATTACTGTGATATTGMTGGTTTGCCTTGGAAAYGM

CAGAGATCATTCT-GTCR I I l I I GAGATTGCATCCAAGTACTGCATTTCDGACTCTTTTGTTGCATCCMGTACTGCATT

CVGACTCTTTTGTTGACYATGATGGCYACTCCHA

>Seq . ID : 28

TGCACAGGCTCTAGGCDCVYGGGCTTCAGTAGTTGCRGCACRYGGGCTCAGTAGTTGYGGCTCMTGGSYYTAGWKKSYVC

AGGCNTCARTAGTTGYRGCANVCAGRCCVTAGAGTGCGAG

>Seq. ID: 29

GGTGAACTCCRGGAGTTGGTGATGGACAGGGAGGCCTGGYGTGCTGCRRTYCATGGGGTCMAARRRKCGGACACGACTGA

GCGACTGAACTGAACTGAAHHTDMAHTHTCADTHHTTKDHKDAGGRAAGCCCCCTGGCTTC

>Seq. ID: 30

GTTTTRTTCCTGTGTGTTCTTGCCTCCANTGTCCACAGCTRTCAGAACTAGTGTGTTTTY Ni l GTGGGAGCTCKCAATG

WCCTTTTAYATATTCCANNNNCASAGTCTGCCTAGTTGATCRTGTGGATTTAATCTGCAGCTTGTACAGCTGGTGGGMG

GTTTTGGGTCTTCTTCCTTAGCCACACTGCCCCTGGGTTTC^ATTGTGGTTTTATTTCCACCTCTGCATGTGGGTCRTCC

ACTGGGGTTTGCTCCTGAGGCTGCCCTGGAGGACTTGGGTTTGCCCCTGTGAGGGCCAGGTGTGGAGGTGGTGCAGCTGC Corr5_FastaConsAll .txt

TTGGGTYRCAGGGGTTCTGGCAGCACCAGGTACTCAGGGGAGTTGGYRGCTAGRGCAGCAGGAAATAYAGTGBTCTRGAA

GGDTATGGGARRAAGVAAWKGSCAACCCACTCCAGTATTCτTGCCTGGARAAτCCNW--NNτGGACRGAGGAGCCτGGYR

GGNCAGTCCATGGGGTCRCAMAGAGTCRGACAYGACTGNAGCGACTDAGCACACAYAVACACAAVACTCTTTTTGCMTGT

GGCARBTCTG

>Seq. ID: 31

ATCACADTGTGACTGGTGATDGDAGBAAADTHHAATDHTATAAADAAHAHDATTDHWTCAGTTCAGTTCAGTTCAGTCRC

TCAGTCGTGTCCGACTCTTTGCGACCCCATGAAYYGCAGCAC

>Seq. ID: 32

CACCAAHACABCAADWTCAGTBTDGAGAAAAADAAATDDTDDADDHADTBHABTTTADHTDDDDTCAGWTCAGWTCAGTT

CAGTTCAGTCRCTCAGTCGTGTCCGACTCTTTGCGACCCCATGAA

>Seq, ID: 33

TTTGTGGGTCGCATCAAGGGTGCCAAGTGCCCTTTCGACCTCCAATTCCTA

>Seq. ID: 34

TGTTAYATRGAGAAGAGAAGAGGGAGGAGGGAGWTAGAGGTGACCMRRAKGAGAWGAGGKGGAATCAAWAGDGGAGAGAG

HRRKCTAGCCAGTAATCASTTCCYTAWGTGYWCTCCACMRCTGGAMCACNCAGARAKNNTCACRGAGTTRBRYAGAGAAG

ATCASTTCCCTAAGTGTTCTCCACMGTCTGGAACACACAGARATTCACAGAGTTRGRTAGAGWAGAGARGGGTKAGGGAG

AGTAGAGKTTGGAATTΓCAAAMTACRATGTTAAAGAAAAGAAGAAGRAAAAGAAAGAGAGAAAAAAHDAACAAACAAAA

AMAAAMAARGTMVYRAARATTAI^NNGWWMWTAAAGGTACAAMTTGATAACAAATACMWAAAAGCAAARATTAAAAATC

TAGAGTAGAGKTTGGAATTTCAAAAATACRATGTTANAAAARAAGAAGRAAAAGAAAGAGARAAAAAAMRAACAAACAAA

NAARAAMAAACAARGTMCVHAWAAATTATAAAGAAAATNAYAGGTACNAAAATWGAYAACWAAHMCCAAAMACAT

>Seq . ID: 35

GACCAACCTAGAYAGCATATTMAAAAGCAGAGACATTACTTTGCCAACAAAAVYCCMATGGA

>Seq. ID: 36

CTGTRAAGAARGCTGAGYRCYRAAGAATTGATGCTTTTGAACTGTGGTGTTGGAGAAGACTCTTGAGAGTCCCTTGGACT

GCAAGGAGATCCAACCAGTCCATYCTAAAGGARATCAGYCCTGRGTDTTCWTTGGAAGGAMTGATGCTRAAGCTGAAACT

CCARTACTTTGGCCACCTSATGYGAAGAGYTGACTCATTGGAAAAGACYCTGATGCTGGGARRGATTGRGGGCAG

>Seq. ID: 37 Corr5_FastaconsAll .txt

AADAHADAWAAWADHAHTWAHHTCAAACTG

>seq. ID: 38

CCCCACCACCTCTTTCGGAGWAGIMRVWARMBTWGAGCTTACAGTTCMGTTAATAATTCCTGGGTGTGACAGTGTTTAAC

CTACAAACTccTTTGGAARTccTCTAGCCTGCCTGAATAGGi I I I ΓCCGGCCACATGKGATTGTTCAGAGCCTCCCAACT

GTGAGAGGCAGGAGATGTTCTMACTGTCTAAACACAGATTCTTTTGAGKAGTTACAAGATTGATTAGAAATTGTATTGG

TGAATGG I I I I I CACTTGTTGGGCCATTGTTTGCTGCTAAGTTTCCATATCCCTTACCTGCTGTGTCCCTGGCAGTGTAT

TGATTAATATMTTGGTGTAAGTAGTAGCTTTAATGTTTGTAACCTGGGACCCTTGAGTTAATTCT I I I l CTTGTTATAG

CCCACCACACCTTTGCTCTGTAGGAATGCMCTTTATCTMTGC I I I I I TGGAGGGTGGCTCCTGACCMCCACCTTTAG

AGAAAMTMGTTTTCTGAAGAAAAGGTCTTMAATGTTMCAGGCCTCCGGGCCAGAAGATGATGCAMTCACCTAAGC

TTTTGCATATGATAAGTTTGCAGGAAGAAAGCCTGGTTTGCTGCAAGACTCGACCCCHHCCCCCNMMNATTATCCTCTAT

GCATMCTTMGGTATMAAACTACTTTGAAAAATAAAGTGCGGGCCTTGTTCACCGAAACTTGGTCTCACCATGTCGTT

CTTTCTCTTACCTTCTGGCTGMTTAYTCAGYCTCTTTTCTCCACYGAATTTYCYYACTGAGCTMTCCTCATWCTATTAY

TCTTKAYATCYYTRATTARCATWTA/

GATCAGYYGGGGCTGGTCCCCGGCA

>Seq . ID: 39

ATCAGTGCATGANAGBCACGTGATCAGTGCAT-NGMTCAGYCACGTGATCAGTGCMTGATAGYCACGTGATCAGTGCATG

MTCAGYCACGTGATCAGTGCMTGANAGYCACGTGATCAGTGCATGMTCAGYCACGTGATCAGTGCMTGMTCAGYCACGTG

ATCAGTGCMTGANAGYCACGTGATCAGTGCATGMTCAGYCACGTGATCAGTGCMTGANAGYCACGTGATCASTGCATGMT

CAGYCACGTGATCAGTGCMTGATAGC

>Seq. ID: 40

ATGATCAGYCACGTGATCAGTGCATGATCAGYCACGTGATCAGTGCATGANAGCCACGTGATCAGTGCATGATCAGYCAC

A

>Seq, ID: 41

GAGGAGAAGGGGACGACAGAGGATGAGATGGCTGGATGGCATCANSANCRATGGACRTGAGTYTGAGTRMCTCCGGGAG

C

Claims

WHAT IS CLAIMED IS:

1. A method of detecting prion disease in a subject, the method comprising detecting the presence of a single nucleotide polymorphism (SNP) in nucleic acid present in an acellular fluid sample from the subject.

2. The method of claim 1, wherein the prion disease is bovine spongiform encephalopathy.

3. The method of claim 1 , wherein the SNP is present in a non-coding region.

4. The method of claim 1 , wherein the SNP is present in a reference sequence set forth in SEQ ID NOs 1 -41.

5. The method of claim 4, wherein the SNP comprises one of the polymorphic positions set forth in Table 1.

6. The method of claim 1, wherein the acellular fluid sample is serum.

7. The method of claim 1, wherein the acellular fluid sample is plasma.

8. The method of claim 1, wherein the nucleic acid sample comprises DNA.

9. An oligonucleotide that specifically hybridizes under stringent conditions to a sequence comprising a SNP position, or to the complement thereof, as set forth in Table 1, wherein the oligonucleotide is competent to discriminate between reference sequence set forth in Table 1 and a polymorphism present at a SNP site as set forth in Table 1.