WO2009121091A1 - Mapping method for polyploid subjects - Google Patents
Mapping method for polyploid subjects Download PDFInfo
- Publication number
- WO2009121091A1 WO2009121091A1 PCT/AU2008/001397 AU2008001397W WO2009121091A1 WO 2009121091 A1 WO2009121091 A1 WO 2009121091A1 AU 2008001397 W AU2008001397 W AU 2008001397W WO 2009121091 A1 WO2009121091 A1 WO 2009121091A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- mutation
- polymorphism
- polyploid
- subject
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Definitions
- the present invention is in the field of molecular biology and, more particularly, relates to methods for identifying nucleotide sequence variations, e.g., polymorphisms or mutations in nucleic acid from polyploid subjects.
- Genetic variations between organisms are routinely used as genetic markers in a variety of assays for, for example, gene mapping, positional cloning, identification of individuals, marker-assisted breeding, genotype/phenotype association, or for determining a subject likely to have a trait of interest. This is because such genetic markers may be in close proximity and, as a consequence linkage disequilibrium with an allele causative of a phenotype of interest and detection of the genetic marker is indicative of the presence of the allele, or a genetic marker may be causative of a phenotype of interest.
- a genetic marker is a variation in a nucleic acid that is polymorphic or has a plurality of different alleles that occur within a population.
- Preferred genetic markers are those that are readily detectable.
- SNPs Single nucleotide polymorphisms
- Picoult-Newberg et al Genome Res., 9: 167-174, 1999.
- SNPs are a particularly versatile class of genetic marker, because they occur in all regions of a genome, e.g., both coding and non-coding regions, and occur regularly throughout a genome.
- a SNP occurs approximately every 200 base pairs (bp) (Ravel et al., In: Vollman et al, (Eds) Genetic variation for plant breeding, Eucarpia: Tulln, Australia, ppl77-181); in cotton (Gossypium hirsutum L.) a SNP is estimated to occur approximately every 43 bp (An et al, MoI Genet Genomics. 2007 Aug 28); and in soybean a SNP is estimated to occur every 273 bp (Zhu et al, Genetics, 163: 1123- 1134, 2003).
- the high density and mutational stability of SNPs make them particularly useful genetic markers for population genetics and for mapping genes associated with complex traits.
- ddNTPs 2',3'-dideoxynucleotide triphosphates
- the nucleotide sequence of the target nucleic acid is determined. This sequence is then compared to a sequence from another subject to detect the occurrence of two different nucleotide residues, i.e., two different fluorescent labels, at the same position, i.e., having the same molecular weight. The occurrence of two different nucleotide residues in the same position is indicative of a SNP.
- SNPs in polyploid subjects are more difficult than in diploid subjects because the SNPs must be detected in a mixture of nucleotide sequences.
- variation between homoeologous sequences, which occur on different copies of the genome, and paralogous sequences, i.e., (duplicate copies that may exist within a given genome) dramatically increase the level of complexity of sequence information, and traditional SNP identification methods are simply unable to produce an unambiguous interpretation of such complex sequences.
- HSVs homolog-sequence variants
- PSVs paralog-sequence variants
- nucleotide residues at the site of a SNP varies within a population of individuals.
- SNPs, HSVs and PSVs all appear as sequence variations using traditional SNP identification methods. Accordingly, such methods are unable to distinguish between genetic variations that occur between genomes within a single polyploid subject, and truly polymorphic variations between polyploid subjects.
- SNPs in polyploid subjects are further complicated by the fact that many homoeologous sequences are of different lengths, e.g., as a result of insertions and/or deletions in one or more genomes of a polyploid subject. Such variations are more common in sequences occurring in non-coding regions of the genome.
- a result of such an insertion or deletion is that there is a frame-shift when sequences on different genomes are aligned.
- the method detects two or more nucleotide residues at the same position as a result of this frame-shift.
- sequencing has traditionally been performed by a target nucleic acid from one or more genomes of a polyploid subject, subcloning the amplified nucleic acid and isolating individual clones, i.e., comprising a target nucleic acid from one genome of the polyploid subject, and separately sequencing target nucleic acid from the individual clones to thereby determine the sequence of each genome.
- a target nucleic acid from one or more genomes of a polyploid subject subcloning the amplified nucleic acid and isolating individual clones, i.e., comprising a target nucleic acid from one genome of the polyploid subject, and separately sequencing target nucleic acid from the individual clones to thereby determine the sequence of each genome.
- Clearly the requirement for subcloning and multiple sequencing reactions is both time-consuming and expensive.
- ESTs expressed sequence tags
- Tang et al, BMC Bioinformatics, 7: 438-453, 2006 describes an algorithm for analyzing sequences from EST data to identify SNPs in polyploid subjects.
- EST collections are usually generated from a limited number of genotypes, i.e., strains, lines, varieties and/or cultivars.
- the inventors sought to produce a method that permitted them to readily identify polymorphisms in nucleic acid from polyploid subjects, without the requirement for subcloning amplified nucleic acid.
- the inventors also sought to produce a method that distinguishes between allelic polymorphisms and HSVs or PSVs.
- the inventors have produced a method that involves performing a sequencing reaction with only a single ddNTP to thereby determine the location of each occurrence of the corresponding nucleotide in a target nucleic acid from a subject, and aligning the sequence data with sequence data of another subject, and then comparing data sets to thereby determine a sequence variation between subjects.
- the inventors then detected the nucleotide variation in DNA from a plurality of polyploid subjects, wherein some of the subjects comprise one form of the variation and some subjects comprise a different form of the variation.
- the inventors demonstrated that the variation that exists between subjects is polymorphic, and is not a HSV or a PSV.
- the inventors Following identification of the nucleotide variation, the inventors detected the presence or absence of the nucleotide variation in a plurality of polyploid subjects that differ from one another in so far as at least a region of one genome comprising the target nucleic acid is deleted. By comparing results from subjects comprising a deletion in different genomes, the inventors not only demonstrated that the nucleotide variation is polymorphic but also determined on which genome the nucleotide variation resides.
- This method has utility in detecting polymorphisms(e.g., SNPs) and/or mutations (e.g., INDELs) in polyploid subjects, e.g., produced using targeting induced local lesions in genomes (e.g., TILLING), thereby permitting rapid screening of mutants having a phenotype of interest to identify the genotype of that mutant.
- polymorphisms e.g., SNPs
- INDELs mutations
- TILLING targeting induced local lesions in genomes
- this method has the capability of analysing large numbers of mutations and polymorphisms in nucleic acid fragments, it is particularly compatible with, albeit not restricted to, sequencing tools for analyzing longer stretches of nucleic acid e.g., more than about 250 bp or 300 bp or 350 bp or 400 bp or 450 bp or 500 bp or 550 bp or 600 bp, or more.
- the present inventors have shown that their generic method may be performed in pooled DNA samples, e.g., bulked segregant analysis, thereby facilitating rapid identification of polymorphisms associated with a trait of interest in a polyploid organism.
- the present inventors have shown that it is possible to rapidly detect one or more variable nucleotide sequences within or flanking an allele of interest in a target sequence, by aligning sequence data of allelic variants and detecting a mobility shift between termination fragments produced in sequencing reactions.
- This mobility shift may result from a change in molecular weight of a nucleic acid fragment comprising ddNTP as a consequence of an insertion, deletion or other mutation in an allele.
- the inventors have determined sequences flanking or surrounding one or more nucleotide variations. This permits identification of genetic markers and enables production of reagents e.g., a probe or primer to detect the nucleotide variation.
- the present inventors have shown that the method of the present invention is sufficiently sensitive to permit the detection of polymorphisms e.g., SNPs , in mixed samples e.g., comprising pooled genomes from a plurality of subjects.
- the method is broadly applicable to detecting alleles present at low frequency in polyploid organisms, e.g., a single allele in a diploid, tetraploid, allotetraploid or hexaploid organism such as occur in the grasses.
- the present inventors have shown that the method of the present invention is suited for detecting polymorphisms in complex mixtures of nucleic acids, e.g., wherein target nucleic acids amplified from different genomes of polyploid subjects comprise microsatellites. Also exemplified herein e.g., example 8 hereof, the present inventors have shown that the method of the present invention is capable of detecting polymorphisms or mutations de novo in genomes of polyploid subjects.
- inventive method has broad applicability to breeding in polyploid subjects, e.g., where the identification of one or more polymorphisms in one or more homeologous genes that are not conserved across all genomes of the polyploid is required for determining the breeding contribution of a particular parent.
- inventive method can be employed for identifying polymorphisms and/or mutations in one or more genes associated with trait(s) such as, for example yield and/or flour quality and/or flowering and/or disease-resistance and/or tolerance to abiotic stress such as drought, salinity, frost, heat, etc.
- polymorphisms and/or mutations associated with wheat flour quality are identified in a starch biosynthesis gene of tetraploid and/or hexaploid wheat, e.g., a Wx gene encoding granule-bound starch synthase (GBSS). More particularly, the method of the invention has permitted the identification of an INDEL (i.e., one or more insertions or deletions) in a Wx gene of tetraploid wheat.
- INDEL i.e., one or more insertions or deletions
- polymorphisms and/or mutations associated with wheat flour quality are identified in a storage protein-encoding gene of tetraploid and/or hexaploid wheat, e.g., a gene encoding a glutenin subunit e.g., GIu-Al, GIu-Bl, GIu-Dl, GIu-AS, GIu-BS, GIu-DS, Glu-D4 or GIu-DS or allele thereof, or alternatively, encoding a gliadin e.g., an allele of a GIi-I or Gli-2 gene.
- a storage protein-encoding gene of tetraploid and/or hexaploid wheat e.g., a gene encoding a glutenin subunit e.g., GIu-Al, GIu-Bl, GIu-Dl, GIu-AS, GIu-BS, GIu-DS, Glu-D4 or
- the invention may be employed to identify and select for the presence of a Glu-Dld allele in hexaploid wheats, which enhances bread- making quality of wheat by virtue of an extra cysteine residue in the encoded glutenin subunit, thereby improving bread dough strength.
- the invention is employed to select against the presence of a glutenin null allele e.g., a GIu- BIa null allele or a Glu-Alc null allele.
- the invention is employed to select lines having one or more over expressed glutenin-encoding genes e.g., an over expressed Glu-Blal allele.
- polymorphisms and/or mutations associated with flowering time and/or time to physiological maturity of tetraploid or hexaploid wheats e.g., winter wheat or spring wheat are identified in a vernalization-response gene e.g., Vrnl, Vm2, Vrn3 or allele thereof and/or a photoperiod gene e.g., Ppdl or Ppd2 or allele thereof.
- breeding lines are characterized by phenotype in the absence of vernalization and/or following vernalization, and phenotypic outliers in the population are selected and analyzed for the presence of polymorphisms and/or mutations in a Ppd2 gene.
- Measurable phenotypes for identifying polymorphisms and/or mutations in a Ppd2 gene associated with flowering time and/or time to physiological maturity of tetraploid or hexaploid wheats are selected e.g., from the group consisting of growth rate, flowering time, time to head emergence, main stem leaf number at flowering, and combinations thereof.
- polymorphisms and/or mutations associated with improved tolerance of plants to abiotic stress conditions such as drought or frost are identified in a fructan biosynthesis gene of tetraploid and/or hexaploid wheat, including Chinese Spring wheat, e.g., a gene encoding fructan 1-exohydrolase (FEH) such as an allele in a /-FEH gene e.g., 1-FEH-6A, 1-FEH-6B, 1-FEH-6D or allele thereof.
- FEH fructan 1-exohydrolase
- the invention may be employed to identify and select for the presence of one or more 1- FEH-6D alleles e.g., in Chinese Spring wheat.
- QTLs in the regions of 1—FEH genes have been mapped to chromosomes 6D and 7A in a population of double haploid lines derived from a cross between the wheat lines Berkut (a high fructan wheat) and Krichauff (a low fructan wheat), and validated in a population derived from a cross between the lines Sokoll (a high fructan wheat) and Krichauff, and the QTLs on chromosome 6D shown to coincide with the 1-FEH-6D gene.
- sequence data pertaining to the l-FEHw2-6D allele NCBI Accession No.
- primers flanking intron sequences in the 1-FEH-6D gene were designed and used in the method of the invention to identify polymorphisms in 1-FEH-6D genes by comparing the sequences of the wheat lines Berkut, Krichauff, Sokoll, and by bulked segregant analysis of Berkut x Krichauff lines and Krichauff x Sokoll lines carrying Berkut or Krichauff or Sokoll alleles in a 1-FEH-6D gene, wherein Berkut and Sokoll alleles are associated with improved abiotic stress tolerance e.g., improved frost tolerance and/or improved drought tolerance.
- abiotic stress tolerance e.g., improved frost tolerance and/or improved drought tolerance.
- polymorphisms and/or mutations associated with improved disease resistance of plants e.g., improved resistance to a nematode such as Heterodera avenae or root lesion nematode Pratylenchus neglectus.
- a nematode such as Heterodera avenae or root lesion nematode Pratylenchus neglectus.
- alleles associated with resistance to P. neglectus are identified within or in the vicinity of a Rlnnl gene on chromosome 7 or the A, B, or D genome of hexaploid wheat, and resistance alleles are present in a number of wheat varieties, including Excalibur and Krichauff.
- the invention is employed to identify and select for the presence of one or more Rlnnl alleles.
- a QTL has been identified positioned near a SSR marker gwm344, approximately 13.6 cM from Rlnnl gene on chromosome 7 A in a population derived from a cross between the wheat lines Kukri (susceptible to P. neglectus) and Excalibur (resistant to P. neglectus), and deletion mapping of SSR markers performed to determine a region linked to the QTL.
- primers flanking intron sequences, especially small introns, in wheat Rlnnl genes are designed and used in the method of the invention to identify polymorphisms in wheat Rlnnl genes by comparing sequences in resistant wheat lines (e.g., Krichauff, Excalibur) and susceptible wheat lines (e.g., Kukri, Tammin, Trident), and by bulked segregant analysis of Krichauff x Kukri and/or Krichauff x Tammin and/or Krichauff x Trident and/or Excalibur x Kukri and/or Excalibur x Tammin and/or Excalibur x Trident lines carrying Krichauff or Excalibur or Kukri or Tammin or Trident alleles in a Rlnnl gene, wherein Krichauff or Excalibur alleles are associated with improved resistance.
- resistant wheat lines e.g., Krichauff, Excalibur
- susceptible wheat lines e.g., Kukri, Tammin, Trident
- the present inventors have shown that the method of the present invention is suitable for detecting a polymorphism in a nodulin-like gene of hexaploid bread wheat.
- the present inventors have shown that the method of the present invention is suitable for detecting a polymorphism in an acetohydroxyacid synthase (AHAS) gene of hexaploid bread wheat, wherein the polymorphism confers resistance to the herbicide imidazolinone.
- AHAS acetohydroxyacid synthase
- the present inventors have shown that the method of the present invention is suitable for detecting polymorphisms or mutations linked to the BoI gene on chromosome 7B of hexaploid bread wheat, wherein the polymorphism or mutation confers tolerance the heavy metal boron.
- the present invention provides a method for identifying a polymorphism or mutation in a genome of a polyploid subject, comprising comparing sequences within a target nucleic acid from genomes of two or more polyploid subjects, and determining linkage between one or more variable nucleotides in the sequences and one or more nucleotides within a specific genome of a polyploid subject, thereby identifying a polymorphism or mutation in a genome of a polyploid subject.
- the present invention provides a method for identifying a polymorphism or mutation in a genome of a first polyploid subject, said method comprising: (i) comparing a position or a plurality of positions at which a specific nucleotide residue occurs within a target nucleic acid from genomes of the first polyploid subject to the position(s) at which the specific nucleotide residue occurs in a corresponding target nucleic acid from genomes of a second polyploid subject to thereby identify one or more variable nucleotides between the genomes of the first and second polyploid subjects; and
- the present invention provides a method for identifying a polymorphism or mutation in a genome of a first polyploid subject, said method comprising: (i) determining a position or a plurality of positions at which a specific nucleotide residue occurs within a target nucleic acid from genomes of the first polyploid subject; (ii) comparing the position(s) at which the specific nucleotide residue occurs in (i) to the position(s) at which the specific nucleotide residue occurs in a corresponding target nucleic acid from genomes of a second polyploid subject to thereby identify one or more variable nucleotides between the genomes of the first and second polyploid subjects; and
- polymorphism shall be taken to mean a difference in the nucleotide sequence of a specific site or region of the genome of a subject that occurs in a population of individuals.
- exemplary polymorphisms include a simple sequence repeat or microsatellite marker, e.g. in which the length of the marker varies between individuals in a population or a simple nucleotide polymorphism.
- a simple nucleotide polymorphism is a small change (e.g., an insertion, a deletion, a transition or a transversion) that occurs in one or more genomes of a population of polyploid subjects.
- a simple nucleotide polymorphism comprises or consists of an insertion or deletion or transition or transversion of one, or two or three or five, or ten or twenty nucleotides in one or more genomes of a polyploid subject.
- the polymorphism is a single nucleotide polymorphism (SNP).
- mutation shall be taken to mean a permanent, transmissible change in nucleotide sequence of the genome of a subject and optionally, an expression product thereof that is causative of a trait.
- mutations include an insertion of one or more new nucleotides or deletion of one or more nucleotides or substitution of one or more existing nucleotides with different nucleotides.
- polyploid subject shall be taken to mean a subject having a genome comprising more than two sets of homologous chromosomes.
- a subject having three sets of homologous chromosomes is a triploid subject
- a subject having four sets of homologous chromosomes is a tetraploid subject
- a subject comprising six sets of homologous chromosomes is a hexaploid subject, etc.
- triploid subjects include banana plants, apple plants, ginger plants or watermelon plants.
- tetraploid subjects include tetraploid wheat, such as, for example, Triticum turgidum (e.g., var.
- durum durum, polonicum, persicum, turanicum or turgidum) or T. durum, maize, cotton, potato, cabbage, leek, tobacco, peanut, kinnow or Pelargonium.
- Exemplary hexaploid subjects include hexaploid wheat, e.g., e.g., T. aestivum or T. compactum, triticale or oat.
- Exemplary octaploid subjects include strawberry, dahlia or pansy.
- the polyploid subject is a tetraploid subject or a hexaploid subject.
- the first and second polyploid subjects are from the same species, variety, cultivar or line.
- the method of the invention is performed using subjects of the same species that are different varieties thereby facilitating detection of a polymorphism or mutation occurring in one variety but not another.
- an assay is useful for identifying a polymorphism or mutation associated with or causative of a phenotype that occurs in one variety but not another variety.
- the present invention also clearly contemplates the first and second subjects being of the same or different species or the same or different varieties or the same or different cultivars or the same or different lines.
- target nucleic acid shall be taken to mean a nucleic acid that occurs within the genomes of a polyploid subject comprising a mutation or polymorphism.
- a "target sequence” may also occur a plurality of times within a single genome or within each genome of a polyploid subject, e.g., a paralogous sequence, i.e., a sequence arsing from replication within a single genome.
- a "target nucleic acid” on each genome or within each genome need not be identical, and the presence of a mutation or polymorphism in one genome will not be identical to that on another genome.
- target sequence may comprise a plurality of mutations or polymorphisms or one or more HSVs that differ between target sequences in different genomes or PSVs that differ between target sequences in the same genome.
- a "target nucleic acid” in all genomes also need not be the same length.
- the target sequence is at least about 10 nucleotides in length or 20 nucleotides in length or 30 nucleotides in length or 50 nucleotides in length or 100 nucleotides in length or 200 nucleotides in length or 300 nucleotides in length or 500 nucleotides in length or 1000 nucleotides in length or 2000 nucleotides in length.
- variable nucleotide shall be taken to mean a nucleotide residue within a target sequence that varies between two or more polyploid subjects.
- a variable nucleotide has multiple allelic forms wherein one or more of those forms occur in a subject.
- a method as described herein additionally comprises amplifying a target nucleic acid prior to determining each position of a nucleotide residue within a target sequence from genomes of a polyploid subject.
- the target nucleic acid is preferably amplified using any amplification reaction such as, for example, polymerase chain reaction (PCR).
- PCR polymerase chain reaction
- the amplified nucleic acid shall also be considered to be a "target nucleic acid" within the meaning of this term in the present specification and claims. Accordingly, each position of a nucleotide residue within the amplified nucleic acid can also be determined, thereby determining each position of a nucleotide residue in the target nucleic acid.
- a position or a plurality of positions at which the specific nucleotide residue occurs within a target nucleic acid is determined by performing a ddNTP- terminated sequencing method, e.g., Sanger sequencing, with a single ddNTP, e.g., ddATP or ddTTP or ddCTP or ddGTP.
- a single ddNTP e.g., ddATP or ddTTP or ddCTP or ddGTP.
- an electrotrace, or graphical representation depicting the molecular weight of each termination fragment produced using a ddNTP-terminated sequencing method is useful for determining the relative position of each occurrence of a nucleotide within a target nucleic acid.
- one or a plurality of positions at which a nucleotide residue occurs within a target sequence is determined by performing a method comprising:
- the primer capable of annealing to the tag sequence is labeled with a detectable marker to facilitate detection of the nucleic acid fragments.
- the present invention also contemplates other methods for sequencing a nucleic acid, such as, for example, making use of a labeled ddNTP rather than a labeled primer.
- a ddNTP -terminated sequencing reaction employs multiple cycles of (i) denaturation of double-stranded nucleic acid such as a nucleic acid "template” to be copied or a hybrid between a "template” and a complementary "primer”; (ii) annealing of a primer to its complementary sequence in the single- stranded "template”; and (iii) extension of the primer in the 5'- to 3'- direction by a polymerase activity e.g., an activity of a thermostable polymerase, such as, Taq, to thereby produce a double-stranded nucleic acid comprising a newly-synthesized strand complementary to the single-stranded template.
- a polymerase activity e.g., an activity of a
- a ddNTP-terminated sequencing reaction in the context of the present invention is performed with a ddNTP, which when incorporated into a sequence by the polymerase prevents further extension of the newly- synthesized strand, i.e., terminates the reaction.
- the sequencing reaction is performed in the presence of all deoxynucleotide triphosphates (dNTPs) required to synthesize a copy of the template nucleic acid, i.e., adenosine, thymine, guanine and cytosine and one ddNTP.
- dNTPs deoxynucleotide triphosphates
- the ddNTP is included at a lower concentration than the corresponding dNTP to permit the dNTP to be incorporated more often than the ddNTP, thereby permitting production of nucleic acid fragments of sufficient length to sequence a target sequence.
- Sequencing reactions are known in the art and described, for example in Sambrook, Fritsch & Maniatis, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, New York, Third Edition (2001) and/or Ausubel et al, Current Protocols in Molecular Biology. Wiley Interscience, ISBN 047 150338, 1987.
- the reactions described in the previously discussed referenced are adapted for use in the present invention by including a specific ddNTP rather than using ddATP, ddTTP, ddCTP and ddGTP in a single reaction.
- the method of the invention comprises separately or independently determining each position at which a plurality of different nucleotide residues occur within a target nucleic acid.
- the method comprises separately determining the position of a plurality of nucleotides within a target nucleic acid, e.g., determining the position of a plurality of adenosines in the target sequence and independently determining the position of a plurality of thymines in the target sequence and independently determining the position of a plurality of guanines in the target sequence and independently determining the position of a plurality of cytosines in the target sequence.
- the step of comparing the position(s) at which a nucleotide residue occurs in the target nucleic acid from a first polyploid subject and a second polyploid subject comprises aligning the position or plurality of positions at which the specific nucleotide residue occurs within target nucleic acids from genomes of the first and second polyploid subjects prior to comparing a position(s) at which the specific nucleotide residue occurs within a target nucleic acid from genomes of the first and second polyploid subjects.
- the molecular weights of termination fragments from samples from each subject are analyzed to determine a pattern of termination fragments having similar molecular weights in both samples, and the results aligned based on the similar pattern of termination fragments.
- sequences from target nucleic acids of different size are useful for aligning sequences from target nucleic acids of different size, e.g., as a result of an insertion or deletion in the target sequence from one of the subjects or in the presence of a microsatellite marker that varies in size between subjects.
- comparison of electrotraces depicting the molecular weight of each termination fragment from target nucleic acids from different subjects is useful for such alignment.
- Sequences can be aligned manually, e.g., by identifying patterns of termination fragments having similar molecular weights.
- software implementing an algorithm that detects patterns is used to identify a pattern of termination fragments conserved between target sequences from different subjects.
- Such software is also useful for identifying differences between aligned sequences, e.g., as a result of a variable nucleotide.
- a suitable algorithm and/or computer software for implementing that algorithm will be apparent to the skilled artisan and/or described herein.
- determining the specific nucleotide that occurs in the first polyploid subject but not in the second polyploid subject comprises detecting the presence of the nucleotide at a position in the target nucleic acid from genomes of the first polyploid subject and the absence of the nucleotide at the position in the corresponding target nucleic acid from genomes of the second polyploid subject.
- a termination fragment is produced in a ddNTP termination sequencing method using nucleic acid from one polyploid subject, and the termination fragment is not detected using nucleic acid from another polyploid subject.
- a termination fragment will be produced from a sample from the subject comprising the polymorphism or mutation but will not be produced from a sample from the subject that does not comprise the polymorphism or mutation. It will be apparent to the skilled artisan that a mutation or polymorphism can be heterozygous and one allele of the mutation or polymorphism is the same as the alleles on the homoeologous chromosomes in a polyploid subject.
- determining the specific nucleotide that occurs in a first polyploid subject but not in second polyploid subject comprises detecting an increased or decreased number of copies of the specific nucleotide at a position within the target nucleic acid from genomes of the first polyploid subject compared to within the corresponding target nucleic acid from genomes of second polyploid subject.
- a reduction in the number of copies of the nucleotide is detected by detecting a reduced amount of termination fragments produced by performing ddNTP -mediated sequencing with the ddNTP corresponding to the allele that is also present on homoeologous chromosomes.
- ddNTP-mediated sequencing As exemplified herein, such a change in the number of copies of a nucleotide in the genomes of a polyploid subject is readily detected using ddNTP-mediated sequencing.
- the number of nucleic acid fragments terminated at the site of the variable nucleotide is reduced in a sample from a heterozygous subject, and the relative number of fragments produced is readily detectable, e.g., in the case of a sequencing reaction using fluorescently labeled primers by detecting a reduction in the level of fluorescence of nucleic acid fragments terminated at the site of the variable nucleotide.
- linkage between the variable nucleotide and one or more nucleotides in a genome of a polyploid subject is determined by detecting the presence or absence of the variable nucleotide in one or more first mutant polyploid subject(s) in which at least a corresponding target nucleic acid in a genome has been deleted or otherwise removed, and in one or more second mutant polyploid subject(s) in which a corresponding target nucleic acid in other genome(s) has been deleted or otherwise removed, wherein absence of the variable nucleotide in the first mutant subject, and presence of the variable nucleotide in the second mutant polyploid subject indicates that the variable nucleotide is linked to the genome comprising the deletion in the first polyploid subject.
- variable nucleotide in the first subject indicates that the variable nucleotide is linked to the genome comprising the deletion in the first subject.
- the polyploid subjects are isogenic other than the deletion in the target regions.
- isogenic shall be taken to mean that the sequence of the genomes of two subjects is substantially identical. This shall not be taken to mean that the sequence of the genomes of the two subjects is identical, rather the sequences may include small differences, e.g., simple nucleotide polymorphisms.
- Suitable mutant polyploid subjects will be apparent to the skilled artisan and include aneuploid subjects, e.g., lacking one genome that generally occurs in a population of subjects, or a subject in which a chromosome or a part of a chromosome, e.g., an arm of a chromosome of one genome of a polyploid subject is deleted, e.g., a so-called "deletion line".
- detection of the presence or absence of a variable nucleotide in a first and/or second mutant polyploid subject is performed using a computer implemented algorithm to analyse each position of a nucleotide in the first and second mutant polyploid subjects and to determine the presence or absence of the variable nucleotide.
- the method described herein according to any embodiment is performed using nucleic acid from one or more pools of nucleic acid, e.g., bulked segregant analysis (BSA).
- BSA bulked segregant analysis
- two pools of nucleic acid are produced or obtained each from pools of subjects phenotypically similar except for one trait.
- the subjects are also genetically similar, e.g., derived from a single breeding population.
- the method of the invention facilitates detection of a polymorphism or mutation in which an allele occurs in one population and another allele occurs in another population. Because the two populations are genetically similar, there is an increased probability that the polymorphism or mutation is associated with the trait that varies between the two populations.
- the present invention also encompasses determining an association between the polymorphism or mutation and the trait.
- the method of the present invention additionally comprises providing or obtaining a sample comprising the target nucleic acid.
- said sample comprises genomic DNA from a polyploid subject.
- a sample is from or comprises a nucleated cell from a polyploid subject.
- the sample is an extract of a cell from a polyploid subject.
- the method of the present invention additionally comprises determining the sequence of the nucleic acid adjacent to the polymorphism or mutation.
- the sequence of nucleic acid adjacent to the polymorphism or mutation is determined by separately determining the position of each nucleotide within a region of the target sequence adjacent to the polymorphism or mutation and deriving the nucleotide sequence adjacent to the site of the polymorphism or mutation based on the position of each nucleotide residue so determined.
- electrotrace representations of termination fragments produced by performing sequencing reactions each with an individual ddNTP are overlaid, and the nucleotide sequence of nucleic acid adjacent to the polymorphism or mutation is determined.
- This sequence information facilitates identification of a genetic marker. Accordingly, the present embodiment of the invention applies mutatis mutandis to a method for identifying a genetic marker.
- the method of the present invention comprises determining a nucleotide specific to the genome in which the polymorphism or mutation occurs and linked to the polymorphism or mutation, e.g., a HSV or PSV in the target nucleic acid in which the polymorphism or mutation occurs.
- the present invention permits production of a primer that is able to preferentially or specifically amplify a target sequence comprising a polymorphism or mutation rather than copies of the target sequence on homoeologous chromosomes.
- the method additionally comprises: (i) determining a position or a plurality of positions at which the specific nucleotide residue occurs within a target nucleic acid in genomes of a first mutant polyploid subject in which at least the target nucleic acid in one genome of the first mutant polyploid subject has been deleted or otherwise removed, wherein the target nucleic acid that has been deleted or otherwise removed also comprises the mutation or polymorphism; and (ii) comparing the position(s) of the specific nucleotide residue determined at (i) to the position(s) of ' the specific nucleotide residue in the target nucleic acid in a corresponding target nucleic acid within genomes of a second mutant polyploid subject in which at least the target nucleic acid in a different genome to that of the first subject has been deleted or otherwise removed to determine a nucleotide that occurs in the first polyploid subject but not in the second polyploid subject thereby detecting a variable nucleotide, thereby detecting a nucleo
- Such a method is also applicable mutatis mutandis to the detection of a nucleic acid variation occurring in a copy of a target sequence in a genome of a subject and not in another copy of the target sequence occurring in the same genome of the subject, i.e., a PSV.
- the method additionally comprises determining the sequence of nucleic acid adjacent to a nucleotide specific to the genome in which the polymorphism or mutation occurs and linked to the polymorphism or mutation. Suitable methods for determining this sequence will be apparent to the skilled artisan and/or described herein.
- nucleotide specific to the genome in which a polymorphism or mutation occurs and nucleic acid adjacent thereto is useful as a genetic marker linked to a polymorphism or mutation, e.g., as an annealing site for a primer to preferentially or specifically amplify the target sequence comprising a polymorphism or mutation identified using the method of the invention.
- a probe or primer anneals or hybridizes to a nucleic acid to a greater extent or higher level than it does to other nucleic acid in the genome of a polyploid subject.
- the term “preferentially” does not limit the annealing or hybridization of the probe or primer to only one site or nucleic acid in the genomes of a polyploid subject. Rather, the level of annealing or hybridization of the probe or primer need only be increased to a higher level, and preferably significantly increased compared to the level of annealing or hybridization to another nucleic acid in the genomes of a polyploid subject.
- probe or primer detectably anneals to only one nucleic acid in the genomes of a polyploid subject.
- the present invention provides a method for producing a probe or primer for detecting a polymorphism or mutation, said method comprising:
- the probe or primer preferentially or specifically anneals or hybridizes to a nucleic acid comprising an allele of the polymorphism or mutation.
- the present invention also provides a method for producing a primer for amplifying a target sequence in a genome of a polyploid subject, said target sequence comprising a polymorphism or mutation, said method comprising: (i) performing a method as described herein according to any embodiment to determine the sequence of a nucleic acid comprising a nucleotide specific to a genome of a polyploid subject in which a polymorphism or mutation occurs and that is linked to the polymorphism or mutation or obtaining the sequence of said nucleic acid determined by performing said method; and (ii) producing or obtaining a probe or primer comprising the sequence determined at (i) and capable of preferentially or specifically annealing or hybridizing to nucleic acid linked to said polymorphism or mutation.
- the inventive method has broad applicability to the identification of one or more polymorphisms and/or mutations in one or more homeologous genes that are not conserved across all genomes of the polyploid is required for determining the breeding contribution of a particular parent.
- the present invention also has clear application to the discovery or identification of mutations causative of a trait of interest. Accordingly, the present invention provides a method for identifying a polymorphism or mutation associated with or causative of a trait, said method comprising:
- a polymorphism or mutation is a polymorphism or mutation in one or more genes associated with yield and/or flour quality and/or flowering and/or disease- resistance and/or tolerance to abiotic stress such as drought, salinity, frost, heat.
- a polymorphism or mutation is a polymorphism or mutation associated with wheat flour quality, including suitability for bread or noodle or biscuit making, e.g., a Wx gene encoding granule-bound starch synthase (GBSS).
- the mutation is an INDEL in a Wx gene.
- a polymorphism or mutation is a polymorphism or mutation associated with wheat flour quality, including suitability for bread or noodle or biscuit making, e.g., a polymorphism or mutation in a gene encoding a glutenin subunit e.g., GIu-Al, GIu-Bl, GIu-Dl, Glu-A3, GIu-BS, GIu-DS, Glu-D4 or Glu-D5 or allele thereof, or alternatively, encoding a gliadin e.g., an allele of a GU-I or Gli-2 gene.
- a glutenin subunit e.g., GIu-Al, GIu-Bl, GIu-Dl, Glu-A3, GIu-BS, GIu-DS, Glu-D4 or Glu-D5 or allele thereof, or alternatively, encoding a gliadin e.g., an allele of a GU
- a polymorphism or mutation is a polymorphism or mutation in a Glu-Dld allele in hexaploid wheat that enhances bread-making quality of wheat e.g., by virtue of providing an extra cysteine residue in the encoded glutenin subunit, thereby improving bread dough strength.
- a polymorphism or mutation is a polymorphism or mutation in a GIu-Bl al allele that enhances bread making quality by virtue of being expressed at a high level.
- a polymorphism or mutation is a polymorphism or mutation associated with flowering time and/or time to physiological maturity e.g., a polymorphism or mutations in a vernalization-response gene e.g., Vrnl, Vrn2, Vrn3 or allele thereof and/or a photoperiod gene e.g., Ppdl or Ppd2 or allele thereof.
- a vernalization-response gene e.g., Vrnl, Vrn2, Vrn3 or allele thereof
- a photoperiod gene e.g., Ppdl or Ppd2 or allele thereof.
- a polymorphism or mutation is a polymorphism or mutation associated with improved tolerance of plants to abiotic stress conditions such as drought or frost e.g., a polymorphism or mutations in a fructan biosynthesis gene of tetraploid and/or hexaploid wheat, including Chinese Spring wheat, e.g., a gene encoding fructan 1-exohydrolase (FEH) such as an allele in a i-FEHgene e.g., 1-FEH-6A, 1-FEH-6B, 1- FEH-6D or allele thereof.
- abiotic stress conditions such as drought or frost
- a polymorphism or mutations in a fructan biosynthesis gene of tetraploid and/or hexaploid wheat including Chinese Spring wheat, e.g., a gene encoding fructan 1-exohydrolase (FEH) such as an allele in a i-FEHgene e.g., 1-FEH-6A, 1-FE
- a polymorphism or mutation is a polymorphism or mutation associated with improved resistance to a nematode, preferably improved resistance to a root lesion nematode, e.g., Pratylenchus neglectus, e.g., a polymorphism or mutations in a Rlnnl gene or allele thereof.
- a polymorphism or mutation is a polymorphism or mutation in a nodulin-like gene of hexaploid bread wheat.
- a polymorphism or mutation is a polymorphism or mutation associated with herbicide resistance of plants e.g., a polymorphism or mutation in acetohydroxyacid synthase (AHAS) gene, wherein the polymorphism or mutation confers resistance or tolerance to the herbicide imidazolinone.
- AHAS acetohydroxyacid synthase
- a polymorphism or mutation is a polymorphism or mutation associated with tolerance to a heavy metal e.g., boron , such as a polymorphism or mutation linked to the BoI gene e.g., on chromosome 7B of hexaploid bread wheat.
- a heavy metal e.g., boron
- the method additionally comprises determining or identifying a gene or other nucleic acid in a genome of a polyploid subject that is expressed in nature, e.g., microRNA encoding nucleic acid, and that is linked to the polymorphism or mutation.
- the method additionally comprises determining a gene or nucleic acid expressed in nature in a genome of a polyploid subject that causes the trait.
- the present invention provides for positional cloning of a gene responsible for a trait.
- the present invention is readily adapted to screening mutant polyploid subjects, e.g., polyploid plants having a desired trait.
- mutations or polymorphisms may be known previously or produced de novo, and associated with a particular trait of interest.
- mutations may be produced de novo in a zygote of a polyploid subject using, for example a chemical mutagen, e.g., ethylnitrosurea (ENU) or ethylmethanesulfonate (EMS).
- ENU ethylnitrosurea
- EMS ethylmethanesulfonate
- This polyploid subject, or offspring thereof, is then screened to detect a polyploid subject having a desired trait, and the mutation causative of this trait is detected by performing a method as described herein according to any embodiment.
- mutant polyploid subjects are screened to identify those comprising mutations in a particular target nucleic acid prior to phenotypic screening to identify those having a desired trait.
- the present invention also provides a method for identifying a mutation causative of a trait, said method comprising: (i) inducing or producing a mutation in a polyploid subject;
- the present invention provides a method for identifying a mutation causative of a trait, said method comprising:
- nucleotide residues referred to herein are those recommended by the IUPAC-IUB Biochemical Nomenclature Commission, wherein A represents Adenine, C represents Cytosine, G represents Guanine, T represents thymine, Y represents a pyrimidine residue, R represents a purine residue, M represents Adenine or Cytosine, K represents Guanine or Thymine, S represents Guanine or Cytosine, W represents Adenine or Thymine, H represents a nucleotide other than Guanine, B represents a nucleotide other than Adenine, V represents a nucleotide other than Thymine, D represents a nucleotide other than Cytosine and N represents any nucleotide residue.
- derived from shall be taken to indicate that a specified integer may be obtained from a particular source albeit not necessarily directly from that source.
- composition of matter, group of steps or group of compositions of matter shall be taken to encompass one and a plurality (i.e. one or more) of those steps, compositions of matter, groups of steps or group of compositions of matter.
- Figure IA is a graphical representation showing partial termination fragment profiles for aligned (a) ddT, electrotraces corresponding to the sequence of a nodulin-like gene amplified from the bread wheat varieties Excalibur and Kukri using the primer set 18B (SEQ ID NOs: 3 and 4).
- the two bread wheat varieties harbor a known SNP, a cytosine to guanosine transition. Sequencing reactions assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
- Figure IB is a graphical representation showing partial termination fragment profiles for aligned (a) ddC, electrotraces corresponding to the sequence of a nodulin-like gene amplified from the bread wheat varieties Excalibur and Kukri using the primer set 18B (SEQ ID NOs: 3 and 4).
- the two bread wheat varieties harbor a known SNP, a cytosine to guanosine transition, indicated by an arrow, and have the homozygous genotypes GG and CC, respectively. Sequencing reactions assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
- Figure 1C is a graphical representation showing partial termination fragment profiles for aligned (a) ddG, electrotraces corresponding to the sequence of a nodulin-like gene amplified from the bread wheat varieties Excalibur and Kukri using the primer set 18B (SEQ ID NOs: 3 and 4).
- the two bread wheat varieties harbor a known SNP, a cytosine to guanosine transition, indicated by an arrow, and have the homozygous genotypes GG and CC, respectively. Sequencing reactions assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
- Figure 2 is a graphical representation showing results of sequencing reactions to identify a SNP conferring herbicide tolerance in the AHAS gene located on chromosome 6B of bread wheat using bulked segregant analysis.
- the herbicide tolerant DNA pool (MU) has the SNP genotype TT, while the herbicide susceptible DNA pool (WT) has the genotype CC.
- the homoeologous AHAS genes amplified from the A- and D-genomes in both DNA pools have the genotypes CC and CC respectively, at the corresponding nucleotide position.
- the left-hand side panels shows overlaid and individual electrotraces produced using ddT and the right-hand panel shows overlaid and individual electrotraces produced using ddC.
- the termination fragment corresponding to the target SNP is indicated with an arrow.
- Assays were performed using the M13 reverse sequencing primer (SEQ ID NO: 2).
- Figure 3 is a graphical representation showing a comparison of overlaid and individual termination fragment profiles for detection of a polymorphism in the AHAS gene.
- the termination fragment corresponding to the target SNP is indicated with an arrow.
- Assays were performed using the M13 reverse sequencing primer (SEQ ID NO: 2).
- Figure 4 panel A is a graphical representation showing the results of dideoxynucleotide- mediated sequencing termination fragment profiles produced using single ddNTPs and overlaid results to identify homeologue sequence variants in the AHAS gene of bread wheat using aneuploid stocks as indicated.
- the termination fragments corresponding to the HSVs are indicated with an arrow.
- Assays were performed using the M 13 reverse sequencing primer (SEQ ID NO: 2).
- Figure 4 panel B depicts the published sequence for the region of the AHAS genes depicted in Figure 4 panel A.
- Figure 5 is a graphical representation showing dideoxynucleotide-mediated sequencing termination fragment profiles produced using single ddNTPs and overlaid results to determine the nucleotide sequence of a region of the AHAS gene amplified from the bread wheat variety Excalibur. The cut-off threshold for nucleotide assignment is indicated by the solid line. Assays were performed using the Ml 3 reverse sequencing primer (SEQ ID NO: 2).
- Figure 6 is a graphical representation showing an alignment of overlaid termination fragment profiles for the four dideoxynucleotides amplified from two homozygous samples harboring a SNP. Shown are partial termination fragment profiles for overlaid electrotraces corresponding to a nodulin-like gene amplified from the bread wheat varieties Kukri and Excalibur. Kukri and Excalibur have the SNP genotypes CC and GG, respectively. Aligned termination fragments (solid lines) correspond to nucleotides common to both samples, while misaligned termination fragments (dashed lines) correspond to sequence variants. The termination fragments corresponding to the SNP are indicated by an arrow. Assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
- Figure 7A is a graphical representation showing alignment of overlaid partial termination fragment profiles for the four dideoxynucleotides amplified from three pooled samples harboring a SNP.
- Pool 1 consists of the wheat variety Kukri with the SNP genotype CC.
- Pool 2 comprises the varieties RAC875 and Excalibur with genotypes GG and CC, respectively.
- Pool 3 consists of the varieties Timgalen, Trident, Spear, Berkut and Krichauff with the genotypes CC, GG, GG, GG and GG, respectively.
- Aligned termination fragments (solid lines) correspond to nucleotides common to all samples. The termination fragments corresponding to the SNP are indicated by an arrow. Assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
- FIG 7B is a graphical representation showing alignment of individual partial termination fragment profiles for the four dideoxynucleotides amplified from three pooled samples harboring a SNP.
- Pool 1 consists of the wheat variety Kukri with the SNP genotype CC.
- Pool 2 comprises the varieties RAC875 and Excalibur with genotypes GG and CC, respectively.
- Pool 3 consists of the varieties Timgalen, Trident, Spear, Berkut and Krichauff with the genotypes CC, GG, GG, GG and GG, respectively.
- Aligned termination fragments (solid lines) correspond to nucleotides common to all samples. The termination fragments corresponding to the SNP are indicated by an arrow. Assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
- Figure 8 is a copy of a photographic representation showing the effect of dideoxynucleotide-termination on the mobility of oligonucleotides in denaturing polyacrylamide gel.
- the 3 '-end of the oligonucleotide (5' HEX- ACG ACG TTG TAA AA 3' (SEQ ID NO: 9)) was labeled in four separated reactions with each of the four dideoxynucleotides.
- the reaction products were separated on a 10% (19:1) denaturing polyacrylamide gel using a GelScan3000 instrument and detected by fluorescence.
- Figure 9 A is a graphical representation showing the sensitivity of the method of the invention for detecting a polymorphism in pooled samples. Shown are the partial termination fragment profiles for individual electrotraces corresponding to a region of a triticin precursor gene.
- DNA pool 1 contains the barley line WABAR2096 with the SNP genotype GG.
- DNA pools 2-5 contain a mixture of the barley lines WABAR2096 and WI3385, the latter barley line has the SNP genotype TT.
- the frequency of the WI3385 SNP allele in DNA pools 2, 3, 4 and 5 is 50, 33, 17 and 8%, respectively.
- Assays were performed using the M13 reverse sequencing primer (SEQ ID NO: 2).
- Figure 9B is a graphical representation showing the sensitivity of the method of the invention for detecting a polymorphism in pooled samples. Shown are the partial termination fragment profiles for overlaid electrotraces corresponding to a region of a triticin precursor gene.
- DNA pool 1 contains the barley line WABAR2096 with the SNP genotype GG.
- DNA pools 2-5 contain a mixture of the barley lines WABAR2096 and WI3385, the latter barley line has the SNP genotype TT.
- the frequency of the WI3385 SNP allele in DNA pools 2, 3, 4 and 5 is 50, 33, 17 and 8%, respectively.
- Assays were performed using the M 13 reverse sequencing primer (SEQ ID NO: 2).
- Figure 1OA is a graphical representation showing unaligned electrotraces produced using ddT-mediated sequencing for Stylet and Wylkatchem. Assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
- Figure 1OB is a graphical representation showing manually aligned electrotraces produced using ddT-mediated sequencing for Stylet and Wylkatchem showing two SNPs identified in a region flanking a microsatellite repeat sequence. The termination fragments corresponding to the SNPs are shown with an arrow. Assays were performed using the M 13 forward sequencing primer (SEQ ID NO: 1).
- Figure 11 is a graphical representation showing a ClustalW alignment of the Sanger sequences for sub-cloned PCR fragments amplified from chromosomes 2A and 2D of Stylet and Wylkatchem. Three SNPs located on chromosome 2A are highlighted in grey.
- Figure 12A is a graphical representation showing aligned ddA- or ddG-mediated sequencing results from a region of Gene 77 sequenced using the M 13 forward sequencing primer (SEQ ID NO: 1).
- Target nucleic acid is from Chinese Spring or bulked sample A (as indicated). Results indicate the presence-absence of a peak in the ddA trace and the presence of a peak at the corresponding nucleotide position in the ddG trace, thereby indicating the presence of the G allele on all three genomes of Chinese Spring and the presence of A and G alleles in the bulked sample.
- Figure 12B is a graphical representation showing aligned ddA- or ddG-mediated sequencing results from a region of Gene 77 sequenced using the M 13 forward sequencing primer (SEQ ID NO: 1). Sequenced nucleic acid is from the four individual accessions present in bulked sample A showing the presence-absence of a peak in the ddA trace and presence of a peak in the ddG trace, indicating the presence of the A allele on at least one genome.
- Figure 13 is a graphical representation demonstrating identification and validation of a SNP in a complex DNA mixture containing homoeologous sequences of different lengths.
- Gene 89 was amplified from the three wheat genomes using conserved primers, and sequencing assays performed using the M 13 forward sequencing primer (SEQ ID NO: 1). Shown are the aligned ddA traces of varieties Renan, Hartog, Chinese Spring and nullisomic-tetrasomic stocks for the group 3 chromosomes. The presence-absence of a peak indicates the presence of a SNP between Excalibur and Kukri that is shown to be located on the A genome by the peaks absence in the N3A nullisomic-tetrasomic stock.
- Figure 14A is a graphical representation showing identification of allelic SNPs linked to the BoI gene by bulked segregant analysis.
- Gene CAT was amplified using conserved primers LS-CAT forward and reverse (SEQ ID NOs: 30 and 31, respectively), and a method of the present invention was performed using the M 13 forward sequencing primer (SEQ ID NO: 1). Shown are the ddC traces for the sensitive and tolerant DNA bulks and parental lines. Arrows indicate the first allelic sequence variation identified.
- Figure 14B is a graphical representation showing results of sequencing reactions used in a blinded genotyping assay using the method of the present invention to confirm linkage of allelic variation detected in the CAT gene with the BoI gene.
- Gene CAT was amplified using conserved primers LS-CAT forward and reverse (SEQ ID NOs: 30 and 31, respectively), and a method of the present invention was performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1). Shown are the ddC traces for eight doubled haploid lines with BoI genotypes: sensitive, tolerant, sensitive, tolerant, sensitive, sensitive, tolerant and sensitive, respectively. Arrows indicate the first allelic sequence variation linked to the BoI gene.
- Figure 14C is a graphical representation showing HSV assignment using nullisomic- tetrasomic wheat stocks.
- Gene CAT was amplified using conserved primers LS-CAT forward and reverse (SEQ ID NOs: 30 and 31, respectively), and a method of the present invention performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1). Shown are the ddC traces for the nullisomic stocks for group 7 chromosomes and the sensitive and tolerant DNA bulks. Double arrows indicate allelic variation linked to the BoI gene. A single arrow and asterisk (*) indicates HSV that can be assigned to a copy of chromosome 7.
- a primer is a nucleic acid molecule comprising any combination of ribonucleotides, deoxyribonucleotides and/or analogs thereof such that it comprises DNA, RNA or DNA/RNA with one or more ribonucleotide or deoxyribonucleotide analogs contained therein, and is capable of annealing to a nucleic acid template to act as a binding site for an enzyme, e.g., DNA or RNA polymerase, thereby providing a site for initiation of replication of a specific nucleic acid in the 5' to 3' direction.
- an enzyme e.g., DNA or RNA polymerase
- the nucleotide sequence of a primer is generally substantially complementary to the nucleotide sequence of a region of a template nucleic acid to be amplified, i.e., a target sequence, or at least comprises a region of complementarity sufficient for annealing to occur and extension in the 5' to 3' direction therefrom.
- a degree of non- complementarity will not prevent a primer from initiating extension.
- Primers are generally, but not necessarily, short synthetic nucleic acids of about 12-50 nucleotides in length.
- a primer for amplifying or directly sequencing a target sequence comprises at least about 12-15 nucleotides in length and is capable of annealing to a strand of the nucleic acid template, i.e., target nucleic acid.
- Primers may also comprise at least about 20 or 25 or 30 nucleotides in length and are capable of annealing to a strand of the template.
- a primer specific to a target nucleic acid or comprising a region specific to a target nucleic acid is designed such that it comprises a sequence having at least about 80% identity overall to the sequence of a target nucleic acid. More preferably, the degree of sequence identity is at least about 85% or 90% or 95% or 98% or 99%.
- nucleotide sequences fall within a particular percentage identity limitation recited herein, those skilled in the art will be aware that it is necessary to conduct a side-by-side comparison or multiple alignment of sequences. In such comparisons or alignments, differences may arise in the positioning of non- identical residues, depending upon the algorithm used to perform the alignment.
- reference to a percentage identity between two or more nucleotide sequences shall be taken to refer to the number of identical residues between said sequences as determined using any standard algorithm known to those skilled in the art.
- nucleotide sequences may be aligned and their identity calculated using the BESTFIT program or other appropriate program of the Computer Genetics Group, Inc., University Research Park, Madison, Wisconsin, United States of America (Devereaux et al, Nucl. Acids Res. 12, 387-395, 1984).
- NCBI National Center for Biotechnology Information
- BLAST Basic Local Alignment Search Tool
- NCBI National Center for Biotechnology Information
- BLAST 2 Sequences a tool that is used for direct pairwise comparison of two nucleotide sequences.
- NCBI Network Codebook
- the specific composition of a primer will depend upon the sequence of the target nucleic acid. Accordingly, the sequence of the primer or region specific for a target nucleic acid is not to be taken to be limited to a particular sequence. Rather the sequence need only be sufficient to allow for annealing of the primer to a template nucleic acid and initiation of an amplification reaction and/or a sequencing reaction. Because a primer is generally extended in the 5'- to 3'- direction it is preferred that at least the 3 '-terminal nucleotide is complementary to the relevant nucleotide in the target nucleic acid.
- At least the 3 or 4 or 6 or 8 or 10 contiguous nucleotides at the 3'- terminus of the primer are complementary to the relevant nucleotides in the target nucleic acid.
- the complementarity of the 3' terminus of the primer ensures that the extending end of the primer is capable of initiating amplification of the target nucleic acid, for example, by a polymerase.
- a primer of the invention does not comprise multiple contiguous nucleotides that are not identical to a strand of the target nucleic acid.
- the primer comprises no more than 6 or 5 or 4 or 3 or 2 contiguous nucleotides that are not identical to a strand of the target nucleic acid. More preferably, any nucleotides that are not identical to a strand of the target nucleic acid are non-contiguous.
- a primer comprises or consists of or consists essentially of at least about 10 nucleotides, more preferably at least about 12 nucleotides or at least about 15 or 20 nucleotides that anneal to a target nucleic acid or are complementary to the target nucleic acid.
- longer primers are also used in PCR reactions, for example, reactions in which a long region of nucleic acid (e.g., greater than lOOObp) is amplified.
- the present invention additionally contemplates a primer comprising at least about 25 or 30 or 35 nucleotides that anneal to a target nucleic acid or are complementary to the target nucleic acid.
- a primer comprising one or more modified bases such as, for example, locked nucleic acid (LNA) or peptide nucleic acid (PNA) need only comprise a region of at least about 8 nucleotides that anneal to a target nucleic acid or are complementary to the target nucleic acid.
- LNA locked nucleic acid
- PNA peptide nucleic acid
- the complementary nucleotides or modified nucleotides are contiguous.
- the number of nucleotides capable of annealing to a target nucleic acid is related to the stringency under which the primer will anneal.
- a primer of the invention anneals to a target nucleic acid under moderate to high stringency conditions.
- the stringency under which a primer of the invention anneals to a template nucleic acid is determined empirically. Generally, such a method requires performance of an amplification reaction using one or more primers under various conditions and determining the level of specific amplification produced.
- a primer of the invention is labeled with a detectable marker (e.g., a radionucleotide or a fluorescent marker) and the level of primer that has annealed to a target nucleic acid under suitably stringent conditions is determined.
- a detectable marker e.g., a radionucleotide or a fluorescent marker
- a moderate stringency annealing conditions will generally be achieved using a condition selected from the group consisting of: (i) an incubation temperature between about 42°C and about 55°C; (ii) an incubation temperature between about 15 0 C and 1O 0 C less than the predicted Tm for a primer; and (iii) a Mg 2+ concentration of between about 2mM and 3mM.
- High stringency annealing conditions will generally be achieved using a condition selected from the group consisting of:
- a reagent such as, for example, glycerol (5-
- a temperature that is within about 5 0 C or within about 1O 0 C or equal to an estimated temperature at which a primer denatures from a target nucleic acid (Tm) is considered to be high stringency.
- Medium stringency is to be considered to be within 1O 0 C to 2O 0 C or 1O 0 C to 15 0 C of the calculated Tm of the primer.
- the conditions under which a primer anneals to a target nucleic acid are determined in silico.
- methods for determining the predicted melting temperature (or Tm) of a primer or the temperature at which a primer denatures from a specific nucleic acid are known in the art.
- the method of Wallace et al, ⁇ Nucleic Acids Res. 6, 3543, 1979 estimates the Tm of a primer based on the G, C, T and A content.
- the described method uses the formula 2(A + G) + 4(G + C) to estimate the Tm of a probe or primer.
- the nearest neighbor method described by Breslauer et al, Proc. Natl Acad. Sci. USA, 83:3746-3750, 1986 is useful for determining the Tm of a primer.
- the nearest neighbour method uses the formula:
- T m (cale) ⁇ H ' ⁇ CRln(C t /n) + ⁇ S°)
- AH 0 is standard enthalpy for helix formation
- AS 0 is standard entropy for helix formation
- C t is the total strand concentration
- n reflects the symmetry factor, which is 1 in the case of self-complementary strands and 4 in the case of non-self-complementary strands
- R is the gas constant (1.987).
- Tm 11 TM" ' — + 16.61g — "Orr 273.15 wherein, dH is enthalpy for helix formation, dS is entropy for helix formation, R is molar gas constant (1.987cal/°C mol), “c” is the nucleic acid molar concentration (determined empirically, W.Rychlik et.al, supra), (default value is 0.2 ⁇ M for unified thermodynamic parameters), [K + ] is salt molar concentration (default value is 50 mM).
- Suitable software for determining the Tm of an oligonucleotide using the nearest neighbor method is known in the art and available from, for example, US Department of Commerce, Northwest Fisheries Service Center and Department of Molecular Genetics and Biochemistry, University of Pittsburgh School of Medicine.
- M is the molarity of Na+ and % form is the percentage of formamide (set to
- Tm is determined using the formula (described by Giesen et al, Nucl. Acids Res., 26: 5004-5006):
- TnmnDN A is the melting temperature as calculated using a nearest neighbor model for the corresponding DNA/DNA duplex applying ⁇ H° and AS 0 values as described by SantaLucia et al Biochemistry, 35: 3555-3562, 1995.
- a suitable program for determining the Tm of a primer comprising LNA is available from, for example, Exiqon, Vedbaek, Germany.
- a primers or primer sequence that is predicted to be or shown to be capable of selectively annealing to a target nucleic acid is also optionally analyzed for one or more additional characteristics that make it suitable for use as a primer in the method of the invention.
- a primer is analyzed to ensure that it is unlikely to form secondary structures (i.e., the primer does not comprise regions of self- complementarity) .
- primer dimers may be assessed to determine their ability to anneal to one another and form "primer dimers".
- primer dimers Methods for determining a primer that is capable of self-dimerization and/or primer dimer formation are known in the art and/or described supra.
- a primer satisfies the following criteria:
- the primer comprises a region that is to anneal to a target nucleic acid having at least about 17-28 bases in length;
- the primer comprises about 50-60% (G+C); (iii) the 3'-terminus of the primer is a G or C, or CG or GC (this prevents "breathing" of ends and increases efficiency of initiation of amplification); (iv) preferably, the primer has a Tm between about 55 0 C and about 8O 0 C; (v) the primer does not comprise three or more contiguous Cs and/or Gs at the 3'- ends of primers (as this may promote mispriming at G or C-rich sequences due to the stability of annealing) ; (vi) the 3 '-end of a primer should not be complementary with another primer in a reaction; and
- the primer does not comprise a region of self-complementarity.
- a program selected from the group consisting of:
- the composition of the target nucleic acid is considered (i.e. the nucleotide sequence) as is the type of amplification reaction to be used.
- a target nucleic acid is amplified with a primer comprising a region that anneals to the target nucleic acid and a tag region that provides an annealing site for a distinct labeled primer that facilitates sequencing.
- a tag region that is unable to anneal to the template nucleic acid is selected to ensure that it does not cause non-specific annealing of the first primer in the first amplification reaction and the amplification of non-template nucleic acid.
- the tag region is unable to anneal to a nucleic acid in a sample being assayed to such a degree as to amplify nucleic acid to a detectable level (i.e. background amplification).
- the requirement that the tag region not anneal to a template nucleic acid does not require that the tag region not anneal under any conditions. Rather, it is preferred that the tag region is not capable of annealing to the template nucleic acid under conditions sufficient for annealing of the region of the primer that anneals to the target nucleic acid.
- the tag region may anneal to the target nucleic acid under low stringency conditions.
- the tag comprises a sequence of nucleotides that does not naturally occur in a sample being assayed. Methods for determining a sequence that is not present in a sample being assayed will be apparent to the skilled artisan.
- the nucleotide sequence of the tag region is analyzed using a program, such as, for example, BLAST to determine whether or not that sequence (or its complement) occurs naturally in a subject being assayed.
- a nucleotide sequence is selected from an subject different to that from which a sample being assayed is derived.
- the tag is derived from, for example, an unrelated mammal or plant or a virus or a bacteria or a fungus that is not a pathogen of the mammal or plant.
- a tag sequence is selected from a bacterial page gene, e.g., tag comprises a sequence from M13 phage GTAAACGACGGCCAGT (SEQ ID NO: 12) or a sequence from T7 phage TAATACGACTCACTATAGGG (SEQ ID NO: 13).
- tag is useful as, for example, a tag sequence for a primer used to amplify a sequence from a polyploid plant.
- an artificial sequence is used for a tag.
- a tag sequence described by Heath et at, Med Genet 57:272-280, 2000 is used (i.e., a tag sequence comprises a nucleotide sequence selected from the group consisting of: (i) TCCGTCTTAGCTGAGTGGCGTA (SEQ ID NO: 14);
- a zip-code sequence is used as a tag sequence.
- a tag sequence comprises the nucleotide sequence GGAGCACGCTATCCCGTTAGAC (SEQ ID NO: 20) or CGCTGCCAACTACCGCACATG (SEQ ID NO: 21) or CCTCGTGCGAGGCGTATTCTG (SEQ ID NO: 22) or
- a zip code sequence is generally a sequence of nucleotides that has been produced synthetically and is predicted not to occur in a nucleic acid derived from a specific subject.
- the tag region comprises a nucleotide sequence CACGACGTTGTAAAACGAC (SEQ ID NO: 24) or GTACATTAAGTTCCCATTAC (SEQ ID NO: 25) as described in the applicant's International Patent Application No. PCT/AU2006/000318.
- a tag sequence comprises a sequence set forth in SEQ ID NO: 1 or SEQ ID NO: 2 or the complement thereof.
- the tag comprises or consists of the sequence of the primer used to sequence a target nucleic acid.
- the tag sequence need not only comprise a region that provides a suitable template for annealing of a primer to an amplicon to thereby permit dideoxynucleotide-mediated sequencing.
- the tag comprises additional sequence to facilitate binding of a polymerase to enable sequencing of amplified nucleic acid.
- the tag comprises a spacer sequence at the 5 '-end such that it is positioned at the 3'-end of the region that anneals to a target nucleic acid.
- a spacer region is rich in adenosine and/or thymine rather than cytosine and/or guanine. This is because a spacer region rich in cytosine and/or guanine increases the Tm of the first primer more than a spacer region rich in adenosine and/or thymine. Accordingly, a CG rich spacer may cause background by non-specific amplification of nucleic acids.
- Primer synthesis Following primer design and or analysis, the primer is produced and/or synthesized.
- Methods for producing/synthesizing a primer are known in the art. For example, oligonucleotide synthesis is described, in Gait (Ed) ⁇ In: Oligonucleotide Synthesis: A Practical Approach, IRL Press, Oxford, 1984).
- a probe or primer may be obtained by biological synthesis (e.g. by digestion of a nucleic acid with a restriction endonuclease) or by chemical synthesis. For short sequences (up to about 100 nucleotides) chemical synthesis is preferable.
- a primer comprising deoxynucleotides is produced using standard solid-phase phosphoramidite chemistry.
- this method uses protected nucleoside phosphoramidites to produce an oligonucleotide of up to about 80 nucleotides.
- an initial 5'-protected nucleoside is attached to a polymer resin by its 3 '-hydroxy group. The 5' hydroxyl group is then de-protected and the subsequent nucleoside-3 '-phosphoramidite in the sequence is coupled to the de-protected group.
- An internucleotide bond is then formed by oxidizing the linked nucleosides to form a phosphotriester.
- an oligonucleotide of desired length and sequence is obtained. Suitable methods of oligonucleotide synthesis are described, for example, in Caruthers, M. H., et al., "Methods in Enzymology," Vol. 154, pp. 287-314 (1988).
- oligonucleotide synthesis include, for example, phosphotriester and phosphodiester methods (Narang, et a Meth. Enzymol 68: 90, 1979) and synthesis on a support (Beaucage, et al Tetrahedron Letters 22: 1859-1862, 1981), and others described in "Synthesis and Applications of DNA and RNA," S. A. Narang, editor, Academic Press, New York, 1987, and the references contained therein.
- a plurality of primers are produced using standard techniques, each primer comprising a portion of a desired primer and a region that allows for annealing to another primer.
- the primers are then used in an overlap extension method that comprises allowing the primers to anneal and synthesizing copies of a complete primer using a polymerase.
- an overlap extension method comprises allowing the primers to anneal and synthesizing copies of a complete primer using a polymerase.
- a primer can also include one or more nucleic acid analogs.
- a primer comprises a phosphate ester analog and/or a pentose sugar analog.
- a primer of the invention comprises polynucleotide in which the phosphate ester and/or sugar phosphate ester linkages are replaced with other types of linkages, such as N-(2-ammoethyl)-glycine amides and other amides (see, e.g., Nielsen et ah, Science 254: 1497-1500, 1991; WO 92/20702; and USSN 5,719,262); morpholinos (see, for example, USSN 5,698,685); carbamates (for example, as described in Stirchak & Summerton, J.
- Phosphate ester analogs include, but are not limited to, (i) C 1 -C 4 alkylphosphonate, e.g.
- a primer of the invention comprises one or more LNA and/or PNA residues.
- Primers comprising one or more LNA or PNA residues have been previously shown to anneal to nucleic acid template at a higher temperature than a primer that comprises substantially the same sequence but does not comprise the LNA or PNA residues.
- a primer used in a sequencing reaction preferably comprises a detectable marker (for example, a fluorescent dye) to enable detection of fragments to thereby determine the position of a nucleotide in a sequence.
- a primer comprises or is conjugated to a detectable marker.
- detectable marker refers to any moiety which can be attached to a primer and: (i) provides a detectable signal; and/or (ii) interacts with a second detectable marker to modify the detectable signal provided by the second detectable marker, e.g.
- FRET Fluorescent Resonance Energy Transfer
- annealing e.g., duplex formation
- a member of a binding complex or affinity set e.g., affinity, antibody/antigen, ionic complexation, hapten/ligand, e.g. biotin/avidin.
- Labeling of a primer is accomplished using any one of a large number of known techniques employing known detectable markers, linkages, linking groups, reagents, reaction conditions, and analysis and purification methods.
- Detectable markers include, but are not limited to, light-emitting or light-absorbing compounds which generate or quench a detectable fluorescent, chemiluminescent, or bioluminescent signal (for example, as described in Kricka, L. in Nonisotopic DNA Probe Techniques (1992), Academic Press, San Diego, pp. 3-28).
- Fluorescent reporter dyes useful for labeling biomolecules include, but are not limited to, fluoresceins (see, for example USSN 5,188,934; 6,008,379; or USSN 6,020,481), rhodamines (as described, for example, in USSN 5,366,860; USSN 5,847,162; USSN 5,936,087; or USSN 6,051,719), benzophenoxazines (for example, as described in USSN U.S. Pat. No.
- Detectable markers also include, but are not limited to, semiconductor nanocrystals, or Quantum Dots® (as described, for example in US Pat. No. 5,990,479 or US Pat. No. 6,207,392). Suitable methods for linking a detectable marker to a primer (or labeling a primer) are also described in the references supra.
- a primer is produced with a fluorescent nucleotide analog to facilitate detection.
- a fluorescent nucleotide analog for example, coupling allylamine-dUTP to the succinimidyl- ester derivatives of a fluorescent dye or a hapten (such as biotin or digoxigenin) enables preparation of many common fluorescent nucleotides.
- a fluorescent dye or a hapten such as biotin or digoxigenin
- Other fluorescent nucleotide analogs are also known in the art and described, for example, Jameson, Methods Enzymol. 278:363-390, 1997 or USSN 6,268,132.
- Such nucleotide analogs are incorporated into nucleic acids, e.g., DNA and/or RNA, or oligonucleotides, via either enzymatic or chemical synthesis (e.g., a method described supra).
- a primer is labeled with a fluorescent dye, such as, for example, 6-carboxyfluorescein (FAM), VIC, NED or PET.
- FAM 6-carboxyfluorescein
- VIC VIC
- NED NED
- a simple two-step process is used.
- an amine-modified nucleotide, 5-(3-aminoallyl)-dUTP is incorporated into DNA using conventional enzymatic labeling methods. This step ensures relatively uniform labeling of the probe with primary amine groups.
- the amine-modified DNA is chemically labeled using an amine-reactive fluorescent dye.
- Various commercial kits for labeling a primer are known in the art and available from, for example, Molecular Probes (Invitrogen detection Technology) (Eugene, OR, USA) or Applied Biosystems (Foster City, CA, USA).
- a set of first primers and/or a second primer or set thereof is synthesized.
- a primer comprising a tag region is produced by coupling an oligonucleotide comprising a tag region to an oligonucleotide comprising an allele- specific region.
- an oligonucleotide comprising a tag region is linked to another oligonucleotide using a RNA ligase, such as, for example T4 RNA ligase (as available from New England Biolabs).
- RNA ligase catalyzes ligation of a 5' phosphoryl-terminated nucleic acid donor to a 3' hydroxyl-terminated nucleic acid acceptor through the formation of a 3'-5' phosphodiester bond, with hydrolysis of ATP to AMP and PPi.
- Suitable methods for the ligation of DNA and/or RNA molecules using a RNA ligase are known in the art and/or described in Ausubel et al (In: Current Protocols in Molecular Biology. Wiley Interscience, ISBN 047 150338, 1987) and Sambrook et al (In: Molecular Cloning: Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, New York, Third Edition 2001).
- a method for sequencing e.g., to determine one or a plurality of position(s) of a specific nucleotide in a target nucleic acid from genomes of a polyploid subject comprises performing a dideoxynucleotide-mediated sequencing reaction with one dideoxynucleotide, i.e., either ddATP or ddTTTP or ddGTP or ddCTP.
- Dideoxynucleotide-mediated sequencing methods are known in the art.
- one such method is also known as Sanger sequencing.
- a Sanger sequencing method comprises amplifying a target nucleic acid, e.g., using PCR and sequencing the PCR amplicon.
- Sequencing involves annealing a primer to the amplicon and extending the sequence of nucleotides from the primer using a DNA polymerase, an enzyme that replicates DNA. Included with the primer and DNA polymerase are the four naturally-occurring deoxynucleotide bases, along with a low concentration of a chain terminating nucleotide (most commonly a dideoxynucleotide). Limited incorporation of the chain terminating nucleotide by the DNA polymerase results in a series of related DNA fragments that are terminated only at positions where that particular nucleotide is used. The dideoxynucleotides are added in limited quantities.
- the dideoxynucleotides have a detectable label, e.g., a fluorescent label.
- the primer comprises the detectable label.
- the DNA polymerase catalyzes the production of a sequence of deoxynucleotides.
- a dideoxynucleotide is joined to a base, then that fragment of DNA can no longer be elongated since a dideoxynucleotide lacks a crucial 3'-OH group. Fragments of nucleic acid are then generated that are terminated by a ddNTP at a position corresponding to the position of each corresponding dNTP in the target nucleic acid.
- the fragments are then size-separated by electrophoresis in a slab polyacrylamide gel, or in a narrow glass tube (capillary) filled with a viscous polymer (e.g., Pop 7; Applied Biosystems, USA).
- Detection of the size of labeled nucleic acid fragments then facilitates or permits determination of one or a plurality of position(s) of the specific nucleotide corresponding to the ddNTP used in the sequencing reaction in a target nucleic acid, i.e., relative to the position of the first nucleotide of the primer.
- each position of a nucleotide in a target nucleic acid is determined by performing a method comprising:
- amplification reaction e.g., a polymerase chain reaction (PCR) with a primer comprising a locus-specific region capable of annealing to a target nucleic acid and a tag-sequence that does not anneal to the target nucleic acid to thereby amplify the target nucleic acid, wherein amplicons of the target nucleic acid comprise the tag sequence;
- PCR polymerase chain reaction
- the primer capable of annealing to the tag sequence is labeled with a detectable marker, e.g., a fluorescent marker to facilitate detection of the nucleic acid fragments.
- an unlabeled ddNTP is used in the sequencing reaction. Accordingly, it is not necessary to use ddNTPs labeled with a detectable marker. Accordingly, the method can be performed to determine the position of any dNTP in a target nucleic acid, or may be performed several times, each with a different ddNTP to determine the position of some or all dNTPs in the target nucleic acid. Clearly, such a method results in a cost saving in so far as it does not require multiple different labeled reagents.
- reaction conditions include a reaction mixture comprising a buffer, e.g., comprising 50 mM KCl; 10 mM Tris-HCl, pH 8.4; 2.5 mM MgCl 2 ; 200 ⁇ M of each dNTP; and 200 ⁇ g/mL of gelatine or 0.05% Tween 20, 0.05% NP40, 3 mM (or higher) MgCl 2 , in a buffer, 10 mMTris-HCl is preferred, at pH 8.0 to 8.5.
- the reaction mixtures also contain primer, template, a suitable polymerase, e.g., Taq polymerase and a ddNTP.
- ddNTPs are preferably included in a sequencing reaction at a ratio to dNTP of about dGTP:ddGTP (1:6), dATP:ddATP (1:32), TTP:ddTTP (1:48), and dCTP:ddCTP (l:16).
- a sequencing reaction is then performed by incubating the reaction mixture for a time and under conditions sufficient to denature a double stranded nucleic acid (e.g., at about 95 0 C for at least about 30 seconds), then incubating the reaction mixture for a time and under conditions sufficient for a primer to anneal to a target nucleic acid (e.g., about 65°C-75°C for about 30 seconds), then incubating the reaction mixture for a time and under conditions sufficient for a polymerase to extend the primer and preferably incorporate a ddNTP into a nucleic acid (e.g., about 70°C-75°C for about 30 seconds to about 60 seconds.
- a ddNTP e.g., about 70°C-75°C for about 30 seconds to about 60 seconds.
- the resulting nucleic acid fragments are then resolved, e.g., using electrophoresis and the size of the fragments determined, e.g., by detecting a label using suitable means, e.g., in the case of a fluorescent label an electrophoresis gel or capillary is exposed to a laser emitting light at a suitable wavelength to stimulate the label and emission of light from the label is detected. The position of the light emission in the gel or capillary is indicative of the size of a nucleic acid fragment.
- suitable means e.g., in the case of a fluorescent label an electrophoresis gel or capillary is exposed to a laser emitting light at a suitable wavelength to stimulate the label and emission of light from the label is detected.
- the position of the light emission in the gel or capillary is indicative of the size of a nucleic acid fragment.
- one or a plurality of position(s) of a specific nucleotide in a target nucleic acid from one polyploid is compared to one or a plurality of position(s) of the specific nucleotide in a target nucleic acid from another polyploid subject.
- this comparison comprises aligning the sequence information, e.g., one or a plurality of position(s) of a specific nucleotide in the target nucleic acid from each subject.
- Such alignment may be performed by an individual, e.g. by eye, whereby the individual identifies patterns in the sequences or positions of the nucleotides.
- sequence information or position of each nucleotide is determined using an algorithm, e.g., a computer implemented algorithm that recognizes patterns, e.g., an anti-correlation algorithm. Alignment of sequence information then permits differences between the sequences, e.g., as a result of a presence of a variable nucleotide to be determined.
- algorithm e.g., a computer implemented algorithm that recognizes patterns, e.g., an anti-correlation algorithm. Alignment of sequence information then permits differences between the sequences, e.g., as a result of a presence of a variable nucleotide to be determined.
- PCR polymerase chain reaction
- two non-complementary nucleic acid primers comprising at least about 20 nucleotides, and more preferably at least 30 nucleotides are hybridized to different strands of a target nucleic acid, and specific nucleic acid copies of the target are amplified enzymatically.
- Reagents required for a PCR include for example, one or more primers (described herein), a suitable polymerase, deoxynucleotides and/or ribonucleotides, and a buffer. Suitable reagents are described for example, in Dieffenbach (ed) and Dveksler (ed) (In: PCR Primer: A Laboratory Manual, Cold Spring Harbor Laboratories, NY, 1995).
- a suitable polymerase for use in the method of the invention include, a DNA polymerase, a RNA polymerase, a reverse transcriptase, a T7 polymerase, a SP6 polymerase, a T3 polymerase, SequenaseTM, a Klenow fragment, a Taq polymerase, a Taq polymerase derivative, a Taq polymerase variant, a Pfu polymerase, a Pfx polymerase, a Tfi polymerase, an AmpliTaqTM FS polymerase, a thermostable DNA polymerase with minimal or no 3 '-5' exonuclease activity, or an enzymatically active variant or fragment of any of the above polymerases.
- a polymerase used in the method of the invention is a thermostable polymerase.
- a mixture of two or more polymerases is used.
- the mixture of a Pfx or Pfu polymerase and a Taq polymerase has been previously shown to be useful for amplifying templates comprising a high GC content or for amplifying a large target nucleic acid.
- Suitable commercial sources for a polymerase useful for the performance of the invention will be apparent to the skilled artisan and include, for example, Stratagene (La Jolla, CA, USA), Promega (Madison, WI, USA), Invitrogen (Carlsbad, CA, USA), Applied Biosystems (Foster City, CA, USA) and New England Biolabs (Beverly, MA, USA).
- strand displacement amplification utilizes oligonucleotides, a DNA polymerase and a restriction endonuclease to amplify a target sequence.
- the oligonucleotides are hybridized to a target nucleic acid and the polymerase used to produce a copy of this region.
- the duplexes of copied nucleic acid and target nucleic acid are then nicked with an endonuclease that specifically recognizes a sequence of nucleotides at the beginning of the copied nucleic acid.
- the DNA polymerase recognizes the nicked DNA and produces another copy of the target region at the same time displacing the previously generated nucleic acid.
- SDA is that it occurs in an isothermal format, thereby facilitating high-throughput automated amplification.
- the target nucleic acid is then used in a sequencing reaction described herein according to any embodiment.
- the amplification product is isolated, e.g., by gel electrophoresis, or is separated from unincorporated dNTPs and/or other components of an amplification reaction, e.g., by size exclusion chromatography prior to sequencing.
- the sequencing reaction is performed without subcloning an amplicon, i.e., a product of an amplification reaction.
- a method for determining linkage to a genome of a polyploid subject preferably makes use of a polyploid subject in which at least a target nucleic acid in one genome of the subject is deleted.
- a target nucleic acid in one genome of the subject is deleted.
- the present inventors have made use of an aneuploid subject.
- the aneuploid or mutant polyploid subject is a nullisomic subject, e.g., a nullisomic-tetrasomic aneuploid of a hexaploid subject.
- Such an subject lacks one genome, i.e., one chromosome set.
- Suitable nullisomic polyploid subjects are known in the art and described, for example, in Sears 1966, In: Chromosome Manipulations and Plant Genetics, Riley and Lewis (Eds), Oliver and Boyd, Edinburgh.
- an aneuploid subject or mutant polyploid subject is a ditelosomic subject.
- a ditelosomic subject has two copies of one arm of a chromosome and miss the other arm of one of the chromosomes.
- Suitable ditelosomic subject is described, for example, in Sears and Sears Proc. 5th Int. Wheat Genetics Symposium.
- an aneuploid subject or a mutant polyploid subject is a monotelodisomic subject.
- Monotelodisomic subjects have at least one complete copy of a chromosome and another copy of a chromosome lacking one arm.
- a mutant polyploid subject comprises a mutation in one copy of a chromosome, wherein a region of the chromosome is deleted.
- a collection of lines of wheat comprising overlapping deletions have been produced and are described, for example, in Endo and Gill, J. Hered., 87: 295-307, 1996. Methods for producing additional deletion lines will be apparent to the skilled artisan and/or described, for example, in Endo and Gill, J. Hered., 87: 295-307, 1996.
- variable nucleotide is mapped to a genome of the polyploid subject.
- failure to detect the variable nucleotide in a mutant polyploid subject described herein above indicates that the variable nucleotide occurs within the genome comprising the mutation in the mutant subject.
- the present inventors have used aneuploid stocks of wheat to determine the presence or absence of a variable nucleotide in a genome of hexaploid wheat.
- the present invention is also useful for identifying a mutation responsible for or causative of a trait.
- a method comprises inducing or producing a mutation in a polyploid subject. Suitable methods for inducing or producing a mutation will be apparent to the skilled artisan.
- the present invention is useful for detecting a mutation induced by TILLING.
- TILLING In the first step of the TILLING process, single base pair changes are induced in a population of plants by treating seeds (or pollen) with a chemical mutagen (e.g., ENU or EMS or MNU).
- a chemical mutagen e.g., ENU or EMS or MNU
- seeds or pollen of a polyploid subject are contacted with a mutagen for a time and under conditions sufficient to induce a mutation in at least one genome of said polyploid subject.
- the seed or pollen is contacted with a chemical mutagen for a period of time from about 15 minutes to about 1 hour or about 30 minutes or about 45 minutes.
- Adult polyploid subjects are then grown from the seed and/or produced using the pollen and bred to produce a generation where mutations are stably inherited.
- Methods of TILLING are known in the art and/or described, for example, in Suzuki et al, MoI Genet Genomics, 279:213-223, 2007 or Slade et al, Nat Biotechnol. 25:75-81, 2005.
- the plants can be screened using, for example, a phenotypic assay to identify one or more plants having a trait of interest.
- exemplary traits include, for example, resistance to a plant pathogen or toxin, resistance to drought, water stress or cold or an improved nutritional quality, e.g., wheat having an increased or reduced level of amylose relative to amylopectin for production of bread or noodles.
- mutations in a specific target nucleic acid are selected prior to phenotypic screening.
- a method as described herein according to any embodiment is performed to identify a polyploid subject comprising a mutation in a target nucleic acid.
- the identified subject(s) is(are) then screened to identify that subject (those subjects) having the desired trait.
- plants produced using a TILLING method are screened to identify those plants having a mutation in a granule- bound starch synthase I gene (encoding GBSSI ).
- Wheat flour from a plant having such a mutation is then assayed to determine the level of amylose relative to the level of amylopectin.
- a plant having a reduced level of amylose relative to amylopectin is desirable for noodle making as it improves noodle texture.
- genes that may be involved in producing a trait of interest will be apparent to the skilled artisan and include, for example, a gene conferring pesticide resistance, e.g., a chitinase, ⁇ -kaf ⁇ rin; wheatwin or WPR4; thionin; thaumatin or thaumatin-like protein such as zeamatin; or a gene encoding a protein involved in tolerance to desiccation, e.g., HVAl or DREBlA.
- the method described supra is clearly useful for producing a polyploid subject, e.g., polyploid crop plants having a desired trait, e.g., resistance to a pest or to an abiotic stress or having an improved nutritional quality.
- the present invention also provides a polyploid subject, particularly, a polyploid crop plant produced by a method described herein according to any embodiment.
- the present invention also provides a polyploid crop plant having a desirable trait produced by performing a method as described herein according to any embodiment.
- a polymorphism or mutation as identified using a method as described herein according to any embodiment is useful as a genetic marker, e.g., to identify a polyploid subject having a desired trait or for genetic mapping or for marker assisted breeding.
- the mutation, polymorphism or genetic marker is detected using any method known in the art.
- the mutation, polymorphism or genetic marker is detected by performing a method as described in the assignee's co-pending application USSN 60/973,928 filed in the United States Patent and Trademark Office on September 20, 2007 entitled "Method of amplifying nucleic acid".
- a target nucleic acid on one genome of a polyploid subject and comprising the mutation, polymorphism or genetic marker is amplified in a first PCR amplification phase in which a set of first primers is used to selectively amplify the nucleic acid.
- a second phase amplification is performed using one or more second primers comprising (i) an allele specific region comprising a sequence complementary to the target nucleic acid adjacent to the polymorphism or mutation and that has a lower Tm than the first primers; and (ii) a tag region comprising a sequence that does not anneal to the template nucleic acid, however increases the Tm of the second primer to about the Tm of the first primer.
- the allele specific region of the second primer(s) anneals to the amplification product of the first phase amplification, thereby permitting amplification with the second primer(s) and first primers.
- the sequence of the second primer(s) is incorporated into amplification products thereby permitting the annealing temperature to be increased, and for the entire second primer and the first primer to anneal to target sequences and prime amplification by PCR.
- a polymorphism or mutation is detected.
- one or more nucleotide(s) positioned at the 3' end of the allele specific region that is complementary to an allele of the polymorphism or mutation. The 3' end of the second primer only anneals in the presence of that allele, and permits amplification by PCR.
- the two or more second primers complementary to different alleles permits detection of different alleles.
- the two or more second primers may be used in the same reaction if each primer is labeled so as to permit differentiation between amplification products produced by different primers, e.g., using tag regions having different molecular weights or different detectable markers.
- each second primer is used in a separate reaction.
- Such a method involves amplifying a target nucleic acid from a genome of a polyploid subject comprising the mutation or polymorphism, e.g., using PCR. Methods for producing primers for selectively or preferentially amplifying one genome of a polyploid subject will be apparent to the skilled artisan and/or described herein. The polymorphism or mutation is then detected by any of a variety of methods .
- ligase chain reaction (described in, for example, EU 320,308 and US 4,883,750) uses two or more oligonucleotides that hybridize to adjacent target nucleic acids. A ligase enzyme is then used to link the oligonucleotides. In the presence of an allele of a polymorphism or mutation that is not complementary to the nucleotide at an end of one of the primers that is adjacent to the other primer, the ligase is unable to link the primers, thereby failing to produce a detectable amplification product. However, a ligation product is produced in the presence of an allele that is complementary to the end of the primer adjacent to the other primer when annealed to a target nucleic acid.
- the ligated oligonucleotides then become a target for further oligonucleotides.
- the ligated fragments are then detected, for example, using electrophoresis, or MALDI-TOF.
- one or more of the probes is labeled with a detectable marker, thereby facilitating rapid detection.
- RNA-DNA duplex formed is a target for RNase H that cleaves the probe.
- the cleaved probe is then detected using, for example, electrophoresis or MALDI-TOF.
- a polymorphism or mutation that introduces or alters a sequence that is a recognition sequence for a restriction endonuclease is detected by digesting DNA with the endonuclease and detecting the fragment of interest using, for example, Southern blotting (described in Ausubel et al (In: Current Protocols in Molecular Biology. Wiley Interscience, ISBN 047 150338, 1987) and Sambrook et al (In: Molecular Cloning: Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, New York, Third Edition 2001)).
- a target nucleic acid comprising a polymorphism or mutation is amplified, e.g., using PCR and amplicons produced in the PCR digested with a restriction endonuclease that cleaves an allele of the polymorphism or mutation.
- the resulting amplicons are then detected by gel electrophoresis or capillary electrophoresis or mass spectrometry or Southern blotting.
- a polymorphism or mutation is detected using single stranded conformational polymorphism (SSCP) analysis.
- SSCP analysis relies upon the formation of secondary structures in nucleic acids and the sequence dependent nature of these secondary structures.
- an amplification method preferably a PCR method, is used to selectively amplify a target nucleic acid from one genome of a polyploid subject that comprises a polymorphism or mutation.
- the amplified nucleic acids are then denatured, cooled and analyzed using, for example, non-denaturing polyacrylamide gel electrophoresis, mass spectrometry, or liquid chromatography (e.g. HPLC or dHPLC).
- Regions that comprise different sequences form different secondary structures, and as a consequence migrate at different rates through, for example, a gel and/or a charged field. Accordingly, both homozygous forms of a polymorphism or mutation and a heterozygous form of the polymorphism or mutation may be detected using such analysis.
- a detectable marker may be incorporated into a probe/primer useful in SSCP analysis to facilitate rapid marker detection.
- Allele specific PCR (as described, for example, In Liu et al, Genome Research, 7: 389- 398, 1997) is also useful for determining the presence of one or other allele of a polymorphism or mutation.
- An oligonucleotide is designed, in which the most 3' base of the oligonucleotide hybridizes with the polymorphism or mutation.
- PCR amplification if the 3' end of the oligonucleotide does not hybridize to a target sequence, little or no PCR product is produced, indicating that a base other than that present in the oligonucleotide is present at the site of polymorphism or mutation in the sample.
- PCR products are then detected using, for example, gel or capillary electrophoresis or mass spectrometry.
- PCR products are detected using real-time PCR analysis.
- a label that binds to double stranded nucleic acid and fluoresces is incorporated into a PCR, and the level of fluorescence detected when the PCR is at a temperature permissive for double stranded nucleic acid formation.
- Detection of increasing levels of fluorescence during PCR amplification indicates that PCR product is being produced and indicates that allele present at the site of a polymorphism or mutation.
- An exemplary real-time amplification method is described in US Pat. No. 6,174,670.
- Primer extension methods are also useful for the detection of a polymorphism or mutation.
- An oligonucleotide is used that hybridizes to the region of a nucleic acid adjacent to the polymorphism or mutation. This oligonucleotide is then used in a primer extension protocol with a polymerase and a free nucleotide diphosphate or dideoxynucleotide triphosphate that corresponds to either or any of the possible nucleotides that occur at the polymorphism or mutation.
- the nucleotide-diphosphate is labeled with a detectable marker (e.g. a flurophore).
- a detectable marker e.g. a flurophore
- unbound labeled nucleotide diphosphates are removed, e.g. using size exclusion chromatography or electrophoresis, or hydrolyzed, using for example, alkaline phosphatase, and the incorporation of the labeled nucleotide into the oligonucleotide is detected, indicating the nucleotide that is present at the site of the polymorphism or mutation.
- the present invention also extends to high-throughput forms of primer extension analysis, such as, for example, minisequencing (Sy Vamen et at, Genomics 9: 341-342,
- a probe or primer (or multiple probes or primers) is immobilized on a solid support (e.g. a glass slide).
- a sample comprising amplified target nucleic acid comprising a mutation or polymorphism is then brought into direct contact with the probe/s or primer/s, and a primer extension protocol performed with each of the free nucleotide bases labeled with a different detectable marker.
- the nucleotide present at a polymorphism or mutation is then determined by detecting the detectable marker bound to each probe and/or primer.
- LNA locked nucleic acid
- PNA protein-nucleic acid
- LNA and PNA molecules bind, with high affinity, to nucleic acid, in particular, DNA.
- Flurophores in particular, rhodomine or hexachlorofluorescein conjugated to the LNA or PNA probe fluoresce at a significantly greater level upon hybridization of the probe to target nucleic acid.
- the level of increase of fluorescence is not enhanced to the same level when even a single nucleotide mismatch occurs.
- the degree of fluorescence detected in a sample is indicative of the presence of a mismatch between the LNA or PNA probe and the target nucleic acid, such as, in the presence of a SNP.
- fluorescently labeled LNA or PNA technology is used to detect a single base change in a nucleic acid that has been previously amplified using, for example, an amplification method known in the art and/or described herein.
- LNA or PNA detection technology is amenable to a high-throughput detection of one or more markers by immobilizing an LNA or PNA probe to a solid support, as described in Oram et ah, CHn. Chem. 45: 1898-1905, 1999.
- Molecular BeaconsTM are also useful for detecting polymorphisms or mutations in an amplified product (see, for example, Mhlang and Malmberg, Methods 25: 463-471, 2001).
- Molecular BeaconsTM are single stranded nucleic acid molecules with a stem- and-loop structure.
- the loop structure is complementary to the region surrounding the SNP of interest.
- the stem structure is formed by annealing two "arms" complementary to each other that are on either side of the probe (loop).
- a fluorescent moiety is bound to one arm and the other arm comprises a quenching moiety that suppresses any detectable fluorescence when the molecular beacon is not bound to a target sequence.
- the arms Upon binding of the loop region to its target nucleic acid the arms are separated and fluorescence is detectable. However, even a single base mismatch significantly alters the level of fluorescence detected in a sample. Accordingly, the presence or absence of a particular nucleotide at the site of a polymorphism or mutation is determined by the level of fluorescence detected.
- the polymorphisms and/or mutations identified by a method described herein according to any embodiment are useful for genetic mapping, e.g., to identify a gene causative of a trait of interest. Accordingly, the present invention also provides a method for identifying a gene associated with or causative of a trait of interest, said method comprising:
- the present invention also provides a method for identifying a gene associated with or causative of a trait in a polyploid subject, said method comprising:
- each member of said panel comprises at least one of said polymorphisms or mutations.
- the polyploid subjects are plants.
- the panel of polyploid subjects are near-isogenic.
- the term "near isogenic polyploid subjects” shall be taken to mean a population of polyploid subjects having identity over a substantial proportion of their genomes, notwithstanding the presence of sufficiently few differences to permit the contribution of a distinct allele or gene to the trait to be determined by a comparison of the trait phenotypes of the population.
- recombinant inbred lines, lines produced by several generations of backcrossing, or siblings are suitable near-isogenic lines for the present purpose.
- a segregating population is required.
- Experimental populations such as, for example, an F2 generation, a backcross (BC) population, or recombinant inbred lines (RIL), can be used as a mapping population.
- Bulk segregant analysis for the rapid detection of markers at specific genomic regions using segregating populations, is described by Michelmoore et al, Proc. Natl Acad. ScL (USA) 88, 9828-9832, 1991.
- F2 mapping populations F2 plants are used to determine genotype, and F2 families to determine phenotype.
- Recombinant inbred lines are produced by single-seed descent.
- Single Marker Analysis is used to detect a locus in the vicinity of a single genetic marker.
- the trait in a population of plants segregating for a particular marker is compared according to the marker class. Presence or absence or differences provides an estimate of the phenotypic effect of substituting one allele for another allele at the locus.
- a simple statistical test such as t- test or F-test, is used. A significant value indicates that a locus is located in the vicinity of the marker.
- Single point analysis does not require a complete molecular linkage map. The further the locus is from the marker, the less likely it is to be detected statistically, as a consequence of recombination between the marker and the gene.
- the association between marker genotype and phenotype is determined by a process comprising:
- interval mapping is to test a model for the presence of a QTL at many positions between two mapped marker loci.
- This model is a fit of a presumptive QTL to the trait, wherein the suitability of the fit is tested by determining the maximum likelihood that a QTL for the trait lies between two segregating markers. For example, in the case of a QTL located between two segregating markers, the 2-loci marker genotypes of segregating progeny will each contain mixtures of QTL genotypes. Accordingly, it is possible to search for loci parameters that best approximate the distribution in the trait for each marker class. Models are evaluated by computing the likelihood of the observed distributions with and without fitting a QTL effect. The map position of a QTL is determined as the maximum likelihood from the distribution of likelihood values (LOD scores: ratio of likelihood that the effect occurs by linkage: likelihood that the effect occurs by chance), calculated for each locus.
- LOD scores ratio of likelihood that the effect occurs by linkage: likelihood that the effect occurs by chance
- Interval mapping by regression is a simplification of the maximum likelihood method supra wherein basic QTL analysis or regression on coded marker genotypes is performed, except that phenotypes are regressed on the probability of a QTL genotype as determined from the linkage between the trait and the nearest flanking markers.
- regression mapping gives estimates of QTL position and effect that are almost identical to those given by the maximum likelihood method. The approximation deviates only at places where there are large gaps, or many missing genotypes.
- the composite interval mapping may be repeated to look for additional loci.
- two or more distinct regions of the genome can be nominated as candidate loci, and a gamete relationship matrix constructed for each candidate locus, and a 2-locus regression performed for each pair of loci, determining a best fit for the interacting effects between the two loci or alleles at those loci, including any dominance or additive effects.
- the algorithm described by Carlborg et al, Genetics (2000) can be used for simultaneous mapping.
- the present example describes reagents that are used in one embodiment of a method of the invention to detect a mutation or polymorphism in nucleic acid from a polyploid subject, such as bread wheat.
- Genomic DNA was extracted from frozen leaf material of barley (Hordeum vulgare) and bread wheat (Triticum aestivum) as described by Devos et al. Theoretical and Applied Genetics, 83: 931-939, 1992. Mixed DNA samples were prepared by pooling equal amount of genomic DNA for appropriate barley and wheat lines.
- Genome-specific primers for the amplification of target genes were obtained from published databases, or designed using Primer3 software (Rozen and Skaletsky, In: Krawatz and Misener (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, New Jersey, pp365-386, 2000) from expressed sequence tag (EST) sequence containing single nucleotide polymorphisms (SNPs) identified using AutoSNP (Barker et al. Bioinformatics, 19: 421-422, 2003) and QualitySNP (Tang et al. BMC Bioinformatics, 7: 438, 2006).
- EST expressed sequence tag
- Primer3 software to conserved sequence outside the polymorphic regions.
- Each set of primers were synthesized with a nucleotide sequence corresponding to a standard M13 sequencing primer at their 5 '-end.
- the forward and reverse primers for each target gene was synthesized with the nucleotide sequence 5' CAC GAC GTT GTA AAA CGA C 3' (SEQ ID NO: 1) and 5' GGT TTT CCC AGT CAC GAC 3' (SEQ ID NO: 2) respectively, hereafter referred to as the Ml 3 forward and Ml 3 reverse sequencing primers.
- the sequences of primers used in this study are listed in Table 1.
- Ml 3 forward and reverse sequencing primers were synthesized with one of the following fluorescent dyes: VIC, FAM, NED and PET (Applied Biosystems).
- PCR assays were performed in 25 ⁇ l reaction mixture containing 0.2 mM dNTP, Ix
- PCR buffer supplemented with 1.5 mM MgCl 2 (Qiagen), 0.2 ⁇ M each of forward and reverse M13-tailed primer, 50-100 ng genomic DNA and 1 U Taq DNA polymerase
- PCR was performed for a total of 35 cycles with the profile: 30 s at 94 0 C, 30 s at 55 0 C, 60 s at 72 0 C, and a final extension step of 10 min at 72 0 C.
- PCR products were purified by ultrafiltration to remove residual primer and dNTP using a Multiscreen 384 PCR cleanup plate (Millipore) according to the manufacturer's instructions.
- the purified PCR products were resuspended in 20 ⁇ l of sterile water, and quantified using a ND-1000 spectrophotometer (NanoDrop technologies) by measuring the absorbance at 260 nm wavelength.
- Detection of polymorphisms Twenty five ng of purified PCR product was added to each of four reaction wells and dried by evaporation by heating for 15 min at 8O 0 C. Three ⁇ l of reaction mixture containing Ix Thermo Sequenase buffer (3 mM MgCl 2 , 12 niM Tris-HCl, and pH 9.4), 225 ⁇ M each of dATP, dTTP, dCTP and 7-deaza-dGTP (Roche), 0.5 ⁇ M dye-labeled M 13 forward (or Ml 3 reverse) sequencing primer, 0.45 U Thermo Sequenase DNA polymerase (GE Healthcare) and 0.75 ⁇ M of either ddATP, ddTTP, ddCTP or ddGTP was added to each reaction well.
- Ix Thermo Sequenase buffer 3 mM MgCl 2 , 12 niM Tris-HCl, and pH 9.4
- the present example describes an example of an embodiment of a method of the present invention in which a polymorphism is detected in a target nucleic acid, in this case a nodulin-like gene in a genome of a polyploid subject, in this case bread wheat.
- a target nucleic acid in this case a nodulin-like gene in a genome of a polyploid subject, in this case bread wheat.
- a target gene i.e., nodulin-like gene harboring known SNPs from bread wheat lines was amplified and the presence of a polymorphism detected using methods described in Example 1.
- the nodulin gene was amplified using the primers cacgacgttgtaaaacgacTACTTCCTCGAGAAGTACGCCG (18B forward; SEQ ID NO: 3); and ggttttcccagtcacgacGTAGAGCGTGATCACCGTGG (18B reverse; SEQ ID NO: 4).
- the repeatability of the termination fragment pattern amplified for individual dideoxynucleotides was assessed by replicated assay of the same samples.
- the reproducibility of the termination fragment patterns amplified for individual dideoxynucleotides from different samples harboring known SNPs were compared.
- This example describes an embodiment of the present invention in which a mutation is detected in an acetohydroxyacid synthase (AHAS) gene on a genome of a polyploid subject using bulked segregant analysis, i.e., using pooled genomic DNA samples from a plurality of subjects having a similar phenotype. The mutation was mapped to a genome of the polyploid subject using aneuploid stocks.
- This example also describes the identification of HSVs in a genome of a polyploid subject that are linked to the mutation.
- Bulked segregant analysis (BSA) is a rapid approach to identify genetic markers linked to a target gene.
- BSA involves genotyping two pools of DNA from phenotypically distinct subjects, e.g., plants, originating from a segregating population, for sequence polymorphisms with a skewed allele frequency (Michelmore et al. Proc. Natl. Acad. Sd. USA, 88: 9828-9832, 1991).
- the underlying principle of BSA is that each pool of DNA contains individual subjects, e.g., plants with substantially identical genotypes for the target genomic region, but comprises random genotypes at unlinked loci.
- sequence polymorphisms with a skewed allele frequency are considered to be genetically linked to the trait locus.
- BSA BSA to identify SNPs in candidate genes relies on the availability of unique genomic sequence to design primers for PCR amplification of a target locus. This is especially important in higher plant genomes where gene duplication and polyploidy can confound the detection of allelic SNPs due to the presence of homoeologous and paralogous sequences.
- the present invention provides an advantage over other methods in so far as it deconvolutes sequence complexity resulting from the amplification of homoeologous and paralogous genes when genomic sequence is unavailable for the design of specific primers to amplify the target locus. This is achieved by comparing the termination fragment profiles amplified from two pools of DNA for a single nucleotide at a time.
- AHAS acetohydroxyacid synthase
- the second DNA pool comprised the wheat lines CF-Janz, CF-Sunstate and CF-Frame and was homozygous for the herbicide tolerance (mutant) allele on the B-genome, and fixed for the wild type allele on the A- and D-genomes.
- the two pools of DNA had identical AHAS genotypes for the A- and D-genomes but a different genotype for the B- genome, corresponding to the SNP (a cytosine to thymine transition) conferring herbicide tolerance.
- the SNP genotype for the herbicide susceptible DNA pool was CC/CC/CC for the A-, B- and D-genomes, respectively.
- the genotype for the herbicide tolerant DNA pool was CC/TT/CC for the A-, B- and D-genomes, respectively.
- the AHAS gene was amplified from the two DNA pools using primers that amplified conserved regions of the A-, B- and D- genomes (cacgacgttgtaaaacgacGTAGGACAAGAAACTTGCATG (LS-MI900 forward; SEQ ID NO: 5), and ggttttcccagtcacgacGTAGGACAAGAAACTTGCATG (LS-IMI900 reverse; SEQ ID NO: 6)).
- Amplification, sequencing and comparison of the resulting termination fragment profiles was performed essentially as described in Example 1.
- allelic SNPs and HSVs would have been detected as mixed peaks in Sanger sequencing chromatograms produced using standard sequence analysis software.
- An advantage of the method of the invention compared to conventional Sanger sequencing, is that analyzing the termination fragment profiles for one dideoxynucleotide at a time allows the simplification of the complex DNA mixture such that allelic differences between the DNA pools can be readily identified ( Figure 3).
- aneuploid stocks for bread wheat (Sears In: Riley and Lewis (eds) Chromosome manipulation and plant genetics. Oliver and Boyd, Edinburgh, pp29-50, 1965) were used to determine the genotypes at the HSVs in the AHAS gene.
- conserved primers (LS-IMI900 forward and reverse) were used to amplify the AHAS gene from the nullisomic-tetrasomic wheat stocks for the group 6 chromosomes essentially as described in Example 1. Sequencing and analysis were performed essentially as described in Example 1 using the resulting PCR products and the termination fragment profiles for each nullisomic-tetrasomic stock was compared with the euploid wheat Chinese Spring.
- the absence of a termination fragment in the ddC electrotrace of the nullisomic 6B stock indicated that the HSV genotype on the B-genome was CC, while the presence of the termination fragment in the remaining two nullisomic lines and presence of a termination fragment in the ddT electrotrace at the same position indicated that the HSV genotypes for both the A- and D -genomes was TT.
- the HSV genotypes determined from the electrotraces corresponded to the published sequence for the three AHAS genes.
- This example describes an embodiment of the present invention in which the sequence of nucleic acid flanking or adjacent to a mutation in a genome of a polyploid subject is determined by separately determining one or a plurality of positions of each of the four naturally-occurring nucleotides in a target sequence from an AHAS gene in bread wheat, aligning those sequences and overlaying those sequences.
- Characterization of both the position and nucleotide substitution of a mutation or polymorphism, as well as determination of the flanking nucleotide sequence in a single assay allows for the immediate development of a genetic marker to permit simple detection of the mutation or the SNP.
- target genes harboring known SNPs from barley and bread wheat lines were amplified.
- An exemplary gene is the AHAS gene amplified using the primers LS-IMI900 forward and LS-EVII reverse.
- assays were performed essentially as described in Example 1 to amplify termination fragment profiles for each of the four dideoxynucleotides.
- the nucleotide sequence of the sample was determined by overlaying the electrotraces for each of the dideoxynucleotides and assigning the nucleotide sequence from the termination fragments with the greatest peak heights (Figure 5), according to the principles for sequence determination described for manual Sanger sequencing (Sanger et al. Proc. Natl. Acad. Sd. USA, 74: 5463-5467, 1977). The accuracy of the derived nucleotide sequence was determined by comparison to the published sequence.
- Example 5 Polymorphisms are characterized by a mobility shift between termination fragments
- This example describes an embodiment of the present invention in which a variable nucleotide is detected in a target sequence by detecting a reduced level of nucleic acid fragments corresponding to an allele of the variable nucleotide in samples from polyploid subjects comprising the variable nucleotide. Moreover, by aligning sequencing results that detect each allele of the variable nucleotide, a change in molecular weight of nucleic acid fragments comprising ddNTP corresponding to each allele was identified.
- target genes harboring known SNPs from three pools of bread wheat DNA containing different doses of each SNP allele were amplified using the primers cacgacgttgtaaaacgacGCCAAATCTGTTGGCGATTA (37D forward; SEQ ID NO: 7) and ggttttcccagtcacgacCGTTCGCCAACGCCCGGA (37D reverse; SEQ ID NO: 8).
- the first pool of DNA comprised a single homozygous wheat line, and was therefore fixed for one of the SNP alleles present at the target locus.
- the second pool of DNA comprised two homozygous wheat lines, each with a different SNP allele at the target locus. Hence, the frequency of each SNP allele in the second pool was 50%.
- the third pool of DNA contained five homozygous wheat lines, four of which had the same SNP allele at the target locus while the fifth line had the alternate SNP allele. Hence, the frequency of the two alleles in the third DNA pool was 80 and 20%, respectively.
- assays were performed to amplify termination fragment profiles for each of the four dideoxynucleotides from each of the three DNA pools essentially as described in Example 1. The electrotraces for the four dideoxynucleotides for each DNA pool were overlaid, and the resulting composite electrotraces were aligned with one another as shown in Figure 7.
- the presence of a SNP was associated with a shift in the mobility of the termination fragments corresponding to the sequence polymorphism when electrotraces overlaid for the four dideoxynucleotides were aligned for different samples.
- the presence of a SNP was detected as a distinct difference in the mobility of a single termination fragment corresponding to the SNP ( Figure 6 and Figure 7A).
- the presence of a SNP was observed as two distinct termination fragments with a smaller mobility difference than would be expected between termination fragments corresponding to adjacent nucleotides ( Figure 7A).
- termination fragments corresponded to the presence of the two SNP alleles in heterozygous samples, and were detectable even when the alleles were present at different frequencies (compare pools 2 and 3 in Figure 7A). Detection of a mobility shift in the termination fragments corresponding to a SNP in heterozygous samples was facilitated by examining individual electrotraces for the four dideoxynucleotides (as shown in Figure 7 A and 7B).
- Amplification of a nucleic acid harboring a SNP amplified from a heterozygous sample using the method of the invention is expected to generate two equally sized termination fragments at the position of the SNP, each of which is terminated by one of the dideoxynucleotides corresponding to the nucleotide variation.
- the two termination fragments have equal nucleotide length, termination by different dideoxynucleotides results in a subtle difference in their electrophoretic mobility that can be visualized in the aligned electrotraces of the four dideoxynucleotides.
- Sensitivity of SNP detection in mixed samples This example describes an application of an embodiment of the present invention in the identification of a polymorphism in a sample comprising pooled genomes from a plurality of subjects. As described below this embodiment of the invention detected an allele of the SNP in samples in which the frequency of the allele compared to another allele in the sample was 5:1.
- SNPs in pooled samples presents an opportunity to increase assay throughput and reduce costs. However, it also presents several challenges due to the requirement to detect mutations that may be present at low frequency in a sample.
- target nucleic acids harboring SNPs from pooled genomic DNA of barley lines with known genotypes were amplified.
- An exemplary nucleic acid is from the triticin precursor gene amplified using the primers cacgacgttgtaaaacgacTGCAACTTGCGAAACGAACC (hvLSP45 forward; SEQ ID NO: 10) and ggttttcccagtcacgacAGTTGCCCCGGGCTAAGAAG (hvLSP45 reverse; SEQ ID NO: 11).
- Target nucleic acids were amplified from a total of five DNA pools. The first DNA pool consisted of a single homozygous barley line, while the remaining pools were comprised of a mixture of two homozygous lines with alternate SNP alleles.
- the frequency of the SNP alleles in the second, third, fourth and fifth DNA pools was 1:1, 2:1, 5:1 and 11:1, respectively.
- termination fragment profiles for the four dideoxynucleotides from each DNA pool were produced essentially as described in Example 1.
- the sensitivity of the assay for SNP detection was assessed using a two-tiered approach. First, the ability to detect the SNP in pooled samples with different allele dosage was assessed by overlaying the electrotraces for individual dideoxynucleotides from DNA pools 2 to 5 with the corresponding electrotrace from DNA pool 1 ( Figure 9A).
- the presence of the SNP in the overlaid electrotraces could be identified by a difference in the peak height of the termination fragments corresponding to the sequence polymorphism in pooled samples with an allele frequency of 17% (5:1 ratio of allele A to allele B; Figure 9A).
- the overlaid electrotraces for the termination fragment profiles of the four dideoxynucleotides amplified from each DNA pool were aligned.
- the presence of the SNP could be visually determined by a mobility shift in the termination fragments corresponding to the sequence variation in the DNA pools with an allele frequency of 17% (5:1 allele ratio; Figure 9B). Detection of both a peak height difference and a mobility shift facilitates reliable identification of unknown sequence variation in pooled samples containing a mutation with less than 20% representation.
- This example describes an embodiment of the invention in which a polymorphism is identified in a genome of a polyploid subject and in which target nucleic acids amplified from different genomes of the polyploid subject are different lengths as a result of a microsatellite in the sequence.
- primers comprising the nucleotide sequence cacgacgttgtaaaacgacGCAAAGTGTAGCCGAGGAAG (SEQ ID NO: 26) and ggttttcccagtcacgacTTAGAGTTTTGCAGCGCCTT (SEQ ID NO: 27) were used to amplify homoeologous sequences containing a microsatellite repeat from chromosomes 2A and 2D of bread wheat.
- the target sequences were amplified using conserved primers from the varieties Stylet and Wylkatchem, and Chinese Spring nullisomic- tetrasomic stocks for the group 2 chromosomes. Separation of the PCR products on an 8% sequencing gel revealed the amplification of two fragments of 173-bp and 236-bp from Stylet, and 165-bp and 236-bp from Wylkatchem, which were assigned to chromosomes 2A and 2D using the nullisomic-tetrasomic stocks. Assays of the present invention using each of the four dideoxynucleotides individually were performed using the resulting PCR products amplified from each wheat line.
- allelic SNPs the termination fragment profiles for Stylet and Wylkatchem were aligned and compared for the presence of sequence variation.
- the chromosomal origin of putative SNPs was determined by further comparing the aligned termination fragment profiles of the two wheat varieties with those for the nullisomic-tetrasomic stocks.
- the PCR fragments amplified from Stylet and Wylkatchem using the cfd36 forward primer comprising the sequence CACGACGTTGTAAAACGACGCAAAGTGTAGCCGAGGAAG (SEQ ID NO: 28) and the cfd36 reverse primer comprising the sequence GGTTTTCCCAGTCACGACTTAGAGTTTTGCAGCGCCTT (SEQ ID NO: 29) were sub-cloned and sequenced by Sanger sequencing. Alignment of the sequences amplified from chromosome 2A confirmed that microsatellite repeat length variation was responsible for the observed 8-bp allele size difference, and the presence of the three SNPs identified by the method of the present invention in the flanking region. No sequence variation was observed for the 236-bp PCR fragments amplified from chromosome 2D ( Figure 11).
- nucleotide sequence flanking each SNP was determined from the termination fragment profiles of the four dideoxynucleotides and used to design allele-specific primers. DNA typing of the six wheat varieties on a
- LuminexlOO instrument using an allele-specific primer extension assay revealed the expected genotypes (Table 1). Subsequent genetic mapping of the SNPs in segregating doubled haploid populations confirmed the location of the SNP loci on their expected group 3 chromosome.
- Each primer set was expected to amplify homoeologous sequences from the three wheat genomes to produce a mixture of DNA fragments with either the same or different lengths.
- Each gene was also amplified from Chinese Spring nullisomic- tetrasomic wheat stocks for the group 3 chromosomes. An assay was performed essentially as described in Example 1 using the resulting PCR products to amplify termination fragment profiles for the four dideoxynucleotides, and the electrotraces were assessed for sequence variation.
- HSVs homologue-sequence variants
- This example describes an embodiment of the present invention in which polymorphisms or mutations linked to a boron toxicity tolerance gene were identified.
- the method described in this example comprises performing a method of the present invention to identify polymorphisms or mutations in a region of chromosome 7B in bread wheat linked to boron toxicity tolerance using pooled or bulked DNA samples from bread wheat tolerant to boron toxicity or susceptible to boron toxicity. By identifying variable nucleotides between the two pools or bulks of DNA a polymorphism or mutation associated with or linked to boron toxicity tolerance is identified. Using nullisomic-tetrasomic lines of bread wheat the identified polymorphisms or mutations are mapped to a genome of bread wheat. HSVs were also identified to facilitate production of primers capable of amplifying the genome of bread wheat comprising a polymorphism or mutation associated with an allele of a gene conferring boron toxicity tolerance.
- Each primer pair was expected to amplify homoeologous nucleic acids from each of the three wheat genomes with, potentially, different length PCR fragments.
- Table 2 Sequences of primers used to amplify candidate genes. Sequence (5'— >3') corresponding to the Ml 3 sequencing primer sequence and locus-specific sequence is in lower and upper case, respectively.
- Allelic SNPs linked to the BoI gene were identified in all six candidate genes. As represented in Figure 14 A, complex termination fragment profiles were observed due to the amplification of all three wheat genomes including HSVs and allelic SNPs. However, allelic SNPs were readily identified by comparing the termination fragment profiles for the two DNA bulks, in which the presence of HSVs were masked as a result of genetic identity between the bulks except at locations of the genome linked to the BoI locus.
- the arrows in Figure 14A indicate the first sequence variation between the two DNA bulks that could be used either directly as, or to develop a linked marker for resistance or susceptibility to boron toxicity.
- the termination fragment profiles produced using DNA from the nuUisomic-tetrasomic stocks were compared with the termination fragment profiles produced using DNA from bulked DNA, samples. HSV genotypes were inferred by the absence of termination fragments in the nullisomic-tetrasomic lines and presence in bulked DNA samples.
- the absence of a termination fragment in the ddC electrotrace of the nullisomic 7B stock indicated that the HSV genotype on the B- genome was CC, while the presence of the termination fragment in the remaining two nullisomic lines and presence of a termination fragment in an alternative electrotrace at the same position indicated the HSV genotypes for both the A- and D-genomes was GG ( Figure 14C).
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Immunology (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention provides a method for identifying a polymorphism or mutation in a genome of a polyploid subject, comprising comparing sequences within a target nucleic acid from genomes of two or more polyploid subjects, and determining linkage between one or more variable nucleotides in the sequences and one or more nucleotides within a specific genome of a polyploid subject, thereby identifying a polymorphism or mutation in a genome of a polyploid subject. The present invention also provides a method for producing a probe or primer comprising a sequence capable of preferentially or specifically annealing or hybridizing to a nucleic acid comprising a polymorphism or mutation identified by the method of the invention. The present invention also provides for the use of the method to determine variation in populations, associations between polymorphisms or mutations and specific traits, and in breeding.
Description
Mapping method for polyploid subjects
Cross-reference to related application
The present application claims priority from USSN 61/ 042,549 filed in the United States Patent and Trademark Office, the contents of which are incorporated by reference in their entirety.
Field of the invention
The present invention is in the field of molecular biology and, more particularly, relates to methods for identifying nucleotide sequence variations, e.g., polymorphisms or mutations in nucleic acid from polyploid subjects.
Background of the invention
Description of related art
Genetic variations between organisms, such as polymorphisms and mutations are routinely used as genetic markers in a variety of assays for, for example, gene mapping, positional cloning, identification of individuals, marker-assisted breeding, genotype/phenotype association, or for determining a subject likely to have a trait of interest. This is because such genetic markers may be in close proximity and, as a consequence linkage disequilibrium with an allele causative of a phenotype of interest and detection of the genetic marker is indicative of the presence of the allele, or a genetic marker may be causative of a phenotype of interest. In this respect, a skilled artisan will understand that a genetic marker is a variation in a nucleic acid that is polymorphic or has a plurality of different alleles that occur within a population. Preferred genetic markers are those that are readily detectable.
Single nucleotide polymorphisms (SNPs) are the most common type of genetic marker Picoult-Newberg et al, Genome Res., 9: 167-174, 1999). SNPs are a particularly versatile class of genetic marker, because they occur in all regions of a genome, e.g., both coding and non-coding regions, and occur regularly throughout a genome. For example, in wheat a SNP occurs approximately every 200 base pairs (bp) (Ravel et al., In: Vollman et al, (Eds) Genetic variation for plant breeding, Eucarpia: Tulln, Australia, ppl77-181); in cotton (Gossypium hirsutum L.) a SNP is estimated to occur approximately every 43 bp (An et al, MoI Genet Genomics. 2007 Aug 28); and in soybean a SNP is estimated to occur every 273 bp (Zhu et al, Genetics, 163: 1123- 1134, 2003). The high density and mutational stability of SNPs make them particularly
useful genetic markers for population genetics and for mapping genes associated with complex traits.
Traditional methods for identifying a SNP in a diploid subject involve sequencing a target nucleic acid using dideoxynucleotide-mediated sequencing, e.g., Sanger sequencing. Generally, these methods involve amplifying a target nucleic acid using a polymerase in the presence of fluorescently labeled 2',3'-dideoxynucleotide triphosphates (ddNTPs). Each nucleotide is labeled with a different fluorescent label. In this reaction, incorporation of an individual ddNTP into an amplicon terminates amplification, with that amplicon comprising the fluorescent marker conjugated to the ddNTP. By separating the resulting amplicons according to molecular weight, e.g., by electrophoresis, and detecting the order in which each fluorescent label occurs, the nucleotide sequence of the target nucleic acid is determined. This sequence is then compared to a sequence from another subject to detect the occurrence of two different nucleotide residues, i.e., two different fluorescent labels, at the same position, i.e., having the same molecular weight. The occurrence of two different nucleotide residues in the same position is indicative of a SNP.
The identification of SNPs in polyploid subjects is more difficult than in diploid subjects because the SNPs must be detected in a mixture of nucleotide sequences. For example, variation between homoeologous sequences, which occur on different copies of the genome, and paralogous sequences, i.e., (duplicate copies that may exist within a given genome) dramatically increase the level of complexity of sequence information, and traditional SNP identification methods are simply unable to produce an unambiguous interpretation of such complex sequences. In this respect, when homoeologous sequences or paralagous sequences in a polyploid subject have the same length, i.e., the same number of nucleotides, the identification of allelic SNPs is confounded by the presence of homolog-sequence variants (HSVs), i.e., sequence variations between homoeologous chromosomes, or by paralog-sequence variants (PSVs), i.e., sequence variations between paralagous sequences. Such HSVs and PSVs generally do not vary between individuals within a population of related polyploid subjects, e.g., polyploid subjects of the same species or cultivar or variety. In contrast to a HSV or a PSV, the nucleotide residues at the site of a SNP varies within a population of individuals. However, SNPs, HSVs and PSVs all appear as sequence variations using traditional SNP identification methods. Accordingly, such methods are unable to
distinguish between genetic variations that occur between genomes within a single polyploid subject, and truly polymorphic variations between polyploid subjects.
The identification of SNPs in polyploid subjects is further complicated by the fact that many homoeologous sequences are of different lengths, e.g., as a result of insertions and/or deletions in one or more genomes of a polyploid subject. Such variations are more common in sequences occurring in non-coding regions of the genome. A result of such an insertion or deletion is that there is a frame-shift when sequences on different genomes are aligned. Moreover, when a traditional dideoxynucleotide terminated sequencing method is performed, the method detects two or more nucleotide residues at the same position as a result of this frame-shift. To account for the complexity associated with sequencing nucleic acid from a polyploid genome, sequencing has traditionally been performed by a target nucleic acid from one or more genomes of a polyploid subject, subcloning the amplified nucleic acid and isolating individual clones, i.e., comprising a target nucleic acid from one genome of the polyploid subject, and separately sequencing target nucleic acid from the individual clones to thereby determine the sequence of each genome. Clearly the requirement for subcloning and multiple sequencing reactions is both time-consuming and expensive.
As a consequence of the difficulties associated with the identification of SNPs and other polymorphisms e.g., INDELs (insertion or deletion of one or more nucleotides) in polyploid subjects, large-scale identification of SNPs in polyploid subjects has generally taken advantage of computational analysis of pre-existing datasets such as expressed sequence tags (ESTs). For example, Tang et al, BMC Bioinformatics, 7: 438-453, 2006 describes an algorithm for analyzing sequences from EST data to identify SNPs in polyploid subjects. However, EST collections are usually generated from a limited number of genotypes, i.e., strains, lines, varieties and/or cultivars. As a consequence, these methods fail to capture significant proportions of SNP variation, including diversity relevant to specific germplasm. Moreover, ESTs are only generated from sequences that are transcribed to produce mRNA and, as a consequence methods based on EST sequence data do not identify polymorphisms located in untranscribed regions of the genome. Most in silico methods for analyzing EST data are also unable to distinguish allelic variation, e.g., a SNP from HSVs or PSVs.
Moreover, in silico analysis methods are unable to distinguish between a sequencing error and an actual SNP or mutation in a target nucleic acid.
Other methods for SNP identification in polyploid subjects are based on analysis of the sequence of a specific candidate gene. Such methods rely on high-throughput sequencing and resequencing using approaches such as Sanger sequencing or pyrosequencing (Hinds et al, Science 307: 1072-1079, 1995, Margulies et al Nature 437: 376-380, 2005). In polyploid plants, these approaches require an initial step to specifically amplify each target locus using PCR primers designed to unique flanking sequence, or a time-consuming and expensive sub-cloning step to isolate individual homologs. Due to a lack of sequence information for most plant species and high gene sequence conservation it is often difficult to identify locus-specific primers. Moreover, as discussed supra, homoeologous and/or paralogous sequences are often variable in length, leading to the generation of complex sequence information that is difficult, if not impossible to analyze. Accordingly, in most cases target nucleic acids are amplified and sub-cloned prior to sequencing. Such methods are time consuming and expensive.
It is clear from the foregoing that there is a need in the art for a simple method for identifying polymorphisms in polyploid subjects that does not require amplification and sub-cloning of a target nucleic acid prior to sequencing and sequence analysis. Such a method has application in the discovery of genetic markers useful in, for example, gene mapping, positional cloning, identification of individuals, marker-assisted breeding, genotype/phenotype association, or for determining a subject likely to have a trait of interest.
General The following publications provide conventional techniques of molecular biology. Such procedures are described, for example, in the following texts that are incorporated by reference:
1. Sambrook, Fritsch & Maniatis, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, New York, Third Edition (2001), whole of VoIs I, II, and III;
2. DNA Cloning: A Practical Approach, VoIs. I and II (D. N. Glover, ed., 1985), IRL Press, Oxford, whole of text;
3. Ausubel et al, Current Protocols in Molecular Biology. Wiley Interscience, ISBN 047 150338, 1987, whole of text;
4. Oligonucleotide Synthesis: A Practical Approach (M. J. Gait, ed., 1984) IRL Press, Oxford, whole of text, and particularly the papers therein by Gait, ppl-22; Atkinson et ah, pp35-81; Sproat et ah, pp 83-115; and Wu et ah, pp 135-151;
5. Nucleic Acid Hybridization: A Practical Approach (B. D. Hames & S. J. Higgins, eds., 1985) IRL Press, Oxford, whole of text;
6. Perbal, B., A Practical Guide to Molecular Cloning (1984);
Summary of the invention
In work leading up to the present invention, the inventors sought to produce a method that permitted them to readily identify polymorphisms in nucleic acid from polyploid subjects, without the requirement for subcloning amplified nucleic acid. The inventors also sought to produce a method that distinguishes between allelic polymorphisms and HSVs or PSVs.
As exemplified herein e.g., example 2 hereof, the inventors have produced a method that involves performing a sequencing reaction with only a single ddNTP to thereby determine the location of each occurrence of the corresponding nucleotide in a target nucleic acid from a subject, and aligning the sequence data with sequence data of another subject, and then comparing data sets to thereby determine a sequence variation between subjects. The inventors then detected the nucleotide variation in DNA from a plurality of polyploid subjects, wherein some of the subjects comprise one form of the variation and some subjects comprise a different form of the variation. The inventors demonstrated that the variation that exists between subjects is polymorphic, and is not a HSV or a PSV. Following identification of the nucleotide variation, the inventors detected the presence or absence of the nucleotide variation in a plurality of polyploid subjects that differ from one another in so far as at least a region of one genome comprising the target nucleic acid is deleted. By comparing results from subjects comprising a deletion in different genomes, the inventors not only demonstrated that the nucleotide variation is polymorphic but also determined on which genome the nucleotide variation resides.
This method has utility in detecting polymorphisms(e.g., SNPs) and/or mutations (e.g., INDELs) in polyploid subjects, e.g., produced using targeting induced local lesions in genomes (e.g., TILLING), thereby permitting rapid screening of mutants having a phenotype of interest to identify the genotype of that mutant.
Because this method has the capability of analysing large numbers of mutations and polymorphisms in nucleic acid fragments, it is particularly compatible with, albeit not restricted to, sequencing tools for analyzing longer stretches of nucleic acid e.g., more than about 250 bp or 300 bp or 350 bp or 400 bp or 450 bp or 500 bp or 550 bp or 600 bp, or more.
As exemplified herein e.g., example 3 hereof, the present inventors have shown that their generic method may be performed in pooled DNA samples, e.g., bulked segregant analysis, thereby facilitating rapid identification of polymorphisms associated with a trait of interest in a polyploid organism.
Also exemplified herein e.g., example 5 hereof, the present inventors have shown that it is possible to rapidly detect one or more variable nucleotide sequences within or flanking an allele of interest in a target sequence, by aligning sequence data of allelic variants and detecting a mobility shift between termination fragments produced in sequencing reactions. This mobility shift may result from a change in molecular weight of a nucleic acid fragment comprising ddNTP as a consequence of an insertion, deletion or other mutation in an allele. Thus, by aligning sequence data from reactions obtained using each of the four ddNTPs, the inventors have determined sequences flanking or surrounding one or more nucleotide variations. This permits identification of genetic markers and enables production of reagents e.g., a probe or primer to detect the nucleotide variation.
Also exemplified herein e.g., example 6 hereof, the present inventors have shown that the method of the present invention is sufficiently sensitive to permit the detection of polymorphisms e.g., SNPs , in mixed samples e.g., comprising pooled genomes from a plurality of subjects. For example, the method is broadly applicable to detecting alleles present at low frequency in polyploid organisms, e.g., a single allele in a diploid, tetraploid, allotetraploid or hexaploid organism such as occur in the grasses.
Also exemplified herein e.g., example 7 hereof, the present inventors have shown that the method of the present invention is suited for detecting polymorphisms in complex mixtures of nucleic acids, e.g., wherein target nucleic acids amplified from different genomes of polyploid subjects comprise microsatellites.
Also exemplified herein e.g., example 8 hereof, the present inventors have shown that the method of the present invention is capable of detecting polymorphisms or mutations de novo in genomes of polyploid subjects.
It will be understood to the skilled artisan that the inventive method has broad applicability to breeding in polyploid subjects, e.g., where the identification of one or more polymorphisms in one or more homeologous genes that are not conserved across all genomes of the polyploid is required for determining the breeding contribution of a particular parent. For example, the inventive method can be employed for identifying polymorphisms and/or mutations in one or more genes associated with trait(s) such as, for example yield and/or flour quality and/or flowering and/or disease-resistance and/or tolerance to abiotic stress such as drought, salinity, frost, heat, etc.
In one example, polymorphisms and/or mutations associated with wheat flour quality, including suitability for bread or noodle or biscuit making, are identified in a starch biosynthesis gene of tetraploid and/or hexaploid wheat, e.g., a Wx gene encoding granule-bound starch synthase (GBSS). More particularly, the method of the invention has permitted the identification of an INDEL (i.e., one or more insertions or deletions) in a Wx gene of tetraploid wheat.
In another example, polymorphisms and/or mutations associated with wheat flour quality, including suitability for bread or noodle or biscuit making, are identified in a storage protein-encoding gene of tetraploid and/or hexaploid wheat, e.g., a gene encoding a glutenin subunit e.g., GIu-Al, GIu-Bl, GIu-Dl, GIu-AS, GIu-BS, GIu-DS, Glu-D4 or GIu-DS or allele thereof, or alternatively, encoding a gliadin e.g., an allele of a GIi-I or Gli-2 gene. For example, the invention may be employed to identify and select for the presence of a Glu-Dld allele in hexaploid wheats, which enhances bread- making quality of wheat by virtue of an extra cysteine residue in the encoded glutenin subunit, thereby improving bread dough strength. Alternatively, or in addition, the invention is employed to select against the presence of a glutenin null allele e.g., a GIu- BIa null allele or a Glu-Alc null allele. Alternatively, or in addition, the invention is employed to select lines having one or more over expressed glutenin-encoding genes e.g., an over expressed Glu-Blal allele.
In another example, polymorphisms and/or mutations associated with flowering time and/or time to physiological maturity of tetraploid or hexaploid wheats e.g., winter
wheat or spring wheat, are identified in a vernalization-response gene e.g., Vrnl, Vm2, Vrn3 or allele thereof and/or a photoperiod gene e.g., Ppdl or Ppd2 or allele thereof. For example, breeding lines are characterized by phenotype in the absence of vernalization and/or following vernalization, and phenotypic outliers in the population are selected and analyzed for the presence of polymorphisms and/or mutations in a Ppd2 gene. Measurable phenotypes for identifying polymorphisms and/or mutations in a Ppd2 gene associated with flowering time and/or time to physiological maturity of tetraploid or hexaploid wheats are selected e.g., from the group consisting of growth rate, flowering time, time to head emergence, main stem leaf number at flowering, and combinations thereof.
In another example, polymorphisms and/or mutations associated with improved tolerance of plants to abiotic stress conditions such as drought or frost are identified in a fructan biosynthesis gene of tetraploid and/or hexaploid wheat, including Chinese Spring wheat, e.g., a gene encoding fructan 1-exohydrolase (FEH) such as an allele in a /-FEH gene e.g., 1-FEH-6A, 1-FEH-6B, 1-FEH-6D or allele thereof. For example, the invention may be employed to identify and select for the presence of one or more 1- FEH-6D alleles e.g., in Chinese Spring wheat. More particularly, QTLs in the regions of 1—FEH genes have been mapped to chromosomes 6D and 7A in a population of double haploid lines derived from a cross between the wheat lines Berkut (a high fructan wheat) and Krichauff (a low fructan wheat), and validated in a population derived from a cross between the lines Sokoll (a high fructan wheat) and Krichauff, and the QTLs on chromosome 6D shown to coincide with the 1-FEH-6D gene. Based on sequence data pertaining to the l-FEHw2-6D allele (NCBI Accession No. AJ508387) primers flanking intron sequences in the 1-FEH-6D gene were designed and used in the method of the invention to identify polymorphisms in 1-FEH-6D genes by comparing the sequences of the wheat lines Berkut, Krichauff, Sokoll, and by bulked segregant analysis of Berkut x Krichauff lines and Krichauff x Sokoll lines carrying Berkut or Krichauff or Sokoll alleles in a 1-FEH-6D gene, wherein Berkut and Sokoll alleles are associated with improved abiotic stress tolerance e.g., improved frost tolerance and/or improved drought tolerance.
In another example, polymorphisms and/or mutations associated with improved disease resistance of plants e.g., improved resistance to a nematode such as Heterodera avenae or root lesion nematode Pratylenchus neglectus. For example, alleles associated with resistance to P. neglectus are identified within or in the vicinity of a Rlnnl gene on
chromosome 7 or the A, B, or D genome of hexaploid wheat, and resistance alleles are present in a number of wheat varieties, including Excalibur and Krichauff. In this example, the invention is employed to identify and select for the presence of one or more Rlnnl alleles. More particularly, a QTL has been identified positioned near a SSR marker gwm344, approximately 13.6 cM from Rlnnl gene on chromosome 7 A in a population derived from a cross between the wheat lines Kukri (susceptible to P. neglectus) and Excalibur (resistant to P. neglectus), and deletion mapping of SSR markers performed to determine a region linked to the QTL. Based on sequence data pertaining to Expressed Sequence Tags (ETSs) in this region of the chromosome, and intron-exon boundary sequence data from syntenous genes in rice and Brachypodium sp., primers flanking intron sequences, especially small introns, in wheat Rlnnl genes are designed and used in the method of the invention to identify polymorphisms in wheat Rlnnl genes by comparing sequences in resistant wheat lines (e.g., Krichauff, Excalibur) and susceptible wheat lines (e.g., Kukri, Tammin, Trident), and by bulked segregant analysis of Krichauff x Kukri and/or Krichauff x Tammin and/or Krichauff x Trident and/or Excalibur x Kukri and/or Excalibur x Tammin and/or Excalibur x Trident lines carrying Krichauff or Excalibur or Kukri or Tammin or Trident alleles in a Rlnnl gene, wherein Krichauff or Excalibur alleles are associated with improved resistance.
In yet another example, e.g., example 2 hereof, the present inventors have shown that the method of the present invention is suitable for detecting a polymorphism in a nodulin-like gene of hexaploid bread wheat.
In yet another example, e.g., example 3, hereof, the present inventors have shown that the method of the present invention is suitable for detecting a polymorphism in an acetohydroxyacid synthase (AHAS) gene of hexaploid bread wheat, wherein the polymorphism confers resistance to the herbicide imidazolinone.
In yet another example, e.g., example 9 hereof, the present inventors have shown that the method of the present invention is suitable for detecting polymorphisms or mutations linked to the BoI gene on chromosome 7B of hexaploid bread wheat, wherein the polymorphism or mutation confers tolerance the heavy metal boron.
It will also be understood that the exemplification of the invention in a hexaploid organism demonstrates a difficult test case due to the presence of three genomes i.e.,
A,B,D genomes, and acknowledge the applicability of the invention to identifying polymorphisms in less complex organisms e.g., diploid and tetraploid organisms.
Specific embodiments The scope of the invention will be apparent from the claims as filed with the application that follow the examples. The claims as filed with the application are hereby incorporated into the description. The scope of the invention will also be apparent from the following description of specific embodiments and/or detailed description of preferred embodiments.
In one example, the present invention provides a method for identifying a polymorphism or mutation in a genome of a polyploid subject, comprising comparing sequences within a target nucleic acid from genomes of two or more polyploid subjects, and determining linkage between one or more variable nucleotides in the sequences and one or more nucleotides within a specific genome of a polyploid subject, thereby identifying a polymorphism or mutation in a genome of a polyploid subject.
For example, the present invention provides a method for identifying a polymorphism or mutation in a genome of a first polyploid subject, said method comprising: (i) comparing a position or a plurality of positions at which a specific nucleotide residue occurs within a target nucleic acid from genomes of the first polyploid subject to the position(s) at which the specific nucleotide residue occurs in a corresponding target nucleic acid from genomes of a second polyploid subject to thereby identify one or more variable nucleotides between the genomes of the first and second polyploid subjects; and
(ii) determining linkage between the identified one or more variable nucleotides and one or more nucleotides within a genome of the first polyploid subject, thereby identifying a polymorphism or mutation in the genome of the first polyploid subject.
For example, the present invention provides a method for identifying a polymorphism or mutation in a genome of a first polyploid subject, said method comprising: (i) determining a position or a plurality of positions at which a specific nucleotide residue occurs within a target nucleic acid from genomes of the first polyploid subject; (ii) comparing the position(s) at which the specific nucleotide residue occurs in (i) to the position(s) at which the specific nucleotide residue occurs in a corresponding target nucleic acid from genomes of a second polyploid subject to thereby identify one or more
variable nucleotides between the genomes of the first and second polyploid subjects; and
(iii) determining linkage between the identified one or more variable nucleotides and one or more nucleotides within a genome of the first polyploid subject, thereby identifying a polymorphism or mutation in the genome of the first polyploid subject.
As used herein, the term "polymorphism" shall be taken to mean a difference in the nucleotide sequence of a specific site or region of the genome of a subject that occurs in a population of individuals. Exemplary polymorphisms include a simple sequence repeat or microsatellite marker, e.g. in which the length of the marker varies between individuals in a population or a simple nucleotide polymorphism. The skilled artisan will understand that a simple nucleotide polymorphism is a small change (e.g., an insertion, a deletion, a transition or a transversion) that occurs in one or more genomes of a population of polyploid subjects. For example, a simple nucleotide polymorphism comprises or consists of an insertion or deletion or transition or transversion of one, or two or three or five, or ten or twenty nucleotides in one or more genomes of a polyploid subject. Preferably, the polymorphism is a single nucleotide polymorphism (SNP).
As used herein, the term "mutation" shall be taken to mean a permanent, transmissible change in nucleotide sequence of the genome of a subject and optionally, an expression product thereof that is causative of a trait. Examples of mutations include an insertion of one or more new nucleotides or deletion of one or more nucleotides or substitution of one or more existing nucleotides with different nucleotides.
As used herein, the term "polyploid subject" shall be taken to mean a subject having a genome comprising more than two sets of homologous chromosomes. For example, a subject having three sets of homologous chromosomes is a triploid subject, a subject having four sets of homologous chromosomes is a tetraploid subject, a subject comprising six sets of homologous chromosomes is a hexaploid subject, etc. Exemplary triploid subjects include banana plants, apple plants, ginger plants or watermelon plants. Exemplary tetraploid subjects include tetraploid wheat, such as, for example, Triticum turgidum (e.g., var. durum, polonicum, persicum, turanicum or turgidum) or T. durum, maize, cotton, potato, cabbage, leek, tobacco, peanut, kinnow or Pelargonium. Exemplary hexaploid subjects include hexaploid wheat, e.g., e.g., T. aestivum or T. compactum, triticale or oat. Exemplary octaploid subjects include strawberry, dahlia or pansy. Preferably, the polyploid subject is a tetraploid subject or a hexaploid subject.
In a preferred example, the first and second polyploid subjects are from the same species, variety, cultivar or line. For example, the method of the invention is performed using subjects of the same species that are different varieties thereby facilitating detection of a polymorphism or mutation occurring in one variety but not another. For example, such an assay is useful for identifying a polymorphism or mutation associated with or causative of a phenotype that occurs in one variety but not another variety. The present invention also clearly contemplates the first and second subjects being of the same or different species or the same or different varieties or the same or different cultivars or the same or different lines.
As used herein, the term "target nucleic acid" shall be taken to mean a nucleic acid that occurs within the genomes of a polyploid subject comprising a mutation or polymorphism. A "target sequence" may also occur a plurality of times within a single genome or within each genome of a polyploid subject, e.g., a paralogous sequence, i.e., a sequence arsing from replication within a single genome. A "target nucleic acid" on each genome or within each genome need not be identical, and the presence of a mutation or polymorphism in one genome will not be identical to that on another genome. This is not to say that the only variation between the copies of the target sequence in each genome or within a single genome need be the polymorphism or mutation identified using the method of the invention, rather a target sequence may comprise a plurality of mutations or polymorphisms or one or more HSVs that differ between target sequences in different genomes or PSVs that differ between target sequences in the same genome. A "target nucleic acid" in all genomes also need not be the same length. Preferably, the target sequence is at least about 10 nucleotides in length or 20 nucleotides in length or 30 nucleotides in length or 50 nucleotides in length or 100 nucleotides in length or 200 nucleotides in length or 300 nucleotides in length or 500 nucleotides in length or 1000 nucleotides in length or 2000 nucleotides in length.
As used herein, the term "variable nucleotide" shall be taken to mean a nucleotide residue within a target sequence that varies between two or more polyploid subjects. Preferably, a variable nucleotide has multiple allelic forms wherein one or more of those forms occur in a subject.
In one example, a method as described herein according to any embodiment additionally comprises amplifying a target nucleic acid prior to determining each position of a
nucleotide residue within a target sequence from genomes of a polyploid subject. The target nucleic acid is preferably amplified using any amplification reaction such as, for example, polymerase chain reaction (PCR). In such a method, the amplified nucleic acid shall also be considered to be a "target nucleic acid" within the meaning of this term in the present specification and claims. Accordingly, each position of a nucleotide residue within the amplified nucleic acid can also be determined, thereby determining each position of a nucleotide residue in the target nucleic acid.
In one example, a position or a plurality of positions at which the specific nucleotide residue occurs within a target nucleic acid is determined by performing a ddNTP- terminated sequencing method, e.g., Sanger sequencing, with a single ddNTP, e.g., ddATP or ddTTP or ddCTP or ddGTP. In this respect, by using a single ddNTP one or a plurality of positions at which the nucleotide corresponding to that ddNTP occurs within a target nucleic acid is determined. Methods of ddNTP-terminated sequencing will be apparent to the skilled artisan and/or described herein. The skilled artisan will appreciate that an electrotrace, or graphical representation depicting the molecular weight of each termination fragment produced using a ddNTP-terminated sequencing method is useful for determining the relative position of each occurrence of a nucleotide within a target nucleic acid. Preferably, one or a plurality of positions at which a nucleotide residue occurs within a target sequence is determined by performing a method comprising:
(i) performing an amplification reaction with a primer comprising a locus-specific region capable of annealing to the target nucleic acid and a tag-sequence that does not anneal to the target nucleic acid to thereby amplify the target nucleic acid, wherein amplicons of the target nucleic acid comprise the tag sequence;
(ii) performing a sequencing reaction with a primer capable of annealing to the tag sequence in the presence of a specific ddNTP; and
(iii) detecting the molecular weight of the nucleic acid fragments produced at (ii), wherein the molecular weight of each fragment corresponds to the position or positions of the specific nucleotide corresponding to the specific ddNTP in the target nucleic acid. In one example, the primer capable of annealing to the tag sequence is labeled with a detectable marker to facilitate detection of the nucleic acid fragments.
The present invention also contemplates other methods for sequencing a nucleic acid, such as, for example, making use of a labeled ddNTP rather than a labeled primer.
The skilled artisan will be aware that a ddNTP -terminated sequencing reaction employs multiple cycles of (i) denaturation of double-stranded nucleic acid such as a nucleic acid "template" to be copied or a hybrid between a "template" and a complementary "primer"; (ii) annealing of a primer to its complementary sequence in the single- stranded "template"; and (iii) extension of the primer in the 5'- to 3'- direction by a polymerase activity e.g., an activity of a thermostable polymerase, such as, Taq, to thereby produce a double-stranded nucleic acid comprising a newly-synthesized strand complementary to the single-stranded template. A ddNTP-terminated sequencing reaction in the context of the present invention is performed with a ddNTP, which when incorporated into a sequence by the polymerase prevents further extension of the newly- synthesized strand, i.e., terminates the reaction. The sequencing reaction is performed in the presence of all deoxynucleotide triphosphates (dNTPs) required to synthesize a copy of the template nucleic acid, i.e., adenosine, thymine, guanine and cytosine and one ddNTP. The ddNTP is included at a lower concentration than the corresponding dNTP to permit the dNTP to be incorporated more often than the ddNTP, thereby permitting production of nucleic acid fragments of sufficient length to sequence a target sequence. Sequencing reactions are known in the art and described, for example in Sambrook, Fritsch & Maniatis, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, New York, Third Edition (2001) and/or Ausubel et al, Current Protocols in Molecular Biology. Wiley Interscience, ISBN 047 150338, 1987. The reactions described in the previously discussed referenced are adapted for use in the present invention by including a specific ddNTP rather than using ddATP, ddTTP, ddCTP and ddGTP in a single reaction.
Preferably, the method of the invention comprises separately or independently determining each position at which a plurality of different nucleotide residues occur within a target nucleic acid. For example, the method comprises separately determining the position of a plurality of nucleotides within a target nucleic acid, e.g., determining the position of a plurality of adenosines in the target sequence and independently determining the position of a plurality of thymines in the target sequence and independently determining the position of a plurality of guanines in the target sequence and independently determining the position of a plurality of cytosines in the target sequence.
In one example, the step of comparing the position(s) at which a nucleotide residue occurs in the target nucleic acid from a first polyploid subject and a second polyploid
subject comprises aligning the position or plurality of positions at which the specific nucleotide residue occurs within target nucleic acids from genomes of the first and second polyploid subjects prior to comparing a position(s) at which the specific nucleotide residue occurs within a target nucleic acid from genomes of the first and second polyploid subjects. For example, the molecular weights of termination fragments from samples from each subject are analyzed to determine a pattern of termination fragments having similar molecular weights in both samples, and the results aligned based on the similar pattern of termination fragments. Such analysis is useful for aligning sequences from target nucleic acids of different size, e.g., as a result of an insertion or deletion in the target sequence from one of the subjects or in the presence of a microsatellite marker that varies in size between subjects. The skilled artisan will appreciate that comparison of electrotraces depicting the molecular weight of each termination fragment from target nucleic acids from different subjects is useful for such alignment. Sequences can be aligned manually, e.g., by identifying patterns of termination fragments having similar molecular weights. Alternatively, software implementing an algorithm that detects patterns is used to identify a pattern of termination fragments conserved between target sequences from different subjects. Such software is also useful for identifying differences between aligned sequences, e.g., as a result of a variable nucleotide. A suitable algorithm and/or computer software for implementing that algorithm will be apparent to the skilled artisan and/or described herein.
In one example, determining the specific nucleotide that occurs in the first polyploid subject but not in the second polyploid subject comprises detecting the presence of the nucleotide at a position in the target nucleic acid from genomes of the first polyploid subject and the absence of the nucleotide at the position in the corresponding target nucleic acid from genomes of the second polyploid subject. For example, a termination fragment is produced in a ddNTP termination sequencing method using nucleic acid from one polyploid subject, and the termination fragment is not detected using nucleic acid from another polyploid subject. For example, in the case of a polymorphism or mutation that occurs in one subject but not in another subject, a termination fragment will be produced from a sample from the subject comprising the polymorphism or mutation but will not be produced from a sample from the subject that does not comprise the polymorphism or mutation.
It will be apparent to the skilled artisan that a mutation or polymorphism can be heterozygous and one allele of the mutation or polymorphism is the same as the alleles on the homoeologous chromosomes in a polyploid subject. In such a situation, the allele that is present at the site of the mutation/polymorphism and on other homoeologous chromosomes is detected by a reduction in the number of copies of the allele in a first subject compared to a second subject. Accordingly, determining the specific nucleotide that occurs in a first polyploid subject but not in second polyploid subject comprises detecting an increased or decreased number of copies of the specific nucleotide at a position within the target nucleic acid from genomes of the first polyploid subject compared to within the corresponding target nucleic acid from genomes of second polyploid subject. For example, a reduction in the number of copies of the nucleotide is detected by detecting a reduced amount of termination fragments produced by performing ddNTP -mediated sequencing with the ddNTP corresponding to the allele that is also present on homoeologous chromosomes. As exemplified herein, such a change in the number of copies of a nucleotide in the genomes of a polyploid subject is readily detected using ddNTP-mediated sequencing. This is because the number of nucleic acid fragments terminated at the site of the variable nucleotide is reduced in a sample from a heterozygous subject, and the relative number of fragments produced is readily detectable, e.g., in the case of a sequencing reaction using fluorescently labeled primers by detecting a reduction in the level of fluorescence of nucleic acid fragments terminated at the site of the variable nucleotide.
In one example of the present invention, linkage between the variable nucleotide and one or more nucleotides in a genome of a polyploid subject is determined by detecting the presence or absence of the variable nucleotide in one or more first mutant polyploid subject(s) in which at least a corresponding target nucleic acid in a genome has been deleted or otherwise removed, and in one or more second mutant polyploid subject(s) in which a corresponding target nucleic acid in other genome(s) has been deleted or otherwise removed, wherein absence of the variable nucleotide in the first mutant subject, and presence of the variable nucleotide in the second mutant polyploid subject indicates that the variable nucleotide is linked to the genome comprising the deletion in the first polyploid subject. Absence of the variable nucleotide in the first subject, and presence of the variable nucleotide in the second polyploid subject indicates that the variable nucleotide is linked to the genome comprising the deletion in the first subject. Preferably, the polyploid subjects are isogenic other than the deletion in the target regions. As used herein, the term "isogenic" shall be taken to mean that the sequence of
the genomes of two subjects is substantially identical. This shall not be taken to mean that the sequence of the genomes of the two subjects is identical, rather the sequences may include small differences, e.g., simple nucleotide polymorphisms.
Suitable mutant polyploid subjects will be apparent to the skilled artisan and include aneuploid subjects, e.g., lacking one genome that generally occurs in a population of subjects, or a subject in which a chromosome or a part of a chromosome, e.g., an arm of a chromosome of one genome of a polyploid subject is deleted, e.g., a so-called "deletion line".
In one example, detection of the presence or absence of a variable nucleotide in a first and/or second mutant polyploid subject is performed using a computer implemented algorithm to analyse each position of a nucleotide in the first and second mutant polyploid subjects and to determine the presence or absence of the variable nucleotide.
In one example, the method described herein according to any embodiment is performed using nucleic acid from one or more pools of nucleic acid, e.g., bulked segregant analysis (BSA). For example, two pools of nucleic acid are produced or obtained each from pools of subjects phenotypically similar except for one trait. Preferably, the subjects are also genetically similar, e.g., derived from a single breeding population. The method of the invention facilitates detection of a polymorphism or mutation in which an allele occurs in one population and another allele occurs in another population. Because the two populations are genetically similar, there is an increased probability that the polymorphism or mutation is associated with the trait that varies between the two populations. The present invention also encompasses determining an association between the polymorphism or mutation and the trait.
In one example, the method of the present invention additionally comprises providing or obtaining a sample comprising the target nucleic acid. Preferably, said sample comprises genomic DNA from a polyploid subject. For example, such a sample is from or comprises a nucleated cell from a polyploid subject. In another example, the sample is an extract of a cell from a polyploid subject.
In one example, the method of the present invention additionally comprises determining the sequence of the nucleic acid adjacent to the polymorphism or mutation. For example, the sequence of nucleic acid adjacent to the polymorphism or mutation is
determined by separately determining the position of each nucleotide within a region of the target sequence adjacent to the polymorphism or mutation and deriving the nucleotide sequence adjacent to the site of the polymorphism or mutation based on the position of each nucleotide residue so determined. For example, electrotrace representations of termination fragments produced by performing sequencing reactions each with an individual ddNTP are overlaid, and the nucleotide sequence of nucleic acid adjacent to the polymorphism or mutation is determined. This sequence information facilitates identification of a genetic marker. Accordingly, the present embodiment of the invention applies mutatis mutandis to a method for identifying a genetic marker.
In another example, the method of the present invention comprises determining a nucleotide specific to the genome in which the polymorphism or mutation occurs and linked to the polymorphism or mutation, e.g., a HSV or PSV in the target nucleic acid in which the polymorphism or mutation occurs. By determining such a variation, the present invention permits production of a primer that is able to preferentially or specifically amplify a target sequence comprising a polymorphism or mutation rather than copies of the target sequence on homoeologous chromosomes. For example, the method additionally comprises: (i) determining a position or a plurality of positions at which the specific nucleotide residue occurs within a target nucleic acid in genomes of a first mutant polyploid subject in which at least the target nucleic acid in one genome of the first mutant polyploid subject has been deleted or otherwise removed, wherein the target nucleic acid that has been deleted or otherwise removed also comprises the mutation or polymorphism; and (ii) comparing the position(s) of the specific nucleotide residue determined at (i) to the position(s) of' the specific nucleotide residue in the target nucleic acid in a corresponding target nucleic acid within genomes of a second mutant polyploid subject in which at least the target nucleic acid in a different genome to that of the first subject has been deleted or otherwise removed to determine a nucleotide that occurs in the first polyploid subject but not in the second polyploid subject thereby detecting a variable nucleotide, thereby detecting a nucleotide that occurs in one genome of a polyploid subject and not in another genome of the polyploid subject, wherein the nucleotide is linked to the polymorphism or mutation.
Such a method is also applicable mutatis mutandis to the detection of a nucleic acid variation occurring in a copy of a target sequence in a genome of a subject and not in
another copy of the target sequence occurring in the same genome of the subject, i.e., a PSV.
In one example, the method additionally comprises determining the sequence of nucleic acid adjacent to a nucleotide specific to the genome in which the polymorphism or mutation occurs and linked to the polymorphism or mutation. Suitable methods for determining this sequence will be apparent to the skilled artisan and/or described herein.
As discussed supra, a nucleotide specific to the genome in which a polymorphism or mutation occurs and nucleic acid adjacent thereto is useful as a genetic marker linked to a polymorphism or mutation, e.g., as an annealing site for a primer to preferentially or specifically amplify the target sequence comprising a polymorphism or mutation identified using the method of the invention.
By "preferentially" as used throughout the specification and claims is meant that a probe or primer anneals or hybridizes to a nucleic acid to a greater extent or higher level than it does to other nucleic acid in the genome of a polyploid subject. However, the term "preferentially" does not limit the annealing or hybridization of the probe or primer to only one site or nucleic acid in the genomes of a polyploid subject. Rather, the level of annealing or hybridization of the probe or primer need only be increased to a higher level, and preferably significantly increased compared to the level of annealing or hybridization to another nucleic acid in the genomes of a polyploid subject.
By "specifically" is meant that probe or primer detectably anneals to only one nucleic acid in the genomes of a polyploid subject.
Accordingly, the present invention provides a method for producing a probe or primer for detecting a polymorphism or mutation, said method comprising:
(i) performing a method as described herein according to any embodiment to identify a polymorphism or mutation and the sequence of a nucleic acid adjacent to said polymorphism or mutation or obtaining the sequence of a nucleic acid comprising a polymorphism or mutation and nucleic acid adjacent thereto identified by performing said method; and (ii) producing or obtaining a probe or primer comprising the sequence determined at (i) and capable of preferentially or specifically annealing or hybridizing to a nucleic acid
comprising said polymorphism or mutation. Suitable methods for producing a probe or primer will be apparent to the skilled artisan and/or described herein.
hi one example, the probe or primer preferentially or specifically anneals or hybridizes to a nucleic acid comprising an allele of the polymorphism or mutation.
The present invention also provides a method for producing a primer for amplifying a target sequence in a genome of a polyploid subject, said target sequence comprising a polymorphism or mutation, said method comprising: (i) performing a method as described herein according to any embodiment to determine the sequence of a nucleic acid comprising a nucleotide specific to a genome of a polyploid subject in which a polymorphism or mutation occurs and that is linked to the polymorphism or mutation or obtaining the sequence of said nucleic acid determined by performing said method; and (ii) producing or obtaining a probe or primer comprising the sequence determined at (i) and capable of preferentially or specifically annealing or hybridizing to nucleic acid linked to said polymorphism or mutation.
Suitable methods for producing a probe or primer will be apparent to the skilled artisan and/or described herein.
Clearly the present invention extends to any novel probe or primer that is a direct product of a method described herein according to any embodiment.
It will be understood to the skilled artisan that the inventive method has broad applicability to the identification of one or more polymorphisms and/or mutations in one or more homeologous genes that are not conserved across all genomes of the polyploid is required for determining the breeding contribution of a particular parent. Thus, the present invention also has clear application to the discovery or identification of mutations causative of a trait of interest. Accordingly, the present invention provides a method for identifying a polymorphism or mutation associated with or causative of a trait, said method comprising:
(i) identifying a polymorphism or mutation by performing a method as described herein according to any embodiment; (ii) analyzing a panel of subjects to determine those that comprise a trait, wherein not all members of the panel comprise the polymorphism or mutation; and
(iii) determining variation in the development of the trait wherein said variation indicates that the polymorphism or mutation is associated with the trait.
In one example, a polymorphism or mutation is a polymorphism or mutation in one or more genes associated with yield and/or flour quality and/or flowering and/or disease- resistance and/or tolerance to abiotic stress such as drought, salinity, frost, heat.
In another example, a polymorphism or mutation is a polymorphism or mutation associated with wheat flour quality, including suitability for bread or noodle or biscuit making, e.g., a Wx gene encoding granule-bound starch synthase (GBSS). Preferably, the mutation is an INDEL in a Wx gene.
In another example, a polymorphism or mutation is a polymorphism or mutation associated with wheat flour quality, including suitability for bread or noodle or biscuit making, e.g., a polymorphism or mutation in a gene encoding a glutenin subunit e.g., GIu-Al, GIu-Bl, GIu-Dl, Glu-A3, GIu-BS, GIu-DS, Glu-D4 or Glu-D5 or allele thereof, or alternatively, encoding a gliadin e.g., an allele of a GU-I or Gli-2 gene. Preferably, a polymorphism or mutation is a polymorphism or mutation in a Glu-Dld allele in hexaploid wheat that enhances bread-making quality of wheat e.g., by virtue of providing an extra cysteine residue in the encoded glutenin subunit, thereby improving bread dough strength. Alternatively, a polymorphism or mutation is a polymorphism or mutation in a GIu-Bl al allele that enhances bread making quality by virtue of being expressed at a high level.
In another example, a polymorphism or mutation is a polymorphism or mutation associated with flowering time and/or time to physiological maturity e.g., a polymorphism or mutations in a vernalization-response gene e.g., Vrnl, Vrn2, Vrn3 or allele thereof and/or a photoperiod gene e.g., Ppdl or Ppd2 or allele thereof.
In another example, a polymorphism or mutation is a polymorphism or mutation associated with improved tolerance of plants to abiotic stress conditions such as drought or frost e.g., a polymorphism or mutations in a fructan biosynthesis gene of tetraploid and/or hexaploid wheat, including Chinese Spring wheat, e.g., a gene encoding fructan 1-exohydrolase (FEH) such as an allele in a i-FEHgene e.g., 1-FEH-6A, 1-FEH-6B, 1- FEH-6D or allele thereof.
In another example, a polymorphism or mutation is a polymorphism or mutation associated with improved resistance to a nematode, preferably improved resistance to a root lesion nematode, e.g., Pratylenchus neglectus, e.g., a polymorphism or mutations in a Rlnnl gene or allele thereof.
In another example, a polymorphism or mutation is a polymorphism or mutation in a nodulin-like gene of hexaploid bread wheat.
In yet another example, a polymorphism or mutation is a polymorphism or mutation associated with herbicide resistance of plants e.g., a polymorphism or mutation in acetohydroxyacid synthase (AHAS) gene, wherein the polymorphism or mutation confers resistance or tolerance to the herbicide imidazolinone.
In yet another example, a polymorphism or mutation is a polymorphism or mutation associated with tolerance to a heavy metal e.g., boron , such as a polymorphism or mutation linked to the BoI gene e.g., on chromosome 7B of hexaploid bread wheat.
In one example, the method additionally comprises determining or identifying a gene or other nucleic acid in a genome of a polyploid subject that is expressed in nature, e.g., microRNA encoding nucleic acid, and that is linked to the polymorphism or mutation.
Preferably, the method additionally comprises determining a gene or nucleic acid expressed in nature in a genome of a polyploid subject that causes the trait. In this manner, the present invention provides for positional cloning of a gene responsible for a trait.
The present invention is readily adapted to screening mutant polyploid subjects, e.g., polyploid plants having a desired trait. In such screening assays, mutations or polymorphisms may be known previously or produced de novo, and associated with a particular trait of interest. For example, mutations may be produced de novo in a zygote of a polyploid subject using, for example a chemical mutagen, e.g., ethylnitrosurea (ENU) or ethylmethanesulfonate (EMS). The zygote is then used to produce a polyploid subject having one or more mutations. This polyploid subject, or offspring thereof, is then screened to detect a polyploid subject having a desired trait, and the mutation causative of this trait is detected by performing a method as described herein according to any embodiment. Alternatively, mutant polyploid subjects are screened to
identify those comprising mutations in a particular target nucleic acid prior to phenotypic screening to identify those having a desired trait. Accordingly, the present invention also provides a method for identifying a mutation causative of a trait, said method comprising: (i) inducing or producing a mutation in a polyploid subject;
(ii) identifying a mutant polyploid subject at (i) having a trait; and
(iii) performing a method described herein according to any embodiment to identify a mutation in the genome(s) of the polyploid subject at (ii), thereby identifying a mutation causative of the trait.
Alternatively, the present invention provides a method for identifying a mutation causative of a trait, said method comprising:
(i) inducing or producing a mutation in a polyploid subject;
(ii) performing the method described herein according to any embodiment to identify a polyploid subject comprising mutation in a target nucleic acid in its genome(s); and
(iii) identifying a polyploid subject at (ii) having a trait, thereby identifying a mutation causative of the trait.
Definitions
Unless the context requires otherwise or specifically stated to the contrary, integers, steps, or elements of the invention recited herein as singular integers, steps or elements clearly encompass both singular and plural forms of the recited integers, steps or elements.
The designation of nucleotide residues referred to herein are those recommended by the IUPAC-IUB Biochemical Nomenclature Commission, wherein A represents Adenine, C represents Cytosine, G represents Guanine, T represents thymine, Y represents a pyrimidine residue, R represents a purine residue, M represents Adenine or Cytosine, K represents Guanine or Thymine, S represents Guanine or Cytosine, W represents Adenine or Thymine, H represents a nucleotide other than Guanine, B represents a nucleotide other than Adenine, V represents a nucleotide other than Thymine, D represents a nucleotide other than Cytosine and N represents any nucleotide residue.
As used herein the term "derived from" shall be taken to indicate that a specified integer may be obtained from a particular source albeit not necessarily directly from that source.
Throughout this specification, unless the context requires otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated step or element or integer or group of steps or elements or integers but not the exclusion of any other step or element or integer or group of elements or integers.
Throughout this specification, unless specifically stated otherwise or the context requires otherwise, reference to a single step, composition of matter, group of steps or group of compositions of matter shall be taken to encompass one and a plurality (i.e. one or more) of those steps, compositions of matter, groups of steps or group of compositions of matter.
Each embodiment described herein is to be applied mutatis mutandis to each and every other embodiment unless specifically stated otherwise.
Those skilled in the art will appreciate that the invention described herein is susceptible to variations and modifications other than those specifically described. It is to be understood that the invention includes all such variations and modifications. The
invention also includes all of the steps, features, compositions and compounds referred to or indicated in this specification, individually or collectively, and any and all combinations or any two or more of said steps or features. The present invention is also not to be limited in scope by the specific embodiments described herein, which are intended for the purpose of exemplification only. Functionally-equivalent products, compositions and methods are clearly within the scope of the invention, as described herein.
Brief description of the drawings Figure IA is a graphical representation showing partial termination fragment profiles for aligned (a) ddT, electrotraces corresponding to the sequence of a nodulin-like gene amplified from the bread wheat varieties Excalibur and Kukri using the primer set 18B (SEQ ID NOs: 3 and 4). The two bread wheat varieties harbor a known SNP, a cytosine to guanosine transition. Sequencing reactions assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
Figure IB is a graphical representation showing partial termination fragment profiles for aligned (a) ddC, electrotraces corresponding to the sequence of a nodulin-like gene amplified from the bread wheat varieties Excalibur and Kukri using the primer set 18B (SEQ ID NOs: 3 and 4). The two bread wheat varieties harbor a known SNP, a cytosine to guanosine transition, indicated by an arrow, and have the homozygous genotypes GG and CC, respectively. Sequencing reactions assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
Figure 1C is a graphical representation showing partial termination fragment profiles for aligned (a) ddG, electrotraces corresponding to the sequence of a nodulin-like gene amplified from the bread wheat varieties Excalibur and Kukri using the primer set 18B (SEQ ID NOs: 3 and 4). The two bread wheat varieties harbor a known SNP, a cytosine to guanosine transition, indicated by an arrow, and have the homozygous genotypes GG and CC, respectively. Sequencing reactions assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
Figure 2 is a graphical representation showing results of sequencing reactions to identify a SNP conferring herbicide tolerance in the AHAS gene located on chromosome 6B of bread wheat using bulked segregant analysis. The herbicide tolerant DNA pool (MU) has the SNP genotype TT, while the herbicide susceptible DNA pool (WT) has the
genotype CC. The homoeologous AHAS genes amplified from the A- and D-genomes in both DNA pools have the genotypes CC and CC respectively, at the corresponding nucleotide position. The left-hand side panels shows overlaid and individual electrotraces produced using ddT and the right-hand panel shows overlaid and individual electrotraces produced using ddC. The termination fragment corresponding to the target SNP is indicated with an arrow. Assays were performed using the M13 reverse sequencing primer (SEQ ID NO: 2).
Figure 3 is a graphical representation showing a comparison of overlaid and individual termination fragment profiles for detection of a polymorphism in the AHAS gene. The termination fragment corresponding to the target SNP is indicated with an arrow. Assays were performed using the M13 reverse sequencing primer (SEQ ID NO: 2).
Figure 4 panel A is a graphical representation showing the results of dideoxynucleotide- mediated sequencing termination fragment profiles produced using single ddNTPs and overlaid results to identify homeologue sequence variants in the AHAS gene of bread wheat using aneuploid stocks as indicated. The termination fragments corresponding to the HSVs are indicated with an arrow. Assays were performed using the M 13 reverse sequencing primer (SEQ ID NO: 2).
Figure 4 panel B depicts the published sequence for the region of the AHAS genes depicted in Figure 4 panel A.
Figure 5 is a graphical representation showing dideoxynucleotide-mediated sequencing termination fragment profiles produced using single ddNTPs and overlaid results to determine the nucleotide sequence of a region of the AHAS gene amplified from the bread wheat variety Excalibur. The cut-off threshold for nucleotide assignment is indicated by the solid line. Assays were performed using the Ml 3 reverse sequencing primer (SEQ ID NO: 2).
Figure 6 is a graphical representation showing an alignment of overlaid termination fragment profiles for the four dideoxynucleotides amplified from two homozygous samples harboring a SNP. Shown are partial termination fragment profiles for overlaid electrotraces corresponding to a nodulin-like gene amplified from the bread wheat varieties Kukri and Excalibur. Kukri and Excalibur have the SNP genotypes CC and GG, respectively. Aligned termination fragments (solid lines) correspond to nucleotides
common to both samples, while misaligned termination fragments (dashed lines) correspond to sequence variants. The termination fragments corresponding to the SNP are indicated by an arrow. Assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
Figure 7A is a graphical representation showing alignment of overlaid partial termination fragment profiles for the four dideoxynucleotides amplified from three pooled samples harboring a SNP. Pool 1 consists of the wheat variety Kukri with the SNP genotype CC. Pool 2 comprises the varieties RAC875 and Excalibur with genotypes GG and CC, respectively. Pool 3 consists of the varieties Timgalen, Trident, Spear, Berkut and Krichauff with the genotypes CC, GG, GG, GG and GG, respectively. Aligned termination fragments (solid lines) correspond to nucleotides common to all samples. The termination fragments corresponding to the SNP are indicated by an arrow. Assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
Figure 7B is a graphical representation showing alignment of individual partial termination fragment profiles for the four dideoxynucleotides amplified from three pooled samples harboring a SNP. Pool 1 consists of the wheat variety Kukri with the SNP genotype CC. Pool 2 comprises the varieties RAC875 and Excalibur with genotypes GG and CC, respectively. Pool 3 consists of the varieties Timgalen, Trident, Spear, Berkut and Krichauff with the genotypes CC, GG, GG, GG and GG, respectively. Aligned termination fragments (solid lines) correspond to nucleotides common to all samples. The termination fragments corresponding to the SNP are indicated by an arrow. Assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
Figure 8 is a copy of a photographic representation showing the effect of dideoxynucleotide-termination on the mobility of oligonucleotides in denaturing polyacrylamide gel. The 3 '-end of the oligonucleotide (5' HEX- ACG ACG TTG TAA AA 3' (SEQ ID NO: 9)) was labeled in four separated reactions with each of the four dideoxynucleotides. The reaction products were separated on a 10% (19:1) denaturing polyacrylamide gel using a GelScan3000 instrument and detected by fluorescence.
Figure 9 A is a graphical representation showing the sensitivity of the method of the invention for detecting a polymorphism in pooled samples. Shown are the partial
termination fragment profiles for individual electrotraces corresponding to a region of a triticin precursor gene. DNA pool 1 contains the barley line WABAR2096 with the SNP genotype GG. DNA pools 2-5 contain a mixture of the barley lines WABAR2096 and WI3385, the latter barley line has the SNP genotype TT. The frequency of the WI3385 SNP allele in DNA pools 2, 3, 4 and 5 is 50, 33, 17 and 8%, respectively. Assays were performed using the M13 reverse sequencing primer (SEQ ID NO: 2).
Figure 9B is a graphical representation showing the sensitivity of the method of the invention for detecting a polymorphism in pooled samples. Shown are the partial termination fragment profiles for overlaid electrotraces corresponding to a region of a triticin precursor gene. DNA pool 1 contains the barley line WABAR2096 with the SNP genotype GG. DNA pools 2-5 contain a mixture of the barley lines WABAR2096 and WI3385, the latter barley line has the SNP genotype TT. The frequency of the WI3385 SNP allele in DNA pools 2, 3, 4 and 5 is 50, 33, 17 and 8%, respectively. Assays were performed using the M 13 reverse sequencing primer (SEQ ID NO: 2).
Figure 1OA is a graphical representation showing unaligned electrotraces produced using ddT-mediated sequencing for Stylet and Wylkatchem. Assays were performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1).
Figure 1OB is a graphical representation showing manually aligned electrotraces produced using ddT-mediated sequencing for Stylet and Wylkatchem showing two SNPs identified in a region flanking a microsatellite repeat sequence. The termination fragments corresponding to the SNPs are shown with an arrow. Assays were performed using the M 13 forward sequencing primer (SEQ ID NO: 1).
Figure 11 is a graphical representation showing a ClustalW alignment of the Sanger sequences for sub-cloned PCR fragments amplified from chromosomes 2A and 2D of Stylet and Wylkatchem. Three SNPs located on chromosome 2A are highlighted in grey.
Figure 12A is a graphical representation showing aligned ddA- or ddG-mediated sequencing results from a region of Gene 77 sequenced using the M 13 forward sequencing primer (SEQ ID NO: 1). Target nucleic acid is from Chinese Spring or bulked sample A (as indicated). Results indicate the presence-absence of a peak in the ddA trace and the presence of a peak at the corresponding nucleotide position in the
ddG trace, thereby indicating the presence of the G allele on all three genomes of Chinese Spring and the presence of A and G alleles in the bulked sample.
Figure 12B is a graphical representation showing aligned ddA- or ddG-mediated sequencing results from a region of Gene 77 sequenced using the M 13 forward sequencing primer (SEQ ID NO: 1). Sequenced nucleic acid is from the four individual accessions present in bulked sample A showing the presence-absence of a peak in the ddA trace and presence of a peak in the ddG trace, indicating the presence of the A allele on at least one genome.
Figure 13 is a graphical representation demonstrating identification and validation of a SNP in a complex DNA mixture containing homoeologous sequences of different lengths. Gene 89 was amplified from the three wheat genomes using conserved primers, and sequencing assays performed using the M 13 forward sequencing primer (SEQ ID NO: 1). Shown are the aligned ddA traces of varieties Renan, Hartog, Chinese Spring and nullisomic-tetrasomic stocks for the group 3 chromosomes. The presence-absence of a peak indicates the presence of a SNP between Excalibur and Kukri that is shown to be located on the A genome by the peaks absence in the N3A nullisomic-tetrasomic stock.
Figure 14A is a graphical representation showing identification of allelic SNPs linked to the BoI gene by bulked segregant analysis. Gene CAT was amplified using conserved primers LS-CAT forward and reverse (SEQ ID NOs: 30 and 31, respectively), and a method of the present invention was performed using the M 13 forward sequencing primer (SEQ ID NO: 1). Shown are the ddC traces for the sensitive and tolerant DNA bulks and parental lines. Arrows indicate the first allelic sequence variation identified.
Figure 14B is a graphical representation showing results of sequencing reactions used in a blinded genotyping assay using the method of the present invention to confirm linkage of allelic variation detected in the CAT gene with the BoI gene. Gene CAT was amplified using conserved primers LS-CAT forward and reverse (SEQ ID NOs: 30 and 31, respectively), and a method of the present invention was performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1). Shown are the ddC traces for eight doubled haploid lines with BoI genotypes: sensitive, tolerant, sensitive, tolerant, sensitive, sensitive, tolerant and sensitive, respectively. Arrows indicate the first allelic sequence variation linked to the BoI gene.
Figure 14C is a graphical representation showing HSV assignment using nullisomic- tetrasomic wheat stocks. Gene CAT was amplified using conserved primers LS-CAT forward and reverse (SEQ ID NOs: 30 and 31, respectively), and a method of the present invention performed using the Ml 3 forward sequencing primer (SEQ ID NO: 1). Shown are the ddC traces for the nullisomic stocks for group 7 chromosomes and the sensitive and tolerant DNA bulks. Double arrows indicate allelic variation linked to the BoI gene. A single arrow and asterisk (*) indicates HSV that can be assigned to a copy of chromosome 7.
Detailed description of preferred embodiments
Primer design
As will be apparent from the foregoing disclosure, embodiments of the present invention make use of a primer to amplify a target sequence and/or sequence a target sequence. As will be known to the skilled artisan, a "primer" is a nucleic acid molecule comprising any combination of ribonucleotides, deoxyribonucleotides and/or analogs thereof such that it comprises DNA, RNA or DNA/RNA with one or more ribonucleotide or deoxyribonucleotide analogs contained therein, and is capable of annealing to a nucleic acid template to act as a binding site for an enzyme, e.g., DNA or RNA polymerase, thereby providing a site for initiation of replication of a specific nucleic acid in the 5' to 3' direction. The nucleotide sequence of a primer is generally substantially complementary to the nucleotide sequence of a region of a template nucleic acid to be amplified, i.e., a target sequence, or at least comprises a region of complementarity sufficient for annealing to occur and extension in the 5' to 3' direction therefrom. However, as will be apparent to the skilled artisan a degree of non- complementarity will not prevent a primer from initiating extension. Suitable methods for designing and/or producing a primer suitable for use in the method of the invention are known in the art and/or described herein. Primers are generally, but not necessarily, short synthetic nucleic acids of about 12-50 nucleotides in length. Preferably, a primer for amplifying or directly sequencing a target sequence comprises at least about 12-15 nucleotides in length and is capable of annealing to a strand of the nucleic acid template, i.e., target nucleic acid. Primers may also comprise at least about 20 or 25 or 30 nucleotides in length and are capable of annealing to a strand of the template.
In one example of the present invention, a primer specific to a target nucleic acid or comprising a region specific to a target nucleic acid is designed such that it comprises a
sequence having at least about 80% identity overall to the sequence of a target nucleic acid. More preferably, the degree of sequence identity is at least about 85% or 90% or 95% or 98% or 99%.
To determine whether or not two nucleotide sequences fall within a particular percentage identity limitation recited herein, those skilled in the art will be aware that it is necessary to conduct a side-by-side comparison or multiple alignment of sequences. In such comparisons or alignments, differences may arise in the positioning of non- identical residues, depending upon the algorithm used to perform the alignment. In the present context, reference to a percentage identity between two or more nucleotide sequences shall be taken to refer to the number of identical residues between said sequences as determined using any standard algorithm known to those skilled in the art. For example, nucleotide sequences may be aligned and their identity calculated using the BESTFIT program or other appropriate program of the Computer Genetics Group, Inc., University Research Park, Madison, Wisconsin, United States of America (Devereaux et al, Nucl. Acids Res. 12, 387-395, 1984).
Alternatively, a suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST) (Altschul et al. J. MoI. Biol. 215: 403- 410, 1990), which is available from several sources, including the NCBI, Bethesda, Md.. The BLAST software suite includes various sequence analysis programs including "blastn," that is used to align a known nucleotide sequence with other polynucleotide sequences from a variety of databases. Also available is a tool called "BLAST 2 Sequences" that is used for direct pairwise comparison of two nucleotide sequences.
As used herein the term "NCBI" shall be taken to mean the database of the National Center for Biotechnology Information at the National Library of Medicine at the National Institutes of Health of the Government of the United States of America, Bethesda, MD, 20894.
Clearly, the specific composition of a primer will depend upon the sequence of the target nucleic acid. Accordingly, the sequence of the primer or region specific for a target nucleic acid is not to be taken to be limited to a particular sequence. Rather the sequence need only be sufficient to allow for annealing of the primer to a template nucleic acid and initiation of an amplification reaction and/or a sequencing reaction.
Because a primer is generally extended in the 5'- to 3'- direction it is preferred that at least the 3 '-terminal nucleotide is complementary to the relevant nucleotide in the target nucleic acid. More preferably, at least the 3 or 4 or 6 or 8 or 10 contiguous nucleotides at the 3'- terminus of the primer are complementary to the relevant nucleotides in the target nucleic acid. The complementarity of the 3' terminus of the primer ensures that the extending end of the primer is capable of initiating amplification of the target nucleic acid, for example, by a polymerase.
Because regions of non-complementarity reduce the predicted Tm of a primer and may be associated with amplification of non-target nucleic acid it is preferred that a primer of the invention does not comprise multiple contiguous nucleotides that are not identical to a strand of the target nucleic acid. Preferably, the primer comprises no more than 6 or 5 or 4 or 3 or 2 contiguous nucleotides that are not identical to a strand of the target nucleic acid. More preferably, any nucleotides that are not identical to a strand of the target nucleic acid are non-contiguous.
Generally, a primer comprises or consists of or consists essentially of at least about 10 nucleotides, more preferably at least about 12 nucleotides or at least about 15 or 20 nucleotides that anneal to a target nucleic acid or are complementary to the target nucleic acid. However, longer primers are also used in PCR reactions, for example, reactions in which a long region of nucleic acid (e.g., greater than lOOObp) is amplified. Accordingly, the present invention additionally contemplates a primer comprising at least about 25 or 30 or 35 nucleotides that anneal to a target nucleic acid or are complementary to the target nucleic acid.
Alternatively, a primer comprising one or more modified bases, such as, for example, locked nucleic acid (LNA) or peptide nucleic acid (PNA) need only comprise a region of at least about 8 nucleotides that anneal to a target nucleic acid or are complementary to the target nucleic acid. Preferably, the complementary nucleotides or modified nucleotides are contiguous.
As will be apparent to the skilled artisan, the number of nucleotides capable of annealing to a target nucleic acid is related to the stringency under which the primer will anneal. Preferably, a primer of the invention anneals to a target nucleic acid under moderate to high stringency conditions.
In one embodiment, the stringency under which a primer of the invention anneals to a template nucleic acid is determined empirically. Generally, such a method requires performance of an amplification reaction using one or more primers under various conditions and determining the level of specific amplification produced.
Alternatively, a primer of the invention is labeled with a detectable marker (e.g., a radionucleotide or a fluorescent marker) and the level of primer that has annealed to a target nucleic acid under suitably stringent conditions is determined.
For the purposes of defining the level of stringency, a moderate stringency annealing conditions will generally be achieved using a condition selected from the group consisting of: (i) an incubation temperature between about 42°C and about 55°C; (ii) an incubation temperature between about 150C and 1O0C less than the predicted Tm for a primer; and (iii) a Mg2+ concentration of between about 2mM and 3mM.
High stringency annealing conditions will generally be achieved using a condition selected from the group consisting of:
(i) an incubation temperature above about 550C and preferably above about 65°C;
(ii) an incubation temperature between about 1O0C and I0C less than the predicted
Tm for a primer; and
(iii) a Mg2+ concentration of between about ImM and 1.9mM.
Alternative or additional conditions for enhancing stringency of annealing will be apparent to the skilled artisan. For example, a reagent such as, for example, glycerol (5-
10%), DMSO (2-10%), formamide (1 - 5%), Betaine (0.5 - 2M) or tetramethylammonium chloride (TMAC, >50mM) are known to alter the annealing temperature of a primer and a nucleic acid template(Sarkar et ah, Nucl. Acids Res. 18:
7465; 1990, Baskaran et a Genome Res. 6: 633-638, 1996; and Frackman et ah,
Promega Notes 65: 27, 1998).
Alternatively, or in addition, a temperature that is within about 50C or within about 1O0C or equal to an estimated temperature at which a primer denatures from a target nucleic acid (Tm) is considered to be high stringency. Medium stringency is to be considered to
be within 1O0C to 2O0C or 1O0C to 150C of the calculated Tm of the primer. Methods for determining Tm of a primer are described herein or are known in the art.
Additional conditions for altering the stringency of a PCR reaction are understood by those skilled in the art. For the purposes of further clarification only, reference to the parameters affecting annealing between nucleic acid molecules is found in Ausubel et al (Current Protocols in Molecular Biology, Wiley Interscience, ISBN 047150338, 1992), which is herein incorporated by reference.
In one example, the conditions under which a primer anneals to a target nucleic acid are determined in silico. For example, methods for determining the predicted melting temperature (or Tm) of a primer (or the temperature at which a primer denatures from a specific nucleic acid) are known in the art.
For example, the method of Wallace et al, {Nucleic Acids Res. 6, 3543, 1979) estimates the Tm of a primer based on the G, C, T and A content. In particular, the described method uses the formula 2(A + G) + 4(G + C) to estimate the Tm of a probe or primer.
Alternatively, the nearest neighbor method described by Breslauer et al, Proc. Natl Acad. Sci. USA, 83:3746-3750, 1986 is useful for determining the Tm of a primer. The nearest neighbour method uses the formula:
Tm (cale) = ΣΔH'^CRln(Ct/n) + ∑ΔS°)
wherein AH0 is standard enthalpy for helix formation, AS0 is standard entropy for helix formation, Ct is the total strand concentration, n reflects the symmetry factor, which is 1 in the case of self-complementary strands and 4 in the case of non-self-complementary strands and R is the gas constant (1.987).
Ryuchlik et al, Nucl Acids Res. 18: 6409-6412, 1990 describe an alternative formula for determining Tm of an oligonucleotide:
Tm11™"' = — + 16.61g — "Orr 273.15
wherein, dH is enthalpy for helix formation, dS is entropy for helix formation, R is molar gas constant (1.987cal/°C mol), "c" is the nucleic acid molar concentration (determined empirically, W.Rychlik et.al, supra), (default value is 0.2 μM for unified thermodynamic parameters), [K+] is salt molar concentration (default value is 50 mM).
Suitable software for determining the Tm of an oligonucleotide using the nearest neighbor method is known in the art and available from, for example, US Department of Commerce, Northwest Fisheries Service Center and Department of Molecular Genetics and Biochemistry, University of Pittsburgh School of Medicine.
Alternatively, for longer primers (i.e., a primer comprising at least about 200 nucleotides), the method of Meinkoth and Wahl (In: Anal Biochem, 138: 267-284, 1984), is useful for determining the Tm of the primer. This method uses the formula:
81.5 + 16.6(log10M) + 0.41 (% GC) - 0.61 (% form) - 500 / Length in bp,
wherein M is the molarity of Na+ and % form is the percentage of formamide (set to
Ό .
For a primer that comprises or consists of PNA the Tm is determined using the formula (described by Giesen et al, Nucl. Acids Res., 26: 5004-5006):
7"mpred = Co + C1* TmnnDNA + C2 * fpyr + C3 * length,
wherein, in which TnmnDNA is the melting temperature as calculated using a nearest neighbor model for the corresponding DNA/DNA duplex applying ΔH° and AS0 values as described by SantaLucia et al Biochemistry, 35: 3555-3562, 1995. fpyr denotes the fractional pyrimidine content, and length is the PNA sequence length in bases. The constants are C0 = 20.79, C1 = 0.83, C2 = -26.13 and C3 = 0.44.
To determine the Tm of a primer comprising one or more LNA residues a modified form of the formula of SantaLucia et al Biochemistry, 35: 3555-3562, 1995 may be used:
ΔΗ Tm
ΛS+/n([Naf ^C/4f)
A suitable program for determining the Tm of a primer comprising LNA is available from, for example, Exiqon, Vedbaek, Germany.
A primers or primer sequence that is predicted to be or shown to be capable of selectively annealing to a target nucleic acid is also optionally analyzed for one or more additional characteristics that make it suitable for use as a primer in the method of the invention. For example, a primer is analyzed to ensure that it is unlikely to form secondary structures (i.e., the primer does not comprise regions of self- complementarity) .
Furthermore, should the primer be proposed to be used in a reaction with one or more other primers (e.g., a PCR reaction and/or a multiplex reaction) all primers may be assessed to determine their ability to anneal to one another and form "primer dimers". Methods for determining a primer that is capable of self-dimerization and/or primer dimer formation are known in the art and/or described supra.
Methods for designing and/or selecting a primer suitable for use in an amplification reaction are known in the art and described, for example, in Innis and Gelfand (1990) (In: Optimization of PCRs. pp. 3-12 in: PCR Protocols (Innis, Gelfand, Sninsky and White, eds.); Academic Press, New York) and Dieffenbach and Dveksler (Eds) (In: PCR Primer: A Laboratory Manual, Cold Spring Harbor Laboratories, NY, 1995). Such methods are particularly suited, for example, for designing a locus specific sequence of a primer of the invention.
Generally, it is recommended that a primer satisfies the following criteria:
(i). the primer comprises a region that is to anneal to a target nucleic acid having at least about 17-28 bases in length;
(ii). the primer comprises about 50-60% (G+C); (iii) the 3'-terminus of the primer is a G or C, or CG or GC (this prevents "breathing" of ends and increases efficiency of initiation of amplification); (iv) preferably, the primer has a Tm between about 550C and about 8O0C; (v) the primer does not comprise three or more contiguous Cs and/or Gs at the 3'- ends of primers (as this may promote mispriming at G or C-rich sequences due to the stability of annealing) ;
(vi) the 3 '-end of a primer should not be complementary with another primer in a reaction; and
(vii) the primer does not comprise a region of self-complementarity.
Several software programs are available that enable the design of one or more primers, or a region of a primer (e.g., a locus specific sequence of a first primer of the invention).
For example, a program selected from the group consisting of:
(i) Primer3, available from the Center for Genome Research, Cambridge, MA,
USA, designs one or more primers for use in an amplification reaction based upon a known template sequence;
(ii) Primer Premier 5, available from Biosoft International, Palo Alto, CA, USA, designs and/or analyzes primers;
(iii) CODEHOP, available from Fred Hutchinson Cancer Research Centre, Seattle,
Washington, USA, designs primers based on multiple protein alignments; and (iv) FastPCR, available from Institute of Biotechnology, University of Helsinki,
Finland, designs multiple primers, including primers for use in a multiplex reaction, based on one or more known sequences.
When designing a primer, the composition of the target nucleic acid is considered (i.e. the nucleotide sequence) as is the type of amplification reaction to be used.
Tag regions
As is discussed supra, in one preferred form of the invention a target nucleic acid is amplified with a primer comprising a region that anneals to the target nucleic acid and a tag region that provides an annealing site for a distinct labeled primer that facilitates sequencing.
A tag region that is unable to anneal to the template nucleic acid is selected to ensure that it does not cause non-specific annealing of the first primer in the first amplification reaction and the amplification of non-template nucleic acid. Preferably, the tag region is unable to anneal to a nucleic acid in a sample being assayed to such a degree as to amplify nucleic acid to a detectable level (i.e. background amplification).
As will be apparent to the skilled artisan, the requirement that the tag region not anneal to a template nucleic acid does not require that the tag region not anneal under any conditions. Rather, it is preferred that the tag region is not capable of annealing to the
template nucleic acid under conditions sufficient for annealing of the region of the primer that anneals to the target nucleic acid. For example, the tag region may anneal to the target nucleic acid under low stringency conditions.
In one embodiment, it is preferred that the tag comprises a sequence of nucleotides that does not naturally occur in a sample being assayed. Methods for determining a sequence that is not present in a sample being assayed will be apparent to the skilled artisan. For example, the nucleotide sequence of the tag region is analyzed using a program, such as, for example, BLAST to determine whether or not that sequence (or its complement) occurs naturally in a subject being assayed.
Alternatively, or in addition a nucleotide sequence is selected from an subject different to that from which a sample being assayed is derived. For example, should the sample being assayed be derived from a mammal or a plant, the tag is derived from, for example, an unrelated mammal or plant or a virus or a bacteria or a fungus that is not a pathogen of the mammal or plant. In one embodiment, a tag sequence is selected from a bacterial page gene, e.g., tag comprises a sequence from M13 phage GTAAACGACGGCCAGT (SEQ ID NO: 12) or a sequence from T7 phage TAATACGACTCACTATAGGG (SEQ ID NO: 13). Such a tag is useful as, for example, a tag sequence for a primer used to amplify a sequence from a polyploid plant.
Alternatively, an artificial sequence is used for a tag. For example, a tag sequence described by Heath et at, Med Genet 57:272-280, 2000 is used (i.e., a tag sequence comprises a nucleotide sequence selected from the group consisting of: (i) TCCGTCTTAGCTGAGTGGCGTA (SEQ ID NO: 14);
(ii) AGGCAGAATCGACTCACCGCTA (SEQ ID NO: 15);
(iii) TCCGTCTTAGCTGAGTGGCGTA (SEQ ID NO: 16);
(iv) AGGCAGAATCGACTCACCGCTA (SEQ ID NO: 17);
(v) GCTAAATCGGACTAGCTACC (SEQ ID NO: 18); and (vi) TAATCCAGCTACGCTGCATC (SEQ ID NO: 19).
In a further embodiment, a zip-code sequence is used as a tag sequence. For example, a tag sequence comprises the nucleotide sequence GGAGCACGCTATCCCGTTAGAC (SEQ ID NO: 20) or CGCTGCCAACTACCGCACATG (SEQ ID NO: 21) or CCTCGTGCGAGGCGTATTCTG (SEQ ID NO: 22) or
GCGACCTGACTTGCCGAAGAAC (SEQ ID NO: 23). A zip code sequence is
generally a sequence of nucleotides that has been produced synthetically and is predicted not to occur in a nucleic acid derived from a specific subject.
In another embodiment, the tag region comprises a nucleotide sequence CACGACGTTGTAAAACGAC (SEQ ID NO: 24) or GTACATTAAGTTCCCATTAC (SEQ ID NO: 25) as described in the applicant's International Patent Application No. PCT/AU2006/000318.
Preferably a tag sequence comprises a sequence set forth in SEQ ID NO: 1 or SEQ ID NO: 2 or the complement thereof.
In one embodiment, the tag comprises or consists of the sequence of the primer used to sequence a target nucleic acid.
As will be apparent to the skilled artisan, the tag sequence need not only comprise a region that provides a suitable template for annealing of a primer to an amplicon to thereby permit dideoxynucleotide-mediated sequencing. For example, the tag comprises additional sequence to facilitate binding of a polymerase to enable sequencing of amplified nucleic acid.
Alternatively, or in addition, the tag comprises a spacer sequence at the 5 '-end such that it is positioned at the 3'-end of the region that anneals to a target nucleic acid. Preferably, such a spacer region is rich in adenosine and/or thymine rather than cytosine and/or guanine. This is because a spacer region rich in cytosine and/or guanine increases the Tm of the first primer more than a spacer region rich in adenosine and/or thymine. Accordingly, a CG rich spacer may cause background by non-specific amplification of nucleic acids.
Primer synthesis Following primer design and or analysis, the primer is produced and/or synthesized. Methods for producing/synthesizing a primer are known in the art. For example, oligonucleotide synthesis is described, in Gait (Ed) {In: Oligonucleotide Synthesis: A Practical Approach, IRL Press, Oxford, 1984). For example, a probe or primer may be obtained by biological synthesis (e.g. by digestion of a nucleic acid with a restriction endonuclease) or by chemical synthesis. For short sequences (up to about 100 nucleotides) chemical synthesis is preferable.
In one embodiment, a primer comprising deoxynucleotides (e.g., a DNA-based oligonucleotide) is produced using standard solid-phase phosphoramidite chemistry. Essentially, this method uses protected nucleoside phosphoramidites to produce an oligonucleotide of up to about 80 nucleotides. Typically, an initial 5'-protected nucleoside is attached to a polymer resin by its 3 '-hydroxy group. The 5' hydroxyl group is then de-protected and the subsequent nucleoside-3 '-phosphoramidite in the sequence is coupled to the de-protected group. An internucleotide bond is then formed by oxidizing the linked nucleosides to form a phosphotriester. By repeating the steps of de-protection, coupling and oxidation an oligonucleotide of desired length and sequence is obtained. Suitable methods of oligonucleotide synthesis are described, for example, in Caruthers, M. H., et al., "Methods in Enzymology," Vol. 154, pp. 287-314 (1988).
Other methods for oligonucleotide synthesis include, for example, phosphotriester and phosphodiester methods (Narang, et a Meth. Enzymol 68: 90, 1979) and synthesis on a support (Beaucage, et al Tetrahedron Letters 22: 1859-1862, 1981), and others described in "Synthesis and Applications of DNA and RNA," S. A. Narang, editor, Academic Press, New York, 1987, and the references contained therein.
For longer sequences standard replication methods employed in molecular biology are useful, such as, for example, the use of Ml 3 for single stranded DNA as described by J. Messing (1983) Methods Enzymol, 101, 20-78.
Alternatively, a plurality of primers are produced using standard techniques, each primer comprising a portion of a desired primer and a region that allows for annealing to another primer. The primers are then used in an overlap extension method that comprises allowing the primers to anneal and synthesizing copies of a complete primer using a polymerase. Such a method is described, for example, by Stemmer et al, Gene 164, 49-53, 1995.
As discussed supra a primer can also include one or more nucleic acid analogs. For example, a primer comprises a phosphate ester analog and/or a pentose sugar analog. Alternatively, or in addition, a primer of the invention comprises polynucleotide in which the phosphate ester and/or sugar phosphate ester linkages are replaced with other types of linkages, such as N-(2-ammoethyl)-glycine amides and other amides (see, e.g., Nielsen et ah, Science 254: 1497-1500, 1991; WO 92/20702; and USSN 5,719,262);
morpholinos (see, for example, USSN 5,698,685); carbamates (for example, as described in Stirchak & Summerton, J. Org. Chem. 52: 4202, 1987); methylene(methylimino) (as described, for example, in Vasseur et al, J. Am. Chem. Soc. 114: 4006, 1992); 3'-thioformacetals (see, for example, Jones et al, J. Org. Chem. 58: 2983, 1993); sulfamates (as described, for example in, USSN 5,470,967); 2- aminoethylglycine, commonly referred to as PNA (see, for example, WO 92/20702). Phosphate ester analogs include, but are not limited to, (i) C1-C4 alkylphosphonate, e.g. methylphosphonate; (ii) phosphoramidate; (iii) C1-C6 alkyl-phosphotriester; (iv) phosphorothioate; and (v) phosphorodithioate. Methods for the production of a primer comprising such a modified nucleotide or nucleotide linkage are known in the art and discussed in the documents referred to supra.
For example, a primer of the invention comprises one or more LNA and/or PNA residues. Primers comprising one or more LNA or PNA residues have been previously shown to anneal to nucleic acid template at a higher temperature than a primer that comprises substantially the same sequence but does not comprise the LNA or PNA residues.
Methods for the synthesis of an oligonucleotide comprising LNA are described, for example, in Nielsen et al, J. Chem. Soc. Perkin Trans., 1: 3423, 1997; Singh and Wengel, Chem. Commun. 1247, 1998. Methods for the synthesis of an oligonucleotide comprising PNA are described, for example, in Egholm et al, Am. Chem. Soc, 114: 1895, 1992; Egholm et al, Nature, 365: 566, 1993; and Oram et al, Nucl Acids Res., 21: 5332, 1993.
As described herein, a primer used in a sequencing reaction preferably comprises a detectable marker (for example, a fluorescent dye) to enable detection of fragments to thereby determine the position of a nucleotide in a sequence. Accordingly, in one embodiment, a primer comprises or is conjugated to a detectable marker. As used herein, the term "detectable marker" refers to any moiety which can be attached to a primer and: (i) provides a detectable signal; and/or (ii) interacts with a second detectable marker to modify the detectable signal provided by the second detectable marker, e.g. FRET (Fluorescent Resonance Energy Transfer); and/or (iii) stabilizes annealing, e.g., duplex formation; and/or (iv) provide a member of a binding complex or affinity set, e.g., affinity, antibody/antigen, ionic complexation, hapten/ligand, e.g. biotin/avidin.
Labeling of a primer is accomplished using any one of a large number of known techniques employing known detectable markers, linkages, linking groups, reagents, reaction conditions, and analysis and purification methods. Detectable markers include, but are not limited to, light-emitting or light-absorbing compounds which generate or quench a detectable fluorescent, chemiluminescent, or bioluminescent signal (for example, as described in Kricka, L. in Nonisotopic DNA Probe Techniques (1992), Academic Press, San Diego, pp. 3-28). Fluorescent reporter dyes useful for labeling biomolecules include, but are not limited to, fluoresceins (see, for example USSN 5,188,934; 6,008,379; or USSN 6,020,481), rhodamines (as described, for example, in USSN 5,366,860; USSN 5,847,162; USSN 5,936,087; or USSN 6,051,719), benzophenoxazines (for example, as described in USSN U.S. Pat. No. 6,140,500), energy-transfer fluorescent dyes, comprising pairs of donors and acceptors (as described in USSN 5,863,727; USSN 5,800,996; or 5,945,526), or a cyanine (as described, for example, in WO 97/45539). Exemplary fluorescein dyes include, but are not limited to, 6-carboxyfluorescein; 2l,4',l,4,-tetrachlorofluorescein; and 2',4t,5',7',l,4- hexachlorofluorescein. Detectable markers also include, but are not limited to, semiconductor nanocrystals, or Quantum Dots® (as described, for example in US Pat. No. 5,990,479 or US Pat. No. 6,207,392). Suitable methods for linking a detectable marker to a primer (or labeling a primer) are also described in the references supra.
Alternatively, or in addition, a primer is produced with a fluorescent nucleotide analog to facilitate detection. For example, coupling allylamine-dUTP to the succinimidyl- ester derivatives of a fluorescent dye or a hapten (such as biotin or digoxigenin) enables preparation of many common fluorescent nucleotides. Such a method is described in, for example Henegariu, Nat. Biotechnol. 75:345-348, 2000. Other fluorescent nucleotide analogs are also known in the art and described, for example, Jameson, Methods Enzymol. 278:363-390, 1997 or USSN 6,268,132. Such nucleotide analogs are incorporated into nucleic acids, e.g., DNA and/or RNA, or oligonucleotides, via either enzymatic or chemical synthesis (e.g., a method described supra).
In one preferred example of the present invention, a primer is labeled with a fluorescent dye, such as, for example, 6-carboxyfluorescein (FAM), VIC, NED or PET. To label a primer with a fluorescent dye a simple two-step process is used. In the first step, an amine-modified nucleotide, 5-(3-aminoallyl)-dUTP, is incorporated into DNA using conventional enzymatic labeling methods. This step ensures relatively uniform labeling of the probe with primary amine groups. In the second step, the amine-modified DNA
is chemically labeled using an amine-reactive fluorescent dye. Various commercial kits for labeling a primer are known in the art and available from, for example, Molecular Probes (Invitrogen detection Technology) (Eugene, OR, USA) or Applied Biosystems (Foster City, CA, USA).
Commercial sources for the production of a labeled probe or primer or for a suitable label will be known to the skilled artisan, e.g., Sigma-Genosys, Sydney, Australia or Applied Biosystems, Foster City, CA, USA.
Using any method for oligonucleotide synthesis described herein and/or known in the art a set of first primers and/or a second primer or set thereof is synthesized.
In another example, a primer comprising a tag region is produced by coupling an oligonucleotide comprising a tag region to an oligonucleotide comprising an allele- specific region. For example, an oligonucleotide comprising a tag region is linked to another oligonucleotide using a RNA ligase, such as, for example T4 RNA ligase (as available from New England Biolabs). A RNA ligase catalyzes ligation of a 5' phosphoryl-terminated nucleic acid donor to a 3' hydroxyl-terminated nucleic acid acceptor through the formation of a 3'-5' phosphodiester bond, with hydrolysis of ATP to AMP and PPi. Suitable methods for the ligation of DNA and/or RNA molecules using a RNA ligase are known in the art and/or described in Ausubel et al (In: Current Protocols in Molecular Biology. Wiley Interscience, ISBN 047 150338, 1987) and Sambrook et al (In: Molecular Cloning: Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, New York, Third Edition 2001).
Sequencing nucleic acids
Methods for sequencing a nucleic acid will be apparent to the skilled artisan and/or described in Sambrook, Fritsch & Maniatis, A Laboratory Manual, Cold Spring Harbor Laboratories, New York, Third Edition (2001), whole of VoIs I, II, and III; and/or Ausubel et al., Current Protocols in Molecular Biology. Wiley Interscience, ISBN 047 150338, 1987, whole of text.
Preferably, a method for sequencing, e.g., to determine one or a plurality of position(s) of a specific nucleotide in a target nucleic acid from genomes of a polyploid subject comprises performing a dideoxynucleotide-mediated sequencing reaction with one dideoxynucleotide, i.e., either ddATP or ddTTTP or ddGTP or ddCTP.
Dideoxynucleotide-mediated sequencing methods are known in the art. For example, one such method is also known as Sanger sequencing. In one example, a Sanger sequencing method comprises amplifying a target nucleic acid, e.g., using PCR and sequencing the PCR amplicon. Sequencing involves annealing a primer to the amplicon and extending the sequence of nucleotides from the primer using a DNA polymerase, an enzyme that replicates DNA. Included with the primer and DNA polymerase are the four naturally-occurring deoxynucleotide bases, along with a low concentration of a chain terminating nucleotide (most commonly a dideoxynucleotide). Limited incorporation of the chain terminating nucleotide by the DNA polymerase results in a series of related DNA fragments that are terminated only at positions where that particular nucleotide is used. The dideoxynucleotides are added in limited quantities. In one example, the dideoxynucleotides have a detectable label, e.g., a fluorescent label. Preferably, the primer comprises the detectable label. As the DNA strand is elongated the DNA polymerase catalyzes the production of a sequence of deoxynucleotides. However, if a dideoxynucleotide is joined to a base, then that fragment of DNA can no longer be elongated since a dideoxynucleotide lacks a crucial 3'-OH group. Fragments of nucleic acid are then generated that are terminated by a ddNTP at a position corresponding to the position of each corresponding dNTP in the target nucleic acid. The fragments are then size-separated by electrophoresis in a slab polyacrylamide gel, or in a narrow glass tube (capillary) filled with a viscous polymer (e.g., Pop 7; Applied Biosystems, USA). Detection of the size of labeled nucleic acid fragments then facilitates or permits determination of one or a plurality of position(s) of the specific nucleotide corresponding to the ddNTP used in the sequencing reaction in a target nucleic acid, i.e., relative to the position of the first nucleotide of the primer.
In a preferred example, each position of a nucleotide in a target nucleic acid is determined by performing a method comprising:
(i) performing an amplification reaction, e.g., a polymerase chain reaction (PCR) with a primer comprising a locus-specific region capable of annealing to a target nucleic acid and a tag-sequence that does not anneal to the target nucleic acid to thereby amplify the target nucleic acid, wherein amplicons of the target nucleic acid comprise the tag sequence;
(ii) performing a sequencing reaction with a primer capable of annealing to the tag sequence in the presence of one ddNTP; and
(iii) detecting the molecular weight of the nucleic acid fragments produced at (ii), wherein the molecular weight of each fragment corresponds to the position of the nucleotide corresponding to the ddNTP used in the sequencing reaction in the target nucleic acid. Preferably, the primer capable of annealing to the tag sequence is labeled with a detectable marker, e.g., a fluorescent marker to facilitate detection of the nucleic acid fragments.
In accordance with this embodiment, an unlabeled ddNTP is used in the sequencing reaction. Accordingly, it is not necessary to use ddNTPs labeled with a detectable marker. Accordingly, the method can be performed to determine the position of any dNTP in a target nucleic acid, or may be performed several times, each with a different ddNTP to determine the position of some or all dNTPs in the target nucleic acid. Clearly, such a method results in a cost saving in so far as it does not require multiple different labeled reagents.
The skilled artisan will be aware of suitable conditions for performing a sequencing reaction. For example, methods of sequencing making use of a labeled primer are described, for example, in Smith et al., 1986, Nature 321:674-679 and US Pat. No. 5,075,216. The methods described in each of these documents are modified to make them suitable for performance in the method of the present invention by only incorporating a specific ddNTP in a sequencing reaction.
For example, reaction conditions include a reaction mixture comprising a buffer, e.g., comprising 50 mM KCl; 10 mM Tris-HCl, pH 8.4; 2.5 mM MgCl2 ; 200μM of each dNTP; and 200 μg/mL of gelatine or 0.05% Tween 20, 0.05% NP40, 3 mM (or higher) MgCl2, in a buffer, 10 mMTris-HCl is preferred, at pH 8.0 to 8.5. The reaction mixtures also contain primer, template, a suitable polymerase, e.g., Taq polymerase and a ddNTP. In this respect, ddNTPs are preferably included in a sequencing reaction at a ratio to dNTP of about dGTP:ddGTP (1:6), dATP:ddATP (1:32), TTP:ddTTP (1:48), and dCTP:ddCTP (l:16).
A sequencing reaction is then performed by incubating the reaction mixture for a time and under conditions sufficient to denature a double stranded nucleic acid (e.g., at about 950C for at least about 30 seconds), then incubating the reaction mixture for a time and under conditions sufficient for a primer to anneal to a target nucleic acid (e.g., about 65°C-75°C for about 30 seconds), then incubating the reaction mixture for a time and
under conditions sufficient for a polymerase to extend the primer and preferably incorporate a ddNTP into a nucleic acid (e.g., about 70°C-75°C for about 30 seconds to about 60 seconds.
As discussed supra, the resulting nucleic acid fragments are then resolved, e.g., using electrophoresis and the size of the fragments determined, e.g., by detecting a label using suitable means, e.g., in the case of a fluorescent label an electrophoresis gel or capillary is exposed to a laser emitting light at a suitable wavelength to stimulate the label and emission of light from the label is detected. The position of the light emission in the gel or capillary is indicative of the size of a nucleic acid fragment.
Preferably, one or a plurality of position(s) of a specific nucleotide in a target nucleic acid from one polyploid is compared to one or a plurality of position(s) of the specific nucleotide in a target nucleic acid from another polyploid subject. In one example, this comparison comprises aligning the sequence information, e.g., one or a plurality of position(s) of a specific nucleotide in the target nucleic acid from each subject. Such alignment may be performed by an individual, e.g. by eye, whereby the individual identifies patterns in the sequences or positions of the nucleotides. In another example, the sequence information or position of each nucleotide is determined using an algorithm, e.g., a computer implemented algorithm that recognizes patterns, e.g., an anti-correlation algorithm. Alignment of sequence information then permits differences between the sequences, e.g., as a result of a presence of a variable nucleotide to be determined.
Nucleic acid amplification
Based on the foregoing description, the skilled artisan will also be aware that some embodiments of the present invention also involve amplifying a target nucleic acid. The skilled artisan will also be aware of suitable methods for amplifying a nucleic acid. For example, a target nucleic acid is amplified by polymerase chain reaction (PCR). Methods of PCR are known in the art and described, for example, in Dieffenbach (Ed) and Dveksler (Ed) (In: PCR Primer: A Laboratory Manual, Cold Spring Harbour Laboratories, NY, 1995). Generally, for PCR two non-complementary nucleic acid primers comprising at least about 20 nucleotides, and more preferably at least 30 nucleotides are hybridized to different strands of a target nucleic acid, and specific nucleic acid copies of the target are amplified enzymatically.
Reagents required for a PCR are known in the art and include for example, one or more primers (described herein), a suitable polymerase, deoxynucleotides and/or ribonucleotides, and a buffer. Suitable reagents are described for example, in Dieffenbach (ed) and Dveksler (ed) (In: PCR Primer: A Laboratory Manual, Cold Spring Harbor Laboratories, NY, 1995).
For example, a suitable polymerase for use in the method of the invention include, a DNA polymerase, a RNA polymerase, a reverse transcriptase, a T7 polymerase, a SP6 polymerase, a T3 polymerase, Sequenase™, a Klenow fragment, a Taq polymerase, a Taq polymerase derivative, a Taq polymerase variant, a Pfu polymerase, a Pfx polymerase, a Tfi polymerase, an AmpliTaq™ FS polymerase, a thermostable DNA polymerase with minimal or no 3 '-5' exonuclease activity, or an enzymatically active variant or fragment of any of the above polymerases. Preferably, a polymerase used in the method of the invention is a thermostable polymerase.
In one example, a mixture of two or more polymerases is used. For example, the mixture of a Pfx or Pfu polymerase and a Taq polymerase has been previously shown to be useful for amplifying templates comprising a high GC content or for amplifying a large target nucleic acid.
Suitable commercial sources for a polymerase useful for the performance of the invention will be apparent to the skilled artisan and include, for example, Stratagene (La Jolla, CA, USA), Promega (Madison, WI, USA), Invitrogen (Carlsbad, CA, USA), Applied Biosystems (Foster City, CA, USA) and New England Biolabs (Beverly, MA, USA).
Other forms of amplification are also encompassed by the present invention. For example, strand displacement amplification (SDA) utilizes oligonucleotides, a DNA polymerase and a restriction endonuclease to amplify a target sequence. The oligonucleotides are hybridized to a target nucleic acid and the polymerase used to produce a copy of this region. The duplexes of copied nucleic acid and target nucleic acid are then nicked with an endonuclease that specifically recognizes a sequence of nucleotides at the beginning of the copied nucleic acid. The DNA polymerase recognizes the nicked DNA and produces another copy of the target region at the same time displacing the previously generated nucleic acid. The advantage of SDA is that it
occurs in an isothermal format, thereby facilitating high-throughput automated amplification.
Following amplification, the target nucleic acid is then used in a sequencing reaction described herein according to any embodiment. For example, the amplification product is isolated, e.g., by gel electrophoresis, or is separated from unincorporated dNTPs and/or other components of an amplification reaction, e.g., by size exclusion chromatography prior to sequencing. Preferably, the sequencing reaction is performed without subcloning an amplicon, i.e., a product of an amplification reaction.
Determining linkage to a genome of a polyploid subject
As discussed herein above, a method for determining linkage to a genome of a polyploid subject preferably makes use of a polyploid subject in which at least a target nucleic acid in one genome of the subject is deleted. The skilled artisan will be aware of suitable mutant polyploid subj ects .
For example, as exemplified herein the present inventors have made use of an aneuploid subject. In one example, the aneuploid or mutant polyploid subject is a nullisomic subject, e.g., a nullisomic-tetrasomic aneuploid of a hexaploid subject. Such an subject lacks one genome, i.e., one chromosome set. Suitable nullisomic polyploid subjects are known in the art and described, for example, in Sears 1966, In: Chromosome Manipulations and Plant Genetics, Riley and Lewis (Eds), Oliver and Boyd, Edinburgh.
In another example, an aneuploid subject or mutant polyploid subject is a ditelosomic subject. In this respect, a ditelosomic subject has two copies of one arm of a chromosome and miss the other arm of one of the chromosomes. Suitable ditelosomic subject is described, for example, in Sears and Sears Proc. 5th Int. Wheat Genetics Symposium.
In a further example, an aneuploid subject or a mutant polyploid subject is a monotelodisomic subject. Monotelodisomic subjects have at least one complete copy of a chromosome and another copy of a chromosome lacking one arm.
In a further example, a mutant polyploid subject comprises a mutation in one copy of a chromosome, wherein a region of the chromosome is deleted. In this respect, a collection of lines of wheat comprising overlapping deletions have been produced and
are described, for example, in Endo and Gill, J. Hered., 87: 295-307, 1996. Methods for producing additional deletion lines will be apparent to the skilled artisan and/or described, for example, in Endo and Gill, J. Hered., 87: 295-307, 1996.
Using one or more of the mutant polyploid subjects described herein, a variable nucleotide is mapped to a genome of the polyploid subject. For example, failure to detect the variable nucleotide in a mutant polyploid subject described herein above indicates that the variable nucleotide occurs within the genome comprising the mutation in the mutant subject. For example, the present inventors have used aneuploid stocks of wheat to determine the presence or absence of a variable nucleotide in a genome of hexaploid wheat.
Identification of mutations causative of a trait of interest
As will be apparent to the skilled artisan based on the description herein, the present invention is also useful for identifying a mutation responsible for or causative of a trait. In one preferred example, such a method comprises inducing or producing a mutation in a polyploid subject. Suitable methods for inducing or producing a mutation will be apparent to the skilled artisan. For example, the present invention is useful for detecting a mutation induced by TILLING. In the first step of the TILLING process, single base pair changes are induced in a population of plants by treating seeds (or pollen) with a chemical mutagen (e.g., ENU or EMS or MNU). For example, seeds or pollen of a polyploid subject are contacted with a mutagen for a time and under conditions sufficient to induce a mutation in at least one genome of said polyploid subject. For example, the seed or pollen is contacted with a chemical mutagen for a period of time from about 15 minutes to about 1 hour or about 30 minutes or about 45 minutes. Adult polyploid subjects are then grown from the seed and/or produced using the pollen and bred to produce a generation where mutations are stably inherited. Methods of TILLING are known in the art and/or described, for example, in Suzuki et al, MoI Genet Genomics, 279:213-223, 2007 or Slade et al, Nat Biotechnol. 25:75-81, 2005.
Following production of a plurality of mutant plants, the plants can be screened using, for example, a phenotypic assay to identify one or more plants having a trait of interest. Exemplary traits include, for example, resistance to a plant pathogen or toxin, resistance to drought, water stress or cold or an improved nutritional quality, e.g., wheat having an increased or reduced level of amylose relative to amylopectin for production of bread or noodles.
Following identification of a plant having a trait of interest, a method as described herein according to any embodiment is performed to identify the mutation in the plant, thereby identifying the mutation causative of the trait,
In another example, mutations in a specific target nucleic acid are selected prior to phenotypic screening. For example, a method as described herein according to any embodiment is performed to identify a polyploid subject comprising a mutation in a target nucleic acid. The identified subject(s) is(are) then screened to identify that subject (those subjects) having the desired trait. For example, plants produced using a TILLING method are screened to identify those plants having a mutation in a granule- bound starch synthase I gene (encoding GBSSI ). Wheat flour from a plant having such a mutation is then assayed to determine the level of amylose relative to the level of amylopectin. For example, a plant having a reduced level of amylose relative to amylopectin is desirable for noodle making as it improves noodle texture. Other genes that may be involved in producing a trait of interest will be apparent to the skilled artisan and include, for example, a gene conferring pesticide resistance, e.g., a chitinase, γ-kafϊrin; wheatwin or WPR4; thionin; thaumatin or thaumatin-like protein such as zeamatin; or a gene encoding a protein involved in tolerance to desiccation, e.g., HVAl or DREBlA.
Notwithstanding that the foregoing description is in respect of chemical-induced mutagenesis, the present invention clearly encompasses other forms of mutagenesis, such as, for example, radiation-induced mutagenesis.
The method described supra, is clearly useful for producing a polyploid subject, e.g., polyploid crop plants having a desired trait, e.g., resistance to a pest or to an abiotic stress or having an improved nutritional quality. The present invention also provides a polyploid subject, particularly, a polyploid crop plant produced by a method described herein according to any embodiment. For example, the present invention also provides a polyploid crop plant having a desirable trait produced by performing a method as described herein according to any embodiment.
Detection of polymorphisms/mutations A polymorphism or mutation as identified using a method as described herein according to any embodiment is useful as a genetic marker, e.g., to identify a polyploid subject
having a desired trait or for genetic mapping or for marker assisted breeding. Following identification of a polymorphism or mutation, and preferably a genetic marker using a method as described herein according to any embodiment, the mutation, polymorphism or genetic marker is detected using any method known in the art.
In one example, the mutation, polymorphism or genetic marker is detected by performing a method as described in the assignee's co-pending application USSN 60/973,928 filed in the United States Patent and Trademark Office on September 20, 2007 entitled "Method of amplifying nucleic acid". For example, a target nucleic acid on one genome of a polyploid subject and comprising the mutation, polymorphism or genetic marker is amplified in a first PCR amplification phase in which a set of first primers is used to selectively amplify the nucleic acid. A second phase amplification is performed using one or more second primers comprising (i) an allele specific region comprising a sequence complementary to the target nucleic acid adjacent to the polymorphism or mutation and that has a lower Tm than the first primers; and (ii) a tag region comprising a sequence that does not anneal to the template nucleic acid, however increases the Tm of the second primer to about the Tm of the first primer. By reducing the annealing temperature in the second phase amplification, the allele specific region of the second primer(s) anneals to the amplification product of the first phase amplification, thereby permitting amplification with the second primer(s) and first primers. Following several amplification cycles, the sequence of the second primer(s) is incorporated into amplification products thereby permitting the annealing temperature to be increased, and for the entire second primer and the first primer to anneal to target sequences and prime amplification by PCR. By detecting one or more amplification products produced in this second phase of amplification a polymorphism or mutation is detected. For example, one or more nucleotide(s) positioned at the 3' end of the allele specific region that is complementary to an allele of the polymorphism or mutation. The 3' end of the second primer only anneals in the presence of that allele, and permits amplification by PCR. Accordingly, detection of the amplification product produced using this second primer and either another second primer or a first primer, an allele of a polymorphism or mutation is detected. On the other hand, failure to detect this amplification product indicates the presence of a different allele. Use of two or more second primers complementary to different alleles permits detection of different alleles. In this respect, the two or more second primers may be used in the same reaction if each primer is labeled so as to permit differentiation between amplification products produced by different primers, e.g., using tag regions having different molecular
weights or different detectable markers. Alternatively, each second primer is used in a separate reaction.
Other methods for detecting a polymorphism or mutation are also clearly encompassed by the present invention. In one example, such a method involves amplifying a target nucleic acid from a genome of a polyploid subject comprising the mutation or polymorphism, e.g., using PCR. Methods for producing primers for selectively or preferentially amplifying one genome of a polyploid subject will be apparent to the skilled artisan and/or described herein. The polymorphism or mutation is then detected by any of a variety of methods .
For example, ligase chain reaction (described in, for example, EU 320,308 and US 4,883,750) uses two or more oligonucleotides that hybridize to adjacent target nucleic acids. A ligase enzyme is then used to link the oligonucleotides. In the presence of an allele of a polymorphism or mutation that is not complementary to the nucleotide at an end of one of the primers that is adjacent to the other primer, the ligase is unable to link the primers, thereby failing to produce a detectable amplification product. However, a ligation product is produced in the presence of an allele that is complementary to the end of the primer adjacent to the other primer when annealed to a target nucleic acid. Using thermocycling the ligated oligonucleotides then become a target for further oligonucleotides. The ligated fragments are then detected, for example, using electrophoresis, or MALDI-TOF. Alternatively, or in addition, one or more of the probes is labeled with a detectable marker, thereby facilitating rapid detection.
Cycling Probe Technology uses chimeric synthetic probe that comprises DNA-RNA- DNA that is capable of hybridizing to a target sequence. Upon hybridization to a target sequence the RNA-DNA duplex formed is a target for RNase H that cleaves the probe. The cleaved probe is then detected using, for example, electrophoresis or MALDI-TOF.
A polymorphism or mutation that introduces or alters a sequence that is a recognition sequence for a restriction endonuclease is detected by digesting DNA with the endonuclease and detecting the fragment of interest using, for example, Southern blotting (described in Ausubel et al (In: Current Protocols in Molecular Biology. Wiley Interscience, ISBN 047 150338, 1987) and Sambrook et al (In: Molecular Cloning: Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, New York, Third Edition 2001)).
In another example, a target nucleic acid comprising a polymorphism or mutation is amplified, e.g., using PCR and amplicons produced in the PCR digested with a restriction endonuclease that cleaves an allele of the polymorphism or mutation. The resulting amplicons are then detected by gel electrophoresis or capillary electrophoresis or mass spectrometry or Southern blotting.
Alternatively, a polymorphism or mutation is detected using single stranded conformational polymorphism (SSCP) analysis. SSCP analysis relies upon the formation of secondary structures in nucleic acids and the sequence dependent nature of these secondary structures. In one form of this analysis an amplification method, preferably a PCR method, is used to selectively amplify a target nucleic acid from one genome of a polyploid subject that comprises a polymorphism or mutation. The amplified nucleic acids are then denatured, cooled and analyzed using, for example, non-denaturing polyacrylamide gel electrophoresis, mass spectrometry, or liquid chromatography (e.g. HPLC or dHPLC). Regions that comprise different sequences form different secondary structures, and as a consequence migrate at different rates through, for example, a gel and/or a charged field. Accordingly, both homozygous forms of a polymorphism or mutation and a heterozygous form of the polymorphism or mutation may be detected using such analysis. Clearly, a detectable marker may be incorporated into a probe/primer useful in SSCP analysis to facilitate rapid marker detection.
Allele specific PCR (as described, for example, In Liu et al, Genome Research, 7: 389- 398, 1997) is also useful for determining the presence of one or other allele of a polymorphism or mutation. An oligonucleotide is designed, in which the most 3' base of the oligonucleotide hybridizes with the polymorphism or mutation. During PCR amplification, if the 3' end of the oligonucleotide does not hybridize to a target sequence, little or no PCR product is produced, indicating that a base other than that present in the oligonucleotide is present at the site of polymorphism or mutation in the sample. PCR products are then detected using, for example, gel or capillary electrophoresis or mass spectrometry. In another example, PCR products are detected using real-time PCR analysis. For example, a label that binds to double stranded nucleic acid and fluoresces is incorporated into a PCR, and the level of fluorescence detected when the PCR is at a temperature permissive for double stranded nucleic acid formation.
Detection of increasing levels of fluorescence during PCR amplification indicates that
PCR product is being produced and indicates that allele present at the site of a polymorphism or mutation. An exemplary real-time amplification method is described in US Pat. No. 6,174,670.
Primer extension methods (described, for example, in Dieffenbach (Ed) and Dveksler (Ed) (In: PCR Primer: A Laboratory Manual, Cold Spring Harbour Laboratories, NY, 1995)) are also useful for the detection of a polymorphism or mutation. An oligonucleotide is used that hybridizes to the region of a nucleic acid adjacent to the polymorphism or mutation. This oligonucleotide is then used in a primer extension protocol with a polymerase and a free nucleotide diphosphate or dideoxynucleotide triphosphate that corresponds to either or any of the possible nucleotides that occur at the polymorphism or mutation. Preferably the nucleotide-diphosphate is labeled with a detectable marker (e.g. a flurophore). Following primer extension, unbound labeled nucleotide diphosphates are removed, e.g. using size exclusion chromatography or electrophoresis, or hydrolyzed, using for example, alkaline phosphatase, and the incorporation of the labeled nucleotide into the oligonucleotide is detected, indicating the nucleotide that is present at the site of the polymorphism or mutation.
The present invention also extends to high-throughput forms of primer extension analysis, such as, for example, minisequencing (Sy Vamen et at, Genomics 9: 341-342,
1995). In such a method, a probe or primer (or multiple probes or primers) is immobilized on a solid support (e.g. a glass slide). A sample comprising amplified target nucleic acid comprising a mutation or polymorphism is then brought into direct contact with the probe/s or primer/s, and a primer extension protocol performed with each of the free nucleotide bases labeled with a different detectable marker. The nucleotide present at a polymorphism or mutation is then determined by detecting the detectable marker bound to each probe and/or primer.
Fluorescently labeled locked nucleic acid (LNA) molecules or fluorescently labeled protein-nucleic acid (PNA) molecules are useful for the detection of polymorphisms or mutations (as described in Simeonov and Nikiforov, Nucleic Acids Research, 30(17): 1-
5, 2002). LNA and PNA molecules bind, with high affinity, to nucleic acid, in particular, DNA. Flurophores (in particular, rhodomine or hexachlorofluorescein) conjugated to the LNA or PNA probe fluoresce at a significantly greater level upon hybridization of the probe to target nucleic acid. However, the level of increase of fluorescence is not enhanced to the same level when even a single nucleotide mismatch
occurs. Accordingly, the degree of fluorescence detected in a sample is indicative of the presence of a mismatch between the LNA or PNA probe and the target nucleic acid, such as, in the presence of a SNP. Preferably, fluorescently labeled LNA or PNA technology is used to detect a single base change in a nucleic acid that has been previously amplified using, for example, an amplification method known in the art and/or described herein.
As will be apparent to the skilled artisan, LNA or PNA detection technology is amenable to a high-throughput detection of one or more markers by immobilizing an LNA or PNA probe to a solid support, as described in Oram et ah, CHn. Chem. 45: 1898-1905, 1999.
Molecular Beacons™ are also useful for detecting polymorphisms or mutations in an amplified product (see, for example, Mhlang and Malmberg, Methods 25: 463-471, 2001). Molecular Beacons™ are single stranded nucleic acid molecules with a stem- and-loop structure. The loop structure is complementary to the region surrounding the SNP of interest. The stem structure is formed by annealing two "arms" complementary to each other that are on either side of the probe (loop). A fluorescent moiety is bound to one arm and the other arm comprises a quenching moiety that suppresses any detectable fluorescence when the molecular beacon is not bound to a target sequence. Upon binding of the loop region to its target nucleic acid the arms are separated and fluorescence is detectable. However, even a single base mismatch significantly alters the level of fluorescence detected in a sample. Accordingly, the presence or absence of a particular nucleotide at the site of a polymorphism or mutation is determined by the level of fluorescence detected.
Genetic mapping
As will be apparent to the skilled artisan the polymorphisms and/or mutations identified by a method described herein according to any embodiment are useful for genetic mapping, e.g., to identify a gene causative of a trait of interest. Accordingly, the present invention also provides a method for identifying a gene associated with or causative of a trait of interest, said method comprising:
(i) identifying a polymorphism or mutation by performing a method described herein according to any embodiment; (ii) detecting a trait in a panel of polyploid subjects, wherein not all members of said panel comprise said polymorphism or mutation, and wherein variation in the trait
between the members of said panel indicates that said polymorphism or mutation is associated with or causative of said trait; and
(iii) identifying a gene linked to or associated with said polymorphism or mutation thereby identifying a gene associated with or causative of said mutation.
The present invention also provides a method for identifying a gene associated with or causative of a trait in a polyploid subject, said method comprising:
(i) identifying a locus associated with genetic variation in a trait in a polyploid subject; (ii) performing a method as described herein according to any embodiment to determine a plurality of polymorphisms or mutations linked to said locus; (iii) detecting a trait in a panel of polyploid subjects, wherein not all members of said panel comprise at least one of said polymorphism or mutation, and wherein variation in the trait between the members of said panel indicates that said polymorphism or mutation is associated with or causative of said trait; and
(iv) identifying a gene linked to or associated with said polymorphism or mutation, thereby identifying a gene associated with or causative of said mutation.
In one example, each member of said panel comprises at least one of said polymorphisms or mutations.
Preferably, the polyploid subjects are plants.
Preferably, the panel of polyploid subjects are near-isogenic. In the present context, the term "near isogenic polyploid subjects" shall be taken to mean a population of polyploid subjects having identity over a substantial proportion of their genomes, notwithstanding the presence of sufficiently few differences to permit the contribution of a distinct allele or gene to the trait to be determined by a comparison of the trait phenotypes of the population. As will be known to the skilled artisan, recombinant inbred lines, lines produced by several generations of backcrossing, or siblings, are suitable near-isogenic lines for the present purpose. •
To discover a marker/locus linkage, a segregating population is required. Experimental populations, such as, for example, an F2 generation, a backcross (BC) population, or recombinant inbred lines (RIL), can be used as a mapping population. Bulk segregant analysis, for the rapid detection of markers at specific genomic regions using
segregating populations, is described by Michelmoore et al, Proc. Natl Acad. ScL (USA) 88, 9828-9832, 1991. In the case of F2 mapping populations, F2 plants are used to determine genotype, and F2 families to determine phenotype. Recombinant inbred lines are produced by single-seed descent.
As for statistical methods, Single Marker Analysis (Point Analysis) is used to detect a locus in the vicinity of a single genetic marker. The trait in a population of plants segregating for a particular marker is compared according to the marker class. Presence or absence or differences provides an estimate of the phenotypic effect of substituting one allele for another allele at the locus. To determine whether or not the inferred phenotypic effect is significantly different from zero, a simple statistical test, such as t- test or F-test, is used. A significant value indicates that a locus is located in the vicinity of the marker. Single point analysis does not require a complete molecular linkage map. The further the locus is from the marker, the less likely it is to be detected statistically, as a consequence of recombination between the marker and the gene.
In the regression approach, the association between marker genotype and phenotype is determined by a process comprising:
(i) assigning numeric codes to marker genotypes; and
(ii) determining the regression value r for the trait against the codes, wherein a significant value for r indicates that the marker is linked to the locus for the trait, and wherein the regression slope gives an estimate of the effect of a particular locus on the trait.
For QTL interval mapping, the Mapmaker algorithm developed by Lincoln et al, Constructing genetic linkage maps with MAPMAKER/EXP version 3.0: A tutorial and reference manual. Whitehead Institute for Biomedical Research, Cambridge, MA, USA,
1993, can be used. The principle behind interval mapping is to test a model for the presence of a QTL at many positions between two mapped marker loci. This model is a fit of a presumptive QTL to the trait, wherein the suitability of the fit is tested by determining the maximum likelihood that a QTL for the trait lies between two segregating markers. For example, in the case of a QTL located between two segregating markers, the 2-loci marker genotypes of segregating progeny will each contain mixtures of QTL genotypes. Accordingly, it is possible to search for loci parameters that best approximate the distribution in the trait for each marker class. Models are evaluated by computing the likelihood of the observed distributions with and without fitting a QTL effect. The map position of a QTL is determined as the maximum
likelihood from the distribution of likelihood values (LOD scores: ratio of likelihood that the effect occurs by linkage: likelihood that the effect occurs by chance), calculated for each locus.
Interval mapping by regression (Haley and Knott, Heredity 69, 315-324, 1992) is a simplification of the maximum likelihood method supra wherein basic QTL analysis or regression on coded marker genotypes is performed, except that phenotypes are regressed on the probability of a QTL genotype as determined from the linkage between the trait and the nearest flanking markers. In most cases, regression mapping gives estimates of QTL position and effect that are almost identical to those given by the maximum likelihood method. The approximation deviates only at places where there are large gaps, or many missing genotypes.
Those skilled in the art will also be aware that it is possible to detect multiple interacting alleles or genes for a particular trait, such as, for example, using composite interval mapping approaches. To achieve this end, the composite interval mapping may be repeated to look for additional loci. Alternatively, or in addition, two or more distinct regions of the genome can be nominated as candidate loci, and a gamete relationship matrix constructed for each candidate locus, and a 2-locus regression performed for each pair of loci, determining a best fit for the interacting effects between the two loci or alleles at those loci, including any dominance or additive effects. The algorithm described by Carlborg et al, Genetics (2000) can be used for simultaneous mapping.
The present invention will now be illustrated by the following examples, which are not intended to be limiting in any way. The teachings of all references cited herein are incorporated herein by reference.
Example 1
Materials and methods
The present example describes reagents that are used in one embodiment of a method of the invention to detect a mutation or polymorphism in nucleic acid from a polyploid subject, such as bread wheat.
Plant Materials
Genomic DNA was extracted from frozen leaf material of barley (Hordeum vulgare) and bread wheat (Triticum aestivum) as described by Devos et al. Theoretical and Applied Genetics, 83: 931-939, 1992. Mixed DNA samples were prepared by pooling equal amount of genomic DNA for appropriate barley and wheat lines.
Primer Synthesis
Genome-specific primers for the amplification of target genes were obtained from published databases, or designed using Primer3 software (Rozen and Skaletsky, In: Krawatz and Misener (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, New Jersey, pp365-386, 2000) from expressed sequence tag (EST) sequence containing single nucleotide polymorphisms (SNPs) identified using AutoSNP (Barker et al. Bioinformatics, 19: 421-422, 2003) and QualitySNP (Tang et al. BMC Bioinformatics, 7: 438, 2006). Primers to amplify target genes located on the three homoeologous chromosomes of bread wheat were designed using Primer3 software to conserved sequence outside the polymorphic regions. Each set of primers were synthesized with a nucleotide sequence corresponding to a standard M13 sequencing primer at their 5 '-end. Specifically, the forward and reverse primers for each target gene was synthesized with the nucleotide sequence 5' CAC GAC GTT GTA AAA CGA C 3' (SEQ ID NO: 1) and 5' GGT TTT CCC AGT CAC GAC 3' (SEQ ID NO: 2) respectively, hereafter referred to as the Ml 3 forward and Ml 3 reverse sequencing primers. The sequences of primers used in this study are listed in Table 1. Ml 3 forward and reverse sequencing primers were synthesized with one of the following fluorescent dyes: VIC, FAM, NED and PET (Applied Biosystems).
Amplification of genomic sequences
PCR assays were performed in 25 μl reaction mixture containing 0.2 mM dNTP, Ix
PCR buffer supplemented with 1.5 mM MgCl2 (Qiagen), 0.2 μM each of forward and reverse M13-tailed primer, 50-100 ng genomic DNA and 1 U Taq DNA polymerase
(Qiagen). Following an initial denaturation step of 3 min at 940C, PCR was performed
for a total of 35 cycles with the profile: 30 s at 940C, 30 s at 550C, 60 s at 720C, and a final extension step of 10 min at 720C. PCR products were purified by ultrafiltration to remove residual primer and dNTP using a Multiscreen384 PCR cleanup plate (Millipore) according to the manufacturer's instructions. The purified PCR products were resuspended in 20 μl of sterile water, and quantified using a ND-1000 spectrophotometer (NanoDrop technologies) by measuring the absorbance at 260 nm wavelength.
Detection of polymorphisms Twenty five ng of purified PCR product was added to each of four reaction wells and dried by evaporation by heating for 15 min at 8O0C. Three μl of reaction mixture containing Ix Thermo Sequenase buffer (3 mM MgCl2, 12 niM Tris-HCl, and pH 9.4), 225 μM each of dATP, dTTP, dCTP and 7-deaza-dGTP (Roche), 0.5 μM dye-labeled M 13 forward (or Ml 3 reverse) sequencing primer, 0.45 U Thermo Sequenase DNA polymerase (GE Healthcare) and 0.75 μM of either ddATP, ddTTP, ddCTP or ddGTP was added to each reaction well. Following an initial denaturation step of 5 min at 950C, thermal cycling was performed for a total of 30 cycles consisting of 30 s at 950C, 30 s at 5O0C and 60 s at 720C. One μl of the resulting reaction products was added to 5 μl of deionised formamide containing 0.05 μl of GeneScanδOO LIZ size standard (Applied Biosystems). The mixture was heated uncovered for 5 min at 75°C to concentrate the reaction products by evaporating the water, and electrophoresed on an ABI 3730 DNA fragment analyser (Applied Biosystems) using the standard POP7 Genotyping Run
Module and spectral calibration filter set G5 (FAM, VIC, NED, PET, LIZ), with modification to the run time from 1200 s to 1500 s. Termination fragment sizing and alignment was performed using the AFLP Methods Module of GeneMapper v4.0 software (Applied Biosystems), according to the manufacturer's instructions.
Example 2
Detection of polymorphisms in samples from polyploid subjects The present example describes an example of an embodiment of a method of the present invention in which a polymorphism is detected in a target nucleic acid, in this case a nodulin-like gene in a genome of a polyploid subject, in this case bread wheat.
To detect a polymorphism in polyploid subjects, a target gene (i.e., nodulin-like gene) harboring known SNPs from bread wheat lines was amplified and the presence of a polymorphism detected using methods described in Example 1. The nodulin gene was
amplified using the primers cacgacgttgtaaaacgacTACTTCCTCGAGAAGTACGCCG (18B forward; SEQ ID NO: 3); and ggttttcccagtcacgacGTAGAGCGTGATCACCGTGG (18B reverse; SEQ ID NO: 4). Initially, the repeatability of the termination fragment pattern amplified for individual dideoxynucleotides was assessed by replicated assay of the same samples. Next, the reproducibility of the termination fragment patterns amplified for individual dideoxynucleotides from different samples harboring known SNPs were compared.
In general, the alignment of the termination fragment profiles amplified for each dideoxynucleotide was highly reproducible for replicate assays of the same sample and between different samples harboring known SNPs. Overlay of the electrotraces for individual dideoxynucleotides amplified from replicate and different samples revealed only minor differences, observed as slight offsets in the alignment of some of the termination fragments in the overlaid electrotraces (Figure IA).
Alignment and overlay of electrotraces for individual dideoxynucleotides amplified from samples harboring a SNP revealed visually obvious differences in the termination fragment profiles at the position of the nucleotide variant. These differences were observed as the presence-absence of a termination fragment between the samples compared (Figure IB), or a quantitative difference in the peak height of the termination fragments (Figure 1C). The relatively uniform peak height of termination fragments in overlaid electrotraces, except for those associated with sequence variation, facilitates the quantitative detection of mutations in complex DNA mixtures, e.g., genomic DNA from polyploid subjects. The high level of assay reproducibility also permits automation of the mutation discovery process using pattern recognition algorithms.
Example 3
Targeted identification of polymorphisms in polyploid genomes
This example describes an embodiment of the present invention in which a mutation is detected in an acetohydroxyacid synthase (AHAS) gene on a genome of a polyploid subject using bulked segregant analysis, i.e., using pooled genomic DNA samples from a plurality of subjects having a similar phenotype. The mutation was mapped to a genome of the polyploid subject using aneuploid stocks. This example also describes the identification of HSVs in a genome of a polyploid subject that are linked to the mutation.
Bulked segregant analysis (BSA) is a rapid approach to identify genetic markers linked to a target gene. BSA involves genotyping two pools of DNA from phenotypically distinct subjects, e.g., plants, originating from a segregating population, for sequence polymorphisms with a skewed allele frequency (Michelmore et al. Proc. Natl. Acad. Sd. USA, 88: 9828-9832, 1991). The underlying principle of BSA is that each pool of DNA contains individual subjects, e.g., plants with substantially identical genotypes for the target genomic region, but comprises random genotypes at unlinked loci. Hence, sequence polymorphisms with a skewed allele frequency are considered to be genetically linked to the trait locus.
The use of BSA to identify SNPs in candidate genes relies on the availability of unique genomic sequence to design primers for PCR amplification of a target locus. This is especially important in higher plant genomes where gene duplication and polyploidy can confound the detection of allelic SNPs due to the presence of homoeologous and paralogous sequences.
The present invention provides an advantage over other methods in so far as it deconvolutes sequence complexity resulting from the amplification of homoeologous and paralogous genes when genomic sequence is unavailable for the design of specific primers to amplify the target locus. This is achieved by comparing the termination fragment profiles amplified from two pools of DNA for a single nucleotide at a time.
To demonstrate the method of the invention for identifying SNPs in candidate genes using BSA, a characterized mutation in an acetohydroxyacid synthase (AHAS) gene was detected in samples from bread wheat. This mutation confers tolerance to the herbicide imidazolinone. The AHAS gene is located on the three homoeologous group 6 chromosomes and contains numerous homoeologue sequence variants (HSVs). Two pools of DNA were prepared using wheat lines with contrasting phenotypes. The first DNA pool comprised wheat varieties Tasman, Kukri and Excalibur and was homozygous for the wild type (herbicide susceptible) AHAS allele. The second DNA pool comprised the wheat lines CF-Janz, CF-Sunstate and CF-Frame and was homozygous for the herbicide tolerance (mutant) allele on the B-genome, and fixed for the wild type allele on the A- and D-genomes. Hence, the two pools of DNA had identical AHAS genotypes for the A- and D-genomes but a different genotype for the B- genome, corresponding to the SNP (a cytosine to thymine transition) conferring herbicide tolerance. The SNP genotype for the herbicide susceptible DNA pool was
CC/CC/CC for the A-, B- and D-genomes, respectively. The genotype for the herbicide tolerant DNA pool was CC/TT/CC for the A-, B- and D-genomes, respectively. The AHAS gene was amplified from the two DNA pools using primers that amplified conserved regions of the A-, B- and D- genomes (cacgacgttgtaaaacgacGTAGGACAAGAAACTTGCATG (LS-MI900 forward; SEQ ID NO: 5), and ggttttcccagtcacgacGTAGGACAAGAAACTTGCATG (LS-IMI900 reverse; SEQ ID NO: 6)). Amplification, sequencing and comparison of the resulting termination fragment profiles was performed essentially as described in Example 1.
Comparison of the termination fragment profiles for the individual dideoxynucleotides from the two DNA pools allowed for unambiguous detection of the SNP conferring herbicide tolerance, despite the presence of the homoeologous genes amplified from the A- and D-genomes (Figure 2). Facilitating the identification of the target SNP was that the presence of HSVs was masked by the analysis of the termination fragment profiles for one nucleotide at a time. This masking effect results because the homoeologous genes are present in equal proportion in the two DNA pools, and therefore produce the same termination fragment profiles. Without being bound by theory or mode of action, a difference in a termination fragment profile between the two DNA pools can only result from sequence variation linked to the target genomic region. In contrast, allelic SNPs and HSVs would have been detected as mixed peaks in Sanger sequencing chromatograms produced using standard sequence analysis software. An advantage of the method of the invention compared to conventional Sanger sequencing, is that analyzing the termination fragment profiles for one dideoxynucleotide at a time allows the simplification of the complex DNA mixture such that allelic differences between the DNA pools can be readily identified (Figure 3).
To further illustrate the ability of the method of the invention to simplify the identification of SNPs in candidate genes using BSA and facilitate the development of genetic markers, aneuploid stocks for bread wheat (Sears In: Riley and Lewis (eds) Chromosome manipulation and plant genetics. Oliver and Boyd, Edinburgh, pp29-50, 1965) were used to determine the genotypes at the HSVs in the AHAS gene. To achieve this, conserved primers (LS-IMI900 forward and reverse) were used to amplify the AHAS gene from the nullisomic-tetrasomic wheat stocks for the group 6 chromosomes essentially as described in Example 1. Sequencing and analysis were performed essentially as described in Example 1 using the resulting PCR products and the
termination fragment profiles for each nullisomic-tetrasomic stock was compared with the euploid wheat Chinese Spring.
Comparison of the termination fragment profiles for the individual dideoxynucleotides from each aneuploid stock allowed the genotype at the HSVs to be determined for individual genomes (Figure 4). The HSV genotypes were inferred from the absence of termination fragments in the nullisomic-tetrasomic lines. For example, for one HSV shown in Figure 4 the absence of a termination fragment in the ddC electrotrace of the nullisomic 6B stock indicated that the HSV genotype on the B-genome was CC, while the presence of the termination fragment in the remaining two nullisomic lines and presence of a termination fragment in the ddT electrotrace at the same position indicated that the HSV genotypes for both the A- and D -genomes was TT. In all instances, the HSV genotypes determined from the electrotraces corresponded to the published sequence for the three AHAS genes. These results demonstrate that the analysis of termination fragment profiles for one dideoxynucleotide at a time allows for simplification of complex DNA mixtures such that sequence variation between samples can be readily identified.
Example 4 Deriving nucleotide sequence of region surrounding polymorphisms
This example describes an embodiment of the present invention in which the sequence of nucleic acid flanking or adjacent to a mutation in a genome of a polyploid subject is determined by separately determining one or a plurality of positions of each of the four naturally-occurring nucleotides in a target sequence from an AHAS gene in bread wheat, aligning those sequences and overlaying those sequences.
Characterization of both the position and nucleotide substitution of a mutation or polymorphism, as well as determination of the flanking nucleotide sequence in a single assay allows for the immediate development of a genetic marker to permit simple detection of the mutation or the SNP.
To assess the accuracy of the method of the present invention for determining the nucleotide sequence of a sample, target genes harboring known SNPs from barley and bread wheat lines were amplified. An exemplary gene is the AHAS gene amplified using the primers LS-IMI900 forward and LS-EVII reverse. For each target gene, assays were performed essentially as described in Example 1 to amplify termination fragment
profiles for each of the four dideoxynucleotides. The nucleotide sequence of the sample was determined by overlaying the electrotraces for each of the dideoxynucleotides and assigning the nucleotide sequence from the termination fragments with the greatest peak heights (Figure 5), according to the principles for sequence determination described for manual Sanger sequencing (Sanger et al. Proc. Natl. Acad. Sd. USA, 74: 5463-5467, 1977). The accuracy of the derived nucleotide sequence was determined by comparison to the published sequence. An accuracy of more than 95% was achieved, on average, for sequence read lengths of up to 600 nucleotides, indicating that the nucleotide sequence of PCR fragments up to 1200-bp in length may be derived if the method of the invention was performed from both ends of a PCR fragment. Sufficient nucleotide sequence flanking a SNP or mutation could always be obtained to permit the development of a genetic marker.
Example 5 Polymorphisms are characterized by a mobility shift between termination fragments
This example describes an embodiment of the present invention in which a variable nucleotide is detected in a target sequence by detecting a reduced level of nucleic acid fragments corresponding to an allele of the variable nucleotide in samples from polyploid subjects comprising the variable nucleotide. Moreover, by aligning sequencing results that detect each allele of the variable nucleotide, a change in molecular weight of nucleic acid fragments comprising ddNTP corresponding to each allele was identified.
During the experiment performed to assess the accuracy of the method of the present invention for determining the nucleic acid sequence of a sample (Example 4), it was observed that the presence of a SNP between homozygous samples was characterized by a mobility shift between the termination fragments at the position of the nucleotide variation. In this respect, alignment of overlaid electrotraces for the termination fragment profiles of the four dideoxynucleotides for two or more samples harboring a known SNP revealed a visually distinct size difference for the termination fragments corresponding to the SNP (Figure 6).
To further investigate the association of the mobility shift observed for termination fragments corresponding to sequence variation, target genes harboring known SNPs from three pools of bread wheat DNA containing different doses of each SNP allele were amplified using the primers cacgacgttgtaaaacgacGCCAAATCTGTTGGCGATTA
(37D forward; SEQ ID NO: 7) and ggttttcccagtcacgacCGTTCGCCAACGCCCGGA (37D reverse; SEQ ID NO: 8). The first pool of DNA comprised a single homozygous wheat line, and was therefore fixed for one of the SNP alleles present at the target locus. The second pool of DNA comprised two homozygous wheat lines, each with a different SNP allele at the target locus. Hence, the frequency of each SNP allele in the second pool was 50%. The third pool of DNA contained five homozygous wheat lines, four of which had the same SNP allele at the target locus while the fifth line had the alternate SNP allele. Hence, the frequency of the two alleles in the third DNA pool was 80 and 20%, respectively. For each target gene, assays were performed to amplify termination fragment profiles for each of the four dideoxynucleotides from each of the three DNA pools essentially as described in Example 1. The electrotraces for the four dideoxynucleotides for each DNA pool were overlaid, and the resulting composite electrotraces were aligned with one another as shown in Figure 7.
As shown in Figure 7, the presence of a SNP was associated with a shift in the mobility of the termination fragments corresponding to the sequence polymorphism when electrotraces overlaid for the four dideoxynucleotides were aligned for different samples. For homozygous samples, the presence of a SNP was detected as a distinct difference in the mobility of a single termination fragment corresponding to the SNP (Figure 6 and Figure 7A). In heterozygous samples, the presence of a SNP was observed as two distinct termination fragments with a smaller mobility difference than would be expected between termination fragments corresponding to adjacent nucleotides (Figure 7A). These termination fragments corresponded to the presence of the two SNP alleles in heterozygous samples, and were detectable even when the alleles were present at different frequencies (compare pools 2 and 3 in Figure 7A). Detection of a mobility shift in the termination fragments corresponding to a SNP in heterozygous samples was facilitated by examining individual electrotraces for the four dideoxynucleotides (as shown in Figure 7 A and 7B).
Amplification of a nucleic acid harboring a SNP amplified from a heterozygous sample using the method of the invention is expected to generate two equally sized termination fragments at the position of the SNP, each of which is terminated by one of the dideoxynucleotides corresponding to the nucleotide variation. However, whilst the two termination fragments have equal nucleotide length, termination by different dideoxynucleotides results in a subtle difference in their electrophoretic mobility that can be visualized in the aligned electrotraces of the four dideoxynucleotides. To
determine whether or not different nucleotides caused a change in mobility of a nucleic acid fragment when electrophoresed, the 3 '-ends of synthetic oligonucleotides (SEQ ID NO: 9) were labeled with different dideoxynucleotides and their electrophoretic mobility compared in denaturing polyacrylamide gels (Figure 8). The results of these experiments showed that the electrophoretic mobility of dideoxynucleotide-terminated oligonucleotides was affected by the type of dideoxynucleotide attached, with ddC, ddA, ddT and ddG having an increasingly negative effect on oligonucleotide mobility, respectively. These results confirmed that the mobility shift observed for termination fragments corresponding to sequence variation in the aligned electrotraces was due to the effect of the type of dideoxynucleotide-termination on the electrophoretic mobility of the termination fragments.
Example 6
Sensitivity of SNP detection in mixed samples This example describes an application of an embodiment of the present invention in the identification of a polymorphism in a sample comprising pooled genomes from a plurality of subjects. As described below this embodiment of the invention detected an allele of the SNP in samples in which the frequency of the allele compared to another allele in the sample was 5:1.
The identification of SNPs in pooled samples presents an opportunity to increase assay throughput and reduce costs. However, it also presents several challenges due to the requirement to detect mutations that may be present at low frequency in a sample. To assess the ability of the method of the present invention to reliably detect mutations in pooled samples, target nucleic acids harboring SNPs from pooled genomic DNA of barley lines with known genotypes were amplified. An exemplary nucleic acid is from the triticin precursor gene amplified using the primers cacgacgttgtaaaacgacTGCAACTTGCGAAACGAACC (hvLSP45 forward; SEQ ID NO: 10) and ggttttcccagtcacgacAGTTGCCCCGGGCTAAGAAG (hvLSP45 reverse; SEQ ID NO: 11). Target nucleic acids were amplified from a total of five DNA pools. The first DNA pool consisted of a single homozygous barley line, while the remaining pools were comprised of a mixture of two homozygous lines with alternate SNP alleles. The frequency of the SNP alleles in the second, third, fourth and fifth DNA pools was 1:1, 2:1, 5:1 and 11:1, respectively. For each target gene, termination fragment profiles for the four dideoxynucleotides from each DNA pool were produced essentially as described in Example 1. The sensitivity of the assay for SNP detection was assessed
using a two-tiered approach. First, the ability to detect the SNP in pooled samples with different allele dosage was assessed by overlaying the electrotraces for individual dideoxynucleotides from DNA pools 2 to 5 with the corresponding electrotrace from DNA pool 1 (Figure 9A). In general, the presence of the SNP in the overlaid electrotraces could be identified by a difference in the peak height of the termination fragments corresponding to the sequence polymorphism in pooled samples with an allele frequency of 17% (5:1 ratio of allele A to allele B; Figure 9A). And second, the overlaid electrotraces for the termination fragment profiles of the four dideoxynucleotides amplified from each DNA pool were aligned. Again, the presence of the SNP could be visually determined by a mobility shift in the termination fragments corresponding to the sequence variation in the DNA pools with an allele frequency of 17% (5:1 allele ratio; Figure 9B). Detection of both a peak height difference and a mobility shift facilitates reliable identification of unknown sequence variation in pooled samples containing a mutation with less than 20% representation.
Example 7
Identification of SNPs in complex DNA mixtures
This example describes an embodiment of the invention in which a polymorphism is identified in a genome of a polyploid subject and in which target nucleic acids amplified from different genomes of the polyploid subject are different lengths as a result of a microsatellite in the sequence. By aligning sequence information from each subject the difference in length of the microsatellite is accounted for and a polymorphism is identified in a genome of a polyploid subject.
To investigate the ability of a method described herein to identify genetic variation in a complex DNA mixture, primers comprising the nucleotide sequence cacgacgttgtaaaacgacGCAAAGTGTAGCCGAGGAAG (SEQ ID NO: 26) and ggttttcccagtcacgacTTAGAGTTTTGCAGCGCCTT (SEQ ID NO: 27) were used to amplify homoeologous sequences containing a microsatellite repeat from chromosomes 2A and 2D of bread wheat. The target sequences were amplified using conserved primers from the varieties Stylet and Wylkatchem, and Chinese Spring nullisomic- tetrasomic stocks for the group 2 chromosomes. Separation of the PCR products on an 8% sequencing gel revealed the amplification of two fragments of 173-bp and 236-bp from Stylet, and 165-bp and 236-bp from Wylkatchem, which were assigned to chromosomes 2A and 2D using the nullisomic-tetrasomic stocks.
Assays of the present invention using each of the four dideoxynucleotides individually were performed using the resulting PCR products amplified from each wheat line. To identify allelic SNPs the termination fragment profiles for Stylet and Wylkatchem were aligned and compared for the presence of sequence variation. The chromosomal origin of putative SNPs was determined by further comparing the aligned termination fragment profiles of the two wheat varieties with those for the nullisomic-tetrasomic stocks.
An initial comparison of the termination fragment profiles for the individual dideoxynucleotides revealed that the 8-bp allele size difference observed for the PCR fragments amplified from chromosome 2A in Stylet and Wylkatchem was due to microsatellite repeat length variation. Manual alignment of the termination fragment profiles to correct for the microsatellite length variation allowed the identification of three putative SNPs in the flanking sequence. Each SNP was detected by the presence- absence of a termination fragment in the aligned electrotraces (Figure 10). Comparison of the aligned termination fragment profiles of the two wheat varieties with those for the nullisomic-tetrasomic stocks showed that the three SNPs were present in the sequences amplified from chromosome 2A.
To validate the results of this assay, the PCR fragments amplified from Stylet and Wylkatchem using the cfd36 forward primer comprising the sequence CACGACGTTGTAAAACGACGCAAAGTGTAGCCGAGGAAG (SEQ ID NO: 28) and the cfd36 reverse primer comprising the sequence GGTTTTCCCAGTCACGACTTAGAGTTTTGCAGCGCCTT (SEQ ID NO: 29) were sub-cloned and sequenced by Sanger sequencing. Alignment of the sequences amplified from chromosome 2A confirmed that microsatellite repeat length variation was responsible for the observed 8-bp allele size difference, and the presence of the three SNPs identified by the method of the present invention in the flanking region. No sequence variation was observed for the 236-bp PCR fragments amplified from chromosome 2D (Figure 11).
These results confirm that the method of the present invention can reliably detect genetic variation in a complex DNA mixture composed of homoeologous sequences with different lengths. In contrast, sequence analysis of this complex DNA mixture by Sanger sequencing would have been confounded by the homologue sequence variation unless a sub-cloning step was performed.
Example 8
Accuracy of the method of the invention for identifying unknown mutations This example describes validation of an embodiment of the present invention in which two blinded studies were performed and polymorphisms or mutations were identified in genes on a genome of polyploid subjects. Also described is an embodiment of the present invention in which the sequence surrounding each polymorphism or mutation was determined and an allele specific primer produced that permitted an amplification reaction to be performed, e.g., PCR that distinguished between different alleles of the polymorphism or mutation.
To confirm the accuracy of the method of the invention for the identification of unknown mutations in complex DNA mixtures and to validate the estimate for the detection sensitivity, two blinded studies were performed in which candidate genes for a region of interest on the group 3 chromosomes of bread wheat were resequenced.
Blinded Study #1
To confirm the accuracy of the method of the invention for indentifying unknown mutations and validate the estimate for the detection sensitivity in mixed samples, a blinded study was performed in which candidate genes for a region of interest on the group 3 chromosomes of bread wheat were resequenced. A total of 32 genes were amplified using essentially genome-specific primers from the wheat variety Kukri and a pooled sample comprised of the wheat varieties Timgalen, Trident, Spear, Berkut and Krichauff. Assays were performed using the resulting PCR products to amplify termination fragment profiles for the four dideoxynucleotides essentially as described in Example 1, and the electrotraces were assessed for the presence of SNPs. Overall, three putative SNPs were identified, one in each of three genes. To confirm the presence of the putative SNPs and determine the accuracy for SNP detection, the candidate genes were amplified from each of the wheat varieties and assays were performed essentially as described in Example 1. Comparison of the electrotraces for individual wheat lines for each of the candidate genes confirmed the presence of the three putative SNPs, and revealed that no mutations were missed in the initial screen. Two of the SNPs were represented in the pooled sample with an allele frequency of 40%, and one SNP was represented with 20% allele frequency. These results show that the method of the invention can achieve a high degree of accuracy for the identification of unknown mutations in mixed samples.
To illustrate the development of diagnostic molecular markers to assay genetic variation identified using the method of the invention, the nucleotide sequence flanking each SNP was determined from the termination fragment profiles of the four dideoxynucleotides and used to design allele-specific primers. DNA typing of the six wheat varieties on a
LuminexlOO instrument using an allele-specific primer extension assay (essentially as described in Lee et ah, Theoretical and Applied Genetics 110, 167-174, 2004) revealed the expected genotypes (Table 1). Subsequent genetic mapping of the SNPs in segregating doubled haploid populations confirmed the location of the SNP loci on their expected group 3 chromosome.
Table 1. Sequences of primers used to develop diagnostic markers for SNPs identified using the method of the invention, and the genotypes observed for the wheat varieties using an allele-specific primer extension assay performed on a LuminexlOO instrument.
SNP genotype
18B Fwd 3B
Rvs
Ki
37D Fwd 3D AG
Rv;
Rvs ATcacGccAccTTcππT
Blinded Study #2
In this study, the ability to identify genetic variation in pooled DNA mixtures containing homoeologous sequences of the same and different lengths was investigated. A total of five genes were amplified using conserved primers from the wheat variety Chinese Spring and three pooled samples each comprising eight wheat accessions: bulk A (Balkan, Belliei 590, Chortandinka, Courtot, Glenlea, Renan, Synthetic W7984), bulk B (Chyamtang, Miskaagani, N46, NYU Bay, Seu Seun 27, Opata 85, Chinese Spring, Berkut) and bulk C (Janz, Hartog, Dagger, Machete, Westonia, Molineux, Gladius, Krichauff). Each primer set was expected to amplify homoeologous sequences from the three wheat genomes to produce a mixture of DNA fragments with either the same or different lengths. Each gene was also amplified from Chinese Spring nullisomic- tetrasomic wheat stocks for the group 3 chromosomes. An assay was performed essentially as described in Example 1 using the resulting PCR products to amplify termination fragment profiles for the four dideoxynucleotides, and the electrotraces were assessed for sequence variation. Allelic variation was identified by comparing the termination fragment profiles of the variety Chinese Spring with the pooled samples, while homologue-sequence variants (HSVs) were identified by comparing the variety Chinese Spring (euploid) with the nullisomic-tetrasomic wheat stocks (aneuploids).
An allelic SNP was detected in each of two genes. In both instances, the SNP was homozygous in the variety Chinese Spring and heterozygous in the pooled samples (Figures 12 and 13). To confirm the presence of the SNPs, the candidate genes were reamplifϊed from the individual accessions and an assay essentially as described in Example 1 was performed. Comparison of electrotraces for the individual accessions confirmed the presence of the SNPs and that no mutations were missed. The two SNPs were each represented in the pooled samples with an allele frequency of 25, 25 and 38% for bulked samples A, B and C, respectively. These results confirmed the accuracy of the method of the invention for the identification of sequence variation in pooled samples comprised of homoeologous DNA fragments of different lengths.
Example 9
Fine mapping of a boron toxicity tolerance gene in bread wheat
This example describes an embodiment of the present invention in which polymorphisms or mutations linked to a boron toxicity tolerance gene were identified. The method described in this example comprises performing a method of the present invention to identify polymorphisms or mutations in a region of chromosome 7B in bread wheat linked to boron toxicity tolerance using pooled or bulked DNA samples from bread wheat tolerant to boron toxicity or susceptible to boron toxicity. By identifying variable nucleotides between the two pools or bulks of DNA a polymorphism or mutation associated with or linked to boron toxicity tolerance is identified. Using nullisomic-tetrasomic lines of bread wheat the identified polymorphisms or mutations are mapped to a genome of bread wheat. HSVs were also identified to facilitate production of primers capable of amplifying the genome of bread wheat comprising a polymorphism or mutation associated with an allele of a gene conferring boron toxicity tolerance.
To demonstrate the utility of the method of the present invention for rapid fine-mapping of a genomic region containing an allele conferring a key trait, for example a quantitative trait locus (QTL), the BoI boron toxicity tolerance locus on chromosome 7B in bread wheat was analysed using a doubled haploid population derived from a cross between the varieties Cranbrook and Halberd (Schnurbusch et al. Theor. Appl. Genet, //5: 451-461, 2007).
Bulk segregant analysis was performed using doubled haploid lines phenotypically selected for tolerance or sensitivity to boron toxicity. Six candidate genes underlying the boron toxicity tolerance QTL interval were identified by comparative genomics using wheat-rice synteny and amplified by PCR from the parental lines and two DNA bulks. Each gene was also amplified from Chinese Spring nullisomic-tetrasomic wheat stocks for the group 7 chromosomes using PCR primers designed to anneal to conserved regions of the candidate genes, and to span putative intron-exon junctions. The sequences of the primers are shown in Table 2. Each primer pair was expected to amplify homoeologous nucleic acids from each of the three wheat genomes with, potentially, different length PCR fragments.
Table 2: Sequences of primers used to amplify candidate genes. Sequence (5'— >3') corresponding to the Ml 3 sequencing primer sequence and locus-specific sequence is in lower and upper case, respectively.
Sequencing reactions were performed using the PCR products as template and using a single ddNTP. The resulting electrotraces were assessed for sequence variation. Allelic variation linked to the BoI gene was identified by differences in the termination fragment profiles of the two bulked DNA samples, which corresponded to differences observed in the termination fragment profiles of the parental lines. HSVs and the chromosomal origin of allelic SNPs were determined by further comparing the aligned termination fragment profiles of the four samples with those for the nullisomic- tetrasomic stocks.
Allelic SNPs linked to the BoI gene were identified in all six candidate genes. As represented in Figure 14 A, complex termination fragment profiles were observed due to the amplification of all three wheat genomes including HSVs and allelic SNPs. However, allelic SNPs were readily identified by comparing the termination fragment profiles for the two DNA bulks, in which the presence of HSVs were masked as a result of genetic identity between the bulks except at locations of the genome linked to the
BoI locus. The arrows in Figure 14A indicate the first sequence variation between the two DNA bulks that could be used either directly as, or to develop a linked marker for resistance or susceptibility to boron toxicity.
The association between the identified allelic SNPs and the boron toxicity tolerance QTL was further confirmed in a blinded genotyping study using 20 randomly selected doubled haploid lines from the Cranbrook x Halberd cross with known BoI genotypes (Figure 14B).
To further demonstrate linkage between allelic variation observed in the candidate genes to chromosome 7B and boron toxicity tolerance, and to identify HSVs to develop genome-specific markers, the termination fragment profiles produced using DNA from the nuUisomic-tetrasomic stocks were compared with the termination fragment profiles produced using DNA from bulked DNA, samples. HSV genotypes were inferred by the absence of termination fragments in the nullisomic-tetrasomic lines and presence in bulked DNA samples. For example, the absence of a termination fragment in the ddC electrotrace of the nullisomic 7B stock indicated that the HSV genotype on the B- genome was CC, while the presence of the termination fragment in the remaining two nullisomic lines and presence of a termination fragment in an alternative electrotrace at the same position indicated the HSV genotypes for both the A- and D-genomes was GG (Figure 14C).
The sequential identification of allelic variation in candidate genes linked to the BoI gene using bulked segregant analysis, followed by assignment of HSV genotypes using the nullisomic-tetrasomic stocks provides an effective approach to identify sequence variation for the development of PCR-based markers. These markers will facilitate the rapid fine-mapping of the QTL interval to provide tightly-linked markers for direct selection of the BoI gene in wheat breeding, and to facilitate positional cloning of the BoI gene.
Claims
1. A method for identifying a polymorphism or mutation in a genome of a first polyploid subject, said method comprising:
(i) comparing a position or a plurality of positions at which a specific nucleotide residue occurs within a target nucleic acid from genomes of the first polyploid subject to the position(s) at which the specific nucleotide residue occurs in a corresponding target nucleic acid from genomes of a second polyploid subject to thereby identify one or more variable nucleotides between the genomes of the first and second polyploid subjects; and
(ii) determining linkage between the identified one or more variable nucleotides and one or more nucleotides within a genome of the first polyploid subject, thereby identifying a polymorphism or mutation in the genome of the first polyploid subject.
2. A method for identifying a polymorphism or mutation in a genome of a first polyploid subject, said method comprising:
(i) determining a position or a plurality of positions at which a specific nucleotide residue occurs within a target nucleic acid from genomes of the first polyploid subject; (ii) comparing the position(s) at which the specific nucleotide residue occurs in (i) to the position(s) at which the specific nucleotide residue occurs in a corresponding target nucleic acid from genomes of a second polyploid subject to thereby identify one or more variable nucleotides between the genomes of the first and second polyploid subjects; and
(iii) determining linkage between the identified one or more variable nucleotides and one or more nucleotides within a genome of the first polyploid subject, thereby identifying a polymorphism or mutation in the genome of the first polyploid subject.
3. The method according to claim 1 or 2, wherein the first and second polyploid subjects are from the same species, variety, cultivar or line.
4. The method according to any one of claims 1 to 3 additionally comprising amplifying a target nucleic acid prior to determining a position or plurality of positions of the specific nucleotide residue within the target nucleic acid.
5. The method according to any one of claims 1 to 4, wherein a position or a plurality of positions at which the specific nucleotide residue occurs within a target nucleic acid is determined by performing a dideoxynucleotide triphosphate (ddNTP) terminated sequencing method.
6. The method according to claim 5, wherein a position or a plurality of positions at which the specific nucleotide residue occurs within the target sequence is determined by performing a method comprising:
(i) performing an amplification reaction with a primer comprising a locus-specific region capable of annealing to the target nucleic acid and a tag-sequence that does not anneal to the target nucleic acid to thereby amplify the target nucleic acid, wherein amplicons of the target nucleic acid comprise the tag sequence;
(ii) performing a sequencing reaction with a primer capable of annealing to the tag sequence in the presence of a specific ddNTP; and
(iii) detecting the molecular weight of the nucleic acid fragments produced at (ii), wherein the molecular weight of each fragment corresponds to the position or positions of the specific nucleotide corresponding to the specific ddNTP in the target nucleic acid.
7. The method according to claim 6, wherein the primer capable of annealing to the tag sequence is labeled with a detectable marker to facilitate detection of the nucleic acid fragments.
8. The method according to any one of claims 1 to 7, comprising aligning the position or plurality of positions at which the specific nucleotide residue occurs within target nucleic acids from genomes of the first and second polyploid subjects prior to comparing a position(s) at which the specific nucleotide residue occurs within a target nucleic acid from genomes of the first and second polyploid subjects.
9. The method according to any one of claims 1 to 8, wherein determining the specific nucleotide that occurs in the first polyploid subject but not in the second polyploid subject comprises detecting the presence of the nucleotide at a position in the target nucleic acid from genomes of the first polyploid subject and the absence of the nucleotide at the position in the corresponding target nucleic acid from genomes of the second polyploid subject.
10. The method according to any one of claims 1 to 8, wherein determining the specific nucleotide that occurs in a first polyploid subject but not in second polyploid subject comprises detecting an increased or decreased number of copies of the specific nucleotide at a position within the target nucleic acid from genomes of the first polyploid subject compared to within the corresponding target nucleic acid from genomes of second polyploid subject.
11. The method according to claim 10, wherein an increase or decrease in the number of copies of the specific nucleotide is detected by detecting an increased or decreased amount of termination fragments produced by performing a ddNTP-mediated sequencing method.
12. The method according to any one of claims 1 to 11, wherein linkage between the variable nucleotide and one or more nucleotides in a genome of a polyploid subject is determined by detecting the presence or absence of the variable nucleotide in one or more first mutant polyploid subject(s) in which at least a corresponding target nucleic acid in a genome has been deleted or otherwise removed, and in one or more second mutant polyploid subject(s) in which a corresponding target nucleic acid in other genome(s) has been deleted or otherwise removed, wherein absence of the variable nucleotide in the first mutant subject, and presence of the variable nucleotide in the second mutant polyploid subject indicates that the variable nucleotide is linked to the genome comprising the deletion in the first polyploid subject.
13. The method according to claim 12, wherein the mutant polyploid subjects are isogenic other than the deletion in the target regions.
14. The method according to any one of claims 1 to 13 additionally comprising determining the sequence of nucleic acid adjacent to the polymorphism or mutation.
15. The method according to claim 14, wherein the sequence of nucleic acid adjacent to the polymorphism or mutation is determined by separately determining the position of each nucleotide within a region of the target sequence adjacent to the polymorphism or mutation and deriving the nucleotide sequence adjacent to the site of the polymorphism or mutation based on the position of each nucleotide residue so determined.
16. The method according to claim 1, additionally comprising determining a nucleotide specific to the genome in which the polymorphism or mutation occurs and linked to the polymorphism or mutation in the target nucleic acid in which the polymorphism or mutation occurs.
17. The method according to claim 1 , further comprising:
(i) determining a position or a plurality of positions at which the specific nucleotide residue occurs within a target nucleic acid in genomes of a first mutant polyploid subject in which at least the target nucleic acid in one genome of the first mutant polyploid subject has been deleted or otherwise removed, wherein the genome that has been deleted or otherwise removed also comprises the mutation or polymorphism; and (ii) comparing the position(s) of the specific nucleotide residue determined at (i) to the position(s) of the specific nucleotide residue in the target nucleic acid in a corresponding target nucleic acid within genomes of a second mutant polyploid subject in which at least the target nucleic acid in a different genome to that of the first subject has been deleted or otherwise removed to determine a nucleotide that occurs in the first polyploid subject but not in the second polyploid subject thereby detecting a variable nucleotide, thereby detecting a nucleotide that occurs in one genome of a polyploid subject and not in another genome of the polyploid subject, wherein the nucleotide is linked to the polymorphism or mutation.
18. The method according to claim 16 or 17 additionally comprising determining the sequence of nucleic acid adjacent to nucleotide specific to the genome in which the polymorphism or mutation occurs and linked to the polymorphism or mutation.
19. A process for producing a probe or primer for detecting a polymorphism or mutation, said process comprising:
(i) performing the method according to claim 14 or 15 to identify a polymorphism or mutation and the sequence of nucleic acid adjacent to said polymorphism or mutation; and
(ii) producing or obtaining a probe or primer comprising the sequence determined at (i) and capable of preferentially or specifically annealing or hybridizing to a nucleic acid comprising said polymorphism or mutation.
20. A process for producing a probe or primer for detecting a polymorphism or mutation, said process comprising: (i) obtaining the sequence of a nucleic acid comprising a polymorphism or mutation and nucleic acid adjacent thereto determined by performing the method according to claim 14 or 15; and
(ii) producing or obtaining a probe or primer comprising the sequence obtained at (i) and capable of preferentially or specifically annealing or hybridizing to a nucleic acid comprising said polymorphism or mutation.
21. A process for producing a primer for amplifying a target sequence in a genome of a polyploid subject, said target sequence comprising a polymorphism or mutation, said process comprising:
(i) performing the method according to claim 18 to determine the sequence of a nucleic acid comprising a nucleotide specific to a genome of a polyploid subject in which a polymorphism or mutation occurs and that is linked to the polymorphism or mutation; and
(ii) producing or obtaining a probe or primer comprising the sequence determined at (i) and capable of preferentially or specifically annealing or hybridizing to nucleic acid linked to said polymorphism or mutation.
22. A process for producing a probe or primer for detecting a polymorphism or mutation, said process comprising:
(i) obtaining the sequence of a nucleic acid comprising a nucleotide specific to a genome of a polyploid subject in which a polymorphism or mutation occurs and that is linked to the polymorphism or mutation by performing the method according to claim 18; and
(ii) producing or obtaining a probe or primer comprising the sequence obtained at (i) and capable of preferentially or specifically annealing or hybridizing to a nucleic acid comprising said polymorphism or mutation.
23. A process for identifying a polymorphism or mutation associated with or causative of a trait, said process comprising:
(i) identifying a polymorphism or mutation by performing the method according to any one of claims 1 to 13;
(ii) analyzing a panel of subjects to determine those that comprise a trait, wherein not all members of the panel comprise the polymorphism or mutation; and
(iii) determining variation in the development of the trait wherein said variation indicates that the polymorphism or mutation is associated with or causative of the trait.
24. The method according to claim 23, additionally comprising determining or identifying a gene or nucleic acid expressed in nature linked to the polymorphism or mutation.
25. A process for identifying a mutation causative of a trait, said process comprising: (i) inducing or producing a mutation in a polyploid subject;
(ii) screening the polyploid subject at (i) to identify a polyploid subject having a trait; and
(iii) performing the method according to any one of claims 1 to 13 to identify a mutation in the genome(s) of the polyploid subject at (ii), thereby identifying a mutation causative of the trait.
26. A method for identifying a mutation causative of a trait, said method comprising: (i) inducing or producing a mutation in a polyploid subject;
(ii) performing the method according to any one of claims 1 to 13 to identify a polyploid subject comprising mutation in a target nucleic acid in its genome(s); and (iii) screening the polyploid subject at identified at (ii) to identify a polyploid subject having a trait, thereby identifying a mutation causative of the trait.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US4254908P | 2008-04-04 | 2008-04-04 | |
| US61/042,549 | 2008-04-04 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2009121091A1 true WO2009121091A1 (en) | 2009-10-08 |
Family
ID=41134718
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/AU2008/001397 Ceased WO2009121091A1 (en) | 2008-04-04 | 2008-09-19 | Mapping method for polyploid subjects |
Country Status (2)
| Country | Link |
|---|---|
| AR (1) | AR068526A1 (en) |
| WO (1) | WO2009121091A1 (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101974629A (en) * | 2010-10-26 | 2011-02-16 | 西南大学 | Method for investigating origin of species of allopolyploid by virtual synthetic species |
| CN113474466A (en) * | 2019-02-21 | 2021-10-01 | 主基因有限公司 | Polyploid genotyping |
| CN116403642A (en) * | 2023-06-06 | 2023-07-07 | 中国农业科学院作物科学研究所 | A fast and fine-tuning method for QTL |
| CN116779035A (en) * | 2023-05-26 | 2023-09-19 | 成都基因汇科技有限公司 | Polyploid transcriptome subgenomic typing method and computer readable storage medium |
| CN120099172A (en) * | 2025-05-08 | 2025-06-06 | 北京嘉宝仁和医疗科技股份有限公司 | STR primer set, kit and method for detecting polyploidy, UPD and maternal contamination in samples |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030017487A1 (en) * | 2001-06-06 | 2003-01-23 | Pharmacogenetics, Ltd. | Method for detecting single nucleotide polymorphisms (SNP'S) and point mutations |
| WO2003041490A2 (en) * | 2001-11-12 | 2003-05-22 | Commonwealth Scientific And Industrial Research Organisation | Novel isoamylases and associated methods and products |
| WO2005039389A2 (en) * | 2003-10-22 | 2005-05-06 | 454 Corporation | Sequence-based karyotyping |
| WO2008006169A1 (en) * | 2006-07-12 | 2008-01-17 | Commonwealth Scientific And Industrial Research Organisation | Polynucleotides and methods for enhancing salinity tolerance in plants |
| WO2008025097A1 (en) * | 2006-08-31 | 2008-03-06 | Commonwealth Scientific And Industrial Research Organisation | Salt tolerant plants |
-
2008
- 2008-09-19 WO PCT/AU2008/001397 patent/WO2009121091A1/en not_active Ceased
- 2008-09-19 AR ARP080104101A patent/AR068526A1/en unknown
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030017487A1 (en) * | 2001-06-06 | 2003-01-23 | Pharmacogenetics, Ltd. | Method for detecting single nucleotide polymorphisms (SNP'S) and point mutations |
| WO2003041490A2 (en) * | 2001-11-12 | 2003-05-22 | Commonwealth Scientific And Industrial Research Organisation | Novel isoamylases and associated methods and products |
| WO2005039389A2 (en) * | 2003-10-22 | 2005-05-06 | 454 Corporation | Sequence-based karyotyping |
| WO2008006169A1 (en) * | 2006-07-12 | 2008-01-17 | Commonwealth Scientific And Industrial Research Organisation | Polynucleotides and methods for enhancing salinity tolerance in plants |
| WO2008025097A1 (en) * | 2006-08-31 | 2008-03-06 | Commonwealth Scientific And Industrial Research Organisation | Salt tolerant plants |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101974629A (en) * | 2010-10-26 | 2011-02-16 | 西南大学 | Method for investigating origin of species of allopolyploid by virtual synthetic species |
| CN113474466A (en) * | 2019-02-21 | 2021-10-01 | 主基因有限公司 | Polyploid genotyping |
| CN116779035A (en) * | 2023-05-26 | 2023-09-19 | 成都基因汇科技有限公司 | Polyploid transcriptome subgenomic typing method and computer readable storage medium |
| CN116779035B (en) * | 2023-05-26 | 2024-03-15 | 成都基因汇科技有限公司 | Polyploid transcriptome subgenomic typing method and computer readable storage medium |
| CN116403642A (en) * | 2023-06-06 | 2023-07-07 | 中国农业科学院作物科学研究所 | A fast and fine-tuning method for QTL |
| CN116403642B (en) * | 2023-06-06 | 2023-08-04 | 中国农业科学院作物科学研究所 | Quick and fine positioning method for QTL |
| CN120099172A (en) * | 2025-05-08 | 2025-06-06 | 北京嘉宝仁和医疗科技股份有限公司 | STR primer set, kit and method for detecting polyploidy, UPD and maternal contamination in samples |
Also Published As
| Publication number | Publication date |
|---|---|
| AR068526A1 (en) | 2009-11-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Semagn et al. | An overview of molecular marker methods for plants | |
| Madhumati | Potential and application of molecular markers techniques for plant genome analysis | |
| Appleby et al. | New technologies for ultra-high throughput genotyping in plants | |
| US8771952B2 (en) | Substances and methods for a DNA based profiling assay | |
| JP6234463B2 (en) | Nucleic acid multiplex analysis method | |
| US20100223293A1 (en) | Polymorphic Markers and Methods of Genotyping Corn | |
| US20060135758A1 (en) | Soybean polymorphisms and methods of genotyping | |
| WO2003060163A2 (en) | Discrimination and detection of target nucleotide sequences using mass spectrometry | |
| WO2003020983A1 (en) | Allele specific pcr for genotyping | |
| HK1197270A1 (en) | Ssr markers for plants and uses thereof | |
| Bagge et al. | Functional markers in wheat: technical and economic aspects | |
| EP2195453A2 (en) | Method of amplifying nucleic acid | |
| US20170283854A1 (en) | Multiplexed pcr assay for high throughput genotyping | |
| WO2006094360A1 (en) | Method of amplifying nucleic acid | |
| WO2009121091A1 (en) | Mapping method for polyploid subjects | |
| Agrawal et al. | Molecular markers | |
| Karaca et al. | Molecular markers in Salvia L.: past, present and future | |
| KR102777919B1 (en) | KASP Primer Set Based on SNP for Discriminating Anser fabalis and Anser albifrons, and Uses thereof | |
| Singh et al. | Polymerase chain reaction-based markers | |
| Oefner | Sequence variation and the biological function of genes: methodological and biological considerations | |
| KR20230068537A (en) | Biomarkers for determining the maturation date of wheat | |
| Singh et al. | Molecular markers in plants | |
| KR102665741B1 (en) | Molecular marker for discriminating bacterial wilt-resistant pepper and uses thereof | |
| Xu | Molecular breeding tools: markers and maps. | |
| Singh et al. | Sequence-based markers |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08800032 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 08800032 Country of ref document: EP Kind code of ref document: A1 |