[go: up one dir, main page]

WO2014071250A1 - Methods for detecting and mapping modifications to nucleic acid polymers using nanopore systems - Google Patents

Methods for detecting and mapping modifications to nucleic acid polymers using nanopore systems Download PDF

Info

Publication number
WO2014071250A1
WO2014071250A1 PCT/US2013/068162 US2013068162W WO2014071250A1 WO 2014071250 A1 WO2014071250 A1 WO 2014071250A1 US 2013068162 W US2013068162 W US 2013068162W WO 2014071250 A1 WO2014071250 A1 WO 2014071250A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
nanopore
nucleotide
acid polymer
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2013/068162
Other languages
French (fr)
Inventor
Jens H. Gundlach
Ian DERRINGTON
Andrew Laszlo
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Washington Center for Commercialization
Original Assignee
University of Washington Center for Commercialization
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Washington Center for Commercialization filed Critical University of Washington Center for Commercialization
Publication of WO2014071250A1 publication Critical patent/WO2014071250A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism

Definitions

  • sequence listing associated with this application is provided in text format in lieu of a paper copy and is hereby incorporated by reference into the specification.
  • the name of the text file containing the sequence listing is 43321_SeqList-FINAL.txt.
  • the text file is 19KB; was created on 01 November 2013; and is being submitted via EFS- Web with the filing of the specification.
  • DNA The nucleic acid, DNA, is often referred to as the "blueprint for life.” However, there is more to the code than DNA sequence alone. Modifications can occur on the canonical nucleotide subunits that can affect the functional information embedded in the DNA. For example, epigenetic factors govern the DNA blueprint's transcription and translation into protein. Thus, there is a rapidly growing interest in understanding these modifications, such as epigenetic factors.
  • Epigenetic modifications such as methylation and/or hydroxymethylation of DNA can be natural processes by which normal cells function and carry out or inhibit many cellular functions.
  • epigenetic modifications are known to be involved with normal silencing and/or prevention of gene expression, thereby enabling a cell to essentially turn off one or more genes.
  • the most common epigenetic DNA modification is the methylation of cytosine leading to 5-methylcytosine ( m C).
  • cytosine methylations occur in C-G dinucleotides (CpG; the "CpG” shorthand invokes the relationship that the cytosine and guanine are linked in the same strand by a single phosphate and distinguished the relationship from a C:G pairing of complementary strands, such as in double-stranded DNA).
  • Methylation is associated with gene regulation (i.e., highly methylated DNA tends to be less transcriptionally active) and therefore has implications for cell development, aging, and diseases such as cancer. Further oxidation of the methyl residue results in 5-hydroxymethylcytosine ( h C). Because of its relatively recent discovery in mammalian tissue, the function of h C is less well explored.
  • methylation patterns are tissue specific and change over the life of an organism as it develops or is exposed to certain chemicals and environmental conditions. In some cases, these changes are heritable through multiple generations.
  • nucleotide modifications such as methylation
  • precise mapping of modifications may yield more pertinent information to research and ultimately to clinical diagnosis of gene-regulation-related disease, than sequencing the standard four bases alone.
  • Clinical uses will require fast, inexpensive, and reliable detection methods to map modifications such as methylation. Because such modification patterns vary between cells, it is preferable to use small, native, unamplified DNA samples, making this task suitable for single-molecule techniques.
  • SMRT Single-molecule real-time sequencing
  • T he durations of the pauses are stochastically distributed, and the change in kinetics caused by m C is subtle. Thus, detection requires averaging over dozens of reads, complicating methylation detection.
  • nanochannels have been used as nano-Coulter counters to measure the correlation between DNA methylation and certain histone modifications for single chromatin molecules. However, this method lacks single-nucleotide resolution.
  • the present disclosure provides a method of detecting a nucleotide modification in a nucleic acid polymer.
  • the method comprises applying an electrical field to a nanopore system comprising a first conductive liquid medium in liquid communication with a second conductive liquid medium through a nanopore; translocating the nucleic acid polymer through a nanopore from the first conductive liquid medium to the second conductive liquid medium; detecting an ion current to provide a current pattern associated with a portion of the nucleic acid polymer; and comparing the current pattern to a reference current pattern associated with the same nucleotide sequence as the portion of the nucleic acid polymer without any modifications, wherein a difference between the current pattern and the reference current pattern indicates the presence of a modified nucleotide in nucleic acid polymer.
  • the nucleic acid polymer can be DNA, RNA, mRNA, PNA, or a combination thereof.
  • the DNA is single stranded DNA (ssDNA).
  • the method further comprises identifying the type of nucleotide modification present in the polymer based on a character of the difference between the current pattern and the reference current pattern.
  • the character of the difference comprises the degree of current increase or decrease and/or the duration of the difference.
  • the nucleotide modification is an epigenetic modification or a modification resulting from DNA damage.
  • the nucleotide modification is a 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine, 5-carboxycytosine, b-glucosyl-5-hydroxy-methylcytosine, 8-oxoguanine, 2-amino-adenosine, 2-amino-deoxyadenosine, 2-thiothymidine, pyrrolo-pyrimidine, 2-thiocytidine, a thymine dimer, or an abasic lesion.
  • the portion of the nucleic acid polymer comprises one or a plurality of contiguous nucleotides of the nucleic acid polymer. In some embodiments, the portion of the nucleic acid polymer comprises the nucleotide or nucleotide position with the modification. In some embodiments, the portion of the nucleic acid polymer further comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more additional nucleotides adjacent to the nucleotide or nucleotide position with the modification on one or both sides. In some embodiments, at least one additional nucleotide is adjacent at the 5' side of the nucleotide with the modification.
  • At least one additional nucleotide is adjacent at the 3' side of the nucleotide with the modification.
  • the portion of the nucleic acid polymer further comprises at least two additional nucleotides adjacent at the 5' side of the nucleotide with the modification and at least one nucleotide adjacent at the 3' side of the nucleotide with the modification.
  • the nanopore is a solid-state nanopore, protein nanopore, a hybrid solid state-protein nanopore, a biologically adapted solid-state nanopore, or a DNA origami nanopore.
  • the protein nanopore is a ⁇ -barrel pore, such as alpha-hemolysin or Mycobacterium smegmatis porin A (MspA), or a homolog thereof.
  • the protein nanopore sequence is modified from the wild- type sequence to contain at least one amino acid substitution, deletion, or addition. In some embodiments, the at least one amino acid substitution, deletion, or addition results in a net charge change in the nanopore.
  • the electric field is sufficient to cause the electrophoretic translocation of the nucleic acid polymer through the nanopore. In some embodiments, the electric field is between about 40 mV to 1 V.
  • the nanopore is associated with a molecular motor, wherein the molecular motor is capable of moving a nucleic acid polymer into or through the nanopore with an average translocation velocity that is less than the average translocation velocity at which the analyte translocates into or through the nanopore in the absence of the molecular motor.
  • the molecular motor is a polymerase, an exonuclease, a helicase, a topoisomerase, or a translocase. In some embodiments, the molecular motor is phi29.
  • the disclosure provides method of detecting a nucleotide modification in a nucleic acid polymer, as generally described above, but including the step generating a reference nucleic acid polymer that contains the same nucleotide sequence but that does not contain any modifications to the canonical nucleotide structures.
  • the method comprises amplifying a target nucleic acid polymer that potentially contains at least one nucleotide modification to produce a reference nucleic acid polymer that does not contain a nucleotide modification; applying the target and reference nucleic acid polymers to a nanopore system comprising a first conductive liquid medium in liquid communication with a second conductive liquid medium through a nanopore; causing the translocation of the target nucleic acid polymer through the nanopore from the first conductive liquid medium to the second conductive liquid medium; detecting an ion current to provide a target current pattern associated with a portion of the target nucleic acid polymer; causing the translocation of the reference nucleic acid polymer through the nanopore from the first conductive liquid medium to the second conductive liquid medium; detecting an ion current to provide a reference current pattern associated with a portion of the reference nucleic acid polymer, wherein the portion of the target nucleic acid polymer comprises the same nucleotide sequence as the portion of the target nucleic acid polymer;
  • the reference nucleic acid polymer is produced from the target nucleic acid polymer using at least one round of the polymerase chain reaction (PCR).
  • the method further comprises determining the position of the modified nucleotide in the target polymer.
  • the method further comprises identifying the modified nucleotide in the target polymer.
  • the method further comprises determining the sequence of at least a portion of the target nucleic acid polymer comprising the modified nucleotide.
  • the modified nucleotide is identified without knowledge of the nucleotide identity in the unmodified reference sequence.
  • FIGURE 1A-FIGURE 1C schematically illustrate a representative system useful for nanopore-based analysis of nucleic acid modifications according to the present disclosure.
  • FIGURE 1A is a diagram of an exemplary template configuration, wherein a hairpin primer oligonucleotide forms a loop and hybridizes to a portion of itself. The remainder of the hairpin primer hybridizes to a 3'-end domain of the template strand. The gap between the 3'-end of the template strand and the 5'-end of the hairpin primer is indicated. Adjacent to the hairpin primer, and hybridizing to an internal domain of the template strand, is a blocking oligomer.
  • FIGURE IB schematically illustrates the use of a molecular motor in connection with a nanopore system to assist the controlled translocation of the nucleic acid template strand through the nanopore for analysis.
  • the nanopore e.g., MspA
  • the nanopore is embedded in a phospholipid bilayer and provides liquid communication between the upper chamber (i.e., cis side) and the lower chamber (i.e., trans side).
  • a voltage (e.g., 180 mV) is applied across the membrane, which causes an ion current to flow through the pore.
  • the molecular motor (shown in dark) is pulled into contact with the vestibule of the nanopore in the cis side, but cannot pass through, thus causing the unzipping of the blocking oligomer from template strand.
  • the blocking oligomer is "unzipped” from the template strand.
  • the short and narrow constriction of MspA concentrates the ion current to resolve the relatively small differences between C, m C, and h C.
  • FIGURE 1C schematically illustrates the progression of template through the nanopore system over time.
  • the single stranded 5'-end of the template strand interacts with, or enters the vestibule of, the nanopore.
  • the molecular motor which is attached to the double stranded portion of the complex, contacts the vestibule of the molecular motor, but does not pass through, thus slowing translocation of the template strand through the nanopore.
  • the force of translocation eventually causes the complete unzipping of the blocking oligomer (the process of which is illustrated in FIGURE IB), thus exposing the 3'-end of the hairpin primer to the molecular motor (e.g., a DNA polymerase such as phi29).
  • the molecular motor pulls the template strand back through the nanopore by enzymatic action where the hairpin primer is elongated based on the template strand.
  • FIGURE 2 graphically illustrates the mean consensus levels and standard deviation of the mean for the normalized current levels (ion current divided by the open- pore current) determined for several events of TGCC quadromers and modified TGCC quadromers (i.e., wherein the C is methylated or hydroxymethylated).
  • the current levels for the TGCC quadromers and modified TGCC quadromers are scaled and offset to each other on the same graph based on their calibration regions (i.e., a recognizable adapter sequence separated from the template strand by a single abasic nucleotide).
  • the unmodified levels are displayed in black and the modified current levels are displayed in gray and with a * symbol.
  • the quadromers and sequences illustrated in this figure are listed from 3 '-5' reflecting the order the data was recorded during phi29 synthesis. This order is the opposite of how DNA is typically listed, thus the CpGs appear as GpCs in this figure.
  • the sequence of the unmodified (top) strand is set forth as SEQ ID NO:5.
  • the sequence of the modified strand is set forth herein as SEQ ID NO:6.
  • the location of hydroxymethylation and methylation in the modified strand TGCC quadromers are indicated by h and m, respectively. Below the levels, the quadromers corresponding to each level are indicated. Note that hydroxymethylation decreases the average current in the affected levels while methylation increases the average current of affected levels (see FIGURE 3).
  • FIGURE 3 graphically illustrates the normalized difference in mean current values for each current level.
  • a control cytosine remains unmodified and shows no significant difference in reads on both strands.
  • the sequence of the illustrated modified strand is set forth herein as SEQ ID NO:6.
  • FIGURE 4 graphically illustrates the normalized difference in mean current values for each level with seven Gaussians fit the data.
  • the Gaussian peaks indicate the centroid of the effect, which corresponds well with the location of the epigenetic modifications.
  • the Gaussian fit for the unmodified GpC region has a very low amplitude relative to the modified GpCs.
  • This mapping difference yields a clear and easily interpretable way of mapping modifications in the nucleic acid polymer.
  • the sequence of the illustrated modified strand is set forth herein as SEQ ID NO:6.
  • FIGURE 5A-FIGURE 5D graphically illustrate the detection of methylation in representative single stranded DNA sequence containing a CpG site.
  • FIGURE 5A and FIGURE 5B illustrate segments of raw current traces for the DNA sequence with unmethylated and methylated cytosine, respectively. Ion current changes as DNA passes through the pore in single-nucleotide steps. The average current values for each current level are horizontal black or horizontal lines, respectively.
  • the traces shown in FIGURE 5A and FIGURE B are for DNA with identical nucleotide sequence.
  • the current trace shown in FIGURE 5A contains a single unmethylated CpG site, whereas the trace in FIGURE 5B contains a single methylated CpG site.
  • FIGURE 5C illustrates the extracted average current values from each level from FIGURE 5A in a solid black line and from FIGURE 5B in a solid gray line, with the difference illustrated with shading.
  • the stochastic duration of current levels has been removed so that the DNA base sequence can be aligned to the observed current levels.
  • the DNA sequence, set forth herein as SEQ ID NO:7 is shown below (from 5'- to 3'-, left to right) with the modified C indicated as ' ,m C".
  • FIGURE 5D illustrates current difference plot. The current levels obtained with methylated DNA were subtracted from the current levels obtained with unmethylated DNA. The effect of a single " pG causes an ion current increase that persists over approximately four steps of the DNA through the pore.
  • FIGURE 6 graphically illustrates the differences in the ion current level sequences resulting from DNA containing methylation or hydroxymethylation and from DNA without methylation or hydroxymethylation.
  • X is an abasic site.
  • FIGURE 6A The sequence illustrated in FIGURE 6A is set forth in SEQ ID NO:8, and the sequence illustrated in FIGURE 6B is set forth in SEQ ID NO:9.
  • the methylated positions are marked by a significant current increase that persists over approximately four steps of the DNA through the pore.
  • the amplitude and shape of the current difference depend on the nucleotides adjacent to the m C. In regions containing no methylation, current differences are insignificant.
  • h C results in a small reduction in current, although the magnitude of the current difference is less than observed for m C.
  • h C results in a current increase.
  • Error bars are the observed SD for single-molecule reads of methylated DNA and indicate the variation in single-molecule reads.
  • the gray boxes along the x axis are the SDs for reads of unmethylated DNA. See TABLE 2, for exact numbers of events.
  • FIGURE 7A and FIGURE 7B graphically illustrates that the DNA sequence context changes the resulting current difference pattern when a modified cytosine replaces a cytosine at a CpG site.
  • FIGURE 7A shows the current difference patterns caused by the sequence XY m CpG, where X and Y are any of the four nucleotides A, C, G, and T.
  • FIGURE 7B shows the current difference patterns caused by the sequence XY h CpG, where X and Y are any of the four nucleotides A, C, G, and T.
  • the right-most column and bottom row of each figure display the current differences averaged over the nucleotides X or Y, respectively.
  • FIGURE 7A illustrates that the maximum difference reaches 7 pA for AA m CpG and is only 1-2 pA when XY contains a thymine.
  • the average maximum difference is approximately 2 pA.
  • the number of levels showing a significant current difference varies from 3 to 5. The difference is maximal when the m C is immediately above the constriction of the nanopore (see FIGURE 8) and the distribution is skewed.
  • FIGURE 7B illustrates that the current deviations due to h C are more complex.
  • FIGURE 8A and FIGURE 8B schematically illustrate spatial methylation sensitivity of MspA.
  • the variable shading indicates the region of higher electric field within MspA.
  • FIGURE 8 A illustrates that when m C is cis of the constriction, it is in a high field region and it modulates the ion current. Other nucleotides that are also within the high field region determine the magnitude of the m C-specific signal.
  • FIGURE 8A illustrates that when m C is trans of the constriction, it is outside the high field region and no longer affects the current.
  • the sequence illustrated in FIGURE 8A is set forth as SEQ ID NO: 12 and the sequence illustrated in FIGURE 8B is set forth as SEQ ID NO: 13.
  • FIGURE 9A-FIGURE 9D graphically illustrate the differences in ion current resulting from multiple adjacent m Cs and h Cs.
  • Current differences [ mo(j ified ⁇ ⁇ unmodified] f° r f° ur DNA strands contain different methylation (and hydroxymethylation) patterns.
  • CpGs rarely occur in such high density, the illustrated data demonstrate that it is possible to discern multiple adjacent m CpGs and h CpGs.
  • Each sequence set forth in FIGURE 9A-FIGURE 9D is set forth herein as SEQ ID NO: 14- 17, respectively.
  • FIGURE 9A which illustrates data from a strand containing one m C and one h C (as indicated at the bottom), demonstrates that one can simultaneously detect ⁇ Kl and h C in a single strand.
  • FIGURE 9B illustrates the current difference resulting from a strand with identical sequence to that shown in FIGURE 9A, but containing four m Cs as well as two h Cs (indicated at the bottom). As demonstrated, even with this density of modified CpGs, individual m Cs and h Cs can be resolved.
  • FIGURE 9C illustrates the current differences resulting from a strand with adjacent m CpG sites. The modification density results in wide and large current difference profiles. The current difference profiles for individual m Cs seemingly superimpose.
  • FIGURE 9D illustrates the current differences for a strand with identical sequence to that in FIGURE 9C but with two " s replaced by two h Cs (indicated at the bottom).
  • the effects of m C and h C counteract one another.
  • the result is approximately a superposition of the signals shown in FIGURE 7.
  • FIGURE 1 OA-FIGURE 10D graphically illustrate the classification power for individual-level positions surrounding CpG sites.
  • FIGURE 10A illustrates the t-test value for each level and each measured sequence context (XYCpG), testing the unmethylated hypothesis XYCpG, against the methylated hypothesis XY ⁇ pG. Darker values indicate that a level has more power to call the methylation status of the CpG.
  • FIGURE 10B similarly illustrates the predictive power of single levels to call hydroxymethylation.
  • FIGURE IOC similarly illustrates the predictive power of single levels to call methylation and hydroxymethylation.
  • the lower plots in FIGURE 1 OA-FIGURE IOC show the t-test value for each position, averaged over all sequence contexts.
  • FIGURE 10D illustrates the classification frequency using just one specific level at the indicated positions.
  • FIGURE 10D, part (i) illustrates the rates that strands containing a m C were called correctly as m CpG (indicated as " m C"), or incorrectly as h CpG (indicated as " h C”) and unmethylated CpG (indicated with "C").
  • the present disclosure generally relates to methods for measuring, diagnosing, visualizing, and/or detecting modifications in nucleic acids through the use of nanopore-based analysis.
  • the methods and compositions are useful for accurately detecting, distinguishing, and mapping epigenetic modifications in nucleic acids such as 5-methylcytosine ( m C) and 5-hydroxymethylcytosine ( h C).
  • Nanopore analysis is an emerging single-molecule technique that has shown promise for DNA sequencing and analysis. As is described in more detail below, in nanopore sequencing, a thin membrane containing a single nanometer- sized pore divides a salt solution into two wells, cis and trans. A voltage across the membrane causes an ion current through the pore.
  • This current can also facilitate the interaction of analytes, such as DNA, with the nanopore, in some cases driving the analyte through the pore from one side to the other.
  • analytes such as DNA
  • the nucleotides at the narrowest section of the pore modulate the ion current.
  • Solid-state nanopores have been used to detect the bulk presence of ⁇ Kl and h C in double-stranded DNA (dsDNA) (Wanunu, M., et al., "Discrimination of Methylcytosine From Hydroxymethylcytosine in DNA Molecules," Journal of the American Chemical Society 133(3):486-492, 2011). Recently, solid-state nanopores were also used to detect dsDNA complexed with methyl-binding proteins and thereby indirectly measured the approximate location of individual methylation sites (Shim, J., et al., Detection and Quantification of Methylation in DNA Using Solid-State Nanopores,” Scientific Reports 3: 1389, 2013).
  • the present inventors have developed an approach using nanopore-based analysis that can detect, map, and distinguish (i.e., accurately identify) multiple, distinct DNA modifications within a single DNA template polymer.
  • the engineered biological protein pore Mycobacterium smegmatis porin A (MspA) was used to detect and map 5-methylcytosine and 5-hydroxymethylcytosine within single strands of DNA with single-nucleotide resolution.
  • MspA Mycobacterium smegmatis porin A
  • DNAP phi29 DNA polymerase
  • a comparison of the current levels generated with DNA containing methylated or hydroxymethylated CpG sites to current levels obtained with unmethylated copies of the same DNA sequence resulted in a surprisingly precise indication of methylated or hydroxymethylated CpG sites.
  • the detection efficiency in a quasi-random DNA strand was 97.5 + 0.7% for methylation and 97 + 0.9% for hydroxymethylation.
  • the disclosed approach can be applied to detection and mapping of modifications, such as epigenetic modifications or the results of DNA damage, that occur in genomic DNA or RNA. Such information can be valuable for clinical uses, such as assessing associations of such changes with risk or presence of disease.
  • the present disclosure provides a method of detecting a nucleotide modification in a nucleic acid polymer.
  • the method comprises applying an electrical field to a nanopore system comprising a first conductive liquid medium in liquid communication with a second conductive liquid medium through a nanopore.
  • the nucleic acid polymer is translocated through a nanopore from the first conductive liquid medium to the second conductive liquid medium.
  • An ion current is detected to provide a current pattern associated with a portion of the nucleic acid polymer.
  • the current pattern is compared to a reference current pattern associated with the same nucleotide sequence as the portion of the nucleic acid polymer without any modifications, wherein a difference between the current pattern and the reference current pattern indicates the presence of a modified nucleotide in the polymer.
  • nucleic acid can refer to a deoxyribonucleotide polymer (DNA), ribonucleotide polymer (RNA, including mRNA), peptide nucleic acids (PNAs) and phosphorothioate DNA, in either single- or double-stranded form.
  • DNA deoxyribonucleotide polymer
  • RNA ribonucleotide polymer
  • PNAs peptide nucleic acids
  • phosphorothioate DNA in either single- or double-stranded form.
  • the nucleic acid subunits for each distinct nucleic acid polymer-type are commonly known.
  • the structure of the canonical polymer subunits of DNA are referred to herein as adenine (A), guanine (G), cytosine (C), and thymine (T).
  • A adenine
  • G guanine
  • C cytosine
  • T thymine
  • these are generally referred to herein as nucleotides or nucleotide residues.
  • U uracil
  • T thymine
  • the present disclosure is directed to detecting modifications that can occur within the nucleic acid polymers, and in some embodiments, modifications that specifically occur to the individual subunits of the nucleic acid polymers, i.e., the individual nucleotides.
  • modification encompasses any chemical change in the structure of the nucleic acid polymer subunit that results in a noncanonical subunit structure. Such chemical changes can results from, for example, epigenetic modifications (such as to genomic DNA or RNA), or damage resulting from radiation, chemical, or other means.
  • nucleotide modification do not refer to simple additions or deletions of canonical nucleotides to the sequence of the polymer. Nor do the terms refer to substitutions of one canonical nucleotide for another canonical nucleotide for that polymer-type.
  • uracil (U) is considered a noncanonical nucleotide structure for DNA polymers (and, conversely, thymine (T) is considered a noncanonical nucleotide structure for RNA polymers).
  • the present disclosure is directed to the detection of nucleotide modifications, as defined.
  • the disclosure encompasses the detection of a noncanonical nucleotide structure within a nucleic acid polymer, which results from the act of modification.
  • Any of the foregoing noncanonical subunits include analog structures.
  • noncanonical nucleic acid subunits include uracil (for DNA), thymine (for RNA), 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine, 5-carboxylcytosine, b-glucosyl-5-hydroxy-methylcytosine, 8-oxoguanine, 2-amino-adenosine, 2-amino- deoxyadenosine, 2-thiothymidine, pyrrolo-pyrimidine, 2-thiocytidine, or an abasic lesion.
  • An abasic lesion is a location along the deoxyribose backbone but lacking a base.
  • noncanonical structures can incorporate more than one nucleic acid subunits, such as thymine dimers.
  • a single target nucleic acid polymer can comprise a combination of any of the foregoing polymers and/or polymer subunits.
  • the polymer analyte is a combination of any two or more of DNA, RNA, PNA.
  • the present disclosure addresses the detection of modified nucleic acid subunits that are noncanonical (i.e., have been modified from the canonical structure) with reference to the canonical subunit structures for the two or more types of nucleic acids that make up the single polymer.
  • Nanopore specifically refers to a pore having an opening with a diameter at its most narrow point of about 0.3 nm to about 2 nm.
  • Nanopores useful in the present disclosure include any pore capable of permitting the linear translocation of a nucleic acid polymer from one side to the other at a velocity amenable to monitoring techniques, such as techniques to detect current fluctuations.
  • the nanopore comprises a protein.
  • proteins can be ⁇ -barrel pores, outer membrane proteins (often of bacterial origin), ⁇ -toxin porings, and transport proteins.
  • Exemplary pores include alpha-hemolysin, Mycobacterium smegmatis porin A (MspA) and related porins such as from Norcadia facinica, membrane outer protein (OmpATb), membrane outer protein F (OmpF), membrane outer protein G (OmpG), outer membrane phospholipase A, Neisseria autotransporter lipoprotein (NalP), lysenin, anthrax toxin and leukocidins, and homologs thereof, or other porins, as described in U.S. Pub. No.
  • a "homolog,” as used herein, is a gene or gene product from another bacterial species that has a similar structure and evolutionary origin.
  • homologs of wild-type MspA such as MppA, PorMl, PorM2, and Mmcs4296, can serve as the nanopore in the present invention.
  • Protein nanopores have the advantage that, as biomolecules, they self-assemble and are essentially identical to one another.
  • protein nanopores can be wild-type or can be modified to contain at least one amino acid substitution, deletion, or addition.
  • the at least one amino acid substitution, deletion, or addition results in a different net charge of the nanopore.
  • the different in net charge increases the difference of net charge as compared to the first charged moiety of the polymer analyte.
  • the at least one amino acid substitution, deletion, or addition results in a nanopore that is less negatively charged.
  • the resulting net charge is negative (but less so), is neutral (where it was previously negative), is positive (where it was previously negative or neutral), or is more positive (where it was previously positive but less so).
  • MspA nanopores can be modified with amino acid substitutions to result in a MspA mutant with a mutation at position 93, a mutation at position 90, position 91, or both positions 90 and 91, and optionally one or more mutations at any of the following amino acid positions: 88, 105, 108, 118, 134, or 139, with reference to the wild type amino acid sequence.
  • the MspA contains the mutations D90N/D91N/D93N, with reference to the wild type sequence positions (referred to therein as "MIMspA” or “Ml-NNN”). In another embodiment, the MspA contains the mutations D90N/D91N/D93N/D118R/D134R/E139K, with reference to the wild type sequence positions (referred to therein as "M2MspA”). See U.S. Pub. No. 2012/0055792.
  • Such mutations can result in a MspA nanopore that comprises a vestibule having a length from about 2 to about 6 nm and a diameter from about 2 to about 6 nm, and a constriction zone having a length from about 0.3 to about 3 nm and a diameter from about 0.3 to about 3 nm, wherein the vestibule and constriction zone together define a tunnel.
  • the amino acid substitutions described in these examples provide a greater net positive charge in the vestibule of the nanopore, further enhancing the energetic favorability of interacting with a negatively charged analyte polymer end.
  • the nanopores can include or comprise DNA-based structures, such as generated by DNA origami techniques.
  • DNA origami-based nanopores for analyte detection, see PCT Pub. No. WO2013083983, incorporated herein by reference.
  • the nanopore can be a solid state nanopore.
  • Solid state nanopores can be produced as described in U.S. Patent Nos. 7,258,838 and 7,504,058, incorporated herein by reference in their entireties. Solid state nanopores have the advantage that they are more robust and stable. Furthermore, solid state nanopores can in some cases be multiplexed and batch fabricated in an efficient and cost-effective manner. Finally, they might be combined with micro-electronic fabrication technology.
  • the nanopore comprises a hybrid protein/solid state nanopore in which a nanopore protein is incorporated into a solid state nanopore.
  • the nanopore is a biologically adapted solid-state pore.
  • the nanopore comprises a vestibule and a constriction zone that together form a tunnel.
  • a "vestibule” refers to the cone-shaped portion of the interior of the nanopore whose diameter generally decreases from one end to the other along a central axis, where the narrowest portion of the vestibule is connected to the constriction zone.
  • a vestibule may generally be visualized as "goblet-shaped.” Because the vestibule is goblet- shaped, the diameter changes along the path of a central axis, where the diameter is larger at one end than the opposite end. The diameter may range from about 2 nm to about 6 nm.
  • the diameter is about, at least about, or at most about 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, or 6.0 nm, or any range derivable therein.
  • the length of the central axis may range from about 2 nm to about 6 nm.
  • the length is about, at least about, or at most about 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, or 6.0 nm, or any range derivable therein.
  • diameter When referring to "diameter” herein, one can determine a diameter by measuring center-to-center distances or atomic surface-to-surface distances.
  • a “constriction zone” refers to the narrowest portion of the tunnel of the nanopore, in terms of diameter, that is connected to the vestibule.
  • the length of the constriction zone can range, for example, from about 0.3 nm to about 20 nm. Optionally, the length is about, at most about, or at least about 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, or 3 nm, or any range derivable therein.
  • the diameter of the constriction zone can range from about 0.3 nm to about 2 nm.
  • the diameter is about, at most about, or at least about 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, or 3 nm, or any range derivable therein.
  • the range of dimension can extend up to about 20 nm.
  • the constriction zone of a solid state nanopore is about, at most about, or at least about 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1,2 13, 14, 15, 16, 17, 18, 19, or 20 nm, or any range derivable therein.
  • the nanopore is disposed within a membrane, thin film, or lipid bilayer, which can separate the first and second conductive liquid media, which provides a nonconductive barrier between the first conductive liquid medium and the second conductive liquid medium.
  • the nanopore thus, provides liquid communication between the first and second conductive liquid media.
  • the pore provides the only liquid communication between the first and second conductive liquid media.
  • the liquid media typically comprises electrolytes or ions that can flow from the first conductive liquid medium to the second conductive liquid medium through the interior of the nanopore. Liquids employable in methods described herein are well-known in the art. Descriptions and examples of such media, including conductive liquid media, are provided in U.S. Patent No.
  • the first and second liquid media may be the same or different, and either one or both may comprise one or more of a salt, a detergent, or a buffer. Indeed, any liquid media described herein may comprise one or more of a salt, a detergent, or a buffer. Additionally, any liquid medium described herein may comprise a viscosity-altering substance or a velocity-altering substance.
  • the nanopore is capable of interacting with the nucleic acid analyte polymer serving as the target or focus of a modification analysis herein.
  • the polymer and nanopore are capable of interacting such that the polymer can translocate through the nanopore from a first conductive liquid medium to a second conductive liquid medium.
  • the translocation is preferably in a linear fashion, through the pore to the other side.
  • the terms "interact" or "interacting" indicate that the analyte moves into at least an interior portion of the nanopore and, optionally, moves into the constriction zone so as to maximally effect the measurable current through the nanopore.
  • the terms "through the nanopore” or “translocate” are used to convey that at least some portion of the polymer analyte enters one side of the nanopore and moves to and out of the other side of the nanopore.
  • the first and second conductive liquid media located on either side of the nanopore are referred to as being on the cis and trans regions, where the analyte polymer to be measured generally translocates first from the cis region to the trans region through the nanopore.
  • the analyte polymer to be measured can translocate from the trans region to the cis region through the nanopore.
  • the nanopore system used incorporated a molecular motor, a blocking oligo, and a hairpin primer.
  • the blocking oligo is unzipped from the template strand as the template strand passes linearly from the cis to the trans side.
  • the molecular motor pulls the strand backwards through the nanopore, from the trans to the cis side, by virtue of the polymerase action that is "primed" by the hairpin primer.
  • the entire length of the polymer does not pass through the pore, but sub-portions or segments of the polymer complete the pass through the nanopore for analysis.
  • the analyte nucleic acid polymer can be translocated through the nanopore using a variety of mechanisms.
  • the analyte polymer and/or reference sequence can be electrophoretically translocated through the nanopore by virtue of the electrical field that is applied to the system.
  • some nanopore systems also incorporate structural elements to apply an electrical field across the nanopore-bearing membrane or film.
  • the system can include a pair of drive electrodes that drive current through the nanopores.
  • the system can include one or more measurement electrodes that measure the current through the nanopore. These can be, for example, a patch-clamp amplifier or a data acquisition device.
  • nanopore systems can include an Axopatch-IB patch-clamp amplifier (Axon Instruments, Union City, CA) to apply voltage across the bilayer and measure the ionic current flowing through the nanopore.
  • the electrical field is sufficient to translocate a polymer analyte through the nanopore.
  • the voltage range that can be used can depend on the type of nanopore system being used.
  • the applied electrical field is between about 20 mV and about 260 mV, for protein-based nanopores embedded in lipid membranes.
  • the applied electrical field is between about 40 mV and about 200 mV.
  • the applied electrical field is between about 100 mV and about 200 mV.
  • the applied electrical field is about 180 mV.
  • the applied electrical field can be in a similar range as described, up to as high as 1 V.
  • nanopore systems can include a component that translocates a polymer through the nanopore enzymatically.
  • a molecular motor can be included to influence the translocation of polymers through the nanopore.
  • a molecular motor can be useful for facilitating entry of a polymer into the nanopore and/or facilitating or modulating translocation of the polymer through the nanopore.
  • the translocation velocity, or an average translocation velocity is less than the translocation velocity that would occur without the molecular motor.
  • the molecular motor can be an enzyme.
  • Illustrative, nonlimiting examples useful for nanopore systems include polymerases, exonucleases, a Klenow fragment, helicases (such as hel308/Mbu, T7hp4A, RecD, XpD), translocases, and topoisomerases.
  • a DNA polymerase such as phi29 can be used to facilitate movement in both directions. See Cherf, G.M., et al., "Automated Forward and Reverse Ratcheting of DNA in a Nanopore at 5- A Precision," Nature Biotechnology 30:344-348, 2012; and Manrao et al., 2012, both of which are incorporated herein by reference in their entireties.
  • An embodiment of the present system that utilizes a molecular motor is also schematically illustrated in FIGURES IB and 1C.
  • the present aspect of the disclosure includes the step of detecting an ion current to provide a current patter associated with a portion of the nucleic acid polymer.
  • characteristics of the nucleic acid polymer analyte, or subunit(s) thereof can be determined based on the effect of the polymer, or subunit(s) thereof, on a measurable signal when interacting with the nanopore, such as interactions with the outer rim, vestibule, or constriction zone of the nanopore.
  • the output signal produced by the nanopore system is any measurable signal that provides a multitude of distinct and reproducible signals depending on the physical characteristics of the polymer or polymer subunit(s).
  • the polymer subunit(s) that determine(s) or influence(s) a measurable signal is/are the subunit(s) residing in the "constriction zone," i.e., the three- dimensional region in the interior of the pore with the narrowest diameter.
  • the number of polymer subunits that influence the co- passage of electrolytes and, thus, a current output signal can vary.
  • the ionic current level through the pore is an output signal that can vary depending on the particular polymer subunit(s) residing in the constriction zone of the nanopore at any given time.
  • the current levels can vary to create a trace, or "current pattern,” of multiple output signals corresponding to the contiguous sequence of the polymer subunits that have affected the current at each iterative step.
  • This detection of current levels, or “blockade” events have been used to characterize a host of information about the structure polymers, such as DNA, passing through, or held in, a nanopore in various contexts.
  • a "blockade” is evidenced by a change in ion current that is clearly distinguishable from noise fluctuations and is usually associated with the presence of an analyte molecule, e.g., one or more polymer subunits, within the nanopore such as in the constriction zone.
  • the strength of the blockade, or change in current will depend on a characteristic of the polymer subunit(s) present. Accordingly, in some embodiments, a “blockade” is defined against a “blockade reference” current level.
  • the blockade reference current level corresponds to the current level when the nanopore is unblocked (i.e., has no analyte structures present in, or interacting with, the nanopore).
  • the blockade reference current level corresponds to the current level when the nanopore has a known analyte (e.g., a known analyte polymer subunit) residing in the nanopore.
  • the current level returns spontaneously to the blockade reference level (if the nanopore reverts to an empty state, or becomes occupied again by the known analyte).
  • the current level proceeds to a level that reflects the next iterative translocation event of the polymer analyte domain through the nanopore, and the particular subunit(s) residing in the nanopore change(s).
  • the blockade is established when the current is lower than the blockade reference current level by an amount of about 1-100% of the blockade reference current level. It will be understood that the blockade reference current level can immediately precede the blockade event or, alternatively, be separated from the blockade event by a period of time with intervening current measurements.
  • the ionic current may be lower than the blockade reference current level by a threshold amount of about, at least about, or at most about 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%, or any range derivable therein, of the blockade reference current level when a polymer analyte domain subunit enters the nanopore.
  • a threshold amount of about, at least about, or at most about 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%, or any range derivable therein, of the blockade reference current level when a polymer analyte domain subunit enters the nanopore.
  • the blockade With respect to the blockade reference current level defined by the presence of a known analyte (e.g., known polymer subunit(s)), the blockade is established when the current is lower or higher than the reference level by an amount of about 1-100% of the reference current level. It will be understood that the blockade reference current level can immediately precede the blockade event or, alternatively, be separated from the blockade event by a period of time with intervening current measurements.
  • a known analyte e.g., known polymer subunit(s)
  • the ionic current may be lower or higher than the blockade reference current level by threshold of about, at least about, or at most about 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%, or any range derivable therein, of the blockade reference current level when a polymer analyte domain subunit enters the nanopore.
  • "Deep blockades" can be identified as intervals where the ionic current is lower (or higher) by at least 50% of the blockade reference level. Intervals where the current drops by less than 50% of the blockade reference level can identified as "partial blockades.”
  • the current level in a blockade remains at the reduced (or elevated) level for at least about 1.0 ⁇ 8.
  • the present inventors have determined that the measureable current pattern, specifically one or more blockades in a trace, are associated with structure(s) of one or more contiguous nucleotides of a nucleic acid polymer that reside in the constriction zone of the nanopore during translocation. Furthermore, the inventors have determined that slight modifications, such as methylation and hydroxymethylation, on one or more specific nucleotide residues differentially effect the current flow as compared to a polymer with the same nucleotide sequence, but with unmodified nucleotides. The influence on the current flow is detectable and, as demonstrated below, can be used to accurately detect the modification and map it to the specific nucleotide.
  • the current pattern associated with a portion of the nucleic acid polymer is compared to a "reference current pattern" (as distinct from the "blockade reference level") associated with the same nucleotide sequence as the portion of the nucleic acid polymer, but wherein sequence associate with the reference current pattern does not have any modifications to any of the nucleotides.
  • the presence of a modified nucleotide in the analyte polymer (or a portion thereof) is indicated by a difference between the current pattern and the reference current pattern is detected.
  • the construction zone i.e., the shape of the narrowest portion of the pore tunnel
  • the output signal for each iterative step during translocation i.e., the signal reflecting the passage of a single nucleotide
  • the output signal for each iterative step during translocation is often affected by multiple contiguous nucleotides in the polymer sequence, specifically those that reside in the constriction zone at each iterative passage step.
  • each blockade event in the trace is influenced mostly by a quadromer (or 4-mer) of contiguous nucleotides that reside in the constriction zone at that time.
  • each individual nucleotide in the sequence can ultimately contribute to four blockade events if it passes completely through the constriction zone.
  • the inventors have shown that the specific difference in the signal that results from a modification of a single nucleotide is similarly observed over four blockade events and the profile of the difference in current signal over at least four blockade events is indicative of the specific type of modification and the location of the modification.
  • the specific profile of the difference in current signal over at least four blockade events is also influenced by the sequence context of the quadromer, i.e., the surrounding nucleotide sequence of the quadromer.
  • the determination of the presence of a modified nucleotide can depend on the whether the difference between the analyte polymer current pattern and the reference current pattern is associated with a particular portion of the analyte polymer.
  • the current difference is associated with a portion of the nucleic acid analyte polymer that comprises one or a plurality of contiguous nucleotides of the nucleic acid polymer.
  • the portion of the nucleic acid analyte polymer comprises two or more contiguous nucleotides of the nucleic acid polymer, such as 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more contiguous nucleotides of the nucleic acid polymer. In some embodiments, the portion of the nucleic acid analyte polymer comprises 3, 4, 5, or 6 contiguous nucleotides of the nucleic acid polymer. In some embodiments, the portion of the nucleic acid polymer comprises the nucleotide or nucleotide position with the modification.
  • the portion of the nucleic acid polymer also comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 contiguous nucleotides immediately adjacent to the nucleotide or nucleotide position with the modification.
  • the additional contiguous nucleotides can be immediately adjacent at the 5'-, the 3'-, or both sides of the nucleotide or nucleotide position with the modification.
  • the portion of the nucleic acid polymer includes at least one nucleotide immediately 5'- to the modified nucleotide. In some embodiments, the portion of the nucleic acid polymer includes at least one nucleotide immediately 3'- to the modified nucleotide.
  • the portion of the nucleic acid polymer includes at least one nucleotide immediately 5'- and at least one nucleotide immediately 3' to the modified nucleotide.
  • the two nucleotides positioned immediately 5' to the modified nucleotide had the most influence on the current signal difference, followed by the single nucleotide position immediately 3' to the modified nucleotide. Persons of skill in the art would be able to determine the positions of maximal influence on the signal difference for any nanopore of interest.
  • the inventors discovered that the identity of the specific nucleotides that occupy each position in the nucleic acid polymer portion influence the identifiable character of the current difference.
  • every sequence variation of XY ⁇ G (and XY h CG) were tested and the various current profiles (and differences with the reference sequences) were catalogued (see, e.g., FIGURES 7A and 7B).
  • Such data can be used as a reference, or "look-up" up table, to assist the mapping and identification of the specific modifications that are detected.
  • the present method further comprises identifying the type of modification present in the nucleic acid polymer based on a character of the difference between the current pattern and the reference current pattern.
  • the character is the duration of the difference, e.g., how much time or many blockade events the difference is observed for.
  • the character is the degree of change, e.g., how much the current pattern differs from the reference pattern in terms of increase or decrease.
  • the character can also combine a particular range of current increase or decrease over a particular time (or number of blockade events).
  • knowledge of the specific sequence of the nucleic acid portion can contribute to the determination of whether the difference is indicative of a modified nucleotide.
  • the comparable current patterns for the portion of the analyte nucleic acid polymer and reference current pattern must be associated with the same sequence, whether or not known.
  • a reference nucleic acid polymer can be replicated or amplified from the analyte nucleic acid polymer using conventional techniques that replicate the (potentially unknown) sequence, but that do not replicate the modification. Such techniques include using the polymerase chain reaction.
  • the reference current pattern can be obtained in a number of ways. In some embodiments, the reference current pattern can be generated de novo by similarly applying a reference nucleic acid polymer to nanopore analysis to generate a current pattern.
  • the reference current pattern was previously determined and is available in a reference or look-up table.
  • the reference current pattern can be derived from the modeling signals that would be expected from the structure of the reference portion, i.e., the sequence without modification.
  • the present disclosure provides a method that involves generating a reference current pattern for comparison.
  • this aspect provides a method detecting a nucleotide modification in a nucleic acid polymer, comprising the step of amplifying a target nucleic acid polymer that potentially contains at least one nucleotide modification to produce a reference nucleic acid polymer that does not contain a nucleotide modification.
  • the method further comprises applying the target and reference nucleic acid polymers to a nanopore system comprising a first conductive liquid medium in liquid communication with a second conductive liquid medium through a nanopore.
  • the target nucleic acid polymer is caused to translocate through the nanopore from the first conductive liquid medium to the second conductive liquid medium, and an ion current is detected to provide a target current pattern associated with a portion of the target nucleic acid polymer.
  • the reference nucleic acid polymer is caused to translocate of through the nanopore from the first conductive liquid medium to the second conductive liquid medium, and an ion current is detected to provide a reference current pattern associated with a portion of the reference nucleic acid polymer, wherein the portion of the target nucleic acid polymer comprises the same nucleotide sequence as the portion of the target nucleic acid polymer.
  • the target current pattern is compared to the reference current pattern, wherein a difference between the current pattern and the reference current pattern indicates the presence of a modified nucleotide in the target nucleic acid polymer.
  • the technique of replication is limited to approaches that provide reference nucleic acid polymers that contain the same nucleotide sequences as the portion of the target nucleic acid polymer, but that does not retain the modifications to be detected.
  • the reference nucleic acid polymer is produced from the target nucleic acid polymer using at least one round of the polymerase chain reaction (PCR).
  • PCR polymerase chain reaction
  • the following describes the application of a nanopore system to detect and map methylated and hydroxymethylated cytosine nucleotides in DNA polymer analyte.
  • the M2-NNN-MspA protein was purified from Mycobacterium smegmatis as previously described in Butler et al (2008).
  • DNA oligonucleotides were synthesized at Stanford University Protein and Nucleic Acid Facility and purified at their facility using column purification methods. See Table 1 for sequence of ssDNA strands used. The location of the modifications is indicated in all figures and in Table 1 with "m” or "h”.
  • DNA templates, primers and blocking oligomers were mixed at relative molar concentrations of 1: 1: 1.2 and annealed by incubating at 95 °C for 3 min followed by slow-cooling to below 30 °C. DNA and phi29 DNAP were stored at -20 °C until immediately before use.
  • An Axopatch 200B integrating patch clamp amplifier (Axon Instruments) applied a 180 mV voltage across the bilayer (trans side positive) and measured the ionic current through the pore.
  • M2-NNN MspA was added to the grounded cis compartment, yielding a concentration of -2.5 ng/ml. Once a single pore inserted, the compartment was flushed with experimental buffer to avoid further insertions.
  • Annealed DNA as shown in FIGURE 1A, was then added to the experimental volume to achieve a final concentration near -1 ⁇ .
  • DNAP DNA Polymerase
  • EDTA and DTT were added to the front well to final concentrations of ImM each. The EDTA and DTT bind up contaminant divalent ions and create the reducing conditions which phi29 requires for functionality.
  • phi29 DNAP is added to a final concentration of 1.5 ⁇
  • a dNTP mixture is added to ⁇ of each of the four standard dNTPs
  • MgCl 2 is added to a final concentration of lOmM.
  • FIGURES IB and 1C The interaction of phi29 DNAP with the MspA nanopore and hybridized template construct is illustrated schematically in FIGURES IB and 1C.
  • the reproducible current levels were generated indicating the nucleotide- by- nucleotide movement of the ssDNA template strand length- wise through the constriction zone of the nanopore.
  • the template strand threads through the nanopore from the cis to the trans side of the pore, while the blocking oligo unzips from the template strand.
  • the 3'-end of the hairpin primer is exposed to the phi29 DNAP active site, which enables the polymerase to extend the primer.
  • this extension action of the polymerase overcomes the original translocation direction of the template strand, resulting in the template strand being "pulled” back through the nanopore from the trans side to the cis side.
  • the electrical field is monitored for the current flowing through the nanopore. Fluctuations of current are indicative of the specific one or more nucleotides of DNA residing in the most constricted portion of the nanopore opening.
  • each blockade event is determined by a quadromer of contiguous nucleotides within the constriction zone of the nanopore.
  • sequence of the quadromer changes leading to a fluctuation in current level.
  • the current fluctuations are recorded to generate a "trace" of current fluctuations over time, with each nucleotide movement associated with a distinct blockade event.
  • Established relative blockades for the different combinations of nucleotides allow for the reconstruction of the DNA sequence with its associated levels. See International PCT Pub. No. WO2013/159042, incorporated herein by reference in its entirety.
  • FIGURE 2 It is noted that the sequence illustrated in FIGURES 2-4 (and set forth herein as SEQ ID NOS:5 and 6) are internal portions of the sequences set forth in SEQ ID NOS: l and 2, respectively, but are illustrated from 3' to 5' to correspond to the temporal acquisition of the data as the phi29 DNAP pulled the template strand into the cis side of the pore.
  • FIGURE 3 which shows the normalized current differences (modified TGCC signal minus the TGCC signal)
  • the introduction of h C tended to result in a lower current level over the associated quadromers
  • the introduction of m C resulted in a consistently higher current level in the associated quadromers.
  • An unmodified CpG present in the modified TGCC template strand notably resulted in a similar current signal as the reference TGCC strand.
  • a statistical analysis wherein Gaussian peaks were fit to the data illustrated in FIGURE 3 resulted in the Gaussian peaks corresponding closely to the locations of the epigenetic modifications in the modified TGCC template.
  • the signal changes resulting from these epigenetic modifications can be precisely correlated to the location of and type of modification within the DNA polymer using a nanopore-based analysis.
  • the following is a description of an expanded study a nanopore-based analysis to detect and distinguish multiple epigenetic modifications that appear in DNA polymers within a variety of different sequence contexts.
  • DNAP phi29 DNA polymerase
  • FIGURES 5A and 5B show raw current traces for unmethylated and methylated DNA, respectively.
  • the extracted average current levels are shown in FIGURE 5C for unmethylated DNA in black and methylated DNA in light gray.
  • FIGURE 5C shows the difference between the methylated and unmethylated current level sequences.
  • FIGURES 6A-6D show the average current level differences in 20 or more single-molecule comparisons for four different DNA constructs. Across all such comparisons, m C consistently increases current relative to C, whereas h C generally decreased current relative to C.
  • the current difference caused by a m C or a h C was found to be strongly affected by the sequence context in which it is embedded.
  • the nucleotides immediately adjacent to a m C or h C have the greatest influence on the size and shape of the current difference.
  • the nucleotides on the 5' side were varied and the nucleotide on the 3' side of the C was fixed as a G because of the biological relevance of CpG sites.
  • it was observed that the nucleotide two positions to the 5' side of the modified cytosine have a bigger influence than the nucleotide two positions toward the 3' side, which have a lesser effect (see FIGURE 10).
  • FIGURES 7A and 7B Results for all 16 XY m CpGs and XY h CpGs are summarized in FIGURES 7A and 7B.
  • the maximum difference is up to 7 pA depending on sequence context. On average, the maximum difference caused by m C is approximately 2.5 pA (FIGURE 7A, Bottom Right Panel).
  • FIGURE 7A Bottom Right Panel
  • four nucleotides within MspA's constriction affect each current level (see, e.g., schematic illustrations in FIGURES IB and FIGURE 8), with the two nucleotides centered in the pore's constriction affecting the current the most (Manrao et al., 2011; Manrao et al., 2012).
  • the present results are consistent in that the replacement of C for m C or h C affects approximately four consecutive current levels.
  • the current difference is maximal when the m C is positioned immediately to cis of the constriction and the shape of the difference peak exhibits skewness.
  • the schematic in FIGURE 8 shows how sequence context dependence arises.
  • MspA's cross-section is shown in solid black, and variable gray shading indicates the region of high electric field.
  • Nucleotides within the region of high electric field affect the ion current.
  • m C or h C pass through the pore, their location relative to the pore constriction determines how much they affect the current.
  • All nucleotides within the high-field region of the constriction will influence the current, and therefore alter the influence of a ⁇ Kl and h C modification.
  • the nucleotide to the 3' of the CpG is also relevant, albeit to a smaller extent.
  • FIGURE 4 the 3' side of the current difference peak caused by m C is reduced.
  • the data in FIGURE 7 demonstrate that the four nucleotides X, Y, C, and G dominate the magnitude of the current difference caused by m CpG and h CpG (see also FIGURE 10).
  • FIGURES 9A-9D shows a construct with several modified Cs spaced only five nucleotides apart.
  • the current difference peaks associated with the four m Cs and two h Cs are still easily distinguishable.
  • the difference peak is wider and higher than the signal for just one m CpG within the same context.
  • Placing a h CpG immediately adjacent to a " pG (FIGURE 9D) reduces the signal of the nearby m CpGs. The signal is approximately a superposition of the individual " and h C signals.
  • Methylated sites with smaller current differences such as CT m CpG and TC m CpG, were detected with lower accuracy: ⁇ 86% and ⁇ 88%, respectively (see TABLE 2 for individual context-dependent detection rates).
  • h C true-positive rates were lower than for m C.
  • m C was distinct from h C; "KIpGs were miscalled as h CpGs in 3 out of 478 occurrences, whereas h CpGs were never miscalled as roCpGs in 609 reads.
  • true negatives included non-CpG regions in addition to CpGs tested above, resulting in a higher true- negative detection rate than in the method described in the preceding paragraph. Rates from these two methods are not directly comparable.
  • a Bayesian classification measure was used to find m Cs, yielding similar detection efficiencies (see the Materials and Methods in the below Examples section).
  • m C detection without reference to DNA sequence is useful for hypermethylation or hypomethylation detection and is comparable to other nanopore methylation detection techniques (Wanunu, M., et al., "Discrimination of Methylcytosine From Hydroxymethylcytosine in DNA Molecules," Journal of the American Chemical Society 133(3):486-492, 2011; Shim et al., 2013).
  • the nanopore strand sequencing method used in this work produces a second read of the same DNA molecule because of the bi-directional movement of the template strand through the nanopore (Manrao et al., 2012). Using this second read can improve calling accuracies. In contrast to other m C and h C detection techniques that rely on m C- specific chemical reactions and/or enzymatic kinetics, the present system detects the methylation directly.
  • the present methylation detection method does not require de novo sequencing with the nanopore to detect methylation. Given a previously measured reference current sequence for unmethylated DNA and known context-dependent methylation patterns as in FIGURES 7 A and 7B, one can then take a single read of a methylated DNA molecule and detect methylation with confidence for most sequence contexts. Because PCR does not copy certain epigenetic modifications such as methylation, nanopore reads of amplified copies would serve as the unmethylated reference. Genomic DNA would then be extracted, given adapters to enable polymerase control, and then be presented to the pore. Individual reads of methylated DNA could then be aligned to the current level reference using a Smith- Waterman alignment algorithm (Manrao et al., 2012).
  • phi29 DNAP was used as a molecular motor to control the motion of DNA through a single MspA pore established in an unsupported phospholipid bilayer.
  • the buffer was 300 mM KC1, 10 mM Hepes buffered at pH 8.00 + 0.05. Currents were recorded on an Axopatch 200B amplifier with custom Labview software (National Instruments) at a voltage bias of 180 mV.
  • DNA template, primer, and blocking oligomer were mixed together in a 1: 1: 1.2 ratio to a final concentration of 50 ⁇ .
  • DNA was then annealed by heating to 95 °C for 5 min, cooling to 60 °C for 2 min, and then cooling to 4 °C.
  • Experimental concentrations were ⁇ 500 nM for DNA, ⁇ 500 nM for phi29 DNAP, ⁇ 500 ⁇ for dNTPs, -10 mM for MgC12, and ⁇ 1 mM for DTT.
  • This sequence creates a reproducible current motif that signals the end of the read. This region was used to calibrate currents and, thus, to control for small changes in buffer conductivity due to evaporation or temperature variation.
  • the sequence of interest followed this calibration sequence.
  • the DNA was designed to contain a variety of nucleotides adjacent to the CpGs. Each strand had at least three CpGs embedded in a random sequence, sufficiently spaced so that their current signatures did not overlap. In each strand, three of these CpGs were uniformly either unmethylated, methylated, or hydroxymethylated. Additionally, eight different DNA sequences were examined (PAN Laboratories, Stanford University, Stanford, CA) containing various methylation patterns (TABLE 2 for sequences used). Some experiments were performed with a mixture of methylated, hydroxymethylated, and unmethylated DNA. Without calibration, these strands could still be sorted by methylation- specific currents.
  • blockade events were determined using a thresholding method on current data.
  • a feed-forward neural network removed events that did not correspond with phi29 polymerase activity. Once appropriate events were determined, raw current levels were discerned using a custom-written graphical user interface. Current level transition boundaries were selected, and the median current levels were extracted in the time order that they occurred for each event. The phi29 DNAP occasionally exhibited backstepping, causing repeated levels that were removed. Consensus current level sequences were found for each sequence type, and event levels associated with that sequence were automatically aligned using a Needleman-Wunsch algorithm. For experiments with DNA mixtures, a quality score from the Needleman-Wunsch algorithm was used to distinguish DNA with different types of methylation.
  • Events are found using a threshold detection algorithm, described in Butler et al. (2008). Events with durations less than 1 second or with average currents greater than 75 pA or less than 15 pA were rejected.
  • a feed-forward neural network consisting of 5 layers, each layer containing 20 neurons, was employed. The features used in the neural network were the event duration, the event average current and variance, and the outputs of a K-means clustering algorithm. The neural network was trained with -200 events. The neural network removed 100% of the events that were not associated with polymerase activity.
  • DNA type classification After extracting current level sequences, we constructed consensus current level sequences specific to unmethylated, methylated, and hydroxymethylated DNA constructs. These consensus current level sequences were constructed using events from experiments containing a single sequence and methylation pattern of DNA, without a mixture of other methylation patterns. Alignment to form consensus level sequences was performed using a Needleman-Wunsch algorithm with an affine gap, as used in Manrao et al., 2012. An event classification algorithm was developed to sort events from experiments that had mixtures of unmethylated, methylated, and hydroxymethylated DNA. The algorithm aligned each event to the consensus levels extracted above and produced a similarity score used to classify each event.
  • the classification method was tested on 297 events from experiments containing a single sequence and methylation pattern of DNA, yielding a classification accuracy of 99.7%.
  • This score-based whole-event classification algorithm was used to separate methylated, hydroxymethylated, and unmethylated events in experiments that were run with mixtures of DNA with various methylation patterns.
  • the consensus levels for unmethylated, methylated, or hydroxymethylated were updated as these events were classified.
  • FIGURES 5A-5D and FIGURES 6A-6D With events classified and current levels aligned, we constructed the level differences, shown in FIGURES 5A-5D and FIGURES 6A-6D. From the known CpG location the level differences surrounding the CpG were extracted, as shown in FIGURE 7. These level differences are used in the Bayesian methylation classifier described below. Level differences near the CpG were used to distinguish m CpG, hCpG, and unmethylated CpG's. We estimated the ability of a single level near the CpG to accurately call the type of CpG's methylation with a t-test that compared the three hypotheses: methylated, hydroxymethylated, or unmethylated (FIGURES 1 OA- IOC).
  • FIGURES 10A, 10B, and IOC also show the average of the t-test for different contexts, yielding the classification power of a given position for all contexts XYCpG.
  • the plots of classification power resemble the current difference plots shown in the right-most bottom plot in FIGURES 7A and 7B, but include level variance information and are independent of sign.
  • the magnitude of the classification power indicates how distinguishable m CpG is from CpG, h CpG is from CpG, and all three are from each other in FIGURES 10A, 10B, and IOC, respectively. It is observed that level positions -1, 0 and 1, corresponding to the two levels on the 5' side of the modified C site and one level to the 3' side of the modified C, have the highest discrimination power.
  • This table encompasses two similar tables, A and B. Within each table, the calling frequencies and detected count of XYCpG's are presented. Each row gives the calling frequency for a m C, h C, or C called as a m C, h C or C, as indicated for each row. The highlighted rows indicate CpG sites called as non-methylated CpG due to positive "KIpG or h CpG detection within 2 levels. Each column indicates the context within which its calling frequencies were obtained. Numbers in parentheses are the count of observed CpG levels within the given context and 5 type of CpG.
  • the fourth row in the first column states that in 81% of 75 observed events, an AA h CG was accurately called as an AA h CG.
  • the final column provides the average and the standard error of the calling frequency obtained by bootstrap resampling 1/5 of the observed events for each construct 5 times.
  • TABLE 3A has the results classification using three levels (-1, 0 , and 1) and TABLE 3B has the results classification using two levels (-1 and 0).
  • Bayesian probability classifier as described below, was used. This started with P(XYZ ⁇ ⁇ ; ⁇ ), which is the probability of the sequence hypothesis XYZ given set of current differences ⁇ A j ⁇ .
  • the letters X and Y are each any standard nucleotide and Z is a C, m C, or h C.
  • P( ⁇ A ⁇ ⁇ XYZ) was defined, which is the probability of observing current difference knowing the sequence XYZ at location i. This was modeled as
  • ⁇ ( ⁇ ; I XYZ) exp(-(A ; - A XYZ i ) ⁇ 2 ⁇ XYZ J ) , (equation 1) where A XYZ i and ⁇ ⁇ ⁇ are the mean and variance of the level difference at position i for context XYZ (as shown in FIGURES 7 A and 7B).
  • Equation 3 The highest probability for equation 3 was used to classify the set of current differences as belonging to the sequence hypothesis XYZ.
  • the prior probability can be taken to be the expected probability of CpG methylation or hydroxymethylation for a given sample, or to be 1/3 as was chosen for the samples. Because XYZ were compared over all hypotheses, the factor ⁇ ( ⁇ ; ⁇ ) does not matter, and only the product of probabilities from Equation 1 were used.
  • Equation 3 Given the known sequence and location of the CpG site with known context XY, Equation 3 was used and considered only C, m C, or h C. Classification of the CpG was given by the highest value of PiXYZ). Classification frequency was calculated as the number of classifications divided by the number of expected classifications for the given XYZ context.
  • Equation 3 Given an unknown sequence, but the observed level differences, Equation 3 was extended to consider all dinucleotide hypotheses and all cytosine variants C, m C, or h C. The known level differences were compared to all sets of level differences within a given event, and observed peaks in probabilities for given XYZ hypotheses. Classification was given by the highest probability along the progression of level differences.
  • a peak detection algorithm was used to identify methylation sites without using sequence specific knowledge (i.e., events were not compared using known current difference patterns for various sequence contexts). Such detection can identify methylation sites independent of the sequence of the examined DNA with reasonable accuracy.
  • the peak detection required level differences that reach a maximal height of 1.1 pA and that have a separation of at least 6 levels from any adjacent peaks. With these parameters, -93% true-positive methylation within 2 levels of the known methylation position were identified, and >99% true-negative (non-methylation) were identified. Increasing the requisite peak height improved true-negative detection, at the cost of true- positive detection. While illustrative embodiments have been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present disclosure generally relates to methods for measuring, diagnosing, visualizing, and/or detecting modifications in nucleic acids using nanopore-based analysis. The methods comprise translocating the nucleic acid polymer through a nanopore, detecting an ion current pattern associated with a portion of the nucleic acid polymer, and comparing the current pattern to a reference current pattern associated with the same nucleotide sequence as the portion of the nucleic acid polymer but which does not contain any modifications. In some embodiments, the methods and compositions are useful for accurately detecting, distinguishing, and mapping epigenetic modifications in nucleic acids such as 5-methylcytosine (mC) and 5-hydroxymethylcytosine (hC). In some embodiments, the methods comprise amplifying a target nucleic acid polymer with suspected epigenetic modifications to generate a reference nucleic acid polymer with the same nucleotide sequence but without any epigenetic modifications.

Description

METHODS FOR DETECTING AND MAPPING MODIFICATIONS TO NUCLEIC ACID POLYMERS USING NANOPORE SYSTEMS
CROSS-REFERENCE(S) TO RELATED APPLICATION(S) This application claims the benefit of U.S. Application No. 61/721,430, filed
November 1, 2012, and U.S. Application No. 61/841,824, filed July 1, 2013, both of which are incorporated herein by reference.
STATEMENT REGARDING SEQUENCE LISTING
The sequence listing associated with this application is provided in text format in lieu of a paper copy and is hereby incorporated by reference into the specification. The name of the text file containing the sequence listing is 43321_SeqList-FINAL.txt. The text file is 19KB; was created on 01 November 2013; and is being submitted via EFS- Web with the filing of the specification.
STATEMENT OF GOVERNMENT LICENSE RIGHTS
This invention was made with Government support under R01HG005115 and
R01HG006321 awarded by National Institutes of Health. The Government has certain rights in the invention.
BACKGROUND
The nucleic acid, DNA, is often referred to as the "blueprint for life." However, there is more to the code than DNA sequence alone. Modifications can occur on the canonical nucleotide subunits that can affect the functional information embedded in the DNA. For example, epigenetic factors govern the DNA blueprint's transcription and translation into protein. Thus, there is a rapidly growing interest in understanding these modifications, such as epigenetic factors.
Epigenetic modifications such as methylation and/or hydroxymethylation of DNA can be natural processes by which normal cells function and carry out or inhibit many cellular functions. For example, epigenetic modifications are known to be involved with normal silencing and/or prevention of gene expression, thereby enabling a cell to essentially turn off one or more genes. The most common epigenetic DNA modification is the methylation of cytosine leading to 5-methylcytosine (mC). In mammals, most genetically relevant cytosine methylations (mC) occur in C-G dinucleotides (CpG; the "CpG" shorthand invokes the relationship that the cytosine and guanine are linked in the same strand by a single phosphate and distinguished the relationship from a C:G pairing of complementary strands, such as in double-stranded DNA). Methylation is associated with gene regulation (i.e., highly methylated DNA tends to be less transcriptionally active) and therefore has implications for cell development, aging, and diseases such as cancer. Further oxidation of the methyl residue results in 5-hydroxymethylcytosine (hC). Because of its relatively recent discovery in mammalian tissue, the function of hC is less well explored. However, there is indication that it also has a role in regulation of chromatin structure and gene expression. Unlike DNA sequence, methylation patterns are tissue specific and change over the life of an organism as it develops or is exposed to certain chemicals and environmental conditions. In some cases, these changes are heritable through multiple generations.
Because nucleotide modifications, such as methylation, have a proven link to gene expression, precise mapping of modifications may yield more pertinent information to research and ultimately to clinical diagnosis of gene-regulation-related disease, than sequencing the standard four bases alone. Clinical uses will require fast, inexpensive, and reliable detection methods to map modifications such as methylation. Because such modification patterns vary between cells, it is preferable to use small, native, unamplified DNA samples, making this task suitable for single-molecule techniques.
Currently available techniques for mapping of DNA methylation include the following: bisulfite sequencing, methylation- specific enzyme restriction, affinity enrichment, and various single-molecule techniques. In bisulfite sequencing, all unmethylated cytosines are converted to deoxyuridine. Converted samples are amplified, sequenced, and compared with unmodified sequence information. Converted Cs become Ts in the amplified DNA, whereas " s remain unchanged. Conditions required bringing this conversion close to 100% completion cause DNA damage by fragmentation. Conventional bisulfite sequencing cannot differentiate between " and hC. Oxidative bisulfite sequencing can distinguish between mC and hC; however, this assay has significant sample losses with only 0.5% of the original DNA fragments remaining intact. In methylation- specific enzyme restriction, proteins recognize and cut DNA strands at mCs, and subsequent sequencing and alignment of the strands to the known genomic sequence reveal the locations of the "HZs. While this technique works for broadly spaced mCs, it lacks sensitivity for densely packed mCs. Affinity enrichment assays are bulk assays and are unable to resolve "HZs with nucleotide precision. Single-molecule real-time sequencing (SMRT) exploits polymerase incorporation kinetics to detect methylation while also sequencing via fluorescently tagged dNTPs. When encountering a mC, a polymerase pauses longer on average to incorporate deoxyguanosine triphosphate than when it encounters an unmethylated C. T he durations of the pauses are stochastically distributed, and the change in kinetics caused by mC is subtle. Thus, detection requires averaging over dozens of reads, complicating methylation detection. Recently, nanochannels have been used as nano-Coulter counters to measure the correlation between DNA methylation and certain histone modifications for single chromatin molecules. However, this method lacks single-nucleotide resolution.
Accordingly, despite the advances in field of detecting modifications to nucleic acids such as methylation, a need remains for a robust and simple method to detect nucleotide modifications within nucleic acids that is accurate and simple to perform. The methods and compositions of the present disclosure address this and related needs of the art.
SUMMARY
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In one aspect, the present disclosure provides a method of detecting a nucleotide modification in a nucleic acid polymer. The method comprises applying an electrical field to a nanopore system comprising a first conductive liquid medium in liquid communication with a second conductive liquid medium through a nanopore; translocating the nucleic acid polymer through a nanopore from the first conductive liquid medium to the second conductive liquid medium; detecting an ion current to provide a current pattern associated with a portion of the nucleic acid polymer; and comparing the current pattern to a reference current pattern associated with the same nucleotide sequence as the portion of the nucleic acid polymer without any modifications, wherein a difference between the current pattern and the reference current pattern indicates the presence of a modified nucleotide in nucleic acid polymer.
In some embodiments, the nucleic acid polymer can be DNA, RNA, mRNA, PNA, or a combination thereof. In some embodiments, the DNA is single stranded DNA (ssDNA). In some embodiments, the method further comprises identifying the type of nucleotide modification present in the polymer based on a character of the difference between the current pattern and the reference current pattern. In some embodiments, the character of the difference comprises the degree of current increase or decrease and/or the duration of the difference.
In some embodiments, the nucleotide modification is an epigenetic modification or a modification resulting from DNA damage. In some embodiments, the nucleotide modification is a 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine, 5-carboxycytosine, b-glucosyl-5-hydroxy-methylcytosine, 8-oxoguanine, 2-amino-adenosine, 2-amino-deoxyadenosine, 2-thiothymidine, pyrrolo-pyrimidine, 2-thiocytidine, a thymine dimer, or an abasic lesion.
In some embodiments, the portion of the nucleic acid polymer comprises one or a plurality of contiguous nucleotides of the nucleic acid polymer. In some embodiments, the portion of the nucleic acid polymer comprises the nucleotide or nucleotide position with the modification. In some embodiments, the portion of the nucleic acid polymer further comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more additional nucleotides adjacent to the nucleotide or nucleotide position with the modification on one or both sides. In some embodiments, at least one additional nucleotide is adjacent at the 5' side of the nucleotide with the modification. In some embodiments, at least one additional nucleotide is adjacent at the 3' side of the nucleotide with the modification. In some embodiments, the portion of the nucleic acid polymer further comprises at least two additional nucleotides adjacent at the 5' side of the nucleotide with the modification and at least one nucleotide adjacent at the 3' side of the nucleotide with the modification.
In some embodiments, the nanopore is a solid-state nanopore, protein nanopore, a hybrid solid state-protein nanopore, a biologically adapted solid-state nanopore, or a DNA origami nanopore. In some embodiments, the protein nanopore is a β-barrel pore, such as alpha-hemolysin or Mycobacterium smegmatis porin A (MspA), or a homolog thereof. In some embodiments, the protein nanopore sequence is modified from the wild- type sequence to contain at least one amino acid substitution, deletion, or addition. In some embodiments, the at least one amino acid substitution, deletion, or addition results in a net charge change in the nanopore. In some embodiments, the electric field is sufficient to cause the electrophoretic translocation of the nucleic acid polymer through the nanopore. In some embodiments, the electric field is between about 40 mV to 1 V.
In some embodiments, the nanopore is associated with a molecular motor, wherein the molecular motor is capable of moving a nucleic acid polymer into or through the nanopore with an average translocation velocity that is less than the average translocation velocity at which the analyte translocates into or through the nanopore in the absence of the molecular motor. In some embodiments, the molecular motor is a polymerase, an exonuclease, a helicase, a topoisomerase, or a translocase. In some embodiments, the molecular motor is phi29.
In another aspect, the disclosure provides method of detecting a nucleotide modification in a nucleic acid polymer, as generally described above, but including the step generating a reference nucleic acid polymer that contains the same nucleotide sequence but that does not contain any modifications to the canonical nucleotide structures. The method comprises amplifying a target nucleic acid polymer that potentially contains at least one nucleotide modification to produce a reference nucleic acid polymer that does not contain a nucleotide modification; applying the target and reference nucleic acid polymers to a nanopore system comprising a first conductive liquid medium in liquid communication with a second conductive liquid medium through a nanopore; causing the translocation of the target nucleic acid polymer through the nanopore from the first conductive liquid medium to the second conductive liquid medium; detecting an ion current to provide a target current pattern associated with a portion of the target nucleic acid polymer; causing the translocation of the reference nucleic acid polymer through the nanopore from the first conductive liquid medium to the second conductive liquid medium; detecting an ion current to provide a reference current pattern associated with a portion of the reference nucleic acid polymer, wherein the portion of the target nucleic acid polymer comprises the same nucleotide sequence as the portion of the target nucleic acid polymer; comparing the target current pattern to the reference current pattern, wherein a difference between the current pattern and the reference current pattern indicates the presence of a modified nucleotide in the target nucleic acid polymer.
In some embodiments, the reference nucleic acid polymer is produced from the target nucleic acid polymer using at least one round of the polymerase chain reaction (PCR). In some embodiments, the method further comprises determining the position of the modified nucleotide in the target polymer. In some embodiments, the method further comprises identifying the modified nucleotide in the target polymer. In some embodiments, the method further comprises determining the sequence of at least a portion of the target nucleic acid polymer comprising the modified nucleotide. In some embodiments, the modified nucleotide is identified without knowledge of the nucleotide identity in the unmodified reference sequence.
DESCRIPTION OF THE DRAWINGS
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
FIGURE 1A-FIGURE 1C schematically illustrate a representative system useful for nanopore-based analysis of nucleic acid modifications according to the present disclosure. FIGURE 1A is a diagram of an exemplary template configuration, wherein a hairpin primer oligonucleotide forms a loop and hybridizes to a portion of itself. The remainder of the hairpin primer hybridizes to a 3'-end domain of the template strand. The gap between the 3'-end of the template strand and the 5'-end of the hairpin primer is indicated. Adjacent to the hairpin primer, and hybridizing to an internal domain of the template strand, is a blocking oligomer. The gap between the 3'-end of the hairpin primer and the 5'-end of the blocking oligonucleotide is indicated. The blocking oligonucleotide has a series of abasic residues at the 3 '-end that do not hybridize to the template strand. The template strand is what is "read" by the nanopore. FIGURE IB schematically illustrates the use of a molecular motor in connection with a nanopore system to assist the controlled translocation of the nucleic acid template strand through the nanopore for analysis. The nanopore (e.g., MspA) is embedded in a phospholipid bilayer and provides liquid communication between the upper chamber (i.e., cis side) and the lower chamber (i.e., trans side). A voltage (e.g., 180 mV) is applied across the membrane, which causes an ion current to flow through the pore. The molecular motor (shown in dark) is pulled into contact with the vestibule of the nanopore in the cis side, but cannot pass through, thus causing the unzipping of the blocking oligomer from template strand. As the template strand is electrophoretically translocated through the nanopore the blocking oligomer is "unzipped" from the template strand. The short and narrow constriction of MspA concentrates the ion current to resolve the relatively small differences between C, mC, and hC. FIGURE 1C schematically illustrates the progression of template through the nanopore system over time. In part (i), the single stranded 5'-end of the template strand interacts with, or enters the vestibule of, the nanopore. In part (ii), the molecular motor, which is attached to the double stranded portion of the complex, contacts the vestibule of the molecular motor, but does not pass through, thus slowing translocation of the template strand through the nanopore. In part (iii), the force of translocation eventually causes the complete unzipping of the blocking oligomer (the process of which is illustrated in FIGURE IB), thus exposing the 3'-end of the hairpin primer to the molecular motor (e.g., a DNA polymerase such as phi29). In part (iv), the molecular motor pulls the template strand back through the nanopore by enzymatic action where the hairpin primer is elongated based on the template strand.
FIGURE 2 graphically illustrates the mean consensus levels and standard deviation of the mean for the normalized current levels (ion current divided by the open- pore current) determined for several events of TGCC quadromers and modified TGCC quadromers (i.e., wherein the C is methylated or hydroxymethylated). The current levels for the TGCC quadromers and modified TGCC quadromers are scaled and offset to each other on the same graph based on their calibration regions (i.e., a recognizable adapter sequence separated from the template strand by a single abasic nucleotide). The unmodified levels are displayed in black and the modified current levels are displayed in gray and with a * symbol. The quadromers and sequences illustrated in this figure (and in FIGURES 3 and 4) are listed from 3 '-5' reflecting the order the data was recorded during phi29 synthesis. This order is the opposite of how DNA is typically listed, thus the CpGs appear as GpCs in this figure. The sequence of the unmodified (top) strand is set forth as SEQ ID NO:5. The sequence of the modified strand is set forth herein as SEQ ID NO:6. The location of hydroxymethylation and methylation in the modified strand TGCC quadromers are indicated by h and m, respectively. Below the levels, the quadromers corresponding to each level are indicated. Note that hydroxymethylation decreases the average current in the affected levels while methylation increases the average current of affected levels (see FIGURE 3).
FIGURE 3 graphically illustrates the normalized difference in mean current values for each current level. Levels caused by quadromers that include 5- hydroxymethylcytosine show significant reduction in current while levels caused by quadromers that include a 5-methylcytosine show a significant increase in current (>1 of the open pore current; typical open pore current = 110 pA). A control cytosine remains unmodified and shows no significant difference in reads on both strands. The sequence of the illustrated modified strand is set forth herein as SEQ ID NO:6.
FIGURE 4 graphically illustrates the normalized difference in mean current values for each level with seven Gaussians fit the data. The Gaussian peaks indicate the centroid of the effect, which corresponds well with the location of the epigenetic modifications. Of particular note is that the Gaussian fit for the unmodified GpC region has a very low amplitude relative to the modified GpCs. This mapping difference yields a clear and easily interpretable way of mapping modifications in the nucleic acid polymer. The sequence of the illustrated modified strand is set forth herein as SEQ ID NO:6.
FIGURE 5A-FIGURE 5D graphically illustrate the detection of methylation in representative single stranded DNA sequence containing a CpG site. FIGURE 5A and FIGURE 5B illustrate segments of raw current traces for the DNA sequence with unmethylated and methylated cytosine, respectively. Ion current changes as DNA passes through the pore in single-nucleotide steps. The average current values for each current level are horizontal black or horizontal lines, respectively. The traces shown in FIGURE 5A and FIGURE B are for DNA with identical nucleotide sequence. The current trace shown in FIGURE 5A contains a single unmethylated CpG site, whereas the trace in FIGURE 5B contains a single methylated CpG site. FIGURE 5C illustrates the extracted average current values from each level from FIGURE 5A in a solid black line and from FIGURE 5B in a solid gray line, with the difference illustrated with shading. The stochastic duration of current levels has been removed so that the DNA base sequence can be aligned to the observed current levels. The DNA sequence, set forth herein as SEQ ID NO:7, is shown below (from 5'- to 3'-, left to right) with the modified C indicated as ',mC". FIGURE 5D illustrates current difference plot. The current levels obtained with methylated DNA were subtracted from the current levels obtained with unmethylated DNA. The effect of a single " pG causes an ion current increase that persists over approximately four steps of the DNA through the pore. The magnitude and duration (i.e., shape) of the current difference is determined by the nucleotides adjacent to the methylated C (see FIGURE 6 and FIGURE 7). FIGURE 6 graphically illustrates the differences in the ion current level sequences resulting from DNA containing methylation or hydroxymethylation and from DNA without methylation or hydroxymethylation. FIGURE 6A and FIGURE 6B illustrate current differences [AI = 7meth - Ainmeth' where 7meth or unmeth s me average current for at least 20 reads of methylated or unmethylated DNA, respectively] obtained with two DNA strands each containing three methylated CpG sites, indicated by "mC" in the associated sequence. X is an abasic site. The sequence illustrated in FIGURE 6A is set forth in SEQ ID NO:8, and the sequence illustrated in FIGURE 6B is set forth in SEQ ID NO:9. The methylated positions are marked by a significant current increase that persists over approximately four steps of the DNA through the pore. The amplitude and shape of the current difference depend on the nucleotides adjacent to the mC. In regions containing no methylation, current differences are insignificant. FIGURE 6C and FIGURE 6D illustrate the current difference [AI = /hydroxy - /unmeth; where /meth (/unmeth) is the average current for at least 23 reads of hydroxymethylated or unmethylated (or, unhydroxymethylated) DNA, respectively] obtained with two DNA strands each containing three hydroxymethylated CpG sites. In most cases, hC results in a small reduction in current, although the magnitude of the current difference is less than observed for mC. In a few cases, hC results in a current increase. Error bars are the observed SD for single-molecule reads of methylated DNA and indicate the variation in single-molecule reads. The gray boxes along the x axis are the SDs for reads of unmethylated DNA. See TABLE 2, for exact numbers of events.
FIGURE 7A and FIGURE 7B graphically illustrates that the DNA sequence context changes the resulting current difference pattern when a modified cytosine replaces a cytosine at a CpG site. FIGURE 7A shows the current difference patterns caused by the sequence XYmCpG, where X and Y are any of the four nucleotides A, C, G, and T. FIGURE 7B shows the current difference patterns caused by the sequence XYhCpG, where X and Y are any of the four nucleotides A, C, G, and T. The right-most column and bottom row of each figure display the current differences averaged over the nucleotides X or Y, respectively. The bottom right box in each figure displays the average current difference for all studied sequence contexts. Both the amplitude and the shape of current difference change with sequence context. FIGURE 7A illustrates that the maximum difference reaches 7 pA for AAmCpG and is only 1-2 pA when XY contains a thymine. The average maximum difference is approximately 2 pA. The number of levels showing a significant current difference varies from 3 to 5. The difference is maximal when the mC is immediately above the constriction of the nanopore (see FIGURE 8) and the distribution is skewed. FIGURE 7B illustrates that the current deviations due to hC are more complex. Generally, when the hC is centered within MspA's constriction, the difference is -2 to -1 pA. However, some contexts involve positive differences. The differences associated with sequences containing XThCpG, XAhCpG, AYhCpG, and CYhCpG are small, with only approximately 1σ differences. As seen for mC, difference patterns caused by hC involve between 3 and 5 levels and are also skewed. The average difference patterns due to mC and hC are similar; both difference patterns map out a single tight recognition site within MspA's constriction (see FIGURES 8A and 8B).
FIGURE 8A and FIGURE 8B schematically illustrate spatial methylation sensitivity of MspA. Schematic cross-section of MspA with mC held just above (FIGURE 8 A) and just below (FIGURE 8B) MspA's constriction. The variable shading indicates the region of higher electric field within MspA. FIGURE 8 A illustrates that when mC is cis of the constriction, it is in a high field region and it modulates the ion current. Other nucleotides that are also within the high field region determine the magnitude of the mC-specific signal. FIGURE 8A illustrates that when mC is trans of the constriction, it is outside the high field region and no longer affects the current. The sequence illustrated in FIGURE 8A is set forth as SEQ ID NO: 12 and the sequence illustrated in FIGURE 8B is set forth as SEQ ID NO: 13.
FIGURE 9A-FIGURE 9D graphically illustrate the differences in ion current resulting from multiple adjacent mCs and hCs. Current differences [ mo(jified ~ ^unmodified] f°rur DNA strands contain different methylation (and hydroxymethylation) patterns. Although CpGs rarely occur in such high density, the illustrated data demonstrate that it is possible to discern multiple adjacent mCpGs and hCpGs. Each sequence set forth in FIGURE 9A-FIGURE 9D is set forth herein as SEQ ID NO: 14- 17, respectively. FIGURE 9A, which illustrates data from a strand containing one mC and one hC (as indicated at the bottom), demonstrates that one can simultaneously detect ^Kl and hC in a single strand. FIGURE 9B illustrates the current difference resulting from a strand with identical sequence to that shown in FIGURE 9A, but containing four mCs as well as two hCs (indicated at the bottom). As demonstrated, even with this density of modified CpGs, individual mCs and hCs can be resolved. FIGURE 9C illustrates the current differences resulting from a strand with adjacent mCpG sites. The modification density results in wide and large current difference profiles. The current difference profiles for individual mCs seemingly superimpose. The current increase in the middle of the trace is due to only one mCpG and compares well with a mCpG embedded in the same sequence context shown in FIGURE 4B above. FIGURE 9D illustrates the current differences for a strand with identical sequence to that in FIGURE 9C but with two " s replaced by two hCs (indicated at the bottom). Here, the effects of mC and hC counteract one another. As in FIGURE 9C, the result is approximately a superposition of the signals shown in FIGURE 7.
FIGURE 1 OA-FIGURE 10D graphically illustrate the classification power for individual-level positions surrounding CpG sites. FIGURE 10A illustrates the t-test value for each level and each measured sequence context (XYCpG), testing the unmethylated hypothesis XYCpG, against the methylated hypothesis XY^pG. Darker values indicate that a level has more power to call the methylation status of the CpG. FIGURE 10B similarly illustrates the predictive power of single levels to call hydroxymethylation. FIGURE IOC similarly illustrates the predictive power of single levels to call methylation and hydroxymethylation. The lower plots in FIGURE 1 OA-FIGURE IOC show the t-test value for each position, averaged over all sequence contexts. Interestingly, the two nucleotide positions 5' of the methylated C have a larger effect on the methylation- specific current than the two nucleotide positions to the 3' side. FIGURE 10D illustrates the classification frequency using just one specific level at the indicated positions. FIGURE 10D, part (i), illustrates the rates that strands containing a mC were called correctly as mCpG (indicated as "mC"), or incorrectly as hCpG (indicated as "hC") and unmethylated CpG (indicated with "C"). Using the level at position 0, the roCpG's were classified correctly -90% of the time, miscalled as unmethylated 9% of the time, but were never miscalled as hydroxymethylated. The other plots are the same for CpGs that were hydroxymethylated (FIGURE 10D, part (ii)) and unmethylated (FIGURE 10D, part (iii)).
DETAILED DESCRIPTION
The present disclosure generally relates to methods for measuring, diagnosing, visualizing, and/or detecting modifications in nucleic acids through the use of nanopore-based analysis. In some embodiments, the methods and compositions are useful for accurately detecting, distinguishing, and mapping epigenetic modifications in nucleic acids such as 5-methylcytosine (mC) and 5-hydroxymethylcytosine (hC). Nanopore analysis is an emerging single-molecule technique that has shown promise for DNA sequencing and analysis. As is described in more detail below, in nanopore sequencing, a thin membrane containing a single nanometer- sized pore divides a salt solution into two wells, cis and trans. A voltage across the membrane causes an ion current through the pore. This current can also facilitate the interaction of analytes, such as DNA, with the nanopore, in some cases driving the analyte through the pore from one side to the other. As a DNA polymer passes through the pore, the nucleotides at the narrowest section of the pore modulate the ion current. In principle, one can determine the identity and sequence of the nucleotides from the current recording (Manrao et al., 2012). Solid-state nanopores have been used to detect the bulk presence of ^Kl and hC in double-stranded DNA (dsDNA) (Wanunu, M., et al., "Discrimination of Methylcytosine From Hydroxymethylcytosine in DNA Molecules," Journal of the American Chemical Society 133(3):486-492, 2011). Recently, solid-state nanopores were also used to detect dsDNA complexed with methyl-binding proteins and thereby indirectly measured the approximate location of individual methylation sites (Shim, J., et al., Detection and Quantification of Methylation in DNA Using Solid-State Nanopores," Scientific Reports 3: 1389, 2013). Experiments with ssDNA held statically in biological pores have distinguished C, mC, or hC directly (Manrao, E.A., Derrington, I.M., Pavlenok, M., Niederweis, M., Gundlach, J.H., "Nucleotide Discrimination With DNA Immobilized in the MspA Nanopore, PLoS One 6(10):e25723, 2011; Wallace, E.V.B., et al., "Identification of Epigenetic DNA Modifications With a Protein Nanopore," Chemical Communications 46(43):8195-8197, 2010). However, a challenge remains to implement nanopore technologies into a robust method detect, distinguish, and even map, the existence of such modifications at any position within the DNA or other nucleic acid polymer.
The present inventors have developed an approach using nanopore-based analysis that can detect, map, and distinguish (i.e., accurately identify) multiple, distinct DNA modifications within a single DNA template polymer. Specifically, the engineered biological protein pore, Mycobacterium smegmatis porin A (MspA), was used to detect and map 5-methylcytosine and 5-hydroxymethylcytosine within single strands of DNA with single-nucleotide resolution. As described in more detail below, a phi29 DNA polymerase (DNAP) (Manrao et l., 2012) was used to assist controlled translocation of ssDNA template through the MspA pore (Butler, T.Z., Pavlenok, M., Derrington, I.M., Niederweis, M., Gundlach, J.H., "Single-Molecule DNA Detection With an Engineered MspA Protein Nanopore," Proceedings of the National Academy of Sciences (PNAS) USA 105(52):20647-20652, 2008) in single-nucleotide steps, and the ion current through the pore was recorded. A comparison of the current levels generated with DNA containing methylated or hydroxymethylated CpG sites to current levels obtained with unmethylated copies of the same DNA sequence resulted in a surprisingly precise indication of methylated or hydroxymethylated CpG sites. With a single read, the detection efficiency in a quasi-random DNA strand was 97.5 + 0.7% for methylation and 97 + 0.9% for hydroxymethylation. The disclosed approach can be applied to detection and mapping of modifications, such as epigenetic modifications or the results of DNA damage, that occur in genomic DNA or RNA. Such information can be valuable for clinical uses, such as assessing associations of such changes with risk or presence of disease.
In accordance with the foregoing, the present disclosure provides a method of detecting a nucleotide modification in a nucleic acid polymer. The method comprises applying an electrical field to a nanopore system comprising a first conductive liquid medium in liquid communication with a second conductive liquid medium through a nanopore. The nucleic acid polymer is translocated through a nanopore from the first conductive liquid medium to the second conductive liquid medium. An ion current is detected to provide a current pattern associated with a portion of the nucleic acid polymer. Finally, the current pattern is compared to a reference current pattern associated with the same nucleotide sequence as the portion of the nucleic acid polymer without any modifications, wherein a difference between the current pattern and the reference current pattern indicates the presence of a modified nucleotide in the polymer.
As used herein, the term "polymer" refers to a chemical compound comprising two or more repeating structural units, referred to herein interchangeably as "subunits," "monomeric units," or "mers," where each subunit can be the same or different. The term "nucleic acid" can refer to a deoxyribonucleotide polymer (DNA), ribonucleotide polymer (RNA, including mRNA), peptide nucleic acids (PNAs) and phosphorothioate DNA, in either single- or double-stranded form. The nucleic acid subunits for each distinct nucleic acid polymer-type are commonly known. For example, the structure of the canonical polymer subunits of DNA are referred to herein as adenine (A), guanine (G), cytosine (C), and thymine (T). As a group, these are generally referred to herein as nucleotides or nucleotide residues. For RNA, the four canonical polymer subunits are the same, except with uracil (U) instead of thymine (T).
The present disclosure is directed to detecting modifications that can occur within the nucleic acid polymers, and in some embodiments, modifications that specifically occur to the individual subunits of the nucleic acid polymers, i.e., the individual nucleotides. The term "modification" encompasses any chemical change in the structure of the nucleic acid polymer subunit that results in a noncanonical subunit structure. Such chemical changes can results from, for example, epigenetic modifications (such as to genomic DNA or RNA), or damage resulting from radiation, chemical, or other means.
Accordingly, as used throughout this disclosure, the terms "nucleotide modification", "nucleic acid modification", and the like, do not refer to simple additions or deletions of canonical nucleotides to the sequence of the polymer. Nor do the terms refer to substitutions of one canonical nucleotide for another canonical nucleotide for that polymer-type. In this regard, it is noted that uracil (U) is considered a noncanonical nucleotide structure for DNA polymers (and, conversely, thymine (T) is considered a noncanonical nucleotide structure for RNA polymers).
Accordingly, the present disclosure is directed to the detection of nucleotide modifications, as defined. Thus, the disclosure encompasses the detection of a noncanonical nucleotide structure within a nucleic acid polymer, which results from the act of modification. Any of the foregoing noncanonical subunits include analog structures. Illustrative and nonlimiting examples of noncanonical nucleic acid subunits include uracil (for DNA), thymine (for RNA), 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine, 5-carboxylcytosine, b-glucosyl-5-hydroxy-methylcytosine, 8-oxoguanine, 2-amino-adenosine, 2-amino- deoxyadenosine, 2-thiothymidine, pyrrolo-pyrimidine, 2-thiocytidine, or an abasic lesion. An abasic lesion is a location along the deoxyribose backbone but lacking a base. Additionally, noncanonical structures can incorporate more than one nucleic acid subunits, such as thymine dimers.
In some embodiments, a single target nucleic acid polymer can comprise a combination of any of the foregoing polymers and/or polymer subunits. For example, in some embodiments, the polymer analyte is a combination of any two or more of DNA, RNA, PNA. In such embodiments, the present disclosure addresses the detection of modified nucleic acid subunits that are noncanonical (i.e., have been modified from the canonical structure) with reference to the canonical subunit structures for the two or more types of nucleic acids that make up the single polymer.
Various aspects of the nanopore and nanopore system will now be described. Methods and systems for nanopore-based polymer analysis, which are capable of use with the present disclosure, include systems such as those described in U.S. Pub. No. 2012/0055792, and International PCT Pub. Nos. WO2011/106456 and WO2011/106459, all of which are incorporated by reference herein in their entireties. A "nanopore" specifically refers to a pore having an opening with a diameter at its most narrow point of about 0.3 nm to about 2 nm. Nanopores useful in the present disclosure include any pore capable of permitting the linear translocation of a nucleic acid polymer from one side to the other at a velocity amenable to monitoring techniques, such as techniques to detect current fluctuations.
In some embodiments, the nanopore comprises a protein. Such proteins can be β-barrel pores, outer membrane proteins (often of bacterial origin), β-toxin porings, and transport proteins. Exemplary pores include alpha-hemolysin, Mycobacterium smegmatis porin A (MspA) and related porins such as from Norcadia facinica, membrane outer protein (OmpATb), membrane outer protein F (OmpF), membrane outer protein G (OmpG), outer membrane phospholipase A, Neisseria autotransporter lipoprotein (NalP), lysenin, anthrax toxin and leukocidins, and homologs thereof, or other porins, as described in U.S. Pub. No. US2012/0055792, International PCT Pub. Nos. WO2011/106459, and WO2011/106456, incorporated herein by reference. A "homolog," as used herein, is a gene or gene product from another bacterial species that has a similar structure and evolutionary origin. By way of an example, homologs of wild-type MspA, such as MppA, PorMl, PorM2, and Mmcs4296, can serve as the nanopore in the present invention. Protein nanopores have the advantage that, as biomolecules, they self-assemble and are essentially identical to one another. In addition, it is possible to genetically engineer protein nanopores to confer desired attributes, such as substituting amino acid residues for amino acids with different charges, or to create a fusion protein (e.g., an exonuclease+alpha-hemolysin). Thus, the protein nanopores can be wild-type or can be modified to contain at least one amino acid substitution, deletion, or addition. In some embodiments the at least one amino acid substitution, deletion, or addition results in a different net charge of the nanopore. In some embodiments, the different in net charge increases the difference of net charge as compared to the first charged moiety of the polymer analyte. For example, considering that DNA has a net negative charge, in some embodiments, the at least one amino acid substitution, deletion, or addition results in a nanopore that is less negatively charged. In some cases, the resulting net charge is negative (but less so), is neutral (where it was previously negative), is positive (where it was previously negative or neutral), or is more positive (where it was previously positive but less so).
Descriptions of specific modifications to MspA nanopores have been described, see U.S. Pub. No. 2012/0055792, incorporated herein by reference in its entirety. Briefly described, MspA nanopores can be modified with amino acid substitutions to result in a MspA mutant with a mutation at position 93, a mutation at position 90, position 91, or both positions 90 and 91, and optionally one or more mutations at any of the following amino acid positions: 88, 105, 108, 118, 134, or 139, with reference to the wild type amino acid sequence. In one specific embodiment, the MspA contains the mutations D90N/D91N/D93N, with reference to the wild type sequence positions (referred to therein as "MIMspA" or "Ml-NNN"). In another embodiment, the MspA contains the mutations D90N/D91N/D93N/D118R/D134R/E139K, with reference to the wild type sequence positions (referred to therein as "M2MspA"). See U.S. Pub. No. 2012/0055792. Such mutations can result in a MspA nanopore that comprises a vestibule having a length from about 2 to about 6 nm and a diameter from about 2 to about 6 nm, and a constriction zone having a length from about 0.3 to about 3 nm and a diameter from about 0.3 to about 3 nm, wherein the vestibule and constriction zone together define a tunnel. Furthermore, the amino acid substitutions described in these examples provide a greater net positive charge in the vestibule of the nanopore, further enhancing the energetic favorability of interacting with a negatively charged analyte polymer end.
In some embodiments, the nanopores can include or comprise DNA-based structures, such as generated by DNA origami techniques. For descriptions of DNA origami-based nanopores for analyte detection, see PCT Pub. No. WO2013083983, incorporated herein by reference.
In some embodiments, the nanopore can be a solid state nanopore. Solid state nanopores can be produced as described in U.S. Patent Nos. 7,258,838 and 7,504,058, incorporated herein by reference in their entireties. Solid state nanopores have the advantage that they are more robust and stable. Furthermore, solid state nanopores can in some cases be multiplexed and batch fabricated in an efficient and cost-effective manner. Finally, they might be combined with micro-electronic fabrication technology. In some embodiments, the nanopore comprises a hybrid protein/solid state nanopore in which a nanopore protein is incorporated into a solid state nanopore. In some embodiments, the nanopore is a biologically adapted solid-state pore.
In some embodiments, such as incorporating MspA protein nanopores, the nanopore comprises a vestibule and a constriction zone that together form a tunnel. A "vestibule" refers to the cone-shaped portion of the interior of the nanopore whose diameter generally decreases from one end to the other along a central axis, where the narrowest portion of the vestibule is connected to the constriction zone. A vestibule may generally be visualized as "goblet-shaped." Because the vestibule is goblet- shaped, the diameter changes along the path of a central axis, where the diameter is larger at one end than the opposite end. The diameter may range from about 2 nm to about 6 nm. Optionally, the diameter is about, at least about, or at most about 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, or 6.0 nm, or any range derivable therein. The length of the central axis may range from about 2 nm to about 6 nm. Optionally, the length is about, at least about, or at most about 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, or 6.0 nm, or any range derivable therein. When referring to "diameter" herein, one can determine a diameter by measuring center-to-center distances or atomic surface-to-surface distances.
A "constriction zone" refers to the narrowest portion of the tunnel of the nanopore, in terms of diameter, that is connected to the vestibule. The length of the constriction zone can range, for example, from about 0.3 nm to about 20 nm. Optionally, the length is about, at most about, or at least about 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, or 3 nm, or any range derivable therein. The diameter of the constriction zone can range from about 0.3 nm to about 2 nm. Optionally, the diameter is about, at most about, or at least about 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, or 3 nm, or any range derivable therein. In other embodiment, such as those incorporating solid state pores, the range of dimension (length or diameter) can extend up to about 20 nm. For example, the constriction zone of a solid state nanopore is about, at most about, or at least about 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1,2 13, 14, 15, 16, 17, 18, 19, or 20 nm, or any range derivable therein.
In some cases, the nanopore is disposed within a membrane, thin film, or lipid bilayer, which can separate the first and second conductive liquid media, which provides a nonconductive barrier between the first conductive liquid medium and the second conductive liquid medium. The nanopore, thus, provides liquid communication between the first and second conductive liquid media. In some embodiments, the pore provides the only liquid communication between the first and second conductive liquid media. The liquid media typically comprises electrolytes or ions that can flow from the first conductive liquid medium to the second conductive liquid medium through the interior of the nanopore. Liquids employable in methods described herein are well-known in the art. Descriptions and examples of such media, including conductive liquid media, are provided in U.S. Patent No. 7,189,503, for example, which is incorporated herein by reference in its entirety. The first and second liquid media may be the same or different, and either one or both may comprise one or more of a salt, a detergent, or a buffer. Indeed, any liquid media described herein may comprise one or more of a salt, a detergent, or a buffer. Additionally, any liquid medium described herein may comprise a viscosity-altering substance or a velocity-altering substance.
The nanopore is capable of interacting with the nucleic acid analyte polymer serving as the target or focus of a modification analysis herein. Specifically, the polymer and nanopore are capable of interacting such that the polymer can translocate through the nanopore from a first conductive liquid medium to a second conductive liquid medium. The translocation is preferably in a linear fashion, through the pore to the other side. As used herein, the terms "interact" or "interacting" indicate that the analyte moves into at least an interior portion of the nanopore and, optionally, moves into the constriction zone so as to maximally effect the measurable current through the nanopore. As used herein, the terms "through the nanopore" or "translocate" are used to convey that at least some portion of the polymer analyte enters one side of the nanopore and moves to and out of the other side of the nanopore. In some cases, the first and second conductive liquid media located on either side of the nanopore are referred to as being on the cis and trans regions, where the analyte polymer to be measured generally translocates first from the cis region to the trans region through the nanopore. However, in some embodiments, the analyte polymer to be measured can translocate from the trans region to the cis region through the nanopore. For example, as described below (and illustrated in FIGURES 1A, IB, and 1C), the nanopore system used incorporated a molecular motor, a blocking oligo, and a hairpin primer. The blocking oligo is unzipped from the template strand as the template strand passes linearly from the cis to the trans side. Once the blocking oligo completely disassociates from the template, the molecular motor pulls the strand backwards through the nanopore, from the trans to the cis side, by virtue of the polymerase action that is "primed" by the hairpin primer. In some cases, the entire length of the polymer does not pass through the pore, but sub-portions or segments of the polymer complete the pass through the nanopore for analysis.
The analyte nucleic acid polymer can be translocated through the nanopore using a variety of mechanisms. For example, the analyte polymer and/or reference sequence can be electrophoretically translocated through the nanopore by virtue of the electrical field that is applied to the system. Thus, some nanopore systems also incorporate structural elements to apply an electrical field across the nanopore-bearing membrane or film. For example, the system can include a pair of drive electrodes that drive current through the nanopores. Additionally, the system can include one or more measurement electrodes that measure the current through the nanopore. These can be, for example, a patch-clamp amplifier or a data acquisition device. For example, nanopore systems can include an Axopatch-IB patch-clamp amplifier (Axon Instruments, Union City, CA) to apply voltage across the bilayer and measure the ionic current flowing through the nanopore. The electrical field is sufficient to translocate a polymer analyte through the nanopore. As will be understood, the voltage range that can be used can depend on the type of nanopore system being used. For example, in some embodiments, the applied electrical field is between about 20 mV and about 260 mV, for protein-based nanopores embedded in lipid membranes. In some embodiments, the applied electrical field is between about 40 mV and about 200 mV. In some embodiments, the applied electrical field is between about 100 mV and about 200 mV. In some embodiments, the applied electrical field is about 180 mV. In other embodiments where solid state nanopores are used, the applied electrical field can be in a similar range as described, up to as high as 1 V.
Additionally or alternatively, nanopore systems can include a component that translocates a polymer through the nanopore enzymatically. For example, a molecular motor can be included to influence the translocation of polymers through the nanopore. A molecular motor can be useful for facilitating entry of a polymer into the nanopore and/or facilitating or modulating translocation of the polymer through the nanopore. Ideally, the translocation velocity, or an average translocation velocity, is less than the translocation velocity that would occur without the molecular motor. In any embodiment herein, the molecular motor can be an enzyme. Illustrative, nonlimiting examples useful for nanopore systems include polymerases, exonucleases, a Klenow fragment, helicases (such as hel308/Mbu, T7hp4A, RecD, XpD), translocases, and topoisomerases. In one example, described in more detail below, a DNA polymerase such as phi29 can be used to facilitate movement in both directions. See Cherf, G.M., et al., "Automated Forward and Reverse Ratcheting of DNA in a Nanopore at 5- A Precision," Nature Biotechnology 30:344-348, 2012; and Manrao et al., 2012, both of which are incorporated herein by reference in their entireties. An embodiment of the present system that utilizes a molecular motor is also schematically illustrated in FIGURES IB and 1C.
As indicated above, the present aspect of the disclosure includes the step of detecting an ion current to provide a current patter associated with a portion of the nucleic acid polymer.
Generally, characteristics of the nucleic acid polymer analyte, or subunit(s) thereof, can be determined based on the effect of the polymer, or subunit(s) thereof, on a measurable signal when interacting with the nanopore, such as interactions with the outer rim, vestibule, or constriction zone of the nanopore. The output signal produced by the nanopore system is any measurable signal that provides a multitude of distinct and reproducible signals depending on the physical characteristics of the polymer or polymer subunit(s). To illustrate, the polymer subunit(s) that determine(s) or influence(s) a measurable signal is/are the subunit(s) residing in the "constriction zone," i.e., the three- dimensional region in the interior of the pore with the narrowest diameter. Depending on the length of the constriction zone, the number of polymer subunits that influence the co- passage of electrolytes and, thus, a current output signal can vary. The ionic current level through the pore is an output signal that can vary depending on the particular polymer subunit(s) residing in the constriction zone of the nanopore at any given time. As the polymer translocates in iterative steps (e.g., linearly, subunit by subunit through the pore), the current levels can vary to create a trace, or "current pattern," of multiple output signals corresponding to the contiguous sequence of the polymer subunits that have affected the current at each iterative step. This detection of current levels, or "blockade" events, have been used to characterize a host of information about the structure polymers, such as DNA, passing through, or held in, a nanopore in various contexts.
In general, a "blockade" is evidenced by a change in ion current that is clearly distinguishable from noise fluctuations and is usually associated with the presence of an analyte molecule, e.g., one or more polymer subunits, within the nanopore such as in the constriction zone. The strength of the blockade, or change in current, will depend on a characteristic of the polymer subunit(s) present. Accordingly, in some embodiments, a "blockade" is defined against a "blockade reference" current level. In some embodiments, the blockade reference current level corresponds to the current level when the nanopore is unblocked (i.e., has no analyte structures present in, or interacting with, the nanopore). In some embodiments, the blockade reference current level corresponds to the current level when the nanopore has a known analyte (e.g., a known analyte polymer subunit) residing in the nanopore. In some embodiments, the current level returns spontaneously to the blockade reference level (if the nanopore reverts to an empty state, or becomes occupied again by the known analyte). In other embodiments, the current level proceeds to a level that reflects the next iterative translocation event of the polymer analyte domain through the nanopore, and the particular subunit(s) residing in the nanopore change(s). To illustrate, with respect to the blockade reference current level defined as an unblocked level, the blockade is established when the current is lower than the blockade reference current level by an amount of about 1-100% of the blockade reference current level. It will be understood that the blockade reference current level can immediately precede the blockade event or, alternatively, be separated from the blockade event by a period of time with intervening current measurements. For example, the ionic current may be lower than the blockade reference current level by a threshold amount of about, at least about, or at most about 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%, or any range derivable therein, of the blockade reference current level when a polymer analyte domain subunit enters the nanopore. With respect to the blockade reference current level defined by the presence of a known analyte (e.g., known polymer subunit(s)), the blockade is established when the current is lower or higher than the reference level by an amount of about 1-100% of the reference current level. It will be understood that the blockade reference current level can immediately precede the blockade event or, alternatively, be separated from the blockade event by a period of time with intervening current measurements. For example, the ionic current may be lower or higher than the blockade reference current level by threshold of about, at least about, or at most about 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%, or any range derivable therein, of the blockade reference current level when a polymer analyte domain subunit enters the nanopore. "Deep blockades" can be identified as intervals where the ionic current is lower (or higher) by at least 50% of the blockade reference level. Intervals where the current drops by less than 50% of the blockade reference level can identified as "partial blockades." In some embodiments, the current level in a blockade remains at the reduced (or elevated) level for at least about 1.0 μ8.
The present inventors have determined that the measureable current pattern, specifically one or more blockades in a trace, are associated with structure(s) of one or more contiguous nucleotides of a nucleic acid polymer that reside in the constriction zone of the nanopore during translocation. Furthermore, the inventors have determined that slight modifications, such as methylation and hydroxymethylation, on one or more specific nucleotide residues differentially effect the current flow as compared to a polymer with the same nucleotide sequence, but with unmodified nucleotides. The influence on the current flow is detectable and, as demonstrated below, can be used to accurately detect the modification and map it to the specific nucleotide.
Accordingly, to determine the presence of a modified nucleotide in the analyte nucleic acid polymer, the current pattern associated with a portion of the nucleic acid polymer is compared to a "reference current pattern" (as distinct from the "blockade reference level") associated with the same nucleotide sequence as the portion of the nucleic acid polymer, but wherein sequence associate with the reference current pattern does not have any modifications to any of the nucleotides. The presence of a modified nucleotide in the analyte polymer (or a portion thereof) is indicated by a difference between the current pattern and the reference current pattern is detected.
As indicated above, various pores that are useful for this analysis have different three dimensional structures, especially in the tunnel. Thus, the construction zone, i.e., the shape of the narrowest portion of the pore tunnel, can vary. The output signal for each iterative step during translocation, i.e., the signal reflecting the passage of a single nucleotide, is often affected by multiple contiguous nucleotides in the polymer sequence, specifically those that reside in the constriction zone at each iterative passage step. For example, it has been determined for MspA that each blockade event in the trace is influenced mostly by a quadromer (or 4-mer) of contiguous nucleotides that reside in the constriction zone at that time. Thus, each individual nucleotide in the sequence can ultimately contribute to four blockade events if it passes completely through the constriction zone. As described below, the inventors have shown that the specific difference in the signal that results from a modification of a single nucleotide is similarly observed over four blockade events and the profile of the difference in current signal over at least four blockade events is indicative of the specific type of modification and the location of the modification. Moreover, the specific profile of the difference in current signal over at least four blockade events is also influenced by the sequence context of the quadromer, i.e., the surrounding nucleotide sequence of the quadromer.
Accordingly, the determination of the presence of a modified nucleotide can depend on the whether the difference between the analyte polymer current pattern and the reference current pattern is associated with a particular portion of the analyte polymer. For example, in some embodiments, the current difference is associated with a portion of the nucleic acid analyte polymer that comprises one or a plurality of contiguous nucleotides of the nucleic acid polymer. In some embodiments, the portion of the nucleic acid analyte polymer comprises two or more contiguous nucleotides of the nucleic acid polymer, such as 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more contiguous nucleotides of the nucleic acid polymer. In some embodiments, the portion of the nucleic acid analyte polymer comprises 3, 4, 5, or 6 contiguous nucleotides of the nucleic acid polymer. In some embodiments, the portion of the nucleic acid polymer comprises the nucleotide or nucleotide position with the modification. In further embodiments, the portion of the nucleic acid polymer also comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 contiguous nucleotides immediately adjacent to the nucleotide or nucleotide position with the modification. In these embodiments, the additional contiguous nucleotides can be immediately adjacent at the 5'-, the 3'-, or both sides of the nucleotide or nucleotide position with the modification. In some embodiments, the portion of the nucleic acid polymer includes at least one nucleotide immediately 5'- to the modified nucleotide. In some embodiments, the portion of the nucleic acid polymer includes at least one nucleotide immediately 3'- to the modified nucleotide. In some embodiments, the portion of the nucleic acid polymer includes at least one nucleotide immediately 5'- and at least one nucleotide immediately 3' to the modified nucleotide. In one example, as described below, it was determined that for the specific nanopore system employed, the two nucleotides positioned immediately 5' to the modified nucleotide had the most influence on the current signal difference, followed by the single nucleotide position immediately 3' to the modified nucleotide. Persons of skill in the art would be able to determine the positions of maximal influence on the signal difference for any nanopore of interest.
In addition to determining the particular size (i.e., length) of the nucleic acid polymer portion that corresponds with a current pattern difference (and hence the presence of a modified nucleotide), the inventors discovered that the identity of the specific nucleotides that occupy each position in the nucleic acid polymer portion influence the identifiable character of the current difference. As described below, every sequence variation of XY^G (and XYhCG) were tested and the various current profiles (and differences with the reference sequences) were catalogued (see, e.g., FIGURES 7A and 7B). Such data can be used as a reference, or "look-up" up table, to assist the mapping and identification of the specific modifications that are detected.
In accordance with the above discoveries, in some embodiments, the present method further comprises identifying the type of modification present in the nucleic acid polymer based on a character of the difference between the current pattern and the reference current pattern. In some embodiments, the character is the duration of the difference, e.g., how much time or many blockade events the difference is observed for. In some embodiments, the character is the degree of change, e.g., how much the current pattern differs from the reference pattern in terms of increase or decrease. The character can also combine a particular range of current increase or decrease over a particular time (or number of blockade events). In some embodiments, knowledge of the specific sequence of the nucleic acid portion can contribute to the determination of whether the difference is indicative of a modified nucleotide. However, it will be recognized that knowledge of the specific nucleotide sequence is not necessary for the method. Instead, the comparable current patterns for the portion of the analyte nucleic acid polymer and reference current pattern must be associated with the same sequence, whether or not known. For example, a reference nucleic acid polymer can be replicated or amplified from the analyte nucleic acid polymer using conventional techniques that replicate the (potentially unknown) sequence, but that do not replicate the modification. Such techniques include using the polymerase chain reaction. The reference current pattern can be obtained in a number of ways. In some embodiments, the reference current pattern can be generated de novo by similarly applying a reference nucleic acid polymer to nanopore analysis to generate a current pattern. In some embodiments, the reference current pattern was previously determined and is available in a reference or look-up table. In some embodiments, the reference current pattern can be derived from the modeling signals that would be expected from the structure of the reference portion, i.e., the sequence without modification.
In another aspect, the present disclosure provides a method that involves generating a reference current pattern for comparison. Specifically, this aspect provides a method detecting a nucleotide modification in a nucleic acid polymer, comprising the step of amplifying a target nucleic acid polymer that potentially contains at least one nucleotide modification to produce a reference nucleic acid polymer that does not contain a nucleotide modification. The method further comprises applying the target and reference nucleic acid polymers to a nanopore system comprising a first conductive liquid medium in liquid communication with a second conductive liquid medium through a nanopore. The target nucleic acid polymer is caused to translocate through the nanopore from the first conductive liquid medium to the second conductive liquid medium, and an ion current is detected to provide a target current pattern associated with a portion of the target nucleic acid polymer. Further, the reference nucleic acid polymer is caused to translocate of through the nanopore from the first conductive liquid medium to the second conductive liquid medium, and an ion current is detected to provide a reference current pattern associated with a portion of the reference nucleic acid polymer, wherein the portion of the target nucleic acid polymer comprises the same nucleotide sequence as the portion of the target nucleic acid polymer. The target current pattern is compared to the reference current pattern, wherein a difference between the current pattern and the reference current pattern indicates the presence of a modified nucleotide in the target nucleic acid polymer.
As will be apparent to persons of skill in the art, the technique of replication is limited to approaches that provide reference nucleic acid polymers that contain the same nucleotide sequences as the portion of the target nucleic acid polymer, but that does not retain the modifications to be detected. In some embodiments, the reference nucleic acid polymer is produced from the target nucleic acid polymer using at least one round of the polymerase chain reaction (PCR). A skilled artisan can readily determine the reaction conditions to facilitate replication or amplification of the sequence so as to produce a reference sequence that is capable of producing a current signal to assist in the detection of modifications in the target nucleic acid polymer.
The use of the term "or" in the claims is used to mean "and/or" unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and "and/or."
Following long-standing patent law, the words "a" and "an," when used in conjunction with the word "comprising" in the claims or specification, denotes one or more, unless specifically noted.
The use of the term "about" is intended to include a slight variation, such as 10%, above and below the stated value.
Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise," "comprising," and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of "including, but not limited to." Words using the singular or plural number also include the plural and singular number, respectively. Additionally, the words "herein," "above," and "below," and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of the application.
Disclosed are materials, compositions, and components that can be used for, can be used in conjunction with, can be used in preparation for, or are products of the disclosed methods and compositions. It is understood that, when combinations, subsets, interactions, groups, etc., of these materials are disclosed, each of various individual and collective combinations is specifically contemplated, even though specific reference to each and every single combination and permutation of these compounds may not be explicitly disclosed. This concept applies to all aspects of this disclosure including, but not limited to, steps in the described methods. Thus, specific elements of any foregoing embodiments can be combined or substituted for elements in other embodiments. For example, if there are a variety of additional steps that can be performed, it is understood that each of these additional steps can be performed with any specific method steps or combination of method steps of the disclosed methods, and that each such combination or subset of combinations is specifically contemplated and should be considered disclosed. Additionally, it is understood that the embodiments described herein can be implemented using any suitable material such as those described elsewhere herein or as known in the art.
Publications cited herein and the subject matter for which they are cited are hereby specifically incorporated by reference in their entireties.
The following describes the application of a nanopore system to detect and map methylated and hydroxymethylated cytosine nucleotides in DNA polymer analyte.
The M2-NNN-MspA protein was purified from Mycobacterium smegmatis as previously described in Butler et al (2008). DNA oligonucleotides were synthesized at Stanford University Protein and Nucleic Acid Facility and purified at their facility using column purification methods. See Table 1 for sequence of ssDNA strands used. The location of the modifications is indicated in all figures and in Table 1 with "m" or "h". DNA templates, primers and blocking oligomers were mixed at relative molar concentrations of 1: 1: 1.2 and annealed by incubating at 95 °C for 3 min followed by slow-cooling to below 30 °C. DNA and phi29 DNAP were stored at -20 °C until immediately before use.
TABLE 1: Template strand and oligo sequences for use in a nanopore-based analysis
Figure imgf000028_0001
Single MspA pores were established in a lipid bilayer with previously described methods in Butler et al. (2008), which is incorporated by reference herein. See FIGURE IB. Briefly, l,2-diphytanoyl-sn-glycerol-3-phosphocholine (Avanti Polar Lipids) lipid bilayers were formed across a horizontal -20 μιη diameter Teflon aperture. The -60 μΐ compartments on both sides of the bilayer contained experimental buffer of 0.3 M KC1, 1 mM EDTA, 1 mM DTT, and 10 mM HEPES/KOH buffered at pH 8.0 + 0.05. An Axopatch 200B integrating patch clamp amplifier (Axon Instruments) applied a 180 mV voltage across the bilayer (trans side positive) and measured the ionic current through the pore. M2-NNN MspA was added to the grounded cis compartment, yielding a concentration of -2.5 ng/ml. Once a single pore inserted, the compartment was flushed with experimental buffer to avoid further insertions.
Annealed DNA, as shown in FIGURE 1A, was then added to the experimental volume to achieve a final concentration near -1 μΜ. Once DNA was added to the system, initial interactions with the pore were observed as previously described. phi29 DNA Polymerase (DNAP) was then utilized as described in Manrao et al. (2012). Briefly, once DNA interaction with the pore was confirmed, EDTA and DTT were added to the front well to final concentrations of ImM each. The EDTA and DTT bind up contaminant divalent ions and create the reducing conditions which phi29 requires for functionality. Then phi29 DNAP is added to a final concentration of 1.5μΜ, a dNTP mixture is added to ΙΟΟμΜ of each of the four standard dNTPs, and MgCl2 is added to a final concentration of lOmM. The interaction of phi29 DNAP with the MspA nanopore and hybridized template construct is illustrated schematically in FIGURES IB and 1C. As illustrated, the reproducible current levels were generated indicating the nucleotide- by- nucleotide movement of the ssDNA template strand length- wise through the constriction zone of the nanopore. In the first stage, the template strand threads through the nanopore from the cis to the trans side of the pore, while the blocking oligo unzips from the template strand. Once the blocking oligo is completely disassociated from the template strand, the 3'-end of the hairpin primer is exposed to the phi29 DNAP active site, which enables the polymerase to extend the primer. In most cases, this extension action of the polymerase overcomes the original translocation direction of the template strand, resulting in the template strand being "pulled" back through the nanopore from the trans side to the cis side. During the entire process, the electrical field is monitored for the current flowing through the nanopore. Fluctuations of current are indicative of the specific one or more nucleotides of DNA residing in the most constricted portion of the nanopore opening. With this particular nanopore, it was determined that each blockade event is determined by a quadromer of contiguous nucleotides within the constriction zone of the nanopore. As the polymer progresses through the nanopore, one nucleotide at a time, sequence of the quadromer changes leading to a fluctuation in current level. The current fluctuations are recorded to generate a "trace" of current fluctuations over time, with each nucleotide movement associated with a distinct blockade event. Established relative blockades for the different combinations of nucleotides (e.g., the quadromers that determine each detectable current signal) allow for the reconstruction of the DNA sequence with its associated levels. See International PCT Pub. No. WO2013/159042, incorporated herein by reference in its entirety.
After multiple runs, the average current levels for each from the modified TGCC template (i.e., with methylated and hydroxymethylated CpG sites) were compared to the average current levels for the TGCC template (i.e., the same DNA sequence as the modified TGCC template but with unmethylated CpG sites). FIGURE 2. It is noted that the sequence illustrated in FIGURES 2-4 (and set forth herein as SEQ ID NOS:5 and 6) are internal portions of the sequences set forth in SEQ ID NOS: l and 2, respectively, but are illustrated from 3' to 5' to correspond to the temporal acquisition of the data as the phi29 DNAP pulled the template strand into the cis side of the pore. As illustrated in FIGURE 3, which shows the normalized current differences (modified TGCC signal minus the TGCC signal), the introduction of hC tended to result in a lower current level over the associated quadromers, whereas the introduction of mC resulted in a consistently higher current level in the associated quadromers. An unmodified CpG present in the modified TGCC template strand notably resulted in a similar current signal as the reference TGCC strand. A statistical analysis wherein Gaussian peaks were fit to the data illustrated in FIGURE 3 resulted in the Gaussian peaks corresponding closely to the locations of the epigenetic modifications in the modified TGCC template. Accordingly, the signal changes resulting from these epigenetic modifications can be precisely correlated to the location of and type of modification within the DNA polymer using a nanopore-based analysis. The following is a description of an expanded study a nanopore-based analysis to detect and distinguish multiple epigenetic modifications that appear in DNA polymers within a variety of different sequence contexts.
As described above, phi29 DNA polymerase (DNAP) was used to draw ssDNA through a mutated MspA protein pore in single-nucleotide steps. This yielded resolved current levels that could be associated with the DNA sequence (Manrao et al., 2012). As described hereinabove, an application of this system was able to detect and map mC and hC along single molecules of ssDNA with single-nucleotide resolution in a TGCC quadromer context.
Here, current traces were measured for methylated, hydroxymethylated, and unmethylated DNA passing through the pore. The effect of methylation on the ion current was observed by comparing the current level sequence from reads of methylated DNA with the current level sequence from reads of unmethylated DNA of the same sequence. Based on these analyses, it was apparent that single methyl group on a " consistently increases the current relative to unmethylated C, regardless of the sequence context. This increase persists for several current levels as the DNA passes through the pore. FIGURES 5A and 5B show raw current traces for unmethylated and methylated DNA, respectively. The extracted average current levels are shown in FIGURE 5C for unmethylated DNA in black and methylated DNA in light gray. As in Manrao et al. (2012), these current level sequences can be aligned to their DNA sequence (shown in FIGURE 5C). The difference between the methylated and unmethylated current level sequences (FIGURE 5D) isolates the effect.
Specifically, eight different DNA sequences were investigated with the nanopore system. Each sequence had multiple CpG sites. Several reads of unmethylated DNA were compared to reads of methylated and hydroxymethylated DNA. FIGURES 6A-6D show the average current level differences in 20 or more single-molecule comparisons for four different DNA constructs. Across all such comparisons, mC consistently increases current relative to C, whereas hC generally decreased current relative to C.
The current difference caused by a mC or a hC was found to be strongly affected by the sequence context in which it is embedded. In particular, the nucleotides immediately adjacent to a mC or hC have the greatest influence on the size and shape of the current difference. To investigate this further, the nucleotides on the 5' side were varied and the nucleotide on the 3' side of the C was fixed as a G because of the biological relevance of CpG sites. In exploratory experiments, it was observed that the nucleotide two positions to the 5' side of the modified cytosine have a bigger influence than the nucleotide two positions toward the 3' side, which have a lesser effect (see FIGURE 10). Therefore, all 16 two-nucleotide combinations of the form XYCpG were measured, where X and Y represent A, C, G, or T. Experiments with 115 MspA pores were conducted, using 22 different DNA constructs with each containing several CpG regions. These CpGs were either unmethylated (CpG), methylated (roCpG), or hydroxymethylated (hCpG). In total, 814 translocation events that contained full reads of the given DNA strand were analyzed. These events contained a total of 2,857 passages of various CpGs through the pore. Strand- specific statistics can be found in TABLE 2.
TABLE 2: Template strand and oligo sequences for use in a nanopore-based analysis of nucleotide modifications in various quadromer contexts.
Figure imgf000033_0001
Figure imgf000034_0001
Figure imgf000035_0001
Results for all 16 XYmCpGs and XYhCpGs are summarized in FIGURES 7A and 7B. The maximum difference is up to 7 pA depending on sequence context. On average, the maximum difference caused by mC is approximately 2.5 pA (FIGURE 7A, Bottom Right Panel). Previously, it was observed that four nucleotides within MspA's constriction affect each current level (see, e.g., schematic illustrations in FIGURES IB and FIGURE 8), with the two nucleotides centered in the pore's constriction affecting the current the most (Manrao et al., 2011; Manrao et al., 2012). The present results are consistent in that the replacement of C for mC or hC affects approximately four consecutive current levels. The current difference is maximal when the mC is positioned immediately to cis of the constriction and the shape of the difference peak exhibits skewness.
Current differences due to hC are more complex than differences due to mC. Typically, when hC is centered within MspA's constriction, the difference is -2 to -1 pA. In some cases (GGhCpG, AAhCpG, AThCpG, TGhCpG, and CAhCpG), the current difference includes some levels with positive difference. The differences associated with some sequence contexts are small, with only ~1σ differences per level. Averaging over all sequence contexts (FIGURE 7B, Bottom Right Panel), the difference reaches a negative peak when the hC is near the cis side of the constriction. The locations of maximal difference caused by mC and hC differ by ~1 nt. Average difference patterns for both mC and hC map out a single, sharp recognition site within MspA's constriction. All current profiles caused by hC are very distinct from current profiles caused by mC within the same sequence context.
The schematic in FIGURE 8 shows how sequence context dependence arises. MspA's cross-section is shown in solid black, and variable gray shading indicates the region of high electric field. Nucleotides within the region of high electric field affect the ion current. As mC or hC pass through the pore, their location relative to the pore constriction determines how much they affect the current. All nucleotides within the high-field region of the constriction will influence the current, and therefore alter the influence of a ^Kl and hC modification. Apart from the two nucleotides positioned immediately to the 5' of the CpG, the nucleotide to the 3' of the CpG is also relevant, albeit to a smaller extent. For example, when a T follows the CpG, as in TG^pGT, the 3' side of the current difference peak caused by mC is reduced (FIGURE 4). The data in FIGURE 7 demonstrate that the four nucleotides X, Y, C, and G dominate the magnitude of the current difference caused by mCpG and hCpG (see also FIGURE 10).
The effect of several "KIpGs near one another was also investigated. FIGURES 9A-9D shows a construct with several modified Cs spaced only five nucleotides apart. The current difference peaks associated with the four mCs and two hCs are still easily distinguishable. When two or three mCpGs are immediately adjacent to one another, as in FIGURE 9C, the difference peak is wider and higher than the signal for just one mCpG within the same context. Placing a hCpG immediately adjacent to a " pG (FIGURE 9D) reduces the signal of the nearby mCpGs. The signal is approximately a superposition of the individual " and hC signals.
Using the current differences shown in FIGURE 7, a simple Bayesian probability methylation detection algorithm was implemented. Three consecutive current differences from single-molecule measurements were compared to the current difference patterns in FIGURE 7 (see the Materials and Methods in the below Examples section). This algorithm was used to distinguish between C, mC, and hC at known CpG sites. A "HZpG true-positive detection rate of 97.5 + 0.7% and a hCpG true-positive detection rate of 97.0 + 0.9% were determined. The true-negative detection rate for unmethylated CpGs was 98.4 + 0.6%. Many XYmCpGs, such as AAmCpG, were always properly called. Methylated sites with smaller current differences, such as CTmCpG and TCmCpG, were detected with lower accuracy: ~86% and ~88%, respectively (see TABLE 2 for individual context-dependent detection rates). As one would expect based on the comparatively smaller current differences shown, hC true-positive rates were lower than for mC. In all sequence contexts, mC was distinct from hC; "KIpGs were miscalled as hCpGs in 3 out of 478 occurrences, whereas hCpGs were never miscalled as roCpGs in 609 reads.
The observed current differences were also effective in locating mCpG sites relative to one another without using any prior knowledge of the DNA sequence. Current level sequences of DNA containing CpG sites were aligned via Needleman-Wunsch alignment (Manrao EA, et al., 2012) to current level recordings of unmethylated DNA. Searches for methylation sites within the current difference between methylated event traces and an unmethylated consensus were performed using a peak detection algorithm. Detecting a mC within two nucleotides of its known position was considered a true- positive detection. A true-positive detection rate of 92.7% was determined for mC and a true-negative rate of 99.1% for all unmethylated regions. In this method, true negatives included non-CpG regions in addition to CpGs tested above, resulting in a higher true- negative detection rate than in the method described in the preceding paragraph. Rates from these two methods are not directly comparable. In another sequence-independent technique, a Bayesian classification measure was used to find mCs, yielding similar detection efficiencies (see the Materials and Methods in the below Examples section). mC detection without reference to DNA sequence is useful for hypermethylation or hypomethylation detection and is comparable to other nanopore methylation detection techniques (Wanunu, M., et al., "Discrimination of Methylcytosine From Hydroxymethylcytosine in DNA Molecules," Journal of the American Chemical Society 133(3):486-492, 2011; Shim et al., 2013).
The present analysis shows that MspA-based nanopore sequencing can locate DNA methylation sites with near unity efficiency. In this work, the single reads of DNA molecules containing mCpGs and hCpGs were compared to measured current references acquired with unmethylated DNA of the same sequence. It was found that mC and hC have distinct current signatures. The detection probabilities are expected to be reasonable estimates for ^Kl and hC detection in genomic DNA because the constructs studied simulate a heterogeneous sequence. Although mC is distinct from both C and hC, confident detection of mC and hC in some contexts may require repeated reads. The nanopore strand sequencing method used in this work produces a second read of the same DNA molecule because of the bi-directional movement of the template strand through the nanopore (Manrao et al., 2012). Using this second read can improve calling accuracies. In contrast to other mC and hC detection techniques that rely on mC- specific chemical reactions and/or enzymatic kinetics, the present system detects the methylation directly. Unlike single-molecule detection via SMRT (Flusberg, B.A., et al., "Direct Detection of DNA Methylation During Single-Molecule, Real-Time Sequencing," Nature Methods 7(6):461-465, 2010; Lluch-Senar, M., al., "Comprehensive Methylome Characterization of Mycoplasma genitalium and Mycoplasma pneumoniae at Single-Base Resolution," PLoS Genetics 9(l):el003191, 2013), the methylation signal in MspA-based nanopore sequencing is carried in the primary signal: the ion current. Polymerase kinetics (Flusberg et al., 2010) may be used as an additional indicator of modified bases in the present method.
As was seen previously (Manrao et al., 2012; Cherf et al., 2012), the phi29 DNAP exhibited toggling/backstepping behavior that is thought to be related to the polymerase's proofreading function. This behavior and the stochastic level durations complicate level extraction. Optimized DNA translation control will enhance the industrial application of the present method.
Because MspA demonstrated well-resolved signals for nucleotides that are differentiated by only a single methyl group, it is expected that other modified bases such as 8-oxo-guanine (Schibel, A.E., et al., "Nanopore Detection of 8-oxo-7,8-dihydro-2'- Deoxyguanosine in Immobilized Single-Stranded DNA Via Adduct Formation to the DNA Damage Site," Journal of the American Chemical Society 132(51): 17992-17995, 2010), thymine dimers, 5-carboxylcytosine, and 5-formylcytosine (Ito S., et al., "Tet Proteins Can Convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine" Science 333(6047): 1300-1303, 2011; Nabel, C.S., Manning, S.A., Kohli, R.M., "The Curious Chemical Biology of Cytosine: Deamination, Methylation, and Oxidation as Modulators of Genomic Potential," ACS Chemical Biology 7(l):20-30, 2012) will have equally well-resolved current signatures. MspA has already proved to be extremely sensitive to abasic residues, one of the most common DNA lesions (Manrao et al., 2012).
The present methylation detection method does not require de novo sequencing with the nanopore to detect methylation. Given a previously measured reference current sequence for unmethylated DNA and known context-dependent methylation patterns as in FIGURES 7 A and 7B, one can then take a single read of a methylated DNA molecule and detect methylation with confidence for most sequence contexts. Because PCR does not copy certain epigenetic modifications such as methylation, nanopore reads of amplified copies would serve as the unmethylated reference. Genomic DNA would then be extracted, given adapters to enable polymerase control, and then be presented to the pore. Individual reads of methylated DNA could then be aligned to the current level reference using a Smith- Waterman alignment algorithm (Manrao et al., 2012). Once aligned, current level comparisons could be made and methylations detected. The unmethylated current reference would only need to be made once and could be reused as a reference detection in other genomic samples. Because of the low copy number of DNA obtainable from mammalian samples, efficient transport of the DNA to the nanopore remains a technical challenge before clinical application of the technique. Industrial implementation can further include miniaturization and parallelization of the experimental setup. All of the intrinsic advantages of nanopore sequencing, such as long read lengths, speed, and minimal sample preparation, are transferable to MspA-based mapping of epigenetic modifications. The conceptual and practical simplicity, as well as the high sensitivity and robust data interpretation, favor conversion of this concept into an industrial platform. It is anticipated that fast and confident methylation detection will accelerate research and ultimately improve health care.
EXAMPLES
Materials and Methods
Nanopore system setup:
The general experimental setup involving MspA and phi29 has been previously described (Manrao et al., 2012). Briefly, a phi29 DNAP was used as a molecular motor to control the motion of DNA through a single MspA pore established in an unsupported phospholipid bilayer. The buffer was 300 mM KC1, 10 mM Hepes buffered at pH 8.00 + 0.05. Currents were recorded on an Axopatch 200B amplifier with custom Labview software (National Instruments) at a voltage bias of 180 mV.
Before each experiment, the DNA template, primer, and blocking oligomer were mixed together in a 1: 1: 1.2 ratio to a final concentration of 50 μΜ. DNA was then annealed by heating to 95 °C for 5 min, cooling to 60 °C for 2 min, and then cooling to 4 °C. Experimental concentrations were ~500 nM for DNA, ~500 nM for phi29 DNAP, ~ 500 μΜ for dNTPs, -10 mM for MgC12, and ~1 mM for DTT.
During strand sequencing (Manrao et al., 2012; Cherf et al., 2012), the DNA is passed through the pore twice, once in the 5 '-to-3' direction (unzipping mode) and once in the 3'-to-5' direction (synthesis mode). For the present disclosure, data from the synthesis mode of phi29 DNAP motion was used (hence, the listing of the sequences from 3' to 5' in FIGURES 2-4). All strands included the sequence 5'-PAAAAAAACCTTCCX-3' (set forth herein as SEQ ID NO:42) at the 5' end of the strand (where P is a phosphorylated 5' end and X is an abasic residue). This sequence creates a reproducible current motif that signals the end of the read. This region was used to calibrate currents and, thus, to control for small changes in buffer conductivity due to evaporation or temperature variation. The sequence of interest followed this calibration sequence. The DNA was designed to contain a variety of nucleotides adjacent to the CpGs. Each strand had at least three CpGs embedded in a random sequence, sufficiently spaced so that their current signatures did not overlap. In each strand, three of these CpGs were uniformly either unmethylated, methylated, or hydroxymethylated. Additionally, eight different DNA sequences were examined (PAN Laboratories, Stanford University, Stanford, CA) containing various methylation patterns (TABLE 2 for sequences used). Some experiments were performed with a mixture of methylated, hydroxymethylated, and unmethylated DNA. Without calibration, these strands could still be sorted by methylation- specific currents.
Data analysis:
Briefly, blockade events were determined using a thresholding method on current data. A feed-forward neural network removed events that did not correspond with phi29 polymerase activity. Once appropriate events were determined, raw current levels were discerned using a custom-written graphical user interface. Current level transition boundaries were selected, and the median current levels were extracted in the time order that they occurred for each event. The phi29 DNAP occasionally exhibited backstepping, causing repeated levels that were removed. Consensus current level sequences were found for each sequence type, and event levels associated with that sequence were automatically aligned using a Needleman-Wunsch algorithm. For experiments with DNA mixtures, a quality score from the Needleman-Wunsch algorithm was used to distinguish DNA with different types of methylation. Once aligned, levels from methylated and unmethylated DNA were examined with a Bayesian probability measure to classify mCpG, hCpG, and CpG sites. The algorithm used current level differences for three consecutive levels, centered on the level associated with XYCpG. Expanded details for data analysis are provided below.
Event selection:
Events are found using a threshold detection algorithm, described in Butler et al. (2008). Events with durations less than 1 second or with average currents greater than 75 pA or less than 15 pA were rejected. To remove events not associated with the polymerase activity, a feed-forward neural network consisting of 5 layers, each layer containing 20 neurons, was employed. The features used in the neural network were the event duration, the event average current and variance, and the outputs of a K-means clustering algorithm. The neural network was trained with -200 events. The neural network removed 100% of the events that were not associated with polymerase activity.
DNA type classification: After extracting current level sequences, we constructed consensus current level sequences specific to unmethylated, methylated, and hydroxymethylated DNA constructs. These consensus current level sequences were constructed using events from experiments containing a single sequence and methylation pattern of DNA, without a mixture of other methylation patterns. Alignment to form consensus level sequences was performed using a Needleman-Wunsch algorithm with an affine gap, as used in Manrao et al., 2012. An event classification algorithm was developed to sort events from experiments that had mixtures of unmethylated, methylated, and hydroxymethylated DNA. The algorithm aligned each event to the consensus levels extracted above and produced a similarity score used to classify each event. The classification method was tested on 297 events from experiments containing a single sequence and methylation pattern of DNA, yielding a classification accuracy of 99.7%. This score-based whole-event classification algorithm was used to separate methylated, hydroxymethylated, and unmethylated events in experiments that were run with mixtures of DNA with various methylation patterns. The consensus levels for unmethylated, methylated, or hydroxymethylated were updated as these events were classified.
Current difference plot construction:
With events classified and current levels aligned, we constructed the level differences, shown in FIGURES 5A-5D and FIGURES 6A-6D. From the known CpG location the level differences surrounding the CpG were extracted, as shown in FIGURE 7. These level differences are used in the Bayesian methylation classifier described below. Level differences near the CpG were used to distinguish mCpG, hCpG, and unmethylated CpG's. We estimated the ability of a single level near the CpG to accurately call the type of CpG's methylation with a t-test that compared the three hypotheses: methylated, hydroxymethylated, or unmethylated (FIGURES 1 OA- IOC). FIGURES 10A, 10B, and IOC also show the average of the t-test for different contexts, yielding the classification power of a given position for all contexts XYCpG. The plots of classification power resemble the current difference plots shown in the right-most bottom plot in FIGURES 7A and 7B, but include level variance information and are independent of sign. The magnitude of the classification power indicates how distinguishable mCpG is from CpG, hCpG is from CpG, and all three are from each other in FIGURES 10A, 10B, and IOC, respectively. It is observed that level positions -1, 0 and 1, corresponding to the two levels on the 5' side of the modified C site and one level to the 3' side of the modified C, have the highest discrimination power.
Methylation calling:
The Bayesian probability classifier using each of the levels -1, 0, or 1 alone can be used to predict the net methylation accuracy of the CpG with 80-90% accuracy (FIGURE 10D). As anticipated, classification accuracy is directly correlated with the classification power shown in FIGURE IOC, having a Pearson correlation coefficient of 0.963 (p = 1.6e-4).
Using the Bayesian classifier with more than one current level difference will improve the classification accuracy. Using the two level differences at positions -1 and 0, sensitivities of 94.6+/- 1.1%, 90.7+/-1.2%, and 91.6+/-1.2% are obtained for mC, hC and C nucleotides, respectively. The values and errors are the average and standard error of the calling frequency when bootstrap resampling 1/5 of the observed events for each construct five times. See TABLE 3, below. Using three levels at positions -1, 0, and 1, an average sensitivity of 97.7+/-0.8%, 95.0+/-1.3% and 95.6+/-1.3% to mC, hC and C, respectively, is obtained. Including all levels did not increase the sensitivity above 98.4% for any hypothesis. See TABLE 3 below for the calling and miscalling accuracies of each sequence context. Because nearby a single mCpG or hCpG influences surrounding levels, unmethylated CpGs that were within 2 nt of a mCpG or hCpG would occasionally be called incorrectly. These miscalls were identified and the calling frequencies were corrected, as shown in TABLE 3. The methylation calling accuracies presented above use the levels at positions -1, 0 and 1.
TABLE 3A; Three level context de endent callin fre uencies
Figure imgf000044_0001
TABLE 3B: Two level context de endent callin fre uencies
Figure imgf000044_0002
Figure imgf000045_0001
TABLE 3: This table encompasses two similar tables, A and B. Within each table, the calling frequencies and detected count of XYCpG's are presented. Each row gives the calling frequency for a mC, hC, or C called as a mC, hC or C, as indicated for each row. The highlighted rows indicate CpG sites called as non-methylated CpG due to positive "KIpG or hCpG detection within 2 levels. Each column indicates the context within which its calling frequencies were obtained. Numbers in parentheses are the count of observed CpG levels within the given context and 5 type of CpG. For example, the fourth row in the first column, with the entry 81.5 (75), states that in 81% of 75 observed events, an AAhCG was accurately called as an AAhCG. The final column provides the average and the standard error of the calling frequency obtained by bootstrap resampling 1/5 of the observed events for each construct 5 times. TABLE 3A has the results classification using three levels (-1, 0 , and 1) and TABLE 3B has the results classification using two levels (-1 and 0).
Bayesian probability classifier:
To classify observed current differences as belonging to CpG, mCpG, and hCpG, a
Bayesian probability classifier, as described below, was used. This started with P(XYZ\{ Δ;}), which is the probability of the sequence hypothesis XYZ given set of current differences {Aj}. The letters X and Y are each any standard nucleotide and Z is a C, mC, or hC. Next P({ A^} \XYZ) was defined, which is the probability of observing current difference knowing the sequence XYZ at location i. This was modeled as
1
Ρ(Δ; I XYZ) = exp(-(A; - AXYZ i ) Ι 2σ XYZ J ) , (equation 1)
Figure imgf000046_0001
where AXYZ i and θχγζ ι are the mean and variance of the level difference at position i for context XYZ (as shown in FIGURES 7 A and 7B).
To define the total probability of classifying the levels {Δ;} we used Bayes' theorem.
P{{Ai} \ XYZ)P{XYZ)
P(XYZ \ {Ai}) = (equation 2)
Taking each measurement i as independent gives
Y[ P(Ai \ XYZ)P(XYZ)
P(XYZ \ {Ai }) = - (equation 3)
The highest probability for equation 3 was used to classify the set of current differences as belonging to the sequence hypothesis XYZ. The prior probability can be taken to be the expected probability of CpG methylation or hydroxymethylation for a given sample, or to be 1/3 as was chosen for the samples. Because XYZ were compared over all hypotheses, the factor Ρ({Δ;}) does not matter, and only the product of probabilities from Equation 1 were used.
Given the known sequence and location of the CpG site with known context XY, Equation 3 was used and considered only C, mC, or hC. Classification of the CpG was given by the highest value of PiXYZ). Classification frequency was calculated as the number of classifications divided by the number of expected classifications for the given XYZ context.
Given an unknown sequence, but the observed level differences, Equation 3 was extended to consider all dinucleotide hypotheses and all cytosine variants C, mC, or hC. The known level differences were compared to all sets of level differences within a given event, and observed peaks in probabilities for given XYZ hypotheses. Classification was given by the highest probability along the progression of level differences.
Sequence independent methylation detection:
A peak detection algorithm was used to identify methylation sites without using sequence specific knowledge (i.e., events were not compared using known current difference patterns for various sequence contexts). Such detection can identify methylation sites independent of the sequence of the examined DNA with reasonable accuracy. The peak detection required level differences that reach a maximal height of 1.1 pA and that have a separation of at least 6 levels from any adjacent peaks. With these parameters, -93% true-positive methylation within 2 levels of the known methylation position were identified, and >99% true-negative (non-methylation) were identified. Increasing the requisite peak height improved true-negative detection, at the cost of true- positive detection. While illustrative embodiments have been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.

Claims

CLAIMS The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
1. A method of detecting a nucleotide modification in a nucleic acid polymer, comprising:
applying an electrical field to a nanopore system comprising a first conductive liquid medium in liquid communication with a second conductive liquid medium through a nanopore;
translocating the nucleic acid polymer through a nanopore from the first conductive liquid medium to the second conductive liquid medium;
detecting an ion current to provide a current pattern associated with a portion of the nucleic acid polymer; and
comparing the current pattern to a reference current pattern associated with the same nucleotide sequence as the portion of the nucleic acid polymer without any modifications, wherein a difference between the current pattern and the reference current pattern indicates the presence of a modified nucleotide in nucleic acid polymer.
2. The method of Claim 1, wherein the nucleic acid polymer is DNA, RNA, mRNA, PNA, or a combination thereof.
3. The method of Claim 2, wherein the DNA is single stranded DNA.
4. The method of Claim 1, further comprising identifying the type of nucleotide modification present in the polymer based on a character of the difference between the current pattern and the reference current pattern.
5. The method of Claim 4, wherein the character of the difference comprises the degree of current increase or decrease and/or the duration of the difference.
6. The method of Claim 1, wherein the nucleotide modification is an epigenetic modification or a modification resulting from DNA damage.
7. The method of Claim 1, wherein the nucleotide modification is a 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine, 5-carboxylcytosine, b-glucosyl-5-hydroxy-methylcytosine, 8-oxoguanine, 2-amino-adenosine, 2-amino-deoxyadenosine, 2-thiothymidine, pyrrolo-pyrimidine, 2-thiocytidine, a thymine dimer, or an abasic lesion.
8. The method of Claim 1, wherein the portion of the nucleic acid polymer comprises one or a plurality of contiguous nucleotides of the nucleic acid polymer.
9. The method of Claim 8, wherein the portion of the nucleic acid polymer comprises the nucleotide or nucleotide position with the modification.
10. The method of Claim 9, wherein the portion of the nucleic acid polymer further comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more additional nucleotides adjacent to the nucleotide or nucleotide position with the modification on one or both sides.
11. The method of Claim 10, wherein at least one additional nucleotide is adjacent at the 5' side of the nucleotide with the modification.
12. The method of Claim 10, wherein at least one additional nucleotide is adjacent at the 3' side of the nucleotide with the modification.
13. The method of Claim 10, wherein the portion of the nucleic acid polymer further comprises at least two additional nucleotides adjacent at the 5' side of the nucleotide with the modification and at least one nucleotide adjacent at the 3' side of the nucleotide with the modification.
14. The method of Claim 1, wherein the nanopore is a solid-state nanopore, protein nanopore, a hybrid solid state-protein nanopore, a biologically adapted solid-state nanopore, or a DNA origami nanopore.
15. The method of Claim 14, wherein the protein nanopore is a β-barrel pore.
16. The method of Claim 15, wherein the protein nanopore is alpha-hemolysin or Mycobacterium smegmatis porin A (MspA), or a homolog thereof.
17. The method of Claim 14, wherein the protein nanopore sequence is modified from the wild-type sequence to contain at least one amino acid substitution, deletion, or addition.
18. The method of Claim 17, wherein the at least one amino acid substitution, deletion, or addition results in a net charge change in the nanopore.
19. The method of Claim 1, wherein the electric field is sufficient to cause the electrophoretic translocation of the nucleic acid polymer through the nanopore.
20. The method of Claim 1, wherein the electric field is between about 40 mV to 1 V.
21. The method of Claim 1, wherein the nanopore is associated with a molecular motor, wherein the molecular motor is capable of moving a nucleic acid polymer into or through the nanopore with an average translocation velocity that is less than the average translocation velocity at which the analyte translocates into or through the nanopore in the absence of the molecular motor.
22. The method of Claim 21, wherein the molecular motor is a polymerase, an exonuclease, a helicase, a topoisomerase, or a translocase.
23. The method of Claim 22, wherein the molecular motor is phi29.
24. A method of detecting a nucleotide modification in a nucleic acid polymer, comprising:
amplifying a target nucleic acid polymer that potentially contains at least one nucleotide modification to produce a reference nucleic acid polymer that does not contain a nucleotide modification;
applying the target and reference nucleic acid polymers to a nanopore system comprising a first conductive liquid medium in liquid communication with a second conductive liquid medium through a nanopore;
causing the translocation of the target nucleic acid polymer through the nanopore from the first conductive liquid medium to the second conductive liquid medium;
detecting an ion current to provide a target current pattern associated with a portion of the target nucleic acid polymer;
causing the translocation of the reference nucleic acid polymer through the nanopore from the first conductive liquid medium to the second conductive liquid medium; and detecting an ion current to provide a reference current pattern associated with a portion of the reference nucleic acid polymer, wherein the portion of the target nucleic acid polymer comprises the same nucleotide sequence as the portion of the target nucleic acid polymer;
comparing the target current pattern to the reference current pattern, wherein a difference between the current pattern and the reference current pattern indicates the presence of a modified nucleotide in the target nucleic acid polymer.
25. The method of Claim 24, wherein the nucleic acid polymer is DNA, RNA, mRNA, PNA, or a combination thereof.
26. The method of Claim 25, wherein the DNA is single stranded DNA.
27. The method of Claim 25, wherein the DNA is genomic DNA.
28. The method of Claim 24, wherein the nucleotide modification is an epigenetic modification.
29. The method of Claim 28, wherein the epigenetic modification is a 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine, 5-carboxylcytosine, b-glucosyl-5-hydroxy-methylcytosine, 8-oxoguanine, 2-amino-adenosine, 2-amino-deoxyadenosine, 2-thiothymidine, pyrrolo-pyrimidine, or 2-thiocytidine.
30. The method of Claim 24, wherein the reference nucleic acid polymer is produced from the target nucleic acid polymer using at least one round of the polymerase chain reaction (PCR).
31. The method of Claim 24, wherein the portion of the target nucleic acid polymer comprises one or a plurality of contiguous nucleotides of the target nucleic acid polymer.
32. The method of Claim 31, wherein the portion of the target nucleic acid polymer comprises the nucleotide or nucleotide position with the modification.
33. The method of Claim 32, wherein the portion of the target nucleic acid polymer further comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more additional nucleotides adjacent to the nucleotide or nucleotide position with the modification on one or both sides.
34. The method of Claim 24, further comprising determining the position of the modified nucleotide in the target polymer.
35. The method of Claim 24, further comprising identifying the modified nucleotide in the target polymer.
36. The method of Claim 35, further comprising determining the sequence of at least a portion of the target nucleic acid polymer comprising the modified nucleotide.
37. The method of Claim 35, wherein the modified nucleotide is identified without knowledge of the nucleotide identity in the unmodified reference sequence.
PCT/US2013/068162 2012-11-01 2013-11-01 Methods for detecting and mapping modifications to nucleic acid polymers using nanopore systems Ceased WO2014071250A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201261721430P 2012-11-01 2012-11-01
US61/721,430 2012-11-01
US201361841824P 2013-07-01 2013-07-01
US61/841,824 2013-07-01

Publications (1)

Publication Number Publication Date
WO2014071250A1 true WO2014071250A1 (en) 2014-05-08

Family

ID=50628120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/068162 Ceased WO2014071250A1 (en) 2012-11-01 2013-11-01 Methods for detecting and mapping modifications to nucleic acid polymers using nanopore systems

Country Status (1)

Country Link
WO (1) WO2014071250A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015175789A1 (en) * 2014-05-14 2015-11-19 Mcruer Robert N Translocation control for sensing by a nanopore
WO2015197355A1 (en) * 2014-06-25 2015-12-30 Joseph Prosser Sequencer
WO2016141221A1 (en) * 2015-03-03 2016-09-09 Stratos Genomics, Inc. Polynucleotide binding protein sequencing cross reference to related applications
WO2016164363A1 (en) * 2015-04-06 2016-10-13 The Regents Of The University Of California Methods for determing base locations in a polynucleotide
WO2017027518A1 (en) * 2015-08-10 2017-02-16 Stratos Genomics, Inc. Single molecule nucleic acid sequencing with molecular sensor complexes
WO2020043082A1 (en) * 2018-08-28 2020-03-05 Nanjing University Protein nanopore for identifying an analyte
WO2020168286A1 (en) * 2019-02-14 2020-08-20 University Of Washington Systems and methods for improved nanopore-based analysis of nucleic acids
US20200333290A1 (en) * 2019-04-16 2020-10-22 The Board Of Trustees Of The University Of Illinois Classification of epigenetic biomarkers and/or dna conformational superstructures via use of atomically thin nanopores
US10822652B2 (en) * 2013-08-30 2020-11-03 University Of Washington Through Its Center For Commercialization Selective modification of polymer subunits to improve nanopore-based analysis
EP3137490B1 (en) 2014-05-02 2021-01-27 Oxford Nanopore Technologies Limited Mutant pores
CN113470751A (en) * 2021-06-30 2021-10-01 南方科技大学 Protein nanopore amino acid sequence screening method, protein nanopore and application of protein nanopore
WO2023030146A1 (en) * 2021-08-30 2023-03-09 四川大学 Polypeptide composition analysis method based on copper ion modified mspa nanopores
CN115877018A (en) * 2022-08-05 2023-03-31 四川大学华西医院 Application of porin in preparation of kit for detecting dehydroepiandrosterone sulfate
WO2024156993A1 (en) * 2023-01-25 2024-08-02 Oxford Nanopore Technologies Plc Calibration of a nanopore array device
WO2025155921A1 (en) * 2024-01-17 2025-07-24 Government Of The United States Of America, As Represented By The Secretary Of Commerce Measuring radiation-induced damage to a biopolymer

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010034018A2 (en) * 2008-09-22 2010-03-25 University Of Washington Msp nanopores and related methods
US20100320094A1 (en) * 2006-05-05 2010-12-23 University Of Utah Research Foundation Nanopore Platforms for Ion Channel Recordings and Single Molecule Detection and Analysis
WO2011106456A2 (en) * 2010-02-23 2011-09-01 University Of Washington Artificial mycolic acid membranes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100320094A1 (en) * 2006-05-05 2010-12-23 University Of Utah Research Foundation Nanopore Platforms for Ion Channel Recordings and Single Molecule Detection and Analysis
WO2010034018A2 (en) * 2008-09-22 2010-03-25 University Of Washington Msp nanopores and related methods
WO2011106456A2 (en) * 2010-02-23 2011-09-01 University Of Washington Artificial mycolic acid membranes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
VENKATESAN ET AL.: "Nanopore sensors for nucleic acid analysis", NATURE NANOTECHNOLOGY, vol. 6, no. 10, 18 September 2011 (2011-09-18), pages 615 - 624 *
WALLACE ET AL.: "Identification of epigenetic DNA modifications with a protein nanopore", CHEMICAL COMMUNICATIONS, vol. 46, no. 43, 21 November 2010 (2010-11-21), pages 8195 - 8197 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10822652B2 (en) * 2013-08-30 2020-11-03 University Of Washington Through Its Center For Commercialization Selective modification of polymer subunits to improve nanopore-based analysis
EP3137490B1 (en) 2014-05-02 2021-01-27 Oxford Nanopore Technologies Limited Mutant pores
WO2015175789A1 (en) * 2014-05-14 2015-11-19 Mcruer Robert N Translocation control for sensing by a nanopore
US10457979B2 (en) 2014-05-14 2019-10-29 Stratos Genomics, Inc. Translocation control for sensing by a nanopore
US10676782B2 (en) 2014-05-14 2020-06-09 Stratos Genomics, Inc. Translocation control for sensing by a nanopore
WO2015197355A1 (en) * 2014-06-25 2015-12-30 Joseph Prosser Sequencer
US10948455B2 (en) 2014-06-25 2021-03-16 Huldagate Technologies Limited Sequencer
GB2543217A (en) * 2014-06-25 2017-04-12 Huldagate Tech Ltd Sequencer
WO2016141221A1 (en) * 2015-03-03 2016-09-09 Stratos Genomics, Inc. Polynucleotide binding protein sequencing cross reference to related applications
US20180044725A1 (en) * 2015-03-03 2018-02-15 Stratos Genomics, Inc. Polynucleotide binding protein sequencing
WO2016164363A1 (en) * 2015-04-06 2016-10-13 The Regents Of The University Of California Methods for determing base locations in a polynucleotide
US10760117B2 (en) 2015-04-06 2020-09-01 The Regents Of The University Of California Methods for determining base locations in a polynucleotide
WO2017027518A1 (en) * 2015-08-10 2017-02-16 Stratos Genomics, Inc. Single molecule nucleic acid sequencing with molecular sensor complexes
WO2020043082A1 (en) * 2018-08-28 2020-03-05 Nanjing University Protein nanopore for identifying an analyte
WO2020168286A1 (en) * 2019-02-14 2020-08-20 University Of Washington Systems and methods for improved nanopore-based analysis of nucleic acids
US20220366313A1 (en) * 2019-02-14 2022-11-17 University Of Washington Systems and methods for improved nanopore-based analysis of nucleic acids
US12321837B2 (en) * 2019-02-14 2025-06-03 University Of Washington Systems and methods for improved nanopore-based analysis of nucleic acids
US20200333290A1 (en) * 2019-04-16 2020-10-22 The Board Of Trustees Of The University Of Illinois Classification of epigenetic biomarkers and/or dna conformational superstructures via use of atomically thin nanopores
US11860123B2 (en) * 2019-04-16 2024-01-02 The Board Of Trustees Of The University Of Illinois Classification of epigenetic biomarkers and/or DNA conformational superstructures via use of atomically thin nanopores
CN113470751A (en) * 2021-06-30 2021-10-01 南方科技大学 Protein nanopore amino acid sequence screening method, protein nanopore and application of protein nanopore
WO2023030146A1 (en) * 2021-08-30 2023-03-09 四川大学 Polypeptide composition analysis method based on copper ion modified mspa nanopores
CN115877018A (en) * 2022-08-05 2023-03-31 四川大学华西医院 Application of porin in preparation of kit for detecting dehydroepiandrosterone sulfate
WO2024156993A1 (en) * 2023-01-25 2024-08-02 Oxford Nanopore Technologies Plc Calibration of a nanopore array device
WO2025155921A1 (en) * 2024-01-17 2025-07-24 Government Of The United States Of America, As Represented By The Secretary Of Commerce Measuring radiation-induced damage to a biopolymer

Similar Documents

Publication Publication Date Title
WO2014071250A1 (en) Methods for detecting and mapping modifications to nucleic acid polymers using nanopore systems
US20240167085A1 (en) Compositions and methods for polynucleotide sequencing
US20240360502A1 (en) Selective modification of polymer subunits to improve nanopore-based analysis
CN115851894B (en) Analysis of polymers
KR102106499B1 (en) Analysis of measurements of a polymer
US20210071239A1 (en) Methods for determining base locations in a polynucleotide
US20140170656A1 (en) Methods of Detecting Target Nucleic Acids
AU2015305570A1 (en) RNA-guided systems for probing and mapping of nucleic acids
CN112480218A (en) Methods for assembling proteins having multiple subunits
EP3535404A2 (en) Modified nucleic acids for nanopore analysis
WO2021021592A1 (en) Nucleic acid constructs and related methods for nanopore readout and scalable dna circuit reporting
US20250305033A1 (en) Single molecule nucleic acid detection by mismatch cleavage
US20240328990A1 (en) Systems and methods for analyzing a target molecule
US20200377944A1 (en) Compositions and methods for unidirectional nucleic acid sequencing
EP3673085A1 (en) Enzyme screening methods
CN105372303B (en) A kind of single molecule analysis method of detection methylate DNA
Cho Development of Single Molecule Electronic SNP Assays using Polymer Tagged Nucleotides and Nanopore Detection
Shi Single nucleotide polymorphism (SNP) discriminations by nanopore sensing
Ivica α-Hemolysin nanopore sensing of MicroRNA with electrolyte gradients

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13850279

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13850279

Country of ref document: EP

Kind code of ref document: A1